Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
9002 lines (9001 sloc) 517 KB
Namespace(batch_size=50, data_name='CR', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='non-static')
Use gpu0
maximum length (in tokens): 105
Done! Tokenizing Time=0.06s, #Sentences=3775
SentimentNet(
(embedding): Embedding(5343 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/62] avg loss 0.013445, throughput 0.471727K wps
[Epoch 0 Batch 60/62] avg loss 0.0131968, throughput 4.84626K wps
Begin Testing...
[Epoch 0] train avg loss 0.0135073, dev acc 0.6372, dev avg loss 0.655936, throughput 0.558127K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0130377, throughput 4.91513K wps
[Epoch 1 Batch 60/62] avg loss 0.0130932, throughput 4.81334K wps
Begin Testing...
[Epoch 1] train avg loss 0.0131987, dev acc 0.6372, dev avg loss 0.652218, throughput 4.87065K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0129942, throughput 4.95888K wps
[Epoch 2 Batch 60/62] avg loss 0.0129855, throughput 4.83108K wps
Begin Testing...
[Epoch 2] train avg loss 0.0131511, dev acc 0.6372, dev avg loss 0.647381, throughput 4.90005K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0128415, throughput 4.91883K wps
[Epoch 3 Batch 60/62] avg loss 0.0128294, throughput 4.84425K wps
Begin Testing...
[Epoch 3] train avg loss 0.0130041, dev acc 0.6372, dev avg loss 0.64255, throughput 4.88807K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0126786, throughput 4.96171K wps
[Epoch 4 Batch 60/62] avg loss 0.0127723, throughput 4.83611K wps
Begin Testing...
[Epoch 4] train avg loss 0.0128726, dev acc 0.6372, dev avg loss 0.638311, throughput 4.90359K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0126234, throughput 4.92492K wps
[Epoch 5 Batch 60/62] avg loss 0.0124364, throughput 4.82995K wps
Begin Testing...
[Epoch 5] train avg loss 0.012706, dev acc 0.6372, dev avg loss 0.634129, throughput 4.87534K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0124799, throughput 4.79373K wps
[Epoch 6 Batch 60/62] avg loss 0.0123978, throughput 4.68229K wps
Begin Testing...
[Epoch 6] train avg loss 0.0125991, dev acc 0.6372, dev avg loss 0.629978, throughput 4.74414K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0121903, throughput 4.37484K wps
[Epoch 7 Batch 60/62] avg loss 0.0123959, throughput 4.62811K wps
Begin Testing...
[Epoch 7] train avg loss 0.0124428, dev acc 0.6372, dev avg loss 0.624923, throughput 4.51607K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0122846, throughput 4.96264K wps
[Epoch 8 Batch 60/62] avg loss 0.0121282, throughput 4.79527K wps
Begin Testing...
[Epoch 8] train avg loss 0.0123774, dev acc 0.6372, dev avg loss 0.619865, throughput 4.88361K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0121771, throughput 4.95395K wps
[Epoch 9 Batch 60/62] avg loss 0.0119131, throughput 4.84895K wps
Begin Testing...
[Epoch 9] train avg loss 0.0121916, dev acc 0.6372, dev avg loss 0.614844, throughput 4.90712K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0120003, throughput 4.93726K wps
[Epoch 10 Batch 60/62] avg loss 0.0120468, throughput 4.82593K wps
Begin Testing...
[Epoch 10] train avg loss 0.0121914, dev acc 0.6490, dev avg loss 0.60958, throughput 4.88997K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0118796, throughput 4.94148K wps
[Epoch 11 Batch 60/62] avg loss 0.0117989, throughput 4.80922K wps
Begin Testing...
[Epoch 11] train avg loss 0.0119515, dev acc 0.6372, dev avg loss 0.604868, throughput 4.88105K wps
[Epoch 12 Batch 30/62] avg loss 0.0117181, throughput 4.9344K wps
[Epoch 12 Batch 60/62] avg loss 0.0116583, throughput 4.83048K wps
Begin Testing...
[Epoch 12] train avg loss 0.0118604, dev acc 0.6490, dev avg loss 0.598476, throughput 4.88592K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0115948, throughput 4.93865K wps
[Epoch 13 Batch 60/62] avg loss 0.0113715, throughput 4.84105K wps
Begin Testing...
[Epoch 13] train avg loss 0.0115705, dev acc 0.6490, dev avg loss 0.593178, throughput 4.89779K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0112728, throughput 4.95158K wps
[Epoch 14 Batch 60/62] avg loss 0.0111997, throughput 4.8281K wps
Begin Testing...
[Epoch 14] train avg loss 0.0113931, dev acc 0.6549, dev avg loss 0.585561, throughput 4.89568K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0113973, throughput 4.9172K wps
[Epoch 15 Batch 60/62] avg loss 0.0110716, throughput 4.85717K wps
Begin Testing...
[Epoch 15] train avg loss 0.0114255, dev acc 0.6962, dev avg loss 0.579586, throughput 4.89334K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0108623, throughput 4.96316K wps
[Epoch 16 Batch 60/62] avg loss 0.0113374, throughput 4.86042K wps
Begin Testing...
[Epoch 16] train avg loss 0.0112716, dev acc 0.7139, dev avg loss 0.573322, throughput 4.91661K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0108886, throughput 4.94509K wps
[Epoch 17 Batch 60/62] avg loss 0.0108646, throughput 4.83962K wps
Begin Testing...
[Epoch 17] train avg loss 0.0110222, dev acc 0.6991, dev avg loss 0.565283, throughput 4.89782K wps
[Epoch 18 Batch 30/62] avg loss 0.010731, throughput 4.96271K wps
[Epoch 18 Batch 60/62] avg loss 0.0106331, throughput 4.83561K wps
Begin Testing...
[Epoch 18] train avg loss 0.0108652, dev acc 0.7139, dev avg loss 0.561672, throughput 4.90353K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0105682, throughput 4.94056K wps
[Epoch 19 Batch 60/62] avg loss 0.0103771, throughput 4.82665K wps
Begin Testing...
[Epoch 19] train avg loss 0.0106028, dev acc 0.7168, dev avg loss 0.550879, throughput 4.88809K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0102601, throughput 4.91329K wps
[Epoch 20 Batch 60/62] avg loss 0.0103047, throughput 4.82117K wps
Begin Testing...
[Epoch 20] train avg loss 0.0104176, dev acc 0.7227, dev avg loss 0.543837, throughput 4.87274K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0100937, throughput 4.93931K wps
[Epoch 21 Batch 60/62] avg loss 0.0100922, throughput 4.84118K wps
Begin Testing...
[Epoch 21] train avg loss 0.0102422, dev acc 0.7345, dev avg loss 0.537139, throughput 4.8972K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.00992095, throughput 4.94586K wps
[Epoch 22 Batch 60/62] avg loss 0.00988475, throughput 4.82724K wps
Begin Testing...
[Epoch 22] train avg loss 0.0100151, dev acc 0.7198, dev avg loss 0.530422, throughput 4.89234K wps
[Epoch 23 Batch 30/62] avg loss 0.00997493, throughput 4.95337K wps
[Epoch 23 Batch 60/62] avg loss 0.00965959, throughput 4.83017K wps
Begin Testing...
[Epoch 23] train avg loss 0.00998559, dev acc 0.7227, dev avg loss 0.524708, throughput 4.89973K wps
[Epoch 24 Batch 30/62] avg loss 0.00972962, throughput 4.94553K wps
[Epoch 24 Batch 60/62] avg loss 0.00943127, throughput 4.83315K wps
Begin Testing...
[Epoch 24] train avg loss 0.00968126, dev acc 0.7434, dev avg loss 0.517007, throughput 4.89504K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.00953306, throughput 4.93025K wps
[Epoch 25 Batch 60/62] avg loss 0.00934566, throughput 4.80385K wps
Begin Testing...
[Epoch 25] train avg loss 0.00956199, dev acc 0.7286, dev avg loss 0.510836, throughput 4.87335K wps
[Epoch 26 Batch 30/62] avg loss 0.00907633, throughput 4.9455K wps
[Epoch 26 Batch 60/62] avg loss 0.00952389, throughput 4.84178K wps
Begin Testing...
[Epoch 26] train avg loss 0.00938982, dev acc 0.7345, dev avg loss 0.505575, throughput 4.89909K wps
[Epoch 27 Batch 30/62] avg loss 0.00928468, throughput 4.94761K wps
[Epoch 27 Batch 60/62] avg loss 0.00899859, throughput 4.8399K wps
Begin Testing...
[Epoch 27] train avg loss 0.00923328, dev acc 0.7434, dev avg loss 0.499525, throughput 4.89968K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.00892942, throughput 4.94443K wps
[Epoch 28 Batch 60/62] avg loss 0.00905691, throughput 4.84321K wps
Begin Testing...
[Epoch 28] train avg loss 0.00906026, dev acc 0.7404, dev avg loss 0.494729, throughput 4.90138K wps
[Epoch 29 Batch 30/62] avg loss 0.00882908, throughput 4.9525K wps
[Epoch 29 Batch 60/62] avg loss 0.0088607, throughput 4.85042K wps
Begin Testing...
[Epoch 29] train avg loss 0.00898893, dev acc 0.7434, dev avg loss 0.490111, throughput 4.90763K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00874423, throughput 4.98039K wps
[Epoch 30 Batch 60/62] avg loss 0.00839061, throughput 4.86983K wps
Begin Testing...
[Epoch 30] train avg loss 0.00865087, dev acc 0.7434, dev avg loss 0.485349, throughput 4.93074K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00830643, throughput 4.97675K wps
[Epoch 31 Batch 60/62] avg loss 0.00873809, throughput 4.87517K wps
Begin Testing...
[Epoch 31] train avg loss 0.0086694, dev acc 0.7375, dev avg loss 0.482564, throughput 4.93307K wps
[Epoch 32 Batch 30/62] avg loss 0.00835041, throughput 4.96316K wps
[Epoch 32 Batch 60/62] avg loss 0.0083072, throughput 4.82533K wps
Begin Testing...
[Epoch 32] train avg loss 0.00840796, dev acc 0.7404, dev avg loss 0.477304, throughput 4.89862K wps
[Epoch 33 Batch 30/62] avg loss 0.0081551, throughput 4.92545K wps
[Epoch 33 Batch 60/62] avg loss 0.00829042, throughput 4.83263K wps
Begin Testing...
[Epoch 33] train avg loss 0.00833375, dev acc 0.7670, dev avg loss 0.472417, throughput 4.88726K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.00809826, throughput 4.92582K wps
[Epoch 34 Batch 60/62] avg loss 0.00823976, throughput 4.82162K wps
Begin Testing...
[Epoch 34] train avg loss 0.00821343, dev acc 0.7375, dev avg loss 0.471272, throughput 4.88108K wps
[Epoch 35 Batch 30/62] avg loss 0.00789119, throughput 4.93705K wps
[Epoch 35 Batch 60/62] avg loss 0.00797774, throughput 4.82862K wps
Begin Testing...
[Epoch 35] train avg loss 0.00812172, dev acc 0.7581, dev avg loss 0.465611, throughput 4.88903K wps
[Epoch 36 Batch 30/62] avg loss 0.00769126, throughput 4.93575K wps
[Epoch 36 Batch 60/62] avg loss 0.00796227, throughput 4.82045K wps
Begin Testing...
[Epoch 36] train avg loss 0.00790419, dev acc 0.7434, dev avg loss 0.465442, throughput 4.88425K wps
[Epoch 37 Batch 30/62] avg loss 0.00782121, throughput 4.93549K wps
[Epoch 37 Batch 60/62] avg loss 0.00761437, throughput 4.81188K wps
Begin Testing...
[Epoch 37] train avg loss 0.00779516, dev acc 0.7434, dev avg loss 0.46251, throughput 4.87863K wps
[Epoch 38 Batch 30/62] avg loss 0.00744394, throughput 4.92002K wps
[Epoch 38 Batch 60/62] avg loss 0.00772991, throughput 4.80913K wps
Begin Testing...
[Epoch 38] train avg loss 0.00768352, dev acc 0.7788, dev avg loss 0.459294, throughput 4.87127K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.00737345, throughput 4.97096K wps
[Epoch 39 Batch 60/62] avg loss 0.007457, throughput 4.86727K wps
Begin Testing...
[Epoch 39] train avg loss 0.00755137, dev acc 0.7758, dev avg loss 0.453665, throughput 4.92608K wps
[Epoch 40 Batch 30/62] avg loss 0.00737379, throughput 4.9837K wps
[Epoch 40 Batch 60/62] avg loss 0.00726226, throughput 4.87156K wps
Begin Testing...
[Epoch 40] train avg loss 0.00738417, dev acc 0.7906, dev avg loss 0.451054, throughput 4.93344K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.00715502, throughput 4.98009K wps
[Epoch 41 Batch 60/62] avg loss 0.00745742, throughput 4.861K wps
Begin Testing...
[Epoch 41] train avg loss 0.00738155, dev acc 0.7611, dev avg loss 0.452702, throughput 4.92783K wps
[Epoch 42 Batch 30/62] avg loss 0.00702534, throughput 4.98742K wps
[Epoch 42 Batch 60/62] avg loss 0.00747432, throughput 4.8624K wps
Begin Testing...
[Epoch 42] train avg loss 0.0073331, dev acc 0.7847, dev avg loss 0.446736, throughput 4.92998K wps
[Epoch 43 Batch 30/62] avg loss 0.00692755, throughput 4.98862K wps
[Epoch 43 Batch 60/62] avg loss 0.00695682, throughput 4.88248K wps
Begin Testing...
[Epoch 43] train avg loss 0.00700193, dev acc 0.7699, dev avg loss 0.445013, throughput 4.94176K wps
[Epoch 44 Batch 30/62] avg loss 0.00670062, throughput 4.9822K wps
[Epoch 44 Batch 60/62] avg loss 0.00712595, throughput 4.88549K wps
Begin Testing...
[Epoch 44] train avg loss 0.00699366, dev acc 0.7817, dev avg loss 0.44334, throughput 4.94104K wps
[Epoch 45 Batch 30/62] avg loss 0.00686528, throughput 4.97675K wps
[Epoch 45 Batch 60/62] avg loss 0.00677299, throughput 4.83285K wps
Begin Testing...
[Epoch 45] train avg loss 0.0068584, dev acc 0.7729, dev avg loss 0.440649, throughput 4.90872K wps
[Epoch 46 Batch 30/62] avg loss 0.00640212, throughput 4.95008K wps
[Epoch 46 Batch 60/62] avg loss 0.00668036, throughput 4.81475K wps
Begin Testing...
[Epoch 46] train avg loss 0.0066247, dev acc 0.7876, dev avg loss 0.438343, throughput 4.88771K wps
[Epoch 47 Batch 30/62] avg loss 0.00660311, throughput 4.93405K wps
[Epoch 47 Batch 60/62] avg loss 0.00646501, throughput 4.81428K wps
Begin Testing...
[Epoch 47] train avg loss 0.00659378, dev acc 0.7788, dev avg loss 0.436129, throughput 4.8808K wps
[Epoch 48 Batch 30/62] avg loss 0.00639406, throughput 4.91493K wps
[Epoch 48 Batch 60/62] avg loss 0.00637038, throughput 4.80639K wps
Begin Testing...
[Epoch 48] train avg loss 0.00645187, dev acc 0.7758, dev avg loss 0.436345, throughput 4.86633K wps
[Epoch 49 Batch 30/62] avg loss 0.0065051, throughput 4.8828K wps
[Epoch 49 Batch 60/62] avg loss 0.00611506, throughput 4.8076K wps
Begin Testing...
[Epoch 49] train avg loss 0.0064, dev acc 0.7906, dev avg loss 0.431317, throughput 4.85318K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00648552, throughput 4.92371K wps
[Epoch 50 Batch 60/62] avg loss 0.00620265, throughput 4.8164K wps
Begin Testing...
[Epoch 50] train avg loss 0.00638058, dev acc 0.7788, dev avg loss 0.435056, throughput 4.87537K wps
[Epoch 51 Batch 30/62] avg loss 0.00615561, throughput 4.9197K wps
[Epoch 51 Batch 60/62] avg loss 0.00605422, throughput 4.81603K wps
Begin Testing...
[Epoch 51] train avg loss 0.00617264, dev acc 0.7906, dev avg loss 0.42798, throughput 4.87347K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00608989, throughput 4.92474K wps
[Epoch 52 Batch 60/62] avg loss 0.00612328, throughput 4.81134K wps
Begin Testing...
[Epoch 52] train avg loss 0.0061573, dev acc 0.7906, dev avg loss 0.426614, throughput 4.87399K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.00613385, throughput 4.91569K wps
[Epoch 53 Batch 60/62] avg loss 0.00584593, throughput 4.82171K wps
Begin Testing...
[Epoch 53] train avg loss 0.00609652, dev acc 0.7817, dev avg loss 0.426982, throughput 4.87503K wps
[Epoch 54 Batch 30/62] avg loss 0.00585307, throughput 4.935K wps
[Epoch 54 Batch 60/62] avg loss 0.00584116, throughput 4.82292K wps
Begin Testing...
[Epoch 54] train avg loss 0.00590052, dev acc 0.7847, dev avg loss 0.426442, throughput 4.88429K wps
[Epoch 55 Batch 30/62] avg loss 0.00565125, throughput 4.93897K wps
[Epoch 55 Batch 60/62] avg loss 0.00573727, throughput 4.8622K wps
Begin Testing...
[Epoch 55] train avg loss 0.00583393, dev acc 0.7935, dev avg loss 0.424057, throughput 4.90746K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00569146, throughput 4.93934K wps
[Epoch 56 Batch 60/62] avg loss 0.00553693, throughput 4.84011K wps
Begin Testing...
[Epoch 56] train avg loss 0.00570647, dev acc 0.7876, dev avg loss 0.423241, throughput 4.89687K wps
[Epoch 57 Batch 30/62] avg loss 0.00545584, throughput 4.97103K wps
[Epoch 57 Batch 60/62] avg loss 0.00585332, throughput 4.88425K wps
Begin Testing...
[Epoch 57] train avg loss 0.00571544, dev acc 0.7847, dev avg loss 0.420496, throughput 4.93435K wps
[Epoch 58 Batch 30/62] avg loss 0.00543689, throughput 4.98044K wps
[Epoch 58 Batch 60/62] avg loss 0.0054878, throughput 4.88719K wps
Begin Testing...
[Epoch 58] train avg loss 0.00549895, dev acc 0.7935, dev avg loss 0.417901, throughput 4.93885K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/62] avg loss 0.00531911, throughput 5.0013K wps
[Epoch 59 Batch 60/62] avg loss 0.00544832, throughput 4.8692K wps
Begin Testing...
[Epoch 59] train avg loss 0.00544736, dev acc 0.7935, dev avg loss 0.416824, throughput 4.94126K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/62] avg loss 0.00504661, throughput 4.98266K wps
[Epoch 60 Batch 60/62] avg loss 0.00539697, throughput 4.89258K wps
Begin Testing...
[Epoch 60] train avg loss 0.00529213, dev acc 0.7906, dev avg loss 0.416071, throughput 4.94371K wps
[Epoch 61 Batch 30/62] avg loss 0.00530804, throughput 4.98295K wps
[Epoch 61 Batch 60/62] avg loss 0.00508263, throughput 4.83853K wps
Begin Testing...
[Epoch 61] train avg loss 0.00522678, dev acc 0.7994, dev avg loss 0.414187, throughput 4.91503K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/62] avg loss 0.00504957, throughput 4.93522K wps
[Epoch 62 Batch 60/62] avg loss 0.00510101, throughput 4.80079K wps
Begin Testing...
[Epoch 62] train avg loss 0.00520337, dev acc 0.7935, dev avg loss 0.414048, throughput 4.87308K wps
[Epoch 63 Batch 30/62] avg loss 0.00488011, throughput 4.93389K wps
[Epoch 63 Batch 60/62] avg loss 0.0052202, throughput 4.80206K wps
Begin Testing...
[Epoch 63] train avg loss 0.00515046, dev acc 0.7906, dev avg loss 0.412618, throughput 4.87438K wps
[Epoch 64 Batch 30/62] avg loss 0.00483845, throughput 4.92132K wps
[Epoch 64 Batch 60/62] avg loss 0.00502515, throughput 4.81619K wps
Begin Testing...
[Epoch 64] train avg loss 0.00495068, dev acc 0.7935, dev avg loss 0.412102, throughput 4.87568K wps
[Epoch 65 Batch 30/62] avg loss 0.00481287, throughput 4.9322K wps
[Epoch 65 Batch 60/62] avg loss 0.00490264, throughput 4.82885K wps
Begin Testing...
[Epoch 65] train avg loss 0.00496641, dev acc 0.8053, dev avg loss 0.418704, throughput 4.88616K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/62] avg loss 0.00467902, throughput 4.9391K wps
[Epoch 66 Batch 60/62] avg loss 0.00470211, throughput 4.82495K wps
Begin Testing...
[Epoch 66] train avg loss 0.00474762, dev acc 0.7965, dev avg loss 0.411497, throughput 4.88862K wps
[Epoch 67 Batch 30/62] avg loss 0.0047562, throughput 4.92613K wps
[Epoch 67 Batch 60/62] avg loss 0.00462049, throughput 4.8139K wps
Begin Testing...
[Epoch 67] train avg loss 0.00471963, dev acc 0.7994, dev avg loss 0.410209, throughput 4.87687K wps
[Epoch 68 Batch 30/62] avg loss 0.00445426, throughput 4.93308K wps
[Epoch 68 Batch 60/62] avg loss 0.0046298, throughput 4.84473K wps
Begin Testing...
[Epoch 68] train avg loss 0.00459832, dev acc 0.7935, dev avg loss 0.408507, throughput 4.89701K wps
[Epoch 69 Batch 30/62] avg loss 0.00464487, throughput 4.99189K wps
[Epoch 69 Batch 60/62] avg loss 0.00429537, throughput 4.86371K wps
Begin Testing...
[Epoch 69] train avg loss 0.00452614, dev acc 0.7994, dev avg loss 0.407721, throughput 4.93426K wps
[Epoch 70 Batch 30/62] avg loss 0.00426489, throughput 4.97545K wps
[Epoch 70 Batch 60/62] avg loss 0.00444868, throughput 4.82434K wps
Begin Testing...
[Epoch 70] train avg loss 0.00437552, dev acc 0.8024, dev avg loss 0.408463, throughput 4.90606K wps
[Epoch 71 Batch 30/62] avg loss 0.00425345, throughput 4.95247K wps
[Epoch 71 Batch 60/62] avg loss 0.00432024, throughput 4.8293K wps
Begin Testing...
[Epoch 71] train avg loss 0.00436143, dev acc 0.7965, dev avg loss 0.406426, throughput 4.89527K wps
[Epoch 72 Batch 30/62] avg loss 0.00420103, throughput 4.94874K wps
[Epoch 72 Batch 60/62] avg loss 0.00427249, throughput 4.87087K wps
Begin Testing...
[Epoch 72] train avg loss 0.00431095, dev acc 0.7994, dev avg loss 0.41164, throughput 4.91802K wps
[Epoch 73 Batch 30/62] avg loss 0.00420223, throughput 4.97549K wps
[Epoch 73 Batch 60/62] avg loss 0.00418222, throughput 4.86391K wps
Begin Testing...
[Epoch 73] train avg loss 0.00425525, dev acc 0.7965, dev avg loss 0.405059, throughput 4.926K wps
[Epoch 74 Batch 30/62] avg loss 0.00413097, throughput 4.96214K wps
[Epoch 74 Batch 60/62] avg loss 0.00404736, throughput 4.8315K wps
Begin Testing...
[Epoch 74] train avg loss 0.0041412, dev acc 0.7965, dev avg loss 0.404888, throughput 4.90268K wps
[Epoch 75 Batch 30/62] avg loss 0.00389628, throughput 4.9392K wps
[Epoch 75 Batch 60/62] avg loss 0.0043514, throughput 4.84072K wps
Begin Testing...
[Epoch 75] train avg loss 0.00413485, dev acc 0.7994, dev avg loss 0.404551, throughput 4.89702K wps
[Epoch 76 Batch 30/62] avg loss 0.0039169, throughput 4.9499K wps
[Epoch 76 Batch 60/62] avg loss 0.00400583, throughput 4.80024K wps
Begin Testing...
[Epoch 76] train avg loss 0.00399436, dev acc 0.7994, dev avg loss 0.403852, throughput 4.881K wps
[Epoch 77 Batch 30/62] avg loss 0.00402453, throughput 4.9366K wps
[Epoch 77 Batch 60/62] avg loss 0.00374487, throughput 4.82652K wps
Begin Testing...
[Epoch 77] train avg loss 0.00396336, dev acc 0.8053, dev avg loss 0.402713, throughput 4.88698K wps
Observed Improvement.
Begin Testing...
[Epoch 78 Batch 30/62] avg loss 0.00372348, throughput 4.92524K wps
[Epoch 78 Batch 60/62] avg loss 0.0039152, throughput 4.81877K wps
Begin Testing...
[Epoch 78] train avg loss 0.00383342, dev acc 0.8024, dev avg loss 0.403372, throughput 4.87807K wps
[Epoch 79 Batch 30/62] avg loss 0.00360691, throughput 4.9172K wps
[Epoch 79 Batch 60/62] avg loss 0.00380993, throughput 4.81862K wps
Begin Testing...
[Epoch 79] train avg loss 0.00377186, dev acc 0.8024, dev avg loss 0.401945, throughput 4.87375K wps
[Epoch 80 Batch 30/62] avg loss 0.00363161, throughput 4.91291K wps
[Epoch 80 Batch 60/62] avg loss 0.00372225, throughput 4.83944K wps
Begin Testing...
[Epoch 80] train avg loss 0.00373056, dev acc 0.7965, dev avg loss 0.402562, throughput 4.88255K wps
[Epoch 81 Batch 30/62] avg loss 0.00358091, throughput 4.93753K wps
[Epoch 81 Batch 60/62] avg loss 0.00379092, throughput 4.85232K wps
Begin Testing...
[Epoch 81] train avg loss 0.0037086, dev acc 0.8112, dev avg loss 0.407002, throughput 4.90013K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/62] avg loss 0.00341959, throughput 4.93654K wps
[Epoch 82 Batch 60/62] avg loss 0.00362189, throughput 4.86557K wps
Begin Testing...
[Epoch 82] train avg loss 0.00354048, dev acc 0.8112, dev avg loss 0.402683, throughput 4.90737K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/62] avg loss 0.00362322, throughput 4.93393K wps
[Epoch 83 Batch 60/62] avg loss 0.00351045, throughput 4.83923K wps
Begin Testing...
[Epoch 83] train avg loss 0.00359019, dev acc 0.8053, dev avg loss 0.400379, throughput 4.89349K wps
[Epoch 84 Batch 30/62] avg loss 0.00348498, throughput 4.94791K wps
[Epoch 84 Batch 60/62] avg loss 0.00347922, throughput 4.82548K wps
Begin Testing...
[Epoch 84] train avg loss 0.00347214, dev acc 0.8083, dev avg loss 0.403754, throughput 4.89166K wps
[Epoch 85 Batch 30/62] avg loss 0.00357113, throughput 4.93768K wps
[Epoch 85 Batch 60/62] avg loss 0.00326591, throughput 4.82296K wps
Begin Testing...
[Epoch 85] train avg loss 0.00349396, dev acc 0.8171, dev avg loss 0.40066, throughput 4.88639K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/62] avg loss 0.0032253, throughput 4.95436K wps
[Epoch 86 Batch 60/62] avg loss 0.00351155, throughput 4.88762K wps
Begin Testing...
[Epoch 86] train avg loss 0.00343281, dev acc 0.8083, dev avg loss 0.399794, throughput 4.92818K wps
[Epoch 87 Batch 30/62] avg loss 0.00320541, throughput 4.97434K wps
[Epoch 87 Batch 60/62] avg loss 0.00344333, throughput 4.8698K wps
Begin Testing...
[Epoch 87] train avg loss 0.00341011, dev acc 0.7994, dev avg loss 0.399896, throughput 4.9276K wps
[Epoch 88 Batch 30/62] avg loss 0.0033472, throughput 5.00638K wps
[Epoch 88 Batch 60/62] avg loss 0.00313657, throughput 4.87398K wps
Begin Testing...
[Epoch 88] train avg loss 0.0032611, dev acc 0.8024, dev avg loss 0.401921, throughput 4.94487K wps
[Epoch 89 Batch 30/62] avg loss 0.00304958, throughput 4.95111K wps
[Epoch 89 Batch 60/62] avg loss 0.00313891, throughput 4.83242K wps
Begin Testing...
[Epoch 89] train avg loss 0.003154, dev acc 0.7994, dev avg loss 0.39916, throughput 4.89843K wps
[Epoch 90 Batch 30/62] avg loss 0.00319146, throughput 4.93611K wps
[Epoch 90 Batch 60/62] avg loss 0.0030831, throughput 4.82163K wps
Begin Testing...
[Epoch 90] train avg loss 0.00316246, dev acc 0.8053, dev avg loss 0.398907, throughput 4.88463K wps
[Epoch 91 Batch 30/62] avg loss 0.00284124, throughput 4.92377K wps
[Epoch 91 Batch 60/62] avg loss 0.00317324, throughput 4.80262K wps
Begin Testing...
[Epoch 91] train avg loss 0.00308219, dev acc 0.8171, dev avg loss 0.403204, throughput 4.86933K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/62] avg loss 0.00288564, throughput 4.87798K wps
[Epoch 92 Batch 60/62] avg loss 0.00306204, throughput 4.81344K wps
Begin Testing...
[Epoch 92] train avg loss 0.00300929, dev acc 0.8024, dev avg loss 0.39859, throughput 4.85389K wps
[Epoch 93 Batch 30/62] avg loss 0.00304639, throughput 4.89393K wps
[Epoch 93 Batch 60/62] avg loss 0.00284405, throughput 4.79452K wps
Begin Testing...
[Epoch 93] train avg loss 0.00296548, dev acc 0.8053, dev avg loss 0.400506, throughput 4.85095K wps
[Epoch 94 Batch 30/62] avg loss 0.00284129, throughput 4.90905K wps
[Epoch 94 Batch 60/62] avg loss 0.00303574, throughput 4.80948K wps
Begin Testing...
[Epoch 94] train avg loss 0.00297128, dev acc 0.8083, dev avg loss 0.40273, throughput 4.86478K wps
[Epoch 95 Batch 30/62] avg loss 0.00286424, throughput 4.9441K wps
[Epoch 95 Batch 60/62] avg loss 0.00281804, throughput 4.8151K wps
Begin Testing...
[Epoch 95] train avg loss 0.00286033, dev acc 0.8053, dev avg loss 0.399327, throughput 4.88453K wps
[Epoch 96 Batch 30/62] avg loss 0.002682, throughput 4.93968K wps
[Epoch 96 Batch 60/62] avg loss 0.00280621, throughput 4.82469K wps
Begin Testing...
[Epoch 96] train avg loss 0.00277098, dev acc 0.8053, dev avg loss 0.399787, throughput 4.88719K wps
[Epoch 97 Batch 30/62] avg loss 0.00276861, throughput 4.93674K wps
[Epoch 97 Batch 60/62] avg loss 0.00273016, throughput 4.83938K wps
Begin Testing...
[Epoch 97] train avg loss 0.00276745, dev acc 0.8053, dev avg loss 0.399933, throughput 4.89485K wps
[Epoch 98 Batch 30/62] avg loss 0.00266061, throughput 4.94148K wps
[Epoch 98 Batch 60/62] avg loss 0.00264135, throughput 4.85929K wps
Begin Testing...
[Epoch 98] train avg loss 0.00269753, dev acc 0.8112, dev avg loss 0.398668, throughput 4.90815K wps
[Epoch 99 Batch 30/62] avg loss 0.00261924, throughput 4.9367K wps
[Epoch 99 Batch 60/62] avg loss 0.00257474, throughput 4.82314K wps
Begin Testing...
[Epoch 99] train avg loss 0.00265475, dev acc 0.8112, dev avg loss 0.400058, throughput 4.88547K wps
[Epoch 100 Batch 30/62] avg loss 0.00260291, throughput 4.91384K wps
[Epoch 100 Batch 60/62] avg loss 0.00257685, throughput 4.85946K wps
Begin Testing...
[Epoch 100] train avg loss 0.00264481, dev acc 0.8230, dev avg loss 0.401613, throughput 4.89451K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/62] avg loss 0.00273689, throughput 4.99194K wps
[Epoch 101 Batch 60/62] avg loss 0.00247731, throughput 4.86389K wps
Begin Testing...
[Epoch 101] train avg loss 0.00264166, dev acc 0.8142, dev avg loss 0.399851, throughput 4.93307K wps
[Epoch 102 Batch 30/62] avg loss 0.00246808, throughput 4.94302K wps
[Epoch 102 Batch 60/62] avg loss 0.00255604, throughput 4.83942K wps
Begin Testing...
[Epoch 102] train avg loss 0.00254966, dev acc 0.8112, dev avg loss 0.402226, throughput 4.89653K wps
[Epoch 103 Batch 30/62] avg loss 0.00244403, throughput 4.94818K wps
[Epoch 103 Batch 60/62] avg loss 0.002531, throughput 4.81661K wps
Begin Testing...
[Epoch 103] train avg loss 0.00248259, dev acc 0.8171, dev avg loss 0.401456, throughput 4.88857K wps
[Epoch 104 Batch 30/62] avg loss 0.00245059, throughput 4.91035K wps
[Epoch 104 Batch 60/62] avg loss 0.00245823, throughput 4.80514K wps
Begin Testing...
[Epoch 104] train avg loss 0.00250326, dev acc 0.8083, dev avg loss 0.400125, throughput 4.86438K wps
[Epoch 105 Batch 30/62] avg loss 0.00245116, throughput 4.91432K wps
[Epoch 105 Batch 60/62] avg loss 0.00236679, throughput 4.797K wps
Begin Testing...
[Epoch 105] train avg loss 0.00245955, dev acc 0.8083, dev avg loss 0.401209, throughput 4.86226K wps
[Epoch 106 Batch 30/62] avg loss 0.00245967, throughput 4.92792K wps
[Epoch 106 Batch 60/62] avg loss 0.00241958, throughput 4.83744K wps
Begin Testing...
[Epoch 106] train avg loss 0.00245092, dev acc 0.8053, dev avg loss 0.400825, throughput 4.88899K wps
[Epoch 107 Batch 30/62] avg loss 0.00232097, throughput 4.96835K wps
[Epoch 107 Batch 60/62] avg loss 0.00231263, throughput 4.86626K wps
Begin Testing...
[Epoch 107] train avg loss 0.00237091, dev acc 0.8142, dev avg loss 0.401292, throughput 4.92277K wps
[Epoch 108 Batch 30/62] avg loss 0.00218261, throughput 4.98262K wps
[Epoch 108 Batch 60/62] avg loss 0.00245994, throughput 4.86717K wps
Begin Testing...
[Epoch 108] train avg loss 0.00233806, dev acc 0.8112, dev avg loss 0.40157, throughput 4.93093K wps
[Epoch 109 Batch 30/62] avg loss 0.00214847, throughput 4.98316K wps
[Epoch 109 Batch 60/62] avg loss 0.00232461, throughput 4.85921K wps
Begin Testing...
[Epoch 109] train avg loss 0.00225533, dev acc 0.8171, dev avg loss 0.403396, throughput 4.92652K wps
[Epoch 110 Batch 30/62] avg loss 0.00214925, throughput 4.94703K wps
[Epoch 110 Batch 60/62] avg loss 0.00225157, throughput 4.83554K wps
Begin Testing...
[Epoch 110] train avg loss 0.00225002, dev acc 0.8201, dev avg loss 0.405584, throughput 4.89719K wps
[Epoch 111 Batch 30/62] avg loss 0.00210567, throughput 4.95789K wps
[Epoch 111 Batch 60/62] avg loss 0.00213322, throughput 4.84157K wps
Begin Testing...
[Epoch 111] train avg loss 0.00217526, dev acc 0.8142, dev avg loss 0.402819, throughput 4.90561K wps
[Epoch 112 Batch 30/62] avg loss 0.00215706, throughput 4.95956K wps
[Epoch 112 Batch 60/62] avg loss 0.0022813, throughput 4.85516K wps
Begin Testing...
[Epoch 112] train avg loss 0.00223246, dev acc 0.8112, dev avg loss 0.403293, throughput 4.91278K wps
[Epoch 113 Batch 30/62] avg loss 0.00204791, throughput 4.9515K wps
[Epoch 113 Batch 60/62] avg loss 0.00214596, throughput 4.81977K wps
Begin Testing...
[Epoch 113] train avg loss 0.00213818, dev acc 0.8083, dev avg loss 0.404342, throughput 4.89177K wps
[Epoch 114 Batch 30/62] avg loss 0.00203295, throughput 4.93008K wps
[Epoch 114 Batch 60/62] avg loss 0.0019987, throughput 4.8103K wps
Begin Testing...
[Epoch 114] train avg loss 0.00205839, dev acc 0.8171, dev avg loss 0.40833, throughput 4.87685K wps
[Epoch 115 Batch 30/62] avg loss 0.0021643, throughput 4.89277K wps
[Epoch 115 Batch 60/62] avg loss 0.00200021, throughput 4.84323K wps
Begin Testing...
[Epoch 115] train avg loss 0.00210126, dev acc 0.8142, dev avg loss 0.403812, throughput 4.87394K wps
[Epoch 116 Batch 30/62] avg loss 0.00201431, throughput 4.93547K wps
[Epoch 116 Batch 60/62] avg loss 0.00202221, throughput 4.80507K wps
Begin Testing...
[Epoch 116] train avg loss 0.0020591, dev acc 0.8142, dev avg loss 0.406041, throughput 4.87748K wps
[Epoch 117 Batch 30/62] avg loss 0.00196542, throughput 4.94477K wps
[Epoch 117 Batch 60/62] avg loss 0.00199098, throughput 4.82016K wps
Begin Testing...
[Epoch 117] train avg loss 0.00200254, dev acc 0.8171, dev avg loss 0.40592, throughput 4.88772K wps
[Epoch 118 Batch 30/62] avg loss 0.00199991, throughput 4.92471K wps
[Epoch 118 Batch 60/62] avg loss 0.0019675, throughput 4.81119K wps
Begin Testing...
[Epoch 118] train avg loss 0.00199394, dev acc 0.8171, dev avg loss 0.408684, throughput 4.87451K wps
[Epoch 119 Batch 30/62] avg loss 0.00203604, throughput 4.92393K wps
[Epoch 119 Batch 60/62] avg loss 0.00185689, throughput 4.81257K wps
Begin Testing...
[Epoch 119] train avg loss 0.00199259, dev acc 0.8201, dev avg loss 0.40845, throughput 4.87452K wps
[Epoch 120 Batch 30/62] avg loss 0.00186694, throughput 4.90486K wps
[Epoch 120 Batch 60/62] avg loss 0.00189741, throughput 4.79164K wps
Begin Testing...
[Epoch 120] train avg loss 0.00191043, dev acc 0.8201, dev avg loss 0.408325, throughput 4.85079K wps
[Epoch 121 Batch 30/62] avg loss 0.00177509, throughput 4.91359K wps
[Epoch 121 Batch 60/62] avg loss 0.00187273, throughput 4.81126K wps
Begin Testing...
[Epoch 121] train avg loss 0.00185189, dev acc 0.8230, dev avg loss 0.409625, throughput 4.86995K wps
Observed Improvement.
Begin Testing...
[Epoch 122 Batch 30/62] avg loss 0.00183536, throughput 4.92576K wps
[Epoch 122 Batch 60/62] avg loss 0.00186271, throughput 4.80947K wps
Begin Testing...
[Epoch 122] train avg loss 0.00185319, dev acc 0.8112, dev avg loss 0.407253, throughput 4.87392K wps
[Epoch 123 Batch 30/62] avg loss 0.0018143, throughput 4.93396K wps
[Epoch 123 Batch 60/62] avg loss 0.00171863, throughput 4.83571K wps
Begin Testing...
[Epoch 123] train avg loss 0.00176556, dev acc 0.8142, dev avg loss 0.408033, throughput 4.89179K wps
[Epoch 124 Batch 30/62] avg loss 0.00176602, throughput 4.9401K wps
[Epoch 124 Batch 60/62] avg loss 0.0018552, throughput 4.85136K wps
Begin Testing...
[Epoch 124] train avg loss 0.00186109, dev acc 0.8201, dev avg loss 0.407848, throughput 4.90262K wps
[Epoch 125 Batch 30/62] avg loss 0.00177371, throughput 4.95628K wps
[Epoch 125 Batch 60/62] avg loss 0.00170149, throughput 4.83568K wps
Begin Testing...
[Epoch 125] train avg loss 0.00175845, dev acc 0.8142, dev avg loss 0.406372, throughput 4.90279K wps
[Epoch 126 Batch 30/62] avg loss 0.00182704, throughput 4.93073K wps
[Epoch 126 Batch 60/62] avg loss 0.00165369, throughput 4.82077K wps
Begin Testing...
[Epoch 126] train avg loss 0.00176805, dev acc 0.8171, dev avg loss 0.407174, throughput 4.88238K wps
[Epoch 127 Batch 30/62] avg loss 0.00169655, throughput 4.92989K wps
[Epoch 127 Batch 60/62] avg loss 0.00171651, throughput 4.81877K wps
Begin Testing...
[Epoch 127] train avg loss 0.00172212, dev acc 0.8201, dev avg loss 0.407556, throughput 4.8809K wps
[Epoch 128 Batch 30/62] avg loss 0.00166197, throughput 4.92073K wps
[Epoch 128 Batch 60/62] avg loss 0.0016612, throughput 4.8245K wps
Begin Testing...
[Epoch 128] train avg loss 0.00172645, dev acc 0.8260, dev avg loss 0.407096, throughput 4.87847K wps
Observed Improvement.
Begin Testing...
[Epoch 129 Batch 30/62] avg loss 0.00165866, throughput 4.93985K wps
[Epoch 129 Batch 60/62] avg loss 0.00173905, throughput 4.82769K wps
Begin Testing...
[Epoch 129] train avg loss 0.00171076, dev acc 0.8260, dev avg loss 0.408482, throughput 4.88985K wps
Observed Improvement.
Begin Testing...
[Epoch 130 Batch 30/62] avg loss 0.00167902, throughput 4.94411K wps
[Epoch 130 Batch 60/62] avg loss 0.00166501, throughput 4.83577K wps
Begin Testing...
[Epoch 130] train avg loss 0.00170428, dev acc 0.8201, dev avg loss 0.409378, throughput 4.89377K wps
[Epoch 131 Batch 30/62] avg loss 0.00163923, throughput 4.93794K wps
[Epoch 131 Batch 60/62] avg loss 0.00158189, throughput 4.83468K wps
Begin Testing...
[Epoch 131] train avg loss 0.00162725, dev acc 0.8171, dev avg loss 0.412034, throughput 4.89201K wps
[Epoch 132 Batch 30/62] avg loss 0.0016463, throughput 4.93032K wps
[Epoch 132 Batch 60/62] avg loss 0.00155324, throughput 4.81834K wps
Begin Testing...
[Epoch 132] train avg loss 0.00164797, dev acc 0.8260, dev avg loss 0.419009, throughput 4.88081K wps
Observed Improvement.
Begin Testing...
[Epoch 133 Batch 30/62] avg loss 0.00145594, throughput 4.92639K wps
[Epoch 133 Batch 60/62] avg loss 0.00164364, throughput 4.82716K wps
Begin Testing...
[Epoch 133] train avg loss 0.00157608, dev acc 0.8260, dev avg loss 0.410458, throughput 4.88286K wps
Observed Improvement.
Begin Testing...
[Epoch 134 Batch 30/62] avg loss 0.00154391, throughput 4.92676K wps
[Epoch 134 Batch 60/62] avg loss 0.00147854, throughput 4.82157K wps
Begin Testing...
[Epoch 134] train avg loss 0.0015245, dev acc 0.8319, dev avg loss 0.414659, throughput 4.87915K wps
Observed Improvement.
Begin Testing...
[Epoch 135 Batch 30/62] avg loss 0.00151295, throughput 4.93987K wps
[Epoch 135 Batch 60/62] avg loss 0.00161193, throughput 4.82986K wps
Begin Testing...
[Epoch 135] train avg loss 0.00156672, dev acc 0.8260, dev avg loss 0.413498, throughput 4.88935K wps
[Epoch 136 Batch 30/62] avg loss 0.00155415, throughput 4.92467K wps
[Epoch 136 Batch 60/62] avg loss 0.00148812, throughput 4.83304K wps
Begin Testing...
[Epoch 136] train avg loss 0.00158763, dev acc 0.8260, dev avg loss 0.41416, throughput 4.88481K wps
[Epoch 137 Batch 30/62] avg loss 0.00144511, throughput 4.93279K wps
[Epoch 137 Batch 60/62] avg loss 0.00160545, throughput 4.83169K wps
Begin Testing...
[Epoch 137] train avg loss 0.00153991, dev acc 0.8230, dev avg loss 0.411326, throughput 4.8898K wps
[Epoch 138 Batch 30/62] avg loss 0.00138395, throughput 4.94413K wps
[Epoch 138 Batch 60/62] avg loss 0.00150656, throughput 4.83674K wps
Begin Testing...
[Epoch 138] train avg loss 0.00145115, dev acc 0.8230, dev avg loss 0.414153, throughput 4.89614K wps
[Epoch 139 Batch 30/62] avg loss 0.00137001, throughput 4.94798K wps
[Epoch 139 Batch 60/62] avg loss 0.00151116, throughput 4.8171K wps
Begin Testing...
[Epoch 139] train avg loss 0.00144808, dev acc 0.8201, dev avg loss 0.414638, throughput 4.88796K wps
[Epoch 140 Batch 30/62] avg loss 0.00136126, throughput 4.92479K wps
[Epoch 140 Batch 60/62] avg loss 0.00135073, throughput 4.82898K wps
Begin Testing...
[Epoch 140] train avg loss 0.00136242, dev acc 0.8230, dev avg loss 0.415429, throughput 4.8841K wps
[Epoch 141 Batch 30/62] avg loss 0.00132463, throughput 4.95255K wps
[Epoch 141 Batch 60/62] avg loss 0.00147527, throughput 4.83351K wps
Begin Testing...
[Epoch 141] train avg loss 0.00140349, dev acc 0.8201, dev avg loss 0.416175, throughput 4.89887K wps
[Epoch 142 Batch 30/62] avg loss 0.00146153, throughput 4.93673K wps
[Epoch 142 Batch 60/62] avg loss 0.00135837, throughput 4.80788K wps
Begin Testing...
[Epoch 142] train avg loss 0.00141772, dev acc 0.8201, dev avg loss 0.416126, throughput 4.87686K wps
[Epoch 143 Batch 30/62] avg loss 0.00132431, throughput 4.92471K wps
[Epoch 143 Batch 60/62] avg loss 0.00139188, throughput 4.8195K wps
Begin Testing...
[Epoch 143] train avg loss 0.00138705, dev acc 0.8230, dev avg loss 0.417326, throughput 4.87972K wps
[Epoch 144 Batch 30/62] avg loss 0.00135, throughput 4.95746K wps
[Epoch 144 Batch 60/62] avg loss 0.00127342, throughput 4.81419K wps
Begin Testing...
[Epoch 144] train avg loss 0.00130893, dev acc 0.8230, dev avg loss 0.416488, throughput 4.89125K wps
[Epoch 145 Batch 30/62] avg loss 0.00128519, throughput 4.92439K wps
[Epoch 145 Batch 60/62] avg loss 0.00133467, throughput 4.81471K wps
Begin Testing...
[Epoch 145] train avg loss 0.00133226, dev acc 0.8260, dev avg loss 0.417505, throughput 4.87507K wps
[Epoch 146 Batch 30/62] avg loss 0.00140574, throughput 4.9434K wps
[Epoch 146 Batch 60/62] avg loss 0.00138723, throughput 4.81833K wps
Begin Testing...
[Epoch 146] train avg loss 0.00140059, dev acc 0.8230, dev avg loss 0.417894, throughput 4.8862K wps
[Epoch 147 Batch 30/62] avg loss 0.00127563, throughput 4.97521K wps
[Epoch 147 Batch 60/62] avg loss 0.00130018, throughput 4.87532K wps
Begin Testing...
[Epoch 147] train avg loss 0.00131154, dev acc 0.8230, dev avg loss 0.418934, throughput 4.931K wps
[Epoch 148 Batch 30/62] avg loss 0.00130709, throughput 4.96927K wps
[Epoch 148 Batch 60/62] avg loss 0.00121163, throughput 4.8477K wps
Begin Testing...
[Epoch 148] train avg loss 0.00126477, dev acc 0.8230, dev avg loss 0.420909, throughput 4.91552K wps
[Epoch 149 Batch 30/62] avg loss 0.00124646, throughput 4.97422K wps
[Epoch 149 Batch 60/62] avg loss 0.00129258, throughput 4.83177K wps
Begin Testing...
[Epoch 149] train avg loss 0.00126932, dev acc 0.8260, dev avg loss 0.42212, throughput 4.90945K wps
[Epoch 150 Batch 30/62] avg loss 0.00117627, throughput 4.96937K wps
[Epoch 150 Batch 60/62] avg loss 0.00129253, throughput 4.8438K wps
Begin Testing...
[Epoch 150] train avg loss 0.00123909, dev acc 0.8260, dev avg loss 0.421011, throughput 4.91169K wps
[Epoch 151 Batch 30/62] avg loss 0.0011603, throughput 4.90742K wps
[Epoch 151 Batch 60/62] avg loss 0.00125494, throughput 4.81415K wps
Begin Testing...
[Epoch 151] train avg loss 0.00122865, dev acc 0.8348, dev avg loss 0.424816, throughput 4.86772K wps
Observed Improvement.
Begin Testing...
[Epoch 152 Batch 30/62] avg loss 0.00126421, throughput 4.94207K wps
[Epoch 152 Batch 60/62] avg loss 0.00113896, throughput 4.84206K wps
Begin Testing...
[Epoch 152] train avg loss 0.00120811, dev acc 0.8289, dev avg loss 0.42111, throughput 4.89985K wps
[Epoch 153 Batch 30/62] avg loss 0.00122426, throughput 4.96044K wps
[Epoch 153 Batch 60/62] avg loss 0.00124705, throughput 4.83794K wps
Begin Testing...
[Epoch 153] train avg loss 0.00125165, dev acc 0.8319, dev avg loss 0.423785, throughput 4.90446K wps
[Epoch 154 Batch 30/62] avg loss 0.00108365, throughput 4.93601K wps
[Epoch 154 Batch 60/62] avg loss 0.00120043, throughput 4.79192K wps
Begin Testing...
[Epoch 154] train avg loss 0.0011538, dev acc 0.8260, dev avg loss 0.423137, throughput 4.86877K wps
[Epoch 155 Batch 30/62] avg loss 0.00113152, throughput 4.91817K wps
[Epoch 155 Batch 60/62] avg loss 0.00118403, throughput 4.83032K wps
Begin Testing...
[Epoch 155] train avg loss 0.00115801, dev acc 0.8289, dev avg loss 0.422134, throughput 4.88112K wps
[Epoch 156 Batch 30/62] avg loss 0.00116643, throughput 4.92678K wps
[Epoch 156 Batch 60/62] avg loss 0.0011346, throughput 4.83118K wps
Begin Testing...
[Epoch 156] train avg loss 0.00115634, dev acc 0.8348, dev avg loss 0.422629, throughput 4.88511K wps
Observed Improvement.
Begin Testing...
[Epoch 157 Batch 30/62] avg loss 0.00109197, throughput 4.94288K wps
[Epoch 157 Batch 60/62] avg loss 0.00119816, throughput 4.79925K wps
Begin Testing...
[Epoch 157] train avg loss 0.00118505, dev acc 0.8260, dev avg loss 0.424904, throughput 4.8753K wps
[Epoch 158 Batch 30/62] avg loss 0.00108017, throughput 4.93171K wps
[Epoch 158 Batch 60/62] avg loss 0.00113251, throughput 4.8262K wps
Begin Testing...
[Epoch 158] train avg loss 0.00111975, dev acc 0.8319, dev avg loss 0.423126, throughput 4.88376K wps
[Epoch 159 Batch 30/62] avg loss 0.00107761, throughput 4.93185K wps
[Epoch 159 Batch 60/62] avg loss 0.00110841, throughput 4.83795K wps
Begin Testing...
[Epoch 159] train avg loss 0.00112279, dev acc 0.8260, dev avg loss 0.42316, throughput 4.89004K wps
[Epoch 160 Batch 30/62] avg loss 0.00105955, throughput 4.92531K wps
[Epoch 160 Batch 60/62] avg loss 0.00114383, throughput 4.86112K wps
Begin Testing...
[Epoch 160] train avg loss 0.00113028, dev acc 0.8230, dev avg loss 0.42516, throughput 4.90021K wps
[Epoch 161 Batch 30/62] avg loss 0.00106864, throughput 4.9659K wps
[Epoch 161 Batch 60/62] avg loss 0.00113653, throughput 4.85823K wps
Begin Testing...
[Epoch 161] train avg loss 0.00112438, dev acc 0.8260, dev avg loss 0.426109, throughput 4.91829K wps
[Epoch 162 Batch 30/62] avg loss 0.0011719, throughput 4.95204K wps
[Epoch 162 Batch 60/62] avg loss 0.00103751, throughput 4.83855K wps
Begin Testing...
[Epoch 162] train avg loss 0.00111564, dev acc 0.8230, dev avg loss 0.426758, throughput 4.90169K wps
[Epoch 163 Batch 30/62] avg loss 0.00112327, throughput 4.93995K wps
[Epoch 163 Batch 60/62] avg loss 0.00110703, throughput 4.84378K wps
Begin Testing...
[Epoch 163] train avg loss 0.00113555, dev acc 0.8348, dev avg loss 0.428327, throughput 4.8972K wps
Observed Improvement.
Begin Testing...
[Epoch 164 Batch 30/62] avg loss 0.00102646, throughput 4.91372K wps
[Epoch 164 Batch 60/62] avg loss 0.00105261, throughput 4.83505K wps
Begin Testing...
[Epoch 164] train avg loss 0.00104664, dev acc 0.8289, dev avg loss 0.429286, throughput 4.8798K wps
[Epoch 165 Batch 30/62] avg loss 0.000993723, throughput 4.93602K wps
[Epoch 165 Batch 60/62] avg loss 0.000969147, throughput 4.82377K wps
Begin Testing...
[Epoch 165] train avg loss 0.00104037, dev acc 0.8260, dev avg loss 0.433639, throughput 4.88556K wps
[Epoch 166 Batch 30/62] avg loss 0.00103125, throughput 4.93263K wps
[Epoch 166 Batch 60/62] avg loss 0.00101036, throughput 4.82917K wps
Begin Testing...
[Epoch 166] train avg loss 0.00102789, dev acc 0.8260, dev avg loss 0.43008, throughput 4.88701K wps
[Epoch 167 Batch 30/62] avg loss 0.00104284, throughput 4.92774K wps
[Epoch 167 Batch 60/62] avg loss 0.000958115, throughput 4.80478K wps
Begin Testing...
[Epoch 167] train avg loss 0.00101473, dev acc 0.8260, dev avg loss 0.431452, throughput 4.87371K wps
[Epoch 168 Batch 30/62] avg loss 0.000993763, throughput 4.9001K wps
[Epoch 168 Batch 60/62] avg loss 0.000949397, throughput 4.81756K wps
Begin Testing...
[Epoch 168] train avg loss 0.00100365, dev acc 0.8230, dev avg loss 0.430207, throughput 4.86588K wps
[Epoch 169 Batch 30/62] avg loss 0.00104061, throughput 4.9306K wps
[Epoch 169 Batch 60/62] avg loss 0.000938038, throughput 4.80034K wps
Begin Testing...
[Epoch 169] train avg loss 0.000988033, dev acc 0.8230, dev avg loss 0.431484, throughput 4.87179K wps
[Epoch 170 Batch 30/62] avg loss 0.000991867, throughput 4.91738K wps
[Epoch 170 Batch 60/62] avg loss 0.000932809, throughput 4.81949K wps
Begin Testing...
[Epoch 170] train avg loss 0.000974802, dev acc 0.8230, dev avg loss 0.432317, throughput 4.87494K wps
[Epoch 171 Batch 30/62] avg loss 0.000977734, throughput 4.92994K wps
[Epoch 171 Batch 60/62] avg loss 0.000941724, throughput 4.8017K wps
Begin Testing...
[Epoch 171] train avg loss 0.000980667, dev acc 0.8201, dev avg loss 0.434028, throughput 4.87207K wps
[Epoch 172 Batch 30/62] avg loss 0.000865354, throughput 4.93807K wps
[Epoch 172 Batch 60/62] avg loss 0.000983529, throughput 4.81584K wps
Begin Testing...
[Epoch 172] train avg loss 0.000930014, dev acc 0.8260, dev avg loss 0.434352, throughput 4.88195K wps
[Epoch 173 Batch 30/62] avg loss 0.00100322, throughput 4.95884K wps
[Epoch 173 Batch 60/62] avg loss 0.00098865, throughput 4.82106K wps
Begin Testing...
[Epoch 173] train avg loss 0.00101373, dev acc 0.8378, dev avg loss 0.438718, throughput 4.89485K wps
Observed Improvement.
Begin Testing...
[Epoch 174 Batch 30/62] avg loss 0.000978976, throughput 4.92989K wps
[Epoch 174 Batch 60/62] avg loss 0.000979541, throughput 4.81615K wps
Begin Testing...
[Epoch 174] train avg loss 0.000979813, dev acc 0.8230, dev avg loss 0.434806, throughput 4.88077K wps
[Epoch 175 Batch 30/62] avg loss 0.000905998, throughput 4.93048K wps
[Epoch 175 Batch 60/62] avg loss 0.000907003, throughput 4.80691K wps
Begin Testing...
[Epoch 175] train avg loss 0.000925599, dev acc 0.8230, dev avg loss 0.435277, throughput 4.87437K wps
[Epoch 176 Batch 30/62] avg loss 0.000948889, throughput 4.92552K wps
[Epoch 176 Batch 60/62] avg loss 0.000908992, throughput 4.82254K wps
Begin Testing...
[Epoch 176] train avg loss 0.000943701, dev acc 0.8260, dev avg loss 0.435107, throughput 4.87888K wps
[Epoch 177 Batch 30/62] avg loss 0.000878182, throughput 4.92796K wps
[Epoch 177 Batch 60/62] avg loss 0.000975693, throughput 4.83092K wps
Begin Testing...
[Epoch 177] train avg loss 0.000944721, dev acc 0.8201, dev avg loss 0.434872, throughput 4.8867K wps
[Epoch 178 Batch 30/62] avg loss 0.000889917, throughput 4.95787K wps
[Epoch 178 Batch 60/62] avg loss 0.000839382, throughput 4.84944K wps
Begin Testing...
[Epoch 178] train avg loss 0.000872623, dev acc 0.8230, dev avg loss 0.436346, throughput 4.90952K wps
[Epoch 179 Batch 30/62] avg loss 0.000889195, throughput 4.95148K wps
[Epoch 179 Batch 60/62] avg loss 0.000927502, throughput 4.82884K wps
Begin Testing...
[Epoch 179] train avg loss 0.000929151, dev acc 0.8201, dev avg loss 0.437887, throughput 4.89594K wps
[Epoch 180 Batch 30/62] avg loss 0.000973668, throughput 4.94167K wps
[Epoch 180 Batch 60/62] avg loss 0.000831781, throughput 4.82141K wps
Begin Testing...
[Epoch 180] train avg loss 0.000921627, dev acc 0.8230, dev avg loss 0.437316, throughput 4.88672K wps
[Epoch 181 Batch 30/62] avg loss 0.000825963, throughput 4.91906K wps
[Epoch 181 Batch 60/62] avg loss 0.000939353, throughput 4.82326K wps
Begin Testing...
[Epoch 181] train avg loss 0.000891966, dev acc 0.8230, dev avg loss 0.437434, throughput 4.87682K wps
[Epoch 182 Batch 30/62] avg loss 0.000867811, throughput 4.92518K wps
[Epoch 182 Batch 60/62] avg loss 0.000823859, throughput 4.80652K wps
Begin Testing...
[Epoch 182] train avg loss 0.000852982, dev acc 0.8289, dev avg loss 0.438244, throughput 4.87042K wps
[Epoch 183 Batch 30/62] avg loss 0.000801359, throughput 4.93218K wps
[Epoch 183 Batch 60/62] avg loss 0.000818466, throughput 4.83988K wps
Begin Testing...
[Epoch 183] train avg loss 0.000809552, dev acc 0.8289, dev avg loss 0.438928, throughput 4.89271K wps
[Epoch 184 Batch 30/62] avg loss 0.000841416, throughput 4.9406K wps
[Epoch 184 Batch 60/62] avg loss 0.000849439, throughput 4.84106K wps
Begin Testing...
[Epoch 184] train avg loss 0.000843917, dev acc 0.8289, dev avg loss 0.438152, throughput 4.89756K wps
[Epoch 185 Batch 30/62] avg loss 0.000835334, throughput 4.93186K wps
[Epoch 185 Batch 60/62] avg loss 0.000772945, throughput 4.82953K wps
Begin Testing...
[Epoch 185] train avg loss 0.000809165, dev acc 0.8260, dev avg loss 0.439601, throughput 4.88774K wps
[Epoch 186 Batch 30/62] avg loss 0.000811842, throughput 4.91802K wps
[Epoch 186 Batch 60/62] avg loss 0.000820948, throughput 4.82284K wps
Begin Testing...
[Epoch 186] train avg loss 0.000822724, dev acc 0.8319, dev avg loss 0.442136, throughput 4.87677K wps
[Epoch 187 Batch 30/62] avg loss 0.000821102, throughput 4.90497K wps
[Epoch 187 Batch 60/62] avg loss 0.000865831, throughput 4.79484K wps
Begin Testing...
[Epoch 187] train avg loss 0.000849243, dev acc 0.8230, dev avg loss 0.441778, throughput 4.8553K wps
[Epoch 188 Batch 30/62] avg loss 0.000836062, throughput 4.93783K wps
[Epoch 188 Batch 60/62] avg loss 0.000810722, throughput 4.81504K wps
Begin Testing...
[Epoch 188] train avg loss 0.000853639, dev acc 0.8289, dev avg loss 0.44292, throughput 4.88158K wps
[Epoch 189 Batch 30/62] avg loss 0.000733967, throughput 4.93635K wps
[Epoch 189 Batch 60/62] avg loss 0.000819734, throughput 4.82994K wps
Begin Testing...
[Epoch 189] train avg loss 0.000802793, dev acc 0.8289, dev avg loss 0.442318, throughput 4.88986K wps
[Epoch 190 Batch 30/62] avg loss 0.000796659, throughput 4.9462K wps
[Epoch 190 Batch 60/62] avg loss 0.000839895, throughput 4.8178K wps
Begin Testing...
[Epoch 190] train avg loss 0.000832549, dev acc 0.8260, dev avg loss 0.441422, throughput 4.88707K wps
[Epoch 191 Batch 30/62] avg loss 0.000791413, throughput 4.93073K wps
[Epoch 191 Batch 60/62] avg loss 0.000744238, throughput 4.80565K wps
Begin Testing...
[Epoch 191] train avg loss 0.000766732, dev acc 0.8289, dev avg loss 0.442979, throughput 4.87533K wps
[Epoch 192 Batch 30/62] avg loss 0.000746033, throughput 4.94018K wps
[Epoch 192 Batch 60/62] avg loss 0.00069881, throughput 4.82498K wps
Begin Testing...
[Epoch 192] train avg loss 0.000728788, dev acc 0.8260, dev avg loss 0.442207, throughput 4.88761K wps
[Epoch 193 Batch 30/62] avg loss 0.000729343, throughput 4.91415K wps
[Epoch 193 Batch 60/62] avg loss 0.000776538, throughput 4.8173K wps
Begin Testing...
[Epoch 193] train avg loss 0.000764997, dev acc 0.8319, dev avg loss 0.442535, throughput 4.87276K wps
[Epoch 194 Batch 30/62] avg loss 0.000720262, throughput 4.92807K wps
[Epoch 194 Batch 60/62] avg loss 0.000708799, throughput 4.81154K wps
Begin Testing...
[Epoch 194] train avg loss 0.000713206, dev acc 0.8319, dev avg loss 0.442688, throughput 4.87595K wps
[Epoch 195 Batch 30/62] avg loss 0.00079499, throughput 4.91599K wps
[Epoch 195 Batch 60/62] avg loss 0.000792924, throughput 4.77961K wps
Begin Testing...
[Epoch 195] train avg loss 0.00079928, dev acc 0.8230, dev avg loss 0.442847, throughput 4.85429K wps
[Epoch 196 Batch 30/62] avg loss 0.000694746, throughput 4.93557K wps
[Epoch 196 Batch 60/62] avg loss 0.000759288, throughput 4.81568K wps
Begin Testing...
[Epoch 196] train avg loss 0.00075999, dev acc 0.8260, dev avg loss 0.445062, throughput 4.88092K wps
[Epoch 197 Batch 30/62] avg loss 0.000780717, throughput 4.92897K wps
[Epoch 197 Batch 60/62] avg loss 0.000719829, throughput 4.84817K wps
Begin Testing...
[Epoch 197] train avg loss 0.000762046, dev acc 0.8230, dev avg loss 0.445097, throughput 4.89337K wps
[Epoch 198 Batch 30/62] avg loss 0.000726032, throughput 4.95112K wps
[Epoch 198 Batch 60/62] avg loss 0.000775801, throughput 4.82837K wps
Begin Testing...
[Epoch 198] train avg loss 0.000764433, dev acc 0.8348, dev avg loss 0.447533, throughput 4.89544K wps
[Epoch 199 Batch 30/62] avg loss 0.000710051, throughput 4.94191K wps
[Epoch 199 Batch 60/62] avg loss 0.000746546, throughput 4.80315K wps
Begin Testing...
[Epoch 199] train avg loss 0.000731783, dev acc 0.8289, dev avg loss 0.445931, throughput 4.87994K wps
Test loss 0.329627, test acc 0.8621
Total time cost 301.12s
[Epoch 0 Batch 30/62] avg loss 0.0134099, throughput 4.68396K wps
[Epoch 0 Batch 60/62] avg loss 0.0130446, throughput 4.82706K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133665, dev acc 0.6519, dev avg loss 0.646416, throughput 4.76497K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0132915, throughput 4.92584K wps
[Epoch 1 Batch 60/62] avg loss 0.012984, throughput 4.82071K wps
Begin Testing...
[Epoch 1] train avg loss 0.0132989, dev acc 0.6519, dev avg loss 0.639813, throughput 4.87886K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0129344, throughput 4.94846K wps
[Epoch 2 Batch 60/62] avg loss 0.0129028, throughput 4.83946K wps
Begin Testing...
[Epoch 2] train avg loss 0.0130489, dev acc 0.6519, dev avg loss 0.634614, throughput 4.90011K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0128663, throughput 4.98224K wps
[Epoch 3 Batch 60/62] avg loss 0.0127377, throughput 4.8779K wps
Begin Testing...
[Epoch 3] train avg loss 0.0129831, dev acc 0.6519, dev avg loss 0.628229, throughput 4.93663K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0128713, throughput 4.97545K wps
[Epoch 4 Batch 60/62] avg loss 0.0125316, throughput 4.85148K wps
Begin Testing...
[Epoch 4] train avg loss 0.0129052, dev acc 0.6519, dev avg loss 0.624134, throughput 4.91802K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0125555, throughput 4.94017K wps
[Epoch 5 Batch 60/62] avg loss 0.0124649, throughput 4.81369K wps
Begin Testing...
[Epoch 5] train avg loss 0.0126641, dev acc 0.6519, dev avg loss 0.61818, throughput 4.88206K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0123767, throughput 4.93163K wps
[Epoch 6 Batch 60/62] avg loss 0.0123722, throughput 4.84408K wps
Begin Testing...
[Epoch 6] train avg loss 0.0125669, dev acc 0.6519, dev avg loss 0.613508, throughput 4.89525K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0124189, throughput 4.95376K wps
[Epoch 7 Batch 60/62] avg loss 0.0121335, throughput 4.81237K wps
Begin Testing...
[Epoch 7] train avg loss 0.0124299, dev acc 0.6519, dev avg loss 0.608955, throughput 4.88975K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0120111, throughput 4.93913K wps
[Epoch 8 Batch 60/62] avg loss 0.0122622, throughput 4.83707K wps
Begin Testing...
[Epoch 8] train avg loss 0.0122635, dev acc 0.6519, dev avg loss 0.602553, throughput 4.8927K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0119937, throughput 4.93191K wps
[Epoch 9 Batch 60/62] avg loss 0.0119565, throughput 4.81971K wps
Begin Testing...
[Epoch 9] train avg loss 0.012096, dev acc 0.6519, dev avg loss 0.59704, throughput 4.88144K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0120836, throughput 4.94584K wps
[Epoch 10 Batch 60/62] avg loss 0.0117503, throughput 4.81265K wps
Begin Testing...
[Epoch 10] train avg loss 0.0120119, dev acc 0.6519, dev avg loss 0.592398, throughput 4.88413K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0117452, throughput 4.93483K wps
[Epoch 11 Batch 60/62] avg loss 0.0116594, throughput 4.82888K wps
Begin Testing...
[Epoch 11] train avg loss 0.0118214, dev acc 0.6519, dev avg loss 0.585172, throughput 4.88693K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0118966, throughput 4.91821K wps
[Epoch 12 Batch 60/62] avg loss 0.0114391, throughput 4.81517K wps
Begin Testing...
[Epoch 12] train avg loss 0.0118318, dev acc 0.6667, dev avg loss 0.578228, throughput 4.87262K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0114877, throughput 4.92659K wps
[Epoch 13 Batch 60/62] avg loss 0.0115528, throughput 4.78595K wps
Begin Testing...
[Epoch 13] train avg loss 0.0116332, dev acc 0.6726, dev avg loss 0.5707, throughput 4.86199K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0112106, throughput 4.90366K wps
[Epoch 14 Batch 60/62] avg loss 0.0113691, throughput 4.81136K wps
Begin Testing...
[Epoch 14] train avg loss 0.0114017, dev acc 0.6785, dev avg loss 0.563237, throughput 4.8642K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0112011, throughput 4.92966K wps
[Epoch 15 Batch 60/62] avg loss 0.0110826, throughput 4.82143K wps
Begin Testing...
[Epoch 15] train avg loss 0.0113134, dev acc 0.6932, dev avg loss 0.555771, throughput 4.88156K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0111078, throughput 4.94124K wps
[Epoch 16 Batch 60/62] avg loss 0.0108848, throughput 4.86461K wps
Begin Testing...
[Epoch 16] train avg loss 0.0111489, dev acc 0.6903, dev avg loss 0.547996, throughput 4.91023K wps
[Epoch 17 Batch 30/62] avg loss 0.0107977, throughput 4.97723K wps
[Epoch 17 Batch 60/62] avg loss 0.0107559, throughput 4.88258K wps
Begin Testing...
[Epoch 17] train avg loss 0.0109077, dev acc 0.6962, dev avg loss 0.540487, throughput 4.9368K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.0105613, throughput 4.96792K wps
[Epoch 18 Batch 60/62] avg loss 0.0105882, throughput 4.86076K wps
Begin Testing...
[Epoch 18] train avg loss 0.0107087, dev acc 0.6932, dev avg loss 0.533039, throughput 4.92264K wps
[Epoch 19 Batch 30/62] avg loss 0.0102623, throughput 4.99227K wps
[Epoch 19 Batch 60/62] avg loss 0.0105188, throughput 4.85748K wps
Begin Testing...
[Epoch 19] train avg loss 0.0105213, dev acc 0.7434, dev avg loss 0.523578, throughput 4.93071K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0104173, throughput 4.97263K wps
[Epoch 20 Batch 60/62] avg loss 0.0101783, throughput 4.86806K wps
Begin Testing...
[Epoch 20] train avg loss 0.010454, dev acc 0.6991, dev avg loss 0.517278, throughput 4.92558K wps
[Epoch 21 Batch 30/62] avg loss 0.0101931, throughput 4.93235K wps
[Epoch 21 Batch 60/62] avg loss 0.010019, throughput 4.82543K wps
Begin Testing...
[Epoch 21] train avg loss 0.0102879, dev acc 0.7493, dev avg loss 0.507704, throughput 4.88582K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.00992343, throughput 4.93037K wps
[Epoch 22 Batch 60/62] avg loss 0.00999161, throughput 4.8123K wps
Begin Testing...
[Epoch 22] train avg loss 0.0100846, dev acc 0.7375, dev avg loss 0.500609, throughput 4.87847K wps
[Epoch 23 Batch 30/62] avg loss 0.00988965, throughput 4.92739K wps
[Epoch 23 Batch 60/62] avg loss 0.0097332, throughput 4.81588K wps
Begin Testing...
[Epoch 23] train avg loss 0.0099604, dev acc 0.7758, dev avg loss 0.492612, throughput 4.87732K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.00955312, throughput 4.91957K wps
[Epoch 24 Batch 60/62] avg loss 0.00969825, throughput 4.82345K wps
Begin Testing...
[Epoch 24] train avg loss 0.00968257, dev acc 0.7404, dev avg loss 0.487209, throughput 4.87757K wps
[Epoch 25 Batch 30/62] avg loss 0.00964048, throughput 4.91796K wps
[Epoch 25 Batch 60/62] avg loss 0.00925508, throughput 4.80527K wps
Begin Testing...
[Epoch 25] train avg loss 0.00953425, dev acc 0.7434, dev avg loss 0.478458, throughput 4.86781K wps
[Epoch 26 Batch 30/62] avg loss 0.00911522, throughput 4.92207K wps
[Epoch 26 Batch 60/62] avg loss 0.00932824, throughput 4.82761K wps
Begin Testing...
[Epoch 26] train avg loss 0.00937965, dev acc 0.7434, dev avg loss 0.472407, throughput 4.88234K wps
[Epoch 27 Batch 30/62] avg loss 0.00907666, throughput 4.93758K wps
[Epoch 27 Batch 60/62] avg loss 0.00904748, throughput 4.83067K wps
Begin Testing...
[Epoch 27] train avg loss 0.00921966, dev acc 0.7758, dev avg loss 0.464757, throughput 4.89042K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.00891607, throughput 4.94239K wps
[Epoch 28 Batch 60/62] avg loss 0.00878773, throughput 4.82987K wps
Begin Testing...
[Epoch 28] train avg loss 0.00897389, dev acc 0.7788, dev avg loss 0.458109, throughput 4.8925K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00875677, throughput 4.96438K wps
[Epoch 29 Batch 60/62] avg loss 0.00878111, throughput 4.86845K wps
Begin Testing...
[Epoch 29] train avg loss 0.00885926, dev acc 0.7876, dev avg loss 0.45283, throughput 4.92216K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00888479, throughput 4.96821K wps
[Epoch 30 Batch 60/62] avg loss 0.00831968, throughput 4.85631K wps
Begin Testing...
[Epoch 30] train avg loss 0.00872586, dev acc 0.8024, dev avg loss 0.44673, throughput 4.91931K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00848312, throughput 4.9756K wps
[Epoch 31 Batch 60/62] avg loss 0.00868877, throughput 4.89457K wps
Begin Testing...
[Epoch 31] train avg loss 0.00866296, dev acc 0.7729, dev avg loss 0.445407, throughput 4.94221K wps
[Epoch 32 Batch 30/62] avg loss 0.00835721, throughput 5.00664K wps
[Epoch 32 Batch 60/62] avg loss 0.00823196, throughput 4.9001K wps
Begin Testing...
[Epoch 32] train avg loss 0.00841306, dev acc 0.7906, dev avg loss 0.437846, throughput 4.96026K wps
[Epoch 33 Batch 30/62] avg loss 0.00808875, throughput 5.00907K wps
[Epoch 33 Batch 60/62] avg loss 0.00834186, throughput 4.88048K wps
Begin Testing...
[Epoch 33] train avg loss 0.00836758, dev acc 0.8142, dev avg loss 0.431527, throughput 4.94959K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.00813909, throughput 4.97104K wps
[Epoch 34 Batch 60/62] avg loss 0.00803888, throughput 4.84622K wps
Begin Testing...
[Epoch 34] train avg loss 0.00825095, dev acc 0.8112, dev avg loss 0.427385, throughput 4.91481K wps
[Epoch 35 Batch 30/62] avg loss 0.00823723, throughput 4.9513K wps
[Epoch 35 Batch 60/62] avg loss 0.00776545, throughput 4.80653K wps
Begin Testing...
[Epoch 35] train avg loss 0.00809545, dev acc 0.8142, dev avg loss 0.424256, throughput 4.88354K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/62] avg loss 0.00765785, throughput 4.91832K wps
[Epoch 36 Batch 60/62] avg loss 0.00786989, throughput 4.81636K wps
Begin Testing...
[Epoch 36] train avg loss 0.00781196, dev acc 0.7994, dev avg loss 0.42301, throughput 4.87415K wps
[Epoch 37 Batch 30/62] avg loss 0.00749041, throughput 4.93363K wps
[Epoch 37 Batch 60/62] avg loss 0.00767455, throughput 4.83585K wps
Begin Testing...
[Epoch 37] train avg loss 0.00772687, dev acc 0.8083, dev avg loss 0.419101, throughput 4.89096K wps
[Epoch 38 Batch 30/62] avg loss 0.00749659, throughput 4.94208K wps
[Epoch 38 Batch 60/62] avg loss 0.00745531, throughput 4.8274K wps
Begin Testing...
[Epoch 38] train avg loss 0.00761545, dev acc 0.8112, dev avg loss 0.412881, throughput 4.89122K wps
[Epoch 39 Batch 30/62] avg loss 0.0072405, throughput 4.95601K wps
[Epoch 39 Batch 60/62] avg loss 0.00760926, throughput 4.83294K wps
Begin Testing...
[Epoch 39] train avg loss 0.00749484, dev acc 0.8024, dev avg loss 0.40942, throughput 4.90027K wps
[Epoch 40 Batch 30/62] avg loss 0.00734128, throughput 4.92981K wps
[Epoch 40 Batch 60/62] avg loss 0.00714827, throughput 4.84475K wps
Begin Testing...
[Epoch 40] train avg loss 0.00730697, dev acc 0.8083, dev avg loss 0.409631, throughput 4.89346K wps
[Epoch 41 Batch 30/62] avg loss 0.00702218, throughput 4.95533K wps
[Epoch 41 Batch 60/62] avg loss 0.00714336, throughput 4.83847K wps
Begin Testing...
[Epoch 41] train avg loss 0.00722079, dev acc 0.8083, dev avg loss 0.403651, throughput 4.90326K wps
[Epoch 42 Batch 30/62] avg loss 0.00694606, throughput 4.96792K wps
[Epoch 42 Batch 60/62] avg loss 0.00719079, throughput 4.85211K wps
Begin Testing...
[Epoch 42] train avg loss 0.00715697, dev acc 0.8142, dev avg loss 0.402859, throughput 4.91523K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/62] avg loss 0.00698728, throughput 4.95809K wps
[Epoch 43 Batch 60/62] avg loss 0.00690147, throughput 4.88938K wps
Begin Testing...
[Epoch 43] train avg loss 0.00697598, dev acc 0.8083, dev avg loss 0.401681, throughput 4.93097K wps
[Epoch 44 Batch 30/62] avg loss 0.00678272, throughput 5.0053K wps
[Epoch 44 Batch 60/62] avg loss 0.00706757, throughput 4.90283K wps
Begin Testing...
[Epoch 44] train avg loss 0.00693682, dev acc 0.8112, dev avg loss 0.396419, throughput 4.96051K wps
[Epoch 45 Batch 30/62] avg loss 0.00663867, throughput 5.00246K wps
[Epoch 45 Batch 60/62] avg loss 0.00665639, throughput 4.87719K wps
Begin Testing...
[Epoch 45] train avg loss 0.00669289, dev acc 0.8112, dev avg loss 0.396405, throughput 4.94672K wps
[Epoch 46 Batch 30/62] avg loss 0.00646484, throughput 5.00214K wps
[Epoch 46 Batch 60/62] avg loss 0.00650213, throughput 4.88385K wps
Begin Testing...
[Epoch 46] train avg loss 0.00659313, dev acc 0.8171, dev avg loss 0.391965, throughput 4.94834K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.00650411, throughput 4.96076K wps
[Epoch 47 Batch 60/62] avg loss 0.00648811, throughput 4.8401K wps
Begin Testing...
[Epoch 47] train avg loss 0.00657487, dev acc 0.8201, dev avg loss 0.390306, throughput 4.90563K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/62] avg loss 0.00655313, throughput 4.94375K wps
[Epoch 48 Batch 60/62] avg loss 0.00617592, throughput 4.82847K wps
Begin Testing...
[Epoch 48] train avg loss 0.00645945, dev acc 0.8112, dev avg loss 0.391324, throughput 4.89251K wps
[Epoch 49 Batch 30/62] avg loss 0.00637726, throughput 4.97022K wps
[Epoch 49 Batch 60/62] avg loss 0.00621473, throughput 4.89387K wps
Begin Testing...
[Epoch 49] train avg loss 0.00641142, dev acc 0.7994, dev avg loss 0.392587, throughput 4.93825K wps
[Epoch 50 Batch 30/62] avg loss 0.00618834, throughput 4.95552K wps
[Epoch 50 Batch 60/62] avg loss 0.00594493, throughput 4.85595K wps
Begin Testing...
[Epoch 50] train avg loss 0.00612799, dev acc 0.8083, dev avg loss 0.388695, throughput 4.91137K wps
[Epoch 51 Batch 30/62] avg loss 0.00593473, throughput 4.9419K wps
[Epoch 51 Batch 60/62] avg loss 0.00587289, throughput 4.83574K wps
Begin Testing...
[Epoch 51] train avg loss 0.00598859, dev acc 0.8201, dev avg loss 0.382945, throughput 4.90224K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00615284, throughput 4.95082K wps
[Epoch 52 Batch 60/62] avg loss 0.00596559, throughput 4.82788K wps
Begin Testing...
[Epoch 52] train avg loss 0.00610373, dev acc 0.8142, dev avg loss 0.382858, throughput 4.89398K wps
[Epoch 53 Batch 30/62] avg loss 0.00602529, throughput 4.92749K wps
[Epoch 53 Batch 60/62] avg loss 0.00574941, throughput 4.8359K wps
Begin Testing...
[Epoch 53] train avg loss 0.00595755, dev acc 0.8142, dev avg loss 0.38424, throughput 4.88843K wps
[Epoch 54 Batch 30/62] avg loss 0.00581721, throughput 4.95459K wps
[Epoch 54 Batch 60/62] avg loss 0.00558414, throughput 4.8518K wps
Begin Testing...
[Epoch 54] train avg loss 0.00575721, dev acc 0.8142, dev avg loss 0.379283, throughput 4.90822K wps
[Epoch 55 Batch 30/62] avg loss 0.00545245, throughput 4.92984K wps
[Epoch 55 Batch 60/62] avg loss 0.0057634, throughput 4.84832K wps
Begin Testing...
[Epoch 55] train avg loss 0.00571465, dev acc 0.8142, dev avg loss 0.379262, throughput 4.89549K wps
[Epoch 56 Batch 30/62] avg loss 0.0055362, throughput 4.93664K wps
[Epoch 56 Batch 60/62] avg loss 0.00544183, throughput 4.81803K wps
Begin Testing...
[Epoch 56] train avg loss 0.00553538, dev acc 0.8112, dev avg loss 0.382911, throughput 4.88398K wps
[Epoch 57 Batch 30/62] avg loss 0.00543585, throughput 4.94751K wps
[Epoch 57 Batch 60/62] avg loss 0.00535671, throughput 4.87655K wps
Begin Testing...
[Epoch 57] train avg loss 0.00546414, dev acc 0.8142, dev avg loss 0.378338, throughput 4.91994K wps
[Epoch 58 Batch 30/62] avg loss 0.00547686, throughput 4.97509K wps
[Epoch 58 Batch 60/62] avg loss 0.00538484, throughput 4.89385K wps
Begin Testing...
[Epoch 58] train avg loss 0.0054941, dev acc 0.8171, dev avg loss 0.372371, throughput 4.9409K wps
[Epoch 59 Batch 30/62] avg loss 0.00523452, throughput 4.96529K wps
[Epoch 59 Batch 60/62] avg loss 0.00518318, throughput 4.85246K wps
Begin Testing...
[Epoch 59] train avg loss 0.00535765, dev acc 0.8142, dev avg loss 0.379215, throughput 4.914K wps
[Epoch 60 Batch 30/62] avg loss 0.00517399, throughput 4.95444K wps
[Epoch 60 Batch 60/62] avg loss 0.0049352, throughput 4.85281K wps
Begin Testing...
[Epoch 60] train avg loss 0.00519129, dev acc 0.8230, dev avg loss 0.370424, throughput 4.90871K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/62] avg loss 0.00501043, throughput 4.94992K wps
[Epoch 61 Batch 60/62] avg loss 0.00505466, throughput 4.83451K wps
Begin Testing...
[Epoch 61] train avg loss 0.00511686, dev acc 0.8201, dev avg loss 0.369856, throughput 4.8981K wps
[Epoch 62 Batch 30/62] avg loss 0.00493273, throughput 4.91589K wps
[Epoch 62 Batch 60/62] avg loss 0.00477221, throughput 4.82087K wps
Begin Testing...
[Epoch 62] train avg loss 0.00492498, dev acc 0.8171, dev avg loss 0.37668, throughput 4.87547K wps
[Epoch 63 Batch 30/62] avg loss 0.00486431, throughput 4.92583K wps
[Epoch 63 Batch 60/62] avg loss 0.00489938, throughput 4.81987K wps
Begin Testing...
[Epoch 63] train avg loss 0.00491704, dev acc 0.8171, dev avg loss 0.367958, throughput 4.8795K wps
[Epoch 64 Batch 30/62] avg loss 0.00491741, throughput 4.94578K wps
[Epoch 64 Batch 60/62] avg loss 0.00453007, throughput 4.88203K wps
Begin Testing...
[Epoch 64] train avg loss 0.00485339, dev acc 0.8201, dev avg loss 0.367605, throughput 4.92151K wps
[Epoch 65 Batch 30/62] avg loss 0.00472245, throughput 4.98626K wps
[Epoch 65 Batch 60/62] avg loss 0.00479169, throughput 4.86572K wps
Begin Testing...
[Epoch 65] train avg loss 0.004807, dev acc 0.8230, dev avg loss 0.366992, throughput 4.93116K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/62] avg loss 0.00441512, throughput 4.95472K wps
[Epoch 66 Batch 60/62] avg loss 0.00464475, throughput 4.84388K wps
Begin Testing...
[Epoch 66] train avg loss 0.00458398, dev acc 0.8260, dev avg loss 0.365095, throughput 4.905K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/62] avg loss 0.00432602, throughput 4.96949K wps
[Epoch 67 Batch 60/62] avg loss 0.00475072, throughput 4.82203K wps
Begin Testing...
[Epoch 67] train avg loss 0.00457005, dev acc 0.8230, dev avg loss 0.365649, throughput 4.90199K wps
[Epoch 68 Batch 30/62] avg loss 0.0043508, throughput 4.93472K wps
[Epoch 68 Batch 60/62] avg loss 0.00466272, throughput 4.83088K wps
Begin Testing...
[Epoch 68] train avg loss 0.00451397, dev acc 0.8230, dev avg loss 0.365332, throughput 4.8894K wps
[Epoch 69 Batch 30/62] avg loss 0.0043271, throughput 4.91997K wps
[Epoch 69 Batch 60/62] avg loss 0.00432996, throughput 4.83217K wps
Begin Testing...
[Epoch 69] train avg loss 0.00442936, dev acc 0.8230, dev avg loss 0.363693, throughput 4.88293K wps
[Epoch 70 Batch 30/62] avg loss 0.00411349, throughput 4.93808K wps
[Epoch 70 Batch 60/62] avg loss 0.00449746, throughput 4.84246K wps
Begin Testing...
[Epoch 70] train avg loss 0.00438911, dev acc 0.8142, dev avg loss 0.367183, throughput 4.89773K wps
[Epoch 71 Batch 30/62] avg loss 0.00407207, throughput 4.98986K wps
[Epoch 71 Batch 60/62] avg loss 0.00431133, throughput 4.90196K wps
Begin Testing...
[Epoch 71] train avg loss 0.00424653, dev acc 0.8230, dev avg loss 0.363468, throughput 4.95115K wps
[Epoch 72 Batch 30/62] avg loss 0.00408908, throughput 5.01219K wps
[Epoch 72 Batch 60/62] avg loss 0.0041106, throughput 4.8796K wps
Begin Testing...
[Epoch 72] train avg loss 0.00417799, dev acc 0.8171, dev avg loss 0.367309, throughput 4.95039K wps
[Epoch 73 Batch 30/62] avg loss 0.00398242, throughput 4.94964K wps
[Epoch 73 Batch 60/62] avg loss 0.00414229, throughput 4.84203K wps
Begin Testing...
[Epoch 73] train avg loss 0.00410849, dev acc 0.8289, dev avg loss 0.36142, throughput 4.90099K wps
Observed Improvement.
Begin Testing...
[Epoch 74 Batch 30/62] avg loss 0.00417423, throughput 4.93279K wps
[Epoch 74 Batch 60/62] avg loss 0.00386422, throughput 4.85081K wps
Begin Testing...
[Epoch 74] train avg loss 0.00403197, dev acc 0.8230, dev avg loss 0.361085, throughput 4.89957K wps
[Epoch 75 Batch 30/62] avg loss 0.00380108, throughput 4.96357K wps
[Epoch 75 Batch 60/62] avg loss 0.00398891, throughput 4.83979K wps
Begin Testing...
[Epoch 75] train avg loss 0.00394984, dev acc 0.8260, dev avg loss 0.361875, throughput 4.90696K wps
[Epoch 76 Batch 30/62] avg loss 0.00381086, throughput 4.96707K wps
[Epoch 76 Batch 60/62] avg loss 0.00391544, throughput 4.8487K wps
Begin Testing...
[Epoch 76] train avg loss 0.00388359, dev acc 0.8142, dev avg loss 0.368756, throughput 4.91384K wps
[Epoch 77 Batch 30/62] avg loss 0.00383243, throughput 4.96847K wps
[Epoch 77 Batch 60/62] avg loss 0.00366876, throughput 4.86023K wps
Begin Testing...
[Epoch 77] train avg loss 0.0037864, dev acc 0.8171, dev avg loss 0.36292, throughput 4.92065K wps
[Epoch 78 Batch 30/62] avg loss 0.00359763, throughput 4.97246K wps
[Epoch 78 Batch 60/62] avg loss 0.00387535, throughput 4.88638K wps
Begin Testing...
[Epoch 78] train avg loss 0.00378179, dev acc 0.8260, dev avg loss 0.360518, throughput 4.93644K wps
[Epoch 79 Batch 30/62] avg loss 0.0037129, throughput 5.00529K wps
[Epoch 79 Batch 60/62] avg loss 0.0035273, throughput 4.90318K wps
Begin Testing...
[Epoch 79] train avg loss 0.00367243, dev acc 0.8171, dev avg loss 0.364423, throughput 4.95608K wps
[Epoch 80 Batch 30/62] avg loss 0.00375206, throughput 4.94922K wps
[Epoch 80 Batch 60/62] avg loss 0.00346609, throughput 4.8221K wps
Begin Testing...
[Epoch 80] train avg loss 0.00373694, dev acc 0.8112, dev avg loss 0.373765, throughput 4.89147K wps
[Epoch 81 Batch 30/62] avg loss 0.00362442, throughput 4.94427K wps
[Epoch 81 Batch 60/62] avg loss 0.00337154, throughput 4.8395K wps
Begin Testing...
[Epoch 81] train avg loss 0.00352099, dev acc 0.8142, dev avg loss 0.365539, throughput 4.89854K wps
[Epoch 82 Batch 30/62] avg loss 0.003508, throughput 4.96419K wps
[Epoch 82 Batch 60/62] avg loss 0.00348821, throughput 4.86891K wps
Begin Testing...
[Epoch 82] train avg loss 0.00353442, dev acc 0.8319, dev avg loss 0.360954, throughput 4.92246K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/62] avg loss 0.0032571, throughput 4.96171K wps
[Epoch 83 Batch 60/62] avg loss 0.00331252, throughput 4.84112K wps
Begin Testing...
[Epoch 83] train avg loss 0.00330893, dev acc 0.8201, dev avg loss 0.362853, throughput 4.90759K wps
[Epoch 84 Batch 30/62] avg loss 0.003276, throughput 4.96065K wps
[Epoch 84 Batch 60/62] avg loss 0.00330088, throughput 4.84292K wps
Begin Testing...
[Epoch 84] train avg loss 0.0033705, dev acc 0.8171, dev avg loss 0.362915, throughput 4.90753K wps
[Epoch 85 Batch 30/62] avg loss 0.00324174, throughput 4.95044K wps
[Epoch 85 Batch 60/62] avg loss 0.00323805, throughput 4.84705K wps
Begin Testing...
[Epoch 85] train avg loss 0.00329551, dev acc 0.8230, dev avg loss 0.362266, throughput 4.90633K wps
[Epoch 86 Batch 30/62] avg loss 0.00309364, throughput 4.96381K wps
[Epoch 86 Batch 60/62] avg loss 0.00331561, throughput 4.79428K wps
Begin Testing...
[Epoch 86] train avg loss 0.00323442, dev acc 0.8230, dev avg loss 0.359615, throughput 4.88562K wps
[Epoch 87 Batch 30/62] avg loss 0.00309573, throughput 4.91643K wps
[Epoch 87 Batch 60/62] avg loss 0.00319864, throughput 4.81687K wps
Begin Testing...
[Epoch 87] train avg loss 0.0031761, dev acc 0.8230, dev avg loss 0.359915, throughput 4.87364K wps
[Epoch 88 Batch 30/62] avg loss 0.00305051, throughput 4.94587K wps
[Epoch 88 Batch 60/62] avg loss 0.0030045, throughput 4.84338K wps
Begin Testing...
[Epoch 88] train avg loss 0.00307906, dev acc 0.8230, dev avg loss 0.360739, throughput 4.90048K wps
[Epoch 89 Batch 30/62] avg loss 0.00297628, throughput 4.94061K wps
[Epoch 89 Batch 60/62] avg loss 0.00313188, throughput 4.86182K wps
Begin Testing...
[Epoch 89] train avg loss 0.00308311, dev acc 0.8230, dev avg loss 0.35955, throughput 4.90711K wps
[Epoch 90 Batch 30/62] avg loss 0.00302483, throughput 4.98212K wps
[Epoch 90 Batch 60/62] avg loss 0.00306205, throughput 4.87855K wps
Begin Testing...
[Epoch 90] train avg loss 0.00311977, dev acc 0.8201, dev avg loss 0.366193, throughput 4.93697K wps
[Epoch 91 Batch 30/62] avg loss 0.00297114, throughput 5.00296K wps
[Epoch 91 Batch 60/62] avg loss 0.00307469, throughput 4.86875K wps
Begin Testing...
[Epoch 91] train avg loss 0.00306379, dev acc 0.8171, dev avg loss 0.361534, throughput 4.94146K wps
[Epoch 92 Batch 30/62] avg loss 0.00278875, throughput 4.99403K wps
[Epoch 92 Batch 60/62] avg loss 0.00300699, throughput 4.86496K wps
Begin Testing...
[Epoch 92] train avg loss 0.00296748, dev acc 0.8171, dev avg loss 0.364042, throughput 4.93468K wps
[Epoch 93 Batch 30/62] avg loss 0.00267786, throughput 4.95207K wps
[Epoch 93 Batch 60/62] avg loss 0.0029485, throughput 4.85672K wps
Begin Testing...
[Epoch 93] train avg loss 0.00289778, dev acc 0.8201, dev avg loss 0.360139, throughput 4.91129K wps
[Epoch 94 Batch 30/62] avg loss 0.00270546, throughput 4.94559K wps
[Epoch 94 Batch 60/62] avg loss 0.00287019, throughput 4.83256K wps
Begin Testing...
[Epoch 94] train avg loss 0.00280006, dev acc 0.8201, dev avg loss 0.369843, throughput 4.89495K wps
[Epoch 95 Batch 30/62] avg loss 0.00273958, throughput 4.94492K wps
[Epoch 95 Batch 60/62] avg loss 0.00279795, throughput 4.88401K wps
Begin Testing...
[Epoch 95] train avg loss 0.00277377, dev acc 0.8171, dev avg loss 0.362251, throughput 4.92042K wps
[Epoch 96 Batch 30/62] avg loss 0.00262148, throughput 4.99376K wps
[Epoch 96 Batch 60/62] avg loss 0.00273335, throughput 4.86255K wps
Begin Testing...
[Epoch 96] train avg loss 0.00267521, dev acc 0.8201, dev avg loss 0.36833, throughput 4.933K wps
[Epoch 97 Batch 30/62] avg loss 0.00265383, throughput 4.97K wps
[Epoch 97 Batch 60/62] avg loss 0.00272691, throughput 4.84746K wps
Begin Testing...
[Epoch 97] train avg loss 0.00271371, dev acc 0.8171, dev avg loss 0.370007, throughput 4.91366K wps
[Epoch 98 Batch 30/62] avg loss 0.00270589, throughput 4.92809K wps
[Epoch 98 Batch 60/62] avg loss 0.00245755, throughput 4.8262K wps
Begin Testing...
[Epoch 98] train avg loss 0.00260252, dev acc 0.8171, dev avg loss 0.36735, throughput 4.88362K wps
[Epoch 99 Batch 30/62] avg loss 0.00242742, throughput 4.93346K wps
[Epoch 99 Batch 60/62] avg loss 0.00264725, throughput 4.82007K wps
Begin Testing...
[Epoch 99] train avg loss 0.00261992, dev acc 0.8437, dev avg loss 0.359594, throughput 4.88396K wps
Observed Improvement.
Begin Testing...
[Epoch 100 Batch 30/62] avg loss 0.00243491, throughput 4.9144K wps
[Epoch 100 Batch 60/62] avg loss 0.00255481, throughput 4.80826K wps
Begin Testing...
[Epoch 100] train avg loss 0.00250526, dev acc 0.8112, dev avg loss 0.37034, throughput 4.86852K wps
[Epoch 101 Batch 30/62] avg loss 0.00231287, throughput 4.92267K wps
[Epoch 101 Batch 60/62] avg loss 0.00251065, throughput 4.80998K wps
Begin Testing...
[Epoch 101] train avg loss 0.00244197, dev acc 0.8142, dev avg loss 0.372541, throughput 4.87296K wps
[Epoch 102 Batch 30/62] avg loss 0.00229687, throughput 4.95362K wps
[Epoch 102 Batch 60/62] avg loss 0.00232082, throughput 4.84582K wps
Begin Testing...
[Epoch 102] train avg loss 0.00234524, dev acc 0.8407, dev avg loss 0.360421, throughput 4.90565K wps
[Epoch 103 Batch 30/62] avg loss 0.0023379, throughput 4.94966K wps
[Epoch 103 Batch 60/62] avg loss 0.00235925, throughput 4.86764K wps
Begin Testing...
[Epoch 103] train avg loss 0.00237237, dev acc 0.8260, dev avg loss 0.364179, throughput 4.91431K wps
[Epoch 104 Batch 30/62] avg loss 0.00237563, throughput 4.96121K wps
[Epoch 104 Batch 60/62] avg loss 0.0024056, throughput 4.83089K wps
Begin Testing...
[Epoch 104] train avg loss 0.00242542, dev acc 0.8230, dev avg loss 0.361763, throughput 4.90394K wps
[Epoch 105 Batch 30/62] avg loss 0.0024179, throughput 4.94471K wps
[Epoch 105 Batch 60/62] avg loss 0.00228632, throughput 4.85575K wps
Begin Testing...
[Epoch 105] train avg loss 0.0023434, dev acc 0.8201, dev avg loss 0.366378, throughput 4.90741K wps
[Epoch 106 Batch 30/62] avg loss 0.00245021, throughput 4.9804K wps
[Epoch 106 Batch 60/62] avg loss 0.00225743, throughput 4.85953K wps
Begin Testing...
[Epoch 106] train avg loss 0.00235536, dev acc 0.8112, dev avg loss 0.370379, throughput 4.92553K wps
[Epoch 107 Batch 30/62] avg loss 0.00213075, throughput 4.96067K wps
[Epoch 107 Batch 60/62] avg loss 0.00227266, throughput 4.82581K wps
Begin Testing...
[Epoch 107] train avg loss 0.00222823, dev acc 0.8260, dev avg loss 0.36305, throughput 4.89794K wps
[Epoch 108 Batch 30/62] avg loss 0.00219018, throughput 4.93601K wps
[Epoch 108 Batch 60/62] avg loss 0.00224424, throughput 4.83288K wps
Begin Testing...
[Epoch 108] train avg loss 0.00221336, dev acc 0.8260, dev avg loss 0.36468, throughput 4.89124K wps
[Epoch 109 Batch 30/62] avg loss 0.00211696, throughput 4.94358K wps
[Epoch 109 Batch 60/62] avg loss 0.00224757, throughput 4.8252K wps
Begin Testing...
[Epoch 109] train avg loss 0.00221993, dev acc 0.8230, dev avg loss 0.368901, throughput 4.89181K wps
[Epoch 110 Batch 30/62] avg loss 0.00204756, throughput 4.95236K wps
[Epoch 110 Batch 60/62] avg loss 0.00212975, throughput 4.84664K wps
Begin Testing...
[Epoch 110] train avg loss 0.0020954, dev acc 0.8112, dev avg loss 0.375635, throughput 4.90592K wps
[Epoch 111 Batch 30/62] avg loss 0.00202774, throughput 4.95668K wps
[Epoch 111 Batch 60/62] avg loss 0.00217717, throughput 4.86224K wps
Begin Testing...
[Epoch 111] train avg loss 0.00211217, dev acc 0.8201, dev avg loss 0.367426, throughput 4.91526K wps
[Epoch 112 Batch 30/62] avg loss 0.00196972, throughput 4.96434K wps
[Epoch 112 Batch 60/62] avg loss 0.00210952, throughput 4.8383K wps
Begin Testing...
[Epoch 112] train avg loss 0.0020384, dev acc 0.8260, dev avg loss 0.366433, throughput 4.90803K wps
[Epoch 113 Batch 30/62] avg loss 0.00198939, throughput 4.95624K wps
[Epoch 113 Batch 60/62] avg loss 0.00198637, throughput 4.83794K wps
Begin Testing...
[Epoch 113] train avg loss 0.00201206, dev acc 0.8289, dev avg loss 0.365268, throughput 4.90179K wps
[Epoch 114 Batch 30/62] avg loss 0.00194475, throughput 4.93809K wps
[Epoch 114 Batch 60/62] avg loss 0.00190437, throughput 4.84893K wps
Begin Testing...
[Epoch 114] train avg loss 0.00199039, dev acc 0.8289, dev avg loss 0.365977, throughput 4.90083K wps
[Epoch 115 Batch 30/62] avg loss 0.0019093, throughput 4.96807K wps
[Epoch 115 Batch 60/62] avg loss 0.00190458, throughput 4.84734K wps
Begin Testing...
[Epoch 115] train avg loss 0.00196174, dev acc 0.8112, dev avg loss 0.377706, throughput 4.91391K wps
[Epoch 116 Batch 30/62] avg loss 0.00165897, throughput 4.96577K wps
[Epoch 116 Batch 60/62] avg loss 0.00200735, throughput 4.85173K wps
Begin Testing...
[Epoch 116] train avg loss 0.00188306, dev acc 0.8260, dev avg loss 0.367118, throughput 4.91523K wps
[Epoch 117 Batch 30/62] avg loss 0.00187178, throughput 4.93041K wps
[Epoch 117 Batch 60/62] avg loss 0.00184698, throughput 4.8128K wps
Begin Testing...
[Epoch 117] train avg loss 0.00193809, dev acc 0.8289, dev avg loss 0.366651, throughput 4.87837K wps
[Epoch 118 Batch 30/62] avg loss 0.002002, throughput 4.93805K wps
[Epoch 118 Batch 60/62] avg loss 0.00191354, throughput 4.82646K wps
Begin Testing...
[Epoch 118] train avg loss 0.00198065, dev acc 0.8201, dev avg loss 0.378257, throughput 4.88706K wps
[Epoch 119 Batch 30/62] avg loss 0.0017456, throughput 4.93806K wps
[Epoch 119 Batch 60/62] avg loss 0.00170944, throughput 4.79888K wps
Begin Testing...
[Epoch 119] train avg loss 0.00175419, dev acc 0.8230, dev avg loss 0.368658, throughput 4.87595K wps
[Epoch 120 Batch 30/62] avg loss 0.00181844, throughput 4.92919K wps
[Epoch 120 Batch 60/62] avg loss 0.00185118, throughput 4.81007K wps
Begin Testing...
[Epoch 120] train avg loss 0.00187248, dev acc 0.8260, dev avg loss 0.368822, throughput 4.8752K wps
[Epoch 121 Batch 30/62] avg loss 0.00176969, throughput 4.92523K wps
[Epoch 121 Batch 60/62] avg loss 0.00167562, throughput 4.81576K wps
Begin Testing...
[Epoch 121] train avg loss 0.00172258, dev acc 0.8171, dev avg loss 0.37931, throughput 4.87791K wps
[Epoch 122 Batch 30/62] avg loss 0.00175131, throughput 4.9622K wps
[Epoch 122 Batch 60/62] avg loss 0.0016391, throughput 4.85603K wps
Begin Testing...
[Epoch 122] train avg loss 0.00169982, dev acc 0.8289, dev avg loss 0.371984, throughput 4.91569K wps
[Epoch 123 Batch 30/62] avg loss 0.00174079, throughput 5.00858K wps
[Epoch 123 Batch 60/62] avg loss 0.00153059, throughput 4.89343K wps
Begin Testing...
[Epoch 123] train avg loss 0.00169152, dev acc 0.8260, dev avg loss 0.36993, throughput 4.95669K wps
[Epoch 124 Batch 30/62] avg loss 0.00171848, throughput 4.95414K wps
[Epoch 124 Batch 60/62] avg loss 0.00163627, throughput 4.85956K wps
Begin Testing...
[Epoch 124] train avg loss 0.00167225, dev acc 0.8142, dev avg loss 0.3807, throughput 4.9136K wps
[Epoch 125 Batch 30/62] avg loss 0.00172377, throughput 4.98447K wps
[Epoch 125 Batch 60/62] avg loss 0.00161762, throughput 4.86134K wps
Begin Testing...
[Epoch 125] train avg loss 0.00166866, dev acc 0.8142, dev avg loss 0.381476, throughput 4.92716K wps
[Epoch 126 Batch 30/62] avg loss 0.00171179, throughput 4.93959K wps
[Epoch 126 Batch 60/62] avg loss 0.00166883, throughput 4.82639K wps
Begin Testing...
[Epoch 126] train avg loss 0.00168815, dev acc 0.8142, dev avg loss 0.383731, throughput 4.88922K wps
[Epoch 127 Batch 30/62] avg loss 0.00165742, throughput 5.00297K wps
[Epoch 127 Batch 60/62] avg loss 0.00156886, throughput 4.87705K wps
Begin Testing...
[Epoch 127] train avg loss 0.00162386, dev acc 0.8230, dev avg loss 0.379912, throughput 4.94682K wps
[Epoch 128 Batch 30/62] avg loss 0.00163791, throughput 5.00025K wps
[Epoch 128 Batch 60/62] avg loss 0.00158778, throughput 4.89156K wps
Begin Testing...
[Epoch 128] train avg loss 0.00164797, dev acc 0.8230, dev avg loss 0.380709, throughput 4.95266K wps
[Epoch 129 Batch 30/62] avg loss 0.00151663, throughput 5.02115K wps
[Epoch 129 Batch 60/62] avg loss 0.00161163, throughput 4.89269K wps
Begin Testing...
[Epoch 129] train avg loss 0.00160564, dev acc 0.8348, dev avg loss 0.372382, throughput 4.9626K wps
[Epoch 130 Batch 30/62] avg loss 0.00148259, throughput 5.0128K wps
[Epoch 130 Batch 60/62] avg loss 0.00152547, throughput 4.86172K wps
Begin Testing...
[Epoch 130] train avg loss 0.00153144, dev acc 0.8230, dev avg loss 0.384031, throughput 4.94211K wps
[Epoch 131 Batch 30/62] avg loss 0.00139174, throughput 4.91789K wps
[Epoch 131 Batch 60/62] avg loss 0.00163982, throughput 4.82246K wps
Begin Testing...
[Epoch 131] train avg loss 0.00155017, dev acc 0.8260, dev avg loss 0.376813, throughput 4.8772K wps
[Epoch 132 Batch 30/62] avg loss 0.00153697, throughput 4.93389K wps
[Epoch 132 Batch 60/62] avg loss 0.00148321, throughput 4.82612K wps
Begin Testing...
[Epoch 132] train avg loss 0.00153701, dev acc 0.8289, dev avg loss 0.381557, throughput 4.88515K wps
[Epoch 133 Batch 30/62] avg loss 0.00145197, throughput 4.91346K wps
[Epoch 133 Batch 60/62] avg loss 0.00152952, throughput 4.82543K wps
Begin Testing...
[Epoch 133] train avg loss 0.00149449, dev acc 0.8289, dev avg loss 0.381279, throughput 4.87691K wps
[Epoch 134 Batch 30/62] avg loss 0.00138069, throughput 4.93484K wps
[Epoch 134 Batch 60/62] avg loss 0.00131468, throughput 4.82331K wps
Begin Testing...
[Epoch 134] train avg loss 0.00135592, dev acc 0.8289, dev avg loss 0.381647, throughput 4.88456K wps
[Epoch 135 Batch 30/62] avg loss 0.00138089, throughput 4.96707K wps
[Epoch 135 Batch 60/62] avg loss 0.00134758, throughput 4.88363K wps
Begin Testing...
[Epoch 135] train avg loss 0.00146467, dev acc 0.8289, dev avg loss 0.37756, throughput 4.93291K wps
[Epoch 136 Batch 30/62] avg loss 0.00145489, throughput 4.99107K wps
[Epoch 136 Batch 60/62] avg loss 0.00138543, throughput 4.86671K wps
Begin Testing...
[Epoch 136] train avg loss 0.00143591, dev acc 0.8260, dev avg loss 0.378043, throughput 4.9338K wps
[Epoch 137 Batch 30/62] avg loss 0.00130555, throughput 4.99565K wps
[Epoch 137 Batch 60/62] avg loss 0.00133742, throughput 4.84313K wps
Begin Testing...
[Epoch 137] train avg loss 0.00133792, dev acc 0.8260, dev avg loss 0.378666, throughput 4.92306K wps
[Epoch 138 Batch 30/62] avg loss 0.00136795, throughput 4.96549K wps
[Epoch 138 Batch 60/62] avg loss 0.00139517, throughput 4.82803K wps
Begin Testing...
[Epoch 138] train avg loss 0.00137389, dev acc 0.8230, dev avg loss 0.387752, throughput 4.90246K wps
[Epoch 139 Batch 30/62] avg loss 0.0013818, throughput 4.94214K wps
[Epoch 139 Batch 60/62] avg loss 0.00123678, throughput 4.84892K wps
Begin Testing...
[Epoch 139] train avg loss 0.00134515, dev acc 0.8289, dev avg loss 0.379404, throughput 4.90258K wps
[Epoch 140 Batch 30/62] avg loss 0.00141007, throughput 4.97557K wps
[Epoch 140 Batch 60/62] avg loss 0.00128965, throughput 4.88565K wps
Begin Testing...
[Epoch 140] train avg loss 0.00135367, dev acc 0.8260, dev avg loss 0.386746, throughput 4.93733K wps
[Epoch 141 Batch 30/62] avg loss 0.0013361, throughput 5.00369K wps
[Epoch 141 Batch 60/62] avg loss 0.00125795, throughput 4.85612K wps
Begin Testing...
[Epoch 141] train avg loss 0.00130755, dev acc 0.8260, dev avg loss 0.384902, throughput 4.93409K wps
[Epoch 142 Batch 30/62] avg loss 0.00130167, throughput 4.963K wps
[Epoch 142 Batch 60/62] avg loss 0.00134115, throughput 4.84311K wps
Begin Testing...
[Epoch 142] train avg loss 0.00133582, dev acc 0.8289, dev avg loss 0.383469, throughput 4.90896K wps
[Epoch 143 Batch 30/62] avg loss 0.00135964, throughput 4.9362K wps
[Epoch 143 Batch 60/62] avg loss 0.00120727, throughput 4.83702K wps
Begin Testing...
[Epoch 143] train avg loss 0.00129011, dev acc 0.8289, dev avg loss 0.3876, throughput 4.89518K wps
[Epoch 144 Batch 30/62] avg loss 0.00127479, throughput 4.94116K wps
[Epoch 144 Batch 60/62] avg loss 0.0012092, throughput 4.82752K wps
Begin Testing...
[Epoch 144] train avg loss 0.00124823, dev acc 0.8289, dev avg loss 0.3854, throughput 4.89043K wps
[Epoch 145 Batch 30/62] avg loss 0.00114886, throughput 4.93206K wps
[Epoch 145 Batch 60/62] avg loss 0.00123138, throughput 4.8202K wps
Begin Testing...
[Epoch 145] train avg loss 0.00120522, dev acc 0.8289, dev avg loss 0.386304, throughput 4.88223K wps
[Epoch 146 Batch 30/62] avg loss 0.00117964, throughput 4.9432K wps
[Epoch 146 Batch 60/62] avg loss 0.0012608, throughput 4.83269K wps
Begin Testing...
[Epoch 146] train avg loss 0.0012493, dev acc 0.8171, dev avg loss 0.394658, throughput 4.89556K wps
[Epoch 147 Batch 30/62] avg loss 0.00126229, throughput 4.94705K wps
[Epoch 147 Batch 60/62] avg loss 0.00113098, throughput 4.84091K wps
Begin Testing...
[Epoch 147] train avg loss 0.00121684, dev acc 0.8289, dev avg loss 0.388571, throughput 4.8999K wps
[Epoch 148 Batch 30/62] avg loss 0.00113718, throughput 4.9457K wps
[Epoch 148 Batch 60/62] avg loss 0.00117189, throughput 4.83018K wps
Begin Testing...
[Epoch 148] train avg loss 0.00117428, dev acc 0.8319, dev avg loss 0.38438, throughput 4.89424K wps
[Epoch 149 Batch 30/62] avg loss 0.00114402, throughput 4.96019K wps
[Epoch 149 Batch 60/62] avg loss 0.00122404, throughput 4.82364K wps
Begin Testing...
[Epoch 149] train avg loss 0.00122732, dev acc 0.8319, dev avg loss 0.385727, throughput 4.89841K wps
[Epoch 150 Batch 30/62] avg loss 0.00121397, throughput 4.93442K wps
[Epoch 150 Batch 60/62] avg loss 0.00115732, throughput 4.8204K wps
Begin Testing...
[Epoch 150] train avg loss 0.00124702, dev acc 0.8319, dev avg loss 0.387121, throughput 4.88423K wps
[Epoch 151 Batch 30/62] avg loss 0.00110976, throughput 4.94727K wps
[Epoch 151 Batch 60/62] avg loss 0.00108431, throughput 4.846K wps
Begin Testing...
[Epoch 151] train avg loss 0.00110717, dev acc 0.8230, dev avg loss 0.40013, throughput 4.90364K wps
[Epoch 152 Batch 30/62] avg loss 0.00108214, throughput 4.95942K wps
[Epoch 152 Batch 60/62] avg loss 0.00122809, throughput 4.83873K wps
Begin Testing...
[Epoch 152] train avg loss 0.00117429, dev acc 0.8378, dev avg loss 0.387072, throughput 4.9048K wps
[Epoch 153 Batch 30/62] avg loss 0.00113748, throughput 4.98672K wps
[Epoch 153 Batch 60/62] avg loss 0.00103996, throughput 4.89937K wps
Begin Testing...
[Epoch 153] train avg loss 0.00109603, dev acc 0.8348, dev avg loss 0.386898, throughput 4.94957K wps
[Epoch 154 Batch 30/62] avg loss 0.00112056, throughput 5.00944K wps
[Epoch 154 Batch 60/62] avg loss 0.00108572, throughput 4.89812K wps
Begin Testing...
[Epoch 154] train avg loss 0.00111331, dev acc 0.8260, dev avg loss 0.391554, throughput 4.96059K wps
[Epoch 155 Batch 30/62] avg loss 0.00106212, throughput 5.01362K wps
[Epoch 155 Batch 60/62] avg loss 0.00107313, throughput 4.86392K wps
Begin Testing...
[Epoch 155] train avg loss 0.00108419, dev acc 0.8230, dev avg loss 0.393441, throughput 4.9431K wps
[Epoch 156 Batch 30/62] avg loss 0.0010319, throughput 4.95512K wps
[Epoch 156 Batch 60/62] avg loss 0.00100708, throughput 4.83873K wps
Begin Testing...
[Epoch 156] train avg loss 0.0010378, dev acc 0.8289, dev avg loss 0.387317, throughput 4.90213K wps
[Epoch 157 Batch 30/62] avg loss 0.00112725, throughput 4.92563K wps
[Epoch 157 Batch 60/62] avg loss 0.0011078, throughput 4.80323K wps
Begin Testing...
[Epoch 157] train avg loss 0.00111753, dev acc 0.8289, dev avg loss 0.390534, throughput 4.87049K wps
[Epoch 158 Batch 30/62] avg loss 0.000978177, throughput 4.92376K wps
[Epoch 158 Batch 60/62] avg loss 0.00102823, throughput 4.8161K wps
Begin Testing...
[Epoch 158] train avg loss 0.00101853, dev acc 0.8230, dev avg loss 0.399203, throughput 4.87619K wps
[Epoch 159 Batch 30/62] avg loss 0.00104599, throughput 4.9152K wps
[Epoch 159 Batch 60/62] avg loss 0.00099307, throughput 4.84216K wps
Begin Testing...
[Epoch 159] train avg loss 0.00104706, dev acc 0.8289, dev avg loss 0.389621, throughput 4.88315K wps
[Epoch 160 Batch 30/62] avg loss 0.00100813, throughput 4.95596K wps
[Epoch 160 Batch 60/62] avg loss 0.00103074, throughput 4.84255K wps
Begin Testing...
[Epoch 160] train avg loss 0.00101953, dev acc 0.8289, dev avg loss 0.393867, throughput 4.90479K wps
[Epoch 161 Batch 30/62] avg loss 0.000931261, throughput 4.92492K wps
[Epoch 161 Batch 60/62] avg loss 0.00107641, throughput 4.84485K wps
Begin Testing...
[Epoch 161] train avg loss 0.00101507, dev acc 0.8289, dev avg loss 0.397078, throughput 4.89248K wps
[Epoch 162 Batch 30/62] avg loss 0.000953774, throughput 4.91894K wps
[Epoch 162 Batch 60/62] avg loss 0.00103776, throughput 4.82938K wps
Begin Testing...
[Epoch 162] train avg loss 0.00102438, dev acc 0.8289, dev avg loss 0.39037, throughput 4.88134K wps
[Epoch 163 Batch 30/62] avg loss 0.00102271, throughput 4.9565K wps
[Epoch 163 Batch 60/62] avg loss 0.00100891, throughput 4.81853K wps
Begin Testing...
[Epoch 163] train avg loss 0.00103699, dev acc 0.8289, dev avg loss 0.399277, throughput 4.89331K wps
[Epoch 164 Batch 30/62] avg loss 0.000902989, throughput 4.92864K wps
[Epoch 164 Batch 60/62] avg loss 0.000985673, throughput 4.82617K wps
Begin Testing...
[Epoch 164] train avg loss 0.00095676, dev acc 0.8348, dev avg loss 0.396447, throughput 4.88373K wps
[Epoch 165 Batch 30/62] avg loss 0.000981116, throughput 4.94433K wps
[Epoch 165 Batch 60/62] avg loss 0.000945678, throughput 4.82164K wps
Begin Testing...
[Epoch 165] train avg loss 0.000993323, dev acc 0.8289, dev avg loss 0.399381, throughput 4.89064K wps
[Epoch 166 Batch 30/62] avg loss 0.000896876, throughput 4.92894K wps
[Epoch 166 Batch 60/62] avg loss 0.000876196, throughput 4.82561K wps
Begin Testing...
[Epoch 166] train avg loss 0.000899181, dev acc 0.8348, dev avg loss 0.395443, throughput 4.88502K wps
[Epoch 167 Batch 30/62] avg loss 0.000915247, throughput 4.94925K wps
[Epoch 167 Batch 60/62] avg loss 0.000909126, throughput 4.85687K wps
Begin Testing...
[Epoch 167] train avg loss 0.000919939, dev acc 0.8378, dev avg loss 0.397265, throughput 4.90906K wps
[Epoch 168 Batch 30/62] avg loss 0.00089364, throughput 4.95139K wps
[Epoch 168 Batch 60/62] avg loss 0.000976583, throughput 4.85228K wps
Begin Testing...
[Epoch 168] train avg loss 0.000939617, dev acc 0.8230, dev avg loss 0.405221, throughput 4.90876K wps
[Epoch 169 Batch 30/62] avg loss 0.000918062, throughput 4.98791K wps
[Epoch 169 Batch 60/62] avg loss 0.000890507, throughput 4.88186K wps
Begin Testing...
[Epoch 169] train avg loss 0.000918903, dev acc 0.8378, dev avg loss 0.400144, throughput 4.94127K wps
[Epoch 170 Batch 30/62] avg loss 0.000870888, throughput 4.96186K wps
[Epoch 170 Batch 60/62] avg loss 0.000911359, throughput 4.86712K wps
Begin Testing...
[Epoch 170] train avg loss 0.000915173, dev acc 0.8230, dev avg loss 0.415729, throughput 4.92088K wps
[Epoch 171 Batch 30/62] avg loss 0.000904477, throughput 4.94765K wps
[Epoch 171 Batch 60/62] avg loss 0.000900682, throughput 4.83125K wps
Begin Testing...
[Epoch 171] train avg loss 0.0009196, dev acc 0.8378, dev avg loss 0.395931, throughput 4.89544K wps
[Epoch 172 Batch 30/62] avg loss 0.000895919, throughput 4.96217K wps
[Epoch 172 Batch 60/62] avg loss 0.000896967, throughput 4.8759K wps
Begin Testing...
[Epoch 172] train avg loss 0.000901913, dev acc 0.8348, dev avg loss 0.398389, throughput 4.92583K wps
[Epoch 173 Batch 30/62] avg loss 0.000893121, throughput 4.98741K wps
[Epoch 173 Batch 60/62] avg loss 0.000857853, throughput 4.87685K wps
Begin Testing...
[Epoch 173] train avg loss 0.000884633, dev acc 0.8289, dev avg loss 0.402838, throughput 4.94002K wps
[Epoch 174 Batch 30/62] avg loss 0.000882989, throughput 4.98142K wps
[Epoch 174 Batch 60/62] avg loss 0.000881145, throughput 4.84058K wps
Begin Testing...
[Epoch 174] train avg loss 0.000893335, dev acc 0.8378, dev avg loss 0.402143, throughput 4.91679K wps
[Epoch 175 Batch 30/62] avg loss 0.000822045, throughput 4.94898K wps
[Epoch 175 Batch 60/62] avg loss 0.000818058, throughput 4.8505K wps
Begin Testing...
[Epoch 175] train avg loss 0.000829332, dev acc 0.8348, dev avg loss 0.403283, throughput 4.90627K wps
[Epoch 176 Batch 30/62] avg loss 0.000868944, throughput 4.98117K wps
[Epoch 176 Batch 60/62] avg loss 0.000800002, throughput 4.81659K wps
Begin Testing...
[Epoch 176] train avg loss 0.00083337, dev acc 0.8378, dev avg loss 0.403739, throughput 4.90297K wps
[Epoch 177 Batch 30/62] avg loss 0.000807821, throughput 4.95491K wps
[Epoch 177 Batch 60/62] avg loss 0.000745258, throughput 4.83031K wps
Begin Testing...
[Epoch 177] train avg loss 0.000780895, dev acc 0.8348, dev avg loss 0.401096, throughput 4.89987K wps
[Epoch 178 Batch 30/62] avg loss 0.000798535, throughput 4.93265K wps
[Epoch 178 Batch 60/62] avg loss 0.000897656, throughput 4.82989K wps
Begin Testing...
[Epoch 178] train avg loss 0.000865845, dev acc 0.8230, dev avg loss 0.423239, throughput 4.88751K wps
[Epoch 179 Batch 30/62] avg loss 0.000853633, throughput 4.92191K wps
[Epoch 179 Batch 60/62] avg loss 0.00087952, throughput 4.81598K wps
Begin Testing...
[Epoch 179] train avg loss 0.000871094, dev acc 0.8378, dev avg loss 0.403472, throughput 4.87585K wps
[Epoch 180 Batch 30/62] avg loss 0.000780214, throughput 4.94563K wps
[Epoch 180 Batch 60/62] avg loss 0.000883964, throughput 4.82807K wps
Begin Testing...
[Epoch 180] train avg loss 0.000848667, dev acc 0.8201, dev avg loss 0.412544, throughput 4.89199K wps
[Epoch 181 Batch 30/62] avg loss 0.000809414, throughput 4.93151K wps
[Epoch 181 Batch 60/62] avg loss 0.000783638, throughput 4.81005K wps
Begin Testing...
[Epoch 181] train avg loss 0.00081014, dev acc 0.8201, dev avg loss 0.414506, throughput 4.8759K wps
[Epoch 182 Batch 30/62] avg loss 0.000829732, throughput 4.9295K wps
[Epoch 182 Batch 60/62] avg loss 0.000818807, throughput 4.83141K wps
Begin Testing...
[Epoch 182] train avg loss 0.000825003, dev acc 0.8348, dev avg loss 0.406399, throughput 4.88713K wps
[Epoch 183 Batch 30/62] avg loss 0.000812505, throughput 4.93216K wps
[Epoch 183 Batch 60/62] avg loss 0.000751083, throughput 4.85021K wps
Begin Testing...
[Epoch 183] train avg loss 0.000801279, dev acc 0.8171, dev avg loss 0.41187, throughput 4.89779K wps
[Epoch 184 Batch 30/62] avg loss 0.00079525, throughput 4.92374K wps
[Epoch 184 Batch 60/62] avg loss 0.000793811, throughput 4.84611K wps
Begin Testing...
[Epoch 184] train avg loss 0.000801066, dev acc 0.8348, dev avg loss 0.409075, throughput 4.89231K wps
[Epoch 185 Batch 30/62] avg loss 0.000757636, throughput 4.95321K wps
[Epoch 185 Batch 60/62] avg loss 0.000819399, throughput 4.83732K wps
Begin Testing...
[Epoch 185] train avg loss 0.000798137, dev acc 0.8378, dev avg loss 0.406668, throughput 4.90128K wps
[Epoch 186 Batch 30/62] avg loss 0.00076862, throughput 4.94196K wps
[Epoch 186 Batch 60/62] avg loss 0.00077306, throughput 4.88902K wps
Begin Testing...
[Epoch 186] train avg loss 0.000793128, dev acc 0.8230, dev avg loss 0.416384, throughput 4.92297K wps
[Epoch 187 Batch 30/62] avg loss 0.000690729, throughput 4.98621K wps
[Epoch 187 Batch 60/62] avg loss 0.000810514, throughput 4.86966K wps
Begin Testing...
[Epoch 187] train avg loss 0.000751458, dev acc 0.8348, dev avg loss 0.408842, throughput 4.9342K wps
[Epoch 188 Batch 30/62] avg loss 0.0007756, throughput 4.97904K wps
[Epoch 188 Batch 60/62] avg loss 0.000789192, throughput 4.86432K wps
Begin Testing...
[Epoch 188] train avg loss 0.000790268, dev acc 0.8289, dev avg loss 0.412474, throughput 4.92851K wps
[Epoch 189 Batch 30/62] avg loss 0.000680799, throughput 4.94797K wps
[Epoch 189 Batch 60/62] avg loss 0.000718007, throughput 4.8312K wps
Begin Testing...
[Epoch 189] train avg loss 0.000712588, dev acc 0.8348, dev avg loss 0.410755, throughput 4.89698K wps
[Epoch 190 Batch 30/62] avg loss 0.000697277, throughput 4.96498K wps
[Epoch 190 Batch 60/62] avg loss 0.000735713, throughput 4.82463K wps
Begin Testing...
[Epoch 190] train avg loss 0.000731943, dev acc 0.8348, dev avg loss 0.408531, throughput 4.90117K wps
[Epoch 191 Batch 30/62] avg loss 0.000695868, throughput 4.96501K wps
[Epoch 191 Batch 60/62] avg loss 0.000735588, throughput 4.84373K wps
Begin Testing...
[Epoch 191] train avg loss 0.000721573, dev acc 0.8348, dev avg loss 0.413217, throughput 4.90968K wps
[Epoch 192 Batch 30/62] avg loss 0.000728491, throughput 4.92501K wps
[Epoch 192 Batch 60/62] avg loss 0.000710799, throughput 4.82952K wps
Begin Testing...
[Epoch 192] train avg loss 0.000727643, dev acc 0.8289, dev avg loss 0.416073, throughput 4.88277K wps
[Epoch 193 Batch 30/62] avg loss 0.00071659, throughput 4.94365K wps
[Epoch 193 Batch 60/62] avg loss 0.000707932, throughput 4.8206K wps
Begin Testing...
[Epoch 193] train avg loss 0.00071885, dev acc 0.8230, dev avg loss 0.419737, throughput 4.88934K wps
[Epoch 194 Batch 30/62] avg loss 0.000706563, throughput 4.94964K wps
[Epoch 194 Batch 60/62] avg loss 0.0007319, throughput 4.84253K wps
Begin Testing...
[Epoch 194] train avg loss 0.00075032, dev acc 0.8319, dev avg loss 0.410891, throughput 4.9024K wps
[Epoch 195 Batch 30/62] avg loss 0.000685402, throughput 4.94919K wps
[Epoch 195 Batch 60/62] avg loss 0.000662334, throughput 4.8086K wps
Begin Testing...
[Epoch 195] train avg loss 0.000679669, dev acc 0.8230, dev avg loss 0.418433, throughput 4.88445K wps
[Epoch 196 Batch 30/62] avg loss 0.00069371, throughput 4.94431K wps
[Epoch 196 Batch 60/62] avg loss 0.000644419, throughput 4.86421K wps
Begin Testing...
[Epoch 196] train avg loss 0.000671208, dev acc 0.8348, dev avg loss 0.414747, throughput 4.91204K wps
[Epoch 197 Batch 30/62] avg loss 0.000634128, throughput 4.96836K wps
[Epoch 197 Batch 60/62] avg loss 0.000698401, throughput 4.84457K wps
Begin Testing...
[Epoch 197] train avg loss 0.000679623, dev acc 0.8171, dev avg loss 0.433965, throughput 4.9133K wps
[Epoch 198 Batch 30/62] avg loss 0.000654084, throughput 4.94506K wps
[Epoch 198 Batch 60/62] avg loss 0.000659367, throughput 4.81872K wps
Begin Testing...
[Epoch 198] train avg loss 0.000660631, dev acc 0.8201, dev avg loss 0.424885, throughput 4.88818K wps
[Epoch 199 Batch 30/62] avg loss 0.000678743, throughput 4.94051K wps
[Epoch 199 Batch 60/62] avg loss 0.00063441, throughput 4.8346K wps
Begin Testing...
[Epoch 199] train avg loss 0.000684406, dev acc 0.8348, dev avg loss 0.417292, throughput 4.89441K wps
Test loss 0.357892, test acc 0.8541
Total time cost 276.58s
[Epoch 0 Batch 30/62] avg loss 0.0136246, throughput 4.7597K wps
[Epoch 0 Batch 60/62] avg loss 0.0130988, throughput 4.87128K wps
Begin Testing...
[Epoch 0] train avg loss 0.0135185, dev acc 0.6254, dev avg loss 0.661929, throughput 4.8244K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0129907, throughput 5.0126K wps
[Epoch 1 Batch 60/62] avg loss 0.0132814, throughput 4.8553K wps
Begin Testing...
[Epoch 1] train avg loss 0.013299, dev acc 0.6254, dev avg loss 0.654544, throughput 4.9403K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0130117, throughput 5.00115K wps
[Epoch 2 Batch 60/62] avg loss 0.012957, throughput 4.89473K wps
Begin Testing...
[Epoch 2] train avg loss 0.0131979, dev acc 0.6254, dev avg loss 0.649144, throughput 4.95463K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0127012, throughput 5.0043K wps
[Epoch 3 Batch 60/62] avg loss 0.0128852, throughput 4.87169K wps
Begin Testing...
[Epoch 3] train avg loss 0.0129635, dev acc 0.6254, dev avg loss 0.64307, throughput 4.94364K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0127787, throughput 4.95943K wps
[Epoch 4 Batch 60/62] avg loss 0.0125598, throughput 4.8295K wps
Begin Testing...
[Epoch 4] train avg loss 0.0128253, dev acc 0.6254, dev avg loss 0.637947, throughput 4.89875K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0126755, throughput 4.93895K wps
[Epoch 5 Batch 60/62] avg loss 0.0125833, throughput 4.81239K wps
Begin Testing...
[Epoch 5] train avg loss 0.0128214, dev acc 0.6254, dev avg loss 0.632538, throughput 4.87992K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.012289, throughput 4.92554K wps
[Epoch 6 Batch 60/62] avg loss 0.0125219, throughput 4.82534K wps
Begin Testing...
[Epoch 6] train avg loss 0.0125505, dev acc 0.6254, dev avg loss 0.626297, throughput 4.88181K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0124418, throughput 4.92897K wps
[Epoch 7 Batch 60/62] avg loss 0.0122715, throughput 4.84613K wps
Begin Testing...
[Epoch 7] train avg loss 0.0125047, dev acc 0.6254, dev avg loss 0.622681, throughput 4.8935K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0122189, throughput 4.93412K wps
[Epoch 8 Batch 60/62] avg loss 0.0123794, throughput 4.83572K wps
Begin Testing...
[Epoch 8] train avg loss 0.0124647, dev acc 0.6254, dev avg loss 0.614415, throughput 4.89106K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0120207, throughput 4.9258K wps
[Epoch 9 Batch 60/62] avg loss 0.0121163, throughput 4.83224K wps
Begin Testing...
[Epoch 9] train avg loss 0.0122398, dev acc 0.6283, dev avg loss 0.607571, throughput 4.88493K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0118795, throughput 4.96364K wps
[Epoch 10 Batch 60/62] avg loss 0.0121393, throughput 4.84691K wps
Begin Testing...
[Epoch 10] train avg loss 0.0121449, dev acc 0.6283, dev avg loss 0.601605, throughput 4.91011K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0119035, throughput 4.95631K wps
[Epoch 11 Batch 60/62] avg loss 0.0116691, throughput 4.82458K wps
Begin Testing...
[Epoch 11] train avg loss 0.0118869, dev acc 0.6254, dev avg loss 0.596481, throughput 4.89624K wps
[Epoch 12 Batch 30/62] avg loss 0.0115881, throughput 4.91791K wps
[Epoch 12 Batch 60/62] avg loss 0.0116657, throughput 4.83966K wps
Begin Testing...
[Epoch 12] train avg loss 0.0117676, dev acc 0.6431, dev avg loss 0.586842, throughput 4.88578K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0116977, throughput 4.94142K wps
[Epoch 13 Batch 60/62] avg loss 0.0113182, throughput 4.81301K wps
Begin Testing...
[Epoch 13] train avg loss 0.0116985, dev acc 0.6490, dev avg loss 0.578609, throughput 4.8828K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0115198, throughput 4.94197K wps
[Epoch 14 Batch 60/62] avg loss 0.0111791, throughput 4.83857K wps
Begin Testing...
[Epoch 14] train avg loss 0.0114911, dev acc 0.6549, dev avg loss 0.571015, throughput 4.89519K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0113056, throughput 4.91577K wps
[Epoch 15 Batch 60/62] avg loss 0.0109122, throughput 4.83478K wps
Begin Testing...
[Epoch 15] train avg loss 0.0111988, dev acc 0.6460, dev avg loss 0.566222, throughput 4.87988K wps
[Epoch 16 Batch 30/62] avg loss 0.011076, throughput 4.94004K wps
[Epoch 16 Batch 60/62] avg loss 0.0106908, throughput 4.81885K wps
Begin Testing...
[Epoch 16] train avg loss 0.0110393, dev acc 0.6932, dev avg loss 0.552579, throughput 4.88618K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0107428, throughput 4.94929K wps
[Epoch 17 Batch 60/62] avg loss 0.0107675, throughput 4.82284K wps
Begin Testing...
[Epoch 17] train avg loss 0.0108709, dev acc 0.6844, dev avg loss 0.544541, throughput 4.89075K wps
[Epoch 18 Batch 30/62] avg loss 0.0107304, throughput 4.92593K wps
[Epoch 18 Batch 60/62] avg loss 0.0105037, throughput 4.83318K wps
Begin Testing...
[Epoch 18] train avg loss 0.0107723, dev acc 0.6991, dev avg loss 0.535893, throughput 4.88845K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0105167, throughput 4.94334K wps
[Epoch 19 Batch 60/62] avg loss 0.0103892, throughput 4.8444K wps
Begin Testing...
[Epoch 19] train avg loss 0.0105942, dev acc 0.7021, dev avg loss 0.528483, throughput 4.89697K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0101509, throughput 4.93739K wps
[Epoch 20 Batch 60/62] avg loss 0.0102005, throughput 4.84759K wps
Begin Testing...
[Epoch 20] train avg loss 0.0103017, dev acc 0.7404, dev avg loss 0.518381, throughput 4.8996K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0101857, throughput 4.94397K wps
[Epoch 21 Batch 60/62] avg loss 0.0100325, throughput 4.85961K wps
Begin Testing...
[Epoch 21] train avg loss 0.0102212, dev acc 0.7404, dev avg loss 0.510309, throughput 4.90901K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.00972254, throughput 4.96665K wps
[Epoch 22 Batch 60/62] avg loss 0.0101412, throughput 4.8442K wps
Begin Testing...
[Epoch 22] train avg loss 0.0100497, dev acc 0.7522, dev avg loss 0.50277, throughput 4.91076K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.00953694, throughput 4.98683K wps
[Epoch 23 Batch 60/62] avg loss 0.00982728, throughput 4.89792K wps
Begin Testing...
[Epoch 23] train avg loss 0.00982113, dev acc 0.7463, dev avg loss 0.49602, throughput 4.94933K wps
[Epoch 24 Batch 30/62] avg loss 0.00972099, throughput 4.98456K wps
[Epoch 24 Batch 60/62] avg loss 0.00947986, throughput 4.89862K wps
Begin Testing...
[Epoch 24] train avg loss 0.00965954, dev acc 0.7522, dev avg loss 0.492172, throughput 4.94791K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.00937117, throughput 5.00715K wps
[Epoch 25 Batch 60/62] avg loss 0.00920938, throughput 4.87159K wps
Begin Testing...
[Epoch 25] train avg loss 0.0093643, dev acc 0.7552, dev avg loss 0.482281, throughput 4.94331K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.00921388, throughput 4.95425K wps
[Epoch 26 Batch 60/62] avg loss 0.00915643, throughput 4.84011K wps
Begin Testing...
[Epoch 26] train avg loss 0.00929161, dev acc 0.7552, dev avg loss 0.478602, throughput 4.90301K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/62] avg loss 0.00916171, throughput 4.96423K wps
[Epoch 27 Batch 60/62] avg loss 0.00887427, throughput 4.84312K wps
Begin Testing...
[Epoch 27] train avg loss 0.00913143, dev acc 0.7552, dev avg loss 0.473571, throughput 4.90922K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.00896058, throughput 4.94431K wps
[Epoch 28 Batch 60/62] avg loss 0.00880433, throughput 4.81703K wps
Begin Testing...
[Epoch 28] train avg loss 0.00905643, dev acc 0.7758, dev avg loss 0.465137, throughput 4.88697K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00881167, throughput 4.93282K wps
[Epoch 29 Batch 60/62] avg loss 0.008713, throughput 4.8151K wps
Begin Testing...
[Epoch 29] train avg loss 0.00898878, dev acc 0.7758, dev avg loss 0.461777, throughput 4.88094K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00856378, throughput 4.92854K wps
[Epoch 30 Batch 60/62] avg loss 0.00859768, throughput 4.85201K wps
Begin Testing...
[Epoch 30] train avg loss 0.0087814, dev acc 0.7906, dev avg loss 0.455249, throughput 4.89641K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00842174, throughput 4.91717K wps
[Epoch 31 Batch 60/62] avg loss 0.00856865, throughput 4.8234K wps
Begin Testing...
[Epoch 31] train avg loss 0.00859282, dev acc 0.8142, dev avg loss 0.451651, throughput 4.87657K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/62] avg loss 0.0087062, throughput 4.9284K wps
[Epoch 32 Batch 60/62] avg loss 0.00815644, throughput 4.84177K wps
Begin Testing...
[Epoch 32] train avg loss 0.00860664, dev acc 0.7935, dev avg loss 0.446908, throughput 4.89329K wps
[Epoch 33 Batch 30/62] avg loss 0.00841059, throughput 4.98196K wps
[Epoch 33 Batch 60/62] avg loss 0.00796768, throughput 4.87043K wps
Begin Testing...
[Epoch 33] train avg loss 0.0082575, dev acc 0.7758, dev avg loss 0.447103, throughput 4.93158K wps
[Epoch 34 Batch 30/62] avg loss 0.00818577, throughput 4.95712K wps
[Epoch 34 Batch 60/62] avg loss 0.00820406, throughput 4.8632K wps
Begin Testing...
[Epoch 34] train avg loss 0.00827919, dev acc 0.7906, dev avg loss 0.439887, throughput 4.91658K wps
[Epoch 35 Batch 30/62] avg loss 0.00786219, throughput 4.97311K wps
[Epoch 35 Batch 60/62] avg loss 0.00801659, throughput 4.87085K wps
Begin Testing...
[Epoch 35] train avg loss 0.00809538, dev acc 0.7994, dev avg loss 0.43619, throughput 4.92948K wps
[Epoch 36 Batch 30/62] avg loss 0.00779679, throughput 4.96053K wps
[Epoch 36 Batch 60/62] avg loss 0.00790588, throughput 4.84659K wps
Begin Testing...
[Epoch 36] train avg loss 0.00796635, dev acc 0.7994, dev avg loss 0.432873, throughput 4.91006K wps
[Epoch 37 Batch 30/62] avg loss 0.00755252, throughput 4.95247K wps
[Epoch 37 Batch 60/62] avg loss 0.0078875, throughput 4.87777K wps
Begin Testing...
[Epoch 37] train avg loss 0.00775603, dev acc 0.7994, dev avg loss 0.430161, throughput 4.92209K wps
[Epoch 38 Batch 30/62] avg loss 0.00755672, throughput 4.9443K wps
[Epoch 38 Batch 60/62] avg loss 0.00742083, throughput 4.83627K wps
Begin Testing...
[Epoch 38] train avg loss 0.0076006, dev acc 0.8053, dev avg loss 0.425774, throughput 4.89599K wps
[Epoch 39 Batch 30/62] avg loss 0.00739415, throughput 4.93984K wps
[Epoch 39 Batch 60/62] avg loss 0.00767687, throughput 4.80741K wps
Begin Testing...
[Epoch 39] train avg loss 0.0076456, dev acc 0.8112, dev avg loss 0.422575, throughput 4.88012K wps
[Epoch 40 Batch 30/62] avg loss 0.00734668, throughput 4.93876K wps
[Epoch 40 Batch 60/62] avg loss 0.00753767, throughput 4.81035K wps
Begin Testing...
[Epoch 40] train avg loss 0.00762463, dev acc 0.8142, dev avg loss 0.420249, throughput 4.87915K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.00738995, throughput 4.94173K wps
[Epoch 41 Batch 60/62] avg loss 0.00699782, throughput 4.83788K wps
Begin Testing...
[Epoch 41] train avg loss 0.00737194, dev acc 0.8171, dev avg loss 0.417051, throughput 4.895K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/62] avg loss 0.00706435, throughput 4.91188K wps
[Epoch 42 Batch 60/62] avg loss 0.00710322, throughput 4.78769K wps
Begin Testing...
[Epoch 42] train avg loss 0.00718562, dev acc 0.8201, dev avg loss 0.414253, throughput 4.85473K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/62] avg loss 0.00685883, throughput 4.91911K wps
[Epoch 43 Batch 60/62] avg loss 0.00737692, throughput 4.80232K wps
Begin Testing...
[Epoch 43] train avg loss 0.00719373, dev acc 0.8201, dev avg loss 0.411407, throughput 4.86607K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/62] avg loss 0.00666758, throughput 4.92121K wps
[Epoch 44 Batch 60/62] avg loss 0.0070076, throughput 4.82357K wps
Begin Testing...
[Epoch 44] train avg loss 0.00690275, dev acc 0.8230, dev avg loss 0.409172, throughput 4.878K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/62] avg loss 0.00684252, throughput 4.95973K wps
[Epoch 45 Batch 60/62] avg loss 0.00677256, throughput 4.85084K wps
Begin Testing...
[Epoch 45] train avg loss 0.0068607, dev acc 0.8230, dev avg loss 0.406921, throughput 4.91202K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/62] avg loss 0.00670392, throughput 4.95451K wps
[Epoch 46 Batch 60/62] avg loss 0.00654225, throughput 4.83187K wps
Begin Testing...
[Epoch 46] train avg loss 0.00667538, dev acc 0.8230, dev avg loss 0.404761, throughput 4.89952K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.00662338, throughput 4.97079K wps
[Epoch 47 Batch 60/62] avg loss 0.00646229, throughput 4.85131K wps
Begin Testing...
[Epoch 47] train avg loss 0.0066277, dev acc 0.8230, dev avg loss 0.402912, throughput 4.91706K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/62] avg loss 0.00644848, throughput 4.9766K wps
[Epoch 48 Batch 60/62] avg loss 0.00648874, throughput 4.84251K wps
Begin Testing...
[Epoch 48] train avg loss 0.00656981, dev acc 0.8260, dev avg loss 0.399914, throughput 4.91494K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/62] avg loss 0.00630782, throughput 4.96195K wps
[Epoch 49 Batch 60/62] avg loss 0.00638978, throughput 4.86094K wps
Begin Testing...
[Epoch 49] train avg loss 0.00648186, dev acc 0.8289, dev avg loss 0.398607, throughput 4.9174K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00611637, throughput 4.96115K wps
[Epoch 50 Batch 60/62] avg loss 0.00652154, throughput 4.84959K wps
Begin Testing...
[Epoch 50] train avg loss 0.00642076, dev acc 0.8201, dev avg loss 0.397474, throughput 4.91198K wps
[Epoch 51 Batch 30/62] avg loss 0.00641268, throughput 4.98373K wps
[Epoch 51 Batch 60/62] avg loss 0.00594193, throughput 4.84799K wps
Begin Testing...
[Epoch 51] train avg loss 0.0062136, dev acc 0.8260, dev avg loss 0.394912, throughput 4.92103K wps
[Epoch 52 Batch 30/62] avg loss 0.00611266, throughput 4.95351K wps
[Epoch 52 Batch 60/62] avg loss 0.00598516, throughput 4.82589K wps
Begin Testing...
[Epoch 52] train avg loss 0.00612009, dev acc 0.8348, dev avg loss 0.392669, throughput 4.89457K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.0059536, throughput 4.93308K wps
[Epoch 53 Batch 60/62] avg loss 0.00607828, throughput 4.82632K wps
Begin Testing...
[Epoch 53] train avg loss 0.00608036, dev acc 0.8230, dev avg loss 0.391572, throughput 4.88542K wps
[Epoch 54 Batch 30/62] avg loss 0.00603172, throughput 4.94472K wps
[Epoch 54 Batch 60/62] avg loss 0.00575062, throughput 4.80969K wps
Begin Testing...
[Epoch 54] train avg loss 0.00596596, dev acc 0.8260, dev avg loss 0.389435, throughput 4.88345K wps
[Epoch 55 Batch 30/62] avg loss 0.00559612, throughput 4.94771K wps
[Epoch 55 Batch 60/62] avg loss 0.00596665, throughput 4.82873K wps
Begin Testing...
[Epoch 55] train avg loss 0.00581382, dev acc 0.8319, dev avg loss 0.386458, throughput 4.89547K wps
[Epoch 56 Batch 30/62] avg loss 0.00574731, throughput 4.9353K wps
[Epoch 56 Batch 60/62] avg loss 0.00559163, throughput 4.81633K wps
Begin Testing...
[Epoch 56] train avg loss 0.00570269, dev acc 0.8289, dev avg loss 0.384788, throughput 4.88402K wps
[Epoch 57 Batch 30/62] avg loss 0.00557284, throughput 4.93985K wps
[Epoch 57 Batch 60/62] avg loss 0.00550098, throughput 4.8454K wps
Begin Testing...
[Epoch 57] train avg loss 0.00556899, dev acc 0.8289, dev avg loss 0.384588, throughput 4.89936K wps
[Epoch 58 Batch 30/62] avg loss 0.00544979, throughput 4.94014K wps
[Epoch 58 Batch 60/62] avg loss 0.00540296, throughput 4.84015K wps
Begin Testing...
[Epoch 58] train avg loss 0.0055115, dev acc 0.8319, dev avg loss 0.382569, throughput 4.89743K wps
[Epoch 59 Batch 30/62] avg loss 0.00520629, throughput 4.96464K wps
[Epoch 59 Batch 60/62] avg loss 0.00539387, throughput 4.86075K wps
Begin Testing...
[Epoch 59] train avg loss 0.00534416, dev acc 0.8230, dev avg loss 0.380343, throughput 4.91943K wps
[Epoch 60 Batch 30/62] avg loss 0.00518665, throughput 4.96165K wps
[Epoch 60 Batch 60/62] avg loss 0.0054978, throughput 4.8415K wps
Begin Testing...
[Epoch 60] train avg loss 0.00544662, dev acc 0.8260, dev avg loss 0.382102, throughput 4.90519K wps
[Epoch 61 Batch 30/62] avg loss 0.0052626, throughput 4.91003K wps
[Epoch 61 Batch 60/62] avg loss 0.00523188, throughput 4.83772K wps
Begin Testing...
[Epoch 61] train avg loss 0.00530312, dev acc 0.8289, dev avg loss 0.378015, throughput 4.88219K wps
[Epoch 62 Batch 30/62] avg loss 0.00513604, throughput 4.94673K wps
[Epoch 62 Batch 60/62] avg loss 0.00493653, throughput 4.82272K wps
Begin Testing...
[Epoch 62] train avg loss 0.00517276, dev acc 0.8289, dev avg loss 0.376294, throughput 4.88877K wps
[Epoch 63 Batch 30/62] avg loss 0.00507511, throughput 4.93635K wps
[Epoch 63 Batch 60/62] avg loss 0.00491011, throughput 4.82267K wps
Begin Testing...
[Epoch 63] train avg loss 0.00509292, dev acc 0.8289, dev avg loss 0.378119, throughput 4.8849K wps
[Epoch 64 Batch 30/62] avg loss 0.00487321, throughput 4.94112K wps
[Epoch 64 Batch 60/62] avg loss 0.00494584, throughput 4.85251K wps
Begin Testing...
[Epoch 64] train avg loss 0.00497429, dev acc 0.8289, dev avg loss 0.374607, throughput 4.90279K wps
[Epoch 65 Batch 30/62] avg loss 0.00471733, throughput 4.95503K wps
[Epoch 65 Batch 60/62] avg loss 0.00478209, throughput 4.83429K wps
Begin Testing...
[Epoch 65] train avg loss 0.00482407, dev acc 0.8319, dev avg loss 0.374239, throughput 4.9018K wps
[Epoch 66 Batch 30/62] avg loss 0.00490059, throughput 4.94009K wps
[Epoch 66 Batch 60/62] avg loss 0.00483664, throughput 4.85425K wps
Begin Testing...
[Epoch 66] train avg loss 0.00501575, dev acc 0.8348, dev avg loss 0.372023, throughput 4.90206K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/62] avg loss 0.00477646, throughput 4.97412K wps
[Epoch 67 Batch 60/62] avg loss 0.00455691, throughput 4.83008K wps
Begin Testing...
[Epoch 67] train avg loss 0.00474738, dev acc 0.8319, dev avg loss 0.372877, throughput 4.90882K wps
[Epoch 68 Batch 30/62] avg loss 0.00454848, throughput 4.94327K wps
[Epoch 68 Batch 60/62] avg loss 0.00466745, throughput 4.85053K wps
Begin Testing...
[Epoch 68] train avg loss 0.00468807, dev acc 0.8319, dev avg loss 0.369365, throughput 4.90436K wps
[Epoch 69 Batch 30/62] avg loss 0.00474451, throughput 4.93312K wps
[Epoch 69 Batch 60/62] avg loss 0.00427, throughput 4.81791K wps
Begin Testing...
[Epoch 69] train avg loss 0.00460544, dev acc 0.8289, dev avg loss 0.368845, throughput 4.88149K wps
[Epoch 70 Batch 30/62] avg loss 0.00435062, throughput 4.93556K wps
[Epoch 70 Batch 60/62] avg loss 0.00459171, throughput 4.83569K wps
Begin Testing...
[Epoch 70] train avg loss 0.00450259, dev acc 0.8319, dev avg loss 0.368135, throughput 4.89033K wps
[Epoch 71 Batch 30/62] avg loss 0.00430683, throughput 4.92688K wps
[Epoch 71 Batch 60/62] avg loss 0.00443398, throughput 4.83246K wps
Begin Testing...
[Epoch 71] train avg loss 0.00445977, dev acc 0.8319, dev avg loss 0.366768, throughput 4.8871K wps
[Epoch 72 Batch 30/62] avg loss 0.00427437, throughput 4.9603K wps
[Epoch 72 Batch 60/62] avg loss 0.00438409, throughput 4.84805K wps
Begin Testing...
[Epoch 72] train avg loss 0.00438194, dev acc 0.8378, dev avg loss 0.368116, throughput 4.91053K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/62] avg loss 0.00413726, throughput 4.94675K wps
[Epoch 73 Batch 60/62] avg loss 0.00420249, throughput 4.81066K wps
Begin Testing...
[Epoch 73] train avg loss 0.00421384, dev acc 0.8319, dev avg loss 0.367506, throughput 4.88396K wps
[Epoch 74 Batch 30/62] avg loss 0.0042011, throughput 4.90177K wps
[Epoch 74 Batch 60/62] avg loss 0.00408001, throughput 4.84044K wps
Begin Testing...
[Epoch 74] train avg loss 0.00416309, dev acc 0.8319, dev avg loss 0.366244, throughput 4.87837K wps
[Epoch 75 Batch 30/62] avg loss 0.00390163, throughput 4.9498K wps
[Epoch 75 Batch 60/62] avg loss 0.00390946, throughput 4.83542K wps
Begin Testing...
[Epoch 75] train avg loss 0.00398007, dev acc 0.8348, dev avg loss 0.364902, throughput 4.89725K wps
[Epoch 76 Batch 30/62] avg loss 0.00385705, throughput 4.93635K wps
[Epoch 76 Batch 60/62] avg loss 0.00424465, throughput 4.83006K wps
Begin Testing...
[Epoch 76] train avg loss 0.00412365, dev acc 0.8289, dev avg loss 0.363084, throughput 4.88962K wps
[Epoch 77 Batch 30/62] avg loss 0.00393375, throughput 4.9483K wps
[Epoch 77 Batch 60/62] avg loss 0.00398754, throughput 4.86685K wps
Begin Testing...
[Epoch 77] train avg loss 0.00402638, dev acc 0.8319, dev avg loss 0.362788, throughput 4.91434K wps
[Epoch 78 Batch 30/62] avg loss 0.00379534, throughput 4.9839K wps
[Epoch 78 Batch 60/62] avg loss 0.0040791, throughput 4.85262K wps
Begin Testing...
[Epoch 78] train avg loss 0.00398461, dev acc 0.8319, dev avg loss 0.362144, throughput 4.92506K wps
[Epoch 79 Batch 30/62] avg loss 0.0037018, throughput 4.99421K wps
[Epoch 79 Batch 60/62] avg loss 0.00363991, throughput 4.86385K wps
Begin Testing...
[Epoch 79] train avg loss 0.00374858, dev acc 0.8319, dev avg loss 0.362311, throughput 4.93412K wps
[Epoch 80 Batch 30/62] avg loss 0.00378116, throughput 4.9611K wps
[Epoch 80 Batch 60/62] avg loss 0.0036025, throughput 4.83626K wps
Begin Testing...
[Epoch 80] train avg loss 0.00375162, dev acc 0.8319, dev avg loss 0.362876, throughput 4.90427K wps
[Epoch 81 Batch 30/62] avg loss 0.00361195, throughput 4.95515K wps
[Epoch 81 Batch 60/62] avg loss 0.00371996, throughput 4.82491K wps
Begin Testing...
[Epoch 81] train avg loss 0.00371382, dev acc 0.8319, dev avg loss 0.361533, throughput 4.89676K wps
[Epoch 82 Batch 30/62] avg loss 0.00361591, throughput 4.94172K wps
[Epoch 82 Batch 60/62] avg loss 0.00365782, throughput 4.84607K wps
Begin Testing...
[Epoch 82] train avg loss 0.00366201, dev acc 0.8319, dev avg loss 0.361981, throughput 4.9004K wps
[Epoch 83 Batch 30/62] avg loss 0.00342788, throughput 4.93258K wps
[Epoch 83 Batch 60/62] avg loss 0.00349177, throughput 4.85066K wps
Begin Testing...
[Epoch 83] train avg loss 0.00353232, dev acc 0.8348, dev avg loss 0.36073, throughput 4.89839K wps
[Epoch 84 Batch 30/62] avg loss 0.00350437, throughput 4.98242K wps
[Epoch 84 Batch 60/62] avg loss 0.00340229, throughput 4.88682K wps
Begin Testing...
[Epoch 84] train avg loss 0.00349365, dev acc 0.8348, dev avg loss 0.360058, throughput 4.94109K wps
[Epoch 85 Batch 30/62] avg loss 0.00331816, throughput 4.98137K wps
[Epoch 85 Batch 60/62] avg loss 0.00352501, throughput 4.83586K wps
Begin Testing...
[Epoch 85] train avg loss 0.00346281, dev acc 0.8348, dev avg loss 0.358811, throughput 4.91393K wps
[Epoch 86 Batch 30/62] avg loss 0.00340142, throughput 4.94469K wps
[Epoch 86 Batch 60/62] avg loss 0.00329901, throughput 4.85101K wps
Begin Testing...
[Epoch 86] train avg loss 0.00341149, dev acc 0.8348, dev avg loss 0.358044, throughput 4.90467K wps
[Epoch 87 Batch 30/62] avg loss 0.00328606, throughput 4.94706K wps
[Epoch 87 Batch 60/62] avg loss 0.00329748, throughput 4.83853K wps
Begin Testing...
[Epoch 87] train avg loss 0.00336583, dev acc 0.8289, dev avg loss 0.35893, throughput 4.89785K wps
[Epoch 88 Batch 30/62] avg loss 0.00314085, throughput 4.93559K wps
[Epoch 88 Batch 60/62] avg loss 0.00327883, throughput 4.84421K wps
Begin Testing...
[Epoch 88] train avg loss 0.00327284, dev acc 0.8348, dev avg loss 0.358232, throughput 4.89659K wps
[Epoch 89 Batch 30/62] avg loss 0.00315115, throughput 4.96415K wps
[Epoch 89 Batch 60/62] avg loss 0.00314602, throughput 4.83883K wps
Begin Testing...
[Epoch 89] train avg loss 0.0031887, dev acc 0.8289, dev avg loss 0.35997, throughput 4.90662K wps
[Epoch 90 Batch 30/62] avg loss 0.00293695, throughput 4.95548K wps
[Epoch 90 Batch 60/62] avg loss 0.00328732, throughput 4.85704K wps
Begin Testing...
[Epoch 90] train avg loss 0.00316991, dev acc 0.8348, dev avg loss 0.357948, throughput 4.91271K wps
[Epoch 91 Batch 30/62] avg loss 0.00306343, throughput 4.95598K wps
[Epoch 91 Batch 60/62] avg loss 0.00297575, throughput 4.82763K wps
Begin Testing...
[Epoch 91] train avg loss 0.00305344, dev acc 0.8407, dev avg loss 0.356427, throughput 4.89831K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/62] avg loss 0.00302386, throughput 4.95917K wps
[Epoch 92 Batch 60/62] avg loss 0.00293717, throughput 4.83527K wps
Begin Testing...
[Epoch 92] train avg loss 0.00303423, dev acc 0.8437, dev avg loss 0.355214, throughput 4.90433K wps
Observed Improvement.
Begin Testing...
[Epoch 93 Batch 30/62] avg loss 0.00296628, throughput 4.9868K wps
[Epoch 93 Batch 60/62] avg loss 0.00306643, throughput 4.88591K wps
Begin Testing...
[Epoch 93] train avg loss 0.00312279, dev acc 0.8407, dev avg loss 0.355707, throughput 4.94358K wps
[Epoch 94 Batch 30/62] avg loss 0.00293678, throughput 4.94065K wps
[Epoch 94 Batch 60/62] avg loss 0.00290137, throughput 4.83608K wps
Begin Testing...
[Epoch 94] train avg loss 0.00293136, dev acc 0.8348, dev avg loss 0.356203, throughput 4.89495K wps
[Epoch 95 Batch 30/62] avg loss 0.00267343, throughput 4.98708K wps
[Epoch 95 Batch 60/62] avg loss 0.00292427, throughput 4.85719K wps
Begin Testing...
[Epoch 95] train avg loss 0.00283414, dev acc 0.8437, dev avg loss 0.354532, throughput 4.92617K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/62] avg loss 0.00271892, throughput 4.96098K wps
[Epoch 96 Batch 60/62] avg loss 0.00296932, throughput 4.84737K wps
Begin Testing...
[Epoch 96] train avg loss 0.00291688, dev acc 0.8407, dev avg loss 0.355113, throughput 4.90991K wps
[Epoch 97 Batch 30/62] avg loss 0.00273099, throughput 4.94009K wps
[Epoch 97 Batch 60/62] avg loss 0.00260929, throughput 4.83771K wps
Begin Testing...
[Epoch 97] train avg loss 0.00269672, dev acc 0.8319, dev avg loss 0.355361, throughput 4.89284K wps
[Epoch 98 Batch 30/62] avg loss 0.00280453, throughput 4.93856K wps
[Epoch 98 Batch 60/62] avg loss 0.00262458, throughput 4.81244K wps
Begin Testing...
[Epoch 98] train avg loss 0.00275646, dev acc 0.8407, dev avg loss 0.353088, throughput 4.88198K wps
[Epoch 99 Batch 30/62] avg loss 0.00266587, throughput 4.94898K wps
[Epoch 99 Batch 60/62] avg loss 0.00275791, throughput 4.85941K wps
Begin Testing...
[Epoch 99] train avg loss 0.00274063, dev acc 0.8407, dev avg loss 0.353569, throughput 4.91089K wps
[Epoch 100 Batch 30/62] avg loss 0.00259267, throughput 4.93934K wps
[Epoch 100 Batch 60/62] avg loss 0.00257766, throughput 4.84644K wps
Begin Testing...
[Epoch 100] train avg loss 0.00266753, dev acc 0.8437, dev avg loss 0.353025, throughput 4.89842K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/62] avg loss 0.00264676, throughput 4.94768K wps
[Epoch 101 Batch 60/62] avg loss 0.00235219, throughput 4.81778K wps
Begin Testing...
[Epoch 101] train avg loss 0.00254938, dev acc 0.8378, dev avg loss 0.354979, throughput 4.88791K wps
[Epoch 102 Batch 30/62] avg loss 0.00248622, throughput 4.94946K wps
[Epoch 102 Batch 60/62] avg loss 0.00254037, throughput 4.83299K wps
Begin Testing...
[Epoch 102] train avg loss 0.0025281, dev acc 0.8378, dev avg loss 0.35392, throughput 4.89641K wps
[Epoch 103 Batch 30/62] avg loss 0.0024846, throughput 4.94448K wps
[Epoch 103 Batch 60/62] avg loss 0.00261609, throughput 4.80111K wps
Begin Testing...
[Epoch 103] train avg loss 0.00257261, dev acc 0.8407, dev avg loss 0.352921, throughput 4.87738K wps
[Epoch 104 Batch 30/62] avg loss 0.00241853, throughput 4.91622K wps
[Epoch 104 Batch 60/62] avg loss 0.00246502, throughput 4.84431K wps
Begin Testing...
[Epoch 104] train avg loss 0.00247291, dev acc 0.8289, dev avg loss 0.356384, throughput 4.8861K wps
[Epoch 105 Batch 30/62] avg loss 0.00232061, throughput 4.92803K wps
[Epoch 105 Batch 60/62] avg loss 0.00236862, throughput 4.84242K wps
Begin Testing...
[Epoch 105] train avg loss 0.00235996, dev acc 0.8407, dev avg loss 0.352739, throughput 4.89314K wps
[Epoch 106 Batch 30/62] avg loss 0.00233124, throughput 4.98947K wps
[Epoch 106 Batch 60/62] avg loss 0.00240589, throughput 4.86772K wps
Begin Testing...
[Epoch 106] train avg loss 0.00242426, dev acc 0.8289, dev avg loss 0.358569, throughput 4.93556K wps
[Epoch 107 Batch 30/62] avg loss 0.00228691, throughput 4.97401K wps
[Epoch 107 Batch 60/62] avg loss 0.00231394, throughput 4.84287K wps
Begin Testing...
[Epoch 107] train avg loss 0.0023198, dev acc 0.8289, dev avg loss 0.355746, throughput 4.91555K wps
[Epoch 108 Batch 30/62] avg loss 0.00211399, throughput 4.94629K wps
[Epoch 108 Batch 60/62] avg loss 0.00247481, throughput 4.82854K wps
Begin Testing...
[Epoch 108] train avg loss 0.00231768, dev acc 0.8466, dev avg loss 0.353044, throughput 4.89345K wps
Observed Improvement.
Begin Testing...
[Epoch 109 Batch 30/62] avg loss 0.00226659, throughput 4.96905K wps
[Epoch 109 Batch 60/62] avg loss 0.00221643, throughput 4.84269K wps
Begin Testing...
[Epoch 109] train avg loss 0.00225782, dev acc 0.8407, dev avg loss 0.352174, throughput 4.91254K wps
[Epoch 110 Batch 30/62] avg loss 0.00211821, throughput 4.96003K wps
[Epoch 110 Batch 60/62] avg loss 0.00231393, throughput 4.88219K wps
Begin Testing...
[Epoch 110] train avg loss 0.00222157, dev acc 0.8319, dev avg loss 0.355057, throughput 4.92812K wps
[Epoch 111 Batch 30/62] avg loss 0.0023288, throughput 4.99062K wps
[Epoch 111 Batch 60/62] avg loss 0.00213127, throughput 4.89439K wps
Begin Testing...
[Epoch 111] train avg loss 0.002245, dev acc 0.8289, dev avg loss 0.354935, throughput 4.94941K wps
[Epoch 112 Batch 30/62] avg loss 0.00216162, throughput 4.99057K wps
[Epoch 112 Batch 60/62] avg loss 0.00209653, throughput 4.88725K wps
Begin Testing...
[Epoch 112] train avg loss 0.00215781, dev acc 0.8407, dev avg loss 0.352948, throughput 4.94381K wps
[Epoch 113 Batch 30/62] avg loss 0.00213654, throughput 4.99042K wps
[Epoch 113 Batch 60/62] avg loss 0.00212753, throughput 4.84833K wps
Begin Testing...
[Epoch 113] train avg loss 0.00213074, dev acc 0.8378, dev avg loss 0.351535, throughput 4.92559K wps
[Epoch 114 Batch 30/62] avg loss 0.00201403, throughput 4.94284K wps
[Epoch 114 Batch 60/62] avg loss 0.00221848, throughput 4.81686K wps
Begin Testing...
[Epoch 114] train avg loss 0.0021587, dev acc 0.8496, dev avg loss 0.352635, throughput 4.88665K wps
Observed Improvement.
Begin Testing...
[Epoch 115 Batch 30/62] avg loss 0.00196071, throughput 4.93617K wps
[Epoch 115 Batch 60/62] avg loss 0.00211479, throughput 4.82856K wps
Begin Testing...
[Epoch 115] train avg loss 0.00206916, dev acc 0.8407, dev avg loss 0.351916, throughput 4.88836K wps
[Epoch 116 Batch 30/62] avg loss 0.00198463, throughput 4.94387K wps
[Epoch 116 Batch 60/62] avg loss 0.00192252, throughput 4.81391K wps
Begin Testing...
[Epoch 116] train avg loss 0.0019553, dev acc 0.8378, dev avg loss 0.354441, throughput 4.88315K wps
[Epoch 117 Batch 30/62] avg loss 0.00196306, throughput 4.8974K wps
[Epoch 117 Batch 60/62] avg loss 0.00200042, throughput 4.81059K wps
Begin Testing...
[Epoch 117] train avg loss 0.00200844, dev acc 0.8437, dev avg loss 0.352552, throughput 4.85981K wps
[Epoch 118 Batch 30/62] avg loss 0.00203232, throughput 4.94626K wps
[Epoch 118 Batch 60/62] avg loss 0.00198347, throughput 4.80289K wps
Begin Testing...
[Epoch 118] train avg loss 0.00203156, dev acc 0.8407, dev avg loss 0.35257, throughput 4.88062K wps
[Epoch 119 Batch 30/62] avg loss 0.00185509, throughput 4.92804K wps
[Epoch 119 Batch 60/62] avg loss 0.00197595, throughput 4.83266K wps
Begin Testing...
[Epoch 119] train avg loss 0.0019405, dev acc 0.8407, dev avg loss 0.352058, throughput 4.88742K wps
[Epoch 120 Batch 30/62] avg loss 0.00174728, throughput 4.96226K wps
[Epoch 120 Batch 60/62] avg loss 0.00198163, throughput 4.85685K wps
Begin Testing...
[Epoch 120] train avg loss 0.00189615, dev acc 0.8407, dev avg loss 0.352102, throughput 4.9172K wps
[Epoch 121 Batch 30/62] avg loss 0.0018573, throughput 4.97232K wps
[Epoch 121 Batch 60/62] avg loss 0.00187812, throughput 4.84545K wps
Begin Testing...
[Epoch 121] train avg loss 0.00188056, dev acc 0.8407, dev avg loss 0.352643, throughput 4.91429K wps
[Epoch 122 Batch 30/62] avg loss 0.00179544, throughput 4.9184K wps
[Epoch 122 Batch 60/62] avg loss 0.00181157, throughput 4.83068K wps
Begin Testing...
[Epoch 122] train avg loss 0.00182063, dev acc 0.8378, dev avg loss 0.351535, throughput 4.88253K wps
[Epoch 123 Batch 30/62] avg loss 0.00170733, throughput 4.94716K wps
[Epoch 123 Batch 60/62] avg loss 0.00191518, throughput 4.84262K wps
Begin Testing...
[Epoch 123] train avg loss 0.00182894, dev acc 0.8437, dev avg loss 0.35235, throughput 4.89989K wps
[Epoch 124 Batch 30/62] avg loss 0.00166166, throughput 4.93792K wps
[Epoch 124 Batch 60/62] avg loss 0.00172158, throughput 4.82818K wps
Begin Testing...
[Epoch 124] train avg loss 0.00175006, dev acc 0.8407, dev avg loss 0.35434, throughput 4.88904K wps
[Epoch 125 Batch 30/62] avg loss 0.00167396, throughput 4.94767K wps
[Epoch 125 Batch 60/62] avg loss 0.00176788, throughput 4.81663K wps
Begin Testing...
[Epoch 125] train avg loss 0.00175897, dev acc 0.8378, dev avg loss 0.353482, throughput 4.88733K wps
[Epoch 126 Batch 30/62] avg loss 0.00182579, throughput 4.9447K wps
[Epoch 126 Batch 60/62] avg loss 0.00158123, throughput 4.85208K wps
Begin Testing...
[Epoch 126] train avg loss 0.00171955, dev acc 0.8407, dev avg loss 0.35482, throughput 4.9052K wps
[Epoch 127 Batch 30/62] avg loss 0.0016342, throughput 4.94043K wps
[Epoch 127 Batch 60/62] avg loss 0.0016353, throughput 4.86121K wps
Begin Testing...
[Epoch 127] train avg loss 0.00167472, dev acc 0.8407, dev avg loss 0.357613, throughput 4.90628K wps
[Epoch 128 Batch 30/62] avg loss 0.00171042, throughput 4.98022K wps
[Epoch 128 Batch 60/62] avg loss 0.00166622, throughput 4.84883K wps
Begin Testing...
[Epoch 128] train avg loss 0.00172094, dev acc 0.8348, dev avg loss 0.353801, throughput 4.91999K wps
[Epoch 129 Batch 30/62] avg loss 0.00160306, throughput 4.95593K wps
[Epoch 129 Batch 60/62] avg loss 0.00164257, throughput 4.856K wps
Begin Testing...
[Epoch 129] train avg loss 0.00164629, dev acc 0.8466, dev avg loss 0.354735, throughput 4.91106K wps
[Epoch 130 Batch 30/62] avg loss 0.00160974, throughput 4.96104K wps
[Epoch 130 Batch 60/62] avg loss 0.00154407, throughput 4.81704K wps
Begin Testing...
[Epoch 130] train avg loss 0.00158325, dev acc 0.8407, dev avg loss 0.354694, throughput 4.89601K wps
[Epoch 131 Batch 30/62] avg loss 0.00153632, throughput 4.9328K wps
[Epoch 131 Batch 60/62] avg loss 0.00167836, throughput 4.82112K wps
Begin Testing...
[Epoch 131] train avg loss 0.00162135, dev acc 0.8466, dev avg loss 0.353439, throughput 4.8817K wps
[Epoch 132 Batch 30/62] avg loss 0.00154936, throughput 4.94427K wps
[Epoch 132 Batch 60/62] avg loss 0.00162112, throughput 4.84274K wps
Begin Testing...
[Epoch 132] train avg loss 0.00160391, dev acc 0.8407, dev avg loss 0.352571, throughput 4.90174K wps
[Epoch 133 Batch 30/62] avg loss 0.00149871, throughput 4.95595K wps
[Epoch 133 Batch 60/62] avg loss 0.00157231, throughput 4.85011K wps
Begin Testing...
[Epoch 133] train avg loss 0.00155433, dev acc 0.8378, dev avg loss 0.355121, throughput 4.90871K wps
[Epoch 134 Batch 30/62] avg loss 0.00141161, throughput 4.96439K wps
[Epoch 134 Batch 60/62] avg loss 0.00154254, throughput 4.85835K wps
Begin Testing...
[Epoch 134] train avg loss 0.00150191, dev acc 0.8437, dev avg loss 0.354728, throughput 4.91812K wps
[Epoch 135 Batch 30/62] avg loss 0.00143942, throughput 4.97722K wps
[Epoch 135 Batch 60/62] avg loss 0.00159831, throughput 4.81446K wps
Begin Testing...
[Epoch 135] train avg loss 0.00155109, dev acc 0.8407, dev avg loss 0.354482, throughput 4.90112K wps
[Epoch 136 Batch 30/62] avg loss 0.00143423, throughput 4.93469K wps
[Epoch 136 Batch 60/62] avg loss 0.00149316, throughput 4.84148K wps
Begin Testing...
[Epoch 136] train avg loss 0.00147247, dev acc 0.8407, dev avg loss 0.352811, throughput 4.89606K wps
[Epoch 137 Batch 30/62] avg loss 0.00152234, throughput 4.97848K wps
[Epoch 137 Batch 60/62] avg loss 0.00145886, throughput 4.83438K wps
Begin Testing...
[Epoch 137] train avg loss 0.00156685, dev acc 0.8437, dev avg loss 0.354365, throughput 4.91022K wps
[Epoch 138 Batch 30/62] avg loss 0.00136712, throughput 4.93635K wps
[Epoch 138 Batch 60/62] avg loss 0.00155196, throughput 4.8113K wps
Begin Testing...
[Epoch 138] train avg loss 0.00149092, dev acc 0.8407, dev avg loss 0.354537, throughput 4.8799K wps
[Epoch 139 Batch 30/62] avg loss 0.00147268, throughput 4.93781K wps
[Epoch 139 Batch 60/62] avg loss 0.00145202, throughput 4.80621K wps
Begin Testing...
[Epoch 139] train avg loss 0.00145563, dev acc 0.8407, dev avg loss 0.353891, throughput 4.87602K wps
[Epoch 140 Batch 30/62] avg loss 0.00142506, throughput 4.90937K wps
[Epoch 140 Batch 60/62] avg loss 0.00138039, throughput 4.81193K wps
Begin Testing...
[Epoch 140] train avg loss 0.00141693, dev acc 0.8407, dev avg loss 0.356183, throughput 4.86637K wps
[Epoch 141 Batch 30/62] avg loss 0.00140338, throughput 4.91701K wps
[Epoch 141 Batch 60/62] avg loss 0.00139104, throughput 4.82908K wps
Begin Testing...
[Epoch 141] train avg loss 0.00141698, dev acc 0.8378, dev avg loss 0.354429, throughput 4.8809K wps
[Epoch 142 Batch 30/62] avg loss 0.00144356, throughput 4.99987K wps
[Epoch 142 Batch 60/62] avg loss 0.0013893, throughput 4.85531K wps
Begin Testing...
[Epoch 142] train avg loss 0.00143158, dev acc 0.8348, dev avg loss 0.354782, throughput 4.9346K wps
[Epoch 143 Batch 30/62] avg loss 0.0012389, throughput 5.00934K wps
[Epoch 143 Batch 60/62] avg loss 0.0013058, throughput 4.89264K wps
Begin Testing...
[Epoch 143] train avg loss 0.00128249, dev acc 0.8348, dev avg loss 0.354569, throughput 4.95422K wps
[Epoch 144 Batch 30/62] avg loss 0.0012556, throughput 4.94358K wps
[Epoch 144 Batch 60/62] avg loss 0.00136855, throughput 4.80551K wps
Begin Testing...
[Epoch 144] train avg loss 0.00135618, dev acc 0.8437, dev avg loss 0.355727, throughput 4.88155K wps
[Epoch 145 Batch 30/62] avg loss 0.00124019, throughput 4.91893K wps
[Epoch 145 Batch 60/62] avg loss 0.00140239, throughput 4.81792K wps
Begin Testing...
[Epoch 145] train avg loss 0.00133109, dev acc 0.8378, dev avg loss 0.359299, throughput 4.87528K wps
[Epoch 146 Batch 30/62] avg loss 0.00134903, throughput 4.92083K wps
[Epoch 146 Batch 60/62] avg loss 0.00124644, throughput 4.82127K wps
Begin Testing...
[Epoch 146] train avg loss 0.00129725, dev acc 0.8466, dev avg loss 0.357661, throughput 4.87706K wps
[Epoch 147 Batch 30/62] avg loss 0.00127713, throughput 4.92963K wps
[Epoch 147 Batch 60/62] avg loss 0.00138257, throughput 4.86917K wps
Begin Testing...
[Epoch 147] train avg loss 0.00133613, dev acc 0.8348, dev avg loss 0.356098, throughput 4.9062K wps
[Epoch 148 Batch 30/62] avg loss 0.00122136, throughput 4.99423K wps
[Epoch 148 Batch 60/62] avg loss 0.00131263, throughput 4.87815K wps
Begin Testing...
[Epoch 148] train avg loss 0.00128174, dev acc 0.8378, dev avg loss 0.355688, throughput 4.94224K wps
[Epoch 149 Batch 30/62] avg loss 0.00117455, throughput 4.99203K wps
[Epoch 149 Batch 60/62] avg loss 0.00125055, throughput 4.87879K wps
Begin Testing...
[Epoch 149] train avg loss 0.00124765, dev acc 0.8407, dev avg loss 0.356376, throughput 4.9415K wps
[Epoch 150 Batch 30/62] avg loss 0.00137385, throughput 4.93729K wps
[Epoch 150 Batch 60/62] avg loss 0.00121375, throughput 4.84643K wps
Begin Testing...
[Epoch 150] train avg loss 0.00131159, dev acc 0.8378, dev avg loss 0.355629, throughput 4.89952K wps
[Epoch 151 Batch 30/62] avg loss 0.00116856, throughput 4.93129K wps
[Epoch 151 Batch 60/62] avg loss 0.00121859, throughput 4.8385K wps
Begin Testing...
[Epoch 151] train avg loss 0.00120319, dev acc 0.8378, dev avg loss 0.356709, throughput 4.89162K wps
[Epoch 152 Batch 30/62] avg loss 0.00129321, throughput 4.94804K wps
[Epoch 152 Batch 60/62] avg loss 0.00111456, throughput 4.84868K wps
Begin Testing...
[Epoch 152] train avg loss 0.00121354, dev acc 0.8407, dev avg loss 0.357433, throughput 4.90327K wps
[Epoch 153 Batch 30/62] avg loss 0.00113557, throughput 4.95296K wps
[Epoch 153 Batch 60/62] avg loss 0.00117917, throughput 4.85801K wps
Begin Testing...
[Epoch 153] train avg loss 0.00117591, dev acc 0.8378, dev avg loss 0.357085, throughput 4.91152K wps
[Epoch 154 Batch 30/62] avg loss 0.00113342, throughput 4.94578K wps
[Epoch 154 Batch 60/62] avg loss 0.0012064, throughput 4.83713K wps
Begin Testing...
[Epoch 154] train avg loss 0.00119495, dev acc 0.8378, dev avg loss 0.356481, throughput 4.89869K wps
[Epoch 155 Batch 30/62] avg loss 0.00117125, throughput 4.953K wps
[Epoch 155 Batch 60/62] avg loss 0.0012019, throughput 4.87634K wps
Begin Testing...
[Epoch 155] train avg loss 0.00119884, dev acc 0.8378, dev avg loss 0.356865, throughput 4.92203K wps
[Epoch 156 Batch 30/62] avg loss 0.00105984, throughput 4.99843K wps
[Epoch 156 Batch 60/62] avg loss 0.00107965, throughput 4.87664K wps
Begin Testing...
[Epoch 156] train avg loss 0.0010698, dev acc 0.8348, dev avg loss 0.357196, throughput 4.94419K wps
[Epoch 157 Batch 30/62] avg loss 0.00104173, throughput 4.96395K wps
[Epoch 157 Batch 60/62] avg loss 0.00123147, throughput 4.83774K wps
Begin Testing...
[Epoch 157] train avg loss 0.00114351, dev acc 0.8378, dev avg loss 0.357194, throughput 4.90665K wps
[Epoch 158 Batch 30/62] avg loss 0.00115532, throughput 4.9431K wps
[Epoch 158 Batch 60/62] avg loss 0.00107046, throughput 4.81778K wps
Begin Testing...
[Epoch 158] train avg loss 0.00112021, dev acc 0.8348, dev avg loss 0.357884, throughput 4.88682K wps
[Epoch 159 Batch 30/62] avg loss 0.0010818, throughput 4.93928K wps
[Epoch 159 Batch 60/62] avg loss 0.00107824, throughput 4.83814K wps
Begin Testing...
[Epoch 159] train avg loss 0.00110955, dev acc 0.8466, dev avg loss 0.360747, throughput 4.89586K wps
[Epoch 160 Batch 30/62] avg loss 0.00109725, throughput 4.95798K wps
[Epoch 160 Batch 60/62] avg loss 0.00114573, throughput 4.81412K wps
Begin Testing...
[Epoch 160] train avg loss 0.00113087, dev acc 0.8437, dev avg loss 0.35846, throughput 4.89304K wps
[Epoch 161 Batch 30/62] avg loss 0.000955041, throughput 4.95583K wps
[Epoch 161 Batch 60/62] avg loss 0.00120522, throughput 4.84319K wps
Begin Testing...
[Epoch 161] train avg loss 0.00107789, dev acc 0.8319, dev avg loss 0.358009, throughput 4.90522K wps
[Epoch 162 Batch 30/62] avg loss 0.00104295, throughput 4.95335K wps
[Epoch 162 Batch 60/62] avg loss 0.00101554, throughput 4.85155K wps
Begin Testing...
[Epoch 162] train avg loss 0.00104051, dev acc 0.8378, dev avg loss 0.358183, throughput 4.90906K wps
[Epoch 163 Batch 30/62] avg loss 0.0010632, throughput 4.96071K wps
[Epoch 163 Batch 60/62] avg loss 0.00103317, throughput 4.83114K wps
Begin Testing...
[Epoch 163] train avg loss 0.00107674, dev acc 0.8437, dev avg loss 0.358732, throughput 4.90283K wps
[Epoch 164 Batch 30/62] avg loss 0.00107421, throughput 4.95975K wps
[Epoch 164 Batch 60/62] avg loss 0.0010304, throughput 4.83152K wps
Begin Testing...
[Epoch 164] train avg loss 0.00104955, dev acc 0.8378, dev avg loss 0.357456, throughput 4.90026K wps
[Epoch 165 Batch 30/62] avg loss 0.0010306, throughput 4.92952K wps
[Epoch 165 Batch 60/62] avg loss 0.00100011, throughput 4.81396K wps
Begin Testing...
[Epoch 165] train avg loss 0.00101962, dev acc 0.8319, dev avg loss 0.357457, throughput 4.87671K wps
[Epoch 166 Batch 30/62] avg loss 0.00111937, throughput 4.94804K wps
[Epoch 166 Batch 60/62] avg loss 0.00103819, throughput 4.81218K wps
Begin Testing...
[Epoch 166] train avg loss 0.00108391, dev acc 0.8319, dev avg loss 0.358883, throughput 4.88596K wps
[Epoch 167 Batch 30/62] avg loss 0.00099054, throughput 4.95695K wps
[Epoch 167 Batch 60/62] avg loss 0.000994585, throughput 4.79923K wps
Begin Testing...
[Epoch 167] train avg loss 0.000999841, dev acc 0.8348, dev avg loss 0.359866, throughput 4.88412K wps
[Epoch 168 Batch 30/62] avg loss 0.000907913, throughput 4.95088K wps
[Epoch 168 Batch 60/62] avg loss 0.000995125, throughput 4.80639K wps
Begin Testing...
[Epoch 168] train avg loss 0.000959336, dev acc 0.8466, dev avg loss 0.360817, throughput 4.88494K wps
[Epoch 169 Batch 30/62] avg loss 0.00100784, throughput 4.94013K wps
[Epoch 169 Batch 60/62] avg loss 0.000972164, throughput 4.81462K wps
Begin Testing...
[Epoch 169] train avg loss 0.00100253, dev acc 0.8378, dev avg loss 0.360038, throughput 4.88305K wps
[Epoch 170 Batch 30/62] avg loss 0.000926548, throughput 4.95168K wps
[Epoch 170 Batch 60/62] avg loss 0.000994124, throughput 4.80746K wps
Begin Testing...
[Epoch 170] train avg loss 0.000966264, dev acc 0.8407, dev avg loss 0.360515, throughput 4.88582K wps
[Epoch 171 Batch 30/62] avg loss 0.000990363, throughput 4.93637K wps
[Epoch 171 Batch 60/62] avg loss 0.000927732, throughput 4.85338K wps
Begin Testing...
[Epoch 171] train avg loss 0.000968954, dev acc 0.8348, dev avg loss 0.359955, throughput 4.90228K wps
[Epoch 172 Batch 30/62] avg loss 0.000874015, throughput 4.9395K wps
[Epoch 172 Batch 60/62] avg loss 0.000960617, throughput 4.84013K wps
Begin Testing...
[Epoch 172] train avg loss 0.000926936, dev acc 0.8407, dev avg loss 0.359847, throughput 4.89629K wps
[Epoch 173 Batch 30/62] avg loss 0.000956539, throughput 4.9366K wps
[Epoch 173 Batch 60/62] avg loss 0.000982958, throughput 4.82367K wps
Begin Testing...
[Epoch 173] train avg loss 0.000969084, dev acc 0.8378, dev avg loss 0.360373, throughput 4.88641K wps
[Epoch 174 Batch 30/62] avg loss 0.000958866, throughput 4.933K wps
[Epoch 174 Batch 60/62] avg loss 0.000875272, throughput 4.83938K wps
Begin Testing...
[Epoch 174] train avg loss 0.000933448, dev acc 0.8378, dev avg loss 0.360871, throughput 4.892K wps
[Epoch 175 Batch 30/62] avg loss 0.000854597, throughput 4.9582K wps
[Epoch 175 Batch 60/62] avg loss 0.0008567, throughput 4.85202K wps
Begin Testing...
[Epoch 175] train avg loss 0.000871346, dev acc 0.8378, dev avg loss 0.359988, throughput 4.91146K wps
[Epoch 176 Batch 30/62] avg loss 0.000829197, throughput 4.97837K wps
[Epoch 176 Batch 60/62] avg loss 0.000932445, throughput 4.88678K wps
Begin Testing...
[Epoch 176] train avg loss 0.000911969, dev acc 0.8378, dev avg loss 0.359686, throughput 4.93765K wps
[Epoch 177 Batch 30/62] avg loss 0.000841944, throughput 4.99981K wps
[Epoch 177 Batch 60/62] avg loss 0.000969218, throughput 4.88815K wps
Begin Testing...
[Epoch 177] train avg loss 0.000905908, dev acc 0.8437, dev avg loss 0.359891, throughput 4.9505K wps
[Epoch 178 Batch 30/62] avg loss 0.000858847, throughput 5.01058K wps
[Epoch 178 Batch 60/62] avg loss 0.000867218, throughput 4.89247K wps
Begin Testing...
[Epoch 178] train avg loss 0.000871989, dev acc 0.8348, dev avg loss 0.362187, throughput 4.95729K wps
[Epoch 179 Batch 30/62] avg loss 0.000882015, throughput 4.99484K wps
[Epoch 179 Batch 60/62] avg loss 0.00088835, throughput 4.8943K wps
Begin Testing...
[Epoch 179] train avg loss 0.000895387, dev acc 0.8466, dev avg loss 0.361893, throughput 4.95065K wps
[Epoch 180 Batch 30/62] avg loss 0.000879647, throughput 4.94121K wps
[Epoch 180 Batch 60/62] avg loss 0.000863833, throughput 4.84707K wps
Begin Testing...
[Epoch 180] train avg loss 0.000875175, dev acc 0.8407, dev avg loss 0.361815, throughput 4.90048K wps
[Epoch 181 Batch 30/62] avg loss 0.000865496, throughput 4.93906K wps
[Epoch 181 Batch 60/62] avg loss 0.000864395, throughput 4.83843K wps
Begin Testing...
[Epoch 181] train avg loss 0.000863152, dev acc 0.8437, dev avg loss 0.362044, throughput 4.895K wps
[Epoch 182 Batch 30/62] avg loss 0.00091686, throughput 4.93622K wps
[Epoch 182 Batch 60/62] avg loss 0.000800491, throughput 4.83839K wps
Begin Testing...
[Epoch 182] train avg loss 0.000866772, dev acc 0.8407, dev avg loss 0.362089, throughput 4.89345K wps
[Epoch 183 Batch 30/62] avg loss 0.000854427, throughput 4.62023K wps
[Epoch 183 Batch 60/62] avg loss 0.000780659, throughput 4.81207K wps
Begin Testing...
[Epoch 183] train avg loss 0.000820843, dev acc 0.8407, dev avg loss 0.361058, throughput 4.72591K wps
[Epoch 184 Batch 30/62] avg loss 0.000883077, throughput 4.92906K wps
[Epoch 184 Batch 60/62] avg loss 0.000761868, throughput 4.8129K wps
Begin Testing...
[Epoch 184] train avg loss 0.000824677, dev acc 0.8437, dev avg loss 0.362174, throughput 4.87473K wps
[Epoch 185 Batch 30/62] avg loss 0.000818392, throughput 4.92476K wps
[Epoch 185 Batch 60/62] avg loss 0.000804151, throughput 4.82718K wps
Begin Testing...
[Epoch 185] train avg loss 0.000821245, dev acc 0.8407, dev avg loss 0.362666, throughput 4.88221K wps
[Epoch 186 Batch 30/62] avg loss 0.000731693, throughput 4.94496K wps
[Epoch 186 Batch 60/62] avg loss 0.000852743, throughput 4.8177K wps
Begin Testing...
[Epoch 186] train avg loss 0.000826373, dev acc 0.8407, dev avg loss 0.362878, throughput 4.88671K wps
[Epoch 187 Batch 30/62] avg loss 0.000832237, throughput 4.93603K wps
[Epoch 187 Batch 60/62] avg loss 0.000869722, throughput 4.82504K wps
Begin Testing...
[Epoch 187] train avg loss 0.000856299, dev acc 0.8407, dev avg loss 0.362544, throughput 4.88697K wps
[Epoch 188 Batch 30/62] avg loss 0.000773919, throughput 4.93851K wps
[Epoch 188 Batch 60/62] avg loss 0.000799572, throughput 4.82268K wps
Begin Testing...
[Epoch 188] train avg loss 0.000794246, dev acc 0.8407, dev avg loss 0.363054, throughput 4.88566K wps
[Epoch 189 Batch 30/62] avg loss 0.00084082, throughput 4.94739K wps
[Epoch 189 Batch 60/62] avg loss 0.000804482, throughput 4.8282K wps
Begin Testing...
[Epoch 189] train avg loss 0.000825113, dev acc 0.8407, dev avg loss 0.363496, throughput 4.89279K wps
[Epoch 190 Batch 30/62] avg loss 0.00086018, throughput 4.93874K wps
[Epoch 190 Batch 60/62] avg loss 0.000750523, throughput 4.82915K wps
Begin Testing...
[Epoch 190] train avg loss 0.000804666, dev acc 0.8437, dev avg loss 0.364754, throughput 4.88882K wps
[Epoch 191 Batch 30/62] avg loss 0.000810533, throughput 4.95268K wps
[Epoch 191 Batch 60/62] avg loss 0.000778773, throughput 4.83495K wps
Begin Testing...
[Epoch 191] train avg loss 0.000798537, dev acc 0.8466, dev avg loss 0.365699, throughput 4.89854K wps
[Epoch 192 Batch 30/62] avg loss 0.000751039, throughput 5.01719K wps
[Epoch 192 Batch 60/62] avg loss 0.000789822, throughput 4.88298K wps
Begin Testing...
[Epoch 192] train avg loss 0.000779659, dev acc 0.8407, dev avg loss 0.364325, throughput 4.95294K wps
[Epoch 193 Batch 30/62] avg loss 0.000652739, throughput 4.98021K wps
[Epoch 193 Batch 60/62] avg loss 0.00079584, throughput 4.84185K wps
Begin Testing...
[Epoch 193] train avg loss 0.000737584, dev acc 0.8496, dev avg loss 0.366584, throughput 4.91575K wps
Observed Improvement.
Begin Testing...
[Epoch 194 Batch 30/62] avg loss 0.000765425, throughput 4.96872K wps
[Epoch 194 Batch 60/62] avg loss 0.000804224, throughput 4.83111K wps
Begin Testing...
[Epoch 194] train avg loss 0.000785339, dev acc 0.8407, dev avg loss 0.365353, throughput 4.90698K wps
[Epoch 195 Batch 30/62] avg loss 0.000690845, throughput 4.9639K wps
[Epoch 195 Batch 60/62] avg loss 0.000720645, throughput 4.8502K wps
Begin Testing...
[Epoch 195] train avg loss 0.000732752, dev acc 0.8319, dev avg loss 0.370956, throughput 4.91222K wps
[Epoch 196 Batch 30/62] avg loss 0.000716375, throughput 4.94633K wps
[Epoch 196 Batch 60/62] avg loss 0.00076625, throughput 4.81989K wps
Begin Testing...
[Epoch 196] train avg loss 0.000749451, dev acc 0.8407, dev avg loss 0.36552, throughput 4.88994K wps
[Epoch 197 Batch 30/62] avg loss 0.000807481, throughput 4.96499K wps
[Epoch 197 Batch 60/62] avg loss 0.000672251, throughput 4.85679K wps
Begin Testing...
[Epoch 197] train avg loss 0.000744866, dev acc 0.8378, dev avg loss 0.365731, throughput 4.91885K wps
[Epoch 198 Batch 30/62] avg loss 0.000719073, throughput 5.00912K wps
[Epoch 198 Batch 60/62] avg loss 0.000751537, throughput 4.89248K wps
Begin Testing...
[Epoch 198] train avg loss 0.000751157, dev acc 0.8407, dev avg loss 0.365033, throughput 4.95649K wps
[Epoch 199 Batch 30/62] avg loss 0.000651629, throughput 4.97433K wps
[Epoch 199 Batch 60/62] avg loss 0.000722658, throughput 4.84944K wps
Begin Testing...
[Epoch 199] train avg loss 0.000693006, dev acc 0.8407, dev avg loss 0.365981, throughput 4.91804K wps
Test loss 0.36521, test acc 0.8382
Total time cost 277.31s
[Epoch 0 Batch 30/62] avg loss 0.0134248, throughput 4.70323K wps
[Epoch 0 Batch 60/62] avg loss 0.0129955, throughput 4.82887K wps
Begin Testing...
[Epoch 0] train avg loss 0.0134053, dev acc 0.6578, dev avg loss 0.636985, throughput 4.77358K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0130458, throughput 4.92567K wps
[Epoch 1 Batch 60/62] avg loss 0.0129564, throughput 4.82204K wps
Begin Testing...
[Epoch 1] train avg loss 0.013175, dev acc 0.6578, dev avg loss 0.632523, throughput 4.88109K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0131143, throughput 4.90754K wps
[Epoch 2 Batch 60/62] avg loss 0.0127111, throughput 4.80972K wps
Begin Testing...
[Epoch 2] train avg loss 0.0130898, dev acc 0.6578, dev avg loss 0.627036, throughput 4.86549K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0129182, throughput 4.91855K wps
[Epoch 3 Batch 60/62] avg loss 0.012611, throughput 4.82516K wps
Begin Testing...
[Epoch 3] train avg loss 0.0129429, dev acc 0.6578, dev avg loss 0.621635, throughput 4.87735K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.012511, throughput 4.92431K wps
[Epoch 4 Batch 60/62] avg loss 0.012774, throughput 4.82258K wps
Begin Testing...
[Epoch 4] train avg loss 0.0127917, dev acc 0.6578, dev avg loss 0.617665, throughput 4.87767K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0126339, throughput 4.9372K wps
[Epoch 5 Batch 60/62] avg loss 0.0123412, throughput 4.85609K wps
Begin Testing...
[Epoch 5] train avg loss 0.0127388, dev acc 0.6578, dev avg loss 0.615192, throughput 4.9026K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0124098, throughput 4.94665K wps
[Epoch 6 Batch 60/62] avg loss 0.0124742, throughput 4.81438K wps
Begin Testing...
[Epoch 6] train avg loss 0.0125922, dev acc 0.6578, dev avg loss 0.608687, throughput 4.88587K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.012139, throughput 4.92463K wps
[Epoch 7 Batch 60/62] avg loss 0.0123457, throughput 4.84604K wps
Begin Testing...
[Epoch 7] train avg loss 0.0124253, dev acc 0.6578, dev avg loss 0.603569, throughput 4.89309K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0120979, throughput 4.96433K wps
[Epoch 8 Batch 60/62] avg loss 0.012389, throughput 4.85951K wps
Begin Testing...
[Epoch 8] train avg loss 0.0123714, dev acc 0.6578, dev avg loss 0.597638, throughput 4.91868K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0120174, throughput 4.97136K wps
[Epoch 9 Batch 60/62] avg loss 0.0120211, throughput 4.83321K wps
Begin Testing...
[Epoch 9] train avg loss 0.0121663, dev acc 0.6578, dev avg loss 0.592273, throughput 4.90928K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.011856, throughput 4.95674K wps
[Epoch 10 Batch 60/62] avg loss 0.0118565, throughput 4.88317K wps
Begin Testing...
[Epoch 10] train avg loss 0.0119941, dev acc 0.6578, dev avg loss 0.585965, throughput 4.9276K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0117178, throughput 4.983K wps
[Epoch 11 Batch 60/62] avg loss 0.0117389, throughput 4.83758K wps
Begin Testing...
[Epoch 11] train avg loss 0.0118973, dev acc 0.6608, dev avg loss 0.580028, throughput 4.91614K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0116684, throughput 4.94783K wps
[Epoch 12 Batch 60/62] avg loss 0.0116354, throughput 4.84522K wps
Begin Testing...
[Epoch 12] train avg loss 0.0117947, dev acc 0.6667, dev avg loss 0.574216, throughput 4.90244K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0114751, throughput 4.94148K wps
[Epoch 13 Batch 60/62] avg loss 0.0113679, throughput 4.82077K wps
Begin Testing...
[Epoch 13] train avg loss 0.0115866, dev acc 0.6844, dev avg loss 0.568407, throughput 4.88765K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0112624, throughput 4.90345K wps
[Epoch 14 Batch 60/62] avg loss 0.011314, throughput 4.81803K wps
Begin Testing...
[Epoch 14] train avg loss 0.0114149, dev acc 0.6667, dev avg loss 0.560442, throughput 4.86764K wps
[Epoch 15 Batch 30/62] avg loss 0.0111431, throughput 4.90751K wps
[Epoch 15 Batch 60/62] avg loss 0.0110407, throughput 4.80937K wps
Begin Testing...
[Epoch 15] train avg loss 0.0112781, dev acc 0.7168, dev avg loss 0.554588, throughput 4.86492K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0108658, throughput 4.89896K wps
[Epoch 16 Batch 60/62] avg loss 0.0110889, throughput 4.79101K wps
Begin Testing...
[Epoch 16] train avg loss 0.0111059, dev acc 0.7080, dev avg loss 0.545946, throughput 4.85223K wps
[Epoch 17 Batch 30/62] avg loss 0.010854, throughput 4.94547K wps
[Epoch 17 Batch 60/62] avg loss 0.0107271, throughput 4.8455K wps
Begin Testing...
[Epoch 17] train avg loss 0.0109383, dev acc 0.7109, dev avg loss 0.53814, throughput 4.90236K wps
[Epoch 18 Batch 30/62] avg loss 0.0106334, throughput 4.96539K wps
[Epoch 18 Batch 60/62] avg loss 0.0103955, throughput 4.86196K wps
Begin Testing...
[Epoch 18] train avg loss 0.0106464, dev acc 0.7670, dev avg loss 0.534532, throughput 4.91807K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0105769, throughput 4.93878K wps
[Epoch 19 Batch 60/62] avg loss 0.0102713, throughput 4.82561K wps
Begin Testing...
[Epoch 19] train avg loss 0.0105479, dev acc 0.7286, dev avg loss 0.523381, throughput 4.88937K wps
[Epoch 20 Batch 30/62] avg loss 0.0101476, throughput 4.94664K wps
[Epoch 20 Batch 60/62] avg loss 0.0102713, throughput 4.8155K wps
Begin Testing...
[Epoch 20] train avg loss 0.0103163, dev acc 0.7316, dev avg loss 0.515692, throughput 4.88646K wps
[Epoch 21 Batch 30/62] avg loss 0.0100596, throughput 4.94281K wps
[Epoch 21 Batch 60/62] avg loss 0.00991204, throughput 4.81939K wps
Begin Testing...
[Epoch 21] train avg loss 0.0100657, dev acc 0.7493, dev avg loss 0.508243, throughput 4.88627K wps
[Epoch 22 Batch 30/62] avg loss 0.00983111, throughput 4.94969K wps
[Epoch 22 Batch 60/62] avg loss 0.00973906, throughput 4.86359K wps
Begin Testing...
[Epoch 22] train avg loss 0.00990731, dev acc 0.7581, dev avg loss 0.501392, throughput 4.91481K wps
[Epoch 23 Batch 30/62] avg loss 0.00978047, throughput 4.99017K wps
[Epoch 23 Batch 60/62] avg loss 0.00969917, throughput 4.87141K wps
Begin Testing...
[Epoch 23] train avg loss 0.00980323, dev acc 0.7522, dev avg loss 0.494938, throughput 4.93669K wps
[Epoch 24 Batch 30/62] avg loss 0.00941617, throughput 5.00375K wps
[Epoch 24 Batch 60/62] avg loss 0.00944548, throughput 4.87314K wps
Begin Testing...
[Epoch 24] train avg loss 0.00960487, dev acc 0.7906, dev avg loss 0.489172, throughput 4.94453K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.00942747, throughput 4.9558K wps
[Epoch 25 Batch 60/62] avg loss 0.00915984, throughput 4.81326K wps
Begin Testing...
[Epoch 25] train avg loss 0.00942279, dev acc 0.7906, dev avg loss 0.483213, throughput 4.88973K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.00914885, throughput 4.92006K wps
[Epoch 26 Batch 60/62] avg loss 0.00907511, throughput 4.82309K wps
Begin Testing...
[Epoch 26] train avg loss 0.00925815, dev acc 0.7876, dev avg loss 0.476341, throughput 4.87881K wps
[Epoch 27 Batch 30/62] avg loss 0.00911732, throughput 4.93201K wps
[Epoch 27 Batch 60/62] avg loss 0.00891011, throughput 4.8429K wps
Begin Testing...
[Epoch 27] train avg loss 0.00911564, dev acc 0.7847, dev avg loss 0.470902, throughput 4.89469K wps
[Epoch 28 Batch 30/62] avg loss 0.00888765, throughput 4.97128K wps
[Epoch 28 Batch 60/62] avg loss 0.00882685, throughput 4.86595K wps
Begin Testing...
[Epoch 28] train avg loss 0.009011, dev acc 0.7906, dev avg loss 0.465449, throughput 4.92422K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00857678, throughput 4.99902K wps
[Epoch 29 Batch 60/62] avg loss 0.00879097, throughput 4.88205K wps
Begin Testing...
[Epoch 29] train avg loss 0.00876227, dev acc 0.7935, dev avg loss 0.460741, throughput 4.94669K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00852734, throughput 4.98966K wps
[Epoch 30 Batch 60/62] avg loss 0.0085544, throughput 4.88615K wps
Begin Testing...
[Epoch 30] train avg loss 0.00863088, dev acc 0.7994, dev avg loss 0.455899, throughput 4.94425K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00852633, throughput 5.01428K wps
[Epoch 31 Batch 60/62] avg loss 0.00817917, throughput 4.88244K wps
Begin Testing...
[Epoch 31] train avg loss 0.00840045, dev acc 0.8171, dev avg loss 0.451791, throughput 4.95142K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/62] avg loss 0.00830826, throughput 4.98734K wps
[Epoch 32 Batch 60/62] avg loss 0.00815798, throughput 4.85112K wps
Begin Testing...
[Epoch 32] train avg loss 0.00833965, dev acc 0.7817, dev avg loss 0.449873, throughput 4.92464K wps
[Epoch 33 Batch 30/62] avg loss 0.00806691, throughput 4.93371K wps
[Epoch 33 Batch 60/62] avg loss 0.00793426, throughput 4.82019K wps
Begin Testing...
[Epoch 33] train avg loss 0.0080791, dev acc 0.8024, dev avg loss 0.443174, throughput 4.88323K wps
[Epoch 34 Batch 30/62] avg loss 0.0080221, throughput 4.93084K wps
[Epoch 34 Batch 60/62] avg loss 0.00799321, throughput 4.82339K wps
Begin Testing...
[Epoch 34] train avg loss 0.00807433, dev acc 0.8053, dev avg loss 0.439763, throughput 4.8823K wps
[Epoch 35 Batch 30/62] avg loss 0.00795225, throughput 4.92709K wps
[Epoch 35 Batch 60/62] avg loss 0.00796381, throughput 4.81881K wps
Begin Testing...
[Epoch 35] train avg loss 0.00807117, dev acc 0.8230, dev avg loss 0.435796, throughput 4.87834K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/62] avg loss 0.00769084, throughput 4.92491K wps
[Epoch 36 Batch 60/62] avg loss 0.00775013, throughput 4.81169K wps
Begin Testing...
[Epoch 36] train avg loss 0.00780168, dev acc 0.8289, dev avg loss 0.432152, throughput 4.87493K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/62] avg loss 0.0077268, throughput 4.92492K wps
[Epoch 37 Batch 60/62] avg loss 0.00747752, throughput 4.85262K wps
Begin Testing...
[Epoch 37] train avg loss 0.0077368, dev acc 0.8407, dev avg loss 0.433828, throughput 4.89709K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/62] avg loss 0.00763674, throughput 4.96175K wps
[Epoch 38 Batch 60/62] avg loss 0.00747545, throughput 4.86707K wps
Begin Testing...
[Epoch 38] train avg loss 0.0076778, dev acc 0.8348, dev avg loss 0.42528, throughput 4.92078K wps
[Epoch 39 Batch 30/62] avg loss 0.00721719, throughput 4.95179K wps
[Epoch 39 Batch 60/62] avg loss 0.00745152, throughput 4.83328K wps
Begin Testing...
[Epoch 39] train avg loss 0.00739613, dev acc 0.8260, dev avg loss 0.421856, throughput 4.89924K wps
[Epoch 40 Batch 30/62] avg loss 0.00711606, throughput 4.93667K wps
[Epoch 40 Batch 60/62] avg loss 0.00733891, throughput 4.84069K wps
Begin Testing...
[Epoch 40] train avg loss 0.00727777, dev acc 0.8230, dev avg loss 0.419058, throughput 4.89476K wps
[Epoch 41 Batch 30/62] avg loss 0.00735719, throughput 4.95526K wps
[Epoch 41 Batch 60/62] avg loss 0.00688198, throughput 4.82893K wps
Begin Testing...
[Epoch 41] train avg loss 0.00723291, dev acc 0.8289, dev avg loss 0.415544, throughput 4.89748K wps
[Epoch 42 Batch 30/62] avg loss 0.00704805, throughput 4.91509K wps
[Epoch 42 Batch 60/62] avg loss 0.006992, throughput 4.83418K wps
Begin Testing...
[Epoch 42] train avg loss 0.00710186, dev acc 0.8319, dev avg loss 0.412616, throughput 4.88174K wps
[Epoch 43 Batch 30/62] avg loss 0.00702233, throughput 4.92204K wps
[Epoch 43 Batch 60/62] avg loss 0.00692355, throughput 4.85231K wps
Begin Testing...
[Epoch 43] train avg loss 0.00704293, dev acc 0.8260, dev avg loss 0.410026, throughput 4.89559K wps
[Epoch 44 Batch 30/62] avg loss 0.00656073, throughput 4.99907K wps
[Epoch 44 Batch 60/62] avg loss 0.00706184, throughput 4.88568K wps
Begin Testing...
[Epoch 44] train avg loss 0.00687517, dev acc 0.8024, dev avg loss 0.411552, throughput 4.94767K wps
[Epoch 45 Batch 30/62] avg loss 0.00674713, throughput 5.01069K wps
[Epoch 45 Batch 60/62] avg loss 0.00678439, throughput 4.89716K wps
Begin Testing...
[Epoch 45] train avg loss 0.00683848, dev acc 0.8466, dev avg loss 0.403603, throughput 4.95995K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/62] avg loss 0.00648496, throughput 5.00772K wps
[Epoch 46 Batch 60/62] avg loss 0.00663173, throughput 4.90284K wps
Begin Testing...
[Epoch 46] train avg loss 0.00658485, dev acc 0.8496, dev avg loss 0.401218, throughput 4.96208K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.00654995, throughput 4.99535K wps
[Epoch 47 Batch 60/62] avg loss 0.00637602, throughput 4.88683K wps
Begin Testing...
[Epoch 47] train avg loss 0.00652224, dev acc 0.8496, dev avg loss 0.398997, throughput 4.9469K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/62] avg loss 0.00642541, throughput 4.93629K wps
[Epoch 48 Batch 60/62] avg loss 0.00609909, throughput 4.83209K wps
Begin Testing...
[Epoch 48] train avg loss 0.00629778, dev acc 0.8319, dev avg loss 0.398066, throughput 4.88993K wps
[Epoch 49 Batch 30/62] avg loss 0.00645494, throughput 4.93145K wps
[Epoch 49 Batch 60/62] avg loss 0.00603707, throughput 4.8367K wps
Begin Testing...
[Epoch 49] train avg loss 0.00629346, dev acc 0.8614, dev avg loss 0.394287, throughput 4.88884K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00644673, throughput 4.92666K wps
[Epoch 50 Batch 60/62] avg loss 0.0059623, throughput 4.82528K wps
Begin Testing...
[Epoch 50] train avg loss 0.00628544, dev acc 0.8584, dev avg loss 0.392237, throughput 4.88194K wps
[Epoch 51 Batch 30/62] avg loss 0.0061212, throughput 4.92191K wps
[Epoch 51 Batch 60/62] avg loss 0.00597464, throughput 4.8102K wps
Begin Testing...
[Epoch 51] train avg loss 0.00611578, dev acc 0.8437, dev avg loss 0.390014, throughput 4.87203K wps
[Epoch 52 Batch 30/62] avg loss 0.0058683, throughput 4.92008K wps
[Epoch 52 Batch 60/62] avg loss 0.00594484, throughput 4.813K wps
Begin Testing...
[Epoch 52] train avg loss 0.00594163, dev acc 0.8525, dev avg loss 0.387282, throughput 4.87234K wps
[Epoch 53 Batch 30/62] avg loss 0.0060605, throughput 4.90917K wps
[Epoch 53 Batch 60/62] avg loss 0.00570053, throughput 4.79912K wps
Begin Testing...
[Epoch 53] train avg loss 0.00594917, dev acc 0.8407, dev avg loss 0.386667, throughput 4.86088K wps
[Epoch 54 Batch 30/62] avg loss 0.00565474, throughput 4.88779K wps
[Epoch 54 Batch 60/62] avg loss 0.00579519, throughput 4.80175K wps
Begin Testing...
[Epoch 54] train avg loss 0.00579999, dev acc 0.8437, dev avg loss 0.384149, throughput 4.85088K wps
[Epoch 55 Batch 30/62] avg loss 0.00547354, throughput 4.91545K wps
[Epoch 55 Batch 60/62] avg loss 0.00565919, throughput 4.82608K wps
Begin Testing...
[Epoch 55] train avg loss 0.00565455, dev acc 0.8555, dev avg loss 0.381844, throughput 4.87838K wps
[Epoch 56 Batch 30/62] avg loss 0.00554203, throughput 4.92941K wps
[Epoch 56 Batch 60/62] avg loss 0.00567164, throughput 4.83086K wps
Begin Testing...
[Epoch 56] train avg loss 0.00567264, dev acc 0.8407, dev avg loss 0.37991, throughput 4.888K wps
[Epoch 57 Batch 30/62] avg loss 0.00538164, throughput 4.97093K wps
[Epoch 57 Batch 60/62] avg loss 0.00537435, throughput 4.84009K wps
Begin Testing...
[Epoch 57] train avg loss 0.0054187, dev acc 0.8466, dev avg loss 0.377683, throughput 4.91183K wps
[Epoch 58 Batch 30/62] avg loss 0.00529166, throughput 4.96996K wps
[Epoch 58 Batch 60/62] avg loss 0.00535392, throughput 4.84462K wps
Begin Testing...
[Epoch 58] train avg loss 0.0053731, dev acc 0.8496, dev avg loss 0.375997, throughput 4.91339K wps
[Epoch 59 Batch 30/62] avg loss 0.00531314, throughput 4.96756K wps
[Epoch 59 Batch 60/62] avg loss 0.00511158, throughput 4.87984K wps
Begin Testing...
[Epoch 59] train avg loss 0.00523788, dev acc 0.8466, dev avg loss 0.37487, throughput 4.93123K wps
[Epoch 60 Batch 30/62] avg loss 0.00529314, throughput 4.99879K wps
[Epoch 60 Batch 60/62] avg loss 0.00493589, throughput 4.8461K wps
Begin Testing...
[Epoch 60] train avg loss 0.00518825, dev acc 0.8525, dev avg loss 0.372305, throughput 4.92892K wps
[Epoch 61 Batch 30/62] avg loss 0.00526804, throughput 5.00548K wps
[Epoch 61 Batch 60/62] avg loss 0.00489864, throughput 4.89521K wps
Begin Testing...
[Epoch 61] train avg loss 0.00511618, dev acc 0.8466, dev avg loss 0.371046, throughput 4.95555K wps
[Epoch 62 Batch 30/62] avg loss 0.00486083, throughput 4.96423K wps
[Epoch 62 Batch 60/62] avg loss 0.0050685, throughput 4.88717K wps
Begin Testing...
[Epoch 62] train avg loss 0.00503187, dev acc 0.8437, dev avg loss 0.37023, throughput 4.93168K wps
[Epoch 63 Batch 30/62] avg loss 0.0047317, throughput 4.94543K wps
[Epoch 63 Batch 60/62] avg loss 0.00486321, throughput 4.83259K wps
Begin Testing...
[Epoch 63] train avg loss 0.00483214, dev acc 0.8437, dev avg loss 0.372009, throughput 4.89505K wps
[Epoch 64 Batch 30/62] avg loss 0.00480365, throughput 4.94802K wps
[Epoch 64 Batch 60/62] avg loss 0.0048448, throughput 4.85289K wps
Begin Testing...
[Epoch 64] train avg loss 0.0049005, dev acc 0.8466, dev avg loss 0.36937, throughput 4.90657K wps
[Epoch 65 Batch 30/62] avg loss 0.00477597, throughput 4.95011K wps
[Epoch 65 Batch 60/62] avg loss 0.00479248, throughput 4.82718K wps
Begin Testing...
[Epoch 65] train avg loss 0.00482836, dev acc 0.8496, dev avg loss 0.366561, throughput 4.89437K wps
[Epoch 66 Batch 30/62] avg loss 0.00478883, throughput 4.93443K wps
[Epoch 66 Batch 60/62] avg loss 0.00443601, throughput 4.81562K wps
Begin Testing...
[Epoch 66] train avg loss 0.00468764, dev acc 0.8466, dev avg loss 0.365553, throughput 4.88076K wps
[Epoch 67 Batch 30/62] avg loss 0.00463575, throughput 4.91518K wps
[Epoch 67 Batch 60/62] avg loss 0.00451125, throughput 4.8158K wps
Begin Testing...
[Epoch 67] train avg loss 0.00458485, dev acc 0.8466, dev avg loss 0.364397, throughput 4.8721K wps
[Epoch 68 Batch 30/62] avg loss 0.00453778, throughput 4.91721K wps
[Epoch 68 Batch 60/62] avg loss 0.00423871, throughput 4.83358K wps
Begin Testing...
[Epoch 68] train avg loss 0.00450812, dev acc 0.8378, dev avg loss 0.363332, throughput 4.88162K wps
[Epoch 69 Batch 30/62] avg loss 0.00443081, throughput 4.92979K wps
[Epoch 69 Batch 60/62] avg loss 0.00434729, throughput 4.83307K wps
Begin Testing...
[Epoch 69] train avg loss 0.00446857, dev acc 0.8466, dev avg loss 0.362154, throughput 4.88693K wps
[Epoch 70 Batch 30/62] avg loss 0.00414577, throughput 4.9387K wps
[Epoch 70 Batch 60/62] avg loss 0.00436839, throughput 4.81834K wps
Begin Testing...
[Epoch 70] train avg loss 0.00430083, dev acc 0.8407, dev avg loss 0.362881, throughput 4.88481K wps
[Epoch 71 Batch 30/62] avg loss 0.00426141, throughput 4.93993K wps
[Epoch 71 Batch 60/62] avg loss 0.00419056, throughput 4.83026K wps
Begin Testing...
[Epoch 71] train avg loss 0.00430248, dev acc 0.8407, dev avg loss 0.360272, throughput 4.89097K wps
[Epoch 72 Batch 30/62] avg loss 0.00405623, throughput 4.93689K wps
[Epoch 72 Batch 60/62] avg loss 0.00441676, throughput 4.83638K wps
Begin Testing...
[Epoch 72] train avg loss 0.0042803, dev acc 0.8525, dev avg loss 0.360113, throughput 4.89349K wps
[Epoch 73 Batch 30/62] avg loss 0.00420919, throughput 4.97743K wps
[Epoch 73 Batch 60/62] avg loss 0.00398447, throughput 4.84184K wps
Begin Testing...
[Epoch 73] train avg loss 0.00419579, dev acc 0.8525, dev avg loss 0.358493, throughput 4.91553K wps
[Epoch 74 Batch 30/62] avg loss 0.00413717, throughput 4.95602K wps
[Epoch 74 Batch 60/62] avg loss 0.0040902, throughput 4.8336K wps
Begin Testing...
[Epoch 74] train avg loss 0.00417022, dev acc 0.8407, dev avg loss 0.357759, throughput 4.89813K wps
[Epoch 75 Batch 30/62] avg loss 0.00381155, throughput 4.95569K wps
[Epoch 75 Batch 60/62] avg loss 0.00406991, throughput 4.83092K wps
Begin Testing...
[Epoch 75] train avg loss 0.00399059, dev acc 0.8407, dev avg loss 0.357296, throughput 4.90069K wps
[Epoch 76 Batch 30/62] avg loss 0.00401038, throughput 4.99725K wps
[Epoch 76 Batch 60/62] avg loss 0.00376797, throughput 4.88443K wps
Begin Testing...
[Epoch 76] train avg loss 0.00394205, dev acc 0.8496, dev avg loss 0.357044, throughput 4.94623K wps
[Epoch 77 Batch 30/62] avg loss 0.00365591, throughput 4.95647K wps
[Epoch 77 Batch 60/62] avg loss 0.00390241, throughput 4.80627K wps
Begin Testing...
[Epoch 77] train avg loss 0.00381919, dev acc 0.8437, dev avg loss 0.358821, throughput 4.88613K wps
[Epoch 78 Batch 30/62] avg loss 0.00374796, throughput 4.95962K wps
[Epoch 78 Batch 60/62] avg loss 0.00365315, throughput 4.842K wps
Begin Testing...
[Epoch 78] train avg loss 0.00372386, dev acc 0.8407, dev avg loss 0.355681, throughput 4.90719K wps
[Epoch 79 Batch 30/62] avg loss 0.00368263, throughput 4.97173K wps
[Epoch 79 Batch 60/62] avg loss 0.00380567, throughput 4.84837K wps
Begin Testing...
[Epoch 79] train avg loss 0.00383274, dev acc 0.8466, dev avg loss 0.354708, throughput 4.91606K wps
[Epoch 80 Batch 30/62] avg loss 0.00354907, throughput 4.99514K wps
[Epoch 80 Batch 60/62] avg loss 0.00352361, throughput 4.87986K wps
Begin Testing...
[Epoch 80] train avg loss 0.00363952, dev acc 0.8407, dev avg loss 0.354705, throughput 4.94413K wps
[Epoch 81 Batch 30/62] avg loss 0.00356058, throughput 4.99703K wps
[Epoch 81 Batch 60/62] avg loss 0.00338144, throughput 4.88228K wps
Begin Testing...
[Epoch 81] train avg loss 0.00355242, dev acc 0.8466, dev avg loss 0.353021, throughput 4.94349K wps
[Epoch 82 Batch 30/62] avg loss 0.00351976, throughput 4.9453K wps
[Epoch 82 Batch 60/62] avg loss 0.00347002, throughput 4.8465K wps
Begin Testing...
[Epoch 82] train avg loss 0.00355685, dev acc 0.8407, dev avg loss 0.353833, throughput 4.90213K wps
[Epoch 83 Batch 30/62] avg loss 0.00336547, throughput 4.95002K wps
[Epoch 83 Batch 60/62] avg loss 0.0034105, throughput 4.83815K wps
Begin Testing...
[Epoch 83] train avg loss 0.0033971, dev acc 0.8437, dev avg loss 0.352899, throughput 4.90069K wps
[Epoch 84 Batch 30/62] avg loss 0.0033147, throughput 4.94329K wps
[Epoch 84 Batch 60/62] avg loss 0.00330959, throughput 4.83035K wps
Begin Testing...
[Epoch 84] train avg loss 0.00330123, dev acc 0.8407, dev avg loss 0.353478, throughput 4.89324K wps
[Epoch 85 Batch 30/62] avg loss 0.00342343, throughput 4.92091K wps
[Epoch 85 Batch 60/62] avg loss 0.00336696, throughput 4.81191K wps
Begin Testing...
[Epoch 85] train avg loss 0.003411, dev acc 0.8407, dev avg loss 0.351944, throughput 4.87372K wps
[Epoch 86 Batch 30/62] avg loss 0.00309623, throughput 4.91385K wps
[Epoch 86 Batch 60/62] avg loss 0.00322478, throughput 4.80638K wps
Begin Testing...
[Epoch 86] train avg loss 0.00325449, dev acc 0.8437, dev avg loss 0.352392, throughput 4.86637K wps
[Epoch 87 Batch 30/62] avg loss 0.00316812, throughput 4.93078K wps
[Epoch 87 Batch 60/62] avg loss 0.00308811, throughput 4.80881K wps
Begin Testing...
[Epoch 87] train avg loss 0.00318161, dev acc 0.8437, dev avg loss 0.352448, throughput 4.87586K wps
[Epoch 88 Batch 30/62] avg loss 0.00311262, throughput 4.91366K wps
[Epoch 88 Batch 60/62] avg loss 0.00306878, throughput 4.80646K wps
Begin Testing...
[Epoch 88] train avg loss 0.00312277, dev acc 0.8437, dev avg loss 0.352271, throughput 4.86649K wps
[Epoch 89 Batch 30/62] avg loss 0.00295679, throughput 4.92021K wps
[Epoch 89 Batch 60/62] avg loss 0.00292111, throughput 4.84798K wps
Begin Testing...
[Epoch 89] train avg loss 0.0029535, dev acc 0.8466, dev avg loss 0.351522, throughput 4.89225K wps
[Epoch 90 Batch 30/62] avg loss 0.00300239, throughput 4.98061K wps
[Epoch 90 Batch 60/62] avg loss 0.00301484, throughput 4.86048K wps
Begin Testing...
[Epoch 90] train avg loss 0.00303325, dev acc 0.8437, dev avg loss 0.351212, throughput 4.92615K wps
[Epoch 91 Batch 30/62] avg loss 0.00285484, throughput 4.96444K wps
[Epoch 91 Batch 60/62] avg loss 0.00276226, throughput 4.88205K wps
Begin Testing...
[Epoch 91] train avg loss 0.00285164, dev acc 0.8437, dev avg loss 0.353126, throughput 4.92925K wps
[Epoch 92 Batch 30/62] avg loss 0.00276605, throughput 4.93075K wps
[Epoch 92 Batch 60/62] avg loss 0.00295193, throughput 4.87154K wps
Begin Testing...
[Epoch 92] train avg loss 0.00287803, dev acc 0.8466, dev avg loss 0.349943, throughput 4.90765K wps
[Epoch 93 Batch 30/62] avg loss 0.00292737, throughput 4.98114K wps
[Epoch 93 Batch 60/62] avg loss 0.00288181, throughput 4.88295K wps
Begin Testing...
[Epoch 93] train avg loss 0.00293453, dev acc 0.8466, dev avg loss 0.349526, throughput 4.93835K wps
[Epoch 94 Batch 30/62] avg loss 0.00273114, throughput 4.98189K wps
[Epoch 94 Batch 60/62] avg loss 0.00285864, throughput 4.86187K wps
Begin Testing...
[Epoch 94] train avg loss 0.0028002, dev acc 0.8378, dev avg loss 0.352589, throughput 4.92796K wps
[Epoch 95 Batch 30/62] avg loss 0.00266493, throughput 5.01354K wps
[Epoch 95 Batch 60/62] avg loss 0.00285881, throughput 4.8768K wps
Begin Testing...
[Epoch 95] train avg loss 0.00276669, dev acc 0.8466, dev avg loss 0.350496, throughput 4.95026K wps
[Epoch 96 Batch 30/62] avg loss 0.0027769, throughput 5.00844K wps
[Epoch 96 Batch 60/62] avg loss 0.00265431, throughput 4.89474K wps
Begin Testing...
[Epoch 96] train avg loss 0.00275578, dev acc 0.8437, dev avg loss 0.349482, throughput 4.9566K wps
[Epoch 97 Batch 30/62] avg loss 0.00266908, throughput 4.99962K wps
[Epoch 97 Batch 60/62] avg loss 0.00273447, throughput 4.86209K wps
Begin Testing...
[Epoch 97] train avg loss 0.00272961, dev acc 0.8437, dev avg loss 0.350264, throughput 4.93776K wps
[Epoch 98 Batch 30/62] avg loss 0.00249694, throughput 4.99667K wps
[Epoch 98 Batch 60/62] avg loss 0.0025881, throughput 4.87052K wps
Begin Testing...
[Epoch 98] train avg loss 0.00257121, dev acc 0.8466, dev avg loss 0.348928, throughput 4.93934K wps
[Epoch 99 Batch 30/62] avg loss 0.00258846, throughput 4.96955K wps
[Epoch 99 Batch 60/62] avg loss 0.0023763, throughput 4.8443K wps
Begin Testing...
[Epoch 99] train avg loss 0.00249262, dev acc 0.8437, dev avg loss 0.350701, throughput 4.91245K wps
[Epoch 100 Batch 30/62] avg loss 0.00266446, throughput 4.93755K wps
[Epoch 100 Batch 60/62] avg loss 0.0025372, throughput 4.8076K wps
Begin Testing...
[Epoch 100] train avg loss 0.00261628, dev acc 0.8437, dev avg loss 0.348268, throughput 4.87825K wps
[Epoch 101 Batch 30/62] avg loss 0.0025367, throughput 4.91362K wps
[Epoch 101 Batch 60/62] avg loss 0.00233247, throughput 4.80395K wps
Begin Testing...
[Epoch 101] train avg loss 0.00244079, dev acc 0.8466, dev avg loss 0.348395, throughput 4.86524K wps
[Epoch 102 Batch 30/62] avg loss 0.00225427, throughput 4.91425K wps
[Epoch 102 Batch 60/62] avg loss 0.00256333, throughput 4.81672K wps
Begin Testing...
[Epoch 102] train avg loss 0.00242556, dev acc 0.8437, dev avg loss 0.349441, throughput 4.87259K wps
[Epoch 103 Batch 30/62] avg loss 0.00250779, throughput 4.95057K wps
[Epoch 103 Batch 60/62] avg loss 0.00219747, throughput 4.82579K wps
Begin Testing...
[Epoch 103] train avg loss 0.00238739, dev acc 0.8496, dev avg loss 0.350481, throughput 4.89357K wps
[Epoch 104 Batch 30/62] avg loss 0.00240916, throughput 4.92922K wps
[Epoch 104 Batch 60/62] avg loss 0.00251769, throughput 4.83213K wps
Begin Testing...
[Epoch 104] train avg loss 0.00248382, dev acc 0.8496, dev avg loss 0.348282, throughput 4.88535K wps
[Epoch 105 Batch 30/62] avg loss 0.00221225, throughput 4.95104K wps
[Epoch 105 Batch 60/62] avg loss 0.0022128, throughput 4.84164K wps
Begin Testing...
[Epoch 105] train avg loss 0.00224152, dev acc 0.8407, dev avg loss 0.350242, throughput 4.90262K wps
[Epoch 106 Batch 30/62] avg loss 0.00233728, throughput 4.94611K wps
[Epoch 106 Batch 60/62] avg loss 0.00224039, throughput 4.84013K wps
Begin Testing...
[Epoch 106] train avg loss 0.00229687, dev acc 0.8437, dev avg loss 0.34867, throughput 4.89872K wps
[Epoch 107 Batch 30/62] avg loss 0.00225437, throughput 4.91885K wps
[Epoch 107 Batch 60/62] avg loss 0.00220924, throughput 4.84004K wps
Begin Testing...
[Epoch 107] train avg loss 0.00225053, dev acc 0.8437, dev avg loss 0.348049, throughput 4.88648K wps
[Epoch 108 Batch 30/62] avg loss 0.00216341, throughput 4.93503K wps
[Epoch 108 Batch 60/62] avg loss 0.00220482, throughput 4.83519K wps
Begin Testing...
[Epoch 108] train avg loss 0.00222561, dev acc 0.8437, dev avg loss 0.347792, throughput 4.8923K wps
[Epoch 109 Batch 30/62] avg loss 0.00221858, throughput 4.96444K wps
[Epoch 109 Batch 60/62] avg loss 0.00204664, throughput 4.85365K wps
Begin Testing...
[Epoch 109] train avg loss 0.00216634, dev acc 0.8437, dev avg loss 0.347468, throughput 4.91621K wps
[Epoch 110 Batch 30/62] avg loss 0.00211299, throughput 4.9819K wps
[Epoch 110 Batch 60/62] avg loss 0.00217528, throughput 4.89478K wps
Begin Testing...
[Epoch 110] train avg loss 0.00215911, dev acc 0.8437, dev avg loss 0.350516, throughput 4.94412K wps
[Epoch 111 Batch 30/62] avg loss 0.00204163, throughput 5.0051K wps
[Epoch 111 Batch 60/62] avg loss 0.00204946, throughput 4.87199K wps
Begin Testing...
[Epoch 111] train avg loss 0.00207851, dev acc 0.8437, dev avg loss 0.349691, throughput 4.9443K wps
[Epoch 112 Batch 30/62] avg loss 0.00219151, throughput 4.97937K wps
[Epoch 112 Batch 60/62] avg loss 0.00193395, throughput 4.83891K wps
Begin Testing...
[Epoch 112] train avg loss 0.0020797, dev acc 0.8466, dev avg loss 0.349071, throughput 4.91449K wps
[Epoch 113 Batch 30/62] avg loss 0.00185756, throughput 4.92668K wps
[Epoch 113 Batch 60/62] avg loss 0.00227281, throughput 4.82965K wps
Begin Testing...
[Epoch 113] train avg loss 0.00207999, dev acc 0.8466, dev avg loss 0.348281, throughput 4.88443K wps
[Epoch 114 Batch 30/62] avg loss 0.00207339, throughput 4.94805K wps
[Epoch 114 Batch 60/62] avg loss 0.00200178, throughput 4.82463K wps
Begin Testing...
[Epoch 114] train avg loss 0.00203964, dev acc 0.8437, dev avg loss 0.350811, throughput 4.89191K wps
[Epoch 115 Batch 30/62] avg loss 0.00190435, throughput 4.94052K wps
[Epoch 115 Batch 60/62] avg loss 0.00196101, throughput 4.85925K wps
Begin Testing...
[Epoch 115] train avg loss 0.00195265, dev acc 0.8437, dev avg loss 0.349799, throughput 4.90664K wps
[Epoch 116 Batch 30/62] avg loss 0.00202683, throughput 4.96542K wps
[Epoch 116 Batch 60/62] avg loss 0.00194908, throughput 4.83845K wps
Begin Testing...
[Epoch 116] train avg loss 0.00201555, dev acc 0.8437, dev avg loss 0.35071, throughput 4.90694K wps
[Epoch 117 Batch 30/62] avg loss 0.00188112, throughput 4.93533K wps
[Epoch 117 Batch 60/62] avg loss 0.00190433, throughput 4.82199K wps
Begin Testing...
[Epoch 117] train avg loss 0.00192981, dev acc 0.8437, dev avg loss 0.349579, throughput 4.88625K wps
[Epoch 118 Batch 30/62] avg loss 0.00179031, throughput 4.9534K wps
[Epoch 118 Batch 60/62] avg loss 0.00184408, throughput 4.85357K wps
Begin Testing...
[Epoch 118] train avg loss 0.00184715, dev acc 0.8496, dev avg loss 0.349197, throughput 4.91011K wps
[Epoch 119 Batch 30/62] avg loss 0.0018467, throughput 4.95836K wps
[Epoch 119 Batch 60/62] avg loss 0.00195131, throughput 4.82135K wps
Begin Testing...
[Epoch 119] train avg loss 0.0019174, dev acc 0.8496, dev avg loss 0.34938, throughput 4.89557K wps
[Epoch 120 Batch 30/62] avg loss 0.00174138, throughput 4.92404K wps
[Epoch 120 Batch 60/62] avg loss 0.00189005, throughput 4.83672K wps
Begin Testing...
[Epoch 120] train avg loss 0.00181342, dev acc 0.8466, dev avg loss 0.350598, throughput 4.88642K wps
[Epoch 121 Batch 30/62] avg loss 0.00186556, throughput 4.93213K wps
[Epoch 121 Batch 60/62] avg loss 0.00186851, throughput 4.81689K wps
Begin Testing...
[Epoch 121] train avg loss 0.00192182, dev acc 0.8496, dev avg loss 0.350228, throughput 4.88097K wps
[Epoch 122 Batch 30/62] avg loss 0.00173175, throughput 4.95089K wps
[Epoch 122 Batch 60/62] avg loss 0.00167545, throughput 4.8472K wps
Begin Testing...
[Epoch 122] train avg loss 0.00171627, dev acc 0.8437, dev avg loss 0.351786, throughput 4.90538K wps
[Epoch 123 Batch 30/62] avg loss 0.00154638, throughput 4.93864K wps
[Epoch 123 Batch 60/62] avg loss 0.00191505, throughput 4.82067K wps
Begin Testing...
[Epoch 123] train avg loss 0.00177597, dev acc 0.8437, dev avg loss 0.351687, throughput 4.88616K wps
[Epoch 124 Batch 30/62] avg loss 0.00163649, throughput 4.95002K wps
[Epoch 124 Batch 60/62] avg loss 0.00164046, throughput 4.81533K wps
Begin Testing...
[Epoch 124] train avg loss 0.00166286, dev acc 0.8407, dev avg loss 0.358712, throughput 4.88624K wps
[Epoch 125 Batch 30/62] avg loss 0.00169855, throughput 4.93753K wps
[Epoch 125 Batch 60/62] avg loss 0.00173084, throughput 4.82285K wps
Begin Testing...
[Epoch 125] train avg loss 0.00172408, dev acc 0.8378, dev avg loss 0.352381, throughput 4.88628K wps
[Epoch 126 Batch 30/62] avg loss 0.00171935, throughput 4.9451K wps
[Epoch 126 Batch 60/62] avg loss 0.00166761, throughput 4.82999K wps
Begin Testing...
[Epoch 126] train avg loss 0.00171062, dev acc 0.8496, dev avg loss 0.351637, throughput 4.89133K wps
[Epoch 127 Batch 30/62] avg loss 0.00173234, throughput 4.95469K wps
[Epoch 127 Batch 60/62] avg loss 0.00159783, throughput 4.84502K wps
Begin Testing...
[Epoch 127] train avg loss 0.00171778, dev acc 0.8437, dev avg loss 0.351481, throughput 4.90779K wps
[Epoch 128 Batch 30/62] avg loss 0.00170307, throughput 4.99641K wps
[Epoch 128 Batch 60/62] avg loss 0.0015285, throughput 4.87499K wps
Begin Testing...
[Epoch 128] train avg loss 0.00167144, dev acc 0.8496, dev avg loss 0.351187, throughput 4.94113K wps
[Epoch 129 Batch 30/62] avg loss 0.0016276, throughput 4.99166K wps
[Epoch 129 Batch 60/62] avg loss 0.00151691, throughput 4.87459K wps
Begin Testing...
[Epoch 129] train avg loss 0.00157574, dev acc 0.8437, dev avg loss 0.351997, throughput 4.94039K wps
[Epoch 130 Batch 30/62] avg loss 0.00151492, throughput 4.97738K wps
[Epoch 130 Batch 60/62] avg loss 0.00166109, throughput 4.87039K wps
Begin Testing...
[Epoch 130] train avg loss 0.00160659, dev acc 0.8466, dev avg loss 0.351465, throughput 4.92849K wps
[Epoch 131 Batch 30/62] avg loss 0.0015093, throughput 4.959K wps
[Epoch 131 Batch 60/62] avg loss 0.00153226, throughput 4.86967K wps
Begin Testing...
[Epoch 131] train avg loss 0.00154246, dev acc 0.8466, dev avg loss 0.35187, throughput 4.92068K wps
[Epoch 132 Batch 30/62] avg loss 0.00151585, throughput 4.96654K wps
[Epoch 132 Batch 60/62] avg loss 0.00153933, throughput 4.84197K wps
Begin Testing...
[Epoch 132] train avg loss 0.00153433, dev acc 0.8525, dev avg loss 0.352151, throughput 4.90845K wps
[Epoch 133 Batch 30/62] avg loss 0.00137105, throughput 4.94573K wps
[Epoch 133 Batch 60/62] avg loss 0.00152102, throughput 4.80624K wps
Begin Testing...
[Epoch 133] train avg loss 0.00146122, dev acc 0.8496, dev avg loss 0.352523, throughput 4.88148K wps
[Epoch 134 Batch 30/62] avg loss 0.00140535, throughput 4.93863K wps
[Epoch 134 Batch 60/62] avg loss 0.00146177, throughput 4.82453K wps
Begin Testing...
[Epoch 134] train avg loss 0.00144072, dev acc 0.8466, dev avg loss 0.35351, throughput 4.8872K wps
[Epoch 135 Batch 30/62] avg loss 0.00148532, throughput 4.94644K wps
[Epoch 135 Batch 60/62] avg loss 0.00141497, throughput 4.80587K wps
Begin Testing...
[Epoch 135] train avg loss 0.00146994, dev acc 0.8407, dev avg loss 0.356317, throughput 4.88173K wps
[Epoch 136 Batch 30/62] avg loss 0.00139965, throughput 4.92054K wps
[Epoch 136 Batch 60/62] avg loss 0.00138634, throughput 4.81824K wps
Begin Testing...
[Epoch 136] train avg loss 0.00144197, dev acc 0.8496, dev avg loss 0.354137, throughput 4.87566K wps
[Epoch 137 Batch 30/62] avg loss 0.00142291, throughput 4.9244K wps
[Epoch 137 Batch 60/62] avg loss 0.0013689, throughput 4.8139K wps
Begin Testing...
[Epoch 137] train avg loss 0.00140128, dev acc 0.8437, dev avg loss 0.357839, throughput 4.87549K wps
[Epoch 138 Batch 30/62] avg loss 0.00137965, throughput 4.91047K wps
[Epoch 138 Batch 60/62] avg loss 0.00140079, throughput 4.81K wps
Begin Testing...
[Epoch 138] train avg loss 0.00140398, dev acc 0.8466, dev avg loss 0.355615, throughput 4.86644K wps
[Epoch 139 Batch 30/62] avg loss 0.00135756, throughput 4.92773K wps
[Epoch 139 Batch 60/62] avg loss 0.00135865, throughput 4.85054K wps
Begin Testing...
[Epoch 139] train avg loss 0.0013593, dev acc 0.8496, dev avg loss 0.356362, throughput 4.89749K wps
[Epoch 140 Batch 30/62] avg loss 0.00138732, throughput 4.96423K wps
[Epoch 140 Batch 60/62] avg loss 0.00129008, throughput 4.87623K wps
Begin Testing...
[Epoch 140] train avg loss 0.00134989, dev acc 0.8466, dev avg loss 0.358376, throughput 4.9246K wps
[Epoch 141 Batch 30/62] avg loss 0.00126411, throughput 4.9878K wps
[Epoch 141 Batch 60/62] avg loss 0.00128757, throughput 4.8767K wps
Begin Testing...
[Epoch 141] train avg loss 0.00128717, dev acc 0.8496, dev avg loss 0.356884, throughput 4.93833K wps
[Epoch 142 Batch 30/62] avg loss 0.00130336, throughput 4.98411K wps
[Epoch 142 Batch 60/62] avg loss 0.00139027, throughput 4.84638K wps
Begin Testing...
[Epoch 142] train avg loss 0.00137737, dev acc 0.8496, dev avg loss 0.356912, throughput 4.92191K wps
[Epoch 143 Batch 30/62] avg loss 0.0012369, throughput 4.94393K wps
[Epoch 143 Batch 60/62] avg loss 0.00135572, throughput 4.83782K wps
Begin Testing...
[Epoch 143] train avg loss 0.00130874, dev acc 0.8466, dev avg loss 0.356329, throughput 4.89736K wps
[Epoch 144 Batch 30/62] avg loss 0.00129946, throughput 4.91423K wps
[Epoch 144 Batch 60/62] avg loss 0.00124438, throughput 4.84373K wps
Begin Testing...
[Epoch 144] train avg loss 0.00127794, dev acc 0.8496, dev avg loss 0.357492, throughput 4.88573K wps
[Epoch 145 Batch 30/62] avg loss 0.00131352, throughput 4.98539K wps
[Epoch 145 Batch 60/62] avg loss 0.00127124, throughput 4.88406K wps
Begin Testing...
[Epoch 145] train avg loss 0.0013178, dev acc 0.8525, dev avg loss 0.356814, throughput 4.94199K wps
[Epoch 146 Batch 30/62] avg loss 0.00129553, throughput 4.97824K wps
[Epoch 146 Batch 60/62] avg loss 0.00125453, throughput 4.9067K wps
Begin Testing...
[Epoch 146] train avg loss 0.00128637, dev acc 0.8496, dev avg loss 0.355671, throughput 4.94862K wps
[Epoch 147 Batch 30/62] avg loss 0.00121659, throughput 4.95565K wps
[Epoch 147 Batch 60/62] avg loss 0.00111632, throughput 4.8543K wps
Begin Testing...
[Epoch 147] train avg loss 0.00117113, dev acc 0.8496, dev avg loss 0.360914, throughput 4.91071K wps
[Epoch 148 Batch 30/62] avg loss 0.00115521, throughput 4.96182K wps
[Epoch 148 Batch 60/62] avg loss 0.00121702, throughput 4.81703K wps
Begin Testing...
[Epoch 148] train avg loss 0.00119355, dev acc 0.8525, dev avg loss 0.35991, throughput 4.89443K wps
[Epoch 149 Batch 30/62] avg loss 0.00117206, throughput 4.94295K wps
[Epoch 149 Batch 60/62] avg loss 0.00120765, throughput 4.8375K wps
Begin Testing...
[Epoch 149] train avg loss 0.00118535, dev acc 0.8496, dev avg loss 0.357121, throughput 4.89807K wps
[Epoch 150 Batch 30/62] avg loss 0.00111982, throughput 4.90553K wps
[Epoch 150 Batch 60/62] avg loss 0.00118642, throughput 4.83369K wps
Begin Testing...
[Epoch 150] train avg loss 0.00115492, dev acc 0.8525, dev avg loss 0.35665, throughput 4.87505K wps
[Epoch 151 Batch 30/62] avg loss 0.00124492, throughput 4.93082K wps
[Epoch 151 Batch 60/62] avg loss 0.00112874, throughput 4.82807K wps
Begin Testing...
[Epoch 151] train avg loss 0.00119589, dev acc 0.8525, dev avg loss 0.357495, throughput 4.8858K wps
[Epoch 152 Batch 30/62] avg loss 0.00116532, throughput 4.90913K wps
[Epoch 152 Batch 60/62] avg loss 0.00111146, throughput 4.81639K wps
Begin Testing...
[Epoch 152] train avg loss 0.00113922, dev acc 0.8555, dev avg loss 0.357356, throughput 4.86947K wps
[Epoch 153 Batch 30/62] avg loss 0.00113347, throughput 4.90328K wps
[Epoch 153 Batch 60/62] avg loss 0.00118935, throughput 4.78307K wps
Begin Testing...
[Epoch 153] train avg loss 0.00116122, dev acc 0.8525, dev avg loss 0.357285, throughput 4.84887K wps
[Epoch 154 Batch 30/62] avg loss 0.00113739, throughput 4.91939K wps
[Epoch 154 Batch 60/62] avg loss 0.00117978, throughput 4.82049K wps
Begin Testing...
[Epoch 154] train avg loss 0.00119309, dev acc 0.8525, dev avg loss 0.357901, throughput 4.87778K wps
[Epoch 155 Batch 30/62] avg loss 0.00108323, throughput 4.92427K wps
[Epoch 155 Batch 60/62] avg loss 0.00102655, throughput 4.80942K wps
Begin Testing...
[Epoch 155] train avg loss 0.00108046, dev acc 0.8525, dev avg loss 0.358078, throughput 4.87262K wps
[Epoch 156 Batch 30/62] avg loss 0.00109749, throughput 4.93054K wps
[Epoch 156 Batch 60/62] avg loss 0.000978527, throughput 4.82163K wps
Begin Testing...
[Epoch 156] train avg loss 0.00107738, dev acc 0.8525, dev avg loss 0.360744, throughput 4.88305K wps
[Epoch 157 Batch 30/62] avg loss 0.00106146, throughput 4.92396K wps
[Epoch 157 Batch 60/62] avg loss 0.00109502, throughput 4.81531K wps
Begin Testing...
[Epoch 157] train avg loss 0.00109032, dev acc 0.8525, dev avg loss 0.359366, throughput 4.87676K wps
[Epoch 158 Batch 30/62] avg loss 0.00100976, throughput 4.93388K wps
[Epoch 158 Batch 60/62] avg loss 0.00109244, throughput 4.83047K wps
Begin Testing...
[Epoch 158] train avg loss 0.00106008, dev acc 0.8525, dev avg loss 0.358859, throughput 4.8894K wps
[Epoch 159 Batch 30/62] avg loss 0.00115116, throughput 4.96556K wps
[Epoch 159 Batch 60/62] avg loss 0.00103097, throughput 4.83154K wps
Begin Testing...
[Epoch 159] train avg loss 0.00112737, dev acc 0.8555, dev avg loss 0.361037, throughput 4.90394K wps
[Epoch 160 Batch 30/62] avg loss 0.000995165, throughput 4.93615K wps
[Epoch 160 Batch 60/62] avg loss 0.00108051, throughput 4.87051K wps
Begin Testing...
[Epoch 160] train avg loss 0.0010414, dev acc 0.8525, dev avg loss 0.360073, throughput 4.90947K wps
[Epoch 161 Batch 30/62] avg loss 0.00113781, throughput 4.95071K wps
[Epoch 161 Batch 60/62] avg loss 0.0009962, throughput 4.82959K wps
Begin Testing...
[Epoch 161] train avg loss 0.00108412, dev acc 0.8555, dev avg loss 0.360752, throughput 4.89489K wps
[Epoch 162 Batch 30/62] avg loss 0.00107973, throughput 4.9582K wps
[Epoch 162 Batch 60/62] avg loss 0.00105766, throughput 4.85533K wps
Begin Testing...
[Epoch 162] train avg loss 0.0010755, dev acc 0.8555, dev avg loss 0.361056, throughput 4.91523K wps
[Epoch 163 Batch 30/62] avg loss 0.000954702, throughput 4.99701K wps
[Epoch 163 Batch 60/62] avg loss 0.0010329, throughput 4.84904K wps
Begin Testing...
[Epoch 163] train avg loss 0.000997671, dev acc 0.8555, dev avg loss 0.36188, throughput 4.9291K wps
[Epoch 164 Batch 30/62] avg loss 0.00103137, throughput 4.94752K wps
[Epoch 164 Batch 60/62] avg loss 0.000957481, throughput 4.85579K wps
Begin Testing...
[Epoch 164] train avg loss 0.00100551, dev acc 0.8584, dev avg loss 0.363017, throughput 4.90693K wps
[Epoch 165 Batch 30/62] avg loss 0.00096823, throughput 4.9663K wps
[Epoch 165 Batch 60/62] avg loss 0.00095885, throughput 4.86394K wps
Begin Testing...
[Epoch 165] train avg loss 0.000972986, dev acc 0.8496, dev avg loss 0.362533, throughput 4.92154K wps
[Epoch 166 Batch 30/62] avg loss 0.00093336, throughput 4.97759K wps
[Epoch 166 Batch 60/62] avg loss 0.000978057, throughput 4.8451K wps
Begin Testing...
[Epoch 166] train avg loss 0.000958312, dev acc 0.8525, dev avg loss 0.365094, throughput 4.91595K wps
[Epoch 167 Batch 30/62] avg loss 0.000921609, throughput 4.92539K wps
[Epoch 167 Batch 60/62] avg loss 0.000946178, throughput 4.82443K wps
Begin Testing...
[Epoch 167] train avg loss 0.000943818, dev acc 0.8496, dev avg loss 0.36619, throughput 4.88185K wps
[Epoch 168 Batch 30/62] avg loss 0.00096899, throughput 4.95566K wps
[Epoch 168 Batch 60/62] avg loss 0.000863765, throughput 4.87652K wps
Begin Testing...
[Epoch 168] train avg loss 0.000931802, dev acc 0.8584, dev avg loss 0.364129, throughput 4.92381K wps
[Epoch 169 Batch 30/62] avg loss 0.000893015, throughput 4.97857K wps
[Epoch 169 Batch 60/62] avg loss 0.000958698, throughput 4.81617K wps
Begin Testing...
[Epoch 169] train avg loss 0.000934032, dev acc 0.8466, dev avg loss 0.366345, throughput 4.90199K wps
[Epoch 170 Batch 30/62] avg loss 0.000959072, throughput 4.96556K wps
[Epoch 170 Batch 60/62] avg loss 0.000917472, throughput 4.87051K wps
Begin Testing...
[Epoch 170] train avg loss 0.000948289, dev acc 0.8584, dev avg loss 0.365403, throughput 4.92343K wps
[Epoch 171 Batch 30/62] avg loss 0.000994773, throughput 4.96308K wps
[Epoch 171 Batch 60/62] avg loss 0.000862364, throughput 4.83866K wps
Begin Testing...
[Epoch 171] train avg loss 0.000976503, dev acc 0.8496, dev avg loss 0.369055, throughput 4.90544K wps
[Epoch 172 Batch 30/62] avg loss 0.000868121, throughput 4.94763K wps
[Epoch 172 Batch 60/62] avg loss 0.000960358, throughput 4.82497K wps
Begin Testing...
[Epoch 172] train avg loss 0.000915413, dev acc 0.8584, dev avg loss 0.366375, throughput 4.89243K wps
[Epoch 173 Batch 30/62] avg loss 0.000917094, throughput 4.93078K wps
[Epoch 173 Batch 60/62] avg loss 0.000945143, throughput 4.83696K wps
Begin Testing...
[Epoch 173] train avg loss 0.000940101, dev acc 0.8525, dev avg loss 0.365112, throughput 4.89081K wps
[Epoch 174 Batch 30/62] avg loss 0.000896324, throughput 4.94221K wps
[Epoch 174 Batch 60/62] avg loss 0.000851704, throughput 4.84918K wps
Begin Testing...
[Epoch 174] train avg loss 0.000891067, dev acc 0.8555, dev avg loss 0.368111, throughput 4.9025K wps
[Epoch 175 Batch 30/62] avg loss 0.000868067, throughput 4.9426K wps
[Epoch 175 Batch 60/62] avg loss 0.000913061, throughput 4.82721K wps
Begin Testing...
[Epoch 175] train avg loss 0.000923215, dev acc 0.8496, dev avg loss 0.371215, throughput 4.89045K wps
[Epoch 176 Batch 30/62] avg loss 0.000797484, throughput 4.921K wps
[Epoch 176 Batch 60/62] avg loss 0.000814462, throughput 4.81908K wps
Begin Testing...
[Epoch 176] train avg loss 0.000816225, dev acc 0.8496, dev avg loss 0.366647, throughput 4.87747K wps
[Epoch 177 Batch 30/62] avg loss 0.000859596, throughput 4.92901K wps
[Epoch 177 Batch 60/62] avg loss 0.000891217, throughput 4.82504K wps
Begin Testing...
[Epoch 177] train avg loss 0.000890222, dev acc 0.8496, dev avg loss 0.37035, throughput 4.88293K wps
[Epoch 178 Batch 30/62] avg loss 0.000786948, throughput 4.94083K wps
[Epoch 178 Batch 60/62] avg loss 0.000859913, throughput 4.82102K wps
Begin Testing...
[Epoch 178] train avg loss 0.000823962, dev acc 0.8496, dev avg loss 0.368031, throughput 4.88602K wps
[Epoch 179 Batch 30/62] avg loss 0.000810704, throughput 4.92852K wps
[Epoch 179 Batch 60/62] avg loss 0.000869481, throughput 4.85688K wps
Begin Testing...
[Epoch 179] train avg loss 0.000844819, dev acc 0.8525, dev avg loss 0.373012, throughput 4.9001K wps
[Epoch 180 Batch 30/62] avg loss 0.000816938, throughput 4.97936K wps
[Epoch 180 Batch 60/62] avg loss 0.000794958, throughput 4.8591K wps
Begin Testing...
[Epoch 180] train avg loss 0.00082581, dev acc 0.8525, dev avg loss 0.368243, throughput 4.92548K wps
[Epoch 181 Batch 30/62] avg loss 0.000853153, throughput 4.94448K wps
[Epoch 181 Batch 60/62] avg loss 0.000727315, throughput 4.8515K wps
Begin Testing...
[Epoch 181] train avg loss 0.000789228, dev acc 0.8584, dev avg loss 0.370819, throughput 4.90492K wps
[Epoch 182 Batch 30/62] avg loss 0.00079335, throughput 4.97788K wps
[Epoch 182 Batch 60/62] avg loss 0.000825143, throughput 4.85278K wps
Begin Testing...
[Epoch 182] train avg loss 0.000816184, dev acc 0.8525, dev avg loss 0.370168, throughput 4.92023K wps
[Epoch 183 Batch 30/62] avg loss 0.000828182, throughput 4.95386K wps
[Epoch 183 Batch 60/62] avg loss 0.000827687, throughput 4.84305K wps
Begin Testing...
[Epoch 183] train avg loss 0.000826228, dev acc 0.8555, dev avg loss 0.371023, throughput 4.90409K wps
[Epoch 184 Batch 30/62] avg loss 0.000888575, throughput 4.96377K wps
[Epoch 184 Batch 60/62] avg loss 0.000708759, throughput 4.86405K wps
Begin Testing...
[Epoch 184] train avg loss 0.000804688, dev acc 0.8584, dev avg loss 0.370419, throughput 4.91946K wps
[Epoch 185 Batch 30/62] avg loss 0.000826821, throughput 4.93522K wps
[Epoch 185 Batch 60/62] avg loss 0.000759842, throughput 4.83059K wps
Begin Testing...
[Epoch 185] train avg loss 0.000823646, dev acc 0.8555, dev avg loss 0.372117, throughput 4.88932K wps
[Epoch 186 Batch 30/62] avg loss 0.000725019, throughput 4.96701K wps
[Epoch 186 Batch 60/62] avg loss 0.000823976, throughput 4.80762K wps
Begin Testing...
[Epoch 186] train avg loss 0.000787668, dev acc 0.8496, dev avg loss 0.367978, throughput 4.89042K wps
[Epoch 187 Batch 30/62] avg loss 0.000703557, throughput 4.94235K wps
[Epoch 187 Batch 60/62] avg loss 0.000778443, throughput 4.82023K wps
Begin Testing...
[Epoch 187] train avg loss 0.000742432, dev acc 0.8525, dev avg loss 0.368494, throughput 4.88713K wps
[Epoch 188 Batch 30/62] avg loss 0.000762836, throughput 4.94468K wps
[Epoch 188 Batch 60/62] avg loss 0.000775368, throughput 4.82243K wps
Begin Testing...
[Epoch 188] train avg loss 0.000785495, dev acc 0.8525, dev avg loss 0.368539, throughput 4.88961K wps
[Epoch 189 Batch 30/62] avg loss 0.000738442, throughput 4.94597K wps
[Epoch 189 Batch 60/62] avg loss 0.000696508, throughput 4.8761K wps
Begin Testing...
[Epoch 189] train avg loss 0.000724864, dev acc 0.8555, dev avg loss 0.368528, throughput 4.91811K wps
[Epoch 190 Batch 30/62] avg loss 0.000742363, throughput 4.97295K wps
[Epoch 190 Batch 60/62] avg loss 0.000780218, throughput 4.87509K wps
Begin Testing...
[Epoch 190] train avg loss 0.000767977, dev acc 0.8525, dev avg loss 0.369934, throughput 4.9301K wps
[Epoch 191 Batch 30/62] avg loss 0.000728441, throughput 4.98429K wps
[Epoch 191 Batch 60/62] avg loss 0.000720712, throughput 4.8624K wps
Begin Testing...
[Epoch 191] train avg loss 0.000728509, dev acc 0.8584, dev avg loss 0.374563, throughput 4.92966K wps
[Epoch 192 Batch 30/62] avg loss 0.000657113, throughput 4.98295K wps
[Epoch 192 Batch 60/62] avg loss 0.000727565, throughput 4.87962K wps
Begin Testing...
[Epoch 192] train avg loss 0.000702351, dev acc 0.8525, dev avg loss 0.373479, throughput 4.93642K wps
[Epoch 193 Batch 30/62] avg loss 0.000782234, throughput 4.96919K wps
[Epoch 193 Batch 60/62] avg loss 0.000709226, throughput 4.85692K wps
Begin Testing...
[Epoch 193] train avg loss 0.000758948, dev acc 0.8555, dev avg loss 0.372147, throughput 4.9199K wps
[Epoch 194 Batch 30/62] avg loss 0.000685212, throughput 4.95842K wps
[Epoch 194 Batch 60/62] avg loss 0.000668869, throughput 4.86029K wps
Begin Testing...
[Epoch 194] train avg loss 0.000694237, dev acc 0.8525, dev avg loss 0.373718, throughput 4.91478K wps
[Epoch 195 Batch 30/62] avg loss 0.000700579, throughput 4.94875K wps
[Epoch 195 Batch 60/62] avg loss 0.000686889, throughput 4.85977K wps
Begin Testing...
[Epoch 195] train avg loss 0.000694562, dev acc 0.8496, dev avg loss 0.372612, throughput 4.91126K wps
[Epoch 196 Batch 30/62] avg loss 0.000673388, throughput 4.95205K wps
[Epoch 196 Batch 60/62] avg loss 0.000734129, throughput 4.86091K wps
Begin Testing...
[Epoch 196] train avg loss 0.000731211, dev acc 0.8525, dev avg loss 0.373387, throughput 4.91324K wps
[Epoch 197 Batch 30/62] avg loss 0.000703814, throughput 4.95098K wps
[Epoch 197 Batch 60/62] avg loss 0.00074255, throughput 4.84678K wps
Begin Testing...
[Epoch 197] train avg loss 0.000735178, dev acc 0.8525, dev avg loss 0.37139, throughput 4.90535K wps
[Epoch 198 Batch 30/62] avg loss 0.000686603, throughput 4.95307K wps
[Epoch 198 Batch 60/62] avg loss 0.000642552, throughput 4.83903K wps
Begin Testing...
[Epoch 198] train avg loss 0.000663716, dev acc 0.8555, dev avg loss 0.372659, throughput 4.90265K wps
[Epoch 199 Batch 30/62] avg loss 0.000657044, throughput 4.96319K wps
[Epoch 199 Batch 60/62] avg loss 0.000654131, throughput 4.8699K wps
Begin Testing...
[Epoch 199] train avg loss 0.000653103, dev acc 0.8525, dev avg loss 0.373135, throughput 4.92341K wps
Test loss 0.445847, test acc 0.7772
Total time cost 276.37s
[Epoch 0 Batch 30/62] avg loss 0.0134706, throughput 4.75854K wps
[Epoch 0 Batch 60/62] avg loss 0.0129551, throughput 4.87265K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133837, dev acc 0.6254, dev avg loss 0.659225, throughput 4.82585K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0132918, throughput 4.96658K wps
[Epoch 1 Batch 60/62] avg loss 0.0128748, throughput 4.84966K wps
Begin Testing...
[Epoch 1] train avg loss 0.0132739, dev acc 0.6254, dev avg loss 0.654024, throughput 4.91398K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0130849, throughput 4.94237K wps
[Epoch 2 Batch 60/62] avg loss 0.01295, throughput 4.86405K wps
Begin Testing...
[Epoch 2] train avg loss 0.0131942, dev acc 0.6254, dev avg loss 0.650198, throughput 4.91064K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0128868, throughput 4.95951K wps
[Epoch 3 Batch 60/62] avg loss 0.0127222, throughput 4.85791K wps
Begin Testing...
[Epoch 3] train avg loss 0.0129564, dev acc 0.6254, dev avg loss 0.646126, throughput 4.91619K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0127841, throughput 4.99654K wps
[Epoch 4 Batch 60/62] avg loss 0.0127041, throughput 4.8774K wps
Begin Testing...
[Epoch 4] train avg loss 0.0128932, dev acc 0.6254, dev avg loss 0.640362, throughput 4.9433K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0124445, throughput 4.99817K wps
[Epoch 5 Batch 60/62] avg loss 0.0126121, throughput 4.84229K wps
Begin Testing...
[Epoch 5] train avg loss 0.0126888, dev acc 0.6254, dev avg loss 0.63685, throughput 4.92598K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0124684, throughput 4.9856K wps
[Epoch 6 Batch 60/62] avg loss 0.0125496, throughput 4.85682K wps
Begin Testing...
[Epoch 6] train avg loss 0.0126553, dev acc 0.6254, dev avg loss 0.632051, throughput 4.92473K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0124778, throughput 4.95854K wps
[Epoch 7 Batch 60/62] avg loss 0.012366, throughput 4.83342K wps
Begin Testing...
[Epoch 7] train avg loss 0.0126409, dev acc 0.6254, dev avg loss 0.627369, throughput 4.90328K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0121037, throughput 4.95992K wps
[Epoch 8 Batch 60/62] avg loss 0.0124303, throughput 4.83665K wps
Begin Testing...
[Epoch 8] train avg loss 0.0124259, dev acc 0.6254, dev avg loss 0.623135, throughput 4.90223K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0122366, throughput 4.97369K wps
[Epoch 9 Batch 60/62] avg loss 0.0119893, throughput 4.87065K wps
Begin Testing...
[Epoch 9] train avg loss 0.0123107, dev acc 0.6254, dev avg loss 0.619085, throughput 4.92602K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0121199, throughput 4.95401K wps
[Epoch 10 Batch 60/62] avg loss 0.0118135, throughput 4.85362K wps
Begin Testing...
[Epoch 10] train avg loss 0.0121773, dev acc 0.6283, dev avg loss 0.612611, throughput 4.91114K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.011643, throughput 4.95287K wps
[Epoch 11 Batch 60/62] avg loss 0.0119089, throughput 4.83567K wps
Begin Testing...
[Epoch 11] train avg loss 0.0119296, dev acc 0.6342, dev avg loss 0.607151, throughput 4.90032K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.011633, throughput 4.95135K wps
[Epoch 12 Batch 60/62] avg loss 0.0118256, throughput 4.85284K wps
Begin Testing...
[Epoch 12] train avg loss 0.011916, dev acc 0.6342, dev avg loss 0.601686, throughput 4.90984K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0113966, throughput 4.9582K wps
[Epoch 13 Batch 60/62] avg loss 0.0116523, throughput 4.84626K wps
Begin Testing...
[Epoch 13] train avg loss 0.0116776, dev acc 0.6431, dev avg loss 0.595741, throughput 4.90884K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0113373, throughput 4.94901K wps
[Epoch 14 Batch 60/62] avg loss 0.0115476, throughput 4.83862K wps
Begin Testing...
[Epoch 14] train avg loss 0.0115572, dev acc 0.6342, dev avg loss 0.591327, throughput 4.8992K wps
[Epoch 15 Batch 30/62] avg loss 0.0111671, throughput 4.95536K wps
[Epoch 15 Batch 60/62] avg loss 0.0112085, throughput 4.83676K wps
Begin Testing...
[Epoch 15] train avg loss 0.0113436, dev acc 0.6637, dev avg loss 0.583186, throughput 4.90101K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0113295, throughput 4.9449K wps
[Epoch 16 Batch 60/62] avg loss 0.0107763, throughput 4.82762K wps
Begin Testing...
[Epoch 16] train avg loss 0.0112475, dev acc 0.6578, dev avg loss 0.576755, throughput 4.89296K wps
[Epoch 17 Batch 30/62] avg loss 0.0107808, throughput 4.92486K wps
[Epoch 17 Batch 60/62] avg loss 0.0108972, throughput 4.84697K wps
Begin Testing...
[Epoch 17] train avg loss 0.0109932, dev acc 0.7168, dev avg loss 0.570049, throughput 4.89199K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.0105767, throughput 4.96818K wps
[Epoch 18 Batch 60/62] avg loss 0.0106915, throughput 4.84052K wps
Begin Testing...
[Epoch 18] train avg loss 0.0107434, dev acc 0.6608, dev avg loss 0.565536, throughput 4.90875K wps
[Epoch 19 Batch 30/62] avg loss 0.0105921, throughput 4.94897K wps
[Epoch 19 Batch 60/62] avg loss 0.0103693, throughput 4.84601K wps
Begin Testing...
[Epoch 19] train avg loss 0.0105871, dev acc 0.6844, dev avg loss 0.55747, throughput 4.90334K wps
[Epoch 20 Batch 30/62] avg loss 0.0105721, throughput 4.94757K wps
[Epoch 20 Batch 60/62] avg loss 0.0103449, throughput 4.83536K wps
Begin Testing...
[Epoch 20] train avg loss 0.0105768, dev acc 0.7227, dev avg loss 0.549455, throughput 4.89726K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0104088, throughput 4.9324K wps
[Epoch 21 Batch 60/62] avg loss 0.00983998, throughput 4.83637K wps
Begin Testing...
[Epoch 21] train avg loss 0.0102335, dev acc 0.7109, dev avg loss 0.544983, throughput 4.89215K wps
[Epoch 22 Batch 30/62] avg loss 0.00993747, throughput 4.94616K wps
[Epoch 22 Batch 60/62] avg loss 0.0100598, throughput 4.85168K wps
Begin Testing...
[Epoch 22] train avg loss 0.0101369, dev acc 0.7493, dev avg loss 0.535799, throughput 4.90453K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.00982563, throughput 4.94258K wps
[Epoch 23 Batch 60/62] avg loss 0.00970588, throughput 4.82811K wps
Begin Testing...
[Epoch 23] train avg loss 0.00989492, dev acc 0.7611, dev avg loss 0.52966, throughput 4.89178K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.00942273, throughput 4.97041K wps
[Epoch 24 Batch 60/62] avg loss 0.00984804, throughput 4.85337K wps
Begin Testing...
[Epoch 24] train avg loss 0.00980315, dev acc 0.7670, dev avg loss 0.52512, throughput 4.91864K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.00955282, throughput 4.94208K wps
[Epoch 25 Batch 60/62] avg loss 0.00931093, throughput 4.84013K wps
Begin Testing...
[Epoch 25] train avg loss 0.00957621, dev acc 0.7670, dev avg loss 0.518638, throughput 4.8971K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.00944053, throughput 4.96069K wps
[Epoch 26 Batch 60/62] avg loss 0.00937216, throughput 4.84864K wps
Begin Testing...
[Epoch 26] train avg loss 0.00948912, dev acc 0.7729, dev avg loss 0.512446, throughput 4.91083K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/62] avg loss 0.00916912, throughput 4.96311K wps
[Epoch 27 Batch 60/62] avg loss 0.00943207, throughput 4.85392K wps
Begin Testing...
[Epoch 27] train avg loss 0.00938943, dev acc 0.7788, dev avg loss 0.507328, throughput 4.9145K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.00906152, throughput 4.97133K wps
[Epoch 28 Batch 60/62] avg loss 0.00901025, throughput 4.88267K wps
Begin Testing...
[Epoch 28] train avg loss 0.00916377, dev acc 0.7788, dev avg loss 0.502227, throughput 4.93423K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00900117, throughput 5.00783K wps
[Epoch 29 Batch 60/62] avg loss 0.00862273, throughput 4.89035K wps
Begin Testing...
[Epoch 29] train avg loss 0.00895519, dev acc 0.7876, dev avg loss 0.497231, throughput 4.95411K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00887763, throughput 4.97538K wps
[Epoch 30 Batch 60/62] avg loss 0.00874105, throughput 4.84972K wps
Begin Testing...
[Epoch 30] train avg loss 0.00884184, dev acc 0.7935, dev avg loss 0.492596, throughput 4.91843K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00867114, throughput 4.96679K wps
[Epoch 31 Batch 60/62] avg loss 0.0085996, throughput 4.84774K wps
Begin Testing...
[Epoch 31] train avg loss 0.00871733, dev acc 0.7847, dev avg loss 0.488208, throughput 4.9137K wps
[Epoch 32 Batch 30/62] avg loss 0.008422, throughput 4.96206K wps
[Epoch 32 Batch 60/62] avg loss 0.00836393, throughput 4.84681K wps
Begin Testing...
[Epoch 32] train avg loss 0.00860112, dev acc 0.7906, dev avg loss 0.485368, throughput 4.91042K wps
[Epoch 33 Batch 30/62] avg loss 0.00853696, throughput 4.95742K wps
[Epoch 33 Batch 60/62] avg loss 0.00817685, throughput 4.85415K wps
Begin Testing...
[Epoch 33] train avg loss 0.0084511, dev acc 0.7994, dev avg loss 0.479706, throughput 4.9123K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.00810995, throughput 4.94368K wps
[Epoch 34 Batch 60/62] avg loss 0.00846587, throughput 4.84246K wps
Begin Testing...
[Epoch 34] train avg loss 0.00842264, dev acc 0.7935, dev avg loss 0.477036, throughput 4.89905K wps
[Epoch 35 Batch 30/62] avg loss 0.00798978, throughput 4.95344K wps
[Epoch 35 Batch 60/62] avg loss 0.00817183, throughput 4.8349K wps
Begin Testing...
[Epoch 35] train avg loss 0.00828204, dev acc 0.7935, dev avg loss 0.47177, throughput 4.90015K wps
[Epoch 36 Batch 30/62] avg loss 0.00784715, throughput 4.93642K wps
[Epoch 36 Batch 60/62] avg loss 0.00814026, throughput 4.82957K wps
Begin Testing...
[Epoch 36] train avg loss 0.00821548, dev acc 0.7994, dev avg loss 0.468544, throughput 4.88943K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/62] avg loss 0.00775845, throughput 4.95278K wps
[Epoch 37 Batch 60/62] avg loss 0.00790624, throughput 4.83753K wps
Begin Testing...
[Epoch 37] train avg loss 0.00785014, dev acc 0.7994, dev avg loss 0.464808, throughput 4.90111K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/62] avg loss 0.00780501, throughput 4.94411K wps
[Epoch 38 Batch 60/62] avg loss 0.00772035, throughput 4.81847K wps
Begin Testing...
[Epoch 38] train avg loss 0.00791175, dev acc 0.7994, dev avg loss 0.463287, throughput 4.88775K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.00793211, throughput 4.95885K wps
[Epoch 39 Batch 60/62] avg loss 0.00728981, throughput 4.85129K wps
Begin Testing...
[Epoch 39] train avg loss 0.00771048, dev acc 0.8024, dev avg loss 0.458037, throughput 4.90965K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.00730127, throughput 4.96431K wps
[Epoch 40 Batch 60/62] avg loss 0.00765224, throughput 4.84139K wps
Begin Testing...
[Epoch 40] train avg loss 0.00752689, dev acc 0.8053, dev avg loss 0.454424, throughput 4.90852K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.00716525, throughput 4.93865K wps
[Epoch 41 Batch 60/62] avg loss 0.00751035, throughput 4.83269K wps
Begin Testing...
[Epoch 41] train avg loss 0.00756227, dev acc 0.7906, dev avg loss 0.45437, throughput 4.89227K wps
[Epoch 42 Batch 30/62] avg loss 0.00718865, throughput 4.97136K wps
[Epoch 42 Batch 60/62] avg loss 0.00710109, throughput 4.88323K wps
Begin Testing...
[Epoch 42] train avg loss 0.00721657, dev acc 0.7965, dev avg loss 0.450651, throughput 4.93404K wps
[Epoch 43 Batch 30/62] avg loss 0.00688348, throughput 4.96889K wps
[Epoch 43 Batch 60/62] avg loss 0.00720956, throughput 4.88198K wps
Begin Testing...
[Epoch 43] train avg loss 0.00719003, dev acc 0.8053, dev avg loss 0.449707, throughput 4.93187K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/62] avg loss 0.00686995, throughput 4.98221K wps
[Epoch 44 Batch 60/62] avg loss 0.00715091, throughput 4.86015K wps
Begin Testing...
[Epoch 44] train avg loss 0.00720162, dev acc 0.8053, dev avg loss 0.443359, throughput 4.92732K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/62] avg loss 0.00698267, throughput 5.00072K wps
[Epoch 45 Batch 60/62] avg loss 0.00704772, throughput 4.87321K wps
Begin Testing...
[Epoch 45] train avg loss 0.00704719, dev acc 0.8053, dev avg loss 0.439385, throughput 4.94422K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/62] avg loss 0.00661607, throughput 4.96515K wps
[Epoch 46 Batch 60/62] avg loss 0.00706479, throughput 4.86372K wps
Begin Testing...
[Epoch 46] train avg loss 0.00690881, dev acc 0.8024, dev avg loss 0.437411, throughput 4.92127K wps
[Epoch 47 Batch 30/62] avg loss 0.00653454, throughput 4.9488K wps
[Epoch 47 Batch 60/62] avg loss 0.0068761, throughput 4.83906K wps
Begin Testing...
[Epoch 47] train avg loss 0.00674193, dev acc 0.8053, dev avg loss 0.435103, throughput 4.90006K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/62] avg loss 0.00652381, throughput 4.93536K wps
[Epoch 48 Batch 60/62] avg loss 0.00669766, throughput 4.85997K wps
Begin Testing...
[Epoch 48] train avg loss 0.00669959, dev acc 0.8024, dev avg loss 0.432921, throughput 4.90419K wps
[Epoch 49 Batch 30/62] avg loss 0.00627218, throughput 4.97963K wps
[Epoch 49 Batch 60/62] avg loss 0.00668108, throughput 4.83767K wps
Begin Testing...
[Epoch 49] train avg loss 0.00653645, dev acc 0.8053, dev avg loss 0.430318, throughput 4.91469K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00623241, throughput 4.96818K wps
[Epoch 50 Batch 60/62] avg loss 0.0064874, throughput 4.81644K wps
Begin Testing...
[Epoch 50] train avg loss 0.00639442, dev acc 0.8053, dev avg loss 0.429322, throughput 4.89698K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/62] avg loss 0.00656816, throughput 4.9853K wps
[Epoch 51 Batch 60/62] avg loss 0.00619026, throughput 4.86601K wps
Begin Testing...
[Epoch 51] train avg loss 0.00645695, dev acc 0.8053, dev avg loss 0.426748, throughput 4.93195K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00611277, throughput 4.97481K wps
[Epoch 52 Batch 60/62] avg loss 0.00620101, throughput 4.8224K wps
Begin Testing...
[Epoch 52] train avg loss 0.00621039, dev acc 0.8053, dev avg loss 0.425508, throughput 4.90468K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.00606458, throughput 4.95896K wps
[Epoch 53 Batch 60/62] avg loss 0.00619586, throughput 4.82425K wps
Begin Testing...
[Epoch 53] train avg loss 0.00617487, dev acc 0.8083, dev avg loss 0.426367, throughput 4.89501K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/62] avg loss 0.0059809, throughput 4.94363K wps
[Epoch 54 Batch 60/62] avg loss 0.005915, throughput 4.81871K wps
Begin Testing...
[Epoch 54] train avg loss 0.00607304, dev acc 0.8083, dev avg loss 0.423829, throughput 4.88706K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/62] avg loss 0.00602652, throughput 4.95029K wps
[Epoch 55 Batch 60/62] avg loss 0.00573924, throughput 4.82829K wps
Begin Testing...
[Epoch 55] train avg loss 0.00595474, dev acc 0.8112, dev avg loss 0.422312, throughput 4.89414K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00562883, throughput 4.9409K wps
[Epoch 56 Batch 60/62] avg loss 0.0060351, throughput 4.83132K wps
Begin Testing...
[Epoch 56] train avg loss 0.00592533, dev acc 0.7847, dev avg loss 0.427068, throughput 4.89368K wps
[Epoch 57 Batch 30/62] avg loss 0.00572816, throughput 4.93071K wps
[Epoch 57 Batch 60/62] avg loss 0.0057001, throughput 4.80334K wps
Begin Testing...
[Epoch 57] train avg loss 0.0057795, dev acc 0.8053, dev avg loss 0.417849, throughput 4.87295K wps
[Epoch 58 Batch 30/62] avg loss 0.00563366, throughput 4.95836K wps
[Epoch 58 Batch 60/62] avg loss 0.00559766, throughput 4.8332K wps
Begin Testing...
[Epoch 58] train avg loss 0.00564304, dev acc 0.8142, dev avg loss 0.417833, throughput 4.89986K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/62] avg loss 0.00530504, throughput 4.9514K wps
[Epoch 59 Batch 60/62] avg loss 0.00569789, throughput 4.82956K wps
Begin Testing...
[Epoch 59] train avg loss 0.00552043, dev acc 0.8083, dev avg loss 0.415171, throughput 4.89648K wps
[Epoch 60 Batch 30/62] avg loss 0.00519799, throughput 4.97311K wps
[Epoch 60 Batch 60/62] avg loss 0.00540238, throughput 4.8186K wps
Begin Testing...
[Epoch 60] train avg loss 0.00531453, dev acc 0.8083, dev avg loss 0.415869, throughput 4.89866K wps
[Epoch 61 Batch 30/62] avg loss 0.00520161, throughput 4.96423K wps
[Epoch 61 Batch 60/62] avg loss 0.00530917, throughput 4.81524K wps
Begin Testing...
[Epoch 61] train avg loss 0.00536195, dev acc 0.7994, dev avg loss 0.411152, throughput 4.89554K wps
[Epoch 62 Batch 30/62] avg loss 0.00511577, throughput 4.94294K wps
[Epoch 62 Batch 60/62] avg loss 0.0050287, throughput 4.85104K wps
Begin Testing...
[Epoch 62] train avg loss 0.00514656, dev acc 0.7994, dev avg loss 0.409533, throughput 4.90315K wps
[Epoch 63 Batch 30/62] avg loss 0.00522282, throughput 4.95142K wps
[Epoch 63 Batch 60/62] avg loss 0.00497292, throughput 4.84177K wps
Begin Testing...
[Epoch 63] train avg loss 0.00516275, dev acc 0.8112, dev avg loss 0.408573, throughput 4.90241K wps
[Epoch 64 Batch 30/62] avg loss 0.00495015, throughput 4.95806K wps
[Epoch 64 Batch 60/62] avg loss 0.00511433, throughput 4.82595K wps
Begin Testing...
[Epoch 64] train avg loss 0.00510507, dev acc 0.8112, dev avg loss 0.41242, throughput 4.89796K wps
[Epoch 65 Batch 30/62] avg loss 0.00496258, throughput 4.93512K wps
[Epoch 65 Batch 60/62] avg loss 0.00485047, throughput 4.83458K wps
Begin Testing...
[Epoch 65] train avg loss 0.00498315, dev acc 0.8142, dev avg loss 0.41051, throughput 4.89042K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/62] avg loss 0.00494901, throughput 4.94414K wps
[Epoch 66 Batch 60/62] avg loss 0.00484198, throughput 4.86123K wps
Begin Testing...
[Epoch 66] train avg loss 0.00504742, dev acc 0.8142, dev avg loss 0.410058, throughput 4.90738K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/62] avg loss 0.00470837, throughput 4.96446K wps
[Epoch 67 Batch 60/62] avg loss 0.00484249, throughput 4.90287K wps
Begin Testing...
[Epoch 67] train avg loss 0.00489348, dev acc 0.8083, dev avg loss 0.405858, throughput 4.9402K wps
[Epoch 68 Batch 30/62] avg loss 0.0044217, throughput 5.00959K wps
[Epoch 68 Batch 60/62] avg loss 0.00478192, throughput 4.87645K wps
Begin Testing...
[Epoch 68] train avg loss 0.00468815, dev acc 0.8024, dev avg loss 0.404429, throughput 4.94694K wps
[Epoch 69 Batch 30/62] avg loss 0.00476182, throughput 4.94591K wps
[Epoch 69 Batch 60/62] avg loss 0.00449231, throughput 4.85656K wps
Begin Testing...
[Epoch 69] train avg loss 0.00465758, dev acc 0.8083, dev avg loss 0.40307, throughput 4.90683K wps
[Epoch 70 Batch 30/62] avg loss 0.00474549, throughput 4.96286K wps
[Epoch 70 Batch 60/62] avg loss 0.00435531, throughput 4.82771K wps
Begin Testing...
[Epoch 70] train avg loss 0.00458304, dev acc 0.8112, dev avg loss 0.40217, throughput 4.90053K wps
[Epoch 71 Batch 30/62] avg loss 0.00425191, throughput 4.9285K wps
[Epoch 71 Batch 60/62] avg loss 0.00436058, throughput 4.81208K wps
Begin Testing...
[Epoch 71] train avg loss 0.00435568, dev acc 0.8053, dev avg loss 0.401875, throughput 4.87654K wps
[Epoch 72 Batch 30/62] avg loss 0.00422864, throughput 4.91897K wps
[Epoch 72 Batch 60/62] avg loss 0.00432583, throughput 4.79597K wps
Begin Testing...
[Epoch 72] train avg loss 0.00437135, dev acc 0.8053, dev avg loss 0.402211, throughput 4.86364K wps
[Epoch 73 Batch 30/62] avg loss 0.00410648, throughput 4.9256K wps
[Epoch 73 Batch 60/62] avg loss 0.00443074, throughput 4.8362K wps
Begin Testing...
[Epoch 73] train avg loss 0.0043137, dev acc 0.8053, dev avg loss 0.400952, throughput 4.88879K wps
[Epoch 74 Batch 30/62] avg loss 0.00419856, throughput 4.96567K wps
[Epoch 74 Batch 60/62] avg loss 0.00413073, throughput 4.85525K wps
Begin Testing...
[Epoch 74] train avg loss 0.00420222, dev acc 0.8053, dev avg loss 0.400588, throughput 4.91494K wps
[Epoch 75 Batch 30/62] avg loss 0.00410424, throughput 4.96924K wps
[Epoch 75 Batch 60/62] avg loss 0.00419665, throughput 4.85804K wps
Begin Testing...
[Epoch 75] train avg loss 0.00423848, dev acc 0.8053, dev avg loss 0.39975, throughput 4.92064K wps
[Epoch 76 Batch 30/62] avg loss 0.00402235, throughput 4.97175K wps
[Epoch 76 Batch 60/62] avg loss 0.00397224, throughput 4.81468K wps
Begin Testing...
[Epoch 76] train avg loss 0.00401739, dev acc 0.8053, dev avg loss 0.398501, throughput 4.89995K wps
[Epoch 77 Batch 30/62] avg loss 0.00377993, throughput 4.96099K wps
[Epoch 77 Batch 60/62] avg loss 0.00399721, throughput 4.85052K wps
Begin Testing...
[Epoch 77] train avg loss 0.00399311, dev acc 0.7965, dev avg loss 0.401297, throughput 4.91163K wps
[Epoch 78 Batch 30/62] avg loss 0.00379461, throughput 4.95843K wps
[Epoch 78 Batch 60/62] avg loss 0.00402223, throughput 4.84302K wps
Begin Testing...
[Epoch 78] train avg loss 0.00398325, dev acc 0.8024, dev avg loss 0.398378, throughput 4.90567K wps
[Epoch 79 Batch 30/62] avg loss 0.00366093, throughput 4.96728K wps
[Epoch 79 Batch 60/62] avg loss 0.00399671, throughput 4.81963K wps
Begin Testing...
[Epoch 79] train avg loss 0.00387192, dev acc 0.8053, dev avg loss 0.399183, throughput 4.89846K wps
[Epoch 80 Batch 30/62] avg loss 0.00357164, throughput 4.97125K wps
[Epoch 80 Batch 60/62] avg loss 0.00388346, throughput 4.87615K wps
Begin Testing...
[Epoch 80] train avg loss 0.00387709, dev acc 0.8053, dev avg loss 0.403471, throughput 4.9303K wps
[Epoch 81 Batch 30/62] avg loss 0.00362104, throughput 4.979K wps
[Epoch 81 Batch 60/62] avg loss 0.00367893, throughput 4.89652K wps
Begin Testing...
[Epoch 81] train avg loss 0.00368641, dev acc 0.8024, dev avg loss 0.400964, throughput 4.94501K wps
[Epoch 82 Batch 30/62] avg loss 0.00341301, throughput 4.97926K wps
[Epoch 82 Batch 60/62] avg loss 0.00376025, throughput 4.90012K wps
Begin Testing...
[Epoch 82] train avg loss 0.00359869, dev acc 0.8053, dev avg loss 0.397775, throughput 4.94647K wps
[Epoch 83 Batch 30/62] avg loss 0.00368801, throughput 4.98412K wps
[Epoch 83 Batch 60/62] avg loss 0.00338556, throughput 4.84156K wps
Begin Testing...
[Epoch 83] train avg loss 0.00359642, dev acc 0.7965, dev avg loss 0.401001, throughput 4.91908K wps
[Epoch 84 Batch 30/62] avg loss 0.00346718, throughput 4.95662K wps
[Epoch 84 Batch 60/62] avg loss 0.0037078, throughput 4.84896K wps
Begin Testing...
[Epoch 84] train avg loss 0.00361995, dev acc 0.8053, dev avg loss 0.398495, throughput 4.90751K wps
[Epoch 85 Batch 30/62] avg loss 0.00351964, throughput 4.93957K wps
[Epoch 85 Batch 60/62] avg loss 0.00322615, throughput 4.79K wps
Begin Testing...
[Epoch 85] train avg loss 0.00342052, dev acc 0.7994, dev avg loss 0.399554, throughput 4.87257K wps
[Epoch 86 Batch 30/62] avg loss 0.00350842, throughput 4.95368K wps
[Epoch 86 Batch 60/62] avg loss 0.00320764, throughput 4.85364K wps
Begin Testing...
[Epoch 86] train avg loss 0.00340112, dev acc 0.8024, dev avg loss 0.398922, throughput 4.90917K wps
[Epoch 87 Batch 30/62] avg loss 0.00329742, throughput 4.95836K wps
[Epoch 87 Batch 60/62] avg loss 0.00342291, throughput 4.86814K wps
Begin Testing...
[Epoch 87] train avg loss 0.00340632, dev acc 0.8083, dev avg loss 0.399263, throughput 4.92156K wps
[Epoch 88 Batch 30/62] avg loss 0.00344166, throughput 4.996K wps
[Epoch 88 Batch 60/62] avg loss 0.00308395, throughput 4.9018K wps
Begin Testing...
[Epoch 88] train avg loss 0.0033505, dev acc 0.8053, dev avg loss 0.398709, throughput 4.95576K wps
[Epoch 89 Batch 30/62] avg loss 0.00333992, throughput 4.95844K wps
[Epoch 89 Batch 60/62] avg loss 0.00305666, throughput 4.84913K wps
Begin Testing...
[Epoch 89] train avg loss 0.00323689, dev acc 0.8053, dev avg loss 0.398972, throughput 4.9101K wps
[Epoch 90 Batch 30/62] avg loss 0.00324639, throughput 4.96081K wps
[Epoch 90 Batch 60/62] avg loss 0.00298446, throughput 4.84355K wps
Begin Testing...
[Epoch 90] train avg loss 0.00313805, dev acc 0.7935, dev avg loss 0.399503, throughput 4.90842K wps
[Epoch 91 Batch 30/62] avg loss 0.00334894, throughput 4.96463K wps
[Epoch 91 Batch 60/62] avg loss 0.00285891, throughput 4.84223K wps
Begin Testing...
[Epoch 91] train avg loss 0.00317184, dev acc 0.7965, dev avg loss 0.400179, throughput 4.90782K wps
[Epoch 92 Batch 30/62] avg loss 0.00305619, throughput 4.91305K wps
[Epoch 92 Batch 60/62] avg loss 0.00296561, throughput 4.81489K wps
Begin Testing...
[Epoch 92] train avg loss 0.00304791, dev acc 0.7994, dev avg loss 0.400299, throughput 4.87276K wps
[Epoch 93 Batch 30/62] avg loss 0.00297102, throughput 4.96K wps
[Epoch 93 Batch 60/62] avg loss 0.00289404, throughput 4.83446K wps
Begin Testing...
[Epoch 93] train avg loss 0.00300699, dev acc 0.8024, dev avg loss 0.400087, throughput 4.90227K wps
[Epoch 94 Batch 30/62] avg loss 0.00289207, throughput 4.95021K wps
[Epoch 94 Batch 60/62] avg loss 0.00302043, throughput 4.8592K wps
Begin Testing...
[Epoch 94] train avg loss 0.00299271, dev acc 0.7994, dev avg loss 0.400501, throughput 4.91108K wps
[Epoch 95 Batch 30/62] avg loss 0.00303117, throughput 4.96248K wps
[Epoch 95 Batch 60/62] avg loss 0.0026589, throughput 4.8801K wps
Begin Testing...
[Epoch 95] train avg loss 0.00288437, dev acc 0.7994, dev avg loss 0.401559, throughput 4.9228K wps
[Epoch 96 Batch 30/62] avg loss 0.00299382, throughput 4.97483K wps
[Epoch 96 Batch 60/62] avg loss 0.00280482, throughput 4.80305K wps
Begin Testing...
[Epoch 96] train avg loss 0.00292952, dev acc 0.8053, dev avg loss 0.402803, throughput 4.89212K wps
[Epoch 97 Batch 30/62] avg loss 0.00277655, throughput 4.92753K wps
[Epoch 97 Batch 60/62] avg loss 0.00277669, throughput 4.85677K wps
Begin Testing...
[Epoch 97] train avg loss 0.00284238, dev acc 0.7965, dev avg loss 0.402015, throughput 4.89766K wps
[Epoch 98 Batch 30/62] avg loss 0.00260865, throughput 4.93992K wps
[Epoch 98 Batch 60/62] avg loss 0.00286661, throughput 4.84236K wps
Begin Testing...
[Epoch 98] train avg loss 0.00281431, dev acc 0.7935, dev avg loss 0.404232, throughput 4.89676K wps
[Epoch 99 Batch 30/62] avg loss 0.00247204, throughput 4.9714K wps
[Epoch 99 Batch 60/62] avg loss 0.00283333, throughput 4.83936K wps
Begin Testing...
[Epoch 99] train avg loss 0.00266069, dev acc 0.7994, dev avg loss 0.402638, throughput 4.91164K wps
[Epoch 100 Batch 30/62] avg loss 0.00252431, throughput 4.9345K wps
[Epoch 100 Batch 60/62] avg loss 0.00260102, throughput 4.82235K wps
Begin Testing...
[Epoch 100] train avg loss 0.0026622, dev acc 0.8053, dev avg loss 0.405827, throughput 4.88382K wps
[Epoch 101 Batch 30/62] avg loss 0.00256765, throughput 4.96885K wps
[Epoch 101 Batch 60/62] avg loss 0.00252881, throughput 4.85207K wps
Begin Testing...
[Epoch 101] train avg loss 0.00257619, dev acc 0.7965, dev avg loss 0.402872, throughput 4.91605K wps
[Epoch 102 Batch 30/62] avg loss 0.00257191, throughput 4.96046K wps
[Epoch 102 Batch 60/62] avg loss 0.00250945, throughput 4.86955K wps
Begin Testing...
[Epoch 102] train avg loss 0.00257757, dev acc 0.7965, dev avg loss 0.407365, throughput 4.9212K wps
[Epoch 103 Batch 30/62] avg loss 0.00240712, throughput 4.96581K wps
[Epoch 103 Batch 60/62] avg loss 0.0024683, throughput 4.83335K wps
Begin Testing...
[Epoch 103] train avg loss 0.00249428, dev acc 0.7906, dev avg loss 0.405879, throughput 4.90497K wps
[Epoch 104 Batch 30/62] avg loss 0.00226131, throughput 4.93025K wps
[Epoch 104 Batch 60/62] avg loss 0.00266665, throughput 4.86287K wps
Begin Testing...
[Epoch 104] train avg loss 0.0025694, dev acc 0.7847, dev avg loss 0.405526, throughput 4.90417K wps
[Epoch 105 Batch 30/62] avg loss 0.00254042, throughput 4.96311K wps
[Epoch 105 Batch 60/62] avg loss 0.00243154, throughput 4.84618K wps
Begin Testing...
[Epoch 105] train avg loss 0.00249342, dev acc 0.8024, dev avg loss 0.405715, throughput 4.91107K wps
[Epoch 106 Batch 30/62] avg loss 0.00223418, throughput 4.94902K wps
[Epoch 106 Batch 60/62] avg loss 0.00260384, throughput 4.84029K wps
Begin Testing...
[Epoch 106] train avg loss 0.00242926, dev acc 0.7965, dev avg loss 0.405678, throughput 4.90129K wps
[Epoch 107 Batch 30/62] avg loss 0.00225545, throughput 4.96247K wps
[Epoch 107 Batch 60/62] avg loss 0.00236681, throughput 4.83298K wps
Begin Testing...
[Epoch 107] train avg loss 0.00230826, dev acc 0.7965, dev avg loss 0.405176, throughput 4.90288K wps
[Epoch 108 Batch 30/62] avg loss 0.00232431, throughput 4.93059K wps
[Epoch 108 Batch 60/62] avg loss 0.00223023, throughput 4.84505K wps
Begin Testing...
[Epoch 108] train avg loss 0.00230825, dev acc 0.7935, dev avg loss 0.405894, throughput 4.89446K wps
[Epoch 109 Batch 30/62] avg loss 0.00231597, throughput 4.9465K wps
[Epoch 109 Batch 60/62] avg loss 0.00219495, throughput 4.85825K wps
Begin Testing...
[Epoch 109] train avg loss 0.00228143, dev acc 0.7876, dev avg loss 0.40627, throughput 4.90959K wps
[Epoch 110 Batch 30/62] avg loss 0.00214315, throughput 4.94574K wps
[Epoch 110 Batch 60/62] avg loss 0.00222608, throughput 4.81931K wps
Begin Testing...
[Epoch 110] train avg loss 0.00221542, dev acc 0.7935, dev avg loss 0.406047, throughput 4.88988K wps
[Epoch 111 Batch 30/62] avg loss 0.002158, throughput 4.93893K wps