Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='MR', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='multichannel')
Use gpu0
maximum length (in tokens): 56
Done! Tokenizing Time=0.21s, #Sentences=10662
SentimentNet(
(embedding): Embedding(18768 -> 300, float32)
(embedding_extend): Embedding(18768 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/173] avg loss 0.0140445, throughput 0.448192K wps
[Epoch 0 Batch 60/173] avg loss 0.0139668, throughput 2.23382K wps
[Epoch 0 Batch 90/173] avg loss 0.0140539, throughput 2.24226K wps
[Epoch 0 Batch 120/173] avg loss 0.013999, throughput 2.23966K wps
[Epoch 0 Batch 150/173] avg loss 0.0137923, throughput 2.23245K wps
Begin Testing...
[Epoch 0] train avg loss 0.0139566, dev acc 0.6163, dev avg loss 0.678783, throughput 0.981298K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0137116, throughput 2.27464K wps
[Epoch 1 Batch 60/173] avg loss 0.0136219, throughput 2.24484K wps
[Epoch 1 Batch 90/173] avg loss 0.0136418, throughput 2.22274K wps
[Epoch 1 Batch 120/173] avg loss 0.0135923, throughput 2.24597K wps
[Epoch 1 Batch 150/173] avg loss 0.0134999, throughput 2.23764K wps
Begin Testing...
[Epoch 1] train avg loss 0.0136238, dev acc 0.6569, dev avg loss 0.666712, throughput 2.24446K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0133156, throughput 2.26038K wps
[Epoch 2 Batch 60/173] avg loss 0.0133563, throughput 2.23353K wps
[Epoch 2 Batch 90/173] avg loss 0.0131646, throughput 2.24637K wps
[Epoch 2 Batch 120/173] avg loss 0.0133249, throughput 2.23484K wps
[Epoch 2 Batch 150/173] avg loss 0.0132272, throughput 2.24946K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132891, dev acc 0.7101, dev avg loss 0.652805, throughput 2.24556K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0130912, throughput 2.28881K wps
[Epoch 3 Batch 60/173] avg loss 0.0129702, throughput 2.22688K wps
[Epoch 3 Batch 90/173] avg loss 0.0130545, throughput 2.24005K wps
[Epoch 3 Batch 120/173] avg loss 0.0129206, throughput 2.23713K wps
[Epoch 3 Batch 150/173] avg loss 0.0128017, throughput 2.24762K wps
Begin Testing...
[Epoch 3] train avg loss 0.0129659, dev acc 0.6986, dev avg loss 0.64051, throughput 2.24781K wps
[Epoch 4 Batch 30/173] avg loss 0.0126543, throughput 2.2617K wps
[Epoch 4 Batch 60/173] avg loss 0.0128062, throughput 2.22977K wps
[Epoch 4 Batch 90/173] avg loss 0.0126922, throughput 2.2425K wps
[Epoch 4 Batch 120/173] avg loss 0.0128211, throughput 2.23624K wps
[Epoch 4 Batch 150/173] avg loss 0.012586, throughput 2.24132K wps
Begin Testing...
[Epoch 4] train avg loss 0.0126952, dev acc 0.7393, dev avg loss 0.624479, throughput 2.24201K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.0124201, throughput 2.27566K wps
[Epoch 5 Batch 60/173] avg loss 0.0124238, throughput 2.23652K wps
[Epoch 5 Batch 90/173] avg loss 0.01227, throughput 2.24323K wps
[Epoch 5 Batch 120/173] avg loss 0.0122495, throughput 2.22352K wps
[Epoch 5 Batch 150/173] avg loss 0.0123, throughput 2.2397K wps
Begin Testing...
[Epoch 5] train avg loss 0.0123469, dev acc 0.7404, dev avg loss 0.609132, throughput 2.2434K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/173] avg loss 0.0118521, throughput 2.25055K wps
[Epoch 6 Batch 60/173] avg loss 0.0120329, throughput 2.23065K wps
[Epoch 6 Batch 90/173] avg loss 0.011966, throughput 2.23367K wps
[Epoch 6 Batch 120/173] avg loss 0.0120768, throughput 2.21356K wps
[Epoch 6 Batch 150/173] avg loss 0.0119139, throughput 2.21165K wps
Begin Testing...
[Epoch 6] train avg loss 0.012002, dev acc 0.7497, dev avg loss 0.592262, throughput 2.229K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/173] avg loss 0.0116754, throughput 2.26668K wps
[Epoch 7 Batch 60/173] avg loss 0.0119072, throughput 2.23162K wps
[Epoch 7 Batch 90/173] avg loss 0.0116192, throughput 2.2391K wps
[Epoch 7 Batch 120/173] avg loss 0.0115865, throughput 2.23499K wps
[Epoch 7 Batch 150/173] avg loss 0.0115224, throughput 2.24718K wps
Begin Testing...
[Epoch 7] train avg loss 0.011651, dev acc 0.7654, dev avg loss 0.575203, throughput 2.24267K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/173] avg loss 0.0113799, throughput 2.26502K wps
[Epoch 8 Batch 60/173] avg loss 0.0112618, throughput 2.23772K wps
[Epoch 8 Batch 90/173] avg loss 0.0112191, throughput 2.24192K wps
[Epoch 8 Batch 120/173] avg loss 0.0113287, throughput 2.23713K wps
[Epoch 8 Batch 150/173] avg loss 0.0111611, throughput 2.24002K wps
Begin Testing...
[Epoch 8] train avg loss 0.0112483, dev acc 0.7800, dev avg loss 0.559089, throughput 2.24287K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/173] avg loss 0.0111078, throughput 2.2718K wps
[Epoch 9 Batch 60/173] avg loss 0.0109939, throughput 2.1959K wps
[Epoch 9 Batch 90/173] avg loss 0.0111072, throughput 2.23152K wps
[Epoch 9 Batch 120/173] avg loss 0.0108777, throughput 2.23937K wps
[Epoch 9 Batch 150/173] avg loss 0.0109145, throughput 2.24146K wps
Begin Testing...
[Epoch 9] train avg loss 0.0109516, dev acc 0.7862, dev avg loss 0.544108, throughput 2.23658K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/173] avg loss 0.0107458, throughput 2.2865K wps
[Epoch 10 Batch 60/173] avg loss 0.0104585, throughput 2.2351K wps
[Epoch 10 Batch 90/173] avg loss 0.0110036, throughput 2.2381K wps
[Epoch 10 Batch 120/173] avg loss 0.0100485, throughput 2.23501K wps
[Epoch 10 Batch 150/173] avg loss 0.0104979, throughput 2.21176K wps
Begin Testing...
[Epoch 10] train avg loss 0.0105793, dev acc 0.7800, dev avg loss 0.531031, throughput 2.23877K wps
[Epoch 11 Batch 30/173] avg loss 0.0105472, throughput 2.28801K wps
[Epoch 11 Batch 60/173] avg loss 0.0104755, throughput 2.246K wps
[Epoch 11 Batch 90/173] avg loss 0.0100533, throughput 2.24344K wps
[Epoch 11 Batch 120/173] avg loss 0.0102061, throughput 2.21581K wps
[Epoch 11 Batch 150/173] avg loss 0.00978922, throughput 2.22133K wps
Begin Testing...
[Epoch 11] train avg loss 0.0102157, dev acc 0.7842, dev avg loss 0.518264, throughput 2.24004K wps
[Epoch 12 Batch 30/173] avg loss 0.0100045, throughput 2.28637K wps
[Epoch 12 Batch 60/173] avg loss 0.00980535, throughput 2.23279K wps
[Epoch 12 Batch 90/173] avg loss 0.00994204, throughput 2.2325K wps
[Epoch 12 Batch 120/173] avg loss 0.0100127, throughput 2.22762K wps
[Epoch 12 Batch 150/173] avg loss 0.00986874, throughput 2.23676K wps
Begin Testing...
[Epoch 12] train avg loss 0.00996149, dev acc 0.7873, dev avg loss 0.507678, throughput 2.24163K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/173] avg loss 0.00947031, throughput 2.29616K wps
[Epoch 13 Batch 60/173] avg loss 0.00997943, throughput 2.23198K wps
[Epoch 13 Batch 90/173] avg loss 0.00979392, throughput 2.22541K wps
[Epoch 13 Batch 120/173] avg loss 0.00961625, throughput 2.23488K wps
[Epoch 13 Batch 150/173] avg loss 0.00972815, throughput 2.2227K wps
Begin Testing...
[Epoch 13] train avg loss 0.00971204, dev acc 0.7894, dev avg loss 0.499608, throughput 2.23781K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/173] avg loss 0.00971678, throughput 2.27792K wps
[Epoch 14 Batch 60/173] avg loss 0.00973095, throughput 2.22623K wps
[Epoch 14 Batch 90/173] avg loss 0.00965427, throughput 2.22815K wps
[Epoch 14 Batch 120/173] avg loss 0.00926655, throughput 2.21127K wps
[Epoch 14 Batch 150/173] avg loss 0.00919744, throughput 2.23678K wps
Begin Testing...
[Epoch 14] train avg loss 0.00953483, dev acc 0.7873, dev avg loss 0.492281, throughput 2.23596K wps
[Epoch 15 Batch 30/173] avg loss 0.00897343, throughput 2.28473K wps
[Epoch 15 Batch 60/173] avg loss 0.00929378, throughput 2.2367K wps
[Epoch 15 Batch 90/173] avg loss 0.0092159, throughput 2.21697K wps
[Epoch 15 Batch 120/173] avg loss 0.00924302, throughput 2.22313K wps
[Epoch 15 Batch 150/173] avg loss 0.00911145, throughput 2.23197K wps
Begin Testing...
[Epoch 15] train avg loss 0.00923976, dev acc 0.7935, dev avg loss 0.484909, throughput 2.23807K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.00904278, throughput 2.29359K wps
[Epoch 16 Batch 60/173] avg loss 0.00911872, throughput 2.23486K wps
[Epoch 16 Batch 90/173] avg loss 0.00911281, throughput 2.2335K wps
[Epoch 16 Batch 120/173] avg loss 0.00868047, throughput 2.23644K wps
[Epoch 16 Batch 150/173] avg loss 0.0087234, throughput 2.2223K wps
Begin Testing...
[Epoch 16] train avg loss 0.00892554, dev acc 0.7904, dev avg loss 0.479301, throughput 2.24358K wps
[Epoch 17 Batch 30/173] avg loss 0.0087448, throughput 2.26634K wps
[Epoch 17 Batch 60/173] avg loss 0.00873712, throughput 2.22198K wps
[Epoch 17 Batch 90/173] avg loss 0.00868864, throughput 2.21347K wps
[Epoch 17 Batch 120/173] avg loss 0.00883665, throughput 2.23654K wps
[Epoch 17 Batch 150/173] avg loss 0.00867303, throughput 2.23112K wps
Begin Testing...
[Epoch 17] train avg loss 0.00873826, dev acc 0.7987, dev avg loss 0.474248, throughput 2.23169K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.00846173, throughput 2.26363K wps
[Epoch 18 Batch 60/173] avg loss 0.00854592, throughput 2.2153K wps
[Epoch 18 Batch 90/173] avg loss 0.00861771, throughput 2.22382K wps
[Epoch 18 Batch 120/173] avg loss 0.00836099, throughput 2.22258K wps
[Epoch 18 Batch 150/173] avg loss 0.00900785, throughput 2.23188K wps
Begin Testing...
[Epoch 18] train avg loss 0.00858678, dev acc 0.7946, dev avg loss 0.470982, throughput 2.23141K wps
[Epoch 19 Batch 30/173] avg loss 0.00835308, throughput 2.27504K wps
[Epoch 19 Batch 60/173] avg loss 0.00876692, throughput 2.22293K wps
[Epoch 19 Batch 90/173] avg loss 0.00843297, throughput 2.21238K wps
[Epoch 19 Batch 120/173] avg loss 0.0078132, throughput 2.23796K wps
[Epoch 19 Batch 150/173] avg loss 0.00805158, throughput 2.23979K wps
Begin Testing...
[Epoch 19] train avg loss 0.00833608, dev acc 0.7967, dev avg loss 0.465947, throughput 2.23522K wps
[Epoch 20 Batch 30/173] avg loss 0.00796049, throughput 2.26823K wps
[Epoch 20 Batch 60/173] avg loss 0.00828413, throughput 2.21992K wps
[Epoch 20 Batch 90/173] avg loss 0.00816687, throughput 2.23807K wps
[Epoch 20 Batch 120/173] avg loss 0.00831611, throughput 2.23367K wps
[Epoch 20 Batch 150/173] avg loss 0.00827, throughput 2.22294K wps
Begin Testing...
[Epoch 20] train avg loss 0.00823177, dev acc 0.7967, dev avg loss 0.463099, throughput 2.23525K wps
[Epoch 21 Batch 30/173] avg loss 0.00828277, throughput 2.27736K wps
[Epoch 21 Batch 60/173] avg loss 0.00809679, throughput 2.20988K wps
[Epoch 21 Batch 90/173] avg loss 0.00816152, throughput 2.23522K wps
[Epoch 21 Batch 120/173] avg loss 0.00768061, throughput 2.23396K wps
[Epoch 21 Batch 150/173] avg loss 0.00772262, throughput 2.22275K wps
Begin Testing...
[Epoch 21] train avg loss 0.00801286, dev acc 0.7967, dev avg loss 0.459544, throughput 2.23152K wps
[Epoch 22 Batch 30/173] avg loss 0.00747858, throughput 2.25378K wps
[Epoch 22 Batch 60/173] avg loss 0.00793094, throughput 2.21362K wps
[Epoch 22 Batch 90/173] avg loss 0.00801674, throughput 2.20528K wps
[Epoch 22 Batch 120/173] avg loss 0.00778437, throughput 2.22455K wps
[Epoch 22 Batch 150/173] avg loss 0.00769051, throughput 2.23776K wps
Begin Testing...
[Epoch 22] train avg loss 0.00782729, dev acc 0.7956, dev avg loss 0.456853, throughput 2.22822K wps
[Epoch 23 Batch 30/173] avg loss 0.00740432, throughput 2.27441K wps
[Epoch 23 Batch 60/173] avg loss 0.0079993, throughput 2.2324K wps
[Epoch 23 Batch 90/173] avg loss 0.00752843, throughput 2.23668K wps
[Epoch 23 Batch 120/173] avg loss 0.00747644, throughput 2.23407K wps
[Epoch 23 Batch 150/173] avg loss 0.00756714, throughput 2.2234K wps
Begin Testing...
[Epoch 23] train avg loss 0.00766391, dev acc 0.7977, dev avg loss 0.454491, throughput 2.23739K wps
[Epoch 24 Batch 30/173] avg loss 0.00709164, throughput 2.26643K wps
[Epoch 24 Batch 60/173] avg loss 0.00765475, throughput 2.23118K wps
[Epoch 24 Batch 90/173] avg loss 0.00769988, throughput 2.22988K wps
[Epoch 24 Batch 120/173] avg loss 0.00765002, throughput 2.20659K wps
[Epoch 24 Batch 150/173] avg loss 0.00746789, throughput 2.21606K wps
Begin Testing...
[Epoch 24] train avg loss 0.00751649, dev acc 0.7998, dev avg loss 0.452234, throughput 2.22918K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/173] avg loss 0.00731315, throughput 2.25994K wps
[Epoch 25 Batch 60/173] avg loss 0.00729415, throughput 2.22671K wps
[Epoch 25 Batch 90/173] avg loss 0.00750253, throughput 2.23875K wps
[Epoch 25 Batch 120/173] avg loss 0.00743295, throughput 2.23485K wps
[Epoch 25 Batch 150/173] avg loss 0.0074329, throughput 2.23313K wps
Begin Testing...
[Epoch 25] train avg loss 0.00740565, dev acc 0.7998, dev avg loss 0.450152, throughput 2.23863K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/173] avg loss 0.00749903, throughput 2.28432K wps
[Epoch 26 Batch 60/173] avg loss 0.00703604, throughput 2.22482K wps
[Epoch 26 Batch 90/173] avg loss 0.0070091, throughput 2.23924K wps
[Epoch 26 Batch 120/173] avg loss 0.00711496, throughput 2.20752K wps
[Epoch 26 Batch 150/173] avg loss 0.00723802, throughput 2.22888K wps
Begin Testing...
[Epoch 26] train avg loss 0.00720329, dev acc 0.7967, dev avg loss 0.448535, throughput 2.23676K wps
[Epoch 27 Batch 30/173] avg loss 0.00693395, throughput 2.28997K wps
[Epoch 27 Batch 60/173] avg loss 0.00700814, throughput 2.23929K wps
[Epoch 27 Batch 90/173] avg loss 0.00690629, throughput 2.23281K wps
[Epoch 27 Batch 120/173] avg loss 0.0068831, throughput 2.23365K wps
[Epoch 27 Batch 150/173] avg loss 0.00693784, throughput 2.22743K wps
Begin Testing...
[Epoch 27] train avg loss 0.00695862, dev acc 0.7998, dev avg loss 0.446828, throughput 2.24017K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/173] avg loss 0.00691795, throughput 2.26598K wps
[Epoch 28 Batch 60/173] avg loss 0.00646145, throughput 2.22382K wps
[Epoch 28 Batch 90/173] avg loss 0.00697219, throughput 2.23419K wps
[Epoch 28 Batch 120/173] avg loss 0.00704029, throughput 2.23253K wps
[Epoch 28 Batch 150/173] avg loss 0.00662957, throughput 2.21825K wps
Begin Testing...
[Epoch 28] train avg loss 0.00681069, dev acc 0.7977, dev avg loss 0.444757, throughput 2.23412K wps
[Epoch 29 Batch 30/173] avg loss 0.00688767, throughput 2.27379K wps
[Epoch 29 Batch 60/173] avg loss 0.00688107, throughput 2.22131K wps
[Epoch 29 Batch 90/173] avg loss 0.00651047, throughput 2.21484K wps
[Epoch 29 Batch 120/173] avg loss 0.00709216, throughput 2.22734K wps
[Epoch 29 Batch 150/173] avg loss 0.00669532, throughput 2.22029K wps
Begin Testing...
[Epoch 29] train avg loss 0.00677272, dev acc 0.7987, dev avg loss 0.443366, throughput 2.22908K wps
[Epoch 30 Batch 30/173] avg loss 0.00662066, throughput 2.2795K wps
[Epoch 30 Batch 60/173] avg loss 0.00672236, throughput 2.23405K wps
[Epoch 30 Batch 90/173] avg loss 0.00652711, throughput 2.23488K wps
[Epoch 30 Batch 120/173] avg loss 0.00651159, throughput 2.21298K wps
[Epoch 30 Batch 150/173] avg loss 0.00616764, throughput 2.23356K wps
Begin Testing...
[Epoch 30] train avg loss 0.00651774, dev acc 0.7977, dev avg loss 0.443355, throughput 2.23867K wps
[Epoch 31 Batch 30/173] avg loss 0.00589804, throughput 2.26949K wps
[Epoch 31 Batch 60/173] avg loss 0.00624159, throughput 2.23168K wps
[Epoch 31 Batch 90/173] avg loss 0.00634251, throughput 2.22154K wps
[Epoch 31 Batch 120/173] avg loss 0.00665561, throughput 2.23446K wps
[Epoch 31 Batch 150/173] avg loss 0.00673022, throughput 2.22744K wps
Begin Testing...
[Epoch 31] train avg loss 0.00640635, dev acc 0.7987, dev avg loss 0.439815, throughput 2.2347K wps
[Epoch 32 Batch 30/173] avg loss 0.00611788, throughput 2.26983K wps
[Epoch 32 Batch 60/173] avg loss 0.00638919, throughput 2.23334K wps
[Epoch 32 Batch 90/173] avg loss 0.00592876, throughput 2.23268K wps
[Epoch 32 Batch 120/173] avg loss 0.00640459, throughput 2.23207K wps
[Epoch 32 Batch 150/173] avg loss 0.00610062, throughput 2.23792K wps
Begin Testing...
[Epoch 32] train avg loss 0.00623546, dev acc 0.7925, dev avg loss 0.438721, throughput 2.24102K wps
[Epoch 33 Batch 30/173] avg loss 0.00621535, throughput 2.27151K wps
[Epoch 33 Batch 60/173] avg loss 0.00567574, throughput 2.2234K wps
[Epoch 33 Batch 90/173] avg loss 0.0058796, throughput 2.21528K wps
[Epoch 33 Batch 120/173] avg loss 0.00644757, throughput 2.22599K wps
[Epoch 33 Batch 150/173] avg loss 0.00625993, throughput 2.23397K wps
Begin Testing...
[Epoch 33] train avg loss 0.00611889, dev acc 0.7987, dev avg loss 0.439448, throughput 2.23289K wps
[Epoch 34 Batch 30/173] avg loss 0.00603358, throughput 2.28748K wps
[Epoch 34 Batch 60/173] avg loss 0.00575593, throughput 2.22453K wps
[Epoch 34 Batch 90/173] avg loss 0.00630018, throughput 2.23014K wps
[Epoch 34 Batch 120/173] avg loss 0.00578755, throughput 2.19428K wps
[Epoch 34 Batch 150/173] avg loss 0.00583772, throughput 2.22915K wps
Begin Testing...
[Epoch 34] train avg loss 0.0059869, dev acc 0.7946, dev avg loss 0.441216, throughput 2.23274K wps
[Epoch 35 Batch 30/173] avg loss 0.00584019, throughput 2.28229K wps
[Epoch 35 Batch 60/173] avg loss 0.00607417, throughput 2.23727K wps
[Epoch 35 Batch 90/173] avg loss 0.00571221, throughput 2.23389K wps
[Epoch 35 Batch 120/173] avg loss 0.0055164, throughput 2.23617K wps
[Epoch 35 Batch 150/173] avg loss 0.00572756, throughput 2.22753K wps
Begin Testing...
[Epoch 35] train avg loss 0.00583401, dev acc 0.7914, dev avg loss 0.436258, throughput 2.24145K wps
[Epoch 36 Batch 30/173] avg loss 0.00583214, throughput 2.28516K wps
[Epoch 36 Batch 60/173] avg loss 0.00574418, throughput 2.22904K wps
[Epoch 36 Batch 90/173] avg loss 0.00555358, throughput 2.22588K wps
[Epoch 36 Batch 120/173] avg loss 0.00554106, throughput 2.23257K wps
[Epoch 36 Batch 150/173] avg loss 0.00553191, throughput 2.23261K wps
Begin Testing...
[Epoch 36] train avg loss 0.00567377, dev acc 0.7956, dev avg loss 0.439226, throughput 2.2404K wps
[Epoch 37 Batch 30/173] avg loss 0.00523284, throughput 2.29006K wps
[Epoch 37 Batch 60/173] avg loss 0.00570663, throughput 2.232K wps
[Epoch 37 Batch 90/173] avg loss 0.00546511, throughput 2.22925K wps
[Epoch 37 Batch 120/173] avg loss 0.00578979, throughput 2.22537K wps
[Epoch 37 Batch 150/173] avg loss 0.00559344, throughput 2.21798K wps
Begin Testing...
[Epoch 37] train avg loss 0.00555497, dev acc 0.7935, dev avg loss 0.434911, throughput 2.23475K wps
[Epoch 38 Batch 30/173] avg loss 0.00545371, throughput 2.25143K wps
[Epoch 38 Batch 60/173] avg loss 0.00549153, throughput 2.21971K wps
[Epoch 38 Batch 90/173] avg loss 0.00545657, throughput 2.23201K wps
[Epoch 38 Batch 120/173] avg loss 0.00542068, throughput 2.23052K wps
[Epoch 38 Batch 150/173] avg loss 0.00535189, throughput 2.23075K wps
Begin Testing...
[Epoch 38] train avg loss 0.00542811, dev acc 0.7935, dev avg loss 0.434006, throughput 2.23418K wps
[Epoch 39 Batch 30/173] avg loss 0.00542817, throughput 2.28499K wps
[Epoch 39 Batch 60/173] avg loss 0.00543053, throughput 2.23443K wps
[Epoch 39 Batch 90/173] avg loss 0.00522363, throughput 2.22442K wps
[Epoch 39 Batch 120/173] avg loss 0.0052526, throughput 2.22587K wps
[Epoch 39 Batch 150/173] avg loss 0.00505871, throughput 2.22508K wps
Begin Testing...
[Epoch 39] train avg loss 0.00530479, dev acc 0.7987, dev avg loss 0.43472, throughput 2.23916K wps
[Epoch 40 Batch 30/173] avg loss 0.00500743, throughput 2.27829K wps
[Epoch 40 Batch 60/173] avg loss 0.00534279, throughput 2.21814K wps
[Epoch 40 Batch 90/173] avg loss 0.00534667, throughput 2.21179K wps
[Epoch 40 Batch 120/173] avg loss 0.00502015, throughput 2.23421K wps
[Epoch 40 Batch 150/173] avg loss 0.00519429, throughput 2.23893K wps
Begin Testing...
[Epoch 40] train avg loss 0.00522436, dev acc 0.7925, dev avg loss 0.435693, throughput 2.23621K wps
[Epoch 41 Batch 30/173] avg loss 0.00511093, throughput 2.27497K wps
[Epoch 41 Batch 60/173] avg loss 0.00499288, throughput 2.23053K wps
[Epoch 41 Batch 90/173] avg loss 0.00479956, throughput 2.21869K wps
[Epoch 41 Batch 120/173] avg loss 0.00519981, throughput 2.24002K wps
[Epoch 41 Batch 150/173] avg loss 0.00495358, throughput 2.22991K wps
Begin Testing...
[Epoch 41] train avg loss 0.00503582, dev acc 0.7987, dev avg loss 0.435928, throughput 2.23681K wps
[Epoch 42 Batch 30/173] avg loss 0.00483339, throughput 2.2685K wps
[Epoch 42 Batch 60/173] avg loss 0.00468371, throughput 2.23243K wps
[Epoch 42 Batch 90/173] avg loss 0.00489982, throughput 2.23098K wps
[Epoch 42 Batch 120/173] avg loss 0.00499753, throughput 2.23516K wps
[Epoch 42 Batch 150/173] avg loss 0.00490228, throughput 2.21604K wps
Begin Testing...
[Epoch 42] train avg loss 0.00486375, dev acc 0.7904, dev avg loss 0.43239, throughput 2.23632K wps
[Epoch 43 Batch 30/173] avg loss 0.00459193, throughput 2.26158K wps
[Epoch 43 Batch 60/173] avg loss 0.00477892, throughput 2.20924K wps
[Epoch 43 Batch 90/173] avg loss 0.00488433, throughput 2.22977K wps
[Epoch 43 Batch 120/173] avg loss 0.0050037, throughput 2.23423K wps
[Epoch 43 Batch 150/173] avg loss 0.00460709, throughput 2.23658K wps
Begin Testing...
[Epoch 43] train avg loss 0.00477911, dev acc 0.7946, dev avg loss 0.433353, throughput 2.23392K wps
[Epoch 44 Batch 30/173] avg loss 0.004663, throughput 2.26692K wps
[Epoch 44 Batch 60/173] avg loss 0.00470813, throughput 2.22275K wps
[Epoch 44 Batch 90/173] avg loss 0.00468023, throughput 2.2216K wps
[Epoch 44 Batch 120/173] avg loss 0.00468546, throughput 2.22646K wps
[Epoch 44 Batch 150/173] avg loss 0.00449779, throughput 2.23627K wps
Begin Testing...
[Epoch 44] train avg loss 0.00465948, dev acc 0.7956, dev avg loss 0.436127, throughput 2.23526K wps
[Epoch 45 Batch 30/173] avg loss 0.00453, throughput 2.28522K wps
[Epoch 45 Batch 60/173] avg loss 0.00435592, throughput 2.20625K wps
[Epoch 45 Batch 90/173] avg loss 0.00445065, throughput 2.22356K wps
[Epoch 45 Batch 120/173] avg loss 0.00467, throughput 2.23643K wps
[Epoch 45 Batch 150/173] avg loss 0.00447166, throughput 2.22698K wps
Begin Testing...
[Epoch 45] train avg loss 0.00450997, dev acc 0.7967, dev avg loss 0.433079, throughput 2.23394K wps
[Epoch 46 Batch 30/173] avg loss 0.00427027, throughput 2.28623K wps
[Epoch 46 Batch 60/173] avg loss 0.00442236, throughput 2.23278K wps
[Epoch 46 Batch 90/173] avg loss 0.00443403, throughput 2.23558K wps
[Epoch 46 Batch 120/173] avg loss 0.00399667, throughput 2.22724K wps
[Epoch 46 Batch 150/173] avg loss 0.00439641, throughput 2.23342K wps
Begin Testing...
[Epoch 46] train avg loss 0.00436667, dev acc 0.7935, dev avg loss 0.433636, throughput 2.24122K wps
[Epoch 47 Batch 30/173] avg loss 0.00401345, throughput 2.27336K wps
[Epoch 47 Batch 60/173] avg loss 0.00429187, throughput 2.22691K wps
[Epoch 47 Batch 90/173] avg loss 0.00427291, throughput 2.22163K wps
[Epoch 47 Batch 120/173] avg loss 0.00402696, throughput 2.22598K wps
[Epoch 47 Batch 150/173] avg loss 0.00441802, throughput 2.2259K wps
Begin Testing...
[Epoch 47] train avg loss 0.00424419, dev acc 0.7946, dev avg loss 0.432813, throughput 2.23423K wps
[Epoch 48 Batch 30/173] avg loss 0.00431719, throughput 2.28492K wps
[Epoch 48 Batch 60/173] avg loss 0.00414972, throughput 2.23609K wps
[Epoch 48 Batch 90/173] avg loss 0.00412465, throughput 2.23191K wps
[Epoch 48 Batch 120/173] avg loss 0.00425238, throughput 2.22914K wps
[Epoch 48 Batch 150/173] avg loss 0.00407674, throughput 2.23522K wps
Begin Testing...
[Epoch 48] train avg loss 0.00417016, dev acc 0.7987, dev avg loss 0.434322, throughput 2.2427K wps
[Epoch 49 Batch 30/173] avg loss 0.00405559, throughput 2.27748K wps
[Epoch 49 Batch 60/173] avg loss 0.00430067, throughput 2.21387K wps
[Epoch 49 Batch 90/173] avg loss 0.00394944, throughput 2.23358K wps
[Epoch 49 Batch 120/173] avg loss 0.00398366, throughput 2.22888K wps
[Epoch 49 Batch 150/173] avg loss 0.00413179, throughput 2.22594K wps
Begin Testing...
[Epoch 49] train avg loss 0.00410409, dev acc 0.7977, dev avg loss 0.434979, throughput 2.23426K wps
[Epoch 50 Batch 30/173] avg loss 0.00369852, throughput 2.25437K wps
[Epoch 50 Batch 60/173] avg loss 0.00392296, throughput 2.23359K wps
[Epoch 50 Batch 90/173] avg loss 0.00390778, throughput 2.23884K wps
[Epoch 50 Batch 120/173] avg loss 0.00405082, throughput 2.21891K wps
[Epoch 50 Batch 150/173] avg loss 0.00393457, throughput 2.23K wps
Begin Testing...
[Epoch 50] train avg loss 0.00388644, dev acc 0.7987, dev avg loss 0.436363, throughput 2.23413K wps
[Epoch 51 Batch 30/173] avg loss 0.00380551, throughput 2.26449K wps
[Epoch 51 Batch 60/173] avg loss 0.00375756, throughput 2.2233K wps
[Epoch 51 Batch 90/173] avg loss 0.00397813, throughput 2.23847K wps
[Epoch 51 Batch 120/173] avg loss 0.00388095, throughput 2.226K wps
[Epoch 51 Batch 150/173] avg loss 0.00395615, throughput 2.22825K wps
Begin Testing...
[Epoch 51] train avg loss 0.00390437, dev acc 0.7977, dev avg loss 0.437142, throughput 2.23532K wps
[Epoch 52 Batch 30/173] avg loss 0.00378183, throughput 2.2829K wps
[Epoch 52 Batch 60/173] avg loss 0.00379543, throughput 2.23422K wps
[Epoch 52 Batch 90/173] avg loss 0.00366311, throughput 2.23498K wps
[Epoch 52 Batch 120/173] avg loss 0.00373626, throughput 2.23621K wps
[Epoch 52 Batch 150/173] avg loss 0.00385049, throughput 2.22241K wps
Begin Testing...
[Epoch 52] train avg loss 0.00376569, dev acc 0.7987, dev avg loss 0.434505, throughput 2.24069K wps
[Epoch 53 Batch 30/173] avg loss 0.0036891, throughput 2.27629K wps
[Epoch 53 Batch 60/173] avg loss 0.00375955, throughput 2.226K wps
[Epoch 53 Batch 90/173] avg loss 0.00352737, throughput 2.22297K wps
[Epoch 53 Batch 120/173] avg loss 0.00343219, throughput 2.23089K wps
[Epoch 53 Batch 150/173] avg loss 0.00359294, throughput 2.23667K wps
Begin Testing...
[Epoch 53] train avg loss 0.00366662, dev acc 0.7894, dev avg loss 0.438739, throughput 2.23851K wps
[Epoch 54 Batch 30/173] avg loss 0.00351686, throughput 2.28655K wps
[Epoch 54 Batch 60/173] avg loss 0.00360826, throughput 2.21004K wps
[Epoch 54 Batch 90/173] avg loss 0.00376416, throughput 2.21224K wps
[Epoch 54 Batch 120/173] avg loss 0.00340118, throughput 2.23579K wps
[Epoch 54 Batch 150/173] avg loss 0.00327096, throughput 2.23672K wps
Begin Testing...
[Epoch 54] train avg loss 0.00353052, dev acc 0.8008, dev avg loss 0.43633, throughput 2.23389K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/173] avg loss 0.00328817, throughput 2.28257K wps
[Epoch 55 Batch 60/173] avg loss 0.00329368, throughput 2.2283K wps
[Epoch 55 Batch 90/173] avg loss 0.00359472, throughput 2.23051K wps
[Epoch 55 Batch 120/173] avg loss 0.00340589, throughput 2.23416K wps
[Epoch 55 Batch 150/173] avg loss 0.00349767, throughput 2.2218K wps
Begin Testing...
[Epoch 55] train avg loss 0.00339166, dev acc 0.7998, dev avg loss 0.436381, throughput 2.23844K wps
[Epoch 56 Batch 30/173] avg loss 0.00348097, throughput 2.27701K wps
[Epoch 56 Batch 60/173] avg loss 0.00333187, throughput 2.23989K wps
[Epoch 56 Batch 90/173] avg loss 0.00339152, throughput 2.23564K wps
[Epoch 56 Batch 120/173] avg loss 0.00313154, throughput 2.22366K wps
[Epoch 56 Batch 150/173] avg loss 0.00342398, throughput 2.22242K wps
Begin Testing...
[Epoch 56] train avg loss 0.00332504, dev acc 0.8008, dev avg loss 0.436359, throughput 2.23859K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/173] avg loss 0.00329448, throughput 2.27631K wps
[Epoch 57 Batch 60/173] avg loss 0.00319494, throughput 2.2126K wps
[Epoch 57 Batch 90/173] avg loss 0.00313067, throughput 2.22458K wps
[Epoch 57 Batch 120/173] avg loss 0.00321931, throughput 2.23633K wps
[Epoch 57 Batch 150/173] avg loss 0.00347141, throughput 2.22313K wps
Begin Testing...
[Epoch 57] train avg loss 0.00327301, dev acc 0.7987, dev avg loss 0.438438, throughput 2.23514K wps
[Epoch 58 Batch 30/173] avg loss 0.00310258, throughput 2.27259K wps
[Epoch 58 Batch 60/173] avg loss 0.00318054, throughput 2.2231K wps
[Epoch 58 Batch 90/173] avg loss 0.00302411, throughput 2.22788K wps
[Epoch 58 Batch 120/173] avg loss 0.00315665, throughput 2.19821K wps
[Epoch 58 Batch 150/173] avg loss 0.00329064, throughput 2.21951K wps
Begin Testing...
[Epoch 58] train avg loss 0.0031584, dev acc 0.7925, dev avg loss 0.43888, throughput 2.22487K wps
[Epoch 59 Batch 30/173] avg loss 0.00335326, throughput 2.28107K wps
[Epoch 59 Batch 60/173] avg loss 0.00290569, throughput 2.21046K wps
[Epoch 59 Batch 90/173] avg loss 0.00300699, throughput 2.21138K wps
[Epoch 59 Batch 120/173] avg loss 0.00320704, throughput 2.2074K wps
[Epoch 59 Batch 150/173] avg loss 0.00297367, throughput 2.23616K wps
Begin Testing...
[Epoch 59] train avg loss 0.00312477, dev acc 0.7946, dev avg loss 0.438562, throughput 2.2298K wps
[Epoch 60 Batch 30/173] avg loss 0.00294565, throughput 2.27255K wps
[Epoch 60 Batch 60/173] avg loss 0.00298264, throughput 2.20151K wps
[Epoch 60 Batch 90/173] avg loss 0.00295452, throughput 2.2216K wps
[Epoch 60 Batch 120/173] avg loss 0.00304741, throughput 2.22132K wps
[Epoch 60 Batch 150/173] avg loss 0.00283308, throughput 2.21296K wps
Begin Testing...
[Epoch 60] train avg loss 0.00298609, dev acc 0.7904, dev avg loss 0.442084, throughput 2.22533K wps
[Epoch 61 Batch 30/173] avg loss 0.00293615, throughput 2.25443K wps
[Epoch 61 Batch 60/173] avg loss 0.00298861, throughput 2.21268K wps
[Epoch 61 Batch 90/173] avg loss 0.00294342, throughput 2.22944K wps
[Epoch 61 Batch 120/173] avg loss 0.00301619, throughput 2.23555K wps
[Epoch 61 Batch 150/173] avg loss 0.00295745, throughput 2.23588K wps
Begin Testing...
[Epoch 61] train avg loss 0.00296298, dev acc 0.8019, dev avg loss 0.44175, throughput 2.23447K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/173] avg loss 0.00272811, throughput 2.28581K wps
[Epoch 62 Batch 60/173] avg loss 0.00295486, throughput 2.22398K wps
[Epoch 62 Batch 90/173] avg loss 0.00290226, throughput 2.22638K wps
[Epoch 62 Batch 120/173] avg loss 0.00263702, throughput 2.23817K wps
[Epoch 62 Batch 150/173] avg loss 0.0028431, throughput 2.22996K wps
Begin Testing...
[Epoch 62] train avg loss 0.00283099, dev acc 0.7946, dev avg loss 0.442329, throughput 2.23746K wps
[Epoch 63 Batch 30/173] avg loss 0.0026185, throughput 2.28439K wps
[Epoch 63 Batch 60/173] avg loss 0.00293104, throughput 2.22207K wps
[Epoch 63 Batch 90/173] avg loss 0.00275469, throughput 2.22647K wps
[Epoch 63 Batch 120/173] avg loss 0.00280134, throughput 2.2342K wps
[Epoch 63 Batch 150/173] avg loss 0.00262066, throughput 2.20575K wps
Begin Testing...
[Epoch 63] train avg loss 0.00274828, dev acc 0.7987, dev avg loss 0.442812, throughput 2.23452K wps
[Epoch 64 Batch 30/173] avg loss 0.00249727, throughput 2.27279K wps
[Epoch 64 Batch 60/173] avg loss 0.00252846, throughput 2.22794K wps
[Epoch 64 Batch 90/173] avg loss 0.00280889, throughput 2.23026K wps
[Epoch 64 Batch 120/173] avg loss 0.00266771, throughput 2.2238K wps
[Epoch 64 Batch 150/173] avg loss 0.00301123, throughput 2.22931K wps
Begin Testing...
[Epoch 64] train avg loss 0.00268507, dev acc 0.7956, dev avg loss 0.444226, throughput 2.23435K wps
[Epoch 65 Batch 30/173] avg loss 0.00264562, throughput 2.26511K wps
[Epoch 65 Batch 60/173] avg loss 0.00254158, throughput 2.22895K wps
[Epoch 65 Batch 90/173] avg loss 0.0025356, throughput 2.22275K wps
[Epoch 65 Batch 120/173] avg loss 0.00268612, throughput 2.21787K wps
[Epoch 65 Batch 150/173] avg loss 0.00268794, throughput 2.19731K wps
Begin Testing...
[Epoch 65] train avg loss 0.00262017, dev acc 0.7998, dev avg loss 0.445941, throughput 2.22493K wps
[Epoch 66 Batch 30/173] avg loss 0.00241522, throughput 2.28665K wps
[Epoch 66 Batch 60/173] avg loss 0.00254605, throughput 2.22035K wps
[Epoch 66 Batch 90/173] avg loss 0.00253968, throughput 2.22656K wps
[Epoch 66 Batch 120/173] avg loss 0.00230934, throughput 2.22984K wps
[Epoch 66 Batch 150/173] avg loss 0.00270361, throughput 2.22728K wps
Begin Testing...
[Epoch 66] train avg loss 0.00253975, dev acc 0.7977, dev avg loss 0.448583, throughput 2.23647K wps
[Epoch 67 Batch 30/173] avg loss 0.00257263, throughput 2.28776K wps
[Epoch 67 Batch 60/173] avg loss 0.00247563, throughput 2.22628K wps
[Epoch 67 Batch 90/173] avg loss 0.00248943, throughput 2.21772K wps
[Epoch 67 Batch 120/173] avg loss 0.00234227, throughput 2.23317K wps
[Epoch 67 Batch 150/173] avg loss 0.00262507, throughput 2.23959K wps
Begin Testing...
[Epoch 67] train avg loss 0.0025084, dev acc 0.7977, dev avg loss 0.448137, throughput 2.24052K wps
[Epoch 68 Batch 30/173] avg loss 0.00243553, throughput 2.26936K wps
[Epoch 68 Batch 60/173] avg loss 0.0023158, throughput 2.21818K wps
[Epoch 68 Batch 90/173] avg loss 0.00234542, throughput 2.2306K wps
[Epoch 68 Batch 120/173] avg loss 0.00235372, throughput 2.22935K wps
[Epoch 68 Batch 150/173] avg loss 0.00245706, throughput 2.23196K wps
Begin Testing...
[Epoch 68] train avg loss 0.00241735, dev acc 0.7946, dev avg loss 0.449924, throughput 2.23214K wps
[Epoch 69 Batch 30/173] avg loss 0.00233706, throughput 2.27366K wps
[Epoch 69 Batch 60/173] avg loss 0.00226196, throughput 2.22904K wps
[Epoch 69 Batch 90/173] avg loss 0.0023332, throughput 2.23361K wps
[Epoch 69 Batch 120/173] avg loss 0.00240637, throughput 2.22813K wps
[Epoch 69 Batch 150/173] avg loss 0.00240409, throughput 2.23651K wps
Begin Testing...
[Epoch 69] train avg loss 0.00237724, dev acc 0.7956, dev avg loss 0.450682, throughput 2.23952K wps
[Epoch 70 Batch 30/173] avg loss 0.00236517, throughput 2.2534K wps
[Epoch 70 Batch 60/173] avg loss 0.0021781, throughput 2.22685K wps
[Epoch 70 Batch 90/173] avg loss 0.00217187, throughput 2.21677K wps
[Epoch 70 Batch 120/173] avg loss 0.00234092, throughput 2.21709K wps
[Epoch 70 Batch 150/173] avg loss 0.00231983, throughput 2.20605K wps
Begin Testing...
[Epoch 70] train avg loss 0.00228778, dev acc 0.7956, dev avg loss 0.451614, throughput 2.22371K wps
[Epoch 71 Batch 30/173] avg loss 0.00208342, throughput 2.25753K wps
[Epoch 71 Batch 60/173] avg loss 0.00224693, throughput 2.23416K wps
[Epoch 71 Batch 90/173] avg loss 0.00231605, throughput 2.22688K wps
[Epoch 71 Batch 120/173] avg loss 0.00207698, throughput 2.20504K wps
[Epoch 71 Batch 150/173] avg loss 0.00214064, throughput 2.22143K wps
Begin Testing...
[Epoch 71] train avg loss 0.00221922, dev acc 0.7946, dev avg loss 0.452054, throughput 2.22834K wps
[Epoch 72 Batch 30/173] avg loss 0.00223211, throughput 2.28816K wps
[Epoch 72 Batch 60/173] avg loss 0.00226529, throughput 2.19405K wps
[Epoch 72 Batch 90/173] avg loss 0.00228748, throughput 2.22748K wps
[Epoch 72 Batch 120/173] avg loss 0.00220873, throughput 2.22557K wps
[Epoch 72 Batch 150/173] avg loss 0.00215467, throughput 2.22313K wps
Begin Testing...
[Epoch 72] train avg loss 0.00224056, dev acc 0.8019, dev avg loss 0.454958, throughput 2.22957K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/173] avg loss 0.00200045, throughput 2.25953K wps
[Epoch 73 Batch 60/173] avg loss 0.00218588, throughput 2.22609K wps
[Epoch 73 Batch 90/173] avg loss 0.00207761, throughput 2.22256K wps
[Epoch 73 Batch 120/173] avg loss 0.00209927, throughput 2.22395K wps
[Epoch 73 Batch 150/173] avg loss 0.00212359, throughput 2.22245K wps
Begin Testing...
[Epoch 73] train avg loss 0.00213328, dev acc 0.7987, dev avg loss 0.454442, throughput 2.23068K wps
[Epoch 74 Batch 30/173] avg loss 0.00208492, throughput 2.26953K wps
[Epoch 74 Batch 60/173] avg loss 0.00199334, throughput 2.22474K wps
[Epoch 74 Batch 90/173] avg loss 0.00204333, throughput 2.20828K wps
[Epoch 74 Batch 120/173] avg loss 0.00202969, throughput 2.19942K wps
[Epoch 74 Batch 150/173] avg loss 0.00201701, throughput 2.19126K wps
Begin Testing...
[Epoch 74] train avg loss 0.00204938, dev acc 0.7987, dev avg loss 0.4575, throughput 2.21606K wps
[Epoch 75 Batch 30/173] avg loss 0.00207421, throughput 2.25776K wps
[Epoch 75 Batch 60/173] avg loss 0.00198154, throughput 2.21427K wps
[Epoch 75 Batch 90/173] avg loss 0.00208517, throughput 2.21938K wps
[Epoch 75 Batch 120/173] avg loss 0.00209864, throughput 2.21782K wps
[Epoch 75 Batch 150/173] avg loss 0.00211409, throughput 2.20421K wps
Begin Testing...
[Epoch 75] train avg loss 0.00205758, dev acc 0.8029, dev avg loss 0.457475, throughput 2.22113K wps
Observed Improvement.
Begin Testing...
[Epoch 76 Batch 30/173] avg loss 0.00199329, throughput 2.26327K wps
[Epoch 76 Batch 60/173] avg loss 0.00180028, throughput 2.22828K wps
[Epoch 76 Batch 90/173] avg loss 0.00205349, throughput 2.22664K wps
[Epoch 76 Batch 120/173] avg loss 0.00185261, throughput 2.20478K wps
[Epoch 76 Batch 150/173] avg loss 0.00191839, throughput 2.20935K wps
Begin Testing...
[Epoch 76] train avg loss 0.00194745, dev acc 0.8008, dev avg loss 0.457307, throughput 2.22678K wps
[Epoch 77 Batch 30/173] avg loss 0.00181199, throughput 2.28631K wps
[Epoch 77 Batch 60/173] avg loss 0.00197026, throughput 2.2256K wps
[Epoch 77 Batch 90/173] avg loss 0.0018885, throughput 2.2207K wps
[Epoch 77 Batch 120/173] avg loss 0.00189443, throughput 2.22304K wps
[Epoch 77 Batch 150/173] avg loss 0.00196033, throughput 2.22583K wps
Begin Testing...
[Epoch 77] train avg loss 0.00191389, dev acc 0.7998, dev avg loss 0.461417, throughput 2.2358K wps
[Epoch 78 Batch 30/173] avg loss 0.001733, throughput 2.28444K wps
[Epoch 78 Batch 60/173] avg loss 0.00190833, throughput 2.21848K wps
[Epoch 78 Batch 90/173] avg loss 0.00187509, throughput 2.22264K wps
[Epoch 78 Batch 120/173] avg loss 0.00201371, throughput 2.22349K wps
[Epoch 78 Batch 150/173] avg loss 0.00185285, throughput 2.22169K wps
Begin Testing...
[Epoch 78] train avg loss 0.00188062, dev acc 0.7914, dev avg loss 0.460889, throughput 2.23372K wps
[Epoch 79 Batch 30/173] avg loss 0.00187168, throughput 2.26633K wps
[Epoch 79 Batch 60/173] avg loss 0.00169653, throughput 2.23163K wps
[Epoch 79 Batch 90/173] avg loss 0.00174926, throughput 2.22692K wps
[Epoch 79 Batch 120/173] avg loss 0.00202991, throughput 2.2257K wps
[Epoch 79 Batch 150/173] avg loss 0.00185348, throughput 2.20077K wps
Begin Testing...
[Epoch 79] train avg loss 0.00183648, dev acc 0.8019, dev avg loss 0.462693, throughput 2.227K wps
[Epoch 80 Batch 30/173] avg loss 0.00182401, throughput 2.26552K wps
[Epoch 80 Batch 60/173] avg loss 0.00176418, throughput 2.2287K wps
[Epoch 80 Batch 90/173] avg loss 0.00180988, throughput 2.2356K wps
[Epoch 80 Batch 120/173] avg loss 0.00188915, throughput 2.23254K wps
[Epoch 80 Batch 150/173] avg loss 0.00164774, throughput 2.2357K wps
Begin Testing...
[Epoch 80] train avg loss 0.00178528, dev acc 0.7925, dev avg loss 0.463835, throughput 2.23777K wps
[Epoch 81 Batch 30/173] avg loss 0.0017461, throughput 2.27235K wps
[Epoch 81 Batch 60/173] avg loss 0.0015701, throughput 2.21089K wps
[Epoch 81 Batch 90/173] avg loss 0.00162447, throughput 2.22451K wps
[Epoch 81 Batch 120/173] avg loss 0.00181735, throughput 2.22849K wps
[Epoch 81 Batch 150/173] avg loss 0.0016501, throughput 2.2315K wps
Begin Testing...
[Epoch 81] train avg loss 0.00171658, dev acc 0.7946, dev avg loss 0.466199, throughput 2.23342K wps
[Epoch 82 Batch 30/173] avg loss 0.00158189, throughput 2.26106K wps
[Epoch 82 Batch 60/173] avg loss 0.00177589, throughput 2.22871K wps
[Epoch 82 Batch 90/173] avg loss 0.00159114, throughput 2.23462K wps
[Epoch 82 Batch 120/173] avg loss 0.00150346, throughput 2.23099K wps
[Epoch 82 Batch 150/173] avg loss 0.00165173, throughput 2.23034K wps
Begin Testing...
[Epoch 82] train avg loss 0.00165333, dev acc 0.7967, dev avg loss 0.466283, throughput 2.23694K wps
[Epoch 83 Batch 30/173] avg loss 0.00170136, throughput 2.28201K wps
[Epoch 83 Batch 60/173] avg loss 0.00170933, throughput 2.22899K wps
[Epoch 83 Batch 90/173] avg loss 0.00157827, throughput 2.22715K wps
[Epoch 83 Batch 120/173] avg loss 0.00175026, throughput 2.23175K wps
[Epoch 83 Batch 150/173] avg loss 0.00163005, throughput 2.22429K wps
Begin Testing...
[Epoch 83] train avg loss 0.00167978, dev acc 0.7987, dev avg loss 0.468069, throughput 2.23798K wps
[Epoch 84 Batch 30/173] avg loss 0.00156262, throughput 2.26822K wps
[Epoch 84 Batch 60/173] avg loss 0.00164502, throughput 2.21681K wps
[Epoch 84 Batch 90/173] avg loss 0.00152487, throughput 2.22522K wps
[Epoch 84 Batch 120/173] avg loss 0.00171007, throughput 2.22631K wps
[Epoch 84 Batch 150/173] avg loss 0.00164874, throughput 2.19531K wps
Begin Testing...
[Epoch 84] train avg loss 0.00161931, dev acc 0.7946, dev avg loss 0.469037, throughput 2.22482K wps
[Epoch 85 Batch 30/173] avg loss 0.00150911, throughput 2.26222K wps
[Epoch 85 Batch 60/173] avg loss 0.00155036, throughput 2.23139K wps
[Epoch 85 Batch 90/173] avg loss 0.00165103, throughput 2.21737K wps
[Epoch 85 Batch 120/173] avg loss 0.00158203, throughput 2.23279K wps
[Epoch 85 Batch 150/173] avg loss 0.00149242, throughput 2.22685K wps
Begin Testing...
[Epoch 85] train avg loss 0.00156134, dev acc 0.8040, dev avg loss 0.470199, throughput 2.2324K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/173] avg loss 0.00142175, throughput 2.26633K wps
[Epoch 86 Batch 60/173] avg loss 0.00151243, throughput 2.20979K wps
[Epoch 86 Batch 90/173] avg loss 0.00153514, throughput 2.20108K wps
[Epoch 86 Batch 120/173] avg loss 0.00142266, throughput 2.23557K wps
[Epoch 86 Batch 150/173] avg loss 0.00166265, throughput 2.2234K wps
Begin Testing...
[Epoch 86] train avg loss 0.00151346, dev acc 0.7987, dev avg loss 0.472372, throughput 2.22755K wps
[Epoch 87 Batch 30/173] avg loss 0.00142197, throughput 2.26607K wps
[Epoch 87 Batch 60/173] avg loss 0.00160623, throughput 2.21243K wps
[Epoch 87 Batch 90/173] avg loss 0.00154691, throughput 2.21284K wps
[Epoch 87 Batch 120/173] avg loss 0.00153744, throughput 2.21867K wps
[Epoch 87 Batch 150/173] avg loss 0.00133294, throughput 2.2244K wps
Begin Testing...
[Epoch 87] train avg loss 0.00150496, dev acc 0.7956, dev avg loss 0.474171, throughput 2.22762K wps
[Epoch 88 Batch 30/173] avg loss 0.00145152, throughput 2.27283K wps
[Epoch 88 Batch 60/173] avg loss 0.00144183, throughput 2.2112K wps
[Epoch 88 Batch 90/173] avg loss 0.00154615, throughput 2.22288K wps
[Epoch 88 Batch 120/173] avg loss 0.00141847, throughput 2.23089K wps
[Epoch 88 Batch 150/173] avg loss 0.00137226, throughput 2.237K wps
Begin Testing...
[Epoch 88] train avg loss 0.00144698, dev acc 0.8019, dev avg loss 0.47678, throughput 2.23341K wps
[Epoch 89 Batch 30/173] avg loss 0.00143213, throughput 2.25356K wps
[Epoch 89 Batch 60/173] avg loss 0.00145997, throughput 2.20994K wps
[Epoch 89 Batch 90/173] avg loss 0.00134875, throughput 2.1981K wps
[Epoch 89 Batch 120/173] avg loss 0.00150246, throughput 2.22784K wps
[Epoch 89 Batch 150/173] avg loss 0.00148089, throughput 2.23366K wps
Begin Testing...
[Epoch 89] train avg loss 0.00144362, dev acc 0.7956, dev avg loss 0.475557, throughput 2.22583K wps
[Epoch 90 Batch 30/173] avg loss 0.00138537, throughput 2.27833K wps
[Epoch 90 Batch 60/173] avg loss 0.00133703, throughput 2.22679K wps
[Epoch 90 Batch 90/173] avg loss 0.00135308, throughput 2.23086K wps
[Epoch 90 Batch 120/173] avg loss 0.00136797, throughput 2.22368K wps
[Epoch 90 Batch 150/173] avg loss 0.00145166, throughput 2.22122K wps
Begin Testing...
[Epoch 90] train avg loss 0.00139526, dev acc 0.7987, dev avg loss 0.475419, throughput 2.23534K wps
[Epoch 91 Batch 30/173] avg loss 0.00136381, throughput 2.2679K wps
[Epoch 91 Batch 60/173] avg loss 0.00135758, throughput 2.21816K wps
[Epoch 91 Batch 90/173] avg loss 0.0012323, throughput 2.23161K wps
[Epoch 91 Batch 120/173] avg loss 0.0014494, throughput 2.23135K wps
[Epoch 91 Batch 150/173] avg loss 0.00144812, throughput 2.2328K wps
Begin Testing...
[Epoch 91] train avg loss 0.00137739, dev acc 0.7967, dev avg loss 0.47692, throughput 2.23607K wps
[Epoch 92 Batch 30/173] avg loss 0.00138, throughput 2.29003K wps
[Epoch 92 Batch 60/173] avg loss 0.00133506, throughput 2.2194K wps
[Epoch 92 Batch 90/173] avg loss 0.00131016, throughput 2.22539K wps
[Epoch 92 Batch 120/173] avg loss 0.00137311, throughput 2.22873K wps
[Epoch 92 Batch 150/173] avg loss 0.00129087, throughput 2.23327K wps
Begin Testing...
[Epoch 92] train avg loss 0.00134162, dev acc 0.7967, dev avg loss 0.479572, throughput 2.23764K wps
[Epoch 93 Batch 30/173] avg loss 0.00130794, throughput 2.28136K wps
[Epoch 93 Batch 60/173] avg loss 0.00127547, throughput 2.23332K wps
[Epoch 93 Batch 90/173] avg loss 0.0011823, throughput 2.22556K wps
[Epoch 93 Batch 120/173] avg loss 0.00141237, throughput 2.22941K wps
[Epoch 93 Batch 150/173] avg loss 0.00135657, throughput 2.21941K wps
Begin Testing...
[Epoch 93] train avg loss 0.00130285, dev acc 0.8019, dev avg loss 0.482307, throughput 2.23753K wps
[Epoch 94 Batch 30/173] avg loss 0.00131397, throughput 2.27774K wps
[Epoch 94 Batch 60/173] avg loss 0.00127705, throughput 2.22702K wps
[Epoch 94 Batch 90/173] avg loss 0.00120643, throughput 2.23537K wps
[Epoch 94 Batch 120/173] avg loss 0.00130075, throughput 2.20538K wps
[Epoch 94 Batch 150/173] avg loss 0.00119471, throughput 2.22357K wps
Begin Testing...
[Epoch 94] train avg loss 0.00126963, dev acc 0.7967, dev avg loss 0.482919, throughput 2.2305K wps
[Epoch 95 Batch 30/173] avg loss 0.00123829, throughput 2.2598K wps
[Epoch 95 Batch 60/173] avg loss 0.00126987, throughput 2.20381K wps
[Epoch 95 Batch 90/173] avg loss 0.00129948, throughput 2.21906K wps
[Epoch 95 Batch 120/173] avg loss 0.00116402, throughput 2.21863K wps
[Epoch 95 Batch 150/173] avg loss 0.00117419, throughput 2.2405K wps
Begin Testing...
[Epoch 95] train avg loss 0.00123558, dev acc 0.7956, dev avg loss 0.482843, throughput 2.22928K wps
[Epoch 96 Batch 30/173] avg loss 0.00129841, throughput 2.25472K wps
[Epoch 96 Batch 60/173] avg loss 0.00123089, throughput 2.21974K wps
[Epoch 96 Batch 90/173] avg loss 0.00124959, throughput 2.23201K wps
[Epoch 96 Batch 120/173] avg loss 0.0011425, throughput 2.23218K wps
[Epoch 96 Batch 150/173] avg loss 0.00127802, throughput 2.22177K wps
Begin Testing...
[Epoch 96] train avg loss 0.00123904, dev acc 0.7946, dev avg loss 0.4848, throughput 2.22938K wps
[Epoch 97 Batch 30/173] avg loss 0.0012028, throughput 2.24131K wps
[Epoch 97 Batch 60/173] avg loss 0.00119183, throughput 2.22532K wps
[Epoch 97 Batch 90/173] avg loss 0.00120656, throughput 2.22306K wps
[Epoch 97 Batch 120/173] avg loss 0.00113037, throughput 2.23275K wps
[Epoch 97 Batch 150/173] avg loss 0.00121333, throughput 2.22391K wps
Begin Testing...
[Epoch 97] train avg loss 0.0012088, dev acc 0.7977, dev avg loss 0.485885, throughput 2.22989K wps
[Epoch 98 Batch 30/173] avg loss 0.00121377, throughput 2.24657K wps
[Epoch 98 Batch 60/173] avg loss 0.00111819, throughput 2.22834K wps
[Epoch 98 Batch 90/173] avg loss 0.00122604, throughput 2.21809K wps
[Epoch 98 Batch 120/173] avg loss 0.00126517, throughput 2.21821K wps
[Epoch 98 Batch 150/173] avg loss 0.0012306, throughput 2.21436K wps
Begin Testing...
[Epoch 98] train avg loss 0.00119867, dev acc 0.7977, dev avg loss 0.486079, throughput 2.22659K wps
[Epoch 99 Batch 30/173] avg loss 0.00104191, throughput 2.25714K wps
[Epoch 99 Batch 60/173] avg loss 0.00123777, throughput 2.22625K wps
[Epoch 99 Batch 90/173] avg loss 0.0011443, throughput 2.23977K wps
[Epoch 99 Batch 120/173] avg loss 0.00106748, throughput 2.22417K wps
[Epoch 99 Batch 150/173] avg loss 0.00126697, throughput 2.22712K wps
Begin Testing...
[Epoch 99] train avg loss 0.00115514, dev acc 0.7977, dev avg loss 0.488726, throughput 2.23386K wps
[Epoch 100 Batch 30/173] avg loss 0.00103301, throughput 2.27976K wps
[Epoch 100 Batch 60/173] avg loss 0.00118687, throughput 2.23364K wps
[Epoch 100 Batch 90/173] avg loss 0.00109189, throughput 2.21492K wps
[Epoch 100 Batch 120/173] avg loss 0.00100999, throughput 2.22537K wps
[Epoch 100 Batch 150/173] avg loss 0.00110856, throughput 2.22624K wps
Begin Testing...
[Epoch 100] train avg loss 0.00109898, dev acc 0.8040, dev avg loss 0.494608, throughput 2.23479K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/173] avg loss 0.00116043, throughput 2.26487K wps
[Epoch 101 Batch 60/173] avg loss 0.00113528, throughput 2.21926K wps
[Epoch 101 Batch 90/173] avg loss 0.00120893, throughput 2.21872K wps
[Epoch 101 Batch 120/173] avg loss 0.00117889, throughput 2.23395K wps
[Epoch 101 Batch 150/173] avg loss 0.00121441, throughput 2.2153K wps
Begin Testing...
[Epoch 101] train avg loss 0.0011632, dev acc 0.8008, dev avg loss 0.490938, throughput 2.23087K wps
[Epoch 102 Batch 30/173] avg loss 0.00109186, throughput 2.27212K wps
[Epoch 102 Batch 60/173] avg loss 0.00104688, throughput 2.22558K wps
[Epoch 102 Batch 90/173] avg loss 0.0010756, throughput 2.20182K wps
[Epoch 102 Batch 120/173] avg loss 0.00107378, throughput 2.23699K wps
[Epoch 102 Batch 150/173] avg loss 0.00112692, throughput 2.23702K wps
Begin Testing...
[Epoch 102] train avg loss 0.00108017, dev acc 0.8008, dev avg loss 0.492995, throughput 2.23481K wps
[Epoch 103 Batch 30/173] avg loss 0.00107158, throughput 2.27808K wps
[Epoch 103 Batch 60/173] avg loss 0.00123857, throughput 2.22282K wps
[Epoch 103 Batch 90/173] avg loss 0.00101283, throughput 2.23595K wps
[Epoch 103 Batch 120/173] avg loss 0.00113891, throughput 2.23591K wps
[Epoch 103 Batch 150/173] avg loss 0.00107911, throughput 2.23274K wps
Begin Testing...
[Epoch 103] train avg loss 0.00109474, dev acc 0.7967, dev avg loss 0.494251, throughput 2.2407K wps
[Epoch 104 Batch 30/173] avg loss 0.00103703, throughput 2.26348K wps
[Epoch 104 Batch 60/173] avg loss 0.00105078, throughput 2.23047K wps
[Epoch 104 Batch 90/173] avg loss 0.000921962, throughput 2.22558K wps
[Epoch 104 Batch 120/173] avg loss 0.00107913, throughput 2.22798K wps
[Epoch 104 Batch 150/173] avg loss 0.00112243, throughput 2.2313K wps
Begin Testing...
[Epoch 104] train avg loss 0.001039, dev acc 0.7977, dev avg loss 0.49561, throughput 2.23409K wps
[Epoch 105 Batch 30/173] avg loss 0.0011182, throughput 2.25814K wps
[Epoch 105 Batch 60/173] avg loss 0.000933515, throughput 2.22121K wps
[Epoch 105 Batch 90/173] avg loss 0.00104706, throughput 2.23558K wps
[Epoch 105 Batch 120/173] avg loss 0.00100151, throughput 2.24019K wps
[Epoch 105 Batch 150/173] avg loss 0.00102834, throughput 2.23858K wps
Begin Testing...
[Epoch 105] train avg loss 0.00102423, dev acc 0.7967, dev avg loss 0.498072, throughput 2.23848K wps
[Epoch 106 Batch 30/173] avg loss 0.00101829, throughput 2.26873K wps
[Epoch 106 Batch 60/173] avg loss 0.00115833, throughput 2.22867K wps
[Epoch 106 Batch 90/173] avg loss 0.000905284, throughput 2.2277K wps
[Epoch 106 Batch 120/173] avg loss 0.000829951, throughput 2.23764K wps
[Epoch 106 Batch 150/173] avg loss 0.00100101, throughput 2.21691K wps
Begin Testing...
[Epoch 106] train avg loss 0.000976984, dev acc 0.7967, dev avg loss 0.499773, throughput 2.23394K wps
[Epoch 107 Batch 30/173] avg loss 0.000997926, throughput 2.27874K wps
[Epoch 107 Batch 60/173] avg loss 0.00106664, throughput 2.23324K wps
[Epoch 107 Batch 90/173] avg loss 0.00103335, throughput 2.22058K wps
[Epoch 107 Batch 120/173] avg loss 0.000925549, throughput 2.22662K wps
[Epoch 107 Batch 150/173] avg loss 0.00109157, throughput 2.23935K wps
Begin Testing...
[Epoch 107] train avg loss 0.00101774, dev acc 0.7987, dev avg loss 0.498672, throughput 2.23916K wps
[Epoch 108 Batch 30/173] avg loss 0.00103167, throughput 2.26778K wps
[Epoch 108 Batch 60/173] avg loss 0.000965383, throughput 2.22497K wps
[Epoch 108 Batch 90/173] avg loss 0.000925873, throughput 2.23017K wps
[Epoch 108 Batch 120/173] avg loss 0.000919533, throughput 2.23792K wps
[Epoch 108 Batch 150/173] avg loss 0.000921503, throughput 2.22784K wps
Begin Testing...
[Epoch 108] train avg loss 0.000961018, dev acc 0.7967, dev avg loss 0.501713, throughput 2.23661K wps
[Epoch 109 Batch 30/173] avg loss 0.000989905, throughput 2.27464K wps
[Epoch 109 Batch 60/173] avg loss 0.000943998, throughput 2.22755K wps
[Epoch 109 Batch 90/173] avg loss 0.000996957, throughput 2.22243K wps
[Epoch 109 Batch 120/173] avg loss 0.001026, throughput 2.23284K wps
[Epoch 109 Batch 150/173] avg loss 0.00100128, throughput 2.21515K wps
Begin Testing...
[Epoch 109] train avg loss 0.000992894, dev acc 0.7998, dev avg loss 0.501958, throughput 2.23393K wps
[Epoch 110 Batch 30/173] avg loss 0.000907067, throughput 2.25849K wps
[Epoch 110 Batch 60/173] avg loss 0.00100563, throughput 2.22283K wps
[Epoch 110 Batch 90/173] avg loss 0.00101978, throughput 2.22759K wps
[Epoch 110 Batch 120/173] avg loss 0.000937943, throughput 2.21662K wps
[Epoch 110 Batch 150/173] avg loss 0.000940836, throughput 2.21672K wps
Begin Testing...
[Epoch 110] train avg loss 0.000960533, dev acc 0.7977, dev avg loss 0.502712, throughput 2.22739K wps
[Epoch 111 Batch 30/173] avg loss 0.000870442, throughput 2.26538K wps
[Epoch 111 Batch 60/173] avg loss 0.000880836, throughput 2.23319K wps
[Epoch 111 Batch 90/173] avg loss 0.00087881, throughput 2.23695K wps
[Epoch 111 Batch 120/173] avg loss 0.000949133, throughput 2.21486K wps
[Epoch 111 Batch 150/173] avg loss 0.000925746, throughput 2.22746K wps
Begin Testing...
[Epoch 111] train avg loss 0.000903744, dev acc 0.7998, dev avg loss 0.508075, throughput 2.23468K wps
[Epoch 112 Batch 30/173] avg loss 0.000828325, throughput 2.24006K wps
[Epoch 112 Batch 60/173] avg loss 0.000839364, throughput 2.21588K wps
[Epoch 112 Batch 90/173] avg loss 0.000810049, throughput 2.21801K wps
[Epoch 112 Batch 120/173] avg loss 0.00102479, throughput 2.2259K wps
[Epoch 112 Batch 150/173] avg loss 0.000854521, throughput 2.2316K wps
Begin Testing...
[Epoch 112] train avg loss 0.000873968, dev acc 0.7998, dev avg loss 0.507996, throughput 2.22712K wps
[Epoch 113 Batch 30/173] avg loss 0.000861436, throughput 2.25114K wps
[Epoch 113 Batch 60/173] avg loss 0.000898085, throughput 2.2312K wps
[Epoch 113 Batch 90/173] avg loss 0.000895525, throughput 2.21938K wps
[Epoch 113 Batch 120/173] avg loss 0.000879803, throughput 2.23933K wps
[Epoch 113 Batch 150/173] avg loss 0.000945738, throughput 2.2327K wps
Begin Testing...
[Epoch 113] train avg loss 0.000882887, dev acc 0.8019, dev avg loss 0.507643, throughput 2.23212K wps
[Epoch 114 Batch 30/173] avg loss 0.000787733, throughput 2.25281K wps
[Epoch 114 Batch 60/173] avg loss 0.0008455, throughput 2.21575K wps
[Epoch 114 Batch 90/173] avg loss 0.00089901, throughput 2.20661K wps
[Epoch 114 Batch 120/173] avg loss 0.000828931, throughput 2.22752K wps
[Epoch 114 Batch 150/173] avg loss 0.000848995, throughput 2.23365K wps
Begin Testing...
[Epoch 114] train avg loss 0.000846313, dev acc 0.7977, dev avg loss 0.509348, throughput 2.22908K wps
[Epoch 115 Batch 30/173] avg loss 0.000874048, throughput 2.27222K wps
[Epoch 115 Batch 60/173] avg loss 0.000854789, throughput 2.23095K wps
[Epoch 115 Batch 90/173] avg loss 0.000821796, throughput 2.20326K wps
[Epoch 115 Batch 120/173] avg loss 0.000836377, throughput 2.22964K wps
[Epoch 115 Batch 150/173] avg loss 0.000815624, throughput 2.23218K wps
Begin Testing...
[Epoch 115] train avg loss 0.000845636, dev acc 0.7977, dev avg loss 0.509581, throughput 2.23117K wps
[Epoch 116 Batch 30/173] avg loss 0.000906785, throughput 2.2764K wps
[Epoch 116 Batch 60/173] avg loss 0.000796855, throughput 2.21205K wps
[Epoch 116 Batch 90/173] avg loss 0.000879626, throughput 2.20063K wps
[Epoch 116 Batch 120/173] avg loss 0.000792513, throughput 2.22894K wps
[Epoch 116 Batch 150/173] avg loss 0.000831395, throughput 2.23232K wps
Begin Testing...
[Epoch 116] train avg loss 0.00083602, dev acc 0.7998, dev avg loss 0.512, throughput 2.23019K wps
[Epoch 117 Batch 30/173] avg loss 0.000842281, throughput 2.25878K wps
[Epoch 117 Batch 60/173] avg loss 0.000790361, throughput 2.23083K wps
[Epoch 117 Batch 90/173] avg loss 0.000843635, throughput 2.23082K wps
[Epoch 117 Batch 120/173] avg loss 0.000839882, throughput 2.22802K wps
[Epoch 117 Batch 150/173] avg loss 0.000754801, throughput 2.21728K wps
Begin Testing...
[Epoch 117] train avg loss 0.000809532, dev acc 0.8040, dev avg loss 0.513479, throughput 2.22903K wps
Observed Improvement.
Begin Testing...
[Epoch 118 Batch 30/173] avg loss 0.000850983, throughput 2.29049K wps
[Epoch 118 Batch 60/173] avg loss 0.000797824, throughput 2.23115K wps
[Epoch 118 Batch 90/173] avg loss 0.000758812, throughput 2.22197K wps
[Epoch 118 Batch 120/173] avg loss 0.000959519, throughput 2.21958K wps
[Epoch 118 Batch 150/173] avg loss 0.000857963, throughput 2.21439K wps
Begin Testing...
[Epoch 118] train avg loss 0.000837334, dev acc 0.7956, dev avg loss 0.514826, throughput 2.23155K wps
[Epoch 119 Batch 30/173] avg loss 0.000777321, throughput 2.25426K wps
[Epoch 119 Batch 60/173] avg loss 0.000747858, throughput 2.23229K wps
[Epoch 119 Batch 90/173] avg loss 0.000902671, throughput 2.20118K wps
[Epoch 119 Batch 120/173] avg loss 0.000857316, throughput 2.21175K wps
[Epoch 119 Batch 150/173] avg loss 0.000841673, throughput 2.19397K wps
Begin Testing...
[Epoch 119] train avg loss 0.00082853, dev acc 0.8019, dev avg loss 0.51257, throughput 2.2177K wps
[Epoch 120 Batch 30/173] avg loss 0.00070571, throughput 2.26704K wps
[Epoch 120 Batch 60/173] avg loss 0.000664096, throughput 2.20919K wps
[Epoch 120 Batch 90/173] avg loss 0.000822171, throughput 2.21478K wps
[Epoch 120 Batch 120/173] avg loss 0.000801482, throughput 2.23619K wps
[Epoch 120 Batch 150/173] avg loss 0.000866892, throughput 2.23887K wps
Begin Testing...
[Epoch 120] train avg loss 0.000779444, dev acc 0.7956, dev avg loss 0.515108, throughput 2.23206K wps
[Epoch 121 Batch 30/173] avg loss 0.000806075, throughput 2.26499K wps
[Epoch 121 Batch 60/173] avg loss 0.000861536, throughput 2.23517K wps
[Epoch 121 Batch 90/173] avg loss 0.000819973, throughput 2.22359K wps
[Epoch 121 Batch 120/173] avg loss 0.000752081, throughput 2.22086K wps
[Epoch 121 Batch 150/173] avg loss 0.000804099, throughput 2.23568K wps
Begin Testing...
[Epoch 121] train avg loss 0.000797211, dev acc 0.7967, dev avg loss 0.514971, throughput 2.23529K wps
[Epoch 122 Batch 30/173] avg loss 0.00062859, throughput 2.26773K wps
[Epoch 122 Batch 60/173] avg loss 0.000793934, throughput 2.23011K wps
[Epoch 122 Batch 90/173] avg loss 0.00083303, throughput 2.23726K wps
[Epoch 122 Batch 120/173] avg loss 0.000804024, throughput 2.23488K wps
[Epoch 122 Batch 150/173] avg loss 0.000776021, throughput 2.23494K wps
Begin Testing...
[Epoch 122] train avg loss 0.000771586, dev acc 0.7977, dev avg loss 0.518257, throughput 2.24017K wps
[Epoch 123 Batch 30/173] avg loss 0.000704015, throughput 2.2821K wps
[Epoch 123 Batch 60/173] avg loss 0.000688333, throughput 2.20947K wps
[Epoch 123 Batch 90/173] avg loss 0.000833789, throughput 2.22003K wps
[Epoch 123 Batch 120/173] avg loss 0.000661854, throughput 2.22538K wps
[Epoch 123 Batch 150/173] avg loss 0.000757109, throughput 2.2299K wps
Begin Testing...
[Epoch 123] train avg loss 0.0007345, dev acc 0.7977, dev avg loss 0.521077, throughput 2.23308K wps
[Epoch 124 Batch 30/173] avg loss 0.000742889, throughput 2.26969K wps
[Epoch 124 Batch 60/173] avg loss 0.000696507, throughput 2.19975K wps
[Epoch 124 Batch 90/173] avg loss 0.00077872, throughput 2.21555K wps
[Epoch 124 Batch 120/173] avg loss 0.000771193, throughput 2.21715K wps
[Epoch 124 Batch 150/173] avg loss 0.000739922, throughput 2.22436K wps
Begin Testing...
[Epoch 124] train avg loss 0.000741959, dev acc 0.8008, dev avg loss 0.522003, throughput 2.22445K wps
[Epoch 125 Batch 30/173] avg loss 0.000743671, throughput 2.26232K wps
[Epoch 125 Batch 60/173] avg loss 0.000607021, throughput 2.22227K wps
[Epoch 125 Batch 90/173] avg loss 0.000697554, throughput 2.20974K wps
[Epoch 125 Batch 120/173] avg loss 0.000748077, throughput 2.21569K wps
[Epoch 125 Batch 150/173] avg loss 0.000728394, throughput 2.2196K wps
Begin Testing...
[Epoch 125] train avg loss 0.000715728, dev acc 0.7946, dev avg loss 0.524687, throughput 2.2217K wps
[Epoch 126 Batch 30/173] avg loss 0.000695355, throughput 2.24777K wps
[Epoch 126 Batch 60/173] avg loss 0.000742445, throughput 2.23779K wps
[Epoch 126 Batch 90/173] avg loss 0.000701195, throughput 2.219K wps
[Epoch 126 Batch 120/173] avg loss 0.000696395, throughput 2.2195K wps
[Epoch 126 Batch 150/173] avg loss 0.000726894, throughput 2.20817K wps
Begin Testing...
[Epoch 126] train avg loss 0.000713869, dev acc 0.7987, dev avg loss 0.52474, throughput 2.226K wps
[Epoch 127 Batch 30/173] avg loss 0.000674237, throughput 2.27024K wps
[Epoch 127 Batch 60/173] avg loss 0.000686827, throughput 2.22808K wps
[Epoch 127 Batch 90/173] avg loss 0.000697419, throughput 2.2292K wps
[Epoch 127 Batch 120/173] avg loss 0.000695141, throughput 2.2326K wps
[Epoch 127 Batch 150/173] avg loss 0.000693905, throughput 2.22257K wps
Begin Testing...
[Epoch 127] train avg loss 0.000681869, dev acc 0.8019, dev avg loss 0.526459, throughput 2.23332K wps
[Epoch 128 Batch 30/173] avg loss 0.000584098, throughput 2.26294K wps
[Epoch 128 Batch 60/173] avg loss 0.000728097, throughput 2.23195K wps
[Epoch 128 Batch 90/173] avg loss 0.000646979, throughput 2.23278K wps
[Epoch 128 Batch 120/173] avg loss 0.000680886, throughput 2.23011K wps
[Epoch 128 Batch 150/173] avg loss 0.000733354, throughput 2.22706K wps
Begin Testing...
[Epoch 128] train avg loss 0.000671842, dev acc 0.7998, dev avg loss 0.534322, throughput 2.23219K wps
[Epoch 129 Batch 30/173] avg loss 0.000712768, throughput 2.2641K wps
[Epoch 129 Batch 60/173] avg loss 0.000692312, throughput 2.21261K wps
[Epoch 129 Batch 90/173] avg loss 0.000691303, throughput 2.22749K wps
[Epoch 129 Batch 120/173] avg loss 0.00061016, throughput 2.23443K wps
[Epoch 129 Batch 150/173] avg loss 0.000659492, throughput 2.21825K wps
Begin Testing...
[Epoch 129] train avg loss 0.000674234, dev acc 0.7967, dev avg loss 0.529887, throughput 2.23079K wps
[Epoch 130 Batch 30/173] avg loss 0.000697509, throughput 2.24308K wps
[Epoch 130 Batch 60/173] avg loss 0.000789168, throughput 2.22993K wps
[Epoch 130 Batch 90/173] avg loss 0.000667295, throughput 2.1964K wps
[Epoch 130 Batch 120/173] avg loss 0.000607284, throughput 2.20657K wps
[Epoch 130 Batch 150/173] avg loss 0.000608292, throughput 2.23324K wps
Begin Testing...
[Epoch 130] train avg loss 0.000674275, dev acc 0.7977, dev avg loss 0.528392, throughput 2.2234K wps
[Epoch 131 Batch 30/173] avg loss 0.000664397, throughput 2.26244K wps
[Epoch 131 Batch 60/173] avg loss 0.000608475, throughput 2.23089K wps
[Epoch 131 Batch 90/173] avg loss 0.00071651, throughput 2.21066K wps
[Epoch 131 Batch 120/173] avg loss 0.000576096, throughput 2.22792K wps
[Epoch 131 Batch 150/173] avg loss 0.000724607, throughput 2.21003K wps
Begin Testing...
[Epoch 131] train avg loss 0.000657474, dev acc 0.7977, dev avg loss 0.531666, throughput 2.22883K wps
[Epoch 132 Batch 30/173] avg loss 0.000654956, throughput 2.28114K wps
[Epoch 132 Batch 60/173] avg loss 0.000644369, throughput 2.21121K wps
[Epoch 132 Batch 90/173] avg loss 0.000654174, throughput 2.21281K wps
[Epoch 132 Batch 120/173] avg loss 0.000609283, throughput 2.23475K wps
[Epoch 132 Batch 150/173] avg loss 0.0006644, throughput 2.2342K wps
Begin Testing...
[Epoch 132] train avg loss 0.00063955, dev acc 0.7977, dev avg loss 0.534071, throughput 2.23504K wps
[Epoch 133 Batch 30/173] avg loss 0.000642548, throughput 2.27038K wps
[Epoch 133 Batch 60/173] avg loss 0.000690074, throughput 2.22041K wps
[Epoch 133 Batch 90/173] avg loss 0.000656997, throughput 2.22449K wps
[Epoch 133 Batch 120/173] avg loss 0.000698233, throughput 2.22846K wps
[Epoch 133 Batch 150/173] avg loss 0.000661852, throughput 2.20912K wps
Begin Testing...
[Epoch 133] train avg loss 0.000656409, dev acc 0.7977, dev avg loss 0.533826, throughput 2.22831K wps
[Epoch 134 Batch 30/173] avg loss 0.000613437, throughput 2.2485K wps
[Epoch 134 Batch 60/173] avg loss 0.000648957, throughput 2.20039K wps
[Epoch 134 Batch 90/173] avg loss 0.000615871, throughput 2.2023K wps
[Epoch 134 Batch 120/173] avg loss 0.000611339, throughput 2.22752K wps
[Epoch 134 Batch 150/173] avg loss 0.00058084, throughput 2.23683K wps
Begin Testing...
[Epoch 134] train avg loss 0.000611716, dev acc 0.7967, dev avg loss 0.536837, throughput 2.22345K wps
[Epoch 135 Batch 30/173] avg loss 0.000543643, throughput 2.26891K wps
[Epoch 135 Batch 60/173] avg loss 0.00062224, throughput 2.23338K wps
[Epoch 135 Batch 90/173] avg loss 0.00062439, throughput 2.21259K wps
[Epoch 135 Batch 120/173] avg loss 0.000564579, throughput 2.21567K wps
[Epoch 135 Batch 150/173] avg loss 0.000595925, throughput 2.21982K wps
Begin Testing...
[Epoch 135] train avg loss 0.000591833, dev acc 0.7998, dev avg loss 0.540794, throughput 2.23041K wps
[Epoch 136 Batch 30/173] avg loss 0.000656124, throughput 2.26784K wps
[Epoch 136 Batch 60/173] avg loss 0.000628597, throughput 2.23307K wps
[Epoch 136 Batch 90/173] avg loss 0.000544917, throughput 2.22401K wps
[Epoch 136 Batch 120/173] avg loss 0.000619669, throughput 2.2219K wps
[Epoch 136 Batch 150/173] avg loss 0.000596994, throughput 2.22305K wps
Begin Testing...
[Epoch 136] train avg loss 0.000596331, dev acc 0.7987, dev avg loss 0.539777, throughput 2.23216K wps
[Epoch 137 Batch 30/173] avg loss 0.000560865, throughput 2.25721K wps
[Epoch 137 Batch 60/173] avg loss 0.000649031, throughput 2.2113K wps
[Epoch 137 Batch 90/173] avg loss 0.000502717, throughput 2.21119K wps
[Epoch 137 Batch 120/173] avg loss 0.000571831, throughput 2.23546K wps
[Epoch 137 Batch 150/173] avg loss 0.000627785, throughput 2.21713K wps
Begin Testing...
[Epoch 137] train avg loss 0.000584606, dev acc 0.7977, dev avg loss 0.541318, throughput 2.22561K wps
[Epoch 138 Batch 30/173] avg loss 0.000593841, throughput 2.27615K wps
[Epoch 138 Batch 60/173] avg loss 0.000591273, throughput 2.22611K wps
[Epoch 138 Batch 90/173] avg loss 0.000575161, throughput 2.2185K wps
[Epoch 138 Batch 120/173] avg loss 0.000718556, throughput 2.22113K wps
[Epoch 138 Batch 150/173] avg loss 0.000617068, throughput 2.22969K wps
Begin Testing...
[Epoch 138] train avg loss 0.000611328, dev acc 0.7946, dev avg loss 0.543295, throughput 2.23502K wps
[Epoch 139 Batch 30/173] avg loss 0.000539447, throughput 2.26136K wps
[Epoch 139 Batch 60/173] avg loss 0.000558177, throughput 2.21972K wps
[Epoch 139 Batch 90/173] avg loss 0.000663923, throughput 2.22112K wps
[Epoch 139 Batch 120/173] avg loss 0.000561346, throughput 2.21559K wps
[Epoch 139 Batch 150/173] avg loss 0.000629126, throughput 2.22378K wps
Begin Testing...
[Epoch 139] train avg loss 0.000593588, dev acc 0.7967, dev avg loss 0.544512, throughput 2.2268K wps
[Epoch 140 Batch 30/173] avg loss 0.000498812, throughput 2.2811K wps
[Epoch 140 Batch 60/173] avg loss 0.000617959, throughput 2.22406K wps
[Epoch 140 Batch 90/173] avg loss 0.00058967, throughput 2.23365K wps
[Epoch 140 Batch 120/173] avg loss 0.000498731, throughput 2.20348K wps
[Epoch 140 Batch 150/173] avg loss 0.000561025, throughput 2.23059K wps
Begin Testing...
[Epoch 140] train avg loss 0.000558205, dev acc 0.8008, dev avg loss 0.54528, throughput 2.23123K wps
[Epoch 141 Batch 30/173] avg loss 0.000556126, throughput 2.25578K wps
[Epoch 141 Batch 60/173] avg loss 0.000593734, throughput 2.21074K wps
[Epoch 141 Batch 90/173] avg loss 0.000531024, throughput 2.22078K wps
[Epoch 141 Batch 120/173] avg loss 0.00055091, throughput 2.22617K wps
[Epoch 141 Batch 150/173] avg loss 0.000527604, throughput 2.23118K wps
Begin Testing...
[Epoch 141] train avg loss 0.000551701, dev acc 0.7977, dev avg loss 0.545731, throughput 2.22885K wps
[Epoch 142 Batch 30/173] avg loss 0.000571651, throughput 2.26702K wps
[Epoch 142 Batch 60/173] avg loss 0.000525427, throughput 2.20819K wps
[Epoch 142 Batch 90/173] avg loss 0.000562005, throughput 2.23657K wps
[Epoch 142 Batch 120/173] avg loss 0.000627023, throughput 2.23075K wps
[Epoch 142 Batch 150/173] avg loss 0.000534781, throughput 2.22602K wps
Begin Testing...
[Epoch 142] train avg loss 0.000569632, dev acc 0.7977, dev avg loss 0.547326, throughput 2.23346K wps
[Epoch 143 Batch 30/173] avg loss 0.00048561, throughput 2.27091K wps
[Epoch 143 Batch 60/173] avg loss 0.000576407, throughput 2.22292K wps
[Epoch 143 Batch 90/173] avg loss 0.000577215, throughput 2.21384K wps
[Epoch 143 Batch 120/173] avg loss 0.000493792, throughput 2.21509K wps
[Epoch 143 Batch 150/173] avg loss 0.000574239, throughput 2.21142K wps
Begin Testing...
[Epoch 143] train avg loss 0.00054479, dev acc 0.7977, dev avg loss 0.547906, throughput 2.22507K wps
[Epoch 144 Batch 30/173] avg loss 0.000588605, throughput 2.27326K wps
[Epoch 144 Batch 60/173] avg loss 0.000549271, throughput 2.21338K wps
[Epoch 144 Batch 90/173] avg loss 0.000567283, throughput 2.20217K wps
[Epoch 144 Batch 120/173] avg loss 0.000581011, throughput 2.21281K wps
[Epoch 144 Batch 150/173] avg loss 0.000516869, throughput 2.21642K wps
Begin Testing...
[Epoch 144] train avg loss 0.000552857, dev acc 0.7977, dev avg loss 0.548939, throughput 2.22459K wps
[Epoch 145 Batch 30/173] avg loss 0.000561691, throughput 2.27177K wps
[Epoch 145 Batch 60/173] avg loss 0.00051192, throughput 2.21896K wps
[Epoch 145 Batch 90/173] avg loss 0.000571101, throughput 2.22561K wps
[Epoch 145 Batch 120/173] avg loss 0.000610453, throughput 2.23299K wps
[Epoch 145 Batch 150/173] avg loss 0.000505504, throughput 2.23276K wps
Begin Testing...
[Epoch 145] train avg loss 0.000546506, dev acc 0.8008, dev avg loss 0.55135, throughput 2.23545K wps
[Epoch 146 Batch 30/173] avg loss 0.000535278, throughput 2.25863K wps
[Epoch 146 Batch 60/173] avg loss 0.000503659, throughput 2.23651K wps
[Epoch 146 Batch 90/173] avg loss 0.000509844, throughput 2.23628K wps
[Epoch 146 Batch 120/173] avg loss 0.000445632, throughput 2.23603K wps
[Epoch 146 Batch 150/173] avg loss 0.000485203, throughput 2.23265K wps
Begin Testing...
[Epoch 146] train avg loss 0.000503183, dev acc 0.7998, dev avg loss 0.552546, throughput 2.23799K wps
[Epoch 147 Batch 30/173] avg loss 0.000415413, throughput 2.27454K wps
[Epoch 147 Batch 60/173] avg loss 0.000553521, throughput 2.22412K wps
[Epoch 147 Batch 90/173] avg loss 0.000515211, throughput 2.23023K wps
[Epoch 147 Batch 120/173] avg loss 0.000562557, throughput 2.21978K wps
[Epoch 147 Batch 150/173] avg loss 0.000523681, throughput 2.23597K wps
Begin Testing...
[Epoch 147] train avg loss 0.000514671, dev acc 0.7977, dev avg loss 0.55099, throughput 2.23613K wps
[Epoch 148 Batch 30/173] avg loss 0.00054457, throughput 2.27992K wps
[Epoch 148 Batch 60/173] avg loss 0.000538169, throughput 2.2225K wps
[Epoch 148 Batch 90/173] avg loss 0.000536974, throughput 2.23159K wps
[Epoch 148 Batch 120/173] avg loss 0.000500229, throughput 2.22957K wps
[Epoch 148 Batch 150/173] avg loss 0.000491957, throughput 2.22559K wps
Begin Testing...
[Epoch 148] train avg loss 0.000528121, dev acc 0.7977, dev avg loss 0.552713, throughput 2.23538K wps
[Epoch 149 Batch 30/173] avg loss 0.000526143, throughput 2.28145K wps
[Epoch 149 Batch 60/173] avg loss 0.000513716, throughput 2.22151K wps
[Epoch 149 Batch 90/173] avg loss 0.00044958, throughput 2.23658K wps
[Epoch 149 Batch 120/173] avg loss 0.0004849, throughput 2.23339K wps
[Epoch 149 Batch 150/173] avg loss 0.000554995, throughput 2.22915K wps
Begin Testing...
[Epoch 149] train avg loss 0.000512271, dev acc 0.7977, dev avg loss 0.553186, throughput 2.23991K wps
[Epoch 150 Batch 30/173] avg loss 0.000579616, throughput 2.26701K wps
[Epoch 150 Batch 60/173] avg loss 0.000481146, throughput 2.21541K wps
[Epoch 150 Batch 90/173] avg loss 0.000499742, throughput 2.22661K wps
[Epoch 150 Batch 120/173] avg loss 0.000432587, throughput 2.23672K wps
[Epoch 150 Batch 150/173] avg loss 0.000463522, throughput 2.22933K wps
Begin Testing...
[Epoch 150] train avg loss 0.00049306, dev acc 0.7987, dev avg loss 0.555351, throughput 2.23258K wps
[Epoch 151 Batch 30/173] avg loss 0.000482784, throughput 2.27823K wps
[Epoch 151 Batch 60/173] avg loss 0.000455889, throughput 2.22375K wps
[Epoch 151 Batch 90/173] avg loss 0.000490018, throughput 2.23237K wps
[Epoch 151 Batch 120/173] avg loss 0.00050513, throughput 2.23132K wps
[Epoch 151 Batch 150/173] avg loss 0.000564999, throughput 2.23243K wps
Begin Testing...
[Epoch 151] train avg loss 0.000499996, dev acc 0.7987, dev avg loss 0.558015, throughput 2.23632K wps
[Epoch 152 Batch 30/173] avg loss 0.000441061, throughput 2.27788K wps
[Epoch 152 Batch 60/173] avg loss 0.000490319, throughput 2.23437K wps
[Epoch 152 Batch 90/173] avg loss 0.000483792, throughput 2.23544K wps
[Epoch 152 Batch 120/173] avg loss 0.000510022, throughput 2.22847K wps
[Epoch 152 Batch 150/173] avg loss 0.000557897, throughput 2.20933K wps
Begin Testing...
[Epoch 152] train avg loss 0.000493062, dev acc 0.7987, dev avg loss 0.56024, throughput 2.23282K wps
[Epoch 153 Batch 30/173] avg loss 0.00051968, throughput 2.27718K wps
[Epoch 153 Batch 60/173] avg loss 0.000546831, throughput 2.23119K wps
[Epoch 153 Batch 90/173] avg loss 0.000524213, throughput 2.21706K wps
[Epoch 153 Batch 120/173] avg loss 0.000534989, throughput 2.22608K wps
[Epoch 153 Batch 150/173] avg loss 0.000514023, throughput 2.21827K wps
Begin Testing...
[Epoch 153] train avg loss 0.000516058, dev acc 0.7967, dev avg loss 0.560964, throughput 2.23099K wps
[Epoch 154 Batch 30/173] avg loss 0.000483938, throughput 2.26364K wps
[Epoch 154 Batch 60/173] avg loss 0.000460387, throughput 2.23802K wps
[Epoch 154 Batch 90/173] avg loss 0.000481072, throughput 2.22515K wps
[Epoch 154 Batch 120/173] avg loss 0.0004385, throughput 2.22678K wps
[Epoch 154 Batch 150/173] avg loss 0.000498395, throughput 2.23628K wps
Begin Testing...
[Epoch 154] train avg loss 0.000467854, dev acc 0.7977, dev avg loss 0.562162, throughput 2.23503K wps
[Epoch 155 Batch 30/173] avg loss 0.000422151, throughput 2.25431K wps
[Epoch 155 Batch 60/173] avg loss 0.000483049, throughput 2.23301K wps
[Epoch 155 Batch 90/173] avg loss 0.000410496, throughput 2.23469K wps
[Epoch 155 Batch 120/173] avg loss 0.000486719, throughput 2.21754K wps
[Epoch 155 Batch 150/173] avg loss 0.000531989, throughput 2.22576K wps
Begin Testing...
[Epoch 155] train avg loss 0.000465174, dev acc 0.7967, dev avg loss 0.562943, throughput 2.23239K wps
[Epoch 156 Batch 30/173] avg loss 0.000500461, throughput 2.27977K wps
[Epoch 156 Batch 60/173] avg loss 0.000469014, throughput 2.23563K wps
[Epoch 156 Batch 90/173] avg loss 0.000466564, throughput 2.23217K wps
[Epoch 156 Batch 120/173] avg loss 0.000455729, throughput 2.22996K wps
[Epoch 156 Batch 150/173] avg loss 0.000484451, throughput 2.23197K wps
Begin Testing...
[Epoch 156] train avg loss 0.00047329, dev acc 0.7977, dev avg loss 0.566385, throughput 2.24105K wps
[Epoch 157 Batch 30/173] avg loss 0.000458638, throughput 2.27756K wps
[Epoch 157 Batch 60/173] avg loss 0.000479892, throughput 2.22616K wps
[Epoch 157 Batch 90/173] avg loss 0.000442461, throughput 2.23587K wps
[Epoch 157 Batch 120/173] avg loss 0.000489279, throughput 2.22054K wps
[Epoch 157 Batch 150/173] avg loss 0.000477585, throughput 2.21279K wps
Begin Testing...
[Epoch 157] train avg loss 0.000477691, dev acc 0.7956, dev avg loss 0.565362, throughput 2.23038K wps
[Epoch 158 Batch 30/173] avg loss 0.000425246, throughput 2.28349K wps
[Epoch 158 Batch 60/173] avg loss 0.000473786, throughput 2.23877K wps
[Epoch 158 Batch 90/173] avg loss 0.000480362, throughput 2.23032K wps
[Epoch 158 Batch 120/173] avg loss 0.000514447, throughput 2.23675K wps
[Epoch 158 Batch 150/173] avg loss 0.000476516, throughput 2.22674K wps
Begin Testing...
[Epoch 158] train avg loss 0.000474852, dev acc 0.7946, dev avg loss 0.564894, throughput 2.24185K wps
[Epoch 159 Batch 30/173] avg loss 0.000454919, throughput 2.27448K wps
[Epoch 159 Batch 60/173] avg loss 0.000473044, throughput 2.22933K wps
[Epoch 159 Batch 90/173] avg loss 0.000476027, throughput 2.23407K wps
[Epoch 159 Batch 120/173] avg loss 0.00045606, throughput 2.23576K wps
[Epoch 159 Batch 150/173] avg loss 0.000410715, throughput 2.22991K wps
Begin Testing...
[Epoch 159] train avg loss 0.000451138, dev acc 0.7956, dev avg loss 0.565878, throughput 2.23849K wps
[Epoch 160 Batch 30/173] avg loss 0.000491487, throughput 2.26831K wps
[Epoch 160 Batch 60/173] avg loss 0.000477252, throughput 2.20026K wps
[Epoch 160 Batch 90/173] avg loss 0.000420792, throughput 2.2078K wps
[Epoch 160 Batch 120/173] avg loss 0.000476002, throughput 2.2047K wps
[Epoch 160 Batch 150/173] avg loss 0.000522435, throughput 2.22566K wps
Begin Testing...
[Epoch 160] train avg loss 0.000473852, dev acc 0.7967, dev avg loss 0.5687, throughput 2.22209K wps
[Epoch 161 Batch 30/173] avg loss 0.000505158, throughput 2.27759K wps
[Epoch 161 Batch 60/173] avg loss 0.000436145, throughput 2.21716K wps
[Epoch 161 Batch 90/173] avg loss 0.000392543, throughput 2.22739K wps
[Epoch 161 Batch 120/173] avg loss 0.000456917, throughput 2.23496K wps
[Epoch 161 Batch 150/173] avg loss 0.000458173, throughput 2.21254K wps
Begin Testing...
[Epoch 161] train avg loss 0.000451855, dev acc 0.7977, dev avg loss 0.567626, throughput 2.22807K wps
[Epoch 162 Batch 30/173] avg loss 0.00039077, throughput 2.2668K wps
[Epoch 162 Batch 60/173] avg loss 0.000432032, throughput 2.22422K wps
[Epoch 162 Batch 90/173] avg loss 0.000455845, throughput 2.22774K wps
[Epoch 162 Batch 120/173] avg loss 0.000414019, throughput 2.22715K wps
[Epoch 162 Batch 150/173] avg loss 0.000446375, throughput 2.22222K wps
Begin Testing...
[Epoch 162] train avg loss 0.000435399, dev acc 0.7946, dev avg loss 0.567173, throughput 2.23421K wps
[Epoch 163 Batch 30/173] avg loss 0.000389158, throughput 2.2735K wps
[Epoch 163 Batch 60/173] avg loss 0.000413047, throughput 2.23442K wps
[Epoch 163 Batch 90/173] avg loss 0.000416774, throughput 2.23035K wps
[Epoch 163 Batch 120/173] avg loss 0.000510442, throughput 2.23062K wps
[Epoch 163 Batch 150/173] avg loss 0.000481943, throughput 2.22413K wps
Begin Testing...
[Epoch 163] train avg loss 0.000441549, dev acc 0.7956, dev avg loss 0.569928, throughput 2.23708K wps
[Epoch 164 Batch 30/173] avg loss 0.000396846, throughput 2.26702K wps
[Epoch 164 Batch 60/173] avg loss 0.000423009, throughput 2.22264K wps
[Epoch 164 Batch 90/173] avg loss 0.000447851, throughput 2.2212K wps
[Epoch 164 Batch 120/173] avg loss 0.000397997, throughput 2.21493K wps
[Epoch 164 Batch 150/173] avg loss 0.00037515, throughput 2.23821K wps
Begin Testing...
[Epoch 164] train avg loss 0.000406031, dev acc 0.7956, dev avg loss 0.5717, throughput 2.23302K wps
[Epoch 165 Batch 30/173] avg loss 0.000387514, throughput 2.27603K wps
[Epoch 165 Batch 60/173] avg loss 0.000451938, throughput 2.23323K wps
[Epoch 165 Batch 90/173] avg loss 0.00040626, throughput 2.23518K wps
[Epoch 165 Batch 120/173] avg loss 0.000430578, throughput 2.23296K wps
[Epoch 165 Batch 150/173] avg loss 0.000462813, throughput 2.23354K wps
Begin Testing...
[Epoch 165] train avg loss 0.00042373, dev acc 0.7987, dev avg loss 0.573908, throughput 2.24166K wps
[Epoch 166 Batch 30/173] avg loss 0.000380121, throughput 2.27325K wps
[Epoch 166 Batch 60/173] avg loss 0.000424781, throughput 2.22434K wps
[Epoch 166 Batch 90/173] avg loss 0.000454049, throughput 2.23654K wps
[Epoch 166 Batch 120/173] avg loss 0.000396889, throughput 2.22721K wps
[Epoch 166 Batch 150/173] avg loss 0.000508258, throughput 2.22676K wps
Begin Testing...
[Epoch 166] train avg loss 0.000433985, dev acc 0.7956, dev avg loss 0.573229, throughput 2.23626K wps
[Epoch 167 Batch 30/173] avg loss 0.000350429, throughput 2.27189K wps
[Epoch 167 Batch 60/173] avg loss 0.000395587, throughput 2.22416K wps
[Epoch 167 Batch 90/173] avg loss 0.000489934, throughput 2.21641K wps
[Epoch 167 Batch 120/173] avg loss 0.000447041, throughput 2.22898K wps
[Epoch 167 Batch 150/173] avg loss 0.000381586, throughput 2.21479K wps
Begin Testing...
[Epoch 167] train avg loss 0.000417251, dev acc 0.7977, dev avg loss 0.574517, throughput 2.2254K wps
[Epoch 168 Batch 30/173] avg loss 0.000379237, throughput 2.26614K wps
[Epoch 168 Batch 60/173] avg loss 0.000419451, throughput 2.22287K wps
[Epoch 168 Batch 90/173] avg loss 0.000329933, throughput 2.23225K wps
[Epoch 168 Batch 120/173] avg loss 0.000452617, throughput 2.21895K wps
[Epoch 168 Batch 150/173] avg loss 0.000426862, throughput 2.23775K wps
Begin Testing...
[Epoch 168] train avg loss 0.000403863, dev acc 0.7956, dev avg loss 0.575203, throughput 2.2353K wps
[Epoch 169 Batch 30/173] avg loss 0.000419412, throughput 2.27081K wps
[Epoch 169 Batch 60/173] avg loss 0.000401468, throughput 2.22693K wps
[Epoch 169 Batch 90/173] avg loss 0.000418159, throughput 2.22574K wps
[Epoch 169 Batch 120/173] avg loss 0.000398101, throughput 2.23779K wps
[Epoch 169 Batch 150/173] avg loss 0.000366046, throughput 2.23335K wps
Begin Testing...
[Epoch 169] train avg loss 0.000402257, dev acc 0.7998, dev avg loss 0.577435, throughput 2.23867K wps
[Epoch 170 Batch 30/173] avg loss 0.000413128, throughput 2.24468K wps
[Epoch 170 Batch 60/173] avg loss 0.00036765, throughput 2.22612K wps
[Epoch 170 Batch 90/173] avg loss 0.000404675, throughput 2.23994K wps
[Epoch 170 Batch 120/173] avg loss 0.000368149, throughput 2.21968K wps
[Epoch 170 Batch 150/173] avg loss 0.000373021, throughput 2.23212K wps
Begin Testing...
[Epoch 170] train avg loss 0.000389094, dev acc 0.7977, dev avg loss 0.582828, throughput 2.23284K wps
[Epoch 171 Batch 30/173] avg loss 0.000472338, throughput 2.28002K wps
[Epoch 171 Batch 60/173] avg loss 0.000385138, throughput 2.20731K wps
[Epoch 171 Batch 90/173] avg loss 0.000418379, throughput 2.2284K wps
[Epoch 171 Batch 120/173] avg loss 0.000359949, throughput 2.22553K wps
[Epoch 171 Batch 150/173] avg loss 0.000340679, throughput 2.22389K wps
Begin Testing...
[Epoch 171] train avg loss 0.000393715, dev acc 0.7935, dev avg loss 0.576892, throughput 2.23152K wps
[Epoch 172 Batch 30/173] avg loss 0.00040236, throughput 2.27885K wps
[Epoch 172 Batch 60/173] avg loss 0.000414082, throughput 2.22141K wps
[Epoch 172 Batch 90/173] avg loss 0.000329066, throughput 2.21345K wps
[Epoch 172 Batch 120/173] avg loss 0.000386385, throughput 2.2258K wps
[Epoch 172 Batch 150/173] avg loss 0.000457344, throughput 2.23148K wps
Begin Testing...
[Epoch 172] train avg loss 0.000393087, dev acc 0.7967, dev avg loss 0.578415, throughput 2.23288K wps
[Epoch 173 Batch 30/173] avg loss 0.000492983, throughput 2.25598K wps
[Epoch 173 Batch 60/173] avg loss 0.000358904, throughput 2.22668K wps
[Epoch 173 Batch 90/173] avg loss 0.000383999, throughput 2.2368K wps
[Epoch 173 Batch 120/173] avg loss 0.00043187, throughput 2.22477K wps
[Epoch 173 Batch 150/173] avg loss 0.000357701, throughput 2.20419K wps
Begin Testing...
[Epoch 173] train avg loss 0.000399242, dev acc 0.7956, dev avg loss 0.579504, throughput 2.23046K wps
[Epoch 174 Batch 30/173] avg loss 0.000351441, throughput 2.2867K wps
[Epoch 174 Batch 60/173] avg loss 0.000357477, throughput 2.2167K wps
[Epoch 174 Batch 90/173] avg loss 0.000400437, throughput 2.23348K wps
[Epoch 174 Batch 120/173] avg loss 0.00040156, throughput 2.22785K wps
[Epoch 174 Batch 150/173] avg loss 0.000330363, throughput 2.21747K wps
Begin Testing...
[Epoch 174] train avg loss 0.000376176, dev acc 0.7987, dev avg loss 0.583038, throughput 2.23588K wps
[Epoch 175 Batch 30/173] avg loss 0.00038326, throughput 2.27694K wps
[Epoch 175 Batch 60/173] avg loss 0.000393327, throughput 2.22818K wps
[Epoch 175 Batch 90/173] avg loss 0.00040974, throughput 2.24038K wps
[Epoch 175 Batch 120/173] avg loss 0.00038229, throughput 2.2317K wps
[Epoch 175 Batch 150/173] avg loss 0.00036231, throughput 2.23082K wps
Begin Testing...
[Epoch 175] train avg loss 0.000385091, dev acc 0.7946, dev avg loss 0.579766, throughput 2.24107K wps
[Epoch 176 Batch 30/173] avg loss 0.000321847, throughput 2.28292K wps
[Epoch 176 Batch 60/173] avg loss 0.000366205, throughput 2.2094K wps
[Epoch 176 Batch 90/173] avg loss 0.000445698, throughput 2.22929K wps
[Epoch 176 Batch 120/173] avg loss 0.000340876, throughput 2.23297K wps
[Epoch 176 Batch 150/173] avg loss 0.000334489, throughput 2.22842K wps
Begin Testing...
[Epoch 176] train avg loss 0.000358604, dev acc 0.7967, dev avg loss 0.588579, throughput 2.23341K wps
[Epoch 177 Batch 30/173] avg loss 0.000356852, throughput 2.27316K wps
[Epoch 177 Batch 60/173] avg loss 0.000384461, throughput 2.23597K wps
[Epoch 177 Batch 90/173] avg loss 0.000415501, throughput 2.23323K wps
[Epoch 177 Batch 120/173] avg loss 0.000330535, throughput 2.22576K wps
[Epoch 177 Batch 150/173] avg loss 0.000378754, throughput 2.23233K wps
Begin Testing...
[Epoch 177] train avg loss 0.000370787, dev acc 0.7967, dev avg loss 0.582774, throughput 2.23882K wps
[Epoch 178 Batch 30/173] avg loss 0.000376101, throughput 2.26472K wps
[Epoch 178 Batch 60/173] avg loss 0.000337208, throughput 2.21266K wps
[Epoch 178 Batch 90/173] avg loss 0.000333528, throughput 2.20814K wps
[Epoch 178 Batch 120/173] avg loss 0.000408012, throughput 2.23232K wps
[Epoch 178 Batch 150/173] avg loss 0.000413726, throughput 2.22353K wps
Begin Testing...
[Epoch 178] train avg loss 0.000374298, dev acc 0.7946, dev avg loss 0.584005, throughput 2.22717K wps
[Epoch 179 Batch 30/173] avg loss 0.000365394, throughput 2.26921K wps
[Epoch 179 Batch 60/173] avg loss 0.000324709, throughput 2.191K wps
[Epoch 179 Batch 90/173] avg loss 0.000370915, throughput 2.2175K wps
[Epoch 179 Batch 120/173] avg loss 0.000329055, throughput 2.23427K wps
[Epoch 179 Batch 150/173] avg loss 0.000335815, throughput 2.23443K wps
Begin Testing...
[Epoch 179] train avg loss 0.000351204, dev acc 0.7977, dev avg loss 0.586926, throughput 2.23046K wps
[Epoch 180 Batch 30/173] avg loss 0.00033599, throughput 2.26304K wps
[Epoch 180 Batch 60/173] avg loss 0.000357304, throughput 2.23191K wps
[Epoch 180 Batch 90/173] avg loss 0.000354382, throughput 2.23821K wps
[Epoch 180 Batch 120/173] avg loss 0.000328735, throughput 2.21727K wps
[Epoch 180 Batch 150/173] avg loss 0.000362259, throughput 2.20237K wps
Begin Testing...
[Epoch 180] train avg loss 0.000346806, dev acc 0.7935, dev avg loss 0.586447, throughput 2.22831K wps
[Epoch 181 Batch 30/173] avg loss 0.000367634, throughput 2.28027K wps
[Epoch 181 Batch 60/173] avg loss 0.000402392, throughput 2.21974K wps
[Epoch 181 Batch 90/173] avg loss 0.000296677, throughput 2.20701K wps
[Epoch 181 Batch 120/173] avg loss 0.000356531, throughput 2.22989K wps
[Epoch 181 Batch 150/173] avg loss 0.00039482, throughput 2.23207K wps
Begin Testing...
[Epoch 181] train avg loss 0.000362309, dev acc 0.7925, dev avg loss 0.588582, throughput 2.23386K wps
[Epoch 182 Batch 30/173] avg loss 0.000371196, throughput 2.26839K wps
[Epoch 182 Batch 60/173] avg loss 0.000295068, throughput 2.22485K wps
[Epoch 182 Batch 90/173] avg loss 0.000370682, throughput 2.22893K wps
[Epoch 182 Batch 120/173] avg loss 0.000350863, throughput 2.23326K wps
[Epoch 182 Batch 150/173] avg loss 0.000392771, throughput 2.22251K wps
Begin Testing...
[Epoch 182] train avg loss 0.000358052, dev acc 0.7946, dev avg loss 0.589897, throughput 2.23519K wps
[Epoch 183 Batch 30/173] avg loss 0.000323384, throughput 2.27137K wps
[Epoch 183 Batch 60/173] avg loss 0.000385344, throughput 2.22513K wps
[Epoch 183 Batch 90/173] avg loss 0.000293494, throughput 2.22408K wps
[Epoch 183 Batch 120/173] avg loss 0.00031292, throughput 2.23854K wps
[Epoch 183 Batch 150/173] avg loss 0.000368941, throughput 2.23742K wps
Begin Testing...
[Epoch 183] train avg loss 0.00034076, dev acc 0.7967, dev avg loss 0.591201, throughput 2.23856K wps
[Epoch 184 Batch 30/173] avg loss 0.000322457, throughput 2.28216K wps
[Epoch 184 Batch 60/173] avg loss 0.00032312, throughput 2.23619K wps
[Epoch 184 Batch 90/173] avg loss 0.000367562, throughput 2.23946K wps
[Epoch 184 Batch 120/173] avg loss 0.000346689, throughput 2.22805K wps
[Epoch 184 Batch 150/173] avg loss 0.000353752, throughput 2.23755K wps
Begin Testing...
[Epoch 184] train avg loss 0.000344104, dev acc 0.7925, dev avg loss 0.590499, throughput 2.24089K wps
[Epoch 185 Batch 30/173] avg loss 0.000327425, throughput 2.26837K wps
[Epoch 185 Batch 60/173] avg loss 0.00033158, throughput 2.22827K wps
[Epoch 185 Batch 90/173] avg loss 0.000352928, throughput 2.23353K wps
[Epoch 185 Batch 120/173] avg loss 0.000321977, throughput 2.23685K wps
[Epoch 185 Batch 150/173] avg loss 0.000388053, throughput 2.23348K wps
Begin Testing...
[Epoch 185] train avg loss 0.000347938, dev acc 0.7935, dev avg loss 0.59041, throughput 2.2375K wps
[Epoch 186 Batch 30/173] avg loss 0.000340456, throughput 2.29034K wps
[Epoch 186 Batch 60/173] avg loss 0.000319377, throughput 2.2325K wps
[Epoch 186 Batch 90/173] avg loss 0.000312125, throughput 2.21119K wps
[Epoch 186 Batch 120/173] avg loss 0.00031565, throughput 2.23108K wps
[Epoch 186 Batch 150/173] avg loss 0.00028019, throughput 2.23074K wps
Begin Testing...
[Epoch 186] train avg loss 0.000320842, dev acc 0.7967, dev avg loss 0.592987, throughput 2.2357K wps
[Epoch 187 Batch 30/173] avg loss 0.000360116, throughput 2.28632K wps
[Epoch 187 Batch 60/173] avg loss 0.000336819, throughput 2.207K wps
[Epoch 187 Batch 90/173] avg loss 0.000320535, throughput 2.23102K wps
[Epoch 187 Batch 120/173] avg loss 0.000320273, throughput 2.22321K wps
[Epoch 187 Batch 150/173] avg loss 0.000381325, throughput 2.21391K wps
Begin Testing...
[Epoch 187] train avg loss 0.000339462, dev acc 0.7935, dev avg loss 0.591099, throughput 2.2312K wps
[Epoch 188 Batch 30/173] avg loss 0.000321045, throughput 2.28197K wps
[Epoch 188 Batch 60/173] avg loss 0.000332355, throughput 2.23633K wps
[Epoch 188 Batch 90/173] avg loss 0.00031969, throughput 2.22852K wps
[Epoch 188 Batch 120/173] avg loss 0.000280707, throughput 2.22566K wps
[Epoch 188 Batch 150/173] avg loss 0.000363439, throughput 2.21563K wps
Begin Testing...
[Epoch 188] train avg loss 0.000331651, dev acc 0.7925, dev avg loss 0.59434, throughput 2.23398K wps
[Epoch 189 Batch 30/173] avg loss 0.000331837, throughput 2.26829K wps
[Epoch 189 Batch 60/173] avg loss 0.000318235, throughput 2.23041K wps
[Epoch 189 Batch 90/173] avg loss 0.000310914, throughput 2.22718K wps
[Epoch 189 Batch 120/173] avg loss 0.000310365, throughput 2.22938K wps
[Epoch 189 Batch 150/173] avg loss 0.000343245, throughput 2.21996K wps
Begin Testing...
[Epoch 189] train avg loss 0.000320487, dev acc 0.7956, dev avg loss 0.595396, throughput 2.234K wps
[Epoch 190 Batch 30/173] avg loss 0.000319708, throughput 2.26975K wps
[Epoch 190 Batch 60/173] avg loss 0.00032655, throughput 2.2256K wps
[Epoch 190 Batch 90/173] avg loss 0.000304348, throughput 2.22159K wps
[Epoch 190 Batch 120/173] avg loss 0.000339659, throughput 2.22774K wps
[Epoch 190 Batch 150/173] avg loss 0.000337029, throughput 2.23182K wps
Begin Testing...
[Epoch 190] train avg loss 0.00033846, dev acc 0.7967, dev avg loss 0.600907, throughput 2.23322K wps
[Epoch 191 Batch 30/173] avg loss 0.000341517, throughput 2.28106K wps
[Epoch 191 Batch 60/173] avg loss 0.00034966, throughput 2.22548K wps
[Epoch 191 Batch 90/173] avg loss 0.000300039, throughput 2.229K wps
[Epoch 191 Batch 120/173] avg loss 0.000313747, throughput 2.23404K wps
[Epoch 191 Batch 150/173] avg loss 0.000298692, throughput 2.23282K wps
Begin Testing...
[Epoch 191] train avg loss 0.000323909, dev acc 0.7935, dev avg loss 0.596545, throughput 2.23662K wps
[Epoch 192 Batch 30/173] avg loss 0.000286625, throughput 2.28235K wps
[Epoch 192 Batch 60/173] avg loss 0.000294302, throughput 2.2342K wps
[Epoch 192 Batch 90/173] avg loss 0.000298745, throughput 2.22921K wps
[Epoch 192 Batch 120/173] avg loss 0.000320456, throughput 2.22285K wps
[Epoch 192 Batch 150/173] avg loss 0.000365019, throughput 2.23495K wps
Begin Testing...
[Epoch 192] train avg loss 0.00032031, dev acc 0.7946, dev avg loss 0.596969, throughput 2.23863K wps
[Epoch 193 Batch 30/173] avg loss 0.000385235, throughput 2.27388K wps
[Epoch 193 Batch 60/173] avg loss 0.00030304, throughput 2.22599K wps
[Epoch 193 Batch 90/173] avg loss 0.000269196, throughput 2.22952K wps
[Epoch 193 Batch 120/173] avg loss 0.000324709, throughput 2.23063K wps
[Epoch 193 Batch 150/173] avg loss 0.000340015, throughput 2.23682K wps
Begin Testing...
[Epoch 193] train avg loss 0.000326485, dev acc 0.7956, dev avg loss 0.597478, throughput 2.23764K wps
[Epoch 194 Batch 30/173] avg loss 0.000297346, throughput 2.27888K wps
[Epoch 194 Batch 60/173] avg loss 0.000303126, throughput 2.2251K wps
[Epoch 194 Batch 90/173] avg loss 0.000332981, throughput 2.22971K wps
[Epoch 194 Batch 120/173] avg loss 0.000315113, throughput 2.22146K wps
[Epoch 194 Batch 150/173] avg loss 0.000369053, throughput 2.19684K wps
Begin Testing...
[Epoch 194] train avg loss 0.000319542, dev acc 0.7925, dev avg loss 0.599458, throughput 2.22815K wps
[Epoch 195 Batch 30/173] avg loss 0.000331025, throughput 2.26942K wps
[Epoch 195 Batch 60/173] avg loss 0.000290698, throughput 2.22467K wps
[Epoch 195 Batch 90/173] avg loss 0.000309269, throughput 2.23139K wps
[Epoch 195 Batch 120/173] avg loss 0.000348577, throughput 2.22852K wps
[Epoch 195 Batch 150/173] avg loss 0.000306142, throughput 2.22687K wps
Begin Testing...
[Epoch 195] train avg loss 0.000311521, dev acc 0.7935, dev avg loss 0.600217, throughput 2.23644K wps
[Epoch 196 Batch 30/173] avg loss 0.000292224, throughput 2.26392K wps
[Epoch 196 Batch 60/173] avg loss 0.000312435, throughput 2.23673K wps
[Epoch 196 Batch 90/173] avg loss 0.000288512, throughput 2.23267K wps
[Epoch 196 Batch 120/173] avg loss 0.000282793, throughput 2.23388K wps
[Epoch 196 Batch 150/173] avg loss 0.000267823, throughput 2.22902K wps
Begin Testing...
[Epoch 196] train avg loss 0.000291186, dev acc 0.7967, dev avg loss 0.606087, throughput 2.23906K wps
[Epoch 197 Batch 30/173] avg loss 0.000340636, throughput 2.25755K wps
[Epoch 197 Batch 60/173] avg loss 0.000344928, throughput 2.23483K wps
[Epoch 197 Batch 90/173] avg loss 0.000293582, throughput 2.23223K wps
[Epoch 197 Batch 120/173] avg loss 0.000291173, throughput 2.23575K wps
[Epoch 197 Batch 150/173] avg loss 0.000305561, throughput 2.18237K wps
Begin Testing...
[Epoch 197] train avg loss 0.000313003, dev acc 0.7935, dev avg loss 0.60367, throughput 2.22764K wps
[Epoch 198 Batch 30/173] avg loss 0.000281474, throughput 2.25394K wps
[Epoch 198 Batch 60/173] avg loss 0.000301613, throughput 2.2108K wps
[Epoch 198 Batch 90/173] avg loss 0.00028792, throughput 2.23196K wps
[Epoch 198 Batch 120/173] avg loss 0.000327522, throughput 2.21814K wps
[Epoch 198 Batch 150/173] avg loss 0.000330733, throughput 2.23244K wps
Begin Testing...
[Epoch 198] train avg loss 0.000316031, dev acc 0.7946, dev avg loss 0.603998, throughput 2.23082K wps
[Epoch 199 Batch 30/173] avg loss 0.00028532, throughput 2.26693K wps
[Epoch 199 Batch 60/173] avg loss 0.000310773, throughput 2.21839K wps
[Epoch 199 Batch 90/173] avg loss 0.000329766, throughput 2.21549K wps
[Epoch 199 Batch 120/173] avg loss 0.000312421, throughput 2.21233K wps
[Epoch 199 Batch 150/173] avg loss 0.000310729, throughput 2.21054K wps
Begin Testing...
[Epoch 199] train avg loss 0.000306293, dev acc 0.7946, dev avg loss 0.605125, throughput 2.22313K wps
Test loss 0.537749, test acc 0.8021
Total time cost 909.37s
[Epoch 0 Batch 30/173] avg loss 0.0139997, throughput 1.73024K wps
[Epoch 0 Batch 60/173] avg loss 0.0138892, throughput 2.2066K wps
[Epoch 0 Batch 90/173] avg loss 0.0138991, throughput 2.2141K wps
[Epoch 0 Batch 120/173] avg loss 0.013992, throughput 2.2202K wps
[Epoch 0 Batch 150/173] avg loss 0.0137051, throughput 2.23327K wps
Begin Testing...
[Epoch 0] train avg loss 0.0138962, dev acc 0.6246, dev avg loss 0.67755, throughput 2.11665K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0136542, throughput 2.25374K wps
[Epoch 1 Batch 60/173] avg loss 0.0136726, throughput 2.23282K wps
[Epoch 1 Batch 90/173] avg loss 0.0136019, throughput 2.23066K wps
[Epoch 1 Batch 120/173] avg loss 0.0136373, throughput 2.22444K wps
[Epoch 1 Batch 150/173] avg loss 0.0135766, throughput 2.22191K wps
Begin Testing...
[Epoch 1] train avg loss 0.0136214, dev acc 0.6861, dev avg loss 0.665228, throughput 2.23079K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0134253, throughput 2.26471K wps
[Epoch 2 Batch 60/173] avg loss 0.0133462, throughput 2.23209K wps
[Epoch 2 Batch 90/173] avg loss 0.0132524, throughput 2.20545K wps
[Epoch 2 Batch 120/173] avg loss 0.013355, throughput 2.20721K wps
[Epoch 2 Batch 150/173] avg loss 0.0133152, throughput 2.23698K wps
Begin Testing...
[Epoch 2] train avg loss 0.0133343, dev acc 0.7059, dev avg loss 0.651827, throughput 2.23016K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0131679, throughput 2.27631K wps
[Epoch 3 Batch 60/173] avg loss 0.0130297, throughput 2.21708K wps
[Epoch 3 Batch 90/173] avg loss 0.0129638, throughput 2.23238K wps
[Epoch 3 Batch 120/173] avg loss 0.012944, throughput 2.22155K wps
[Epoch 3 Batch 150/173] avg loss 0.0128992, throughput 2.23501K wps
Begin Testing...
[Epoch 3] train avg loss 0.0129853, dev acc 0.7299, dev avg loss 0.638187, throughput 2.23668K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/173] avg loss 0.0129337, throughput 2.27082K wps
[Epoch 4 Batch 60/173] avg loss 0.012849, throughput 2.23101K wps
[Epoch 4 Batch 90/173] avg loss 0.0126751, throughput 2.2196K wps
[Epoch 4 Batch 120/173] avg loss 0.0127157, throughput 2.23634K wps
[Epoch 4 Batch 150/173] avg loss 0.0126982, throughput 2.22508K wps
Begin Testing...
[Epoch 4] train avg loss 0.0127543, dev acc 0.7331, dev avg loss 0.62268, throughput 2.23634K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.012601, throughput 2.28331K wps
[Epoch 5 Batch 60/173] avg loss 0.0125259, throughput 2.23333K wps
[Epoch 5 Batch 90/173] avg loss 0.0125129, throughput 2.20862K wps
[Epoch 5 Batch 120/173] avg loss 0.0121404, throughput 2.21802K wps
[Epoch 5 Batch 150/173] avg loss 0.0123439, throughput 2.22512K wps
Begin Testing...
[Epoch 5] train avg loss 0.0123986, dev acc 0.7383, dev avg loss 0.606877, throughput 2.23293K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/173] avg loss 0.0121504, throughput 2.26432K wps
[Epoch 6 Batch 60/173] avg loss 0.0120287, throughput 2.22897K wps
[Epoch 6 Batch 90/173] avg loss 0.0120257, throughput 2.23512K wps
[Epoch 6 Batch 120/173] avg loss 0.0120344, throughput 2.23333K wps
[Epoch 6 Batch 150/173] avg loss 0.0119479, throughput 2.22736K wps
Begin Testing...
[Epoch 6] train avg loss 0.0120594, dev acc 0.7393, dev avg loss 0.588607, throughput 2.23761K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/173] avg loss 0.0118671, throughput 2.26939K wps
[Epoch 7 Batch 60/173] avg loss 0.0117318, throughput 2.22481K wps
[Epoch 7 Batch 90/173] avg loss 0.011788, throughput 2.23469K wps
[Epoch 7 Batch 120/173] avg loss 0.0115098, throughput 2.23589K wps
[Epoch 7 Batch 150/173] avg loss 0.0116968, throughput 2.22767K wps
Begin Testing...
[Epoch 7] train avg loss 0.0116949, dev acc 0.7445, dev avg loss 0.571645, throughput 2.23592K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/173] avg loss 0.0113099, throughput 2.25237K wps
[Epoch 8 Batch 60/173] avg loss 0.0114155, throughput 2.2255K wps
[Epoch 8 Batch 90/173] avg loss 0.0114048, throughput 2.23228K wps
[Epoch 8 Batch 120/173] avg loss 0.0111953, throughput 2.23247K wps
[Epoch 8 Batch 150/173] avg loss 0.0112897, throughput 2.23026K wps
Begin Testing...
[Epoch 8] train avg loss 0.0113103, dev acc 0.7497, dev avg loss 0.554481, throughput 2.23249K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/173] avg loss 0.011181, throughput 2.26701K wps
[Epoch 9 Batch 60/173] avg loss 0.011092, throughput 2.23398K wps
[Epoch 9 Batch 90/173] avg loss 0.0107743, throughput 2.2382K wps
[Epoch 9 Batch 120/173] avg loss 0.0110936, throughput 2.23137K wps
[Epoch 9 Batch 150/173] avg loss 0.0108066, throughput 2.23169K wps
Begin Testing...
[Epoch 9] train avg loss 0.0110145, dev acc 0.7581, dev avg loss 0.538823, throughput 2.2391K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/173] avg loss 0.0107561, throughput 2.25768K wps
[Epoch 10 Batch 60/173] avg loss 0.0108528, throughput 2.2326K wps
[Epoch 10 Batch 90/173] avg loss 0.0107582, throughput 2.2325K wps
[Epoch 10 Batch 120/173] avg loss 0.0106034, throughput 2.23378K wps
[Epoch 10 Batch 150/173] avg loss 0.0106384, throughput 2.23232K wps
Begin Testing...
[Epoch 10] train avg loss 0.0107153, dev acc 0.7685, dev avg loss 0.526672, throughput 2.23686K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/173] avg loss 0.0105011, throughput 2.26707K wps
[Epoch 11 Batch 60/173] avg loss 0.0104703, throughput 2.23614K wps
[Epoch 11 Batch 90/173] avg loss 0.0103457, throughput 2.22899K wps
[Epoch 11 Batch 120/173] avg loss 0.010557, throughput 2.23138K wps
[Epoch 11 Batch 150/173] avg loss 0.0102061, throughput 2.22471K wps
Begin Testing...
[Epoch 11] train avg loss 0.0103809, dev acc 0.7737, dev avg loss 0.511607, throughput 2.23746K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/173] avg loss 0.0101302, throughput 2.28224K wps
[Epoch 12 Batch 60/173] avg loss 0.0101261, throughput 2.23144K wps
[Epoch 12 Batch 90/173] avg loss 0.0100463, throughput 2.22233K wps
[Epoch 12 Batch 120/173] avg loss 0.0100879, throughput 2.2315K wps
[Epoch 12 Batch 150/173] avg loss 0.0101155, throughput 2.23745K wps
Begin Testing...
[Epoch 12] train avg loss 0.0101418, dev acc 0.7821, dev avg loss 0.499618, throughput 2.2379K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/173] avg loss 0.0101405, throughput 2.27123K wps
[Epoch 13 Batch 60/173] avg loss 0.00964011, throughput 2.23599K wps
[Epoch 13 Batch 90/173] avg loss 0.00967381, throughput 2.22435K wps
[Epoch 13 Batch 120/173] avg loss 0.00975185, throughput 2.20995K wps
[Epoch 13 Batch 150/173] avg loss 0.00965113, throughput 2.22135K wps
Begin Testing...
[Epoch 13] train avg loss 0.00980725, dev acc 0.7831, dev avg loss 0.488269, throughput 2.23255K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/173] avg loss 0.0098774, throughput 2.28769K wps
[Epoch 14 Batch 60/173] avg loss 0.00940824, throughput 2.23702K wps
[Epoch 14 Batch 90/173] avg loss 0.00952075, throughput 2.22828K wps
[Epoch 14 Batch 120/173] avg loss 0.00959922, throughput 2.23429K wps
[Epoch 14 Batch 150/173] avg loss 0.0094642, throughput 2.22593K wps
Begin Testing...
[Epoch 14] train avg loss 0.00954148, dev acc 0.7883, dev avg loss 0.477846, throughput 2.24051K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/173] avg loss 0.00942336, throughput 2.27073K wps
[Epoch 15 Batch 60/173] avg loss 0.00958727, throughput 2.23184K wps
[Epoch 15 Batch 90/173] avg loss 0.00934948, throughput 2.22401K wps
[Epoch 15 Batch 120/173] avg loss 0.0092464, throughput 2.22713K wps
[Epoch 15 Batch 150/173] avg loss 0.00918277, throughput 2.20748K wps
Begin Testing...
[Epoch 15] train avg loss 0.00935729, dev acc 0.7925, dev avg loss 0.471302, throughput 2.22831K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.00925031, throughput 2.26419K wps
[Epoch 16 Batch 60/173] avg loss 0.00933118, throughput 2.23461K wps
[Epoch 16 Batch 90/173] avg loss 0.00926854, throughput 2.21303K wps
[Epoch 16 Batch 120/173] avg loss 0.00898109, throughput 2.22666K wps
[Epoch 16 Batch 150/173] avg loss 0.00880357, throughput 2.23536K wps
Begin Testing...
[Epoch 16] train avg loss 0.00913428, dev acc 0.7873, dev avg loss 0.462108, throughput 2.23555K wps
[Epoch 17 Batch 30/173] avg loss 0.00875688, throughput 2.28474K wps
[Epoch 17 Batch 60/173] avg loss 0.00881291, throughput 2.21796K wps
[Epoch 17 Batch 90/173] avg loss 0.00880764, throughput 2.22854K wps
[Epoch 17 Batch 120/173] avg loss 0.00890365, throughput 2.2341K wps
[Epoch 17 Batch 150/173] avg loss 0.00893878, throughput 2.23654K wps
Begin Testing...
[Epoch 17] train avg loss 0.00883969, dev acc 0.7935, dev avg loss 0.457205, throughput 2.23882K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.00877027, throughput 2.2877K wps
[Epoch 18 Batch 60/173] avg loss 0.00849378, throughput 2.22082K wps
[Epoch 18 Batch 90/173] avg loss 0.00888257, throughput 2.21656K wps
[Epoch 18 Batch 120/173] avg loss 0.00850492, throughput 2.23232K wps
[Epoch 18 Batch 150/173] avg loss 0.00876803, throughput 2.23501K wps
Begin Testing...
[Epoch 18] train avg loss 0.00871991, dev acc 0.8029, dev avg loss 0.450596, throughput 2.23647K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/173] avg loss 0.00821535, throughput 2.2401K wps
[Epoch 19 Batch 60/173] avg loss 0.00813856, throughput 2.21899K wps
[Epoch 19 Batch 90/173] avg loss 0.00852584, throughput 2.2391K wps
[Epoch 19 Batch 120/173] avg loss 0.00836456, throughput 2.2335K wps
[Epoch 19 Batch 150/173] avg loss 0.00880376, throughput 2.23795K wps
Begin Testing...
[Epoch 19] train avg loss 0.00844809, dev acc 0.7967, dev avg loss 0.446386, throughput 2.23205K wps
[Epoch 20 Batch 30/173] avg loss 0.00827338, throughput 2.26437K wps
[Epoch 20 Batch 60/173] avg loss 0.00807013, throughput 2.2372K wps
[Epoch 20 Batch 90/173] avg loss 0.00801535, throughput 2.22414K wps
[Epoch 20 Batch 120/173] avg loss 0.00827824, throughput 2.23773K wps
[Epoch 20 Batch 150/173] avg loss 0.00832289, throughput 2.23313K wps
Begin Testing...
[Epoch 20] train avg loss 0.00826288, dev acc 0.7956, dev avg loss 0.444488, throughput 2.23783K wps
[Epoch 21 Batch 30/173] avg loss 0.0082146, throughput 2.25672K wps
[Epoch 21 Batch 60/173] avg loss 0.00792529, throughput 2.21069K wps
[Epoch 21 Batch 90/173] avg loss 0.00800707, throughput 2.22951K wps
[Epoch 21 Batch 120/173] avg loss 0.00827563, throughput 2.23133K wps
[Epoch 21 Batch 150/173] avg loss 0.00817762, throughput 2.22558K wps
Begin Testing...
[Epoch 21] train avg loss 0.00812435, dev acc 0.8133, dev avg loss 0.436119, throughput 2.23033K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/173] avg loss 0.00834125, throughput 2.24975K wps
[Epoch 22 Batch 60/173] avg loss 0.00783502, throughput 2.22436K wps
[Epoch 22 Batch 90/173] avg loss 0.00746491, throughput 2.22464K wps
[Epoch 22 Batch 120/173] avg loss 0.00809901, throughput 2.23323K wps
[Epoch 22 Batch 150/173] avg loss 0.00786459, throughput 2.2156K wps
Begin Testing...
[Epoch 22] train avg loss 0.00797744, dev acc 0.8040, dev avg loss 0.433298, throughput 2.22713K wps
[Epoch 23 Batch 30/173] avg loss 0.0080466, throughput 2.28107K wps
[Epoch 23 Batch 60/173] avg loss 0.00794061, throughput 2.23987K wps
[Epoch 23 Batch 90/173] avg loss 0.00755672, throughput 2.22308K wps
[Epoch 23 Batch 120/173] avg loss 0.00790889, throughput 2.21193K wps
[Epoch 23 Batch 150/173] avg loss 0.00761944, throughput 2.23142K wps
Begin Testing...
[Epoch 23] train avg loss 0.00778649, dev acc 0.8102, dev avg loss 0.428635, throughput 2.23557K wps
[Epoch 24 Batch 30/173] avg loss 0.00778314, throughput 2.27784K wps
[Epoch 24 Batch 60/173] avg loss 0.00775523, throughput 2.23015K wps
[Epoch 24 Batch 90/173] avg loss 0.00744964, throughput 2.22455K wps
[Epoch 24 Batch 120/173] avg loss 0.007385, throughput 2.23027K wps
[Epoch 24 Batch 150/173] avg loss 0.00731891, throughput 2.22531K wps
Begin Testing...
[Epoch 24] train avg loss 0.00758965, dev acc 0.8144, dev avg loss 0.425061, throughput 2.23646K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/173] avg loss 0.00732762, throughput 2.25541K wps
[Epoch 25 Batch 60/173] avg loss 0.00742839, throughput 2.21839K wps
[Epoch 25 Batch 90/173] avg loss 0.00750611, throughput 2.22028K wps
[Epoch 25 Batch 120/173] avg loss 0.00724296, throughput 2.21658K wps
[Epoch 25 Batch 150/173] avg loss 0.0075996, throughput 2.23255K wps
Begin Testing...
[Epoch 25] train avg loss 0.00742809, dev acc 0.8113, dev avg loss 0.422694, throughput 2.22767K wps
[Epoch 26 Batch 30/173] avg loss 0.00693804, throughput 2.26321K wps
[Epoch 26 Batch 60/173] avg loss 0.00722499, throughput 2.22523K wps
[Epoch 26 Batch 90/173] avg loss 0.00727788, throughput 2.23783K wps
[Epoch 26 Batch 120/173] avg loss 0.00759914, throughput 2.22204K wps
[Epoch 26 Batch 150/173] avg loss 0.00699288, throughput 2.21266K wps
Begin Testing...
[Epoch 26] train avg loss 0.00723327, dev acc 0.8196, dev avg loss 0.419419, throughput 2.23021K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/173] avg loss 0.00742048, throughput 2.26668K wps
[Epoch 27 Batch 60/173] avg loss 0.00723464, throughput 2.23178K wps
[Epoch 27 Batch 90/173] avg loss 0.00675458, throughput 2.22733K wps
[Epoch 27 Batch 120/173] avg loss 0.0070296, throughput 2.22937K wps
[Epoch 27 Batch 150/173] avg loss 0.00704549, throughput 2.23066K wps
Begin Testing...
[Epoch 27] train avg loss 0.00709763, dev acc 0.8113, dev avg loss 0.418637, throughput 2.23614K wps
[Epoch 28 Batch 30/173] avg loss 0.00699197, throughput 2.27966K wps
[Epoch 28 Batch 60/173] avg loss 0.00709829, throughput 2.22616K wps
[Epoch 28 Batch 90/173] avg loss 0.00679782, throughput 2.1912K wps
[Epoch 28 Batch 120/173] avg loss 0.00682414, throughput 2.20554K wps
[Epoch 28 Batch 150/173] avg loss 0.00721952, throughput 2.2126K wps
Begin Testing...
[Epoch 28] train avg loss 0.00703306, dev acc 0.8196, dev avg loss 0.415129, throughput 2.22467K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/173] avg loss 0.00672715, throughput 2.2814K wps
[Epoch 29 Batch 60/173] avg loss 0.00677264, throughput 2.2203K wps
[Epoch 29 Batch 90/173] avg loss 0.00682524, throughput 2.23073K wps
[Epoch 29 Batch 120/173] avg loss 0.00647444, throughput 2.22689K wps
[Epoch 29 Batch 150/173] avg loss 0.00711783, throughput 2.2234K wps
Begin Testing...
[Epoch 29] train avg loss 0.00680792, dev acc 0.8206, dev avg loss 0.412632, throughput 2.23316K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/173] avg loss 0.00674273, throughput 2.25625K wps
[Epoch 30 Batch 60/173] avg loss 0.00676876, throughput 2.22666K wps
[Epoch 30 Batch 90/173] avg loss 0.00643213, throughput 2.22892K wps
[Epoch 30 Batch 120/173] avg loss 0.00671469, throughput 2.22887K wps
[Epoch 30 Batch 150/173] avg loss 0.00629745, throughput 2.2321K wps
Begin Testing...
[Epoch 30] train avg loss 0.00664031, dev acc 0.8154, dev avg loss 0.412328, throughput 2.23418K wps
[Epoch 31 Batch 30/173] avg loss 0.00662957, throughput 2.2607K wps
[Epoch 31 Batch 60/173] avg loss 0.00662312, throughput 2.21216K wps
[Epoch 31 Batch 90/173] avg loss 0.00650335, throughput 2.21999K wps
[Epoch 31 Batch 120/173] avg loss 0.00623806, throughput 2.23542K wps
[Epoch 31 Batch 150/173] avg loss 0.00657933, throughput 2.22893K wps
Begin Testing...
[Epoch 31] train avg loss 0.00655544, dev acc 0.8248, dev avg loss 0.409096, throughput 2.23248K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/173] avg loss 0.00641504, throughput 2.28016K wps
[Epoch 32 Batch 60/173] avg loss 0.00636238, throughput 2.23069K wps
[Epoch 32 Batch 90/173] avg loss 0.00644208, throughput 2.22809K wps
[Epoch 32 Batch 120/173] avg loss 0.00624041, throughput 2.23817K wps
[Epoch 32 Batch 150/173] avg loss 0.00620045, throughput 2.23822K wps
Begin Testing...
[Epoch 32] train avg loss 0.00636132, dev acc 0.8175, dev avg loss 0.407811, throughput 2.2415K wps
[Epoch 33 Batch 30/173] avg loss 0.00617785, throughput 2.25946K wps
[Epoch 33 Batch 60/173] avg loss 0.00611626, throughput 2.22509K wps
[Epoch 33 Batch 90/173] avg loss 0.00609455, throughput 2.2226K wps
[Epoch 33 Batch 120/173] avg loss 0.00586923, throughput 2.22468K wps
[Epoch 33 Batch 150/173] avg loss 0.00670812, throughput 2.22768K wps
Begin Testing...
[Epoch 33] train avg loss 0.00622197, dev acc 0.8259, dev avg loss 0.404844, throughput 2.23159K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/173] avg loss 0.00591922, throughput 2.23276K wps
[Epoch 34 Batch 60/173] avg loss 0.00628306, throughput 2.22445K wps
[Epoch 34 Batch 90/173] avg loss 0.00602142, throughput 2.22588K wps
[Epoch 34 Batch 120/173] avg loss 0.00612713, throughput 2.2342K wps
[Epoch 34 Batch 150/173] avg loss 0.00633808, throughput 2.2361K wps
Begin Testing...
[Epoch 34] train avg loss 0.00610797, dev acc 0.8238, dev avg loss 0.403471, throughput 2.2302K wps
[Epoch 35 Batch 30/173] avg loss 0.0057639, throughput 2.27445K wps
[Epoch 35 Batch 60/173] avg loss 0.00592834, throughput 2.22942K wps
[Epoch 35 Batch 90/173] avg loss 0.00604971, throughput 2.2088K wps
[Epoch 35 Batch 120/173] avg loss 0.0063809, throughput 2.21877K wps
[Epoch 35 Batch 150/173] avg loss 0.00589055, throughput 2.22404K wps
Begin Testing...
[Epoch 35] train avg loss 0.00596345, dev acc 0.8227, dev avg loss 0.402275, throughput 2.23125K wps
[Epoch 36 Batch 30/173] avg loss 0.00577106, throughput 2.27336K wps
[Epoch 36 Batch 60/173] avg loss 0.0056683, throughput 2.2321K wps
[Epoch 36 Batch 90/173] avg loss 0.00557511, throughput 2.22087K wps
[Epoch 36 Batch 120/173] avg loss 0.00593859, throughput 2.22696K wps
[Epoch 36 Batch 150/173] avg loss 0.00570335, throughput 2.22622K wps
Begin Testing...
[Epoch 36] train avg loss 0.0057175, dev acc 0.8238, dev avg loss 0.4009, throughput 2.23501K wps
[Epoch 37 Batch 30/173] avg loss 0.00555442, throughput 2.2862K wps
[Epoch 37 Batch 60/173] avg loss 0.00566342, throughput 2.23317K wps
[Epoch 37 Batch 90/173] avg loss 0.00548874, throughput 2.23005K wps
[Epoch 37 Batch 120/173] avg loss 0.00567055, throughput 2.23244K wps
[Epoch 37 Batch 150/173] avg loss 0.00561968, throughput 2.22653K wps
Begin Testing...
[Epoch 37] train avg loss 0.00565418, dev acc 0.8186, dev avg loss 0.402428, throughput 2.23975K wps
[Epoch 38 Batch 30/173] avg loss 0.00547898, throughput 2.28119K wps
[Epoch 38 Batch 60/173] avg loss 0.00572987, throughput 2.23684K wps
[Epoch 38 Batch 90/173] avg loss 0.00568491, throughput 2.21699K wps
[Epoch 38 Batch 120/173] avg loss 0.00534465, throughput 2.21916K wps
[Epoch 38 Batch 150/173] avg loss 0.00532508, throughput 2.21322K wps
Begin Testing...
[Epoch 38] train avg loss 0.00555318, dev acc 0.8290, dev avg loss 0.397767, throughput 2.2316K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/173] avg loss 0.00542738, throughput 2.2712K wps
[Epoch 39 Batch 60/173] avg loss 0.0052576, throughput 2.22574K wps
[Epoch 39 Batch 90/173] avg loss 0.00539559, throughput 2.23273K wps
[Epoch 39 Batch 120/173] avg loss 0.00514064, throughput 2.21908K wps
[Epoch 39 Batch 150/173] avg loss 0.00526308, throughput 2.22288K wps
Begin Testing...
[Epoch 39] train avg loss 0.00532407, dev acc 0.8259, dev avg loss 0.398939, throughput 2.23523K wps
[Epoch 40 Batch 30/173] avg loss 0.00531719, throughput 2.27159K wps
[Epoch 40 Batch 60/173] avg loss 0.00511026, throughput 2.23046K wps
[Epoch 40 Batch 90/173] avg loss 0.00491083, throughput 2.2378K wps
[Epoch 40 Batch 120/173] avg loss 0.00533557, throughput 2.2283K wps
[Epoch 40 Batch 150/173] avg loss 0.00508849, throughput 2.2192K wps
Begin Testing...
[Epoch 40] train avg loss 0.00519923, dev acc 0.8290, dev avg loss 0.396415, throughput 2.23576K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/173] avg loss 0.00544923, throughput 2.27027K wps
[Epoch 41 Batch 60/173] avg loss 0.0048492, throughput 2.22634K wps
[Epoch 41 Batch 90/173] avg loss 0.00496797, throughput 2.22895K wps
[Epoch 41 Batch 120/173] avg loss 0.00503276, throughput 2.22986K wps
[Epoch 41 Batch 150/173] avg loss 0.00526748, throughput 2.22825K wps
Begin Testing...
[Epoch 41] train avg loss 0.0050851, dev acc 0.8269, dev avg loss 0.397456, throughput 2.23419K wps
[Epoch 42 Batch 30/173] avg loss 0.00485427, throughput 2.2707K wps
[Epoch 42 Batch 60/173] avg loss 0.00501164, throughput 2.23388K wps
[Epoch 42 Batch 90/173] avg loss 0.00499074, throughput 2.23713K wps
[Epoch 42 Batch 120/173] avg loss 0.00470677, throughput 2.22942K wps
[Epoch 42 Batch 150/173] avg loss 0.00512137, throughput 2.23237K wps
Begin Testing...
[Epoch 42] train avg loss 0.00495229, dev acc 0.8259, dev avg loss 0.394417, throughput 2.23998K wps
[Epoch 43 Batch 30/173] avg loss 0.00473675, throughput 2.28406K wps
[Epoch 43 Batch 60/173] avg loss 0.00467267, throughput 2.23004K wps
[Epoch 43 Batch 90/173] avg loss 0.00477988, throughput 2.23816K wps
[Epoch 43 Batch 120/173] avg loss 0.00496754, throughput 2.20822K wps
[Epoch 43 Batch 150/173] avg loss 0.0049351, throughput 2.23121K wps
Begin Testing...
[Epoch 43] train avg loss 0.00482594, dev acc 0.8238, dev avg loss 0.395458, throughput 2.23797K wps
[Epoch 44 Batch 30/173] avg loss 0.00488071, throughput 2.2746K wps
[Epoch 44 Batch 60/173] avg loss 0.00432075, throughput 2.23831K wps
[Epoch 44 Batch 90/173] avg loss 0.00481304, throughput 2.22849K wps
[Epoch 44 Batch 120/173] avg loss 0.00485953, throughput 2.22771K wps
[Epoch 44 Batch 150/173] avg loss 0.00475241, throughput 2.23155K wps
Begin Testing...
[Epoch 44] train avg loss 0.0047405, dev acc 0.8373, dev avg loss 0.393186, throughput 2.23811K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/173] avg loss 0.00457424, throughput 2.27173K wps
[Epoch 45 Batch 60/173] avg loss 0.00442983, throughput 2.24098K wps
[Epoch 45 Batch 90/173] avg loss 0.00446382, throughput 2.22384K wps
[Epoch 45 Batch 120/173] avg loss 0.00471706, throughput 2.23571K wps
[Epoch 45 Batch 150/173] avg loss 0.00461039, throughput 2.23793K wps
Begin Testing...
[Epoch 45] train avg loss 0.00458269, dev acc 0.8321, dev avg loss 0.394554, throughput 2.24135K wps
[Epoch 46 Batch 30/173] avg loss 0.00416721, throughput 2.2667K wps
[Epoch 46 Batch 60/173] avg loss 0.00441906, throughput 2.21063K wps
[Epoch 46 Batch 90/173] avg loss 0.00438904, throughput 2.23241K wps
[Epoch 46 Batch 120/173] avg loss 0.00422854, throughput 2.23982K wps
[Epoch 46 Batch 150/173] avg loss 0.00482827, throughput 2.23391K wps
Begin Testing...
[Epoch 46] train avg loss 0.00445134, dev acc 0.8311, dev avg loss 0.391718, throughput 2.23685K wps
[Epoch 47 Batch 30/173] avg loss 0.00436652, throughput 2.26636K wps
[Epoch 47 Batch 60/173] avg loss 0.00420907, throughput 2.22629K wps
[Epoch 47 Batch 90/173] avg loss 0.00418737, throughput 2.2163K wps
[Epoch 47 Batch 120/173] avg loss 0.00442181, throughput 2.21219K wps
[Epoch 47 Batch 150/173] avg loss 0.00435268, throughput 2.21967K wps
Begin Testing...
[Epoch 47] train avg loss 0.00432692, dev acc 0.8290, dev avg loss 0.391461, throughput 2.22605K wps
[Epoch 48 Batch 30/173] avg loss 0.00404163, throughput 2.26906K wps
[Epoch 48 Batch 60/173] avg loss 0.00448927, throughput 2.20944K wps
[Epoch 48 Batch 90/173] avg loss 0.00421832, throughput 2.23687K wps
[Epoch 48 Batch 120/173] avg loss 0.00424461, throughput 2.22792K wps
[Epoch 48 Batch 150/173] avg loss 0.00426911, throughput 2.23039K wps
Begin Testing...
[Epoch 48] train avg loss 0.00425222, dev acc 0.8311, dev avg loss 0.393394, throughput 2.23504K wps
[Epoch 49 Batch 30/173] avg loss 0.0041054, throughput 2.28376K wps
[Epoch 49 Batch 60/173] avg loss 0.00418734, throughput 2.2175K wps
[Epoch 49 Batch 90/173] avg loss 0.00381871, throughput 2.22071K wps
[Epoch 49 Batch 120/173] avg loss 0.0040366, throughput 2.23069K wps
[Epoch 49 Batch 150/173] avg loss 0.00419365, throughput 2.23229K wps
Begin Testing...
[Epoch 49] train avg loss 0.00407298, dev acc 0.8332, dev avg loss 0.390628, throughput 2.23657K wps
[Epoch 50 Batch 30/173] avg loss 0.00406632, throughput 2.25639K wps
[Epoch 50 Batch 60/173] avg loss 0.00403902, throughput 2.22862K wps
[Epoch 50 Batch 90/173] avg loss 0.0040141, throughput 2.22246K wps
[Epoch 50 Batch 120/173] avg loss 0.00416693, throughput 2.22997K wps
[Epoch 50 Batch 150/173] avg loss 0.00395158, throughput 2.21967K wps
Begin Testing...
[Epoch 50] train avg loss 0.00404693, dev acc 0.8290, dev avg loss 0.392366, throughput 2.23163K wps
[Epoch 51 Batch 30/173] avg loss 0.00366946, throughput 2.27052K wps
[Epoch 51 Batch 60/173] avg loss 0.00402281, throughput 2.22664K wps
[Epoch 51 Batch 90/173] avg loss 0.00402711, throughput 2.2351K wps
[Epoch 51 Batch 120/173] avg loss 0.00399359, throughput 2.23548K wps
[Epoch 51 Batch 150/173] avg loss 0.00418982, throughput 2.2337K wps
Begin Testing...
[Epoch 51] train avg loss 0.0039739, dev acc 0.8363, dev avg loss 0.39001, throughput 2.23936K wps
[Epoch 52 Batch 30/173] avg loss 0.00377711, throughput 2.26833K wps
[Epoch 52 Batch 60/173] avg loss 0.00349832, throughput 2.2178K wps
[Epoch 52 Batch 90/173] avg loss 0.00393776, throughput 2.23665K wps
[Epoch 52 Batch 120/173] avg loss 0.00395012, throughput 2.21855K wps
[Epoch 52 Batch 150/173] avg loss 0.00377121, throughput 2.23978K wps
Begin Testing...
[Epoch 52] train avg loss 0.0038486, dev acc 0.8311, dev avg loss 0.392633, throughput 2.23591K wps
[Epoch 53 Batch 30/173] avg loss 0.00391015, throughput 2.27712K wps
[Epoch 53 Batch 60/173] avg loss 0.00346215, throughput 2.22131K wps
[Epoch 53 Batch 90/173] avg loss 0.0036065, throughput 2.21771K wps
[Epoch 53 Batch 120/173] avg loss 0.0039199, throughput 2.23221K wps
[Epoch 53 Batch 150/173] avg loss 0.00368508, throughput 2.22321K wps
Begin Testing...
[Epoch 53] train avg loss 0.00372027, dev acc 0.8269, dev avg loss 0.392734, throughput 2.2339K wps
[Epoch 54 Batch 30/173] avg loss 0.00382311, throughput 2.28623K wps
[Epoch 54 Batch 60/173] avg loss 0.00357403, throughput 2.23354K wps
[Epoch 54 Batch 90/173] avg loss 0.00372292, throughput 2.22897K wps
[Epoch 54 Batch 120/173] avg loss 0.00347754, throughput 2.23088K wps
[Epoch 54 Batch 150/173] avg loss 0.0036109, throughput 2.23858K wps
Begin Testing...
[Epoch 54] train avg loss 0.00364769, dev acc 0.8342, dev avg loss 0.390665, throughput 2.24222K wps
[Epoch 55 Batch 30/173] avg loss 0.00356565, throughput 2.24954K wps
[Epoch 55 Batch 60/173] avg loss 0.00356737, throughput 2.2337K wps
[Epoch 55 Batch 90/173] avg loss 0.00362744, throughput 2.23268K wps
[Epoch 55 Batch 120/173] avg loss 0.00330504, throughput 2.22688K wps
[Epoch 55 Batch 150/173] avg loss 0.00335239, throughput 2.21144K wps
Begin Testing...
[Epoch 55] train avg loss 0.00350434, dev acc 0.8321, dev avg loss 0.390991, throughput 2.22909K wps
[Epoch 56 Batch 30/173] avg loss 0.00344843, throughput 2.28301K wps
[Epoch 56 Batch 60/173] avg loss 0.00328206, throughput 2.22072K wps
[Epoch 56 Batch 90/173] avg loss 0.00342518, throughput 2.21285K wps
[Epoch 56 Batch 120/173] avg loss 0.00347458, throughput 2.21885K wps
[Epoch 56 Batch 150/173] avg loss 0.00330482, throughput 2.21912K wps
Begin Testing...
[Epoch 56] train avg loss 0.00341596, dev acc 0.8321, dev avg loss 0.391859, throughput 2.2294K wps
[Epoch 57 Batch 30/173] avg loss 0.0031375, throughput 2.26598K wps
[Epoch 57 Batch 60/173] avg loss 0.00312153, throughput 2.23236K wps
[Epoch 57 Batch 90/173] avg loss 0.00331935, throughput 2.22294K wps
[Epoch 57 Batch 120/173] avg loss 0.0035299, throughput 2.23809K wps
[Epoch 57 Batch 150/173] avg loss 0.00321489, throughput 2.23673K wps
Begin Testing...
[Epoch 57] train avg loss 0.00328652, dev acc 0.8363, dev avg loss 0.391095, throughput 2.23652K wps
[Epoch 58 Batch 30/173] avg loss 0.00296761, throughput 2.2753K wps
[Epoch 58 Batch 60/173] avg loss 0.00313219, throughput 2.21708K wps
[Epoch 58 Batch 90/173] avg loss 0.00337056, throughput 2.23227K wps
[Epoch 58 Batch 120/173] avg loss 0.00352739, throughput 2.21277K wps
[Epoch 58 Batch 150/173] avg loss 0.00327685, throughput 2.22374K wps
Begin Testing...
[Epoch 58] train avg loss 0.00323559, dev acc 0.8332, dev avg loss 0.391921, throughput 2.23089K wps
[Epoch 59 Batch 30/173] avg loss 0.00317845, throughput 2.26642K wps
[Epoch 59 Batch 60/173] avg loss 0.00314903, throughput 2.23332K wps
[Epoch 59 Batch 90/173] avg loss 0.00307547, throughput 2.23166K wps
[Epoch 59 Batch 120/173] avg loss 0.00310878, throughput 2.21766K wps
[Epoch 59 Batch 150/173] avg loss 0.00314595, throughput 2.21221K wps
Begin Testing...
[Epoch 59] train avg loss 0.00315735, dev acc 0.8290, dev avg loss 0.392819, throughput 2.23125K wps
[Epoch 60 Batch 30/173] avg loss 0.00302203, throughput 2.25748K wps
[Epoch 60 Batch 60/173] avg loss 0.00297999, throughput 2.22059K wps
[Epoch 60 Batch 90/173] avg loss 0.0029687, throughput 2.23167K wps
[Epoch 60 Batch 120/173] avg loss 0.0031782, throughput 2.23608K wps
[Epoch 60 Batch 150/173] avg loss 0.00304142, throughput 2.21405K wps
Begin Testing...
[Epoch 60] train avg loss 0.0030415, dev acc 0.8332, dev avg loss 0.39262, throughput 2.22913K wps
[Epoch 61 Batch 30/173] avg loss 0.00291424, throughput 2.26753K wps
[Epoch 61 Batch 60/173] avg loss 0.00292245, throughput 2.21298K wps
[Epoch 61 Batch 90/173] avg loss 0.00271401, throughput 2.21232K wps
[Epoch 61 Batch 120/173] avg loss 0.0029131, throughput 2.22858K wps
[Epoch 61 Batch 150/173] avg loss 0.00313522, throughput 2.23526K wps
Begin Testing...
[Epoch 61] train avg loss 0.00291976, dev acc 0.8321, dev avg loss 0.394728, throughput 2.23213K wps
[Epoch 62 Batch 30/173] avg loss 0.00285143, throughput 2.25697K wps
[Epoch 62 Batch 60/173] avg loss 0.00312958, throughput 2.22165K wps
[Epoch 62 Batch 90/173] avg loss 0.00311724, throughput 2.22535K wps
[Epoch 62 Batch 120/173] avg loss 0.00271503, throughput 2.22791K wps
[Epoch 62 Batch 150/173] avg loss 0.00289154, throughput 2.21092K wps
Begin Testing...
[Epoch 62] train avg loss 0.00293677, dev acc 0.8321, dev avg loss 0.393284, throughput 2.22902K wps
[Epoch 63 Batch 30/173] avg loss 0.00307154, throughput 2.27225K wps
[Epoch 63 Batch 60/173] avg loss 0.00274465, throughput 2.23742K wps
[Epoch 63 Batch 90/173] avg loss 0.00267975, throughput 2.23082K wps
[Epoch 63 Batch 120/173] avg loss 0.00279929, throughput 2.22618K wps
[Epoch 63 Batch 150/173] avg loss 0.00272027, throughput 2.23241K wps
Begin Testing...
[Epoch 63] train avg loss 0.00281171, dev acc 0.8342, dev avg loss 0.393687, throughput 2.23953K wps
[Epoch 64 Batch 30/173] avg loss 0.00260868, throughput 2.26789K wps
[Epoch 64 Batch 60/173] avg loss 0.00272075, throughput 2.23131K wps
[Epoch 64 Batch 90/173] avg loss 0.002807, throughput 2.22745K wps
[Epoch 64 Batch 120/173] avg loss 0.00269767, throughput 2.22922K wps
[Epoch 64 Batch 150/173] avg loss 0.00251565, throughput 2.23117K wps
Begin Testing...
[Epoch 64] train avg loss 0.00269471, dev acc 0.8332, dev avg loss 0.394524, throughput 2.2368K wps
[Epoch 65 Batch 30/173] avg loss 0.00263198, throughput 2.27865K wps
[Epoch 65 Batch 60/173] avg loss 0.00254591, throughput 2.22018K wps
[Epoch 65 Batch 90/173] avg loss 0.00270394, throughput 2.22235K wps
[Epoch 65 Batch 120/173] avg loss 0.00247476, throughput 2.23644K wps
[Epoch 65 Batch 150/173] avg loss 0.00258348, throughput 2.23799K wps
Begin Testing...
[Epoch 65] train avg loss 0.00264016, dev acc 0.8300, dev avg loss 0.395072, throughput 2.23587K wps
[Epoch 66 Batch 30/173] avg loss 0.00282991, throughput 2.28354K wps
[Epoch 66 Batch 60/173] avg loss 0.00246168, throughput 2.22393K wps
[Epoch 66 Batch 90/173] avg loss 0.00262516, throughput 2.22713K wps
[Epoch 66 Batch 120/173] avg loss 0.00255723, throughput 2.23114K wps
[Epoch 66 Batch 150/173] avg loss 0.0024116, throughput 2.23229K wps
Begin Testing...
[Epoch 66] train avg loss 0.00257346, dev acc 0.8321, dev avg loss 0.395945, throughput 2.23921K wps
[Epoch 67 Batch 30/173] avg loss 0.00252069, throughput 2.28415K wps
[Epoch 67 Batch 60/173] avg loss 0.00264087, throughput 2.21834K wps
[Epoch 67 Batch 90/173] avg loss 0.00240081, throughput 2.2082K wps
[Epoch 67 Batch 120/173] avg loss 0.00248171, throughput 2.22313K wps
[Epoch 67 Batch 150/173] avg loss 0.00242002, throughput 2.2144K wps
Begin Testing...
[Epoch 67] train avg loss 0.00250113, dev acc 0.8321, dev avg loss 0.396742, throughput 2.22968K wps
[Epoch 68 Batch 30/173] avg loss 0.00227956, throughput 2.26617K wps
[Epoch 68 Batch 60/173] avg loss 0.00229737, throughput 2.22978K wps
[Epoch 68 Batch 90/173] avg loss 0.00259801, throughput 2.23309K wps
[Epoch 68 Batch 120/173] avg loss 0.00246104, throughput 2.22617K wps
[Epoch 68 Batch 150/173] avg loss 0.0024952, throughput 2.23227K wps
Begin Testing...
[Epoch 68] train avg loss 0.00243812, dev acc 0.8342, dev avg loss 0.395871, throughput 2.23708K wps
[Epoch 69 Batch 30/173] avg loss 0.00222739, throughput 2.2766K wps
[Epoch 69 Batch 60/173] avg loss 0.00248935, throughput 2.22933K wps
[Epoch 69 Batch 90/173] avg loss 0.00237031, throughput 2.22867K wps
[Epoch 69 Batch 120/173] avg loss 0.00242327, throughput 2.22925K wps
[Epoch 69 Batch 150/173] avg loss 0.00228667, throughput 2.22776K wps
Begin Testing...
[Epoch 69] train avg loss 0.00238386, dev acc 0.8311, dev avg loss 0.39809, throughput 2.2377K wps
[Epoch 70 Batch 30/173] avg loss 0.00231495, throughput 2.27673K wps
[Epoch 70 Batch 60/173] avg loss 0.00235811, throughput 2.23462K wps
[Epoch 70 Batch 90/173] avg loss 0.00231834, throughput 2.22955K wps
[Epoch 70 Batch 120/173] avg loss 0.00221353, throughput 2.23589K wps
[Epoch 70 Batch 150/173] avg loss 0.00236545, throughput 2.23342K wps
Begin Testing...
[Epoch 70] train avg loss 0.00232637, dev acc 0.8352, dev avg loss 0.400565, throughput 2.2401K wps
[Epoch 71 Batch 30/173] avg loss 0.00230929, throughput 2.28128K wps
[Epoch 71 Batch 60/173] avg loss 0.00225077, throughput 2.22354K wps
[Epoch 71 Batch 90/173] avg loss 0.00211218, throughput 2.21375K wps
[Epoch 71 Batch 120/173] avg loss 0.00220867, throughput 2.21306K wps
[Epoch 71 Batch 150/173] avg loss 0.00239668, throughput 2.23846K wps
Begin Testing...
[Epoch 71] train avg loss 0.00223704, dev acc 0.8342, dev avg loss 0.401759, throughput 2.23489K wps
[Epoch 72 Batch 30/173] avg loss 0.00216422, throughput 2.27544K wps
[Epoch 72 Batch 60/173] avg loss 0.00208172, throughput 2.21077K wps
[Epoch 72 Batch 90/173] avg loss 0.00220203, throughput 2.2201K wps
[Epoch 72 Batch 120/173] avg loss 0.00222919, throughput 2.23702K wps
[Epoch 72 Batch 150/173] avg loss 0.00225525, throughput 2.2372K wps
Begin Testing...
[Epoch 72] train avg loss 0.00219555, dev acc 0.8352, dev avg loss 0.400786, throughput 2.23581K wps
[Epoch 73 Batch 30/173] avg loss 0.00205757, throughput 2.27808K wps
[Epoch 73 Batch 60/173] avg loss 0.00228593, throughput 2.22059K wps
[Epoch 73 Batch 90/173] avg loss 0.00223684, throughput 2.22443K wps
[Epoch 73 Batch 120/173] avg loss 0.00215713, throughput 2.21237K wps
[Epoch 73 Batch 150/173] avg loss 0.002225, throughput 2.23038K wps
Begin Testing...
[Epoch 73] train avg loss 0.00216824, dev acc 0.8342, dev avg loss 0.399803, throughput 2.23151K wps
[Epoch 74 Batch 30/173] avg loss 0.00203613, throughput 2.2536K wps
[Epoch 74 Batch 60/173] avg loss 0.00204618, throughput 2.20906K wps
[Epoch 74 Batch 90/173] avg loss 0.0020537, throughput 2.21518K wps
[Epoch 74 Batch 120/173] avg loss 0.00203513, throughput 2.22354K wps
[Epoch 74 Batch 150/173] avg loss 0.00203046, throughput 2.23352K wps
Begin Testing...
[Epoch 74] train avg loss 0.00205645, dev acc 0.8332, dev avg loss 0.401403, throughput 2.2287K wps
[Epoch 75 Batch 30/173] avg loss 0.00196693, throughput 2.27145K wps
[Epoch 75 Batch 60/173] avg loss 0.002321, throughput 2.21526K wps
[Epoch 75 Batch 90/173] avg loss 0.00192506, throughput 2.2243K wps
[Epoch 75 Batch 120/173] avg loss 0.00203178, throughput 2.23533K wps
[Epoch 75 Batch 150/173] avg loss 0.00215404, throughput 2.22943K wps
Begin Testing...
[Epoch 75] train avg loss 0.00207066, dev acc 0.8332, dev avg loss 0.401539, throughput 2.23427K wps
[Epoch 76 Batch 30/173] avg loss 0.00194419, throughput 2.27682K wps
[Epoch 76 Batch 60/173] avg loss 0.00196272, throughput 2.23365K wps
[Epoch 76 Batch 90/173] avg loss 0.00191597, throughput 2.23327K wps
[Epoch 76 Batch 120/173] avg loss 0.00197554, throughput 2.21708K wps
[Epoch 76 Batch 150/173] avg loss 0.00196141, throughput 2.22778K wps
Begin Testing...
[Epoch 76] train avg loss 0.00197307, dev acc 0.8352, dev avg loss 0.40549, throughput 2.23611K wps
[Epoch 77 Batch 30/173] avg loss 0.00186876, throughput 2.28289K wps
[Epoch 77 Batch 60/173] avg loss 0.00204274, throughput 2.23182K wps
[Epoch 77 Batch 90/173] avg loss 0.00191432, throughput 2.22548K wps
[Epoch 77 Batch 120/173] avg loss 0.00197417, throughput 2.23839K wps
[Epoch 77 Batch 150/173] avg loss 0.00208513, throughput 2.23904K wps
Begin Testing...
[Epoch 77] train avg loss 0.00197918, dev acc 0.8363, dev avg loss 0.405777, throughput 2.24137K wps
[Epoch 78 Batch 30/173] avg loss 0.00174598, throughput 2.26587K wps
[Epoch 78 Batch 60/173] avg loss 0.00198209, throughput 2.2238K wps
[Epoch 78 Batch 90/173] avg loss 0.00186337, throughput 2.20481K wps
[Epoch 78 Batch 120/173] avg loss 0.00191121, throughput 2.23346K wps
[Epoch 78 Batch 150/173] avg loss 0.00183216, throughput 2.23519K wps
Begin Testing...
[Epoch 78] train avg loss 0.0018862, dev acc 0.8332, dev avg loss 0.406517, throughput 2.231K wps
[Epoch 79 Batch 30/173] avg loss 0.00178633, throughput 2.29297K wps
[Epoch 79 Batch 60/173] avg loss 0.00182317, throughput 2.22061K wps
[Epoch 79 Batch 90/173] avg loss 0.00192273, throughput 2.23317K wps
[Epoch 79 Batch 120/173] avg loss 0.00163082, throughput 2.23911K wps
[Epoch 79 Batch 150/173] avg loss 0.0017457, throughput 2.23277K wps
Begin Testing...
[Epoch 79] train avg loss 0.00180397, dev acc 0.8352, dev avg loss 0.404189, throughput 2.24004K wps
[Epoch 80 Batch 30/173] avg loss 0.00187875, throughput 2.26998K wps
[Epoch 80 Batch 60/173] avg loss 0.00176457, throughput 2.23488K wps
[Epoch 80 Batch 90/173] avg loss 0.00194896, throughput 2.2236K wps
[Epoch 80 Batch 120/173] avg loss 0.00171405, throughput 2.22589K wps
[Epoch 80 Batch 150/173] avg loss 0.00172642, throughput 2.23847K wps
Begin Testing...
[Epoch 80] train avg loss 0.00181368, dev acc 0.8321, dev avg loss 0.406842, throughput 2.23828K wps
[Epoch 81 Batch 30/173] avg loss 0.00170484, throughput 2.28495K wps
[Epoch 81 Batch 60/173] avg loss 0.00176211, throughput 2.23349K wps
[Epoch 81 Batch 90/173] avg loss 0.00172879, throughput 2.2082K wps
[Epoch 81 Batch 120/173] avg loss 0.00187838, throughput 2.22615K wps
[Epoch 81 Batch 150/173] avg loss 0.00165487, throughput 2.22003K wps
Begin Testing...
[Epoch 81] train avg loss 0.00175491, dev acc 0.8300, dev avg loss 0.408343, throughput 2.23216K wps
[Epoch 82 Batch 30/173] avg loss 0.00171619, throughput 2.27492K wps
[Epoch 82 Batch 60/173] avg loss 0.0016955, throughput 2.22685K wps
[Epoch 82 Batch 90/173] avg loss 0.00170193, throughput 2.23591K wps
[Epoch 82 Batch 120/173] avg loss 0.00167697, throughput 2.23196K wps
[Epoch 82 Batch 150/173] avg loss 0.00179489, throughput 2.23809K wps
Begin Testing...
[Epoch 82] train avg loss 0.00173868, dev acc 0.8321, dev avg loss 0.407056, throughput 2.24131K wps
[Epoch 83 Batch 30/173] avg loss 0.00163051, throughput 2.2566K wps
[Epoch 83 Batch 60/173] avg loss 0.00156783, throughput 2.23509K wps
[Epoch 83 Batch 90/173] avg loss 0.00175373, throughput 2.2235K wps
[Epoch 83 Batch 120/173] avg loss 0.00165645, throughput 2.23093K wps
[Epoch 83 Batch 150/173] avg loss 0.00174089, throughput 2.23657K wps
Begin Testing...
[Epoch 83] train avg loss 0.00166904, dev acc 0.8342, dev avg loss 0.412043, throughput 2.2356K wps
[Epoch 84 Batch 30/173] avg loss 0.00152868, throughput 2.26117K wps
[Epoch 84 Batch 60/173] avg loss 0.00166391, throughput 2.23189K wps
[Epoch 84 Batch 90/173] avg loss 0.00171428, throughput 2.224K wps
[Epoch 84 Batch 120/173] avg loss 0.00176125, throughput 2.1981K wps
[Epoch 84 Batch 150/173] avg loss 0.00164092, throughput 2.22525K wps
Begin Testing...
[Epoch 84] train avg loss 0.0016719, dev acc 0.8290, dev avg loss 0.407586, throughput 2.22878K wps
[Epoch 85 Batch 30/173] avg loss 0.00161351, throughput 2.28576K wps
[Epoch 85 Batch 60/173] avg loss 0.00167036, throughput 2.22151K wps
[Epoch 85 Batch 90/173] avg loss 0.00172617, throughput 2.23726K wps
[Epoch 85 Batch 120/173] avg loss 0.00168485, throughput 2.2259K wps
[Epoch 85 Batch 150/173] avg loss 0.00160266, throughput 2.23546K wps
Begin Testing...
[Epoch 85] train avg loss 0.00166578, dev acc 0.8321, dev avg loss 0.410601, throughput 2.2394K wps
[Epoch 86 Batch 30/173] avg loss 0.00161497, throughput 2.28354K wps
[Epoch 86 Batch 60/173] avg loss 0.00150461, throughput 2.23333K wps
[Epoch 86 Batch 90/173] avg loss 0.00155923, throughput 2.23961K wps
[Epoch 86 Batch 120/173] avg loss 0.0015362, throughput 2.22334K wps
[Epoch 86 Batch 150/173] avg loss 0.00166001, throughput 2.22747K wps
Begin Testing...
[Epoch 86] train avg loss 0.00159698, dev acc 0.8321, dev avg loss 0.416024, throughput 2.23943K wps
[Epoch 87 Batch 30/173] avg loss 0.00144107, throughput 2.26659K wps
[Epoch 87 Batch 60/173] avg loss 0.00141579, throughput 2.23378K wps
[Epoch 87 Batch 90/173] avg loss 0.00148251, throughput 2.22482K wps
[Epoch 87 Batch 120/173] avg loss 0.00159491, throughput 2.22677K wps
[Epoch 87 Batch 150/173] avg loss 0.00158612, throughput 2.23418K wps
Begin Testing...
[Epoch 87] train avg loss 0.00150757, dev acc 0.8352, dev avg loss 0.411712, throughput 2.23527K wps
[Epoch 88 Batch 30/173] avg loss 0.00145019, throughput 2.27027K wps
[Epoch 88 Batch 60/173] avg loss 0.00150912, throughput 2.23174K wps
[Epoch 88 Batch 90/173] avg loss 0.0015032, throughput 2.23428K wps
[Epoch 88 Batch 120/173] avg loss 0.00143544, throughput 2.22747K wps
[Epoch 88 Batch 150/173] avg loss 0.00146512, throughput 2.23065K wps
Begin Testing...
[Epoch 88] train avg loss 0.00149805, dev acc 0.8352, dev avg loss 0.411509, throughput 2.23799K wps
[Epoch 89 Batch 30/173] avg loss 0.00141814, throughput 2.27638K wps
[Epoch 89 Batch 60/173] avg loss 0.00139824, throughput 2.21759K wps
[Epoch 89 Batch 90/173] avg loss 0.00153637, throughput 2.22983K wps
[Epoch 89 Batch 120/173] avg loss 0.0015725, throughput 2.22019K wps
[Epoch 89 Batch 150/173] avg loss 0.00153173, throughput 2.23815K wps
Begin Testing...
[Epoch 89] train avg loss 0.00148681, dev acc 0.8352, dev avg loss 0.412522, throughput 2.23615K wps
[Epoch 90 Batch 30/173] avg loss 0.00142605, throughput 2.27779K wps
[Epoch 90 Batch 60/173] avg loss 0.00148903, throughput 2.22171K wps
[Epoch 90 Batch 90/173] avg loss 0.00143897, throughput 2.22949K wps
[Epoch 90 Batch 120/173] avg loss 0.0014142, throughput 2.2095K wps
[Epoch 90 Batch 150/173] avg loss 0.00136862, throughput 2.23292K wps
Begin Testing...
[Epoch 90] train avg loss 0.00143387, dev acc 0.8300, dev avg loss 0.415144, throughput 2.23349K wps
[Epoch 91 Batch 30/173] avg loss 0.00137026, throughput 2.26525K wps
[Epoch 91 Batch 60/173] avg loss 0.00142201, throughput 2.22624K wps
[Epoch 91 Batch 90/173] avg loss 0.00145411, throughput 2.22715K wps
[Epoch 91 Batch 120/173] avg loss 0.00147299, throughput 2.23825K wps
[Epoch 91 Batch 150/173] avg loss 0.00134674, throughput 2.22011K wps
Begin Testing...
[Epoch 91] train avg loss 0.0014008, dev acc 0.8321, dev avg loss 0.418513, throughput 2.2353K wps
[Epoch 92 Batch 30/173] avg loss 0.00127567, throughput 2.27549K wps
[Epoch 92 Batch 60/173] avg loss 0.00144078, throughput 2.2343K wps
[Epoch 92 Batch 90/173] avg loss 0.00146052, throughput 2.22658K wps
[Epoch 92 Batch 120/173] avg loss 0.00133454, throughput 2.22715K wps
[Epoch 92 Batch 150/173] avg loss 0.00124186, throughput 2.23042K wps
Begin Testing...
[Epoch 92] train avg loss 0.00135313, dev acc 0.8311, dev avg loss 0.418307, throughput 2.23857K wps
[Epoch 93 Batch 30/173] avg loss 0.00129361, throughput 2.27358K wps
[Epoch 93 Batch 60/173] avg loss 0.00133762, throughput 2.21522K wps
[Epoch 93 Batch 90/173] avg loss 0.00132967, throughput 2.23442K wps
[Epoch 93 Batch 120/173] avg loss 0.00132997, throughput 2.23991K wps
[Epoch 93 Batch 150/173] avg loss 0.00134581, throughput 2.23665K wps
Begin Testing...
[Epoch 93] train avg loss 0.00131048, dev acc 0.8352, dev avg loss 0.418884, throughput 2.23823K wps
[Epoch 94 Batch 30/173] avg loss 0.00128881, throughput 2.25701K wps
[Epoch 94 Batch 60/173] avg loss 0.00137762, throughput 2.23021K wps
[Epoch 94 Batch 90/173] avg loss 0.00126556, throughput 2.22583K wps
[Epoch 94 Batch 120/173] avg loss 0.00141184, throughput 2.23174K wps
[Epoch 94 Batch 150/173] avg loss 0.00134223, throughput 2.23376K wps
Begin Testing...
[Epoch 94] train avg loss 0.00132749, dev acc 0.8352, dev avg loss 0.418214, throughput 2.23597K wps
[Epoch 95 Batch 30/173] avg loss 0.00130407, throughput 2.26915K wps
[Epoch 95 Batch 60/173] avg loss 0.00124723, throughput 2.22702K wps
[Epoch 95 Batch 90/173] avg loss 0.00130315, throughput 2.23273K wps
[Epoch 95 Batch 120/173] avg loss 0.00123706, throughput 2.23105K wps
[Epoch 95 Batch 150/173] avg loss 0.00145875, throughput 2.2313K wps
Begin Testing...
[Epoch 95] train avg loss 0.00129096, dev acc 0.8342, dev avg loss 0.418329, throughput 2.23781K wps
[Epoch 96 Batch 30/173] avg loss 0.00122955, throughput 2.2865K wps
[Epoch 96 Batch 60/173] avg loss 0.0012281, throughput 2.23803K wps
[Epoch 96 Batch 90/173] avg loss 0.00136582, throughput 2.22307K wps
[Epoch 96 Batch 120/173] avg loss 0.00124296, throughput 2.22445K wps
[Epoch 96 Batch 150/173] avg loss 0.00117518, throughput 2.23042K wps
Begin Testing...
[Epoch 96] train avg loss 0.00124398, dev acc 0.8342, dev avg loss 0.420801, throughput 2.2382K wps
[Epoch 97 Batch 30/173] avg loss 0.00121805, throughput 2.27981K wps
[Epoch 97 Batch 60/173] avg loss 0.00130889, throughput 2.22945K wps
[Epoch 97 Batch 90/173] avg loss 0.00121988, throughput 2.23593K wps
[Epoch 97 Batch 120/173] avg loss 0.0012041, throughput 2.23228K wps
[Epoch 97 Batch 150/173] avg loss 0.00114292, throughput 2.23314K wps
Begin Testing...
[Epoch 97] train avg loss 0.00123548, dev acc 0.8342, dev avg loss 0.420439, throughput 2.24145K wps
[Epoch 98 Batch 30/173] avg loss 0.00116982, throughput 2.2848K wps
[Epoch 98 Batch 60/173] avg loss 0.0010478, throughput 2.23025K wps
[Epoch 98 Batch 90/173] avg loss 0.00118936, throughput 2.23682K wps
[Epoch 98 Batch 120/173] avg loss 0.00119412, throughput 2.23811K wps
[Epoch 98 Batch 150/173] avg loss 0.00112237, throughput 2.22209K wps
Begin Testing...
[Epoch 98] train avg loss 0.00115451, dev acc 0.8373, dev avg loss 0.419911, throughput 2.23991K wps
Observed Improvement.
Begin Testing...
[Epoch 99 Batch 30/173] avg loss 0.00112214, throughput 2.27475K wps
[Epoch 99 Batch 60/173] avg loss 0.00121731, throughput 2.2156K wps
[Epoch 99 Batch 90/173] avg loss 0.00122546, throughput 2.23579K wps
[Epoch 99 Batch 120/173] avg loss 0.00107279, throughput 2.23411K wps
[Epoch 99 Batch 150/173] avg loss 0.00117432, throughput 2.23906K wps
Begin Testing...
[Epoch 99] train avg loss 0.00115017, dev acc 0.8332, dev avg loss 0.422935, throughput 2.23877K wps
[Epoch 100 Batch 30/173] avg loss 0.00118783, throughput 2.26927K wps
[Epoch 100 Batch 60/173] avg loss 0.0010401, throughput 2.21994K wps
[Epoch 100 Batch 90/173] avg loss 0.00110134, throughput 2.23628K wps
[Epoch 100 Batch 120/173] avg loss 0.0011467, throughput 2.23207K wps
[Epoch 100 Batch 150/173] avg loss 0.00111396, throughput 2.22338K wps
Begin Testing...
[Epoch 100] train avg loss 0.00112391, dev acc 0.8373, dev avg loss 0.424459, throughput 2.23477K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/173] avg loss 0.00106377, throughput 2.27666K wps
[Epoch 101 Batch 60/173] avg loss 0.00113246, throughput 2.23225K wps
[Epoch 101 Batch 90/173] avg loss 0.00114036, throughput 2.22908K wps
[Epoch 101 Batch 120/173] avg loss 0.00116695, throughput 2.23704K wps
[Epoch 101 Batch 150/173] avg loss 0.0011004, throughput 2.23429K wps
Begin Testing...
[Epoch 101] train avg loss 0.00110024, dev acc 0.8332, dev avg loss 0.424671, throughput 2.2413K wps
[Epoch 102 Batch 30/173] avg loss 0.00104093, throughput 2.27569K wps
[Epoch 102 Batch 60/173] avg loss 0.00108261, throughput 2.23775K wps
[Epoch 102 Batch 90/173] avg loss 0.0010641, throughput 2.22916K wps
[Epoch 102 Batch 120/173] avg loss 0.00114049, throughput 2.23167K wps
[Epoch 102 Batch 150/173] avg loss 0.00114726, throughput 2.22906K wps
Begin Testing...
[Epoch 102] train avg loss 0.0010927, dev acc 0.8311, dev avg loss 0.424364, throughput 2.23907K wps
[Epoch 103 Batch 30/173] avg loss 0.00125568, throughput 2.2675K wps
[Epoch 103 Batch 60/173] avg loss 0.00104315, throughput 2.23209K wps
[Epoch 103 Batch 90/173] avg loss 0.00108424, throughput 2.23816K wps
[Epoch 103 Batch 120/173] avg loss 0.00113075, throughput 2.22811K wps
[Epoch 103 Batch 150/173] avg loss 0.00105507, throughput 2.2283K wps
Begin Testing...
[Epoch 103] train avg loss 0.00109555, dev acc 0.8373, dev avg loss 0.42745, throughput 2.23785K wps
Observed Improvement.
Begin Testing...
[Epoch 104 Batch 30/173] avg loss 0.00101825, throughput 2.28846K wps
[Epoch 104 Batch 60/173] avg loss 0.00100573, throughput 2.22185K wps
[Epoch 104 Batch 90/173] avg loss 0.00117838, throughput 2.23649K wps
[Epoch 104 Batch 120/173] avg loss 0.001126, throughput 2.21969K wps
[Epoch 104 Batch 150/173] avg loss 0.00106806, throughput 2.22779K wps
Begin Testing...
[Epoch 104] train avg loss 0.00108025, dev acc 0.8342, dev avg loss 0.426913, throughput 2.2387K wps
[Epoch 105 Batch 30/173] avg loss 0.000952453, throughput 2.27376K wps
[Epoch 105 Batch 60/173] avg loss 0.000984404, throughput 2.22876K wps
[Epoch 105 Batch 90/173] avg loss 0.000936111, throughput 2.2321K wps
[Epoch 105 Batch 120/173] avg loss 0.00102183, throughput 2.21057K wps
[Epoch 105 Batch 150/173] avg loss 0.00106141, throughput 2.23052K wps
Begin Testing...
[Epoch 105] train avg loss 0.00100598, dev acc 0.8342, dev avg loss 0.428793, throughput 2.23364K wps
[Epoch 106 Batch 30/173] avg loss 0.000926409, throughput 2.27962K wps
[Epoch 106 Batch 60/173] avg loss 0.00107155, throughput 2.23602K wps
[Epoch 106 Batch 90/173] avg loss 0.00101642, throughput 2.22772K wps
[Epoch 106 Batch 120/173] avg loss 0.00097523, throughput 2.23488K wps
[Epoch 106 Batch 150/173] avg loss 0.000994863, throughput 2.22043K wps
Begin Testing...
[Epoch 106] train avg loss 0.00102603, dev acc 0.8352, dev avg loss 0.429166, throughput 2.23673K wps
[Epoch 107 Batch 30/173] avg loss 0.000857798, throughput 2.27023K wps
[Epoch 107 Batch 60/173] avg loss 0.000981485, throughput 2.23913K wps
[Epoch 107 Batch 90/173] avg loss 0.000995738, throughput 2.24067K wps
[Epoch 107 Batch 120/173] avg loss 0.00099069, throughput 2.22219K wps
[Epoch 107 Batch 150/173] avg loss 0.00105281, throughput 2.22711K wps
Begin Testing...
[Epoch 107] train avg loss 0.000984347, dev acc 0.8352, dev avg loss 0.429823, throughput 2.23717K wps
[Epoch 108 Batch 30/173] avg loss 0.000977101, throughput 2.27535K wps
[Epoch 108 Batch 60/173] avg loss 0.00092173, throughput 2.23045K wps
[Epoch 108 Batch 90/173] avg loss 0.000992013, throughput 2.23168K wps
[Epoch 108 Batch 120/173] avg loss 0.000951649, throughput 2.24026K wps
[Epoch 108 Batch 150/173] avg loss 0.00103677, throughput 2.21751K wps
Begin Testing...
[Epoch 108] train avg loss 0.000971826, dev acc 0.8342, dev avg loss 0.432138, throughput 2.23789K wps
[Epoch 109 Batch 30/173] avg loss 0.00105163, throughput 2.27259K wps
[Epoch 109 Batch 60/173] avg loss 0.000915427, throughput 2.21757K wps
[Epoch 109 Batch 90/173] avg loss 0.000989083, throughput 2.2352K wps
[Epoch 109 Batch 120/173] avg loss 0.000921204, throughput 2.23615K wps
[Epoch 109 Batch 150/173] avg loss 0.000944285, throughput 2.22143K wps
Begin Testing...
[Epoch 109] train avg loss 0.000970449, dev acc 0.8311, dev avg loss 0.432562, throughput 2.23706K wps
[Epoch 110 Batch 30/173] avg loss 0.000985166, throughput 2.27213K wps
[Epoch 110 Batch 60/173] avg loss 0.000960563, throughput 2.20944K wps
[Epoch 110 Batch 90/173] avg loss 0.00102051, throughput 2.24057K wps
[Epoch 110 Batch 120/173] avg loss 0.000980004, throughput 2.23853K wps
[Epoch 110 Batch 150/173] avg loss 0.00096021, throughput 2.22846K wps
Begin Testing...
[Epoch 110] train avg loss 0.000978379, dev acc 0.8352, dev avg loss 0.434114, throughput 2.23479K wps
[Epoch 111 Batch 30/173] avg loss 0.000973219, throughput 2.2633K wps
[Epoch 111 Batch 60/173] avg loss 0.000933791, throughput 2.22418K wps
[Epoch 111 Batch 90/173] avg loss 0.00090131, throughput 2.22897K wps
[Epoch 111 Batch 120/173] avg loss 0.000943958, throughput 2.23347K wps
[Epoch 111 Batch 150/173] avg loss 0.00100982, throughput 2.23473K wps
Begin Testing...
[Epoch 111] train avg loss 0.000967593, dev acc 0.8321, dev avg loss 0.436073, throughput 2.23485K wps
[Epoch 112 Batch 30/173] avg loss 0.000908503, throughput 2.27918K wps
[Epoch 112 Batch 60/173] avg loss 0.000932867, throughput 2.23038K wps
[Epoch 112 Batch 90/173] avg loss 0.000985292, throughput 2.22732K wps
[Epoch 112 Batch 120/173] avg loss 0.000888005, throughput 2.23242K wps
[Epoch 112 Batch 150/173] avg loss 0.00102825, throughput 2.23165K wps
Begin Testing...
[Epoch 112] train avg loss 0.000945329, dev acc 0.8311, dev avg loss 0.436188, throughput 2.23812K wps
[Epoch 113 Batch 30/173] avg loss 0.000919711, throughput 2.27198K wps
[Epoch 113 Batch 60/173] avg loss 0.000899907, throughput 2.22571K wps
[Epoch 113 Batch 90/173] avg loss 0.000894291, throughput 2.23521K wps
[Epoch 113 Batch 120/173] avg loss 0.000949292, throughput 2.23459K wps
[Epoch 113 Batch 150/173] avg loss 0.000890451, throughput 2.22304K wps
Begin Testing...
[Epoch 113] train avg loss 0.000915121, dev acc 0.8352, dev avg loss 0.434779, throughput 2.23569K wps
[Epoch 114 Batch 30/173] avg loss 0.000797829, throughput 2.24511K wps
[Epoch 114 Batch 60/173] avg loss 0.000876971, throughput 2.22901K wps
[Epoch 114 Batch 90/173] avg loss 0.000823965, throughput 2.22733K wps
[Epoch 114 Batch 120/173] avg loss 0.000962386, throughput 2.21658K wps
[Epoch 114 Batch 150/173] avg loss 0.000848837, throughput 2.21534K wps
Begin Testing...
[Epoch 114] train avg loss 0.000877345, dev acc 0.8332, dev avg loss 0.437282, throughput 2.22638K wps
[Epoch 115 Batch 30/173] avg loss 0.000817217, throughput 2.27732K wps
[Epoch 115 Batch 60/173] avg loss 0.00082895, throughput 2.21873K wps
[Epoch 115 Batch 90/173] avg loss 0.00082961, throughput 2.22811K wps
[Epoch 115 Batch 120/173] avg loss 0.00088352, throughput 2.23184K wps
[Epoch 115 Batch 150/173] avg loss 0.000844006, throughput 2.23615K wps
Begin Testing...
[Epoch 115] train avg loss 0.000855847, dev acc 0.8332, dev avg loss 0.437754, throughput 2.23665K wps
[Epoch 116 Batch 30/173] avg loss 0.000807884, throughput 2.27438K wps
[Epoch 116 Batch 60/173] avg loss 0.000864472, throughput 2.22508K wps
[Epoch 116 Batch 90/173] avg loss 0.000892428, throughput 2.22674K wps
[Epoch 116 Batch 120/173] avg loss 0.000910997, throughput 2.23521K wps
[Epoch 116 Batch 150/173] avg loss 0.000818355, throughput 2.22348K wps
Begin Testing...
[Epoch 116] train avg loss 0.000853286, dev acc 0.8342, dev avg loss 0.437758, throughput 2.23564K wps
[Epoch 117 Batch 30/173] avg loss 0.000853549, throughput 2.26729K wps
[Epoch 117 Batch 60/173] avg loss 0.000835748, throughput 2.20581K wps
[Epoch 117 Batch 90/173] avg loss 0.000826613, throughput 2.23362K wps
[Epoch 117 Batch 120/173] avg loss 0.000836886, throughput 2.24463K wps
[Epoch 117 Batch 150/173] avg loss 0.000881222, throughput 2.23191K wps
Begin Testing...
[Epoch 117] train avg loss 0.000847183, dev acc 0.8332, dev avg loss 0.439368, throughput 2.23603K wps
[Epoch 118 Batch 30/173] avg loss 0.000724991, throughput 2.26894K wps
[Epoch 118 Batch 60/173] avg loss 0.000903344, throughput 2.23774K wps
[Epoch 118 Batch 90/173] avg loss 0.000842673, throughput 2.21759K wps
[Epoch 118 Batch 120/173] avg loss 0.000845396, throughput 2.22218K wps
[Epoch 118 Batch 150/173] avg loss 0.000785884, throughput 2.22912K wps
Begin Testing...
[Epoch 118] train avg loss 0.000833717, dev acc 0.8332, dev avg loss 0.440667, throughput 2.23579K wps
[Epoch 119 Batch 30/173] avg loss 0.00081902, throughput 2.27737K wps
[Epoch 119 Batch 60/173] avg loss 0.000763772, throughput 2.22551K wps
[Epoch 119 Batch 90/173] avg loss 0.000838044, throughput 2.22726K wps
[Epoch 119 Batch 120/173] avg loss 0.000811876, throughput 2.23306K wps
[Epoch 119 Batch 150/173] avg loss 0.000782866, throughput 2.23321K wps
Begin Testing...
[Epoch 119] train avg loss 0.000799287, dev acc 0.8373, dev avg loss 0.442181, throughput 2.23882K wps
Observed Improvement.
Begin Testing...
[Epoch 120 Batch 30/173] avg loss 0.000764084, throughput 2.27636K wps
[Epoch 120 Batch 60/173] avg loss 0.000747637, throughput 2.22701K wps
[Epoch 120 Batch 90/173] avg loss 0.000746404, throughput 2.22821K wps
[Epoch 120 Batch 120/173] avg loss 0.00084585, throughput 2.23681K wps
[Epoch 120 Batch 150/173] avg loss 0.000836168, throughput 2.23522K wps
Begin Testing...
[Epoch 120] train avg loss 0.000782444, dev acc 0.8363, dev avg loss 0.442246, throughput 2.2399K wps
[Epoch 121 Batch 30/173] avg loss 0.000819296, throughput 2.28262K wps
[Epoch 121 Batch 60/173] avg loss 0.00079979, throughput 2.22319K wps
[Epoch 121 Batch 90/173] avg loss 0.000739924, throughput 2.21739K wps
[Epoch 121 Batch 120/173] avg loss 0.000740666, throughput 2.21942K wps
[Epoch 121 Batch 150/173] avg loss 0.000803568, throughput 2.23256K wps
Begin Testing...
[Epoch 121] train avg loss 0.000780872, dev acc 0.8342, dev avg loss 0.443232, throughput 2.23502K wps
[Epoch 122 Batch 30/173] avg loss 0.00071972, throughput 2.28733K wps
[Epoch 122 Batch 60/173] avg loss 0.000712145, throughput 2.20423K wps
[Epoch 122 Batch 90/173] avg loss 0.000774572, throughput 2.228K wps
[Epoch 122 Batch 120/173] avg loss 0.000781601, throughput 2.22589K wps
[Epoch 122 Batch 150/173] avg loss 0.000790044, throughput 2.22479K wps
Begin Testing...
[Epoch 122] train avg loss 0.000762055, dev acc 0.8394, dev avg loss 0.445001, throughput 2.2316K wps
Observed Improvement.
Begin Testing...
[Epoch 123 Batch 30/173] avg loss 0.000672364, throughput 2.28734K wps
[Epoch 123 Batch 60/173] avg loss 0.000711918, throughput 2.23305K wps
[Epoch 123 Batch 90/173] avg loss 0.00079809, throughput 2.23599K wps
[Epoch 123 Batch 120/173] avg loss 0.000865829, throughput 2.21086K wps
[Epoch 123 Batch 150/173] avg loss 0.000696282, throughput 2.22582K wps
Begin Testing...
[Epoch 123] train avg loss 0.000760759, dev acc 0.8363, dev avg loss 0.444784, throughput 2.2367K wps
[Epoch 124 Batch 30/173] avg loss 0.000663327, throughput 2.27336K wps
[Epoch 124 Batch 60/173] avg loss 0.000713681, throughput 2.22766K wps
[Epoch 124 Batch 90/173] avg loss 0.000800872, throughput 2.23341K wps
[Epoch 124 Batch 120/173] avg loss 0.000702814, throughput 2.23451K wps
[Epoch 124 Batch 150/173] avg loss 0.000764959, throughput 2.23905K wps
Begin Testing...
[Epoch 124] train avg loss 0.000738105, dev acc 0.8384, dev avg loss 0.445966, throughput 2.2399K wps
[Epoch 125 Batch 30/173] avg loss 0.000753506, throughput 2.27471K wps
[Epoch 125 Batch 60/173] avg loss 0.000734168, throughput 2.23744K wps
[Epoch 125 Batch 90/173] avg loss 0.000717771, throughput 2.21808K wps
[Epoch 125 Batch 120/173] avg loss 0.000769809, throughput 2.23476K wps
[Epoch 125 Batch 150/173] avg loss 0.000749418, throughput 2.23381K wps
Begin Testing...
[Epoch 125] train avg loss 0.000744494, dev acc 0.8342, dev avg loss 0.444724, throughput 2.2396K wps
[Epoch 126 Batch 30/173] avg loss 0.000695899, throughput 2.26926K wps
[Epoch 126 Batch 60/173] avg loss 0.000661252, throughput 2.22777K wps
[Epoch 126 Batch 90/173] avg loss 0.000799716, throughput 2.21355K wps
[Epoch 126 Batch 120/173] avg loss 0.000764866, throughput 2.23473K wps
[Epoch 126 Batch 150/173] avg loss 0.000648182, throughput 2.23602K wps
Begin Testing...
[Epoch 126] train avg loss 0.000726213, dev acc 0.8352, dev avg loss 0.446381, throughput 2.23693K wps
[Epoch 127 Batch 30/173] avg loss 0.00068367, throughput 2.26584K wps
[Epoch 127 Batch 60/173] avg loss 0.000662946, throughput 2.23382K wps
[Epoch 127 Batch 90/173] avg loss 0.000742917, throughput 2.22125K wps
[Epoch 127 Batch 120/173] avg loss 0.000586088, throughput 2.22536K wps
[Epoch 127 Batch 150/173] avg loss 0.000744142, throughput 2.22595K wps
Begin Testing...
[Epoch 127] train avg loss 0.000702591, dev acc 0.8332, dev avg loss 0.451235, throughput 2.23322K wps
[Epoch 128 Batch 30/173] avg loss 0.000719149, throughput 2.27817K wps
[Epoch 128 Batch 60/173] avg loss 0.000619636, throughput 2.23089K wps
[Epoch 128 Batch 90/173] avg loss 0.000714885, throughput 2.2408K wps
[Epoch 128 Batch 120/173] avg loss 0.000739917, throughput 2.23168K wps
[Epoch 128 Batch 150/173] avg loss 0.000691877, throughput 2.22698K wps
Begin Testing...
[Epoch 128] train avg loss 0.000695283, dev acc 0.8300, dev avg loss 0.44867, throughput 2.23813K wps
[Epoch 129 Batch 30/173] avg loss 0.000754843, throughput 2.26825K wps
[Epoch 129 Batch 60/173] avg loss 0.00071389, throughput 2.22458K wps
[Epoch 129 Batch 90/173] avg loss 0.000729742, throughput 2.22518K wps
[Epoch 129 Batch 120/173] avg loss 0.000669248, throughput 2.23913K wps
[Epoch 129 Batch 150/173] avg loss 0.000658626, throughput 2.23528K wps
Begin Testing...
[Epoch 129] train avg loss 0.000711895, dev acc 0.8363, dev avg loss 0.4528, throughput 2.23596K wps
[Epoch 130 Batch 30/173] avg loss 0.000740094, throughput 2.28625K wps
[Epoch 130 Batch 60/173] avg loss 0.00065911, throughput 2.22518K wps
[Epoch 130 Batch 90/173] avg loss 0.000658166, throughput 2.23219K wps
[Epoch 130 Batch 120/173] avg loss 0.000658368, throughput 2.20975K wps
[Epoch 130 Batch 150/173] avg loss 0.000744003, throughput 2.22737K wps
Begin Testing...
[Epoch 130] train avg loss 0.000696632, dev acc 0.8290, dev avg loss 0.449035, throughput 2.23406K wps
[Epoch 131 Batch 30/173] avg loss 0.000714046, throughput 2.28351K wps
[Epoch 131 Batch 60/173] avg loss 0.000696354, throughput 2.22992K wps
[Epoch 131 Batch 90/173] avg loss 0.000653336, throughput 2.2227K wps
[Epoch 131 Batch 120/173] avg loss 0.000683576, throughput 2.23311K wps
[Epoch 131 Batch 150/173] avg loss 0.000794014, throughput 2.21018K wps
Begin Testing...
[Epoch 131] train avg loss 0.000706095, dev acc 0.8332, dev avg loss 0.451261, throughput 2.23653K wps
[Epoch 132 Batch 30/173] avg loss 0.000651534, throughput 2.28247K wps
[Epoch 132 Batch 60/173] avg loss 0.000570063, throughput 2.2314K wps
[Epoch 132 Batch 90/173] avg loss 0.000722961, throughput 2.22341K wps
[Epoch 132 Batch 120/173] avg loss 0.000597129, throughput 2.21861K wps
[Epoch 132 Batch 150/173] avg loss 0.000628685, throughput 2.1972K wps
Begin Testing...
[Epoch 132] train avg loss 0.000637846, dev acc 0.8352, dev avg loss 0.453818, throughput 2.22765K wps
[Epoch 133 Batch 30/173] avg loss 0.000682768, throughput 2.24456K wps
[Epoch 133 Batch 60/173] avg loss 0.000695982, throughput 2.22258K wps
[Epoch 133 Batch 90/173] avg loss 0.000628734, throughput 2.22146K wps
[Epoch 133 Batch 120/173] avg loss 0.000631879, throughput 2.22311K wps
[Epoch 133 Batch 150/173] avg loss 0.000643773, throughput 2.20527K wps
Begin Testing...
[Epoch 133] train avg loss 0.00065626, dev acc 0.8342, dev avg loss 0.45347, throughput 2.2239K wps
[Epoch 134 Batch 30/173] avg loss 0.000614763, throughput 2.27482K wps
[Epoch 134 Batch 60/173] avg loss 0.000773709, throughput 2.21348K wps
[Epoch 134 Batch 90/173] avg loss 0.000682461, throughput 2.20584K wps
[Epoch 134 Batch 120/173] avg loss 0.000648464, throughput 2.21581K wps
[Epoch 134 Batch 150/173] avg loss 0.000585303, throughput 2.22469K wps
Begin Testing...
[Epoch 134] train avg loss 0.000654429, dev acc 0.8321, dev avg loss 0.454389, throughput 2.22243K wps
[Epoch 135 Batch 30/173] avg loss 0.00064935, throughput 2.2733K wps
[Epoch 135 Batch 60/173] avg loss 0.000602024, throughput 2.22256K wps
[Epoch 135 Batch 90/173] avg loss 0.000670077, throughput 2.22237K wps
[Epoch 135 Batch 120/173] avg loss 0.000598226, throughput 2.23545K wps
[Epoch 135 Batch 150/173] avg loss 0.000575025, throughput 2.23664K wps
Begin Testing...
[Epoch 135] train avg loss 0.000629186, dev acc 0.8352, dev avg loss 0.457417, throughput 2.2372K wps
[Epoch 136 Batch 30/173] avg loss 0.000587531, throughput 2.27135K wps
[Epoch 136 Batch 60/173] avg loss 0.000593451, throughput 2.23274K wps
[Epoch 136 Batch 90/173] avg loss 0.0006726, throughput 2.23507K wps
[Epoch 136 Batch 120/173] avg loss 0.000588929, throughput 2.22317K wps
[Epoch 136 Batch 150/173] avg loss 0.000571297, throughput 2.23254K wps
Begin Testing...
[Epoch 136] train avg loss 0.000595724, dev acc 0.8311, dev avg loss 0.457047, throughput 2.23698K wps
[Epoch 137 Batch 30/173] avg loss 0.000580858, throughput 2.26779K wps
[Epoch 137 Batch 60/173] avg loss 0.000626475, throughput 2.22718K wps
[Epoch 137 Batch 90/173] avg loss 0.000581519, throughput 2.23512K wps
[Epoch 137 Batch 120/173] avg loss 0.000608565, throughput 2.22958K wps
[Epoch 137 Batch 150/173] avg loss 0.000613449, throughput 2.23316K wps
Begin Testing...
[Epoch 137] train avg loss 0.000604647, dev acc 0.8332, dev avg loss 0.458463, throughput 2.23788K wps
[Epoch 138 Batch 30/173] avg loss 0.000565469, throughput 2.25418K wps
[Epoch 138 Batch 60/173] avg loss 0.000558137, throughput 2.22324K wps
[Epoch 138 Batch 90/173] avg loss 0.00056233, throughput 2.23812K wps
[Epoch 138 Batch 120/173] avg loss 0.000674088, throughput 2.2374K wps
[Epoch 138 Batch 150/173] avg loss 0.000582443, throughput 2.2298K wps
Begin Testing...
[Epoch 138] train avg loss 0.000602005, dev acc 0.8352, dev avg loss 0.459895, throughput 2.23548K wps
[Epoch 139 Batch 30/173] avg loss 0.000593599, throughput 2.27013K wps
[Epoch 139 Batch 60/173] avg loss 0.000553704, throughput 2.22063K wps
[Epoch 139 Batch 90/173] avg loss 0.00057751, throughput 2.23558K wps
[Epoch 139 Batch 120/173] avg loss 0.000568055, throughput 2.23787K wps
[Epoch 139 Batch 150/173] avg loss 0.000622001, throughput 2.21577K wps
Begin Testing...
[Epoch 139] train avg loss 0.000589357, dev acc 0.8332, dev avg loss 0.459764, throughput 2.23498K wps
[Epoch 140 Batch 30/173] avg loss 0.000656911, throughput 2.25994K wps
[Epoch 140 Batch 60/173] avg loss 0.000620403, throughput 2.22337K wps
[Epoch 140 Batch 90/173] avg loss 0.000555767, throughput 2.23033K wps
[Epoch 140 Batch 120/173] avg loss 0.000586062, throughput 2.22512K wps
[Epoch 140 Batch 150/173] avg loss 0.000589955, throughput 2.22125K wps
Begin Testing...
[Epoch 140] train avg loss 0.00060935, dev acc 0.8352, dev avg loss 0.459466, throughput 2.23237K wps
[Epoch 141 Batch 30/173] avg loss 0.000531803, throughput 2.2871K wps
[Epoch 141 Batch 60/173] avg loss 0.000660589, throughput 2.23631K wps
[Epoch 141 Batch 90/173] avg loss 0.000663929, throughput 2.2286K wps
[Epoch 141 Batch 120/173] avg loss 0.000570625, throughput 2.22562K wps
[Epoch 141 Batch 150/173] avg loss 0.000577429, throughput 2.23726K wps
Begin Testing...
[Epoch 141] train avg loss 0.000587916, dev acc 0.8332, dev avg loss 0.460376, throughput 2.24182K wps
[Epoch 142 Batch 30/173] avg loss 0.000562492, throughput 2.26466K wps
[Epoch 142 Batch 60/173] avg loss 0.000613859, throughput 2.21587K wps
[Epoch 142 Batch 90/173] avg loss 0.000563772, throughput 2.23317K wps
[Epoch 142 Batch 120/173] avg loss 0.000526092, throughput 2.22294K wps
[Epoch 142 Batch 150/173] avg loss 0.00056728, throughput 2.22805K wps
Begin Testing...
[Epoch 142] train avg loss 0.000575456, dev acc 0.8352, dev avg loss 0.466351, throughput 2.2334K wps
[Epoch 143 Batch 30/173] avg loss 0.000528449, throughput 2.28103K wps
[Epoch 143 Batch 60/173] avg loss 0.000551542, throughput 2.22456K wps
[Epoch 143 Batch 90/173] avg loss 0.00062842, throughput 2.23206K wps
[Epoch 143 Batch 120/173] avg loss 0.000535296, throughput 2.23639K wps
[Epoch 143 Batch 150/173] avg loss 0.000530742, throughput 2.22914K wps
Begin Testing...
[Epoch 143] train avg loss 0.000552582, dev acc 0.8332, dev avg loss 0.463617, throughput 2.23891K wps
[Epoch 144 Batch 30/173] avg loss 0.000528206, throughput 2.2708K wps
[Epoch 144 Batch 60/173] avg loss 0.000546421, throughput 2.22787K wps
[Epoch 144 Batch 90/173] avg loss 0.000606461, throughput 2.22515K wps
[Epoch 144 Batch 120/173] avg loss 0.00054707, throughput 2.22402K wps
[Epoch 144 Batch 150/173] avg loss 0.000617434, throughput 2.22352K wps
Begin Testing...
[Epoch 144] train avg loss 0.000562752, dev acc 0.8332, dev avg loss 0.463465, throughput 2.23444K wps
[Epoch 145 Batch 30/173] avg loss 0.000577148, throughput 2.28379K wps
[Epoch 145 Batch 60/173] avg loss 0.000513357, throughput 2.2309K wps
[Epoch 145 Batch 90/173] avg loss 0.000473043, throughput 2.2392K wps
[Epoch 145 Batch 120/173] avg loss 0.00049904, throughput 2.21253K wps
[Epoch 145 Batch 150/173] avg loss 0.000615452, throughput 2.23677K wps
Begin Testing...
[Epoch 145] train avg loss 0.000534196, dev acc 0.8352, dev avg loss 0.464523, throughput 2.23886K wps
[Epoch 146 Batch 30/173] avg loss 0.000522166, throughput 2.26873K wps
[Epoch 146 Batch 60/173] avg loss 0.000606847, throughput 2.21645K wps
[Epoch 146 Batch 90/173] avg loss 0.000560627, throughput 2.22787K wps
[Epoch 146 Batch 120/173] avg loss 0.000599392, throughput 2.23073K wps
[Epoch 146 Batch 150/173] avg loss 0.000486422, throughput 2.23135K wps
Begin Testing...
[Epoch 146] train avg loss 0.000559671, dev acc 0.8373, dev avg loss 0.464599, throughput 2.2352K wps
[Epoch 147 Batch 30/173] avg loss 0.00051825, throughput 2.27677K wps
[Epoch 147 Batch 60/173] avg loss 0.00051116, throughput 2.22288K wps
[Epoch 147 Batch 90/173] avg loss 0.000583385, throughput 2.22721K wps
[Epoch 147 Batch 120/173] avg loss 0.000579373, throughput 2.2294K wps
[Epoch 147 Batch 150/173] avg loss 0.000579353, throughput 2.2365K wps
Begin Testing...
[Epoch 147] train avg loss 0.000544858, dev acc 0.8342, dev avg loss 0.466301, throughput 2.23871K wps
[Epoch 148 Batch 30/173] avg loss 0.000487854, throughput 2.28654K wps
[Epoch 148 Batch 60/173] avg loss 0.000537376, throughput 2.22274K wps
[Epoch 148 Batch 90/173] avg loss 0.000594836, throughput 2.20943K wps
[Epoch 148 Batch 120/173] avg loss 0.000562081, throughput 2.22508K wps
[Epoch 148 Batch 150/173] avg loss 0.000506062, throughput 2.23585K wps
Begin Testing...
[Epoch 148] train avg loss 0.000538166, dev acc 0.8352, dev avg loss 0.465789, throughput 2.23438K wps
[Epoch 149 Batch 30/173] avg loss 0.000506452, throughput 2.28466K wps
[Epoch 149 Batch 60/173] avg loss 0.0005874, throughput 2.21212K wps
[Epoch 149 Batch 90/173] avg loss 0.000540322, throughput 2.22593K wps
[Epoch 149 Batch 120/173] avg loss 0.000611738, throughput 2.21759K wps
[Epoch 149 Batch 150/173] avg loss 0.000508304, throughput 2.22724K wps
Begin Testing...
[Epoch 149] train avg loss 0.000549931, dev acc 0.8311, dev avg loss 0.466785, throughput 2.23383K wps
[Epoch 150 Batch 30/173] avg loss 0.000524153, throughput 2.25991K wps
[Epoch 150 Batch 60/173] avg loss 0.000489701, throughput 2.2337K wps
[Epoch 150 Batch 90/173] avg loss 0.000538414, throughput 2.23225K wps
[Epoch 150 Batch 120/173] avg loss 0.000511263, throughput 2.22796K wps
[Epoch 150 Batch 150/173] avg loss 0.000488887, throughput 2.23185K wps
Begin Testing...
[Epoch 150] train avg loss 0.000506517, dev acc 0.8321, dev avg loss 0.468669, throughput 2.23716K wps
[Epoch 151 Batch 30/173] avg loss 0.000504988, throughput 2.26728K wps
[Epoch 151 Batch 60/173] avg loss 0.000465229, throughput 2.21802K wps
[Epoch 151 Batch 90/173] avg loss 0.000490282, throughput 2.23611K wps
[Epoch 151 Batch 120/173] avg loss 0.000497023, throughput 2.23467K wps
[Epoch 151 Batch 150/173] avg loss 0.000508672, throughput 2.21242K wps
Begin Testing...
[Epoch 151] train avg loss 0.000505295, dev acc 0.8321, dev avg loss 0.47271, throughput 2.23189K wps
[Epoch 152 Batch 30/173] avg loss 0.000520005, throughput 2.274K wps
[Epoch 152 Batch 60/173] avg loss 0.000430818, throughput 2.22964K wps
[Epoch 152 Batch 90/173] avg loss 0.000484596, throughput 2.23703K wps
[Epoch 152 Batch 120/173] avg loss 0.000487626, throughput 2.23319K wps
[Epoch 152 Batch 150/173] avg loss 0.000515, throughput 2.23862K wps
Begin Testing...
[Epoch 152] train avg loss 0.000494284, dev acc 0.8311, dev avg loss 0.472014, throughput 2.24139K wps
[Epoch 153 Batch 30/173] avg loss 0.000472541, throughput 2.27401K wps
[Epoch 153 Batch 60/173] avg loss 0.00047903, throughput 2.21538K wps
[Epoch 153 Batch 90/173] avg loss 0.000495343, throughput 2.22515K wps
[Epoch 153 Batch 120/173] avg loss 0.000571267, throughput 2.211K wps
[Epoch 153 Batch 150/173] avg loss 0.000517548, throughput 2.22052K wps
Begin Testing...
[Epoch 153] train avg loss 0.000502977, dev acc 0.8332, dev avg loss 0.471338, throughput 2.22953K wps
[Epoch 154 Batch 30/173] avg loss 0.000472408, throughput 2.26161K wps
[Epoch 154 Batch 60/173] avg loss 0.000557172, throughput 2.23576K wps
[Epoch 154 Batch 90/173] avg loss 0.000475815, throughput 2.22645K wps
[Epoch 154 Batch 120/173] avg loss 0.000478323, throughput 2.22582K wps
[Epoch 154 Batch 150/173] avg loss 0.000603981, throughput 2.21058K wps
Begin Testing...
[Epoch 154] train avg loss 0.000515375, dev acc 0.8332, dev avg loss 0.470966, throughput 2.23171K wps
[Epoch 155 Batch 30/173] avg loss 0.00049587, throughput 2.2794K wps
[Epoch 155 Batch 60/173] avg loss 0.000404431, throughput 2.20017K wps
[Epoch 155 Batch 90/173] avg loss 0.000530085, throughput 2.22766K wps
[Epoch 155 Batch 120/173] avg loss 0.000456302, throughput 2.23512K wps
[Epoch 155 Batch 150/173] avg loss 0.000523041, throughput 2.23778K wps
Begin Testing...
[Epoch 155] train avg loss 0.000486274, dev acc 0.8321, dev avg loss 0.47329, throughput 2.23599K wps
[Epoch 156 Batch 30/173] avg loss 0.000508519, throughput 2.28464K wps
[Epoch 156 Batch 60/173] avg loss 0.000464935, throughput 2.21493K wps
[Epoch 156 Batch 90/173] avg loss 0.000506744, throughput 2.22228K wps
[Epoch 156 Batch 120/173] avg loss 0.000475134, throughput 2.23752K wps
[Epoch 156 Batch 150/173] avg loss 0.0004175, throughput 2.23868K wps
Begin Testing...
[Epoch 156] train avg loss 0.000481522, dev acc 0.8332, dev avg loss 0.474677, throughput 2.23648K wps
[Epoch 157 Batch 30/173] avg loss 0.000522403, throughput 2.26884K wps
[Epoch 157 Batch 60/173] avg loss 0.000463396, throughput 2.21245K wps
[Epoch 157 Batch 90/173] avg loss 0.000434818, throughput 2.23346K wps
[Epoch 157 Batch 120/173] avg loss 0.000488408, throughput 2.23328K wps
[Epoch 157 Batch 150/173] avg loss 0.000498275, throughput 2.22861K wps
Begin Testing...
[Epoch 157] train avg loss 0.000472157, dev acc 0.8342, dev avg loss 0.4753, throughput 2.23458K wps
[Epoch 158 Batch 30/173] avg loss 0.000441493, throughput 2.27496K wps
[Epoch 158 Batch 60/173] avg loss 0.000511455, throughput 2.23619K wps
[Epoch 158 Batch 90/173] avg loss 0.000439705, throughput 2.22989K wps
[Epoch 158 Batch 120/173] avg loss 0.00049408, throughput 2.22491K wps
[Epoch 158 Batch 150/173] avg loss 0.000419005, throughput 2.23168K wps
Begin Testing...
[Epoch 158] train avg loss 0.000460988, dev acc 0.8342, dev avg loss 0.47805, throughput 2.23963K wps
[Epoch 159 Batch 30/173] avg loss 0.000478338, throughput 2.25862K wps
[Epoch 159 Batch 60/173] avg loss 0.000457376, throughput 2.21325K wps
[Epoch 159 Batch 90/173] avg loss 0.00047341, throughput 2.23454K wps
[Epoch 159 Batch 120/173] avg loss 0.000512279, throughput 2.22612K wps
[Epoch 159 Batch 150/173] avg loss 0.000471685, throughput 2.22235K wps
Begin Testing...
[Epoch 159] train avg loss 0.000477265, dev acc 0.8311, dev avg loss 0.481048, throughput 2.22981K wps
[Epoch 160 Batch 30/173] avg loss 0.000421938, throughput 2.27696K wps
[Epoch 160 Batch 60/173] avg loss 0.000456837, throughput 2.22023K wps
[Epoch 160 Batch 90/173] avg loss 0.000476824, throughput 2.22714K wps
[Epoch 160 Batch 120/173] avg loss 0.000436707, throughput 2.23761K wps
[Epoch 160 Batch 150/173] avg loss 0.000474469, throughput 2.21113K wps
Begin Testing...
[Epoch 160] train avg loss 0.000451697, dev acc 0.8363, dev avg loss 0.478477, throughput 2.23458K wps
[Epoch 161 Batch 30/173] avg loss 0.00045094, throughput 2.25757K wps
[Epoch 161 Batch 60/173] avg loss 0.000440988, throughput 2.21608K wps
[Epoch 161 Batch 90/173] avg loss 0.00042539, throughput 2.22551K wps
[Epoch 161 Batch 120/173] avg loss 0.000488253, throughput 2.22794K wps
[Epoch 161 Batch 150/173] avg loss 0.000539993, throughput 2.22759K wps
Begin Testing...
[Epoch 161] train avg loss 0.000468235, dev acc 0.8332, dev avg loss 0.479273, throughput 2.22691K wps
[Epoch 162 Batch 30/173] avg loss 0.000441592, throughput 2.28231K wps
[Epoch 162 Batch 60/173] avg loss 0.000433456, throughput 2.22788K wps
[Epoch 162 Batch 90/173] avg loss 0.000404314, throughput 2.23681K wps
[Epoch 162 Batch 120/173] avg loss 0.000480155, throughput 2.23888K wps
[Epoch 162 Batch 150/173] avg loss 0.000453996, throughput 2.22128K wps
Begin Testing...
[Epoch 162] train avg loss 0.000455693, dev acc 0.8332, dev avg loss 0.480197, throughput 2.23985K wps
[Epoch 163 Batch 30/173] avg loss 0.000459204, throughput 2.26673K wps
[Epoch 163 Batch 60/173] avg loss 0.000441058, throughput 2.23263K wps
[Epoch 163 Batch 90/173] avg loss 0.000466259, throughput 2.23177K wps
[Epoch 163 Batch 120/173] avg loss 0.00042217, throughput 2.22459K wps
[Epoch 163 Batch 150/173] avg loss 0.000440257, throughput 2.23483K wps
Begin Testing...
[Epoch 163] train avg loss 0.000443734, dev acc 0.8363, dev avg loss 0.478984, throughput 2.23761K wps
[Epoch 164 Batch 30/173] avg loss 0.000442215, throughput 2.27504K wps
[Epoch 164 Batch 60/173] avg loss 0.000433674, throughput 2.22501K wps
[Epoch 164 Batch 90/173] avg loss 0.000480091, throughput 2.22163K wps
[Epoch 164 Batch 120/173] avg loss 0.000484159, throughput 2.23619K wps
[Epoch 164 Batch 150/173] avg loss 0.000502322, throughput 2.22889K wps
Begin Testing...
[Epoch 164] train avg loss 0.000469415, dev acc 0.8342, dev avg loss 0.478946, throughput 2.23681K wps
[Epoch 165 Batch 30/173] avg loss 0.000441921, throughput 2.272K wps
[Epoch 165 Batch 60/173] avg loss 0.000421331, throughput 2.21759K wps
[Epoch 165 Batch 90/173] avg loss 0.000415214, throughput 2.2319K wps
[Epoch 165 Batch 120/173] avg loss 0.000420139, throughput 2.20073K wps
[Epoch 165 Batch 150/173] avg loss 0.000434841, throughput 2.23531K wps
Begin Testing...
[Epoch 165] train avg loss 0.000439145, dev acc 0.8342, dev avg loss 0.481913, throughput 2.2324K wps
[Epoch 166 Batch 30/173] avg loss 0.00044041, throughput 2.28248K wps
[Epoch 166 Batch 60/173] avg loss 0.000446981, throughput 2.23393K wps
[Epoch 166 Batch 90/173] avg loss 0.000486465, throughput 2.23425K wps
[Epoch 166 Batch 120/173] avg loss 0.000424646, throughput 2.21315K wps
[Epoch 166 Batch 150/173] avg loss 0.00044796, throughput 2.2273K wps
Begin Testing...
[Epoch 166] train avg loss 0.000438212, dev acc 0.8342, dev avg loss 0.481773, throughput 2.23724K wps
[Epoch 167 Batch 30/173] avg loss 0.000469857, throughput 2.27784K wps
[Epoch 167 Batch 60/173] avg loss 0.000449928, throughput 2.23163K wps
[Epoch 167 Batch 90/173] avg loss 0.000416126, throughput 2.22791K wps
[Epoch 167 Batch 120/173] avg loss 0.000375405, throughput 2.22067K wps
[Epoch 167 Batch 150/173] avg loss 0.000375578, throughput 2.23291K wps
Begin Testing...
[Epoch 167] train avg loss 0.000422015, dev acc 0.8332, dev avg loss 0.48191, throughput 2.23772K wps
[Epoch 168 Batch 30/173] avg loss 0.000434978, throughput 2.25595K wps
[Epoch 168 Batch 60/173] avg loss 0.000433414, throughput 2.22313K wps
[Epoch 168 Batch 90/173] avg loss 0.000432001, throughput 2.23357K wps
[Epoch 168 Batch 120/173] avg loss 0.000404956, throughput 2.23615K wps
[Epoch 168 Batch 150/173] avg loss 0.000406685, throughput 2.22673K wps
Begin Testing...
[Epoch 168] train avg loss 0.000433802, dev acc 0.8332, dev avg loss 0.482675, throughput 2.23419K wps
[Epoch 169 Batch 30/173] avg loss 0.000458025, throughput 2.27716K wps
[Epoch 169 Batch 60/173] avg loss 0.000404429, throughput 2.22819K wps
[Epoch 169 Batch 90/173] avg loss 0.000411801, throughput 2.23219K wps
[Epoch 169 Batch 120/173] avg loss 0.000405558, throughput 2.23872K wps
[Epoch 169 Batch 150/173] avg loss 0.000416323, throughput 2.23777K wps
Begin Testing...
[Epoch 169] train avg loss 0.000414733, dev acc 0.8332, dev avg loss 0.484483, throughput 2.2418K wps
[Epoch 170 Batch 30/173] avg loss 0.000428173, throughput 2.25864K wps
[Epoch 170 Batch 60/173] avg loss 0.000434121, throughput 2.22921K wps
[Epoch 170 Batch 90/173] avg loss 0.000367665, throughput 2.2354K wps
[Epoch 170 Batch 120/173] avg loss 0.000439516, throughput 2.22681K wps
[Epoch 170 Batch 150/173] avg loss 0.000437793, throughput 2.22299K wps
Begin Testing...
[Epoch 170] train avg loss 0.000420311, dev acc 0.8342, dev avg loss 0.484785, throughput 2.23369K wps
[Epoch 171 Batch 30/173] avg loss 0.000428647, throughput 2.2754K wps
[Epoch 171 Batch 60/173] avg loss 0.000423464, throughput 2.2201K wps
[Epoch 171 Batch 90/173] avg loss 0.000379882, throughput 2.20237K wps
[Epoch 171 Batch 120/173] avg loss 0.000418539, throughput 2.22965K wps
[Epoch 171 Batch 150/173] avg loss 0.000435561, throughput 2.22154K wps
Begin Testing...
[Epoch 171] train avg loss 0.000414916, dev acc 0.8290, dev avg loss 0.487084, throughput 2.22658K wps
[Epoch 172 Batch 30/173] avg loss 0.000401736, throughput 2.27224K wps
[Epoch 172 Batch 60/173] avg loss 0.00040496, throughput 2.22577K wps
[Epoch 172 Batch 90/173] avg loss 0.000389374, throughput 2.22476K wps
[Epoch 172 Batch 120/173] avg loss 0.000378736, throughput 2.21715K wps
[Epoch 172 Batch 150/173] avg loss 0.000362015, throughput 2.23321K wps
Begin Testing...
[Epoch 172] train avg loss 0.000386832, dev acc 0.8332, dev avg loss 0.487604, throughput 2.23461K wps
[Epoch 173 Batch 30/173] avg loss 0.000391022, throughput 2.26739K wps
[Epoch 173 Batch 60/173] avg loss 0.000411177, throughput 2.23505K wps
[Epoch 173 Batch 90/173] avg loss 0.000356358, throughput 2.22664K wps
[Epoch 173 Batch 120/173] avg loss 0.000354065, throughput 2.23548K wps
[Epoch 173 Batch 150/173] avg loss 0.000440758, throughput 2.22667K wps
Begin Testing...
[Epoch 173] train avg loss 0.000397197, dev acc 0.8321, dev avg loss 0.489567, throughput 2.23851K wps
[Epoch 174 Batch 30/173] avg loss 0.000366463, throughput 2.25998K wps
[Epoch 174 Batch 60/173] avg loss 0.000417959, throughput 2.22857K wps
[Epoch 174 Batch 90/173] avg loss 0.00039112, throughput 2.22888K wps
[Epoch 174 Batch 120/173] avg loss 0.000431335, throughput 2.2234K wps
[Epoch 174 Batch 150/173] avg loss 0.000369694, throughput 2.23634K wps
Begin Testing...
[Epoch 174] train avg loss 0.000401247, dev acc 0.8311, dev avg loss 0.49093, throughput 2.2355K wps
[Epoch 175 Batch 30/173] avg loss 0.000379069, throughput 2.28672K wps
[Epoch 175 Batch 60/173] avg loss 0.000361573, throughput 2.22018K wps
[Epoch 175 Batch 90/173] avg loss 0.000411784, throughput 2.22167K wps
[Epoch 175 Batch 120/173] avg loss 0.000391768, throughput 2.20863K wps
[Epoch 175 Batch 150/173] avg loss 0.000386837, throughput 2.22226K wps
Begin Testing...
[Epoch 175] train avg loss 0.000378432, dev acc 0.8352, dev avg loss 0.494635, throughput 2.23156K wps
[Epoch 176 Batch 30/173] avg loss 0.000363053, throughput 2.28366K wps
[Epoch 176 Batch 60/173] avg loss 0.000384944, throughput 2.23474K wps
[Epoch 176 Batch 90/173] avg loss 0.000397218, throughput 2.20059K wps
[Epoch 176 Batch 120/173] avg loss 0.000412048, throughput 2.23751K wps
[Epoch 176 Batch 150/173] avg loss 0.00036884, throughput 2.22192K wps
Begin Testing...
[Epoch 176] train avg loss 0.000378356, dev acc 0.8332, dev avg loss 0.488624, throughput 2.23447K wps
[Epoch 177 Batch 30/173] avg loss 0.000336267, throughput 2.23366K wps
[Epoch 177 Batch 60/173] avg loss 0.000373843, throughput 2.22263K wps
[Epoch 177 Batch 90/173] avg loss 0.000356664, throughput 2.23727K wps
[Epoch 177 Batch 120/173] avg loss 0.00038234, throughput 2.22221K wps
[Epoch 177 Batch 150/173] avg loss 0.000412516, throughput 2.23863K wps
Begin Testing...
[Epoch 177] train avg loss 0.000370151, dev acc 0.8352, dev avg loss 0.490526, throughput 2.23214K wps
[Epoch 178 Batch 30/173] avg loss 0.000377824, throughput 2.28809K wps
[Epoch 178 Batch 60/173] avg loss 0.000373991, throughput 2.23711K wps
[Epoch 178 Batch 90/173] avg loss 0.000344074, throughput 2.22313K wps
[Epoch 178 Batch 120/173] avg loss 0.000376463, throughput 2.2364K wps
[Epoch 178 Batch 150/173] avg loss 0.000383105, throughput 2.23315K wps
Begin Testing...
[Epoch 178] train avg loss 0.000381475, dev acc 0.8321, dev avg loss 0.494442, throughput 2.24203K wps
[Epoch 179 Batch 30/173] avg loss 0.000376373, throughput 2.27902K wps
[Epoch 179 Batch 60/173] avg loss 0.000364212, throughput 2.23563K wps
[Epoch 179 Batch 90/173] avg loss 0.000402173, throughput 2.22172K wps
[Epoch 179 Batch 120/173] avg loss 0.000375595, throughput 2.22105K wps
[Epoch 179 Batch 150/173] avg loss 0.000405741, throughput 2.23177K wps
Begin Testing...
[Epoch 179] train avg loss 0.000386185, dev acc 0.8363, dev avg loss 0.4919, throughput 2.23687K wps
[Epoch 180 Batch 30/173] avg loss 0.000346761, throughput 2.27256K wps
[Epoch 180 Batch 60/173] avg loss 0.000374705, throughput 2.23301K wps
[Epoch 180 Batch 90/173] avg loss 0.000383273, throughput 2.2367K wps
[Epoch 180 Batch 120/173] avg loss 0.00038142, throughput 2.23188K wps
[Epoch 180 Batch 150/173] avg loss 0.000376411, throughput 2.22124K wps
Begin Testing...
[Epoch 180] train avg loss 0.000377874, dev acc 0.8342, dev avg loss 0.493676, throughput 2.23615K wps
[Epoch 181 Batch 30/173] avg loss 0.00034759, throughput 2.2791K wps
[Epoch 181 Batch 60/173] avg loss 0.000424919, throughput 2.23114K wps
[Epoch 181 Batch 90/173] avg loss 0.000356259, throughput 2.23451K wps
[Epoch 181 Batch 120/173] avg loss 0.000363165, throughput 2.22792K wps
[Epoch 181 Batch 150/173] avg loss 0.000381917, throughput 2.22031K wps
Begin Testing...
[Epoch 181] train avg loss 0.000374712, dev acc 0.8352, dev avg loss 0.494527, throughput 2.23755K wps
[Epoch 182 Batch 30/173] avg loss 0.000414133, throughput 2.2688K wps
[Epoch 182 Batch 60/173] avg loss 0.000345086, throughput 2.23585K wps
[Epoch 182 Batch 90/173] avg loss 0.000373008, throughput 2.23151K wps
[Epoch 182 Batch 120/173] avg loss 0.000337914, throughput 2.22995K wps
[Epoch 182 Batch 150/173] avg loss 0.000361962, throughput 2.23119K wps
Begin Testing...
[Epoch 182] train avg loss 0.000364746, dev acc 0.8342, dev avg loss 0.495973, throughput 2.23676K wps
[Epoch 183 Batch 30/173] avg loss 0.000369072, throughput 2.27281K wps
[Epoch 183 Batch 60/173] avg loss 0.000400985, throughput 2.22595K wps
[Epoch 183 Batch 90/173] avg loss 0.000378816, throughput 2.23627K wps
[Epoch 183 Batch 120/173] avg loss 0.000325153, throughput 2.23273K wps
[Epoch 183 Batch 150/173] avg loss 0.000345976, throughput 2.23348K wps
Begin Testing...
[Epoch 183] train avg loss 0.0003671, dev acc 0.8321, dev avg loss 0.496237, throughput 2.23952K wps
[Epoch 184 Batch 30/173] avg loss 0.000319846, throughput 2.27966K wps
[Epoch 184 Batch 60/173] avg loss 0.000371667, throughput 2.23003K wps
[Epoch 184 Batch 90/173] avg loss 0.000368361, throughput 2.21668K wps
[Epoch 184 Batch 120/173] avg loss 0.000357556, throughput 2.21646K wps
[Epoch 184 Batch 150/173] avg loss 0.000341697, throughput 2.22787K wps
Begin Testing...
[Epoch 184] train avg loss 0.000350061, dev acc 0.8332, dev avg loss 0.497673, throughput 2.234K wps
[Epoch 185 Batch 30/173] avg loss 0.000307791, throughput 2.26506K wps
[Epoch 185 Batch 60/173] avg loss 0.000384215, throughput 2.2324K wps
[Epoch 185 Batch 90/173] avg loss 0.000413736, throughput 2.22735K wps
[Epoch 185 Batch 120/173] avg loss 0.000407099, throughput 2.23307K wps
[Epoch 185 Batch 150/173] avg loss 0.000326401, throughput 2.21213K wps
Begin Testing...
[Epoch 185] train avg loss 0.000372748, dev acc 0.8342, dev avg loss 0.497598, throughput 2.23185K wps
[Epoch 186 Batch 30/173] avg loss 0.000296809, throughput 2.27661K wps
[Epoch 186 Batch 60/173] avg loss 0.000352076, throughput 2.22039K wps
[Epoch 186 Batch 90/173] avg loss 0.000313521, throughput 2.22228K wps
[Epoch 186 Batch 120/173] avg loss 0.000371469, throughput 2.22946K wps
[Epoch 186 Batch 150/173] avg loss 0.000350538, throughput 2.22311K wps
Begin Testing...
[Epoch 186] train avg loss 0.000339594, dev acc 0.8342, dev avg loss 0.500096, throughput 2.23301K wps
[Epoch 187 Batch 30/173] avg loss 0.000308051, throughput 2.26658K wps
[Epoch 187 Batch 60/173] avg loss 0.000344896, throughput 2.22442K wps
[Epoch 187 Batch 90/173] avg loss 0.000452735, throughput 2.20687K wps
[Epoch 187 Batch 120/173] avg loss 0.000340802, throughput 2.23706K wps
[Epoch 187 Batch 150/173] avg loss 0.000400176, throughput 2.22481K wps
Begin Testing...
[Epoch 187] train avg loss 0.00036236, dev acc 0.8352, dev avg loss 0.499999, throughput 2.22938K wps
[Epoch 188 Batch 30/173] avg loss 0.000364273, throughput 2.27765K wps
[Epoch 188 Batch 60/173] avg loss 0.000365135, throughput 2.23494K wps
[Epoch 188 Batch 90/173] avg loss 0.000385163, throughput 2.23849K wps
[Epoch 188 Batch 120/173] avg loss 0.000371742, throughput 2.23426K wps
[Epoch 188 Batch 150/173] avg loss 0.000319574, throughput 2.24018K wps
Begin Testing...
[Epoch 188] train avg loss 0.000354311, dev acc 0.8352, dev avg loss 0.500568, throughput 2.24083K wps
[Epoch 189 Batch 30/173] avg loss 0.000353642, throughput 2.26554K wps
[Epoch 189 Batch 60/173] avg loss 0.000397077, throughput 2.22931K wps
[Epoch 189 Batch 90/173] avg loss 0.000325235, throughput 2.22171K wps
[Epoch 189 Batch 120/173] avg loss 0.000355531, throughput 2.23987K wps
[Epoch 189 Batch 150/173] avg loss 0.000360842, throughput 2.23149K wps
Begin Testing...
[Epoch 189] train avg loss 0.000350046, dev acc 0.8352, dev avg loss 0.500684, throughput 2.23597K wps
[Epoch 190 Batch 30/173] avg loss 0.000319107, throughput 2.25837K wps
[Epoch 190 Batch 60/173] avg loss 0.000330892, throughput 2.23065K wps
[Epoch 190 Batch 90/173] avg loss 0.000326992, throughput 2.20774K wps
[Epoch 190 Batch 120/173] avg loss 0.00030113, throughput 2.23185K wps
[Epoch 190 Batch 150/173] avg loss 0.000340595, throughput 2.23389K wps
Begin Testing...
[Epoch 190] train avg loss 0.000328228, dev acc 0.8321, dev avg loss 0.503975, throughput 2.23211K wps
[Epoch 191 Batch 30/173] avg loss 0.000313574, throughput 2.26879K wps
[Epoch 191 Batch 60/173] avg loss 0.000322139, throughput 2.23352K wps
[Epoch 191 Batch 90/173] avg loss 0.000372211, throughput 2.23079K wps
[Epoch 191 Batch 120/173] avg loss 0.000354841, throughput 2.23734K wps
[Epoch 191 Batch 150/173] avg loss 0.000333631, throughput 2.22416K wps
Begin Testing...
[Epoch 191] train avg loss 0.00033943, dev acc 0.8342, dev avg loss 0.501519, throughput 2.23721K wps
[Epoch 192 Batch 30/173] avg loss 0.000387981, throughput 2.26113K wps
[Epoch 192 Batch 60/173] avg loss 0.000315699, throughput 2.21496K wps
[Epoch 192 Batch 90/173] avg loss 0.000324672, throughput 2.18802K wps
[Epoch 192 Batch 120/173] avg loss 0.000325211, throughput 2.18923K wps
[Epoch 192 Batch 150/173] avg loss 0.000279951, throughput 2.21993K wps
Begin Testing...
[Epoch 192] train avg loss 0.000330663, dev acc 0.8311, dev avg loss 0.504615, throughput 2.21719K wps
[Epoch 193 Batch 30/173] avg loss 0.000321915, throughput 2.27821K wps
[Epoch 193 Batch 60/173] avg loss 0.000282751, throughput 2.2206K wps
[Epoch 193 Batch 90/173] avg loss 0.000301081, throughput 2.21577K wps
[Epoch 193 Batch 120/173] avg loss 0.000340403, throughput 2.23231K wps
[Epoch 193 Batch 150/173] avg loss 0.000379034, throughput 2.23514K wps
Begin Testing...
[Epoch 193] train avg loss 0.000323066, dev acc 0.8311, dev avg loss 0.509111, throughput 2.2361K wps
[Epoch 194 Batch 30/173] avg loss 0.000313843, throughput 2.26589K wps
[Epoch 194 Batch 60/173] avg loss 0.000290425, throughput 2.22924K wps
[Epoch 194 Batch 90/173] avg loss 0.000330298, throughput 2.22532K wps
[Epoch 194 Batch 120/173] avg loss 0.000304887, throughput 2.21566K wps
[Epoch 194 Batch 150/173] avg loss 0.000351556, throughput 2.23399K wps
Begin Testing...
[Epoch 194] train avg loss 0.000335701, dev acc 0.8321, dev avg loss 0.505473, throughput 2.2338K wps
[Epoch 195 Batch 30/173] avg loss 0.000270814, throughput 2.27926K wps
[Epoch 195 Batch 60/173] avg loss 0.000328974, throughput 2.22911K wps
[Epoch 195 Batch 90/173] avg loss 0.000342046, throughput 2.23261K wps
[Epoch 195 Batch 120/173] avg loss 0.000310715, throughput 2.21397K wps
[Epoch 195 Batch 150/173] avg loss 0.000298572, throughput 2.2363K wps
Begin Testing...
[Epoch 195] train avg loss 0.000307213, dev acc 0.8342, dev avg loss 0.506914, throughput 2.23757K wps
[Epoch 196 Batch 30/173] avg loss 0.000289456, throughput 2.2726K wps
[Epoch 196 Batch 60/173] avg loss 0.000338895, throughput 2.20806K wps
[Epoch 196 Batch 90/173] avg loss 0.000271676, throughput 2.21819K wps
[Epoch 196 Batch 120/173] avg loss 0.00036129, throughput 2.21395K wps
[Epoch 196 Batch 150/173] avg loss 0.000312375, throughput 2.20767K wps
Begin Testing...
[Epoch 196] train avg loss 0.000317897, dev acc 0.8311, dev avg loss 0.508487, throughput 2.22441K wps
[Epoch 197 Batch 30/173] avg loss 0.000284531, throughput 2.2727K wps
[Epoch 197 Batch 60/173] avg loss 0.000299797, throughput 2.22888K wps
[Epoch 197 Batch 90/173] avg loss 0.000254073, throughput 2.22425K wps
[Epoch 197 Batch 120/173] avg loss 0.000355977, throughput 2.22372K wps
[Epoch 197 Batch 150/173] avg loss 0.000351148, throughput 2.21649K wps
Begin Testing...
[Epoch 197] train avg loss 0.000312084, dev acc 0.8363, dev avg loss 0.506856, throughput 2.23154K wps
[Epoch 198 Batch 30/173] avg loss 0.000278606, throughput 2.28381K wps
[Epoch 198 Batch 60/173] avg loss 0.000311288, throughput 2.23013K wps
[Epoch 198 Batch 90/173] avg loss 0.00033058, throughput 2.23431K wps
[Epoch 198 Batch 120/173] avg loss 0.000343534, throughput 2.23229K wps
[Epoch 198 Batch 150/173] avg loss 0.000308924, throughput 2.23829K wps
Begin Testing...
[Epoch 198] train avg loss 0.000310038, dev acc 0.8342, dev avg loss 0.507242, throughput 2.24327K wps
[Epoch 199 Batch 30/173] avg loss 0.000276266, throughput 2.28363K wps
[Epoch 199 Batch 60/173] avg loss 0.000297469, throughput 2.23133K wps
[Epoch 199 Batch 90/173] avg loss 0.0003216, throughput 2.21683K wps
[Epoch 199 Batch 120/173] avg loss 0.00029425, throughput 2.21427K wps
[Epoch 199 Batch 150/173] avg loss 0.000306782, throughput 2.21743K wps
Begin Testing...
[Epoch 199] train avg loss 0.000301745, dev acc 0.8300, dev avg loss 0.508428, throughput 2.23158K wps
Test loss 0.52068, test acc 0.7955
Total time cost 898.36s
[Epoch 0 Batch 30/173] avg loss 0.013961, throughput 1.74567K wps
[Epoch 0 Batch 60/173] avg loss 0.013923, throughput 2.22521K wps
[Epoch 0 Batch 90/173] avg loss 0.0138446, throughput 2.23496K wps
[Epoch 0 Batch 120/173] avg loss 0.0138222, throughput 2.22953K wps
[Epoch 0 Batch 150/173] avg loss 0.0137367, throughput 2.23425K wps
Begin Testing...
[Epoch 0] train avg loss 0.0138418, dev acc 0.6538, dev avg loss 0.674554, throughput 2.12924K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0135409, throughput 2.28348K wps
[Epoch 1 Batch 60/173] avg loss 0.0136079, throughput 2.22288K wps
[Epoch 1 Batch 90/173] avg loss 0.0135214, throughput 2.22464K wps
[Epoch 1 Batch 120/173] avg loss 0.0134907, throughput 2.21087K wps
[Epoch 1 Batch 150/173] avg loss 0.0133685, throughput 2.21066K wps
Begin Testing...
[Epoch 1] train avg loss 0.0135154, dev acc 0.6809, dev avg loss 0.663157, throughput 2.2301K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0132462, throughput 2.26281K wps
[Epoch 2 Batch 60/173] avg loss 0.0133445, throughput 2.2356K wps
[Epoch 2 Batch 90/173] avg loss 0.0132507, throughput 2.21348K wps
[Epoch 2 Batch 120/173] avg loss 0.013179, throughput 2.23265K wps
[Epoch 2 Batch 150/173] avg loss 0.0131265, throughput 2.23366K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132366, dev acc 0.7007, dev avg loss 0.650164, throughput 2.23567K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0129564, throughput 2.26976K wps
[Epoch 3 Batch 60/173] avg loss 0.0129517, throughput 2.22631K wps
[Epoch 3 Batch 90/173] avg loss 0.0129422, throughput 2.227K wps
[Epoch 3 Batch 120/173] avg loss 0.01282, throughput 2.19414K wps
[Epoch 3 Batch 150/173] avg loss 0.0128115, throughput 2.22129K wps
Begin Testing...
[Epoch 3] train avg loss 0.0129081, dev acc 0.7049, dev avg loss 0.637072, throughput 2.22883K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/173] avg loss 0.0126399, throughput 2.28137K wps
[Epoch 4 Batch 60/173] avg loss 0.012516, throughput 2.23163K wps
[Epoch 4 Batch 90/173] avg loss 0.0126668, throughput 2.22066K wps
[Epoch 4 Batch 120/173] avg loss 0.0124766, throughput 2.22548K wps
[Epoch 4 Batch 150/173] avg loss 0.0126868, throughput 2.23585K wps
Begin Testing...
[Epoch 4] train avg loss 0.0126289, dev acc 0.7247, dev avg loss 0.622655, throughput 2.23648K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.01242, throughput 2.28893K wps
[Epoch 5 Batch 60/173] avg loss 0.0122699, throughput 2.22629K wps
[Epoch 5 Batch 90/173] avg loss 0.0123338, throughput 2.22076K wps
[Epoch 5 Batch 120/173] avg loss 0.012281, throughput 2.23267K wps
[Epoch 5 Batch 150/173] avg loss 0.012086, throughput 2.20986K wps
Begin Testing...
[Epoch 5] train avg loss 0.0122856, dev acc 0.7424, dev avg loss 0.606882, throughput 2.23464K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/173] avg loss 0.0119339, throughput 2.27198K wps
[Epoch 6 Batch 60/173] avg loss 0.0121368, throughput 2.22545K wps
[Epoch 6 Batch 90/173] avg loss 0.01189, throughput 2.21736K wps
[Epoch 6 Batch 120/173] avg loss 0.0119631, throughput 2.20832K wps
[Epoch 6 Batch 150/173] avg loss 0.0118419, throughput 2.22953K wps
Begin Testing...
[Epoch 6] train avg loss 0.0119641, dev acc 0.7466, dev avg loss 0.591788, throughput 2.23039K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/173] avg loss 0.0116664, throughput 2.26265K wps
[Epoch 7 Batch 60/173] avg loss 0.0115842, throughput 2.21406K wps
[Epoch 7 Batch 90/173] avg loss 0.0116144, throughput 2.20587K wps
[Epoch 7 Batch 120/173] avg loss 0.0113877, throughput 2.22499K wps
[Epoch 7 Batch 150/173] avg loss 0.0114171, throughput 2.23885K wps
Begin Testing...
[Epoch 7] train avg loss 0.0115385, dev acc 0.7550, dev avg loss 0.575477, throughput 2.22876K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/173] avg loss 0.0112465, throughput 2.25566K wps
[Epoch 8 Batch 60/173] avg loss 0.0111867, throughput 2.22562K wps
[Epoch 8 Batch 90/173] avg loss 0.0113259, throughput 2.23496K wps
[Epoch 8 Batch 120/173] avg loss 0.0110795, throughput 2.23163K wps
[Epoch 8 Batch 150/173] avg loss 0.0110439, throughput 2.20457K wps
Begin Testing...
[Epoch 8] train avg loss 0.011166, dev acc 0.7539, dev avg loss 0.562566, throughput 2.22724K wps
[Epoch 9 Batch 30/173] avg loss 0.0109334, throughput 2.2615K wps
[Epoch 9 Batch 60/173] avg loss 0.010925, throughput 2.23225K wps
[Epoch 9 Batch 90/173] avg loss 0.0107744, throughput 2.23293K wps
[Epoch 9 Batch 120/173] avg loss 0.0107474, throughput 2.22714K wps
[Epoch 9 Batch 150/173] avg loss 0.0110784, throughput 2.2389K wps
Begin Testing...
[Epoch 9] train avg loss 0.0109123, dev acc 0.7643, dev avg loss 0.546077, throughput 2.23815K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/173] avg loss 0.0106028, throughput 2.25947K wps
[Epoch 10 Batch 60/173] avg loss 0.0106355, throughput 2.22727K wps
[Epoch 10 Batch 90/173] avg loss 0.0104528, throughput 2.23735K wps
[Epoch 10 Batch 120/173] avg loss 0.0105302, throughput 2.22991K wps
[Epoch 10 Batch 150/173] avg loss 0.010517, throughput 2.20877K wps
Begin Testing...
[Epoch 10] train avg loss 0.0105299, dev acc 0.7623, dev avg loss 0.534643, throughput 2.23077K wps
[Epoch 11 Batch 30/173] avg loss 0.00996028, throughput 2.2811K wps
[Epoch 11 Batch 60/173] avg loss 0.0104378, throughput 2.23533K wps
[Epoch 11 Batch 90/173] avg loss 0.0101223, throughput 2.22116K wps
[Epoch 11 Batch 120/173] avg loss 0.0101315, throughput 2.23642K wps
[Epoch 11 Batch 150/173] avg loss 0.0102942, throughput 2.22909K wps
Begin Testing...
[Epoch 11] train avg loss 0.0101561, dev acc 0.7633, dev avg loss 0.519416, throughput 2.23707K wps
[Epoch 12 Batch 30/173] avg loss 0.0101014, throughput 2.28219K wps
[Epoch 12 Batch 60/173] avg loss 0.0100123, throughput 2.23437K wps
[Epoch 12 Batch 90/173] avg loss 0.0100688, throughput 2.23277K wps
[Epoch 12 Batch 120/173] avg loss 0.00997127, throughput 2.219K wps
[Epoch 12 Batch 150/173] avg loss 0.00948561, throughput 2.22874K wps
Begin Testing...
[Epoch 12] train avg loss 0.00992464, dev acc 0.7716, dev avg loss 0.508488, throughput 2.23753K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/173] avg loss 0.0097646, throughput 2.26349K wps
[Epoch 13 Batch 60/173] avg loss 0.0096082, throughput 2.21838K wps
[Epoch 13 Batch 90/173] avg loss 0.00960008, throughput 2.20832K wps
[Epoch 13 Batch 120/173] avg loss 0.0096298, throughput 2.22823K wps
[Epoch 13 Batch 150/173] avg loss 0.0092892, throughput 2.2396K wps
Begin Testing...
[Epoch 13] train avg loss 0.00960149, dev acc 0.7748, dev avg loss 0.499314, throughput 2.23228K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/173] avg loss 0.00924584, throughput 2.26834K wps
[Epoch 14 Batch 60/173] avg loss 0.0092516, throughput 2.22856K wps
[Epoch 14 Batch 90/173] avg loss 0.00950213, throughput 2.22917K wps
[Epoch 14 Batch 120/173] avg loss 0.00924912, throughput 2.23377K wps
[Epoch 14 Batch 150/173] avg loss 0.00941739, throughput 2.23615K wps
Begin Testing...
[Epoch 14] train avg loss 0.00933689, dev acc 0.7758, dev avg loss 0.491937, throughput 2.23731K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/173] avg loss 0.00919684, throughput 2.27968K wps
[Epoch 15 Batch 60/173] avg loss 0.0092003, throughput 2.21696K wps
[Epoch 15 Batch 90/173] avg loss 0.00906032, throughput 2.23706K wps
[Epoch 15 Batch 120/173] avg loss 0.00897303, throughput 2.23771K wps
[Epoch 15 Batch 150/173] avg loss 0.00904894, throughput 2.23236K wps
Begin Testing...
[Epoch 15] train avg loss 0.00911728, dev acc 0.7821, dev avg loss 0.485418, throughput 2.23949K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.00910772, throughput 2.26999K wps
[Epoch 16 Batch 60/173] avg loss 0.00910944, throughput 2.21944K wps
[Epoch 16 Batch 90/173] avg loss 0.00858865, throughput 2.21061K wps
[Epoch 16 Batch 120/173] avg loss 0.00900284, throughput 2.23183K wps
[Epoch 16 Batch 150/173] avg loss 0.00879953, throughput 2.22348K wps
Begin Testing...
[Epoch 16] train avg loss 0.00892601, dev acc 0.7810, dev avg loss 0.477114, throughput 2.2298K wps
[Epoch 17 Batch 30/173] avg loss 0.00858413, throughput 2.26465K wps
[Epoch 17 Batch 60/173] avg loss 0.00885903, throughput 2.23508K wps
[Epoch 17 Batch 90/173] avg loss 0.00873498, throughput 2.22257K wps
[Epoch 17 Batch 120/173] avg loss 0.00870018, throughput 2.22995K wps
[Epoch 17 Batch 150/173] avg loss 0.00837414, throughput 2.23409K wps
Begin Testing...
[Epoch 17] train avg loss 0.00869209, dev acc 0.7852, dev avg loss 0.470243, throughput 2.23506K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.00862611, throughput 2.26597K wps
[Epoch 18 Batch 60/173] avg loss 0.00839325, throughput 2.22764K wps
[Epoch 18 Batch 90/173] avg loss 0.00863039, throughput 2.2277K wps
[Epoch 18 Batch 120/173] avg loss 0.00855638, throughput 2.22547K wps
[Epoch 18 Batch 150/173] avg loss 0.00832815, throughput 2.21973K wps
Begin Testing...
[Epoch 18] train avg loss 0.00850511, dev acc 0.7925, dev avg loss 0.465654, throughput 2.22999K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/173] avg loss 0.00842862, throughput 2.2552K wps
[Epoch 19 Batch 60/173] avg loss 0.00828218, throughput 2.22024K wps
[Epoch 19 Batch 90/173] avg loss 0.0082912, throughput 2.2275K wps
[Epoch 19 Batch 120/173] avg loss 0.0083047, throughput 2.23416K wps
[Epoch 19 Batch 150/173] avg loss 0.00855396, throughput 2.23777K wps
Begin Testing...
[Epoch 19] train avg loss 0.00832116, dev acc 0.7894, dev avg loss 0.461275, throughput 2.23398K wps
[Epoch 20 Batch 30/173] avg loss 0.00826079, throughput 2.27354K wps
[Epoch 20 Batch 60/173] avg loss 0.00836752, throughput 2.23308K wps
[Epoch 20 Batch 90/173] avg loss 0.00809949, throughput 2.22731K wps
[Epoch 20 Batch 120/173] avg loss 0.0083387, throughput 2.23087K wps
[Epoch 20 Batch 150/173] avg loss 0.0076897, throughput 2.23289K wps
Begin Testing...
[Epoch 20] train avg loss 0.00810493, dev acc 0.7946, dev avg loss 0.456735, throughput 2.23758K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/173] avg loss 0.00812059, throughput 2.26561K wps
[Epoch 21 Batch 60/173] avg loss 0.00770212, throughput 2.23535K wps
[Epoch 21 Batch 90/173] avg loss 0.00781422, throughput 2.21624K wps
[Epoch 21 Batch 120/173] avg loss 0.00769679, throughput 2.21384K wps
[Epoch 21 Batch 150/173] avg loss 0.00817878, throughput 2.21552K wps
Begin Testing...
[Epoch 21] train avg loss 0.00793446, dev acc 0.7967, dev avg loss 0.452565, throughput 2.22641K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/173] avg loss 0.0077496, throughput 2.27782K wps
[Epoch 22 Batch 60/173] avg loss 0.0079912, throughput 2.23309K wps
[Epoch 22 Batch 90/173] avg loss 0.00767561, throughput 2.2117K wps
[Epoch 22 Batch 120/173] avg loss 0.00774978, throughput 2.2334K wps
[Epoch 22 Batch 150/173] avg loss 0.00745922, throughput 2.24082K wps
Begin Testing...
[Epoch 22] train avg loss 0.00779437, dev acc 0.7956, dev avg loss 0.450386, throughput 2.23901K wps
[Epoch 23 Batch 30/173] avg loss 0.0074573, throughput 2.27598K wps
[Epoch 23 Batch 60/173] avg loss 0.00752999, throughput 2.23446K wps
[Epoch 23 Batch 90/173] avg loss 0.00771799, throughput 2.23105K wps
[Epoch 23 Batch 120/173] avg loss 0.00743211, throughput 2.23576K wps
[Epoch 23 Batch 150/173] avg loss 0.00756966, throughput 2.2186K wps
Begin Testing...
[Epoch 23] train avg loss 0.00764267, dev acc 0.7967, dev avg loss 0.447325, throughput 2.23816K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/173] avg loss 0.00735797, throughput 2.27324K wps
[Epoch 24 Batch 60/173] avg loss 0.00713067, throughput 2.21107K wps
[Epoch 24 Batch 90/173] avg loss 0.00758327, throughput 2.23002K wps
[Epoch 24 Batch 120/173] avg loss 0.00751671, throughput 2.22805K wps
[Epoch 24 Batch 150/173] avg loss 0.00740697, throughput 2.22322K wps
Begin Testing...
[Epoch 24] train avg loss 0.00740674, dev acc 0.7946, dev avg loss 0.443767, throughput 2.23002K wps
[Epoch 25 Batch 30/173] avg loss 0.00721403, throughput 2.26186K wps
[Epoch 25 Batch 60/173] avg loss 0.00734792, throughput 2.22497K wps
[Epoch 25 Batch 90/173] avg loss 0.00716572, throughput 2.22551K wps
[Epoch 25 Batch 120/173] avg loss 0.0072479, throughput 2.23217K wps
[Epoch 25 Batch 150/173] avg loss 0.00725041, throughput 2.23342K wps
Begin Testing...
[Epoch 25] train avg loss 0.00726562, dev acc 0.7998, dev avg loss 0.442307, throughput 2.23545K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/173] avg loss 0.00717629, throughput 2.27931K wps
[Epoch 26 Batch 60/173] avg loss 0.00714037, throughput 2.23086K wps
[Epoch 26 Batch 90/173] avg loss 0.00697181, throughput 2.22841K wps
[Epoch 26 Batch 120/173] avg loss 0.00692086, throughput 2.23647K wps
[Epoch 26 Batch 150/173] avg loss 0.00717165, throughput 2.22188K wps
Begin Testing...
[Epoch 26] train avg loss 0.00710998, dev acc 0.8029, dev avg loss 0.438699, throughput 2.23714K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/173] avg loss 0.00709277, throughput 2.26292K wps
[Epoch 27 Batch 60/173] avg loss 0.00661003, throughput 2.23036K wps
[Epoch 27 Batch 90/173] avg loss 0.00748793, throughput 2.22008K wps
[Epoch 27 Batch 120/173] avg loss 0.00705547, throughput 2.21807K wps
[Epoch 27 Batch 150/173] avg loss 0.00684586, throughput 2.21644K wps
Begin Testing...
[Epoch 27] train avg loss 0.00700164, dev acc 0.7935, dev avg loss 0.438284, throughput 2.22997K wps
[Epoch 28 Batch 30/173] avg loss 0.00649426, throughput 2.24927K wps
[Epoch 28 Batch 60/173] avg loss 0.0066967, throughput 2.23442K wps
[Epoch 28 Batch 90/173] avg loss 0.00702805, throughput 2.23496K wps
[Epoch 28 Batch 120/173] avg loss 0.00681805, throughput 2.22347K wps
[Epoch 28 Batch 150/173] avg loss 0.0067375, throughput 2.23388K wps
Begin Testing...
[Epoch 28] train avg loss 0.00680626, dev acc 0.8040, dev avg loss 0.434337, throughput 2.23414K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/173] avg loss 0.00648841, throughput 2.28136K wps
[Epoch 29 Batch 60/173] avg loss 0.00662608, throughput 2.23361K wps
[Epoch 29 Batch 90/173] avg loss 0.00648123, throughput 2.23107K wps
[Epoch 29 Batch 120/173] avg loss 0.00688973, throughput 2.23327K wps
[Epoch 29 Batch 150/173] avg loss 0.00661969, throughput 2.22163K wps
Begin Testing...
[Epoch 29] train avg loss 0.00667016, dev acc 0.7987, dev avg loss 0.432686, throughput 2.23861K wps
[Epoch 30 Batch 30/173] avg loss 0.00657988, throughput 2.27628K wps
[Epoch 30 Batch 60/173] avg loss 0.00650861, throughput 2.22091K wps
[Epoch 30 Batch 90/173] avg loss 0.00639038, throughput 2.22449K wps
[Epoch 30 Batch 120/173] avg loss 0.0064719, throughput 2.2141K wps
[Epoch 30 Batch 150/173] avg loss 0.0062187, throughput 2.23743K wps
Begin Testing...
[Epoch 30] train avg loss 0.00648107, dev acc 0.7977, dev avg loss 0.431868, throughput 2.23432K wps
[Epoch 31 Batch 30/173] avg loss 0.0063996, throughput 2.27803K wps
[Epoch 31 Batch 60/173] avg loss 0.00625054, throughput 2.23353K wps
[Epoch 31 Batch 90/173] avg loss 0.00611609, throughput 2.23346K wps
[Epoch 31 Batch 120/173] avg loss 0.00684955, throughput 2.23664K wps
[Epoch 31 Batch 150/173] avg loss 0.00586657, throughput 2.22039K wps
Begin Testing...
[Epoch 31] train avg loss 0.00630347, dev acc 0.7977, dev avg loss 0.431051, throughput 2.23961K wps
[Epoch 32 Batch 30/173] avg loss 0.0061316, throughput 2.25817K wps
[Epoch 32 Batch 60/173] avg loss 0.00626688, throughput 2.2192K wps
[Epoch 32 Batch 90/173] avg loss 0.00598082, throughput 2.18246K wps
[Epoch 32 Batch 120/173] avg loss 0.00618433, throughput 2.20473K wps
[Epoch 32 Batch 150/173] avg loss 0.00635953, throughput 2.21739K wps
Begin Testing...
[Epoch 32] train avg loss 0.00619227, dev acc 0.8019, dev avg loss 0.428363, throughput 2.21788K wps
[Epoch 33 Batch 30/173] avg loss 0.00597817, throughput 2.26994K wps
[Epoch 33 Batch 60/173] avg loss 0.00605863, throughput 2.20879K wps
[Epoch 33 Batch 90/173] avg loss 0.00623984, throughput 2.20412K wps
[Epoch 33 Batch 120/173] avg loss 0.00590757, throughput 2.23684K wps
[Epoch 33 Batch 150/173] avg loss 0.00600954, throughput 2.2341K wps
Begin Testing...
[Epoch 33] train avg loss 0.00602309, dev acc 0.8008, dev avg loss 0.426096, throughput 2.23016K wps
[Epoch 34 Batch 30/173] avg loss 0.0058407, throughput 2.2581K wps
[Epoch 34 Batch 60/173] avg loss 0.00616408, throughput 2.21187K wps
[Epoch 34 Batch 90/173] avg loss 0.0060803, throughput 2.22887K wps
[Epoch 34 Batch 120/173] avg loss 0.00563941, throughput 2.22149K wps
[Epoch 34 Batch 150/173] avg loss 0.00556852, throughput 2.20506K wps
Begin Testing...
[Epoch 34] train avg loss 0.00589285, dev acc 0.8029, dev avg loss 0.424708, throughput 2.22301K wps
[Epoch 35 Batch 30/173] avg loss 0.00580741, throughput 2.2558K wps
[Epoch 35 Batch 60/173] avg loss 0.00570555, throughput 2.23203K wps
[Epoch 35 Batch 90/173] avg loss 0.00586336, throughput 2.23066K wps
[Epoch 35 Batch 120/173] avg loss 0.00567086, throughput 2.22635K wps
[Epoch 35 Batch 150/173] avg loss 0.00591174, throughput 2.22914K wps
Begin Testing...
[Epoch 35] train avg loss 0.00579569, dev acc 0.7998, dev avg loss 0.423703, throughput 2.23452K wps
[Epoch 36 Batch 30/173] avg loss 0.00564108, throughput 2.26337K wps
[Epoch 36 Batch 60/173] avg loss 0.00560048, throughput 2.23577K wps
[Epoch 36 Batch 90/173] avg loss 0.00538441, throughput 2.23299K wps
[Epoch 36 Batch 120/173] avg loss 0.00538985, throughput 2.23046K wps
[Epoch 36 Batch 150/173] avg loss 0.00537272, throughput 2.23553K wps
Begin Testing...
[Epoch 36] train avg loss 0.00553372, dev acc 0.7977, dev avg loss 0.424785, throughput 2.23894K wps
[Epoch 37 Batch 30/173] avg loss 0.00542272, throughput 2.25748K wps
[Epoch 37 Batch 60/173] avg loss 0.00559758, throughput 2.20816K wps
[Epoch 37 Batch 90/173] avg loss 0.00532805, throughput 2.23276K wps
[Epoch 37 Batch 120/173] avg loss 0.00536248, throughput 2.23403K wps
[Epoch 37 Batch 150/173] avg loss 0.00567464, throughput 2.23809K wps
Begin Testing...
[Epoch 37] train avg loss 0.00549678, dev acc 0.8040, dev avg loss 0.426411, throughput 2.23395K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/173] avg loss 0.00522081, throughput 2.28519K wps
[Epoch 38 Batch 60/173] avg loss 0.00531002, throughput 2.23086K wps
[Epoch 38 Batch 90/173] avg loss 0.00521111, throughput 2.22716K wps
[Epoch 38 Batch 120/173] avg loss 0.00528806, throughput 2.21806K wps
[Epoch 38 Batch 150/173] avg loss 0.00509943, throughput 2.22884K wps
Begin Testing...
[Epoch 38] train avg loss 0.0052807, dev acc 0.8071, dev avg loss 0.421217, throughput 2.23598K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/173] avg loss 0.00508063, throughput 2.25734K wps
[Epoch 39 Batch 60/173] avg loss 0.00497592, throughput 2.21891K wps
[Epoch 39 Batch 90/173] avg loss 0.00520261, throughput 2.23531K wps
[Epoch 39 Batch 120/173] avg loss 0.00535515, throughput 2.22249K wps
[Epoch 39 Batch 150/173] avg loss 0.00513082, throughput 2.23572K wps
Begin Testing...
[Epoch 39] train avg loss 0.00514016, dev acc 0.7998, dev avg loss 0.422417, throughput 2.23481K wps
[Epoch 40 Batch 30/173] avg loss 0.00509802, throughput 2.24994K wps
[Epoch 40 Batch 60/173] avg loss 0.0048912, throughput 2.22956K wps
[Epoch 40 Batch 90/173] avg loss 0.00512686, throughput 2.22115K wps
[Epoch 40 Batch 120/173] avg loss 0.00495865, throughput 2.2335K wps
[Epoch 40 Batch 150/173] avg loss 0.00501049, throughput 2.22053K wps
Begin Testing...
[Epoch 40] train avg loss 0.00503927, dev acc 0.8060, dev avg loss 0.419883, throughput 2.23098K wps
[Epoch 41 Batch 30/173] avg loss 0.00482251, throughput 2.2772K wps
[Epoch 41 Batch 60/173] avg loss 0.00501272, throughput 2.23189K wps
[Epoch 41 Batch 90/173] avg loss 0.00465269, throughput 2.22702K wps
[Epoch 41 Batch 120/173] avg loss 0.00515948, throughput 2.22312K wps
[Epoch 41 Batch 150/173] avg loss 0.00471916, throughput 2.22983K wps
Begin Testing...
[Epoch 41] train avg loss 0.00492721, dev acc 0.8092, dev avg loss 0.419488, throughput 2.23764K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/173] avg loss 0.00472943, throughput 2.27097K wps
[Epoch 42 Batch 60/173] avg loss 0.00481681, throughput 2.23043K wps
[Epoch 42 Batch 90/173] avg loss 0.00475049, throughput 2.21925K wps
[Epoch 42 Batch 120/173] avg loss 0.00473778, throughput 2.22586K wps
[Epoch 42 Batch 150/173] avg loss 0.00493406, throughput 2.21369K wps
Begin Testing...
[Epoch 42] train avg loss 0.00477325, dev acc 0.8008, dev avg loss 0.42432, throughput 2.22745K wps
[Epoch 43 Batch 30/173] avg loss 0.00478991, throughput 2.27163K wps
[Epoch 43 Batch 60/173] avg loss 0.00467248, throughput 2.23047K wps
[Epoch 43 Batch 90/173] avg loss 0.00464773, throughput 2.22102K wps
[Epoch 43 Batch 120/173] avg loss 0.00465631, throughput 2.23534K wps
[Epoch 43 Batch 150/173] avg loss 0.00467111, throughput 2.23155K wps
Begin Testing...
[Epoch 43] train avg loss 0.00469077, dev acc 0.8050, dev avg loss 0.419383, throughput 2.2364K wps
[Epoch 44 Batch 30/173] avg loss 0.00443443, throughput 2.27389K wps
[Epoch 44 Batch 60/173] avg loss 0.00453637, throughput 2.23612K wps
[Epoch 44 Batch 90/173] avg loss 0.00471068, throughput 2.22944K wps
[Epoch 44 Batch 120/173] avg loss 0.00457516, throughput 2.21089K wps
[Epoch 44 Batch 150/173] avg loss 0.00463692, throughput 2.23202K wps
Begin Testing...
[Epoch 44] train avg loss 0.00457567, dev acc 0.8008, dev avg loss 0.420363, throughput 2.23641K wps
[Epoch 45 Batch 30/173] avg loss 0.00432093, throughput 2.26684K wps
[Epoch 45 Batch 60/173] avg loss 0.0045114, throughput 2.21354K wps
[Epoch 45 Batch 90/173] avg loss 0.00442963, throughput 2.22955K wps
[Epoch 45 Batch 120/173] avg loss 0.00441196, throughput 2.22675K wps
[Epoch 45 Batch 150/173] avg loss 0.00433539, throughput 2.23405K wps
Begin Testing...
[Epoch 45] train avg loss 0.004407, dev acc 0.8019, dev avg loss 0.421063, throughput 2.23253K wps
[Epoch 46 Batch 30/173] avg loss 0.00423428, throughput 2.28057K wps
[Epoch 46 Batch 60/173] avg loss 0.00424086, throughput 2.22247K wps
[Epoch 46 Batch 90/173] avg loss 0.00429071, throughput 2.22916K wps
[Epoch 46 Batch 120/173] avg loss 0.00434966, throughput 2.22575K wps
[Epoch 46 Batch 150/173] avg loss 0.00416224, throughput 2.23162K wps
Begin Testing...
[Epoch 46] train avg loss 0.00429296, dev acc 0.7998, dev avg loss 0.41992, throughput 2.23727K wps
[Epoch 47 Batch 30/173] avg loss 0.00433874, throughput 2.26982K wps
[Epoch 47 Batch 60/173] avg loss 0.00431087, throughput 2.23588K wps
[Epoch 47 Batch 90/173] avg loss 0.0041592, throughput 2.23516K wps
[Epoch 47 Batch 120/173] avg loss 0.00430391, throughput 2.23275K wps
[Epoch 47 Batch 150/173] avg loss 0.00400202, throughput 2.22678K wps
Begin Testing...
[Epoch 47] train avg loss 0.00421459, dev acc 0.8019, dev avg loss 0.418972, throughput 2.23771K wps
[Epoch 48 Batch 30/173] avg loss 0.00411763, throughput 2.26322K wps
[Epoch 48 Batch 60/173] avg loss 0.00417328, throughput 2.23182K wps
[Epoch 48 Batch 90/173] avg loss 0.00363686, throughput 2.22706K wps
[Epoch 48 Batch 120/173] avg loss 0.00409533, throughput 2.23174K wps
[Epoch 48 Batch 150/173] avg loss 0.00387288, throughput 2.23376K wps
Begin Testing...
[Epoch 48] train avg loss 0.003993, dev acc 0.8071, dev avg loss 0.422788, throughput 2.23317K wps
[Epoch 49 Batch 30/173] avg loss 0.0038584, throughput 2.27028K wps
[Epoch 49 Batch 60/173] avg loss 0.00421628, throughput 2.22615K wps
[Epoch 49 Batch 90/173] avg loss 0.00390728, throughput 2.22104K wps
[Epoch 49 Batch 120/173] avg loss 0.0039034, throughput 2.23076K wps
[Epoch 49 Batch 150/173] avg loss 0.00393579, throughput 2.20276K wps
Begin Testing...
[Epoch 49] train avg loss 0.00395883, dev acc 0.8050, dev avg loss 0.423503, throughput 2.23026K wps
[Epoch 50 Batch 30/173] avg loss 0.00403872, throughput 2.25523K wps
[Epoch 50 Batch 60/173] avg loss 0.00353885, throughput 2.22641K wps
[Epoch 50 Batch 90/173] avg loss 0.00368538, throughput 2.22735K wps
[Epoch 50 Batch 120/173] avg loss 0.00390044, throughput 2.22993K wps
[Epoch 50 Batch 150/173] avg loss 0.00385395, throughput 2.23073K wps
Begin Testing...
[Epoch 50] train avg loss 0.00382577, dev acc 0.8060, dev avg loss 0.423135, throughput 2.23426K wps
[Epoch 51 Batch 30/173] avg loss 0.00395419, throughput 2.27059K wps
[Epoch 51 Batch 60/173] avg loss 0.00354326, throughput 2.22263K wps
[Epoch 51 Batch 90/173] avg loss 0.00366386, throughput 2.20478K wps
[Epoch 51 Batch 120/173] avg loss 0.0037634, throughput 2.23K wps
[Epoch 51 Batch 150/173] avg loss 0.00372623, throughput 2.23269K wps
Begin Testing...
[Epoch 51] train avg loss 0.00371081, dev acc 0.7998, dev avg loss 0.421701, throughput 2.23283K wps
[Epoch 52 Batch 30/173] avg loss 0.00355556, throughput 2.26514K wps
[Epoch 52 Batch 60/173] avg loss 0.0036117, throughput 2.23172K wps
[Epoch 52 Batch 90/173] avg loss 0.0038289, throughput 2.23272K wps
[Epoch 52 Batch 120/173] avg loss 0.00344366, throughput 2.23157K wps
[Epoch 52 Batch 150/173] avg loss 0.00345362, throughput 2.21045K wps
Begin Testing...
[Epoch 52] train avg loss 0.00363495, dev acc 0.8019, dev avg loss 0.422625, throughput 2.23302K wps
[Epoch 53 Batch 30/173] avg loss 0.00346335, throughput 2.28048K wps
[Epoch 53 Batch 60/173] avg loss 0.00337899, throughput 2.23048K wps
[Epoch 53 Batch 90/173] avg loss 0.00352922, throughput 2.21993K wps
[Epoch 53 Batch 120/173] avg loss 0.00369194, throughput 2.21437K wps
[Epoch 53 Batch 150/173] avg loss 0.00362707, throughput 2.22417K wps
Begin Testing...
[Epoch 53] train avg loss 0.00352694, dev acc 0.8019, dev avg loss 0.421573, throughput 2.23399K wps
[Epoch 54 Batch 30/173] avg loss 0.00368559, throughput 2.28036K wps
[Epoch 54 Batch 60/173] avg loss 0.00359417, throughput 2.20731K wps
[Epoch 54 Batch 90/173] avg loss 0.00337163, throughput 2.23095K wps
[Epoch 54 Batch 120/173] avg loss 0.00336448, throughput 2.23383K wps
[Epoch 54 Batch 150/173] avg loss 0.00356405, throughput 2.2236K wps
Begin Testing...
[Epoch 54] train avg loss 0.00349327, dev acc 0.8019, dev avg loss 0.421318, throughput 2.23445K wps
[Epoch 55 Batch 30/173] avg loss 0.00335968, throughput 2.27297K wps
[Epoch 55 Batch 60/173] avg loss 0.00323447, throughput 2.22247K wps
[Epoch 55 Batch 90/173] avg loss 0.0033264, throughput 2.20906K wps
[Epoch 55 Batch 120/173] avg loss 0.00308654, throughput 2.22679K wps
[Epoch 55 Batch 150/173] avg loss 0.0034532, throughput 2.23307K wps
Begin Testing...
[Epoch 55] train avg loss 0.00329469, dev acc 0.8029, dev avg loss 0.423127, throughput 2.23265K wps
[Epoch 56 Batch 30/173] avg loss 0.00302003, throughput 2.26915K wps
[Epoch 56 Batch 60/173] avg loss 0.00342668, throughput 2.23316K wps
[Epoch 56 Batch 90/173] avg loss 0.00320933, throughput 2.23402K wps
[Epoch 56 Batch 120/173] avg loss 0.00352073, throughput 2.21455K wps
[Epoch 56 Batch 150/173] avg loss 0.00332563, throughput 2.22642K wps
Begin Testing...
[Epoch 56] train avg loss 0.00327708, dev acc 0.8040, dev avg loss 0.423324, throughput 2.23493K wps
[Epoch 57 Batch 30/173] avg loss 0.00308531, throughput 2.26326K wps
[Epoch 57 Batch 60/173] avg loss 0.00318005, throughput 2.23256K wps
[Epoch 57 Batch 90/173] avg loss 0.00300583, throughput 2.23296K wps
[Epoch 57 Batch 120/173] avg loss 0.00298393, throughput 2.23129K wps
[Epoch 57 Batch 150/173] avg loss 0.00318539, throughput 2.22123K wps
Begin Testing...
[Epoch 57] train avg loss 0.00311538, dev acc 0.8060, dev avg loss 0.435663, throughput 2.23473K wps
[Epoch 58 Batch 30/173] avg loss 0.00310451, throughput 2.25421K wps
[Epoch 58 Batch 60/173] avg loss 0.00303388, throughput 2.21735K wps
[Epoch 58 Batch 90/173] avg loss 0.00318099, throughput 2.22695K wps
[Epoch 58 Batch 120/173] avg loss 0.00293779, throughput 2.21779K wps
[Epoch 58 Batch 150/173] avg loss 0.0031898, throughput 2.22901K wps
Begin Testing...
[Epoch 58] train avg loss 0.00311154, dev acc 0.8029, dev avg loss 0.424849, throughput 2.22565K wps
[Epoch 59 Batch 30/173] avg loss 0.00285275, throughput 2.29K wps
[Epoch 59 Batch 60/173] avg loss 0.00282308, throughput 2.22422K wps
[Epoch 59 Batch 90/173] avg loss 0.00303636, throughput 2.22412K wps
[Epoch 59 Batch 120/173] avg loss 0.00309611, throughput 2.22894K wps
[Epoch 59 Batch 150/173] avg loss 0.00330657, throughput 2.21841K wps
Begin Testing...
[Epoch 59] train avg loss 0.00302583, dev acc 0.8008, dev avg loss 0.425926, throughput 2.23589K wps
[Epoch 60 Batch 30/173] avg loss 0.00305597, throughput 2.28246K wps
[Epoch 60 Batch 60/173] avg loss 0.00292064, throughput 2.23132K wps
[Epoch 60 Batch 90/173] avg loss 0.00284609, throughput 2.20844K wps
[Epoch 60 Batch 120/173] avg loss 0.00272716, throughput 2.20519K wps
[Epoch 60 Batch 150/173] avg loss 0.00298768, throughput 2.22757K wps
Begin Testing...
[Epoch 60] train avg loss 0.00291481, dev acc 0.8071, dev avg loss 0.429707, throughput 2.23112K wps
[Epoch 61 Batch 30/173] avg loss 0.00266917, throughput 2.2602K wps
[Epoch 61 Batch 60/173] avg loss 0.00285962, throughput 2.22407K wps
[Epoch 61 Batch 90/173] avg loss 0.00293728, throughput 2.21589K wps
[Epoch 61 Batch 120/173] avg loss 0.00300597, throughput 2.23702K wps
[Epoch 61 Batch 150/173] avg loss 0.00273591, throughput 2.23129K wps
Begin Testing...
[Epoch 61] train avg loss 0.00286419, dev acc 0.8029, dev avg loss 0.427847, throughput 2.23344K wps
[Epoch 62 Batch 30/173] avg loss 0.00284038, throughput 2.25997K wps
[Epoch 62 Batch 60/173] avg loss 0.0025468, throughput 2.22923K wps
[Epoch 62 Batch 90/173] avg loss 0.00281788, throughput 2.20622K wps
[Epoch 62 Batch 120/173] avg loss 0.00272786, throughput 2.23619K wps
[Epoch 62 Batch 150/173] avg loss 0.00279209, throughput 2.23466K wps
Begin Testing...
[Epoch 62] train avg loss 0.00275523, dev acc 0.7998, dev avg loss 0.428952, throughput 2.23174K wps
[Epoch 63 Batch 30/173] avg loss 0.00255379, throughput 2.28277K wps
[Epoch 63 Batch 60/173] avg loss 0.00265084, throughput 2.22432K wps
[Epoch 63 Batch 90/173] avg loss 0.0026695, throughput 2.22288K wps
[Epoch 63 Batch 120/173] avg loss 0.00270993, throughput 2.20785K wps
[Epoch 63 Batch 150/173] avg loss 0.00268949, throughput 2.21952K wps
Begin Testing...
[Epoch 63] train avg loss 0.00270035, dev acc 0.8123, dev avg loss 0.436584, throughput 2.22857K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/173] avg loss 0.00253463, throughput 2.27171K wps
[Epoch 64 Batch 60/173] avg loss 0.00252083, throughput 2.22138K wps
[Epoch 64 Batch 90/173] avg loss 0.00256448, throughput 2.22394K wps
[Epoch 64 Batch 120/173] avg loss 0.00260487, throughput 2.23236K wps
[Epoch 64 Batch 150/173] avg loss 0.00269319, throughput 2.22827K wps
Begin Testing...
[Epoch 64] train avg loss 0.00258555, dev acc 0.8029, dev avg loss 0.430504, throughput 2.2352K wps
[Epoch 65 Batch 30/173] avg loss 0.00256305, throughput 2.28013K wps
[Epoch 65 Batch 60/173] avg loss 0.00263219, throughput 2.23453K wps
[Epoch 65 Batch 90/173] avg loss 0.00243792, throughput 2.23072K wps
[Epoch 65 Batch 120/173] avg loss 0.00255761, throughput 2.20337K wps
[Epoch 65 Batch 150/173] avg loss 0.00238825, throughput 2.19699K wps
Begin Testing...
[Epoch 65] train avg loss 0.00250631, dev acc 0.8081, dev avg loss 0.436539, throughput 2.22894K wps
[Epoch 66 Batch 30/173] avg loss 0.00254669, throughput 2.2645K wps
[Epoch 66 Batch 60/173] avg loss 0.00237632, throughput 2.2088K wps
[Epoch 66 Batch 90/173] avg loss 0.00247299, throughput 2.21008K wps
[Epoch 66 Batch 120/173] avg loss 0.00234106, throughput 2.21049K wps
[Epoch 66 Batch 150/173] avg loss 0.00254155, throughput 2.20817K wps
Begin Testing...
[Epoch 66] train avg loss 0.0024544, dev acc 0.8071, dev avg loss 0.434946, throughput 2.22073K wps
[Epoch 67 Batch 30/173] avg loss 0.00235561, throughput 2.26867K wps
[Epoch 67 Batch 60/173] avg loss 0.00233038, throughput 2.20847K wps
[Epoch 67 Batch 90/173] avg loss 0.00232522, throughput 2.20227K wps
[Epoch 67 Batch 120/173] avg loss 0.00243613, throughput 2.19335K wps
[Epoch 67 Batch 150/173] avg loss 0.00263061, throughput 2.22221K wps
Begin Testing...
[Epoch 67] train avg loss 0.0024134, dev acc 0.8092, dev avg loss 0.434833, throughput 2.21803K wps
[Epoch 68 Batch 30/173] avg loss 0.0023913, throughput 2.25224K wps
[Epoch 68 Batch 60/173] avg loss 0.00231197, throughput 2.22097K wps
[Epoch 68 Batch 90/173] avg loss 0.00237177, throughput 2.23056K wps
[Epoch 68 Batch 120/173] avg loss 0.00238873, throughput 2.22303K wps
[Epoch 68 Batch 150/173] avg loss 0.00239716, throughput 2.21981K wps
Begin Testing...
[Epoch 68] train avg loss 0.00234887, dev acc 0.8081, dev avg loss 0.43897, throughput 2.23003K wps
[Epoch 69 Batch 30/173] avg loss 0.00226374, throughput 2.28068K wps
[Epoch 69 Batch 60/173] avg loss 0.00236977, throughput 2.2185K wps
[Epoch 69 Batch 90/173] avg loss 0.00214347, throughput 2.23428K wps
[Epoch 69 Batch 120/173] avg loss 0.00235609, throughput 2.22271K wps
[Epoch 69 Batch 150/173] avg loss 0.00231178, throughput 2.23475K wps
Begin Testing...
[Epoch 69] train avg loss 0.00228132, dev acc 0.8060, dev avg loss 0.436766, throughput 2.234K wps
[Epoch 70 Batch 30/173] avg loss 0.00229188, throughput 2.27969K wps
[Epoch 70 Batch 60/173] avg loss 0.0020759, throughput 2.21583K wps
[Epoch 70 Batch 90/173] avg loss 0.00213241, throughput 2.22287K wps
[Epoch 70 Batch 120/173] avg loss 0.00238637, throughput 2.22775K wps
[Epoch 70 Batch 150/173] avg loss 0.00249229, throughput 2.22454K wps
Begin Testing...
[Epoch 70] train avg loss 0.00225526, dev acc 0.8123, dev avg loss 0.440083, throughput 2.23313K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/173] avg loss 0.00213024, throughput 2.27761K wps
[Epoch 71 Batch 60/173] avg loss 0.00216795, throughput 2.21477K wps
[Epoch 71 Batch 90/173] avg loss 0.00216474, throughput 2.2214K wps
[Epoch 71 Batch 120/173] avg loss 0.00238052, throughput 2.23549K wps
[Epoch 71 Batch 150/173] avg loss 0.00224048, throughput 2.21724K wps
Begin Testing...
[Epoch 71] train avg loss 0.0021802, dev acc 0.8092, dev avg loss 0.443244, throughput 2.23211K wps
[Epoch 72 Batch 30/173] avg loss 0.00214022, throughput 2.27467K wps
[Epoch 72 Batch 60/173] avg loss 0.00201112, throughput 2.23123K wps
[Epoch 72 Batch 90/173] avg loss 0.00222162, throughput 2.23499K wps
[Epoch 72 Batch 120/173] avg loss 0.00223078, throughput 2.22945K wps
[Epoch 72 Batch 150/173] avg loss 0.00203821, throughput 2.21607K wps
Begin Testing...
[Epoch 72] train avg loss 0.00210878, dev acc 0.8081, dev avg loss 0.444071, throughput 2.23693K wps
[Epoch 73 Batch 30/173] avg loss 0.00192142, throughput 2.25826K wps
[Epoch 73 Batch 60/173] avg loss 0.00204512, throughput 2.23023K wps
[Epoch 73 Batch 90/173] avg loss 0.00212101, throughput 2.23545K wps
[Epoch 73 Batch 120/173] avg loss 0.00198996, throughput 2.22555K wps
[Epoch 73 Batch 150/173] avg loss 0.00196432, throughput 2.21613K wps
Begin Testing...
[Epoch 73] train avg loss 0.00201554, dev acc 0.8040, dev avg loss 0.454492, throughput 2.22932K wps
[Epoch 74 Batch 30/173] avg loss 0.00191426, throughput 2.28053K wps
[Epoch 74 Batch 60/173] avg loss 0.00189331, throughput 2.23388K wps
[Epoch 74 Batch 90/173] avg loss 0.00203154, throughput 2.23101K wps
[Epoch 74 Batch 120/173] avg loss 0.00174561, throughput 2.23444K wps
[Epoch 74 Batch 150/173] avg loss 0.002047, throughput 2.2288K wps
Begin Testing...
[Epoch 74] train avg loss 0.00196443, dev acc 0.8071, dev avg loss 0.445346, throughput 2.23861K wps
[Epoch 75 Batch 30/173] avg loss 0.00188881, throughput 2.27992K wps
[Epoch 75 Batch 60/173] avg loss 0.00200662, throughput 2.22617K wps
[Epoch 75 Batch 90/173] avg loss 0.00188361, throughput 2.22572K wps
[Epoch 75 Batch 120/173] avg loss 0.00204287, throughput 2.22655K wps
[Epoch 75 Batch 150/173] avg loss 0.00190147, throughput 2.2245K wps
Begin Testing...
[Epoch 75] train avg loss 0.00197523, dev acc 0.8071, dev avg loss 0.44373, throughput 2.23598K wps
[Epoch 76 Batch 30/173] avg loss 0.00198186, throughput 2.29031K wps
[Epoch 76 Batch 60/173] avg loss 0.00182938, throughput 2.22598K wps
[Epoch 76 Batch 90/173] avg loss 0.00169773, throughput 2.23277K wps
[Epoch 76 Batch 120/173] avg loss 0.00172784, throughput 2.23254K wps
[Epoch 76 Batch 150/173] avg loss 0.00191212, throughput 2.20665K wps
Begin Testing...
[Epoch 76] train avg loss 0.00186004, dev acc 0.8071, dev avg loss 0.447477, throughput 2.23576K wps
[Epoch 77 Batch 30/173] avg loss 0.00184417, throughput 2.27934K wps
[Epoch 77 Batch 60/173] avg loss 0.00178771, throughput 2.21588K wps
[Epoch 77 Batch 90/173] avg loss 0.00181514, throughput 2.22331K wps
[Epoch 77 Batch 120/173] avg loss 0.00191928, throughput 2.19979K wps
[Epoch 77 Batch 150/173] avg loss 0.001752, throughput 2.22312K wps
Begin Testing...
[Epoch 77] train avg loss 0.00181575, dev acc 0.8081, dev avg loss 0.454403, throughput 2.22881K wps
[Epoch 78 Batch 30/173] avg loss 0.00173329, throughput 2.27579K wps
[Epoch 78 Batch 60/173] avg loss 0.00178442, throughput 2.2213K wps
[Epoch 78 Batch 90/173] avg loss 0.00175276, throughput 2.22534K wps
[Epoch 78 Batch 120/173] avg loss 0.00177086, throughput 2.23438K wps
[Epoch 78 Batch 150/173] avg loss 0.00187641, throughput 2.23295K wps
Begin Testing...
[Epoch 78] train avg loss 0.00177811, dev acc 0.8102, dev avg loss 0.44897, throughput 2.23659K wps
[Epoch 79 Batch 30/173] avg loss 0.00178649, throughput 2.27176K wps
[Epoch 79 Batch 60/173] avg loss 0.00185711, throughput 2.22727K wps
[Epoch 79 Batch 90/173] avg loss 0.00164394, throughput 2.22377K wps
[Epoch 79 Batch 120/173] avg loss 0.00170575, throughput 2.23183K wps
[Epoch 79 Batch 150/173] avg loss 0.00175291, throughput 2.23362K wps
Begin Testing...
[Epoch 79] train avg loss 0.00177615, dev acc 0.8113, dev avg loss 0.451358, throughput 2.2369K wps
[Epoch 80 Batch 30/173] avg loss 0.00155773, throughput 2.2644K wps
[Epoch 80 Batch 60/173] avg loss 0.00181303, throughput 2.22968K wps
[Epoch 80 Batch 90/173] avg loss 0.00183247, throughput 2.23356K wps
[Epoch 80 Batch 120/173] avg loss 0.0016552, throughput 2.23567K wps
[Epoch 80 Batch 150/173] avg loss 0.00174318, throughput 2.2293K wps
Begin Testing...
[Epoch 80] train avg loss 0.00172157, dev acc 0.8102, dev avg loss 0.45086, throughput 2.23798K wps
[Epoch 81 Batch 30/173] avg loss 0.00152564, throughput 2.27964K wps
[Epoch 81 Batch 60/173] avg loss 0.00156188, throughput 2.21452K wps
[Epoch 81 Batch 90/173] avg loss 0.00170985, throughput 2.21789K wps
[Epoch 81 Batch 120/173] avg loss 0.00151587, throughput 2.23723K wps
[Epoch 81 Batch 150/173] avg loss 0.00176714, throughput 2.23541K wps
Begin Testing...
[Epoch 81] train avg loss 0.00163275, dev acc 0.8123, dev avg loss 0.453572, throughput 2.23293K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/173] avg loss 0.00160767, throughput 2.28997K wps
[Epoch 82 Batch 60/173] avg loss 0.00158405, throughput 2.23442K wps
[Epoch 82 Batch 90/173] avg loss 0.00173258, throughput 2.22869K wps
[Epoch 82 Batch 120/173] avg loss 0.00164885, throughput 2.22156K wps
[Epoch 82 Batch 150/173] avg loss 0.0016989, throughput 2.20765K wps
Begin Testing...
[Epoch 82] train avg loss 0.00165671, dev acc 0.8144, dev avg loss 0.457862, throughput 2.23514K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/173] avg loss 0.00155637, throughput 2.27153K wps
[Epoch 83 Batch 60/173] avg loss 0.00163104, throughput 2.20912K wps
[Epoch 83 Batch 90/173] avg loss 0.00155036, throughput 2.23358K wps
[Epoch 83 Batch 120/173] avg loss 0.00164909, throughput 2.22983K wps
[Epoch 83 Batch 150/173] avg loss 0.001667, throughput 2.23428K wps
Begin Testing...
[Epoch 83] train avg loss 0.00160813, dev acc 0.8092, dev avg loss 0.462031, throughput 2.23391K wps
[Epoch 84 Batch 30/173] avg loss 0.00137771, throughput 2.27025K wps
[Epoch 84 Batch 60/173] avg loss 0.00157586, throughput 2.20831K wps
[Epoch 84 Batch 90/173] avg loss 0.0015285, throughput 2.22863K wps
[Epoch 84 Batch 120/173] avg loss 0.00154218, throughput 2.22586K wps
[Epoch 84 Batch 150/173] avg loss 0.00172218, throughput 2.21864K wps
Begin Testing...
[Epoch 84] train avg loss 0.00155784, dev acc 0.8092, dev avg loss 0.457669, throughput 2.23088K wps
[Epoch 85 Batch 30/173] avg loss 0.00131453, throughput 2.28654K wps
[Epoch 85 Batch 60/173] avg loss 0.00155742, throughput 2.23282K wps
[Epoch 85 Batch 90/173] avg loss 0.00133858, throughput 2.22984K wps
[Epoch 85 Batch 120/173] avg loss 0.00160987, throughput 2.23534K wps
[Epoch 85 Batch 150/173] avg loss 0.00152762, throughput 2.22712K wps
Begin Testing...
[Epoch 85] train avg loss 0.00149213, dev acc 0.8081, dev avg loss 0.46314, throughput 2.23732K wps
[Epoch 86 Batch 30/173] avg loss 0.00145814, throughput 2.27433K wps
[Epoch 86 Batch 60/173] avg loss 0.00135428, throughput 2.23011K wps
[Epoch 86 Batch 90/173] avg loss 0.00143486, throughput 2.2211K wps
[Epoch 86 Batch 120/173] avg loss 0.0015245, throughput 2.22533K wps
[Epoch 86 Batch 150/173] avg loss 0.00155306, throughput 2.23558K wps
Begin Testing...
[Epoch 86] train avg loss 0.00145547, dev acc 0.8123, dev avg loss 0.461892, throughput 2.23757K wps
[Epoch 87 Batch 30/173] avg loss 0.00140713, throughput 2.26502K wps
[Epoch 87 Batch 60/173] avg loss 0.00138127, throughput 2.23225K wps
[Epoch 87 Batch 90/173] avg loss 0.00150961, throughput 2.22963K wps
[Epoch 87 Batch 120/173] avg loss 0.00146808, throughput 2.23281K wps
[Epoch 87 Batch 150/173] avg loss 0.00149519, throughput 2.21964K wps
Begin Testing...
[Epoch 87] train avg loss 0.00145885, dev acc 0.8019, dev avg loss 0.470976, throughput 2.23632K wps
[Epoch 88 Batch 30/173] avg loss 0.00134935, throughput 2.27906K wps
[Epoch 88 Batch 60/173] avg loss 0.00149142, throughput 2.23268K wps
[Epoch 88 Batch 90/173] avg loss 0.00146592, throughput 2.23164K wps
[Epoch 88 Batch 120/173] avg loss 0.00131268, throughput 2.22722K wps
[Epoch 88 Batch 150/173] avg loss 0.00155359, throughput 2.2334K wps
Begin Testing...
[Epoch 88] train avg loss 0.00142431, dev acc 0.8081, dev avg loss 0.469107, throughput 2.23921K wps
[Epoch 89 Batch 30/173] avg loss 0.00138722, throughput 2.26601K wps
[Epoch 89 Batch 60/173] avg loss 0.00133633, throughput 2.21827K wps
[Epoch 89 Batch 90/173] avg loss 0.00140705, throughput 2.22498K wps
[Epoch 89 Batch 120/173] avg loss 0.00132129, throughput 2.23673K wps
[Epoch 89 Batch 150/173] avg loss 0.00122398, throughput 2.22973K wps
Begin Testing...
[Epoch 89] train avg loss 0.00134061, dev acc 0.8123, dev avg loss 0.465799, throughput 2.235K wps
[Epoch 90 Batch 30/173] avg loss 0.0012051, throughput 2.28768K wps
[Epoch 90 Batch 60/173] avg loss 0.00135473, throughput 2.20958K wps
[Epoch 90 Batch 90/173] avg loss 0.00140136, throughput 2.23289K wps
[Epoch 90 Batch 120/173] avg loss 0.00135411, throughput 2.23066K wps
[Epoch 90 Batch 150/173] avg loss 0.0013242, throughput 2.22781K wps
Begin Testing...
[Epoch 90] train avg loss 0.0013442, dev acc 0.8133, dev avg loss 0.469814, throughput 2.23567K wps
[Epoch 91 Batch 30/173] avg loss 0.00129685, throughput 2.27311K wps
[Epoch 91 Batch 60/173] avg loss 0.0013497, throughput 2.22612K wps
[Epoch 91 Batch 90/173] avg loss 0.00124841, throughput 2.22736K wps
[Epoch 91 Batch 120/173] avg loss 0.00133576, throughput 2.231K wps
[Epoch 91 Batch 150/173] avg loss 0.00135715, throughput 2.2294K wps
Begin Testing...
[Epoch 91] train avg loss 0.00134165, dev acc 0.8144, dev avg loss 0.470726, throughput 2.2368K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/173] avg loss 0.00126121, throughput 2.26972K wps
[Epoch 92 Batch 60/173] avg loss 0.00130454, throughput 2.21546K wps
[Epoch 92 Batch 90/173] avg loss 0.00124308, throughput 2.22434K wps
[Epoch 92 Batch 120/173] avg loss 0.00135459, throughput 2.23235K wps
[Epoch 92 Batch 150/173] avg loss 0.00122237, throughput 2.22631K wps
Begin Testing...
[Epoch 92] train avg loss 0.00128987, dev acc 0.8102, dev avg loss 0.470617, throughput 2.23367K wps
[Epoch 93 Batch 30/173] avg loss 0.00128708, throughput 2.267K wps
[Epoch 93 Batch 60/173] avg loss 0.00131075, throughput 2.23263K wps
[Epoch 93 Batch 90/173] avg loss 0.00124445, throughput 2.21069K wps
[Epoch 93 Batch 120/173] avg loss 0.00127034, throughput 2.2238K wps
[Epoch 93 Batch 150/173] avg loss 0.0013554, throughput 2.22737K wps
Begin Testing...
[Epoch 93] train avg loss 0.00129607, dev acc 0.8081, dev avg loss 0.470558, throughput 2.22862K wps
[Epoch 94 Batch 30/173] avg loss 0.00121528, throughput 2.26585K wps
[Epoch 94 Batch 60/173] avg loss 0.00120747, throughput 2.2297K wps
[Epoch 94 Batch 90/173] avg loss 0.00128766, throughput 2.22427K wps
[Epoch 94 Batch 120/173] avg loss 0.00128973, throughput 2.22048K wps
[Epoch 94 Batch 150/173] avg loss 0.00116768, throughput 2.23315K wps
Begin Testing...
[Epoch 94] train avg loss 0.00121503, dev acc 0.8123, dev avg loss 0.473797, throughput 2.23497K wps
[Epoch 95 Batch 30/173] avg loss 0.00123801, throughput 2.28775K wps
[Epoch 95 Batch 60/173] avg loss 0.00114561, throughput 2.23632K wps
[Epoch 95 Batch 90/173] avg loss 0.00116435, throughput 2.2296K wps
[Epoch 95 Batch 120/173] avg loss 0.00129417, throughput 2.22649K wps
[Epoch 95 Batch 150/173] avg loss 0.00123168, throughput 2.21874K wps
Begin Testing...
[Epoch 95] train avg loss 0.00121106, dev acc 0.8133, dev avg loss 0.474644, throughput 2.23742K wps
[Epoch 96 Batch 30/173] avg loss 0.00109619, throughput 2.28121K wps
[Epoch 96 Batch 60/173] avg loss 0.00116149, throughput 2.23033K wps
[Epoch 96 Batch 90/173] avg loss 0.00120731, throughput 2.22113K wps
[Epoch 96 Batch 120/173] avg loss 0.00126935, throughput 2.2289K wps
[Epoch 96 Batch 150/173] avg loss 0.00119373, throughput 2.23198K wps
Begin Testing...
[Epoch 96] train avg loss 0.0011857, dev acc 0.8123, dev avg loss 0.476494, throughput 2.23786K wps
[Epoch 97 Batch 30/173] avg loss 0.00111823, throughput 2.28003K wps
[Epoch 97 Batch 60/173] avg loss 0.00117748, throughput 2.22604K wps
[Epoch 97 Batch 90/173] avg loss 0.00113854, throughput 2.23081K wps
[Epoch 97 Batch 120/173] avg loss 0.00122335, throughput 2.23127K wps
[Epoch 97 Batch 150/173] avg loss 0.0011668, throughput 2.22815K wps
Begin Testing...
[Epoch 97] train avg loss 0.0011676, dev acc 0.8102, dev avg loss 0.47695, throughput 2.2372K wps
[Epoch 98 Batch 30/173] avg loss 0.0010865, throughput 2.26999K wps
[Epoch 98 Batch 60/173] avg loss 0.00112653, throughput 2.23193K wps
[Epoch 98 Batch 90/173] avg loss 0.00113308, throughput 2.23045K wps
[Epoch 98 Batch 120/173] avg loss 0.001194, throughput 2.20097K wps
[Epoch 98 Batch 150/173] avg loss 0.00104115, throughput 2.22744K wps
Begin Testing...
[Epoch 98] train avg loss 0.00112453, dev acc 0.8113, dev avg loss 0.480179, throughput 2.22991K wps
[Epoch 99 Batch 30/173] avg loss 0.00115036, throughput 2.25698K wps
[Epoch 99 Batch 60/173] avg loss 0.00109329, throughput 2.22903K wps
[Epoch 99 Batch 90/173] avg loss 0.000979228, throughput 2.22482K wps
[Epoch 99 Batch 120/173] avg loss 0.00100697, throughput 2.22324K wps
[Epoch 99 Batch 150/173] avg loss 0.00117077, throughput 2.21412K wps
Begin Testing...
[Epoch 99] train avg loss 0.00108939, dev acc 0.8144, dev avg loss 0.481143, throughput 2.2292K wps
Observed Improvement.
Begin Testing...
[Epoch 100 Batch 30/173] avg loss 0.00104722, throughput 2.27549K wps
[Epoch 100 Batch 60/173] avg loss 0.00107668, throughput 2.21932K wps
[Epoch 100 Batch 90/173] avg loss 0.00117522, throughput 2.20825K wps
[Epoch 100 Batch 120/173] avg loss 0.00115302, throughput 2.23037K wps
[Epoch 100 Batch 150/173] avg loss 0.000986745, throughput 2.22607K wps
Begin Testing...
[Epoch 100] train avg loss 0.001108, dev acc 0.8092, dev avg loss 0.486513, throughput 2.2321K wps
[Epoch 101 Batch 30/173] avg loss 0.000996345, throughput 2.26536K wps
[Epoch 101 Batch 60/173] avg loss 0.0011219, throughput 2.22893K wps
[Epoch 101 Batch 90/173] avg loss 0.00106991, throughput 2.23288K wps
[Epoch 101 Batch 120/173] avg loss 0.00109435, throughput 2.21487K wps
[Epoch 101 Batch 150/173] avg loss 0.00120184, throughput 2.22118K wps
Begin Testing...
[Epoch 101] train avg loss 0.00109677, dev acc 0.8102, dev avg loss 0.482466, throughput 2.22983K wps
[Epoch 102 Batch 30/173] avg loss 0.00109272, throughput 2.26541K wps
[Epoch 102 Batch 60/173] avg loss 0.00103992, throughput 2.23491K wps
[Epoch 102 Batch 90/173] avg loss 0.00105947, throughput 2.23256K wps
[Epoch 102 Batch 120/173] avg loss 0.00109405, throughput 2.23542K wps
[Epoch 102 Batch 150/173] avg loss 0.00108527, throughput 2.22722K wps
Begin Testing...
[Epoch 102] train avg loss 0.00106534, dev acc 0.8102, dev avg loss 0.482074, throughput 2.23643K wps
[Epoch 103 Batch 30/173] avg loss 0.00103151, throughput 2.28827K wps
[Epoch 103 Batch 60/173] avg loss 0.000945595, throughput 2.23058K wps
[Epoch 103 Batch 90/173] avg loss 0.000942978, throughput 2.22741K wps
[Epoch 103 Batch 120/173] avg loss 0.00112082, throughput 2.22051K wps
[Epoch 103 Batch 150/173] avg loss 0.00102912, throughput 2.22312K wps
Begin Testing...
[Epoch 103] train avg loss 0.00102904, dev acc 0.8123, dev avg loss 0.486177, throughput 2.23536K wps
[Epoch 104 Batch 30/173] avg loss 0.00103425, throughput 2.27341K wps
[Epoch 104 Batch 60/173] avg loss 0.000946377, throughput 2.23197K wps
[Epoch 104 Batch 90/173] avg loss 0.00105249, throughput 2.23544K wps
[Epoch 104 Batch 120/173] avg loss 0.000997439, throughput 2.23699K wps
[Epoch 104 Batch 150/173] avg loss 0.000959585, throughput 2.23033K wps
Begin Testing...
[Epoch 104] train avg loss 0.000998945, dev acc 0.8144, dev avg loss 0.487071, throughput 2.24058K wps
Observed Improvement.
Begin Testing...
[Epoch 105 Batch 30/173] avg loss 0.00103, throughput 2.27268K wps
[Epoch 105 Batch 60/173] avg loss 0.00105591, throughput 2.22993K wps
[Epoch 105 Batch 90/173] avg loss 0.000936626, throughput 2.22602K wps
[Epoch 105 Batch 120/173] avg loss 0.00108928, throughput 2.22788K wps
[Epoch 105 Batch 150/173] avg loss 0.000933292, throughput 2.23103K wps
Begin Testing...
[Epoch 105] train avg loss 0.00100892, dev acc 0.8133, dev avg loss 0.490154, throughput 2.23534K wps
[Epoch 106 Batch 30/173] avg loss 0.00105322, throughput 2.25418K wps
[Epoch 106 Batch 60/173] avg loss 0.00101503, throughput 2.2311K wps
[Epoch 106 Batch 90/173] avg loss 0.000856627, throughput 2.20588K wps
[Epoch 106 Batch 120/173] avg loss 0.00091206, throughput 2.23463K wps
[Epoch 106 Batch 150/173] avg loss 0.000944482, throughput 2.22157K wps
Begin Testing...
[Epoch 106] train avg loss 0.000964285, dev acc 0.8144, dev avg loss 0.490189, throughput 2.23027K wps
Observed Improvement.
Begin Testing...
[Epoch 107 Batch 30/173] avg loss 0.000984201, throughput 2.27125K wps
[Epoch 107 Batch 60/173] avg loss 0.000931491, throughput 2.22481K wps
[Epoch 107 Batch 90/173] avg loss 0.000972074, throughput 2.22225K wps
[Epoch 107 Batch 120/173] avg loss 0.000964123, throughput 2.2241K wps
[Epoch 107 Batch 150/173] avg loss 0.000997783, throughput 2.22011K wps
Begin Testing...
[Epoch 107] train avg loss 0.000965326, dev acc 0.8050, dev avg loss 0.500678, throughput 2.23118K wps
[Epoch 108 Batch 30/173] avg loss 0.00095811, throughput 2.24747K wps
[Epoch 108 Batch 60/173] avg loss 0.000803249, throughput 2.20783K wps
[Epoch 108 Batch 90/173] avg loss 0.000925667, throughput 2.22044K wps
[Epoch 108 Batch 120/173] avg loss 0.000989619, throughput 2.20801K wps
[Epoch 108 Batch 150/173] avg loss 0.00086845, throughput 2.22414K wps
Begin Testing...
[Epoch 108] train avg loss 0.000914857, dev acc 0.8123, dev avg loss 0.494032, throughput 2.22122K wps
[Epoch 109 Batch 30/173] avg loss 0.000938994, throughput 2.27696K wps
[Epoch 109 Batch 60/173] avg loss 0.000917574, throughput 2.21399K wps
[Epoch 109 Batch 90/173] avg loss 0.00099867, throughput 2.22443K wps
[Epoch 109 Batch 120/173] avg loss 0.000912317, throughput 2.22028K wps
[Epoch 109 Batch 150/173] avg loss 0.000820537, throughput 2.19944K wps
Begin Testing...
[Epoch 109] train avg loss 0.000928218, dev acc 0.8092, dev avg loss 0.497072, throughput 2.22409K wps
[Epoch 110 Batch 30/173] avg loss 0.000930275, throughput 2.28525K wps
[Epoch 110 Batch 60/173] avg loss 0.000961968, throughput 2.2166K wps
[Epoch 110 Batch 90/173] avg loss 0.000892621, throughput 2.21655K wps
[Epoch 110 Batch 120/173] avg loss 0.000853498, throughput 2.22136K wps
[Epoch 110 Batch 150/173] avg loss 0.000818775, throughput 2.22272K wps
Begin Testing...
[Epoch 110] train avg loss 0.000883608, dev acc 0.8154, dev avg loss 0.496178, throughput 2.23252K wps
Observed Improvement.
Begin Testing...
[Epoch 111 Batch 30/173] avg loss 0.000813818, throughput 2.2449K wps
[Epoch 111 Batch 60/173] avg loss 0.000980035, throughput 2.19763K wps
[Epoch 111 Batch 90/173] avg loss 0.000852757, throughput 2.23398K wps
[Epoch 111 Batch 120/173] avg loss 0.000947214, throughput 2.2252K wps
[Epoch 111 Batch 150/173] avg loss 0.000804311, throughput 2.23541K wps
Begin Testing...
[Epoch 111] train avg loss 0.000880117, dev acc 0.8113, dev avg loss 0.498774, throughput 2.22709K wps
[Epoch 112 Batch 30/173] avg loss 0.000780339, throughput 2.27783K wps
[Epoch 112 Batch 60/173] avg loss 0.000917582, throughput 2.23289K wps
[Epoch 112 Batch 90/173] avg loss 0.000813258, throughput 2.23016K wps
[Epoch 112 Batch 120/173] avg loss 0.000828961, throughput 2.2314K wps
[Epoch 112 Batch 150/173] avg loss 0.000935424, throughput 2.23331K wps
Begin Testing...
[Epoch 112] train avg loss 0.000851534, dev acc 0.8123, dev avg loss 0.500099, throughput 2.23843K wps
[Epoch 113 Batch 30/173] avg loss 0.000783024, throughput 2.26171K wps
[Epoch 113 Batch 60/173] avg loss 0.000845513, throughput 2.22244K wps
[Epoch 113 Batch 90/173] avg loss 0.000784268, throughput 2.23318K wps
[Epoch 113 Batch 120/173] avg loss 0.000889957, throughput 2.23866K wps
[Epoch 113 Batch 150/173] avg loss 0.00080269, throughput 2.23493K wps
Begin Testing...
[Epoch 113] train avg loss 0.000824248, dev acc 0.8123, dev avg loss 0.502066, throughput 2.23698K wps
[Epoch 114 Batch 30/173] avg loss 0.00081055, throughput 2.27767K wps
[Epoch 114 Batch 60/173] avg loss 0.000790082, throughput 2.23242K wps
[Epoch 114 Batch 90/173] avg loss 0.000749346, throughput 2.20805K wps
[Epoch 114 Batch 120/173] avg loss 0.000797722, throughput 2.23462K wps
[Epoch 114 Batch 150/173] avg loss 0.000875993, throughput 2.2197K wps
Begin Testing...
[Epoch 114] train avg loss 0.000816716, dev acc 0.8081, dev avg loss 0.506664, throughput 2.23263K wps
[Epoch 115 Batch 30/173] avg loss 0.000719767, throughput 2.28406K wps
[Epoch 115 Batch 60/173] avg loss 0.000849429, throughput 2.23316K wps
[Epoch 115 Batch 90/173] avg loss 0.000820064, throughput 2.22351K wps
[Epoch 115 Batch 120/173] avg loss 0.000800284, throughput 2.22258K wps
[Epoch 115 Batch 150/173] avg loss 0.000809852, throughput 2.23615K wps
Begin Testing...
[Epoch 115] train avg loss 0.000810073, dev acc 0.8133, dev avg loss 0.504297, throughput 2.23556K wps
[Epoch 116 Batch 30/173] avg loss 0.000821693, throughput 2.29005K wps
[Epoch 116 Batch 60/173] avg loss 0.000825292, throughput 2.23685K wps
[Epoch 116 Batch 90/173] avg loss 0.000838653, throughput 2.23362K wps
[Epoch 116 Batch 120/173] avg loss 0.000829255, throughput 2.22745K wps
[Epoch 116 Batch 150/173] avg loss 0.000784457, throughput 2.23392K wps
Begin Testing...