Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='Subj', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='multichannel')
Use gpu0
maximum length (in tokens): 120
Done! Tokenizing Time=0.23s, #Sentences=10000
SentimentNet(
(embedding): Embedding(21326 -> 300, float32)
(embedding_extend): Embedding(21326 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/162] avg loss 0.0138964, throughput 0.380024K wps
[Epoch 0 Batch 60/162] avg loss 0.0137831, throughput 2.87618K wps
[Epoch 0 Batch 90/162] avg loss 0.0133674, throughput 2.88467K wps
[Epoch 0 Batch 120/162] avg loss 0.0133749, throughput 2.87278K wps
[Epoch 0 Batch 150/162] avg loss 0.0129653, throughput 2.87305K wps
Begin Testing...
[Epoch 0] train avg loss 0.0134443, dev acc 0.8222, dev avg loss 0.638604, throughput 1.29787K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0128311, throughput 2.93727K wps
[Epoch 1 Batch 60/162] avg loss 0.0126233, throughput 2.87243K wps
[Epoch 1 Batch 90/162] avg loss 0.0122736, throughput 2.87761K wps
[Epoch 1 Batch 120/162] avg loss 0.0120976, throughput 2.8795K wps
[Epoch 1 Batch 150/162] avg loss 0.0119754, throughput 2.87019K wps
Begin Testing...
[Epoch 1] train avg loss 0.0122875, dev acc 0.7833, dev avg loss 0.589091, throughput 2.88476K wps
[Epoch 2 Batch 30/162] avg loss 0.0115491, throughput 2.95224K wps
[Epoch 2 Batch 60/162] avg loss 0.011373, throughput 2.86627K wps
[Epoch 2 Batch 90/162] avg loss 0.0110697, throughput 2.87124K wps
[Epoch 2 Batch 120/162] avg loss 0.0109144, throughput 2.876K wps
[Epoch 2 Batch 150/162] avg loss 0.0106293, throughput 2.88139K wps
Begin Testing...
[Epoch 2] train avg loss 0.0110937, dev acc 0.8444, dev avg loss 0.520248, throughput 2.88756K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0103331, throughput 2.9475K wps
[Epoch 3 Batch 60/162] avg loss 0.010201, throughput 2.88315K wps
[Epoch 3 Batch 90/162] avg loss 0.00986648, throughput 2.87473K wps
[Epoch 3 Batch 120/162] avg loss 0.00967512, throughput 2.87998K wps
[Epoch 3 Batch 150/162] avg loss 0.00923795, throughput 2.86988K wps
Begin Testing...
[Epoch 3] train avg loss 0.00981109, dev acc 0.8578, dev avg loss 0.46179, throughput 2.88891K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/162] avg loss 0.0090932, throughput 2.94135K wps
[Epoch 4 Batch 60/162] avg loss 0.00882954, throughput 2.86802K wps
[Epoch 4 Batch 90/162] avg loss 0.00873032, throughput 2.87543K wps
[Epoch 4 Batch 120/162] avg loss 0.00864487, throughput 2.87091K wps
[Epoch 4 Batch 150/162] avg loss 0.00846915, throughput 2.87025K wps
Begin Testing...
[Epoch 4] train avg loss 0.00869354, dev acc 0.8656, dev avg loss 0.415464, throughput 2.88353K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.00813413, throughput 2.93206K wps
[Epoch 5 Batch 60/162] avg loss 0.00813665, throughput 2.86653K wps
[Epoch 5 Batch 90/162] avg loss 0.00768971, throughput 2.86447K wps
[Epoch 5 Batch 120/162] avg loss 0.00747121, throughput 2.87135K wps
[Epoch 5 Batch 150/162] avg loss 0.0076455, throughput 2.87313K wps
Begin Testing...
[Epoch 5] train avg loss 0.00779033, dev acc 0.8667, dev avg loss 0.38158, throughput 2.87909K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/162] avg loss 0.00758568, throughput 2.93868K wps
[Epoch 6 Batch 60/162] avg loss 0.0073088, throughput 2.87464K wps
[Epoch 6 Batch 90/162] avg loss 0.00691635, throughput 2.87527K wps
[Epoch 6 Batch 120/162] avg loss 0.00699058, throughput 2.86312K wps
[Epoch 6 Batch 150/162] avg loss 0.00713405, throughput 2.8725K wps
Begin Testing...
[Epoch 6] train avg loss 0.00717096, dev acc 0.8778, dev avg loss 0.357619, throughput 2.88453K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/162] avg loss 0.00712791, throughput 2.9216K wps
[Epoch 7 Batch 60/162] avg loss 0.00687071, throughput 2.8574K wps
[Epoch 7 Batch 90/162] avg loss 0.00662372, throughput 2.86601K wps
[Epoch 7 Batch 120/162] avg loss 0.00619932, throughput 2.8585K wps
[Epoch 7 Batch 150/162] avg loss 0.00663813, throughput 2.85429K wps
Begin Testing...
[Epoch 7] train avg loss 0.0067061, dev acc 0.8822, dev avg loss 0.339897, throughput 2.87037K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.00641577, throughput 2.9275K wps
[Epoch 8 Batch 60/162] avg loss 0.00644268, throughput 2.85121K wps
[Epoch 8 Batch 90/162] avg loss 0.0060932, throughput 2.86631K wps
[Epoch 8 Batch 120/162] avg loss 0.00630381, throughput 2.86134K wps
[Epoch 8 Batch 150/162] avg loss 0.00647859, throughput 2.86K wps
Begin Testing...
[Epoch 8] train avg loss 0.00628918, dev acc 0.8811, dev avg loss 0.325832, throughput 2.87226K wps
[Epoch 9 Batch 30/162] avg loss 0.00608647, throughput 2.92001K wps
[Epoch 9 Batch 60/162] avg loss 0.00609688, throughput 2.8618K wps
[Epoch 9 Batch 90/162] avg loss 0.00621205, throughput 2.85325K wps
[Epoch 9 Batch 120/162] avg loss 0.00600907, throughput 2.86159K wps
[Epoch 9 Batch 150/162] avg loss 0.00573692, throughput 2.86344K wps
Begin Testing...
[Epoch 9] train avg loss 0.00600935, dev acc 0.8878, dev avg loss 0.314243, throughput 2.87031K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00551543, throughput 2.92522K wps
[Epoch 10 Batch 60/162] avg loss 0.00567959, throughput 2.85433K wps
[Epoch 10 Batch 90/162] avg loss 0.00588485, throughput 2.85746K wps
[Epoch 10 Batch 120/162] avg loss 0.00590448, throughput 2.84933K wps
[Epoch 10 Batch 150/162] avg loss 0.00561051, throughput 2.85152K wps
Begin Testing...
[Epoch 10] train avg loss 0.00569496, dev acc 0.8889, dev avg loss 0.304534, throughput 2.86622K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00550292, throughput 2.92553K wps
[Epoch 11 Batch 60/162] avg loss 0.00539554, throughput 2.8548K wps
[Epoch 11 Batch 90/162] avg loss 0.00549096, throughput 2.84922K wps
[Epoch 11 Batch 120/162] avg loss 0.00564696, throughput 2.84192K wps
[Epoch 11 Batch 150/162] avg loss 0.0052636, throughput 2.85888K wps
Begin Testing...
[Epoch 11] train avg loss 0.005459, dev acc 0.8944, dev avg loss 0.296574, throughput 2.86538K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00546465, throughput 2.92037K wps
[Epoch 12 Batch 60/162] avg loss 0.00535391, throughput 2.85136K wps
[Epoch 12 Batch 90/162] avg loss 0.00502935, throughput 2.86289K wps
[Epoch 12 Batch 120/162] avg loss 0.00493186, throughput 2.86078K wps
[Epoch 12 Batch 150/162] avg loss 0.00561903, throughput 2.85756K wps
Begin Testing...
[Epoch 12] train avg loss 0.00528647, dev acc 0.8944, dev avg loss 0.288418, throughput 2.86914K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.00518073, throughput 2.92249K wps
[Epoch 13 Batch 60/162] avg loss 0.00513369, throughput 2.86416K wps
[Epoch 13 Batch 90/162] avg loss 0.00499206, throughput 2.85946K wps
[Epoch 13 Batch 120/162] avg loss 0.00525506, throughput 2.8646K wps
[Epoch 13 Batch 150/162] avg loss 0.00476097, throughput 2.84729K wps
Begin Testing...
[Epoch 13] train avg loss 0.00507242, dev acc 0.8978, dev avg loss 0.281578, throughput 2.86985K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00481947, throughput 2.92018K wps
[Epoch 14 Batch 60/162] avg loss 0.00472878, throughput 2.85724K wps
[Epoch 14 Batch 90/162] avg loss 0.00514209, throughput 2.85264K wps
[Epoch 14 Batch 120/162] avg loss 0.00471546, throughput 2.848K wps
[Epoch 14 Batch 150/162] avg loss 0.00484997, throughput 2.85418K wps
Begin Testing...
[Epoch 14] train avg loss 0.00485347, dev acc 0.8989, dev avg loss 0.276625, throughput 2.86623K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00481464, throughput 2.91748K wps
[Epoch 15 Batch 60/162] avg loss 0.00457865, throughput 2.84805K wps
[Epoch 15 Batch 90/162] avg loss 0.00476256, throughput 2.86356K wps
[Epoch 15 Batch 120/162] avg loss 0.0045796, throughput 2.85988K wps
[Epoch 15 Batch 150/162] avg loss 0.00479902, throughput 2.86236K wps
Begin Testing...
[Epoch 15] train avg loss 0.0047182, dev acc 0.9011, dev avg loss 0.271143, throughput 2.8697K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/162] avg loss 0.00435743, throughput 2.91524K wps
[Epoch 16 Batch 60/162] avg loss 0.00437217, throughput 2.85491K wps
[Epoch 16 Batch 90/162] avg loss 0.00512718, throughput 2.8594K wps
[Epoch 16 Batch 120/162] avg loss 0.0045019, throughput 2.85587K wps
[Epoch 16 Batch 150/162] avg loss 0.00473681, throughput 2.84722K wps
Begin Testing...
[Epoch 16] train avg loss 0.00459288, dev acc 0.9011, dev avg loss 0.266808, throughput 2.86497K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.00449275, throughput 2.92351K wps
[Epoch 17 Batch 60/162] avg loss 0.00477214, throughput 2.85074K wps
[Epoch 17 Batch 90/162] avg loss 0.00483297, throughput 2.84278K wps
[Epoch 17 Batch 120/162] avg loss 0.00425088, throughput 2.84948K wps
[Epoch 17 Batch 150/162] avg loss 0.00435331, throughput 2.84833K wps
Begin Testing...
[Epoch 17] train avg loss 0.00451293, dev acc 0.9000, dev avg loss 0.263152, throughput 2.86188K wps
[Epoch 18 Batch 30/162] avg loss 0.00430086, throughput 2.93031K wps
[Epoch 18 Batch 60/162] avg loss 0.00419948, throughput 2.85798K wps
[Epoch 18 Batch 90/162] avg loss 0.00427769, throughput 2.84939K wps
[Epoch 18 Batch 120/162] avg loss 0.00466797, throughput 2.84492K wps
[Epoch 18 Batch 150/162] avg loss 0.00384848, throughput 2.82612K wps
Begin Testing...
[Epoch 18] train avg loss 0.00425581, dev acc 0.9011, dev avg loss 0.259164, throughput 2.85911K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.00420898, throughput 2.91201K wps
[Epoch 19 Batch 60/162] avg loss 0.0038842, throughput 2.86071K wps
[Epoch 19 Batch 90/162] avg loss 0.00418603, throughput 2.85996K wps
[Epoch 19 Batch 120/162] avg loss 0.00430973, throughput 2.85317K wps
[Epoch 19 Batch 150/162] avg loss 0.00411015, throughput 2.84526K wps
Begin Testing...
[Epoch 19] train avg loss 0.00413265, dev acc 0.9000, dev avg loss 0.25581, throughput 2.86409K wps
[Epoch 20 Batch 30/162] avg loss 0.00393577, throughput 2.91286K wps
[Epoch 20 Batch 60/162] avg loss 0.00384582, throughput 2.8495K wps
[Epoch 20 Batch 90/162] avg loss 0.00391975, throughput 2.85073K wps
[Epoch 20 Batch 120/162] avg loss 0.00412532, throughput 2.85025K wps
[Epoch 20 Batch 150/162] avg loss 0.00438643, throughput 2.85866K wps
Begin Testing...
[Epoch 20] train avg loss 0.00401859, dev acc 0.9011, dev avg loss 0.253072, throughput 2.86393K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/162] avg loss 0.00422246, throughput 2.91674K wps
[Epoch 21 Batch 60/162] avg loss 0.00389015, throughput 2.85765K wps
[Epoch 21 Batch 90/162] avg loss 0.00371139, throughput 2.84156K wps
[Epoch 21 Batch 120/162] avg loss 0.00368388, throughput 2.83665K wps
[Epoch 21 Batch 150/162] avg loss 0.0039363, throughput 2.84601K wps
Begin Testing...
[Epoch 21] train avg loss 0.00391396, dev acc 0.9022, dev avg loss 0.250953, throughput 2.85915K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.00380837, throughput 2.9133K wps
[Epoch 22 Batch 60/162] avg loss 0.00370024, throughput 2.85283K wps
[Epoch 22 Batch 90/162] avg loss 0.00381399, throughput 2.84094K wps
[Epoch 22 Batch 120/162] avg loss 0.00360247, throughput 2.8455K wps
[Epoch 22 Batch 150/162] avg loss 0.0040084, throughput 2.84163K wps
Begin Testing...
[Epoch 22] train avg loss 0.00377755, dev acc 0.9011, dev avg loss 0.248611, throughput 2.85775K wps
[Epoch 23 Batch 30/162] avg loss 0.00380406, throughput 2.91902K wps
[Epoch 23 Batch 60/162] avg loss 0.00358258, throughput 2.84738K wps
[Epoch 23 Batch 90/162] avg loss 0.00344865, throughput 2.85264K wps
[Epoch 23 Batch 120/162] avg loss 0.00380507, throughput 2.83859K wps
[Epoch 23 Batch 150/162] avg loss 0.00373859, throughput 2.84116K wps
Begin Testing...
[Epoch 23] train avg loss 0.00364694, dev acc 0.9067, dev avg loss 0.247619, throughput 2.85894K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/162] avg loss 0.00369344, throughput 2.91002K wps
[Epoch 24 Batch 60/162] avg loss 0.00341116, throughput 2.85587K wps
[Epoch 24 Batch 90/162] avg loss 0.0035034, throughput 2.85194K wps
[Epoch 24 Batch 120/162] avg loss 0.00383756, throughput 2.85388K wps
[Epoch 24 Batch 150/162] avg loss 0.0035659, throughput 2.8438K wps
Begin Testing...
[Epoch 24] train avg loss 0.00363778, dev acc 0.9044, dev avg loss 0.244353, throughput 2.86106K wps
[Epoch 25 Batch 30/162] avg loss 0.00325637, throughput 2.90256K wps
[Epoch 25 Batch 60/162] avg loss 0.00357636, throughput 2.84345K wps
[Epoch 25 Batch 90/162] avg loss 0.00351816, throughput 2.8565K wps
[Epoch 25 Batch 120/162] avg loss 0.00364578, throughput 2.85875K wps
[Epoch 25 Batch 150/162] avg loss 0.0034372, throughput 2.85599K wps
Begin Testing...
[Epoch 25] train avg loss 0.00347727, dev acc 0.9033, dev avg loss 0.242744, throughput 2.86182K wps
[Epoch 26 Batch 30/162] avg loss 0.00342192, throughput 2.91705K wps
[Epoch 26 Batch 60/162] avg loss 0.00303167, throughput 2.84233K wps
[Epoch 26 Batch 90/162] avg loss 0.00326156, throughput 2.84641K wps
[Epoch 26 Batch 120/162] avg loss 0.0036265, throughput 2.8515K wps
[Epoch 26 Batch 150/162] avg loss 0.00357389, throughput 2.85148K wps
Begin Testing...
[Epoch 26] train avg loss 0.00340479, dev acc 0.9022, dev avg loss 0.240839, throughput 2.85985K wps
[Epoch 27 Batch 30/162] avg loss 0.00327644, throughput 2.92387K wps
[Epoch 27 Batch 60/162] avg loss 0.00322794, throughput 2.84152K wps
[Epoch 27 Batch 90/162] avg loss 0.00337651, throughput 2.84169K wps
[Epoch 27 Batch 120/162] avg loss 0.00330357, throughput 2.85525K wps
[Epoch 27 Batch 150/162] avg loss 0.00330846, throughput 2.85538K wps
Begin Testing...
[Epoch 27] train avg loss 0.0032914, dev acc 0.9033, dev avg loss 0.239991, throughput 2.8609K wps
[Epoch 28 Batch 30/162] avg loss 0.00293917, throughput 2.90614K wps
[Epoch 28 Batch 60/162] avg loss 0.00335869, throughput 2.84474K wps
[Epoch 28 Batch 90/162] avg loss 0.00281786, throughput 2.84831K wps
[Epoch 28 Batch 120/162] avg loss 0.00331899, throughput 2.84263K wps
[Epoch 28 Batch 150/162] avg loss 0.0032328, throughput 2.84101K wps
Begin Testing...
[Epoch 28] train avg loss 0.00314656, dev acc 0.9022, dev avg loss 0.23766, throughput 2.8561K wps
[Epoch 29 Batch 30/162] avg loss 0.00340364, throughput 2.91114K wps
[Epoch 29 Batch 60/162] avg loss 0.00311617, throughput 2.85043K wps
[Epoch 29 Batch 90/162] avg loss 0.00300172, throughput 2.84253K wps
[Epoch 29 Batch 120/162] avg loss 0.00324662, throughput 2.84397K wps
[Epoch 29 Batch 150/162] avg loss 0.00305086, throughput 2.84417K wps
Begin Testing...
[Epoch 29] train avg loss 0.00312763, dev acc 0.9044, dev avg loss 0.236555, throughput 2.8577K wps
[Epoch 30 Batch 30/162] avg loss 0.00326289, throughput 2.91414K wps
[Epoch 30 Batch 60/162] avg loss 0.00278486, throughput 2.85724K wps
[Epoch 30 Batch 90/162] avg loss 0.00294169, throughput 2.85068K wps
[Epoch 30 Batch 120/162] avg loss 0.00306326, throughput 2.83032K wps
[Epoch 30 Batch 150/162] avg loss 0.00275363, throughput 2.83282K wps
Begin Testing...
[Epoch 30] train avg loss 0.00298309, dev acc 0.9033, dev avg loss 0.235307, throughput 2.85538K wps
[Epoch 31 Batch 30/162] avg loss 0.0029683, throughput 2.91721K wps
[Epoch 31 Batch 60/162] avg loss 0.00311114, throughput 2.85407K wps
[Epoch 31 Batch 90/162] avg loss 0.00271694, throughput 2.84864K wps
[Epoch 31 Batch 120/162] avg loss 0.00303651, throughput 2.84671K wps
[Epoch 31 Batch 150/162] avg loss 0.00292348, throughput 2.8427K wps
Begin Testing...
[Epoch 31] train avg loss 0.00294117, dev acc 0.9011, dev avg loss 0.234765, throughput 2.86083K wps
[Epoch 32 Batch 30/162] avg loss 0.00285044, throughput 2.9153K wps
[Epoch 32 Batch 60/162] avg loss 0.00298986, throughput 2.85618K wps
[Epoch 32 Batch 90/162] avg loss 0.00287308, throughput 2.85304K wps
[Epoch 32 Batch 120/162] avg loss 0.00290605, throughput 2.85568K wps
[Epoch 32 Batch 150/162] avg loss 0.00282027, throughput 2.8399K wps
Begin Testing...
[Epoch 32] train avg loss 0.00289567, dev acc 0.9011, dev avg loss 0.233952, throughput 2.8626K wps
[Epoch 33 Batch 30/162] avg loss 0.00272429, throughput 2.90335K wps
[Epoch 33 Batch 60/162] avg loss 0.00246595, throughput 2.84678K wps
[Epoch 33 Batch 90/162] avg loss 0.00314286, throughput 2.84563K wps
[Epoch 33 Batch 120/162] avg loss 0.0027371, throughput 2.84912K wps
[Epoch 33 Batch 150/162] avg loss 0.00278191, throughput 2.84058K wps
Begin Testing...
[Epoch 33] train avg loss 0.00271874, dev acc 0.9078, dev avg loss 0.235037, throughput 2.85573K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/162] avg loss 0.00269627, throughput 2.91398K wps
[Epoch 34 Batch 60/162] avg loss 0.00261127, throughput 2.84249K wps
[Epoch 34 Batch 90/162] avg loss 0.00276196, throughput 2.84896K wps
[Epoch 34 Batch 120/162] avg loss 0.00233479, throughput 2.84725K wps
[Epoch 34 Batch 150/162] avg loss 0.00276104, throughput 2.84161K wps
Begin Testing...
[Epoch 34] train avg loss 0.00266089, dev acc 0.9011, dev avg loss 0.231487, throughput 2.85688K wps
[Epoch 35 Batch 30/162] avg loss 0.00264859, throughput 2.911K wps
[Epoch 35 Batch 60/162] avg loss 0.00262733, throughput 2.84365K wps
[Epoch 35 Batch 90/162] avg loss 0.00260301, throughput 2.83675K wps
[Epoch 35 Batch 120/162] avg loss 0.00265155, throughput 2.85189K wps
[Epoch 35 Batch 150/162] avg loss 0.00240103, throughput 2.83789K wps
Begin Testing...
[Epoch 35] train avg loss 0.00260775, dev acc 0.9011, dev avg loss 0.230703, throughput 2.85477K wps
[Epoch 36 Batch 30/162] avg loss 0.0026211, throughput 2.90402K wps
[Epoch 36 Batch 60/162] avg loss 0.00262029, throughput 2.83496K wps
[Epoch 36 Batch 90/162] avg loss 0.00265928, throughput 2.84401K wps
[Epoch 36 Batch 120/162] avg loss 0.00228246, throughput 2.84515K wps
[Epoch 36 Batch 150/162] avg loss 0.00253621, throughput 2.8524K wps
Begin Testing...
[Epoch 36] train avg loss 0.00252997, dev acc 0.9089, dev avg loss 0.23292, throughput 2.85491K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/162] avg loss 0.00231633, throughput 2.90776K wps
[Epoch 37 Batch 60/162] avg loss 0.00250297, throughput 2.8525K wps
[Epoch 37 Batch 90/162] avg loss 0.00245545, throughput 2.84787K wps
[Epoch 37 Batch 120/162] avg loss 0.0026351, throughput 2.84928K wps
[Epoch 37 Batch 150/162] avg loss 0.00258049, throughput 2.84203K wps
Begin Testing...
[Epoch 37] train avg loss 0.00248686, dev acc 0.9044, dev avg loss 0.230062, throughput 2.85718K wps
[Epoch 38 Batch 30/162] avg loss 0.00232289, throughput 2.91189K wps
[Epoch 38 Batch 60/162] avg loss 0.00248218, throughput 2.83421K wps
[Epoch 38 Batch 90/162] avg loss 0.00242746, throughput 2.84942K wps
[Epoch 38 Batch 120/162] avg loss 0.00232132, throughput 2.8256K wps
[Epoch 38 Batch 150/162] avg loss 0.00252168, throughput 2.83975K wps
Begin Testing...
[Epoch 38] train avg loss 0.00244033, dev acc 0.9022, dev avg loss 0.228842, throughput 2.85187K wps
[Epoch 39 Batch 30/162] avg loss 0.00237648, throughput 2.90011K wps
[Epoch 39 Batch 60/162] avg loss 0.00231659, throughput 2.84885K wps
[Epoch 39 Batch 90/162] avg loss 0.00226327, throughput 2.83789K wps
[Epoch 39 Batch 120/162] avg loss 0.00245943, throughput 2.83289K wps
[Epoch 39 Batch 150/162] avg loss 0.00222034, throughput 2.83702K wps
Begin Testing...
[Epoch 39] train avg loss 0.00234437, dev acc 0.9022, dev avg loss 0.228772, throughput 2.85157K wps
[Epoch 40 Batch 30/162] avg loss 0.00230565, throughput 2.9128K wps
[Epoch 40 Batch 60/162] avg loss 0.00229289, throughput 2.84963K wps
[Epoch 40 Batch 90/162] avg loss 0.00245596, throughput 2.84491K wps
[Epoch 40 Batch 120/162] avg loss 0.00221464, throughput 2.84566K wps
[Epoch 40 Batch 150/162] avg loss 0.00221391, throughput 2.84597K wps
Begin Testing...
[Epoch 40] train avg loss 0.00230896, dev acc 0.9033, dev avg loss 0.228349, throughput 2.85798K wps
[Epoch 41 Batch 30/162] avg loss 0.0024307, throughput 2.9201K wps
[Epoch 41 Batch 60/162] avg loss 0.00203928, throughput 2.85748K wps
[Epoch 41 Batch 90/162] avg loss 0.00206588, throughput 2.85058K wps
[Epoch 41 Batch 120/162] avg loss 0.00229838, throughput 2.84417K wps
[Epoch 41 Batch 150/162] avg loss 0.00227806, throughput 2.8397K wps
Begin Testing...
[Epoch 41] train avg loss 0.00221936, dev acc 0.9067, dev avg loss 0.228118, throughput 2.86059K wps
[Epoch 42 Batch 30/162] avg loss 0.00215535, throughput 2.91609K wps
[Epoch 42 Batch 60/162] avg loss 0.00212562, throughput 2.8568K wps
[Epoch 42 Batch 90/162] avg loss 0.00191348, throughput 2.85264K wps
[Epoch 42 Batch 120/162] avg loss 0.00221609, throughput 2.83954K wps
[Epoch 42 Batch 150/162] avg loss 0.0023602, throughput 2.84435K wps
Begin Testing...
[Epoch 42] train avg loss 0.00217235, dev acc 0.9011, dev avg loss 0.227928, throughput 2.86145K wps
[Epoch 43 Batch 30/162] avg loss 0.00215674, throughput 2.90451K wps
[Epoch 43 Batch 60/162] avg loss 0.00202348, throughput 2.84345K wps
[Epoch 43 Batch 90/162] avg loss 0.00205478, throughput 2.8503K wps
[Epoch 43 Batch 120/162] avg loss 0.00199039, throughput 2.84298K wps
[Epoch 43 Batch 150/162] avg loss 0.00203692, throughput 2.84346K wps
Begin Testing...
[Epoch 43] train avg loss 0.00205275, dev acc 0.9033, dev avg loss 0.227698, throughput 2.85572K wps
[Epoch 44 Batch 30/162] avg loss 0.00200406, throughput 2.90092K wps
[Epoch 44 Batch 60/162] avg loss 0.00209073, throughput 2.84431K wps
[Epoch 44 Batch 90/162] avg loss 0.00223816, throughput 2.8316K wps
[Epoch 44 Batch 120/162] avg loss 0.00191899, throughput 2.83392K wps
[Epoch 44 Batch 150/162] avg loss 0.00210787, throughput 2.83689K wps
Begin Testing...
[Epoch 44] train avg loss 0.00204715, dev acc 0.9033, dev avg loss 0.227732, throughput 2.84836K wps
[Epoch 45 Batch 30/162] avg loss 0.00207968, throughput 2.91067K wps
[Epoch 45 Batch 60/162] avg loss 0.00212122, throughput 2.84554K wps
[Epoch 45 Batch 90/162] avg loss 0.00205177, throughput 2.83681K wps
[Epoch 45 Batch 120/162] avg loss 0.0021031, throughput 2.84666K wps
[Epoch 45 Batch 150/162] avg loss 0.00187084, throughput 2.8431K wps
Begin Testing...
[Epoch 45] train avg loss 0.00203079, dev acc 0.9056, dev avg loss 0.226374, throughput 2.85476K wps
[Epoch 46 Batch 30/162] avg loss 0.00209566, throughput 2.91496K wps
[Epoch 46 Batch 60/162] avg loss 0.00181338, throughput 2.841K wps
[Epoch 46 Batch 90/162] avg loss 0.0019891, throughput 2.84367K wps
[Epoch 46 Batch 120/162] avg loss 0.00189966, throughput 2.83885K wps
[Epoch 46 Batch 150/162] avg loss 0.00198596, throughput 2.82662K wps
Begin Testing...
[Epoch 46] train avg loss 0.00195204, dev acc 0.9067, dev avg loss 0.228293, throughput 2.85167K wps
[Epoch 47 Batch 30/162] avg loss 0.00177321, throughput 2.9056K wps
[Epoch 47 Batch 60/162] avg loss 0.00195287, throughput 2.82933K wps
[Epoch 47 Batch 90/162] avg loss 0.00193994, throughput 2.83336K wps
[Epoch 47 Batch 120/162] avg loss 0.00194607, throughput 2.83812K wps
[Epoch 47 Batch 150/162] avg loss 0.00171768, throughput 2.84913K wps
Begin Testing...
[Epoch 47] train avg loss 0.00185224, dev acc 0.9056, dev avg loss 0.227443, throughput 2.85099K wps
[Epoch 48 Batch 30/162] avg loss 0.00181275, throughput 2.89716K wps
[Epoch 48 Batch 60/162] avg loss 0.00193769, throughput 2.85171K wps
[Epoch 48 Batch 90/162] avg loss 0.00180902, throughput 2.8378K wps
[Epoch 48 Batch 120/162] avg loss 0.0019879, throughput 2.83443K wps
[Epoch 48 Batch 150/162] avg loss 0.00190835, throughput 2.84519K wps
Begin Testing...
[Epoch 48] train avg loss 0.00186112, dev acc 0.9044, dev avg loss 0.226782, throughput 2.8528K wps
[Epoch 49 Batch 30/162] avg loss 0.00174752, throughput 2.91749K wps
[Epoch 49 Batch 60/162] avg loss 0.00184221, throughput 2.8428K wps
[Epoch 49 Batch 90/162] avg loss 0.00180921, throughput 2.83457K wps
[Epoch 49 Batch 120/162] avg loss 0.00164791, throughput 2.84146K wps
[Epoch 49 Batch 150/162] avg loss 0.00186182, throughput 2.85559K wps
Begin Testing...
[Epoch 49] train avg loss 0.00177446, dev acc 0.9044, dev avg loss 0.226035, throughput 2.85764K wps
[Epoch 50 Batch 30/162] avg loss 0.00179791, throughput 2.9093K wps
[Epoch 50 Batch 60/162] avg loss 0.00173695, throughput 2.85201K wps
[Epoch 50 Batch 90/162] avg loss 0.0017309, throughput 2.84664K wps
[Epoch 50 Batch 120/162] avg loss 0.00193608, throughput 2.8413K wps
[Epoch 50 Batch 150/162] avg loss 0.00157212, throughput 2.84037K wps
Begin Testing...
[Epoch 50] train avg loss 0.0017583, dev acc 0.9044, dev avg loss 0.226271, throughput 2.85704K wps
[Epoch 51 Batch 30/162] avg loss 0.00157371, throughput 2.90069K wps
[Epoch 51 Batch 60/162] avg loss 0.00174001, throughput 2.84322K wps
[Epoch 51 Batch 90/162] avg loss 0.00166661, throughput 2.83983K wps
[Epoch 51 Batch 120/162] avg loss 0.00166755, throughput 2.843K wps
[Epoch 51 Batch 150/162] avg loss 0.0016713, throughput 2.84693K wps
Begin Testing...
[Epoch 51] train avg loss 0.00166629, dev acc 0.9044, dev avg loss 0.226261, throughput 2.85363K wps
[Epoch 52 Batch 30/162] avg loss 0.00167653, throughput 2.90778K wps
[Epoch 52 Batch 60/162] avg loss 0.00155208, throughput 2.85632K wps
[Epoch 52 Batch 90/162] avg loss 0.00181021, throughput 2.8328K wps
[Epoch 52 Batch 120/162] avg loss 0.00171234, throughput 2.84406K wps
[Epoch 52 Batch 150/162] avg loss 0.00154559, throughput 2.83784K wps
Begin Testing...
[Epoch 52] train avg loss 0.00165991, dev acc 0.9044, dev avg loss 0.226556, throughput 2.85495K wps
[Epoch 53 Batch 30/162] avg loss 0.0016806, throughput 2.90409K wps
[Epoch 53 Batch 60/162] avg loss 0.0017122, throughput 2.82664K wps
[Epoch 53 Batch 90/162] avg loss 0.0013039, throughput 2.84265K wps
[Epoch 53 Batch 120/162] avg loss 0.00170737, throughput 2.83495K wps
[Epoch 53 Batch 150/162] avg loss 0.00170253, throughput 2.84847K wps
Begin Testing...
[Epoch 53] train avg loss 0.0016021, dev acc 0.9044, dev avg loss 0.226534, throughput 2.84826K wps
[Epoch 54 Batch 30/162] avg loss 0.00142128, throughput 2.90243K wps
[Epoch 54 Batch 60/162] avg loss 0.00151494, throughput 2.8449K wps
[Epoch 54 Batch 90/162] avg loss 0.00170093, throughput 2.85423K wps
[Epoch 54 Batch 120/162] avg loss 0.00143971, throughput 2.85806K wps
[Epoch 54 Batch 150/162] avg loss 0.00164008, throughput 2.83072K wps
Begin Testing...
[Epoch 54] train avg loss 0.00152627, dev acc 0.9044, dev avg loss 0.227801, throughput 2.8577K wps
[Epoch 55 Batch 30/162] avg loss 0.00161234, throughput 2.91638K wps
[Epoch 55 Batch 60/162] avg loss 0.00147335, throughput 2.83093K wps
[Epoch 55 Batch 90/162] avg loss 0.00149758, throughput 2.85516K wps
[Epoch 55 Batch 120/162] avg loss 0.00173805, throughput 2.84504K wps
[Epoch 55 Batch 150/162] avg loss 0.00148896, throughput 2.83336K wps
Begin Testing...
[Epoch 55] train avg loss 0.00155659, dev acc 0.9056, dev avg loss 0.227007, throughput 2.8543K wps
[Epoch 56 Batch 30/162] avg loss 0.00145421, throughput 2.90646K wps
[Epoch 56 Batch 60/162] avg loss 0.00166749, throughput 2.84706K wps
[Epoch 56 Batch 90/162] avg loss 0.00147973, throughput 2.84426K wps
[Epoch 56 Batch 120/162] avg loss 0.00155913, throughput 2.83216K wps
[Epoch 56 Batch 150/162] avg loss 0.0013572, throughput 2.84849K wps
Begin Testing...
[Epoch 56] train avg loss 0.00151245, dev acc 0.9056, dev avg loss 0.228041, throughput 2.85531K wps
[Epoch 57 Batch 30/162] avg loss 0.001536, throughput 2.90923K wps
[Epoch 57 Batch 60/162] avg loss 0.00128017, throughput 2.84505K wps
[Epoch 57 Batch 90/162] avg loss 0.00136152, throughput 2.85371K wps
[Epoch 57 Batch 120/162] avg loss 0.00143625, throughput 2.84884K wps
[Epoch 57 Batch 150/162] avg loss 0.00127971, throughput 2.83854K wps
Begin Testing...
[Epoch 57] train avg loss 0.00140948, dev acc 0.9056, dev avg loss 0.22758, throughput 2.85841K wps
[Epoch 58 Batch 30/162] avg loss 0.00133701, throughput 2.90379K wps
[Epoch 58 Batch 60/162] avg loss 0.00142152, throughput 2.84346K wps
[Epoch 58 Batch 90/162] avg loss 0.0012755, throughput 2.85638K wps
[Epoch 58 Batch 120/162] avg loss 0.00144339, throughput 2.82721K wps
[Epoch 58 Batch 150/162] avg loss 0.0014658, throughput 2.82588K wps
Begin Testing...
[Epoch 58] train avg loss 0.00138407, dev acc 0.9056, dev avg loss 0.227975, throughput 2.85027K wps
[Epoch 59 Batch 30/162] avg loss 0.00139269, throughput 2.90996K wps
[Epoch 59 Batch 60/162] avg loss 0.001364, throughput 2.84579K wps
[Epoch 59 Batch 90/162] avg loss 0.00122791, throughput 2.83971K wps
[Epoch 59 Batch 120/162] avg loss 0.00143224, throughput 2.84598K wps
[Epoch 59 Batch 150/162] avg loss 0.00135992, throughput 2.85061K wps
Begin Testing...
[Epoch 59] train avg loss 0.00136698, dev acc 0.9033, dev avg loss 0.228819, throughput 2.85708K wps
[Epoch 60 Batch 30/162] avg loss 0.00128408, throughput 2.9063K wps
[Epoch 60 Batch 60/162] avg loss 0.00134202, throughput 2.82749K wps
[Epoch 60 Batch 90/162] avg loss 0.0013532, throughput 2.83965K wps
[Epoch 60 Batch 120/162] avg loss 0.00134723, throughput 2.84791K wps
[Epoch 60 Batch 150/162] avg loss 0.0012155, throughput 2.83979K wps
Begin Testing...
[Epoch 60] train avg loss 0.00130203, dev acc 0.9033, dev avg loss 0.229494, throughput 2.85008K wps
[Epoch 61 Batch 30/162] avg loss 0.00125725, throughput 2.90434K wps
[Epoch 61 Batch 60/162] avg loss 0.00135264, throughput 2.83841K wps
[Epoch 61 Batch 90/162] avg loss 0.00128029, throughput 2.83503K wps
[Epoch 61 Batch 120/162] avg loss 0.00138429, throughput 2.83953K wps
[Epoch 61 Batch 150/162] avg loss 0.00121333, throughput 2.84657K wps
Begin Testing...
[Epoch 61] train avg loss 0.00128389, dev acc 0.9056, dev avg loss 0.2292, throughput 2.85236K wps
[Epoch 62 Batch 30/162] avg loss 0.00123856, throughput 2.91338K wps
[Epoch 62 Batch 60/162] avg loss 0.00128401, throughput 2.84888K wps
[Epoch 62 Batch 90/162] avg loss 0.00127681, throughput 2.84712K wps
[Epoch 62 Batch 120/162] avg loss 0.00123668, throughput 2.83745K wps
[Epoch 62 Batch 150/162] avg loss 0.00128042, throughput 2.85682K wps
Begin Testing...
[Epoch 62] train avg loss 0.00125855, dev acc 0.9056, dev avg loss 0.229627, throughput 2.86056K wps
[Epoch 63 Batch 30/162] avg loss 0.00123118, throughput 2.90402K wps
[Epoch 63 Batch 60/162] avg loss 0.00123799, throughput 2.84184K wps
[Epoch 63 Batch 90/162] avg loss 0.0013235, throughput 2.83703K wps
[Epoch 63 Batch 120/162] avg loss 0.00114718, throughput 2.84486K wps
[Epoch 63 Batch 150/162] avg loss 0.00116034, throughput 2.85093K wps
Begin Testing...
[Epoch 63] train avg loss 0.0012114, dev acc 0.9056, dev avg loss 0.23032, throughput 2.85512K wps
[Epoch 64 Batch 30/162] avg loss 0.00124789, throughput 2.91469K wps
[Epoch 64 Batch 60/162] avg loss 0.00116881, throughput 2.8479K wps
[Epoch 64 Batch 90/162] avg loss 0.00114881, throughput 2.84149K wps
[Epoch 64 Batch 120/162] avg loss 0.00113747, throughput 2.83722K wps
[Epoch 64 Batch 150/162] avg loss 0.00119606, throughput 2.84526K wps
Begin Testing...
[Epoch 64] train avg loss 0.00116225, dev acc 0.9056, dev avg loss 0.229784, throughput 2.85513K wps
[Epoch 65 Batch 30/162] avg loss 0.00110441, throughput 2.91725K wps
[Epoch 65 Batch 60/162] avg loss 0.0011162, throughput 2.84629K wps
[Epoch 65 Batch 90/162] avg loss 0.0012144, throughput 2.84155K wps
[Epoch 65 Batch 120/162] avg loss 0.00128643, throughput 2.84296K wps
[Epoch 65 Batch 150/162] avg loss 0.0010892, throughput 2.84232K wps
Begin Testing...
[Epoch 65] train avg loss 0.00115905, dev acc 0.9078, dev avg loss 0.229836, throughput 2.85768K wps
[Epoch 66 Batch 30/162] avg loss 0.00112366, throughput 2.90305K wps
[Epoch 66 Batch 60/162] avg loss 0.00116896, throughput 2.84383K wps
[Epoch 66 Batch 90/162] avg loss 0.00100442, throughput 2.83938K wps
[Epoch 66 Batch 120/162] avg loss 0.00125454, throughput 2.85267K wps
[Epoch 66 Batch 150/162] avg loss 0.00100297, throughput 2.85048K wps
Begin Testing...
[Epoch 66] train avg loss 0.00111331, dev acc 0.9044, dev avg loss 0.23046, throughput 2.85596K wps
[Epoch 67 Batch 30/162] avg loss 0.000869507, throughput 2.9126K wps
[Epoch 67 Batch 60/162] avg loss 0.00130871, throughput 2.84791K wps
[Epoch 67 Batch 90/162] avg loss 0.00112524, throughput 2.83227K wps
[Epoch 67 Batch 120/162] avg loss 0.0010088, throughput 2.8537K wps
[Epoch 67 Batch 150/162] avg loss 0.00109492, throughput 2.84712K wps
Begin Testing...
[Epoch 67] train avg loss 0.00108939, dev acc 0.9067, dev avg loss 0.230569, throughput 2.85571K wps
[Epoch 68 Batch 30/162] avg loss 0.00104163, throughput 2.91644K wps
[Epoch 68 Batch 60/162] avg loss 0.0010048, throughput 2.84211K wps
[Epoch 68 Batch 90/162] avg loss 0.0010283, throughput 2.84313K wps
[Epoch 68 Batch 120/162] avg loss 0.00106672, throughput 2.84308K wps
[Epoch 68 Batch 150/162] avg loss 0.00104462, throughput 2.86109K wps
Begin Testing...
[Epoch 68] train avg loss 0.00105862, dev acc 0.9078, dev avg loss 0.23204, throughput 2.86078K wps
[Epoch 69 Batch 30/162] avg loss 0.00102807, throughput 2.91222K wps
[Epoch 69 Batch 60/162] avg loss 0.00105117, throughput 2.84644K wps
[Epoch 69 Batch 90/162] avg loss 0.00100211, throughput 2.82089K wps
[Epoch 69 Batch 120/162] avg loss 0.000927151, throughput 2.83955K wps
[Epoch 69 Batch 150/162] avg loss 0.00111232, throughput 2.83091K wps
Begin Testing...
[Epoch 69] train avg loss 0.00103169, dev acc 0.9067, dev avg loss 0.232467, throughput 2.8483K wps
[Epoch 70 Batch 30/162] avg loss 0.000963729, throughput 2.90687K wps
[Epoch 70 Batch 60/162] avg loss 0.00101922, throughput 2.84219K wps
[Epoch 70 Batch 90/162] avg loss 0.00101138, throughput 2.85257K wps
[Epoch 70 Batch 120/162] avg loss 0.000890679, throughput 2.85067K wps
[Epoch 70 Batch 150/162] avg loss 0.00104326, throughput 2.84168K wps
Begin Testing...
[Epoch 70] train avg loss 0.000982, dev acc 0.9067, dev avg loss 0.232434, throughput 2.85752K wps
[Epoch 71 Batch 30/162] avg loss 0.000850753, throughput 2.89637K wps
[Epoch 71 Batch 60/162] avg loss 0.000956453, throughput 2.84786K wps
[Epoch 71 Batch 90/162] avg loss 0.00109514, throughput 2.84666K wps
[Epoch 71 Batch 120/162] avg loss 0.000975199, throughput 2.85308K wps
[Epoch 71 Batch 150/162] avg loss 0.0010034, throughput 2.83541K wps
Begin Testing...
[Epoch 71] train avg loss 0.000970863, dev acc 0.9056, dev avg loss 0.232154, throughput 2.85551K wps
[Epoch 72 Batch 30/162] avg loss 0.00097514, throughput 2.90876K wps
[Epoch 72 Batch 60/162] avg loss 0.000959059, throughput 2.83813K wps
[Epoch 72 Batch 90/162] avg loss 0.00111825, throughput 2.8505K wps
[Epoch 72 Batch 120/162] avg loss 0.00096063, throughput 2.84846K wps
[Epoch 72 Batch 150/162] avg loss 0.000947403, throughput 2.83374K wps
Begin Testing...
[Epoch 72] train avg loss 0.000990612, dev acc 0.9067, dev avg loss 0.233018, throughput 2.85548K wps
[Epoch 73 Batch 30/162] avg loss 0.00088134, throughput 2.90648K wps
[Epoch 73 Batch 60/162] avg loss 0.000957602, throughput 2.85413K wps
[Epoch 73 Batch 90/162] avg loss 0.000968219, throughput 2.85478K wps
[Epoch 73 Batch 120/162] avg loss 0.000985348, throughput 2.85496K wps
[Epoch 73 Batch 150/162] avg loss 0.000857551, throughput 2.83593K wps
Begin Testing...
[Epoch 73] train avg loss 0.000935751, dev acc 0.9067, dev avg loss 0.233156, throughput 2.85895K wps
[Epoch 74 Batch 30/162] avg loss 0.000884842, throughput 2.90564K wps
[Epoch 74 Batch 60/162] avg loss 0.000933415, throughput 2.83504K wps
[Epoch 74 Batch 90/162] avg loss 0.000964425, throughput 2.85372K wps
[Epoch 74 Batch 120/162] avg loss 0.000910303, throughput 2.8401K wps
[Epoch 74 Batch 150/162] avg loss 0.000869134, throughput 2.83635K wps
Begin Testing...
[Epoch 74] train avg loss 0.000937268, dev acc 0.9067, dev avg loss 0.233367, throughput 2.85314K wps
[Epoch 75 Batch 30/162] avg loss 0.000836116, throughput 2.90709K wps
[Epoch 75 Batch 60/162] avg loss 0.000818589, throughput 2.85347K wps
[Epoch 75 Batch 90/162] avg loss 0.000945318, throughput 2.85349K wps
[Epoch 75 Batch 120/162] avg loss 0.000906659, throughput 2.84392K wps
[Epoch 75 Batch 150/162] avg loss 0.000911652, throughput 2.83683K wps
Begin Testing...
[Epoch 75] train avg loss 0.00090878, dev acc 0.9078, dev avg loss 0.232604, throughput 2.85647K wps
[Epoch 76 Batch 30/162] avg loss 0.000980741, throughput 2.91161K wps
[Epoch 76 Batch 60/162] avg loss 0.000900739, throughput 2.84527K wps
[Epoch 76 Batch 90/162] avg loss 0.000878072, throughput 2.8345K wps
[Epoch 76 Batch 120/162] avg loss 0.000939935, throughput 2.83434K wps
[Epoch 76 Batch 150/162] avg loss 0.000846443, throughput 2.84603K wps
Begin Testing...
[Epoch 76] train avg loss 0.000894067, dev acc 0.9067, dev avg loss 0.234034, throughput 2.85189K wps
[Epoch 77 Batch 30/162] avg loss 0.000785796, throughput 2.90659K wps
[Epoch 77 Batch 60/162] avg loss 0.000970566, throughput 2.84982K wps
[Epoch 77 Batch 90/162] avg loss 0.000890169, throughput 2.82178K wps
[Epoch 77 Batch 120/162] avg loss 0.000949368, throughput 2.84367K wps
[Epoch 77 Batch 150/162] avg loss 0.000889429, throughput 2.83859K wps
Begin Testing...
[Epoch 77] train avg loss 0.00087749, dev acc 0.9044, dev avg loss 0.23459, throughput 2.8513K wps
[Epoch 78 Batch 30/162] avg loss 0.0008479, throughput 2.91409K wps
[Epoch 78 Batch 60/162] avg loss 0.0010054, throughput 2.83889K wps
[Epoch 78 Batch 90/162] avg loss 0.000777277, throughput 2.84569K wps
[Epoch 78 Batch 120/162] avg loss 0.000841298, throughput 2.84167K wps
[Epoch 78 Batch 150/162] avg loss 0.000802751, throughput 2.84545K wps
Begin Testing...
[Epoch 78] train avg loss 0.000835772, dev acc 0.9056, dev avg loss 0.236631, throughput 2.85514K wps
[Epoch 79 Batch 30/162] avg loss 0.00078633, throughput 2.91907K wps
[Epoch 79 Batch 60/162] avg loss 0.000813634, throughput 2.83654K wps
[Epoch 79 Batch 90/162] avg loss 0.00080347, throughput 2.84582K wps
[Epoch 79 Batch 120/162] avg loss 0.000758097, throughput 2.84103K wps
[Epoch 79 Batch 150/162] avg loss 0.00079186, throughput 2.83574K wps
Begin Testing...
[Epoch 79] train avg loss 0.000798258, dev acc 0.9078, dev avg loss 0.235525, throughput 2.85395K wps
[Epoch 80 Batch 30/162] avg loss 0.000824573, throughput 2.90626K wps
[Epoch 80 Batch 60/162] avg loss 0.000677684, throughput 2.845K wps
[Epoch 80 Batch 90/162] avg loss 0.000805826, throughput 2.84564K wps
[Epoch 80 Batch 120/162] avg loss 0.000793358, throughput 2.85111K wps
[Epoch 80 Batch 150/162] avg loss 0.000764577, throughput 2.84197K wps
Begin Testing...
[Epoch 80] train avg loss 0.000773224, dev acc 0.9056, dev avg loss 0.23625, throughput 2.85722K wps
[Epoch 81 Batch 30/162] avg loss 0.000691456, throughput 2.91032K wps
[Epoch 81 Batch 60/162] avg loss 0.000815678, throughput 2.84062K wps
[Epoch 81 Batch 90/162] avg loss 0.000769336, throughput 2.84855K wps
[Epoch 81 Batch 120/162] avg loss 0.000715061, throughput 2.83475K wps
[Epoch 81 Batch 150/162] avg loss 0.000833244, throughput 2.83594K wps
Begin Testing...
[Epoch 81] train avg loss 0.000768904, dev acc 0.9067, dev avg loss 0.236316, throughput 2.85353K wps
[Epoch 82 Batch 30/162] avg loss 0.000740418, throughput 2.90254K wps
[Epoch 82 Batch 60/162] avg loss 0.000805191, throughput 2.84354K wps
[Epoch 82 Batch 90/162] avg loss 0.000753397, throughput 2.855K wps
[Epoch 82 Batch 120/162] avg loss 0.000833845, throughput 2.83604K wps
[Epoch 82 Batch 150/162] avg loss 0.000762782, throughput 2.8354K wps
Begin Testing...
[Epoch 82] train avg loss 0.000771611, dev acc 0.9067, dev avg loss 0.236818, throughput 2.85384K wps
[Epoch 83 Batch 30/162] avg loss 0.000749452, throughput 2.9115K wps
[Epoch 83 Batch 60/162] avg loss 0.000773576, throughput 2.8535K wps
[Epoch 83 Batch 90/162] avg loss 0.00066387, throughput 2.85514K wps
[Epoch 83 Batch 120/162] avg loss 0.0007209, throughput 2.84642K wps
[Epoch 83 Batch 150/162] avg loss 0.000729355, throughput 2.83977K wps
Begin Testing...
[Epoch 83] train avg loss 0.000721758, dev acc 0.9067, dev avg loss 0.237827, throughput 2.86001K wps
[Epoch 84 Batch 30/162] avg loss 0.000813177, throughput 2.90108K wps
[Epoch 84 Batch 60/162] avg loss 0.000739156, throughput 2.84809K wps
[Epoch 84 Batch 90/162] avg loss 0.000727374, throughput 2.85105K wps
[Epoch 84 Batch 120/162] avg loss 0.000739896, throughput 2.85226K wps
[Epoch 84 Batch 150/162] avg loss 0.000702449, throughput 2.84011K wps
Begin Testing...
[Epoch 84] train avg loss 0.000742347, dev acc 0.9056, dev avg loss 0.237648, throughput 2.85725K wps
[Epoch 85 Batch 30/162] avg loss 0.000682778, throughput 2.89985K wps
[Epoch 85 Batch 60/162] avg loss 0.000735085, throughput 2.8417K wps
[Epoch 85 Batch 90/162] avg loss 0.000728808, throughput 2.85285K wps
[Epoch 85 Batch 120/162] avg loss 0.000659668, throughput 2.83689K wps
[Epoch 85 Batch 150/162] avg loss 0.000675, throughput 2.84742K wps
Begin Testing...
[Epoch 85] train avg loss 0.000699231, dev acc 0.9056, dev avg loss 0.237472, throughput 2.85488K wps
[Epoch 86 Batch 30/162] avg loss 0.000624871, throughput 2.90598K wps
[Epoch 86 Batch 60/162] avg loss 0.000777454, throughput 2.8493K wps
[Epoch 86 Batch 90/162] avg loss 0.000697018, throughput 2.84623K wps
[Epoch 86 Batch 120/162] avg loss 0.000780973, throughput 2.84772K wps
[Epoch 86 Batch 150/162] avg loss 0.00073592, throughput 2.83671K wps
Begin Testing...
[Epoch 86] train avg loss 0.000713563, dev acc 0.9078, dev avg loss 0.240011, throughput 2.85555K wps
[Epoch 87 Batch 30/162] avg loss 0.000746554, throughput 2.89838K wps
[Epoch 87 Batch 60/162] avg loss 0.000690899, throughput 2.8417K wps
[Epoch 87 Batch 90/162] avg loss 0.000650546, throughput 2.84737K wps
[Epoch 87 Batch 120/162] avg loss 0.000738004, throughput 2.84098K wps
[Epoch 87 Batch 150/162] avg loss 0.000639157, throughput 2.84153K wps
Begin Testing...
[Epoch 87] train avg loss 0.000695644, dev acc 0.9067, dev avg loss 0.238518, throughput 2.85146K wps
[Epoch 88 Batch 30/162] avg loss 0.000663644, throughput 2.91461K wps
[Epoch 88 Batch 60/162] avg loss 0.000698833, throughput 2.85347K wps
[Epoch 88 Batch 90/162] avg loss 0.000622016, throughput 2.83487K wps
[Epoch 88 Batch 120/162] avg loss 0.000623394, throughput 2.85014K wps
[Epoch 88 Batch 150/162] avg loss 0.00067001, throughput 2.83334K wps
Begin Testing...
[Epoch 88] train avg loss 0.000654295, dev acc 0.9078, dev avg loss 0.239833, throughput 2.85535K wps
[Epoch 89 Batch 30/162] avg loss 0.000658942, throughput 2.904K wps
[Epoch 89 Batch 60/162] avg loss 0.000648115, throughput 2.83506K wps
[Epoch 89 Batch 90/162] avg loss 0.000766039, throughput 2.83887K wps
[Epoch 89 Batch 120/162] avg loss 0.000799897, throughput 2.84537K wps
[Epoch 89 Batch 150/162] avg loss 0.000545742, throughput 2.83298K wps
Begin Testing...
[Epoch 89] train avg loss 0.000679738, dev acc 0.9078, dev avg loss 0.240927, throughput 2.84974K wps
[Epoch 90 Batch 30/162] avg loss 0.000621818, throughput 2.88728K wps
[Epoch 90 Batch 60/162] avg loss 0.000529324, throughput 2.84073K wps
[Epoch 90 Batch 90/162] avg loss 0.000685422, throughput 2.84354K wps
[Epoch 90 Batch 120/162] avg loss 0.000561118, throughput 2.83761K wps
[Epoch 90 Batch 150/162] avg loss 0.000633691, throughput 2.84166K wps
Begin Testing...
[Epoch 90] train avg loss 0.000611642, dev acc 0.9078, dev avg loss 0.240396, throughput 2.84903K wps
[Epoch 91 Batch 30/162] avg loss 0.000575966, throughput 2.90765K wps
[Epoch 91 Batch 60/162] avg loss 0.000633025, throughput 2.83138K wps
[Epoch 91 Batch 90/162] avg loss 0.0006193, throughput 2.84092K wps
[Epoch 91 Batch 120/162] avg loss 0.000606862, throughput 2.82941K wps
[Epoch 91 Batch 150/162] avg loss 0.000607303, throughput 2.83981K wps
Begin Testing...
[Epoch 91] train avg loss 0.000620225, dev acc 0.9078, dev avg loss 0.240517, throughput 2.8464K wps
[Epoch 92 Batch 30/162] avg loss 0.000603279, throughput 2.90229K wps
[Epoch 92 Batch 60/162] avg loss 0.000594766, throughput 2.82886K wps
[Epoch 92 Batch 90/162] avg loss 0.000585111, throughput 2.82681K wps
[Epoch 92 Batch 120/162] avg loss 0.000592244, throughput 2.83247K wps
[Epoch 92 Batch 150/162] avg loss 0.000562384, throughput 2.82779K wps
Begin Testing...
[Epoch 92] train avg loss 0.000592072, dev acc 0.9078, dev avg loss 0.241511, throughput 2.84282K wps
[Epoch 93 Batch 30/162] avg loss 0.000599908, throughput 2.89607K wps
[Epoch 93 Batch 60/162] avg loss 0.000635065, throughput 2.84667K wps
[Epoch 93 Batch 90/162] avg loss 0.000605778, throughput 2.85198K wps
[Epoch 93 Batch 120/162] avg loss 0.000649525, throughput 2.82789K wps
[Epoch 93 Batch 150/162] avg loss 0.000633871, throughput 2.84013K wps
Begin Testing...
[Epoch 93] train avg loss 0.000615697, dev acc 0.9067, dev avg loss 0.243123, throughput 2.85075K wps
[Epoch 94 Batch 30/162] avg loss 0.000577368, throughput 2.90195K wps
[Epoch 94 Batch 60/162] avg loss 0.000566524, throughput 2.84361K wps
[Epoch 94 Batch 90/162] avg loss 0.000616806, throughput 2.84907K wps
[Epoch 94 Batch 120/162] avg loss 0.000576829, throughput 2.83335K wps
[Epoch 94 Batch 150/162] avg loss 0.000612114, throughput 2.83843K wps
Begin Testing...
[Epoch 94] train avg loss 0.000590225, dev acc 0.9067, dev avg loss 0.241955, throughput 2.85194K wps
[Epoch 95 Batch 30/162] avg loss 0.000573936, throughput 2.91684K wps
[Epoch 95 Batch 60/162] avg loss 0.000508016, throughput 2.84281K wps
[Epoch 95 Batch 90/162] avg loss 0.000504827, throughput 2.83339K wps
[Epoch 95 Batch 120/162] avg loss 0.000594794, throughput 2.85127K wps
[Epoch 95 Batch 150/162] avg loss 0.00056608, throughput 2.84213K wps
Begin Testing...
[Epoch 95] train avg loss 0.000547571, dev acc 0.9078, dev avg loss 0.242879, throughput 2.85484K wps
[Epoch 96 Batch 30/162] avg loss 0.00053317, throughput 2.91718K wps
[Epoch 96 Batch 60/162] avg loss 0.000635194, throughput 2.83836K wps
[Epoch 96 Batch 90/162] avg loss 0.000564649, throughput 2.83931K wps
[Epoch 96 Batch 120/162] avg loss 0.000638812, throughput 2.82874K wps
[Epoch 96 Batch 150/162] avg loss 0.000498342, throughput 2.83608K wps
Begin Testing...
[Epoch 96] train avg loss 0.000569574, dev acc 0.9078, dev avg loss 0.244022, throughput 2.85069K wps
[Epoch 97 Batch 30/162] avg loss 0.000505255, throughput 2.90635K wps
[Epoch 97 Batch 60/162] avg loss 0.00051611, throughput 2.85383K wps
[Epoch 97 Batch 90/162] avg loss 0.000585797, throughput 2.8507K wps
[Epoch 97 Batch 120/162] avg loss 0.000588511, throughput 2.84315K wps
[Epoch 97 Batch 150/162] avg loss 0.000586873, throughput 2.83328K wps
Begin Testing...
[Epoch 97] train avg loss 0.000556432, dev acc 0.9078, dev avg loss 0.243459, throughput 2.8569K wps
[Epoch 98 Batch 30/162] avg loss 0.000513791, throughput 2.90798K wps
[Epoch 98 Batch 60/162] avg loss 0.000519807, throughput 2.8549K wps
[Epoch 98 Batch 90/162] avg loss 0.000588907, throughput 2.8513K wps
[Epoch 98 Batch 120/162] avg loss 0.00052946, throughput 2.83847K wps
[Epoch 98 Batch 150/162] avg loss 0.000618087, throughput 2.83951K wps
Begin Testing...
[Epoch 98] train avg loss 0.00056038, dev acc 0.9078, dev avg loss 0.243213, throughput 2.85657K wps
[Epoch 99 Batch 30/162] avg loss 0.000566582, throughput 2.9065K wps
[Epoch 99 Batch 60/162] avg loss 0.000523228, throughput 2.84712K wps
[Epoch 99 Batch 90/162] avg loss 0.000508791, throughput 2.831K wps
[Epoch 99 Batch 120/162] avg loss 0.000552805, throughput 2.8247K wps
[Epoch 99 Batch 150/162] avg loss 0.000496398, throughput 2.84338K wps
Begin Testing...
[Epoch 99] train avg loss 0.000535132, dev acc 0.9078, dev avg loss 0.244146, throughput 2.84854K wps
[Epoch 100 Batch 30/162] avg loss 0.000490527, throughput 2.90456K wps
[Epoch 100 Batch 60/162] avg loss 0.00052169, throughput 2.8298K wps
[Epoch 100 Batch 90/162] avg loss 0.000532691, throughput 2.84384K wps
[Epoch 100 Batch 120/162] avg loss 0.000544286, throughput 2.8468K wps
[Epoch 100 Batch 150/162] avg loss 0.000539446, throughput 2.82592K wps
Begin Testing...
[Epoch 100] train avg loss 0.000532248, dev acc 0.9067, dev avg loss 0.245179, throughput 2.84987K wps
[Epoch 101 Batch 30/162] avg loss 0.000495329, throughput 2.89563K wps
[Epoch 101 Batch 60/162] avg loss 0.000511763, throughput 2.84669K wps
[Epoch 101 Batch 90/162] avg loss 0.000554401, throughput 2.84899K wps
[Epoch 101 Batch 120/162] avg loss 0.000558458, throughput 2.82884K wps
[Epoch 101 Batch 150/162] avg loss 0.000492754, throughput 2.84665K wps
Begin Testing...
[Epoch 101] train avg loss 0.000519695, dev acc 0.9078, dev avg loss 0.245912, throughput 2.85169K wps
[Epoch 102 Batch 30/162] avg loss 0.000447408, throughput 2.904K wps
[Epoch 102 Batch 60/162] avg loss 0.000482189, throughput 2.84492K wps
[Epoch 102 Batch 90/162] avg loss 0.000516629, throughput 2.83389K wps
[Epoch 102 Batch 120/162] avg loss 0.000608285, throughput 2.82992K wps
[Epoch 102 Batch 150/162] avg loss 0.000592384, throughput 2.85795K wps
Begin Testing...
[Epoch 102] train avg loss 0.000521914, dev acc 0.9078, dev avg loss 0.245696, throughput 2.85296K wps
[Epoch 103 Batch 30/162] avg loss 0.000514724, throughput 2.91693K wps
[Epoch 103 Batch 60/162] avg loss 0.000429483, throughput 2.84255K wps
[Epoch 103 Batch 90/162] avg loss 0.000474122, throughput 2.84494K wps
[Epoch 103 Batch 120/162] avg loss 0.000521548, throughput 2.83986K wps
[Epoch 103 Batch 150/162] avg loss 0.000478067, throughput 2.83447K wps
Begin Testing...
[Epoch 103] train avg loss 0.000484127, dev acc 0.9056, dev avg loss 0.246471, throughput 2.85462K wps
[Epoch 104 Batch 30/162] avg loss 0.000495477, throughput 2.92018K wps
[Epoch 104 Batch 60/162] avg loss 0.000462472, throughput 2.8388K wps
[Epoch 104 Batch 90/162] avg loss 0.000444543, throughput 2.83733K wps
[Epoch 104 Batch 120/162] avg loss 0.000427834, throughput 2.84602K wps
[Epoch 104 Batch 150/162] avg loss 0.000520299, throughput 2.85155K wps
Begin Testing...
[Epoch 104] train avg loss 0.000472975, dev acc 0.9078, dev avg loss 0.247498, throughput 2.85638K wps
[Epoch 105 Batch 30/162] avg loss 0.000446777, throughput 2.90277K wps
[Epoch 105 Batch 60/162] avg loss 0.000514122, throughput 2.83864K wps
[Epoch 105 Batch 90/162] avg loss 0.000469261, throughput 2.84714K wps
[Epoch 105 Batch 120/162] avg loss 0.000540689, throughput 2.85396K wps
[Epoch 105 Batch 150/162] avg loss 0.000484768, throughput 2.84225K wps
Begin Testing...
[Epoch 105] train avg loss 0.000489006, dev acc 0.9078, dev avg loss 0.247648, throughput 2.85548K wps
[Epoch 106 Batch 30/162] avg loss 0.000426952, throughput 2.90936K wps
[Epoch 106 Batch 60/162] avg loss 0.000451043, throughput 2.8342K wps
[Epoch 106 Batch 90/162] avg loss 0.000482894, throughput 2.84259K wps
[Epoch 106 Batch 120/162] avg loss 0.000507432, throughput 2.82794K wps
[Epoch 106 Batch 150/162] avg loss 0.000438614, throughput 2.83883K wps
Begin Testing...
[Epoch 106] train avg loss 0.000459588, dev acc 0.9078, dev avg loss 0.248232, throughput 2.85017K wps
[Epoch 107 Batch 30/162] avg loss 0.000462362, throughput 2.88798K wps
[Epoch 107 Batch 60/162] avg loss 0.000433992, throughput 2.83843K wps
[Epoch 107 Batch 90/162] avg loss 0.000486825, throughput 2.83097K wps
[Epoch 107 Batch 120/162] avg loss 0.000459617, throughput 2.84704K wps
[Epoch 107 Batch 150/162] avg loss 0.000430023, throughput 2.84631K wps
Begin Testing...
[Epoch 107] train avg loss 0.000460054, dev acc 0.9067, dev avg loss 0.248112, throughput 2.84917K wps
[Epoch 108 Batch 30/162] avg loss 0.000480028, throughput 2.91152K wps
[Epoch 108 Batch 60/162] avg loss 0.000449376, throughput 2.84266K wps
[Epoch 108 Batch 90/162] avg loss 0.000440152, throughput 2.83403K wps
[Epoch 108 Batch 120/162] avg loss 0.00045824, throughput 2.84516K wps
[Epoch 108 Batch 150/162] avg loss 0.000449604, throughput 2.85293K wps
Begin Testing...
[Epoch 108] train avg loss 0.000456185, dev acc 0.9067, dev avg loss 0.249347, throughput 2.85712K wps
[Epoch 109 Batch 30/162] avg loss 0.000479712, throughput 2.9128K wps
[Epoch 109 Batch 60/162] avg loss 0.000475393, throughput 2.85167K wps
[Epoch 109 Batch 90/162] avg loss 0.000404161, throughput 2.84442K wps
[Epoch 109 Batch 120/162] avg loss 0.000446042, throughput 2.84286K wps
[Epoch 109 Batch 150/162] avg loss 0.000458506, throughput 2.84371K wps
Begin Testing...
[Epoch 109] train avg loss 0.000452136, dev acc 0.9056, dev avg loss 0.24909, throughput 2.85874K wps
[Epoch 110 Batch 30/162] avg loss 0.000501257, throughput 2.92359K wps
[Epoch 110 Batch 60/162] avg loss 0.00042777, throughput 2.84788K wps
[Epoch 110 Batch 90/162] avg loss 0.000390081, throughput 2.85212K wps
[Epoch 110 Batch 120/162] avg loss 0.000465108, throughput 2.82978K wps
[Epoch 110 Batch 150/162] avg loss 0.000475942, throughput 2.83687K wps
Begin Testing...
[Epoch 110] train avg loss 0.000445391, dev acc 0.9078, dev avg loss 0.249571, throughput 2.85628K wps
[Epoch 111 Batch 30/162] avg loss 0.000450902, throughput 2.90192K wps
[Epoch 111 Batch 60/162] avg loss 0.000384977, throughput 2.84883K wps
[Epoch 111 Batch 90/162] avg loss 0.000456467, throughput 2.84164K wps
[Epoch 111 Batch 120/162] avg loss 0.00048761, throughput 2.84297K wps
[Epoch 111 Batch 150/162] avg loss 0.000462717, throughput 2.83496K wps
Begin Testing...
[Epoch 111] train avg loss 0.000444498, dev acc 0.9067, dev avg loss 0.250698, throughput 2.85254K wps
[Epoch 112 Batch 30/162] avg loss 0.000414771, throughput 2.90903K wps
[Epoch 112 Batch 60/162] avg loss 0.000415639, throughput 2.82847K wps
[Epoch 112 Batch 90/162] avg loss 0.000441349, throughput 2.83687K wps
[Epoch 112 Batch 120/162] avg loss 0.000421252, throughput 2.83837K wps
[Epoch 112 Batch 150/162] avg loss 0.00039807, throughput 2.83684K wps
Begin Testing...
[Epoch 112] train avg loss 0.000416356, dev acc 0.9056, dev avg loss 0.250708, throughput 2.84932K wps
[Epoch 113 Batch 30/162] avg loss 0.000487058, throughput 2.90997K wps
[Epoch 113 Batch 60/162] avg loss 0.000368417, throughput 2.84446K wps
[Epoch 113 Batch 90/162] avg loss 0.000427351, throughput 2.84503K wps
[Epoch 113 Batch 120/162] avg loss 0.000414962, throughput 2.83712K wps
[Epoch 113 Batch 150/162] avg loss 0.000387701, throughput 2.84608K wps
Begin Testing...
[Epoch 113] train avg loss 0.000415098, dev acc 0.9067, dev avg loss 0.251768, throughput 2.85653K wps
[Epoch 114 Batch 30/162] avg loss 0.000394797, throughput 2.89968K wps
[Epoch 114 Batch 60/162] avg loss 0.000411967, throughput 2.84457K wps
[Epoch 114 Batch 90/162] avg loss 0.00046631, throughput 2.83933K wps
[Epoch 114 Batch 120/162] avg loss 0.00037502, throughput 2.82927K wps
[Epoch 114 Batch 150/162] avg loss 0.0004259, throughput 2.84377K wps
Begin Testing...
[Epoch 114] train avg loss 0.0004154, dev acc 0.9067, dev avg loss 0.252278, throughput 2.85007K wps
[Epoch 115 Batch 30/162] avg loss 0.00037586, throughput 2.916K wps
[Epoch 115 Batch 60/162] avg loss 0.000396113, throughput 2.85318K wps
[Epoch 115 Batch 90/162] avg loss 0.000424643, throughput 2.83963K wps
[Epoch 115 Batch 120/162] avg loss 0.000403832, throughput 2.84099K wps
[Epoch 115 Batch 150/162] avg loss 0.000356379, throughput 2.85127K wps
Begin Testing...
[Epoch 115] train avg loss 0.000395086, dev acc 0.9056, dev avg loss 0.252415, throughput 2.85759K wps
[Epoch 116 Batch 30/162] avg loss 0.000402169, throughput 2.92267K wps
[Epoch 116 Batch 60/162] avg loss 0.000381511, throughput 2.84122K wps
[Epoch 116 Batch 90/162] avg loss 0.000386586, throughput 2.85396K wps
[Epoch 116 Batch 120/162] avg loss 0.000384737, throughput 2.84179K wps
[Epoch 116 Batch 150/162] avg loss 0.000361189, throughput 2.83604K wps
Begin Testing...
[Epoch 116] train avg loss 0.00039713, dev acc 0.9067, dev avg loss 0.253828, throughput 2.85857K wps
[Epoch 117 Batch 30/162] avg loss 0.000393617, throughput 2.90822K wps
[Epoch 117 Batch 60/162] avg loss 0.000371109, throughput 2.84498K wps
[Epoch 117 Batch 90/162] avg loss 0.000382363, throughput 2.86016K wps
[Epoch 117 Batch 120/162] avg loss 0.000401675, throughput 2.83629K wps
[Epoch 117 Batch 150/162] avg loss 0.00037671, throughput 2.84107K wps
Begin Testing...
[Epoch 117] train avg loss 0.000380794, dev acc 0.9056, dev avg loss 0.254055, throughput 2.85483K wps
[Epoch 118 Batch 30/162] avg loss 0.000410463, throughput 2.91829K wps
[Epoch 118 Batch 60/162] avg loss 0.000369383, throughput 2.84398K wps
[Epoch 118 Batch 90/162] avg loss 0.00037638, throughput 2.8559K wps
[Epoch 118 Batch 120/162] avg loss 0.000376124, throughput 2.84458K wps
[Epoch 118 Batch 150/162] avg loss 0.000333777, throughput 2.84186K wps
Begin Testing...
[Epoch 118] train avg loss 0.000373484, dev acc 0.9044, dev avg loss 0.254575, throughput 2.86109K wps
[Epoch 119 Batch 30/162] avg loss 0.000408638, throughput 2.89891K wps
[Epoch 119 Batch 60/162] avg loss 0.000354075, throughput 2.84878K wps
[Epoch 119 Batch 90/162] avg loss 0.00041459, throughput 2.8484K wps
[Epoch 119 Batch 120/162] avg loss 0.000362062, throughput 2.84035K wps
[Epoch 119 Batch 150/162] avg loss 0.000363761, throughput 2.84077K wps
Begin Testing...
[Epoch 119] train avg loss 0.0003815, dev acc 0.9067, dev avg loss 0.255571, throughput 2.85396K wps
[Epoch 120 Batch 30/162] avg loss 0.000365755, throughput 2.91512K wps
[Epoch 120 Batch 60/162] avg loss 0.000396961, throughput 2.85064K wps
[Epoch 120 Batch 90/162] avg loss 0.000330997, throughput 2.84436K wps
[Epoch 120 Batch 120/162] avg loss 0.000381007, throughput 2.84128K wps
[Epoch 120 Batch 150/162] avg loss 0.00034382, throughput 2.84438K wps
Begin Testing...
[Epoch 120] train avg loss 0.000363443, dev acc 0.9067, dev avg loss 0.256082, throughput 2.85838K wps
[Epoch 121 Batch 30/162] avg loss 0.00035549, throughput 2.90566K wps
[Epoch 121 Batch 60/162] avg loss 0.000426156, throughput 2.84802K wps
[Epoch 121 Batch 90/162] avg loss 0.000326886, throughput 2.85479K wps
[Epoch 121 Batch 120/162] avg loss 0.000367702, throughput 2.85067K wps
[Epoch 121 Batch 150/162] avg loss 0.000392767, throughput 2.84047K wps
Begin Testing...
[Epoch 121] train avg loss 0.00037173, dev acc 0.9067, dev avg loss 0.255784, throughput 2.85746K wps
[Epoch 122 Batch 30/162] avg loss 0.000348038, throughput 2.90299K wps
[Epoch 122 Batch 60/162] avg loss 0.000379657, throughput 2.83952K wps
[Epoch 122 Batch 90/162] avg loss 0.000385048, throughput 2.82892K wps
[Epoch 122 Batch 120/162] avg loss 0.000358905, throughput 2.85084K wps
[Epoch 122 Batch 150/162] avg loss 0.000394243, throughput 2.84651K wps
Begin Testing...
[Epoch 122] train avg loss 0.000369755, dev acc 0.9067, dev avg loss 0.255986, throughput 2.85187K wps
[Epoch 123 Batch 30/162] avg loss 0.000381306, throughput 2.91707K wps
[Epoch 123 Batch 60/162] avg loss 0.000338016, throughput 2.8493K wps
[Epoch 123 Batch 90/162] avg loss 0.000392657, throughput 2.8318K wps
[Epoch 123 Batch 120/162] avg loss 0.000358889, throughput 2.84014K wps
[Epoch 123 Batch 150/162] avg loss 0.000429914, throughput 2.84348K wps
Begin Testing...
[Epoch 123] train avg loss 0.000375861, dev acc 0.9067, dev avg loss 0.256836, throughput 2.85437K wps
[Epoch 124 Batch 30/162] avg loss 0.000469797, throughput 2.91087K wps
[Epoch 124 Batch 60/162] avg loss 0.00036796, throughput 2.84302K wps
[Epoch 124 Batch 90/162] avg loss 0.000326491, throughput 2.83399K wps
[Epoch 124 Batch 120/162] avg loss 0.000376859, throughput 2.83431K wps
[Epoch 124 Batch 150/162] avg loss 0.000331118, throughput 2.83615K wps
Begin Testing...
[Epoch 124] train avg loss 0.000376803, dev acc 0.9089, dev avg loss 0.256345, throughput 2.85061K wps
Observed Improvement.
Begin Testing...
[Epoch 125 Batch 30/162] avg loss 0.000368936, throughput 2.91203K wps
[Epoch 125 Batch 60/162] avg loss 0.000350557, throughput 2.8306K wps
[Epoch 125 Batch 90/162] avg loss 0.000336952, throughput 2.83448K wps
[Epoch 125 Batch 120/162] avg loss 0.000300936, throughput 2.82725K wps
[Epoch 125 Batch 150/162] avg loss 0.000330573, throughput 2.84562K wps
Begin Testing...
[Epoch 125] train avg loss 0.000339988, dev acc 0.9067, dev avg loss 0.256721, throughput 2.84847K wps
[Epoch 126 Batch 30/162] avg loss 0.000373878, throughput 2.90651K wps
[Epoch 126 Batch 60/162] avg loss 0.000308349, throughput 2.84314K wps
[Epoch 126 Batch 90/162] avg loss 0.000284221, throughput 2.83378K wps
[Epoch 126 Batch 120/162] avg loss 0.000367217, throughput 2.84479K wps
[Epoch 126 Batch 150/162] avg loss 0.000314446, throughput 2.83205K wps
Begin Testing...
[Epoch 126] train avg loss 0.000327092, dev acc 0.9056, dev avg loss 0.257346, throughput 2.85071K wps
[Epoch 127 Batch 30/162] avg loss 0.000367718, throughput 2.92278K wps
[Epoch 127 Batch 60/162] avg loss 0.000345982, throughput 2.83508K wps
[Epoch 127 Batch 90/162] avg loss 0.000302735, throughput 2.86086K wps
[Epoch 127 Batch 120/162] avg loss 0.000308718, throughput 2.84717K wps
[Epoch 127 Batch 150/162] avg loss 0.000305261, throughput 2.83768K wps
Begin Testing...
[Epoch 127] train avg loss 0.000323817, dev acc 0.9078, dev avg loss 0.259225, throughput 2.85808K wps
[Epoch 128 Batch 30/162] avg loss 0.000320675, throughput 2.90877K wps
[Epoch 128 Batch 60/162] avg loss 0.000347971, throughput 2.84319K wps
[Epoch 128 Batch 90/162] avg loss 0.000352805, throughput 2.85104K wps
[Epoch 128 Batch 120/162] avg loss 0.00032201, throughput 2.85384K wps
[Epoch 128 Batch 150/162] avg loss 0.000349277, throughput 2.83625K wps
Begin Testing...
[Epoch 128] train avg loss 0.000335665, dev acc 0.9067, dev avg loss 0.257742, throughput 2.85734K wps
[Epoch 129 Batch 30/162] avg loss 0.000294227, throughput 2.9157K wps
[Epoch 129 Batch 60/162] avg loss 0.00032315, throughput 2.83997K wps
[Epoch 129 Batch 90/162] avg loss 0.000361536, throughput 2.83819K wps
[Epoch 129 Batch 120/162] avg loss 0.000295063, throughput 2.84708K wps
[Epoch 129 Batch 150/162] avg loss 0.000338528, throughput 2.8187K wps
Begin Testing...
[Epoch 129] train avg loss 0.000323474, dev acc 0.9067, dev avg loss 0.258171, throughput 2.85061K wps
[Epoch 130 Batch 30/162] avg loss 0.000393662, throughput 2.9015K wps
[Epoch 130 Batch 60/162] avg loss 0.000364683, throughput 2.84665K wps
[Epoch 130 Batch 90/162] avg loss 0.000320156, throughput 2.83294K wps
[Epoch 130 Batch 120/162] avg loss 0.000327235, throughput 2.83791K wps
[Epoch 130 Batch 150/162] avg loss 0.000269438, throughput 2.85099K wps
Begin Testing...
[Epoch 130] train avg loss 0.000336236, dev acc 0.9067, dev avg loss 0.26034, throughput 2.85197K wps
[Epoch 131 Batch 30/162] avg loss 0.000300118, throughput 2.90129K wps
[Epoch 131 Batch 60/162] avg loss 0.000336409, throughput 2.84741K wps
[Epoch 131 Batch 90/162] avg loss 0.000316032, throughput 2.83059K wps
[Epoch 131 Batch 120/162] avg loss 0.000318194, throughput 2.8484K wps
[Epoch 131 Batch 150/162] avg loss 0.000342541, throughput 2.84955K wps
Begin Testing...
[Epoch 131] train avg loss 0.000320246, dev acc 0.9067, dev avg loss 0.260526, throughput 2.85494K wps
[Epoch 132 Batch 30/162] avg loss 0.000324179, throughput 2.90902K wps
[Epoch 132 Batch 60/162] avg loss 0.000283705, throughput 2.83939K wps
[Epoch 132 Batch 90/162] avg loss 0.000264948, throughput 2.84013K wps
[Epoch 132 Batch 120/162] avg loss 0.00028635, throughput 2.85064K wps
[Epoch 132 Batch 150/162] avg loss 0.00036299, throughput 2.84701K wps
Begin Testing...
[Epoch 132] train avg loss 0.000300992, dev acc 0.9056, dev avg loss 0.260514, throughput 2.85519K wps
[Epoch 133 Batch 30/162] avg loss 0.000335252, throughput 2.9197K wps
[Epoch 133 Batch 60/162] avg loss 0.000320279, throughput 2.8457K wps
[Epoch 133 Batch 90/162] avg loss 0.000336001, throughput 2.83717K wps
[Epoch 133 Batch 120/162] avg loss 0.000322627, throughput 2.83843K wps
[Epoch 133 Batch 150/162] avg loss 0.000269142, throughput 2.83963K wps
Begin Testing...
[Epoch 133] train avg loss 0.000314282, dev acc 0.9067, dev avg loss 0.261582, throughput 2.85502K wps
[Epoch 134 Batch 30/162] avg loss 0.000289855, throughput 2.90018K wps
[Epoch 134 Batch 60/162] avg loss 0.000324775, throughput 2.82626K wps
[Epoch 134 Batch 90/162] avg loss 0.000306528, throughput 2.84131K wps
[Epoch 134 Batch 120/162] avg loss 0.000297808, throughput 2.83687K wps
[Epoch 134 Batch 150/162] avg loss 0.00028666, throughput 2.83733K wps
Begin Testing...
[Epoch 134] train avg loss 0.000302004, dev acc 0.9067, dev avg loss 0.261016, throughput 2.84662K wps
[Epoch 135 Batch 30/162] avg loss 0.000323692, throughput 2.9012K wps
[Epoch 135 Batch 60/162] avg loss 0.000293747, throughput 2.84032K wps
[Epoch 135 Batch 90/162] avg loss 0.00031383, throughput 2.83165K wps
[Epoch 135 Batch 120/162] avg loss 0.000297467, throughput 2.83975K wps
[Epoch 135 Batch 150/162] avg loss 0.000298728, throughput 2.84256K wps
Begin Testing...
[Epoch 135] train avg loss 0.000311313, dev acc 0.9078, dev avg loss 0.261679, throughput 2.84996K wps
[Epoch 136 Batch 30/162] avg loss 0.000271207, throughput 2.91061K wps
[Epoch 136 Batch 60/162] avg loss 0.000306415, throughput 2.83093K wps
[Epoch 136 Batch 90/162] avg loss 0.000343057, throughput 2.8367K wps
[Epoch 136 Batch 120/162] avg loss 0.000307858, throughput 2.83407K wps
[Epoch 136 Batch 150/162] avg loss 0.000322388, throughput 2.82542K wps
Begin Testing...
[Epoch 136] train avg loss 0.000309632, dev acc 0.9044, dev avg loss 0.261185, throughput 2.84697K wps
[Epoch 137 Batch 30/162] avg loss 0.000285907, throughput 2.91854K wps
[Epoch 137 Batch 60/162] avg loss 0.000282435, throughput 2.84265K wps
[Epoch 137 Batch 90/162] avg loss 0.000334719, throughput 2.83936K wps
[Epoch 137 Batch 120/162] avg loss 0.000276332, throughput 2.82561K wps
[Epoch 137 Batch 150/162] avg loss 0.000294007, throughput 2.83184K wps
Begin Testing...
[Epoch 137] train avg loss 0.000294669, dev acc 0.9056, dev avg loss 0.261851, throughput 2.85026K wps
[Epoch 138 Batch 30/162] avg loss 0.000271962, throughput 2.90535K wps
[Epoch 138 Batch 60/162] avg loss 0.000284013, throughput 2.83249K wps
[Epoch 138 Batch 90/162] avg loss 0.000319373, throughput 2.84116K wps
[Epoch 138 Batch 120/162] avg loss 0.00030274, throughput 2.8494K wps
[Epoch 138 Batch 150/162] avg loss 0.000279626, throughput 2.83327K wps
Begin Testing...
[Epoch 138] train avg loss 0.000295095, dev acc 0.9078, dev avg loss 0.26383, throughput 2.85027K wps
[Epoch 139 Batch 30/162] avg loss 0.000294521, throughput 2.90744K wps
[Epoch 139 Batch 60/162] avg loss 0.000298922, throughput 2.84714K wps
[Epoch 139 Batch 90/162] avg loss 0.000318685, throughput 2.84818K wps
[Epoch 139 Batch 120/162] avg loss 0.000266805, throughput 2.83697K wps
[Epoch 139 Batch 150/162] avg loss 0.000300614, throughput 2.83827K wps
Begin Testing...
[Epoch 139] train avg loss 0.000290737, dev acc 0.9078, dev avg loss 0.263075, throughput 2.85493K wps
[Epoch 140 Batch 30/162] avg loss 0.000298607, throughput 2.90243K wps
[Epoch 140 Batch 60/162] avg loss 0.000300137, throughput 2.83854K wps
[Epoch 140 Batch 90/162] avg loss 0.000251534, throughput 2.84741K wps
[Epoch 140 Batch 120/162] avg loss 0.000307753, throughput 2.8364K wps
[Epoch 140 Batch 150/162] avg loss 0.00026503, throughput 2.84145K wps
Begin Testing...
[Epoch 140] train avg loss 0.000284841, dev acc 0.9056, dev avg loss 0.262503, throughput 2.85167K wps
[Epoch 141 Batch 30/162] avg loss 0.000262751, throughput 2.91444K wps
[Epoch 141 Batch 60/162] avg loss 0.00025389, throughput 2.8334K wps
[Epoch 141 Batch 90/162] avg loss 0.000286542, throughput 2.8357K wps
[Epoch 141 Batch 120/162] avg loss 0.000284786, throughput 2.84144K wps
[Epoch 141 Batch 150/162] avg loss 0.000292489, throughput 2.83803K wps
Begin Testing...
[Epoch 141] train avg loss 0.000270885, dev acc 0.9056, dev avg loss 0.263803, throughput 2.85049K wps
[Epoch 142 Batch 30/162] avg loss 0.000249419, throughput 2.90685K wps
[Epoch 142 Batch 60/162] avg loss 0.000260592, throughput 2.83827K wps
[Epoch 142 Batch 90/162] avg loss 0.000323939, throughput 2.836K wps
[Epoch 142 Batch 120/162] avg loss 0.000275588, throughput 2.8356K wps
[Epoch 142 Batch 150/162] avg loss 0.000308511, throughput 2.84198K wps
Begin Testing...
[Epoch 142] train avg loss 0.00028208, dev acc 0.9033, dev avg loss 0.263955, throughput 2.85118K wps
[Epoch 143 Batch 30/162] avg loss 0.000235346, throughput 2.90628K wps
[Epoch 143 Batch 60/162] avg loss 0.000276578, throughput 2.84208K wps
[Epoch 143 Batch 90/162] avg loss 0.000244261, throughput 2.83043K wps
[Epoch 143 Batch 120/162] avg loss 0.000274579, throughput 2.83775K wps
[Epoch 143 Batch 150/162] avg loss 0.00026329, throughput 2.83452K wps
Begin Testing...
[Epoch 143] train avg loss 0.000259395, dev acc 0.9056, dev avg loss 0.264472, throughput 2.84965K wps
[Epoch 144 Batch 30/162] avg loss 0.000276893, throughput 2.91757K wps
[Epoch 144 Batch 60/162] avg loss 0.000313899, throughput 2.84122K wps
[Epoch 144 Batch 90/162] avg loss 0.000280828, throughput 2.84267K wps
[Epoch 144 Batch 120/162] avg loss 0.000254243, throughput 2.83084K wps
[Epoch 144 Batch 150/162] avg loss 0.000302577, throughput 2.84001K wps
Begin Testing...
[Epoch 144] train avg loss 0.000289291, dev acc 0.9044, dev avg loss 0.264435, throughput 2.85163K wps
[Epoch 145 Batch 30/162] avg loss 0.00023991, throughput 2.85333K wps
[Epoch 145 Batch 60/162] avg loss 0.000323687, throughput 2.83412K wps
[Epoch 145 Batch 90/162] avg loss 0.000244118, throughput 2.84032K wps
[Epoch 145 Batch 120/162] avg loss 0.000332773, throughput 2.85232K wps
[Epoch 145 Batch 150/162] avg loss 0.000253916, throughput 2.84311K wps
Begin Testing...
[Epoch 145] train avg loss 0.000276554, dev acc 0.9078, dev avg loss 0.265128, throughput 2.84458K wps
[Epoch 146 Batch 30/162] avg loss 0.000316557, throughput 2.9104K wps
[Epoch 146 Batch 60/162] avg loss 0.000247748, throughput 2.84262K wps
[Epoch 146 Batch 90/162] avg loss 0.000296022, throughput 2.84233K wps
[Epoch 146 Batch 120/162] avg loss 0.000249963, throughput 2.84135K wps
[Epoch 146 Batch 150/162] avg loss 0.000267737, throughput 2.84114K wps
Begin Testing...
[Epoch 146] train avg loss 0.000279192, dev acc 0.9078, dev avg loss 0.265601, throughput 2.85457K wps
[Epoch 147 Batch 30/162] avg loss 0.000235808, throughput 2.91115K wps
[Epoch 147 Batch 60/162] avg loss 0.000253222, throughput 2.84776K wps
[Epoch 147 Batch 90/162] avg loss 0.000253762, throughput 2.83976K wps
[Epoch 147 Batch 120/162] avg loss 0.000254396, throughput 2.856K wps
[Epoch 147 Batch 150/162] avg loss 0.000272546, throughput 2.85122K wps
Begin Testing...
[Epoch 147] train avg loss 0.000252842, dev acc 0.9078, dev avg loss 0.266273, throughput 2.86061K wps
[Epoch 148 Batch 30/162] avg loss 0.000305343, throughput 2.9083K wps
[Epoch 148 Batch 60/162] avg loss 0.000234034, throughput 2.84188K wps
[Epoch 148 Batch 90/162] avg loss 0.000277451, throughput 2.83609K wps
[Epoch 148 Batch 120/162] avg loss 0.000281943, throughput 2.83252K wps
[Epoch 148 Batch 150/162] avg loss 0.000246564, throughput 2.8492K wps
Begin Testing...
[Epoch 148] train avg loss 0.000266876, dev acc 0.9033, dev avg loss 0.265489, throughput 2.85146K wps
[Epoch 149 Batch 30/162] avg loss 0.000235803, throughput 2.91588K wps
[Epoch 149 Batch 60/162] avg loss 0.000275087, throughput 2.83697K wps
[Epoch 149 Batch 90/162] avg loss 0.000229153, throughput 2.83923K wps
[Epoch 149 Batch 120/162] avg loss 0.000209722, throughput 2.84681K wps
[Epoch 149 Batch 150/162] avg loss 0.000238189, throughput 2.83523K wps
Begin Testing...
[Epoch 149] train avg loss 0.000239859, dev acc 0.9078, dev avg loss 0.267777, throughput 2.85272K wps
[Epoch 150 Batch 30/162] avg loss 0.000262827, throughput 2.90906K wps
[Epoch 150 Batch 60/162] avg loss 0.000261985, throughput 2.83697K wps
[Epoch 150 Batch 90/162] avg loss 0.000264218, throughput 2.85015K wps
[Epoch 150 Batch 120/162] avg loss 0.000237927, throughput 2.82699K wps
[Epoch 150 Batch 150/162] avg loss 0.000204881, throughput 2.83937K wps
Begin Testing...
[Epoch 150] train avg loss 0.000245355, dev acc 0.9078, dev avg loss 0.267207, throughput 2.85253K wps
[Epoch 151 Batch 30/162] avg loss 0.000250565, throughput 2.9061K wps
[Epoch 151 Batch 60/162] avg loss 0.00022154, throughput 2.85336K wps
[Epoch 151 Batch 90/162] avg loss 0.000284115, throughput 2.83585K wps
[Epoch 151 Batch 120/162] avg loss 0.000232265, throughput 2.82727K wps
[Epoch 151 Batch 150/162] avg loss 0.000255551, throughput 2.83903K wps
Begin Testing...
[Epoch 151] train avg loss 0.000244553, dev acc 0.9078, dev avg loss 0.26775, throughput 2.85125K wps
[Epoch 152 Batch 30/162] avg loss 0.000243477, throughput 2.89928K wps
[Epoch 152 Batch 60/162] avg loss 0.000257202, throughput 2.83513K wps
[Epoch 152 Batch 90/162] avg loss 0.000309003, throughput 2.84469K wps
[Epoch 152 Batch 120/162] avg loss 0.000276537, throughput 2.85089K wps
[Epoch 152 Batch 150/162] avg loss 0.000249805, throughput 2.83987K wps
Begin Testing...
[Epoch 152] train avg loss 0.00026244, dev acc 0.9056, dev avg loss 0.267317, throughput 2.8535K wps
[Epoch 153 Batch 30/162] avg loss 0.000278722, throughput 2.91061K wps
[Epoch 153 Batch 60/162] avg loss 0.00023377, throughput 2.84582K wps
[Epoch 153 Batch 90/162] avg loss 0.00024023, throughput 2.82666K wps
[Epoch 153 Batch 120/162] avg loss 0.000242429, throughput 2.84159K wps
[Epoch 153 Batch 150/162] avg loss 0.000256561, throughput 2.84142K wps
Begin Testing...
[Epoch 153] train avg loss 0.000251649, dev acc 0.9078, dev avg loss 0.268912, throughput 2.85163K wps
[Epoch 154 Batch 30/162] avg loss 0.000243075, throughput 2.90371K wps
[Epoch 154 Batch 60/162] avg loss 0.000230665, throughput 2.848K wps
[Epoch 154 Batch 90/162] avg loss 0.000259301, throughput 2.85033K wps
[Epoch 154 Batch 120/162] avg loss 0.000216878, throughput 2.82895K wps
[Epoch 154 Batch 150/162] avg loss 0.000249274, throughput 2.83913K wps
Begin Testing...
[Epoch 154] train avg loss 0.000239304, dev acc 0.9078, dev avg loss 0.268648, throughput 2.85288K wps
[Epoch 155 Batch 30/162] avg loss 0.000244559, throughput 2.91555K wps
[Epoch 155 Batch 60/162] avg loss 0.000238937, throughput 2.85044K wps
[Epoch 155 Batch 90/162] avg loss 0.000233634, throughput 2.84577K wps
[Epoch 155 Batch 120/162] avg loss 0.000221883, throughput 2.84183K wps
[Epoch 155 Batch 150/162] avg loss 0.000261413, throughput 2.83938K wps
Begin Testing...
[Epoch 155] train avg loss 0.000240213, dev acc 0.9078, dev avg loss 0.269811, throughput 2.85713K wps
[Epoch 156 Batch 30/162] avg loss 0.000241838, throughput 2.89958K wps
[Epoch 156 Batch 60/162] avg loss 0.00020784, throughput 2.85051K wps
[Epoch 156 Batch 90/162] avg loss 0.000220093, throughput 2.84351K wps
[Epoch 156 Batch 120/162] avg loss 0.000192363, throughput 2.83808K wps
[Epoch 156 Batch 150/162] avg loss 0.00026404, throughput 2.84585K wps
Begin Testing...
[Epoch 156] train avg loss 0.000225011, dev acc 0.9044, dev avg loss 0.269106, throughput 2.85375K wps
[Epoch 157 Batch 30/162] avg loss 0.000221151, throughput 2.90343K wps
[Epoch 157 Batch 60/162] avg loss 0.00029349, throughput 2.84585K wps
[Epoch 157 Batch 90/162] avg loss 0.000228434, throughput 2.83885K wps
[Epoch 157 Batch 120/162] avg loss 0.000202458, throughput 2.84419K wps
[Epoch 157 Batch 150/162] avg loss 0.00020499, throughput 2.84968K wps
Begin Testing...
[Epoch 157] train avg loss 0.000229825, dev acc 0.9078, dev avg loss 0.270443, throughput 2.85372K wps
[Epoch 158 Batch 30/162] avg loss 0.000249477, throughput 2.90847K wps
[Epoch 158 Batch 60/162] avg loss 0.000195599, throughput 2.84299K wps
[Epoch 158 Batch 90/162] avg loss 0.000214251, throughput 2.82672K wps
[Epoch 158 Batch 120/162] avg loss 0.000206487, throughput 2.84576K wps
[Epoch 158 Batch 150/162] avg loss 0.000199835, throughput 2.83725K wps
Begin Testing...
[Epoch 158] train avg loss 0.000209003, dev acc 0.9078, dev avg loss 0.271162, throughput 2.84982K wps
[Epoch 159 Batch 30/162] avg loss 0.000222688, throughput 2.91436K wps
[Epoch 159 Batch 60/162] avg loss 0.000235726, throughput 2.83949K wps
[Epoch 159 Batch 90/162] avg loss 0.000249131, throughput 2.83192K wps
[Epoch 159 Batch 120/162] avg loss 0.000199329, throughput 2.82938K wps
[Epoch 159 Batch 150/162] avg loss 0.000208806, throughput 2.83147K wps
Begin Testing...
[Epoch 159] train avg loss 0.00022031, dev acc 0.9078, dev avg loss 0.270798, throughput 2.848K wps
[Epoch 160 Batch 30/162] avg loss 0.000247099, throughput 2.90754K wps
[Epoch 160 Batch 60/162] avg loss 0.000190261, throughput 2.85208K wps
[Epoch 160 Batch 90/162] avg loss 0.000204877, throughput 2.83969K wps
[Epoch 160 Batch 120/162] avg loss 0.000261048, throughput 2.83997K wps
[Epoch 160 Batch 150/162] avg loss 0.000205856, throughput 2.82626K wps
Begin Testing...
[Epoch 160] train avg loss 0.000220993, dev acc 0.9078, dev avg loss 0.2707, throughput 2.85107K wps
[Epoch 161 Batch 30/162] avg loss 0.000204665, throughput 2.89596K wps
[Epoch 161 Batch 60/162] avg loss 0.00019269, throughput 2.83096K wps
[Epoch 161 Batch 90/162] avg loss 0.000185677, throughput 2.84199K wps
[Epoch 161 Batch 120/162] avg loss 0.000262338, throughput 2.82586K wps
[Epoch 161 Batch 150/162] avg loss 0.000211158, throughput 2.8369K wps
Begin Testing...
[Epoch 161] train avg loss 0.000212943, dev acc 0.9078, dev avg loss 0.270977, throughput 2.84647K wps
[Epoch 162 Batch 30/162] avg loss 0.000221041, throughput 2.90892K wps
[Epoch 162 Batch 60/162] avg loss 0.000199408, throughput 2.84587K wps
[Epoch 162 Batch 90/162] avg loss 0.000220604, throughput 2.84759K wps
[Epoch 162 Batch 120/162] avg loss 0.000204983, throughput 2.83338K wps
[Epoch 162 Batch 150/162] avg loss 0.000202634, throughput 2.842K wps
Begin Testing...
[Epoch 162] train avg loss 0.000212258, dev acc 0.9056, dev avg loss 0.270938, throughput 2.85437K wps
[Epoch 163 Batch 30/162] avg loss 0.00023317, throughput 2.91208K wps
[Epoch 163 Batch 60/162] avg loss 0.000181829, throughput 2.84452K wps
[Epoch 163 Batch 90/162] avg loss 0.00020635, throughput 2.8528K wps
[Epoch 163 Batch 120/162] avg loss 0.000227862, throughput 2.8412K wps
[Epoch 163 Batch 150/162] avg loss 0.000220282, throughput 2.84035K wps
Begin Testing...
[Epoch 163] train avg loss 0.000218575, dev acc 0.9078, dev avg loss 0.27188, throughput 2.85577K wps
[Epoch 164 Batch 30/162] avg loss 0.000166401, throughput 2.90286K wps
[Epoch 164 Batch 60/162] avg loss 0.000256204, throughput 2.85846K wps
[Epoch 164 Batch 90/162] avg loss 0.00020583, throughput 2.83498K wps
[Epoch 164 Batch 120/162] avg loss 0.000201319, throughput 2.85431K wps
[Epoch 164 Batch 150/162] avg loss 0.000205217, throughput 2.84417K wps
Begin Testing...
[Epoch 164] train avg loss 0.000204079, dev acc 0.9078, dev avg loss 0.272439, throughput 2.85805K wps
[Epoch 165 Batch 30/162] avg loss 0.000218258, throughput 2.90334K wps
[Epoch 165 Batch 60/162] avg loss 0.000194784, throughput 2.84467K wps
[Epoch 165 Batch 90/162] avg loss 0.000235216, throughput 2.83867K wps
[Epoch 165 Batch 120/162] avg loss 0.000247541, throughput 2.83889K wps
[Epoch 165 Batch 150/162] avg loss 0.000246026, throughput 2.848K wps
Begin Testing...
[Epoch 165] train avg loss 0.000223616, dev acc 0.9078, dev avg loss 0.272002, throughput 2.85301K wps
[Epoch 166 Batch 30/162] avg loss 0.000213962, throughput 2.91772K wps
[Epoch 166 Batch 60/162] avg loss 0.000210189, throughput 2.84546K wps
[Epoch 166 Batch 90/162] avg loss 0.000216821, throughput 2.84124K wps
[Epoch 166 Batch 120/162] avg loss 0.000209227, throughput 2.84427K wps
[Epoch 166 Batch 150/162] avg loss 0.000202203, throughput 2.83152K wps
Begin Testing...
[Epoch 166] train avg loss 0.000209535, dev acc 0.9078, dev avg loss 0.272741, throughput 2.85362K wps
[Epoch 167 Batch 30/162] avg loss 0.000214678, throughput 2.90313K wps
[Epoch 167 Batch 60/162] avg loss 0.000237469, throughput 2.83844K wps
[Epoch 167 Batch 90/162] avg loss 0.000201066, throughput 2.85174K wps
[Epoch 167 Batch 120/162] avg loss 0.000199166, throughput 2.82665K wps
[Epoch 167 Batch 150/162] avg loss 0.000213543, throughput 2.83873K wps
Begin Testing...
[Epoch 167] train avg loss 0.000215812, dev acc 0.9067, dev avg loss 0.272739, throughput 2.84973K wps
[Epoch 168 Batch 30/162] avg loss 0.000231325, throughput 2.90704K wps
[Epoch 168 Batch 60/162] avg loss 0.000178016, throughput 2.84007K wps
[Epoch 168 Batch 90/162] avg loss 0.000190606, throughput 2.83466K wps
[Epoch 168 Batch 120/162] avg loss 0.000201681, throughput 2.83427K wps
[Epoch 168 Batch 150/162] avg loss 0.000203821, throughput 2.84121K wps
Begin Testing...
[Epoch 168] train avg loss 0.000203243, dev acc 0.9089, dev avg loss 0.274611, throughput 2.85094K wps
Observed Improvement.
Begin Testing...
[Epoch 169 Batch 30/162] avg loss 0.000168828, throughput 2.92137K wps
[Epoch 169 Batch 60/162] avg loss 0.000170589, throughput 2.84124K wps
[Epoch 169 Batch 90/162] avg loss 0.000202912, throughput 2.83628K wps
[Epoch 169 Batch 120/162] avg loss 0.000202881, throughput 2.85406K wps
[Epoch 169 Batch 150/162] avg loss 0.000195906, throughput 2.84512K wps
Begin Testing...
[Epoch 169] train avg loss 0.000191937, dev acc 0.9056, dev avg loss 0.273605, throughput 2.85761K wps
[Epoch 170 Batch 30/162] avg loss 0.000189016, throughput 2.90481K wps
[Epoch 170 Batch 60/162] avg loss 0.000219742, throughput 2.85394K wps
[Epoch 170 Batch 90/162] avg loss 0.000214096, throughput 2.85839K wps
[Epoch 170 Batch 120/162] avg loss 0.000220952, throughput 2.83857K wps
[Epoch 170 Batch 150/162] avg loss 0.000179489, throughput 2.84336K wps
Begin Testing...
[Epoch 170] train avg loss 0.000204763, dev acc 0.9078, dev avg loss 0.274382, throughput 2.85936K wps
[Epoch 171 Batch 30/162] avg loss 0.000207602, throughput 2.90539K wps
[Epoch 171 Batch 60/162] avg loss 0.000191317, throughput 2.84168K wps
[Epoch 171 Batch 90/162] avg loss 0.000177328, throughput 2.83528K wps
[Epoch 171 Batch 120/162] avg loss 0.000210112, throughput 2.83532K wps
[Epoch 171 Batch 150/162] avg loss 0.000202393, throughput 2.83759K wps
Begin Testing...
[Epoch 171] train avg loss 0.000195032, dev acc 0.9067, dev avg loss 0.274642, throughput 2.84953K wps
[Epoch 172 Batch 30/162] avg loss 0.000186177, throughput 2.91349K wps
[Epoch 172 Batch 60/162] avg loss 0.000182164, throughput 2.8433K wps
[Epoch 172 Batch 90/162] avg loss 0.00016483, throughput 2.83517K wps
[Epoch 172 Batch 120/162] avg loss 0.000200055, throughput 2.84309K wps
[Epoch 172 Batch 150/162] avg loss 0.000210329, throughput 2.84058K wps
Begin Testing...
[Epoch 172] train avg loss 0.000187053, dev acc 0.9078, dev avg loss 0.275549, throughput 2.85238K wps
[Epoch 173 Batch 30/162] avg loss 0.000183488, throughput 2.92623K wps
[Epoch 173 Batch 60/162] avg loss 0.000176296, throughput 2.83132K wps
[Epoch 173 Batch 90/162] avg loss 0.000203013, throughput 2.84178K wps
[Epoch 173 Batch 120/162] avg loss 0.000186778, throughput 2.84096K wps
[Epoch 173 Batch 150/162] avg loss 0.000180889, throughput 2.83716K wps
Begin Testing...
[Epoch 173] train avg loss 0.000190294, dev acc 0.9078, dev avg loss 0.276293, throughput 2.85437K wps
[Epoch 174 Batch 30/162] avg loss 0.000164, throughput 2.89954K wps
[Epoch 174 Batch 60/162] avg loss 0.000195509, throughput 2.85287K wps
[Epoch 174 Batch 90/162] avg loss 0.000191087, throughput 2.84962K wps
[Epoch 174 Batch 120/162] avg loss 0.000185081, throughput 2.84828K wps
[Epoch 174 Batch 150/162] avg loss 0.000181767, throughput 2.84578K wps
Begin Testing...
[Epoch 174] train avg loss 0.000182317, dev acc 0.9089, dev avg loss 0.277223, throughput 2.8584K wps
Observed Improvement.
Begin Testing...
[Epoch 175 Batch 30/162] avg loss 0.000194202, throughput 2.89455K wps
[Epoch 175 Batch 60/162] avg loss 0.00018008, throughput 2.84411K wps
[Epoch 175 Batch 90/162] avg loss 0.000180543, throughput 2.84684K wps
[Epoch 175 Batch 120/162] avg loss 0.000177272, throughput 2.83442K wps
[Epoch 175 Batch 150/162] avg loss 0.000196939, throughput 2.83623K wps
Begin Testing...
[Epoch 175] train avg loss 0.000185325, dev acc 0.9056, dev avg loss 0.276578, throughput 2.85073K wps
[Epoch 176 Batch 30/162] avg loss 0.000209948, throughput 2.88857K wps
[Epoch 176 Batch 60/162] avg loss 0.000189742, throughput 2.83184K wps
[Epoch 176 Batch 90/162] avg loss 0.000185811, throughput 2.84631K wps
[Epoch 176 Batch 120/162] avg loss 0.000222864, throughput 2.8438K wps
[Epoch 176 Batch 150/162] avg loss 0.00016708, throughput 2.83225K wps
Begin Testing...
[Epoch 176] train avg loss 0.000195415, dev acc 0.9089, dev avg loss 0.278929, throughput 2.84674K wps
Observed Improvement.
Begin Testing...
[Epoch 177 Batch 30/162] avg loss 0.000240435, throughput 2.90222K wps
[Epoch 177 Batch 60/162] avg loss 0.000180609, throughput 2.83691K wps
[Epoch 177 Batch 90/162] avg loss 0.000190776, throughput 2.85291K wps
[Epoch 177 Batch 120/162] avg loss 0.000200874, throughput 2.83518K wps
[Epoch 177 Batch 150/162] avg loss 0.000169877, throughput 2.8431K wps
Begin Testing...
[Epoch 177] train avg loss 0.000198618, dev acc 0.9089, dev avg loss 0.278569, throughput 2.85331K wps
Observed Improvement.
Begin Testing...
[Epoch 178 Batch 30/162] avg loss 0.000202559, throughput 2.90164K wps
[Epoch 178 Batch 60/162] avg loss 0.000155214, throughput 2.84546K wps
[Epoch 178 Batch 90/162] avg loss 0.000155457, throughput 2.84798K wps
[Epoch 178 Batch 120/162] avg loss 0.000186125, throughput 2.83813K wps
[Epoch 178 Batch 150/162] avg loss 0.000186542, throughput 2.84354K wps
Begin Testing...
[Epoch 178] train avg loss 0.000181472, dev acc 0.9078, dev avg loss 0.277204, throughput 2.85398K wps
[Epoch 179 Batch 30/162] avg loss 0.000173995, throughput 2.9075K wps
[Epoch 179 Batch 60/162] avg loss 0.000179837, throughput 2.85172K wps
[Epoch 179 Batch 90/162] avg loss 0.000178124, throughput 2.84636K wps
[Epoch 179 Batch 120/162] avg loss 0.000164916, throughput 2.84751K wps
[Epoch 179 Batch 150/162] avg loss 0.000205494, throughput 2.84271K wps
Begin Testing...
[Epoch 179] train avg loss 0.000180129, dev acc 0.9089, dev avg loss 0.277624, throughput 2.85758K wps
Observed Improvement.
Begin Testing...
[Epoch 180 Batch 30/162] avg loss 0.000195082, throughput 2.90643K wps
[Epoch 180 Batch 60/162] avg loss 0.000148511, throughput 2.85038K wps
[Epoch 180 Batch 90/162] avg loss 0.000164351, throughput 2.85316K wps
[Epoch 180 Batch 120/162] avg loss 0.000157617, throughput 2.85037K wps
[Epoch 180 Batch 150/162] avg loss 0.000176931, throughput 2.83105K wps
Begin Testing...
[Epoch 180] train avg loss 0.000171094, dev acc 0.9056, dev avg loss 0.276863, throughput 2.85696K wps
[Epoch 181 Batch 30/162] avg loss 0.000170739, throughput 2.91142K wps
[Epoch 181 Batch 60/162] avg loss 0.000164124, throughput 2.84021K wps
[Epoch 181 Batch 90/162] avg loss 0.000160542, throughput 2.84848K wps
[Epoch 181 Batch 120/162] avg loss 0.000215435, throughput 2.84197K wps
[Epoch 181 Batch 150/162] avg loss 0.000156074, throughput 2.84583K wps
Begin Testing...
[Epoch 181] train avg loss 0.000175839, dev acc 0.9078, dev avg loss 0.278519, throughput 2.85588K wps
[Epoch 182 Batch 30/162] avg loss 0.000165367, throughput 2.90721K wps
[Epoch 182 Batch 60/162] avg loss 0.0001862, throughput 2.84565K wps
[Epoch 182 Batch 90/162] avg loss 0.000176996, throughput 2.8579K wps
[Epoch 182 Batch 120/162] avg loss 0.000193101, throughput 2.84439K wps
[Epoch 182 Batch 150/162] avg loss 0.00016291, throughput 2.85038K wps
Begin Testing...
[Epoch 182] train avg loss 0.000175785, dev acc 0.9078, dev avg loss 0.278199, throughput 2.85913K wps
[Epoch 183 Batch 30/162] avg loss 0.000168629, throughput 2.92469K wps
[Epoch 183 Batch 60/162] avg loss 0.000147625, throughput 2.83729K wps
[Epoch 183 Batch 90/162] avg loss 0.000185729, throughput 2.84914K wps
[Epoch 183 Batch 120/162] avg loss 0.000175921, throughput 2.84884K wps
[Epoch 183 Batch 150/162] avg loss 0.000190729, throughput 2.84057K wps
Begin Testing...
[Epoch 183] train avg loss 0.000176439, dev acc 0.9078, dev avg loss 0.278456, throughput 2.8577K wps
[Epoch 184 Batch 30/162] avg loss 0.000183585, throughput 2.91276K wps
[Epoch 184 Batch 60/162] avg loss 0.000155033, throughput 2.84875K wps
[Epoch 184 Batch 90/162] avg loss 0.000177152, throughput 2.84607K wps
[Epoch 184 Batch 120/162] avg loss 0.000168801, throughput 2.84658K wps
[Epoch 184 Batch 150/162] avg loss 0.000170027, throughput 2.8419K wps
Begin Testing...
[Epoch 184] train avg loss 0.000171754, dev acc 0.9078, dev avg loss 0.278508, throughput 2.85704K wps
[Epoch 185 Batch 30/162] avg loss 0.000162386, throughput 2.89451K wps
[Epoch 185 Batch 60/162] avg loss 0.000164014, throughput 2.84027K wps
[Epoch 185 Batch 90/162] avg loss 0.000160553, throughput 2.85143K wps
[Epoch 185 Batch 120/162] avg loss 0.000160879, throughput 2.84435K wps
[Epoch 185 Batch 150/162] avg loss 0.000171236, throughput 2.83597K wps
Begin Testing...
[Epoch 185] train avg loss 0.000163045, dev acc 0.9078, dev avg loss 0.279658, throughput 2.85298K wps
[Epoch 186 Batch 30/162] avg loss 0.000170997, throughput 2.8925K wps
[Epoch 186 Batch 60/162] avg loss 0.000175159, throughput 2.85285K wps
[Epoch 186 Batch 90/162] avg loss 0.000177304, throughput 2.85477K wps
[Epoch 186 Batch 120/162] avg loss 0.000164192, throughput 2.84283K wps
[Epoch 186 Batch 150/162] avg loss 0.000180775, throughput 2.84438K wps
Begin Testing...
[Epoch 186] train avg loss 0.000171886, dev acc 0.9078, dev avg loss 0.279894, throughput 2.85674K wps
[Epoch 187 Batch 30/162] avg loss 0.000170093, throughput 2.9084K wps
[Epoch 187 Batch 60/162] avg loss 0.000164459, throughput 2.84785K wps
[Epoch 187 Batch 90/162] avg loss 0.000196109, throughput 2.84546K wps
[Epoch 187 Batch 120/162] avg loss 0.000150167, throughput 2.84K wps
[Epoch 187 Batch 150/162] avg loss 0.000183303, throughput 2.83638K wps
Begin Testing...
[Epoch 187] train avg loss 0.000171723, dev acc 0.9078, dev avg loss 0.280518, throughput 2.85384K wps
[Epoch 188 Batch 30/162] avg loss 0.000207299, throughput 2.90686K wps
[Epoch 188 Batch 60/162] avg loss 0.000167356, throughput 2.83765K wps
[Epoch 188 Batch 90/162] avg loss 0.000164548, throughput 2.84125K wps
[Epoch 188 Batch 120/162] avg loss 0.00020828, throughput 2.83912K wps
[Epoch 188 Batch 150/162] avg loss 0.000150758, throughput 2.8597K wps
Begin Testing...
[Epoch 188] train avg loss 0.000175722, dev acc 0.9078, dev avg loss 0.280709, throughput 2.85642K wps
[Epoch 189 Batch 30/162] avg loss 0.000176482, throughput 2.91331K wps
[Epoch 189 Batch 60/162] avg loss 0.000152011, throughput 2.84853K wps
[Epoch 189 Batch 90/162] avg loss 0.000143074, throughput 2.8322K wps
[Epoch 189 Batch 120/162] avg loss 0.000196802, throughput 2.84292K wps
[Epoch 189 Batch 150/162] avg loss 0.000143118, throughput 2.84373K wps
Begin Testing...
[Epoch 189] train avg loss 0.000165083, dev acc 0.9078, dev avg loss 0.281621, throughput 2.85596K wps
[Epoch 190 Batch 30/162] avg loss 0.00017797, throughput 2.89618K wps
[Epoch 190 Batch 60/162] avg loss 0.000177971, throughput 2.84527K wps
[Epoch 190 Batch 90/162] avg loss 0.000187965, throughput 2.83758K wps
[Epoch 190 Batch 120/162] avg loss 0.000162931, throughput 2.8399K wps
[Epoch 190 Batch 150/162] avg loss 0.000192719, throughput 2.84035K wps
Begin Testing...
[Epoch 190] train avg loss 0.000177655, dev acc 0.9078, dev avg loss 0.282212, throughput 2.85021K wps
[Epoch 191 Batch 30/162] avg loss 0.000159998, throughput 2.91857K wps
[Epoch 191 Batch 60/162] avg loss 0.000169264, throughput 2.84356K wps
[Epoch 191 Batch 90/162] avg loss 0.000174707, throughput 2.83681K wps
[Epoch 191 Batch 120/162] avg loss 0.000178126, throughput 2.83474K wps
[Epoch 191 Batch 150/162] avg loss 0.000173734, throughput 2.84239K wps
Begin Testing...
[Epoch 191] train avg loss 0.000167186, dev acc 0.9078, dev avg loss 0.282264, throughput 2.85322K wps
[Epoch 192 Batch 30/162] avg loss 0.000172741, throughput 2.91296K wps
[Epoch 192 Batch 60/162] avg loss 0.000151909, throughput 2.83252K wps
[Epoch 192 Batch 90/162] avg loss 0.000153894, throughput 2.85314K wps
[Epoch 192 Batch 120/162] avg loss 0.000160595, throughput 2.85205K wps
[Epoch 192 Batch 150/162] avg loss 0.00014418, throughput 2.8362K wps
Begin Testing...
[Epoch 192] train avg loss 0.000160768, dev acc 0.9056, dev avg loss 0.2812, throughput 2.85537K wps
[Epoch 193 Batch 30/162] avg loss 0.00014153, throughput 2.89814K wps
[Epoch 193 Batch 60/162] avg loss 0.000146948, throughput 2.85136K wps
[Epoch 193 Batch 90/162] avg loss 0.000158257, throughput 2.8507K wps
[Epoch 193 Batch 120/162] avg loss 0.000155363, throughput 2.84969K wps
[Epoch 193 Batch 150/162] avg loss 0.000172048, throughput 2.83941K wps
Begin Testing...
[Epoch 193] train avg loss 0.00015732, dev acc 0.9078, dev avg loss 0.28198, throughput 2.85687K wps
[Epoch 194 Batch 30/162] avg loss 0.000195192, throughput 2.89742K wps
[Epoch 194 Batch 60/162] avg loss 0.000158463, throughput 2.83856K wps
[Epoch 194 Batch 90/162] avg loss 0.00015419, throughput 2.8283K wps
[Epoch 194 Batch 120/162] avg loss 0.000160445, throughput 2.83449K wps
[Epoch 194 Batch 150/162] avg loss 0.000122541, throughput 2.84971K wps
Begin Testing...
[Epoch 194] train avg loss 0.000156323, dev acc 0.9056, dev avg loss 0.281425, throughput 2.84966K wps
[Epoch 195 Batch 30/162] avg loss 0.000118395, throughput 2.90682K wps
[Epoch 195 Batch 60/162] avg loss 0.000150154, throughput 2.8444K wps
[Epoch 195 Batch 90/162] avg loss 0.000152519, throughput 2.82573K wps
[Epoch 195 Batch 120/162] avg loss 0.00015617, throughput 2.83508K wps
[Epoch 195 Batch 150/162] avg loss 0.000148581, throughput 2.82822K wps
Begin Testing...
[Epoch 195] train avg loss 0.000147918, dev acc 0.9078, dev avg loss 0.28259, throughput 2.8463K wps
[Epoch 196 Batch 30/162] avg loss 0.000151837, throughput 2.90208K wps
[Epoch 196 Batch 60/162] avg loss 0.000169264, throughput 2.84442K wps
[Epoch 196 Batch 90/162] avg loss 0.000152603, throughput 2.83141K wps
[Epoch 196 Batch 120/162] avg loss 0.000126748, throughput 2.84761K wps
[Epoch 196 Batch 150/162] avg loss 0.000141708, throughput 2.85281K wps
Begin Testing...
[Epoch 196] train avg loss 0.000149724, dev acc 0.9067, dev avg loss 0.282155, throughput 2.85528K wps
[Epoch 197 Batch 30/162] avg loss 0.000178407, throughput 2.89997K wps
[Epoch 197 Batch 60/162] avg loss 0.00014104, throughput 2.85221K wps
[Epoch 197 Batch 90/162] avg loss 0.000152729, throughput 2.85559K wps
[Epoch 197 Batch 120/162] avg loss 0.000194323, throughput 2.84376K wps
[Epoch 197 Batch 150/162] avg loss 0.000163701, throughput 2.84169K wps
Begin Testing...
[Epoch 197] train avg loss 0.000162205, dev acc 0.9078, dev avg loss 0.283887, throughput 2.85659K wps
[Epoch 198 Batch 30/162] avg loss 0.000146664, throughput 2.90233K wps
[Epoch 198 Batch 60/162] avg loss 0.000147207, throughput 2.85594K wps
[Epoch 198 Batch 90/162] avg loss 0.000130501, throughput 2.84648K wps
[Epoch 198 Batch 120/162] avg loss 0.000160247, throughput 2.84135K wps
[Epoch 198 Batch 150/162] avg loss 0.000160035, throughput 2.83441K wps
Begin Testing...
[Epoch 198] train avg loss 0.000149798, dev acc 0.9078, dev avg loss 0.284495, throughput 2.85595K wps
[Epoch 199 Batch 30/162] avg loss 0.000136271, throughput 2.91004K wps
[Epoch 199 Batch 60/162] avg loss 0.000145893, throughput 2.84268K wps
[Epoch 199 Batch 90/162] avg loss 0.000156019, throughput 2.85084K wps
[Epoch 199 Batch 120/162] avg loss 0.000152448, throughput 2.83978K wps
[Epoch 199 Batch 150/162] avg loss 0.000126533, throughput 2.83997K wps
Begin Testing...
[Epoch 199] train avg loss 0.00014278, dev acc 0.9078, dev avg loss 0.28423, throughput 2.85544K wps
Test loss 0.254589, test acc 0.9050
Total time cost 1420.67s
[Epoch 0 Batch 30/162] avg loss 0.0139053, throughput 2.42926K wps
[Epoch 0 Batch 60/162] avg loss 0.0135874, throughput 2.84595K wps
[Epoch 0 Batch 90/162] avg loss 0.0135093, throughput 2.84377K wps
[Epoch 0 Batch 120/162] avg loss 0.0132847, throughput 2.85405K wps
[Epoch 0 Batch 150/162] avg loss 0.0130283, throughput 2.84059K wps
Begin Testing...
[Epoch 0] train avg loss 0.0134278, dev acc 0.7856, dev avg loss 0.635719, throughput 2.75876K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0126487, throughput 2.91045K wps
[Epoch 1 Batch 60/162] avg loss 0.0124559, throughput 2.84689K wps
[Epoch 1 Batch 90/162] avg loss 0.0123222, throughput 2.83508K wps
[Epoch 1 Batch 120/162] avg loss 0.0120374, throughput 2.84154K wps
[Epoch 1 Batch 150/162] avg loss 0.0119789, throughput 2.84432K wps
Begin Testing...
[Epoch 1] train avg loss 0.0122278, dev acc 0.8667, dev avg loss 0.571474, throughput 2.85524K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/162] avg loss 0.011251, throughput 2.90007K wps
[Epoch 2 Batch 60/162] avg loss 0.0111156, throughput 2.84707K wps
[Epoch 2 Batch 90/162] avg loss 0.0110932, throughput 2.83593K wps
[Epoch 2 Batch 120/162] avg loss 0.0107475, throughput 2.84038K wps
[Epoch 2 Batch 150/162] avg loss 0.0103438, throughput 2.84127K wps
Begin Testing...
[Epoch 2] train avg loss 0.0108615, dev acc 0.8767, dev avg loss 0.503169, throughput 2.85243K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.00994741, throughput 2.90775K wps
[Epoch 3 Batch 60/162] avg loss 0.00978955, throughput 2.84747K wps
[Epoch 3 Batch 90/162] avg loss 0.00937513, throughput 2.83303K wps
[Epoch 3 Batch 120/162] avg loss 0.00969307, throughput 2.84395K wps
[Epoch 3 Batch 150/162] avg loss 0.00909974, throughput 2.83876K wps
Begin Testing...
[Epoch 3] train avg loss 0.00951132, dev acc 0.8711, dev avg loss 0.442051, throughput 2.85213K wps
[Epoch 4 Batch 30/162] avg loss 0.00880978, throughput 2.90735K wps
[Epoch 4 Batch 60/162] avg loss 0.00834898, throughput 2.82885K wps
[Epoch 4 Batch 90/162] avg loss 0.00872467, throughput 2.83942K wps
[Epoch 4 Batch 120/162] avg loss 0.00831506, throughput 2.85085K wps
[Epoch 4 Batch 150/162] avg loss 0.00826798, throughput 2.83138K wps
Begin Testing...
[Epoch 4] train avg loss 0.00846578, dev acc 0.8756, dev avg loss 0.395299, throughput 2.84994K wps
[Epoch 5 Batch 30/162] avg loss 0.00800463, throughput 2.89694K wps
[Epoch 5 Batch 60/162] avg loss 0.00763526, throughput 2.84489K wps
[Epoch 5 Batch 90/162] avg loss 0.0078611, throughput 2.83635K wps
[Epoch 5 Batch 120/162] avg loss 0.00768605, throughput 2.83703K wps
[Epoch 5 Batch 150/162] avg loss 0.00744107, throughput 2.84392K wps
Begin Testing...
[Epoch 5] train avg loss 0.00771689, dev acc 0.8756, dev avg loss 0.361703, throughput 2.84941K wps
[Epoch 6 Batch 30/162] avg loss 0.00724866, throughput 2.91873K wps
[Epoch 6 Batch 60/162] avg loss 0.00704983, throughput 2.84152K wps
[Epoch 6 Batch 90/162] avg loss 0.00710962, throughput 2.83022K wps
[Epoch 6 Batch 120/162] avg loss 0.00693489, throughput 2.843K wps
[Epoch 6 Batch 150/162] avg loss 0.00714905, throughput 2.85482K wps
Begin Testing...
[Epoch 6] train avg loss 0.00709446, dev acc 0.8822, dev avg loss 0.337325, throughput 2.85682K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/162] avg loss 0.00663904, throughput 2.91522K wps
[Epoch 7 Batch 60/162] avg loss 0.00716734, throughput 2.83935K wps
[Epoch 7 Batch 90/162] avg loss 0.00654383, throughput 2.83173K wps
[Epoch 7 Batch 120/162] avg loss 0.00645454, throughput 2.8459K wps
[Epoch 7 Batch 150/162] avg loss 0.0064849, throughput 2.8381K wps
Begin Testing...
[Epoch 7] train avg loss 0.00662665, dev acc 0.8878, dev avg loss 0.318516, throughput 2.85159K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.00656678, throughput 2.90225K wps
[Epoch 8 Batch 60/162] avg loss 0.00615189, throughput 2.83637K wps
[Epoch 8 Batch 90/162] avg loss 0.00645508, throughput 2.84642K wps
[Epoch 8 Batch 120/162] avg loss 0.00627558, throughput 2.84555K wps
[Epoch 8 Batch 150/162] avg loss 0.00613052, throughput 2.8445K wps
Begin Testing...
[Epoch 8] train avg loss 0.00631963, dev acc 0.8911, dev avg loss 0.304511, throughput 2.85403K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/162] avg loss 0.00611042, throughput 2.91347K wps
[Epoch 9 Batch 60/162] avg loss 0.00559149, throughput 2.83873K wps
[Epoch 9 Batch 90/162] avg loss 0.00609243, throughput 2.84181K wps
[Epoch 9 Batch 120/162] avg loss 0.00636795, throughput 2.84163K wps
[Epoch 9 Batch 150/162] avg loss 0.00598608, throughput 2.84103K wps
Begin Testing...
[Epoch 9] train avg loss 0.00599482, dev acc 0.9000, dev avg loss 0.291812, throughput 2.85391K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00577101, throughput 2.89625K wps
[Epoch 10 Batch 60/162] avg loss 0.00628384, throughput 2.83909K wps
[Epoch 10 Batch 90/162] avg loss 0.00526795, throughput 2.85444K wps
[Epoch 10 Batch 120/162] avg loss 0.00558748, throughput 2.83856K wps
[Epoch 10 Batch 150/162] avg loss 0.00553245, throughput 2.83909K wps
Begin Testing...
[Epoch 10] train avg loss 0.00568765, dev acc 0.9000, dev avg loss 0.281725, throughput 2.85274K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00549933, throughput 2.90294K wps
[Epoch 11 Batch 60/162] avg loss 0.0055968, throughput 2.83881K wps
[Epoch 11 Batch 90/162] avg loss 0.00563016, throughput 2.8423K wps
[Epoch 11 Batch 120/162] avg loss 0.005679, throughput 2.82767K wps
[Epoch 11 Batch 150/162] avg loss 0.00517713, throughput 2.84025K wps
Begin Testing...
[Epoch 11] train avg loss 0.00552848, dev acc 0.9022, dev avg loss 0.273148, throughput 2.84854K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00533536, throughput 2.89533K wps
[Epoch 12 Batch 60/162] avg loss 0.00530231, throughput 2.82777K wps
[Epoch 12 Batch 90/162] avg loss 0.00503567, throughput 2.8356K wps
[Epoch 12 Batch 120/162] avg loss 0.00536742, throughput 2.83185K wps
[Epoch 12 Batch 150/162] avg loss 0.00525926, throughput 2.83924K wps
Begin Testing...
[Epoch 12] train avg loss 0.00526917, dev acc 0.9056, dev avg loss 0.265735, throughput 2.84379K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.00502304, throughput 2.89574K wps
[Epoch 13 Batch 60/162] avg loss 0.00507359, throughput 2.8454K wps
[Epoch 13 Batch 90/162] avg loss 0.00516495, throughput 2.84051K wps
[Epoch 13 Batch 120/162] avg loss 0.0052458, throughput 2.83543K wps
[Epoch 13 Batch 150/162] avg loss 0.00476047, throughput 2.84645K wps
Begin Testing...
[Epoch 13] train avg loss 0.00507205, dev acc 0.9078, dev avg loss 0.259277, throughput 2.85187K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00443027, throughput 2.91084K wps
[Epoch 14 Batch 60/162] avg loss 0.00514703, throughput 2.83868K wps
[Epoch 14 Batch 90/162] avg loss 0.00488903, throughput 2.82869K wps
[Epoch 14 Batch 120/162] avg loss 0.00466742, throughput 2.83227K wps
[Epoch 14 Batch 150/162] avg loss 0.00534254, throughput 2.84361K wps
Begin Testing...
[Epoch 14] train avg loss 0.00488696, dev acc 0.9056, dev avg loss 0.253437, throughput 2.84872K wps
[Epoch 15 Batch 30/162] avg loss 0.0047635, throughput 2.91019K wps
[Epoch 15 Batch 60/162] avg loss 0.00446425, throughput 2.84275K wps
[Epoch 15 Batch 90/162] avg loss 0.00511211, throughput 2.84691K wps
[Epoch 15 Batch 120/162] avg loss 0.0049444, throughput 2.84222K wps
[Epoch 15 Batch 150/162] avg loss 0.00440198, throughput 2.85365K wps
Begin Testing...
[Epoch 15] train avg loss 0.00470818, dev acc 0.9078, dev avg loss 0.249351, throughput 2.85793K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/162] avg loss 0.00451966, throughput 2.90718K wps
[Epoch 16 Batch 60/162] avg loss 0.00465405, throughput 2.84478K wps
[Epoch 16 Batch 90/162] avg loss 0.00464047, throughput 2.83161K wps
[Epoch 16 Batch 120/162] avg loss 0.00443187, throughput 2.84767K wps
[Epoch 16 Batch 150/162] avg loss 0.00488056, throughput 2.84445K wps
Begin Testing...
[Epoch 16] train avg loss 0.00460403, dev acc 0.9089, dev avg loss 0.24418, throughput 2.85349K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.00430569, throughput 2.91044K wps
[Epoch 17 Batch 60/162] avg loss 0.0044237, throughput 2.82822K wps
[Epoch 17 Batch 90/162] avg loss 0.00468259, throughput 2.83437K wps
[Epoch 17 Batch 120/162] avg loss 0.00445558, throughput 2.83652K wps
[Epoch 17 Batch 150/162] avg loss 0.00433895, throughput 2.84642K wps
Begin Testing...
[Epoch 17] train avg loss 0.00441612, dev acc 0.9100, dev avg loss 0.240313, throughput 2.85082K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/162] avg loss 0.00441997, throughput 2.89246K wps
[Epoch 18 Batch 60/162] avg loss 0.0042518, throughput 2.83917K wps
[Epoch 18 Batch 90/162] avg loss 0.00415994, throughput 2.84826K wps
[Epoch 18 Batch 120/162] avg loss 0.00441306, throughput 2.8415K wps
[Epoch 18 Batch 150/162] avg loss 0.00396929, throughput 2.83704K wps
Begin Testing...
[Epoch 18] train avg loss 0.00424468, dev acc 0.9122, dev avg loss 0.236465, throughput 2.85001K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.00421234, throughput 2.90328K wps
[Epoch 19 Batch 60/162] avg loss 0.0041662, throughput 2.83313K wps
[Epoch 19 Batch 90/162] avg loss 0.00422461, throughput 2.83383K wps
[Epoch 19 Batch 120/162] avg loss 0.00392326, throughput 2.83816K wps
[Epoch 19 Batch 150/162] avg loss 0.00409339, throughput 2.83723K wps
Begin Testing...
[Epoch 19] train avg loss 0.00412547, dev acc 0.9122, dev avg loss 0.232744, throughput 2.84716K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/162] avg loss 0.0041724, throughput 2.89851K wps
[Epoch 20 Batch 60/162] avg loss 0.00372187, throughput 2.8369K wps
[Epoch 20 Batch 90/162] avg loss 0.00396557, throughput 2.83823K wps
[Epoch 20 Batch 120/162] avg loss 0.00442529, throughput 2.84011K wps
[Epoch 20 Batch 150/162] avg loss 0.00390474, throughput 2.83353K wps
Begin Testing...
[Epoch 20] train avg loss 0.00402459, dev acc 0.9122, dev avg loss 0.229959, throughput 2.84889K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/162] avg loss 0.0037796, throughput 2.90574K wps
[Epoch 21 Batch 60/162] avg loss 0.0041537, throughput 2.84355K wps
[Epoch 21 Batch 90/162] avg loss 0.0036968, throughput 2.84432K wps
[Epoch 21 Batch 120/162] avg loss 0.0038533, throughput 2.83227K wps
[Epoch 21 Batch 150/162] avg loss 0.00401509, throughput 2.83936K wps
Begin Testing...
[Epoch 21] train avg loss 0.00389126, dev acc 0.9122, dev avg loss 0.227124, throughput 2.85188K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.00375962, throughput 2.90128K wps
[Epoch 22 Batch 60/162] avg loss 0.00355488, throughput 2.85505K wps
[Epoch 22 Batch 90/162] avg loss 0.00369527, throughput 2.84478K wps
[Epoch 22 Batch 120/162] avg loss 0.00399499, throughput 2.83469K wps
[Epoch 22 Batch 150/162] avg loss 0.00361508, throughput 2.83877K wps
Begin Testing...
[Epoch 22] train avg loss 0.00372205, dev acc 0.9156, dev avg loss 0.22492, throughput 2.85372K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/162] avg loss 0.00374691, throughput 2.91537K wps
[Epoch 23 Batch 60/162] avg loss 0.00407385, throughput 2.8355K wps
[Epoch 23 Batch 90/162] avg loss 0.00366879, throughput 2.83915K wps
[Epoch 23 Batch 120/162] avg loss 0.00378582, throughput 2.84785K wps
[Epoch 23 Batch 150/162] avg loss 0.00338132, throughput 2.84516K wps
Begin Testing...
[Epoch 23] train avg loss 0.00373302, dev acc 0.9156, dev avg loss 0.221872, throughput 2.8546K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/162] avg loss 0.0037078, throughput 2.911K wps
[Epoch 24 Batch 60/162] avg loss 0.00357885, throughput 2.83316K wps
[Epoch 24 Batch 90/162] avg loss 0.00347431, throughput 2.83857K wps
[Epoch 24 Batch 120/162] avg loss 0.00370265, throughput 2.85298K wps
[Epoch 24 Batch 150/162] avg loss 0.00346873, throughput 2.86139K wps
Begin Testing...
[Epoch 24] train avg loss 0.00357847, dev acc 0.9156, dev avg loss 0.219768, throughput 2.85849K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/162] avg loss 0.00361172, throughput 2.9087K wps
[Epoch 25 Batch 60/162] avg loss 0.00346003, throughput 2.85296K wps
[Epoch 25 Batch 90/162] avg loss 0.00330725, throughput 2.84138K wps
[Epoch 25 Batch 120/162] avg loss 0.00336411, throughput 2.83605K wps
[Epoch 25 Batch 150/162] avg loss 0.00347455, throughput 2.84929K wps
Begin Testing...
[Epoch 25] train avg loss 0.0034686, dev acc 0.9167, dev avg loss 0.21764, throughput 2.85699K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/162] avg loss 0.00402475, throughput 2.91732K wps
[Epoch 26 Batch 60/162] avg loss 0.00308861, throughput 2.84346K wps
[Epoch 26 Batch 90/162] avg loss 0.00344927, throughput 2.83842K wps
[Epoch 26 Batch 120/162] avg loss 0.00319433, throughput 2.83673K wps
[Epoch 26 Batch 150/162] avg loss 0.00359479, throughput 2.83281K wps
Begin Testing...
[Epoch 26] train avg loss 0.00346199, dev acc 0.9144, dev avg loss 0.215693, throughput 2.85204K wps
[Epoch 27 Batch 30/162] avg loss 0.00344918, throughput 2.92039K wps
[Epoch 27 Batch 60/162] avg loss 0.00317007, throughput 2.84601K wps
[Epoch 27 Batch 90/162] avg loss 0.00338217, throughput 2.83236K wps
[Epoch 27 Batch 120/162] avg loss 0.00352161, throughput 2.84779K wps
[Epoch 27 Batch 150/162] avg loss 0.00331657, throughput 2.8374K wps
Begin Testing...
[Epoch 27] train avg loss 0.00336357, dev acc 0.9167, dev avg loss 0.213586, throughput 2.8552K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/162] avg loss 0.00322677, throughput 2.90811K wps
[Epoch 28 Batch 60/162] avg loss 0.00316121, throughput 2.83138K wps
[Epoch 28 Batch 90/162] avg loss 0.00334509, throughput 2.84301K wps
[Epoch 28 Batch 120/162] avg loss 0.00319555, throughput 2.84637K wps
[Epoch 28 Batch 150/162] avg loss 0.00327188, throughput 2.83717K wps
Begin Testing...
[Epoch 28] train avg loss 0.00321139, dev acc 0.9167, dev avg loss 0.211811, throughput 2.8521K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/162] avg loss 0.00341063, throughput 2.89798K wps
[Epoch 29 Batch 60/162] avg loss 0.00335957, throughput 2.83847K wps
[Epoch 29 Batch 90/162] avg loss 0.00313581, throughput 2.84212K wps
[Epoch 29 Batch 120/162] avg loss 0.0029833, throughput 2.84332K wps
[Epoch 29 Batch 150/162] avg loss 0.00287808, throughput 2.83148K wps
Begin Testing...
[Epoch 29] train avg loss 0.0031229, dev acc 0.9156, dev avg loss 0.210349, throughput 2.84832K wps
[Epoch 30 Batch 30/162] avg loss 0.00310098, throughput 2.91188K wps
[Epoch 30 Batch 60/162] avg loss 0.00271556, throughput 2.83397K wps
[Epoch 30 Batch 90/162] avg loss 0.00294585, throughput 2.83365K wps
[Epoch 30 Batch 120/162] avg loss 0.003139, throughput 2.83596K wps
[Epoch 30 Batch 150/162] avg loss 0.00313642, throughput 2.85111K wps
Begin Testing...
[Epoch 30] train avg loss 0.00304875, dev acc 0.9167, dev avg loss 0.209073, throughput 2.85136K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00309116, throughput 2.89743K wps
[Epoch 31 Batch 60/162] avg loss 0.00288289, throughput 2.8317K wps
[Epoch 31 Batch 90/162] avg loss 0.00329923, throughput 2.83151K wps
[Epoch 31 Batch 120/162] avg loss 0.00282542, throughput 2.82495K wps
[Epoch 31 Batch 150/162] avg loss 0.00283972, throughput 2.84919K wps
Begin Testing...
[Epoch 31] train avg loss 0.00297711, dev acc 0.9178, dev avg loss 0.207018, throughput 2.84589K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/162] avg loss 0.00308695, throughput 2.90873K wps
[Epoch 32 Batch 60/162] avg loss 0.00269273, throughput 2.84269K wps
[Epoch 32 Batch 90/162] avg loss 0.00276244, throughput 2.84524K wps
[Epoch 32 Batch 120/162] avg loss 0.00300614, throughput 2.83272K wps
[Epoch 32 Batch 150/162] avg loss 0.00310024, throughput 2.84853K wps
Begin Testing...
[Epoch 32] train avg loss 0.0029569, dev acc 0.9200, dev avg loss 0.206475, throughput 2.85431K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/162] avg loss 0.00262098, throughput 2.90021K wps
[Epoch 33 Batch 60/162] avg loss 0.00292666, throughput 2.8509K wps
[Epoch 33 Batch 90/162] avg loss 0.00296333, throughput 2.83591K wps
[Epoch 33 Batch 120/162] avg loss 0.00296343, throughput 2.84537K wps
[Epoch 33 Batch 150/162] avg loss 0.00265802, throughput 2.84051K wps
Begin Testing...
[Epoch 33] train avg loss 0.00280718, dev acc 0.9200, dev avg loss 0.204304, throughput 2.85276K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/162] avg loss 0.00264441, throughput 2.90935K wps
[Epoch 34 Batch 60/162] avg loss 0.00250776, throughput 2.83036K wps
[Epoch 34 Batch 90/162] avg loss 0.00261728, throughput 2.83898K wps
[Epoch 34 Batch 120/162] avg loss 0.0029601, throughput 2.83824K wps
[Epoch 34 Batch 150/162] avg loss 0.00294987, throughput 2.85163K wps
Begin Testing...
[Epoch 34] train avg loss 0.00272513, dev acc 0.9233, dev avg loss 0.203841, throughput 2.85226K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/162] avg loss 0.00279708, throughput 2.90483K wps
[Epoch 35 Batch 60/162] avg loss 0.00272807, throughput 2.83475K wps
[Epoch 35 Batch 90/162] avg loss 0.00252305, throughput 2.84728K wps
[Epoch 35 Batch 120/162] avg loss 0.00269388, throughput 2.8422K wps
[Epoch 35 Batch 150/162] avg loss 0.00258039, throughput 2.84141K wps
Begin Testing...
[Epoch 35] train avg loss 0.00266572, dev acc 0.9211, dev avg loss 0.203053, throughput 2.8528K wps
[Epoch 36 Batch 30/162] avg loss 0.00300612, throughput 2.90105K wps
[Epoch 36 Batch 60/162] avg loss 0.00258677, throughput 2.81946K wps
[Epoch 36 Batch 90/162] avg loss 0.00269211, throughput 2.8316K wps
[Epoch 36 Batch 120/162] avg loss 0.00263407, throughput 2.82671K wps
[Epoch 36 Batch 150/162] avg loss 0.00232923, throughput 2.83979K wps
Begin Testing...
[Epoch 36] train avg loss 0.00264363, dev acc 0.9233, dev avg loss 0.20106, throughput 2.84281K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/162] avg loss 0.00283289, throughput 2.91156K wps
[Epoch 37 Batch 60/162] avg loss 0.00249935, throughput 2.82542K wps
[Epoch 37 Batch 90/162] avg loss 0.00222129, throughput 2.82915K wps
[Epoch 37 Batch 120/162] avg loss 0.00239332, throughput 2.83893K wps
[Epoch 37 Batch 150/162] avg loss 0.002533, throughput 2.82867K wps
Begin Testing...
[Epoch 37] train avg loss 0.0024967, dev acc 0.9233, dev avg loss 0.200431, throughput 2.84557K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/162] avg loss 0.00223765, throughput 2.89288K wps
[Epoch 38 Batch 60/162] avg loss 0.00254764, throughput 2.83167K wps
[Epoch 38 Batch 90/162] avg loss 0.00259859, throughput 2.82323K wps
[Epoch 38 Batch 120/162] avg loss 0.00240099, throughput 2.8317K wps
[Epoch 38 Batch 150/162] avg loss 0.00257976, throughput 2.83456K wps
Begin Testing...
[Epoch 38] train avg loss 0.00246492, dev acc 0.9211, dev avg loss 0.198937, throughput 2.8414K wps
[Epoch 39 Batch 30/162] avg loss 0.00238578, throughput 2.90016K wps
[Epoch 39 Batch 60/162] avg loss 0.00231533, throughput 2.83268K wps
[Epoch 39 Batch 90/162] avg loss 0.00217344, throughput 2.8202K wps
[Epoch 39 Batch 120/162] avg loss 0.0024308, throughput 2.83752K wps
[Epoch 39 Batch 150/162] avg loss 0.00263748, throughput 2.8247K wps
Begin Testing...
[Epoch 39] train avg loss 0.00237972, dev acc 0.9222, dev avg loss 0.198049, throughput 2.8414K wps
[Epoch 40 Batch 30/162] avg loss 0.00221866, throughput 2.89647K wps
[Epoch 40 Batch 60/162] avg loss 0.0021455, throughput 2.83319K wps
[Epoch 40 Batch 90/162] avg loss 0.00258721, throughput 2.83918K wps
[Epoch 40 Batch 120/162] avg loss 0.00246151, throughput 2.82607K wps
[Epoch 40 Batch 150/162] avg loss 0.0020601, throughput 2.83226K wps
Begin Testing...
[Epoch 40] train avg loss 0.00230504, dev acc 0.9244, dev avg loss 0.197067, throughput 2.84393K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/162] avg loss 0.00218509, throughput 2.90048K wps
[Epoch 41 Batch 60/162] avg loss 0.00216948, throughput 2.82798K wps
[Epoch 41 Batch 90/162] avg loss 0.00239903, throughput 2.84978K wps
[Epoch 41 Batch 120/162] avg loss 0.002418, throughput 2.83261K wps
[Epoch 41 Batch 150/162] avg loss 0.00223519, throughput 2.84038K wps
Begin Testing...
[Epoch 41] train avg loss 0.00230269, dev acc 0.9267, dev avg loss 0.197416, throughput 2.84926K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/162] avg loss 0.00219776, throughput 2.9061K wps
[Epoch 42 Batch 60/162] avg loss 0.00240096, throughput 2.8423K wps
[Epoch 42 Batch 90/162] avg loss 0.00211975, throughput 2.84782K wps
[Epoch 42 Batch 120/162] avg loss 0.00209473, throughput 2.83877K wps
[Epoch 42 Batch 150/162] avg loss 0.00223135, throughput 2.83801K wps
Begin Testing...
[Epoch 42] train avg loss 0.00221572, dev acc 0.9267, dev avg loss 0.195167, throughput 2.85404K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/162] avg loss 0.00211338, throughput 2.90355K wps
[Epoch 43 Batch 60/162] avg loss 0.00204494, throughput 2.84511K wps
[Epoch 43 Batch 90/162] avg loss 0.0021897, throughput 2.84129K wps
[Epoch 43 Batch 120/162] avg loss 0.00217545, throughput 2.8279K wps
[Epoch 43 Batch 150/162] avg loss 0.00208255, throughput 2.84821K wps
Begin Testing...
[Epoch 43] train avg loss 0.0021085, dev acc 0.9256, dev avg loss 0.194434, throughput 2.85235K wps
[Epoch 44 Batch 30/162] avg loss 0.00214858, throughput 2.91025K wps
[Epoch 44 Batch 60/162] avg loss 0.00216616, throughput 2.83251K wps
[Epoch 44 Batch 90/162] avg loss 0.00210871, throughput 2.83283K wps
[Epoch 44 Batch 120/162] avg loss 0.00210802, throughput 2.8413K wps
[Epoch 44 Batch 150/162] avg loss 0.00183967, throughput 2.84599K wps
Begin Testing...
[Epoch 44] train avg loss 0.00208623, dev acc 0.9289, dev avg loss 0.193162, throughput 2.85076K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/162] avg loss 0.00192515, throughput 2.9224K wps
[Epoch 45 Batch 60/162] avg loss 0.00202944, throughput 2.83561K wps
[Epoch 45 Batch 90/162] avg loss 0.00198063, throughput 2.84279K wps
[Epoch 45 Batch 120/162] avg loss 0.00198954, throughput 2.83484K wps
[Epoch 45 Batch 150/162] avg loss 0.00217147, throughput 2.83391K wps
Begin Testing...
[Epoch 45] train avg loss 0.00200001, dev acc 0.9267, dev avg loss 0.192648, throughput 2.8527K wps
[Epoch 46 Batch 30/162] avg loss 0.00197594, throughput 2.90633K wps
[Epoch 46 Batch 60/162] avg loss 0.00214851, throughput 2.84332K wps
[Epoch 46 Batch 90/162] avg loss 0.00168587, throughput 2.84444K wps
[Epoch 46 Batch 120/162] avg loss 0.00198328, throughput 2.84179K wps
[Epoch 46 Batch 150/162] avg loss 0.00187407, throughput 2.8435K wps
Begin Testing...
[Epoch 46] train avg loss 0.00194347, dev acc 0.9256, dev avg loss 0.192406, throughput 2.85428K wps
[Epoch 47 Batch 30/162] avg loss 0.00178976, throughput 2.90437K wps
[Epoch 47 Batch 60/162] avg loss 0.00206766, throughput 2.83877K wps
[Epoch 47 Batch 90/162] avg loss 0.00175742, throughput 2.83051K wps
[Epoch 47 Batch 120/162] avg loss 0.00189424, throughput 2.84333K wps
[Epoch 47 Batch 150/162] avg loss 0.00222871, throughput 2.83458K wps
Begin Testing...
[Epoch 47] train avg loss 0.00196204, dev acc 0.9289, dev avg loss 0.191599, throughput 2.84794K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/162] avg loss 0.00174392, throughput 2.88572K wps
[Epoch 48 Batch 60/162] avg loss 0.00184949, throughput 2.83097K wps
[Epoch 48 Batch 90/162] avg loss 0.00209723, throughput 2.83643K wps
[Epoch 48 Batch 120/162] avg loss 0.00188962, throughput 2.83781K wps
[Epoch 48 Batch 150/162] avg loss 0.00168696, throughput 2.84389K wps
Begin Testing...
[Epoch 48] train avg loss 0.00182395, dev acc 0.9289, dev avg loss 0.190606, throughput 2.84536K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/162] avg loss 0.00190139, throughput 2.89754K wps
[Epoch 49 Batch 60/162] avg loss 0.00186958, throughput 2.83751K wps
[Epoch 49 Batch 90/162] avg loss 0.00182204, throughput 2.83407K wps
[Epoch 49 Batch 120/162] avg loss 0.00166898, throughput 2.83432K wps
[Epoch 49 Batch 150/162] avg loss 0.0018332, throughput 2.83616K wps
Begin Testing...
[Epoch 49] train avg loss 0.00180121, dev acc 0.9289, dev avg loss 0.19014, throughput 2.84706K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/162] avg loss 0.00184379, throughput 2.88593K wps
[Epoch 50 Batch 60/162] avg loss 0.00171805, throughput 2.83072K wps
[Epoch 50 Batch 90/162] avg loss 0.0017333, throughput 2.84061K wps
[Epoch 50 Batch 120/162] avg loss 0.00167107, throughput 2.83265K wps
[Epoch 50 Batch 150/162] avg loss 0.00172321, throughput 2.83566K wps
Begin Testing...
[Epoch 50] train avg loss 0.00175066, dev acc 0.9289, dev avg loss 0.189733, throughput 2.8459K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/162] avg loss 0.00176195, throughput 2.91163K wps
[Epoch 51 Batch 60/162] avg loss 0.00163816, throughput 2.84795K wps
[Epoch 51 Batch 90/162] avg loss 0.00186228, throughput 2.84115K wps
[Epoch 51 Batch 120/162] avg loss 0.00166291, throughput 2.84038K wps
[Epoch 51 Batch 150/162] avg loss 0.00157746, throughput 2.84302K wps
Begin Testing...
[Epoch 51] train avg loss 0.00171292, dev acc 0.9300, dev avg loss 0.188829, throughput 2.85546K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/162] avg loss 0.0017354, throughput 2.90982K wps
[Epoch 52 Batch 60/162] avg loss 0.00163848, throughput 2.84658K wps
[Epoch 52 Batch 90/162] avg loss 0.00169282, throughput 2.83037K wps
[Epoch 52 Batch 120/162] avg loss 0.00160424, throughput 2.83723K wps
[Epoch 52 Batch 150/162] avg loss 0.00155827, throughput 2.83534K wps
Begin Testing...
[Epoch 52] train avg loss 0.00164012, dev acc 0.9300, dev avg loss 0.189608, throughput 2.85091K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/162] avg loss 0.00166866, throughput 2.91289K wps
[Epoch 53 Batch 60/162] avg loss 0.00149184, throughput 2.84555K wps
[Epoch 53 Batch 90/162] avg loss 0.00183456, throughput 2.84221K wps
[Epoch 53 Batch 120/162] avg loss 0.0015916, throughput 2.8395K wps
[Epoch 53 Batch 150/162] avg loss 0.00146107, throughput 2.83934K wps
Begin Testing...
[Epoch 53] train avg loss 0.00160831, dev acc 0.9289, dev avg loss 0.1883, throughput 2.85417K wps
[Epoch 54 Batch 30/162] avg loss 0.00141831, throughput 2.91595K wps
[Epoch 54 Batch 60/162] avg loss 0.00164035, throughput 2.83798K wps
[Epoch 54 Batch 90/162] avg loss 0.00156381, throughput 2.84401K wps
[Epoch 54 Batch 120/162] avg loss 0.00145662, throughput 2.83589K wps
[Epoch 54 Batch 150/162] avg loss 0.00171038, throughput 2.83383K wps
Begin Testing...
[Epoch 54] train avg loss 0.00155846, dev acc 0.9311, dev avg loss 0.187836, throughput 2.85278K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/162] avg loss 0.0015412, throughput 2.88645K wps
[Epoch 55 Batch 60/162] avg loss 0.00157233, throughput 2.8359K wps
[Epoch 55 Batch 90/162] avg loss 0.00142593, throughput 2.84353K wps
[Epoch 55 Batch 120/162] avg loss 0.0015118, throughput 2.82917K wps
[Epoch 55 Batch 150/162] avg loss 0.00175022, throughput 2.83654K wps
Begin Testing...
[Epoch 55] train avg loss 0.00156815, dev acc 0.9322, dev avg loss 0.187243, throughput 2.84638K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/162] avg loss 0.00148366, throughput 2.90954K wps
[Epoch 56 Batch 60/162] avg loss 0.00160023, throughput 2.83161K wps
[Epoch 56 Batch 90/162] avg loss 0.00146048, throughput 2.83471K wps
[Epoch 56 Batch 120/162] avg loss 0.0015459, throughput 2.84578K wps
[Epoch 56 Batch 150/162] avg loss 0.0013756, throughput 2.83353K wps
Begin Testing...
[Epoch 56] train avg loss 0.00149979, dev acc 0.9289, dev avg loss 0.186689, throughput 2.84821K wps
[Epoch 57 Batch 30/162] avg loss 0.00148406, throughput 2.91554K wps
[Epoch 57 Batch 60/162] avg loss 0.00145564, throughput 2.82879K wps
[Epoch 57 Batch 90/162] avg loss 0.00148995, throughput 2.84666K wps
[Epoch 57 Batch 120/162] avg loss 0.00143445, throughput 2.84306K wps
[Epoch 57 Batch 150/162] avg loss 0.00137258, throughput 2.83666K wps
Begin Testing...
[Epoch 57] train avg loss 0.00142974, dev acc 0.9300, dev avg loss 0.185912, throughput 2.85307K wps
[Epoch 58 Batch 30/162] avg loss 0.00155744, throughput 2.90649K wps
[Epoch 58 Batch 60/162] avg loss 0.00142941, throughput 2.84063K wps
[Epoch 58 Batch 90/162] avg loss 0.00134137, throughput 2.83745K wps
[Epoch 58 Batch 120/162] avg loss 0.00149516, throughput 2.83894K wps
[Epoch 58 Batch 150/162] avg loss 0.00129279, throughput 2.84895K wps
Begin Testing...
[Epoch 58] train avg loss 0.00141236, dev acc 0.9289, dev avg loss 0.187162, throughput 2.85284K wps
[Epoch 59 Batch 30/162] avg loss 0.00149277, throughput 2.91698K wps
[Epoch 59 Batch 60/162] avg loss 0.00140666, throughput 2.83114K wps
[Epoch 59 Batch 90/162] avg loss 0.00145836, throughput 2.83552K wps
[Epoch 59 Batch 120/162] avg loss 0.00134758, throughput 2.84628K wps
[Epoch 59 Batch 150/162] avg loss 0.00137173, throughput 2.85076K wps
Begin Testing...
[Epoch 59] train avg loss 0.00140682, dev acc 0.9344, dev avg loss 0.18591, throughput 2.85394K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/162] avg loss 0.00139762, throughput 2.91004K wps
[Epoch 60 Batch 60/162] avg loss 0.00136822, throughput 2.85187K wps
[Epoch 60 Batch 90/162] avg loss 0.00151407, throughput 2.83715K wps
[Epoch 60 Batch 120/162] avg loss 0.00129289, throughput 2.84765K wps
[Epoch 60 Batch 150/162] avg loss 0.00131781, throughput 2.849K wps
Begin Testing...
[Epoch 60] train avg loss 0.00137371, dev acc 0.9311, dev avg loss 0.185464, throughput 2.85797K wps
[Epoch 61 Batch 30/162] avg loss 0.00123346, throughput 2.91253K wps
[Epoch 61 Batch 60/162] avg loss 0.0013987, throughput 2.84341K wps
[Epoch 61 Batch 90/162] avg loss 0.00137032, throughput 2.835K wps
[Epoch 61 Batch 120/162] avg loss 0.00139573, throughput 2.84088K wps
[Epoch 61 Batch 150/162] avg loss 0.00131033, throughput 2.83731K wps
Begin Testing...
[Epoch 61] train avg loss 0.00134708, dev acc 0.9322, dev avg loss 0.184912, throughput 2.85286K wps
[Epoch 62 Batch 30/162] avg loss 0.00124662, throughput 2.91968K wps
[Epoch 62 Batch 60/162] avg loss 0.00126246, throughput 2.84647K wps
[Epoch 62 Batch 90/162] avg loss 0.00129998, throughput 2.83206K wps
[Epoch 62 Batch 120/162] avg loss 0.0013145, throughput 2.84074K wps
[Epoch 62 Batch 150/162] avg loss 0.001311, throughput 2.83382K wps
Begin Testing...
[Epoch 62] train avg loss 0.001282, dev acc 0.9311, dev avg loss 0.185878, throughput 2.8532K wps
[Epoch 63 Batch 30/162] avg loss 0.00129871, throughput 2.90738K wps
[Epoch 63 Batch 60/162] avg loss 0.00126869, throughput 2.83359K wps
[Epoch 63 Batch 90/162] avg loss 0.00124071, throughput 2.83915K wps
[Epoch 63 Batch 120/162] avg loss 0.00136249, throughput 2.82676K wps
[Epoch 63 Batch 150/162] avg loss 0.00123686, throughput 2.84432K wps
Begin Testing...
[Epoch 63] train avg loss 0.00128074, dev acc 0.9322, dev avg loss 0.184222, throughput 2.84871K wps
[Epoch 64 Batch 30/162] avg loss 0.00119617, throughput 2.91825K wps
[Epoch 64 Batch 60/162] avg loss 0.00121019, throughput 2.84529K wps
[Epoch 64 Batch 90/162] avg loss 0.00130335, throughput 2.83417K wps
[Epoch 64 Batch 120/162] avg loss 0.00123781, throughput 2.83254K wps
[Epoch 64 Batch 150/162] avg loss 0.00122243, throughput 2.83216K wps
Begin Testing...
[Epoch 64] train avg loss 0.0012351, dev acc 0.9333, dev avg loss 0.183774, throughput 2.84924K wps
[Epoch 65 Batch 30/162] avg loss 0.00120271, throughput 2.90501K wps
[Epoch 65 Batch 60/162] avg loss 0.00113962, throughput 2.82632K wps
[Epoch 65 Batch 90/162] avg loss 0.00110716, throughput 2.83157K wps
[Epoch 65 Batch 120/162] avg loss 0.00129857, throughput 2.83716K wps
[Epoch 65 Batch 150/162] avg loss 0.00118523, throughput 2.83805K wps
Begin Testing...
[Epoch 65] train avg loss 0.00118407, dev acc 0.9322, dev avg loss 0.183849, throughput 2.84744K wps
[Epoch 66 Batch 30/162] avg loss 0.00106894, throughput 2.89423K wps
[Epoch 66 Batch 60/162] avg loss 0.00109213, throughput 2.84183K wps
[Epoch 66 Batch 90/162] avg loss 0.00113959, throughput 2.83937K wps
[Epoch 66 Batch 120/162] avg loss 0.00114865, throughput 2.83652K wps
[Epoch 66 Batch 150/162] avg loss 0.00118804, throughput 2.84512K wps
Begin Testing...
[Epoch 66] train avg loss 0.00112133, dev acc 0.9300, dev avg loss 0.184237, throughput 2.85003K wps
[Epoch 67 Batch 30/162] avg loss 0.00103557, throughput 2.91745K wps
[Epoch 67 Batch 60/162] avg loss 0.00114714, throughput 2.83435K wps
[Epoch 67 Batch 90/162] avg loss 0.00108848, throughput 2.83607K wps
[Epoch 67 Batch 120/162] avg loss 0.00115254, throughput 2.83294K wps
[Epoch 67 Batch 150/162] avg loss 0.00114719, throughput 2.83748K wps
Begin Testing...
[Epoch 67] train avg loss 0.00113039, dev acc 0.9311, dev avg loss 0.183341, throughput 2.85037K wps
[Epoch 68 Batch 30/162] avg loss 0.00116525, throughput 2.8904K wps
[Epoch 68 Batch 60/162] avg loss 0.00122872, throughput 2.84894K wps
[Epoch 68 Batch 90/162] avg loss 0.00102199, throughput 2.83866K wps
[Epoch 68 Batch 120/162] avg loss 0.00112037, throughput 2.84254K wps
[Epoch 68 Batch 150/162] avg loss 0.000963096, throughput 2.84745K wps
Begin Testing...
[Epoch 68] train avg loss 0.00108895, dev acc 0.9333, dev avg loss 0.183225, throughput 2.85361K wps
[Epoch 69 Batch 30/162] avg loss 0.00104626, throughput 2.90806K wps
[Epoch 69 Batch 60/162] avg loss 0.00103931, throughput 2.84582K wps
[Epoch 69 Batch 90/162] avg loss 0.00110898, throughput 2.83499K wps
[Epoch 69 Batch 120/162] avg loss 0.00110468, throughput 2.8388K wps
[Epoch 69 Batch 150/162] avg loss 0.00106734, throughput 2.83773K wps
Begin Testing...
[Epoch 69] train avg loss 0.00108243, dev acc 0.9322, dev avg loss 0.183295, throughput 2.85175K wps
[Epoch 70 Batch 30/162] avg loss 0.00100946, throughput 2.90738K wps
[Epoch 70 Batch 60/162] avg loss 0.00116452, throughput 2.83159K wps
[Epoch 70 Batch 90/162] avg loss 0.00104269, throughput 2.84556K wps
[Epoch 70 Batch 120/162] avg loss 0.00105933, throughput 2.84219K wps
[Epoch 70 Batch 150/162] avg loss 0.000918106, throughput 2.84468K wps
Begin Testing...
[Epoch 70] train avg loss 0.0010514, dev acc 0.9311, dev avg loss 0.18299, throughput 2.85193K wps
[Epoch 71 Batch 30/162] avg loss 0.00110096, throughput 2.92111K wps
[Epoch 71 Batch 60/162] avg loss 0.00102896, throughput 2.84332K wps
[Epoch 71 Batch 90/162] avg loss 0.00104232, throughput 2.84272K wps
[Epoch 71 Batch 120/162] avg loss 0.000943957, throughput 2.8439K wps
[Epoch 71 Batch 150/162] avg loss 0.00107454, throughput 2.84019K wps
Begin Testing...
[Epoch 71] train avg loss 0.00103824, dev acc 0.9344, dev avg loss 0.182898, throughput 2.85555K wps
Observed Improvement.
Begin Testing...
[Epoch 72 Batch 30/162] avg loss 0.000974019, throughput 2.89492K wps
[Epoch 72 Batch 60/162] avg loss 0.00102375, throughput 2.83115K wps
[Epoch 72 Batch 90/162] avg loss 0.000996869, throughput 2.8436K wps
[Epoch 72 Batch 120/162] avg loss 0.000863472, throughput 2.84109K wps
[Epoch 72 Batch 150/162] avg loss 0.000956585, throughput 2.83996K wps
Begin Testing...
[Epoch 72] train avg loss 0.000961987, dev acc 0.9311, dev avg loss 0.183314, throughput 2.85088K wps
[Epoch 73 Batch 30/162] avg loss 0.000997844, throughput 2.90902K wps
[Epoch 73 Batch 60/162] avg loss 0.00106573, throughput 2.8415K wps
[Epoch 73 Batch 90/162] avg loss 0.000937525, throughput 2.83516K wps
[Epoch 73 Batch 120/162] avg loss 0.000978785, throughput 2.84018K wps
[Epoch 73 Batch 150/162] avg loss 0.000957118, throughput 2.84086K wps
Begin Testing...
[Epoch 73] train avg loss 0.000985755, dev acc 0.9333, dev avg loss 0.182787, throughput 2.85098K wps
[Epoch 74 Batch 30/162] avg loss 0.000957312, throughput 2.89909K wps
[Epoch 74 Batch 60/162] avg loss 0.000862203, throughput 2.82756K wps
[Epoch 74 Batch 90/162] avg loss 0.000838705, throughput 2.82757K wps
[Epoch 74 Batch 120/162] avg loss 0.00100152, throughput 2.82397K wps
[Epoch 74 Batch 150/162] avg loss 0.000922841, throughput 2.83149K wps
Begin Testing...
[Epoch 74] train avg loss 0.000912906, dev acc 0.9311, dev avg loss 0.182811, throughput 2.83866K wps
[Epoch 75 Batch 30/162] avg loss 0.000962457, throughput 2.8984K wps
[Epoch 75 Batch 60/162] avg loss 0.000929875, throughput 2.83308K wps
[Epoch 75 Batch 90/162] avg loss 0.000927322, throughput 2.83607K wps
[Epoch 75 Batch 120/162] avg loss 0.00085694, throughput 2.84203K wps
[Epoch 75 Batch 150/162] avg loss 0.000953598, throughput 2.82874K wps
Begin Testing...
[Epoch 75] train avg loss 0.000952552, dev acc 0.9333, dev avg loss 0.182662, throughput 2.84795K wps
[Epoch 76 Batch 30/162] avg loss 0.000993292, throughput 2.89804K wps
[Epoch 76 Batch 60/162] avg loss 0.000873432, throughput 2.84643K wps
[Epoch 76 Batch 90/162] avg loss 0.000790941, throughput 2.85686K wps
[Epoch 76 Batch 120/162] avg loss 0.000874315, throughput 2.83624K wps
[Epoch 76 Batch 150/162] avg loss 0.000935164, throughput 2.84408K wps
Begin Testing...
[Epoch 76] train avg loss 0.000902938, dev acc 0.9333, dev avg loss 0.182595, throughput 2.85543K wps
[Epoch 77 Batch 30/162] avg loss 0.00100602, throughput 2.90126K wps
[Epoch 77 Batch 60/162] avg loss 0.000799757, throughput 2.8387K wps
[Epoch 77 Batch 90/162] avg loss 0.000870575, throughput 2.83516K wps
[Epoch 77 Batch 120/162] avg loss 0.000857771, throughput 2.83752K wps
[Epoch 77 Batch 150/162] avg loss 0.000819546, throughput 2.84928K wps
Begin Testing...
[Epoch 77] train avg loss 0.00089099, dev acc 0.9311, dev avg loss 0.182305, throughput 2.85127K wps
[Epoch 78 Batch 30/162] avg loss 0.000790523, throughput 2.90743K wps
[Epoch 78 Batch 60/162] avg loss 0.000871937, throughput 2.83875K wps
[Epoch 78 Batch 90/162] avg loss 0.000946967, throughput 2.83389K wps
[Epoch 78 Batch 120/162] avg loss 0.000782921, throughput 2.84535K wps
[Epoch 78 Batch 150/162] avg loss 0.000837696, throughput 2.83598K wps
Begin Testing...
[Epoch 78] train avg loss 0.000843904, dev acc 0.9322, dev avg loss 0.182282, throughput 2.85102K wps
[Epoch 79 Batch 30/162] avg loss 0.000877945, throughput 2.90446K wps
[Epoch 79 Batch 60/162] avg loss 0.000813817, throughput 2.84324K wps
[Epoch 79 Batch 90/162] avg loss 0.00085052, throughput 2.84024K wps
[Epoch 79 Batch 120/162] avg loss 0.000897492, throughput 2.83858K wps
[Epoch 79 Batch 150/162] avg loss 0.000843201, throughput 2.83659K wps
Begin Testing...
[Epoch 79] train avg loss 0.000857228, dev acc 0.9311, dev avg loss 0.183773, throughput 2.85072K wps
[Epoch 80 Batch 30/162] avg loss 0.000875932, throughput 2.90416K wps
[Epoch 80 Batch 60/162] avg loss 0.000837796, throughput 2.83608K wps
[Epoch 80 Batch 90/162] avg loss 0.000887206, throughput 2.83054K wps
[Epoch 80 Batch 120/162] avg loss 0.000869363, throughput 2.83386K wps
[Epoch 80 Batch 150/162] avg loss 0.000711856, throughput 2.84765K wps
Begin Testing...
[Epoch 80] train avg loss 0.000828721, dev acc 0.9333, dev avg loss 0.182639, throughput 2.84951K wps
[Epoch 81 Batch 30/162] avg loss 0.000877208, throughput 2.91122K wps
[Epoch 81 Batch 60/162] avg loss 0.000757106, throughput 2.83745K wps
[Epoch 81 Batch 90/162] avg loss 0.000871243, throughput 2.83635K wps
[Epoch 81 Batch 120/162] avg loss 0.000711886, throughput 2.8392K wps
[Epoch 81 Batch 150/162] avg loss 0.00073823, throughput 2.8158K wps
Begin Testing...
[Epoch 81] train avg loss 0.000806757, dev acc 0.9333, dev avg loss 0.182731, throughput 2.84449K wps
[Epoch 82 Batch 30/162] avg loss 0.00070593, throughput 2.90699K wps
[Epoch 82 Batch 60/162] avg loss 0.00081131, throughput 2.85335K wps
[Epoch 82 Batch 90/162] avg loss 0.000895346, throughput 2.84542K wps
[Epoch 82 Batch 120/162] avg loss 0.000649094, throughput 2.84051K wps
[Epoch 82 Batch 150/162] avg loss 0.000830971, throughput 2.8356K wps
Begin Testing...
[Epoch 82] train avg loss 0.000775025, dev acc 0.9333, dev avg loss 0.182877, throughput 2.85381K wps
[Epoch 83 Batch 30/162] avg loss 0.000774127, throughput 2.91022K wps
[Epoch 83 Batch 60/162] avg loss 0.00069103, throughput 2.85177K wps
[Epoch 83 Batch 90/162] avg loss 0.000844632, throughput 2.82922K wps
[Epoch 83 Batch 120/162] avg loss 0.000722067, throughput 2.84275K wps
[Epoch 83 Batch 150/162] avg loss 0.000770478, throughput 2.84719K wps
Begin Testing...
[Epoch 83] train avg loss 0.00076812, dev acc 0.9322, dev avg loss 0.182722, throughput 2.85355K wps
[Epoch 84 Batch 30/162] avg loss 0.000732252, throughput 2.91708K wps
[Epoch 84 Batch 60/162] avg loss 0.000694478, throughput 2.85017K wps
[Epoch 84 Batch 90/162] avg loss 0.000811342, throughput 2.83888K wps
[Epoch 84 Batch 120/162] avg loss 0.000747596, throughput 2.84301K wps
[Epoch 84 Batch 150/162] avg loss 0.000711958, throughput 2.83321K wps
Begin Testing...
[Epoch 84] train avg loss 0.000740365, dev acc 0.9322, dev avg loss 0.182735, throughput 2.85498K wps
[Epoch 85 Batch 30/162] avg loss 0.000695076, throughput 2.91958K wps
[Epoch 85 Batch 60/162] avg loss 0.000718542, throughput 2.83464K wps
[Epoch 85 Batch 90/162] avg loss 0.000703796, throughput 2.84631K wps
[Epoch 85 Batch 120/162] avg loss 0.000727039, throughput 2.84431K wps
[Epoch 85 Batch 150/162] avg loss 0.000809434, throughput 2.8392K wps
Begin Testing...
[Epoch 85] train avg loss 0.000725051, dev acc 0.9322, dev avg loss 0.182689, throughput 2.85511K wps
[Epoch 86 Batch 30/162] avg loss 0.000712748, throughput 2.92109K wps
[Epoch 86 Batch 60/162] avg loss 0.000778456, throughput 2.83557K wps
[Epoch 86 Batch 90/162] avg loss 0.000718751, throughput 2.84296K wps
[Epoch 86 Batch 120/162] avg loss 0.000629947, throughput 2.84982K wps
[Epoch 86 Batch 150/162] avg loss 0.000655273, throughput 2.84221K wps
Begin Testing...
[Epoch 86] train avg loss 0.000704286, dev acc 0.9344, dev avg loss 0.182772, throughput 2.85517K wps
Observed Improvement.
Begin Testing...
[Epoch 87 Batch 30/162] avg loss 0.000736317, throughput 2.89762K wps
[Epoch 87 Batch 60/162] avg loss 0.000683401, throughput 2.83176K wps
[Epoch 87 Batch 90/162] avg loss 0.000656872, throughput 2.84864K wps
[Epoch 87 Batch 120/162] avg loss 0.000734561, throughput 2.83176K wps
[Epoch 87 Batch 150/162] avg loss 0.00054551, throughput 2.8497K wps
Begin Testing...
[Epoch 87] train avg loss 0.000678004, dev acc 0.9322, dev avg loss 0.18315, throughput 2.85073K wps
[Epoch 88 Batch 30/162] avg loss 0.000694717, throughput 2.908K wps
[Epoch 88 Batch 60/162] avg loss 0.000645286, throughput 2.85121K wps
[Epoch 88 Batch 90/162] avg loss 0.000645438, throughput 2.83255K wps
[Epoch 88 Batch 120/162] avg loss 0.00066593, throughput 2.83763K wps
[Epoch 88 Batch 150/162] avg loss 0.00068665, throughput 2.85068K wps
Begin Testing...
[Epoch 88] train avg loss 0.000664278, dev acc 0.9322, dev avg loss 0.183776, throughput 2.85503K wps
[Epoch 89 Batch 30/162] avg loss 0.000634546, throughput 2.90492K wps
[Epoch 89 Batch 60/162] avg loss 0.000719976, throughput 2.84584K wps
[Epoch 89 Batch 90/162] avg loss 0.000613155, throughput 2.84093K wps
[Epoch 89 Batch 120/162] avg loss 0.000689992, throughput 2.84097K wps
[Epoch 89 Batch 150/162] avg loss 0.000679967, throughput 2.84145K wps
Begin Testing...
[Epoch 89] train avg loss 0.000670998, dev acc 0.9300, dev avg loss 0.182678, throughput 2.85377K wps
[Epoch 90 Batch 30/162] avg loss 0.000625664, throughput 2.90914K wps
[Epoch 90 Batch 60/162] avg loss 0.000681462, throughput 2.82799K wps
[Epoch 90 Batch 90/162] avg loss 0.000692526, throughput 2.8373K wps
[Epoch 90 Batch 120/162] avg loss 0.0005437, throughput 2.8435K wps
[Epoch 90 Batch 150/162] avg loss 0.000672078, throughput 2.83184K wps
Begin Testing...
[Epoch 90] train avg loss 0.000647305, dev acc 0.9333, dev avg loss 0.182869, throughput 2.84899K wps
[Epoch 91 Batch 30/162] avg loss 0.000551784, throughput 2.91363K wps
[Epoch 91 Batch 60/162] avg loss 0.00064759, throughput 2.83384K wps
[Epoch 91 Batch 90/162] avg loss 0.000644026, throughput 2.84512K wps
[Epoch 91 Batch 120/162] avg loss 0.00070908, throughput 2.84163K wps
[Epoch 91 Batch 150/162] avg loss 0.000698754, throughput 2.83258K wps
Begin Testing...
[Epoch 91] train avg loss 0.000645695, dev acc 0.9322, dev avg loss 0.183493, throughput 2.85262K wps
[Epoch 92 Batch 30/162] avg loss 0.000650779, throughput 2.91171K wps
[Epoch 92 Batch 60/162] avg loss 0.00050888, throughput 2.8397K wps
[Epoch 92 Batch 90/162] avg loss 0.000581673, throughput 2.83693K wps
[Epoch 92 Batch 120/162] avg loss 0.000581601, throughput 2.84296K wps
[Epoch 92 Batch 150/162] avg loss 0.000660745, throughput 2.84248K wps
Begin Testing...
[Epoch 92] train avg loss 0.000603448, dev acc 0.9322, dev avg loss 0.183229, throughput 2.85343K wps
[Epoch 93 Batch 30/162] avg loss 0.000626073, throughput 2.9088K wps
[Epoch 93 Batch 60/162] avg loss 0.000590895, throughput 2.84219K wps
[Epoch 93 Batch 90/162] avg loss 0.000615994, throughput 2.85271K wps
[Epoch 93 Batch 120/162] avg loss 0.000641885, throughput 2.83674K wps
[Epoch 93 Batch 150/162] avg loss 0.000575332, throughput 2.83882K wps
Begin Testing...
[Epoch 93] train avg loss 0.00061292, dev acc 0.9322, dev avg loss 0.18352, throughput 2.85421K wps
[Epoch 94 Batch 30/162] avg loss 0.000545733, throughput 2.90547K wps
[Epoch 94 Batch 60/162] avg loss 0.000572616, throughput 2.83129K wps
[Epoch 94 Batch 90/162] avg loss 0.000614079, throughput 2.8461K wps
[Epoch 94 Batch 120/162] avg loss 0.00058759, throughput 2.84342K wps
[Epoch 94 Batch 150/162] avg loss 0.000539575, throughput 2.84792K wps
Begin Testing...
[Epoch 94] train avg loss 0.000581466, dev acc 0.9322, dev avg loss 0.183869, throughput 2.85445K wps
[Epoch 95 Batch 30/162] avg loss 0.000617281, throughput 2.91273K wps
[Epoch 95 Batch 60/162] avg loss 0.000628697, throughput 2.83523K wps
[Epoch 95 Batch 90/162] avg loss 0.00059481, throughput 2.84226K wps
[Epoch 95 Batch 120/162] avg loss 0.000638422, throughput 2.84688K wps
[Epoch 95 Batch 150/162] avg loss 0.000645488, throughput 2.83761K wps
Begin Testing...
[Epoch 95] train avg loss 0.000624611, dev acc 0.9311, dev avg loss 0.184404, throughput 2.8536K wps
[Epoch 96 Batch 30/162] avg loss 0.000611832, throughput 2.89709K wps
[Epoch 96 Batch 60/162] avg loss 0.000504233, throughput 2.84354K wps
[Epoch 96 Batch 90/162] avg loss 0.000625282, throughput 2.84697K wps
[Epoch 96 Batch 120/162] avg loss 0.000520621, throughput 2.83727K wps
[Epoch 96 Batch 150/162] avg loss 0.000594488, throughput 2.83137K wps
Begin Testing...
[Epoch 96] train avg loss 0.000572791, dev acc 0.9333, dev avg loss 0.183708, throughput 2.84965K wps
[Epoch 97 Batch 30/162] avg loss 0.000499526, throughput 2.90652K wps
[Epoch 97 Batch 60/162] avg loss 0.000528186, throughput 2.85294K wps
[Epoch 97 Batch 90/162] avg loss 0.000545473, throughput 2.84057K wps
[Epoch 97 Batch 120/162] avg loss 0.000570977, throughput 2.8327K wps
[Epoch 97 Batch 150/162] avg loss 0.000618407, throughput 2.84751K wps
Begin Testing...
[Epoch 97] train avg loss 0.000558803, dev acc 0.9300, dev avg loss 0.184323, throughput 2.85319K wps
[Epoch 98 Batch 30/162] avg loss 0.000583122, throughput 2.91252K wps
[Epoch 98 Batch 60/162] avg loss 0.000491851, throughput 2.84483K wps
[Epoch 98 Batch 90/162] avg loss 0.000554512, throughput 2.83532K wps
[Epoch 98 Batch 120/162] avg loss 0.000590344, throughput 2.8517K wps
[Epoch 98 Batch 150/162] avg loss 0.000628892, throughput 2.84008K wps
Begin Testing...
[Epoch 98] train avg loss 0.000568472, dev acc 0.9311, dev avg loss 0.184716, throughput 2.85469K wps
[Epoch 99 Batch 30/162] avg loss 0.000555854, throughput 2.9122K wps
[Epoch 99 Batch 60/162] avg loss 0.000538221, throughput 2.84237K wps
[Epoch 99 Batch 90/162] avg loss 0.000549726, throughput 2.83362K wps
[Epoch 99 Batch 120/162] avg loss 0.000543228, throughput 2.84653K wps
[Epoch 99 Batch 150/162] avg loss 0.000559692, throughput 2.83795K wps
Begin Testing...
[Epoch 99] train avg loss 0.000565394, dev acc 0.9322, dev avg loss 0.184811, throughput 2.85262K wps
[Epoch 100 Batch 30/162] avg loss 0.000529773, throughput 2.91578K wps
[Epoch 100 Batch 60/162] avg loss 0.000519793, throughput 2.83184K wps
[Epoch 100 Batch 90/162] avg loss 0.000533748, throughput 2.84805K wps
[Epoch 100 Batch 120/162] avg loss 0.000507176, throughput 2.84392K wps
[Epoch 100 Batch 150/162] avg loss 0.000681935, throughput 2.83576K wps
Begin Testing...
[Epoch 100] train avg loss 0.000546981, dev acc 0.9289, dev avg loss 0.185409, throughput 2.85338K wps
[Epoch 101 Batch 30/162] avg loss 0.000510327, throughput 2.91114K wps
[Epoch 101 Batch 60/162] avg loss 0.00057231, throughput 2.83545K wps
[Epoch 101 Batch 90/162] avg loss 0.000505826, throughput 2.84776K wps
[Epoch 101 Batch 120/162] avg loss 0.000474002, throughput 2.83978K wps
[Epoch 101 Batch 150/162] avg loss 0.000506059, throughput 2.8443K wps
Begin Testing...
[Epoch 101] train avg loss 0.000508416, dev acc 0.9322, dev avg loss 0.184749, throughput 2.85412K wps
[Epoch 102 Batch 30/162] avg loss 0.000495706, throughput 2.91097K wps
[Epoch 102 Batch 60/162] avg loss 0.000567257, throughput 2.84289K wps
[Epoch 102 Batch 90/162] avg loss 0.000536775, throughput 2.84769K wps
[Epoch 102 Batch 120/162] avg loss 0.000539712, throughput 2.83632K wps
[Epoch 102 Batch 150/162] avg loss 0.000528339, throughput 2.83852K wps
Begin Testing...
[Epoch 102] train avg loss 0.000531064, dev acc 0.9322, dev avg loss 0.18469, throughput 2.85324K wps
[Epoch 103 Batch 30/162] avg loss 0.000465758, throughput 2.90252K wps
[Epoch 103 Batch 60/162] avg loss 0.000535708, throughput 2.83487K wps
[Epoch 103 Batch 90/162] avg loss 0.000491496, throughput 2.83523K wps
[Epoch 103 Batch 120/162] avg loss 0.000551284, throughput 2.83808K wps
[Epoch 103 Batch 150/162] avg loss 0.000534475, throughput 2.84185K wps
Begin Testing...
[Epoch 103] train avg loss 0.000505299, dev acc 0.9311, dev avg loss 0.184901, throughput 2.84844K wps
[Epoch 104 Batch 30/162] avg loss 0.000598224, throughput 2.90928K wps
[Epoch 104 Batch 60/162] avg loss 0.000426128, throughput 2.83176K wps
[Epoch 104 Batch 90/162] avg loss 0.000496549, throughput 2.83885K wps
[Epoch 104 Batch 120/162] avg loss 0.000526645, throughput 2.84207K wps
[Epoch 104 Batch 150/162] avg loss 0.000558884, throughput 2.83415K wps
Begin Testing...
[Epoch 104] train avg loss 0.000527997, dev acc 0.9278, dev avg loss 0.185688, throughput 2.84971K wps
[Epoch 105 Batch 30/162] avg loss 0.000514211, throughput 2.91083K wps
[Epoch 105 Batch 60/162] avg loss 0.000554959, throughput 2.8411K wps
[Epoch 105 Batch 90/162] avg loss 0.000427872, throughput 2.84625K wps
[Epoch 105 Batch 120/162] avg loss 0.000513805, throughput 2.83482K wps
[Epoch 105 Batch 150/162] avg loss 0.000512804, throughput 2.84132K wps
Begin Testing...
[Epoch 105] train avg loss 0.000504032, dev acc 0.9311, dev avg loss 0.185012, throughput 2.85556K wps
[Epoch 106 Batch 30/162] avg loss 0.000465137, throughput 2.89987K wps
[Epoch 106 Batch 60/162] avg loss 0.000500815, throughput 2.84088K wps
[Epoch 106 Batch 90/162] avg loss 0.000503217, throughput 2.83501K wps
[Epoch 106 Batch 120/162] avg loss 0.000475388, throughput 2.83549K wps
[Epoch 106 Batch 150/162] avg loss 0.000447305, throughput 2.8457K wps
Begin Testing...
[Epoch 106] train avg loss 0.000476814, dev acc 0.9322, dev avg loss 0.185221, throughput 2.8478K wps
[Epoch 107 Batch 30/162] avg loss 0.000422905, throughput 2.90087K wps
[Epoch 107 Batch 60/162] avg loss 0.000505639, throughput 2.8383K wps
[Epoch 107 Batch 90/162] avg loss 0.000511188, throughput 2.85199K wps
[Epoch 107 Batch 120/162] avg loss 0.000423785, throughput 2.84804K wps
[Epoch 107 Batch 150/162] avg loss 0.00045704, throughput 2.83651K wps
Begin Testing...
[Epoch 107] train avg loss 0.00046395, dev acc 0.9322, dev avg loss 0.185384, throughput 2.85305K wps
[Epoch 108 Batch 30/162] avg loss 0.000444335, throughput 2.90677K wps
[Epoch 108 Batch 60/162] avg loss 0.000480534, throughput 2.84226K wps
[Epoch 108 Batch 90/162] avg loss 0.000466444, throughput 2.84743K wps
[Epoch 108 Batch 120/162] avg loss 0.000474822, throughput 2.85141K wps
[Epoch 108 Batch 150/162] avg loss 0.000477699, throughput 2.84416K wps
Begin Testing...
[Epoch 108] train avg loss 0.000467543, dev acc 0.9278, dev avg loss 0.186317, throughput 2.85764K wps
[Epoch 109 Batch 30/162] avg loss 0.000413285, throughput 2.89535K wps
[Epoch 109 Batch 60/162] avg loss 0.000459082, throughput 2.83729K wps
[Epoch 109 Batch 90/162] avg loss 0.000498399, throughput 2.84902K wps
[Epoch 109 Batch 120/162] avg loss 0.000477842, throughput 2.84017K wps
[Epoch 109 Batch 150/162] avg loss 0.000449629, throughput 2.83678K wps
Begin Testing...
[Epoch 109] train avg loss 0.000467731, dev acc 0.9322, dev avg loss 0.185865, throughput 2.85251K wps
[Epoch 110 Batch 30/162] avg loss 0.000519711, throughput 2.90681K wps
[Epoch 110 Batch 60/162] avg loss 0.000382999, throughput 2.84469K wps
[Epoch 110 Batch 90/162] avg loss 0.000468246, throughput 2.84666K wps
[Epoch 110 Batch 120/162] avg loss 0.000439108, throughput 2.83436K wps
[Epoch 110 Batch 150/162] avg loss 0.000478861, throughput 2.84057K wps
Begin Testing...
[Epoch 110] train avg loss 0.000458329, dev acc 0.9278, dev avg loss 0.186841, throughput 2.85365K wps
[Epoch 111 Batch 30/162] avg loss 0.000464203, throughput 2.90419K wps
[Epoch 111 Batch 60/162] avg loss 0.000441478, throughput 2.85856K wps
[Epoch 111 Batch 90/162] avg loss 0.000374796, throughput 2.8453K wps
[Epoch 111 Batch 120/162] avg loss 0.000429272, throughput 2.84299K wps
[Epoch 111 Batch 150/162] avg loss 0.000564349, throughput 2.83958K wps
Begin Testing...
[Epoch 111] train avg loss 0.000448439, dev acc 0.9300, dev avg loss 0.186696, throughput 2.85821K wps
[Epoch 112 Batch 30/162] avg loss 0.000473743, throughput 2.91002K wps
[Epoch 112 Batch 60/162] avg loss 0.000460063, throughput 2.84238K wps
[Epoch 112 Batch 90/162] avg loss 0.000369553, throughput 2.84166K wps
[Epoch 112 Batch 120/162] avg loss 0.000417048, throughput 2.83049K wps
[Epoch 112 Batch 150/162] avg loss 0.000411311, throughput 2.85071K wps
Begin Testing...
[Epoch 112] train avg loss 0.000429492, dev acc 0.9322, dev avg loss 0.186941, throughput 2.85373K wps
[Epoch 113 Batch 30/162] avg loss 0.000464206, throughput 2.90131K wps
[Epoch 113 Batch 60/162] avg loss 0.000394278, throughput 2.84757K wps
[Epoch 113 Batch 90/162] avg loss 0.000425936, throughput 2.85661K wps
[Epoch 113 Batch 120/162] avg loss 0.000493385, throughput 2.83606K wps
[Epoch 113 Batch 150/162] avg loss 0.000497383, throughput 2.8481K wps
Begin Testing...
[Epoch 113] train avg loss 0.000452347, dev acc 0.9278, dev avg loss 0.187983, throughput 2.85692K wps
[Epoch 114 Batch 30/162] avg loss 0.00036174, throughput 2.91407K wps
[Epoch 114 Batch 60/162] avg loss 0.000427972, throughput 2.85003K wps
[Epoch 114 Batch 90/162] avg loss 0.000461767, throughput 2.85347K wps
[Epoch 114 Batch 120/162] avg loss 0.000401846, throughput 2.83157K wps
[Epoch 114 Batch 150/162] avg loss 0.000473263, throughput 2.84729K wps
Begin Testing...
[Epoch 114] train avg loss 0.000423978, dev acc 0.9311, dev avg loss 0.186506, throughput 2.85885K wps
[Epoch 115 Batch 30/162] avg loss 0.000397419, throughput 2.90908K wps
[Epoch 115 Batch 60/162] avg loss 0.000349459, throughput 2.84533K wps
[Epoch 115 Batch 90/162] avg loss 0.000508043, throughput 2.83922K wps
[Epoch 115 Batch 120/162] avg loss 0.000397317, throughput 2.82964K wps
[Epoch 115 Batch 150/162] avg loss 0.000410985, throughput 2.84502K wps
Begin Testing...
[Epoch 115] train avg loss 0.000413002, dev acc 0.9322, dev avg loss 0.186288, throughput 2.85323K wps
[Epoch 116 Batch 30/162] avg loss 0.000413871, throughput 2.90723K wps
[Epoch 116 Batch 60/162] avg loss 0.000412525, throughput 2.84705K wps
[Epoch 116 Batch 90/162] avg loss 0.000376684, throughput 2.83255K wps
[Epoch 116 Batch 120/162] avg loss 0.000433649, throughput 2.82276K wps
[Epoch 116 Batch 150/162] avg loss 0.000387486, throughput 2.84728K wps
Begin Testing...
[Epoch 116] train avg loss 0.0004116, dev acc 0.9300, dev avg loss 0.187065, throughput 2.8499K wps
[Epoch 117 Batch 30/162] avg loss 0.000415489, throughput 2.91661K wps
[Epoch 117 Batch 60/162] avg loss 0.000399719, throughput 2.84489K wps
[Epoch 117 Batch 90/162] avg loss 0.000396847, throughput 2.85166K wps
[Epoch 117 Batch 120/162] avg loss 0.000394774, throughput 2.84163K wps
[Epoch 117 Batch 150/162] avg loss 0.00041079, throughput 2.84326K wps
Begin Testing...
[Epoch 117] train avg loss 0.000399715, dev acc 0.9322, dev avg loss 0.1864, throughput 2.85953K wps
[Epoch 118 Batch 30/162] avg loss 0.000417722, throughput 2.91368K wps
[Epoch 118 Batch 60/162] avg loss 0.000448824, throughput 2.84074K wps
[Epoch 118 Batch 90/162] avg loss 0.000374592, throughput 2.83876K wps
[Epoch 118 Batch 120/162] avg loss 0.000353481, throughput 2.8338K wps
[Epoch 118 Batch 150/162] avg loss 0.000419149, throughput 2.83562K wps
Begin Testing...
[Epoch 118] train avg loss 0.000402687, dev acc 0.9311, dev avg loss 0.187243, throughput 2.8505K wps
[Epoch 119 Batch 30/162] avg loss 0.000388102, throughput 2.89383K wps
[Epoch 119 Batch 60/162] avg loss 0.000421393, throughput 2.83619K wps
[Epoch 119 Batch 90/162] avg loss 0.000347169, throughput 2.83778K wps
[Epoch 119 Batch 120/162] avg loss 0.000354497, throughput 2.84768K wps
[Epoch 119 Batch 150/162] avg loss 0.000350089, throughput 2.83895K wps
Begin Testing...
[Epoch 119] train avg loss 0.000379517, dev acc 0.9311, dev avg loss 0.187122, throughput 2.85013K wps
[Epoch 120 Batch 30/162] avg loss 0.000358106, throughput 2.90465K wps
[Epoch 120 Batch 60/162] avg loss 0.000371561, throughput 2.84517K wps
[Epoch 120 Batch 90/162] avg loss 0.000417195, throughput 2.84788K wps
[Epoch 120 Batch 120/162] avg loss 0.000342301, throughput 2.83632K wps
[Epoch 120 Batch 150/162] avg loss 0.000404652, throughput 2.84256K wps
Begin Testing...
[Epoch 120] train avg loss 0.000374561, dev acc 0.9311, dev avg loss 0.187287, throughput 2.85408K wps
[Epoch 121 Batch 30/162] avg loss 0.000345255, throughput 2.90476K wps
[Epoch 121 Batch 60/162] avg loss 0.000347101, throughput 2.84863K wps
[Epoch 121 Batch 90/162] avg loss 0.000344831, throughput 2.84609K wps
[Epoch 121 Batch 120/162] avg loss 0.000402338, throughput 2.83695K wps
[Epoch 121 Batch 150/162] avg loss 0.000359338, throughput 2.84475K wps
Begin Testing...
[Epoch 121] train avg loss 0.000360022, dev acc 0.9300, dev avg loss 0.187777, throughput 2.855K wps
[Epoch 122 Batch 30/162] avg loss 0.000371427, throughput 2.90217K wps
[Epoch 122 Batch 60/162] avg loss 0.000320245, throughput 2.85439K wps
[Epoch 122 Batch 90/162] avg loss 0.000404413, throughput 2.86189K wps
[Epoch 122 Batch 120/162] avg loss 0.000356227, throughput 2.83952K wps
[Epoch 122 Batch 150/162] avg loss 0.000367776, throughput 2.84983K wps
Begin Testing...
[Epoch 122] train avg loss 0.000360821, dev acc 0.9300, dev avg loss 0.18886, throughput 2.85981K wps
[Epoch 123 Batch 30/162] avg loss 0.000374699, throughput 2.89604K wps
[Epoch 123 Batch 60/162] avg loss 0.000427826, throughput 2.8535K wps
[Epoch 123 Batch 90/162] avg loss 0.000352377, throughput 2.85576K wps
[Epoch 123 Batch 120/162] avg loss 0.000375445, throughput 2.85094K wps
[Epoch 123 Batch 150/162] avg loss 0.000359524, throughput 2.84695K wps
Begin Testing...
[Epoch 123] train avg loss 0.000376555, dev acc 0.9278, dev avg loss 0.191196, throughput 2.85819K wps
[Epoch 124 Batch 30/162] avg loss 0.00033838, throughput 2.90768K wps
[Epoch 124 Batch 60/162] avg loss 0.000349818, throughput 2.83462K wps
[Epoch 124 Batch 90/162] avg loss 0.000332826, throughput 2.84034K wps
[Epoch 124 Batch 120/162] avg loss 0.000348835, throughput 2.84917K wps
[Epoch 124 Batch 150/162] avg loss 0.000445014, throughput 2.83854K wps
Begin Testing...
[Epoch 124] train avg loss 0.000360146, dev acc 0.9300, dev avg loss 0.188653, throughput 2.85323K wps
[Epoch 125 Batch 30/162] avg loss 0.000360215, throughput 2.90326K wps
[Epoch 125 Batch 60/162] avg loss 0.000285698, throughput 2.83993K wps
[Epoch 125 Batch 90/162] avg loss 0.000375705, throughput 2.84744K wps
[Epoch 125 Batch 120/162] avg loss 0.000299551, throughput 2.83072K wps
[Epoch 125 Batch 150/162] avg loss 0.000335209, throughput 2.84668K wps
Begin Testing...
[Epoch 125] train avg loss 0.000330596, dev acc 0.9311, dev avg loss 0.18887, throughput 2.85198K wps
[Epoch 126 Batch 30/162] avg loss 0.000367205, throughput 2.90844K wps
[Epoch 126 Batch 60/162] avg loss 0.000333055, throughput 2.8452K wps
[Epoch 126 Batch 90/162] avg loss 0.000380445, throughput 2.83401K wps
[Epoch 126 Batch 120/162] avg loss 0.000353744, throughput 2.83767K wps
[Epoch 126 Batch 150/162] avg loss 0.000325867, throughput 2.84043K wps
Begin Testing...
[Epoch 126] train avg loss 0.000350266, dev acc 0.9333, dev avg loss 0.188375, throughput 2.85145K wps
[Epoch 127 Batch 30/162] avg loss 0.000306848, throughput 2.91601K wps
[Epoch 127 Batch 60/162] avg loss 0.000390738, throughput 2.83861K wps
[Epoch 127 Batch 90/162] avg loss 0.000270065, throughput 2.83515K wps
[Epoch 127 Batch 120/162] avg loss 0.00033846, throughput 2.82937K wps
[Epoch 127 Batch 150/162] avg loss 0.000330748, throughput 2.84229K wps
Begin Testing...
[Epoch 127] train avg loss 0.00032649, dev acc 0.9311, dev avg loss 0.18871, throughput 2.85056K wps
[Epoch 128 Batch 30/162] avg loss 0.00034575, throughput 2.90684K wps
[Epoch 128 Batch 60/162] avg loss 0.000334169, throughput 2.84587K wps
[Epoch 128 Batch 90/162] avg loss 0.000287111, throughput 2.83678K wps
[Epoch 128 Batch 120/162] avg loss 0.000358143, throughput 2.83919K wps
[Epoch 128 Batch 150/162] avg loss 0.000331035, throughput 2.83177K wps
Begin Testing...
[Epoch 128] train avg loss 0.000332689, dev acc 0.9300, dev avg loss 0.189518, throughput 2.85191K wps
[Epoch 129 Batch 30/162] avg loss 0.000304716, throughput 2.91362K wps
[Epoch 129 Batch 60/162] avg loss 0.000339835, throughput 2.84039K wps
[Epoch 129 Batch 90/162] avg loss 0.00036196, throughput 2.84157K wps
[Epoch 129 Batch 120/162] avg loss 0.000380155, throughput 2.84803K wps
[Epoch 129 Batch 150/162] avg loss 0.000329311, throughput 2.85637K wps
Begin Testing...
[Epoch 129] train avg loss 0.00034207, dev acc 0.9300, dev avg loss 0.189587, throughput 2.85961K wps
[Epoch 130 Batch 30/162] avg loss 0.000334181, throughput 2.91383K wps
[Epoch 130 Batch 60/162] avg loss 0.000331928, throughput 2.84775K wps
[Epoch 130 Batch 90/162] avg loss 0.000353915, throughput 2.85636K wps
[Epoch 130 Batch 120/162] avg loss 0.000362307, throughput 2.8457K wps
[Epoch 130 Batch 150/162] avg loss 0.000342355, throughput 2.84291K wps
Begin Testing...
[Epoch 130] train avg loss 0.000344613, dev acc 0.9300, dev avg loss 0.189508, throughput 2.86148K wps
[Epoch 131 Batch 30/162] avg loss 0.000283876, throughput 2.91139K wps
[Epoch 131 Batch 60/162] avg loss 0.000371368, throughput 2.84181K wps
[Epoch 131 Batch 90/162] avg loss 0.000354941, throughput 2.83952K wps
[Epoch 131 Batch 120/162] avg loss 0.000307057, throughput 2.82869K wps
[Epoch 131 Batch 150/162] avg loss 0.000346204, throughput 2.84218K wps
Begin Testing...
[Epoch 131] train avg loss 0.000327607, dev acc 0.9278, dev avg loss 0.190565, throughput 2.85245K wps
[Epoch 132 Batch 30/162] avg loss 0.000347255, throughput 2.90935K wps
[Epoch 132 Batch 60/162] avg loss 0.000268174, throughput 2.84766K wps
[Epoch 132 Batch 90/162] avg loss 0.000326132, throughput 2.83176K wps
[Epoch 132 Batch 120/162] avg loss 0.000366651, throughput 2.84572K wps
[Epoch 132 Batch 150/162] avg loss 0.000292787, throughput 2.84751K wps
Begin Testing...
[Epoch 132] train avg loss 0.000320781, dev acc 0.9311, dev avg loss 0.189246, throughput 2.85616K wps
[Epoch 133 Batch 30/162] avg loss 0.000314804, throughput 2.90335K wps
[Epoch 133 Batch 60/162] avg loss 0.000280084, throughput 2.85361K wps
[Epoch 133 Batch 90/162] avg loss 0.000291091, throughput 2.86298K wps
[Epoch 133 Batch 120/162] avg loss 0.000390765, throughput 2.82719K wps
[Epoch 133 Batch 150/162] avg loss 0.000314733, throughput 2.83541K wps
Begin Testing...
[Epoch 133] train avg loss 0.000315851, dev acc 0.9300, dev avg loss 0.189578, throughput 2.8554K wps
[Epoch 134 Batch 30/162] avg loss 0.000320426, throughput 2.90994K wps
[Epoch 134 Batch 60/162] avg loss 0.000337219, throughput 2.83986K wps
[Epoch 134 Batch 90/162] avg loss 0.000292365, throughput 2.84547K wps
[Epoch 134 Batch 120/162] avg loss 0.000280788, throughput 2.8369K wps
[Epoch 134 Batch 150/162] avg loss 0.000278691, throughput 2.84213K wps
Begin Testing...
[Epoch 134] train avg loss 0.00030217, dev acc 0.9278, dev avg loss 0.190181, throughput 2.85473K wps
[Epoch 135 Batch 30/162] avg loss 0.000248723, throughput 2.90961K wps
[Epoch 135 Batch 60/162] avg loss 0.000292555, throughput 2.83919K wps
[Epoch 135 Batch 90/162] avg loss 0.000332161, throughput 2.83618K wps
[Epoch 135 Batch 120/162] avg loss 0.000275859, throughput 2.84977K wps
[Epoch 135 Batch 150/162] avg loss 0.000325947, throughput 2.8417K wps
Begin Testing...
[Epoch 135] train avg loss 0.000293043, dev acc 0.9289, dev avg loss 0.190186, throughput 2.85409K wps
[Epoch 136 Batch 30/162] avg loss 0.000307005, throughput 2.90259K wps
[Epoch 136 Batch 60/162] avg loss 0.000333093, throughput 2.83619K wps
[Epoch 136 Batch 90/162] avg loss 0.00031412, throughput 2.82903K wps
[Epoch 136 Batch 120/162] avg loss 0.000271974, throughput 2.84962K wps
[Epoch 136 Batch 150/162] avg loss 0.000279295, throughput 2.83781K wps
Begin Testing...
[Epoch 136] train avg loss 0.000297367, dev acc 0.9300, dev avg loss 0.190006, throughput 2.84971K wps
[Epoch 137 Batch 30/162] avg loss 0.000320024, throughput 2.90073K wps
[Epoch 137 Batch 60/162] avg loss 0.000292549, throughput 2.8406K wps
[Epoch 137 Batch 90/162] avg loss 0.000331828, throughput 2.84711K wps
[Epoch 137 Batch 120/162] avg loss 0.000286244, throughput 2.83212K wps
[Epoch 137 Batch 150/162] avg loss 0.000320118, throughput 2.84192K wps
Begin Testing...
[Epoch 137] train avg loss 0.000314775, dev acc 0.9300, dev avg loss 0.190098, throughput 2.85229K wps
[Epoch 138 Batch 30/162] avg loss 0.000257545, throughput 2.90719K wps
[Epoch 138 Batch 60/162] avg loss 0.000308076, throughput 2.84033K wps
[Epoch 138 Batch 90/162] avg loss 0.000251067, throughput 2.83634K wps
[Epoch 138 Batch 120/162] avg loss 0.000312842, throughput 2.83502K wps
[Epoch 138 Batch 150/162] avg loss 0.000274576, throughput 2.84783K wps
Begin Testing...
[Epoch 138] train avg loss 0.00028127, dev acc 0.9278, dev avg loss 0.191737, throughput 2.8533K wps
[Epoch 139 Batch 30/162] avg loss 0.000281009, throughput 2.92121K wps
[Epoch 139 Batch 60/162] avg loss 0.000283358, throughput 2.84915K wps
[Epoch 139 Batch 90/162] avg loss 0.000217785, throughput 2.84181K wps
[Epoch 139 Batch 120/162] avg loss 0.000347907, throughput 2.8413K wps
[Epoch 139 Batch 150/162] avg loss 0.000293277, throughput 2.84538K wps
Begin Testing...
[Epoch 139] train avg loss 0.000284132, dev acc 0.9289, dev avg loss 0.190739, throughput 2.85802K wps
[Epoch 140 Batch 30/162] avg loss 0.000308092, throughput 2.89814K wps
[Epoch 140 Batch 60/162] avg loss 0.000297364, throughput 2.83777K wps
[Epoch 140 Batch 90/162] avg loss 0.000288215, throughput 2.83969K wps
[Epoch 140 Batch 120/162] avg loss 0.000255655, throughput 2.84255K wps
[Epoch 140 Batch 150/162] avg loss 0.000262735, throughput 2.84701K wps
Begin Testing...
[Epoch 140] train avg loss 0.00028713, dev acc 0.9311, dev avg loss 0.190831, throughput 2.85084K wps
[Epoch 141 Batch 30/162] avg loss 0.000241726, throughput 2.91596K wps
[Epoch 141 Batch 60/162] avg loss 0.000284126, throughput 2.83511K wps
[Epoch 141 Batch 90/162] avg loss 0.000307432, throughput 2.84207K wps
[Epoch 141 Batch 120/162] avg loss 0.000278822, throughput 2.84292K wps
[Epoch 141 Batch 150/162] avg loss 0.0003274, throughput 2.84731K wps
Begin Testing...
[Epoch 141] train avg loss 0.000288768, dev acc 0.9289, dev avg loss 0.190885, throughput 2.85491K wps
[Epoch 142 Batch 30/162] avg loss 0.000287384, throughput 2.89877K wps
[Epoch 142 Batch 60/162] avg loss 0.000265452, throughput 2.82987K wps
[Epoch 142 Batch 90/162] avg loss 0.000299254, throughput 2.84632K wps
[Epoch 142 Batch 120/162] avg loss 0.000288245, throughput 2.83864K wps
[Epoch 142 Batch 150/162] avg loss 0.000276188, throughput 2.83703K wps
Begin Testing...
[Epoch 142] train avg loss 0.000282703, dev acc 0.9311, dev avg loss 0.190775, throughput 2.84844K wps
[Epoch 143 Batch 30/162] avg loss 0.000264049, throughput 2.90682K wps
[Epoch 143 Batch 60/162] avg loss 0.000289487, throughput 2.844K wps
[Epoch 143 Batch 90/162] avg loss 0.000243455, throughput 2.84402K wps
[Epoch 143 Batch 120/162] avg loss 0.000284826, throughput 2.83707K wps
[Epoch 143 Batch 150/162] avg loss 0.000272939, throughput 2.84175K wps
Begin Testing...
[Epoch 143] train avg loss 0.000269249, dev acc 0.9289, dev avg loss 0.191848, throughput 2.85375K wps
[Epoch 144 Batch 30/162] avg loss 0.000232073, throughput 2.90517K wps
[Epoch 144 Batch 60/162] avg loss 0.000289881, throughput 2.84522K wps
[Epoch 144 Batch 90/162] avg loss 0.000238605, throughput 2.8384K wps
[Epoch 144 Batch 120/162] avg loss 0.000232115, throughput 2.83787K wps
[Epoch 144 Batch 150/162] avg loss 0.000269187, throughput 2.84565K wps
Begin Testing...
[Epoch 144] train avg loss 0.000252351, dev acc 0.9278, dev avg loss 0.191825, throughput 2.85352K wps
[Epoch 145 Batch 30/162] avg loss 0.000305241, throughput 2.91482K wps
[Epoch 145 Batch 60/162] avg loss 0.000283828, throughput 2.85133K wps
[Epoch 145 Batch 90/162] avg loss 0.000258783, throughput 2.8552K wps
[Epoch 145 Batch 120/162] avg loss 0.000208, throughput 2.85046K wps
[Epoch 145 Batch 150/162] avg loss 0.000253175, throughput 2.84298K wps
Begin Testing...
[Epoch 145] train avg loss 0.00026281, dev acc 0.9278, dev avg loss 0.192516, throughput 2.86215K wps
[Epoch 146 Batch 30/162] avg loss 0.000250278, throughput 2.90875K wps
[Epoch 146 Batch 60/162] avg loss 0.00026434, throughput 2.84161K wps
[Epoch 146 Batch 90/162] avg loss 0.000258753, throughput 2.85561K wps
[Epoch 146 Batch 120/162] avg loss 0.000252489, throughput 2.83994K wps
[Epoch 146 Batch 150/162] avg loss 0.000275706, throughput 2.83425K wps
Begin Testing...
[Epoch 146] train avg loss 0.000262581, dev acc 0.9278, dev avg loss 0.192661, throughput 2.85421K wps
[Epoch 147 Batch 30/162] avg loss 0.000216559, throughput 2.90631K wps
[Epoch 147 Batch 60/162] avg loss 0.000224599, throughput 2.85013K wps
[Epoch 147 Batch 90/162] avg loss 0.000283869, throughput 2.8557K wps
[Epoch 147 Batch 120/162] avg loss 0.000262957, throughput 2.84551K wps
[Epoch 147 Batch 150/162] avg loss 0.000271608, throughput 2.83128K wps
Begin Testing...
[Epoch 147] train avg loss 0.000251905, dev acc 0.9278, dev avg loss 0.193697, throughput 2.8552K wps
[Epoch 148 Batch 30/162] avg loss 0.000237275, throughput 2.91341K wps
[Epoch 148 Batch 60/162] avg loss 0.000224778, throughput 2.848K wps
[Epoch 148 Batch 90/162] avg loss 0.000288097, throughput 2.85929K wps
[Epoch 148 Batch 120/162] avg loss 0.000228069, throughput 2.83887K wps
[Epoch 148 Batch 150/162] avg loss 0.000301661, throughput 2.84427K wps
Begin Testing...
[Epoch 148] train avg loss 0.000253056, dev acc 0.9289, dev avg loss 0.192514, throughput 2.86009K wps
[Epoch 149 Batch 30/162] avg loss 0.000266837, throughput 2.90955K wps
[Epoch 149 Batch 60/162] avg loss 0.000238807, throughput 2.84127K wps
[Epoch 149 Batch 90/162] avg loss 0.000271801, throughput 2.85025K wps
[Epoch 149 Batch 120/162] avg loss 0.000304323, throughput 2.84558K wps
[Epoch 149 Batch 150/162] avg loss 0.000255914, throughput 2.83315K wps
Begin Testing...
[Epoch 149] train avg loss 0.000274182, dev acc 0.9278, dev avg loss 0.193385, throughput 2.8552K wps
[Epoch 150 Batch 30/162] avg loss 0.000232553, throughput 2.89963K wps
[Epoch 150 Batch 60/162] avg loss 0.000258057, throughput 2.8495K wps
[Epoch 150 Batch 90/162] avg loss 0.000221153, throughput 2.84437K wps
[Epoch 150 Batch 120/162] avg loss 0.000255175, throughput 2.83229K wps
[Epoch 150 Batch 150/162] avg loss 0.000255913, throughput 2.82878K wps
Begin Testing...
[Epoch 150] train avg loss 0.000248366, dev acc 0.9311, dev avg loss 0.19258, throughput 2.84899K wps
[Epoch 151 Batch 30/162] avg loss 0.000255428, throughput 2.91188K wps
[Epoch 151 Batch 60/162] avg loss 0.000206324, throughput 2.83591K wps
[Epoch 151 Batch 90/162] avg loss 0.000247326, throughput 2.84294K wps
[Epoch 151 Batch 120/162] avg loss 0.000248599, throughput 2.851K wps
[Epoch 151 Batch 150/162] avg loss 0.000261759, throughput 2.83916K wps
Begin Testing...
[Epoch 151] train avg loss 0.000244562, dev acc 0.9300, dev avg loss 0.192869, throughput 2.85428K wps
[Epoch 152 Batch 30/162] avg loss 0.000265019, throughput 2.90654K wps
[Epoch 152 Batch 60/162] avg loss 0.000213607, throughput 2.83189K wps
[Epoch 152 Batch 90/162] avg loss 0.000226394, throughput 2.8497K wps
[Epoch 152 Batch 120/162] avg loss 0.000243852, throughput 2.85438K wps
[Epoch 152 Batch 150/162] avg loss 0.000223322, throughput 2.84076K wps
Begin Testing...
[Epoch 152] train avg loss 0.000236299, dev acc 0.9278, dev avg loss 0.193282, throughput 2.85558K wps
[Epoch 153 Batch 30/162] avg loss 0.000259043, throughput 2.90169K wps
[Epoch 153 Batch 60/162] avg loss 0.000245407, throughput 2.84144K wps
[Epoch 153 Batch 90/162] avg loss 0.000258038, throughput 2.84499K wps
[Epoch 153 Batch 120/162] avg loss 0.000235304, throughput 2.85019K wps
[Epoch 153 Batch 150/162] avg loss 0.000232244, throughput 2.84567K wps
Begin Testing...
[Epoch 153] train avg loss 0.000249195, dev acc 0.9278, dev avg loss 0.193444, throughput 2.85542K wps
[Epoch 154 Batch 30/162] avg loss 0.000252882, throughput 2.89884K wps
[Epoch 154 Batch 60/162] avg loss 0.000282669, throughput 2.84263K wps
[Epoch 154 Batch 90/162] avg loss 0.000234097, throughput 2.84957K wps
[Epoch 154 Batch 120/162] avg loss 0.000206179, throughput 2.84821K wps
[Epoch 154 Batch 150/162] avg loss 0.000227207, throughput 2.83793K wps
Begin Testing...
[Epoch 154] train avg loss 0.000241935, dev acc 0.9300, dev avg loss 0.193316, throughput 2.85597K wps
[Epoch 155 Batch 30/162] avg loss 0.000213742, throughput 2.90151K wps
[Epoch 155 Batch 60/162] avg loss 0.000235793, throughput 2.84866K wps
[Epoch 155 Batch 90/162] avg loss 0.000198635, throughput 2.83976K wps
[Epoch 155 Batch 120/162] avg loss 0.000275481, throughput 2.83625K wps
[Epoch 155 Batch 150/162] avg loss 0.000223239, throughput 2.84084K wps
Begin Testing...
[Epoch 155] train avg loss 0.000228823, dev acc 0.9289, dev avg loss 0.194376, throughput 2.85155K wps
[Epoch 156 Batch 30/162] avg loss 0.000236031, throughput 2.91375K wps
[Epoch 156 Batch 60/162] avg loss 0.000232623, throughput 2.84848K wps
[Epoch 156 Batch 90/162] avg loss 0.00023507, throughput 2.83692K wps
[Epoch 156 Batch 120/162] avg loss 0.000225807, throughput 2.83491K wps
[Epoch 156 Batch 150/162] avg loss 0.00021102, throughput 2.84192K wps
Begin Testing...
[Epoch 156] train avg loss 0.000228972, dev acc 0.9300, dev avg loss 0.193765, throughput 2.85337K wps
[Epoch 157 Batch 30/162] avg loss 0.000255383, throughput 2.90206K wps
[Epoch 157 Batch 60/162] avg loss 0.000229843, throughput 2.84512K wps
[Epoch 157 Batch 90/162] avg loss 0.000288905, throughput 2.83315K wps
[Epoch 157 Batch 120/162] avg loss 0.000204972, throughput 2.84962K wps
[Epoch 157 Batch 150/162] avg loss 0.000255722, throughput 2.84747K wps
Begin Testing...
[Epoch 157] train avg loss 0.000245766, dev acc 0.9278, dev avg loss 0.194108, throughput 2.85437K wps
[Epoch 158 Batch 30/162] avg loss 0.000196342, throughput 2.9158K wps
[Epoch 158 Batch 60/162] avg loss 0.000250998, throughput 2.83653K wps
[Epoch 158 Batch 90/162] avg loss 0.000224832, throughput 2.83458K wps
[Epoch 158 Batch 120/162] avg loss 0.000217174, throughput 2.83905K wps
[Epoch 158 Batch 150/162] avg loss 0.000226463, throughput 2.84315K wps
Begin Testing...
[Epoch 158] train avg loss 0.000226075, dev acc 0.9289, dev avg loss 0.193992, throughput 2.85227K wps
[Epoch 159 Batch 30/162] avg loss 0.000188952, throughput 2.89906K wps
[Epoch 159 Batch 60/162] avg loss 0.000210993, throughput 2.84247K wps
[Epoch 159 Batch 90/162] avg loss 0.00018622, throughput 2.84765K wps
[Epoch 159 Batch 120/162] avg loss 0.000237101, throughput 2.83473K wps
[Epoch 159 Batch 150/162] avg loss 0.000222585, throughput 2.84131K wps
Begin Testing...
[Epoch 159] train avg loss 0.000211595, dev acc 0.9278, dev avg loss 0.194348, throughput 2.85265K wps
[Epoch 160 Batch 30/162] avg loss 0.000231581, throughput 2.90176K wps
[Epoch 160 Batch 60/162] avg loss 0.000225723, throughput 2.83748K wps
[Epoch 160 Batch 90/162] avg loss 0.000248911, throughput 2.84133K wps
[Epoch 160 Batch 120/162] avg loss 0.000209071, throughput 2.83811K wps
[Epoch 160 Batch 150/162] avg loss 0.000235385, throughput 2.84748K wps
Begin Testing...
[Epoch 160] train avg loss 0.00022814, dev acc 0.9311, dev avg loss 0.193953, throughput 2.8525K wps
[Epoch 161 Batch 30/162] avg loss 0.000212818, throughput 2.92656K wps
[Epoch 161 Batch 60/162] avg loss 0.000206364, throughput 2.8496K wps
[Epoch 161 Batch 90/162] avg loss 0.000241508, throughput 2.83998K wps
[Epoch 161 Batch 120/162] avg loss 0.000255077, throughput 2.83705K wps
[Epoch 161 Batch 150/162] avg loss 0.00022405, throughput 2.84453K wps
Begin Testing...
[Epoch 161] train avg loss 0.000231191, dev acc 0.9289, dev avg loss 0.19385, throughput 2.85748K wps
[Epoch 162 Batch 30/162] avg loss 0.000236037, throughput 2.91058K wps
[Epoch 162 Batch 60/162] avg loss 0.000233683, throughput 2.84626K wps
[Epoch 162 Batch 90/162] avg loss 0.000204064, throughput 2.85141K wps
[Epoch 162 Batch 120/162] avg loss 0.000215414, throughput 2.83231K wps
[Epoch 162 Batch 150/162] avg loss 0.000224297, throughput 2.84191K wps
Begin Testing...
[Epoch 162] train avg loss 0.000221002, dev acc 0.9333, dev avg loss 0.194242, throughput 2.85615K wps
[Epoch 163 Batch 30/162] avg loss 0.000254609, throughput 2.91025K wps
[Epoch 163 Batch 60/162] avg loss 0.000231354, throughput 2.83613K wps
[Epoch 163 Batch 90/162] avg loss 0.000225421, throughput 2.83464K wps
[Epoch 163 Batch 120/162] avg loss 0.000203532, throughput 2.83716K wps
[Epoch 163 Batch 150/162] avg loss 0.000197396, throughput 2.85675K wps
Begin Testing...
[Epoch 163] train avg loss 0.000229089, dev acc 0.9289, dev avg loss 0.194852, throughput 2.85516K wps
[Epoch 164 Batch 30/162] avg loss 0.000203118, throughput 2.91192K wps
[Epoch 164 Batch 60/162] avg loss 0.000197604, throughput 2.85338K wps
[Epoch 164 Batch 90/162] avg loss 0.000197085, throughput 2.83297K wps
[Epoch 164 Batch 120/162] avg loss 0.000185756, throughput 2.84009K wps
[Epoch 164 Batch 150/162] avg loss 0.000290935, throughput 2.85414K wps
Begin Testing...
[Epoch 164] train avg loss 0.000217591, dev acc 0.9300, dev avg loss 0.194193, throughput 2.85615K wps
[Epoch 165 Batch 30/162] avg loss 0.000216189, throughput 2.904K wps
[Epoch 165 Batch 60/162] avg loss 0.000245795, throughput 2.83514K wps
[Epoch 165 Batch 90/162] avg loss 0.000240679, throughput 2.83672K wps
[Epoch 165 Batch 120/162] avg loss 0.000199668, throughput 2.84561K wps
[Epoch 165 Batch 150/162] avg loss 0.000205094, throughput 2.84033K wps
Begin Testing...
[Epoch 165] train avg loss 0.000220092, dev acc 0.9289, dev avg loss 0.194746, throughput 2.85056K wps
[Epoch 166 Batch 30/162] avg loss 0.000212229, throughput 2.9224K wps
[Epoch 166 Batch 60/162] avg loss 0.000183494, throughput 2.84563K wps
[Epoch 166 Batch 90/162] avg loss 0.000194275, throughput 2.83743K wps
[Epoch 166 Batch 120/162] avg loss 0.000236444, throughput 2.85408K wps
[Epoch 166 Batch 150/162] avg loss 0.000168795, throughput 2.84226K wps
Begin Testing...
[Epoch 166] train avg loss 0.000206789, dev acc 0.9289, dev avg loss 0.194932, throughput 2.85723K wps
[Epoch 167 Batch 30/162] avg loss 0.000237513, throughput 2.91197K wps
[Epoch 167 Batch 60/162] avg loss 0.000229203, throughput 2.84143K wps
[Epoch 167 Batch 90/162] avg loss 0.000225297, throughput 2.84029K wps
[Epoch 167 Batch 120/162] avg loss 0.000218613, throughput 2.8417K wps
[Epoch 167 Batch 150/162] avg loss 0.000221244, throughput 2.84073K wps
Begin Testing...
[Epoch 167] train avg loss 0.000219966, dev acc 0.9289, dev avg loss 0.195501, throughput 2.85343K wps
[Epoch 168 Batch 30/162] avg loss 0.000193792, throughput 2.90421K wps
[Epoch 168 Batch 60/162] avg loss 0.000215692, throughput 2.84298K wps
[Epoch 168 Batch 90/162] avg loss 0.000252731, throughput 2.84458K wps
[Epoch 168 Batch 120/162] avg loss 0.000233549, throughput 2.85341K wps
[Epoch 168 Batch 150/162] avg loss 0.000183538, throughput 2.83941K wps
Begin Testing...
[Epoch 168] train avg loss 0.000216073, dev acc 0.9300, dev avg loss 0.19553, throughput 2.85565K wps
[Epoch 169 Batch 30/162] avg loss 0.000185364, throughput 2.89928K wps
[Epoch 169 Batch 60/162] avg loss 0.000226762, throughput 2.8248K wps
[Epoch 169 Batch 90/162] avg loss 0.000194856, throughput 2.8361K wps
[Epoch 169 Batch 120/162] avg loss 0.000208881, throughput 2.84211K wps
[Epoch 169 Batch 150/162] avg loss 0.000186759, throughput 2.83466K wps
Begin Testing...
[Epoch 169] train avg loss 0.000198757, dev acc 0.9289, dev avg loss 0.195786, throughput 2.84725K wps
[Epoch 170 Batch 30/162] avg loss 0.000209526, throughput 2.89939K wps
[Epoch 170 Batch 60/162] avg loss 0.000191342, throughput 2.84889K wps
[Epoch 170 Batch 90/162] avg loss 0.00019715, throughput 2.83076K wps
[Epoch 170 Batch 120/162] avg loss 0.000194971, throughput 2.84243K wps
[Epoch 170 Batch 150/162] avg loss 0.000235097, throughput 2.84491K wps
Begin Testing...
[Epoch 170] train avg loss 0.000205509, dev acc 0.9278, dev avg loss 0.196928, throughput 2.85179K wps
[Epoch 171 Batch 30/162] avg loss 0.000205006, throughput 2.89309K wps
[Epoch 171 Batch 60/162] avg loss 0.000212882, throughput 2.83513K wps
[Epoch 171 Batch 90/162] avg loss 0.000187303, throughput 2.85077K wps
[Epoch 171 Batch 120/162] avg loss 0.000210347, throughput 2.84721K wps
[Epoch 171 Batch 150/162] avg loss 0.000169853, throughput 2.83964K wps
Begin Testing...
[Epoch 171] train avg loss 0.000194863, dev acc 0.9278, dev avg loss 0.196541, throughput 2.85234K wps
[Epoch 172 Batch 30/162] avg loss 0.000178182, throughput 2.89887K wps
[Epoch 172 Batch 60/162] avg loss 0.000175459, throughput 2.84346K wps
[Epoch 172 Batch 90/162] avg loss 0.000198492, throughput 2.8419K wps
[Epoch 172 Batch 120/162] avg loss 0.00019082, throughput 2.83697K wps
[Epoch 172 Batch 150/162] avg loss 0.000194167, throughput 2.83591K wps
Begin Testing...
[Epoch 172] train avg loss 0.000185135, dev acc 0.9289, dev avg loss 0.197975, throughput 2.85023K wps
[Epoch 173 Batch 30/162] avg loss 0.000203252, throughput 2.90973K wps
[Epoch 173 Batch 60/162] avg loss 0.000191438, throughput 2.83083K wps
[Epoch 173 Batch 90/162] avg loss 0.000247944, throughput 2.84073K wps
[Epoch 173 Batch 120/162] avg loss 0.000193332, throughput 2.82575K wps
[Epoch 173 Batch 150/162] avg loss 0.000190904, throughput 2.83071K wps
Begin Testing...
[Epoch 173] train avg loss 0.00020455, dev acc 0.9278, dev avg loss 0.197082, throughput 2.84637K wps
[Epoch 174 Batch 30/162] avg loss 0.000181666, throughput 2.89994K wps
[Epoch 174 Batch 60/162] avg loss 0.00019809, throughput 2.83493K wps
[Epoch 174 Batch 90/162] avg loss 0.000183533, throughput 2.84501K wps
[Epoch 174 Batch 120/162] avg loss 0.000169454, throughput 2.83567K wps
[Epoch 174 Batch 150/162] avg loss 0.000186045, throughput 2.82571K wps
Begin Testing...
[Epoch 174] train avg loss 0.000185419, dev acc 0.9278, dev avg loss 0.197741, throughput 2.84724K wps
[Epoch 175 Batch 30/162] avg loss 0.000207718, throughput 2.91392K wps
[Epoch 175 Batch 60/162] avg loss 0.000162048, throughput 2.84466K wps
[Epoch 175 Batch 90/162] avg loss 0.000170627, throughput 2.82709K wps
[Epoch 175 Batch 120/162] avg loss 0.00019171, throughput 2.83985K wps
[Epoch 175 Batch 150/162] avg loss 0.00022704, throughput 2.83987K wps
Begin Testing...
[Epoch 175] train avg loss 0.000191214, dev acc 0.9289, dev avg loss 0.197376, throughput 2.85208K wps
[Epoch 176 Batch 30/162] avg loss 0.000174717, throughput 2.90873K wps
[Epoch 176 Batch 60/162] avg loss 0.000192097, throughput 2.83841K wps
[Epoch 176 Batch 90/162] avg loss 0.000182247, throughput 2.84425K wps
[Epoch 176 Batch 120/162] avg loss 0.000206032, throughput 2.84301K wps
[Epoch 176 Batch 150/162] avg loss 0.000143757, throughput 2.84564K wps
Begin Testing...
[Epoch 176] train avg loss 0.000178782, dev acc 0.9300, dev avg loss 0.197329, throughput 2.85237K wps
[Epoch 177 Batch 30/162] avg loss 0.000172912, throughput 2.90152K wps
[Epoch 177 Batch 60/162] avg loss 0.000208667, throughput 2.84551K wps
[Epoch 177 Batch 90/162] avg loss 0.000169887, throughput 2.8444K wps
[Epoch 177 Batch 120/162] avg loss 0.000179619, throughput 2.83731K wps
[Epoch 177 Batch 150/162] avg loss 0.000210354, throughput 2.83671K wps
Begin Testing...
[Epoch 177] train avg loss 0.000188112, dev acc 0.9278, dev avg loss 0.197718, throughput 2.8525K wps
[Epoch 178 Batch 30/162] avg loss 0.000178548, throughput 2.91411K wps
[Epoch 178 Batch 60/162] avg loss 0.000175445, throughput 2.84872K wps
[Epoch 178 Batch 90/162] avg loss 0.000185885, throughput 2.84775K wps
[Epoch 178 Batch 120/162] avg loss 0.000207287, throughput 2.835K wps
[Epoch 178 Batch 150/162] avg loss 0.000195862, throughput 2.83798K wps
Begin Testing...
[Epoch 178] train avg loss 0.000190231, dev acc 0.9267, dev avg loss 0.198327, throughput 2.85572K wps
[Epoch 179 Batch 30/162] avg loss 0.000180186, throughput 2.90435K wps
[Epoch 179 Batch 60/162] avg loss 0.000181474, throughput 2.83446K wps
[Epoch 179 Batch 90/162] avg loss 0.000172114, throughput 2.84102K wps
[Epoch 179 Batch 120/162] avg loss 0.000165119, throughput 2.84148K wps
[Epoch 179 Batch 150/162] avg loss 0.000184943, throughput 2.84975K wps
Begin Testing...
[Epoch 179] train avg loss 0.000173694, dev acc 0.9278, dev avg loss 0.198129, throughput 2.85363K wps
[Epoch 180 Batch 30/162] avg loss 0.000202294, throughput 2.90306K wps
[Epoch 180 Batch 60/162] avg loss 0.000205664, throughput 2.83914K wps
[Epoch 180 Batch 90/162] avg loss 0.000180947, throughput 2.84367K wps
[Epoch 180 Batch 120/162] avg loss 0.000173043, throughput 2.84744K wps
[Epoch 180 Batch 150/162] avg loss 0.000168928, throughput 2.85041K wps
Begin Testing...
[Epoch 180] train avg loss 0.000187292, dev acc 0.9267, dev avg loss 0.198042, throughput 2.85489K wps
[Epoch 181 Batch 30/162] avg loss 0.000195125, throughput 2.91559K wps
[Epoch 181 Batch 60/162] avg loss 0.000166462, throughput 2.85251K wps
[Epoch 181 Batch 90/162] avg loss 0.000172344, throughput 2.84167K wps
[Epoch 181 Batch 120/162] avg loss 0.00023023, throughput 2.83292K wps
[Epoch 181 Batch 150/162] avg loss 0.000197418, throughput 2.84899K wps
Begin Testing...
[Epoch 181] train avg loss 0.000190568, dev acc 0.9267, dev avg loss 0.198114, throughput 2.85671K wps
[Epoch 182 Batch 30/162] avg loss 0.000159523, throughput 2.92265K wps
[Epoch 182 Batch 60/162] avg loss 0.000195332, throughput 2.83268K wps
[Epoch 182 Batch 90/162] avg loss 0.000173457, throughput 2.84881K wps
[Epoch 182 Batch 120/162] avg loss 0.000181764, throughput 2.83528K wps
[Epoch 182 Batch 150/162] avg loss 0.000186501, throughput 2.82785K wps
Begin Testing...
[Epoch 182] train avg loss 0.000179102, dev acc 0.9278, dev avg loss 0.197849, throughput 2.84968K wps
[Epoch 183 Batch 30/162] avg loss 0.000158263, throughput 2.90395K wps
[Epoch 183 Batch 60/162] avg loss 0.000189481, throughput 2.84245K wps
[Epoch 183 Batch 90/162] avg loss 0.000193333, throughput 2.84131K wps
[Epoch 183 Batch 120/162] avg loss 0.000189622, throughput 2.83575K wps
[Epoch 183 Batch 150/162] avg loss 0.000152242, throughput 2.8326K wps
Begin Testing...
[Epoch 183] train avg loss 0.000174736, dev acc 0.9300, dev avg loss 0.198225, throughput 2.84891K wps
[Epoch 184 Batch 30/162] avg loss 0.000178275, throughput 2.89456K wps
[Epoch 184 Batch 60/162] avg loss 0.000171624, throughput 2.83018K wps
[Epoch 184 Batch 90/162] avg loss 0.000185028, throughput 2.83374K wps
[Epoch 184 Batch 120/162] avg loss 0.000178715, throughput 2.84161K wps
[Epoch 184 Batch 150/162] avg loss 0.00018043, throughput 2.82576K wps
Begin Testing...
[Epoch 184] train avg loss 0.00017611, dev acc 0.9278, dev avg loss 0.198755, throughput 2.84319K wps
[Epoch 185 Batch 30/162] avg loss 0.000167945, throughput 2.8931K wps
[Epoch 185 Batch 60/162] avg loss 0.000171338, throughput 2.82679K wps
[Epoch 185 Batch 90/162] avg loss 0.000165677, throughput 2.82962K wps
[Epoch 185 Batch 120/162] avg loss 0.000149759, throughput 2.8344K wps
[Epoch 185 Batch 150/162] avg loss 0.000192046, throughput 2.84546K wps
Begin Testing...
[Epoch 185] train avg loss 0.000169814, dev acc 0.9278, dev avg loss 0.199286, throughput 2.84565K wps
[Epoch 186 Batch 30/162] avg loss 0.000161716, throughput 2.90985K wps
[Epoch 186 Batch 60/162] avg loss 0.00017267, throughput 2.83536K wps
[Epoch 186 Batch 90/162] avg loss 0.000178584, throughput 2.84784K wps
[Epoch 186 Batch 120/162] avg loss 0.000181837, throughput 2.83657K wps
[Epoch 186 Batch 150/162] avg loss 0.000159493, throughput 2.82857K wps
Begin Testing...
[Epoch 186] train avg loss 0.000171093, dev acc 0.9278, dev avg loss 0.19956, throughput 2.85004K wps
[Epoch 187 Batch 30/162] avg loss 0.000155255, throughput 2.89969K wps
[Epoch 187 Batch 60/162] avg loss 0.000212989, throughput 2.83143K wps
[Epoch 187 Batch 90/162] avg loss 0.000166558, throughput 2.83703K wps
[Epoch 187 Batch 120/162] avg loss 0.000164242, throughput 2.83722K wps
[Epoch 187 Batch 150/162] avg loss 0.000169081, throughput 2.8451K wps
Begin Testing...
[Epoch 187] train avg loss 0.000179565, dev acc 0.9289, dev avg loss 0.200893, throughput 2.84855K wps
[Epoch 188 Batch 30/162] avg loss 0.000194742, throughput 2.91978K wps
[Epoch 188 Batch 60/162] avg loss 0.000148407, throughput 2.84847K wps
[Epoch 188 Batch 90/162] avg loss 0.000136785, throughput 2.82857K wps
[Epoch 188 Batch 120/162] avg loss 0.000164868, throughput 2.83797K wps
[Epoch 188 Batch 150/162] avg loss 0.000164879, throughput 2.83704K wps
Begin Testing...
[Epoch 188] train avg loss 0.000163937, dev acc 0.9289, dev avg loss 0.199697, throughput 2.85342K wps
[Epoch 189 Batch 30/162] avg loss 0.00018879, throughput 2.9094K wps
[Epoch 189 Batch 60/162] avg loss 0.000150953, throughput 2.83667K wps
[Epoch 189 Batch 90/162] avg loss 0.000143316, throughput 2.84201K wps
[Epoch 189 Batch 120/162] avg loss 0.00016459, throughput 2.84513K wps
[Epoch 189 Batch 150/162] avg loss 0.00013669, throughput 2.83788K wps
Begin Testing...
[Epoch 189] train avg loss 0.000159522, dev acc 0.9289, dev avg loss 0.199532, throughput 2.85182K wps
[Epoch 190 Batch 30/162] avg loss 0.000156837, throughput 2.89214K wps
[Epoch 190 Batch 60/162] avg loss 0.000196536, throughput 2.84583K wps
[Epoch 190 Batch 90/162] avg loss 0.000172017, throughput 2.83021K wps
[Epoch 190 Batch 120/162] avg loss 0.000154374, throughput 2.84096K wps
[Epoch 190 Batch 150/162] avg loss 0.000173556, throughput 2.84954K wps
Begin Testing...
[Epoch 190] train avg loss 0.000170988, dev acc 0.9278, dev avg loss 0.200431, throughput 2.85117K wps
[Epoch 191 Batch 30/162] avg loss 0.000152832, throughput 2.90535K wps
[Epoch 191 Batch 60/162] avg loss 0.000167901, throughput 2.84793K wps
[Epoch 191 Batch 90/162] avg loss 0.00018606, throughput 2.83817K wps
[Epoch 191 Batch 120/162] avg loss 0.000179294, throughput 2.84364K wps
[Epoch 191 Batch 150/162] avg loss 0.000159033, throughput 2.84101K wps
Begin Testing...
[Epoch 191] train avg loss 0.000169701, dev acc 0.9278, dev avg loss 0.20039, throughput 2.85217K wps
[Epoch 192 Batch 30/162] avg loss 0.000182594, throughput 2.90559K wps
[Epoch 192 Batch 60/162] avg loss 0.000155131, throughput 2.83659K wps
[Epoch 192 Batch 90/162] avg loss 0.000176508, throughput 2.82329K wps
[Epoch 192 Batch 120/162] avg loss 0.000152633, throughput 2.82904K wps
[Epoch 192 Batch 150/162] avg loss 0.000183379, throughput 2.84019K wps
Begin Testing...
[Epoch 192] train avg loss 0.000168968, dev acc 0.9278, dev avg loss 0.200808, throughput 2.84672K wps
[Epoch 193 Batch 30/162] avg loss 0.000146459, throughput 2.90039K wps
[Epoch 193 Batch 60/162] avg loss 0.000150141, throughput 2.84531K wps
[Epoch 193 Batch 90/162] avg loss 0.000146722, throughput 2.8369K wps
[Epoch 193 Batch 120/162] avg loss 0.00016749, throughput 2.82955K wps
[Epoch 193 Batch 150/162] avg loss 0.000160312, throughput 2.83894K wps
Begin Testing...
[Epoch 193] train avg loss 0.000164186, dev acc 0.9289, dev avg loss 0.200328, throughput 2.84908K wps
[Epoch 194 Batch 30/162] avg loss 0.000192453, throughput 2.90529K wps
[Epoch 194 Batch 60/162] avg loss 0.000153334, throughput 2.84139K wps
[Epoch 194 Batch 90/162] avg loss 0.000196251, throughput 2.83275K wps
[Epoch 194 Batch 120/162] avg loss 0.000139574, throughput 2.83744K wps
[Epoch 194 Batch 150/162] avg loss 0.00013208, throughput 2.82532K wps
Begin Testing...
[Epoch 194] train avg loss 0.000161257, dev acc 0.9278, dev avg loss 0.200448, throughput 2.84642K wps
[Epoch 195 Batch 30/162] avg loss 0.000145504, throughput 2.91109K wps
[Epoch 195 Batch 60/162] avg loss 0.000141232, throughput 2.83623K wps
[Epoch 195 Batch 90/162] avg loss 0.000158807, throughput 2.80954K wps
[Epoch 195 Batch 120/162] avg loss 0.000180762, throughput 2.84468K wps
[Epoch 195 Batch 150/162] avg loss 0.000157619, throughput 2.83779K wps
Begin Testing...
[Epoch 195] train avg loss 0.000158038, dev acc 0.9267, dev avg loss 0.200764, throughput 2.84636K wps
[Epoch 196 Batch 30/162] avg loss 0.00017448, throughput 2.89637K wps
[Epoch 196 Batch 60/162] avg loss 0.000167398, throughput 2.84124K wps
[Epoch 196 Batch 90/162] avg loss 0.000167289, throughput 2.84874K wps
[Epoch 196 Batch 120/162] avg loss 0.000177866, throughput 2.82936K wps
[Epoch 196 Batch 150/162] avg loss 0.000144705, throughput 2.83537K wps
Begin Testing...
[Epoch 196] train avg loss 0.000164056, dev acc 0.9289, dev avg loss 0.201732, throughput 2.85116K wps
[Epoch 197 Batch 30/162] avg loss 0.000156198, throughput 2.90648K wps
[Epoch 197 Batch 60/162] avg loss 0.000164167, throughput 2.83554K wps
[Epoch 197 Batch 90/162] avg loss 0.000180859, throughput 2.82892K wps
[Epoch 197 Batch 120/162] avg loss 0.000172542, throughput 2.83979K wps
[Epoch 197 Batch 150/162] avg loss 0.000138017, throughput 2.84135K wps
Begin Testing...
[Epoch 197] train avg loss 0.000161513, dev acc 0.9267, dev avg loss 0.201006, throughput 2.84982K wps
[Epoch 198 Batch 30/162] avg loss 0.000142549, throughput 2.90129K wps
[Epoch 198 Batch 60/162] avg loss 0.000188285, throughput 2.84648K wps
[Epoch 198 Batch 90/162] avg loss 0.00015599, throughput 2.83105K wps
[Epoch 198 Batch 120/162] avg loss 0.000146221, throughput 2.85088K wps
[Epoch 198 Batch 150/162] avg loss 0.000152628, throughput 2.8255K wps
Begin Testing...
[Epoch 198] train avg loss 0.000155093, dev acc 0.9278, dev avg loss 0.201331, throughput 2.84929K wps
[Epoch 199 Batch 30/162] avg loss 0.000154718, throughput 2.90325K wps
[Epoch 199 Batch 60/162] avg loss 0.000156322, throughput 2.8443K wps
[Epoch 199 Batch 90/162] avg loss 0.00014461, throughput 2.84656K wps
[Epoch 199 Batch 120/162] avg loss 0.00013137, throughput 2.8335K wps
[Epoch 199 Batch 150/162] avg loss 0.000153806, throughput 2.84148K wps
Begin Testing...
[Epoch 199] train avg loss 0.000149218, dev acc 0.9278, dev avg loss 0.201532, throughput 2.8527K wps
Test loss 0.229886, test acc 0.9180
Total time cost 1416.29s
[Epoch 0 Batch 30/162] avg loss 0.0139928, throughput 2.42357K wps
[Epoch 0 Batch 60/162] avg loss 0.0136382, throughput 2.85513K wps
[Epoch 0 Batch 90/162] avg loss 0.0135613, throughput 2.83501K wps
[Epoch 0 Batch 120/162] avg loss 0.013214, throughput 2.84076K wps
[Epoch 0 Batch 150/162] avg loss 0.013078, throughput 2.84397K wps
Begin Testing...
[Epoch 0] train avg loss 0.0134628, dev acc 0.7656, dev avg loss 0.634989, throughput 2.75399K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0126921, throughput 2.90308K wps
[Epoch 1 Batch 60/162] avg loss 0.0126255, throughput 2.84615K wps
[Epoch 1 Batch 90/162] avg loss 0.0122611, throughput 2.83533K wps
[Epoch 1 Batch 120/162] avg loss 0.0119908, throughput 2.83377K wps
[Epoch 1 Batch 150/162] avg loss 0.0118279, throughput 2.84045K wps
Begin Testing...
[Epoch 1] train avg loss 0.0122354, dev acc 0.8522, dev avg loss 0.576682, throughput 2.85056K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/162] avg loss 0.0113894, throughput 2.90757K wps
[Epoch 2 Batch 60/162] avg loss 0.0112074, throughput 2.83624K wps
[Epoch 2 Batch 90/162] avg loss 0.011013, throughput 2.84014K wps
[Epoch 2 Batch 120/162] avg loss 0.010666, throughput 2.84718K wps
[Epoch 2 Batch 150/162] avg loss 0.0105835, throughput 2.83765K wps
Begin Testing...
[Epoch 2] train avg loss 0.0109395, dev acc 0.8611, dev avg loss 0.51191, throughput 2.85154K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0101858, throughput 2.88505K wps
[Epoch 3 Batch 60/162] avg loss 0.0101342, throughput 2.84407K wps
[Epoch 3 Batch 90/162] avg loss 0.00941517, throughput 2.82388K wps
[Epoch 3 Batch 120/162] avg loss 0.00947321, throughput 2.84615K wps
[Epoch 3 Batch 150/162] avg loss 0.00908063, throughput 2.83778K wps
Begin Testing...
[Epoch 3] train avg loss 0.00964916, dev acc 0.8667, dev avg loss 0.449545, throughput 2.84478K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/162] avg loss 0.00925274, throughput 2.91897K wps
[Epoch 4 Batch 60/162] avg loss 0.00859359, throughput 2.83671K wps
[Epoch 4 Batch 90/162] avg loss 0.00872379, throughput 2.8415K wps
[Epoch 4 Batch 120/162] avg loss 0.00850744, throughput 2.84671K wps
[Epoch 4 Batch 150/162] avg loss 0.00827162, throughput 2.84372K wps
Begin Testing...
[Epoch 4] train avg loss 0.00862298, dev acc 0.8722, dev avg loss 0.401285, throughput 2.85626K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.00820157, throughput 2.89892K wps
[Epoch 5 Batch 60/162] avg loss 0.0079874, throughput 2.83777K wps
[Epoch 5 Batch 90/162] avg loss 0.00778605, throughput 2.83931K wps
[Epoch 5 Batch 120/162] avg loss 0.00758078, throughput 2.82136K wps
[Epoch 5 Batch 150/162] avg loss 0.00733559, throughput 2.82052K wps
Begin Testing...
[Epoch 5] train avg loss 0.00777852, dev acc 0.8822, dev avg loss 0.366939, throughput 2.84166K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/162] avg loss 0.00703992, throughput 2.89922K wps
[Epoch 6 Batch 60/162] avg loss 0.00743656, throughput 2.82931K wps
[Epoch 6 Batch 90/162] avg loss 0.00712293, throughput 2.83323K wps
[Epoch 6 Batch 120/162] avg loss 0.00716823, throughput 2.82538K wps
[Epoch 6 Batch 150/162] avg loss 0.00705884, throughput 2.83026K wps
Begin Testing...
[Epoch 6] train avg loss 0.00716806, dev acc 0.8878, dev avg loss 0.340562, throughput 2.84296K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/162] avg loss 0.00705861, throughput 2.89938K wps
[Epoch 7 Batch 60/162] avg loss 0.00691282, throughput 2.82578K wps
[Epoch 7 Batch 90/162] avg loss 0.00669264, throughput 2.83328K wps
[Epoch 7 Batch 120/162] avg loss 0.00646913, throughput 2.83489K wps
[Epoch 7 Batch 150/162] avg loss 0.00656076, throughput 2.83597K wps
Begin Testing...
[Epoch 7] train avg loss 0.00674167, dev acc 0.8944, dev avg loss 0.321678, throughput 2.84403K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.00656871, throughput 2.89576K wps
[Epoch 8 Batch 60/162] avg loss 0.0066884, throughput 2.83534K wps
[Epoch 8 Batch 90/162] avg loss 0.00627374, throughput 2.84114K wps
[Epoch 8 Batch 120/162] avg loss 0.00599318, throughput 2.8421K wps
[Epoch 8 Batch 150/162] avg loss 0.00603882, throughput 2.83533K wps
Begin Testing...
[Epoch 8] train avg loss 0.00634184, dev acc 0.8933, dev avg loss 0.307152, throughput 2.84924K wps
[Epoch 9 Batch 30/162] avg loss 0.00625776, throughput 2.90231K wps
[Epoch 9 Batch 60/162] avg loss 0.00605372, throughput 2.83996K wps
[Epoch 9 Batch 90/162] avg loss 0.00590293, throughput 2.83224K wps
[Epoch 9 Batch 120/162] avg loss 0.00560495, throughput 2.84093K wps
[Epoch 9 Batch 150/162] avg loss 0.0061111, throughput 2.84004K wps
Begin Testing...
[Epoch 9] train avg loss 0.0059759, dev acc 0.9022, dev avg loss 0.293582, throughput 2.85106K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00593607, throughput 2.90239K wps
[Epoch 10 Batch 60/162] avg loss 0.00610846, throughput 2.83416K wps
[Epoch 10 Batch 90/162] avg loss 0.00558912, throughput 2.84139K wps
[Epoch 10 Batch 120/162] avg loss 0.00506218, throughput 2.83799K wps
[Epoch 10 Batch 150/162] avg loss 0.00573599, throughput 2.84548K wps
Begin Testing...
[Epoch 10] train avg loss 0.00570317, dev acc 0.9056, dev avg loss 0.283202, throughput 2.85113K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00528995, throughput 2.90904K wps
[Epoch 11 Batch 60/162] avg loss 0.00533007, throughput 2.84828K wps
[Epoch 11 Batch 90/162] avg loss 0.00583882, throughput 2.84953K wps
[Epoch 11 Batch 120/162] avg loss 0.00560021, throughput 2.82983K wps
[Epoch 11 Batch 150/162] avg loss 0.00530019, throughput 2.83044K wps
Begin Testing...
[Epoch 11] train avg loss 0.00549055, dev acc 0.9078, dev avg loss 0.274427, throughput 2.85154K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00538134, throughput 2.89602K wps
[Epoch 12 Batch 60/162] avg loss 0.0053174, throughput 2.84233K wps
[Epoch 12 Batch 90/162] avg loss 0.00522508, throughput 2.84164K wps
[Epoch 12 Batch 120/162] avg loss 0.00543311, throughput 2.82487K wps
[Epoch 12 Batch 150/162] avg loss 0.0050057, throughput 2.81496K wps
Begin Testing...
[Epoch 12] train avg loss 0.00526925, dev acc 0.9078, dev avg loss 0.266698, throughput 2.84344K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.00517258, throughput 2.90112K wps
[Epoch 13 Batch 60/162] avg loss 0.00513476, throughput 2.83614K wps
[Epoch 13 Batch 90/162] avg loss 0.00481706, throughput 2.84681K wps
[Epoch 13 Batch 120/162] avg loss 0.00516643, throughput 2.83095K wps
[Epoch 13 Batch 150/162] avg loss 0.00496147, throughput 2.83475K wps
Begin Testing...
[Epoch 13] train avg loss 0.0050657, dev acc 0.9089, dev avg loss 0.260187, throughput 2.85017K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00473346, throughput 2.9098K wps
[Epoch 14 Batch 60/162] avg loss 0.00493008, throughput 2.83524K wps
[Epoch 14 Batch 90/162] avg loss 0.0050811, throughput 2.82701K wps
[Epoch 14 Batch 120/162] avg loss 0.00451625, throughput 2.83053K wps
[Epoch 14 Batch 150/162] avg loss 0.00499171, throughput 2.82846K wps
Begin Testing...
[Epoch 14] train avg loss 0.0048715, dev acc 0.9100, dev avg loss 0.253944, throughput 2.84629K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00461456, throughput 2.89825K wps
[Epoch 15 Batch 60/162] avg loss 0.0048179, throughput 2.83419K wps
[Epoch 15 Batch 90/162] avg loss 0.00473146, throughput 2.82676K wps
[Epoch 15 Batch 120/162] avg loss 0.00479764, throughput 2.82535K wps
[Epoch 15 Batch 150/162] avg loss 0.00481071, throughput 2.8378K wps
Begin Testing...
[Epoch 15] train avg loss 0.00474529, dev acc 0.9100, dev avg loss 0.251876, throughput 2.84165K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/162] avg loss 0.00460058, throughput 2.88713K wps
[Epoch 16 Batch 60/162] avg loss 0.00449844, throughput 2.82619K wps
[Epoch 16 Batch 90/162] avg loss 0.00473765, throughput 2.82897K wps
[Epoch 16 Batch 120/162] avg loss 0.00448688, throughput 2.82582K wps
[Epoch 16 Batch 150/162] avg loss 0.0048505, throughput 2.83477K wps
Begin Testing...
[Epoch 16] train avg loss 0.00460528, dev acc 0.9111, dev avg loss 0.244768, throughput 2.8391K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.00395343, throughput 2.89601K wps
[Epoch 17 Batch 60/162] avg loss 0.00459927, throughput 2.82296K wps
[Epoch 17 Batch 90/162] avg loss 0.00441842, throughput 2.83218K wps
[Epoch 17 Batch 120/162] avg loss 0.00440981, throughput 2.82764K wps
[Epoch 17 Batch 150/162] avg loss 0.00472089, throughput 2.82329K wps
Begin Testing...
[Epoch 17] train avg loss 0.00442239, dev acc 0.9111, dev avg loss 0.240153, throughput 2.83979K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/162] avg loss 0.00485319, throughput 2.90456K wps
[Epoch 18 Batch 60/162] avg loss 0.00415564, throughput 2.83029K wps
[Epoch 18 Batch 90/162] avg loss 0.0041886, throughput 2.82661K wps
[Epoch 18 Batch 120/162] avg loss 0.00442258, throughput 2.82538K wps
[Epoch 18 Batch 150/162] avg loss 0.00379512, throughput 2.83177K wps
Begin Testing...
[Epoch 18] train avg loss 0.00432728, dev acc 0.9122, dev avg loss 0.236525, throughput 2.84198K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.00454695, throughput 2.89671K wps
[Epoch 19 Batch 60/162] avg loss 0.00396343, throughput 2.83273K wps
[Epoch 19 Batch 90/162] avg loss 0.00403596, throughput 2.82511K wps
[Epoch 19 Batch 120/162] avg loss 0.0043382, throughput 2.81935K wps
[Epoch 19 Batch 150/162] avg loss 0.00414964, throughput 2.83078K wps
Begin Testing...
[Epoch 19] train avg loss 0.00421737, dev acc 0.9200, dev avg loss 0.236523, throughput 2.83929K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/162] avg loss 0.00411707, throughput 2.89886K wps
[Epoch 20 Batch 60/162] avg loss 0.0038819, throughput 2.83115K wps
[Epoch 20 Batch 90/162] avg loss 0.00421721, throughput 2.83606K wps
[Epoch 20 Batch 120/162] avg loss 0.00381494, throughput 2.81886K wps
[Epoch 20 Batch 150/162] avg loss 0.00425598, throughput 2.82691K wps
Begin Testing...
[Epoch 20] train avg loss 0.0040617, dev acc 0.9111, dev avg loss 0.229554, throughput 2.84131K wps
[Epoch 21 Batch 30/162] avg loss 0.0038524, throughput 2.89965K wps
[Epoch 21 Batch 60/162] avg loss 0.00403452, throughput 2.82646K wps
[Epoch 21 Batch 90/162] avg loss 0.0038408, throughput 2.82941K wps
[Epoch 21 Batch 120/162] avg loss 0.00437686, throughput 2.82486K wps
[Epoch 21 Batch 150/162] avg loss 0.00388231, throughput 2.8398K wps
Begin Testing...
[Epoch 21] train avg loss 0.00398927, dev acc 0.9144, dev avg loss 0.227186, throughput 2.84337K wps
[Epoch 22 Batch 30/162] avg loss 0.00400883, throughput 2.92075K wps
[Epoch 22 Batch 60/162] avg loss 0.00387048, throughput 2.8377K wps
[Epoch 22 Batch 90/162] avg loss 0.00372328, throughput 2.83639K wps
[Epoch 22 Batch 120/162] avg loss 0.00403168, throughput 2.83591K wps
[Epoch 22 Batch 150/162] avg loss 0.00366322, throughput 2.84312K wps
Begin Testing...
[Epoch 22] train avg loss 0.00388014, dev acc 0.9200, dev avg loss 0.225192, throughput 2.85382K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/162] avg loss 0.00362744, throughput 2.92064K wps
[Epoch 23 Batch 60/162] avg loss 0.00375972, throughput 2.84173K wps
[Epoch 23 Batch 90/162] avg loss 0.0038462, throughput 2.82761K wps
[Epoch 23 Batch 120/162] avg loss 0.00375908, throughput 2.83525K wps
[Epoch 23 Batch 150/162] avg loss 0.00367569, throughput 2.84815K wps
Begin Testing...
[Epoch 23] train avg loss 0.00374616, dev acc 0.9222, dev avg loss 0.223616, throughput 2.85317K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/162] avg loss 0.00365919, throughput 2.9167K wps
[Epoch 24 Batch 60/162] avg loss 0.00368303, throughput 2.83755K wps
[Epoch 24 Batch 90/162] avg loss 0.0035171, throughput 2.84651K wps
[Epoch 24 Batch 120/162] avg loss 0.00342818, throughput 2.85644K wps
[Epoch 24 Batch 150/162] avg loss 0.00408807, throughput 2.83329K wps
Begin Testing...
[Epoch 24] train avg loss 0.00364399, dev acc 0.9167, dev avg loss 0.219271, throughput 2.8562K wps
[Epoch 25 Batch 30/162] avg loss 0.0036545, throughput 2.90789K wps
[Epoch 25 Batch 60/162] avg loss 0.00359894, throughput 2.83712K wps
[Epoch 25 Batch 90/162] avg loss 0.00341322, throughput 2.84245K wps
[Epoch 25 Batch 120/162] avg loss 0.00376222, throughput 2.85395K wps
[Epoch 25 Batch 150/162] avg loss 0.00334377, throughput 2.83581K wps
Begin Testing...
[Epoch 25] train avg loss 0.0035235, dev acc 0.9189, dev avg loss 0.217585, throughput 2.85361K wps
[Epoch 26 Batch 30/162] avg loss 0.00310864, throughput 2.91248K wps
[Epoch 26 Batch 60/162] avg loss 0.00378495, throughput 2.83429K wps
[Epoch 26 Batch 90/162] avg loss 0.00345359, throughput 2.83847K wps
[Epoch 26 Batch 120/162] avg loss 0.00372753, throughput 2.83261K wps
[Epoch 26 Batch 150/162] avg loss 0.00321691, throughput 2.83877K wps
Begin Testing...
[Epoch 26] train avg loss 0.00344522, dev acc 0.9300, dev avg loss 0.216205, throughput 2.85127K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.00346308, throughput 2.90296K wps
[Epoch 27 Batch 60/162] avg loss 0.00320654, throughput 2.84704K wps
[Epoch 27 Batch 90/162] avg loss 0.00320069, throughput 2.84359K wps
[Epoch 27 Batch 120/162] avg loss 0.00332923, throughput 2.83635K wps
[Epoch 27 Batch 150/162] avg loss 0.00349292, throughput 2.84448K wps
Begin Testing...
[Epoch 27] train avg loss 0.00332218, dev acc 0.9244, dev avg loss 0.212943, throughput 2.85409K wps
[Epoch 28 Batch 30/162] avg loss 0.0033811, throughput 2.90994K wps
[Epoch 28 Batch 60/162] avg loss 0.00305814, throughput 2.84424K wps
[Epoch 28 Batch 90/162] avg loss 0.00330583, throughput 2.85359K wps
[Epoch 28 Batch 120/162] avg loss 0.00302133, throughput 2.83396K wps
[Epoch 28 Batch 150/162] avg loss 0.00339346, throughput 2.83964K wps
Begin Testing...
[Epoch 28] train avg loss 0.00326727, dev acc 0.9256, dev avg loss 0.211338, throughput 2.85548K wps
[Epoch 29 Batch 30/162] avg loss 0.00318898, throughput 2.90483K wps
[Epoch 29 Batch 60/162] avg loss 0.00285024, throughput 2.84594K wps
[Epoch 29 Batch 90/162] avg loss 0.00305893, throughput 2.84962K wps
[Epoch 29 Batch 120/162] avg loss 0.00340251, throughput 2.83545K wps
[Epoch 29 Batch 150/162] avg loss 0.00304593, throughput 2.84487K wps
Begin Testing...
[Epoch 29] train avg loss 0.0031054, dev acc 0.9278, dev avg loss 0.20969, throughput 2.85483K wps
[Epoch 30 Batch 30/162] avg loss 0.0029205, throughput 2.90381K wps
[Epoch 30 Batch 60/162] avg loss 0.00326903, throughput 2.83523K wps
[Epoch 30 Batch 90/162] avg loss 0.00305568, throughput 2.83212K wps
[Epoch 30 Batch 120/162] avg loss 0.00299213, throughput 2.83539K wps
[Epoch 30 Batch 150/162] avg loss 0.00332984, throughput 2.84403K wps
Begin Testing...
[Epoch 30] train avg loss 0.00308083, dev acc 0.9278, dev avg loss 0.208049, throughput 2.84798K wps
[Epoch 31 Batch 30/162] avg loss 0.00286663, throughput 2.89995K wps
[Epoch 31 Batch 60/162] avg loss 0.00294428, throughput 2.83628K wps
[Epoch 31 Batch 90/162] avg loss 0.00274796, throughput 2.8247K wps
[Epoch 31 Batch 120/162] avg loss 0.00300673, throughput 2.83132K wps
[Epoch 31 Batch 150/162] avg loss 0.00336077, throughput 2.82804K wps
Begin Testing...
[Epoch 31] train avg loss 0.00295267, dev acc 0.9267, dev avg loss 0.207317, throughput 2.84266K wps
[Epoch 32 Batch 30/162] avg loss 0.00305193, throughput 2.91007K wps
[Epoch 32 Batch 60/162] avg loss 0.00294869, throughput 2.8276K wps
[Epoch 32 Batch 90/162] avg loss 0.00283511, throughput 2.82674K wps
[Epoch 32 Batch 120/162] avg loss 0.0027215, throughput 2.84393K wps
[Epoch 32 Batch 150/162] avg loss 0.0029846, throughput 2.83253K wps
Begin Testing...
[Epoch 32] train avg loss 0.0029074, dev acc 0.9278, dev avg loss 0.206096, throughput 2.84736K wps
[Epoch 33 Batch 30/162] avg loss 0.00300061, throughput 2.90377K wps
[Epoch 33 Batch 60/162] avg loss 0.00283444, throughput 2.83805K wps
[Epoch 33 Batch 90/162] avg loss 0.00285563, throughput 2.83838K wps
[Epoch 33 Batch 120/162] avg loss 0.00278824, throughput 2.83367K wps
[Epoch 33 Batch 150/162] avg loss 0.00288621, throughput 2.84496K wps
Begin Testing...
[Epoch 33] train avg loss 0.00289912, dev acc 0.9311, dev avg loss 0.204491, throughput 2.85085K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/162] avg loss 0.00285122, throughput 2.91035K wps
[Epoch 34 Batch 60/162] avg loss 0.00292098, throughput 2.84196K wps
[Epoch 34 Batch 90/162] avg loss 0.002954, throughput 2.84096K wps
[Epoch 34 Batch 120/162] avg loss 0.00251253, throughput 2.83667K wps
[Epoch 34 Batch 150/162] avg loss 0.0026252, throughput 2.84392K wps
Begin Testing...
[Epoch 34] train avg loss 0.00278006, dev acc 0.9300, dev avg loss 0.203562, throughput 2.85313K wps
[Epoch 35 Batch 30/162] avg loss 0.00248231, throughput 2.91249K wps
[Epoch 35 Batch 60/162] avg loss 0.00261722, throughput 2.82976K wps
[Epoch 35 Batch 90/162] avg loss 0.00287417, throughput 2.84069K wps
[Epoch 35 Batch 120/162] avg loss 0.00269421, throughput 2.83972K wps
[Epoch 35 Batch 150/162] avg loss 0.00265433, throughput 2.83545K wps
Begin Testing...
[Epoch 35] train avg loss 0.00268253, dev acc 0.9289, dev avg loss 0.20219, throughput 2.8497K wps
[Epoch 36 Batch 30/162] avg loss 0.00257519, throughput 2.9085K wps
[Epoch 36 Batch 60/162] avg loss 0.0024469, throughput 2.84602K wps
[Epoch 36 Batch 90/162] avg loss 0.00249098, throughput 2.83854K wps
[Epoch 36 Batch 120/162] avg loss 0.00286094, throughput 2.83592K wps
[Epoch 36 Batch 150/162] avg loss 0.00248257, throughput 2.85327K wps
Begin Testing...
[Epoch 36] train avg loss 0.00258522, dev acc 0.9244, dev avg loss 0.202355, throughput 2.85511K wps
[Epoch 37 Batch 30/162] avg loss 0.00281521, throughput 2.91566K wps
[Epoch 37 Batch 60/162] avg loss 0.00271529, throughput 2.84199K wps
[Epoch 37 Batch 90/162] avg loss 0.00238027, throughput 2.83925K wps
[Epoch 37 Batch 120/162] avg loss 0.00255207, throughput 2.84104K wps
[Epoch 37 Batch 150/162] avg loss 0.00234365, throughput 2.84478K wps
Begin Testing...
[Epoch 37] train avg loss 0.00253408, dev acc 0.9244, dev avg loss 0.201354, throughput 2.85549K wps
[Epoch 38 Batch 30/162] avg loss 0.00218363, throughput 2.91247K wps
[Epoch 38 Batch 60/162] avg loss 0.00240268, throughput 2.82886K wps
[Epoch 38 Batch 90/162] avg loss 0.00252517, throughput 2.84038K wps
[Epoch 38 Batch 120/162] avg loss 0.00245679, throughput 2.855K wps
[Epoch 38 Batch 150/162] avg loss 0.00255354, throughput 2.8446K wps
Begin Testing...
[Epoch 38] train avg loss 0.00246162, dev acc 0.9322, dev avg loss 0.199354, throughput 2.8557K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/162] avg loss 0.00243351, throughput 2.90423K wps
[Epoch 39 Batch 60/162] avg loss 0.00257842, throughput 2.8445K wps
[Epoch 39 Batch 90/162] avg loss 0.00225049, throughput 2.834K wps
[Epoch 39 Batch 120/162] avg loss 0.0023536, throughput 2.83632K wps
[Epoch 39 Batch 150/162] avg loss 0.0023885, throughput 2.85783K wps
Begin Testing...
[Epoch 39] train avg loss 0.00236342, dev acc 0.9256, dev avg loss 0.199487, throughput 2.85509K wps
[Epoch 40 Batch 30/162] avg loss 0.0023467, throughput 2.90722K wps
[Epoch 40 Batch 60/162] avg loss 0.00218647, throughput 2.85288K wps
[Epoch 40 Batch 90/162] avg loss 0.0020303, throughput 2.83982K wps
[Epoch 40 Batch 120/162] avg loss 0.00238408, throughput 2.83417K wps
[Epoch 40 Batch 150/162] avg loss 0.00239861, throughput 2.85367K wps
Begin Testing...
[Epoch 40] train avg loss 0.00230726, dev acc 0.9300, dev avg loss 0.197916, throughput 2.85684K wps
[Epoch 41 Batch 30/162] avg loss 0.00224149, throughput 2.89425K wps
[Epoch 41 Batch 60/162] avg loss 0.00212258, throughput 2.84269K wps
[Epoch 41 Batch 90/162] avg loss 0.00227718, throughput 2.83252K wps
[Epoch 41 Batch 120/162] avg loss 0.0022838, throughput 2.84293K wps
[Epoch 41 Batch 150/162] avg loss 0.00212119, throughput 2.84514K wps
Begin Testing...
[Epoch 41] train avg loss 0.00221524, dev acc 0.9244, dev avg loss 0.198109, throughput 2.85025K wps
[Epoch 42 Batch 30/162] avg loss 0.00226345, throughput 2.90295K wps
[Epoch 42 Batch 60/162] avg loss 0.00203802, throughput 2.83815K wps
[Epoch 42 Batch 90/162] avg loss 0.00208614, throughput 2.84535K wps
[Epoch 42 Batch 120/162] avg loss 0.0022674, throughput 2.85014K wps
[Epoch 42 Batch 150/162] avg loss 0.00217561, throughput 2.8436K wps
Begin Testing...
[Epoch 42] train avg loss 0.002176, dev acc 0.9222, dev avg loss 0.199852, throughput 2.85452K wps
[Epoch 43 Batch 30/162] avg loss 0.00206273, throughput 2.90346K wps
[Epoch 43 Batch 60/162] avg loss 0.00238845, throughput 2.8329K wps
[Epoch 43 Batch 90/162] avg loss 0.00193491, throughput 2.84453K wps
[Epoch 43 Batch 120/162] avg loss 0.00230589, throughput 2.84258K wps
[Epoch 43 Batch 150/162] avg loss 0.0019363, throughput 2.83726K wps
Begin Testing...
[Epoch 43] train avg loss 0.00211187, dev acc 0.9244, dev avg loss 0.197981, throughput 2.85124K wps
[Epoch 44 Batch 30/162] avg loss 0.00181491, throughput 2.91506K wps
[Epoch 44 Batch 60/162] avg loss 0.00200574, throughput 2.84001K wps
[Epoch 44 Batch 90/162] avg loss 0.00223407, throughput 2.8469K wps
[Epoch 44 Batch 120/162] avg loss 0.00228357, throughput 2.83484K wps
[Epoch 44 Batch 150/162] avg loss 0.00192744, throughput 2.83185K wps
Begin Testing...
[Epoch 44] train avg loss 0.00207105, dev acc 0.9300, dev avg loss 0.195274, throughput 2.85132K wps
[Epoch 45 Batch 30/162] avg loss 0.00218516, throughput 2.91064K wps
[Epoch 45 Batch 60/162] avg loss 0.00210842, throughput 2.84376K wps
[Epoch 45 Batch 90/162] avg loss 0.00196149, throughput 2.84575K wps
[Epoch 45 Batch 120/162] avg loss 0.00182232, throughput 2.83635K wps
[Epoch 45 Batch 150/162] avg loss 0.00221925, throughput 2.83503K wps
Begin Testing...
[Epoch 45] train avg loss 0.00202177, dev acc 0.9322, dev avg loss 0.19432, throughput 2.85265K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/162] avg loss 0.00183547, throughput 2.89671K wps
[Epoch 46 Batch 60/162] avg loss 0.00205168, throughput 2.83789K wps
[Epoch 46 Batch 90/162] avg loss 0.00188434, throughput 2.84944K wps
[Epoch 46 Batch 120/162] avg loss 0.00223543, throughput 2.84945K wps
[Epoch 46 Batch 150/162] avg loss 0.00200827, throughput 2.84879K wps
Begin Testing...
[Epoch 46] train avg loss 0.00200457, dev acc 0.9300, dev avg loss 0.193725, throughput 2.85335K wps
[Epoch 47 Batch 30/162] avg loss 0.00173613, throughput 2.91329K wps
[Epoch 47 Batch 60/162] avg loss 0.00200863, throughput 2.83394K wps
[Epoch 47 Batch 90/162] avg loss 0.00185788, throughput 2.83107K wps
[Epoch 47 Batch 120/162] avg loss 0.00194544, throughput 2.84348K wps
[Epoch 47 Batch 150/162] avg loss 0.00195375, throughput 2.84297K wps
Begin Testing...
[Epoch 47] train avg loss 0.00191363, dev acc 0.9333, dev avg loss 0.193247, throughput 2.8526K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/162] avg loss 0.00169033, throughput 2.89876K wps
[Epoch 48 Batch 60/162] avg loss 0.0019369, throughput 2.84639K wps
[Epoch 48 Batch 90/162] avg loss 0.00178443, throughput 2.83294K wps
[Epoch 48 Batch 120/162] avg loss 0.00188788, throughput 2.84458K wps
[Epoch 48 Batch 150/162] avg loss 0.00187129, throughput 2.8363K wps
Begin Testing...
[Epoch 48] train avg loss 0.00184032, dev acc 0.9300, dev avg loss 0.192993, throughput 2.85058K wps
[Epoch 49 Batch 30/162] avg loss 0.00161602, throughput 2.90971K wps
[Epoch 49 Batch 60/162] avg loss 0.00172586, throughput 2.85271K wps
[Epoch 49 Batch 90/162] avg loss 0.00183849, throughput 2.84032K wps
[Epoch 49 Batch 120/162] avg loss 0.00198112, throughput 2.83475K wps
[Epoch 49 Batch 150/162] avg loss 0.0018389, throughput 2.84438K wps
Begin Testing...
[Epoch 49] train avg loss 0.00180243, dev acc 0.9300, dev avg loss 0.192381, throughput 2.85506K wps
[Epoch 50 Batch 30/162] avg loss 0.00187042, throughput 2.92083K wps
[Epoch 50 Batch 60/162] avg loss 0.00172579, throughput 2.83871K wps
[Epoch 50 Batch 90/162] avg loss 0.00185189, throughput 2.83488K wps
[Epoch 50 Batch 120/162] avg loss 0.00179601, throughput 2.85046K wps
[Epoch 50 Batch 150/162] avg loss 0.00168084, throughput 2.83314K wps
Begin Testing...
[Epoch 50] train avg loss 0.00177443, dev acc 0.9311, dev avg loss 0.192889, throughput 2.85374K wps
[Epoch 51 Batch 30/162] avg loss 0.00179222, throughput 2.91865K wps
[Epoch 51 Batch 60/162] avg loss 0.00189296, throughput 2.84245K wps
[Epoch 51 Batch 90/162] avg loss 0.00185902, throughput 2.84644K wps
[Epoch 51 Batch 120/162] avg loss 0.00172088, throughput 2.83012K wps
[Epoch 51 Batch 150/162] avg loss 0.00146218, throughput 2.83393K wps
Begin Testing...
[Epoch 51] train avg loss 0.00174066, dev acc 0.9322, dev avg loss 0.192039, throughput 2.85289K wps
[Epoch 52 Batch 30/162] avg loss 0.00166336, throughput 2.89746K wps
[Epoch 52 Batch 60/162] avg loss 0.00153732, throughput 2.84613K wps
[Epoch 52 Batch 90/162] avg loss 0.00169885, throughput 2.85303K wps
[Epoch 52 Batch 120/162] avg loss 0.0018212, throughput 2.84516K wps
[Epoch 52 Batch 150/162] avg loss 0.00189509, throughput 2.84099K wps
Begin Testing...
[Epoch 52] train avg loss 0.00173152, dev acc 0.9322, dev avg loss 0.191337, throughput 2.85355K wps
[Epoch 53 Batch 30/162] avg loss 0.00143452, throughput 2.90171K wps
[Epoch 53 Batch 60/162] avg loss 0.00157486, throughput 2.82947K wps
[Epoch 53 Batch 90/162] avg loss 0.00183577, throughput 2.83404K wps
[Epoch 53 Batch 120/162] avg loss 0.0015689, throughput 2.85587K wps
[Epoch 53 Batch 150/162] avg loss 0.00170633, throughput 2.84931K wps
Begin Testing...
[Epoch 53] train avg loss 0.00161989, dev acc 0.9322, dev avg loss 0.191038, throughput 2.85372K wps
[Epoch 54 Batch 30/162] avg loss 0.00182627, throughput 2.91048K wps
[Epoch 54 Batch 60/162] avg loss 0.00142795, throughput 2.8349K wps
[Epoch 54 Batch 90/162] avg loss 0.00158435, throughput 2.8331K wps
[Epoch 54 Batch 120/162] avg loss 0.00157243, throughput 2.84027K wps
[Epoch 54 Batch 150/162] avg loss 0.00139806, throughput 2.85231K wps
Begin Testing...
[Epoch 54] train avg loss 0.00157782, dev acc 0.9289, dev avg loss 0.193254, throughput 2.85257K wps
[Epoch 55 Batch 30/162] avg loss 0.00166427, throughput 2.90661K wps
[Epoch 55 Batch 60/162] avg loss 0.00160601, throughput 2.84159K wps
[Epoch 55 Batch 90/162] avg loss 0.00146459, throughput 2.82336K wps
[Epoch 55 Batch 120/162] avg loss 0.00141308, throughput 2.82866K wps
[Epoch 55 Batch 150/162] avg loss 0.00160502, throughput 2.83819K wps
Begin Testing...
[Epoch 55] train avg loss 0.00155765, dev acc 0.9322, dev avg loss 0.191196, throughput 2.84684K wps
[Epoch 56 Batch 30/162] avg loss 0.0014786, throughput 2.90671K wps
[Epoch 56 Batch 60/162] avg loss 0.00143728, throughput 2.84119K wps
[Epoch 56 Batch 90/162] avg loss 0.00140232, throughput 2.84182K wps
[Epoch 56 Batch 120/162] avg loss 0.00161205, throughput 2.84233K wps
[Epoch 56 Batch 150/162] avg loss 0.00152445, throughput 2.83944K wps
Begin Testing...
[Epoch 56] train avg loss 0.00149946, dev acc 0.9289, dev avg loss 0.191669, throughput 2.85254K wps
[Epoch 57 Batch 30/162] avg loss 0.00149272, throughput 2.89672K wps
[Epoch 57 Batch 60/162] avg loss 0.00153574, throughput 2.84218K wps
[Epoch 57 Batch 90/162] avg loss 0.00138288, throughput 2.84303K wps
[Epoch 57 Batch 120/162] avg loss 0.00153503, throughput 2.84862K wps
[Epoch 57 Batch 150/162] avg loss 0.00151799, throughput 2.84307K wps
Begin Testing...
[Epoch 57] train avg loss 0.00147931, dev acc 0.9300, dev avg loss 0.190972, throughput 2.85257K wps
[Epoch 58 Batch 30/162] avg loss 0.00147405, throughput 2.901K wps
[Epoch 58 Batch 60/162] avg loss 0.00148826, throughput 2.83919K wps
[Epoch 58 Batch 90/162] avg loss 0.00150919, throughput 2.84492K wps
[Epoch 58 Batch 120/162] avg loss 0.00142502, throughput 2.84078K wps
[Epoch 58 Batch 150/162] avg loss 0.00124202, throughput 2.83116K wps
Begin Testing...
[Epoch 58] train avg loss 0.00142026, dev acc 0.9322, dev avg loss 0.190711, throughput 2.85055K wps
[Epoch 59 Batch 30/162] avg loss 0.00129977, throughput 2.90543K wps
[Epoch 59 Batch 60/162] avg loss 0.00135113, throughput 2.83785K wps
[Epoch 59 Batch 90/162] avg loss 0.00127396, throughput 2.84712K wps
[Epoch 59 Batch 120/162] avg loss 0.00149435, throughput 2.84666K wps
[Epoch 59 Batch 150/162] avg loss 0.00153365, throughput 2.83762K wps
Begin Testing...
[Epoch 59] train avg loss 0.00138443, dev acc 0.9322, dev avg loss 0.190942, throughput 2.85415K wps
[Epoch 60 Batch 30/162] avg loss 0.00127163, throughput 2.9024K wps
[Epoch 60 Batch 60/162] avg loss 0.00132824, throughput 2.83468K wps
[Epoch 60 Batch 90/162] avg loss 0.00147686, throughput 2.84191K wps
[Epoch 60 Batch 120/162] avg loss 0.00130174, throughput 2.83597K wps
[Epoch 60 Batch 150/162] avg loss 0.00138283, throughput 2.84343K wps
Begin Testing...
[Epoch 60] train avg loss 0.00135421, dev acc 0.9333, dev avg loss 0.191195, throughput 2.85195K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/162] avg loss 0.00127314, throughput 2.90939K wps
[Epoch 61 Batch 60/162] avg loss 0.00127464, throughput 2.8513K wps
[Epoch 61 Batch 90/162] avg loss 0.0012435, throughput 2.84099K wps
[Epoch 61 Batch 120/162] avg loss 0.0012406, throughput 2.83685K wps
[Epoch 61 Batch 150/162] avg loss 0.00130037, throughput 2.83896K wps
Begin Testing...
[Epoch 61] train avg loss 0.00126872, dev acc 0.9289, dev avg loss 0.191447, throughput 2.85423K wps
[Epoch 62 Batch 30/162] avg loss 0.00150992, throughput 2.91646K wps
[Epoch 62 Batch 60/162] avg loss 0.00119333, throughput 2.84527K wps
[Epoch 62 Batch 90/162] avg loss 0.00129922, throughput 2.85354K wps
[Epoch 62 Batch 120/162] avg loss 0.00112504, throughput 2.83484K wps
[Epoch 62 Batch 150/162] avg loss 0.00134332, throughput 2.84037K wps
Begin Testing...
[Epoch 62] train avg loss 0.00127899, dev acc 0.9333, dev avg loss 0.191195, throughput 2.85808K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/162] avg loss 0.0011986, throughput 2.92041K wps
[Epoch 63 Batch 60/162] avg loss 0.00139055, throughput 2.85732K wps
[Epoch 63 Batch 90/162] avg loss 0.00118797, throughput 2.85163K wps
[Epoch 63 Batch 120/162] avg loss 0.00133062, throughput 2.84678K wps
[Epoch 63 Batch 150/162] avg loss 0.00123852, throughput 2.82906K wps
Begin Testing...
[Epoch 63] train avg loss 0.00126424, dev acc 0.9300, dev avg loss 0.191379, throughput 2.85741K wps
[Epoch 64 Batch 30/162] avg loss 0.00130914, throughput 2.90806K wps
[Epoch 64 Batch 60/162] avg loss 0.00116428, throughput 2.83415K wps
[Epoch 64 Batch 90/162] avg loss 0.001239, throughput 2.83313K wps
[Epoch 64 Batch 120/162] avg loss 0.00115247, throughput 2.84427K wps
[Epoch 64 Batch 150/162] avg loss 0.00116988, throughput 2.85084K wps
Begin Testing...
[Epoch 64] train avg loss 0.00122435, dev acc 0.9311, dev avg loss 0.190917, throughput 2.85293K wps
[Epoch 65 Batch 30/162] avg loss 0.00114701, throughput 2.89995K wps
[Epoch 65 Batch 60/162] avg loss 0.00122024, throughput 2.83807K wps
[Epoch 65 Batch 90/162] avg loss 0.00112659, throughput 2.8359K wps
[Epoch 65 Batch 120/162] avg loss 0.0012036, throughput 2.84623K wps
[Epoch 65 Batch 150/162] avg loss 0.00107218, throughput 2.8404K wps
Begin Testing...
[Epoch 65] train avg loss 0.00116414, dev acc 0.9300, dev avg loss 0.190766, throughput 2.85026K wps
[Epoch 66 Batch 30/162] avg loss 0.0010973, throughput 2.89833K wps
[Epoch 66 Batch 60/162] avg loss 0.0011643, throughput 2.83329K wps
[Epoch 66 Batch 90/162] avg loss 0.001153, throughput 2.85269K wps
[Epoch 66 Batch 120/162] avg loss 0.00115264, throughput 2.84383K wps
[Epoch 66 Batch 150/162] avg loss 0.00121304, throughput 2.83937K wps
Begin Testing...
[Epoch 66] train avg loss 0.00116925, dev acc 0.9300, dev avg loss 0.191857, throughput 2.8518K wps
[Epoch 67 Batch 30/162] avg loss 0.00117332, throughput 2.90893K wps
[Epoch 67 Batch 60/162] avg loss 0.00101617, throughput 2.8372K wps
[Epoch 67 Batch 90/162] avg loss 0.00115833, throughput 2.83039K wps
[Epoch 67 Batch 120/162] avg loss 0.00112885, throughput 2.82185K wps
[Epoch 67 Batch 150/162] avg loss 0.0011446, throughput 2.82966K wps
Begin Testing...
[Epoch 67] train avg loss 0.00112546, dev acc 0.9289, dev avg loss 0.191311, throughput 2.84432K wps
[Epoch 68 Batch 30/162] avg loss 0.00104208, throughput 2.91583K wps
[Epoch 68 Batch 60/162] avg loss 0.00125531, throughput 2.83265K wps
[Epoch 68 Batch 90/162] avg loss 0.00107396, throughput 2.84195K wps
[Epoch 68 Batch 120/162] avg loss 0.00112183, throughput 2.84028K wps
[Epoch 68 Batch 150/162] avg loss 0.00107062, throughput 2.82894K wps
Begin Testing...
[Epoch 68] train avg loss 0.00111313, dev acc 0.9322, dev avg loss 0.191836, throughput 2.85069K wps
[Epoch 69 Batch 30/162] avg loss 0.00104433, throughput 2.89792K wps
[Epoch 69 Batch 60/162] avg loss 0.00111516, throughput 2.83381K wps
[Epoch 69 Batch 90/162] avg loss 0.00105723, throughput 2.84672K wps
[Epoch 69 Batch 120/162] avg loss 0.00111427, throughput 2.82953K wps
[Epoch 69 Batch 150/162] avg loss 0.000990804, throughput 2.82506K wps
Begin Testing...
[Epoch 69] train avg loss 0.00106752, dev acc 0.9278, dev avg loss 0.193182, throughput 2.84308K wps
[Epoch 70 Batch 30/162] avg loss 0.00101679, throughput 2.90157K wps
[Epoch 70 Batch 60/162] avg loss 0.00109452, throughput 2.83498K wps
[Epoch 70 Batch 90/162] avg loss 0.0010392, throughput 2.83569K wps
[Epoch 70 Batch 120/162] avg loss 0.00105055, throughput 2.8325K wps
[Epoch 70 Batch 150/162] avg loss 0.00104098, throughput 2.83588K wps
Begin Testing...
[Epoch 70] train avg loss 0.00104602, dev acc 0.9311, dev avg loss 0.191671, throughput 2.8484K wps
[Epoch 71 Batch 30/162] avg loss 0.00105953, throughput 2.90869K wps
[Epoch 71 Batch 60/162] avg loss 0.00099359, throughput 2.82474K wps
[Epoch 71 Batch 90/162] avg loss 0.00112173, throughput 2.83906K wps
[Epoch 71 Batch 120/162] avg loss 0.000931667, throughput 2.84374K wps
[Epoch 71 Batch 150/162] avg loss 0.00107702, throughput 2.83836K wps
Begin Testing...
[Epoch 71] train avg loss 0.00104008, dev acc 0.9300, dev avg loss 0.191984, throughput 2.85029K wps
[Epoch 72 Batch 30/162] avg loss 0.00105131, throughput 2.91958K wps
[Epoch 72 Batch 60/162] avg loss 0.000969809, throughput 2.83815K wps
[Epoch 72 Batch 90/162] avg loss 0.000908887, throughput 2.82976K wps
[Epoch 72 Batch 120/162] avg loss 0.000907022, throughput 2.84302K wps
[Epoch 72 Batch 150/162] avg loss 0.000986533, throughput 2.8356K wps
Begin Testing...
[Epoch 72] train avg loss 0.000969041, dev acc 0.9300, dev avg loss 0.191885, throughput 2.8502K wps
[Epoch 73 Batch 30/162] avg loss 0.000950372, throughput 2.90781K wps
[Epoch 73 Batch 60/162] avg loss 0.000946988, throughput 2.84877K wps
[Epoch 73 Batch 90/162] avg loss 0.000955546, throughput 2.854K wps
[Epoch 73 Batch 120/162] avg loss 0.000992349, throughput 2.81492K wps
[Epoch 73 Batch 150/162] avg loss 0.00089613, throughput 2.8414K wps
Begin Testing...
[Epoch 73] train avg loss 0.000955056, dev acc 0.9289, dev avg loss 0.193801, throughput 2.85083K wps
[Epoch 74 Batch 30/162] avg loss 0.000911584, throughput 2.90474K wps
[Epoch 74 Batch 60/162] avg loss 0.000911047, throughput 2.83161K wps
[Epoch 74 Batch 90/162] avg loss 0.00086194, throughput 2.83064K wps
[Epoch 74 Batch 120/162] avg loss 0.000983392, throughput 2.82815K wps
[Epoch 74 Batch 150/162] avg loss 0.000885804, throughput 2.82766K wps
Begin Testing...
[Epoch 74] train avg loss 0.000916721, dev acc 0.9289, dev avg loss 0.192128, throughput 2.84322K wps
[Epoch 75 Batch 30/162] avg loss 0.000911392, throughput 2.89231K wps
[Epoch 75 Batch 60/162] avg loss 0.000894977, throughput 2.81942K wps
[Epoch 75 Batch 90/162] avg loss 0.000944106, throughput 2.83455K wps
[Epoch 75 Batch 120/162] avg loss 0.000928492, throughput 2.83301K wps
[Epoch 75 Batch 150/162] avg loss 0.000910011, throughput 2.82929K wps
Begin Testing...
[Epoch 75] train avg loss 0.000936629, dev acc 0.9289, dev avg loss 0.192076, throughput 2.83982K wps
[Epoch 76 Batch 30/162] avg loss 0.000996658, throughput 2.90151K wps
[Epoch 76 Batch 60/162] avg loss 0.000991996, throughput 2.83839K wps
[Epoch 76 Batch 90/162] avg loss 0.00087823, throughput 2.8369K wps
[Epoch 76 Batch 120/162] avg loss 0.000906215, throughput 2.83747K wps
[Epoch 76 Batch 150/162] avg loss 0.00081713, throughput 2.84069K wps
Begin Testing...
[Epoch 76] train avg loss 0.000913438, dev acc 0.9300, dev avg loss 0.192766, throughput 2.8497K wps
[Epoch 77 Batch 30/162] avg loss 0.000946979, throughput 2.90184K wps
[Epoch 77 Batch 60/162] avg loss 0.000865504, throughput 2.82801K wps
[Epoch 77 Batch 90/162] avg loss 0.000808511, throughput 2.82689K wps
[Epoch 77 Batch 120/162] avg loss 0.000811412, throughput 2.82402K wps
[Epoch 77 Batch 150/162] avg loss 0.000945418, throughput 2.84178K wps
Begin Testing...
[Epoch 77] train avg loss 0.000877787, dev acc 0.9311, dev avg loss 0.192548, throughput 2.8435K wps
[Epoch 78 Batch 30/162] avg loss 0.000838857, throughput 2.91227K wps
[Epoch 78 Batch 60/162] avg loss 0.000818354, throughput 2.85241K wps
[Epoch 78 Batch 90/162] avg loss 0.000896271, throughput 2.83381K wps
[Epoch 78 Batch 120/162] avg loss 0.000918771, throughput 2.83211K wps
[Epoch 78 Batch 150/162] avg loss 0.000851035, throughput 2.84601K wps
Begin Testing...
[Epoch 78] train avg loss 0.000857075, dev acc 0.9278, dev avg loss 0.194475, throughput 2.85373K wps
[Epoch 79 Batch 30/162] avg loss 0.000798796, throughput 2.91044K wps
[Epoch 79 Batch 60/162] avg loss 0.00083322, throughput 2.8371K wps
[Epoch 79 Batch 90/162] avg loss 0.000750204, throughput 2.84176K wps
[Epoch 79 Batch 120/162] avg loss 0.000915273, throughput 2.83483K wps
[Epoch 79 Batch 150/162] avg loss 0.000887646, throughput 2.84448K wps
Begin Testing...
[Epoch 79] train avg loss 0.000824163, dev acc 0.9300, dev avg loss 0.192776, throughput 2.85317K wps
[Epoch 80 Batch 30/162] avg loss 0.000797107, throughput 2.90862K wps
[Epoch 80 Batch 60/162] avg loss 0.000839879, throughput 2.83947K wps
[Epoch 80 Batch 90/162] avg loss 0.000882234, throughput 2.83472K wps
[Epoch 80 Batch 120/162] avg loss 0.000936942, throughput 2.84139K wps
[Epoch 80 Batch 150/162] avg loss 0.000809625, throughput 2.84404K wps
Begin Testing...
[Epoch 80] train avg loss 0.000847811, dev acc 0.9300, dev avg loss 0.193616, throughput 2.85126K wps
[Epoch 81 Batch 30/162] avg loss 0.000784868, throughput 2.90565K wps
[Epoch 81 Batch 60/162] avg loss 0.000874822, throughput 2.83911K wps
[Epoch 81 Batch 90/162] avg loss 0.000807453, throughput 2.8442K wps
[Epoch 81 Batch 120/162] avg loss 0.000783686, throughput 2.84821K wps
[Epoch 81 Batch 150/162] avg loss 0.000733944, throughput 2.83286K wps
Begin Testing...
[Epoch 81] train avg loss 0.000799921, dev acc 0.9311, dev avg loss 0.19343, throughput 2.85188K wps
[Epoch 82 Batch 30/162] avg loss 0.000729198, throughput 2.8937K wps
[Epoch 82 Batch 60/162] avg loss 0.000777468, throughput 2.84383K wps
[Epoch 82 Batch 90/162] avg loss 0.000727605, throughput 2.84442K wps
[Epoch 82 Batch 120/162] avg loss 0.000777951, throughput 2.84393K wps
[Epoch 82 Batch 150/162] avg loss 0.000797325, throughput 2.84894K wps
Begin Testing...
[Epoch 82] train avg loss 0.000770489, dev acc 0.9300, dev avg loss 0.193436, throughput 2.85469K wps
[Epoch 83 Batch 30/162] avg loss 0.00085636, throughput 2.91034K wps
[Epoch 83 Batch 60/162] avg loss 0.000753205, throughput 2.85061K wps
[Epoch 83 Batch 90/162] avg loss 0.000795905, throughput 2.84316K wps
[Epoch 83 Batch 120/162] avg loss 0.000775844, throughput 2.84168K wps
[Epoch 83 Batch 150/162] avg loss 0.000727382, throughput 2.83401K wps
Begin Testing...
[Epoch 83] train avg loss 0.000782229, dev acc 0.9322, dev avg loss 0.193738, throughput 2.85297K wps
[Epoch 84 Batch 30/162] avg loss 0.00077262, throughput 2.91165K wps
[Epoch 84 Batch 60/162] avg loss 0.000780806, throughput 2.83292K wps
[Epoch 84 Batch 90/162] avg loss 0.000773829, throughput 2.83416K wps
[Epoch 84 Batch 120/162] avg loss 0.000756693, throughput 2.85314K wps
[Epoch 84 Batch 150/162] avg loss 0.000695648, throughput 2.84305K wps
Begin Testing...
[Epoch 84] train avg loss 0.000753278, dev acc 0.9311, dev avg loss 0.193928, throughput 2.8533K wps
[Epoch 85 Batch 30/162] avg loss 0.000723417, throughput 2.90176K wps
[Epoch 85 Batch 60/162] avg loss 0.000767381, throughput 2.8447K wps
[Epoch 85 Batch 90/162] avg loss 0.000705946, throughput 2.84119K wps
[Epoch 85 Batch 120/162] avg loss 0.000704305, throughput 2.84449K wps
[Epoch 85 Batch 150/162] avg loss 0.00085845, throughput 2.83757K wps
Begin Testing...
[Epoch 85] train avg loss 0.000755857, dev acc 0.9333, dev avg loss 0.194388, throughput 2.85277K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/162] avg loss 0.000775477, throughput 2.89743K wps
[Epoch 86 Batch 60/162] avg loss 0.000736429, throughput 2.83785K wps
[Epoch 86 Batch 90/162] avg loss 0.000618404, throughput 2.83622K wps
[Epoch 86 Batch 120/162] avg loss 0.000724496, throughput 2.82691K wps
[Epoch 86 Batch 150/162] avg loss 0.00072435, throughput 2.83087K wps
Begin Testing...
[Epoch 86] train avg loss 0.00071435, dev acc 0.9256, dev avg loss 0.19553, throughput 2.84411K wps
[Epoch 87 Batch 30/162] avg loss 0.000644266, throughput 2.90893K wps
[Epoch 87 Batch 60/162] avg loss 0.000689644, throughput 2.8292K wps
[Epoch 87 Batch 90/162] avg loss 0.000658473, throughput 2.8406K wps
[Epoch 87 Batch 120/162] avg loss 0.000666594, throughput 2.84016K wps
[Epoch 87 Batch 150/162] avg loss 0.000701381, throughput 2.83245K wps
Begin Testing...
[Epoch 87] train avg loss 0.000675512, dev acc 0.9333, dev avg loss 0.195381, throughput 2.84988K wps
Observed Improvement.
Begin Testing...
[Epoch 88 Batch 30/162] avg loss 0.000746529, throughput 2.88445K wps
[Epoch 88 Batch 60/162] avg loss 0.000712203, throughput 2.84447K wps
[Epoch 88 Batch 90/162] avg loss 0.000666906, throughput 2.83592K wps
[Epoch 88 Batch 120/162] avg loss 0.000652626, throughput 2.83451K wps
[Epoch 88 Batch 150/162] avg loss 0.00072123, throughput 2.84113K wps
Begin Testing...
[Epoch 88] train avg loss 0.000701353, dev acc 0.9322, dev avg loss 0.195478, throughput 2.84673K wps
[Epoch 89 Batch 30/162] avg loss 0.000582881, throughput 2.90584K wps
[Epoch 89 Batch 60/162] avg loss 0.000636638, throughput 2.84546K wps
[Epoch 89 Batch 90/162] avg loss 0.000686606, throughput 2.8367K wps
[Epoch 89 Batch 120/162] avg loss 0.000659656, throughput 2.83393K wps
[Epoch 89 Batch 150/162] avg loss 0.000713546, throughput 2.82553K wps
Begin Testing...
[Epoch 89] train avg loss 0.000656982, dev acc 0.9322, dev avg loss 0.195677, throughput 2.84831K wps
[Epoch 90 Batch 30/162] avg loss 0.000715529, throughput 2.90467K wps
[Epoch 90 Batch 60/162] avg loss 0.000638485, throughput 2.83153K wps
[Epoch 90 Batch 90/162] avg loss 0.000703567, throughput 2.83565K wps
[Epoch 90 Batch 120/162] avg loss 0.000678631, throughput 2.82978K wps
[Epoch 90 Batch 150/162] avg loss 0.000647268, throughput 2.84347K wps
Begin Testing...
[Epoch 90] train avg loss 0.000670234, dev acc 0.9322, dev avg loss 0.19569, throughput 2.84937K wps
[Epoch 91 Batch 30/162] avg loss 0.000694188, throughput 2.90212K wps
[Epoch 91 Batch 60/162] avg loss 0.000550544, throughput 2.84196K wps
[Epoch 91 Batch 90/162] avg loss 0.000675088, throughput 2.84827K wps
[Epoch 91 Batch 120/162] avg loss 0.000526305, throughput 2.84301K wps
[Epoch 91 Batch 150/162] avg loss 0.000676792, throughput 2.8445K wps
Begin Testing...
[Epoch 91] train avg loss 0.00062819, dev acc 0.9311, dev avg loss 0.19624, throughput 2.85345K wps
[Epoch 92 Batch 30/162] avg loss 0.00061805, throughput 2.90844K wps
[Epoch 92 Batch 60/162] avg loss 0.000618813, throughput 2.84713K wps
[Epoch 92 Batch 90/162] avg loss 0.000574191, throughput 2.84098K wps
[Epoch 92 Batch 120/162] avg loss 0.000708599, throughput 2.84349K wps
[Epoch 92 Batch 150/162] avg loss 0.000622935, throughput 2.85525K wps
Begin Testing...
[Epoch 92] train avg loss 0.000624005, dev acc 0.9333, dev avg loss 0.196582, throughput 2.85776K wps
Observed Improvement.
Begin Testing...
[Epoch 93 Batch 30/162] avg loss 0.000643842, throughput 2.9126K wps
[Epoch 93 Batch 60/162] avg loss 0.000569074, throughput 2.84001K wps
[Epoch 93 Batch 90/162] avg loss 0.000662199, throughput 2.84687K wps
[Epoch 93 Batch 120/162] avg loss 0.000607817, throughput 2.83828K wps
[Epoch 93 Batch 150/162] avg loss 0.000611783, throughput 2.83935K wps
Begin Testing...
[Epoch 93] train avg loss 0.000614834, dev acc 0.9289, dev avg loss 0.196697, throughput 2.85474K wps
[Epoch 94 Batch 30/162] avg loss 0.000531496, throughput 2.91292K wps
[Epoch 94 Batch 60/162] avg loss 0.000676753, throughput 2.8528K wps
[Epoch 94 Batch 90/162] avg loss 0.000558893, throughput 2.83915K wps
[Epoch 94 Batch 120/162] avg loss 0.000584341, throughput 2.83141K wps
[Epoch 94 Batch 150/162] avg loss 0.000662533, throughput 2.84204K wps
Begin Testing...
[Epoch 94] train avg loss 0.000594625, dev acc 0.9322, dev avg loss 0.196562, throughput 2.85377K wps
[Epoch 95 Batch 30/162] avg loss 0.00060762, throughput 2.91686K wps
[Epoch 95 Batch 60/162] avg loss 0.000560398, throughput 2.84098K wps
[Epoch 95 Batch 90/162] avg loss 0.000521856, throughput 2.84134K wps
[Epoch 95 Batch 120/162] avg loss 0.000622863, throughput 2.84539K wps
[Epoch 95 Batch 150/162] avg loss 0.000555369, throughput 2.84713K wps
Begin Testing...
[Epoch 95] train avg loss 0.000582866, dev acc 0.9333, dev avg loss 0.197688, throughput 2.85732K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/162] avg loss 0.000626494, throughput 2.91589K wps
[Epoch 96 Batch 60/162] avg loss 0.00059418, throughput 2.85311K wps
[Epoch 96 Batch 90/162] avg loss 0.00052177, throughput 2.82862K wps
[Epoch 96 Batch 120/162] avg loss 0.000600089, throughput 2.84793K wps
[Epoch 96 Batch 150/162] avg loss 0.000597514, throughput 2.84016K wps
Begin Testing...
[Epoch 96] train avg loss 0.000587416, dev acc 0.9278, dev avg loss 0.198074, throughput 2.8556K wps
[Epoch 97 Batch 30/162] avg loss 0.000598399, throughput 2.90733K wps
[Epoch 97 Batch 60/162] avg loss 0.000579634, throughput 2.83133K wps
[Epoch 97 Batch 90/162] avg loss 0.000556007, throughput 2.84326K wps
[Epoch 97 Batch 120/162] avg loss 0.000538909, throughput 2.83449K wps
[Epoch 97 Batch 150/162] avg loss 0.000549149, throughput 2.84106K wps
Begin Testing...
[Epoch 97] train avg loss 0.000570842, dev acc 0.9333, dev avg loss 0.197919, throughput 2.85126K wps
Observed Improvement.
Begin Testing...
[Epoch 98 Batch 30/162] avg loss 0.0005598, throughput 2.8999K wps
[Epoch 98 Batch 60/162] avg loss 0.000540679, throughput 2.85319K wps
[Epoch 98 Batch 90/162] avg loss 0.000561321, throughput 2.84732K wps
[Epoch 98 Batch 120/162] avg loss 0.000597679, throughput 2.8521K wps
[Epoch 98 Batch 150/162] avg loss 0.000591357, throughput 2.83724K wps
Begin Testing...
[Epoch 98] train avg loss 0.000566515, dev acc 0.9344, dev avg loss 0.198471, throughput 2.8559K wps
Observed Improvement.
Begin Testing...
[Epoch 99 Batch 30/162] avg loss 0.000575171, throughput 2.91214K wps
[Epoch 99 Batch 60/162] avg loss 0.000580405, throughput 2.83933K wps
[Epoch 99 Batch 90/162] avg loss 0.000480292, throughput 2.8562K wps
[Epoch 99 Batch 120/162] avg loss 0.000579281, throughput 2.85397K wps
[Epoch 99 Batch 150/162] avg loss 0.000545233, throughput 2.84371K wps
Begin Testing...
[Epoch 99] train avg loss 0.000554765, dev acc 0.9322, dev avg loss 0.199046, throughput 2.85943K wps
[Epoch 100 Batch 30/162] avg loss 0.000490124, throughput 2.91551K wps
[Epoch 100 Batch 60/162] avg loss 0.00052283, throughput 2.83888K wps
[Epoch 100 Batch 90/162] avg loss 0.000586794, throughput 2.84975K wps
[Epoch 100 Batch 120/162] avg loss 0.000506837, throughput 2.85604K wps
[Epoch 100 Batch 150/162] avg loss 0.000491384, throughput 2.85415K wps
Begin Testing...
[Epoch 100] train avg loss 0.000533356, dev acc 0.9300, dev avg loss 0.199241, throughput 2.86201K wps
[Epoch 101 Batch 30/162] avg loss 0.000553622, throughput 2.91806K wps
[Epoch 101 Batch 60/162] avg loss 0.000542275, throughput 2.83779K wps
[Epoch 101 Batch 90/162] avg loss 0.000447697, throughput 2.83674K wps
[Epoch 101 Batch 120/162] avg loss 0.000534559, throughput 2.84775K wps
[Epoch 101 Batch 150/162] avg loss 0.000573029, throughput 2.85482K wps
Begin Testing...
[Epoch 101] train avg loss 0.000535844, dev acc 0.9311, dev avg loss 0.199707, throughput 2.85745K wps
[Epoch 102 Batch 30/162] avg loss 0.000538994, throughput 2.90549K wps
[Epoch 102 Batch 60/162] avg loss 0.000501123, throughput 2.84692K wps
[Epoch 102 Batch 90/162] avg loss 0.000562239, throughput 2.83075K wps
[Epoch 102 Batch 120/162] avg loss 0.000541878, throughput 2.84754K wps
[Epoch 102 Batch 150/162] avg loss 0.000477789, throughput 2.84244K wps
Begin Testing...
[Epoch 102] train avg loss 0.000520788, dev acc 0.9344, dev avg loss 0.200546, throughput 2.85347K wps
Observed Improvement.
Begin Testing...
[Epoch 103 Batch 30/162] avg loss 0.000547483, throughput 2.91106K wps
[Epoch 103 Batch 60/162] avg loss 0.000523828, throughput 2.83908K wps
[Epoch 103 Batch 90/162] avg loss 0.000519514, throughput 2.84046K wps
[Epoch 103 Batch 120/162] avg loss 0.000504736, throughput 2.84222K wps
[Epoch 103 Batch 150/162] avg loss 0.000547366, throughput 2.85067K wps
Begin Testing...
[Epoch 103] train avg loss 0.000527887, dev acc 0.9311, dev avg loss 0.200665, throughput 2.85479K wps
[Epoch 104 Batch 30/162] avg loss 0.000479231, throughput 2.91285K wps
[Epoch 104 Batch 60/162] avg loss 0.0005476, throughput 2.83659K wps
[Epoch 104 Batch 90/162] avg loss 0.000471811, throughput 2.84158K wps
[Epoch 104 Batch 120/162] avg loss 0.000420862, throughput 2.85042K wps
[Epoch 104 Batch 150/162] avg loss 0.000467948, throughput 2.83841K wps
Begin Testing...
[Epoch 104] train avg loss 0.000485978, dev acc 0.9311, dev avg loss 0.200821, throughput 2.85471K wps
[Epoch 105 Batch 30/162] avg loss 0.000493451, throughput 2.92408K wps
[Epoch 105 Batch 60/162] avg loss 0.000506471, throughput 2.83004K wps
[Epoch 105 Batch 90/162] avg loss 0.000480015, throughput 2.84076K wps
[Epoch 105 Batch 120/162] avg loss 0.000457264, throughput 2.84351K wps
[Epoch 105 Batch 150/162] avg loss 0.000517348, throughput 2.85255K wps
Begin Testing...
[Epoch 105] train avg loss 0.000493457, dev acc 0.9289, dev avg loss 0.201506, throughput 2.85569K wps
[Epoch 106 Batch 30/162] avg loss 0.000458023, throughput 2.91663K wps
[Epoch 106 Batch 60/162] avg loss 0.000545652, throughput 2.85248K wps
[Epoch 106 Batch 90/162] avg loss 0.000486261, throughput 2.83049K wps
[Epoch 106 Batch 120/162] avg loss 0.000464083, throughput 2.85221K wps
[Epoch 106 Batch 150/162] avg loss 0.000469136, throughput 2.85188K wps
Begin Testing...
[Epoch 106] train avg loss 0.000480765, dev acc 0.9322, dev avg loss 0.201133, throughput 2.85927K wps
[Epoch 107 Batch 30/162] avg loss 0.000548017, throughput 2.91955K wps
[Epoch 107 Batch 60/162] avg loss 0.000468785, throughput 2.84439K wps
[Epoch 107 Batch 90/162] avg loss 0.000482218, throughput 2.83168K wps
[Epoch 107 Batch 120/162] avg loss 0.000475788, throughput 2.8421K wps
[Epoch 107 Batch 150/162] avg loss 0.000538576, throughput 2.84187K wps
Begin Testing...
[Epoch 107] train avg loss 0.000500532, dev acc 0.9333, dev avg loss 0.201352, throughput 2.85507K wps
[Epoch 108 Batch 30/162] avg loss 0.000457232, throughput 2.91747K wps
[Epoch 108 Batch 60/162] avg loss 0.000469672, throughput 2.85248K wps
[Epoch 108 Batch 90/162] avg loss 0.000435039, throughput 2.83239K wps
[Epoch 108 Batch 120/162] avg loss 0.000541026, throughput 2.84507K wps
[Epoch 108 Batch 150/162] avg loss 0.000581083, throughput 2.8406K wps
Begin Testing...
[Epoch 108] train avg loss 0.000496154, dev acc 0.9333, dev avg loss 0.201621, throughput 2.85621K wps
[Epoch 109 Batch 30/162] avg loss 0.000487643, throughput 2.92329K wps
[Epoch 109 Batch 60/162] avg loss 0.000484392, throughput 2.83565K wps
[Epoch 109 Batch 90/162] avg loss 0.000414888, throughput 2.83778K wps
[Epoch 109 Batch 120/162] avg loss 0.000394029, throughput 2.84161K wps
[Epoch 109 Batch 150/162] avg loss 0.000463902, throughput 2.83945K wps
Begin Testing...
[Epoch 109] train avg loss 0.000454044, dev acc 0.9322, dev avg loss 0.201877, throughput 2.85347K wps
[Epoch 110 Batch 30/162] avg loss 0.00042774, throughput 2.91423K wps
[Epoch 110 Batch 60/162] avg loss 0.000444865, throughput 2.83716K wps
[Epoch 110 Batch 90/162] avg loss 0.000487617, throughput 2.84233K wps
[Epoch 110 Batch 120/162] avg loss 0.00047429, throughput 2.84051K wps
[Epoch 110 Batch 150/162] avg loss 0.000550691, throughput 2.83746K wps
Begin Testing...
[Epoch 110] train avg loss 0.000474421, dev acc 0.9344, dev avg loss 0.202774, throughput 2.8531K wps
Observed Improvement.
Begin Testing...
[Epoch 111 Batch 30/162] avg loss 0.000491011, throughput 2.88949K wps
[Epoch 111 Batch 60/162] avg loss 0.000426401, throughput 2.84175K wps
[Epoch 111 Batch 90/162] avg loss 0.000482084, throughput 2.83853K wps
[Epoch 111 Batch 120/162] avg loss 0.000415557, throughput 2.82666K wps
[Epoch 111 Batch 150/162] avg loss 0.000436968, throughput 2.85359K wps
Begin Testing...
[Epoch 111] train avg loss 0.000452822, dev acc 0.9311, dev avg loss 0.203018, throughput 2.84884K wps