Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
15051 lines (15050 sloc) 932 KB
Namespace(batch_size=50, data_name='MPQA', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='rand')
Use gpu0
Downloading data/mpqa/all-bcbfeed8.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/mpqa/all-bcbfeed8.zip...
maximum length (in tokens): 36
Done! Tokenizing Time=0.09s, #Sentences=10606
SentimentNet(
(embedding): Embedding(6250 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/172] avg loss 0.0125234, throughput 0.532946K wps
[Epoch 0 Batch 60/172] avg loss 0.0123375, throughput 2.73026K wps
[Epoch 0 Batch 90/172] avg loss 0.0125022, throughput 2.64208K wps
[Epoch 0 Batch 120/172] avg loss 0.0124643, throughput 2.7225K wps
[Epoch 0 Batch 150/172] avg loss 0.0126737, throughput 2.71777K wps
Begin Testing...
[Epoch 0] train avg loss 0.0125385, dev acc 0.7013, dev avg loss 0.61156, throughput 1.17897K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0124262, throughput 2.53087K wps
[Epoch 1 Batch 60/172] avg loss 0.0125075, throughput 2.76563K wps
[Epoch 1 Batch 90/172] avg loss 0.0124945, throughput 2.80791K wps
[Epoch 1 Batch 120/172] avg loss 0.0122504, throughput 2.74929K wps
[Epoch 1 Batch 150/172] avg loss 0.0123345, throughput 2.79849K wps
Begin Testing...
[Epoch 1] train avg loss 0.0124487, dev acc 0.7013, dev avg loss 0.606113, throughput 2.72698K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0124421, throughput 2.78425K wps
[Epoch 2 Batch 60/172] avg loss 0.0124828, throughput 2.76233K wps
[Epoch 2 Batch 90/172] avg loss 0.0122618, throughput 2.82739K wps
[Epoch 2 Batch 120/172] avg loss 0.0123862, throughput 2.78934K wps
[Epoch 2 Batch 150/172] avg loss 0.0124548, throughput 2.72758K wps
Begin Testing...
[Epoch 2] train avg loss 0.0124261, dev acc 0.7013, dev avg loss 0.604503, throughput 2.77554K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0120688, throughput 2.67048K wps
[Epoch 3 Batch 60/172] avg loss 0.0127105, throughput 2.80974K wps
[Epoch 3 Batch 90/172] avg loss 0.0121919, throughput 2.75623K wps
[Epoch 3 Batch 120/172] avg loss 0.0120656, throughput 2.6694K wps
[Epoch 3 Batch 150/172] avg loss 0.0126343, throughput 2.77675K wps
Begin Testing...
[Epoch 3] train avg loss 0.0123567, dev acc 0.7013, dev avg loss 0.604481, throughput 2.73895K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0124398, throughput 2.7458K wps
[Epoch 4 Batch 60/172] avg loss 0.0124111, throughput 2.76109K wps
[Epoch 4 Batch 90/172] avg loss 0.012225, throughput 2.74066K wps
[Epoch 4 Batch 120/172] avg loss 0.0121885, throughput 2.73473K wps
[Epoch 4 Batch 150/172] avg loss 0.0123972, throughput 2.7748K wps
Begin Testing...
[Epoch 4] train avg loss 0.0123362, dev acc 0.7013, dev avg loss 0.602229, throughput 2.75117K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0121021, throughput 2.65915K wps
[Epoch 5 Batch 60/172] avg loss 0.0124908, throughput 2.74588K wps
[Epoch 5 Batch 90/172] avg loss 0.0122019, throughput 2.82335K wps
[Epoch 5 Batch 120/172] avg loss 0.0123642, throughput 2.74124K wps
[Epoch 5 Batch 150/172] avg loss 0.0122706, throughput 2.77318K wps
Begin Testing...
[Epoch 5] train avg loss 0.0123136, dev acc 0.7013, dev avg loss 0.600941, throughput 2.74239K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0126877, throughput 2.79821K wps
[Epoch 6 Batch 60/172] avg loss 0.0120788, throughput 2.77256K wps
[Epoch 6 Batch 90/172] avg loss 0.0122436, throughput 2.82114K wps
[Epoch 6 Batch 120/172] avg loss 0.012223, throughput 2.79777K wps
[Epoch 6 Batch 150/172] avg loss 0.0123582, throughput 2.78504K wps
Begin Testing...
[Epoch 6] train avg loss 0.0123023, dev acc 0.7013, dev avg loss 0.599614, throughput 2.77199K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.0124714, throughput 2.81606K wps
[Epoch 7 Batch 60/172] avg loss 0.0125008, throughput 2.69251K wps
[Epoch 7 Batch 90/172] avg loss 0.0122732, throughput 2.85829K wps
[Epoch 7 Batch 120/172] avg loss 0.0123425, throughput 2.81452K wps
[Epoch 7 Batch 150/172] avg loss 0.0120754, throughput 2.77939K wps
Begin Testing...
[Epoch 7] train avg loss 0.0122823, dev acc 0.7013, dev avg loss 0.598647, throughput 2.77765K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.0123134, throughput 2.82997K wps
[Epoch 8 Batch 60/172] avg loss 0.0120824, throughput 2.71314K wps
[Epoch 8 Batch 90/172] avg loss 0.0122549, throughput 2.78056K wps
[Epoch 8 Batch 120/172] avg loss 0.012074, throughput 2.72063K wps
[Epoch 8 Batch 150/172] avg loss 0.0123536, throughput 2.75479K wps
Begin Testing...
[Epoch 8] train avg loss 0.0122415, dev acc 0.7013, dev avg loss 0.598124, throughput 2.7414K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.0119945, throughput 2.79765K wps
[Epoch 9 Batch 60/172] avg loss 0.0122404, throughput 2.72272K wps
[Epoch 9 Batch 90/172] avg loss 0.0122949, throughput 2.64853K wps
[Epoch 9 Batch 120/172] avg loss 0.0119399, throughput 2.74107K wps
[Epoch 9 Batch 150/172] avg loss 0.0124093, throughput 2.65057K wps
Begin Testing...
[Epoch 9] train avg loss 0.0122194, dev acc 0.7013, dev avg loss 0.597554, throughput 2.69691K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.0119387, throughput 2.7963K wps
[Epoch 10 Batch 60/172] avg loss 0.0121337, throughput 2.69098K wps
[Epoch 10 Batch 90/172] avg loss 0.012159, throughput 2.70415K wps
[Epoch 10 Batch 120/172] avg loss 0.0123574, throughput 2.73506K wps
[Epoch 10 Batch 150/172] avg loss 0.0121218, throughput 2.76543K wps
Begin Testing...
[Epoch 10] train avg loss 0.0121956, dev acc 0.7013, dev avg loss 0.596687, throughput 2.72741K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.0123897, throughput 2.78472K wps
[Epoch 11 Batch 60/172] avg loss 0.0125163, throughput 2.74534K wps
[Epoch 11 Batch 90/172] avg loss 0.0119692, throughput 2.75486K wps
[Epoch 11 Batch 120/172] avg loss 0.0120943, throughput 2.70859K wps
[Epoch 11 Batch 150/172] avg loss 0.0121416, throughput 2.72455K wps
Begin Testing...
[Epoch 11] train avg loss 0.012151, dev acc 0.7013, dev avg loss 0.594291, throughput 2.74418K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.0120726, throughput 2.79527K wps
[Epoch 12 Batch 60/172] avg loss 0.0121274, throughput 2.73738K wps
[Epoch 12 Batch 90/172] avg loss 0.0123905, throughput 2.74507K wps
[Epoch 12 Batch 120/172] avg loss 0.0118897, throughput 2.73189K wps
[Epoch 12 Batch 150/172] avg loss 0.0120882, throughput 2.74583K wps
Begin Testing...
[Epoch 12] train avg loss 0.0121439, dev acc 0.7013, dev avg loss 0.594053, throughput 2.7569K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.0120156, throughput 2.8246K wps
[Epoch 13 Batch 60/172] avg loss 0.0120928, throughput 2.68178K wps
[Epoch 13 Batch 90/172] avg loss 0.0118816, throughput 2.81849K wps
[Epoch 13 Batch 120/172] avg loss 0.0122216, throughput 2.72555K wps
[Epoch 13 Batch 150/172] avg loss 0.0121958, throughput 2.72375K wps
Begin Testing...
[Epoch 13] train avg loss 0.0121045, dev acc 0.7013, dev avg loss 0.59278, throughput 2.7548K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.0118724, throughput 2.74629K wps
[Epoch 14 Batch 60/172] avg loss 0.012457, throughput 2.77046K wps
[Epoch 14 Batch 90/172] avg loss 0.011952, throughput 2.69614K wps
[Epoch 14 Batch 120/172] avg loss 0.0123069, throughput 2.79288K wps
[Epoch 14 Batch 150/172] avg loss 0.0117188, throughput 2.79521K wps
Begin Testing...
[Epoch 14] train avg loss 0.0120636, dev acc 0.7013, dev avg loss 0.59024, throughput 2.75834K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.0120394, throughput 2.76196K wps
[Epoch 15 Batch 60/172] avg loss 0.0118965, throughput 2.80098K wps
[Epoch 15 Batch 90/172] avg loss 0.0121855, throughput 2.80632K wps
[Epoch 15 Batch 120/172] avg loss 0.011999, throughput 2.72787K wps
[Epoch 15 Batch 150/172] avg loss 0.0119812, throughput 2.75652K wps
Begin Testing...
[Epoch 15] train avg loss 0.0120139, dev acc 0.7013, dev avg loss 0.588459, throughput 2.76624K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.0122381, throughput 2.81215K wps
[Epoch 16 Batch 60/172] avg loss 0.0118447, throughput 2.7257K wps
[Epoch 16 Batch 90/172] avg loss 0.012033, throughput 2.68364K wps
[Epoch 16 Batch 120/172] avg loss 0.0119547, throughput 2.82964K wps
[Epoch 16 Batch 150/172] avg loss 0.0117765, throughput 2.69394K wps
Begin Testing...
[Epoch 16] train avg loss 0.0119742, dev acc 0.7013, dev avg loss 0.58634, throughput 2.74455K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/172] avg loss 0.0117084, throughput 2.73558K wps
[Epoch 17 Batch 60/172] avg loss 0.0119114, throughput 2.72147K wps
[Epoch 17 Batch 90/172] avg loss 0.0119837, throughput 2.73322K wps
[Epoch 17 Batch 120/172] avg loss 0.0120529, throughput 2.74129K wps
[Epoch 17 Batch 150/172] avg loss 0.0119982, throughput 2.75959K wps
Begin Testing...
[Epoch 17] train avg loss 0.0119209, dev acc 0.7013, dev avg loss 0.584274, throughput 2.74142K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.0118328, throughput 2.81491K wps
[Epoch 18 Batch 60/172] avg loss 0.0116814, throughput 2.74601K wps
[Epoch 18 Batch 90/172] avg loss 0.012126, throughput 2.77129K wps
[Epoch 18 Batch 120/172] avg loss 0.0119319, throughput 2.75486K wps
[Epoch 18 Batch 150/172] avg loss 0.0119306, throughput 2.66037K wps
Begin Testing...
[Epoch 18] train avg loss 0.0118822, dev acc 0.7013, dev avg loss 0.581357, throughput 2.75315K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.0116737, throughput 2.74086K wps
[Epoch 19 Batch 60/172] avg loss 0.0119015, throughput 2.7371K wps
[Epoch 19 Batch 90/172] avg loss 0.0118344, throughput 2.75859K wps
[Epoch 19 Batch 120/172] avg loss 0.0116907, throughput 2.76613K wps
[Epoch 19 Batch 150/172] avg loss 0.0119185, throughput 2.74799K wps
Begin Testing...
[Epoch 19] train avg loss 0.0118007, dev acc 0.7013, dev avg loss 0.578956, throughput 2.74756K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.0118221, throughput 2.81938K wps
[Epoch 20 Batch 60/172] avg loss 0.0118231, throughput 2.68809K wps
[Epoch 20 Batch 90/172] avg loss 0.0118279, throughput 2.66738K wps
[Epoch 20 Batch 120/172] avg loss 0.0116419, throughput 2.83029K wps
[Epoch 20 Batch 150/172] avg loss 0.01138, throughput 2.66707K wps
Begin Testing...
[Epoch 20] train avg loss 0.0117266, dev acc 0.7013, dev avg loss 0.576035, throughput 2.73429K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/172] avg loss 0.0115038, throughput 2.76238K wps
[Epoch 21 Batch 60/172] avg loss 0.0115127, throughput 2.74374K wps
[Epoch 21 Batch 90/172] avg loss 0.0119344, throughput 2.69764K wps
[Epoch 21 Batch 120/172] avg loss 0.0116156, throughput 2.7254K wps
[Epoch 21 Batch 150/172] avg loss 0.0115362, throughput 2.82109K wps
Begin Testing...
[Epoch 21] train avg loss 0.0116494, dev acc 0.7044, dev avg loss 0.571996, throughput 2.73586K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.0113752, throughput 2.80124K wps
[Epoch 22 Batch 60/172] avg loss 0.0116105, throughput 2.74118K wps
[Epoch 22 Batch 90/172] avg loss 0.011633, throughput 2.80225K wps
[Epoch 22 Batch 120/172] avg loss 0.0117054, throughput 2.79247K wps
[Epoch 22 Batch 150/172] avg loss 0.0113992, throughput 2.78329K wps
Begin Testing...
[Epoch 22] train avg loss 0.0115311, dev acc 0.7075, dev avg loss 0.566758, throughput 2.78693K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/172] avg loss 0.0116247, throughput 2.83932K wps
[Epoch 23 Batch 60/172] avg loss 0.011285, throughput 2.7212K wps
[Epoch 23 Batch 90/172] avg loss 0.0116083, throughput 2.63K wps
[Epoch 23 Batch 120/172] avg loss 0.0109613, throughput 2.84354K wps
[Epoch 23 Batch 150/172] avg loss 0.0111687, throughput 2.78526K wps
Begin Testing...
[Epoch 23] train avg loss 0.0114148, dev acc 0.7128, dev avg loss 0.563873, throughput 2.75609K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.0113561, throughput 2.76758K wps
[Epoch 24 Batch 60/172] avg loss 0.0113431, throughput 2.70782K wps
[Epoch 24 Batch 90/172] avg loss 0.0114733, throughput 2.65082K wps
[Epoch 24 Batch 120/172] avg loss 0.0112568, throughput 2.66161K wps
[Epoch 24 Batch 150/172] avg loss 0.0111883, throughput 2.7013K wps
Begin Testing...
[Epoch 24] train avg loss 0.0112974, dev acc 0.7159, dev avg loss 0.556681, throughput 2.70075K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/172] avg loss 0.0112018, throughput 2.73745K wps
[Epoch 25 Batch 60/172] avg loss 0.0111738, throughput 2.6569K wps
[Epoch 25 Batch 90/172] avg loss 0.0111478, throughput 2.71414K wps
[Epoch 25 Batch 120/172] avg loss 0.0110669, throughput 2.76936K wps
[Epoch 25 Batch 150/172] avg loss 0.0113439, throughput 2.54911K wps
Begin Testing...
[Epoch 25] train avg loss 0.0111527, dev acc 0.7222, dev avg loss 0.549711, throughput 2.66479K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/172] avg loss 0.0110108, throughput 2.78729K wps
[Epoch 26 Batch 60/172] avg loss 0.010885, throughput 2.66511K wps
[Epoch 26 Batch 90/172] avg loss 0.0110166, throughput 2.71026K wps
[Epoch 26 Batch 120/172] avg loss 0.0108388, throughput 2.77367K wps
[Epoch 26 Batch 150/172] avg loss 0.0110184, throughput 2.71449K wps
Begin Testing...
[Epoch 26] train avg loss 0.0109869, dev acc 0.7327, dev avg loss 0.543464, throughput 2.72218K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/172] avg loss 0.0111009, throughput 2.84391K wps
[Epoch 27 Batch 60/172] avg loss 0.0108483, throughput 2.79316K wps
[Epoch 27 Batch 90/172] avg loss 0.0107005, throughput 2.79349K wps
[Epoch 27 Batch 120/172] avg loss 0.0106716, throughput 2.77641K wps
[Epoch 27 Batch 150/172] avg loss 0.0107991, throughput 2.78387K wps
Begin Testing...
[Epoch 27] train avg loss 0.0107896, dev acc 0.7338, dev avg loss 0.533942, throughput 2.78861K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/172] avg loss 0.0107249, throughput 2.79486K wps
[Epoch 28 Batch 60/172] avg loss 0.0106721, throughput 2.76519K wps
[Epoch 28 Batch 90/172] avg loss 0.0107402, throughput 2.67084K wps
[Epoch 28 Batch 120/172] avg loss 0.0107449, throughput 2.67183K wps
[Epoch 28 Batch 150/172] avg loss 0.0102663, throughput 2.77276K wps
Begin Testing...
[Epoch 28] train avg loss 0.0105996, dev acc 0.7411, dev avg loss 0.525269, throughput 2.73759K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/172] avg loss 0.0104617, throughput 2.74195K wps
[Epoch 29 Batch 60/172] avg loss 0.0102693, throughput 2.76233K wps
[Epoch 29 Batch 90/172] avg loss 0.0105656, throughput 2.73738K wps
[Epoch 29 Batch 120/172] avg loss 0.00994724, throughput 2.74553K wps
[Epoch 29 Batch 150/172] avg loss 0.0101816, throughput 2.75075K wps
Begin Testing...
[Epoch 29] train avg loss 0.0103344, dev acc 0.7683, dev avg loss 0.520219, throughput 2.73736K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/172] avg loss 0.0104474, throughput 2.84378K wps
[Epoch 30 Batch 60/172] avg loss 0.00985168, throughput 2.68396K wps
[Epoch 30 Batch 90/172] avg loss 0.0097767, throughput 2.7933K wps
[Epoch 30 Batch 120/172] avg loss 0.0102141, throughput 2.73989K wps
[Epoch 30 Batch 150/172] avg loss 0.0100118, throughput 2.78573K wps
Begin Testing...
[Epoch 30] train avg loss 0.0100836, dev acc 0.7683, dev avg loss 0.507031, throughput 2.75228K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/172] avg loss 0.00994007, throughput 2.70832K wps
[Epoch 31 Batch 60/172] avg loss 0.00981119, throughput 2.74173K wps
[Epoch 31 Batch 90/172] avg loss 0.00984882, throughput 2.72679K wps
[Epoch 31 Batch 120/172] avg loss 0.00942978, throughput 2.72604K wps
[Epoch 31 Batch 150/172] avg loss 0.00946801, throughput 2.65571K wps
Begin Testing...
[Epoch 31] train avg loss 0.00977988, dev acc 0.7893, dev avg loss 0.502625, throughput 2.70833K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/172] avg loss 0.00948219, throughput 2.76734K wps
[Epoch 32 Batch 60/172] avg loss 0.00956166, throughput 2.7418K wps
[Epoch 32 Batch 90/172] avg loss 0.00958388, throughput 2.7588K wps
[Epoch 32 Batch 120/172] avg loss 0.00958764, throughput 2.77274K wps
[Epoch 32 Batch 150/172] avg loss 0.0096425, throughput 2.63893K wps
Begin Testing...
[Epoch 32] train avg loss 0.00950067, dev acc 0.7683, dev avg loss 0.483888, throughput 2.73053K wps
[Epoch 33 Batch 30/172] avg loss 0.00924403, throughput 2.63207K wps
[Epoch 33 Batch 60/172] avg loss 0.00914182, throughput 2.78959K wps
[Epoch 33 Batch 90/172] avg loss 0.00917831, throughput 2.79195K wps
[Epoch 33 Batch 120/172] avg loss 0.00883342, throughput 2.66807K wps
[Epoch 33 Batch 150/172] avg loss 0.00943098, throughput 2.80474K wps
Begin Testing...
[Epoch 33] train avg loss 0.00919553, dev acc 0.7830, dev avg loss 0.473822, throughput 2.74314K wps
[Epoch 34 Batch 30/172] avg loss 0.0093112, throughput 2.78715K wps
[Epoch 34 Batch 60/172] avg loss 0.00895518, throughput 2.62124K wps
[Epoch 34 Batch 90/172] avg loss 0.00871367, throughput 2.68739K wps
[Epoch 34 Batch 120/172] avg loss 0.00865459, throughput 2.76356K wps
[Epoch 34 Batch 150/172] avg loss 0.00883846, throughput 2.74959K wps
Begin Testing...
[Epoch 34] train avg loss 0.00890301, dev acc 0.7883, dev avg loss 0.461503, throughput 2.72381K wps
[Epoch 35 Batch 30/172] avg loss 0.00877244, throughput 2.78607K wps
[Epoch 35 Batch 60/172] avg loss 0.00846479, throughput 2.71978K wps
[Epoch 35 Batch 90/172] avg loss 0.00863009, throughput 2.67673K wps
[Epoch 35 Batch 120/172] avg loss 0.00854281, throughput 2.72952K wps
[Epoch 35 Batch 150/172] avg loss 0.00848359, throughput 2.75945K wps
Begin Testing...
[Epoch 35] train avg loss 0.00860068, dev acc 0.7987, dev avg loss 0.451166, throughput 2.73378K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/172] avg loss 0.00838694, throughput 2.78686K wps
[Epoch 36 Batch 60/172] avg loss 0.00808305, throughput 2.75895K wps
[Epoch 36 Batch 90/172] avg loss 0.00804617, throughput 2.77486K wps
[Epoch 36 Batch 120/172] avg loss 0.00851341, throughput 2.71442K wps
[Epoch 36 Batch 150/172] avg loss 0.00811115, throughput 2.66443K wps
Begin Testing...
[Epoch 36] train avg loss 0.00822449, dev acc 0.8197, dev avg loss 0.44086, throughput 2.72725K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/172] avg loss 0.00814049, throughput 2.65972K wps
[Epoch 37 Batch 60/172] avg loss 0.00779859, throughput 2.77428K wps
[Epoch 37 Batch 90/172] avg loss 0.00792489, throughput 2.74936K wps
[Epoch 37 Batch 120/172] avg loss 0.00764064, throughput 2.73101K wps
[Epoch 37 Batch 150/172] avg loss 0.00788295, throughput 2.73973K wps
Begin Testing...
[Epoch 37] train avg loss 0.00788764, dev acc 0.7966, dev avg loss 0.431679, throughput 2.71114K wps
[Epoch 38 Batch 30/172] avg loss 0.00792389, throughput 2.62416K wps
[Epoch 38 Batch 60/172] avg loss 0.00730882, throughput 2.63129K wps
[Epoch 38 Batch 90/172] avg loss 0.00774611, throughput 2.7071K wps
[Epoch 38 Batch 120/172] avg loss 0.00749192, throughput 2.75308K wps
[Epoch 38 Batch 150/172] avg loss 0.00743677, throughput 2.7318K wps
Begin Testing...
[Epoch 38] train avg loss 0.00761001, dev acc 0.8312, dev avg loss 0.419913, throughput 2.69172K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/172] avg loss 0.00750853, throughput 2.79368K wps
[Epoch 39 Batch 60/172] avg loss 0.00731334, throughput 2.76832K wps
[Epoch 39 Batch 90/172] avg loss 0.00719182, throughput 2.76325K wps
[Epoch 39 Batch 120/172] avg loss 0.00733836, throughput 2.79812K wps
[Epoch 39 Batch 150/172] avg loss 0.00721348, throughput 2.79853K wps
Begin Testing...
[Epoch 39] train avg loss 0.00727218, dev acc 0.8019, dev avg loss 0.414396, throughput 2.77218K wps
[Epoch 40 Batch 30/172] avg loss 0.00725083, throughput 2.77514K wps
[Epoch 40 Batch 60/172] avg loss 0.00701027, throughput 2.83591K wps
[Epoch 40 Batch 90/172] avg loss 0.00686752, throughput 2.8169K wps
[Epoch 40 Batch 120/172] avg loss 0.00701266, throughput 2.76276K wps
[Epoch 40 Batch 150/172] avg loss 0.00655243, throughput 2.71995K wps
Begin Testing...
[Epoch 40] train avg loss 0.00696034, dev acc 0.8312, dev avg loss 0.40082, throughput 2.77624K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/172] avg loss 0.00679633, throughput 2.87007K wps
[Epoch 41 Batch 60/172] avg loss 0.00672662, throughput 2.74982K wps
[Epoch 41 Batch 90/172] avg loss 0.00690358, throughput 2.7187K wps
[Epoch 41 Batch 120/172] avg loss 0.00623771, throughput 2.67511K wps
[Epoch 41 Batch 150/172] avg loss 0.00690873, throughput 2.7658K wps
Begin Testing...
[Epoch 41] train avg loss 0.00667646, dev acc 0.8438, dev avg loss 0.392877, throughput 2.75613K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/172] avg loss 0.00612954, throughput 2.76637K wps
[Epoch 42 Batch 60/172] avg loss 0.00659402, throughput 2.74608K wps
[Epoch 42 Batch 90/172] avg loss 0.00647457, throughput 2.67788K wps
[Epoch 42 Batch 120/172] avg loss 0.00639373, throughput 2.79499K wps
[Epoch 42 Batch 150/172] avg loss 0.00645127, throughput 2.77756K wps
Begin Testing...
[Epoch 42] train avg loss 0.00638207, dev acc 0.8470, dev avg loss 0.385755, throughput 2.74065K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/172] avg loss 0.00584914, throughput 2.74244K wps
[Epoch 43 Batch 60/172] avg loss 0.00615525, throughput 2.70107K wps
[Epoch 43 Batch 90/172] avg loss 0.00600459, throughput 2.79258K wps
[Epoch 43 Batch 120/172] avg loss 0.00601312, throughput 2.80383K wps
[Epoch 43 Batch 150/172] avg loss 0.00621801, throughput 2.80218K wps
Begin Testing...
[Epoch 43] train avg loss 0.00609518, dev acc 0.8438, dev avg loss 0.377661, throughput 2.77298K wps
[Epoch 44 Batch 30/172] avg loss 0.00594195, throughput 2.66553K wps
[Epoch 44 Batch 60/172] avg loss 0.00589407, throughput 2.715K wps
[Epoch 44 Batch 90/172] avg loss 0.00609345, throughput 2.77547K wps
[Epoch 44 Batch 120/172] avg loss 0.00595247, throughput 2.60027K wps
[Epoch 44 Batch 150/172] avg loss 0.00589622, throughput 2.73835K wps
Begin Testing...
[Epoch 44] train avg loss 0.00592505, dev acc 0.8512, dev avg loss 0.370725, throughput 2.70598K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/172] avg loss 0.00571473, throughput 2.72799K wps
[Epoch 45 Batch 60/172] avg loss 0.00523574, throughput 2.72493K wps
[Epoch 45 Batch 90/172] avg loss 0.00576828, throughput 2.74529K wps
[Epoch 45 Batch 120/172] avg loss 0.00570738, throughput 2.64173K wps
[Epoch 45 Batch 150/172] avg loss 0.00556366, throughput 2.59969K wps
Begin Testing...
[Epoch 45] train avg loss 0.00562988, dev acc 0.8564, dev avg loss 0.365093, throughput 2.69789K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/172] avg loss 0.00538643, throughput 2.77833K wps
[Epoch 46 Batch 60/172] avg loss 0.00533658, throughput 2.72003K wps
[Epoch 46 Batch 90/172] avg loss 0.00539873, throughput 2.71313K wps
[Epoch 46 Batch 120/172] avg loss 0.0056407, throughput 2.68407K wps
[Epoch 46 Batch 150/172] avg loss 0.00532169, throughput 2.62555K wps
Begin Testing...
[Epoch 46] train avg loss 0.00540146, dev acc 0.8627, dev avg loss 0.359591, throughput 2.70864K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/172] avg loss 0.00492514, throughput 2.80648K wps
[Epoch 47 Batch 60/172] avg loss 0.00519278, throughput 2.73315K wps
[Epoch 47 Batch 90/172] avg loss 0.00489949, throughput 2.72097K wps
[Epoch 47 Batch 120/172] avg loss 0.00519263, throughput 2.77429K wps
[Epoch 47 Batch 150/172] avg loss 0.00538152, throughput 2.79204K wps
Begin Testing...
[Epoch 47] train avg loss 0.00517187, dev acc 0.8532, dev avg loss 0.358777, throughput 2.76595K wps
[Epoch 48 Batch 30/172] avg loss 0.00518588, throughput 2.78825K wps
[Epoch 48 Batch 60/172] avg loss 0.00479815, throughput 2.77226K wps
[Epoch 48 Batch 90/172] avg loss 0.00519246, throughput 2.81648K wps
[Epoch 48 Batch 120/172] avg loss 0.00473664, throughput 2.76987K wps
[Epoch 48 Batch 150/172] avg loss 0.00514404, throughput 2.79641K wps
Begin Testing...
[Epoch 48] train avg loss 0.0049523, dev acc 0.8679, dev avg loss 0.350469, throughput 2.78304K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/172] avg loss 0.00487056, throughput 2.86023K wps
[Epoch 49 Batch 60/172] avg loss 0.00501513, throughput 2.78634K wps
[Epoch 49 Batch 90/172] avg loss 0.00454426, throughput 2.79481K wps
[Epoch 49 Batch 120/172] avg loss 0.00467377, throughput 2.66214K wps
[Epoch 49 Batch 150/172] avg loss 0.00480287, throughput 2.71615K wps
Begin Testing...
[Epoch 49] train avg loss 0.00475993, dev acc 0.8627, dev avg loss 0.346938, throughput 2.76429K wps
[Epoch 50 Batch 30/172] avg loss 0.00468657, throughput 2.85105K wps
[Epoch 50 Batch 60/172] avg loss 0.00450293, throughput 2.77091K wps
[Epoch 50 Batch 90/172] avg loss 0.00443477, throughput 2.60597K wps
[Epoch 50 Batch 120/172] avg loss 0.00485379, throughput 2.70564K wps
[Epoch 50 Batch 150/172] avg loss 0.00448217, throughput 2.75877K wps
Begin Testing...
[Epoch 50] train avg loss 0.0045829, dev acc 0.8564, dev avg loss 0.345196, throughput 2.73539K wps
[Epoch 51 Batch 30/172] avg loss 0.00436078, throughput 2.80012K wps
[Epoch 51 Batch 60/172] avg loss 0.00432292, throughput 2.73029K wps
[Epoch 51 Batch 90/172] avg loss 0.00425862, throughput 2.66061K wps
[Epoch 51 Batch 120/172] avg loss 0.00419824, throughput 2.76275K wps
[Epoch 51 Batch 150/172] avg loss 0.00440706, throughput 2.78618K wps
Begin Testing...
[Epoch 51] train avg loss 0.00438455, dev acc 0.8585, dev avg loss 0.344602, throughput 2.74934K wps
[Epoch 52 Batch 30/172] avg loss 0.00394756, throughput 2.79465K wps
[Epoch 52 Batch 60/172] avg loss 0.00419856, throughput 2.80391K wps
[Epoch 52 Batch 90/172] avg loss 0.0044361, throughput 2.80457K wps
[Epoch 52 Batch 120/172] avg loss 0.00412083, throughput 2.78804K wps
[Epoch 52 Batch 150/172] avg loss 0.00430769, throughput 2.7058K wps
Begin Testing...
[Epoch 52] train avg loss 0.00417777, dev acc 0.8595, dev avg loss 0.339339, throughput 2.75984K wps
[Epoch 53 Batch 30/172] avg loss 0.00424448, throughput 2.78318K wps
[Epoch 53 Batch 60/172] avg loss 0.00376644, throughput 2.83872K wps
[Epoch 53 Batch 90/172] avg loss 0.00384269, throughput 2.79816K wps
[Epoch 53 Batch 120/172] avg loss 0.00402926, throughput 2.79143K wps
[Epoch 53 Batch 150/172] avg loss 0.00399768, throughput 2.73836K wps
Begin Testing...
[Epoch 53] train avg loss 0.00397107, dev acc 0.8658, dev avg loss 0.344282, throughput 2.76208K wps
[Epoch 54 Batch 30/172] avg loss 0.00415488, throughput 2.71713K wps
[Epoch 54 Batch 60/172] avg loss 0.0040807, throughput 2.76627K wps
[Epoch 54 Batch 90/172] avg loss 0.00401539, throughput 2.69072K wps
[Epoch 54 Batch 120/172] avg loss 0.00363349, throughput 2.71321K wps
[Epoch 54 Batch 150/172] avg loss 0.00364145, throughput 2.75657K wps
Begin Testing...
[Epoch 54] train avg loss 0.00389302, dev acc 0.8585, dev avg loss 0.336901, throughput 2.72616K wps
[Epoch 55 Batch 30/172] avg loss 0.00324891, throughput 2.81536K wps
[Epoch 55 Batch 60/172] avg loss 0.00406706, throughput 2.74056K wps
[Epoch 55 Batch 90/172] avg loss 0.00399513, throughput 2.74386K wps
[Epoch 55 Batch 120/172] avg loss 0.00371696, throughput 2.74897K wps
[Epoch 55 Batch 150/172] avg loss 0.0037602, throughput 2.73653K wps
Begin Testing...
[Epoch 55] train avg loss 0.00375307, dev acc 0.8585, dev avg loss 0.336379, throughput 2.74197K wps
[Epoch 56 Batch 30/172] avg loss 0.00385181, throughput 2.78604K wps
[Epoch 56 Batch 60/172] avg loss 0.00342037, throughput 2.75758K wps
[Epoch 56 Batch 90/172] avg loss 0.00349121, throughput 2.65617K wps
[Epoch 56 Batch 120/172] avg loss 0.00406349, throughput 2.75439K wps
[Epoch 56 Batch 150/172] avg loss 0.00354956, throughput 2.77137K wps
Begin Testing...
[Epoch 56] train avg loss 0.00364575, dev acc 0.8690, dev avg loss 0.338318, throughput 2.74174K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/172] avg loss 0.00342441, throughput 2.786K wps
[Epoch 57 Batch 60/172] avg loss 0.00356364, throughput 2.72396K wps
[Epoch 57 Batch 90/172] avg loss 0.0035619, throughput 2.66582K wps
[Epoch 57 Batch 120/172] avg loss 0.00349683, throughput 2.74254K wps
[Epoch 57 Batch 150/172] avg loss 0.00333489, throughput 2.73612K wps
Begin Testing...
[Epoch 57] train avg loss 0.00350828, dev acc 0.8658, dev avg loss 0.333282, throughput 2.73652K wps
[Epoch 58 Batch 30/172] avg loss 0.00336653, throughput 2.82459K wps
[Epoch 58 Batch 60/172] avg loss 0.00333926, throughput 2.61873K wps
[Epoch 58 Batch 90/172] avg loss 0.00342524, throughput 2.80095K wps
[Epoch 58 Batch 120/172] avg loss 0.00326574, throughput 2.69457K wps
[Epoch 58 Batch 150/172] avg loss 0.00342173, throughput 2.63049K wps
Begin Testing...
[Epoch 58] train avg loss 0.0034042, dev acc 0.8543, dev avg loss 0.336748, throughput 2.71346K wps
[Epoch 59 Batch 30/172] avg loss 0.00328317, throughput 2.78631K wps
[Epoch 59 Batch 60/172] avg loss 0.00311177, throughput 2.78131K wps
[Epoch 59 Batch 90/172] avg loss 0.0032612, throughput 2.76988K wps
[Epoch 59 Batch 120/172] avg loss 0.00345579, throughput 2.74091K wps
[Epoch 59 Batch 150/172] avg loss 0.00341887, throughput 2.73597K wps
Begin Testing...
[Epoch 59] train avg loss 0.00330643, dev acc 0.8669, dev avg loss 0.333526, throughput 2.76153K wps
[Epoch 60 Batch 30/172] avg loss 0.0032509, throughput 2.84452K wps
[Epoch 60 Batch 60/172] avg loss 0.00318152, throughput 2.72848K wps
[Epoch 60 Batch 90/172] avg loss 0.00328768, throughput 2.72636K wps
[Epoch 60 Batch 120/172] avg loss 0.00306365, throughput 2.72796K wps
[Epoch 60 Batch 150/172] avg loss 0.00357174, throughput 2.77882K wps
Begin Testing...
[Epoch 60] train avg loss 0.00321843, dev acc 0.8658, dev avg loss 0.340111, throughput 2.76139K wps
[Epoch 61 Batch 30/172] avg loss 0.00304537, throughput 2.80769K wps
[Epoch 61 Batch 60/172] avg loss 0.002795, throughput 2.77357K wps
[Epoch 61 Batch 90/172] avg loss 0.00302494, throughput 2.78652K wps
[Epoch 61 Batch 120/172] avg loss 0.00335501, throughput 2.79796K wps
[Epoch 61 Batch 150/172] avg loss 0.00349244, throughput 2.77694K wps
Begin Testing...
[Epoch 61] train avg loss 0.00310395, dev acc 0.8679, dev avg loss 0.349527, throughput 2.76893K wps
[Epoch 62 Batch 30/172] avg loss 0.00287152, throughput 2.83081K wps
[Epoch 62 Batch 60/172] avg loss 0.00289144, throughput 2.7754K wps
[Epoch 62 Batch 90/172] avg loss 0.00321855, throughput 2.77988K wps
[Epoch 62 Batch 120/172] avg loss 0.00284643, throughput 2.73675K wps
[Epoch 62 Batch 150/172] avg loss 0.00322759, throughput 2.73674K wps
Begin Testing...
[Epoch 62] train avg loss 0.00298748, dev acc 0.8690, dev avg loss 0.336302, throughput 2.77207K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/172] avg loss 0.00283236, throughput 2.76718K wps
[Epoch 63 Batch 60/172] avg loss 0.00322029, throughput 2.74635K wps
[Epoch 63 Batch 90/172] avg loss 0.0026903, throughput 2.75897K wps
[Epoch 63 Batch 120/172] avg loss 0.00310923, throughput 2.75218K wps
[Epoch 63 Batch 150/172] avg loss 0.00306929, throughput 2.75531K wps
Begin Testing...
[Epoch 63] train avg loss 0.00295408, dev acc 0.8627, dev avg loss 0.342923, throughput 2.7525K wps
[Epoch 64 Batch 30/172] avg loss 0.00309647, throughput 2.69895K wps
[Epoch 64 Batch 60/172] avg loss 0.00280733, throughput 2.67248K wps
[Epoch 64 Batch 90/172] avg loss 0.00293607, throughput 2.845K wps
[Epoch 64 Batch 120/172] avg loss 0.00271846, throughput 2.66619K wps
[Epoch 64 Batch 150/172] avg loss 0.00298055, throughput 2.8067K wps
Begin Testing...
[Epoch 64] train avg loss 0.00288868, dev acc 0.8679, dev avg loss 0.336463, throughput 2.73374K wps
[Epoch 65 Batch 30/172] avg loss 0.00249416, throughput 2.63975K wps
[Epoch 65 Batch 60/172] avg loss 0.00295161, throughput 2.76881K wps
[Epoch 65 Batch 90/172] avg loss 0.00307445, throughput 2.80018K wps
[Epoch 65 Batch 120/172] avg loss 0.00280866, throughput 2.68868K wps
[Epoch 65 Batch 150/172] avg loss 0.0028223, throughput 2.8189K wps
Begin Testing...
[Epoch 65] train avg loss 0.00285309, dev acc 0.8627, dev avg loss 0.344285, throughput 2.7347K wps
[Epoch 66 Batch 30/172] avg loss 0.00270249, throughput 2.69554K wps
[Epoch 66 Batch 60/172] avg loss 0.00264868, throughput 2.78984K wps
[Epoch 66 Batch 90/172] avg loss 0.00253907, throughput 2.7437K wps
[Epoch 66 Batch 120/172] avg loss 0.0028522, throughput 2.78426K wps
[Epoch 66 Batch 150/172] avg loss 0.00264956, throughput 2.69926K wps
Begin Testing...
[Epoch 66] train avg loss 0.00272097, dev acc 0.8637, dev avg loss 0.345405, throughput 2.75211K wps
[Epoch 67 Batch 30/172] avg loss 0.00245408, throughput 2.61136K wps
[Epoch 67 Batch 60/172] avg loss 0.00283128, throughput 2.65552K wps
[Epoch 67 Batch 90/172] avg loss 0.00263061, throughput 2.77875K wps
[Epoch 67 Batch 120/172] avg loss 0.00288607, throughput 2.746K wps
[Epoch 67 Batch 150/172] avg loss 0.00245977, throughput 2.74795K wps
Begin Testing...
[Epoch 67] train avg loss 0.00267532, dev acc 0.8648, dev avg loss 0.341481, throughput 2.70581K wps
[Epoch 68 Batch 30/172] avg loss 0.00270889, throughput 2.66541K wps
[Epoch 68 Batch 60/172] avg loss 0.00245259, throughput 2.79243K wps
[Epoch 68 Batch 90/172] avg loss 0.00261085, throughput 2.77864K wps
[Epoch 68 Batch 120/172] avg loss 0.00263567, throughput 2.71775K wps
[Epoch 68 Batch 150/172] avg loss 0.00274977, throughput 2.82964K wps
Begin Testing...
[Epoch 68] train avg loss 0.00262869, dev acc 0.8679, dev avg loss 0.351465, throughput 2.76005K wps
[Epoch 69 Batch 30/172] avg loss 0.00257165, throughput 2.80514K wps
[Epoch 69 Batch 60/172] avg loss 0.00267527, throughput 2.76523K wps
[Epoch 69 Batch 90/172] avg loss 0.00271117, throughput 2.72641K wps
[Epoch 69 Batch 120/172] avg loss 0.00225517, throughput 2.72031K wps
[Epoch 69 Batch 150/172] avg loss 0.0024171, throughput 2.81706K wps
Begin Testing...
[Epoch 69] train avg loss 0.00254481, dev acc 0.8669, dev avg loss 0.356109, throughput 2.7621K wps
[Epoch 70 Batch 30/172] avg loss 0.00253194, throughput 2.84426K wps
[Epoch 70 Batch 60/172] avg loss 0.0023303, throughput 2.77487K wps
[Epoch 70 Batch 90/172] avg loss 0.00242167, throughput 2.68531K wps
[Epoch 70 Batch 120/172] avg loss 0.00255038, throughput 2.61834K wps
[Epoch 70 Batch 150/172] avg loss 0.00258237, throughput 2.8216K wps
Begin Testing...
[Epoch 70] train avg loss 0.00252293, dev acc 0.8669, dev avg loss 0.350911, throughput 2.75083K wps
[Epoch 71 Batch 30/172] avg loss 0.00236049, throughput 2.78017K wps
[Epoch 71 Batch 60/172] avg loss 0.0020036, throughput 2.78478K wps
[Epoch 71 Batch 90/172] avg loss 0.0024269, throughput 2.76265K wps
[Epoch 71 Batch 120/172] avg loss 0.00269523, throughput 2.77314K wps
[Epoch 71 Batch 150/172] avg loss 0.00252176, throughput 2.7089K wps
Begin Testing...
[Epoch 71] train avg loss 0.0024496, dev acc 0.8658, dev avg loss 0.354979, throughput 2.76076K wps
[Epoch 72 Batch 30/172] avg loss 0.00221487, throughput 2.75301K wps
[Epoch 72 Batch 60/172] avg loss 0.00234466, throughput 2.70556K wps
[Epoch 72 Batch 90/172] avg loss 0.00259357, throughput 2.7301K wps
[Epoch 72 Batch 120/172] avg loss 0.00256938, throughput 2.72183K wps
[Epoch 72 Batch 150/172] avg loss 0.00252747, throughput 2.63087K wps
Begin Testing...
[Epoch 72] train avg loss 0.00242101, dev acc 0.8700, dev avg loss 0.351882, throughput 2.71506K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/172] avg loss 0.00211708, throughput 2.82841K wps
[Epoch 73 Batch 60/172] avg loss 0.0023388, throughput 2.81434K wps
[Epoch 73 Batch 90/172] avg loss 0.0021861, throughput 2.81305K wps
[Epoch 73 Batch 120/172] avg loss 0.00260183, throughput 2.74778K wps
[Epoch 73 Batch 150/172] avg loss 0.00256401, throughput 2.81462K wps
Begin Testing...
[Epoch 73] train avg loss 0.00236486, dev acc 0.8679, dev avg loss 0.35211, throughput 2.80558K wps
[Epoch 74 Batch 30/172] avg loss 0.00214884, throughput 2.7528K wps
[Epoch 74 Batch 60/172] avg loss 0.00222897, throughput 2.80337K wps
[Epoch 74 Batch 90/172] avg loss 0.00219156, throughput 2.75892K wps
[Epoch 74 Batch 120/172] avg loss 0.00239297, throughput 2.75999K wps
[Epoch 74 Batch 150/172] avg loss 0.00238646, throughput 2.77342K wps
Begin Testing...
[Epoch 74] train avg loss 0.00231734, dev acc 0.8595, dev avg loss 0.351624, throughput 2.76942K wps
[Epoch 75 Batch 30/172] avg loss 0.00212461, throughput 2.81487K wps
[Epoch 75 Batch 60/172] avg loss 0.00227549, throughput 2.75072K wps
[Epoch 75 Batch 90/172] avg loss 0.00228443, throughput 2.5743K wps
[Epoch 75 Batch 120/172] avg loss 0.00243451, throughput 2.59007K wps
[Epoch 75 Batch 150/172] avg loss 0.00207521, throughput 2.75455K wps
Begin Testing...
[Epoch 75] train avg loss 0.00223508, dev acc 0.8679, dev avg loss 0.358579, throughput 2.69754K wps
[Epoch 76 Batch 30/172] avg loss 0.00189613, throughput 2.62567K wps
[Epoch 76 Batch 60/172] avg loss 0.00228818, throughput 2.77835K wps
[Epoch 76 Batch 90/172] avg loss 0.00264577, throughput 2.77128K wps
[Epoch 76 Batch 120/172] avg loss 0.00234416, throughput 2.76795K wps
[Epoch 76 Batch 150/172] avg loss 0.00219354, throughput 2.7669K wps
Begin Testing...
[Epoch 76] train avg loss 0.00224595, dev acc 0.8658, dev avg loss 0.367465, throughput 2.74117K wps
[Epoch 77 Batch 30/172] avg loss 0.00214593, throughput 2.72799K wps
[Epoch 77 Batch 60/172] avg loss 0.00211649, throughput 2.7135K wps
[Epoch 77 Batch 90/172] avg loss 0.00230052, throughput 2.75195K wps
[Epoch 77 Batch 120/172] avg loss 0.00201824, throughput 2.77635K wps
[Epoch 77 Batch 150/172] avg loss 0.00239219, throughput 2.82635K wps
Begin Testing...
[Epoch 77] train avg loss 0.00222857, dev acc 0.8679, dev avg loss 0.3589, throughput 2.76528K wps
[Epoch 78 Batch 30/172] avg loss 0.00199748, throughput 2.85762K wps
[Epoch 78 Batch 60/172] avg loss 0.00233719, throughput 2.78829K wps
[Epoch 78 Batch 90/172] avg loss 0.00195838, throughput 2.78756K wps
[Epoch 78 Batch 120/172] avg loss 0.00225986, throughput 2.77733K wps
[Epoch 78 Batch 150/172] avg loss 0.00245358, throughput 2.76165K wps
Begin Testing...
[Epoch 78] train avg loss 0.00217402, dev acc 0.8669, dev avg loss 0.386531, throughput 2.79462K wps
[Epoch 79 Batch 30/172] avg loss 0.00206987, throughput 2.68981K wps
[Epoch 79 Batch 60/172] avg loss 0.00210264, throughput 2.66295K wps
[Epoch 79 Batch 90/172] avg loss 0.00193894, throughput 2.74998K wps
[Epoch 79 Batch 120/172] avg loss 0.00212541, throughput 2.7182K wps
[Epoch 79 Batch 150/172] avg loss 0.00230131, throughput 2.6288K wps
Begin Testing...
[Epoch 79] train avg loss 0.00212297, dev acc 0.8700, dev avg loss 0.368525, throughput 2.6984K wps
Observed Improvement.
Begin Testing...
[Epoch 80 Batch 30/172] avg loss 0.00190773, throughput 2.81186K wps
[Epoch 80 Batch 60/172] avg loss 0.00204583, throughput 2.77205K wps
[Epoch 80 Batch 90/172] avg loss 0.00203419, throughput 2.76227K wps
[Epoch 80 Batch 120/172] avg loss 0.00225498, throughput 2.76275K wps
[Epoch 80 Batch 150/172] avg loss 0.00209642, throughput 2.78569K wps
Begin Testing...
[Epoch 80] train avg loss 0.00208132, dev acc 0.8658, dev avg loss 0.367116, throughput 2.78141K wps
[Epoch 81 Batch 30/172] avg loss 0.00185957, throughput 2.84526K wps
[Epoch 81 Batch 60/172] avg loss 0.00218968, throughput 2.78435K wps
[Epoch 81 Batch 90/172] avg loss 0.00197393, throughput 2.79424K wps
[Epoch 81 Batch 120/172] avg loss 0.0021763, throughput 2.74758K wps
[Epoch 81 Batch 150/172] avg loss 0.00198424, throughput 2.68355K wps
Begin Testing...
[Epoch 81] train avg loss 0.00203681, dev acc 0.8679, dev avg loss 0.367066, throughput 2.77259K wps
[Epoch 82 Batch 30/172] avg loss 0.0020596, throughput 2.7266K wps
[Epoch 82 Batch 60/172] avg loss 0.00209197, throughput 2.78782K wps
[Epoch 82 Batch 90/172] avg loss 0.00194811, throughput 2.77429K wps
[Epoch 82 Batch 120/172] avg loss 0.00192688, throughput 2.71724K wps
[Epoch 82 Batch 150/172] avg loss 0.0020989, throughput 2.66389K wps
Begin Testing...
[Epoch 82] train avg loss 0.00202262, dev acc 0.8658, dev avg loss 0.374415, throughput 2.72294K wps
[Epoch 83 Batch 30/172] avg loss 0.00182549, throughput 2.66118K wps
[Epoch 83 Batch 60/172] avg loss 0.0018533, throughput 2.76941K wps
[Epoch 83 Batch 90/172] avg loss 0.00187356, throughput 2.78238K wps
[Epoch 83 Batch 120/172] avg loss 0.00206684, throughput 2.66198K wps
[Epoch 83 Batch 150/172] avg loss 0.0024217, throughput 2.81113K wps
Begin Testing...
[Epoch 83] train avg loss 0.00200355, dev acc 0.8679, dev avg loss 0.373013, throughput 2.73953K wps
[Epoch 84 Batch 30/172] avg loss 0.00205802, throughput 2.77832K wps
[Epoch 84 Batch 60/172] avg loss 0.00176647, throughput 2.78839K wps
[Epoch 84 Batch 90/172] avg loss 0.00192376, throughput 2.76701K wps
[Epoch 84 Batch 120/172] avg loss 0.00175628, throughput 2.6906K wps
[Epoch 84 Batch 150/172] avg loss 0.0020472, throughput 2.69543K wps
Begin Testing...
[Epoch 84] train avg loss 0.0019226, dev acc 0.8658, dev avg loss 0.382334, throughput 2.75065K wps
[Epoch 85 Batch 30/172] avg loss 0.00193856, throughput 2.79557K wps
[Epoch 85 Batch 60/172] avg loss 0.00171239, throughput 2.72049K wps
[Epoch 85 Batch 90/172] avg loss 0.00196436, throughput 2.74715K wps
[Epoch 85 Batch 120/172] avg loss 0.00229798, throughput 2.75974K wps
[Epoch 85 Batch 150/172] avg loss 0.00192994, throughput 2.69215K wps
Begin Testing...
[Epoch 85] train avg loss 0.00194353, dev acc 0.8669, dev avg loss 0.3841, throughput 2.74056K wps
[Epoch 86 Batch 30/172] avg loss 0.0016427, throughput 2.69952K wps
[Epoch 86 Batch 60/172] avg loss 0.0019298, throughput 2.76279K wps
[Epoch 86 Batch 90/172] avg loss 0.00168878, throughput 2.60156K wps
[Epoch 86 Batch 120/172] avg loss 0.00199934, throughput 2.80914K wps
[Epoch 86 Batch 150/172] avg loss 0.00211546, throughput 2.79478K wps
Begin Testing...
[Epoch 86] train avg loss 0.00190676, dev acc 0.8679, dev avg loss 0.389887, throughput 2.74063K wps
[Epoch 87 Batch 30/172] avg loss 0.0020578, throughput 2.72945K wps
[Epoch 87 Batch 60/172] avg loss 0.00186589, throughput 2.81551K wps
[Epoch 87 Batch 90/172] avg loss 0.00178669, throughput 2.82344K wps
[Epoch 87 Batch 120/172] avg loss 0.00171245, throughput 2.80202K wps
[Epoch 87 Batch 150/172] avg loss 0.00199732, throughput 2.79851K wps
Begin Testing...
[Epoch 87] train avg loss 0.00189076, dev acc 0.8679, dev avg loss 0.388809, throughput 2.78944K wps
[Epoch 88 Batch 30/172] avg loss 0.00189228, throughput 2.75688K wps
[Epoch 88 Batch 60/172] avg loss 0.00195045, throughput 2.76136K wps
[Epoch 88 Batch 90/172] avg loss 0.0020434, throughput 2.74303K wps
[Epoch 88 Batch 120/172] avg loss 0.00170828, throughput 2.69318K wps
[Epoch 88 Batch 150/172] avg loss 0.00189269, throughput 2.82261K wps
Begin Testing...
[Epoch 88] train avg loss 0.00186786, dev acc 0.8658, dev avg loss 0.388238, throughput 2.7571K wps
[Epoch 89 Batch 30/172] avg loss 0.0017551, throughput 2.73495K wps
[Epoch 89 Batch 60/172] avg loss 0.00175821, throughput 2.79225K wps
[Epoch 89 Batch 90/172] avg loss 0.00210224, throughput 2.75778K wps
[Epoch 89 Batch 120/172] avg loss 0.00187586, throughput 2.78771K wps
[Epoch 89 Batch 150/172] avg loss 0.00181701, throughput 2.80724K wps
Begin Testing...
[Epoch 89] train avg loss 0.00183282, dev acc 0.8658, dev avg loss 0.388814, throughput 2.7798K wps
[Epoch 90 Batch 30/172] avg loss 0.00180178, throughput 2.87793K wps
[Epoch 90 Batch 60/172] avg loss 0.00167905, throughput 2.77798K wps
[Epoch 90 Batch 90/172] avg loss 0.00197271, throughput 2.78347K wps
[Epoch 90 Batch 120/172] avg loss 0.00195005, throughput 2.71351K wps
[Epoch 90 Batch 150/172] avg loss 0.00180674, throughput 2.85661K wps
Begin Testing...
[Epoch 90] train avg loss 0.00182503, dev acc 0.8679, dev avg loss 0.39442, throughput 2.80243K wps
[Epoch 91 Batch 30/172] avg loss 0.00175911, throughput 2.7759K wps
[Epoch 91 Batch 60/172] avg loss 0.00172725, throughput 2.80544K wps
[Epoch 91 Batch 90/172] avg loss 0.00160701, throughput 2.72256K wps
[Epoch 91 Batch 120/172] avg loss 0.00193925, throughput 2.73205K wps
[Epoch 91 Batch 150/172] avg loss 0.00160124, throughput 2.71703K wps
Begin Testing...
[Epoch 91] train avg loss 0.00174852, dev acc 0.8711, dev avg loss 0.411123, throughput 2.74295K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/172] avg loss 0.00178776, throughput 2.71412K wps
[Epoch 92 Batch 60/172] avg loss 0.00149778, throughput 2.79396K wps
[Epoch 92 Batch 90/172] avg loss 0.00198841, throughput 2.68361K wps
[Epoch 92 Batch 120/172] avg loss 0.00175526, throughput 2.7398K wps
[Epoch 92 Batch 150/172] avg loss 0.00163689, throughput 2.67712K wps
Begin Testing...
[Epoch 92] train avg loss 0.00174354, dev acc 0.8690, dev avg loss 0.384806, throughput 2.70753K wps
[Epoch 93 Batch 30/172] avg loss 0.0018247, throughput 2.81205K wps
[Epoch 93 Batch 60/172] avg loss 0.00162321, throughput 2.75013K wps
[Epoch 93 Batch 90/172] avg loss 0.00169272, throughput 2.77235K wps
[Epoch 93 Batch 120/172] avg loss 0.00164715, throughput 2.79014K wps
[Epoch 93 Batch 150/172] avg loss 0.00175018, throughput 2.74454K wps
Begin Testing...
[Epoch 93] train avg loss 0.00175348, dev acc 0.8711, dev avg loss 0.401649, throughput 2.77202K wps
Observed Improvement.
Begin Testing...
[Epoch 94 Batch 30/172] avg loss 0.00157387, throughput 2.75795K wps
[Epoch 94 Batch 60/172] avg loss 0.00159258, throughput 2.7823K wps
[Epoch 94 Batch 90/172] avg loss 0.00192482, throughput 2.75017K wps
[Epoch 94 Batch 120/172] avg loss 0.00189938, throughput 2.74839K wps
[Epoch 94 Batch 150/172] avg loss 0.00179928, throughput 2.76319K wps
Begin Testing...
[Epoch 94] train avg loss 0.00175907, dev acc 0.8669, dev avg loss 0.386215, throughput 2.74242K wps
[Epoch 95 Batch 30/172] avg loss 0.00165885, throughput 2.81225K wps
[Epoch 95 Batch 60/172] avg loss 0.00180196, throughput 2.75523K wps
[Epoch 95 Batch 90/172] avg loss 0.00170022, throughput 2.76916K wps
[Epoch 95 Batch 120/172] avg loss 0.00168879, throughput 2.66984K wps
[Epoch 95 Batch 150/172] avg loss 0.00179507, throughput 2.66048K wps
Begin Testing...
[Epoch 95] train avg loss 0.00171469, dev acc 0.8721, dev avg loss 0.41491, throughput 2.7397K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/172] avg loss 0.00154357, throughput 2.73713K wps
[Epoch 96 Batch 60/172] avg loss 0.00151487, throughput 2.70203K wps
[Epoch 96 Batch 90/172] avg loss 0.00173715, throughput 2.72022K wps
[Epoch 96 Batch 120/172] avg loss 0.00203957, throughput 2.77145K wps
[Epoch 96 Batch 150/172] avg loss 0.00182013, throughput 2.72794K wps
Begin Testing...
[Epoch 96] train avg loss 0.00174024, dev acc 0.8690, dev avg loss 0.425262, throughput 2.73909K wps
[Epoch 97 Batch 30/172] avg loss 0.0014959, throughput 2.73419K wps
[Epoch 97 Batch 60/172] avg loss 0.00160602, throughput 2.70796K wps
[Epoch 97 Batch 90/172] avg loss 0.00185314, throughput 2.74369K wps
[Epoch 97 Batch 120/172] avg loss 0.00154991, throughput 2.68545K wps
[Epoch 97 Batch 150/172] avg loss 0.00179694, throughput 2.76323K wps
Begin Testing...
[Epoch 97] train avg loss 0.00170059, dev acc 0.8700, dev avg loss 0.408222, throughput 2.73071K wps
[Epoch 98 Batch 30/172] avg loss 0.00156108, throughput 2.84512K wps
[Epoch 98 Batch 60/172] avg loss 0.00162206, throughput 2.79296K wps
[Epoch 98 Batch 90/172] avg loss 0.00164101, throughput 2.67424K wps
[Epoch 98 Batch 120/172] avg loss 0.00170672, throughput 2.77717K wps
[Epoch 98 Batch 150/172] avg loss 0.00155256, throughput 2.74549K wps
Begin Testing...
[Epoch 98] train avg loss 0.00164857, dev acc 0.8669, dev avg loss 0.413748, throughput 2.76928K wps
[Epoch 99 Batch 30/172] avg loss 0.00158185, throughput 2.78372K wps
[Epoch 99 Batch 60/172] avg loss 0.00163447, throughput 2.77933K wps
[Epoch 99 Batch 90/172] avg loss 0.00179904, throughput 2.76377K wps
[Epoch 99 Batch 120/172] avg loss 0.00162322, throughput 2.74224K wps
[Epoch 99 Batch 150/172] avg loss 0.00142269, throughput 2.76448K wps
Begin Testing...
[Epoch 99] train avg loss 0.00164266, dev acc 0.8721, dev avg loss 0.421545, throughput 2.76842K wps
Observed Improvement.
Begin Testing...
[Epoch 100 Batch 30/172] avg loss 0.00175439, throughput 2.80582K wps
[Epoch 100 Batch 60/172] avg loss 0.00167182, throughput 2.75613K wps
[Epoch 100 Batch 90/172] avg loss 0.00153234, throughput 2.73363K wps
[Epoch 100 Batch 120/172] avg loss 0.00149287, throughput 2.75653K wps
[Epoch 100 Batch 150/172] avg loss 0.00151952, throughput 2.76027K wps
Begin Testing...
[Epoch 100] train avg loss 0.00164188, dev acc 0.8690, dev avg loss 0.411225, throughput 2.75388K wps
[Epoch 101 Batch 30/172] avg loss 0.00157855, throughput 2.81754K wps
[Epoch 101 Batch 60/172] avg loss 0.00157649, throughput 2.78212K wps
[Epoch 101 Batch 90/172] avg loss 0.0014094, throughput 2.75562K wps
[Epoch 101 Batch 120/172] avg loss 0.00170998, throughput 2.75725K wps
[Epoch 101 Batch 150/172] avg loss 0.00182548, throughput 2.77402K wps
Begin Testing...
[Epoch 101] train avg loss 0.00161998, dev acc 0.8690, dev avg loss 0.429442, throughput 2.75955K wps
[Epoch 102 Batch 30/172] avg loss 0.00158096, throughput 2.74468K wps
[Epoch 102 Batch 60/172] avg loss 0.00156114, throughput 2.75399K wps
[Epoch 102 Batch 90/172] avg loss 0.00169189, throughput 2.7597K wps
[Epoch 102 Batch 120/172] avg loss 0.00152834, throughput 2.75795K wps
[Epoch 102 Batch 150/172] avg loss 0.0016661, throughput 2.69173K wps
Begin Testing...
[Epoch 102] train avg loss 0.00160757, dev acc 0.8700, dev avg loss 0.438453, throughput 2.7428K wps
[Epoch 103 Batch 30/172] avg loss 0.00136607, throughput 2.81188K wps
[Epoch 103 Batch 60/172] avg loss 0.00169118, throughput 2.81105K wps
[Epoch 103 Batch 90/172] avg loss 0.00151434, throughput 2.75588K wps
[Epoch 103 Batch 120/172] avg loss 0.0016833, throughput 2.82661K wps
[Epoch 103 Batch 150/172] avg loss 0.00170825, throughput 2.77186K wps
Begin Testing...
[Epoch 103] train avg loss 0.00158846, dev acc 0.8679, dev avg loss 0.431645, throughput 2.79655K wps
[Epoch 104 Batch 30/172] avg loss 0.00137644, throughput 2.86236K wps
[Epoch 104 Batch 60/172] avg loss 0.00141906, throughput 2.75485K wps
[Epoch 104 Batch 90/172] avg loss 0.00186041, throughput 2.74954K wps
[Epoch 104 Batch 120/172] avg loss 0.00166621, throughput 2.79489K wps
[Epoch 104 Batch 150/172] avg loss 0.00172439, throughput 2.77766K wps
Begin Testing...
[Epoch 104] train avg loss 0.00159787, dev acc 0.8627, dev avg loss 0.414743, throughput 2.78355K wps
[Epoch 105 Batch 30/172] avg loss 0.00132607, throughput 2.76343K wps
[Epoch 105 Batch 60/172] avg loss 0.00156049, throughput 2.75306K wps
[Epoch 105 Batch 90/172] avg loss 0.0014853, throughput 2.7668K wps
[Epoch 105 Batch 120/172] avg loss 0.00175617, throughput 2.75366K wps
[Epoch 105 Batch 150/172] avg loss 0.00142494, throughput 2.73221K wps
Begin Testing...
[Epoch 105] train avg loss 0.00156137, dev acc 0.8669, dev avg loss 0.432189, throughput 2.755K wps
[Epoch 106 Batch 30/172] avg loss 0.00135368, throughput 2.75341K wps
[Epoch 106 Batch 60/172] avg loss 0.00138209, throughput 2.79076K wps
[Epoch 106 Batch 90/172] avg loss 0.00143817, throughput 2.72022K wps
[Epoch 106 Batch 120/172] avg loss 0.00166343, throughput 2.81062K wps
[Epoch 106 Batch 150/172] avg loss 0.00177798, throughput 2.82275K wps
Begin Testing...
[Epoch 106] train avg loss 0.00152072, dev acc 0.8700, dev avg loss 0.437157, throughput 2.78287K wps
[Epoch 107 Batch 30/172] avg loss 0.00148984, throughput 2.88544K wps
[Epoch 107 Batch 60/172] avg loss 0.00133462, throughput 2.83234K wps
[Epoch 107 Batch 90/172] avg loss 0.00160738, throughput 2.82759K wps
[Epoch 107 Batch 120/172] avg loss 0.00155045, throughput 2.80284K wps
[Epoch 107 Batch 150/172] avg loss 0.00153195, throughput 2.81191K wps
Begin Testing...
[Epoch 107] train avg loss 0.00155406, dev acc 0.8648, dev avg loss 0.420569, throughput 2.82724K wps
[Epoch 108 Batch 30/172] avg loss 0.00144999, throughput 2.71341K wps
[Epoch 108 Batch 60/172] avg loss 0.00155853, throughput 2.81065K wps
[Epoch 108 Batch 90/172] avg loss 0.00153835, throughput 2.78068K wps
[Epoch 108 Batch 120/172] avg loss 0.00143677, throughput 2.75014K wps
[Epoch 108 Batch 150/172] avg loss 0.00150537, throughput 2.76975K wps
Begin Testing...
[Epoch 108] train avg loss 0.00153278, dev acc 0.8669, dev avg loss 0.425648, throughput 2.75521K wps
[Epoch 109 Batch 30/172] avg loss 0.00141753, throughput 2.7662K wps
[Epoch 109 Batch 60/172] avg loss 0.00147153, throughput 2.70383K wps
[Epoch 109 Batch 90/172] avg loss 0.00146852, throughput 2.7373K wps
[Epoch 109 Batch 120/172] avg loss 0.0016427, throughput 2.75381K wps
[Epoch 109 Batch 150/172] avg loss 0.0012587, throughput 2.75005K wps
Begin Testing...
[Epoch 109] train avg loss 0.00149783, dev acc 0.8700, dev avg loss 0.435713, throughput 2.7429K wps
[Epoch 110 Batch 30/172] avg loss 0.0014928, throughput 2.63653K wps
[Epoch 110 Batch 60/172] avg loss 0.00140875, throughput 2.7643K wps
[Epoch 110 Batch 90/172] avg loss 0.00142276, throughput 2.74981K wps
[Epoch 110 Batch 120/172] avg loss 0.00167996, throughput 2.75059K wps
[Epoch 110 Batch 150/172] avg loss 0.00159977, throughput 2.74081K wps
Begin Testing...
[Epoch 110] train avg loss 0.00152886, dev acc 0.8690, dev avg loss 0.455723, throughput 2.73194K wps
[Epoch 111 Batch 30/172] avg loss 0.00127628, throughput 2.78279K wps
[Epoch 111 Batch 60/172] avg loss 0.00143815, throughput 2.80418K wps
[Epoch 111 Batch 90/172] avg loss 0.00135344, throughput 2.83241K wps
[Epoch 111 Batch 120/172] avg loss 0.00158792, throughput 2.78423K wps
[Epoch 111 Batch 150/172] avg loss 0.00142942, throughput 2.74996K wps
Begin Testing...
[Epoch 111] train avg loss 0.001434, dev acc 0.8679, dev avg loss 0.432873, throughput 2.78198K wps
[Epoch 112 Batch 30/172] avg loss 0.00142081, throughput 2.84048K wps
[Epoch 112 Batch 60/172] avg loss 0.00149502, throughput 2.75757K wps
[Epoch 112 Batch 90/172] avg loss 0.00133771, throughput 2.7666K wps
[Epoch 112 Batch 120/172] avg loss 0.00141163, throughput 2.64767K wps
[Epoch 112 Batch 150/172] avg loss 0.00140702, throughput 2.77036K wps
Begin Testing...
[Epoch 112] train avg loss 0.00146708, dev acc 0.8679, dev avg loss 0.42446, throughput 2.74714K wps
[Epoch 113 Batch 30/172] avg loss 0.00123966, throughput 2.75834K wps
[Epoch 113 Batch 60/172] avg loss 0.00134367, throughput 2.80484K wps
[Epoch 113 Batch 90/172] avg loss 0.00149931, throughput 2.71411K wps
[Epoch 113 Batch 120/172] avg loss 0.00156194, throughput 2.74098K wps
[Epoch 113 Batch 150/172] avg loss 0.00171982, throughput 2.77109K wps
Begin Testing...
[Epoch 113] train avg loss 0.00145187, dev acc 0.8679, dev avg loss 0.449659, throughput 2.74881K wps
[Epoch 114 Batch 30/172] avg loss 0.00146125, throughput 2.68488K wps
[Epoch 114 Batch 60/172] avg loss 0.00139838, throughput 2.75839K wps
[Epoch 114 Batch 90/172] avg loss 0.00119797, throughput 2.70846K wps
[Epoch 114 Batch 120/172] avg loss 0.00146822, throughput 2.83648K wps
[Epoch 114 Batch 150/172] avg loss 0.00148187, throughput 2.73769K wps
Begin Testing...
[Epoch 114] train avg loss 0.00140803, dev acc 0.8658, dev avg loss 0.428214, throughput 2.75343K wps
[Epoch 115 Batch 30/172] avg loss 0.00131952, throughput 2.7777K wps
[Epoch 115 Batch 60/172] avg loss 0.00147748, throughput 2.75056K wps
[Epoch 115 Batch 90/172] avg loss 0.00129724, throughput 2.76315K wps
[Epoch 115 Batch 120/172] avg loss 0.00135723, throughput 2.80933K wps
[Epoch 115 Batch 150/172] avg loss 0.00141646, throughput 2.79368K wps
Begin Testing...
[Epoch 115] train avg loss 0.00141641, dev acc 0.8627, dev avg loss 0.442431, throughput 2.7775K wps
[Epoch 116 Batch 30/172] avg loss 0.00144851, throughput 2.86291K wps
[Epoch 116 Batch 60/172] avg loss 0.00136963, throughput 2.80426K wps
[Epoch 116 Batch 90/172] avg loss 0.0011251, throughput 2.77846K wps
[Epoch 116 Batch 120/172] avg loss 0.00141087, throughput 2.81598K wps
[Epoch 116 Batch 150/172] avg loss 0.00157314, throughput 2.79092K wps
Begin Testing...
[Epoch 116] train avg loss 0.00139116, dev acc 0.8637, dev avg loss 0.436621, throughput 2.80801K wps
[Epoch 117 Batch 30/172] avg loss 0.00133047, throughput 2.74191K wps
[Epoch 117 Batch 60/172] avg loss 0.00145654, throughput 2.789K wps
[Epoch 117 Batch 90/172] avg loss 0.00124972, throughput 2.7608K wps
[Epoch 117 Batch 120/172] avg loss 0.00147219, throughput 2.57894K wps
[Epoch 117 Batch 150/172] avg loss 0.00127165, throughput 2.80914K wps
Begin Testing...
[Epoch 117] train avg loss 0.00138344, dev acc 0.8690, dev avg loss 0.45604, throughput 2.71646K wps
[Epoch 118 Batch 30/172] avg loss 0.00118076, throughput 2.65811K wps
[Epoch 118 Batch 60/172] avg loss 0.00145188, throughput 2.75297K wps
[Epoch 118 Batch 90/172] avg loss 0.0014676, throughput 2.72856K wps
[Epoch 118 Batch 120/172] avg loss 0.00143735, throughput 2.74441K wps
[Epoch 118 Batch 150/172] avg loss 0.00123066, throughput 2.71874K wps
Begin Testing...
[Epoch 118] train avg loss 0.00139307, dev acc 0.8711, dev avg loss 0.46358, throughput 2.71517K wps
[Epoch 119 Batch 30/172] avg loss 0.00120218, throughput 2.81857K wps
[Epoch 119 Batch 60/172] avg loss 0.00117665, throughput 2.75494K wps
[Epoch 119 Batch 90/172] avg loss 0.00159392, throughput 2.74362K wps
[Epoch 119 Batch 120/172] avg loss 0.00150532, throughput 2.73357K wps
[Epoch 119 Batch 150/172] avg loss 0.00148193, throughput 2.75454K wps
Begin Testing...
[Epoch 119] train avg loss 0.00137854, dev acc 0.8690, dev avg loss 0.488061, throughput 2.75955K wps
[Epoch 120 Batch 30/172] avg loss 0.00123224, throughput 2.82751K wps
[Epoch 120 Batch 60/172] avg loss 0.00136494, throughput 2.81853K wps
[Epoch 120 Batch 90/172] avg loss 0.00149695, throughput 2.7483K wps
[Epoch 120 Batch 120/172] avg loss 0.0010076, throughput 2.79754K wps
[Epoch 120 Batch 150/172] avg loss 0.0014802, throughput 2.8009K wps
Begin Testing...
[Epoch 120] train avg loss 0.00132758, dev acc 0.8627, dev avg loss 0.441888, throughput 2.80038K wps
[Epoch 121 Batch 30/172] avg loss 0.00137882, throughput 2.70887K wps
[Epoch 121 Batch 60/172] avg loss 0.0012554, throughput 2.80052K wps
[Epoch 121 Batch 90/172] avg loss 0.00142969, throughput 2.798K wps
[Epoch 121 Batch 120/172] avg loss 0.00139167, throughput 2.81037K wps
[Epoch 121 Batch 150/172] avg loss 0.00138669, throughput 2.75919K wps
Begin Testing...
[Epoch 121] train avg loss 0.00136077, dev acc 0.8658, dev avg loss 0.45857, throughput 2.77554K wps
[Epoch 122 Batch 30/172] avg loss 0.00150651, throughput 2.83099K wps
[Epoch 122 Batch 60/172] avg loss 0.0012732, throughput 2.72396K wps
[Epoch 122 Batch 90/172] avg loss 0.00131551, throughput 2.75024K wps
[Epoch 122 Batch 120/172] avg loss 0.00110643, throughput 2.70342K wps
[Epoch 122 Batch 150/172] avg loss 0.00132761, throughput 2.77934K wps
Begin Testing...
[Epoch 122] train avg loss 0.00131406, dev acc 0.8679, dev avg loss 0.470476, throughput 2.75901K wps
[Epoch 123 Batch 30/172] avg loss 0.00135657, throughput 2.82388K wps
[Epoch 123 Batch 60/172] avg loss 0.00116422, throughput 2.76766K wps
[Epoch 123 Batch 90/172] avg loss 0.00146405, throughput 2.81467K wps
[Epoch 123 Batch 120/172] avg loss 0.00132446, throughput 2.80785K wps
[Epoch 123 Batch 150/172] avg loss 0.00135672, throughput 2.80432K wps
Begin Testing...
[Epoch 123] train avg loss 0.00130737, dev acc 0.8637, dev avg loss 0.440591, throughput 2.80482K wps
[Epoch 124 Batch 30/172] avg loss 0.00096292, throughput 2.81483K wps
[Epoch 124 Batch 60/172] avg loss 0.00126739, throughput 2.80992K wps
[Epoch 124 Batch 90/172] avg loss 0.00125344, throughput 2.80928K wps
[Epoch 124 Batch 120/172] avg loss 0.00147027, throughput 2.80719K wps
[Epoch 124 Batch 150/172] avg loss 0.00148024, throughput 2.753K wps
Begin Testing...
[Epoch 124] train avg loss 0.00129875, dev acc 0.8606, dev avg loss 0.44458, throughput 2.80677K wps
[Epoch 125 Batch 30/172] avg loss 0.0011079, throughput 2.85649K wps
[Epoch 125 Batch 60/172] avg loss 0.00115371, throughput 2.80237K wps
[Epoch 125 Batch 90/172] avg loss 0.00153399, throughput 2.67765K wps
[Epoch 125 Batch 120/172] avg loss 0.00114923, throughput 2.80141K wps
[Epoch 125 Batch 150/172] avg loss 0.00143865, throughput 2.64377K wps
Begin Testing...
[Epoch 125] train avg loss 0.00131788, dev acc 0.8606, dev avg loss 0.45551, throughput 2.76471K wps
[Epoch 126 Batch 30/172] avg loss 0.0012911, throughput 2.64795K wps
[Epoch 126 Batch 60/172] avg loss 0.0010927, throughput 2.72116K wps
[Epoch 126 Batch 90/172] avg loss 0.00137132, throughput 2.72582K wps
[Epoch 126 Batch 120/172] avg loss 0.00126858, throughput 2.71735K wps
[Epoch 126 Batch 150/172] avg loss 0.00133981, throughput 2.63818K wps
Begin Testing...
[Epoch 126] train avg loss 0.00130617, dev acc 0.8658, dev avg loss 0.470507, throughput 2.6745K wps
[Epoch 127 Batch 30/172] avg loss 0.00117776, throughput 2.78241K wps
[Epoch 127 Batch 60/172] avg loss 0.00114003, throughput 2.78181K wps
[Epoch 127 Batch 90/172] avg loss 0.00136418, throughput 2.75416K wps
[Epoch 127 Batch 120/172] avg loss 0.00144091, throughput 2.78603K wps
[Epoch 127 Batch 150/172] avg loss 0.00142714, throughput 2.83217K wps
Begin Testing...
[Epoch 127] train avg loss 0.00129208, dev acc 0.8648, dev avg loss 0.453399, throughput 2.79093K wps
[Epoch 128 Batch 30/172] avg loss 0.00113304, throughput 2.81027K wps
[Epoch 128 Batch 60/172] avg loss 0.0012253, throughput 2.85755K wps
[Epoch 128 Batch 90/172] avg loss 0.00121869, throughput 2.77028K wps
[Epoch 128 Batch 120/172] avg loss 0.00144768, throughput 2.82013K wps
[Epoch 128 Batch 150/172] avg loss 0.00119348, throughput 2.80491K wps
Begin Testing...
[Epoch 128] train avg loss 0.00125736, dev acc 0.8637, dev avg loss 0.458493, throughput 2.80864K wps
[Epoch 129 Batch 30/172] avg loss 0.00127636, throughput 2.74746K wps
[Epoch 129 Batch 60/172] avg loss 0.00115443, throughput 2.76951K wps
[Epoch 129 Batch 90/172] avg loss 0.00111141, throughput 2.73153K wps
[Epoch 129 Batch 120/172] avg loss 0.00159274, throughput 2.81559K wps
[Epoch 129 Batch 150/172] avg loss 0.00109475, throughput 2.76384K wps
Begin Testing...
[Epoch 129] train avg loss 0.00126623, dev acc 0.8637, dev avg loss 0.478137, throughput 2.76285K wps
[Epoch 130 Batch 30/172] avg loss 0.00120388, throughput 2.77747K wps
[Epoch 130 Batch 60/172] avg loss 0.00118673, throughput 2.68584K wps
[Epoch 130 Batch 90/172] avg loss 0.00116816, throughput 2.74319K wps
[Epoch 130 Batch 120/172] avg loss 0.00133745, throughput 2.74727K wps
[Epoch 130 Batch 150/172] avg loss 0.00137445, throughput 2.75065K wps
Begin Testing...
[Epoch 130] train avg loss 0.00126046, dev acc 0.8648, dev avg loss 0.482057, throughput 2.73103K wps
[Epoch 131 Batch 30/172] avg loss 0.00105604, throughput 2.70682K wps
[Epoch 131 Batch 60/172] avg loss 0.00130924, throughput 2.66922K wps
[Epoch 131 Batch 90/172] avg loss 0.00123498, throughput 2.76107K wps
[Epoch 131 Batch 120/172] avg loss 0.00136988, throughput 2.7777K wps
[Epoch 131 Batch 150/172] avg loss 0.00138618, throughput 2.78451K wps
Begin Testing...
[Epoch 131] train avg loss 0.00126982, dev acc 0.8637, dev avg loss 0.488382, throughput 2.7368K wps
[Epoch 132 Batch 30/172] avg loss 0.00125156, throughput 2.83095K wps
[Epoch 132 Batch 60/172] avg loss 0.00103834, throughput 2.82169K wps
[Epoch 132 Batch 90/172] avg loss 0.00132278, throughput 2.81541K wps
[Epoch 132 Batch 120/172] avg loss 0.00113666, throughput 2.83122K wps
[Epoch 132 Batch 150/172] avg loss 0.00134771, throughput 2.70753K wps
Begin Testing...
[Epoch 132] train avg loss 0.00124638, dev acc 0.8616, dev avg loss 0.467145, throughput 2.80115K wps
[Epoch 133 Batch 30/172] avg loss 0.00115069, throughput 2.78501K wps
[Epoch 133 Batch 60/172] avg loss 0.0013076, throughput 2.75943K wps
[Epoch 133 Batch 90/172] avg loss 0.00120724, throughput 2.67978K wps
[Epoch 133 Batch 120/172] avg loss 0.00103458, throughput 2.70702K wps
[Epoch 133 Batch 150/172] avg loss 0.0013716, throughput 2.7542K wps
Begin Testing...
[Epoch 133] train avg loss 0.00122483, dev acc 0.8679, dev avg loss 0.472714, throughput 2.74453K wps
[Epoch 134 Batch 30/172] avg loss 0.00154008, throughput 2.65784K wps
[Epoch 134 Batch 60/172] avg loss 0.00125795, throughput 2.84725K wps
[Epoch 134 Batch 90/172] avg loss 0.001262, throughput 2.73035K wps
[Epoch 134 Batch 120/172] avg loss 0.00128945, throughput 2.76691K wps
[Epoch 134 Batch 150/172] avg loss 0.00117981, throughput 2.72957K wps
Begin Testing...
[Epoch 134] train avg loss 0.00125809, dev acc 0.8606, dev avg loss 0.464685, throughput 2.74251K wps
[Epoch 135 Batch 30/172] avg loss 0.00107784, throughput 2.76938K wps
[Epoch 135 Batch 60/172] avg loss 0.00126748, throughput 2.7518K wps
[Epoch 135 Batch 90/172] avg loss 0.00119683, throughput 2.74651K wps
[Epoch 135 Batch 120/172] avg loss 0.00120285, throughput 2.68186K wps
[Epoch 135 Batch 150/172] avg loss 0.00133784, throughput 2.75746K wps
Begin Testing...
[Epoch 135] train avg loss 0.00122909, dev acc 0.8669, dev avg loss 0.478792, throughput 2.74266K wps
[Epoch 136 Batch 30/172] avg loss 0.00111317, throughput 2.77967K wps
[Epoch 136 Batch 60/172] avg loss 0.00114261, throughput 2.7724K wps
[Epoch 136 Batch 90/172] avg loss 0.00126644, throughput 2.73444K wps
[Epoch 136 Batch 120/172] avg loss 0.00131283, throughput 2.77714K wps
[Epoch 136 Batch 150/172] avg loss 0.00126777, throughput 2.69816K wps
Begin Testing...
[Epoch 136] train avg loss 0.00120445, dev acc 0.8637, dev avg loss 0.478885, throughput 2.75924K wps
[Epoch 137 Batch 30/172] avg loss 0.0011804, throughput 2.84391K wps
[Epoch 137 Batch 60/172] avg loss 0.00116544, throughput 2.75581K wps
[Epoch 137 Batch 90/172] avg loss 0.00124037, throughput 2.70159K wps
[Epoch 137 Batch 120/172] avg loss 0.00134227, throughput 2.67566K wps
[Epoch 137 Batch 150/172] avg loss 0.00111462, throughput 2.76015K wps
Begin Testing...
[Epoch 137] train avg loss 0.00120251, dev acc 0.8648, dev avg loss 0.470982, throughput 2.74624K wps
[Epoch 138 Batch 30/172] avg loss 0.00105026, throughput 2.79274K wps
[Epoch 138 Batch 60/172] avg loss 0.00119695, throughput 2.72802K wps
[Epoch 138 Batch 90/172] avg loss 0.00118416, throughput 2.80048K wps
[Epoch 138 Batch 120/172] avg loss 0.00134083, throughput 2.76361K wps
[Epoch 138 Batch 150/172] avg loss 0.00110031, throughput 2.71986K wps
Begin Testing...
[Epoch 138] train avg loss 0.00117501, dev acc 0.8669, dev avg loss 0.500083, throughput 2.76451K wps
[Epoch 139 Batch 30/172] avg loss 0.000931767, throughput 2.78235K wps
[Epoch 139 Batch 60/172] avg loss 0.00116197, throughput 2.75133K wps
[Epoch 139 Batch 90/172] avg loss 0.00118057, throughput 2.77561K wps
[Epoch 139 Batch 120/172] avg loss 0.00116282, throughput 2.77555K wps
[Epoch 139 Batch 150/172] avg loss 0.00148159, throughput 2.70604K wps
Begin Testing...
[Epoch 139] train avg loss 0.00119357, dev acc 0.8616, dev avg loss 0.469714, throughput 2.77065K wps
[Epoch 140 Batch 30/172] avg loss 0.00107311, throughput 2.83563K wps
[Epoch 140 Batch 60/172] avg loss 0.0012934, throughput 2.7179K wps
[Epoch 140 Batch 90/172] avg loss 0.000976281, throughput 2.78126K wps
[Epoch 140 Batch 120/172] avg loss 0.00123194, throughput 2.78169K wps
[Epoch 140 Batch 150/172] avg loss 0.00125248, throughput 2.7842K wps
Begin Testing...
[Epoch 140] train avg loss 0.00116992, dev acc 0.8627, dev avg loss 0.474145, throughput 2.77987K wps
[Epoch 141 Batch 30/172] avg loss 0.00105949, throughput 2.84408K wps
[Epoch 141 Batch 60/172] avg loss 0.000980555, throughput 2.81715K wps
[Epoch 141 Batch 90/172] avg loss 0.00127437, throughput 2.79565K wps
[Epoch 141 Batch 120/172] avg loss 0.00104523, throughput 2.76748K wps
[Epoch 141 Batch 150/172] avg loss 0.00126297, throughput 2.7535K wps
Begin Testing...
[Epoch 141] train avg loss 0.00114252, dev acc 0.8616, dev avg loss 0.477848, throughput 2.77637K wps
[Epoch 142 Batch 30/172] avg loss 0.00112872, throughput 2.79689K wps
[Epoch 142 Batch 60/172] avg loss 0.00103112, throughput 2.67563K wps
[Epoch 142 Batch 90/172] avg loss 0.00140657, throughput 2.64039K wps
[Epoch 142 Batch 120/172] avg loss 0.00101819, throughput 2.62415K wps
[Epoch 142 Batch 150/172] avg loss 0.00130495, throughput 2.73915K wps
Begin Testing...
[Epoch 142] train avg loss 0.00117149, dev acc 0.8648, dev avg loss 0.524774, throughput 2.70218K wps
[Epoch 143 Batch 30/172] avg loss 0.0011299, throughput 2.7445K wps
[Epoch 143 Batch 60/172] avg loss 0.00117304, throughput 2.78935K wps
[Epoch 143 Batch 90/172] avg loss 0.00105273, throughput 2.73166K wps
[Epoch 143 Batch 120/172] avg loss 0.00121601, throughput 2.79462K wps
[Epoch 143 Batch 150/172] avg loss 0.00104664, throughput 2.70329K wps
Begin Testing...
[Epoch 143] train avg loss 0.00114743, dev acc 0.8606, dev avg loss 0.466609, throughput 2.75117K wps
[Epoch 144 Batch 30/172] avg loss 0.00120468, throughput 2.71974K wps
[Epoch 144 Batch 60/172] avg loss 0.00114483, throughput 2.76037K wps
[Epoch 144 Batch 90/172] avg loss 0.00112609, throughput 2.78652K wps
[Epoch 144 Batch 120/172] avg loss 0.00144175, throughput 2.67945K wps
[Epoch 144 Batch 150/172] avg loss 0.00113508, throughput 2.7304K wps
Begin Testing...
[Epoch 144] train avg loss 0.00119748, dev acc 0.8637, dev avg loss 0.478122, throughput 2.73645K wps
[Epoch 145 Batch 30/172] avg loss 0.000937737, throughput 2.61733K wps
[Epoch 145 Batch 60/172] avg loss 0.00114361, throughput 2.76096K wps
[Epoch 145 Batch 90/172] avg loss 0.0010304, throughput 2.76915K wps
[Epoch 145 Batch 120/172] avg loss 0.00128043, throughput 2.74685K wps
[Epoch 145 Batch 150/172] avg loss 0.00120153, throughput 2.71001K wps
Begin Testing...
[Epoch 145] train avg loss 0.00111037, dev acc 0.8627, dev avg loss 0.498438, throughput 2.72176K wps
[Epoch 146 Batch 30/172] avg loss 0.000934389, throughput 2.78032K wps
[Epoch 146 Batch 60/172] avg loss 0.00125938, throughput 2.76354K wps
[Epoch 146 Batch 90/172] avg loss 0.00107257, throughput 2.76577K wps
[Epoch 146 Batch 120/172] avg loss 0.00125222, throughput 2.77967K wps
[Epoch 146 Batch 150/172] avg loss 0.00112765, throughput 2.7997K wps
Begin Testing...
[Epoch 146] train avg loss 0.00112385, dev acc 0.8627, dev avg loss 0.484596, throughput 2.76187K wps
[Epoch 147 Batch 30/172] avg loss 0.00109981, throughput 2.79369K wps
[Epoch 147 Batch 60/172] avg loss 0.00120868, throughput 2.85212K wps
[Epoch 147 Batch 90/172] avg loss 0.00129511, throughput 2.80262K wps
[Epoch 147 Batch 120/172] avg loss 0.000888982, throughput 2.72428K wps
[Epoch 147 Batch 150/172] avg loss 0.00116262, throughput 2.80945K wps
Begin Testing...
[Epoch 147] train avg loss 0.00114989, dev acc 0.8512, dev avg loss 0.474581, throughput 2.80643K wps
[Epoch 148 Batch 30/172] avg loss 0.00118974, throughput 2.80319K wps
[Epoch 148 Batch 60/172] avg loss 0.000892228, throughput 2.82556K wps
[Epoch 148 Batch 90/172] avg loss 0.00102043, throughput 2.76412K wps
[Epoch 148 Batch 120/172] avg loss 0.00122248, throughput 2.77403K wps
[Epoch 148 Batch 150/172] avg loss 0.00129609, throughput 2.78111K wps
Begin Testing...
[Epoch 148] train avg loss 0.00110769, dev acc 0.8658, dev avg loss 0.493494, throughput 2.78673K wps
[Epoch 149 Batch 30/172] avg loss 0.00103408, throughput 2.62993K wps
[Epoch 149 Batch 60/172] avg loss 0.00108791, throughput 2.86103K wps
[Epoch 149 Batch 90/172] avg loss 0.000918551, throughput 2.77935K wps
[Epoch 149 Batch 120/172] avg loss 0.00111484, throughput 2.79765K wps
[Epoch 149 Batch 150/172] avg loss 0.00124812, throughput 2.817K wps
Begin Testing...
[Epoch 149] train avg loss 0.00110132, dev acc 0.8553, dev avg loss 0.480669, throughput 2.7796K wps
[Epoch 150 Batch 30/172] avg loss 0.000965184, throughput 2.78133K wps
[Epoch 150 Batch 60/172] avg loss 0.00121883, throughput 2.6764K wps
[Epoch 150 Batch 90/172] avg loss 0.000987356, throughput 2.80038K wps
[Epoch 150 Batch 120/172] avg loss 0.00114331, throughput 2.73682K wps
[Epoch 150 Batch 150/172] avg loss 0.00108193, throughput 2.818K wps
Begin Testing...
[Epoch 150] train avg loss 0.00108769, dev acc 0.8648, dev avg loss 0.507664, throughput 2.76303K wps
[Epoch 151 Batch 30/172] avg loss 0.00120267, throughput 2.75076K wps
[Epoch 151 Batch 60/172] avg loss 0.000974662, throughput 2.76261K wps
[Epoch 151 Batch 90/172] avg loss 0.000846486, throughput 2.81545K wps
[Epoch 151 Batch 120/172] avg loss 0.00119696, throughput 2.78493K wps
[Epoch 151 Batch 150/172] avg loss 0.00105429, throughput 2.79672K wps
Begin Testing...
[Epoch 151] train avg loss 0.00108366, dev acc 0.8606, dev avg loss 0.489022, throughput 2.78102K wps
[Epoch 152 Batch 30/172] avg loss 0.00112221, throughput 2.76572K wps
[Epoch 152 Batch 60/172] avg loss 0.00108643, throughput 2.79257K wps
[Epoch 152 Batch 90/172] avg loss 0.000994792, throughput 2.74739K wps
[Epoch 152 Batch 120/172] avg loss 0.000978995, throughput 2.74995K wps
[Epoch 152 Batch 150/172] avg loss 0.00119039, throughput 2.70591K wps
Begin Testing...
[Epoch 152] train avg loss 0.00107808, dev acc 0.8616, dev avg loss 0.506662, throughput 2.74075K wps
[Epoch 153 Batch 30/172] avg loss 0.00105799, throughput 2.75614K wps
[Epoch 153 Batch 60/172] avg loss 0.00100364, throughput 2.68369K wps
[Epoch 153 Batch 90/172] avg loss 0.00119946, throughput 2.73983K wps
[Epoch 153 Batch 120/172] avg loss 0.00119146, throughput 2.75801K wps
[Epoch 153 Batch 150/172] avg loss 0.00109494, throughput 2.76125K wps
Begin Testing...
[Epoch 153] train avg loss 0.00106406, dev acc 0.8637, dev avg loss 0.551416, throughput 2.74053K wps
[Epoch 154 Batch 30/172] avg loss 0.000942014, throughput 2.82527K wps
[Epoch 154 Batch 60/172] avg loss 0.000951942, throughput 2.67755K wps
[Epoch 154 Batch 90/172] avg loss 0.00124573, throughput 2.76083K wps
[Epoch 154 Batch 120/172] avg loss 0.000969255, throughput 2.74787K wps
[Epoch 154 Batch 150/172] avg loss 0.0011396, throughput 2.72536K wps
Begin Testing...
[Epoch 154] train avg loss 0.00108737, dev acc 0.8595, dev avg loss 0.509761, throughput 2.7322K wps
[Epoch 155 Batch 30/172] avg loss 0.00100321, throughput 2.81041K wps
[Epoch 155 Batch 60/172] avg loss 0.000904133, throughput 2.78166K wps
[Epoch 155 Batch 90/172] avg loss 0.00107956, throughput 2.79351K wps
[Epoch 155 Batch 120/172] avg loss 0.00114717, throughput 2.76296K wps
[Epoch 155 Batch 150/172] avg loss 0.00115236, throughput 2.75382K wps
Begin Testing...
[Epoch 155] train avg loss 0.00107891, dev acc 0.8585, dev avg loss 0.509302, throughput 2.77786K wps
[Epoch 156 Batch 30/172] avg loss 0.0009484, throughput 2.85046K wps
[Epoch 156 Batch 60/172] avg loss 0.00101657, throughput 2.80581K wps
[Epoch 156 Batch 90/172] avg loss 0.00111618, throughput 2.81629K wps
[Epoch 156 Batch 120/172] avg loss 0.00114863, throughput 2.80882K wps
[Epoch 156 Batch 150/172] avg loss 0.00109615, throughput 2.82011K wps
Begin Testing...
[Epoch 156] train avg loss 0.00106586, dev acc 0.8637, dev avg loss 0.5168, throughput 2.81699K wps
[Epoch 157 Batch 30/172] avg loss 0.000853207, throughput 2.85949K wps
[Epoch 157 Batch 60/172] avg loss 0.00120337, throughput 2.80014K wps
[Epoch 157 Batch 90/172] avg loss 0.00109518, throughput 2.69582K wps
[Epoch 157 Batch 120/172] avg loss 0.00103176, throughput 2.8354K wps
[Epoch 157 Batch 150/172] avg loss 0.00101086, throughput 2.80187K wps
Begin Testing...
[Epoch 157] train avg loss 0.00103174, dev acc 0.8606, dev avg loss 0.506729, throughput 2.79997K wps
[Epoch 158 Batch 30/172] avg loss 0.00106667, throughput 2.82374K wps
[Epoch 158 Batch 60/172] avg loss 0.000991872, throughput 2.82167K wps
[Epoch 158 Batch 90/172] avg loss 0.0010401, throughput 2.78511K wps
[Epoch 158 Batch 120/172] avg loss 0.00106576, throughput 2.74721K wps
[Epoch 158 Batch 150/172] avg loss 0.0011207, throughput 2.84966K wps
Begin Testing...
[Epoch 158] train avg loss 0.00106147, dev acc 0.8606, dev avg loss 0.526696, throughput 2.80563K wps
[Epoch 159 Batch 30/172] avg loss 0.0010007, throughput 2.75697K wps
[Epoch 159 Batch 60/172] avg loss 0.000890256, throughput 2.78645K wps
[Epoch 159 Batch 90/172] avg loss 0.000963639, throughput 2.78818K wps
[Epoch 159 Batch 120/172] avg loss 0.00128611, throughput 2.77585K wps
[Epoch 159 Batch 150/172] avg loss 0.000918196, throughput 2.81597K wps
Begin Testing...
[Epoch 159] train avg loss 0.00104455, dev acc 0.8627, dev avg loss 0.52079, throughput 2.78518K wps
[Epoch 160 Batch 30/172] avg loss 0.00088234, throughput 2.82474K wps
[Epoch 160 Batch 60/172] avg loss 0.00101749, throughput 2.79994K wps
[Epoch 160 Batch 90/172] avg loss 0.000862782, throughput 2.77111K wps
[Epoch 160 Batch 120/172] avg loss 0.00132739, throughput 2.7898K wps
[Epoch 160 Batch 150/172] avg loss 0.00118941, throughput 2.66277K wps
Begin Testing...
[Epoch 160] train avg loss 0.00108085, dev acc 0.8595, dev avg loss 0.504304, throughput 2.774K wps
[Epoch 161 Batch 30/172] avg loss 0.000865573, throughput 2.86498K wps
[Epoch 161 Batch 60/172] avg loss 0.000948601, throughput 2.78201K wps
[Epoch 161 Batch 90/172] avg loss 0.00114542, throughput 2.78905K wps
[Epoch 161 Batch 120/172] avg loss 0.00117897, throughput 2.80391K wps
[Epoch 161 Batch 150/172] avg loss 0.0010912, throughput 2.74818K wps
Begin Testing...
[Epoch 161] train avg loss 0.00101905, dev acc 0.8637, dev avg loss 0.523293, throughput 2.80332K wps
[Epoch 162 Batch 30/172] avg loss 0.00115971, throughput 2.86547K wps
[Epoch 162 Batch 60/172] avg loss 0.000873389, throughput 2.81585K wps
[Epoch 162 Batch 90/172] avg loss 0.00114328, throughput 2.7939K wps
[Epoch 162 Batch 120/172] avg loss 0.000985196, throughput 2.74842K wps
[Epoch 162 Batch 150/172] avg loss 0.00105661, throughput 2.78409K wps
Begin Testing...
[Epoch 162] train avg loss 0.00105271, dev acc 0.8627, dev avg loss 0.529079, throughput 2.80112K wps
[Epoch 163 Batch 30/172] avg loss 0.000752396, throughput 2.75443K wps
[Epoch 163 Batch 60/172] avg loss 0.000970293, throughput 2.7866K wps
[Epoch 163 Batch 90/172] avg loss 0.000907935, throughput 2.79384K wps
[Epoch 163 Batch 120/172] avg loss 0.00117592, throughput 2.8042K wps
[Epoch 163 Batch 150/172] avg loss 0.00133722, throughput 2.74066K wps
Begin Testing...
[Epoch 163] train avg loss 0.00104349, dev acc 0.8574, dev avg loss 0.514297, throughput 2.77772K wps
[Epoch 164 Batch 30/172] avg loss 0.000869439, throughput 2.83545K wps
[Epoch 164 Batch 60/172] avg loss 0.00105822, throughput 2.77132K wps
[Epoch 164 Batch 90/172] avg loss 0.000868877, throughput 2.77151K wps
[Epoch 164 Batch 120/172] avg loss 0.00108157, throughput 2.74662K wps
[Epoch 164 Batch 150/172] avg loss 0.000993453, throughput 2.76781K wps
Begin Testing...
[Epoch 164] train avg loss 0.000978538, dev acc 0.8564, dev avg loss 0.511266, throughput 2.77313K wps
[Epoch 165 Batch 30/172] avg loss 0.000779865, throughput 2.77642K wps
[Epoch 165 Batch 60/172] avg loss 0.00114254, throughput 2.79764K wps
[Epoch 165 Batch 90/172] avg loss 0.000846766, throughput 2.77396K wps
[Epoch 165 Batch 120/172] avg loss 0.000927087, throughput 2.64429K wps
[Epoch 165 Batch 150/172] avg loss 0.00109746, throughput 2.64167K wps
Begin Testing...
[Epoch 165] train avg loss 0.000996743, dev acc 0.8595, dev avg loss 0.535949, throughput 2.72539K wps
[Epoch 166 Batch 30/172] avg loss 0.000941604, throughput 2.79907K wps
[Epoch 166 Batch 60/172] avg loss 0.00107765, throughput 2.77149K wps
[Epoch 166 Batch 90/172] avg loss 0.000941051, throughput 2.81931K wps
[Epoch 166 Batch 120/172] avg loss 0.000946808, throughput 2.83024K wps
[Epoch 166 Batch 150/172] avg loss 0.00105762, throughput 2.76468K wps
Begin Testing...
[Epoch 166] train avg loss 0.00100368, dev acc 0.8574, dev avg loss 0.53818, throughput 2.7914K wps
[Epoch 167 Batch 30/172] avg loss 0.000917767, throughput 2.8297K wps
[Epoch 167 Batch 60/172] avg loss 0.00104629, throughput 2.81933K wps
[Epoch 167 Batch 90/172] avg loss 0.00102473, throughput 2.77921K wps
[Epoch 167 Batch 120/172] avg loss 0.000788869, throughput 2.80375K wps
[Epoch 167 Batch 150/172] avg loss 0.00117948, throughput 2.77289K wps
Begin Testing...
[Epoch 167] train avg loss 0.00100727, dev acc 0.8637, dev avg loss 0.557599, throughput 2.79897K wps
[Epoch 168 Batch 30/172] avg loss 0.000852139, throughput 2.78272K wps
[Epoch 168 Batch 60/172] avg loss 0.000807359, throughput 2.68801K wps
[Epoch 168 Batch 90/172] avg loss 0.00100867, throughput 2.78903K wps
[Epoch 168 Batch 120/172] avg loss 0.00106903, throughput 2.77256K wps
[Epoch 168 Batch 150/172] avg loss 0.00118202, throughput 2.64061K wps
Begin Testing...
[Epoch 168] train avg loss 0.00096495, dev acc 0.8637, dev avg loss 0.559119, throughput 2.74049K wps
[Epoch 169 Batch 30/172] avg loss 0.000700178, throughput 2.77492K wps
[Epoch 169 Batch 60/172] avg loss 0.00125569, throughput 2.75539K wps
[Epoch 169 Batch 90/172] avg loss 0.00100031, throughput 2.73438K wps
[Epoch 169 Batch 120/172] avg loss 0.000848816, throughput 2.75858K wps
[Epoch 169 Batch 150/172] avg loss 0.00114246, throughput 2.76002K wps
Begin Testing...
[Epoch 169] train avg loss 0.000980991, dev acc 0.8595, dev avg loss 0.521881, throughput 2.75658K wps
[Epoch 170 Batch 30/172] avg loss 0.000938659, throughput 2.73538K wps
[Epoch 170 Batch 60/172] avg loss 0.000833077, throughput 2.65112K wps
[Epoch 170 Batch 90/172] avg loss 0.000951932, throughput 2.60962K wps
[Epoch 170 Batch 120/172] avg loss 0.0011477, throughput 2.78592K wps
[Epoch 170 Batch 150/172] avg loss 0.000938139, throughput 2.80552K wps
Begin Testing...
[Epoch 170] train avg loss 0.000976687, dev acc 0.8564, dev avg loss 0.518298, throughput 2.71762K wps
[Epoch 171 Batch 30/172] avg loss 0.000811109, throughput 2.87904K wps
[Epoch 171 Batch 60/172] avg loss 0.00089394, throughput 2.74399K wps
[Epoch 171 Batch 90/172] avg loss 0.000923004, throughput 2.78739K wps
[Epoch 171 Batch 120/172] avg loss 0.00110631, throughput 2.77061K wps
[Epoch 171 Batch 150/172] avg loss 0.00105074, throughput 2.77167K wps
Begin Testing...
[Epoch 171] train avg loss 0.000944067, dev acc 0.8543, dev avg loss 0.522346, throughput 2.76965K wps
[Epoch 172 Batch 30/172] avg loss 0.001022, throughput 2.7782K wps
[Epoch 172 Batch 60/172] avg loss 0.00108753, throughput 2.79445K wps
[Epoch 172 Batch 90/172] avg loss 0.000975494, throughput 2.76812K wps
[Epoch 172 Batch 120/172] avg loss 0.000756656, throughput 2.76726K wps
[Epoch 172 Batch 150/172] avg loss 0.000784355, throughput 2.75306K wps
Begin Testing...
[Epoch 172] train avg loss 0.000963056, dev acc 0.8616, dev avg loss 0.569028, throughput 2.77671K wps
[Epoch 173 Batch 30/172] avg loss 0.000962659, throughput 2.80263K wps
[Epoch 173 Batch 60/172] avg loss 0.000834368, throughput 2.70382K wps
[Epoch 173 Batch 90/172] avg loss 0.000929873, throughput 2.73873K wps
[Epoch 173 Batch 120/172] avg loss 0.000945673, throughput 2.76499K wps
[Epoch 173 Batch 150/172] avg loss 0.00125699, throughput 2.76022K wps
Begin Testing...
[Epoch 173] train avg loss 0.000987751, dev acc 0.8616, dev avg loss 0.559382, throughput 2.75309K wps
[Epoch 174 Batch 30/172] avg loss 0.000982446, throughput 2.77097K wps
[Epoch 174 Batch 60/172] avg loss 0.000881761, throughput 2.76117K wps
[Epoch 174 Batch 90/172] avg loss 0.00095442, throughput 2.75954K wps
[Epoch 174 Batch 120/172] avg loss 0.000897926, throughput 2.79037K wps
[Epoch 174 Batch 150/172] avg loss 0.00107953, throughput 2.77942K wps
Begin Testing...
[Epoch 174] train avg loss 0.00094656, dev acc 0.8585, dev avg loss 0.531566, throughput 2.77469K wps
[Epoch 175 Batch 30/172] avg loss 0.000807074, throughput 2.82302K wps
[Epoch 175 Batch 60/172] avg loss 0.000911487, throughput 2.75915K wps
[Epoch 175 Batch 90/172] avg loss 0.00100717, throughput 2.80099K wps
[Epoch 175 Batch 120/172] avg loss 0.00101458, throughput 2.75107K wps
[Epoch 175 Batch 150/172] avg loss 0.000990125, throughput 2.80595K wps
Begin Testing...
[Epoch 175] train avg loss 0.000950518, dev acc 0.8595, dev avg loss 0.54119, throughput 2.78403K wps
[Epoch 176 Batch 30/172] avg loss 0.000813895, throughput 2.74024K wps
[Epoch 176 Batch 60/172] avg loss 0.000895701, throughput 2.80655K wps
[Epoch 176 Batch 90/172] avg loss 0.00100915, throughput 2.78868K wps
[Epoch 176 Batch 120/172] avg loss 0.0011897, throughput 2.69936K wps
[Epoch 176 Batch 150/172] avg loss 0.000725248, throughput 2.79678K wps
Begin Testing...
[Epoch 176] train avg loss 0.000941, dev acc 0.8574, dev avg loss 0.529814, throughput 2.76569K wps
[Epoch 177 Batch 30/172] avg loss 0.000831326, throughput 2.73699K wps
[Epoch 177 Batch 60/172] avg loss 0.000862821, throughput 2.78341K wps
[Epoch 177 Batch 90/172] avg loss 0.000979234, throughput 2.71582K wps
[Epoch 177 Batch 120/172] avg loss 0.00106814, throughput 2.65195K wps
[Epoch 177 Batch 150/172] avg loss 0.00108773, throughput 2.71619K wps
Begin Testing...
[Epoch 177] train avg loss 0.000944659, dev acc 0.8606, dev avg loss 0.571162, throughput 2.72728K wps
[Epoch 178 Batch 30/172] avg loss 0.000825057, throughput 2.70456K wps
[Epoch 178 Batch 60/172] avg loss 0.000940621, throughput 2.68534K wps
[Epoch 178 Batch 90/172] avg loss 0.00103907, throughput 2.84018K wps
[Epoch 178 Batch 120/172] avg loss 0.00106503, throughput 2.75823K wps
[Epoch 178 Batch 150/172] avg loss 0.00083369, throughput 2.57727K wps
Begin Testing...
[Epoch 178] train avg loss 0.000932406, dev acc 0.8585, dev avg loss 0.568385, throughput 2.72341K wps
[Epoch 179 Batch 30/172] avg loss 0.00095282, throughput 2.77727K wps
[Epoch 179 Batch 60/172] avg loss 0.000934508, throughput 2.75645K wps
[Epoch 179 Batch 90/172] avg loss 0.00108686, throughput 2.70905K wps
[Epoch 179 Batch 120/172] avg loss 0.000818604, throughput 2.77403K wps
[Epoch 179 Batch 150/172] avg loss 0.001149, throughput 2.75282K wps
Begin Testing...
[Epoch 179] train avg loss 0.000946572, dev acc 0.8543, dev avg loss 0.533075, throughput 2.73827K wps
[Epoch 180 Batch 30/172] avg loss 0.000836081, throughput 2.79284K wps
[Epoch 180 Batch 60/172] avg loss 0.000882444, throughput 2.80618K wps
[Epoch 180 Batch 90/172] avg loss 0.000910226, throughput 2.82701K wps
[Epoch 180 Batch 120/172] avg loss 0.00102508, throughput 2.79696K wps
[Epoch 180 Batch 150/172] avg loss 0.000857523, throughput 2.82154K wps
Begin Testing...
[Epoch 180] train avg loss 0.000904522, dev acc 0.8616, dev avg loss 0.547433, throughput 2.81051K wps
[Epoch 181 Batch 30/172] avg loss 0.000816359, throughput 2.85612K wps
[Epoch 181 Batch 60/172] avg loss 0.000953771, throughput 2.76019K wps
[Epoch 181 Batch 90/172] avg loss 0.00087123, throughput 2.67152K wps
[Epoch 181 Batch 120/172] avg loss 0.000950504, throughput 2.82928K wps
[Epoch 181 Batch 150/172] avg loss 0.000804935, throughput 2.68068K wps
Begin Testing...
[Epoch 181] train avg loss 0.000910992, dev acc 0.8553, dev avg loss 0.554276, throughput 2.75658K wps
[Epoch 182 Batch 30/172] avg loss 0.00099786, throughput 2.73228K wps
[Epoch 182 Batch 60/172] avg loss 0.000862715, throughput 2.80297K wps
[Epoch 182 Batch 90/172] avg loss 0.00080288, throughput 2.77883K wps
[Epoch 182 Batch 120/172] avg loss 0.000890398, throughput 2.78118K wps
[Epoch 182 Batch 150/172] avg loss 0.000837645, throughput 2.74777K wps
Begin Testing...
[Epoch 182] train avg loss 0.000916953, dev acc 0.8585, dev avg loss 0.55172, throughput 2.76312K wps
[Epoch 183 Batch 30/172] avg loss 0.00075779, throughput 2.77859K wps
[Epoch 183 Batch 60/172] avg loss 0.000793193, throughput 2.67977K wps
[Epoch 183 Batch 90/172] avg loss 0.000924918, throughput 2.68423K wps
[Epoch 183 Batch 120/172] avg loss 0.00105112, throughput 2.78622K wps
[Epoch 183 Batch 150/172] avg loss 0.00087742, throughput 2.77948K wps
Begin Testing...
[Epoch 183] train avg loss 0.000923188, dev acc 0.8532, dev avg loss 0.530477, throughput 2.73685K wps
[Epoch 184 Batch 30/172] avg loss 0.000861903, throughput 2.78576K wps
[Epoch 184 Batch 60/172] avg loss 0.000908813, throughput 2.67654K wps
[Epoch 184 Batch 90/172] avg loss 0.000953649, throughput 2.83292K wps
[Epoch 184 Batch 120/172] avg loss 0.000921233, throughput 2.75807K wps
[Epoch 184 Batch 150/172] avg loss 0.000812052, throughput 2.7091K wps
Begin Testing...
[Epoch 184] train avg loss 0.000907469, dev acc 0.8585, dev avg loss 0.54433, throughput 2.75983K wps
[Epoch 185 Batch 30/172] avg loss 0.000916974, throughput 2.74833K wps
[Epoch 185 Batch 60/172] avg loss 0.000786332, throughput 2.79411K wps
[Epoch 185 Batch 90/172] avg loss 0.000961467, throughput 2.78659K wps
[Epoch 185 Batch 120/172] avg loss 0.000852124, throughput 2.78041K wps
[Epoch 185 Batch 150/172] avg loss 0.000870403, throughput 2.80395K wps
Begin Testing...
[Epoch 185] train avg loss 0.000921351, dev acc 0.8585, dev avg loss 0.54519, throughput 2.78333K wps
[Epoch 186 Batch 30/172] avg loss 0.000990951, throughput 2.84509K wps
[Epoch 186 Batch 60/172] avg loss 0.000946962, throughput 2.79728K wps
[Epoch 186 Batch 90/172] avg loss 0.000893484, throughput 2.74889K wps
[Epoch 186 Batch 120/172] avg loss 0.000831503, throughput 2.81963K wps
[Epoch 186 Batch 150/172] avg loss 0.000813647, throughput 2.73988K wps
Begin Testing...
[Epoch 186] train avg loss 0.000877706, dev acc 0.8627, dev avg loss 0.625405, throughput 2.7736K wps
[Epoch 187 Batch 30/172] avg loss 0.000766145, throughput 2.77369K wps
[Epoch 187 Batch 60/172] avg loss 0.00105204, throughput 2.73962K wps
[Epoch 187 Batch 90/172] avg loss 0.000988674, throughput 2.72291K wps
[Epoch 187 Batch 120/172] avg loss 0.000797886, throughput 2.72215K wps
[Epoch 187 Batch 150/172] avg loss 0.000850038, throughput 2.78282K wps
Begin Testing...
[Epoch 187] train avg loss 0.000886488, dev acc 0.8595, dev avg loss 0.543757, throughput 2.75526K wps
[Epoch 188 Batch 30/172] avg loss 0.000772567, throughput 2.85256K wps
[Epoch 188 Batch 60/172] avg loss 0.000883711, throughput 2.73311K wps
[Epoch 188 Batch 90/172] avg loss 0.000931352, throughput 2.7509K wps
[Epoch 188 Batch 120/172] avg loss 0.000875128, throughput 2.80376K wps
[Epoch 188 Batch 150/172] avg loss 0.000828918, throughput 2.83085K wps
Begin Testing...
[Epoch 188] train avg loss 0.000890722, dev acc 0.8658, dev avg loss 0.588008, throughput 2.795K wps
[Epoch 189 Batch 30/172] avg loss 0.000782562, throughput 2.68542K wps
[Epoch 189 Batch 60/172] avg loss 0.000859125, throughput 2.78043K wps
[Epoch 189 Batch 90/172] avg loss 0.000663137, throughput 2.70517K wps
[Epoch 189 Batch 120/172] avg loss 0.000873202, throughput 2.8308K wps
[Epoch 189 Batch 150/172] avg loss 0.00113188, throughput 2.80511K wps
Begin Testing...
[Epoch 189] train avg loss 0.000861916, dev acc 0.8658, dev avg loss 0.574295, throughput 2.76396K wps
[Epoch 190 Batch 30/172] avg loss 0.000799824, throughput 2.87276K wps
[Epoch 190 Batch 60/172] avg loss 0.000721802, throughput 2.75532K wps
[Epoch 190 Batch 90/172] avg loss 0.000916817, throughput 2.7681K wps
[Epoch 190 Batch 120/172] avg loss 0.000960583, throughput 2.76736K wps
[Epoch 190 Batch 150/172] avg loss 0.000935623, throughput 2.78741K wps
Begin Testing...
[Epoch 190] train avg loss 0.000881267, dev acc 0.8553, dev avg loss 0.554098, throughput 2.78651K wps
[Epoch 191 Batch 30/172] avg loss 0.000733624, throughput 2.82321K wps
[Epoch 191 Batch 60/172] avg loss 0.000775376, throughput 2.62187K wps
[Epoch 191 Batch 90/172] avg loss 0.000694212, throughput 2.77612K wps
[Epoch 191 Batch 120/172] avg loss 0.000909683, throughput 2.62674K wps
[Epoch 191 Batch 150/172] avg loss 0.00101691, throughput 2.81301K wps
Begin Testing...
[Epoch 191] train avg loss 0.00084032, dev acc 0.8595, dev avg loss 0.585912, throughput 2.73563K wps
[Epoch 192 Batch 30/172] avg loss 0.00069402, throughput 2.7355K wps
[Epoch 192 Batch 60/172] avg loss 0.000973095, throughput 2.74502K wps
[Epoch 192 Batch 90/172] avg loss 0.000738252, throughput 2.7018K wps
[Epoch 192 Batch 120/172] avg loss 0.000912504, throughput 2.7732K wps
[Epoch 192 Batch 150/172] avg loss 0.000736305, throughput 2.76788K wps
Begin Testing...
[Epoch 192] train avg loss 0.000838792, dev acc 0.8616, dev avg loss 0.58302, throughput 2.73276K wps
[Epoch 193 Batch 30/172] avg loss 0.000907716, throughput 2.81986K wps
[Epoch 193 Batch 60/172] avg loss 0.000775616, throughput 2.68477K wps
[Epoch 193 Batch 90/172] avg loss 0.000915416, throughput 2.81558K wps
[Epoch 193 Batch 120/172] avg loss 0.000760827, throughput 2.71237K wps
[Epoch 193 Batch 150/172] avg loss 0.00092635, throughput 2.82799K wps
Begin Testing...
[Epoch 193] train avg loss 0.00084878, dev acc 0.8595, dev avg loss 0.569651, throughput 2.77593K wps
[Epoch 194 Batch 30/172] avg loss 0.000740477, throughput 2.79423K wps
[Epoch 194 Batch 60/172] avg loss 0.00103763, throughput 2.63272K wps
[Epoch 194 Batch 90/172] avg loss 0.000682177, throughput 2.76207K wps
[Epoch 194 Batch 120/172] avg loss 0.000760008, throughput 2.76089K wps
[Epoch 194 Batch 150/172] avg loss 0.000797904, throughput 2.78594K wps
Begin Testing...
[Epoch 194] train avg loss 0.000836925, dev acc 0.8616, dev avg loss 0.587793, throughput 2.75235K wps
[Epoch 195 Batch 30/172] avg loss 0.000871203, throughput 2.8668K wps
[Epoch 195 Batch 60/172] avg loss 0.000793901, throughput 2.73848K wps
[Epoch 195 Batch 90/172] avg loss 0.000865376, throughput 2.69886K wps
[Epoch 195 Batch 120/172] avg loss 0.000762619, throughput 2.73919K wps
[Epoch 195 Batch 150/172] avg loss 0.00081019, throughput 2.68369K wps
Begin Testing...
[Epoch 195] train avg loss 0.000846456, dev acc 0.8553, dev avg loss 0.555604, throughput 2.74125K wps
[Epoch 196 Batch 30/172] avg loss 0.00072615, throughput 2.74568K wps
[Epoch 196 Batch 60/172] avg loss 0.00109047, throughput 2.72023K wps
[Epoch 196 Batch 90/172] avg loss 0.000775995, throughput 2.78101K wps
[Epoch 196 Batch 120/172] avg loss 0.000918599, throughput 2.6786K wps
[Epoch 196 Batch 150/172] avg loss 0.000676152, throughput 2.76983K wps
Begin Testing...
[Epoch 196] train avg loss 0.00082827, dev acc 0.8595, dev avg loss 0.572257, throughput 2.72318K wps
[Epoch 197 Batch 30/172] avg loss 0.000702107, throughput 2.78899K wps
[Epoch 197 Batch 60/172] avg loss 0.000833803, throughput 2.76854K wps
[Epoch 197 Batch 90/172] avg loss 0.00103564, throughput 2.67809K wps
[Epoch 197 Batch 120/172] avg loss 0.000940762, throughput 2.76K wps
[Epoch 197 Batch 150/172] avg loss 0.000871611, throughput 2.68543K wps
Begin Testing...
[Epoch 197] train avg loss 0.000848355, dev acc 0.8564, dev avg loss 0.565254, throughput 2.73703K wps
[Epoch 198 Batch 30/172] avg loss 0.000740981, throughput 2.70917K wps
[Epoch 198 Batch 60/172] avg loss 0.000985854, throughput 2.74518K wps
[Epoch 198 Batch 90/172] avg loss 0.000608375, throughput 2.66374K wps
[Epoch 198 Batch 120/172] avg loss 0.000886726, throughput 2.79052K wps
[Epoch 198 Batch 150/172] avg loss 0.000962206, throughput 2.70507K wps
Begin Testing...
[Epoch 198] train avg loss 0.000830097, dev acc 0.8574, dev avg loss 0.58082, throughput 2.73251K wps
[Epoch 199 Batch 30/172] avg loss 0.000995874, throughput 2.81498K wps
[Epoch 199 Batch 60/172] avg loss 0.000883311, throughput 2.69166K wps
[Epoch 199 Batch 90/172] avg loss 0.000714063, throughput 2.79521K wps
[Epoch 199 Batch 120/172] avg loss 0.000774207, throughput 2.75575K wps
[Epoch 199 Batch 150/172] avg loss 0.000823143, throughput 2.813K wps
Begin Testing...
[Epoch 199] train avg loss 0.000845947, dev acc 0.8564, dev avg loss 0.558647, throughput 2.77454K wps
Test loss 0.43285, test acc 0.8632
Total time cost 477.40s
[Epoch 0 Batch 30/172] avg loss 0.0130318, throughput 2.53654K wps
[Epoch 0 Batch 60/172] avg loss 0.012723, throughput 2.80975K wps
[Epoch 0 Batch 90/172] avg loss 0.0124958, throughput 2.74629K wps
[Epoch 0 Batch 120/172] avg loss 0.012199, throughput 2.75841K wps
[Epoch 0 Batch 150/172] avg loss 0.0121218, throughput 2.7964K wps
Begin Testing...
[Epoch 0] train avg loss 0.0125479, dev acc 0.7044, dev avg loss 0.603158, throughput 2.74251K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0123074, throughput 2.85173K wps
[Epoch 1 Batch 60/172] avg loss 0.0121898, throughput 2.80214K wps
[Epoch 1 Batch 90/172] avg loss 0.0123258, throughput 2.77924K wps
[Epoch 1 Batch 120/172] avg loss 0.0122273, throughput 2.77496K wps
[Epoch 1 Batch 150/172] avg loss 0.0125604, throughput 2.77444K wps
Begin Testing...
[Epoch 1] train avg loss 0.0123876, dev acc 0.7044, dev avg loss 0.60621, throughput 2.7957K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0123828, throughput 2.86488K wps
[Epoch 2 Batch 60/172] avg loss 0.0124145, throughput 2.79776K wps
[Epoch 2 Batch 90/172] avg loss 0.0121499, throughput 2.78371K wps
[Epoch 2 Batch 120/172] avg loss 0.0123536, throughput 2.72671K wps
[Epoch 2 Batch 150/172] avg loss 0.012285, throughput 2.77451K wps
Begin Testing...
[Epoch 2] train avg loss 0.0123732, dev acc 0.7044, dev avg loss 0.602932, throughput 2.78476K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0126286, throughput 2.76166K wps
[Epoch 3 Batch 60/172] avg loss 0.0121688, throughput 2.78857K wps
[Epoch 3 Batch 90/172] avg loss 0.0123288, throughput 2.76665K wps
[Epoch 3 Batch 120/172] avg loss 0.0122264, throughput 2.76753K wps
[Epoch 3 Batch 150/172] avg loss 0.0121185, throughput 2.76646K wps
Begin Testing...
[Epoch 3] train avg loss 0.0123243, dev acc 0.7044, dev avg loss 0.599901, throughput 2.7748K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0124816, throughput 2.87302K wps
[Epoch 4 Batch 60/172] avg loss 0.0121898, throughput 2.78533K wps
[Epoch 4 Batch 90/172] avg loss 0.0121783, throughput 2.78467K wps
[Epoch 4 Batch 120/172] avg loss 0.0121864, throughput 2.78794K wps
[Epoch 4 Batch 150/172] avg loss 0.0124458, throughput 2.77978K wps
Begin Testing...
[Epoch 4] train avg loss 0.0122926, dev acc 0.7044, dev avg loss 0.59798, throughput 2.80255K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0123552, throughput 2.77514K wps
[Epoch 5 Batch 60/172] avg loss 0.0122941, throughput 2.73759K wps
[Epoch 5 Batch 90/172] avg loss 0.0124344, throughput 2.73713K wps
[Epoch 5 Batch 120/172] avg loss 0.0121884, throughput 2.72808K wps
[Epoch 5 Batch 150/172] avg loss 0.0120046, throughput 2.88128K wps
Begin Testing...
[Epoch 5] train avg loss 0.0122775, dev acc 0.7044, dev avg loss 0.597519, throughput 2.76019K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0121889, throughput 2.86182K wps
[Epoch 6 Batch 60/172] avg loss 0.0123952, throughput 2.7938K wps
[Epoch 6 Batch 90/172] avg loss 0.012173, throughput 2.77412K wps
[Epoch 6 Batch 120/172] avg loss 0.012123, throughput 2.76907K wps
[Epoch 6 Batch 150/172] avg loss 0.0122545, throughput 2.78441K wps
Begin Testing...
[Epoch 6] train avg loss 0.0122506, dev acc 0.7044, dev avg loss 0.595978, throughput 2.79795K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.0120082, throughput 2.81035K wps
[Epoch 7 Batch 60/172] avg loss 0.0120749, throughput 2.79337K wps
[Epoch 7 Batch 90/172] avg loss 0.0124247, throughput 2.6997K wps
[Epoch 7 Batch 120/172] avg loss 0.0123193, throughput 2.74964K wps
[Epoch 7 Batch 150/172] avg loss 0.0120923, throughput 2.68179K wps
Begin Testing...
[Epoch 7] train avg loss 0.0122165, dev acc 0.7044, dev avg loss 0.59603, throughput 2.75451K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.0118238, throughput 2.80292K wps
[Epoch 8 Batch 60/172] avg loss 0.0125306, throughput 2.71815K wps
[Epoch 8 Batch 90/172] avg loss 0.0124714, throughput 2.80555K wps
[Epoch 8 Batch 120/172] avg loss 0.0118106, throughput 2.69215K wps
[Epoch 8 Batch 150/172] avg loss 0.0121778, throughput 2.75756K wps
Begin Testing...
[Epoch 8] train avg loss 0.0121905, dev acc 0.7044, dev avg loss 0.594429, throughput 2.75496K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.0120847, throughput 2.79425K wps
[Epoch 9 Batch 60/172] avg loss 0.0121795, throughput 2.80913K wps
[Epoch 9 Batch 90/172] avg loss 0.0124528, throughput 2.78525K wps
[Epoch 9 Batch 120/172] avg loss 0.0121981, throughput 2.77694K wps
[Epoch 9 Batch 150/172] avg loss 0.0119764, throughput 2.75556K wps
Begin Testing...
[Epoch 9] train avg loss 0.0121662, dev acc 0.7044, dev avg loss 0.59186, throughput 2.78299K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.0120391, throughput 2.88219K wps
[Epoch 10 Batch 60/172] avg loss 0.0118634, throughput 2.82077K wps
[Epoch 10 Batch 90/172] avg loss 0.0120985, throughput 2.81945K wps
[Epoch 10 Batch 120/172] avg loss 0.0122045, throughput 2.79164K wps
[Epoch 10 Batch 150/172] avg loss 0.0122421, throughput 2.79489K wps
Begin Testing...
[Epoch 10] train avg loss 0.0121422, dev acc 0.7044, dev avg loss 0.593256, throughput 2.81834K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.0117819, throughput 2.81443K wps
[Epoch 11 Batch 60/172] avg loss 0.0122522, throughput 2.78486K wps
[Epoch 11 Batch 90/172] avg loss 0.0120446, throughput 2.78858K wps
[Epoch 11 Batch 120/172] avg loss 0.0121763, throughput 2.75738K wps
[Epoch 11 Batch 150/172] avg loss 0.0121101, throughput 2.78637K wps
Begin Testing...
[Epoch 11] train avg loss 0.0121147, dev acc 0.7044, dev avg loss 0.590425, throughput 2.78939K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.0120599, throughput 2.79036K wps
[Epoch 12 Batch 60/172] avg loss 0.0122144, throughput 2.75551K wps
[Epoch 12 Batch 90/172] avg loss 0.011763, throughput 2.77306K wps
[Epoch 12 Batch 120/172] avg loss 0.0119891, throughput 2.76802K wps
[Epoch 12 Batch 150/172] avg loss 0.0121522, throughput 2.75673K wps
Begin Testing...
[Epoch 12] train avg loss 0.0120836, dev acc 0.7044, dev avg loss 0.589237, throughput 2.76743K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.0120385, throughput 2.65556K wps
[Epoch 13 Batch 60/172] avg loss 0.0121853, throughput 2.79861K wps
[Epoch 13 Batch 90/172] avg loss 0.0123253, throughput 2.72823K wps
[Epoch 13 Batch 120/172] avg loss 0.0120593, throughput 2.75025K wps
[Epoch 13 Batch 150/172] avg loss 0.0120565, throughput 2.80963K wps
Begin Testing...
[Epoch 13] train avg loss 0.0120505, dev acc 0.7044, dev avg loss 0.586021, throughput 2.75326K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.0120149, throughput 2.824K wps
[Epoch 14 Batch 60/172] avg loss 0.011854, throughput 2.78922K wps
[Epoch 14 Batch 90/172] avg loss 0.012022, throughput 2.79445K wps
[Epoch 14 Batch 120/172] avg loss 0.0120762, throughput 2.79474K wps
[Epoch 14 Batch 150/172] avg loss 0.012139, throughput 2.80387K wps
Begin Testing...
[Epoch 14] train avg loss 0.0120138, dev acc 0.7044, dev avg loss 0.58507, throughput 2.79078K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.012136, throughput 2.86689K wps
[Epoch 15 Batch 60/172] avg loss 0.0119815, throughput 2.69115K wps
[Epoch 15 Batch 90/172] avg loss 0.0117505, throughput 2.80043K wps
[Epoch 15 Batch 120/172] avg loss 0.0118279, throughput 2.75772K wps
[Epoch 15 Batch 150/172] avg loss 0.0121027, throughput 2.67776K wps
Begin Testing...
[Epoch 15] train avg loss 0.0119644, dev acc 0.7044, dev avg loss 0.583444, throughput 2.7554K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.0118157, throughput 2.8058K wps
[Epoch 16 Batch 60/172] avg loss 0.012307, throughput 2.77681K wps
[Epoch 16 Batch 90/172] avg loss 0.0117514, throughput 2.78646K wps
[Epoch 16 Batch 120/172] avg loss 0.0116761, throughput 2.67546K wps
[Epoch 16 Batch 150/172] avg loss 0.012005, throughput 2.75094K wps
Begin Testing...
[Epoch 16] train avg loss 0.0119178, dev acc 0.7044, dev avg loss 0.581485, throughput 2.73602K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/172] avg loss 0.0118279, throughput 2.77193K wps
[Epoch 17 Batch 60/172] avg loss 0.0117031, throughput 2.69202K wps
[Epoch 17 Batch 90/172] avg loss 0.011839, throughput 2.68063K wps
[Epoch 17 Batch 120/172] avg loss 0.0120684, throughput 2.70455K wps
[Epoch 17 Batch 150/172] avg loss 0.01175, throughput 2.7752K wps
Begin Testing...
[Epoch 17] train avg loss 0.0118543, dev acc 0.7044, dev avg loss 0.578854, throughput 2.70765K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.0117893, throughput 2.7486K wps
[Epoch 18 Batch 60/172] avg loss 0.01179, throughput 2.69778K wps
[Epoch 18 Batch 90/172] avg loss 0.0117752, throughput 2.73951K wps
[Epoch 18 Batch 120/172] avg loss 0.0117845, throughput 2.77527K wps
[Epoch 18 Batch 150/172] avg loss 0.0116596, throughput 2.77835K wps
Begin Testing...
[Epoch 18] train avg loss 0.0117976, dev acc 0.7044, dev avg loss 0.576266, throughput 2.75238K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.0117193, throughput 2.83194K wps
[Epoch 19 Batch 60/172] avg loss 0.0116171, throughput 2.75084K wps
[Epoch 19 Batch 90/172] avg loss 0.012084, throughput 2.76188K wps
[Epoch 19 Batch 120/172] avg loss 0.0113454, throughput 2.71992K wps
[Epoch 19 Batch 150/172] avg loss 0.011688, throughput 2.77598K wps
Begin Testing...
[Epoch 19] train avg loss 0.0117219, dev acc 0.7044, dev avg loss 0.573895, throughput 2.77328K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.0117525, throughput 2.82024K wps
[Epoch 20 Batch 60/172] avg loss 0.0117848, throughput 2.64572K wps
[Epoch 20 Batch 90/172] avg loss 0.0116221, throughput 2.86058K wps
[Epoch 20 Batch 120/172] avg loss 0.011516, throughput 2.78404K wps
[Epoch 20 Batch 150/172] avg loss 0.0115308, throughput 2.80015K wps
Begin Testing...
[Epoch 20] train avg loss 0.0116513, dev acc 0.7044, dev avg loss 0.569632, throughput 2.78098K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/172] avg loss 0.011671, throughput 2.67419K wps
[Epoch 21 Batch 60/172] avg loss 0.011482, throughput 2.71963K wps
[Epoch 21 Batch 90/172] avg loss 0.0112229, throughput 2.69292K wps
[Epoch 21 Batch 120/172] avg loss 0.0118976, throughput 2.78138K wps
[Epoch 21 Batch 150/172] avg loss 0.0113208, throughput 2.73533K wps
Begin Testing...
[Epoch 21] train avg loss 0.0115623, dev acc 0.7055, dev avg loss 0.566564, throughput 2.73639K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.0113629, throughput 2.80542K wps
[Epoch 22 Batch 60/172] avg loss 0.0115613, throughput 2.76463K wps
[Epoch 22 Batch 90/172] avg loss 0.0114238, throughput 2.72951K wps
[Epoch 22 Batch 120/172] avg loss 0.011516, throughput 2.81926K wps
[Epoch 22 Batch 150/172] avg loss 0.0114715, throughput 2.80336K wps
Begin Testing...
[Epoch 22] train avg loss 0.0114654, dev acc 0.7055, dev avg loss 0.561421, throughput 2.7829K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/172] avg loss 0.0116199, throughput 2.82079K wps
[Epoch 23 Batch 60/172] avg loss 0.011202, throughput 2.75494K wps
[Epoch 23 Batch 90/172] avg loss 0.0114974, throughput 2.76628K wps
[Epoch 23 Batch 120/172] avg loss 0.0112012, throughput 2.72562K wps
[Epoch 23 Batch 150/172] avg loss 0.0113531, throughput 2.79032K wps
Begin Testing...
[Epoch 23] train avg loss 0.0113541, dev acc 0.7159, dev avg loss 0.556411, throughput 2.77297K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.0112682, throughput 2.64177K wps
[Epoch 24 Batch 60/172] avg loss 0.0110862, throughput 2.86631K wps
[Epoch 24 Batch 90/172] avg loss 0.0110494, throughput 2.78489K wps
[Epoch 24 Batch 120/172] avg loss 0.0110469, throughput 2.74995K wps
[Epoch 24 Batch 150/172] avg loss 0.0113561, throughput 2.68312K wps
Begin Testing...
[Epoch 24] train avg loss 0.0111856, dev acc 0.7191, dev avg loss 0.551594, throughput 2.75155K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/172] avg loss 0.011132, throughput 2.85442K wps
[Epoch 25 Batch 60/172] avg loss 0.0110563, throughput 2.73541K wps
[Epoch 25 Batch 90/172] avg loss 0.011171, throughput 2.69774K wps
[Epoch 25 Batch 120/172] avg loss 0.0111879, throughput 2.78542K wps
[Epoch 25 Batch 150/172] avg loss 0.0108291, throughput 2.76673K wps
Begin Testing...
[Epoch 25] train avg loss 0.0110418, dev acc 0.7222, dev avg loss 0.545042, throughput 2.77031K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/172] avg loss 0.010594, throughput 2.84376K wps
[Epoch 26 Batch 60/172] avg loss 0.0109992, throughput 2.80029K wps
[Epoch 26 Batch 90/172] avg loss 0.0110067, throughput 2.7648K wps
[Epoch 26 Batch 120/172] avg loss 0.0108124, throughput 2.77751K wps
[Epoch 26 Batch 150/172] avg loss 0.0110077, throughput 2.62065K wps
Begin Testing...
[Epoch 26] train avg loss 0.0108814, dev acc 0.7411, dev avg loss 0.543273, throughput 2.74754K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/172] avg loss 0.0107655, throughput 2.70951K wps
[Epoch 27 Batch 60/172] avg loss 0.0106942, throughput 2.83095K wps
[Epoch 27 Batch 90/172] avg loss 0.0106132, throughput 2.77188K wps
[Epoch 27 Batch 120/172] avg loss 0.0107501, throughput 2.78123K wps
[Epoch 27 Batch 150/172] avg loss 0.0107352, throughput 2.76893K wps
Begin Testing...
[Epoch 27] train avg loss 0.01072, dev acc 0.7379, dev avg loss 0.531688, throughput 2.77327K wps
[Epoch 28 Batch 30/172] avg loss 0.0103709, throughput 2.8426K wps
[Epoch 28 Batch 60/172] avg loss 0.0105488, throughput 2.76951K wps
[Epoch 28 Batch 90/172] avg loss 0.0105579, throughput 2.83222K wps
[Epoch 28 Batch 120/172] avg loss 0.0104146, throughput 2.82452K wps
[Epoch 28 Batch 150/172] avg loss 0.0101342, throughput 2.69939K wps
Begin Testing...
[Epoch 28] train avg loss 0.0104452, dev acc 0.7474, dev avg loss 0.522937, throughput 2.79129K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/172] avg loss 0.0104669, throughput 2.85878K wps
[Epoch 29 Batch 60/172] avg loss 0.0104298, throughput 2.80178K wps
[Epoch 29 Batch 90/172] avg loss 0.0105082, throughput 2.79376K wps
[Epoch 29 Batch 120/172] avg loss 0.0101689, throughput 2.80382K wps
[Epoch 29 Batch 150/172] avg loss 0.0098588, throughput 2.77253K wps
Begin Testing...
[Epoch 29] train avg loss 0.0102332, dev acc 0.7505, dev avg loss 0.511988, throughput 2.80461K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/172] avg loss 0.0100881, throughput 2.86525K wps
[Epoch 30 Batch 60/172] avg loss 0.0100242, throughput 2.77809K wps
[Epoch 30 Batch 90/172] avg loss 0.00986396, throughput 2.78843K wps
[Epoch 30 Batch 120/172] avg loss 0.00998088, throughput 2.74108K wps
[Epoch 30 Batch 150/172] avg loss 0.00991164, throughput 2.65724K wps
Begin Testing...
[Epoch 30] train avg loss 0.00994409, dev acc 0.7631, dev avg loss 0.501655, throughput 2.76531K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/172] avg loss 0.00990516, throughput 2.78895K wps
[Epoch 31 Batch 60/172] avg loss 0.00983939, throughput 2.74881K wps
[Epoch 31 Batch 90/172] avg loss 0.00976225, throughput 2.641K wps
[Epoch 31 Batch 120/172] avg loss 0.00983422, throughput 2.72945K wps
[Epoch 31 Batch 150/172] avg loss 0.00924849, throughput 2.74544K wps
Begin Testing...
[Epoch 31] train avg loss 0.00968892, dev acc 0.7673, dev avg loss 0.490908, throughput 2.73115K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/172] avg loss 0.00932525, throughput 2.78522K wps
[Epoch 32 Batch 60/172] avg loss 0.00956401, throughput 2.73522K wps
[Epoch 32 Batch 90/172] avg loss 0.00912387, throughput 2.82275K wps
[Epoch 32 Batch 120/172] avg loss 0.00980927, throughput 2.81023K wps
[Epoch 32 Batch 150/172] avg loss 0.00915825, throughput 2.73385K wps
Begin Testing...
[Epoch 32] train avg loss 0.00936896, dev acc 0.7799, dev avg loss 0.480578, throughput 2.77514K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/172] avg loss 0.0093867, throughput 2.78871K wps
[Epoch 33 Batch 60/172] avg loss 0.00929272, throughput 2.84246K wps
[Epoch 33 Batch 90/172] avg loss 0.00900685, throughput 2.83068K wps
[Epoch 33 Batch 120/172] avg loss 0.0088392, throughput 2.82106K wps
[Epoch 33 Batch 150/172] avg loss 0.00874101, throughput 2.82007K wps
Begin Testing...
[Epoch 33] train avg loss 0.00904292, dev acc 0.7851, dev avg loss 0.46785, throughput 2.82002K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/172] avg loss 0.00836751, throughput 2.75651K wps
[Epoch 34 Batch 60/172] avg loss 0.00893504, throughput 2.76624K wps
[Epoch 34 Batch 90/172] avg loss 0.0089009, throughput 2.64146K wps
[Epoch 34 Batch 120/172] avg loss 0.00871853, throughput 2.76166K wps
[Epoch 34 Batch 150/172] avg loss 0.00883466, throughput 2.8248K wps
Begin Testing...
[Epoch 34] train avg loss 0.00873271, dev acc 0.7914, dev avg loss 0.456667, throughput 2.74893K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/172] avg loss 0.00836482, throughput 2.79995K wps
[Epoch 35 Batch 60/172] avg loss 0.00863083, throughput 2.74468K wps
[Epoch 35 Batch 90/172] avg loss 0.00829384, throughput 2.75808K wps
[Epoch 35 Batch 120/172] avg loss 0.00861772, throughput 2.78245K wps
[Epoch 35 Batch 150/172] avg loss 0.00823399, throughput 2.78636K wps
Begin Testing...
[Epoch 35] train avg loss 0.00840797, dev acc 0.7987, dev avg loss 0.445572, throughput 2.77244K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/172] avg loss 0.00811264, throughput 2.75886K wps
[Epoch 36 Batch 60/172] avg loss 0.00812095, throughput 2.75729K wps
[Epoch 36 Batch 90/172] avg loss 0.00795025, throughput 2.75635K wps
[Epoch 36 Batch 120/172] avg loss 0.00822255, throughput 2.7132K wps
[Epoch 36 Batch 150/172] avg loss 0.00787449, throughput 2.75229K wps
Begin Testing...
[Epoch 36] train avg loss 0.0080596, dev acc 0.8061, dev avg loss 0.434864, throughput 2.74738K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/172] avg loss 0.00767836, throughput 2.78967K wps
[Epoch 37 Batch 60/172] avg loss 0.00790922, throughput 2.78107K wps
[Epoch 37 Batch 90/172] avg loss 0.00746159, throughput 2.79328K wps
[Epoch 37 Batch 120/172] avg loss 0.0080727, throughput 2.77981K wps
[Epoch 37 Batch 150/172] avg loss 0.00765029, throughput 2.78366K wps
Begin Testing...
[Epoch 37] train avg loss 0.00774238, dev acc 0.8208, dev avg loss 0.424791, throughput 2.78319K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/172] avg loss 0.00755986, throughput 2.83399K wps
[Epoch 38 Batch 60/172] avg loss 0.00778318, throughput 2.79213K wps
[Epoch 38 Batch 90/172] avg loss 0.00712719, throughput 2.78393K wps
[Epoch 38 Batch 120/172] avg loss 0.00714314, throughput 2.78087K wps
[Epoch 38 Batch 150/172] avg loss 0.00717886, throughput 2.79546K wps
Begin Testing...
[Epoch 38] train avg loss 0.0073214, dev acc 0.8218, dev avg loss 0.414364, throughput 2.7948K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/172] avg loss 0.00733604, throughput 2.79417K wps
[Epoch 39 Batch 60/172] avg loss 0.0069018, throughput 2.42344K wps
[Epoch 39 Batch 90/172] avg loss 0.00713221, throughput 2.80121K wps
[Epoch 39 Batch 120/172] avg loss 0.00693105, throughput 2.78739K wps
[Epoch 39 Batch 150/172] avg loss 0.00712417, throughput 2.75519K wps
Begin Testing...
[Epoch 39] train avg loss 0.00708356, dev acc 0.8260, dev avg loss 0.406668, throughput 2.71538K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/172] avg loss 0.00677608, throughput 2.82896K wps
[Epoch 40 Batch 60/172] avg loss 0.00681857, throughput 2.65856K wps
[Epoch 40 Batch 90/172] avg loss 0.00665977, throughput 2.81038K wps
[Epoch 40 Batch 120/172] avg loss 0.00669704, throughput 2.5961K wps
[Epoch 40 Batch 150/172] avg loss 0.00697358, throughput 2.79683K wps
Begin Testing...
[Epoch 40] train avg loss 0.00676314, dev acc 0.8365, dev avg loss 0.39795, throughput 2.74041K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/172] avg loss 0.00677896, throughput 2.81865K wps
[Epoch 41 Batch 60/172] avg loss 0.00619297, throughput 2.72386K wps
[Epoch 41 Batch 90/172] avg loss 0.0064497, throughput 2.70023K wps
[Epoch 41 Batch 120/172] avg loss 0.00639552, throughput 2.70225K wps
[Epoch 41 Batch 150/172] avg loss 0.00641768, throughput 2.76536K wps
Begin Testing...
[Epoch 41] train avg loss 0.00646568, dev acc 0.8302, dev avg loss 0.391741, throughput 2.74135K wps
[Epoch 42 Batch 30/172] avg loss 0.00638978, throughput 2.74675K wps
[Epoch 42 Batch 60/172] avg loss 0.00649168, throughput 2.72627K wps
[Epoch 42 Batch 90/172] avg loss 0.00591874, throughput 2.76673K wps
[Epoch 42 Batch 120/172] avg loss 0.00635447, throughput 2.74858K wps
[Epoch 42 Batch 150/172] avg loss 0.00611645, throughput 2.80021K wps
Begin Testing...
[Epoch 42] train avg loss 0.00621504, dev acc 0.8428, dev avg loss 0.383702, throughput 2.76488K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/172] avg loss 0.00563124, throughput 2.77953K wps
[Epoch 43 Batch 60/172] avg loss 0.00585664, throughput 2.68314K wps
[Epoch 43 Batch 90/172] avg loss 0.00610949, throughput 2.87957K wps
[Epoch 43 Batch 120/172] avg loss 0.00574157, throughput 2.82549K wps
[Epoch 43 Batch 150/172] avg loss 0.00612357, throughput 2.64788K wps
Begin Testing...
[Epoch 43] train avg loss 0.00589282, dev acc 0.8333, dev avg loss 0.381773, throughput 2.7744K wps
[Epoch 44 Batch 30/172] avg loss 0.00569903, throughput 2.86076K wps
[Epoch 44 Batch 60/172] avg loss 0.00580534, throughput 2.80407K wps
[Epoch 44 Batch 90/172] avg loss 0.00542592, throughput 2.79898K wps
[Epoch 44 Batch 120/172] avg loss 0.00547734, throughput 2.78978K wps
[Epoch 44 Batch 150/172] avg loss 0.00583773, throughput 2.80227K wps
Begin Testing...
[Epoch 44] train avg loss 0.00562026, dev acc 0.8512, dev avg loss 0.3739, throughput 2.80661K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/172] avg loss 0.00510275, throughput 2.84088K wps
[Epoch 45 Batch 60/172] avg loss 0.00571689, throughput 2.78839K wps
[Epoch 45 Batch 90/172] avg loss 0.0052655, throughput 2.78352K wps
[Epoch 45 Batch 120/172] avg loss 0.0055726, throughput 2.74847K wps
[Epoch 45 Batch 150/172] avg loss 0.00532766, throughput 2.7759K wps
Begin Testing...
[Epoch 45] train avg loss 0.00537375, dev acc 0.8522, dev avg loss 0.368233, throughput 2.78494K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/172] avg loss 0.00513424, throughput 2.73991K wps
[Epoch 46 Batch 60/172] avg loss 0.00491772, throughput 2.76284K wps
[Epoch 46 Batch 90/172] avg loss 0.00492751, throughput 2.64164K wps
[Epoch 46 Batch 120/172] avg loss 0.00547913, throughput 2.64171K wps
[Epoch 46 Batch 150/172] avg loss 0.00502856, throughput 2.7792K wps
Begin Testing...
[Epoch 46] train avg loss 0.00510144, dev acc 0.8553, dev avg loss 0.364871, throughput 2.70561K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/172] avg loss 0.00483384, throughput 2.81409K wps
[Epoch 47 Batch 60/172] avg loss 0.00497971, throughput 2.6525K wps
[Epoch 47 Batch 90/172] avg loss 0.00493741, throughput 2.69296K wps
[Epoch 47 Batch 120/172] avg loss 0.00497475, throughput 2.66954K wps
[Epoch 47 Batch 150/172] avg loss 0.00488806, throughput 2.80182K wps
Begin Testing...
[Epoch 47] train avg loss 0.00490918, dev acc 0.8512, dev avg loss 0.36069, throughput 2.73299K wps
[Epoch 48 Batch 30/172] avg loss 0.00491722, throughput 2.82315K wps
[Epoch 48 Batch 60/172] avg loss 0.00441252, throughput 2.78679K wps
[Epoch 48 Batch 90/172] avg loss 0.00454311, throughput 2.79108K wps
[Epoch 48 Batch 120/172] avg loss 0.00501461, throughput 2.80105K wps
[Epoch 48 Batch 150/172] avg loss 0.00459895, throughput 2.77448K wps
Begin Testing...
[Epoch 48] train avg loss 0.00470009, dev acc 0.8532, dev avg loss 0.359656, throughput 2.79889K wps
[Epoch 49 Batch 30/172] avg loss 0.00475514, throughput 2.85475K wps
[Epoch 49 Batch 60/172] avg loss 0.00449658, throughput 2.73167K wps
[Epoch 49 Batch 90/172] avg loss 0.00446357, throughput 2.64669K wps
[Epoch 49 Batch 120/172] avg loss 0.00448045, throughput 2.76431K wps
[Epoch 49 Batch 150/172] avg loss 0.00434927, throughput 2.78364K wps
Begin Testing...
[Epoch 49] train avg loss 0.00449421, dev acc 0.8480, dev avg loss 0.362601, throughput 2.7542K wps
[Epoch 50 Batch 30/172] avg loss 0.00431621, throughput 2.60419K wps
[Epoch 50 Batch 60/172] avg loss 0.00433772, throughput 2.81978K wps
[Epoch 50 Batch 90/172] avg loss 0.00445921, throughput 2.79389K wps
[Epoch 50 Batch 120/172] avg loss 0.00433113, throughput 2.78012K wps
[Epoch 50 Batch 150/172] avg loss 0.00441886, throughput 2.70516K wps
Begin Testing...
[Epoch 50] train avg loss 0.00432152, dev acc 0.8543, dev avg loss 0.355676, throughput 2.73441K wps
[Epoch 51 Batch 30/172] avg loss 0.00399616, throughput 2.82046K wps
[Epoch 51 Batch 60/172] avg loss 0.00412928, throughput 2.77203K wps
[Epoch 51 Batch 90/172] avg loss 0.00394323, throughput 2.75941K wps
[Epoch 51 Batch 120/172] avg loss 0.00427727, throughput 2.74996K wps
[Epoch 51 Batch 150/172] avg loss 0.0040187, throughput 2.76269K wps
Begin Testing...
[Epoch 51] train avg loss 0.00411325, dev acc 0.8532, dev avg loss 0.353695, throughput 2.77201K wps
[Epoch 52 Batch 30/172] avg loss 0.00416828, throughput 2.80245K wps
[Epoch 52 Batch 60/172] avg loss 0.00402583, throughput 2.71958K wps
[Epoch 52 Batch 90/172] avg loss 0.00358416, throughput 2.72932K wps
[Epoch 52 Batch 120/172] avg loss 0.00416666, throughput 2.82018K wps
[Epoch 52 Batch 150/172] avg loss 0.00405954, throughput 2.8171K wps
Begin Testing...
[Epoch 52] train avg loss 0.00398059, dev acc 0.8532, dev avg loss 0.352539, throughput 2.78371K wps
[Epoch 53 Batch 30/172] avg loss 0.0038973, throughput 2.83812K wps
[Epoch 53 Batch 60/172] avg loss 0.00382616, throughput 2.82536K wps
[Epoch 53 Batch 90/172] avg loss 0.00351437, throughput 2.81961K wps
[Epoch 53 Batch 120/172] avg loss 0.00384392, throughput 2.80716K wps
[Epoch 53 Batch 150/172] avg loss 0.00375708, throughput 2.82377K wps
Begin Testing...
[Epoch 53] train avg loss 0.00383923, dev acc 0.8543, dev avg loss 0.352666, throughput 2.82094K wps
[Epoch 54 Batch 30/172] avg loss 0.00379488, throughput 2.71739K wps
[Epoch 54 Batch 60/172] avg loss 0.00362516, throughput 2.72985K wps
[Epoch 54 Batch 90/172] avg loss 0.00365141, throughput 2.68364K wps
[Epoch 54 Batch 120/172] avg loss 0.00358223, throughput 2.84222K wps
[Epoch 54 Batch 150/172] avg loss 0.00365075, throughput 2.64907K wps
Begin Testing...
[Epoch 54] train avg loss 0.00367865, dev acc 0.8564, dev avg loss 0.351997, throughput 2.73492K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/172] avg loss 0.00353412, throughput 2.8215K wps
[Epoch 55 Batch 60/172] avg loss 0.00382661, throughput 2.71693K wps
[Epoch 55 Batch 90/172] avg loss 0.00329685, throughput 2.79369K wps
[Epoch 55 Batch 120/172] avg loss 0.00361209, throughput 2.75969K wps
[Epoch 55 Batch 150/172] avg loss 0.00358006, throughput 2.71709K wps
Begin Testing...
[Epoch 55] train avg loss 0.0035376, dev acc 0.8574, dev avg loss 0.352903, throughput 2.77111K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/172] avg loss 0.00334257, throughput 2.80902K wps
[Epoch 56 Batch 60/172] avg loss 0.00324868, throughput 2.8091K wps
[Epoch 56 Batch 90/172] avg loss 0.00356033, throughput 2.81225K wps
[Epoch 56 Batch 120/172] avg loss 0.00388562, throughput 2.80786K wps
[Epoch 56 Batch 150/172] avg loss 0.00363004, throughput 2.81615K wps
Begin Testing...
[Epoch 56] train avg loss 0.00350469, dev acc 0.8543, dev avg loss 0.356877, throughput 2.81149K wps
[Epoch 57 Batch 30/172] avg loss 0.00317602, throughput 2.85607K wps
[Epoch 57 Batch 60/172] avg loss 0.00355452, throughput 2.70596K wps
[Epoch 57 Batch 90/172] avg loss 0.00346284, throughput 2.86232K wps
[Epoch 57 Batch 120/172] avg loss 0.0032947, throughput 2.76296K wps
[Epoch 57 Batch 150/172] avg loss 0.00307208, throughput 2.82181K wps
Begin Testing...
[Epoch 57] train avg loss 0.00335482, dev acc 0.8532, dev avg loss 0.355561, throughput 2.79391K wps
[Epoch 58 Batch 30/172] avg loss 0.00332864, throughput 2.76881K wps
[Epoch 58 Batch 60/172] avg loss 0.00318451, throughput 2.79109K wps
[Epoch 58 Batch 90/172] avg loss 0.00305202, throughput 2.76812K wps
[Epoch 58 Batch 120/172] avg loss 0.00329387, throughput 2.6767K wps
[Epoch 58 Batch 150/172] avg loss 0.00337807, throughput 2.80319K wps
Begin Testing...
[Epoch 58] train avg loss 0.00325654, dev acc 0.8553, dev avg loss 0.356594, throughput 2.7659K wps
[Epoch 59 Batch 30/172] avg loss 0.00300983, throughput 2.65882K wps
[Epoch 59 Batch 60/172] avg loss 0.00286199, throughput 2.77897K wps
[Epoch 59 Batch 90/172] avg loss 0.00328784, throughput 2.73214K wps
[Epoch 59 Batch 120/172] avg loss 0.00310039, throughput 2.63126K wps
[Epoch 59 Batch 150/172] avg loss 0.00312036, throughput 2.7716K wps
Begin Testing...
[Epoch 59] train avg loss 0.0031474, dev acc 0.8585, dev avg loss 0.36066, throughput 2.72082K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/172] avg loss 0.0029928, throughput 2.74743K wps
[Epoch 60 Batch 60/172] avg loss 0.00297766, throughput 2.79557K wps
[Epoch 60 Batch 90/172] avg loss 0.00298154, throughput 2.77941K wps
[Epoch 60 Batch 120/172] avg loss 0.0030359, throughput 2.76397K wps
[Epoch 60 Batch 150/172] avg loss 0.0031198, throughput 2.79838K wps
Begin Testing...
[Epoch 60] train avg loss 0.00307227, dev acc 0.8512, dev avg loss 0.359787, throughput 2.78243K wps
[Epoch 61 Batch 30/172] avg loss 0.00280551, throughput 2.85297K wps
[Epoch 61 Batch 60/172] avg loss 0.00327829, throughput 2.7981K wps
[Epoch 61 Batch 90/172] avg loss 0.00279482, throughput 2.77199K wps
[Epoch 61 Batch 120/172] avg loss 0.00298416, throughput 2.80314K wps
[Epoch 61 Batch 150/172] avg loss 0.00307518, throughput 2.78053K wps
Begin Testing...
[Epoch 61] train avg loss 0.00298244, dev acc 0.8564, dev avg loss 0.370874, throughput 2.79049K wps
[Epoch 62 Batch 30/172] avg loss 0.00289193, throughput 2.73136K wps
[Epoch 62 Batch 60/172] avg loss 0.00271175, throughput 2.73881K wps
[Epoch 62 Batch 90/172] avg loss 0.00278206, throughput 2.79041K wps
[Epoch 62 Batch 120/172] avg loss 0.00301651, throughput 2.79719K wps
[Epoch 62 Batch 150/172] avg loss 0.00282561, throughput 2.6841K wps
Begin Testing...
[Epoch 62] train avg loss 0.00289275, dev acc 0.8585, dev avg loss 0.363303, throughput 2.74943K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/172] avg loss 0.00267898, throughput 2.78912K wps
[Epoch 63 Batch 60/172] avg loss 0.0025908, throughput 2.76973K wps
[Epoch 63 Batch 90/172] avg loss 0.00275887, throughput 2.78168K wps
[Epoch 63 Batch 120/172] avg loss 0.00294736, throughput 2.79603K wps
[Epoch 63 Batch 150/172] avg loss 0.00321986, throughput 2.71769K wps
Begin Testing...
[Epoch 63] train avg loss 0.00281463, dev acc 0.8585, dev avg loss 0.371154, throughput 2.77324K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/172] avg loss 0.00250858, throughput 2.78518K wps
[Epoch 64 Batch 60/172] avg loss 0.0027578, throughput 2.79859K wps
[Epoch 64 Batch 90/172] avg loss 0.00297407, throughput 2.78145K wps
[Epoch 64 Batch 120/172] avg loss 0.00252449, throughput 2.78578K wps
[Epoch 64 Batch 150/172] avg loss 0.00273859, throughput 2.81673K wps
Begin Testing...
[Epoch 64] train avg loss 0.0027705, dev acc 0.8564, dev avg loss 0.371739, throughput 2.79575K wps
[Epoch 65 Batch 30/172] avg loss 0.0025436, throughput 2.84035K wps
[Epoch 65 Batch 60/172] avg loss 0.00269091, throughput 2.73617K wps
[Epoch 65 Batch 90/172] avg loss 0.00258917, throughput 2.79177K wps
[Epoch 65 Batch 120/172] avg loss 0.0023748, throughput 2.80316K wps
[Epoch 65 Batch 150/172] avg loss 0.00261109, throughput 2.77371K wps
Begin Testing...
[Epoch 65] train avg loss 0.00260658, dev acc 0.8564, dev avg loss 0.374248, throughput 2.78889K wps
[Epoch 66 Batch 30/172] avg loss 0.00275382, throughput 2.64037K wps
[Epoch 66 Batch 60/172] avg loss 0.00260705, throughput 2.81936K wps
[Epoch 66 Batch 90/172] avg loss 0.00226699, throughput 2.81486K wps
[Epoch 66 Batch 120/172] avg loss 0.0028488, throughput 2.79223K wps
[Epoch 66 Batch 150/172] avg loss 0.00227241, throughput 2.75973K wps
Begin Testing...
[Epoch 66] train avg loss 0.00259295, dev acc 0.8564, dev avg loss 0.374861, throughput 2.76745K wps
[Epoch 67 Batch 30/172] avg loss 0.00267203, throughput 2.8345K wps
[Epoch 67 Batch 60/172] avg loss 0.00243444, throughput 2.73272K wps
[Epoch 67 Batch 90/172] avg loss 0.0023595, throughput 2.80771K wps
[Epoch 67 Batch 120/172] avg loss 0.00276188, throughput 2.69775K wps
[Epoch 67 Batch 150/172] avg loss 0.00221201, throughput 2.66511K wps
Begin Testing...
[Epoch 67] train avg loss 0.00250495, dev acc 0.8553, dev avg loss 0.378304, throughput 2.75012K wps
[Epoch 68 Batch 30/172] avg loss 0.00274329, throughput 2.7708K wps
[Epoch 68 Batch 60/172] avg loss 0.00258074, throughput 2.77818K wps
[Epoch 68 Batch 90/172] avg loss 0.00243517, throughput 2.66531K wps
[Epoch 68 Batch 120/172] avg loss 0.00256939, throughput 2.7511K wps
[Epoch 68 Batch 150/172] avg loss 0.00252197, throughput 2.79854K wps
Begin Testing...
[Epoch 68] train avg loss 0.00254629, dev acc 0.8585, dev avg loss 0.384099, throughput 2.73835K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/172] avg loss 0.00260705, throughput 2.83767K wps
[Epoch 69 Batch 60/172] avg loss 0.00244436, throughput 2.77071K wps
[Epoch 69 Batch 90/172] avg loss 0.00258896, throughput 2.77649K wps
[Epoch 69 Batch 120/172] avg loss 0.00245831, throughput 2.78193K wps
[Epoch 69 Batch 150/172] avg loss 0.00234755, throughput 2.75324K wps
Begin Testing...
[Epoch 69] train avg loss 0.00247461, dev acc 0.8564, dev avg loss 0.387912, throughput 2.78423K wps
[Epoch 70 Batch 30/172] avg loss 0.00239411, throughput 2.81098K wps
[Epoch 70 Batch 60/172] avg loss 0.00246083, throughput 2.79412K wps
[Epoch 70 Batch 90/172] avg loss 0.0021288, throughput 2.73648K wps
[Epoch 70 Batch 120/172] avg loss 0.00249688, throughput 2.80107K wps
[Epoch 70 Batch 150/172] avg loss 0.00252019, throughput 2.77662K wps
Begin Testing...
[Epoch 70] train avg loss 0.0024073, dev acc 0.8553, dev avg loss 0.386068, throughput 2.78092K wps
[Epoch 71 Batch 30/172] avg loss 0.00229342, throughput 2.83506K wps
[Epoch 71 Batch 60/172] avg loss 0.00223355, throughput 2.81066K wps
[Epoch 71 Batch 90/172] avg loss 0.00251328, throughput 2.80655K wps
[Epoch 71 Batch 120/172] avg loss 0.0025438, throughput 2.7477K wps
[Epoch 71 Batch 150/172] avg loss 0.00226123, throughput 2.79726K wps
Begin Testing...
[Epoch 71] train avg loss 0.00236535, dev acc 0.8532, dev avg loss 0.383933, throughput 2.80024K wps
[Epoch 72 Batch 30/172] avg loss 0.00225487, throughput 2.78428K wps
[Epoch 72 Batch 60/172] avg loss 0.00257686, throughput 2.79987K wps
[Epoch 72 Batch 90/172] avg loss 0.00220141, throughput 2.78185K wps
[Epoch 72 Batch 120/172] avg loss 0.00242473, throughput 2.86323K wps
[Epoch 72 Batch 150/172] avg loss 0.0021677, throughput 2.80675K wps
Begin Testing...
[Epoch 72] train avg loss 0.00230356, dev acc 0.8512, dev avg loss 0.393263, throughput 2.80586K wps
[Epoch 73 Batch 30/172] avg loss 0.00212694, throughput 2.68301K wps
[Epoch 73 Batch 60/172] avg loss 0.00240204, throughput 2.81559K wps
[Epoch 73 Batch 90/172] avg loss 0.00228495, throughput 2.8085K wps
[Epoch 73 Batch 120/172] avg loss 0.00212711, throughput 2.81373K wps
[Epoch 73 Batch 150/172] avg loss 0.00225892, throughput 2.82799K wps
Begin Testing...
[Epoch 73] train avg loss 0.0022241, dev acc 0.8553, dev avg loss 0.405777, throughput 2.79342K wps
[Epoch 74 Batch 30/172] avg loss 0.00223747, throughput 2.86353K wps
[Epoch 74 Batch 60/172] avg loss 0.00231692, throughput 2.78116K wps
[Epoch 74 Batch 90/172] avg loss 0.00219605, throughput 2.7818K wps
[Epoch 74 Batch 120/172] avg loss 0.00225846, throughput 2.763K wps
[Epoch 74 Batch 150/172] avg loss 0.00221707, throughput 2.79749K wps
Begin Testing...
[Epoch 74] train avg loss 0.00222014, dev acc 0.8543, dev avg loss 0.398976, throughput 2.79637K wps
[Epoch 75 Batch 30/172] avg loss 0.00199584, throughput 2.77816K wps
[Epoch 75 Batch 60/172] avg loss 0.00208869, throughput 2.76168K wps
[Epoch 75 Batch 90/172] avg loss 0.00236424, throughput 2.782K wps
[Epoch 75 Batch 120/172] avg loss 0.00201825, throughput 2.78514K wps
[Epoch 75 Batch 150/172] avg loss 0.00244447, throughput 2.75868K wps
Begin Testing...
[Epoch 75] train avg loss 0.00220606, dev acc 0.8574, dev avg loss 0.403477, throughput 2.76766K wps
[Epoch 76 Batch 30/172] avg loss 0.00207266, throughput 2.78762K wps
[Epoch 76 Batch 60/172] avg loss 0.00198917, throughput 2.77456K wps
[Epoch 76 Batch 90/172] avg loss 0.0021562, throughput 2.71936K wps
[Epoch 76 Batch 120/172] avg loss 0.00228205, throughput 2.72925K wps
[Epoch 76 Batch 150/172] avg loss 0.00220887, throughput 2.71371K wps
Begin Testing...
[Epoch 76] train avg loss 0.00214062, dev acc 0.8532, dev avg loss 0.399009, throughput 2.72416K wps
[Epoch 77 Batch 30/172] avg loss 0.00226673, throughput 2.82525K wps
[Epoch 77 Batch 60/172] avg loss 0.0019605, throughput 2.77358K wps
[Epoch 77 Batch 90/172] avg loss 0.0023134, throughput 2.77641K wps
[Epoch 77 Batch 120/172] avg loss 0.0019601, throughput 2.79641K wps
[Epoch 77 Batch 150/172] avg loss 0.00209309, throughput 2.76903K wps
Begin Testing...
[Epoch 77] train avg loss 0.00212055, dev acc 0.8595, dev avg loss 0.405702, throughput 2.77489K wps
Observed Improvement.
Begin Testing...
[Epoch 78 Batch 30/172] avg loss 0.00185711, throughput 2.78491K wps
[Epoch 78 Batch 60/172] avg loss 0.00228426, throughput 2.75424K wps
[Epoch 78 Batch 90/172] avg loss 0.00177839, throughput 2.76047K wps
[Epoch 78 Batch 120/172] avg loss 0.00195903, throughput 2.76875K wps
[Epoch 78 Batch 150/172] avg loss 0.00245505, throughput 2.71792K wps
Begin Testing...
[Epoch 78] train avg loss 0.00206677, dev acc 0.8564, dev avg loss 0.419811, throughput 2.74213K wps
[Epoch 79 Batch 30/172] avg loss 0.00217515, throughput 2.81756K wps
[Epoch 79 Batch 60/172] avg loss 0.00208889, throughput 2.77842K wps
[Epoch 79 Batch 90/172] avg loss 0.00206939, throughput 2.78402K wps
[Epoch 79 Batch 120/172] avg loss 0.00186191, throughput 2.67203K wps
[Epoch 79 Batch 150/172] avg loss 0.00201022, throughput 2.75474K wps
Begin Testing...
[Epoch 79] train avg loss 0.00206921, dev acc 0.8532, dev avg loss 0.408702, throughput 2.75105K wps
[Epoch 80 Batch 30/172] avg loss 0.00197172, throughput 2.72037K wps
[Epoch 80 Batch 60/172] avg loss 0.00228612, throughput 2.81235K wps
[Epoch 80 Batch 90/172] avg loss 0.00195805, throughput 2.76647K wps
[Epoch 80 Batch 120/172] avg loss 0.00184428, throughput 2.74586K wps
[Epoch 80 Batch 150/172] avg loss 0.00206568, throughput 2.74805K wps
Begin Testing...
[Epoch 80] train avg loss 0.00204836, dev acc 0.8574, dev avg loss 0.420838, throughput 2.75318K wps
[Epoch 81 Batch 30/172] avg loss 0.00184133, throughput 2.84368K wps
[Epoch 81 Batch 60/172] avg loss 0.00194186, throughput 2.80375K wps
[Epoch 81 Batch 90/172] avg loss 0.00204449, throughput 2.81442K wps
[Epoch 81 Batch 120/172] avg loss 0.00225872, throughput 2.74238K wps
[Epoch 81 Batch 150/172] avg loss 0.00191668, throughput 2.86867K wps
Begin Testing...
[Epoch 81] train avg loss 0.00200572, dev acc 0.8512, dev avg loss 0.408228, throughput 2.79694K wps
[Epoch 82 Batch 30/172] avg loss 0.00240031, throughput 2.90511K wps
[Epoch 82 Batch 60/172] avg loss 0.0016265, throughput 2.79717K wps
[Epoch 82 Batch 90/172] avg loss 0.00177204, throughput 2.80613K wps
[Epoch 82 Batch 120/172] avg loss 0.00188563, throughput 2.82959K wps
[Epoch 82 Batch 150/172] avg loss 0.0021581, throughput 2.8032K wps
Begin Testing...
[Epoch 82] train avg loss 0.00199321, dev acc 0.8449, dev avg loss 0.413585, throughput 2.82674K wps
[Epoch 83 Batch 30/172] avg loss 0.00198583, throughput 2.8798K wps
[Epoch 83 Batch 60/172] avg loss 0.00167505, throughput 2.7847K wps
[Epoch 83 Batch 90/172] avg loss 0.00194638, throughput 2.83369K wps
[Epoch 83 Batch 120/172] avg loss 0.00178785, throughput 2.78752K wps
[Epoch 83 Batch 150/172] avg loss 0.00216529, throughput 2.80366K wps
Begin Testing...
[Epoch 83] train avg loss 0.00196764, dev acc 0.8532, dev avg loss 0.425875, throughput 2.81406K wps
[Epoch 84 Batch 30/172] avg loss 0.00182669, throughput 2.8363K wps
[Epoch 84 Batch 60/172] avg loss 0.00178677, throughput 2.78276K wps
[Epoch 84 Batch 90/172] avg loss 0.0019615, throughput 2.76534K wps
[Epoch 84 Batch 120/172] avg loss 0.00159961, throughput 2.81554K wps
[Epoch 84 Batch 150/172] avg loss 0.00211299, throughput 2.7973K wps
Begin Testing...
[Epoch 84] train avg loss 0.00191434, dev acc 0.8543, dev avg loss 0.421226, throughput 2.80321K wps
[Epoch 85 Batch 30/172] avg loss 0.00202293, throughput 2.79618K wps
[Epoch 85 Batch 60/172] avg loss 0.00195881, throughput 2.87783K wps
[Epoch 85 Batch 90/172] avg loss 0.00173179, throughput 2.71345K wps
[Epoch 85 Batch 120/172] avg loss 0.00195328, throughput 2.78543K wps
[Epoch 85 Batch 150/172] avg loss 0.0018488, throughput 2.81384K wps
Begin Testing...
[Epoch 85] train avg loss 0.00194933, dev acc 0.8512, dev avg loss 0.422417, throughput 2.79174K wps
[Epoch 86 Batch 30/172] avg loss 0.00181388, throughput 2.86274K wps
[Epoch 86 Batch 60/172] avg loss 0.00186304, throughput 2.79766K wps
[Epoch 86 Batch 90/172] avg loss 0.00171824, throughput 2.80873K wps
[Epoch 86 Batch 120/172] avg loss 0.00180305, throughput 2.7848K wps
[Epoch 86 Batch 150/172] avg loss 0.00180778, throughput 2.77083K wps
Begin Testing...
[Epoch 86] train avg loss 0.00187822, dev acc 0.8553, dev avg loss 0.431874, throughput 2.78225K wps
[Epoch 87 Batch 30/172] avg loss 0.00161285, throughput 2.77312K wps
[Epoch 87 Batch 60/172] avg loss 0.00232709, throughput 2.79362K wps
[Epoch 87 Batch 90/172] avg loss 0.00191916, throughput 2.77985K wps
[Epoch 87 Batch 120/172] avg loss 0.00171423, throughput 2.76561K wps
[Epoch 87 Batch 150/172] avg loss 0.00170114, throughput 2.73777K wps
Begin Testing...
[Epoch 87] train avg loss 0.00187273, dev acc 0.8553, dev avg loss 0.43114, throughput 2.76685K wps
[Epoch 88 Batch 30/172] avg loss 0.00187652, throughput 2.69452K wps
[Epoch 88 Batch 60/172] avg loss 0.001943, throughput 2.78247K wps
[Epoch 88 Batch 90/172] avg loss 0.00182147, throughput 2.75899K wps
[Epoch 88 Batch 120/172] avg loss 0.0016524, throughput 2.79646K wps
[Epoch 88 Batch 150/172] avg loss 0.00162679, throughput 2.75645K wps
Begin Testing...
[Epoch 88] train avg loss 0.00182373, dev acc 0.8501, dev avg loss 0.425533, throughput 2.7596K wps
[Epoch 89 Batch 30/172] avg loss 0.00169332, throughput 2.79305K wps
[Epoch 89 Batch 60/172] avg loss 0.00162346, throughput 2.71839K wps
[Epoch 89 Batch 90/172] avg loss 0.00181771, throughput 2.74478K wps
[Epoch 89 Batch 120/172] avg loss 0.00199249, throughput 2.773K wps
[Epoch 89 Batch 150/172] avg loss 0.00185743, throughput 2.69158K wps
Begin Testing...
[Epoch 89] train avg loss 0.00178681, dev acc 0.8543, dev avg loss 0.433176, throughput 2.74961K wps
[Epoch 90 Batch 30/172] avg loss 0.0019748, throughput 2.73026K wps
[Epoch 90 Batch 60/172] avg loss 0.00196715, throughput 2.74465K wps
[Epoch 90 Batch 90/172] avg loss 0.00168299, throughput 2.78258K wps
[Epoch 90 Batch 120/172] avg loss 0.00194275, throughput 2.75607K wps
[Epoch 90 Batch 150/172] avg loss 0.00181625, throughput 2.76936K wps
Begin Testing...
[Epoch 90] train avg loss 0.00183534, dev acc 0.8522, dev avg loss 0.442319, throughput 2.7506K wps
[Epoch 91 Batch 30/172] avg loss 0.00175401, throughput 2.80142K wps
[Epoch 91 Batch 60/172] avg loss 0.00163978, throughput 2.79153K wps
[Epoch 91 Batch 90/172] avg loss 0.00166959, throughput 2.75411K wps
[Epoch 91 Batch 120/172] avg loss 0.00195054, throughput 2.79457K wps
[Epoch 91 Batch 150/172] avg loss 0.001846, throughput 2.692K wps
Begin Testing...
[Epoch 91] train avg loss 0.00179072, dev acc 0.8512, dev avg loss 0.450453, throughput 2.77235K wps
[Epoch 92 Batch 30/172] avg loss 0.00143114, throughput 2.78458K wps
[Epoch 92 Batch 60/172] avg loss 0.00162279, throughput 2.76913K wps
[Epoch 92 Batch 90/172] avg loss 0.00147248, throughput 2.76946K wps
[Epoch 92 Batch 120/172] avg loss 0.00197342, throughput 2.76692K wps
[Epoch 92 Batch 150/172] avg loss 0.00202102, throughput 2.75196K wps
Begin Testing...
[Epoch 92] train avg loss 0.0017256, dev acc 0.8470, dev avg loss 0.436765, throughput 2.76099K wps
[Epoch 93 Batch 30/172] avg loss 0.00166414, throughput 2.67024K wps
[Epoch 93 Batch 60/172] avg loss 0.0017465, throughput 2.58358K wps
[Epoch 93 Batch 90/172] avg loss 0.00176592, throughput 2.79554K wps
[Epoch 93 Batch 120/172] avg loss 0.00188228, throughput 2.73292K wps
[Epoch 93 Batch 150/172] avg loss 0.00152507, throughput 2.67194K wps
Begin Testing...
[Epoch 93] train avg loss 0.00173339, dev acc 0.8512, dev avg loss 0.444486, throughput 2.71399K wps
[Epoch 94 Batch 30/172] avg loss 0.00161463, throughput 2.80748K wps
[Epoch 94 Batch 60/172] avg loss 0.0016636, throughput 2.65392K wps
[Epoch 94 Batch 90/172] avg loss 0.00179421, throughput 2.66488K wps
[Epoch 94 Batch 120/172] avg loss 0.00152922, throughput 2.85183K wps
[Epoch 94 Batch 150/172] avg loss 0.00165871, throughput 2.81323K wps
Begin Testing...
[Epoch 94] train avg loss 0.00169002, dev acc 0.8459, dev avg loss 0.440114, throughput 2.76224K wps
[Epoch 95 Batch 30/172] avg loss 0.00144027, throughput 2.85899K wps
[Epoch 95 Batch 60/172] avg loss 0.00160958, throughput 2.80169K wps
[Epoch 95 Batch 90/172] avg loss 0.00166145, throughput 2.79652K wps
[Epoch 95 Batch 120/172] avg loss 0.00188655, throughput 2.7911K wps
[Epoch 95 Batch 150/172] avg loss 0.00171287, throughput 2.77556K wps
Begin Testing...
[Epoch 95] train avg loss 0.00170552, dev acc 0.8585, dev avg loss 0.48441, throughput 2.79108K wps
[Epoch 96 Batch 30/172] avg loss 0.00172678, throughput 2.75649K wps
[Epoch 96 Batch 60/172] avg loss 0.00172365, throughput 2.77362K wps
[Epoch 96 Batch 90/172] avg loss 0.00145648, throughput 2.7688K wps
[Epoch 96 Batch 120/172] avg loss 0.00183894, throughput 2.75118K wps
[Epoch 96 Batch 150/172] avg loss 0.0016633, throughput 2.79754K wps
Begin Testing...
[Epoch 96] train avg loss 0.00168614, dev acc 0.8616, dev avg loss 0.469563, throughput 2.77065K wps
Observed Improvement.
Begin Testing...
[Epoch 97 Batch 30/172] avg loss 0.0016575, throughput 2.81383K wps
[Epoch 97 Batch 60/172] avg loss 0.00151322, throughput 2.85917K wps
[Epoch 97 Batch 90/172] avg loss 0.00143286, throughput 2.79784K wps
[Epoch 97 Batch 120/172] avg loss 0.00184432, throughput 2.69474K wps
[Epoch 97 Batch 150/172] avg loss 0.00161452, throughput 2.80068K wps
Begin Testing...
[Epoch 97] train avg loss 0.00164428, dev acc 0.8449, dev avg loss 0.446684, throughput 2.78264K wps
[Epoch 98 Batch 30/172] avg loss 0.00155678, throughput 2.81498K wps
[Epoch 98 Batch 60/172] avg loss 0.00146458, throughput 2.80207K wps
[Epoch 98 Batch 90/172] avg loss 0.00159762, throughput 2.80439K wps
[Epoch 98 Batch 120/172] avg loss 0.00176321, throughput 2.80443K wps
[Epoch 98 Batch 150/172] avg loss 0.00176106, throughput 2.79103K wps
Begin Testing...
[Epoch 98] train avg loss 0.00164482, dev acc 0.8564, dev avg loss 0.477735, throughput 2.78213K wps
[Epoch 99 Batch 30/172] avg loss 0.00153714, throughput 2.7756K wps
[Epoch 99 Batch 60/172] avg loss 0.0014093, throughput 2.78956K wps
[Epoch 99 Batch 90/172] avg loss 0.0013797, throughput 2.80113K wps
[Epoch 99 Batch 120/172] avg loss 0.0017832, throughput 2.78978K wps
[Epoch 99 Batch 150/172] avg loss 0.00183663, throughput 2.81821K wps
Begin Testing...
[Epoch 99] train avg loss 0.001625, dev acc 0.8459, dev avg loss 0.454886, throughput 2.79762K wps
[Epoch 100 Batch 30/172] avg loss 0.00137739, throughput 2.78045K wps
[Epoch 100 Batch 60/172] avg loss 0.00169164, throughput 2.81538K wps
[Epoch 100 Batch 90/172] avg loss 0.00167677, throughput 2.74861K wps
[Epoch 100 Batch 120/172] avg loss 0.00159828, throughput 2.77787K wps
[Epoch 100 Batch 150/172] avg loss 0.00190474, throughput 2.79467K wps
Begin Testing...
[Epoch 100] train avg loss 0.00163519, dev acc 0.8574, dev avg loss 0.471755, throughput 2.76927K wps
[Epoch 101 Batch 30/172] avg loss 0.0017422, throughput 2.86265K wps
[Epoch 101 Batch 60/172] avg loss 0.0018118, throughput 2.74919K wps
[Epoch 101 Batch 90/172] avg loss 0.0013432, throughput 2.69017K wps
[Epoch 101 Batch 120/172] avg loss 0.00140408, throughput 2.82513K wps
[Epoch 101 Batch 150/172] avg loss 0.00169375, throughput 2.75743K wps
Begin Testing...
[Epoch 101] train avg loss 0.00160148, dev acc 0.8595, dev avg loss 0.486482, throughput 2.75505K wps
[Epoch 102 Batch 30/172] avg loss 0.00163292, throughput 2.81175K wps
[Epoch 102 Batch 60/172] avg loss 0.00145703, throughput 2.73978K wps
[Epoch 102 Batch 90/172] avg loss 0.00164209, throughput 2.75117K wps
[Epoch 102 Batch 120/172] avg loss 0.00152506, throughput 2.69359K wps
[Epoch 102 Batch 150/172] avg loss 0.0016375, throughput 2.79425K wps
Begin Testing...
[Epoch 102] train avg loss 0.00159693, dev acc 0.8459, dev avg loss 0.463825, throughput 2.75895K wps
[Epoch 103 Batch 30/172] avg loss 0.00149193, throughput 2.80055K wps
[Epoch 103 Batch 60/172] avg loss 0.00163156, throughput 2.78498K wps
[Epoch 103 Batch 90/172] avg loss 0.00173442, throughput 2.78647K wps
[Epoch 103 Batch 120/172] avg loss 0.00132965, throughput 2.76065K wps
[Epoch 103 Batch 150/172] avg loss 0.00161284, throughput 2.76846K wps
Begin Testing...
[Epoch 103] train avg loss 0.00157735, dev acc 0.8480, dev avg loss 0.464823, throughput 2.78107K wps
[Epoch 104 Batch 30/172] avg loss 0.00145308, throughput 2.86637K wps
[Epoch 104 Batch 60/172] avg loss 0.00122332, throughput 2.7745K wps
[Epoch 104 Batch 90/172] avg loss 0.00149027, throughput 2.74842K wps
[Epoch 104 Batch 120/172] avg loss 0.00164449, throughput 2.79412K wps
[Epoch 104 Batch 150/172] avg loss 0.00142747, throughput 2.80485K wps
Begin Testing...
[Epoch 104] train avg loss 0.00153377, dev acc 0.8501, dev avg loss 0.477602, throughput 2.79495K wps
[Epoch 105 Batch 30/172] avg loss 0.00147327, throughput 2.78444K wps
[Epoch 105 Batch 60/172] avg loss 0.00156106, throughput 2.84013K wps
[Epoch 105 Batch 90/172] avg loss 0.00134349, throughput 2.82084K wps
[Epoch 105 Batch 120/172] avg loss 0.00166739, throughput 2.77122K wps
[Epoch 105 Batch 150/172] avg loss 0.00154221, throughput 2.71226K wps
Begin Testing...
[Epoch 105] train avg loss 0.00156411, dev acc 0.8470, dev avg loss 0.468014, throughput 2.78519K wps
[Epoch 106 Batch 30/172] avg loss 0.00116511, throughput 2.85536K wps
[Epoch 106 Batch 60/172] avg loss 0.00138591, throughput 2.8193K wps
[Epoch 106 Batch 90/172] avg loss 0.00138158, throughput 2.77811K wps
[Epoch 106 Batch 120/172] avg loss 0.00186614, throughput 2.79801K wps
[Epoch 106 Batch 150/172] avg loss 0.00163254, throughput 2.684K wps
Begin Testing...
[Epoch 106] train avg loss 0.00151895, dev acc 0.8512, dev avg loss 0.486273, throughput 2.79595K wps
[Epoch 107 Batch 30/172] avg loss 0.00131636, throughput 2.83753K wps
[Epoch 107 Batch 60/172] avg loss 0.0015319, throughput 2.7986K wps
[Epoch 107 Batch 90/172] avg loss 0.00167269, throughput 2.79615K wps
[Epoch 107 Batch 120/172] avg loss 0.00130368, throughput 2.78353K wps
[Epoch 107 Batch 150/172] avg loss 0.00167583, throughput 2.7824K wps
Begin Testing...
[Epoch 107] train avg loss 0.00149646, dev acc 0.8449, dev avg loss 0.470784, throughput 2.79584K wps
[Epoch 108 Batch 30/172] avg loss 0.00140766, throughput 2.7682K wps
[Epoch 108 Batch 60/172] avg loss 0.00143295, throughput 2.68494K wps
[Epoch 108 Batch 90/172] avg loss 0.00159878, throughput 2.81284K wps
[Epoch 108 Batch 120/172] avg loss 0.00151408, throughput 2.80027K wps
[Epoch 108 Batch 150/172] avg loss 0.00174759, throughput 2.69125K wps
Begin Testing...
[Epoch 108] train avg loss 0.00152397, dev acc 0.8585, dev avg loss 0.507341, throughput 2.75024K wps
[Epoch 109 Batch 30/172] avg loss 0.00110864, throughput 2.85255K wps
[Epoch 109 Batch 60/172] avg loss 0.00162099, throughput 2.80243K wps
[Epoch 109 Batch 90/172] avg loss 0.00142263, throughput 2.6609K wps
[Epoch 109 Batch 120/172] avg loss 0.00150159, throughput 2.89553K wps
[Epoch 109 Batch 150/172] avg loss 0.00150398, throughput 2.82813K wps
Begin Testing...
[Epoch 109] train avg loss 0.00148395, dev acc 0.8459, dev avg loss 0.476089, throughput 2.80788K wps
[Epoch 110 Batch 30/172] avg loss 0.00151697, throughput 2.80172K wps
[Epoch 110 Batch 60/172] avg loss 0.00140675, throughput 2.79186K wps
[Epoch 110 Batch 90/172] avg loss 0.00149666, throughput 2.78496K wps
[Epoch 110 Batch 120/172] avg loss 0.00143838, throughput 2.78977K wps
[Epoch 110 Batch 150/172] avg loss 0.00155045, throughput 2.69033K wps
Begin Testing...
[Epoch 110] train avg loss 0.00148037, dev acc 0.8501, dev avg loss 0.485993, throughput 2.7742K wps
[Epoch 111 Batch 30/172] avg loss 0.00152863, throughput 2.80187K wps
[Epoch 111 Batch 60/172] avg loss 0.0014479, throughput 2.76341K wps
[Epoch 111 Batch 90/172] avg loss 0.00159062, throughput 2.7748K wps
[Epoch 111 Batch 120/172] avg loss 0.00127525, throughput 2.75735K wps
[Epoch 111 Batch 150/172] avg loss 0.00159328, throughput 2.75753K wps
Begin Testing...
[Epoch 111] train avg loss 0.00147545, dev acc 0.8564, dev avg loss 0.506926, throughput 2.77131K wps
[Epoch 112 Batch 30/172] avg loss 0.0015876, throughput 2.7861K wps
[Epoch 112 Batch 60/172] avg loss 0.00130336, throughput 2.77401K wps
[Epoch 112 Batch 90/172] avg loss 0.00152712, throughput 2.75391K wps
[Epoch 112 Batch 120/172] avg loss 0.00125919, throughput 2.77001K wps
[Epoch 112 Batch 150/172] avg loss 0.00139735, throughput 2.74704K wps
Begin Testing...
[Epoch 112] train avg loss 0.00143492, dev acc 0.8501, dev avg loss 0.488071, throughput 2.77215K wps
[Epoch 113 Batch 30/172] avg loss 0.00142827, throughput 2.67835K wps
[Epoch 113 Batch 60/172] avg loss 0.0013144, throughput 2.78199K wps
[Epoch 113 Batch 90/172] avg loss 0.00139412, throughput 2.77191K wps
[Epoch 113 Batch 120/172] avg loss 0.00147755, throughput 2.72826K wps
[Epoch 113 Batch 150/172] avg loss 0.00159197, throughput 2.71312K wps
Begin Testing...
[Epoch 113] train avg loss 0.00142863, dev acc 0.8480, dev avg loss 0.484919, throughput 2.73941K wps
[Epoch 114 Batch 30/172] avg loss 0.00141735, throughput 2.78963K wps
[Epoch 114 Batch 60/172] avg loss 0.00171106, throughput 2.73481K wps
[Epoch 114 Batch 90/172] avg loss 0.00150194, throughput 2.78864K wps
[Epoch 114 Batch 120/172] avg loss 0.0012736, throughput 2.78964K wps
[Epoch 114 Batch 150/172] avg loss 0.00133386, throughput 2.79411K wps
Begin Testing...
[Epoch 114] train avg loss 0.00141924, dev acc 0.8501, dev avg loss 0.499669, throughput 2.77053K wps
[Epoch 115 Batch 30/172] avg loss 0.00107659, throughput 2.66156K wps
[Epoch 115 Batch 60/172] avg loss 0.00152043, throughput 2.73603K wps
[Epoch 115 Batch 90/172] avg loss 0.00134659, throughput 2.79619K wps
[Epoch 115 Batch 120/172] avg loss 0.00144279, throughput 2.81132K wps
[Epoch 115 Batch 150/172] avg loss 0.00147162, throughput 2.78978K wps
Begin Testing...
[Epoch 115] train avg loss 0.0013946, dev acc 0.8522, dev avg loss 0.517282, throughput 2.7616K wps
[Epoch 116 Batch 30/172] avg loss 0.00142935, throughput 2.82452K wps
[Epoch 116 Batch 60/172] avg loss 0.00125462, throughput 2.79502K wps
[Epoch 116 Batch 90/172] avg loss 0.00131833, throughput 2.80466K wps
[Epoch 116 Batch 120/172] avg loss 0.00145291, throughput 2.78438K wps
[Epoch 116 Batch 150/172] avg loss 0.00140798, throughput 2.77563K wps
Begin Testing...
[Epoch 116] train avg loss 0.00139515, dev acc 0.8491, dev avg loss 0.498938, throughput 2.79743K wps
[Epoch 117 Batch 30/172] avg loss 0.00123784, throughput 2.82488K wps
[Epoch 117 Batch 60/172] avg loss 0.00138898, throughput 2.78831K wps
[Epoch 117 Batch 90/172] avg loss 0.00140464, throughput 2.79355K wps
[Epoch 117 Batch 120/172] avg loss 0.00114451, throughput 2.77301K wps
[Epoch 117 Batch 150/172] avg loss 0.00145557, throughput 2.71026K wps
Begin Testing...
[Epoch 117] train avg loss 0.00135851, dev acc 0.8470, dev avg loss 0.49153, throughput 2.78143K wps
[Epoch 118 Batch 30/172] avg loss 0.00108225, throughput 2.83857K wps
[Epoch 118 Batch 60/172] avg loss 0.00168966, throughput 2.79681K wps
[Epoch 118 Batch 90/172] avg loss 0.00131065, throughput 2.74627K wps
[Epoch 118 Batch 120/172] avg loss 0.0013969, throughput 2.74659K wps
[Epoch 118 Batch 150/172] avg loss 0.00140541, throughput 2.65117K wps
Begin Testing...
[Epoch 118] train avg loss 0.0013638, dev acc 0.8480, dev avg loss 0.499954, throughput 2.74839K wps
[Epoch 119 Batch 30/172] avg loss 0.00124304, throughput 2.77282K wps
[Epoch 119 Batch 60/172] avg loss 0.00140699, throughput 2.7535K wps
[Epoch 119 Batch 90/172] avg loss 0.00154985, throughput 2.77447K wps
[Epoch 119 Batch 120/172] avg loss 0.00121202, throughput 2.70849K wps
[Epoch 119 Batch 150/172] avg loss 0.00124532, throughput 2.78864K wps
Begin Testing...
[Epoch 119] train avg loss 0.00135343, dev acc 0.8438, dev avg loss 0.491903, throughput 2.76523K wps
[Epoch 120 Batch 30/172] avg loss 0.00143872, throughput 2.72137K wps
[Epoch 120 Batch 60/172] avg loss 0.0011473, throughput 2.75005K wps
[Epoch 120 Batch 90/172] avg loss 0.00139421, throughput 2.81112K wps
[Epoch 120 Batch 120/172] avg loss 0.00156958, throughput 2.79341K wps
[Epoch 120 Batch 150/172] avg loss 0.00124533, throughput 2.80209K wps
Begin Testing...
[Epoch 120] train avg loss 0.00135529, dev acc 0.8470, dev avg loss 0.506798, throughput 2.7749K wps
[Epoch 121 Batch 30/172] avg loss 0.00109357, throughput 2.88571K wps
[Epoch 121 Batch 60/172] avg loss 0.00115863, throughput 2.80003K wps
[Epoch 121 Batch 90/172] avg loss 0.00141558, throughput 2.7388K wps
[Epoch 121 Batch 120/172] avg loss 0.00133856, throughput 2.85399K wps
[Epoch 121 Batch 150/172] avg loss 0.0015414, throughput 2.84924K wps
Begin Testing...
[Epoch 121] train avg loss 0.00134273, dev acc 0.8512, dev avg loss 0.523265, throughput 2.824K wps
[Epoch 122 Batch 30/172] avg loss 0.00108773, throughput 2.76391K wps
[Epoch 122 Batch 60/172] avg loss 0.00122462, throughput 2.77095K wps
[Epoch 122 Batch 90/172] avg loss 0.00138646, throughput 2.79418K wps
[Epoch 122 Batch 120/172] avg loss 0.00153056, throughput 2.74706K wps
[Epoch 122 Batch 150/172] avg loss 0.00153209, throughput 2.84135K wps
Begin Testing...
[Epoch 122] train avg loss 0.001331, dev acc 0.8459, dev avg loss 0.497834, throughput 2.77629K wps
[Epoch 123 Batch 30/172] avg loss 0.00135395, throughput 2.7017K wps
[Epoch 123 Batch 60/172] avg loss 0.00112234, throughput 2.79886K wps
[Epoch 123 Batch 90/172] avg loss 0.00120826, throughput 2.77491K wps
[Epoch 123 Batch 120/172] avg loss 0.00126, throughput 2.77189K wps
[Epoch 123 Batch 150/172] avg loss 0.00162077, throughput 2.63548K wps
Begin Testing...
[Epoch 123] train avg loss 0.00131901, dev acc 0.8470, dev avg loss 0.510296, throughput 2.7368K wps
[Epoch 124 Batch 30/172] avg loss 0.00124544, throughput 2.84984K wps
[Epoch 124 Batch 60/172] avg loss 0.00110453, throughput 2.8145K wps
[Epoch 124 Batch 90/172] avg loss 0.0011314, throughput 2.74986K wps
[Epoch 124 Batch 120/172] avg loss 0.00125062, throughput 2.73605K wps
[Epoch 124 Batch 150/172] avg loss 0.00136838, throughput 2.76123K wps
Begin Testing...
[Epoch 124] train avg loss 0.00128475, dev acc 0.8375, dev avg loss 0.499548, throughput 2.7851K wps
[Epoch 125 Batch 30/172] avg loss 0.00110675, throughput 2.69973K wps
[Epoch 125 Batch 60/172] avg loss 0.00128188, throughput 2.62361K wps
[Epoch 125 Batch 90/172] avg loss 0.00125778, throughput 2.78934K wps
[Epoch 125 Batch 120/172] avg loss 0.00116241, throughput 2.81823K wps
[Epoch 125 Batch 150/172] avg loss 0.00157154, throughput 2.81027K wps
Begin Testing...
[Epoch 125] train avg loss 0.00132245, dev acc 0.8438, dev avg loss 0.504569, throughput 2.75179K wps
[Epoch 126 Batch 30/172] avg loss 0.00127882, throughput 2.83066K wps
[Epoch 126 Batch 60/172] avg loss 0.0013382, throughput 2.81288K wps
[Epoch 126 Batch 90/172] avg loss 0.00147315, throughput 2.8022K wps
[Epoch 126 Batch 120/172] avg loss 0.00131263, throughput 2.80159K wps
[Epoch 126 Batch 150/172] avg loss 0.00134174, throughput 2.80299K wps
Begin Testing...
[Epoch 126] train avg loss 0.00130546, dev acc 0.8522, dev avg loss 0.53097, throughput 2.81139K wps
[Epoch 127 Batch 30/172] avg loss 0.00125081, throughput 2.88073K wps
[Epoch 127 Batch 60/172] avg loss 0.00136642, throughput 2.8049K wps
[Epoch 127 Batch 90/172] avg loss 0.00114614, throughput 2.78915K wps
[Epoch 127 Batch 120/172] avg loss 0.00138103, throughput 2.7932K wps
[Epoch 127 Batch 150/172] avg loss 0.0010781, throughput 2.80189K wps
Begin Testing...
[Epoch 127] train avg loss 0.00125644, dev acc 0.8449, dev avg loss 0.515012, throughput 2.81228K wps
[Epoch 128 Batch 30/172] avg loss 0.00100452, throughput 2.77112K wps
[Epoch 128 Batch 60/172] avg loss 0.00114813, throughput 2.83788K wps
[Epoch 128 Batch 90/172] avg loss 0.00134955, throughput 2.71316K wps
[Epoch 128 Batch 120/172] avg loss 0.00128862, throughput 2.77597K wps
[Epoch 128 Batch 150/172] avg loss 0.00146039, throughput 2.77983K wps
Begin Testing...
[Epoch 128] train avg loss 0.00125894, dev acc 0.8438, dev avg loss 0.504302, throughput 2.76951K wps
[Epoch 129 Batch 30/172] avg loss 0.00127291, throughput 2.74683K wps
[Epoch 129 Batch 60/172] avg loss 0.0012614, throughput 2.72872K wps
[Epoch 129 Batch 90/172] avg loss 0.00113492, throughput 2.76591K wps
[Epoch 129 Batch 120/172] avg loss 0.00114825, throughput 2.77844K wps
[Epoch 129 Batch 150/172] avg loss 0.00134204, throughput 2.67657K wps
Begin Testing...
[Epoch 129] train avg loss 0.00125687, dev acc 0.8627, dev avg loss 0.56358, throughput 2.74392K wps
Observed Improvement.
Begin Testing...
[Epoch 130 Batch 30/172] avg loss 0.00104253, throughput 2.80468K wps
[Epoch 130 Batch 60/172] avg loss 0.00120995, throughput 2.73943K wps
[Epoch 130 Batch 90/172] avg loss 0.00125671, throughput 2.77781K wps
[Epoch 130 Batch 120/172] avg loss 0.00140732, throughput 2.71479K wps
[Epoch 130 Batch 150/172] avg loss 0.0012539, throughput 2.82905K wps
Begin Testing...
[Epoch 130] train avg loss 0.00123537, dev acc 0.8459, dev avg loss 0.519006, throughput 2.75144K wps
[Epoch 131 Batch 30/172] avg loss 0.00129485, throughput 2.86253K wps
[Epoch 131 Batch 60/172] avg loss 0.00112352, throughput 2.74346K wps
[Epoch 131 Batch 90/172] avg loss 0.00128166, throughput 2.80356K wps
[Epoch 131 Batch 120/172] avg loss 0.00134968, throughput 2.81023K wps
[Epoch 131 Batch 150/172] avg loss 0.00128287, throughput 2.8401K wps
Begin Testing...
[Epoch 131] train avg loss 0.00127711, dev acc 0.8480, dev avg loss 0.529986, throughput 2.81251K wps
[Epoch 132 Batch 30/172] avg loss 0.00132039, throughput 2.76695K wps
[Epoch 132 Batch 60/172] avg loss 0.00139956, throughput 2.81434K wps
[Epoch 132 Batch 90/172] avg loss 0.0012546, throughput 2.83294K wps
[Epoch 132 Batch 120/172] avg loss 0.000999553, throughput 2.76687K wps
[Epoch 132 Batch 150/172] avg loss 0.00110922, throughput 2.77838K wps
Begin Testing...
[Epoch 132] train avg loss 0.0012176, dev acc 0.8532, dev avg loss 0.532873, throughput 2.79356K wps
[Epoch 133 Batch 30/172] avg loss 0.0012127, throughput 2.67678K wps
[Epoch 133 Batch 60/172] avg loss 0.00132943, throughput 2.8491K wps
[Epoch 133 Batch 90/172] avg loss 0.00111204, throughput 2.68336K wps
[Epoch 133 Batch 120/172] avg loss 0.00125411, throughput 2.72003K wps
[Epoch 133 Batch 150/172] avg loss 0.00126494, throughput 2.74819K wps
Begin Testing...
[Epoch 133] train avg loss 0.0012421, dev acc 0.8532, dev avg loss 0.529878, throughput 2.73553K wps
[Epoch 134 Batch 30/172] avg loss 0.00118771, throughput 2.80391K wps
[Epoch 134 Batch 60/172] avg loss 0.00106597, throughput 2.79363K wps
[Epoch 134 Batch 90/172] avg loss 0.00121101, throughput 2.7016K wps
[Epoch 134 Batch 120/172] avg loss 0.00125361, throughput 2.69613K wps
[Epoch 134 Batch 150/172] avg loss 0.00154804, throughput 2.75793K wps
Begin Testing...
[Epoch 134] train avg loss 0.00124197, dev acc 0.8501, dev avg loss 0.527287, throughput 2.75607K wps
[Epoch 135 Batch 30/172] avg loss 0.00120278, throughput 2.85478K wps
[Epoch 135 Batch 60/172] avg loss 0.0011765, throughput 2.80202K wps
[Epoch 135 Batch 90/172] avg loss 0.000917657, throughput 2.74103K wps
[Epoch 135 Batch 120/172] avg loss 0.00111865, throughput 2.79684K wps
[Epoch 135 Batch 150/172] avg loss 0.00152156, throughput 2.77955K wps
Begin Testing...
[Epoch 135] train avg loss 0.00119733, dev acc 0.8491, dev avg loss 0.533372, throughput 2.79348K wps
[Epoch 136 Batch 30/172] avg loss 0.00107363, throughput 2.75366K wps
[Epoch 136 Batch 60/172] avg loss 0.00133915, throughput 2.64023K wps
[Epoch 136 Batch 90/172] avg loss 0.000969411, throughput 2.62756K wps
[Epoch 136 Batch 120/172] avg loss 0.00126374, throughput 2.80991K wps
[Epoch 136 Batch 150/172] avg loss 0.00132362, throughput 2.80347K wps
Begin Testing...
[Epoch 136] train avg loss 0.00120454, dev acc 0.8501, dev avg loss 0.54217, throughput 2.73672K wps
[Epoch 137 Batch 30/172] avg loss 0.00117114, throughput 2.83175K wps
[Epoch 137 Batch 60/172] avg loss 0.00103555, throughput 2.82066K wps
[Epoch 137 Batch 90/172] avg loss 0.00117356, throughput 2.79347K wps
[Epoch 137 Batch 120/172] avg loss 0.00123688, throughput 2.76978K wps
[Epoch 137 Batch 150/172] avg loss 0.00115717, throughput 2.73133K wps
Begin Testing...
[Epoch 137] train avg loss 0.00118796, dev acc 0.8428, dev avg loss 0.525475, throughput 2.79027K wps
[Epoch 138 Batch 30/172] avg loss 0.000961918, throughput 2.64929K wps
[Epoch 138 Batch 60/172] avg loss 0.00111471, throughput 2.84543K wps
[Epoch 138 Batch 90/172] avg loss 0.00111208, throughput 2.80889K wps
[Epoch 138 Batch 120/172] avg loss 0.00139795, throughput 2.83484K wps
[Epoch 138 Batch 150/172] avg loss 0.00127707, throughput 2.8269K wps
Begin Testing...
[Epoch 138] train avg loss 0.00115178, dev acc 0.8512, dev avg loss 0.545775, throughput 2.79695K wps
[Epoch 139 Batch 30/172] avg loss 0.00103981, throughput 2.86443K wps
[Epoch 139 Batch 60/172] avg loss 0.00125212, throughput 2.80158K wps
[Epoch 139 Batch 90/172] avg loss 0.000996743, throughput 2.80846K wps
[Epoch 139 Batch 120/172] avg loss 0.00122858, throughput 2.81064K wps
[Epoch 139 Batch 150/172] avg loss 0.00125045, throughput 2.78787K wps
Begin Testing...
[Epoch 139] train avg loss 0.00119916, dev acc 0.8553, dev avg loss 0.574703, throughput 2.81143K wps
[Epoch 140 Batch 30/172] avg loss 0.000907177, throughput 2.72674K wps
[Epoch 140 Batch 60/172] avg loss 0.00123156, throughput 2.73631K wps
[Epoch 140 Batch 90/172] avg loss 0.00099573, throughput 2.78798K wps
[Epoch 140 Batch 120/172] avg loss 0.00116607, throughput 2.66184K wps
[Epoch 140 Batch 150/172] avg loss 0.00140521, throughput 2.77438K wps
Begin Testing...
[Epoch 140] train avg loss 0.00116982, dev acc 0.8480, dev avg loss 0.537768, throughput 2.74699K wps
[Epoch 141 Batch 30/172] avg loss 0.00105752, throughput 2.81541K wps
[Epoch 141 Batch 60/172] avg loss 0.00102398, throughput 2.72957K wps
[Epoch 141 Batch 90/172] avg loss 0.00117083, throughput 2.77363K wps
[Epoch 141 Batch 120/172] avg loss 0.00131509, throughput 2.82627K wps
[Epoch 141 Batch 150/172] avg loss 0.00127301, throughput 2.83047K wps
Begin Testing...
[Epoch 141] train avg loss 0.00117435, dev acc 0.8470, dev avg loss 0.538548, throughput 2.79685K wps
[Epoch 142 Batch 30/172] avg loss 0.00108601, throughput 2.68031K wps
[Epoch 142 Batch 60/172] avg loss 0.00102931, throughput 2.89402K wps
[Epoch 142 Batch 90/172] avg loss 0.00114781, throughput 2.84313K wps
[Epoch 142 Batch 120/172] avg loss 0.00114097, throughput 2.82396K wps
[Epoch 142 Batch 150/172] avg loss 0.000949148, throughput 2.84231K wps
Begin Testing...
[Epoch 142] train avg loss 0.00116941, dev acc 0.8407, dev avg loss 0.526628, throughput 2.81871K wps
[Epoch 143 Batch 30/172] avg loss 0.00116425, throughput 2.8833K wps
[Epoch 143 Batch 60/172] avg loss 0.0010636, throughput 2.69524K wps
[Epoch 143 Batch 90/172] avg loss 0.00108915, throughput 2.78683K wps
[Epoch 143 Batch 120/172] avg loss 0.00127657, throughput 2.86422K wps
[Epoch 143 Batch 150/172] avg loss 0.00137158, throughput 2.80574K wps
Begin Testing...
[Epoch 143] train avg loss 0.00116092, dev acc 0.8491, dev avg loss 0.548825, throughput 2.79533K wps
[Epoch 144 Batch 30/172] avg loss 0.0010496, throughput 2.78486K wps
[Epoch 144 Batch 60/172] avg loss 0.00105601, throughput 2.77066K wps
[Epoch 144 Batch 90/172] avg loss 0.00121178, throughput 2.79978K wps
[Epoch 144 Batch 120/172] avg loss 0.00109008, throughput 2.84113K wps
[Epoch 144 Batch 150/172] avg loss 0.00129771, throughput 2.82578K wps
Begin Testing...
[Epoch 144] train avg loss 0.00114651, dev acc 0.8480, dev avg loss 0.55525, throughput 2.80861K wps
[Epoch 145 Batch 30/172] avg loss 0.000892761, throughput 2.82271K wps
[Epoch 145 Batch 60/172] avg loss 0.00116008, throughput 2.80967K wps
[Epoch 145 Batch 90/172] avg loss 0.00110011, throughput 2.77397K wps
[Epoch 145 Batch 120/172] avg loss 0.00108628, throughput 2.83405K wps
[Epoch 145 Batch 150/172] avg loss 0.00128204, throughput 2.82656K wps
Begin Testing...
[Epoch 145] train avg loss 0.00112926, dev acc 0.8480, dev avg loss 0.549864, throughput 2.81619K wps
[Epoch 146 Batch 30/172] avg loss 0.00112927, throughput 2.80279K wps
[Epoch 146 Batch 60/172] avg loss 0.000988192, throughput 2.84064K wps
[Epoch 146 Batch 90/172] avg loss 0.00114271, throughput 2.83417K wps
[Epoch 146 Batch 120/172] avg loss 0.00117124, throughput 2.80276K wps
[Epoch 146 Batch 150/172] avg loss 0.00130219, throughput 2.76435K wps
Begin Testing...
[Epoch 146] train avg loss 0.00115186, dev acc 0.8459, dev avg loss 0.54571, throughput 2.80492K wps
[Epoch 147 Batch 30/172] avg loss 0.00109639, throughput 2.73288K wps
[Epoch 147 Batch 60/172] avg loss 0.00109686, throughput 2.77443K wps
[Epoch 147 Batch 90/172] avg loss 0.00103692, throughput 2.73485K wps
[Epoch 147 Batch 120/172] avg loss 0.00107843, throughput 2.688K wps
[Epoch 147 Batch 150/172] avg loss 0.00128213, throughput 2.76874K wps
Begin Testing...
[Epoch 147] train avg loss 0.00112218, dev acc 0.8459, dev avg loss 0.546597, throughput 2.74301K wps
[Epoch 148 Batch 30/172] avg loss 0.000872697, throughput 2.83477K wps
[Epoch 148 Batch 60/172] avg loss 0.00104002, throughput 2.72331K wps
[Epoch 148 Batch 90/172] avg loss 0.00132328, throughput 2.72967K wps
[Epoch 148 Batch 120/172] avg loss 0.00105306, throughput 2.66911K wps
[Epoch 148 Batch 150/172] avg loss 0.00116314, throughput 2.81199K wps
Begin Testing...
[Epoch 148] train avg loss 0.00109156, dev acc 0.8480, dev avg loss 0.564226, throughput 2.75161K wps
[Epoch 149 Batch 30/172] avg loss 0.00101001, throughput 2.87566K wps
[Epoch 149 Batch 60/172] avg loss 0.00103395, throughput 2.78116K wps
[Epoch 149 Batch 90/172] avg loss 0.00119277, throughput 2.80251K wps
[Epoch 149 Batch 120/172] avg loss 0.00103026, throughput 2.80844K wps
[Epoch 149 Batch 150/172] avg loss 0.00126741, throughput 2.80154K wps
Begin Testing...
[Epoch 149] train avg loss 0.00108578, dev acc 0.8470, dev avg loss 0.555052, throughput 2.79997K wps
[Epoch 150 Batch 30/172] avg loss 0.00112123, throughput 2.84526K wps
[Epoch 150 Batch 60/172] avg loss 0.00120428, throughput 2.79071K wps
[Epoch 150 Batch 90/172] avg loss 0.00124506, throughput 2.78625K wps
[Epoch 150 Batch 120/172] avg loss 0.000929322, throughput 2.79733K wps
[Epoch 150 Batch 150/172] avg loss 0.000949418, throughput 2.75498K wps
Begin Testing...
[Epoch 150] train avg loss 0.00113141, dev acc 0.8386, dev avg loss 0.538553, throughput 2.79152K wps
[Epoch 151 Batch 30/172] avg loss 0.00106638, throughput 2.8136K wps
[Epoch 151 Batch 60/172] avg loss 0.000969445, throughput 2.70649K wps
[Epoch 151 Batch 90/172] avg loss 0.00113403, throughput 2.73485K wps
[Epoch 151 Batch 120/172] avg loss 0.000912069, throughput 2.76775K wps
[Epoch 151 Batch 150/172] avg loss 0.00131975, throughput 2.7922K wps
Begin Testing...
[Epoch 151] train avg loss 0.00107198, dev acc 0.8428, dev avg loss 0.544654, throughput 2.76574K wps
[Epoch 152 Batch 30/172] avg loss 0.00106815, throughput 2.88663K wps
[Epoch 152 Batch 60/172] avg loss 0.000987178, throughput 2.7363K wps
[Epoch 152 Batch 90/172] avg loss 0.00106266, throughput 2.851K wps
[Epoch 152 Batch 120/172] avg loss 0.000957282, throughput 2.81595K wps
[Epoch 152 Batch 150/172] avg loss 0.00104319, throughput 2.79808K wps
Begin Testing...
[Epoch 152] train avg loss 0.00104577, dev acc 0.8417, dev avg loss 0.553887, throughput 2.81716K wps
[Epoch 153 Batch 30/172] avg loss 0.000979902, throughput 2.75327K wps
[Epoch 153 Batch 60/172] avg loss 0.00108708, throughput 2.82087K wps
[Epoch 153 Batch 90/172] avg loss 0.000976649, throughput 2.80183K wps
[Epoch 153 Batch 120/172] avg loss 0.00102102, throughput 2.82004K wps
[Epoch 153 Batch 150/172] avg loss 0.00144386, throughput 2.78842K wps
Begin Testing...
[Epoch 153] train avg loss 0.00109608, dev acc 0.8459, dev avg loss 0.554825, throughput 2.80128K wps
[Epoch 154 Batch 30/172] avg loss 0.00103166, throughput 2.85845K wps
[Epoch 154 Batch 60/172] avg loss 0.000936495, throughput 2.82241K wps
[Epoch 154 Batch 90/172] avg loss 0.0010561, throughput 2.7739K wps
[Epoch 154 Batch 120/172] avg loss 0.00110294, throughput 2.84969K wps
[Epoch 154 Batch 150/172] avg loss 0.0010161, throughput 2.82355K wps
Begin Testing...
[Epoch 154] train avg loss 0.00107406, dev acc 0.8470, dev avg loss 0.557366, throughput 2.8238K wps
[Epoch 155 Batch 30/172] avg loss 0.00104735, throughput 2.83515K wps
[Epoch 155 Batch 60/172] avg loss 0.000943673, throughput 2.82718K wps
[Epoch 155 Batch 90/172] avg loss 0.00128282, throughput 2.8061K wps
[Epoch 155 Batch 120/172] avg loss 0.000926361, throughput 2.82328K wps
[Epoch 155 Batch 150/172] avg loss 0.000892859, throughput 2.70268K wps
Begin Testing...
[Epoch 155] train avg loss 0.00106164, dev acc 0.8470, dev avg loss 0.558218, throughput 2.7948K wps
[Epoch 156 Batch 30/172] avg loss 0.000939164, throughput 2.69291K wps
[Epoch 156 Batch 60/172] avg loss 0.00111452, throughput 2.8177K wps
[Epoch 156 Batch 90/172] avg loss 0.000964252, throughput 2.81718K wps
[Epoch 156 Batch 120/172] avg loss 0.00104562, throughput 2.82662K wps
[Epoch 156 Batch 150/172] avg loss 0.00112173, throughput 2.82658K wps
Begin Testing...
[Epoch 156] train avg loss 0.00102303, dev acc 0.8470, dev avg loss 0.572051, throughput 2.80051K wps
[Epoch 157 Batch 30/172] avg loss 0.000978403, throughput 2.90471K wps
[Epoch 157 Batch 60/172] avg loss 0.000907415, throughput 2.8418K wps
[Epoch 157 Batch 90/172] avg loss 0.00125089, throughput 2.80891K wps
[Epoch 157 Batch 120/172] avg loss 0.00101393, throughput 2.76423K wps
[Epoch 157 Batch 150/172] avg loss 0.00119631, throughput 2.8985K wps
Begin Testing...
[Epoch 157] train avg loss 0.00107893, dev acc 0.8459, dev avg loss 0.562369, throughput 2.82692K wps
[Epoch 158 Batch 30/172] avg loss 0.000957545, throughput 2.8808K wps
[Epoch 158 Batch 60/172] avg loss 0.000978831, throughput 2.78382K wps
[Epoch 158 Batch 90/172] avg loss 0.00121445, throughput 2.78499K wps
[Epoch 158 Batch 120/172] avg loss 0.000963657, throughput 2.78706K wps
[Epoch 158 Batch 150/172] avg loss 0.00101303, throughput 2.74982K wps
Begin Testing...
[Epoch 158] train avg loss 0.00103681, dev acc 0.8501, dev avg loss 0.565034, throughput 2.7769K wps
[Epoch 159 Batch 30/172] avg loss 0.000963313, throughput 2.85434K wps
[Epoch 159 Batch 60/172] avg loss 0.00126617, throughput 2.74226K wps
[Epoch 159 Batch 90/172] avg loss 0.00100338, throughput 2.79994K wps
[Epoch 159 Batch 120/172] avg loss 0.00122765, throughput 2.79836K wps
[Epoch 159 Batch 150/172] avg loss 0.000980455, throughput 2.78026K wps
Begin Testing...
[Epoch 159] train avg loss 0.00106929, dev acc 0.8470, dev avg loss 0.569488, throughput 2.79398K wps
[Epoch 160 Batch 30/172] avg loss 0.000896183, throughput 2.81028K wps
[Epoch 160 Batch 60/172] avg loss 0.00106516, throughput 2.67387K wps
[Epoch 160 Batch 90/172] avg loss 0.000987925, throughput 2.65509K wps
[Epoch 160 Batch 120/172] avg loss 0.00123569, throughput 2.71048K wps
[Epoch 160 Batch 150/172] avg loss 0.000922348, throughput 2.75852K wps
Begin Testing...
[Epoch 160] train avg loss 0.00102622, dev acc 0.8480, dev avg loss 0.566657, throughput 2.71651K wps
[Epoch 161 Batch 30/172] avg loss 0.000958526, throughput 2.82248K wps
[Epoch 161 Batch 60/172] avg loss 0.00105611, throughput 2.79221K wps
[Epoch 161 Batch 90/172] avg loss 0.00119219, throughput 2.75344K wps
[Epoch 161 Batch 120/172] avg loss 0.00104305, throughput 2.7859K wps
[Epoch 161 Batch 150/172] avg loss 0.0012327, throughput 2.70197K wps
Begin Testing...
[Epoch 161] train avg loss 0.0010795, dev acc 0.8501, dev avg loss 0.562325, throughput 2.76773K wps
[Epoch 162 Batch 30/172] avg loss 0.000884172, throughput 2.76886K wps
[Epoch 162 Batch 60/172] avg loss 0.00107107, throughput 2.7369K wps
[Epoch 162 Batch 90/172] avg loss 0.00118764, throughput 2.76349K wps
[Epoch 162 Batch 120/172] avg loss 0.00113304, throughput 2.76121K wps
[Epoch 162 Batch 150/172] avg loss 0.000976019, throughput 2.78646K wps
Begin Testing...
[Epoch 162] train avg loss 0.00103034, dev acc 0.8470, dev avg loss 0.570814, throughput 2.76378K wps
[Epoch 163 Batch 30/172] avg loss 0.00101427, throughput 2.72095K wps
[Epoch 163 Batch 60/172] avg loss 0.00110325, throughput 2.74295K wps
[Epoch 163 Batch 90/172] avg loss 0.000896279, throughput 2.77173K wps
[Epoch 163 Batch 120/172] avg loss 0.000817928, throughput 2.69064K wps
[Epoch 163 Batch 150/172] avg loss 0.00112862, throughput 2.78297K wps
Begin Testing...
[Epoch 163] train avg loss 0.000999954, dev acc 0.8438, dev avg loss 0.565127, throughput 2.74451K wps
[Epoch 164 Batch 30/172] avg loss 0.000651699, throughput 2.80382K wps
[Epoch 164 Batch 60/172] avg loss 0.00105789, throughput 2.75201K wps
[Epoch 164 Batch 90/172] avg loss 0.0010887, throughput 2.76078K wps
[Epoch 164 Batch 120/172] avg loss 0.00120974, throughput 2.82887K wps
[Epoch 164 Batch 150/172] avg loss 0.000978455, throughput 2.83084K wps
Begin Testing...
[Epoch 164] train avg loss 0.00103298, dev acc 0.8459, dev avg loss 0.568612, throughput 2.79492K wps
[Epoch 165 Batch 30/172] avg loss 0.000872494, throughput 2.71995K wps
[Epoch 165 Batch 60/172] avg loss 0.00086021, throughput 2.78141K wps
[Epoch 165 Batch 90/172] avg loss 0.00111058, throughput 2.82734K wps
[Epoch 165 Batch 120/172] avg loss 0.00116141, throughput 2.8181K wps
[Epoch 165 Batch 150/172] avg loss 0.000946216, throughput 2.83329K wps
Begin Testing...
[Epoch 165] train avg loss 0.00101836, dev acc 0.8501, dev avg loss 0.58441, throughput 2.7982K wps
[Epoch 166 Batch 30/172] avg loss 0.00106947, throughput 2.87041K wps
[Epoch 166 Batch 60/172] avg loss 0.00096341, throughput 2.70626K wps
[Epoch 166 Batch 90/172] avg loss 0.000984825, throughput 2.81973K wps
[Epoch 166 Batch 120/172] avg loss 0.000930136, throughput 2.82705K wps
[Epoch 166 Batch 150/172] avg loss 0.00106846, throughput 2.8104K wps
Begin Testing...
[Epoch 166] train avg loss 0.0010178, dev acc 0.8491, dev avg loss 0.575923, throughput 2.81019K wps
[Epoch 167 Batch 30/172] avg loss 0.000891206, throughput 2.90038K wps
[Epoch 167 Batch 60/172] avg loss 0.000928676, throughput 2.80433K wps
[Epoch 167 Batch 90/172] avg loss 0.000838786, throughput 2.81723K wps
[Epoch 167 Batch 120/172] avg loss 0.00111921, throughput 2.82311K wps
[Epoch 167 Batch 150/172] avg loss 0.000935186, throughput 2.83757K wps
Begin Testing...
[Epoch 167] train avg loss 0.000972826, dev acc 0.8417, dev avg loss 0.563355, throughput 2.83295K wps
[Epoch 168 Batch 30/172] avg loss 0.000921359, throughput 2.84318K wps
[Epoch 168 Batch 60/172] avg loss 0.000775784, throughput 2.83483K wps
[Epoch 168 Batch 90/172] avg loss 0.0010189, throughput 2.73287K wps
[Epoch 168 Batch 120/172] avg loss 0.0010835, throughput 2.88065K wps
[Epoch 168 Batch 150/172] avg loss 0.000935866, throughput 2.79897K wps
Begin Testing...
[Epoch 168] train avg loss 0.00097192, dev acc 0.8449, dev avg loss 0.574112, throughput 2.8129K wps
[Epoch 169 Batch 30/172] avg loss 0.00097475, throughput 2.86598K wps
[Epoch 169 Batch 60/172] avg loss 0.000899198, throughput 2.76667K wps
[Epoch 169 Batch 90/172] avg loss 0.000912795, throughput 2.69292K wps
[Epoch 169 Batch 120/172] avg loss 0.00114375, throughput 2.76333K wps
[Epoch 169 Batch 150/172] avg loss 0.000974283, throughput 2.75411K wps
Begin Testing...
[Epoch 169] train avg loss 0.000992687, dev acc 0.8501, dev avg loss 0.574197, throughput 2.76936K wps
[Epoch 170 Batch 30/172] avg loss 0.000968187, throughput 2.81754K wps
[Epoch 170 Batch 60/172] avg loss 0.000854627, throughput 2.8639K wps
[Epoch 170 Batch 90/172] avg loss 0.000798741, throughput 2.82036K wps
[Epoch 170 Batch 120/172] avg loss 0.00110643, throughput 2.82212K wps
[Epoch 170 Batch 150/172] avg loss 0.00107434, throughput 2.79205K wps
Begin Testing...
[Epoch 170] train avg loss 0.000985249, dev acc 0.8470, dev avg loss 0.571574, throughput 2.80306K wps
[Epoch 171 Batch 30/172] avg loss 0.000918363, throughput 2.87767K wps
[Epoch 171 Batch 60/172] avg loss 0.000898663, throughput 2.82053K wps
[Epoch 171 Batch 90/172] avg loss 0.000914834, throughput 2.80132K wps
[Epoch 171 Batch 120/172] avg loss 0.00128041, throughput 2.80664K wps
[Epoch 171 Batch 150/172] avg loss 0.000902943, throughput 2.72987K wps
Begin Testing...
[Epoch 171] train avg loss 0.00100686, dev acc 0.8438, dev avg loss 0.576533, throughput 2.80748K wps
[Epoch 172 Batch 30/172] avg loss 0.000862541, throughput 2.75402K wps
[Epoch 172 Batch 60/172] avg loss 0.000981187, throughput 2.75589K wps
[Epoch 172 Batch 90/172] avg loss 0.00115705, throughput 2.75486K wps
[Epoch 172 Batch 120/172] avg loss 0.000974359, throughput 2.75656K wps
[Epoch 172 Batch 150/172] avg loss 0.000960964, throughput 2.72046K wps
Begin Testing...
[Epoch 172] train avg loss 0.000985906, dev acc 0.8459, dev avg loss 0.575187, throughput 2.74612K wps
[Epoch 173 Batch 30/172] avg loss 0.000868353, throughput 2.73365K wps
[Epoch 173 Batch 60/172] avg loss 0.000873119, throughput 2.67132K wps
[Epoch 173 Batch 90/172] avg loss 0.00111357, throughput 2.71384K wps
[Epoch 173 Batch 120/172] avg loss 0.000822759, throughput 2.72842K wps
[Epoch 173 Batch 150/172] avg loss 0.0010423, throughput 2.75726K wps
Begin Testing...
[Epoch 173] train avg loss 0.000978655, dev acc 0.8491, dev avg loss 0.603581, throughput 2.7169K wps
[Epoch 174 Batch 30/172] avg loss 0.000908358, throughput 2.68043K wps
[Epoch 174 Batch 60/172] avg loss 0.00085952, throughput 2.77708K wps
[Epoch 174 Batch 90/172] avg loss 0.000910524, throughput 2.68012K wps
[Epoch 174 Batch 120/172] avg loss 0.000953097, throughput 2.77561K wps
[Epoch 174 Batch 150/172] avg loss 0.00113775, throughput 2.69379K wps
Begin Testing...
[Epoch 174] train avg loss 0.000952048, dev acc 0.8470, dev avg loss 0.587249, throughput 2.71456K wps
[Epoch 175 Batch 30/172] avg loss 0.000940039, throughput 2.73226K wps
[Epoch 175 Batch 60/172] avg loss 0.000838438, throughput 2.70121K wps
[Epoch 175 Batch 90/172] avg loss 0.000946834, throughput 2.69571K wps
[Epoch 175 Batch 120/172] avg loss 0.00109747, throughput 2.77772K wps
[Epoch 175 Batch 150/172] avg loss 0.00102949, throughput 2.70614K wps
Begin Testing...
[Epoch 175] train avg loss 0.000968819, dev acc 0.8438, dev avg loss 0.579456, throughput 2.73156K wps
[Epoch 176 Batch 30/172] avg loss 0.00105406, throughput 2.70619K wps
[Epoch 176 Batch 60/172] avg loss 0.000892113, throughput 2.68718K wps
[Epoch 176 Batch 90/172] avg loss 0.000830166, throughput 2.83099K wps
[Epoch 176 Batch 120/172] avg loss 0.00105255, throughput 2.70807K wps
[Epoch 176 Batch 150/172] avg loss 0.00105835, throughput 2.84509K wps
Begin Testing...
[Epoch 176] train avg loss 0.000965043, dev acc 0.8470, dev avg loss 0.584967, throughput 2.76516K wps
[Epoch 177 Batch 30/172] avg loss 0.00104776, throughput 2.89919K wps
[Epoch 177 Batch 60/172] avg loss 0.000829644, throughput 2.81835K wps
[Epoch 177 Batch 90/172] avg loss 0.000921871, throughput 2.82032K wps
[Epoch 177 Batch 120/172] avg loss 0.00110123, throughput 2.82009K wps
[Epoch 177 Batch 150/172] avg loss 0.000902682, throughput 2.71949K wps
Begin Testing...
[Epoch 177] train avg loss 0.000955807, dev acc 0.8512, dev avg loss 0.622547, throughput 2.81939K wps
[Epoch 178 Batch 30/172] avg loss 0.000965378, throughput 2.88808K wps
[Epoch 178 Batch 60/172] avg loss 0.000846312, throughput 2.74856K wps
[Epoch 178 Batch 90/172] avg loss 0.00104964, throughput 2.81792K wps
[Epoch 178 Batch 120/172] avg loss 0.000825329, throughput 2.81677K wps
[Epoch 178 Batch 150/172] avg loss 0.000747604, throughput 2.80278K wps
Begin Testing...
[Epoch 178] train avg loss 0.000930798, dev acc 0.8459, dev avg loss 0.584035, throughput 2.79524K wps
[Epoch 179 Batch 30/172] avg loss 0.000780939, throughput 2.83682K wps
[Epoch 179 Batch 60/172] avg loss 0.00104028, throughput 2.69321K wps
[Epoch 179 Batch 90/172] avg loss 0.00098449, throughput 2.72054K wps
[Epoch 179 Batch 120/172] avg loss 0.000874144, throughput 2.76952K wps
[Epoch 179 Batch 150/172] avg loss 0.00101729, throughput 2.81238K wps
Begin Testing...
[Epoch 179] train avg loss 0.000950436, dev acc 0.8491, dev avg loss 0.614846, throughput 2.75813K wps
[Epoch 180 Batch 30/172] avg loss 0.000951086, throughput 2.88062K wps
[Epoch 180 Batch 60/172] avg loss 0.00080088, throughput 2.8128K wps
[Epoch 180 Batch 90/172] avg loss 0.000943583, throughput 2.77305K wps
[Epoch 180 Batch 120/172] avg loss 0.00105253, throughput 2.8731K wps
[Epoch 180 Batch 150/172] avg loss 0.000894192, throughput 2.8316K wps
Begin Testing...
[Epoch 180] train avg loss 0.000959013, dev acc 0.8491, dev avg loss 0.590396, throughput 2.82846K wps
[Epoch 181 Batch 30/172] avg loss 0.000811486, throughput 2.85127K wps
[Epoch 181 Batch 60/172] avg loss 0.00106819, throughput 2.82211K wps
[Epoch 181 Batch 90/172] avg loss 0.000852337, throughput 2.83174K wps
[Epoch 181 Batch 120/172] avg loss 0.000995107, throughput 2.81582K wps
[Epoch 181 Batch 150/172] avg loss 0.00101429, throughput 2.84033K wps
Begin Testing...
[Epoch 181] train avg loss 0.000954661, dev acc 0.8543, dev avg loss 0.602611, throughput 2.82239K wps
[Epoch 182 Batch 30/172] avg loss 0.000989367, throughput 2.85151K wps
[Epoch 182 Batch 60/172] avg loss 0.00103781, throughput 2.8035K wps
[Epoch 182 Batch 90/172] avg loss 0.000833404, throughput 2.79901K wps
[Epoch 182 Batch 120/172] avg loss 0.000847888, throughput 2.8314K wps
[Epoch 182 Batch 150/172] avg loss 0.000978465, throughput 2.79849K wps
Begin Testing...
[Epoch 182] train avg loss 0.000918059, dev acc 0.8501, dev avg loss 0.603869, throughput 2.80114K wps
[Epoch 183 Batch 30/172] avg loss 0.000822235, throughput 2.85357K wps
[Epoch 183 Batch 60/172] avg loss 0.00096846, throughput 2.82615K wps
[Epoch 183 Batch 90/172] avg loss 0.000903342, throughput 2.80472K wps
[Epoch 183 Batch 120/172] avg loss 0.000942011, throughput 2.7028K wps
[Epoch 183 Batch 150/172] avg loss 0.000775541, throughput 2.77327K wps
Begin Testing...
[Epoch 183] train avg loss 0.000896337, dev acc 0.8522, dev avg loss 0.599652, throughput 2.78706K wps
[Epoch 184 Batch 30/172] avg loss 0.00106664, throughput 2.79722K wps
[Epoch 184 Batch 60/172] avg loss 0.000857227, throughput 2.77465K wps
[Epoch 184 Batch 90/172] avg loss 0.000800851, throughput 2.76161K wps
[Epoch 184 Batch 120/172] avg loss 0.00101755, throughput 2.74429K wps
[Epoch 184 Batch 150/172] avg loss 0.000813409, throughput 2.77348K wps
Begin Testing...
[Epoch 184] train avg loss 0.000918978, dev acc 0.8480, dev avg loss 0.607956, throughput 2.75783K wps
[Epoch 185 Batch 30/172] avg loss 0.000742604, throughput 2.83033K wps
[Epoch 185 Batch 60/172] avg loss 0.000825443, throughput 2.74657K wps
[Epoch 185 Batch 90/172] avg loss 0.000845865, throughput 2.75192K wps
[Epoch 185 Batch 120/172] avg loss 0.000911936, throughput 2.76669K wps
[Epoch 185 Batch 150/172] avg loss 0.00120993, throughput 2.72306K wps
Begin Testing...
[Epoch 185] train avg loss 0.000931038, dev acc 0.8470, dev avg loss 0.602948, throughput 2.75274K wps
[Epoch 186 Batch 30/172] avg loss 0.000876207, throughput 2.77306K wps
[Epoch 186 Batch 60/172] avg loss 0.000750352, throughput 2.7568K wps
[Epoch 186 Batch 90/172] avg loss 0.00098817, throughput 2.78408K wps
[Epoch 186 Batch 120/172] avg loss 0.000856023, throughput 2.69589K wps
[Epoch 186 Batch 150/172] avg loss 0.000998309, throughput 2.79239K wps
Begin Testing...
[Epoch 186] train avg loss 0.000903912, dev acc 0.8501, dev avg loss 0.659189, throughput 2.7512K wps
[Epoch 187 Batch 30/172] avg loss 0.000915582, throughput 2.78537K wps
[Epoch 187 Batch 60/172] avg loss 0.00083595, throughput 2.67757K wps
[Epoch 187 Batch 90/172] avg loss 0.000793271, throughput 2.76678K wps
[Epoch 187 Batch 120/172] avg loss 0.00080959, throughput 2.64156K wps
[Epoch 187 Batch 150/172] avg loss 0.00104819, throughput 2.82097K wps
Begin Testing...
[Epoch 187] train avg loss 0.00089903, dev acc 0.8491, dev avg loss 0.614071, throughput 2.74222K wps
[Epoch 188 Batch 30/172] avg loss 0.00103196, throughput 2.83593K wps
[Epoch 188 Batch 60/172] avg loss 0.000878909, throughput 2.77454K wps
[Epoch 188 Batch 90/172] avg loss 0.000813681, throughput 2.76215K wps
[Epoch 188 Batch 120/172] avg loss 0.000904173, throughput 2.75922K wps
[Epoch 188 Batch 150/172] avg loss 0.00076131, throughput 2.66165K wps
Begin Testing...
[Epoch 188] train avg loss 0.000909037, dev acc 0.8501, dev avg loss 0.609645, throughput 2.75877K wps
[Epoch 189 Batch 30/172] avg loss 0.000824921, throughput 2.84645K wps
[Epoch 189 Batch 60/172] avg loss 0.00109485, throughput 2.71594K wps
[Epoch 189 Batch 90/172] avg loss 0.00086523, throughput 2.67092K wps
[Epoch 189 Batch 120/172] avg loss 0.000957542, throughput 2.65517K wps
[Epoch 189 Batch 150/172] avg loss 0.00104903, throughput 2.72304K wps
Begin Testing...
[Epoch 189] train avg loss 0.000923309, dev acc 0.8491, dev avg loss 0.627049, throughput 2.72648K wps
[Epoch 190 Batch 30/172] avg loss 0.000806124, throughput 2.67808K wps
[Epoch 190 Batch 60/172] avg loss 0.000880539, throughput 2.81125K wps
[Epoch 190 Batch 90/172] avg loss 0.00100538, throughput 2.76508K wps
[Epoch 190 Batch 120/172] avg loss 0.000865019, throughput 2.67675K wps
[Epoch 190 Batch 150/172] avg loss 0.000830285, throughput 2.80847K wps
Begin Testing...
[Epoch 190] train avg loss 0.000900446, dev acc 0.8480, dev avg loss 0.60256, throughput 2.75361K wps
[Epoch 191 Batch 30/172] avg loss 0.000733134, throughput 2.68698K wps
[Epoch 191 Batch 60/172] avg loss 0.00068407, throughput 2.74915K wps
[Epoch 191 Batch 90/172] avg loss 0.00106056, throughput 2.79584K wps
[Epoch 191 Batch 120/172] avg loss 0.000875668, throughput 2.76654K wps
[Epoch 191 Batch 150/172] avg loss 0.00101181, throughput 2.79299K wps
Begin Testing...
[Epoch 191] train avg loss 0.000896785, dev acc 0.8532, dev avg loss 0.621955, throughput 2.76254K wps
[Epoch 192 Batch 30/172] avg loss 0.000767704, throughput 2.84462K wps
[Epoch 192 Batch 60/172] avg loss 0.000557343, throughput 2.77209K wps
[Epoch 192 Batch 90/172] avg loss 0.00103866, throughput 2.79759K wps
[Epoch 192 Batch 120/172] avg loss 0.00108612, throughput 2.79348K wps
[Epoch 192 Batch 150/172] avg loss 0.00105144, throughput 2.76429K wps
Begin Testing...
[Epoch 192] train avg loss 0.000909223, dev acc 0.8512, dev avg loss 0.60459, throughput 2.78414K wps
[Epoch 193 Batch 30/172] avg loss 0.000717992, throughput 2.88416K wps
[Epoch 193 Batch 60/172] avg loss 0.000929996, throughput 2.83663K wps
[Epoch 193 Batch 90/172] avg loss 0.000979847, throughput 2.83657K wps
[Epoch 193 Batch 120/172] avg loss 0.000847838, throughput 2.79823K wps
[Epoch 193 Batch 150/172] avg loss 0.000857675, throughput 2.74587K wps
Begin Testing...
[Epoch 193] train avg loss 0.000851408, dev acc 0.8532, dev avg loss 0.627292, throughput 2.83125K wps
[Epoch 194 Batch 30/172] avg loss 0.000879286, throughput 2.77278K wps
[Epoch 194 Batch 60/172] avg loss 0.000977141, throughput 2.78062K wps
[Epoch 194 Batch 90/172] avg loss 0.000733239, throughput 2.79452K wps
[Epoch 194 Batch 120/172] avg loss 0.000978519, throughput 2.77668K wps
[Epoch 194 Batch 150/172] avg loss 0.000966449, throughput 2.68082K wps
Begin Testing...
[Epoch 194] train avg loss 0.000905048, dev acc 0.8470, dev avg loss 0.606288, throughput 2.76498K wps
[Epoch 195 Batch 30/172] avg loss 0.00115162, throughput 2.80618K wps
[Epoch 195 Batch 60/172] avg loss 0.000709707, throughput 2.85924K wps
[Epoch 195 Batch 90/172] avg loss 0.000757715, throughput 2.79979K wps
[Epoch 195 Batch 120/172] avg loss 0.000902893, throughput 2.79328K wps
[Epoch 195 Batch 150/172] avg loss 0.000877188, throughput 2.78061K wps
Begin Testing...
[Epoch 195] train avg loss 0.000854244, dev acc 0.8512, dev avg loss 0.637075, throughput 2.80778K wps
[Epoch 196 Batch 30/172] avg loss 0.000790337, throughput 2.87723K wps
[Epoch 196 Batch 60/172] avg loss 0.000746528, throughput 2.81398K wps
[Epoch 196 Batch 90/172] avg loss 0.000834202, throughput 2.82077K wps
[Epoch 196 Batch 120/172] avg loss 0.0011017, throughput 2.7695K wps
[Epoch 196 Batch 150/172] avg loss 0.000794615, throughput 2.74437K wps
Begin Testing...
[Epoch 196] train avg loss 0.000878906, dev acc 0.8501, dev avg loss 0.62926, throughput 2.80051K wps
[Epoch 197 Batch 30/172] avg loss 0.000806942, throughput 2.83611K wps
[Epoch 197 Batch 60/172] avg loss 0.000928586, throughput 2.79561K wps
[Epoch 197 Batch 90/172] avg loss 0.00100061, throughput 2.80459K wps
[Epoch 197 Batch 120/172] avg loss 0.000751803, throughput 2.78731K wps
[Epoch 197 Batch 150/172] avg loss 0.000949848, throughput 2.77226K wps
Begin Testing...
[Epoch 197] train avg loss 0.000876325, dev acc 0.8470, dev avg loss 0.608644, throughput 2.79832K wps
[Epoch 198 Batch 30/172] avg loss 0.000679452, throughput 2.74468K wps
[Epoch 198 Batch 60/172] avg loss 0.000929797, throughput 2.748K wps
[Epoch 198 Batch 90/172] avg loss 0.000674891, throughput 2.77948K wps
[Epoch 198 Batch 120/172] avg loss 0.000970079, throughput 2.76328K wps
[Epoch 198 Batch 150/172] avg loss 0.000959213, throughput 2.72625K wps
Begin Testing...
[Epoch 198] train avg loss 0.000864908, dev acc 0.8512, dev avg loss 0.615578, throughput 2.75215K wps
[Epoch 199 Batch 30/172] avg loss 0.000889918, throughput 2.85482K wps
[Epoch 199 Batch 60/172] avg loss 0.000967303, throughput 2.73039K wps
[Epoch 199 Batch 90/172] avg loss 0.000831455, throughput 2.74196K wps
[Epoch 199 Batch 120/172] avg loss 0.000798077, throughput 2.8139K wps
[Epoch 199 Batch 150/172] avg loss 0.000884591, throughput 2.79371K wps
Begin Testing...
[Epoch 199] train avg loss 0.000875039, dev acc 0.8470, dev avg loss 0.607424, throughput 2.78445K wps
Test loss 0.573109, test acc 0.8462
Total time cost 467.11s
[Epoch 0 Batch 30/172] avg loss 0.01257, throughput 2.63855K wps
[Epoch 0 Batch 60/172] avg loss 0.0123698, throughput 2.78639K wps
[Epoch 0 Batch 90/172] avg loss 0.0123906, throughput 2.78736K wps
[Epoch 0 Batch 120/172] avg loss 0.0124176, throughput 2.76803K wps
[Epoch 0 Batch 150/172] avg loss 0.0124156, throughput 2.78484K wps
Begin Testing...
[Epoch 0] train avg loss 0.0124551, dev acc 0.6771, dev avg loss 0.628875, throughput 2.75651K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0121932, throughput 2.75057K wps
[Epoch 1 Batch 60/172] avg loss 0.0122418, throughput 2.67406K wps
[Epoch 1 Batch 90/172] avg loss 0.0126019, throughput 2.82398K wps
[Epoch 1 Batch 120/172] avg loss 0.0123901, throughput 2.76013K wps
[Epoch 1 Batch 150/172] avg loss 0.0123511, throughput 2.83646K wps
Begin Testing...
[Epoch 1] train avg loss 0.0123514, dev acc 0.6771, dev avg loss 0.626983, throughput 2.77244K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0122399, throughput 2.86694K wps
[Epoch 2 Batch 60/172] avg loss 0.0122625, throughput 2.82131K wps
[Epoch 2 Batch 90/172] avg loss 0.0124953, throughput 2.81539K wps
[Epoch 2 Batch 120/172] avg loss 0.0124738, throughput 2.8176K wps
[Epoch 2 Batch 150/172] avg loss 0.0123543, throughput 2.74352K wps
Begin Testing...
[Epoch 2] train avg loss 0.012318, dev acc 0.6771, dev avg loss 0.628122, throughput 2.80981K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0123392, throughput 2.73718K wps
[Epoch 3 Batch 60/172] avg loss 0.0121192, throughput 2.82851K wps
[Epoch 3 Batch 90/172] avg loss 0.0122724, throughput 2.82124K wps
[Epoch 3 Batch 120/172] avg loss 0.0123735, throughput 2.73053K wps
[Epoch 3 Batch 150/172] avg loss 0.0122864, throughput 2.71256K wps
Begin Testing...
[Epoch 3] train avg loss 0.0122885, dev acc 0.6771, dev avg loss 0.626071, throughput 2.76727K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0120917, throughput 2.86586K wps
[Epoch 4 Batch 60/172] avg loss 0.0126742, throughput 2.77242K wps
[Epoch 4 Batch 90/172] avg loss 0.0123575, throughput 2.84088K wps
[Epoch 4 Batch 120/172] avg loss 0.0120999, throughput 2.78098K wps
[Epoch 4 Batch 150/172] avg loss 0.01229, throughput 2.85689K wps
Begin Testing...
[Epoch 4] train avg loss 0.0122775, dev acc 0.6771, dev avg loss 0.626163, throughput 2.82301K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0120921, throughput 2.79728K wps
[Epoch 5 Batch 60/172] avg loss 0.0120615, throughput 2.85236K wps
[Epoch 5 Batch 90/172] avg loss 0.0123808, throughput 2.82861K wps
[Epoch 5 Batch 120/172] avg loss 0.0119265, throughput 2.81244K wps
[Epoch 5 Batch 150/172] avg loss 0.0124696, throughput 2.81009K wps
Begin Testing...
[Epoch 5] train avg loss 0.0122288, dev acc 0.6771, dev avg loss 0.625028, throughput 2.82157K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0119878, throughput 2.88081K wps
[Epoch 6 Batch 60/172] avg loss 0.0121926, throughput 2.81593K wps
[Epoch 6 Batch 90/172] avg loss 0.0119485, throughput 2.77365K wps
[Epoch 6 Batch 120/172] avg loss 0.0124742, throughput 2.87736K wps
[Epoch 6 Batch 150/172] avg loss 0.0121448, throughput 2.78793K wps
Begin Testing...
[Epoch 6] train avg loss 0.0122063, dev acc 0.6771, dev avg loss 0.62425, throughput 2.82893K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.0122955, throughput 2.7727K wps
[Epoch 7 Batch 60/172] avg loss 0.0121749, throughput 2.81518K wps
[Epoch 7 Batch 90/172] avg loss 0.0120202, throughput 2.77018K wps
[Epoch 7 Batch 120/172] avg loss 0.0123672, throughput 2.84515K wps
[Epoch 7 Batch 150/172] avg loss 0.0121322, throughput 2.81684K wps
Begin Testing...
[Epoch 7] train avg loss 0.0121995, dev acc 0.6771, dev avg loss 0.623823, throughput 2.78765K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.0120003, throughput 2.79688K wps
[Epoch 8 Batch 60/172] avg loss 0.0122605, throughput 2.78377K wps
[Epoch 8 Batch 90/172] avg loss 0.012302, throughput 2.78045K wps
[Epoch 8 Batch 120/172] avg loss 0.012165, throughput 2.74774K wps
[Epoch 8 Batch 150/172] avg loss 0.0120877, throughput 2.77829K wps
Begin Testing...
[Epoch 8] train avg loss 0.0121619, dev acc 0.6771, dev avg loss 0.623536, throughput 2.75893K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.0119314, throughput 2.7669K wps
[Epoch 9 Batch 60/172] avg loss 0.0122677, throughput 2.74465K wps
[Epoch 9 Batch 90/172] avg loss 0.0123783, throughput 2.71405K wps
[Epoch 9 Batch 120/172] avg loss 0.0122382, throughput 2.76216K wps
[Epoch 9 Batch 150/172] avg loss 0.0119157, throughput 2.76513K wps
Begin Testing...
[Epoch 9] train avg loss 0.0121352, dev acc 0.6771, dev avg loss 0.622476, throughput 2.74938K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.0118455, throughput 2.7484K wps
[Epoch 10 Batch 60/172] avg loss 0.0123157, throughput 2.75456K wps
[Epoch 10 Batch 90/172] avg loss 0.012156, throughput 2.68753K wps
[Epoch 10 Batch 120/172] avg loss 0.0122212, throughput 2.78634K wps
[Epoch 10 Batch 150/172] avg loss 0.0120892, throughput 2.80445K wps
Begin Testing...
[Epoch 10] train avg loss 0.0120957, dev acc 0.6771, dev avg loss 0.622062, throughput 2.74702K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.0115574, throughput 2.85715K wps
[Epoch 11 Batch 60/172] avg loss 0.0123716, throughput 2.79378K wps
[Epoch 11 Batch 90/172] avg loss 0.012127, throughput 2.79012K wps
[Epoch 11 Batch 120/172] avg loss 0.0121093, throughput 2.80068K wps
[Epoch 11 Batch 150/172] avg loss 0.0121174, throughput 2.80278K wps
Begin Testing...
[Epoch 11] train avg loss 0.0120693, dev acc 0.6771, dev avg loss 0.620032, throughput 2.78321K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.0121492, throughput 2.82508K wps
[Epoch 12 Batch 60/172] avg loss 0.0118055, throughput 2.77473K wps
[Epoch 12 Batch 90/172] avg loss 0.0120563, throughput 2.7167K wps
[Epoch 12 Batch 120/172] avg loss 0.0121195, throughput 2.64807K wps
[Epoch 12 Batch 150/172] avg loss 0.012193, throughput 2.77867K wps
Begin Testing...
[Epoch 12] train avg loss 0.0120515, dev acc 0.6771, dev avg loss 0.618832, throughput 2.75357K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.0118997, throughput 2.76537K wps
[Epoch 13 Batch 60/172] avg loss 0.0118397, throughput 2.78539K wps
[Epoch 13 Batch 90/172] avg loss 0.0122091, throughput 2.74864K wps
[Epoch 13 Batch 120/172] avg loss 0.0118866, throughput 2.75697K wps
[Epoch 13 Batch 150/172] avg loss 0.012131, throughput 2.78796K wps
Begin Testing...
[Epoch 13] train avg loss 0.0119971, dev acc 0.6771, dev avg loss 0.617282, throughput 2.76216K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.0120529, throughput 2.88059K wps
[Epoch 14 Batch 60/172] avg loss 0.0122982, throughput 2.72298K wps
[Epoch 14 Batch 90/172] avg loss 0.0117825, throughput 2.79667K wps
[Epoch 14 Batch 120/172] avg loss 0.0119513, throughput 2.82899K wps
[Epoch 14 Batch 150/172] avg loss 0.0115912, throughput 2.75536K wps
Begin Testing...
[Epoch 14] train avg loss 0.0119597, dev acc 0.6771, dev avg loss 0.615618, throughput 2.80726K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.0119744, throughput 2.77352K wps
[Epoch 15 Batch 60/172] avg loss 0.0118059, throughput 2.8149K wps
[Epoch 15 Batch 90/172] avg loss 0.0117753, throughput 2.83323K wps
[Epoch 15 Batch 120/172] avg loss 0.0120008, throughput 2.83735K wps
[Epoch 15 Batch 150/172] avg loss 0.01207, throughput 2.82187K wps
Begin Testing...
[Epoch 15] train avg loss 0.0119134, dev acc 0.6771, dev avg loss 0.613712, throughput 2.81746K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.0116266, throughput 2.83802K wps
[Epoch 16 Batch 60/172] avg loss 0.0121046, throughput 2.85335K wps
[Epoch 16 Batch 90/172] avg loss 0.0118664, throughput 2.81959K wps
[Epoch 16 Batch 120/172] avg loss 0.0117313, throughput 2.7542K wps
[Epoch 16 Batch 150/172] avg loss 0.0119623, throughput 2.82863K wps
Begin Testing...
[Epoch 16] train avg loss 0.0118575, dev acc 0.6771, dev avg loss 0.611447, throughput 2.8195K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/172] avg loss 0.0118147, throughput 2.82473K wps
[Epoch 17 Batch 60/172] avg loss 0.0115181, throughput 2.82976K wps
[Epoch 17 Batch 90/172] avg loss 0.0121663, throughput 2.79971K wps
[Epoch 17 Batch 120/172] avg loss 0.0118539, throughput 2.8351K wps
[Epoch 17 Batch 150/172] avg loss 0.0117093, throughput 2.83259K wps
Begin Testing...
[Epoch 17] train avg loss 0.0118135, dev acc 0.6771, dev avg loss 0.608714, throughput 2.8238K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.0117874, throughput 2.88408K wps
[Epoch 18 Batch 60/172] avg loss 0.0116729, throughput 2.81045K wps
[Epoch 18 Batch 90/172] avg loss 0.0116942, throughput 2.76077K wps
[Epoch 18 Batch 120/172] avg loss 0.0118293, throughput 2.78823K wps
[Epoch 18 Batch 150/172] avg loss 0.011905, throughput 2.76235K wps
Begin Testing...
[Epoch 18] train avg loss 0.0117482, dev acc 0.6771, dev avg loss 0.606422, throughput 2.79689K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.011335, throughput 2.82909K wps
[Epoch 19 Batch 60/172] avg loss 0.0117301, throughput 2.72444K wps
[Epoch 19 Batch 90/172] avg loss 0.0117732, throughput 2.70708K wps
[Epoch 19 Batch 120/172] avg loss 0.011467, throughput 2.82629K wps
[Epoch 19 Batch 150/172] avg loss 0.0118449, throughput 2.6773K wps
Begin Testing...
[Epoch 19] train avg loss 0.0116881, dev acc 0.6771, dev avg loss 0.602418, throughput 2.75242K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.0117269, throughput 2.74973K wps
[Epoch 20 Batch 60/172] avg loss 0.0115333, throughput 2.81583K wps
[Epoch 20 Batch 90/172] avg loss 0.0115836, throughput 2.76828K wps
[Epoch 20 Batch 120/172] avg loss 0.0114886, throughput 2.75144K wps
[Epoch 20 Batch 150/172] avg loss 0.0117822, throughput 2.75127K wps
Begin Testing...
[Epoch 20] train avg loss 0.0116146, dev acc 0.6771, dev avg loss 0.598553, throughput 2.76695K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/172] avg loss 0.0111563, throughput 2.6821K wps
[Epoch 21 Batch 60/172] avg loss 0.0115666, throughput 2.69482K wps
[Epoch 21 Batch 90/172] avg loss 0.011885, throughput 2.71769K wps
[Epoch 21 Batch 120/172] avg loss 0.0115035, throughput 2.78502K wps
[Epoch 21 Batch 150/172] avg loss 0.0114233, throughput 2.80398K wps
Begin Testing...
[Epoch 21] train avg loss 0.011509, dev acc 0.6845, dev avg loss 0.594103, throughput 2.74677K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.01149, throughput 2.87358K wps
[Epoch 22 Batch 60/172] avg loss 0.0113507, throughput 2.71019K wps
[Epoch 22 Batch 90/172] avg loss 0.0115482, throughput 2.78905K wps
[Epoch 22 Batch 120/172] avg loss 0.0113622, throughput 2.8207K wps
[Epoch 22 Batch 150/172] avg loss 0.0112423, throughput 2.81734K wps
Begin Testing...
[Epoch 22] train avg loss 0.0113996, dev acc 0.6855, dev avg loss 0.589091, throughput 2.80126K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/172] avg loss 0.0113744, throughput 2.85223K wps
[Epoch 23 Batch 60/172] avg loss 0.0114974, throughput 2.76521K wps
[Epoch 23 Batch 90/172] avg loss 0.0110774, throughput 2.77546K wps
[Epoch 23 Batch 120/172] avg loss 0.0113594, throughput 2.78132K wps
[Epoch 23 Batch 150/172] avg loss 0.0113465, throughput 2.75699K wps
Begin Testing...
[Epoch 23] train avg loss 0.0112838, dev acc 0.6866, dev avg loss 0.583294, throughput 2.78661K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.0112137, throughput 2.81536K wps
[Epoch 24 Batch 60/172] avg loss 0.0111325, throughput 2.73803K wps
[Epoch 24 Batch 90/172] avg loss 0.0112094, throughput 2.76312K wps
[Epoch 24 Batch 120/172] avg loss 0.0112445, throughput 2.75627K wps
[Epoch 24 Batch 150/172] avg loss 0.0111759, throughput 2.75361K wps
Begin Testing...
[Epoch 24] train avg loss 0.0111521, dev acc 0.6876, dev avg loss 0.576864, throughput 2.76552K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/172] avg loss 0.0114522, throughput 2.80714K wps
[Epoch 25 Batch 60/172] avg loss 0.0108274, throughput 2.74146K wps
[Epoch 25 Batch 90/172] avg loss 0.0110044, throughput 2.76332K wps
[Epoch 25 Batch 120/172] avg loss 0.0111026, throughput 2.74171K wps
[Epoch 25 Batch 150/172] avg loss 0.0109279, throughput 2.77528K wps
Begin Testing...
[Epoch 25] train avg loss 0.0110035, dev acc 0.6908, dev avg loss 0.569642, throughput 2.76215K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/172] avg loss 0.0110284, throughput 2.79974K wps
[Epoch 26 Batch 60/172] avg loss 0.0110478, throughput 2.82346K wps
[Epoch 26 Batch 90/172] avg loss 0.0105849, throughput 2.80429K wps
[Epoch 26 Batch 120/172] avg loss 0.0109237, throughput 2.78931K wps
[Epoch 26 Batch 150/172] avg loss 0.0108645, throughput 2.7511K wps
Begin Testing...
[Epoch 26] train avg loss 0.0108355, dev acc 0.7034, dev avg loss 0.561556, throughput 2.79458K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/172] avg loss 0.0106721, throughput 2.79864K wps
[Epoch 27 Batch 60/172] avg loss 0.0104717, throughput 2.83346K wps
[Epoch 27 Batch 90/172] avg loss 0.0104056, throughput 2.78937K wps
[Epoch 27 Batch 120/172] avg loss 0.0108197, throughput 2.83185K wps
[Epoch 27 Batch 150/172] avg loss 0.0108659, throughput 2.83847K wps
Begin Testing...
[Epoch 27] train avg loss 0.0106576, dev acc 0.7191, dev avg loss 0.551158, throughput 2.82082K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/172] avg loss 0.0105979, throughput 2.77114K wps
[Epoch 28 Batch 60/172] avg loss 0.0105651, throughput 2.7901K wps
[Epoch 28 Batch 90/172] avg loss 0.0105737, throughput 2.78884K wps
[Epoch 28 Batch 120/172] avg loss 0.0104757, throughput 2.85282K wps
[Epoch 28 Batch 150/172] avg loss 0.0100799, throughput 2.80979K wps
Begin Testing...
[Epoch 28] train avg loss 0.0104376, dev acc 0.7212, dev avg loss 0.5406, throughput 2.80804K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/172] avg loss 0.0100976, throughput 2.88017K wps
[Epoch 29 Batch 60/172] avg loss 0.0101323, throughput 2.78568K wps
[Epoch 29 Batch 90/172] avg loss 0.0102414, throughput 2.77317K wps
[Epoch 29 Batch 120/172] avg loss 0.0101364, throughput 2.77457K wps
[Epoch 29 Batch 150/172] avg loss 0.0104273, throughput 2.689K wps
Begin Testing...
[Epoch 29] train avg loss 0.0101746, dev acc 0.7390, dev avg loss 0.528837, throughput 2.77776K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/172] avg loss 0.00991154, throughput 2.84452K wps
[Epoch 30 Batch 60/172] avg loss 0.0100886, throughput 2.71906K wps
[Epoch 30 Batch 90/172] avg loss 0.00982403, throughput 2.79424K wps
[Epoch 30 Batch 120/172] avg loss 0.0101171, throughput 2.79098K wps
[Epoch 30 Batch 150/172] avg loss 0.0100173, throughput 2.79601K wps
Begin Testing...
[Epoch 30] train avg loss 0.00996931, dev acc 0.7631, dev avg loss 0.517414, throughput 2.79136K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/172] avg loss 0.0097238, throughput 2.79394K wps
[Epoch 31 Batch 60/172] avg loss 0.00974465, throughput 2.77913K wps
[Epoch 31 Batch 90/172] avg loss 0.00942907, throughput 2.79449K wps
[Epoch 31 Batch 120/172] avg loss 0.00966714, throughput 2.77665K wps
[Epoch 31 Batch 150/172] avg loss 0.00952697, throughput 2.69829K wps
Begin Testing...
[Epoch 31] train avg loss 0.00964122, dev acc 0.7516, dev avg loss 0.504008, throughput 2.77426K wps
[Epoch 32 Batch 30/172] avg loss 0.00915505, throughput 2.79996K wps
[Epoch 32 Batch 60/172] avg loss 0.00918063, throughput 2.82177K wps
[Epoch 32 Batch 90/172] avg loss 0.00949615, throughput 2.77916K wps
[Epoch 32 Batch 120/172] avg loss 0.0094845, throughput 2.80201K wps
[Epoch 32 Batch 150/172] avg loss 0.00948598, throughput 2.71247K wps
Begin Testing...
[Epoch 32] train avg loss 0.00936154, dev acc 0.7893, dev avg loss 0.491915, throughput 2.78661K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/172] avg loss 0.00914173, throughput 2.84012K wps
[Epoch 33 Batch 60/172] avg loss 0.00916346, throughput 2.80307K wps
[Epoch 33 Batch 90/172] avg loss 0.00926647, throughput 2.79715K wps
[Epoch 33 Batch 120/172] avg loss 0.00850782, throughput 2.78249K wps
[Epoch 33 Batch 150/172] avg loss 0.0088587, throughput 2.76706K wps
Begin Testing...
[Epoch 33] train avg loss 0.00900897, dev acc 0.7809, dev avg loss 0.475937, throughput 2.79669K wps
[Epoch 34 Batch 30/172] avg loss 0.00862031, throughput 2.85789K wps
[Epoch 34 Batch 60/172] avg loss 0.00869039, throughput 2.72187K wps
[Epoch 34 Batch 90/172] avg loss 0.00916359, throughput 2.82121K wps
[Epoch 34 Batch 120/172] avg loss 0.00866677, throughput 2.769K wps
[Epoch 34 Batch 150/172] avg loss 0.00870005, throughput 2.81833K wps
Begin Testing...
[Epoch 34] train avg loss 0.00870091, dev acc 0.8103, dev avg loss 0.465006, throughput 2.79358K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/172] avg loss 0.0086835, throughput 2.83587K wps
[Epoch 35 Batch 60/172] avg loss 0.00816618, throughput 2.75674K wps
[Epoch 35 Batch 90/172] avg loss 0.00844238, throughput 2.72739K wps
[Epoch 35 Batch 120/172] avg loss 0.00832206, throughput 2.76871K wps
[Epoch 35 Batch 150/172] avg loss 0.00820539, throughput 2.78413K wps
Begin Testing...
[Epoch 35] train avg loss 0.00835819, dev acc 0.7883, dev avg loss 0.452162, throughput 2.77725K wps
[Epoch 36 Batch 30/172] avg loss 0.00805265, throughput 2.77237K wps
[Epoch 36 Batch 60/172] avg loss 0.00795264, throughput 2.77228K wps
[Epoch 36 Batch 90/172] avg loss 0.00796517, throughput 2.79286K wps
[Epoch 36 Batch 120/172] avg loss 0.00806925, throughput 2.67018K wps
[Epoch 36 Batch 150/172] avg loss 0.00792425, throughput 2.7636K wps
Begin Testing...
[Epoch 36] train avg loss 0.00803845, dev acc 0.8176, dev avg loss 0.435606, throughput 2.75402K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/172] avg loss 0.00747797, throughput 2.74576K wps
[Epoch 37 Batch 60/172] avg loss 0.00782386, throughput 2.81234K wps
[Epoch 37 Batch 90/172] avg loss 0.00808831, throughput 2.73958K wps
[Epoch 37 Batch 120/172] avg loss 0.00738673, throughput 2.76519K wps
[Epoch 37 Batch 150/172] avg loss 0.00763435, throughput 2.77455K wps
Begin Testing...
[Epoch 37] train avg loss 0.00766432, dev acc 0.8270, dev avg loss 0.4234, throughput 2.75292K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/172] avg loss 0.00732712, throughput 2.76457K wps
[Epoch 38 Batch 60/172] avg loss 0.00749765, throughput 2.78513K wps
[Epoch 38 Batch 90/172] avg loss 0.00724621, throughput 2.78413K wps
[Epoch 38 Batch 120/172] avg loss 0.00737033, throughput 2.76123K wps
[Epoch 38 Batch 150/172] avg loss 0.00737739, throughput 2.75598K wps
Begin Testing...
[Epoch 38] train avg loss 0.00735674, dev acc 0.8396, dev avg loss 0.415524, throughput 2.76817K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/172] avg loss 0.00724498, throughput 2.79559K wps
[Epoch 39 Batch 60/172] avg loss 0.00679503, throughput 2.77115K wps
[Epoch 39 Batch 90/172] avg loss 0.00726699, throughput 2.82579K wps
[Epoch 39 Batch 120/172] avg loss 0.00702016, throughput 2.74464K wps
[Epoch 39 Batch 150/172] avg loss 0.00713743, throughput 2.77312K wps
Begin Testing...
[Epoch 39] train avg loss 0.00706446, dev acc 0.8375, dev avg loss 0.401541, throughput 2.78194K wps
[Epoch 40 Batch 30/172] avg loss 0.0070581, throughput 2.82589K wps
[Epoch 40 Batch 60/172] avg loss 0.00658181, throughput 2.77559K wps
[Epoch 40 Batch 90/172] avg loss 0.00662633, throughput 2.76468K wps
[Epoch 40 Batch 120/172] avg loss 0.00666847, throughput 2.81123K wps
[Epoch 40 Batch 150/172] avg loss 0.00667579, throughput 2.79345K wps
Begin Testing...
[Epoch 40] train avg loss 0.00670334, dev acc 0.8428, dev avg loss 0.392334, throughput 2.79434K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/172] avg loss 0.00644755, throughput 2.82378K wps
[Epoch 41 Batch 60/172] avg loss 0.00646199, throughput 2.78451K wps
[Epoch 41 Batch 90/172] avg loss 0.00656591, throughput 2.79251K wps
[Epoch 41 Batch 120/172] avg loss 0.00678735, throughput 2.81231K wps
[Epoch 41 Batch 150/172] avg loss 0.00626837, throughput 2.77817K wps
Begin Testing...
[Epoch 41] train avg loss 0.00650543, dev acc 0.8438, dev avg loss 0.382876, throughput 2.79622K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/172] avg loss 0.00618258, throughput 2.88867K wps
[Epoch 42 Batch 60/172] avg loss 0.0063385, throughput 2.80122K wps
[Epoch 42 Batch 90/172] avg loss 0.00589665, throughput 2.77846K wps
[Epoch 42 Batch 120/172] avg loss 0.00632738, throughput 2.72077K wps
[Epoch 42 Batch 150/172] avg loss 0.00601894, throughput 2.69158K wps
Begin Testing...
[Epoch 42] train avg loss 0.0061496, dev acc 0.8553, dev avg loss 0.378058, throughput 2.78324K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/172] avg loss 0.00579926, throughput 2.77358K wps
[Epoch 43 Batch 60/172] avg loss 0.00601614, throughput 2.7525K wps
[Epoch 43 Batch 90/172] avg loss 0.00591932, throughput 2.737K wps
[Epoch 43 Batch 120/172] avg loss 0.0059246, throughput 2.77263K wps
[Epoch 43 Batch 150/172] avg loss 0.0057473, throughput 2.76074K wps
Begin Testing...
[Epoch 43] train avg loss 0.00591829, dev acc 0.8532, dev avg loss 0.368084, throughput 2.75622K wps
[Epoch 44 Batch 30/172] avg loss 0.00583818, throughput 2.82122K wps
[Epoch 44 Batch 60/172] avg loss 0.00554757, throughput 2.65979K wps
[Epoch 44 Batch 90/172] avg loss 0.00558169, throughput 2.78993K wps
[Epoch 44 Batch 120/172] avg loss 0.00537738, throughput 2.73621K wps
[Epoch 44 Batch 150/172] avg loss 0.00535506, throughput 2.82446K wps
Begin Testing...
[Epoch 44] train avg loss 0.00560569, dev acc 0.8585, dev avg loss 0.360341, throughput 2.76421K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/172] avg loss 0.00555223, throughput 2.81467K wps
[Epoch 45 Batch 60/172] avg loss 0.00542113, throughput 2.76929K wps
[Epoch 45 Batch 90/172] avg loss 0.00535511, throughput 2.79444K wps
[Epoch 45 Batch 120/172] avg loss 0.00530621, throughput 2.78398K wps
[Epoch 45 Batch 150/172] avg loss 0.0055071, throughput 2.77797K wps
Begin Testing...
[Epoch 45] train avg loss 0.0054448, dev acc 0.8595, dev avg loss 0.356156, throughput 2.77568K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/172] avg loss 0.00495911, throughput 2.86513K wps
[Epoch 46 Batch 60/172] avg loss 0.00525172, throughput 2.80723K wps
[Epoch 46 Batch 90/172] avg loss 0.00529351, throughput 2.77977K wps
[Epoch 46 Batch 120/172] avg loss 0.00529499, throughput 2.82834K wps
[Epoch 46 Batch 150/172] avg loss 0.00507804, throughput 2.76618K wps
Begin Testing...
[Epoch 46] train avg loss 0.0051837, dev acc 0.8585, dev avg loss 0.350329, throughput 2.81231K wps
[Epoch 47 Batch 30/172] avg loss 0.00498851, throughput 2.80846K wps
[Epoch 47 Batch 60/172] avg loss 0.00530755, throughput 2.71932K wps
[Epoch 47 Batch 90/172] avg loss 0.00490553, throughput 2.7881K wps
[Epoch 47 Batch 120/172] avg loss 0.00482992, throughput 2.84315K wps
[Epoch 47 Batch 150/172] avg loss 0.00496976, throughput 2.79913K wps
Begin Testing...
[Epoch 47] train avg loss 0.0049748, dev acc 0.8616, dev avg loss 0.352101, throughput 2.79221K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/172] avg loss 0.0048628, throughput 2.86549K wps
[Epoch 48 Batch 60/172] avg loss 0.0046855, throughput 2.81507K wps
[Epoch 48 Batch 90/172] avg loss 0.00458268, throughput 2.78027K wps
[Epoch 48 Batch 120/172] avg loss 0.00500083, throughput 2.78904K wps
[Epoch 48 Batch 150/172] avg loss 0.00440674, throughput 2.80957K wps
Begin Testing...
[Epoch 48] train avg loss 0.00474726, dev acc 0.8616, dev avg loss 0.344478, throughput 2.81027K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/172] avg loss 0.00443171, throughput 2.76301K wps
[Epoch 49 Batch 60/172] avg loss 0.00443686, throughput 2.76209K wps
[Epoch 49 Batch 90/172] avg loss 0.00492612, throughput 2.78852K wps
[Epoch 49 Batch 120/172] avg loss 0.00417784, throughput 2.82162K wps
[Epoch 49 Batch 150/172] avg loss 0.0048037, throughput 2.80947K wps
Begin Testing...
[Epoch 49] train avg loss 0.00455315, dev acc 0.8658, dev avg loss 0.342993, throughput 2.78436K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/172] avg loss 0.00453991, throughput 2.84462K wps
[Epoch 50 Batch 60/172] avg loss 0.00440803, throughput 2.82352K wps
[Epoch 50 Batch 90/172] avg loss 0.00434066, throughput 2.81574K wps
[Epoch 50 Batch 120/172] avg loss 0.00431991, throughput 2.74735K wps
[Epoch 50 Batch 150/172] avg loss 0.00448712, throughput 2.74511K wps
Begin Testing...
[Epoch 50] train avg loss 0.00440511, dev acc 0.8637, dev avg loss 0.338743, throughput 2.78987K wps
[Epoch 51 Batch 30/172] avg loss 0.00414351, throughput 2.74347K wps
[Epoch 51 Batch 60/172] avg loss 0.00403137, throughput 2.76062K wps
[Epoch 51 Batch 90/172] avg loss 0.00414633, throughput 2.69504K wps
[Epoch 51 Batch 120/172] avg loss 0.00438921, throughput 2.69189K wps
[Epoch 51 Batch 150/172] avg loss 0.00428068, throughput 2.74094K wps
Begin Testing...
[Epoch 51] train avg loss 0.00421318, dev acc 0.8627, dev avg loss 0.33556, throughput 2.72286K wps
[Epoch 52 Batch 30/172] avg loss 0.00376748, throughput 2.77545K wps
[Epoch 52 Batch 60/172] avg loss 0.00364228, throughput 2.76596K wps
[Epoch 52 Batch 90/172] avg loss 0.00442606, throughput 2.78188K wps
[Epoch 52 Batch 120/172] avg loss 0.00431899, throughput 2.7697K wps
[Epoch 52 Batch 150/172] avg loss 0.00386553, throughput 2.76917K wps
Begin Testing...
[Epoch 52] train avg loss 0.00402267, dev acc 0.8637, dev avg loss 0.336354, throughput 2.77016K wps
[Epoch 53 Batch 30/172] avg loss 0.00397309, throughput 2.82257K wps
[Epoch 53 Batch 60/172] avg loss 0.00382461, throughput 2.76063K wps
[Epoch 53 Batch 90/172] avg loss 0.0037469, throughput 2.67681K wps
[Epoch 53 Batch 120/172] avg loss 0.00376973, throughput 2.74309K wps
[Epoch 53 Batch 150/172] avg loss 0.00398498, throughput 2.76929K wps
Begin Testing...
[Epoch 53] train avg loss 0.0039032, dev acc 0.8637, dev avg loss 0.333844, throughput 2.76813K wps
[Epoch 54 Batch 30/172] avg loss 0.00379169, throughput 2.8589K wps
[Epoch 54 Batch 60/172] avg loss 0.00360022, throughput 2.81618K wps
[Epoch 54 Batch 90/172] avg loss 0.00384834, throughput 2.72441K wps
[Epoch 54 Batch 120/172] avg loss 0.00392672, throughput 2.816K wps
[Epoch 54 Batch 150/172] avg loss 0.00363249, throughput 2.79625K wps
Begin Testing...
[Epoch 54] train avg loss 0.00373247, dev acc 0.8658, dev avg loss 0.334851, throughput 2.78835K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/172] avg loss 0.00378747, throughput 2.86015K wps
[Epoch 55 Batch 60/172] avg loss 0.00341655, throughput 2.81015K wps
[Epoch 55 Batch 90/172] avg loss 0.00351793, throughput 2.81527K wps
[Epoch 55 Batch 120/172] avg loss 0.0036868, throughput 2.81782K wps
[Epoch 55 Batch 150/172] avg loss 0.00409474, throughput 2.81077K wps
Begin Testing...
[Epoch 55] train avg loss 0.00366964, dev acc 0.8648, dev avg loss 0.334057, throughput 2.81862K wps
[Epoch 56 Batch 30/172] avg loss 0.00360283, throughput 2.75036K wps
[Epoch 56 Batch 60/172] avg loss 0.00334888, throughput 2.7122K wps
[Epoch 56 Batch 90/172] avg loss 0.00337111, throughput 2.85091K wps
[Epoch 56 Batch 120/172] avg loss 0.00357466, throughput 2.77139K wps
[Epoch 56 Batch 150/172] avg loss 0.003373, throughput 2.81027K wps
Begin Testing...
[Epoch 56] train avg loss 0.00349823, dev acc 0.8658, dev avg loss 0.332951, throughput 2.78005K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/172] avg loss 0.00350564, throughput 2.85929K wps
[Epoch 57 Batch 60/172] avg loss 0.00290241, throughput 2.77296K wps
[Epoch 57 Batch 90/172] avg loss 0.00394034, throughput 2.79785K wps
[Epoch 57 Batch 120/172] avg loss 0.00327601, throughput 2.79271K wps
[Epoch 57 Batch 150/172] avg loss 0.00353437, throughput 2.79899K wps
Begin Testing...
[Epoch 57] train avg loss 0.00339362, dev acc 0.8669, dev avg loss 0.334216, throughput 2.78303K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/172] avg loss 0.00337228, throughput 2.84943K wps
[Epoch 58 Batch 60/172] avg loss 0.00315315, throughput 2.82686K wps
[Epoch 58 Batch 90/172] avg loss 0.00311373, throughput 2.81664K wps
[Epoch 58 Batch 120/172] avg loss 0.00353992, throughput 2.78229K wps
[Epoch 58 Batch 150/172] avg loss 0.00340581, throughput 2.75304K wps
Begin Testing...
[Epoch 58] train avg loss 0.00327518, dev acc 0.8690, dev avg loss 0.335285, throughput 2.8041K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/172] avg loss 0.00337661, throughput 2.7965K wps
[Epoch 59 Batch 60/172] avg loss 0.003106, throughput 2.81166K wps
[Epoch 59 Batch 90/172] avg loss 0.00313017, throughput 2.79821K wps
[Epoch 59 Batch 120/172] avg loss 0.00309028, throughput 2.80028K wps
[Epoch 59 Batch 150/172] avg loss 0.00320406, throughput 2.83042K wps
Begin Testing...
[Epoch 59] train avg loss 0.00320572, dev acc 0.8690, dev avg loss 0.337636, throughput 2.80872K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/172] avg loss 0.0030339, throughput 2.82823K wps
[Epoch 60 Batch 60/172] avg loss 0.00283424, throughput 2.75388K wps
[Epoch 60 Batch 90/172] avg loss 0.00279092, throughput 2.80897K wps
[Epoch 60 Batch 120/172] avg loss 0.00337567, throughput 2.80412K wps
[Epoch 60 Batch 150/172] avg loss 0.00324471, throughput 2.79929K wps
Begin Testing...
[Epoch 60] train avg loss 0.00310649, dev acc 0.8658, dev avg loss 0.336796, throughput 2.79236K wps
[Epoch 61 Batch 30/172] avg loss 0.00279192, throughput 2.79121K wps
[Epoch 61 Batch 60/172] avg loss 0.00327009, throughput 2.83857K wps
[Epoch 61 Batch 90/172] avg loss 0.0028791, throughput 2.80222K wps
[Epoch 61 Batch 120/172] avg loss 0.00293246, throughput 2.81621K wps
[Epoch 61 Batch 150/172] avg loss 0.00315142, throughput 2.8091K wps
Begin Testing...
[Epoch 61] train avg loss 0.00300418, dev acc 0.8627, dev avg loss 0.336352, throughput 2.80741K wps
[Epoch 62 Batch 30/172] avg loss 0.00276619, throughput 2.76849K wps
[Epoch 62 Batch 60/172] avg loss 0.00286006, throughput 2.7559K wps
[Epoch 62 Batch 90/172] avg loss 0.0031049, throughput 2.73138K wps
[Epoch 62 Batch 120/172] avg loss 0.00286202, throughput 2.77595K wps
[Epoch 62 Batch 150/172] avg loss 0.00284848, throughput 2.79332K wps
Begin Testing...
[Epoch 62] train avg loss 0.00288846, dev acc 0.8679, dev avg loss 0.346437, throughput 2.76655K wps
[Epoch 63 Batch 30/172] avg loss 0.002753, throughput 2.82626K wps
[Epoch 63 Batch 60/172] avg loss 0.00301076, throughput 2.80965K wps
[Epoch 63 Batch 90/172] avg loss 0.00271557, throughput 2.67148K wps
[Epoch 63 Batch 120/172] avg loss 0.00313857, throughput 2.84744K wps
[Epoch 63 Batch 150/172] avg loss 0.00285072, throughput 2.79615K wps
Begin Testing...
[Epoch 63] train avg loss 0.00287341, dev acc 0.8637, dev avg loss 0.339722, throughput 2.77482K wps
[Epoch 64 Batch 30/172] avg loss 0.00257541, throughput 2.84078K wps
[Epoch 64 Batch 60/172] avg loss 0.0030165, throughput 2.79182K wps
[Epoch 64 Batch 90/172] avg loss 0.0025529, throughput 2.67651K wps
[Epoch 64 Batch 120/172] avg loss 0.00288853, throughput 2.85079K wps
[Epoch 64 Batch 150/172] avg loss 0.00286592, throughput 2.81421K wps
Begin Testing...
[Epoch 64] train avg loss 0.00279997, dev acc 0.8648, dev avg loss 0.341522, throughput 2.79573K wps
[Epoch 65 Batch 30/172] avg loss 0.00275258, throughput 2.77907K wps
[Epoch 65 Batch 60/172] avg loss 0.00281491, throughput 2.74992K wps
[Epoch 65 Batch 90/172] avg loss 0.00251111, throughput 2.82743K wps
[Epoch 65 Batch 120/172] avg loss 0.00292785, throughput 2.74149K wps
[Epoch 65 Batch 150/172] avg loss 0.00302932, throughput 2.75424K wps
Begin Testing...
[Epoch 65] train avg loss 0.00279914, dev acc 0.8690, dev avg loss 0.34957, throughput 2.77128K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/172] avg loss 0.0026667, throughput 2.72784K wps
[Epoch 66 Batch 60/172] avg loss 0.00266968, throughput 2.65754K wps
[Epoch 66 Batch 90/172] avg loss 0.00245405, throughput 2.81713K wps
[Epoch 66 Batch 120/172] avg loss 0.00288721, throughput 2.74156K wps
[Epoch 66 Batch 150/172] avg loss 0.0027185, throughput 2.74861K wps
Begin Testing...
[Epoch 66] train avg loss 0.0026827, dev acc 0.8690, dev avg loss 0.352194, throughput 2.7336K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/172] avg loss 0.00268951, throughput 2.69799K wps
[Epoch 67 Batch 60/172] avg loss 0.0024989, throughput 2.76455K wps
[Epoch 67 Batch 90/172] avg loss 0.0025545, throughput 2.67724K wps
[Epoch 67 Batch 120/172] avg loss 0.00244354, throughput 2.81498K wps
[Epoch 67 Batch 150/172] avg loss 0.00262836, throughput 2.7297K wps
Begin Testing...
[Epoch 67] train avg loss 0.00258341, dev acc 0.8648, dev avg loss 0.372333, throughput 2.74237K wps
[Epoch 68 Batch 30/172] avg loss 0.00241194, throughput 2.81143K wps
[Epoch 68 Batch 60/172] avg loss 0.00260778, throughput 2.69484K wps
[Epoch 68 Batch 90/172] avg loss 0.00256922, throughput 2.67358K wps
[Epoch 68 Batch 120/172] avg loss 0.00256512, throughput 2.82818K wps
[Epoch 68 Batch 150/172] avg loss 0.00263957, throughput 2.72987K wps
Begin Testing...
[Epoch 68] train avg loss 0.00258592, dev acc 0.8669, dev avg loss 0.358102, throughput 2.72754K wps
[Epoch 69 Batch 30/172] avg loss 0.00239171, throughput 2.74978K wps
[Epoch 69 Batch 60/172] avg loss 0.00263398, throughput 2.67982K wps
[Epoch 69 Batch 90/172] avg loss 0.00232969, throughput 2.75898K wps
[Epoch 69 Batch 120/172] avg loss 0.00265785, throughput 2.78026K wps
[Epoch 69 Batch 150/172] avg loss 0.0024935, throughput 2.77428K wps
Begin Testing...
[Epoch 69] train avg loss 0.0024954, dev acc 0.8595, dev avg loss 0.350895, throughput 2.75199K wps
[Epoch 70 Batch 30/172] avg loss 0.00221731, throughput 2.76851K wps
[Epoch 70 Batch 60/172] avg loss 0.00248513, throughput 2.79887K wps
[Epoch 70 Batch 90/172] avg loss 0.00216858, throughput 2.81509K wps
[Epoch 70 Batch 120/172] avg loss 0.0022879, throughput 2.7667K wps
[Epoch 70 Batch 150/172] avg loss 0.00287197, throughput 2.76389K wps
Begin Testing...
[Epoch 70] train avg loss 0.00245688, dev acc 0.8595, dev avg loss 0.352932, throughput 2.7804K wps
[Epoch 71 Batch 30/172] avg loss 0.00279275, throughput 2.8084K wps
[Epoch 71 Batch 60/172] avg loss 0.00218205, throughput 2.80725K wps
[Epoch 71 Batch 90/172] avg loss 0.00226588, throughput 2.81065K wps
[Epoch 71 Batch 120/172] avg loss 0.00234304, throughput 2.82397K wps
[Epoch 71 Batch 150/172] avg loss 0.00266491, throughput 2.81598K wps
Begin Testing...
[Epoch 71] train avg loss 0.00241804, dev acc 0.8574, dev avg loss 0.356836, throughput 2.81477K wps
[Epoch 72 Batch 30/172] avg loss 0.00263763, throughput 2.86156K wps
[Epoch 72 Batch 60/172] avg loss 0.00226926, throughput 2.82947K wps
[Epoch 72 Batch 90/172] avg loss 0.00226994, throughput 2.82308K wps
[Epoch 72 Batch 120/172] avg loss 0.00241809, throughput 2.81694K wps
[Epoch 72 Batch 150/172] avg loss 0.00236893, throughput 2.81894K wps
Begin Testing...
[Epoch 72] train avg loss 0.00235661, dev acc 0.8648, dev avg loss 0.364803, throughput 2.82644K wps
[Epoch 73 Batch 30/172] avg loss 0.00248691, throughput 2.86417K wps
[Epoch 73 Batch 60/172] avg loss 0.00212897, throughput 2.83544K wps
[Epoch 73 Batch 90/172] avg loss 0.00256181, throughput 2.79447K wps
[Epoch 73 Batch 120/172] avg loss 0.00222815, throughput 2.77425K wps
[Epoch 73 Batch 150/172] avg loss 0.00222804, throughput 2.85001K wps
Begin Testing...
[Epoch 73] train avg loss 0.00238912, dev acc 0.8595, dev avg loss 0.365964, throughput 2.82432K wps
[Epoch 74 Batch 30/172] avg loss 0.00221499, throughput 2.84767K wps
[Epoch 74 Batch 60/172] avg loss 0.00213225, throughput 2.83434K wps
[Epoch 74 Batch 90/172] avg loss 0.00238447, throughput 2.80928K wps
[Epoch 74 Batch 120/172] avg loss 0.00235644, throughput 2.79332K wps
[Epoch 74 Batch 150/172] avg loss 0.00219579, throughput 2.81565K wps
Begin Testing...
[Epoch 74] train avg loss 0.00226007, dev acc 0.8595, dev avg loss 0.368065, throughput 2.82059K wps
[Epoch 75 Batch 30/172] avg loss 0.00220352, throughput 2.80905K wps
[Epoch 75 Batch 60/172] avg loss 0.00231412, throughput 2.82636K wps
[Epoch 75 Batch 90/172] avg loss 0.00228409, throughput 2.83456K wps
[Epoch 75 Batch 120/172] avg loss 0.00197362, throughput 2.78754K wps
[Epoch 75 Batch 150/172] avg loss 0.00229139, throughput 2.83638K wps
Begin Testing...
[Epoch 75] train avg loss 0.00225527, dev acc 0.8553, dev avg loss 0.364161, throughput 2.82069K wps
[Epoch 76 Batch 30/172] avg loss 0.00213414, throughput 2.86454K wps
[Epoch 76 Batch 60/172] avg loss 0.00216009, throughput 2.82426K wps
[Epoch 76 Batch 90/172] avg loss 0.00208092, throughput 2.82313K wps
[Epoch 76 Batch 120/172] avg loss 0.00220359, throughput 2.79988K wps
[Epoch 76 Batch 150/172] avg loss 0.00229447, throughput 2.79259K wps
Begin Testing...
[Epoch 76] train avg loss 0.00215841, dev acc 0.8606, dev avg loss 0.373186, throughput 2.81951K wps
[Epoch 77 Batch 30/172] avg loss 0.00218754, throughput 2.86423K wps
[Epoch 77 Batch 60/172] avg loss 0.00220621, throughput 2.79967K wps
[Epoch 77 Batch 90/172] avg loss 0.00207878, throughput 2.77777K wps
[Epoch 77 Batch 120/172] avg loss 0.00215402, throughput 2.7587K wps
[Epoch 77 Batch 150/172] avg loss 0.00237065, throughput 2.75077K wps
Begin Testing...
[Epoch 77] train avg loss 0.00218325, dev acc 0.8585, dev avg loss 0.370591, throughput 2.78239K wps
[Epoch 78 Batch 30/172] avg loss 0.00219462, throughput 2.76555K wps
[Epoch 78 Batch 60/172] avg loss 0.00207359, throughput 2.7486K wps
[Epoch 78 Batch 90/172] avg loss 0.00227165, throughput 2.68982K wps
[Epoch 78 Batch 120/172] avg loss 0.00209877, throughput 2.72707K wps
[Epoch 78 Batch 150/172] avg loss 0.00232099, throughput 2.71706K wps
Begin Testing...
[Epoch 78] train avg loss 0.00214705, dev acc 0.8585, dev avg loss 0.37828, throughput 2.71799K wps
[Epoch 79 Batch 30/172] avg loss 0.00180234, throughput 2.63203K wps
[Epoch 79 Batch 60/172] avg loss 0.00206294, throughput 2.61453K wps
[Epoch 79 Batch 90/172] avg loss 0.0021629, throughput 2.80494K wps
[Epoch 79 Batch 120/172] avg loss 0.00210286, throughput 2.73579K wps
[Epoch 79 Batch 150/172] avg loss 0.00228468, throughput 2.78261K wps
Begin Testing...
[Epoch 79] train avg loss 0.00211854, dev acc 0.8627, dev avg loss 0.388365, throughput 2.72003K wps
[Epoch 80 Batch 30/172] avg loss 0.00214771, throughput 2.69779K wps
[Epoch 80 Batch 60/172] avg loss 0.00188688, throughput 2.69431K wps
[Epoch 80 Batch 90/172] avg loss 0.00207307, throughput 2.8157K wps
[Epoch 80 Batch 120/172] avg loss 0.00213332, throughput 2.79218K wps
[Epoch 80 Batch 150/172] avg loss 0.00241133, throughput 2.77828K wps
Begin Testing...
[Epoch 80] train avg loss 0.00209974, dev acc 0.8627, dev avg loss 0.384726, throughput 2.74784K wps
[Epoch 81 Batch 30/172] avg loss 0.00208944, throughput 2.81371K wps
[Epoch 81 Batch 60/172] avg loss 0.00186493, throughput 2.77444K wps
[Epoch 81 Batch 90/172] avg loss 0.00227357, throughput 2.72812K wps
[Epoch 81 Batch 120/172] avg loss 0.00190915, throughput 2.77533K wps
[Epoch 81 Batch 150/172] avg loss 0.00206847, throughput 2.75701K wps
Begin Testing...
[Epoch 81] train avg loss 0.00206147, dev acc 0.8543, dev avg loss 0.375611, throughput 2.77184K wps
[Epoch 82 Batch 30/172] avg loss 0.00215131, throughput 2.89401K wps
[Epoch 82 Batch 60/172] avg loss 0.00179437, throughput 2.83776K wps
[Epoch 82 Batch 90/172] avg loss 0.001919, throughput 2.84347K wps
[Epoch 82 Batch 120/172] avg loss 0.00233269, throughput 2.77955K wps
[Epoch 82 Batch 150/172] avg loss 0.00182539, throughput 2.79875K wps
Begin Testing...
[Epoch 82] train avg loss 0.00204895, dev acc 0.8585, dev avg loss 0.387332, throughput 2.82368K wps
[Epoch 83 Batch 30/172] avg loss 0.00168998, throughput 2.80939K wps
[Epoch 83 Batch 60/172] avg loss 0.00203243, throughput 2.80316K wps
[Epoch 83 Batch 90/172] avg loss 0.00209701, throughput 2.79168K wps
[Epoch 83 Batch 120/172] avg loss 0.00193825, throughput 2.7954K wps
[Epoch 83 Batch 150/172] avg loss 0.00194703, throughput 2.79755K wps
Begin Testing...
[Epoch 83] train avg loss 0.00199476, dev acc 0.8595, dev avg loss 0.38654, throughput 2.79817K wps
[Epoch 84 Batch 30/172] avg loss 0.00204851, throughput 2.84215K wps
[Epoch 84 Batch 60/172] avg loss 0.00193269, throughput 2.78533K wps
[Epoch 84 Batch 90/172] avg loss 0.00190559, throughput 2.78061K wps
[Epoch 84 Batch 120/172] avg loss 0.00199694, throughput 2.79627K wps
[Epoch 84 Batch 150/172] avg loss 0.00176606, throughput 2.79077K wps
Begin Testing...
[Epoch 84] train avg loss 0.00196213, dev acc 0.8585, dev avg loss 0.389968, throughput 2.78474K wps
[Epoch 85 Batch 30/172] avg loss 0.00179558, throughput 2.7999K wps
[Epoch 85 Batch 60/172] avg loss 0.00174818, throughput 2.77005K wps
[Epoch 85 Batch 90/172] avg loss 0.00196641, throughput 2.74127K wps
[Epoch 85 Batch 120/172] avg loss 0.00201874, throughput 2.80226K wps
[Epoch 85 Batch 150/172] avg loss 0.00198079, throughput 2.83992K wps
Begin Testing...
[Epoch 85] train avg loss 0.00191404, dev acc 0.8595, dev avg loss 0.39879, throughput 2.79224K wps
[Epoch 86 Batch 30/172] avg loss 0.00210144, throughput 2.87973K wps
[Epoch 86 Batch 60/172] avg loss 0.00174698, throughput 2.78588K wps
[Epoch 86 Batch 90/172] avg loss 0.00162808, throughput 2.7444K wps
[Epoch 86 Batch 120/172] avg loss 0.00198212, throughput 2.77991K wps
[Epoch 86 Batch 150/172] avg loss 0.00212199, throughput 2.78006K wps
Begin Testing...
[Epoch 86] train avg loss 0.00190903, dev acc 0.8585, dev avg loss 0.398281, throughput 2.78874K wps
[Epoch 87 Batch 30/172] avg loss 0.00176298, throughput 2.83358K wps
[Epoch 87 Batch 60/172] avg loss 0.00201982, throughput 2.81646K wps
[Epoch 87 Batch 90/172] avg loss 0.00179356, throughput 2.81993K wps
[Epoch 87 Batch 120/172] avg loss 0.0019059, throughput 2.77176K wps
[Epoch 87 Batch 150/172] avg loss 0.00198653, throughput 2.87428K wps
Begin Testing...
[Epoch 87] train avg loss 0.0019093, dev acc 0.8606, dev avg loss 0.398663, throughput 2.8242K wps
[Epoch 88 Batch 30/172] avg loss 0.00177893, throughput 2.74333K wps
[Epoch 88 Batch 60/172] avg loss 0.00162028, throughput 2.64998K wps
[Epoch 88 Batch 90/172] avg loss 0.00183103, throughput 2.74596K wps
[Epoch 88 Batch 120/172] avg loss 0.00179876, throughput 2.88059K wps
[Epoch 88 Batch 150/172] avg loss 0.00234452, throughput 2.82851K wps
Begin Testing...
[Epoch 88] train avg loss 0.00186147, dev acc 0.8616, dev avg loss 0.41406, throughput 2.77324K wps
[Epoch 89 Batch 30/172] avg loss 0.00150702, throughput 2.85136K wps
[Epoch 89 Batch 60/172] avg loss 0.00199992, throughput 2.8096K wps
[Epoch 89 Batch 90/172] avg loss 0.00185616, throughput 2.80563K wps
[Epoch 89 Batch 120/172] avg loss 0.00197031, throughput 2.8145K wps
[Epoch 89 Batch 150/172] avg loss 0.00179446, throughput 2.81628K wps
Begin Testing...
[Epoch 89] train avg loss 0.00185945, dev acc 0.8574, dev avg loss 0.396291, throughput 2.8089K wps
[Epoch 90 Batch 30/172] avg loss 0.00137833, throughput 2.86942K wps
[Epoch 90 Batch 60/172] avg loss 0.0018791, throughput 2.78281K wps
[Epoch 90 Batch 90/172] avg loss 0.00179993, throughput 2.7928K wps
[Epoch 90 Batch 120/172] avg loss 0.00175334, throughput 2.8268K wps
[Epoch 90 Batch 150/172] avg loss 0.00214998, throughput 2.78917K wps
Begin Testing...
[Epoch 90] train avg loss 0.00183379, dev acc 0.8595, dev avg loss 0.417134, throughput 2.79979K wps
[Epoch 91 Batch 30/172] avg loss 0.00177299, throughput 2.85622K wps
[Epoch 91 Batch 60/172] avg loss 0.00181126, throughput 2.75891K wps
[Epoch 91 Batch 90/172] avg loss 0.00179211, throughput 2.75275K wps
[Epoch 91 Batch 120/172] avg loss 0.00192307, throughput 2.82239K wps
[Epoch 91 Batch 150/172] avg loss 0.00193193, throughput 2.76841K wps
Begin Testing...
[Epoch 91] train avg loss 0.00181014, dev acc 0.8606, dev avg loss 0.422338, throughput 2.78944K wps
[Epoch 92 Batch 30/172] avg loss 0.00201558, throughput 2.73848K wps
[Epoch 92 Batch 60/172] avg loss 0.00173293, throughput 2.78716K wps
[Epoch 92 Batch 90/172] avg loss 0.00167046, throughput 2.7828K wps
[Epoch 92 Batch 120/172] avg loss 0.00175442, throughput 2.79074K wps
[Epoch 92 Batch 150/172] avg loss 0.00168036, throughput 2.79336K wps
Begin Testing...
[Epoch 92] train avg loss 0.00180432, dev acc 0.8606, dev avg loss 0.404468, throughput 2.77807K wps
[Epoch 93 Batch 30/172] avg loss 0.00200851, throughput 2.83366K wps
[Epoch 93 Batch 60/172] avg loss 0.00152573, throughput 2.80479K wps
[Epoch 93 Batch 90/172] avg loss 0.00159933, throughput 2.80468K wps
[Epoch 93 Batch 120/172] avg loss 0.00181557, throughput 2.80722K wps
[Epoch 93 Batch 150/172] avg loss 0.00186441, throughput 2.78744K wps
Begin Testing...
[Epoch 93] train avg loss 0.00174499, dev acc 0.8627, dev avg loss 0.412472, throughput 2.80434K wps
[Epoch 94 Batch 30/172] avg loss 0.00165295, throughput 2.8125K wps
[Epoch 94 Batch 60/172] avg loss 0.00174764, throughput 2.77182K wps
[Epoch 94 Batch 90/172] avg loss 0.00159943, throughput 2.76033K wps
[Epoch 94 Batch 120/172] avg loss 0.00170218, throughput 2.81432K wps
[Epoch 94 Batch 150/172] avg loss 0.00190426, throughput 2.79118K wps
Begin Testing...
[Epoch 94] train avg loss 0.00173377, dev acc 0.8595, dev avg loss 0.412491, throughput 2.77213K wps
[Epoch 95 Batch 30/172] avg loss 0.00150477, throughput 2.7752K wps
[Epoch 95 Batch 60/172] avg loss 0.00168778, throughput 2.77108K wps
[Epoch 95 Batch 90/172] avg loss 0.00156368, throughput 2.67994K wps
[Epoch 95 Batch 120/172] avg loss 0.00193814, throughput 2.79417K wps
[Epoch 95 Batch 150/172] avg loss 0.00194676, throughput 2.72393K wps
Begin Testing...
[Epoch 95] train avg loss 0.00171249, dev acc 0.8595, dev avg loss 0.414087, throughput 2.75398K wps
[Epoch 96 Batch 30/172] avg loss 0.00166849, throughput 2.78878K wps
[Epoch 96 Batch 60/172] avg loss 0.00165689, throughput 2.77082K wps
[Epoch 96 Batch 90/172] avg loss 0.00169572, throughput 2.79876K wps
[Epoch 96 Batch 120/172] avg loss 0.00175666, throughput 2.74591K wps
[Epoch 96 Batch 150/172] avg loss 0.00159605, throughput 2.73983K wps
Begin Testing...
[Epoch 96] train avg loss 0.00173145, dev acc 0.8501, dev avg loss 0.404928, throughput 2.76419K wps
[Epoch 97 Batch 30/172] avg loss 0.00173227, throughput 2.77933K wps
[Epoch 97 Batch 60/172] avg loss 0.00172322, throughput 2.71707K wps
[Epoch 97 Batch 90/172] avg loss 0.00163657, throughput 2.74945K wps
[Epoch 97 Batch 120/172] avg loss 0.00165267, throughput 2.78631K wps
[Epoch 97 Batch 150/172] avg loss 0.00179446, throughput 2.756K wps
Begin Testing...
[Epoch 97] train avg loss 0.00171341, dev acc 0.8616, dev avg loss 0.45167, throughput 2.7565K wps
[Epoch 98 Batch 30/172] avg loss 0.00136678, throughput 2.79065K wps
[Epoch 98 Batch 60/172] avg loss 0.00163099, throughput 2.763K wps
[Epoch 98 Batch 90/172] avg loss 0.0016605, throughput 2.7561K wps
[Epoch 98 Batch 120/172] avg loss 0.00173158, throughput 2.73594K wps
[Epoch 98 Batch 150/172] avg loss 0.00176538, throughput 2.77199K wps
Begin Testing...
[Epoch 98] train avg loss 0.00167226, dev acc 0.8648, dev avg loss 0.431758, throughput 2.7676K wps
[Epoch 99 Batch 30/172] avg loss 0.00138681, throughput 2.7084K wps
[Epoch 99 Batch 60/172] avg loss 0.00168882, throughput 2.76076K wps
[Epoch 99 Batch 90/172] avg loss 0.00163138, throughput 2.7962K wps
[Epoch 99 Batch 120/172] avg loss 0.00177695, throughput 2.73113K wps
[Epoch 99 Batch 150/172] avg loss 0.00179285, throughput 2.66628K wps
Begin Testing...
[Epoch 99] train avg loss 0.00162942, dev acc 0.8637, dev avg loss 0.428132, throughput 2.74473K wps
[Epoch 100 Batch 30/172] avg loss 0.00150338, throughput 2.85004K wps
[Epoch 100 Batch 60/172] avg loss 0.0017442, throughput 2.71969K wps
[Epoch 100 Batch 90/172] avg loss 0.00186082, throughput 2.83K wps
[Epoch 100 Batch 120/172] avg loss 0.00163796, throughput 2.80356K wps
[Epoch 100 Batch 150/172] avg loss 0.00163825, throughput 2.77207K wps
Begin Testing...
[Epoch 100] train avg loss 0.00165871, dev acc 0.8585, dev avg loss 0.416576, throughput 2.79617K wps
[Epoch 101 Batch 30/172] avg loss 0.00150254, throughput 2.82397K wps
[Epoch 101 Batch 60/172] avg loss 0.00168317, throughput 2.75405K wps
[Epoch 101 Batch 90/172] avg loss 0.00155392, throughput 2.81873K wps
[Epoch 101 Batch 120/172] avg loss 0.00166081, throughput 2.76406K wps
[Epoch 101 Batch 150/172] avg loss 0.00173959, throughput 2.68682K wps
Begin Testing...
[Epoch 101] train avg loss 0.00163614, dev acc 0.8585, dev avg loss 0.433231, throughput 2.77878K wps
[Epoch 102 Batch 30/172] avg loss 0.00123633, throughput 2.84857K wps
[Epoch 102 Batch 60/172] avg loss 0.00157843, throughput 2.79221K wps
[Epoch 102 Batch 90/172] avg loss 0.00158149, throughput 2.82884K wps
[Epoch 102 Batch 120/172] avg loss 0.00166579, throughput 2.79061K wps
[Epoch 102 Batch 150/172] avg loss 0.00197332, throughput 2.778K wps
Begin Testing...
[Epoch 102] train avg loss 0.00162923, dev acc 0.8616, dev avg loss 0.452404, throughput 2.80931K wps
[Epoch 103 Batch 30/172] avg loss 0.00149525, throughput 2.88699K wps
[Epoch 103 Batch 60/172] avg loss 0.00131503, throughput 2.82512K wps
[Epoch 103 Batch 90/172] avg loss 0.00158131, throughput 2.83016K wps
[Epoch 103 Batch 120/172] avg loss 0.00151078, throughput 2.81759K wps
[Epoch 103 Batch 150/172] avg loss 0.00161391, throughput 2.80104K wps
Begin Testing...
[Epoch 103] train avg loss 0.00159748, dev acc 0.8595, dev avg loss 0.424343, throughput 2.83241K wps
[Epoch 104 Batch 30/172] avg loss 0.00140247, throughput 2.88363K wps
[Epoch 104 Batch 60/172] avg loss 0.00151131, throughput 2.82215K wps
[Epoch 104 Batch 90/172] avg loss 0.00154046, throughput 2.82411K wps
[Epoch 104 Batch 120/172] avg loss 0.00158382, throughput 2.73168K wps
[Epoch 104 Batch 150/172] avg loss 0.00152955, throughput 2.84386K wps
Begin Testing...
[Epoch 104] train avg loss 0.00151913, dev acc 0.8595, dev avg loss 0.483535, throughput 2.81206K wps
[Epoch 105 Batch 30/172] avg loss 0.00147812, throughput 2.7514K wps
[Epoch 105 Batch 60/172] avg loss 0.00151491, throughput 2.80619K wps
[Epoch 105 Batch 90/172] avg loss 0.00141918, throughput 2.77497K wps
[Epoch 105 Batch 120/172] avg loss 0.00184541, throughput 2.75758K wps
[Epoch 105 Batch 150/172] avg loss 0.00158473, throughput 2.80011K wps
Begin Testing...
[Epoch 105] train avg loss 0.0015578, dev acc 0.8606, dev avg loss 0.426482, throughput 2.76979K wps
[Epoch 106 Batch 30/172] avg loss 0.0011922, throughput 2.84135K wps
[Epoch 106 Batch 60/172] avg loss 0.00146357, throughput 2.78143K wps
[Epoch 106 Batch 90/172] avg loss 0.00170464, throughput 2.70347K wps
[Epoch 106 Batch 120/172] avg loss 0.00155412, throughput 2.73988K wps
[Epoch 106 Batch 150/172] avg loss 0.00169653, throughput 2.77125K wps
Begin Testing...
[Epoch 106] train avg loss 0.00157688, dev acc 0.8627, dev avg loss 0.461055, throughput 2.76288K wps
[Epoch 107 Batch 30/172] avg loss 0.00140659, throughput 2.78868K wps
[Epoch 107 Batch 60/172] avg loss 0.0014582, throughput 2.82284K wps
[Epoch 107 Batch 90/172] avg loss 0.0014401, throughput 2.7447K wps
[Epoch 107 Batch 120/172] avg loss 0.00157663, throughput 2.7468K wps
[Epoch 107 Batch 150/172] avg loss 0.00163926, throughput 2.79485K wps
Begin Testing...
[Epoch 107] train avg loss 0.00156163, dev acc 0.8616, dev avg loss 0.450995, throughput 2.77888K wps
[Epoch 108 Batch 30/172] avg loss 0.0013832, throughput 2.84691K wps
[Epoch 108 Batch 60/172] avg loss 0.00167671, throughput 2.78046K wps
[Epoch 108 Batch 90/172] avg loss 0.00149728, throughput 2.78414K wps
[Epoch 108 Batch 120/172] avg loss 0.00167762, throughput 2.65964K wps
[Epoch 108 Batch 150/172] avg loss 0.00154374, throughput 2.81757K wps
Begin Testing...
[Epoch 108] train avg loss 0.00153777, dev acc 0.8616, dev avg loss 0.476178, throughput 2.77753K wps
[Epoch 109 Batch 30/172] avg loss 0.00150274, throughput 2.79282K wps
[Epoch 109 Batch 60/172] avg loss 0.0014819, throughput 2.77272K wps
[Epoch 109 Batch 90/172] avg loss 0.00164037, throughput 2.77927K wps
[Epoch 109 Batch 120/172] avg loss 0.00143464, throughput 2.79043K wps
[Epoch 109 Batch 150/172] avg loss 0.00136884, throughput 2.81509K wps
Begin Testing...
[Epoch 109] train avg loss 0.00153885, dev acc 0.8658, dev avg loss 0.467544, throughput 2.79315K wps
[Epoch 110 Batch 30/172] avg loss 0.00155154, throughput 2.77423K wps
[Epoch 110 Batch 60/172] avg loss 0.00134196, throughput 2.79456K wps
[Epoch 110 Batch 90/172] avg loss 0.00149724, throughput 2.78963K wps
[Epoch 110 Batch 120/172] avg loss 0.00140146, throughput 2.69684K wps
[Epoch 110 Batch 150/172] avg loss 0.00170149, throughput 2.79259K wps
Begin Testing...
[Epoch 110] train avg loss 0.00150517, dev acc 0.8574, dev avg loss 0.431552, throughput 2.77109K wps
[Epoch 111 Batch 30/172] avg loss 0.0014007, throughput 2.80389K wps
[Epoch 111 Batch 60/172] avg loss 0.00148468, throughput 2.6319K wps
[Epoch 111 Batch 90/172] avg loss 0.0016877, throughput 2.64697K wps
[Epoch 111 Batch 120/172] avg loss 0.00143828, throughput 2.76296K wps
[Epoch 111 Batch 150/172] avg loss 0.00153545, throughput 2.82427K wps
Begin Testing...
[Epoch 111] train avg loss 0.00148996, dev acc 0.8627, dev avg loss 0.448375, throughput 2.73136K wps
[Epoch 112 Batch 30/172] avg loss 0.00137281, throughput 2.85548K wps
[Epoch 112 Batch 60/172] avg loss 0.00148362, throughput 2.7965K wps
[Epoch 112 Batch 90/172] avg loss 0.00141073, throughput 2.83723K wps
[Epoch 112 Batch 120/172] avg loss 0.00152416, throughput 2.75891K wps
[Epoch 112 Batch 150/172] avg loss 0.00154706, throughput 2.79333K wps
Begin Testing...
[Epoch 112] train avg loss 0.00148276, dev acc 0.8627, dev avg loss 0.447057, throughput 2.81K wps
[Epoch 113 Batch 30/172] avg loss 0.00164607, throughput 2.7562K wps
[Epoch 113 Batch 60/172] avg loss 0.00134454, throughput 2.75411K wps
[Epoch 113 Batch 90/172] avg loss 0.00143952, throughput 2.83528K wps
[Epoch 113 Batch 120/172] avg loss 0.00156216, throughput 2.82869K wps
[Epoch 113 Batch 150/172] avg loss 0.00145512, throughput 2.82623K wps
Begin Testing...
[Epoch 113] train avg loss 0.00145751, dev acc 0.8648, dev avg loss 0.467216, throughput 2.80007K wps
[Epoch 114 Batch 30/172] avg loss 0.00158613, throughput 2.85527K wps
[Epoch 114 Batch 60/172] avg loss 0.00130753, throughput 2.82301K wps
[Epoch 114 Batch 90/172] avg loss 0.00151165, throughput 2.81546K wps
[Epoch 114 Batch 120/172] avg loss 0.00154679, throughput 2.74254K wps
[Epoch 114 Batch 150/172] avg loss 0.00162395, throughput 2.84011K wps
Begin Testing...
[Epoch 114] train avg loss 0.00151442, dev acc 0.8669, dev avg loss 0.448799, throughput 2.81438K wps
[Epoch 115 Batch 30/172] avg loss 0.00124387, throughput 2.76436K wps
[Epoch 115 Batch 60/172] avg loss 0.00145629, throughput 2.87927K wps
[Epoch 115 Batch 90/172] avg loss 0.00156109, throughput 2.73194K wps
[Epoch 115 Batch 120/172] avg loss 0.00136339, throughput 2.86961K wps
[Epoch 115 Batch 150/172] avg loss 0.001256, throughput 2.79946K wps
Begin Testing...
[Epoch 115] train avg loss 0.001429, dev acc 0.8669, dev avg loss 0.470947, throughput 2.80435K wps
[Epoch 116 Batch 30/172] avg loss 0.00125989, throughput 2.8565K wps
[Epoch 116 Batch 60/172] avg loss 0.00147412, throughput 2.80776K wps
[Epoch 116 Batch 90/172] avg loss 0.00154624, throughput 2.80025K wps
[Epoch 116 Batch 120/172] avg loss 0.0014178, throughput 2.77978K wps
[Epoch 116 Batch 150/172] avg loss 0.00129099, throughput 2.79942K wps
Begin Testing...
[Epoch 116] train avg loss 0.00142636, dev acc 0.8648, dev avg loss 0.478401, throughput 2.80997K wps
[Epoch 117 Batch 30/172] avg loss 0.00157651, throughput 2.80891K wps
[Epoch 117 Batch 60/172] avg loss 0.00121471, throughput 2.78609K wps
[Epoch 117 Batch 90/172] avg loss 0.00135994, throughput 2.78556K wps
[Epoch 117 Batch 120/172] avg loss 0.00143175, throughput 2.7637K wps
[Epoch 117 Batch 150/172] avg loss 0.00140175, throughput 2.77838K wps
Begin Testing...
[Epoch 117] train avg loss 0.00140145, dev acc 0.8669, dev avg loss 0.466166, throughput 2.78789K wps
[Epoch 118 Batch 30/172] avg loss 0.00138297, throughput 2.82867K wps
[Epoch 118 Batch 60/172] avg loss 0.00132947, throughput 2.78359K wps
[Epoch 118 Batch 90/172] avg loss 0.00161737, throughput 2.84148K wps
[Epoch 118 Batch 120/172] avg loss 0.00126392, throughput 2.82405K wps
[Epoch 118 Batch 150/172] avg loss 0.00127401, throughput 2.81848K wps
Begin Testing...
[Epoch 118] train avg loss 0.00141469, dev acc 0.8616, dev avg loss 0.456481, throughput 2.81733K wps
[Epoch 119 Batch 30/172] avg loss 0.00120293, throughput 2.73795K wps
[Epoch 119 Batch 60/172] avg loss 0.00123296, throughput 2.79516K wps
[Epoch 119 Batch 90/172] avg loss 0.00153363, throughput 2.83222K wps
[Epoch 119 Batch 120/172] avg loss 0.00129215, throughput 2.79571K wps
[Epoch 119 Batch 150/172] avg loss 0.00159474, throughput 2.75704K wps
Begin Testing...
[Epoch 119] train avg loss 0.00139234, dev acc 0.8648, dev avg loss 0.478834, throughput 2.78282K wps
[Epoch 120 Batch 30/172] avg loss 0.0011614, throughput 2.81581K wps
[Epoch 120 Batch 60/172] avg loss 0.00129188, throughput 2.69522K wps
[Epoch 120 Batch 90/172] avg loss 0.00138962, throughput 2.76241K wps
[Epoch 120 Batch 120/172] avg loss 0.00152041, throughput 2.65289K wps
[Epoch 120 Batch 150/172] avg loss 0.00155472, throughput 2.81426K wps
Begin Testing...
[Epoch 120] train avg loss 0.00137501, dev acc 0.8616, dev avg loss 0.510579, throughput 2.74806K wps
[Epoch 121 Batch 30/172] avg loss 0.00151002, throughput 2.75063K wps
[Epoch 121 Batch 60/172] avg loss 0.00139521, throughput 2.68538K wps
[Epoch 121 Batch 90/172] avg loss 0.0012198, throughput 2.78504K wps
[Epoch 121 Batch 120/172] avg loss 0.00143061, throughput 2.77648K wps
[Epoch 121 Batch 150/172] avg loss 0.00146451, throughput 2.74578K wps
Begin Testing...
[Epoch 121] train avg loss 0.00140906, dev acc 0.8585, dev avg loss 0.452227, throughput 2.7398K wps
[Epoch 122 Batch 30/172] avg loss 0.00112676, throughput 2.79737K wps
[Epoch 122 Batch 60/172] avg loss 0.00133522, throughput 2.7586K wps
[Epoch 122 Batch 90/172] avg loss 0.00124262, throughput 2.76092K wps
[Epoch 122 Batch 120/172] avg loss 0.00164194, throughput 2.74589K wps
[Epoch 122 Batch 150/172] avg loss 0.00142541, throughput 2.7586K wps
Begin Testing...
[Epoch 122] train avg loss 0.00136359, dev acc 0.8679, dev avg loss 0.475732, throughput 2.761K wps
[Epoch 123 Batch 30/172] avg loss 0.00145175, throughput 2.75591K wps
[Epoch 123 Batch 60/172] avg loss 0.00129927, throughput 2.78239K wps
[Epoch 123 Batch 90/172] avg loss 0.00141649, throughput 2.74215K wps
[Epoch 123 Batch 120/172] avg loss 0.00139509, throughput 2.75917K wps
[Epoch 123 Batch 150/172] avg loss 0.00124158, throughput 2.79076K wps
Begin Testing...
[Epoch 123] train avg loss 0.00136176, dev acc 0.8627, dev avg loss 0.465717, throughput 2.76865K wps
[Epoch 124 Batch 30/172] avg loss 0.00121513, throughput 2.85265K wps
[Epoch 124 Batch 60/172] avg loss 0.00147399, throughput 2.81671K wps
[Epoch 124 Batch 90/172] avg loss 0.00119587, throughput 2.76764K wps
[Epoch 124 Batch 120/172] avg loss 0.0012317, throughput 2.793K wps
[Epoch 124 Batch 150/172] avg loss 0.00145384, throughput 2.78076K wps
Begin Testing...
[Epoch 124] train avg loss 0.00132305, dev acc 0.8627, dev avg loss 0.48501, throughput 2.80085K wps
[Epoch 125 Batch 30/172] avg loss 0.0013192, throughput 2.83015K wps
[Epoch 125 Batch 60/172] avg loss 0.00137806, throughput 2.77035K wps
[Epoch 125 Batch 90/172] avg loss 0.0013106, throughput 2.83351K wps
[Epoch 125 Batch 120/172] avg loss 0.00103137, throughput 2.8182K wps
[Epoch 125 Batch 150/172] avg loss 0.00139583, throughput 2.8174K wps
Begin Testing...
[Epoch 125] train avg loss 0.00130653, dev acc 0.8648, dev avg loss 0.467836, throughput 2.8026K wps
[Epoch 126 Batch 30/172] avg loss 0.00131396, throughput 2.83926K wps
[Epoch 126 Batch 60/172] avg loss 0.00117792, throughput 2.80078K wps
[Epoch 126 Batch 90/172] avg loss 0.0012977, throughput 2.8168K wps
[Epoch 126 Batch 120/172] avg loss 0.00149862, throughput 2.67983K wps
[Epoch 126 Batch 150/172] avg loss 0.00133078, throughput 2.84485K wps
Begin Testing...
[Epoch 126] train avg loss 0.00131052, dev acc 0.8553, dev avg loss 0.457139, throughput 2.79558K wps
[Epoch 127 Batch 30/172] avg loss 0.00119142, throughput 2.88535K wps
[Epoch 127 Batch 60/172] avg loss 0.00126306, throughput 2.79339K wps
[Epoch 127 Batch 90/172] avg loss 0.0011928, throughput 2.85412K wps
[Epoch 127 Batch 120/172] avg loss 0.00121895, throughput 2.80982K wps
[Epoch 127 Batch 150/172] avg loss 0.00160088, throughput 2.79011K wps
Begin Testing...
[Epoch 127] train avg loss 0.00132275, dev acc 0.8658, dev avg loss 0.481337, throughput 2.82217K wps