Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='SST-2', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='non-static')
Use gpu0
maximum length (in tokens): 53
Done! Tokenizing Time=0.88s, #Sentences=76961
Done! Tokenizing Time=0.03s, #Sentences=1821
Done! Tokenizing Time=0.01s, #Sentences=872
SentimentNet(
(embedding): Embedding(17244 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/1540] avg loss 0.0138731, throughput 0.740285K wps
[Epoch 0 Batch 60/1540] avg loss 0.013804, throughput 2.82758K wps
[Epoch 0 Batch 90/1540] avg loss 0.0137744, throughput 2.76342K wps
[Epoch 0 Batch 120/1540] avg loss 0.0137246, throughput 2.83075K wps
[Epoch 0 Batch 150/1540] avg loss 0.0137565, throughput 2.8483K wps
[Epoch 0 Batch 180/1540] avg loss 0.0136401, throughput 2.77658K wps
[Epoch 0 Batch 210/1540] avg loss 0.0135375, throughput 2.76395K wps
[Epoch 0 Batch 240/1540] avg loss 0.0136515, throughput 2.8218K wps
[Epoch 0 Batch 270/1540] avg loss 0.0134824, throughput 2.82323K wps
[Epoch 0 Batch 300/1540] avg loss 0.0133838, throughput 2.76673K wps
[Epoch 0 Batch 330/1540] avg loss 0.0134695, throughput 2.7997K wps
[Epoch 0 Batch 360/1540] avg loss 0.0133678, throughput 2.84522K wps
[Epoch 0 Batch 390/1540] avg loss 0.0134588, throughput 2.83796K wps
[Epoch 0 Batch 420/1540] avg loss 0.0131752, throughput 2.84253K wps
[Epoch 0 Batch 450/1540] avg loss 0.0131372, throughput 2.8436K wps
[Epoch 0 Batch 480/1540] avg loss 0.0132807, throughput 2.84514K wps
[Epoch 0 Batch 510/1540] avg loss 0.013217, throughput 2.83586K wps
[Epoch 0 Batch 540/1540] avg loss 0.0130389, throughput 2.82719K wps
[Epoch 0 Batch 570/1540] avg loss 0.0130783, throughput 2.78674K wps
[Epoch 0 Batch 600/1540] avg loss 0.0128821, throughput 2.79215K wps
[Epoch 0 Batch 630/1540] avg loss 0.012887, throughput 2.76447K wps
[Epoch 0 Batch 660/1540] avg loss 0.0129701, throughput 2.82761K wps
[Epoch 0 Batch 690/1540] avg loss 0.0128866, throughput 2.76208K wps
[Epoch 0 Batch 720/1540] avg loss 0.0127259, throughput 2.8017K wps
[Epoch 0 Batch 750/1540] avg loss 0.0128263, throughput 2.84323K wps
[Epoch 0 Batch 780/1540] avg loss 0.0128494, throughput 2.80767K wps
[Epoch 0 Batch 810/1540] avg loss 0.0125534, throughput 2.84531K wps
[Epoch 0 Batch 840/1540] avg loss 0.012692, throughput 2.84407K wps
[Epoch 0 Batch 870/1540] avg loss 0.0125014, throughput 2.77246K wps
[Epoch 0 Batch 900/1540] avg loss 0.0126015, throughput 2.748K wps
[Epoch 0 Batch 930/1540] avg loss 0.0125336, throughput 2.75852K wps
[Epoch 0 Batch 960/1540] avg loss 0.0123933, throughput 2.78244K wps
[Epoch 0 Batch 990/1540] avg loss 0.0124793, throughput 2.80191K wps
[Epoch 0 Batch 1020/1540] avg loss 0.0122712, throughput 2.79577K wps
[Epoch 0 Batch 1050/1540] avg loss 0.0121843, throughput 2.83849K wps
[Epoch 0 Batch 1080/1540] avg loss 0.0122296, throughput 2.84532K wps
[Epoch 0 Batch 1110/1540] avg loss 0.0121444, throughput 2.80751K wps
[Epoch 0 Batch 1140/1540] avg loss 0.0120085, throughput 2.75181K wps
[Epoch 0 Batch 1170/1540] avg loss 0.0120781, throughput 2.8277K wps
[Epoch 0 Batch 1200/1540] avg loss 0.0120245, throughput 2.84675K wps
[Epoch 0 Batch 1230/1540] avg loss 0.0120974, throughput 2.83822K wps
[Epoch 0 Batch 1260/1540] avg loss 0.0119134, throughput 2.83829K wps
[Epoch 0 Batch 1290/1540] avg loss 0.0117129, throughput 2.81373K wps
[Epoch 0 Batch 1320/1540] avg loss 0.0118652, throughput 2.82253K wps
[Epoch 0 Batch 1350/1540] avg loss 0.0116027, throughput 2.84118K wps
[Epoch 0 Batch 1380/1540] avg loss 0.0115631, throughput 2.83108K wps
[Epoch 0 Batch 1410/1540] avg loss 0.0114917, throughput 2.82823K wps
[Epoch 0 Batch 1440/1540] avg loss 0.011658, throughput 2.84457K wps
[Epoch 0 Batch 1470/1540] avg loss 0.0114241, throughput 2.7956K wps
[Epoch 0 Batch 1500/1540] avg loss 0.0114143, throughput 2.81744K wps
[Epoch 0 Batch 1530/1540] avg loss 0.0112091, throughput 2.77831K wps
Begin Testing...
[Epoch 0] train avg loss 0.0126735, dev acc 0.7695, dev avg loss 0.568902, throughput 2.57861K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 1 Batch 30/1540] avg loss 0.0110631, throughput 2.82143K wps
[Epoch 1 Batch 60/1540] avg loss 0.0110126, throughput 2.75633K wps
[Epoch 1 Batch 90/1540] avg loss 0.0109376, throughput 2.75374K wps
[Epoch 1 Batch 120/1540] avg loss 0.0106767, throughput 2.79305K wps
[Epoch 1 Batch 150/1540] avg loss 0.0108214, throughput 2.78207K wps
[Epoch 1 Batch 180/1540] avg loss 0.0108471, throughput 2.82535K wps
[Epoch 1 Batch 210/1540] avg loss 0.0108645, throughput 2.84781K wps
[Epoch 1 Batch 240/1540] avg loss 0.0106732, throughput 2.83708K wps
[Epoch 1 Batch 270/1540] avg loss 0.0104913, throughput 2.81022K wps
[Epoch 1 Batch 300/1540] avg loss 0.0106523, throughput 2.83896K wps
[Epoch 1 Batch 330/1540] avg loss 0.0104403, throughput 2.84723K wps
[Epoch 1 Batch 360/1540] avg loss 0.0103332, throughput 2.82321K wps
[Epoch 1 Batch 390/1540] avg loss 0.0105556, throughput 2.77974K wps
[Epoch 1 Batch 420/1540] avg loss 0.0102342, throughput 2.75196K wps
[Epoch 1 Batch 450/1540] avg loss 0.0103945, throughput 2.84025K wps
[Epoch 1 Batch 480/1540] avg loss 0.00997052, throughput 2.81207K wps
[Epoch 1 Batch 510/1540] avg loss 0.0102112, throughput 2.80379K wps
[Epoch 1 Batch 540/1540] avg loss 0.00995837, throughput 2.84143K wps
[Epoch 1 Batch 570/1540] avg loss 0.00998, throughput 2.83346K wps
[Epoch 1 Batch 600/1540] avg loss 0.00985854, throughput 2.83426K wps
[Epoch 1 Batch 630/1540] avg loss 0.00981498, throughput 2.8374K wps
[Epoch 1 Batch 660/1540] avg loss 0.00962526, throughput 2.84628K wps
[Epoch 1 Batch 690/1540] avg loss 0.00947853, throughput 2.84196K wps
[Epoch 1 Batch 720/1540] avg loss 0.00981046, throughput 2.79852K wps
[Epoch 1 Batch 750/1540] avg loss 0.00991648, throughput 2.83888K wps
[Epoch 1 Batch 780/1540] avg loss 0.00950724, throughput 2.8308K wps
[Epoch 1 Batch 810/1540] avg loss 0.00960921, throughput 2.83367K wps
[Epoch 1 Batch 840/1540] avg loss 0.00962353, throughput 2.80737K wps
[Epoch 1 Batch 870/1540] avg loss 0.00952186, throughput 2.84285K wps
[Epoch 1 Batch 900/1540] avg loss 0.00952161, throughput 2.83325K wps
[Epoch 1 Batch 930/1540] avg loss 0.00975803, throughput 2.84156K wps
[Epoch 1 Batch 960/1540] avg loss 0.00901519, throughput 2.84162K wps
[Epoch 1 Batch 990/1540] avg loss 0.00927324, throughput 2.84285K wps
[Epoch 1 Batch 1020/1540] avg loss 0.00905527, throughput 2.84115K wps
[Epoch 1 Batch 1050/1540] avg loss 0.00880264, throughput 2.84458K wps
[Epoch 1 Batch 1080/1540] avg loss 0.00933026, throughput 2.84655K wps
[Epoch 1 Batch 1110/1540] avg loss 0.00892712, throughput 2.84498K wps
[Epoch 1 Batch 1140/1540] avg loss 0.00908068, throughput 2.77923K wps
[Epoch 1 Batch 1170/1540] avg loss 0.00882108, throughput 2.81561K wps
[Epoch 1 Batch 1200/1540] avg loss 0.008659, throughput 2.84511K wps
[Epoch 1 Batch 1230/1540] avg loss 0.00884134, throughput 2.7662K wps
[Epoch 1 Batch 1260/1540] avg loss 0.00876531, throughput 2.80655K wps
[Epoch 1 Batch 1290/1540] avg loss 0.00843336, throughput 2.8286K wps
[Epoch 1 Batch 1320/1540] avg loss 0.00872508, throughput 2.82533K wps
[Epoch 1 Batch 1350/1540] avg loss 0.00895164, throughput 2.75756K wps
[Epoch 1 Batch 1380/1540] avg loss 0.00836612, throughput 2.7535K wps
[Epoch 1 Batch 1410/1540] avg loss 0.00812081, throughput 2.75101K wps
[Epoch 1 Batch 1440/1540] avg loss 0.00865269, throughput 2.82781K wps
[Epoch 1 Batch 1470/1540] avg loss 0.00856407, throughput 2.78925K wps
[Epoch 1 Batch 1500/1540] avg loss 0.00844483, throughput 2.82981K wps
[Epoch 1 Batch 1530/1540] avg loss 0.00853974, throughput 2.79599K wps
Begin Testing...
[Epoch 1] train avg loss 0.00964274, dev acc 0.8188, dev avg loss 0.444348, throughput 2.81574K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 2 Batch 30/1540] avg loss 0.00818177, throughput 2.82178K wps
[Epoch 2 Batch 60/1540] avg loss 0.008221, throughput 2.79026K wps
[Epoch 2 Batch 90/1540] avg loss 0.00843351, throughput 2.82738K wps
[Epoch 2 Batch 120/1540] avg loss 0.00849677, throughput 2.84514K wps
[Epoch 2 Batch 150/1540] avg loss 0.00808841, throughput 2.84661K wps
[Epoch 2 Batch 180/1540] avg loss 0.00823058, throughput 2.8396K wps
[Epoch 2 Batch 210/1540] avg loss 0.00798294, throughput 2.83631K wps
[Epoch 2 Batch 240/1540] avg loss 0.00830489, throughput 2.80703K wps
[Epoch 2 Batch 270/1540] avg loss 0.00864171, throughput 2.84102K wps
[Epoch 2 Batch 300/1540] avg loss 0.00823962, throughput 2.83228K wps
[Epoch 2 Batch 330/1540] avg loss 0.00800677, throughput 2.84485K wps
[Epoch 2 Batch 360/1540] avg loss 0.00794529, throughput 2.78546K wps
[Epoch 2 Batch 390/1540] avg loss 0.00822323, throughput 2.79183K wps
[Epoch 2 Batch 420/1540] avg loss 0.00777398, throughput 2.7759K wps
[Epoch 2 Batch 450/1540] avg loss 0.00773781, throughput 2.7665K wps
[Epoch 2 Batch 480/1540] avg loss 0.00753834, throughput 2.74395K wps
[Epoch 2 Batch 510/1540] avg loss 0.00790738, throughput 2.82738K wps
[Epoch 2 Batch 540/1540] avg loss 0.00766277, throughput 2.83038K wps
[Epoch 2 Batch 570/1540] avg loss 0.0079503, throughput 2.84075K wps
[Epoch 2 Batch 600/1540] avg loss 0.00767657, throughput 2.84514K wps
[Epoch 2 Batch 630/1540] avg loss 0.0081198, throughput 2.84201K wps
[Epoch 2 Batch 660/1540] avg loss 0.00799797, throughput 2.84325K wps
[Epoch 2 Batch 690/1540] avg loss 0.00756495, throughput 2.8427K wps
[Epoch 2 Batch 720/1540] avg loss 0.00801837, throughput 2.79144K wps
[Epoch 2 Batch 750/1540] avg loss 0.00786093, throughput 2.79745K wps
[Epoch 2 Batch 780/1540] avg loss 0.00758438, throughput 2.83218K wps
[Epoch 2 Batch 810/1540] avg loss 0.00806319, throughput 2.83987K wps
[Epoch 2 Batch 840/1540] avg loss 0.00772142, throughput 2.84249K wps
[Epoch 2 Batch 870/1540] avg loss 0.00734938, throughput 2.8283K wps
[Epoch 2 Batch 900/1540] avg loss 0.00802911, throughput 2.75281K wps
[Epoch 2 Batch 930/1540] avg loss 0.00772419, throughput 2.76998K wps
[Epoch 2 Batch 960/1540] avg loss 0.00769413, throughput 2.8393K wps
[Epoch 2 Batch 990/1540] avg loss 0.00781773, throughput 2.76975K wps
[Epoch 2 Batch 1020/1540] avg loss 0.00790665, throughput 2.84538K wps
[Epoch 2 Batch 1050/1540] avg loss 0.0078324, throughput 2.83929K wps
[Epoch 2 Batch 1080/1540] avg loss 0.00775323, throughput 2.83749K wps
[Epoch 2 Batch 1110/1540] avg loss 0.00756242, throughput 2.84694K wps
[Epoch 2 Batch 1140/1540] avg loss 0.00737152, throughput 2.82654K wps
[Epoch 2 Batch 1170/1540] avg loss 0.00750688, throughput 2.75609K wps
[Epoch 2 Batch 1200/1540] avg loss 0.00762412, throughput 2.75336K wps
[Epoch 2 Batch 1230/1540] avg loss 0.00738885, throughput 2.74863K wps
[Epoch 2 Batch 1260/1540] avg loss 0.00738919, throughput 2.78644K wps
[Epoch 2 Batch 1290/1540] avg loss 0.00760591, throughput 2.83327K wps
[Epoch 2 Batch 1320/1540] avg loss 0.00757396, throughput 2.83019K wps
[Epoch 2 Batch 1350/1540] avg loss 0.00747815, throughput 2.75181K wps
[Epoch 2 Batch 1380/1540] avg loss 0.00733852, throughput 2.80396K wps
[Epoch 2 Batch 1410/1540] avg loss 0.00702982, throughput 2.74454K wps
[Epoch 2 Batch 1440/1540] avg loss 0.0077316, throughput 2.80546K wps
[Epoch 2 Batch 1470/1540] avg loss 0.00712889, throughput 2.83939K wps
[Epoch 2 Batch 1500/1540] avg loss 0.00687012, throughput 2.8442K wps
[Epoch 2 Batch 1530/1540] avg loss 0.00718566, throughput 2.84481K wps
Begin Testing...
[Epoch 2] train avg loss 0.00778807, dev acc 0.8257, dev avg loss 0.4072, throughput 2.81287K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 3 Batch 30/1540] avg loss 0.00650701, throughput 2.90134K wps
[Epoch 3 Batch 60/1540] avg loss 0.00690751, throughput 2.84811K wps
[Epoch 3 Batch 90/1540] avg loss 0.00707867, throughput 2.77755K wps
[Epoch 3 Batch 120/1540] avg loss 0.00699722, throughput 2.81227K wps
[Epoch 3 Batch 150/1540] avg loss 0.00673549, throughput 2.81814K wps
[Epoch 3 Batch 180/1540] avg loss 0.00705919, throughput 2.73529K wps
[Epoch 3 Batch 210/1540] avg loss 0.00670478, throughput 2.77665K wps
[Epoch 3 Batch 240/1540] avg loss 0.00768604, throughput 2.82866K wps
[Epoch 3 Batch 270/1540] avg loss 0.00719327, throughput 2.74269K wps
[Epoch 3 Batch 300/1540] avg loss 0.00677073, throughput 2.834K wps
[Epoch 3 Batch 330/1540] avg loss 0.00713877, throughput 2.842K wps
[Epoch 3 Batch 360/1540] avg loss 0.00690095, throughput 2.84193K wps
[Epoch 3 Batch 390/1540] avg loss 0.0071021, throughput 2.82094K wps
[Epoch 3 Batch 420/1540] avg loss 0.00721587, throughput 2.77263K wps
[Epoch 3 Batch 450/1540] avg loss 0.00725034, throughput 2.84355K wps
[Epoch 3 Batch 480/1540] avg loss 0.00709763, throughput 2.78135K wps
[Epoch 3 Batch 510/1540] avg loss 0.00735823, throughput 2.75882K wps
[Epoch 3 Batch 540/1540] avg loss 0.00695172, throughput 2.83119K wps
[Epoch 3 Batch 570/1540] avg loss 0.0066859, throughput 2.83291K wps
[Epoch 3 Batch 600/1540] avg loss 0.00699404, throughput 2.74093K wps
[Epoch 3 Batch 630/1540] avg loss 0.00698417, throughput 2.76972K wps
[Epoch 3 Batch 660/1540] avg loss 0.00698647, throughput 2.83276K wps
[Epoch 3 Batch 690/1540] avg loss 0.00698776, throughput 2.75973K wps
[Epoch 3 Batch 720/1540] avg loss 0.00685158, throughput 2.76591K wps
[Epoch 3 Batch 750/1540] avg loss 0.00641646, throughput 2.83451K wps
[Epoch 3 Batch 780/1540] avg loss 0.00659425, throughput 2.80751K wps
[Epoch 3 Batch 810/1540] avg loss 0.00648315, throughput 2.77291K wps
[Epoch 3 Batch 840/1540] avg loss 0.00688017, throughput 2.84335K wps
[Epoch 3 Batch 870/1540] avg loss 0.00667048, throughput 2.80089K wps
[Epoch 3 Batch 900/1540] avg loss 0.00707814, throughput 2.84334K wps
[Epoch 3 Batch 930/1540] avg loss 0.00658276, throughput 2.84348K wps
[Epoch 3 Batch 960/1540] avg loss 0.00663884, throughput 2.8419K wps
[Epoch 3 Batch 990/1540] avg loss 0.00663324, throughput 2.83602K wps
[Epoch 3 Batch 1020/1540] avg loss 0.00723247, throughput 2.81221K wps
[Epoch 3 Batch 1050/1540] avg loss 0.00653339, throughput 2.79634K wps
[Epoch 3 Batch 1080/1540] avg loss 0.00675507, throughput 2.80582K wps
[Epoch 3 Batch 1110/1540] avg loss 0.00693025, throughput 2.84088K wps
[Epoch 3 Batch 1140/1540] avg loss 0.00651204, throughput 2.84062K wps
[Epoch 3 Batch 1170/1540] avg loss 0.00623055, throughput 2.78628K wps
[Epoch 3 Batch 1200/1540] avg loss 0.00633631, throughput 2.78252K wps
[Epoch 3 Batch 1230/1540] avg loss 0.00668963, throughput 2.80335K wps
[Epoch 3 Batch 1260/1540] avg loss 0.00684858, throughput 2.84077K wps
[Epoch 3 Batch 1290/1540] avg loss 0.00670825, throughput 2.84699K wps
[Epoch 3 Batch 1320/1540] avg loss 0.00688778, throughput 2.84558K wps
[Epoch 3 Batch 1350/1540] avg loss 0.00677907, throughput 2.83668K wps
[Epoch 3 Batch 1380/1540] avg loss 0.00662707, throughput 2.84857K wps
[Epoch 3 Batch 1410/1540] avg loss 0.00703374, throughput 2.78376K wps
[Epoch 3 Batch 1440/1540] avg loss 0.00648778, throughput 2.75579K wps
[Epoch 3 Batch 1470/1540] avg loss 0.00640228, throughput 2.75355K wps
[Epoch 3 Batch 1500/1540] avg loss 0.00675922, throughput 2.74651K wps
[Epoch 3 Batch 1530/1540] avg loss 0.00665724, throughput 2.78368K wps
Begin Testing...
[Epoch 3] train avg loss 0.00683756, dev acc 0.8188, dev avg loss 0.402912, throughput 2.80763K wps
[Epoch 4 Batch 30/1540] avg loss 0.0070664, throughput 2.85584K wps
[Epoch 4 Batch 60/1540] avg loss 0.00640009, throughput 2.81517K wps
[Epoch 4 Batch 90/1540] avg loss 0.00675213, throughput 2.80524K wps
[Epoch 4 Batch 120/1540] avg loss 0.00627249, throughput 2.74292K wps
[Epoch 4 Batch 150/1540] avg loss 0.00652009, throughput 2.74275K wps
[Epoch 4 Batch 180/1540] avg loss 0.00624357, throughput 2.78095K wps
[Epoch 4 Batch 210/1540] avg loss 0.00625948, throughput 2.78858K wps
[Epoch 4 Batch 240/1540] avg loss 0.00651692, throughput 2.83823K wps
[Epoch 4 Batch 270/1540] avg loss 0.00598243, throughput 2.83895K wps
[Epoch 4 Batch 300/1540] avg loss 0.0060029, throughput 2.84018K wps
[Epoch 4 Batch 330/1540] avg loss 0.00618207, throughput 2.81049K wps
[Epoch 4 Batch 360/1540] avg loss 0.00658243, throughput 2.81754K wps
[Epoch 4 Batch 390/1540] avg loss 0.00614248, throughput 2.84292K wps
[Epoch 4 Batch 420/1540] avg loss 0.00626546, throughput 2.83165K wps
[Epoch 4 Batch 450/1540] avg loss 0.00661307, throughput 2.8413K wps
[Epoch 4 Batch 480/1540] avg loss 0.00613066, throughput 2.84277K wps
[Epoch 4 Batch 510/1540] avg loss 0.00633939, throughput 2.7795K wps
[Epoch 4 Batch 540/1540] avg loss 0.00570709, throughput 2.81474K wps
[Epoch 4 Batch 570/1540] avg loss 0.00686641, throughput 2.79457K wps
[Epoch 4 Batch 600/1540] avg loss 0.00616017, throughput 2.8191K wps
[Epoch 4 Batch 630/1540] avg loss 0.00615288, throughput 2.84053K wps
[Epoch 4 Batch 660/1540] avg loss 0.00611493, throughput 2.82609K wps
[Epoch 4 Batch 690/1540] avg loss 0.00604461, throughput 2.75461K wps
[Epoch 4 Batch 720/1540] avg loss 0.00644325, throughput 2.81819K wps
[Epoch 4 Batch 750/1540] avg loss 0.00649496, throughput 2.83104K wps
[Epoch 4 Batch 780/1540] avg loss 0.00601517, throughput 2.75783K wps
[Epoch 4 Batch 810/1540] avg loss 0.00613582, throughput 2.83228K wps
[Epoch 4 Batch 840/1540] avg loss 0.00624138, throughput 2.7806K wps
[Epoch 4 Batch 870/1540] avg loss 0.00625881, throughput 2.83229K wps
[Epoch 4 Batch 900/1540] avg loss 0.00596458, throughput 2.84009K wps
[Epoch 4 Batch 930/1540] avg loss 0.0061375, throughput 2.8413K wps
[Epoch 4 Batch 960/1540] avg loss 0.00653639, throughput 2.77821K wps
[Epoch 4 Batch 990/1540] avg loss 0.00635336, throughput 2.76418K wps
[Epoch 4 Batch 1020/1540] avg loss 0.0064093, throughput 2.76565K wps
[Epoch 4 Batch 1050/1540] avg loss 0.00626846, throughput 2.77886K wps
[Epoch 4 Batch 1080/1540] avg loss 0.00590319, throughput 2.75833K wps
[Epoch 4 Batch 1110/1540] avg loss 0.00632697, throughput 2.84152K wps
[Epoch 4 Batch 1140/1540] avg loss 0.00604994, throughput 2.84226K wps
[Epoch 4 Batch 1170/1540] avg loss 0.00612059, throughput 2.8147K wps
[Epoch 4 Batch 1200/1540] avg loss 0.00615736, throughput 2.84107K wps
[Epoch 4 Batch 1230/1540] avg loss 0.00636214, throughput 2.79322K wps
[Epoch 4 Batch 1260/1540] avg loss 0.00602757, throughput 2.77119K wps
[Epoch 4 Batch 1290/1540] avg loss 0.0061409, throughput 2.74143K wps
[Epoch 4 Batch 1320/1540] avg loss 0.00584609, throughput 2.83463K wps
[Epoch 4 Batch 1350/1540] avg loss 0.00621956, throughput 2.75585K wps
[Epoch 4 Batch 1380/1540] avg loss 0.00596616, throughput 2.77082K wps
[Epoch 4 Batch 1410/1540] avg loss 0.00640039, throughput 2.78971K wps
[Epoch 4 Batch 1440/1540] avg loss 0.00584694, throughput 2.75081K wps
[Epoch 4 Batch 1470/1540] avg loss 0.00620773, throughput 2.74708K wps
[Epoch 4 Batch 1500/1540] avg loss 0.00579899, throughput 2.79835K wps
[Epoch 4 Batch 1530/1540] avg loss 0.00595611, throughput 2.84056K wps
Begin Testing...
[Epoch 4] train avg loss 0.00623336, dev acc 0.8234, dev avg loss 0.390886, throughput 2.80321K wps
[Epoch 5 Batch 30/1540] avg loss 0.00573322, throughput 2.86606K wps
[Epoch 5 Batch 60/1540] avg loss 0.00591252, throughput 2.83332K wps
[Epoch 5 Batch 90/1540] avg loss 0.00555733, throughput 2.82441K wps
[Epoch 5 Batch 120/1540] avg loss 0.00576541, throughput 2.80072K wps
[Epoch 5 Batch 150/1540] avg loss 0.0054876, throughput 2.84172K wps
[Epoch 5 Batch 180/1540] avg loss 0.00580715, throughput 2.7618K wps
[Epoch 5 Batch 210/1540] avg loss 0.00579395, throughput 2.82513K wps
[Epoch 5 Batch 240/1540] avg loss 0.00592773, throughput 2.84117K wps
[Epoch 5 Batch 270/1540] avg loss 0.00562191, throughput 2.80522K wps
[Epoch 5 Batch 300/1540] avg loss 0.0054913, throughput 2.78779K wps
[Epoch 5 Batch 330/1540] avg loss 0.00597087, throughput 2.82979K wps
[Epoch 5 Batch 360/1540] avg loss 0.00585512, throughput 2.81606K wps
[Epoch 5 Batch 390/1540] avg loss 0.00645596, throughput 2.80459K wps
[Epoch 5 Batch 420/1540] avg loss 0.00589891, throughput 2.82741K wps
[Epoch 5 Batch 450/1540] avg loss 0.0055794, throughput 2.80342K wps
[Epoch 5 Batch 480/1540] avg loss 0.00584099, throughput 2.80395K wps
[Epoch 5 Batch 510/1540] avg loss 0.00562716, throughput 2.83374K wps
[Epoch 5 Batch 540/1540] avg loss 0.00598099, throughput 2.84314K wps
[Epoch 5 Batch 570/1540] avg loss 0.00548728, throughput 2.81558K wps
[Epoch 5 Batch 600/1540] avg loss 0.00622653, throughput 2.81845K wps
[Epoch 5 Batch 630/1540] avg loss 0.00615867, throughput 2.75013K wps
[Epoch 5 Batch 660/1540] avg loss 0.00601488, throughput 2.73393K wps
[Epoch 5 Batch 690/1540] avg loss 0.00572056, throughput 2.78089K wps
[Epoch 5 Batch 720/1540] avg loss 0.00596925, throughput 2.82875K wps
[Epoch 5 Batch 750/1540] avg loss 0.00561353, throughput 2.83939K wps
[Epoch 5 Batch 780/1540] avg loss 0.00600702, throughput 2.83124K wps
[Epoch 5 Batch 810/1540] avg loss 0.00552706, throughput 2.83969K wps
[Epoch 5 Batch 840/1540] avg loss 0.00609256, throughput 2.83689K wps
[Epoch 5 Batch 870/1540] avg loss 0.00580939, throughput 2.82218K wps
[Epoch 5 Batch 900/1540] avg loss 0.00577055, throughput 2.77796K wps
[Epoch 5 Batch 930/1540] avg loss 0.00587308, throughput 2.82682K wps
[Epoch 5 Batch 960/1540] avg loss 0.00583056, throughput 2.84056K wps
[Epoch 5 Batch 990/1540] avg loss 0.00535506, throughput 2.84084K wps
[Epoch 5 Batch 1020/1540] avg loss 0.00505792, throughput 2.84189K wps
[Epoch 5 Batch 1050/1540] avg loss 0.00535917, throughput 2.8109K wps
[Epoch 5 Batch 1080/1540] avg loss 0.00557203, throughput 2.83639K wps
[Epoch 5 Batch 1110/1540] avg loss 0.0057495, throughput 2.83182K wps
[Epoch 5 Batch 1140/1540] avg loss 0.00601303, throughput 2.84189K wps
[Epoch 5 Batch 1170/1540] avg loss 0.00578698, throughput 2.84595K wps
[Epoch 5 Batch 1200/1540] avg loss 0.00505328, throughput 2.83203K wps
[Epoch 5 Batch 1230/1540] avg loss 0.00562174, throughput 2.78082K wps
[Epoch 5 Batch 1260/1540] avg loss 0.00576919, throughput 2.83505K wps
[Epoch 5 Batch 1290/1540] avg loss 0.00574465, throughput 2.83771K wps
[Epoch 5 Batch 1320/1540] avg loss 0.00589139, throughput 2.83433K wps
[Epoch 5 Batch 1350/1540] avg loss 0.00552789, throughput 2.83847K wps
[Epoch 5 Batch 1380/1540] avg loss 0.0054342, throughput 2.78751K wps
[Epoch 5 Batch 1410/1540] avg loss 0.00543635, throughput 2.74686K wps
[Epoch 5 Batch 1440/1540] avg loss 0.00617062, throughput 2.83126K wps
[Epoch 5 Batch 1470/1540] avg loss 0.00566744, throughput 2.7678K wps
[Epoch 5 Batch 1500/1540] avg loss 0.00522805, throughput 2.74401K wps
[Epoch 5 Batch 1530/1540] avg loss 0.00573517, throughput 2.79484K wps
Begin Testing...
[Epoch 5] train avg loss 0.0057364, dev acc 0.8177, dev avg loss 0.397413, throughput 2.81427K wps
[Epoch 6 Batch 30/1540] avg loss 0.00568228, throughput 2.88918K wps
[Epoch 6 Batch 60/1540] avg loss 0.00536375, throughput 2.84012K wps
[Epoch 6 Batch 90/1540] avg loss 0.00564413, throughput 2.84078K wps
[Epoch 6 Batch 120/1540] avg loss 0.00530766, throughput 2.76192K wps
[Epoch 6 Batch 150/1540] avg loss 0.00591277, throughput 2.8089K wps
[Epoch 6 Batch 180/1540] avg loss 0.00512063, throughput 2.77527K wps
[Epoch 6 Batch 210/1540] avg loss 0.00515196, throughput 2.80345K wps
[Epoch 6 Batch 240/1540] avg loss 0.00610002, throughput 2.84134K wps
[Epoch 6 Batch 270/1540] avg loss 0.00549361, throughput 2.78808K wps
[Epoch 6 Batch 300/1540] avg loss 0.00530961, throughput 2.74754K wps
[Epoch 6 Batch 330/1540] avg loss 0.00513106, throughput 2.77207K wps
[Epoch 6 Batch 360/1540] avg loss 0.0055239, throughput 2.84131K wps
[Epoch 6 Batch 390/1540] avg loss 0.0055103, throughput 2.84297K wps
[Epoch 6 Batch 420/1540] avg loss 0.00557913, throughput 2.84385K wps
[Epoch 6 Batch 450/1540] avg loss 0.00520877, throughput 2.83338K wps
[Epoch 6 Batch 480/1540] avg loss 0.00572912, throughput 2.84188K wps
[Epoch 6 Batch 510/1540] avg loss 0.00493386, throughput 2.83716K wps
[Epoch 6 Batch 540/1540] avg loss 0.00519416, throughput 2.77703K wps
[Epoch 6 Batch 570/1540] avg loss 0.00497387, throughput 2.81258K wps
[Epoch 6 Batch 600/1540] avg loss 0.00526083, throughput 2.81135K wps
[Epoch 6 Batch 630/1540] avg loss 0.00530803, throughput 2.84002K wps
[Epoch 6 Batch 660/1540] avg loss 0.00512266, throughput 2.83478K wps
[Epoch 6 Batch 690/1540] avg loss 0.00517126, throughput 2.75093K wps
[Epoch 6 Batch 720/1540] avg loss 0.00555066, throughput 2.74736K wps
[Epoch 6 Batch 750/1540] avg loss 0.00526289, throughput 2.80421K wps
[Epoch 6 Batch 780/1540] avg loss 0.00554143, throughput 2.84396K wps
[Epoch 6 Batch 810/1540] avg loss 0.00538314, throughput 2.84514K wps
[Epoch 6 Batch 840/1540] avg loss 0.00549669, throughput 2.84437K wps
[Epoch 6 Batch 870/1540] avg loss 0.00572638, throughput 2.80801K wps
[Epoch 6 Batch 900/1540] avg loss 0.00525683, throughput 2.82246K wps
[Epoch 6 Batch 930/1540] avg loss 0.00514122, throughput 2.8438K wps
[Epoch 6 Batch 960/1540] avg loss 0.00542771, throughput 2.83966K wps
[Epoch 6 Batch 990/1540] avg loss 0.00543039, throughput 2.80074K wps
[Epoch 6 Batch 1020/1540] avg loss 0.00534982, throughput 2.81302K wps
[Epoch 6 Batch 1050/1540] avg loss 0.00529939, throughput 2.82662K wps
[Epoch 6 Batch 1080/1540] avg loss 0.00566534, throughput 2.84275K wps
[Epoch 6 Batch 1110/1540] avg loss 0.0052197, throughput 2.74667K wps
[Epoch 6 Batch 1140/1540] avg loss 0.00566072, throughput 2.78472K wps
[Epoch 6 Batch 1170/1540] avg loss 0.00508595, throughput 2.77437K wps
[Epoch 6 Batch 1200/1540] avg loss 0.00539457, throughput 2.74597K wps
[Epoch 6 Batch 1230/1540] avg loss 0.00510512, throughput 2.76977K wps
[Epoch 6 Batch 1260/1540] avg loss 0.00544075, throughput 2.8374K wps
[Epoch 6 Batch 1290/1540] avg loss 0.00552506, throughput 2.84661K wps
[Epoch 6 Batch 1320/1540] avg loss 0.00513002, throughput 2.79523K wps
[Epoch 6 Batch 1350/1540] avg loss 0.00557773, throughput 2.74037K wps
[Epoch 6 Batch 1380/1540] avg loss 0.00501288, throughput 2.77885K wps
[Epoch 6 Batch 1410/1540] avg loss 0.00525864, throughput 2.806K wps
[Epoch 6 Batch 1440/1540] avg loss 0.00534559, throughput 2.8357K wps
[Epoch 6 Batch 1470/1540] avg loss 0.00510577, throughput 2.834K wps
[Epoch 6 Batch 1500/1540] avg loss 0.00513261, throughput 2.84338K wps
[Epoch 6 Batch 1530/1540] avg loss 0.00545519, throughput 2.81792K wps
Begin Testing...
[Epoch 6] train avg loss 0.00536431, dev acc 0.8257, dev avg loss 0.394096, throughput 2.81091K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 7 Batch 30/1540] avg loss 0.00520467, throughput 2.85959K wps
[Epoch 7 Batch 60/1540] avg loss 0.00565679, throughput 2.83216K wps
[Epoch 7 Batch 90/1540] avg loss 0.00473224, throughput 2.82341K wps
[Epoch 7 Batch 120/1540] avg loss 0.00486042, throughput 2.80058K wps
[Epoch 7 Batch 150/1540] avg loss 0.00530847, throughput 2.83308K wps
[Epoch 7 Batch 180/1540] avg loss 0.00489425, throughput 2.8141K wps
[Epoch 7 Batch 210/1540] avg loss 0.00498323, throughput 2.75105K wps
[Epoch 7 Batch 240/1540] avg loss 0.00562446, throughput 2.75656K wps
[Epoch 7 Batch 270/1540] avg loss 0.00534929, throughput 2.8354K wps
[Epoch 7 Batch 300/1540] avg loss 0.00518119, throughput 2.77657K wps
[Epoch 7 Batch 330/1540] avg loss 0.00471672, throughput 2.77139K wps
[Epoch 7 Batch 360/1540] avg loss 0.00482522, throughput 2.73721K wps
[Epoch 7 Batch 390/1540] avg loss 0.00474128, throughput 2.73411K wps
[Epoch 7 Batch 420/1540] avg loss 0.00542026, throughput 2.79028K wps
[Epoch 7 Batch 450/1540] avg loss 0.00532826, throughput 2.83497K wps
[Epoch 7 Batch 480/1540] avg loss 0.00490905, throughput 2.79477K wps
[Epoch 7 Batch 510/1540] avg loss 0.00533997, throughput 2.83197K wps
[Epoch 7 Batch 540/1540] avg loss 0.00487463, throughput 2.76506K wps
[Epoch 7 Batch 570/1540] avg loss 0.00514006, throughput 2.8096K wps
[Epoch 7 Batch 600/1540] avg loss 0.00505469, throughput 2.83085K wps
[Epoch 7 Batch 630/1540] avg loss 0.00522607, throughput 2.84316K wps
[Epoch 7 Batch 660/1540] avg loss 0.00526874, throughput 2.83504K wps
[Epoch 7 Batch 690/1540] avg loss 0.00514582, throughput 2.73421K wps
[Epoch 7 Batch 720/1540] avg loss 0.00543294, throughput 2.79255K wps
[Epoch 7 Batch 750/1540] avg loss 0.00527122, throughput 2.75046K wps
[Epoch 7 Batch 780/1540] avg loss 0.00512951, throughput 2.7434K wps
[Epoch 7 Batch 810/1540] avg loss 0.00450739, throughput 2.76623K wps
[Epoch 7 Batch 840/1540] avg loss 0.00519998, throughput 2.83816K wps
[Epoch 7 Batch 870/1540] avg loss 0.0055776, throughput 2.8438K wps
[Epoch 7 Batch 900/1540] avg loss 0.0048801, throughput 2.76281K wps
[Epoch 7 Batch 930/1540] avg loss 0.004381, throughput 2.75144K wps
[Epoch 7 Batch 960/1540] avg loss 0.00534786, throughput 2.74786K wps
[Epoch 7 Batch 990/1540] avg loss 0.00494561, throughput 2.77967K wps
[Epoch 7 Batch 1020/1540] avg loss 0.00482273, throughput 2.82822K wps
[Epoch 7 Batch 1050/1540] avg loss 0.00488815, throughput 2.76369K wps
[Epoch 7 Batch 1080/1540] avg loss 0.00461612, throughput 2.76377K wps
[Epoch 7 Batch 1110/1540] avg loss 0.00484443, throughput 2.8424K wps
[Epoch 7 Batch 1140/1540] avg loss 0.0051826, throughput 2.7899K wps
[Epoch 7 Batch 1170/1540] avg loss 0.00495444, throughput 2.84343K wps
[Epoch 7 Batch 1200/1540] avg loss 0.00497228, throughput 2.79229K wps
[Epoch 7 Batch 1230/1540] avg loss 0.00500082, throughput 2.84229K wps
[Epoch 7 Batch 1260/1540] avg loss 0.0052297, throughput 2.7931K wps
[Epoch 7 Batch 1290/1540] avg loss 0.0052132, throughput 2.82882K wps
[Epoch 7 Batch 1320/1540] avg loss 0.00540347, throughput 2.79896K wps
[Epoch 7 Batch 1350/1540] avg loss 0.00455569, throughput 2.83949K wps
[Epoch 7 Batch 1380/1540] avg loss 0.00499058, throughput 2.83887K wps
[Epoch 7 Batch 1410/1540] avg loss 0.00459107, throughput 2.79412K wps
[Epoch 7 Batch 1440/1540] avg loss 0.00520236, throughput 2.7479K wps
[Epoch 7 Batch 1470/1540] avg loss 0.004996, throughput 2.77207K wps
[Epoch 7 Batch 1500/1540] avg loss 0.0048442, throughput 2.81626K wps
[Epoch 7 Batch 1530/1540] avg loss 0.00494675, throughput 2.83506K wps
Begin Testing...
[Epoch 7] train avg loss 0.00505297, dev acc 0.8268, dev avg loss 0.395769, throughput 2.79784K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 8 Batch 30/1540] avg loss 0.00445546, throughput 2.80387K wps
[Epoch 8 Batch 60/1540] avg loss 0.00469889, throughput 2.78246K wps
[Epoch 8 Batch 90/1540] avg loss 0.00440946, throughput 2.84441K wps
[Epoch 8 Batch 120/1540] avg loss 0.00508429, throughput 2.84282K wps
[Epoch 8 Batch 150/1540] avg loss 0.00480997, throughput 2.83741K wps
[Epoch 8 Batch 180/1540] avg loss 0.00484465, throughput 2.76764K wps
[Epoch 8 Batch 210/1540] avg loss 0.0048201, throughput 2.75157K wps
[Epoch 8 Batch 240/1540] avg loss 0.00472864, throughput 2.78635K wps
[Epoch 8 Batch 270/1540] avg loss 0.0044455, throughput 2.82811K wps
[Epoch 8 Batch 300/1540] avg loss 0.00484938, throughput 2.8389K wps
[Epoch 8 Batch 330/1540] avg loss 0.00460504, throughput 2.80984K wps
[Epoch 8 Batch 360/1540] avg loss 0.00454008, throughput 2.75564K wps
[Epoch 8 Batch 390/1540] avg loss 0.00425204, throughput 2.7607K wps
[Epoch 8 Batch 420/1540] avg loss 0.0045722, throughput 2.82334K wps
[Epoch 8 Batch 450/1540] avg loss 0.00477629, throughput 2.78755K wps
[Epoch 8 Batch 480/1540] avg loss 0.0047431, throughput 2.81841K wps
[Epoch 8 Batch 510/1540] avg loss 0.00480707, throughput 2.77462K wps
[Epoch 8 Batch 540/1540] avg loss 0.00473215, throughput 2.78127K wps
[Epoch 8 Batch 570/1540] avg loss 0.00501918, throughput 2.82213K wps
[Epoch 8 Batch 600/1540] avg loss 0.00515938, throughput 2.78283K wps
[Epoch 8 Batch 630/1540] avg loss 0.00489268, throughput 2.83584K wps
[Epoch 8 Batch 660/1540] avg loss 0.00475874, throughput 2.83402K wps
[Epoch 8 Batch 690/1540] avg loss 0.00505327, throughput 2.81876K wps
[Epoch 8 Batch 720/1540] avg loss 0.0047118, throughput 2.83506K wps
[Epoch 8 Batch 750/1540] avg loss 0.0049703, throughput 2.83559K wps
[Epoch 8 Batch 780/1540] avg loss 0.00479523, throughput 2.76266K wps
[Epoch 8 Batch 810/1540] avg loss 0.00500654, throughput 2.74973K wps
[Epoch 8 Batch 840/1540] avg loss 0.00466969, throughput 2.826K wps
[Epoch 8 Batch 870/1540] avg loss 0.00504605, throughput 2.8127K wps
[Epoch 8 Batch 900/1540] avg loss 0.0043166, throughput 2.74917K wps
[Epoch 8 Batch 930/1540] avg loss 0.00479088, throughput 2.74246K wps
[Epoch 8 Batch 960/1540] avg loss 0.00436859, throughput 2.81916K wps
[Epoch 8 Batch 990/1540] avg loss 0.00512153, throughput 2.83686K wps
[Epoch 8 Batch 1020/1540] avg loss 0.00521028, throughput 2.84118K wps
[Epoch 8 Batch 1050/1540] avg loss 0.00492888, throughput 2.77998K wps
[Epoch 8 Batch 1080/1540] avg loss 0.0043597, throughput 2.74982K wps
[Epoch 8 Batch 1110/1540] avg loss 0.00481133, throughput 2.77467K wps
[Epoch 8 Batch 1140/1540] avg loss 0.00484113, throughput 2.84128K wps
[Epoch 8 Batch 1170/1540] avg loss 0.00487179, throughput 2.83969K wps
[Epoch 8 Batch 1200/1540] avg loss 0.00560276, throughput 2.84421K wps
[Epoch 8 Batch 1230/1540] avg loss 0.00527435, throughput 2.84342K wps
[Epoch 8 Batch 1260/1540] avg loss 0.00466746, throughput 2.78885K wps
[Epoch 8 Batch 1290/1540] avg loss 0.00460431, throughput 2.81327K wps
[Epoch 8 Batch 1320/1540] avg loss 0.00500118, throughput 2.83839K wps
[Epoch 8 Batch 1350/1540] avg loss 0.00471581, throughput 2.77175K wps
[Epoch 8 Batch 1380/1540] avg loss 0.00501826, throughput 2.78393K wps
[Epoch 8 Batch 1410/1540] avg loss 0.00438607, throughput 2.83875K wps
[Epoch 8 Batch 1440/1540] avg loss 0.00451276, throughput 2.73838K wps
[Epoch 8 Batch 1470/1540] avg loss 0.00493861, throughput 2.80573K wps
[Epoch 8 Batch 1500/1540] avg loss 0.00463572, throughput 2.83691K wps
[Epoch 8 Batch 1530/1540] avg loss 0.00449114, throughput 2.83629K wps
Begin Testing...
[Epoch 8] train avg loss 0.00477993, dev acc 0.8280, dev avg loss 0.396625, throughput 2.80419K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 9 Batch 30/1540] avg loss 0.00457333, throughput 2.89519K wps
[Epoch 9 Batch 60/1540] avg loss 0.00451278, throughput 2.84091K wps
[Epoch 9 Batch 90/1540] avg loss 0.00428362, throughput 2.80774K wps
[Epoch 9 Batch 120/1540] avg loss 0.004699, throughput 2.81925K wps
[Epoch 9 Batch 150/1540] avg loss 0.004886, throughput 2.84161K wps
[Epoch 9 Batch 180/1540] avg loss 0.0045021, throughput 2.79442K wps
[Epoch 9 Batch 210/1540] avg loss 0.00465881, throughput 2.75136K wps
[Epoch 9 Batch 240/1540] avg loss 0.00423326, throughput 2.8215K wps
[Epoch 9 Batch 270/1540] avg loss 0.00440065, throughput 2.76607K wps
[Epoch 9 Batch 300/1540] avg loss 0.00456845, throughput 2.75196K wps
[Epoch 9 Batch 330/1540] avg loss 0.00435031, throughput 2.75337K wps
[Epoch 9 Batch 360/1540] avg loss 0.00454499, throughput 2.82265K wps
[Epoch 9 Batch 390/1540] avg loss 0.00418916, throughput 2.78223K wps
[Epoch 9 Batch 420/1540] avg loss 0.00429985, throughput 2.83323K wps
[Epoch 9 Batch 450/1540] avg loss 0.00431411, throughput 2.79547K wps
[Epoch 9 Batch 480/1540] avg loss 0.0047041, throughput 2.82384K wps
[Epoch 9 Batch 510/1540] avg loss 0.00473975, throughput 2.84064K wps
[Epoch 9 Batch 540/1540] avg loss 0.00434383, throughput 2.8355K wps
[Epoch 9 Batch 570/1540] avg loss 0.00480871, throughput 2.81281K wps
[Epoch 9 Batch 600/1540] avg loss 0.00453547, throughput 2.77966K wps
[Epoch 9 Batch 630/1540] avg loss 0.00423553, throughput 2.78968K wps
[Epoch 9 Batch 660/1540] avg loss 0.00493691, throughput 2.77816K wps
[Epoch 9 Batch 690/1540] avg loss 0.00413638, throughput 2.83233K wps
[Epoch 9 Batch 720/1540] avg loss 0.00476516, throughput 2.83408K wps
[Epoch 9 Batch 750/1540] avg loss 0.00409829, throughput 2.84486K wps
[Epoch 9 Batch 780/1540] avg loss 0.00440653, throughput 2.78198K wps
[Epoch 9 Batch 810/1540] avg loss 0.00468679, throughput 2.74225K wps
[Epoch 9 Batch 840/1540] avg loss 0.0041498, throughput 2.74637K wps
[Epoch 9 Batch 870/1540] avg loss 0.00444711, throughput 2.74421K wps
[Epoch 9 Batch 900/1540] avg loss 0.00447343, throughput 2.79913K wps
[Epoch 9 Batch 930/1540] avg loss 0.00496936, throughput 2.78265K wps
[Epoch 9 Batch 960/1540] avg loss 0.00442585, throughput 2.84319K wps
[Epoch 9 Batch 990/1540] avg loss 0.00465202, throughput 2.7847K wps
[Epoch 9 Batch 1020/1540] avg loss 0.00438483, throughput 2.75455K wps
[Epoch 9 Batch 1050/1540] avg loss 0.00493468, throughput 2.75295K wps
[Epoch 9 Batch 1080/1540] avg loss 0.00449724, throughput 2.79065K wps
[Epoch 9 Batch 1110/1540] avg loss 0.00428149, throughput 2.80414K wps
[Epoch 9 Batch 1140/1540] avg loss 0.00470543, throughput 2.83069K wps
[Epoch 9 Batch 1170/1540] avg loss 0.00471419, throughput 2.84056K wps
[Epoch 9 Batch 1200/1540] avg loss 0.00452752, throughput 2.83136K wps
[Epoch 9 Batch 1230/1540] avg loss 0.00461886, throughput 2.83398K wps
[Epoch 9 Batch 1260/1540] avg loss 0.00484141, throughput 2.82932K wps
[Epoch 9 Batch 1290/1540] avg loss 0.00421861, throughput 2.78367K wps
[Epoch 9 Batch 1320/1540] avg loss 0.00460272, throughput 2.78433K wps
[Epoch 9 Batch 1350/1540] avg loss 0.00422318, throughput 2.77164K wps
[Epoch 9 Batch 1380/1540] avg loss 0.00475474, throughput 2.79631K wps
[Epoch 9 Batch 1410/1540] avg loss 0.00459386, throughput 2.77649K wps
[Epoch 9 Batch 1440/1540] avg loss 0.00455562, throughput 2.83996K wps
[Epoch 9 Batch 1470/1540] avg loss 0.00466989, throughput 2.83656K wps
[Epoch 9 Batch 1500/1540] avg loss 0.00453766, throughput 2.84204K wps
[Epoch 9 Batch 1530/1540] avg loss 0.00426955, throughput 2.83884K wps
Begin Testing...
[Epoch 9] train avg loss 0.00452796, dev acc 0.8257, dev avg loss 0.402884, throughput 2.80395K wps
[Epoch 10 Batch 30/1540] avg loss 0.00404999, throughput 2.80331K wps
[Epoch 10 Batch 60/1540] avg loss 0.00434378, throughput 2.77984K wps
[Epoch 10 Batch 90/1540] avg loss 0.00463123, throughput 2.8121K wps
[Epoch 10 Batch 120/1540] avg loss 0.00398319, throughput 2.76246K wps
[Epoch 10 Batch 150/1540] avg loss 0.00426683, throughput 2.76045K wps
[Epoch 10 Batch 180/1540] avg loss 0.00371342, throughput 2.79399K wps
[Epoch 10 Batch 210/1540] avg loss 0.00394379, throughput 2.76114K wps
[Epoch 10 Batch 240/1540] avg loss 0.0047569, throughput 2.81751K wps
[Epoch 10 Batch 270/1540] avg loss 0.00420555, throughput 2.82318K wps
[Epoch 10 Batch 300/1540] avg loss 0.00415605, throughput 2.83717K wps
[Epoch 10 Batch 330/1540] avg loss 0.00387746, throughput 2.83875K wps
[Epoch 10 Batch 360/1540] avg loss 0.00424602, throughput 2.83219K wps
[Epoch 10 Batch 390/1540] avg loss 0.00430127, throughput 2.80013K wps
[Epoch 10 Batch 420/1540] avg loss 0.00428835, throughput 2.78904K wps
[Epoch 10 Batch 450/1540] avg loss 0.00430162, throughput 2.767K wps
[Epoch 10 Batch 480/1540] avg loss 0.00464627, throughput 2.83174K wps
[Epoch 10 Batch 510/1540] avg loss 0.00384929, throughput 2.77601K wps
[Epoch 10 Batch 540/1540] avg loss 0.00467797, throughput 2.77299K wps
[Epoch 10 Batch 570/1540] avg loss 0.004292, throughput 2.75746K wps
[Epoch 10 Batch 600/1540] avg loss 0.00479075, throughput 2.81237K wps
[Epoch 10 Batch 630/1540] avg loss 0.00504451, throughput 2.83832K wps
[Epoch 10 Batch 660/1540] avg loss 0.00441538, throughput 2.82657K wps
[Epoch 10 Batch 690/1540] avg loss 0.00408778, throughput 2.84049K wps
[Epoch 10 Batch 720/1540] avg loss 0.00436651, throughput 2.83446K wps
[Epoch 10 Batch 750/1540] avg loss 0.00410101, throughput 2.80224K wps
[Epoch 10 Batch 780/1540] avg loss 0.00441403, throughput 2.83804K wps
[Epoch 10 Batch 810/1540] avg loss 0.00404082, throughput 2.80981K wps
[Epoch 10 Batch 840/1540] avg loss 0.00394178, throughput 2.83201K wps
[Epoch 10 Batch 870/1540] avg loss 0.00474074, throughput 2.83648K wps
[Epoch 10 Batch 900/1540] avg loss 0.00433438, throughput 2.84091K wps
[Epoch 10 Batch 930/1540] avg loss 0.00410834, throughput 2.79754K wps
[Epoch 10 Batch 960/1540] avg loss 0.00426186, throughput 2.81463K wps
[Epoch 10 Batch 990/1540] avg loss 0.00431206, throughput 2.7757K wps
[Epoch 10 Batch 1020/1540] avg loss 0.00453587, throughput 2.80255K wps
[Epoch 10 Batch 1050/1540] avg loss 0.00437808, throughput 2.80807K wps
[Epoch 10 Batch 1080/1540] avg loss 0.00507696, throughput 2.8345K wps
[Epoch 10 Batch 1110/1540] avg loss 0.00431406, throughput 2.83838K wps
[Epoch 10 Batch 1140/1540] avg loss 0.00405776, throughput 2.80904K wps
[Epoch 10 Batch 1170/1540] avg loss 0.00435042, throughput 2.74604K wps
[Epoch 10 Batch 1200/1540] avg loss 0.00430052, throughput 2.74479K wps
[Epoch 10 Batch 1230/1540] avg loss 0.00415985, throughput 2.77302K wps
[Epoch 10 Batch 1260/1540] avg loss 0.0043815, throughput 2.7954K wps
[Epoch 10 Batch 1290/1540] avg loss 0.00455683, throughput 2.75935K wps
[Epoch 10 Batch 1320/1540] avg loss 0.00440808, throughput 2.8275K wps
[Epoch 10 Batch 1350/1540] avg loss 0.00410695, throughput 2.81305K wps
[Epoch 10 Batch 1380/1540] avg loss 0.00414027, throughput 2.78434K wps
[Epoch 10 Batch 1410/1540] avg loss 0.00415287, throughput 2.77797K wps
[Epoch 10 Batch 1440/1540] avg loss 0.00398985, throughput 2.80772K wps
[Epoch 10 Batch 1470/1540] avg loss 0.00418318, throughput 2.84206K wps
[Epoch 10 Batch 1500/1540] avg loss 0.00448284, throughput 2.80257K wps
[Epoch 10 Batch 1530/1540] avg loss 0.00421464, throughput 2.73919K wps
Begin Testing...
[Epoch 10] train avg loss 0.00430427, dev acc 0.8234, dev avg loss 0.404862, throughput 2.80183K wps
[Epoch 11 Batch 30/1540] avg loss 0.00429176, throughput 2.84942K wps
[Epoch 11 Batch 60/1540] avg loss 0.0045061, throughput 2.79672K wps
[Epoch 11 Batch 90/1540] avg loss 0.00393104, throughput 2.79976K wps
[Epoch 11 Batch 120/1540] avg loss 0.00444577, throughput 2.80845K wps
[Epoch 11 Batch 150/1540] avg loss 0.00433763, throughput 2.78474K wps
[Epoch 11 Batch 180/1540] avg loss 0.00409561, throughput 2.77171K wps
[Epoch 11 Batch 210/1540] avg loss 0.00412813, throughput 2.79273K wps
[Epoch 11 Batch 240/1540] avg loss 0.00401006, throughput 2.83346K wps
[Epoch 11 Batch 270/1540] avg loss 0.00394486, throughput 2.82034K wps
[Epoch 11 Batch 300/1540] avg loss 0.00392187, throughput 2.83276K wps
[Epoch 11 Batch 330/1540] avg loss 0.00439889, throughput 2.82849K wps
[Epoch 11 Batch 360/1540] avg loss 0.00389768, throughput 2.81374K wps
[Epoch 11 Batch 390/1540] avg loss 0.00414078, throughput 2.83756K wps
[Epoch 11 Batch 420/1540] avg loss 0.00423887, throughput 2.83288K wps
[Epoch 11 Batch 450/1540] avg loss 0.00415232, throughput 2.83297K wps
[Epoch 11 Batch 480/1540] avg loss 0.00404521, throughput 2.77469K wps
[Epoch 11 Batch 510/1540] avg loss 0.00429682, throughput 2.75914K wps
[Epoch 11 Batch 540/1540] avg loss 0.0040787, throughput 2.78007K wps
[Epoch 11 Batch 570/1540] avg loss 0.00356136, throughput 2.83682K wps
[Epoch 11 Batch 600/1540] avg loss 0.00360597, throughput 2.7485K wps
[Epoch 11 Batch 630/1540] avg loss 0.00396982, throughput 2.83543K wps
[Epoch 11 Batch 660/1540] avg loss 0.00422583, throughput 2.77412K wps
[Epoch 11 Batch 690/1540] avg loss 0.00395218, throughput 2.79436K wps
[Epoch 11 Batch 720/1540] avg loss 0.00379211, throughput 2.81958K wps
[Epoch 11 Batch 750/1540] avg loss 0.00390468, throughput 2.82481K wps
[Epoch 11 Batch 780/1540] avg loss 0.00373197, throughput 2.84364K wps
[Epoch 11 Batch 810/1540] avg loss 0.00432007, throughput 2.84826K wps
[Epoch 11 Batch 840/1540] avg loss 0.00387938, throughput 2.83962K wps
[Epoch 11 Batch 870/1540] avg loss 0.00384119, throughput 2.75358K wps
[Epoch 11 Batch 900/1540] avg loss 0.00428336, throughput 2.81232K wps
[Epoch 11 Batch 930/1540] avg loss 0.00420879, throughput 2.7715K wps
[Epoch 11 Batch 960/1540] avg loss 0.00422392, throughput 2.74374K wps
[Epoch 11 Batch 990/1540] avg loss 0.00434466, throughput 2.7483K wps
[Epoch 11 Batch 1020/1540] avg loss 0.00451735, throughput 2.80122K wps
[Epoch 11 Batch 1050/1540] avg loss 0.00399469, throughput 2.83653K wps
[Epoch 11 Batch 1080/1540] avg loss 0.00390676, throughput 2.7987K wps
[Epoch 11 Batch 1110/1540] avg loss 0.0040662, throughput 2.80051K wps
[Epoch 11 Batch 1140/1540] avg loss 0.00439763, throughput 2.76444K wps
[Epoch 11 Batch 1170/1540] avg loss 0.00438829, throughput 2.7854K wps
[Epoch 11 Batch 1200/1540] avg loss 0.00404161, throughput 2.73634K wps
[Epoch 11 Batch 1230/1540] avg loss 0.00382283, throughput 2.82758K wps
[Epoch 11 Batch 1260/1540] avg loss 0.00409043, throughput 2.8379K wps
[Epoch 11 Batch 1290/1540] avg loss 0.00409543, throughput 2.74465K wps
[Epoch 11 Batch 1320/1540] avg loss 0.00408258, throughput 2.7434K wps
[Epoch 11 Batch 1350/1540] avg loss 0.00454729, throughput 2.80054K wps
[Epoch 11 Batch 1380/1540] avg loss 0.00401724, throughput 2.79185K wps
[Epoch 11 Batch 1410/1540] avg loss 0.00402934, throughput 2.83849K wps
[Epoch 11 Batch 1440/1540] avg loss 0.00428705, throughput 2.81795K wps
[Epoch 11 Batch 1470/1540] avg loss 0.00415758, throughput 2.76681K wps
[Epoch 11 Batch 1500/1540] avg loss 0.00422309, throughput 2.83763K wps
[Epoch 11 Batch 1530/1540] avg loss 0.00437778, throughput 2.81957K wps
Begin Testing...
[Epoch 11] train avg loss 0.00411635, dev acc 0.8222, dev avg loss 0.411877, throughput 2.80112K wps
[Epoch 12 Batch 30/1540] avg loss 0.00380671, throughput 2.80818K wps
[Epoch 12 Batch 60/1540] avg loss 0.00405056, throughput 2.7979K wps
[Epoch 12 Batch 90/1540] avg loss 0.00450817, throughput 2.79375K wps
[Epoch 12 Batch 120/1540] avg loss 0.0042853, throughput 2.84117K wps
[Epoch 12 Batch 150/1540] avg loss 0.00437062, throughput 2.84295K wps
[Epoch 12 Batch 180/1540] avg loss 0.00384883, throughput 2.83921K wps
[Epoch 12 Batch 210/1540] avg loss 0.0036374, throughput 2.83264K wps
[Epoch 12 Batch 240/1540] avg loss 0.00395431, throughput 2.75621K wps
[Epoch 12 Batch 270/1540] avg loss 0.00385069, throughput 2.74784K wps
[Epoch 12 Batch 300/1540] avg loss 0.00402883, throughput 2.83084K wps
[Epoch 12 Batch 330/1540] avg loss 0.00397979, throughput 2.80897K wps
[Epoch 12 Batch 360/1540] avg loss 0.00368586, throughput 2.84049K wps
[Epoch 12 Batch 390/1540] avg loss 0.00368034, throughput 2.78377K wps
[Epoch 12 Batch 420/1540] avg loss 0.00431371, throughput 2.74627K wps
[Epoch 12 Batch 450/1540] avg loss 0.00411431, throughput 2.74999K wps
[Epoch 12 Batch 480/1540] avg loss 0.004497, throughput 2.84179K wps
[Epoch 12 Batch 510/1540] avg loss 0.0036069, throughput 2.77979K wps
[Epoch 12 Batch 540/1540] avg loss 0.00369155, throughput 2.75829K wps
[Epoch 12 Batch 570/1540] avg loss 0.00401245, throughput 2.75074K wps
[Epoch 12 Batch 600/1540] avg loss 0.00391403, throughput 2.78249K wps
[Epoch 12 Batch 630/1540] avg loss 0.00420638, throughput 2.83487K wps
[Epoch 12 Batch 660/1540] avg loss 0.00412267, throughput 2.82862K wps
[Epoch 12 Batch 690/1540] avg loss 0.00373393, throughput 2.84K wps
[Epoch 12 Batch 720/1540] avg loss 0.00399554, throughput 2.80435K wps
[Epoch 12 Batch 750/1540] avg loss 0.00366271, throughput 2.74819K wps
[Epoch 12 Batch 780/1540] avg loss 0.00361945, throughput 2.74362K wps
[Epoch 12 Batch 810/1540] avg loss 0.0041949, throughput 2.75607K wps
[Epoch 12 Batch 840/1540] avg loss 0.00411831, throughput 2.74651K wps
[Epoch 12 Batch 870/1540] avg loss 0.00448377, throughput 2.7451K wps
[Epoch 12 Batch 900/1540] avg loss 0.00410668, throughput 2.82585K wps
[Epoch 12 Batch 930/1540] avg loss 0.00374271, throughput 2.83772K wps
[Epoch 12 Batch 960/1540] avg loss 0.00394915, throughput 2.819K wps
[Epoch 12 Batch 990/1540] avg loss 0.00394037, throughput 2.75718K wps
[Epoch 12 Batch 1020/1540] avg loss 0.00382545, throughput 2.79017K wps
[Epoch 12 Batch 1050/1540] avg loss 0.00359189, throughput 2.8422K wps
[Epoch 12 Batch 1080/1540] avg loss 0.00380519, throughput 2.82318K wps
[Epoch 12 Batch 1110/1540] avg loss 0.00406976, throughput 2.80019K wps
[Epoch 12 Batch 1140/1540] avg loss 0.00369401, throughput 2.83427K wps
[Epoch 12 Batch 1170/1540] avg loss 0.00395149, throughput 2.81652K wps
[Epoch 12 Batch 1200/1540] avg loss 0.00382967, throughput 2.83104K wps
[Epoch 12 Batch 1230/1540] avg loss 0.00396218, throughput 2.84058K wps
[Epoch 12 Batch 1260/1540] avg loss 0.00396803, throughput 2.75828K wps
[Epoch 12 Batch 1290/1540] avg loss 0.00421415, throughput 2.83724K wps
[Epoch 12 Batch 1320/1540] avg loss 0.00360586, throughput 2.79144K wps
[Epoch 12 Batch 1350/1540] avg loss 0.0040514, throughput 2.74918K wps
[Epoch 12 Batch 1380/1540] avg loss 0.00386425, throughput 2.78397K wps
[Epoch 12 Batch 1410/1540] avg loss 0.0038789, throughput 2.81266K wps
[Epoch 12 Batch 1440/1540] avg loss 0.00355372, throughput 2.82315K wps
[Epoch 12 Batch 1470/1540] avg loss 0.00361506, throughput 2.84077K wps
[Epoch 12 Batch 1500/1540] avg loss 0.00394488, throughput 2.84287K wps
[Epoch 12 Batch 1530/1540] avg loss 0.0038878, throughput 2.83521K wps
Begin Testing...
[Epoch 12] train avg loss 0.00394352, dev acc 0.8291, dev avg loss 0.415417, throughput 2.8006K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 13 Batch 30/1540] avg loss 0.0036989, throughput 2.79602K wps
[Epoch 13 Batch 60/1540] avg loss 0.00410115, throughput 2.76109K wps
[Epoch 13 Batch 90/1540] avg loss 0.00427799, throughput 2.77041K wps
[Epoch 13 Batch 120/1540] avg loss 0.00362975, throughput 2.84134K wps
[Epoch 13 Batch 150/1540] avg loss 0.00375893, throughput 2.76932K wps
[Epoch 13 Batch 180/1540] avg loss 0.00397839, throughput 2.7469K wps
[Epoch 13 Batch 210/1540] avg loss 0.00387289, throughput 2.80087K wps
[Epoch 13 Batch 240/1540] avg loss 0.00375101, throughput 2.84137K wps
[Epoch 13 Batch 270/1540] avg loss 0.00359727, throughput 2.84103K wps
[Epoch 13 Batch 300/1540] avg loss 0.00389767, throughput 2.83465K wps
[Epoch 13 Batch 330/1540] avg loss 0.00370883, throughput 2.84146K wps
[Epoch 13 Batch 360/1540] avg loss 0.0035583, throughput 2.74642K wps
[Epoch 13 Batch 390/1540] avg loss 0.00369682, throughput 2.75827K wps
[Epoch 13 Batch 420/1540] avg loss 0.00408053, throughput 2.78878K wps
[Epoch 13 Batch 450/1540] avg loss 0.00356142, throughput 2.78746K wps
[Epoch 13 Batch 480/1540] avg loss 0.00372789, throughput 2.80488K wps
[Epoch 13 Batch 510/1540] avg loss 0.00397909, throughput 2.81226K wps
[Epoch 13 Batch 540/1540] avg loss 0.00387917, throughput 2.83777K wps
[Epoch 13 Batch 570/1540] avg loss 0.0037723, throughput 2.83968K wps
[Epoch 13 Batch 600/1540] avg loss 0.00345128, throughput 2.82687K wps
[Epoch 13 Batch 630/1540] avg loss 0.00418718, throughput 2.83422K wps
[Epoch 13 Batch 660/1540] avg loss 0.00378507, throughput 2.84308K wps
[Epoch 13 Batch 690/1540] avg loss 0.00361521, throughput 2.84248K wps
[Epoch 13 Batch 720/1540] avg loss 0.00380045, throughput 2.8327K wps
[Epoch 13 Batch 750/1540] avg loss 0.00369191, throughput 2.76622K wps
[Epoch 13 Batch 780/1540] avg loss 0.0036062, throughput 2.83971K wps
[Epoch 13 Batch 810/1540] avg loss 0.00361372, throughput 2.83417K wps
[Epoch 13 Batch 840/1540] avg loss 0.00373413, throughput 2.83931K wps
[Epoch 13 Batch 870/1540] avg loss 0.00397958, throughput 2.81404K wps
[Epoch 13 Batch 900/1540] avg loss 0.00365204, throughput 2.74314K wps
[Epoch 13 Batch 930/1540] avg loss 0.0038307, throughput 2.74759K wps
[Epoch 13 Batch 960/1540] avg loss 0.00320756, throughput 2.8089K wps
[Epoch 13 Batch 990/1540] avg loss 0.00347082, throughput 2.76991K wps
[Epoch 13 Batch 1020/1540] avg loss 0.00380585, throughput 2.82663K wps
[Epoch 13 Batch 1050/1540] avg loss 0.00354978, throughput 2.84702K wps
[Epoch 13 Batch 1080/1540] avg loss 0.00382514, throughput 2.85063K wps
[Epoch 13 Batch 1110/1540] avg loss 0.00364803, throughput 2.82111K wps
[Epoch 13 Batch 1140/1540] avg loss 0.00406116, throughput 2.84521K wps
[Epoch 13 Batch 1170/1540] avg loss 0.00389939, throughput 2.80159K wps
[Epoch 13 Batch 1200/1540] avg loss 0.00411847, throughput 2.75537K wps
[Epoch 13 Batch 1230/1540] avg loss 0.00371876, throughput 2.76502K wps
[Epoch 13 Batch 1260/1540] avg loss 0.00367094, throughput 2.75605K wps
[Epoch 13 Batch 1290/1540] avg loss 0.0039963, throughput 2.8203K wps
[Epoch 13 Batch 1320/1540] avg loss 0.00390812, throughput 2.82554K wps
[Epoch 13 Batch 1350/1540] avg loss 0.00331576, throughput 2.76483K wps
[Epoch 13 Batch 1380/1540] avg loss 0.00389223, throughput 2.84388K wps
[Epoch 13 Batch 1410/1540] avg loss 0.00368224, throughput 2.82687K wps
[Epoch 13 Batch 1440/1540] avg loss 0.00433739, throughput 2.83848K wps
[Epoch 13 Batch 1470/1540] avg loss 0.00336801, throughput 2.84336K wps
[Epoch 13 Batch 1500/1540] avg loss 0.0036589, throughput 2.84297K wps
[Epoch 13 Batch 1530/1540] avg loss 0.00357652, throughput 2.84775K wps
Begin Testing...
[Epoch 13] train avg loss 0.00376927, dev acc 0.8222, dev avg loss 0.418496, throughput 2.80925K wps
[Epoch 14 Batch 30/1540] avg loss 0.00375589, throughput 2.83508K wps
[Epoch 14 Batch 60/1540] avg loss 0.003248, throughput 2.77985K wps
[Epoch 14 Batch 90/1540] avg loss 0.00338376, throughput 2.78526K wps
[Epoch 14 Batch 120/1540] avg loss 0.00409107, throughput 2.7904K wps
[Epoch 14 Batch 150/1540] avg loss 0.00362764, throughput 2.7547K wps
[Epoch 14 Batch 180/1540] avg loss 0.00358958, throughput 2.76629K wps
[Epoch 14 Batch 210/1540] avg loss 0.00354955, throughput 2.84135K wps
[Epoch 14 Batch 240/1540] avg loss 0.00351558, throughput 2.80586K wps
[Epoch 14 Batch 270/1540] avg loss 0.00397982, throughput 2.81493K wps
[Epoch 14 Batch 300/1540] avg loss 0.00393555, throughput 2.80477K wps
[Epoch 14 Batch 330/1540] avg loss 0.00363977, throughput 2.80546K wps
[Epoch 14 Batch 360/1540] avg loss 0.00355703, throughput 2.84141K wps
[Epoch 14 Batch 390/1540] avg loss 0.00357388, throughput 2.84536K wps
[Epoch 14 Batch 420/1540] avg loss 0.00358615, throughput 2.82546K wps
[Epoch 14 Batch 450/1540] avg loss 0.00408796, throughput 2.84417K wps
[Epoch 14 Batch 480/1540] avg loss 0.0034912, throughput 2.77129K wps
[Epoch 14 Batch 510/1540] avg loss 0.00356565, throughput 2.83961K wps
[Epoch 14 Batch 540/1540] avg loss 0.00367149, throughput 2.83733K wps
[Epoch 14 Batch 570/1540] avg loss 0.00389759, throughput 2.80747K wps
[Epoch 14 Batch 600/1540] avg loss 0.00356899, throughput 2.82956K wps
[Epoch 14 Batch 630/1540] avg loss 0.00355452, throughput 2.75685K wps
[Epoch 14 Batch 660/1540] avg loss 0.00375699, throughput 2.81986K wps
[Epoch 14 Batch 690/1540] avg loss 0.00352298, throughput 2.82956K wps
[Epoch 14 Batch 720/1540] avg loss 0.0036656, throughput 2.75043K wps
[Epoch 14 Batch 750/1540] avg loss 0.00383437, throughput 2.74541K wps
[Epoch 14 Batch 780/1540] avg loss 0.00339625, throughput 2.84515K wps
[Epoch 14 Batch 810/1540] avg loss 0.00333852, throughput 2.84702K wps
[Epoch 14 Batch 840/1540] avg loss 0.00327385, throughput 2.84778K wps
[Epoch 14 Batch 870/1540] avg loss 0.0031771, throughput 2.84299K wps
[Epoch 14 Batch 900/1540] avg loss 0.00335976, throughput 2.84115K wps
[Epoch 14 Batch 930/1540] avg loss 0.00346387, throughput 2.84651K wps
[Epoch 14 Batch 960/1540] avg loss 0.00371097, throughput 2.8417K wps
[Epoch 14 Batch 990/1540] avg loss 0.00400183, throughput 2.83246K wps
[Epoch 14 Batch 1020/1540] avg loss 0.00360309, throughput 2.82715K wps
[Epoch 14 Batch 1050/1540] avg loss 0.00323007, throughput 2.82123K wps
[Epoch 14 Batch 1080/1540] avg loss 0.00396065, throughput 2.81067K wps
[Epoch 14 Batch 1110/1540] avg loss 0.00366688, throughput 2.80444K wps
[Epoch 14 Batch 1140/1540] avg loss 0.0041739, throughput 2.84289K wps
[Epoch 14 Batch 1170/1540] avg loss 0.00356667, throughput 2.83896K wps
[Epoch 14 Batch 1200/1540] avg loss 0.00374966, throughput 2.824K wps
[Epoch 14 Batch 1230/1540] avg loss 0.00360782, throughput 2.83745K wps
[Epoch 14 Batch 1260/1540] avg loss 0.00335417, throughput 2.83739K wps
[Epoch 14 Batch 1290/1540] avg loss 0.00389921, throughput 2.849K wps
[Epoch 14 Batch 1320/1540] avg loss 0.00330678, throughput 2.84415K wps
[Epoch 14 Batch 1350/1540] avg loss 0.00344476, throughput 2.78438K wps
[Epoch 14 Batch 1380/1540] avg loss 0.00381081, throughput 2.84776K wps
[Epoch 14 Batch 1410/1540] avg loss 0.00359967, throughput 2.84503K wps
[Epoch 14 Batch 1440/1540] avg loss 0.00410466, throughput 2.83881K wps
[Epoch 14 Batch 1470/1540] avg loss 0.00357106, throughput 2.77798K wps
[Epoch 14 Batch 1500/1540] avg loss 0.00354261, throughput 2.79968K wps
[Epoch 14 Batch 1530/1540] avg loss 0.00326459, throughput 2.84476K wps
Begin Testing...
[Epoch 14] train avg loss 0.00362778, dev acc 0.8280, dev avg loss 0.425213, throughput 2.81794K wps
[Epoch 15 Batch 30/1540] avg loss 0.00347744, throughput 2.89295K wps
[Epoch 15 Batch 60/1540] avg loss 0.00333828, throughput 2.84847K wps
[Epoch 15 Batch 90/1540] avg loss 0.00354703, throughput 2.84396K wps
[Epoch 15 Batch 120/1540] avg loss 0.00387464, throughput 2.83999K wps
[Epoch 15 Batch 150/1540] avg loss 0.00417817, throughput 2.84535K wps
[Epoch 15 Batch 180/1540] avg loss 0.00352527, throughput 2.8416K wps
[Epoch 15 Batch 210/1540] avg loss 0.00362242, throughput 2.78901K wps
[Epoch 15 Batch 240/1540] avg loss 0.00344395, throughput 2.7527K wps
[Epoch 15 Batch 270/1540] avg loss 0.00340689, throughput 2.76448K wps
[Epoch 15 Batch 300/1540] avg loss 0.00330769, throughput 2.78729K wps
[Epoch 15 Batch 330/1540] avg loss 0.00349941, throughput 2.77637K wps
[Epoch 15 Batch 360/1540] avg loss 0.00357591, throughput 2.79333K wps
[Epoch 15 Batch 390/1540] avg loss 0.00342728, throughput 2.82592K wps
[Epoch 15 Batch 420/1540] avg loss 0.00343582, throughput 2.84492K wps
[Epoch 15 Batch 450/1540] avg loss 0.00361587, throughput 2.84561K wps
[Epoch 15 Batch 480/1540] avg loss 0.00321183, throughput 2.8361K wps
[Epoch 15 Batch 510/1540] avg loss 0.00321344, throughput 2.79655K wps
[Epoch 15 Batch 540/1540] avg loss 0.0032286, throughput 2.80118K wps
[Epoch 15 Batch 570/1540] avg loss 0.00353111, throughput 2.84981K wps
[Epoch 15 Batch 600/1540] avg loss 0.00349107, throughput 2.84394K wps
[Epoch 15 Batch 630/1540] avg loss 0.00334554, throughput 2.76961K wps
[Epoch 15 Batch 660/1540] avg loss 0.00374302, throughput 2.75879K wps
[Epoch 15 Batch 690/1540] avg loss 0.00332448, throughput 2.7537K wps
[Epoch 15 Batch 720/1540] avg loss 0.00386207, throughput 2.75055K wps
[Epoch 15 Batch 750/1540] avg loss 0.0031384, throughput 2.75414K wps
[Epoch 15 Batch 780/1540] avg loss 0.00346361, throughput 2.76803K wps
[Epoch 15 Batch 810/1540] avg loss 0.00361852, throughput 2.75072K wps
[Epoch 15 Batch 840/1540] avg loss 0.00358818, throughput 2.82005K wps
[Epoch 15 Batch 870/1540] avg loss 0.00381341, throughput 2.84905K wps
[Epoch 15 Batch 900/1540] avg loss 0.00372609, throughput 2.83991K wps
[Epoch 15 Batch 930/1540] avg loss 0.00382002, throughput 2.80411K wps
[Epoch 15 Batch 960/1540] avg loss 0.00362912, throughput 2.76092K wps
[Epoch 15 Batch 990/1540] avg loss 0.00333824, throughput 2.8451K wps
[Epoch 15 Batch 1020/1540] avg loss 0.00321054, throughput 2.84406K wps
[Epoch 15 Batch 1050/1540] avg loss 0.00347868, throughput 2.84278K wps
[Epoch 15 Batch 1080/1540] avg loss 0.00332613, throughput 2.78265K wps
[Epoch 15 Batch 1110/1540] avg loss 0.00341738, throughput 2.77986K wps
[Epoch 15 Batch 1140/1540] avg loss 0.00361888, throughput 2.84376K wps
[Epoch 15 Batch 1170/1540] avg loss 0.00360009, throughput 2.84403K wps
[Epoch 15 Batch 1200/1540] avg loss 0.00295218, throughput 2.80983K wps
[Epoch 15 Batch 1230/1540] avg loss 0.00366093, throughput 2.81368K wps
[Epoch 15 Batch 1260/1540] avg loss 0.00329321, throughput 2.75506K wps
[Epoch 15 Batch 1290/1540] avg loss 0.00361915, throughput 2.83616K wps
[Epoch 15 Batch 1320/1540] avg loss 0.00325281, throughput 2.84841K wps
[Epoch 15 Batch 1350/1540] avg loss 0.00338151, throughput 2.83379K wps
[Epoch 15 Batch 1380/1540] avg loss 0.00360927, throughput 2.84617K wps
[Epoch 15 Batch 1410/1540] avg loss 0.00402024, throughput 2.83851K wps
[Epoch 15 Batch 1440/1540] avg loss 0.00333344, throughput 2.83106K wps
[Epoch 15 Batch 1470/1540] avg loss 0.00351912, throughput 2.75682K wps
[Epoch 15 Batch 1500/1540] avg loss 0.00356688, throughput 2.78719K wps
[Epoch 15 Batch 1530/1540] avg loss 0.00355468, throughput 2.7708K wps
Begin Testing...
[Epoch 15] train avg loss 0.0035069, dev acc 0.8268, dev avg loss 0.425293, throughput 2.8091K wps
[Epoch 16 Batch 30/1540] avg loss 0.00331084, throughput 2.85539K wps
[Epoch 16 Batch 60/1540] avg loss 0.0030325, throughput 2.80337K wps
[Epoch 16 Batch 90/1540] avg loss 0.00343813, throughput 2.77014K wps
[Epoch 16 Batch 120/1540] avg loss 0.00351497, throughput 2.83179K wps
[Epoch 16 Batch 150/1540] avg loss 0.00318089, throughput 2.83792K wps
[Epoch 16 Batch 180/1540] avg loss 0.00333679, throughput 2.84871K wps
[Epoch 16 Batch 210/1540] avg loss 0.00336464, throughput 2.84536K wps
[Epoch 16 Batch 240/1540] avg loss 0.00361588, throughput 2.84324K wps
[Epoch 16 Batch 270/1540] avg loss 0.00316605, throughput 2.77265K wps
[Epoch 16 Batch 300/1540] avg loss 0.00369889, throughput 2.75658K wps
[Epoch 16 Batch 330/1540] avg loss 0.00328536, throughput 2.82445K wps
[Epoch 16 Batch 360/1540] avg loss 0.0030264, throughput 2.79686K wps
[Epoch 16 Batch 390/1540] avg loss 0.00318841, throughput 2.76328K wps
[Epoch 16 Batch 420/1540] avg loss 0.00354234, throughput 2.8068K wps
[Epoch 16 Batch 450/1540] avg loss 0.00343012, throughput 2.75429K wps
[Epoch 16 Batch 480/1540] avg loss 0.00337229, throughput 2.76479K wps
[Epoch 16 Batch 510/1540] avg loss 0.0035189, throughput 2.84672K wps
[Epoch 16 Batch 540/1540] avg loss 0.00334218, throughput 2.84333K wps
[Epoch 16 Batch 570/1540] avg loss 0.00336614, throughput 2.8119K wps
[Epoch 16 Batch 600/1540] avg loss 0.00328434, throughput 2.78284K wps
[Epoch 16 Batch 630/1540] avg loss 0.00335851, throughput 2.78705K wps
[Epoch 16 Batch 660/1540] avg loss 0.00334822, throughput 2.7745K wps
[Epoch 16 Batch 690/1540] avg loss 0.00351528, throughput 2.78779K wps
[Epoch 16 Batch 720/1540] avg loss 0.00310939, throughput 2.75887K wps
[Epoch 16 Batch 750/1540] avg loss 0.00322301, throughput 2.84515K wps
[Epoch 16 Batch 780/1540] avg loss 0.00305656, throughput 2.84615K wps
[Epoch 16 Batch 810/1540] avg loss 0.0031835, throughput 2.8457K wps
[Epoch 16 Batch 840/1540] avg loss 0.00336167, throughput 2.84523K wps
[Epoch 16 Batch 870/1540] avg loss 0.00340439, throughput 2.83439K wps
[Epoch 16 Batch 900/1540] avg loss 0.00343672, throughput 2.82337K wps
[Epoch 16 Batch 930/1540] avg loss 0.00358226, throughput 2.84506K wps
[Epoch 16 Batch 960/1540] avg loss 0.00351756, throughput 2.8135K wps
[Epoch 16 Batch 990/1540] avg loss 0.00333323, throughput 2.78075K wps
[Epoch 16 Batch 1020/1540] avg loss 0.00331252, throughput 2.82522K wps
[Epoch 16 Batch 1050/1540] avg loss 0.00346721, throughput 2.77774K wps
[Epoch 16 Batch 1080/1540] avg loss 0.00375027, throughput 2.7604K wps
[Epoch 16 Batch 1110/1540] avg loss 0.00372633, throughput 2.80242K wps
[Epoch 16 Batch 1140/1540] avg loss 0.0033429, throughput 2.82722K wps
[Epoch 16 Batch 1170/1540] avg loss 0.00319781, throughput 2.76622K wps
[Epoch 16 Batch 1200/1540] avg loss 0.00354752, throughput 2.79613K wps
[Epoch 16 Batch 1230/1540] avg loss 0.0032722, throughput 2.78485K wps
[Epoch 16 Batch 1260/1540] avg loss 0.00313615, throughput 2.84203K wps
[Epoch 16 Batch 1290/1540] avg loss 0.0032758, throughput 2.8474K wps
[Epoch 16 Batch 1320/1540] avg loss 0.00373953, throughput 2.84398K wps
[Epoch 16 Batch 1350/1540] avg loss 0.00298315, throughput 2.77224K wps
[Epoch 16 Batch 1380/1540] avg loss 0.00305148, throughput 2.84305K wps
[Epoch 16 Batch 1410/1540] avg loss 0.00364609, throughput 2.8251K wps
[Epoch 16 Batch 1440/1540] avg loss 0.00332048, throughput 2.8436K wps
[Epoch 16 Batch 1470/1540] avg loss 0.00320433, throughput 2.84211K wps
[Epoch 16 Batch 1500/1540] avg loss 0.00349318, throughput 2.84K wps
[Epoch 16 Batch 1530/1540] avg loss 0.00331897, throughput 2.84302K wps
Begin Testing...
[Epoch 16] train avg loss 0.00336228, dev acc 0.8372, dev avg loss 0.428157, throughput 2.81218K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 17 Batch 30/1540] avg loss 0.00292926, throughput 2.85196K wps
[Epoch 17 Batch 60/1540] avg loss 0.00330681, throughput 2.84059K wps
[Epoch 17 Batch 90/1540] avg loss 0.00336299, throughput 2.8251K wps
[Epoch 17 Batch 120/1540] avg loss 0.00312601, throughput 2.83667K wps
[Epoch 17 Batch 150/1540] avg loss 0.00343347, throughput 2.84932K wps
[Epoch 17 Batch 180/1540] avg loss 0.00341627, throughput 2.83811K wps
[Epoch 17 Batch 210/1540] avg loss 0.00331586, throughput 2.79721K wps
[Epoch 17 Batch 240/1540] avg loss 0.00294444, throughput 2.81147K wps
[Epoch 17 Batch 270/1540] avg loss 0.00271829, throughput 2.84659K wps
[Epoch 17 Batch 300/1540] avg loss 0.00312847, throughput 2.78619K wps
[Epoch 17 Batch 330/1540] avg loss 0.00340055, throughput 2.84093K wps
[Epoch 17 Batch 360/1540] avg loss 0.00289972, throughput 2.84408K wps
[Epoch 17 Batch 390/1540] avg loss 0.00293258, throughput 2.78791K wps
[Epoch 17 Batch 420/1540] avg loss 0.00301331, throughput 2.78194K wps
[Epoch 17 Batch 450/1540] avg loss 0.00316686, throughput 2.84647K wps
[Epoch 17 Batch 480/1540] avg loss 0.00323884, throughput 2.82298K wps
[Epoch 17 Batch 510/1540] avg loss 0.0033132, throughput 2.8334K wps
[Epoch 17 Batch 540/1540] avg loss 0.00309709, throughput 2.77685K wps
[Epoch 17 Batch 570/1540] avg loss 0.00301774, throughput 2.80671K wps
[Epoch 17 Batch 600/1540] avg loss 0.00319555, throughput 2.84221K wps
[Epoch 17 Batch 630/1540] avg loss 0.00316867, throughput 2.84144K wps
[Epoch 17 Batch 660/1540] avg loss 0.00351241, throughput 2.77585K wps
[Epoch 17 Batch 690/1540] avg loss 0.00318071, throughput 2.80672K wps
[Epoch 17 Batch 720/1540] avg loss 0.00321907, throughput 2.84569K wps
[Epoch 17 Batch 750/1540] avg loss 0.00310959, throughput 2.83151K wps
[Epoch 17 Batch 780/1540] avg loss 0.00334146, throughput 2.82971K wps
[Epoch 17 Batch 810/1540] avg loss 0.00333791, throughput 2.84096K wps
[Epoch 17 Batch 840/1540] avg loss 0.00317562, throughput 2.83034K wps
[Epoch 17 Batch 870/1540] avg loss 0.00334432, throughput 2.84116K wps
[Epoch 17 Batch 900/1540] avg loss 0.00334301, throughput 2.82143K wps
[Epoch 17 Batch 930/1540] avg loss 0.00302672, throughput 2.84452K wps
[Epoch 17 Batch 960/1540] avg loss 0.00325875, throughput 2.78906K wps
[Epoch 17 Batch 990/1540] avg loss 0.00325256, throughput 2.84385K wps
[Epoch 17 Batch 1020/1540] avg loss 0.00354869, throughput 2.77684K wps
[Epoch 17 Batch 1050/1540] avg loss 0.00356823, throughput 2.76037K wps
[Epoch 17 Batch 1080/1540] avg loss 0.00302643, throughput 2.82061K wps
[Epoch 17 Batch 1110/1540] avg loss 0.00347567, throughput 2.84333K wps
[Epoch 17 Batch 1140/1540] avg loss 0.00318284, throughput 2.83336K wps
[Epoch 17 Batch 1170/1540] avg loss 0.00315772, throughput 2.77397K wps
[Epoch 17 Batch 1200/1540] avg loss 0.0037661, throughput 2.78451K wps
[Epoch 17 Batch 1230/1540] avg loss 0.00344642, throughput 2.84294K wps
[Epoch 17 Batch 1260/1540] avg loss 0.00357293, throughput 2.81985K wps
[Epoch 17 Batch 1290/1540] avg loss 0.00294052, throughput 2.79591K wps
[Epoch 17 Batch 1320/1540] avg loss 0.00315572, throughput 2.84036K wps
[Epoch 17 Batch 1350/1540] avg loss 0.0031937, throughput 2.84179K wps
[Epoch 17 Batch 1380/1540] avg loss 0.00331831, throughput 2.84312K wps
[Epoch 17 Batch 1410/1540] avg loss 0.00335509, throughput 2.7751K wps
[Epoch 17 Batch 1440/1540] avg loss 0.00370649, throughput 2.75456K wps
[Epoch 17 Batch 1470/1540] avg loss 0.0038141, throughput 2.79423K wps
[Epoch 17 Batch 1500/1540] avg loss 0.00340644, throughput 2.76745K wps
[Epoch 17 Batch 1530/1540] avg loss 0.00373137, throughput 2.7593K wps
Begin Testing...
[Epoch 17] train avg loss 0.00326857, dev acc 0.8314, dev avg loss 0.434033, throughput 2.81579K wps
[Epoch 18 Batch 30/1540] avg loss 0.00316388, throughput 2.84807K wps
[Epoch 18 Batch 60/1540] avg loss 0.00288998, throughput 2.83799K wps
[Epoch 18 Batch 90/1540] avg loss 0.00277676, throughput 2.84543K wps
[Epoch 18 Batch 120/1540] avg loss 0.00315002, throughput 2.84319K wps
[Epoch 18 Batch 150/1540] avg loss 0.00322311, throughput 2.83925K wps
[Epoch 18 Batch 180/1540] avg loss 0.00291506, throughput 2.83831K wps
[Epoch 18 Batch 210/1540] avg loss 0.002897, throughput 2.79692K wps
[Epoch 18 Batch 240/1540] avg loss 0.00315096, throughput 2.77815K wps
[Epoch 18 Batch 270/1540] avg loss 0.00322685, throughput 2.80163K wps
[Epoch 18 Batch 300/1540] avg loss 0.0030956, throughput 2.80782K wps
[Epoch 18 Batch 330/1540] avg loss 0.00311885, throughput 2.78733K wps
[Epoch 18 Batch 360/1540] avg loss 0.00335317, throughput 2.84449K wps
[Epoch 18 Batch 390/1540] avg loss 0.00327014, throughput 2.83766K wps
[Epoch 18 Batch 420/1540] avg loss 0.003091, throughput 2.84279K wps
[Epoch 18 Batch 450/1540] avg loss 0.00315899, throughput 2.82101K wps
[Epoch 18 Batch 480/1540] avg loss 0.00321562, throughput 2.75342K wps
[Epoch 18 Batch 510/1540] avg loss 0.00287454, throughput 2.81221K wps
[Epoch 18 Batch 540/1540] avg loss 0.00332914, throughput 2.84647K wps
[Epoch 18 Batch 570/1540] avg loss 0.00295907, throughput 2.80007K wps
[Epoch 18 Batch 600/1540] avg loss 0.00319503, throughput 2.82489K wps
[Epoch 18 Batch 630/1540] avg loss 0.00307769, throughput 2.83867K wps
[Epoch 18 Batch 660/1540] avg loss 0.00323254, throughput 2.81613K wps
[Epoch 18 Batch 690/1540] avg loss 0.00321698, throughput 2.83369K wps
[Epoch 18 Batch 720/1540] avg loss 0.00312866, throughput 2.8479K wps
[Epoch 18 Batch 750/1540] avg loss 0.00266894, throughput 2.83268K wps
[Epoch 18 Batch 780/1540] avg loss 0.00282837, throughput 2.78205K wps
[Epoch 18 Batch 810/1540] avg loss 0.00322583, throughput 2.8376K wps
[Epoch 18 Batch 840/1540] avg loss 0.00337163, throughput 2.83768K wps
[Epoch 18 Batch 870/1540] avg loss 0.00321502, throughput 2.84879K wps
[Epoch 18 Batch 900/1540] avg loss 0.0034066, throughput 2.84848K wps
[Epoch 18 Batch 930/1540] avg loss 0.00310792, throughput 2.84386K wps
[Epoch 18 Batch 960/1540] avg loss 0.00299468, throughput 2.84479K wps
[Epoch 18 Batch 990/1540] avg loss 0.00359021, throughput 2.80106K wps
[Epoch 18 Batch 1020/1540] avg loss 0.00319617, throughput 2.79196K wps
[Epoch 18 Batch 1050/1540] avg loss 0.00331894, throughput 2.79564K wps
[Epoch 18 Batch 1080/1540] avg loss 0.00324225, throughput 2.76063K wps
[Epoch 18 Batch 1110/1540] avg loss 0.00308812, throughput 2.84319K wps
[Epoch 18 Batch 1140/1540] avg loss 0.00331726, throughput 2.84657K wps
[Epoch 18 Batch 1170/1540] avg loss 0.00275696, throughput 2.84223K wps
[Epoch 18 Batch 1200/1540] avg loss 0.00348029, throughput 2.83154K wps
[Epoch 18 Batch 1230/1540] avg loss 0.00322277, throughput 2.81307K wps
[Epoch 18 Batch 1260/1540] avg loss 0.0031324, throughput 2.76147K wps
[Epoch 18 Batch 1290/1540] avg loss 0.00335996, throughput 2.80868K wps
[Epoch 18 Batch 1320/1540] avg loss 0.00262167, throughput 2.81143K wps
[Epoch 18 Batch 1350/1540] avg loss 0.00341869, throughput 2.8407K wps
[Epoch 18 Batch 1380/1540] avg loss 0.00320679, throughput 2.82906K wps
[Epoch 18 Batch 1410/1540] avg loss 0.00311348, throughput 2.75487K wps
[Epoch 18 Batch 1440/1540] avg loss 0.00320897, throughput 2.82834K wps
[Epoch 18 Batch 1470/1540] avg loss 0.00313995, throughput 2.83984K wps
[Epoch 18 Batch 1500/1540] avg loss 0.00326442, throughput 2.84716K wps
[Epoch 18 Batch 1530/1540] avg loss 0.0029565, throughput 2.83097K wps
Begin Testing...
[Epoch 18] train avg loss 0.0031429, dev acc 0.8303, dev avg loss 0.448871, throughput 2.82137K wps
[Epoch 19 Batch 30/1540] avg loss 0.00314068, throughput 2.89818K wps
[Epoch 19 Batch 60/1540] avg loss 0.00284234, throughput 2.84186K wps
[Epoch 19 Batch 90/1540] avg loss 0.00304088, throughput 2.84785K wps
[Epoch 19 Batch 120/1540] avg loss 0.00295029, throughput 2.8018K wps
[Epoch 19 Batch 150/1540] avg loss 0.00280568, throughput 2.79164K wps
[Epoch 19 Batch 180/1540] avg loss 0.00299376, throughput 2.79135K wps
[Epoch 19 Batch 210/1540] avg loss 0.00324096, throughput 2.82377K wps
[Epoch 19 Batch 240/1540] avg loss 0.00294114, throughput 2.84805K wps
[Epoch 19 Batch 270/1540] avg loss 0.00332724, throughput 2.80918K wps
[Epoch 19 Batch 300/1540] avg loss 0.00293145, throughput 2.84144K wps
[Epoch 19 Batch 330/1540] avg loss 0.00310246, throughput 2.82896K wps
[Epoch 19 Batch 360/1540] avg loss 0.00310853, throughput 2.806K wps
[Epoch 19 Batch 390/1540] avg loss 0.00301485, throughput 2.79396K wps
[Epoch 19 Batch 420/1540] avg loss 0.00316634, throughput 2.84413K wps
[Epoch 19 Batch 450/1540] avg loss 0.00354894, throughput 2.80987K wps
[Epoch 19 Batch 480/1540] avg loss 0.00294124, throughput 2.84719K wps
[Epoch 19 Batch 510/1540] avg loss 0.00325429, throughput 2.84435K wps
[Epoch 19 Batch 540/1540] avg loss 0.00328002, throughput 2.79573K wps
[Epoch 19 Batch 570/1540] avg loss 0.00323148, throughput 2.83992K wps
[Epoch 19 Batch 600/1540] avg loss 0.00296336, throughput 2.79831K wps
[Epoch 19 Batch 630/1540] avg loss 0.00292891, throughput 2.77912K wps
[Epoch 19 Batch 660/1540] avg loss 0.00303365, throughput 2.75679K wps
[Epoch 19 Batch 690/1540] avg loss 0.00313932, throughput 2.76368K wps
[Epoch 19 Batch 720/1540] avg loss 0.00281654, throughput 2.79679K wps
[Epoch 19 Batch 750/1540] avg loss 0.00317858, throughput 2.75666K wps
[Epoch 19 Batch 780/1540] avg loss 0.00287117, throughput 2.75174K wps
[Epoch 19 Batch 810/1540] avg loss 0.00318845, throughput 2.75801K wps
[Epoch 19 Batch 840/1540] avg loss 0.00308172, throughput 2.75649K wps
[Epoch 19 Batch 870/1540] avg loss 0.00322895, throughput 2.78931K wps
[Epoch 19 Batch 900/1540] avg loss 0.00311454, throughput 2.77323K wps
[Epoch 19 Batch 930/1540] avg loss 0.00301103, throughput 2.76846K wps
[Epoch 19 Batch 960/1540] avg loss 0.00289474, throughput 2.80406K wps
[Epoch 19 Batch 990/1540] avg loss 0.00291569, throughput 2.84456K wps
[Epoch 19 Batch 1020/1540] avg loss 0.00311459, throughput 2.8405K wps
[Epoch 19 Batch 1050/1540] avg loss 0.00268002, throughput 2.83979K wps
[Epoch 19 Batch 1080/1540] avg loss 0.00342322, throughput 2.84432K wps
[Epoch 19 Batch 1110/1540] avg loss 0.00312959, throughput 2.76096K wps
[Epoch 19 Batch 1140/1540] avg loss 0.00305805, throughput 2.83442K wps
[Epoch 19 Batch 1170/1540] avg loss 0.00323874, throughput 2.80788K wps
[Epoch 19 Batch 1200/1540] avg loss 0.00303611, throughput 2.75922K wps
[Epoch 19 Batch 1230/1540] avg loss 0.00276795, throughput 2.76002K wps
[Epoch 19 Batch 1260/1540] avg loss 0.00293297, throughput 2.82282K wps
[Epoch 19 Batch 1290/1540] avg loss 0.00321199, throughput 2.76629K wps
[Epoch 19 Batch 1320/1540] avg loss 0.00329922, throughput 2.75393K wps
[Epoch 19 Batch 1350/1540] avg loss 0.00335673, throughput 2.82063K wps
[Epoch 19 Batch 1380/1540] avg loss 0.00301048, throughput 2.78191K wps
[Epoch 19 Batch 1410/1540] avg loss 0.00289952, throughput 2.75743K wps
[Epoch 19 Batch 1440/1540] avg loss 0.00292873, throughput 2.81713K wps
[Epoch 19 Batch 1470/1540] avg loss 0.00300025, throughput 2.84306K wps
[Epoch 19 Batch 1500/1540] avg loss 0.00302098, throughput 2.84456K wps
[Epoch 19 Batch 1530/1540] avg loss 0.00342132, throughput 2.83353K wps
Begin Testing...
[Epoch 19] train avg loss 0.00307221, dev acc 0.8337, dev avg loss 0.44789, throughput 2.80502K wps
[Epoch 20 Batch 30/1540] avg loss 0.00326547, throughput 2.8738K wps
[Epoch 20 Batch 60/1540] avg loss 0.00254785, throughput 2.76934K wps
[Epoch 20 Batch 90/1540] avg loss 0.00299128, throughput 2.75503K wps
[Epoch 20 Batch 120/1540] avg loss 0.00262275, throughput 2.84024K wps
[Epoch 20 Batch 150/1540] avg loss 0.0028109, throughput 2.84102K wps
[Epoch 20 Batch 180/1540] avg loss 0.00304741, throughput 2.84173K wps
[Epoch 20 Batch 210/1540] avg loss 0.00332597, throughput 2.77778K wps
[Epoch 20 Batch 240/1540] avg loss 0.00279887, throughput 2.80034K wps
[Epoch 20 Batch 270/1540] avg loss 0.00331562, throughput 2.80719K wps
[Epoch 20 Batch 300/1540] avg loss 0.00287641, throughput 2.84412K wps
[Epoch 20 Batch 330/1540] avg loss 0.00297958, throughput 2.83447K wps
[Epoch 20 Batch 360/1540] avg loss 0.00304854, throughput 2.76512K wps
[Epoch 20 Batch 390/1540] avg loss 0.00305781, throughput 2.8032K wps
[Epoch 20 Batch 420/1540] avg loss 0.00295303, throughput 2.80876K wps
[Epoch 20 Batch 450/1540] avg loss 0.00287698, throughput 2.84489K wps
[Epoch 20 Batch 480/1540] avg loss 0.00298673, throughput 2.84336K wps
[Epoch 20 Batch 510/1540] avg loss 0.00300638, throughput 2.84656K wps
[Epoch 20 Batch 540/1540] avg loss 0.00311853, throughput 2.77624K wps
[Epoch 20 Batch 570/1540] avg loss 0.00303417, throughput 2.78199K wps
[Epoch 20 Batch 600/1540] avg loss 0.00307036, throughput 2.79977K wps
[Epoch 20 Batch 630/1540] avg loss 0.0033637, throughput 2.82136K wps
[Epoch 20 Batch 660/1540] avg loss 0.00292001, throughput 2.75904K wps
[Epoch 20 Batch 690/1540] avg loss 0.00307096, throughput 2.79724K wps
[Epoch 20 Batch 720/1540] avg loss 0.0033873, throughput 2.78958K wps
[Epoch 20 Batch 750/1540] avg loss 0.00316917, throughput 2.77178K wps
[Epoch 20 Batch 780/1540] avg loss 0.00267293, throughput 2.80027K wps
[Epoch 20 Batch 810/1540] avg loss 0.0028375, throughput 2.75645K wps
[Epoch 20 Batch 840/1540] avg loss 0.00292135, throughput 2.82193K wps
[Epoch 20 Batch 870/1540] avg loss 0.0026182, throughput 2.78382K wps
[Epoch 20 Batch 900/1540] avg loss 0.00308727, throughput 2.8024K wps
[Epoch 20 Batch 930/1540] avg loss 0.00265783, throughput 2.85055K wps
[Epoch 20 Batch 960/1540] avg loss 0.00299123, throughput 2.7801K wps
[Epoch 20 Batch 990/1540] avg loss 0.00315632, throughput 2.81272K wps
[Epoch 20 Batch 1020/1540] avg loss 0.00278662, throughput 2.84234K wps
[Epoch 20 Batch 1050/1540] avg loss 0.00307836, throughput 2.78137K wps
[Epoch 20 Batch 1080/1540] avg loss 0.00285306, throughput 2.81619K wps
[Epoch 20 Batch 1110/1540] avg loss 0.00251057, throughput 2.83506K wps
[Epoch 20 Batch 1140/1540] avg loss 0.00333968, throughput 2.83364K wps
[Epoch 20 Batch 1170/1540] avg loss 0.00329233, throughput 2.79114K wps
[Epoch 20 Batch 1200/1540] avg loss 0.00300691, throughput 2.84095K wps
[Epoch 20 Batch 1230/1540] avg loss 0.00285111, throughput 2.84145K wps
[Epoch 20 Batch 1260/1540] avg loss 0.00252214, throughput 2.84525K wps
[Epoch 20 Batch 1290/1540] avg loss 0.00302976, throughput 2.83871K wps
[Epoch 20 Batch 1320/1540] avg loss 0.00305814, throughput 2.78733K wps
[Epoch 20 Batch 1350/1540] avg loss 0.0029608, throughput 2.7985K wps
[Epoch 20 Batch 1380/1540] avg loss 0.00291907, throughput 2.84739K wps
[Epoch 20 Batch 1410/1540] avg loss 0.00322359, throughput 2.79079K wps
[Epoch 20 Batch 1440/1540] avg loss 0.00291873, throughput 2.80861K wps
[Epoch 20 Batch 1470/1540] avg loss 0.00294896, throughput 2.84547K wps
[Epoch 20 Batch 1500/1540] avg loss 0.00296053, throughput 2.77774K wps
[Epoch 20 Batch 1530/1540] avg loss 0.00303375, throughput 2.81218K wps
Begin Testing...
[Epoch 20] train avg loss 0.00297683, dev acc 0.8326, dev avg loss 0.45415, throughput 2.81046K wps
[Epoch 21 Batch 30/1540] avg loss 0.00282821, throughput 2.83971K wps
[Epoch 21 Batch 60/1540] avg loss 0.00292441, throughput 2.84614K wps
[Epoch 21 Batch 90/1540] avg loss 0.00275239, throughput 2.76135K wps
[Epoch 21 Batch 120/1540] avg loss 0.00274287, throughput 2.76343K wps
[Epoch 21 Batch 150/1540] avg loss 0.00312652, throughput 2.75517K wps
[Epoch 21 Batch 180/1540] avg loss 0.00292015, throughput 2.76415K wps
[Epoch 21 Batch 210/1540] avg loss 0.00316187, throughput 2.80561K wps
[Epoch 21 Batch 240/1540] avg loss 0.0034788, throughput 2.80806K wps
[Epoch 21 Batch 270/1540] avg loss 0.00256883, throughput 2.81494K wps
[Epoch 21 Batch 300/1540] avg loss 0.00245609, throughput 2.83266K wps
[Epoch 21 Batch 330/1540] avg loss 0.00301083, throughput 2.84155K wps
[Epoch 21 Batch 360/1540] avg loss 0.00279609, throughput 2.84182K wps
[Epoch 21 Batch 390/1540] avg loss 0.00274882, throughput 2.84262K wps
[Epoch 21 Batch 420/1540] avg loss 0.00293468, throughput 2.79854K wps
[Epoch 21 Batch 450/1540] avg loss 0.00288869, throughput 2.7542K wps
[Epoch 21 Batch 480/1540] avg loss 0.00288559, throughput 2.77619K wps
[Epoch 21 Batch 510/1540] avg loss 0.00271629, throughput 2.79311K wps
[Epoch 21 Batch 540/1540] avg loss 0.00290964, throughput 2.78249K wps
[Epoch 21 Batch 570/1540] avg loss 0.00276795, throughput 2.82281K wps
[Epoch 21 Batch 600/1540] avg loss 0.00275836, throughput 2.8341K wps
[Epoch 21 Batch 630/1540] avg loss 0.00283422, throughput 2.84337K wps
[Epoch 21 Batch 660/1540] avg loss 0.00299301, throughput 2.82471K wps
[Epoch 21 Batch 690/1540] avg loss 0.00315116, throughput 2.79695K wps
[Epoch 21 Batch 720/1540] avg loss 0.00258499, throughput 2.81753K wps
[Epoch 21 Batch 750/1540] avg loss 0.00276687, throughput 2.84532K wps
[Epoch 21 Batch 780/1540] avg loss 0.00292862, throughput 2.83743K wps
[Epoch 21 Batch 810/1540] avg loss 0.00291906, throughput 2.84102K wps
[Epoch 21 Batch 840/1540] avg loss 0.00319008, throughput 2.83571K wps
[Epoch 21 Batch 870/1540] avg loss 0.00285624, throughput 2.79486K wps
[Epoch 21 Batch 900/1540] avg loss 0.00303904, throughput 2.80494K wps
[Epoch 21 Batch 930/1540] avg loss 0.00273475, throughput 2.83261K wps
[Epoch 21 Batch 960/1540] avg loss 0.00307752, throughput 2.76825K wps
[Epoch 21 Batch 990/1540] avg loss 0.00326203, throughput 2.76758K wps
[Epoch 21 Batch 1020/1540] avg loss 0.00260155, throughput 2.84654K wps
[Epoch 21 Batch 1050/1540] avg loss 0.00268848, throughput 2.78221K wps
[Epoch 21 Batch 1080/1540] avg loss 0.00287666, throughput 2.78902K wps
[Epoch 21 Batch 1110/1540] avg loss 0.00283114, throughput 2.8446K wps
[Epoch 21 Batch 1140/1540] avg loss 0.00281071, throughput 2.84572K wps
[Epoch 21 Batch 1170/1540] avg loss 0.00326468, throughput 2.82826K wps
[Epoch 21 Batch 1200/1540] avg loss 0.00276821, throughput 2.84716K wps
[Epoch 21 Batch 1230/1540] avg loss 0.00289428, throughput 2.77039K wps
[Epoch 21 Batch 1260/1540] avg loss 0.00256034, throughput 2.75735K wps
[Epoch 21 Batch 1290/1540] avg loss 0.00290482, throughput 2.8428K wps
[Epoch 21 Batch 1320/1540] avg loss 0.00317809, throughput 2.84073K wps
[Epoch 21 Batch 1350/1540] avg loss 0.00280456, throughput 2.83856K wps
[Epoch 21 Batch 1380/1540] avg loss 0.00285372, throughput 2.7999K wps
[Epoch 21 Batch 1410/1540] avg loss 0.00281429, throughput 2.76119K wps
[Epoch 21 Batch 1440/1540] avg loss 0.00281823, throughput 2.82455K wps
[Epoch 21 Batch 1470/1540] avg loss 0.00268235, throughput 2.848K wps
[Epoch 21 Batch 1500/1540] avg loss 0.00258286, throughput 2.83847K wps
[Epoch 21 Batch 1530/1540] avg loss 0.00286663, throughput 2.84001K wps
Begin Testing...
[Epoch 21] train avg loss 0.00287243, dev acc 0.8280, dev avg loss 0.458131, throughput 2.81223K wps
[Epoch 22 Batch 30/1540] avg loss 0.00292104, throughput 2.90204K wps
[Epoch 22 Batch 60/1540] avg loss 0.00282632, throughput 2.84602K wps
[Epoch 22 Batch 90/1540] avg loss 0.00315287, throughput 2.82792K wps
[Epoch 22 Batch 120/1540] avg loss 0.00273487, throughput 2.84184K wps
[Epoch 22 Batch 150/1540] avg loss 0.00311342, throughput 2.7976K wps
[Epoch 22 Batch 180/1540] avg loss 0.0026141, throughput 2.83259K wps
[Epoch 22 Batch 210/1540] avg loss 0.00264761, throughput 2.76338K wps
[Epoch 22 Batch 240/1540] avg loss 0.00264657, throughput 2.76141K wps
[Epoch 22 Batch 270/1540] avg loss 0.00297746, throughput 2.80078K wps
[Epoch 22 Batch 300/1540] avg loss 0.00252396, throughput 2.83522K wps
[Epoch 22 Batch 330/1540] avg loss 0.00286938, throughput 2.84174K wps
[Epoch 22 Batch 360/1540] avg loss 0.00321115, throughput 2.83925K wps
[Epoch 22 Batch 390/1540] avg loss 0.00259006, throughput 2.83902K wps
[Epoch 22 Batch 420/1540] avg loss 0.0023917, throughput 2.84061K wps
[Epoch 22 Batch 450/1540] avg loss 0.00252242, throughput 2.80519K wps
[Epoch 22 Batch 480/1540] avg loss 0.00265508, throughput 2.83173K wps
[Epoch 22 Batch 510/1540] avg loss 0.0029741, throughput 2.81002K wps
[Epoch 22 Batch 540/1540] avg loss 0.00293983, throughput 2.75943K wps
[Epoch 22 Batch 570/1540] avg loss 0.00260073, throughput 2.75699K wps
[Epoch 22 Batch 600/1540] avg loss 0.00302816, throughput 2.77584K wps
[Epoch 22 Batch 630/1540] avg loss 0.00293648, throughput 2.83281K wps
[Epoch 22 Batch 660/1540] avg loss 0.00303563, throughput 2.77663K wps
[Epoch 22 Batch 690/1540] avg loss 0.00306034, throughput 2.8422K wps
[Epoch 22 Batch 720/1540] avg loss 0.00257992, throughput 2.83257K wps
[Epoch 22 Batch 750/1540] avg loss 0.00281392, throughput 2.81615K wps
[Epoch 22 Batch 780/1540] avg loss 0.00300888, throughput 2.81586K wps
[Epoch 22 Batch 810/1540] avg loss 0.00291847, throughput 2.76704K wps
[Epoch 22 Batch 840/1540] avg loss 0.00229043, throughput 2.80623K wps
[Epoch 22 Batch 870/1540] avg loss 0.00308537, throughput 2.78787K wps
[Epoch 22 Batch 900/1540] avg loss 0.00251948, throughput 2.8013K wps
[Epoch 22 Batch 930/1540] avg loss 0.00308875, throughput 2.7961K wps
[Epoch 22 Batch 960/1540] avg loss 0.00291763, throughput 2.84247K wps
[Epoch 22 Batch 990/1540] avg loss 0.00276473, throughput 2.76779K wps
[Epoch 22 Batch 1020/1540] avg loss 0.00252109, throughput 2.75313K wps
[Epoch 22 Batch 1050/1540] avg loss 0.00304625, throughput 2.74671K wps
[Epoch 22 Batch 1080/1540] avg loss 0.00305487, throughput 2.80007K wps
[Epoch 22 Batch 1110/1540] avg loss 0.00280639, throughput 2.84374K wps
[Epoch 22 Batch 1140/1540] avg loss 0.00251799, throughput 2.76743K wps
[Epoch 22 Batch 1170/1540] avg loss 0.00289242, throughput 2.81466K wps
[Epoch 22 Batch 1200/1540] avg loss 0.002977, throughput 2.8169K wps
[Epoch 22 Batch 1230/1540] avg loss 0.00264512, throughput 2.84359K wps
[Epoch 22 Batch 1260/1540] avg loss 0.00299371, throughput 2.81499K wps
[Epoch 22 Batch 1290/1540] avg loss 0.00274127, throughput 2.81269K wps
[Epoch 22 Batch 1320/1540] avg loss 0.00295013, throughput 2.84456K wps
[Epoch 22 Batch 1350/1540] avg loss 0.00306481, throughput 2.84839K wps
[Epoch 22 Batch 1380/1540] avg loss 0.00271423, throughput 2.84308K wps
[Epoch 22 Batch 1410/1540] avg loss 0.00254468, throughput 2.83267K wps
[Epoch 22 Batch 1440/1540] avg loss 0.0027715, throughput 2.83341K wps
[Epoch 22 Batch 1470/1540] avg loss 0.00279163, throughput 2.76655K wps
[Epoch 22 Batch 1500/1540] avg loss 0.00285437, throughput 2.76149K wps
[Epoch 22 Batch 1530/1540] avg loss 0.00297377, throughput 2.79098K wps
Begin Testing...
[Epoch 22] train avg loss 0.00282008, dev acc 0.8280, dev avg loss 0.467494, throughput 2.81016K wps
[Epoch 23 Batch 30/1540] avg loss 0.00252937, throughput 2.88865K wps
[Epoch 23 Batch 60/1540] avg loss 0.00269034, throughput 2.81927K wps
[Epoch 23 Batch 90/1540] avg loss 0.00271377, throughput 2.80957K wps
[Epoch 23 Batch 120/1540] avg loss 0.00268616, throughput 2.81611K wps
[Epoch 23 Batch 150/1540] avg loss 0.00274397, throughput 2.80894K wps
[Epoch 23 Batch 180/1540] avg loss 0.00272946, throughput 2.84303K wps
[Epoch 23 Batch 210/1540] avg loss 0.00232483, throughput 2.79447K wps
[Epoch 23 Batch 240/1540] avg loss 0.00267967, throughput 2.81009K wps
[Epoch 23 Batch 270/1540] avg loss 0.00264609, throughput 2.7737K wps
[Epoch 23 Batch 300/1540] avg loss 0.00236484, throughput 2.84169K wps
[Epoch 23 Batch 330/1540] avg loss 0.00260678, throughput 2.8512K wps
[Epoch 23 Batch 360/1540] avg loss 0.00317031, throughput 2.75393K wps
[Epoch 23 Batch 390/1540] avg loss 0.00290491, throughput 2.82852K wps
[Epoch 23 Batch 420/1540] avg loss 0.00270531, throughput 2.83434K wps
[Epoch 23 Batch 450/1540] avg loss 0.00260512, throughput 2.84688K wps
[Epoch 23 Batch 480/1540] avg loss 0.00279146, throughput 2.84471K wps
[Epoch 23 Batch 510/1540] avg loss 0.00255143, throughput 2.81014K wps
[Epoch 23 Batch 540/1540] avg loss 0.0030512, throughput 2.78716K wps
[Epoch 23 Batch 570/1540] avg loss 0.00242128, throughput 2.82467K wps
[Epoch 23 Batch 600/1540] avg loss 0.00222187, throughput 2.80277K wps
[Epoch 23 Batch 630/1540] avg loss 0.00284357, throughput 2.75552K wps
[Epoch 23 Batch 660/1540] avg loss 0.00296659, throughput 2.76364K wps
[Epoch 23 Batch 690/1540] avg loss 0.00231248, throughput 2.8244K wps
[Epoch 23 Batch 720/1540] avg loss 0.00310263, throughput 2.83174K wps
[Epoch 23 Batch 750/1540] avg loss 0.0032179, throughput 2.84835K wps
[Epoch 23 Batch 780/1540] avg loss 0.00265761, throughput 2.84379K wps
[Epoch 23 Batch 810/1540] avg loss 0.00265191, throughput 2.845K wps
[Epoch 23 Batch 840/1540] avg loss 0.00343039, throughput 2.84329K wps
[Epoch 23 Batch 870/1540] avg loss 0.00266458, throughput 2.84321K wps
[Epoch 23 Batch 900/1540] avg loss 0.00265667, throughput 2.84117K wps
[Epoch 23 Batch 930/1540] avg loss 0.00248653, throughput 2.80461K wps
[Epoch 23 Batch 960/1540] avg loss 0.00255022, throughput 2.77651K wps
[Epoch 23 Batch 990/1540] avg loss 0.00304497, throughput 2.8079K wps
[Epoch 23 Batch 1020/1540] avg loss 0.00269182, throughput 2.84202K wps
[Epoch 23 Batch 1050/1540] avg loss 0.00287002, throughput 2.84608K wps
[Epoch 23 Batch 1080/1540] avg loss 0.00272732, throughput 2.834K wps
[Epoch 23 Batch 1110/1540] avg loss 0.00279119, throughput 2.83587K wps
[Epoch 23 Batch 1140/1540] avg loss 0.0027772, throughput 2.84467K wps
[Epoch 23 Batch 1170/1540] avg loss 0.00236012, throughput 2.84742K wps
[Epoch 23 Batch 1200/1540] avg loss 0.0029666, throughput 2.78784K wps
[Epoch 23 Batch 1230/1540] avg loss 0.0031152, throughput 2.84626K wps
[Epoch 23 Batch 1260/1540] avg loss 0.00326324, throughput 2.83636K wps
[Epoch 23 Batch 1290/1540] avg loss 0.0025789, throughput 2.79384K wps
[Epoch 23 Batch 1320/1540] avg loss 0.00273719, throughput 2.84481K wps
[Epoch 23 Batch 1350/1540] avg loss 0.00254679, throughput 2.76476K wps
[Epoch 23 Batch 1380/1540] avg loss 0.002655, throughput 2.76429K wps
[Epoch 23 Batch 1410/1540] avg loss 0.00278044, throughput 2.82979K wps
[Epoch 23 Batch 1440/1540] avg loss 0.00257154, throughput 2.84297K wps
[Epoch 23 Batch 1470/1540] avg loss 0.00270513, throughput 2.83804K wps
[Epoch 23 Batch 1500/1540] avg loss 0.00306855, throughput 2.84551K wps
[Epoch 23 Batch 1530/1540] avg loss 0.00270604, throughput 2.80673K wps
Begin Testing...
[Epoch 23] train avg loss 0.00273906, dev acc 0.8222, dev avg loss 0.467165, throughput 2.82055K wps
[Epoch 24 Batch 30/1540] avg loss 0.00245239, throughput 2.89123K wps
[Epoch 24 Batch 60/1540] avg loss 0.00271375, throughput 2.80766K wps
[Epoch 24 Batch 90/1540] avg loss 0.00244518, throughput 2.82438K wps
[Epoch 24 Batch 120/1540] avg loss 0.00250591, throughput 2.82823K wps
[Epoch 24 Batch 150/1540] avg loss 0.00253623, throughput 2.75656K wps
[Epoch 24 Batch 180/1540] avg loss 0.00235221, throughput 2.78659K wps
[Epoch 24 Batch 210/1540] avg loss 0.00243374, throughput 2.8044K wps
[Epoch 24 Batch 240/1540] avg loss 0.0023034, throughput 2.81714K wps
[Epoch 24 Batch 270/1540] avg loss 0.00258924, throughput 2.84434K wps
[Epoch 24 Batch 300/1540] avg loss 0.00232756, throughput 2.84815K wps
[Epoch 24 Batch 330/1540] avg loss 0.0027609, throughput 2.81833K wps
[Epoch 24 Batch 360/1540] avg loss 0.00299418, throughput 2.78909K wps
[Epoch 24 Batch 390/1540] avg loss 0.0028147, throughput 2.76585K wps
[Epoch 24 Batch 420/1540] avg loss 0.00272886, throughput 2.82862K wps
[Epoch 24 Batch 450/1540] avg loss 0.00308042, throughput 2.81832K wps
[Epoch 24 Batch 480/1540] avg loss 0.00266039, throughput 2.7943K wps
[Epoch 24 Batch 510/1540] avg loss 0.00283853, throughput 2.84183K wps
[Epoch 24 Batch 540/1540] avg loss 0.00263657, throughput 2.84194K wps
[Epoch 24 Batch 570/1540] avg loss 0.00284534, throughput 2.78278K wps
[Epoch 24 Batch 600/1540] avg loss 0.00270481, throughput 2.84315K wps
[Epoch 24 Batch 630/1540] avg loss 0.00299726, throughput 2.76018K wps
[Epoch 24 Batch 660/1540] avg loss 0.00284803, throughput 2.82183K wps
[Epoch 24 Batch 690/1540] avg loss 0.00285506, throughput 2.84537K wps
[Epoch 24 Batch 720/1540] avg loss 0.00268373, throughput 2.83444K wps
[Epoch 24 Batch 750/1540] avg loss 0.00261479, throughput 2.77308K wps
[Epoch 24 Batch 780/1540] avg loss 0.00244806, throughput 2.82736K wps
[Epoch 24 Batch 810/1540] avg loss 0.00270319, throughput 2.83097K wps
[Epoch 24 Batch 840/1540] avg loss 0.00236657, throughput 2.78125K wps
[Epoch 24 Batch 870/1540] avg loss 0.00267735, throughput 2.75688K wps
[Epoch 24 Batch 900/1540] avg loss 0.00271469, throughput 2.75836K wps
[Epoch 24 Batch 930/1540] avg loss 0.00282071, throughput 2.7609K wps
[Epoch 24 Batch 960/1540] avg loss 0.00264839, throughput 2.75652K wps
[Epoch 24 Batch 990/1540] avg loss 0.00277264, throughput 2.84583K wps
[Epoch 24 Batch 1020/1540] avg loss 0.00304678, throughput 2.79882K wps
[Epoch 24 Batch 1050/1540] avg loss 0.00269816, throughput 2.81626K wps
[Epoch 24 Batch 1080/1540] avg loss 0.00257271, throughput 2.76653K wps
[Epoch 24 Batch 1110/1540] avg loss 0.00306696, throughput 2.75846K wps
[Epoch 24 Batch 1140/1540] avg loss 0.00277325, throughput 2.84479K wps
[Epoch 24 Batch 1170/1540] avg loss 0.00249581, throughput 2.7817K wps
[Epoch 24 Batch 1200/1540] avg loss 0.00217259, throughput 2.80165K wps
[Epoch 24 Batch 1230/1540] avg loss 0.00288878, throughput 2.8325K wps
[Epoch 24 Batch 1260/1540] avg loss 0.00254453, throughput 2.84525K wps
[Epoch 24 Batch 1290/1540] avg loss 0.0027355, throughput 2.84667K wps
[Epoch 24 Batch 1320/1540] avg loss 0.00238582, throughput 2.84014K wps
[Epoch 24 Batch 1350/1540] avg loss 0.00245044, throughput 2.83593K wps
[Epoch 24 Batch 1380/1540] avg loss 0.00256904, throughput 2.77559K wps
[Epoch 24 Batch 1410/1540] avg loss 0.00254986, throughput 2.7826K wps
[Epoch 24 Batch 1440/1540] avg loss 0.00264926, throughput 2.75663K wps
[Epoch 24 Batch 1470/1540] avg loss 0.00304982, throughput 2.78943K wps
[Epoch 24 Batch 1500/1540] avg loss 0.00260325, throughput 2.78755K wps
[Epoch 24 Batch 1530/1540] avg loss 0.00262798, throughput 2.84925K wps
Begin Testing...
[Epoch 24] train avg loss 0.00266711, dev acc 0.8268, dev avg loss 0.477991, throughput 2.80756K wps
[Epoch 25 Batch 30/1540] avg loss 0.00270358, throughput 2.90232K wps
[Epoch 25 Batch 60/1540] avg loss 0.00232147, throughput 2.79711K wps
[Epoch 25 Batch 90/1540] avg loss 0.00269513, throughput 2.76577K wps
[Epoch 25 Batch 120/1540] avg loss 0.00252635, throughput 2.74954K wps
[Epoch 25 Batch 150/1540] avg loss 0.00235238, throughput 2.79214K wps
[Epoch 25 Batch 180/1540] avg loss 0.00250503, throughput 2.84383K wps
[Epoch 25 Batch 210/1540] avg loss 0.0026406, throughput 2.77428K wps
[Epoch 25 Batch 240/1540] avg loss 0.0025912, throughput 2.7612K wps
[Epoch 25 Batch 270/1540] avg loss 0.00279242, throughput 2.82489K wps
[Epoch 25 Batch 300/1540] avg loss 0.00254673, throughput 2.84922K wps
[Epoch 25 Batch 330/1540] avg loss 0.00225854, throughput 2.82718K wps
[Epoch 25 Batch 360/1540] avg loss 0.00237961, throughput 2.79878K wps
[Epoch 25 Batch 390/1540] avg loss 0.00237669, throughput 2.79191K wps
[Epoch 25 Batch 420/1540] avg loss 0.0024978, throughput 2.76106K wps
[Epoch 25 Batch 450/1540] avg loss 0.0024718, throughput 2.76172K wps
[Epoch 25 Batch 480/1540] avg loss 0.00263759, throughput 2.76151K wps
[Epoch 25 Batch 510/1540] avg loss 0.00247999, throughput 2.84002K wps
[Epoch 25 Batch 540/1540] avg loss 0.00279206, throughput 2.84331K wps
[Epoch 25 Batch 570/1540] avg loss 0.00242034, throughput 2.83243K wps
[Epoch 25 Batch 600/1540] avg loss 0.00236725, throughput 2.84413K wps
[Epoch 25 Batch 630/1540] avg loss 0.00209452, throughput 2.78502K wps
[Epoch 25 Batch 660/1540] avg loss 0.00255842, throughput 2.82269K wps
[Epoch 25 Batch 690/1540] avg loss 0.00286621, throughput 2.83646K wps
[Epoch 25 Batch 720/1540] avg loss 0.00283023, throughput 2.84819K wps
[Epoch 25 Batch 750/1540] avg loss 0.00309049, throughput 2.84703K wps
[Epoch 25 Batch 780/1540] avg loss 0.00261636, throughput 2.84088K wps
[Epoch 25 Batch 810/1540] avg loss 0.00261038, throughput 2.82597K wps
[Epoch 25 Batch 840/1540] avg loss 0.00254358, throughput 2.76858K wps
[Epoch 25 Batch 870/1540] avg loss 0.00240945, throughput 2.78797K wps
[Epoch 25 Batch 900/1540] avg loss 0.00251794, throughput 2.84548K wps
[Epoch 25 Batch 930/1540] avg loss 0.0027304, throughput 2.84521K wps
[Epoch 25 Batch 960/1540] avg loss 0.00249344, throughput 2.84714K wps
[Epoch 25 Batch 990/1540] avg loss 0.00246032, throughput 2.78448K wps
[Epoch 25 Batch 1020/1540] avg loss 0.00263897, throughput 2.77369K wps
[Epoch 25 Batch 1050/1540] avg loss 0.00253242, throughput 2.83863K wps
[Epoch 25 Batch 1080/1540] avg loss 0.00256358, throughput 2.84244K wps
[Epoch 25 Batch 1110/1540] avg loss 0.00259349, throughput 2.83441K wps
[Epoch 25 Batch 1140/1540] avg loss 0.00247265, throughput 2.80569K wps
[Epoch 25 Batch 1170/1540] avg loss 0.00276106, throughput 2.77694K wps
[Epoch 25 Batch 1200/1540] avg loss 0.00280521, throughput 2.79081K wps
[Epoch 25 Batch 1230/1540] avg loss 0.00276511, throughput 2.7875K wps
[Epoch 25 Batch 1260/1540] avg loss 0.00301446, throughput 2.84595K wps
[Epoch 25 Batch 1290/1540] avg loss 0.00277155, throughput 2.77575K wps
[Epoch 25 Batch 1320/1540] avg loss 0.00275268, throughput 2.82954K wps
[Epoch 25 Batch 1350/1540] avg loss 0.00236504, throughput 2.8177K wps
[Epoch 25 Batch 1380/1540] avg loss 0.00254585, throughput 2.83413K wps
[Epoch 25 Batch 1410/1540] avg loss 0.00272724, throughput 2.79467K wps
[Epoch 25 Batch 1440/1540] avg loss 0.00262649, throughput 2.76879K wps
[Epoch 25 Batch 1470/1540] avg loss 0.00274306, throughput 2.7719K wps
[Epoch 25 Batch 1500/1540] avg loss 0.00287183, throughput 2.84351K wps
[Epoch 25 Batch 1530/1540] avg loss 0.00265935, throughput 2.84384K wps
Begin Testing...
[Epoch 25] train avg loss 0.00259702, dev acc 0.8257, dev avg loss 0.482647, throughput 2.81124K wps
[Epoch 26 Batch 30/1540] avg loss 0.00211267, throughput 2.88465K wps
[Epoch 26 Batch 60/1540] avg loss 0.00261784, throughput 2.83414K wps
[Epoch 26 Batch 90/1540] avg loss 0.00236211, throughput 2.84568K wps
[Epoch 26 Batch 120/1540] avg loss 0.002459, throughput 2.8462K wps
[Epoch 26 Batch 150/1540] avg loss 0.00239096, throughput 2.84518K wps
[Epoch 26 Batch 180/1540] avg loss 0.00231911, throughput 2.82446K wps
[Epoch 26 Batch 210/1540] avg loss 0.00256332, throughput 2.79234K wps
[Epoch 26 Batch 240/1540] avg loss 0.00268995, throughput 2.81576K wps
[Epoch 26 Batch 270/1540] avg loss 0.00251815, throughput 2.84569K wps
[Epoch 26 Batch 300/1540] avg loss 0.00238149, throughput 2.79624K wps
[Epoch 26 Batch 330/1540] avg loss 0.0025308, throughput 2.8183K wps
[Epoch 26 Batch 360/1540] avg loss 0.00240885, throughput 2.80478K wps
[Epoch 26 Batch 390/1540] avg loss 0.00271993, throughput 2.84125K wps
[Epoch 26 Batch 420/1540] avg loss 0.00296547, throughput 2.84293K wps
[Epoch 26 Batch 450/1540] avg loss 0.0025044, throughput 2.84721K wps
[Epoch 26 Batch 480/1540] avg loss 0.0022987, throughput 2.8419K wps
[Epoch 26 Batch 510/1540] avg loss 0.00242769, throughput 2.84151K wps
[Epoch 26 Batch 540/1540] avg loss 0.00253474, throughput 2.80287K wps
[Epoch 26 Batch 570/1540] avg loss 0.0025712, throughput 2.82964K wps
[Epoch 26 Batch 600/1540] avg loss 0.00231208, throughput 2.84453K wps
[Epoch 26 Batch 630/1540] avg loss 0.00239558, throughput 2.83609K wps
[Epoch 26 Batch 660/1540] avg loss 0.00235619, throughput 2.81162K wps
[Epoch 26 Batch 690/1540] avg loss 0.00234738, throughput 2.76209K wps
[Epoch 26 Batch 720/1540] avg loss 0.00271669, throughput 2.83682K wps
[Epoch 26 Batch 750/1540] avg loss 0.00222749, throughput 2.8072K wps
[Epoch 26 Batch 780/1540] avg loss 0.00214718, throughput 2.79101K wps
[Epoch 26 Batch 810/1540] avg loss 0.00251348, throughput 2.81847K wps
[Epoch 26 Batch 840/1540] avg loss 0.0024513, throughput 2.84007K wps
[Epoch 26 Batch 870/1540] avg loss 0.00223514, throughput 2.84734K wps
[Epoch 26 Batch 900/1540] avg loss 0.00250709, throughput 2.84807K wps
[Epoch 26 Batch 930/1540] avg loss 0.00244786, throughput 2.79911K wps
[Epoch 26 Batch 960/1540] avg loss 0.00246638, throughput 2.80859K wps
[Epoch 26 Batch 990/1540] avg loss 0.00240543, throughput 2.84409K wps
[Epoch 26 Batch 1020/1540] avg loss 0.00279408, throughput 2.79943K wps
[Epoch 26 Batch 1050/1540] avg loss 0.00241898, throughput 2.78654K wps
[Epoch 26 Batch 1080/1540] avg loss 0.00235875, throughput 2.81983K wps
[Epoch 26 Batch 1110/1540] avg loss 0.00239126, throughput 2.83673K wps
[Epoch 26 Batch 1140/1540] avg loss 0.00230044, throughput 2.84613K wps
[Epoch 26 Batch 1170/1540] avg loss 0.00306088, throughput 2.84493K wps
[Epoch 26 Batch 1200/1540] avg loss 0.00251507, throughput 2.80587K wps
[Epoch 26 Batch 1230/1540] avg loss 0.00255382, throughput 2.79028K wps
[Epoch 26 Batch 1260/1540] avg loss 0.00287032, throughput 2.774K wps
[Epoch 26 Batch 1290/1540] avg loss 0.00291206, throughput 2.78349K wps
[Epoch 26 Batch 1320/1540] avg loss 0.00288007, throughput 2.83555K wps
[Epoch 26 Batch 1350/1540] avg loss 0.00274449, throughput 2.83077K wps
[Epoch 26 Batch 1380/1540] avg loss 0.00264041, throughput 2.75835K wps
[Epoch 26 Batch 1410/1540] avg loss 0.00282926, throughput 2.75986K wps
[Epoch 26 Batch 1440/1540] avg loss 0.00238688, throughput 2.79934K wps
[Epoch 26 Batch 1470/1540] avg loss 0.00250295, throughput 2.83576K wps
[Epoch 26 Batch 1500/1540] avg loss 0.00262591, throughput 2.81011K wps
[Epoch 26 Batch 1530/1540] avg loss 0.00236542, throughput 2.76027K wps
Begin Testing...
[Epoch 26] train avg loss 0.00250858, dev acc 0.8291, dev avg loss 0.48273, throughput 2.81839K wps
[Epoch 27 Batch 30/1540] avg loss 0.00251709, throughput 2.89826K wps
[Epoch 27 Batch 60/1540] avg loss 0.00225112, throughput 2.83306K wps
[Epoch 27 Batch 90/1540] avg loss 0.00238731, throughput 2.79262K wps
[Epoch 27 Batch 120/1540] avg loss 0.002469, throughput 2.81759K wps
[Epoch 27 Batch 150/1540] avg loss 0.00235461, throughput 2.84761K wps
[Epoch 27 Batch 180/1540] avg loss 0.00212135, throughput 2.83577K wps
[Epoch 27 Batch 210/1540] avg loss 0.00262046, throughput 2.84158K wps
[Epoch 27 Batch 240/1540] avg loss 0.00237199, throughput 2.84617K wps
[Epoch 27 Batch 270/1540] avg loss 0.00238529, throughput 2.77451K wps
[Epoch 27 Batch 300/1540] avg loss 0.0024338, throughput 2.84555K wps
[Epoch 27 Batch 330/1540] avg loss 0.00231718, throughput 2.83153K wps
[Epoch 27 Batch 360/1540] avg loss 0.00236505, throughput 2.81463K wps
[Epoch 27 Batch 390/1540] avg loss 0.00282356, throughput 2.84251K wps
[Epoch 27 Batch 420/1540] avg loss 0.00243458, throughput 2.82004K wps
[Epoch 27 Batch 450/1540] avg loss 0.00262797, throughput 2.8454K wps
[Epoch 27 Batch 480/1540] avg loss 0.00238448, throughput 2.77942K wps
[Epoch 27 Batch 510/1540] avg loss 0.00255772, throughput 2.75544K wps
[Epoch 27 Batch 540/1540] avg loss 0.00256573, throughput 2.75865K wps
[Epoch 27 Batch 570/1540] avg loss 0.00211865, throughput 2.81525K wps
[Epoch 27 Batch 600/1540] avg loss 0.00242537, throughput 2.79383K wps
[Epoch 27 Batch 630/1540] avg loss 0.00247113, throughput 2.75219K wps
[Epoch 27 Batch 660/1540] avg loss 0.00207383, throughput 2.79148K wps
[Epoch 27 Batch 690/1540] avg loss 0.00215255, throughput 2.84127K wps
[Epoch 27 Batch 720/1540] avg loss 0.00262097, throughput 2.82631K wps
[Epoch 27 Batch 750/1540] avg loss 0.00253934, throughput 2.81692K wps
[Epoch 27 Batch 780/1540] avg loss 0.00275775, throughput 2.83992K wps
[Epoch 27 Batch 810/1540] avg loss 0.002506, throughput 2.76372K wps
[Epoch 27 Batch 840/1540] avg loss 0.00248134, throughput 2.81931K wps
[Epoch 27 Batch 870/1540] avg loss 0.00278656, throughput 2.84766K wps
[Epoch 27 Batch 900/1540] avg loss 0.00253623, throughput 2.83916K wps
[Epoch 27 Batch 930/1540] avg loss 0.00254971, throughput 2.79661K wps
[Epoch 27 Batch 960/1540] avg loss 0.0028248, throughput 2.84483K wps
[Epoch 27 Batch 990/1540] avg loss 0.00238771, throughput 2.84054K wps
[Epoch 27 Batch 1020/1540] avg loss 0.00244567, throughput 2.84666K wps
[Epoch 27 Batch 1050/1540] avg loss 0.0022713, throughput 2.7916K wps
[Epoch 27 Batch 1080/1540] avg loss 0.00230763, throughput 2.74859K wps
[Epoch 27 Batch 1110/1540] avg loss 0.00233812, throughput 2.78227K wps
[Epoch 27 Batch 1140/1540] avg loss 0.0026285, throughput 2.76044K wps
[Epoch 27 Batch 1170/1540] avg loss 0.00252527, throughput 2.84434K wps
[Epoch 27 Batch 1200/1540] avg loss 0.00218549, throughput 2.76477K wps
[Epoch 27 Batch 1230/1540] avg loss 0.00225222, throughput 2.81121K wps
[Epoch 27 Batch 1260/1540] avg loss 0.00259294, throughput 2.75769K wps
[Epoch 27 Batch 1290/1540] avg loss 0.00244737, throughput 2.81198K wps
[Epoch 27 Batch 1320/1540] avg loss 0.00225749, throughput 2.82784K wps
[Epoch 27 Batch 1350/1540] avg loss 0.00303857, throughput 2.75365K wps
[Epoch 27 Batch 1380/1540] avg loss 0.00252588, throughput 2.82576K wps
[Epoch 27 Batch 1410/1540] avg loss 0.0026263, throughput 2.75404K wps
[Epoch 27 Batch 1440/1540] avg loss 0.00282669, throughput 2.80825K wps
[Epoch 27 Batch 1470/1540] avg loss 0.00245709, throughput 2.80648K wps
[Epoch 27 Batch 1500/1540] avg loss 0.00269945, throughput 2.80946K wps
[Epoch 27 Batch 1530/1540] avg loss 0.00211008, throughput 2.75272K wps
Begin Testing...
[Epoch 27] train avg loss 0.0024689, dev acc 0.8234, dev avg loss 0.501765, throughput 2.80841K wps
[Epoch 28 Batch 30/1540] avg loss 0.00248777, throughput 2.82293K wps
[Epoch 28 Batch 60/1540] avg loss 0.00233514, throughput 2.84319K wps
[Epoch 28 Batch 90/1540] avg loss 0.00256196, throughput 2.83108K wps
[Epoch 28 Batch 120/1540] avg loss 0.00216628, throughput 2.79025K wps
[Epoch 28 Batch 150/1540] avg loss 0.00232645, throughput 2.7695K wps
[Epoch 28 Batch 180/1540] avg loss 0.00230701, throughput 2.82426K wps
[Epoch 28 Batch 210/1540] avg loss 0.00234777, throughput 2.79739K wps
[Epoch 28 Batch 240/1540] avg loss 0.0024355, throughput 2.76431K wps
[Epoch 28 Batch 270/1540] avg loss 0.00185195, throughput 2.78381K wps
[Epoch 28 Batch 300/1540] avg loss 0.0023317, throughput 2.84381K wps
[Epoch 28 Batch 330/1540] avg loss 0.00223914, throughput 2.83955K wps
[Epoch 28 Batch 360/1540] avg loss 0.0020751, throughput 2.83364K wps
[Epoch 28 Batch 390/1540] avg loss 0.00268217, throughput 2.76546K wps
[Epoch 28 Batch 420/1540] avg loss 0.00247023, throughput 2.77362K wps
[Epoch 28 Batch 450/1540] avg loss 0.00253983, throughput 2.81341K wps
[Epoch 28 Batch 480/1540] avg loss 0.00217791, throughput 2.82267K wps
[Epoch 28 Batch 510/1540] avg loss 0.00234461, throughput 2.79135K wps
[Epoch 28 Batch 540/1540] avg loss 0.00236176, throughput 2.83961K wps
[Epoch 28 Batch 570/1540] avg loss 0.00247637, throughput 2.79279K wps
[Epoch 28 Batch 600/1540] avg loss 0.00272412, throughput 2.8379K wps
[Epoch 28 Batch 630/1540] avg loss 0.00226355, throughput 2.82166K wps
[Epoch 28 Batch 660/1540] avg loss 0.00260962, throughput 2.78637K wps
[Epoch 28 Batch 690/1540] avg loss 0.00278607, throughput 2.84481K wps
[Epoch 28 Batch 720/1540] avg loss 0.0025293, throughput 2.8472K wps
[Epoch 28 Batch 750/1540] avg loss 0.00269563, throughput 2.76699K wps
[Epoch 28 Batch 780/1540] avg loss 0.00245155, throughput 2.83581K wps
[Epoch 28 Batch 810/1540] avg loss 0.00245555, throughput 2.76013K wps
[Epoch 28 Batch 840/1540] avg loss 0.00238099, throughput 2.75626K wps
[Epoch 28 Batch 870/1540] avg loss 0.00225821, throughput 2.8215K wps
[Epoch 28 Batch 900/1540] avg loss 0.00226572, throughput 2.84573K wps
[Epoch 28 Batch 930/1540] avg loss 0.00256207, throughput 2.80665K wps
[Epoch 28 Batch 960/1540] avg loss 0.00226695, throughput 2.80017K wps
[Epoch 28 Batch 990/1540] avg loss 0.00257276, throughput 2.78633K wps
[Epoch 28 Batch 1020/1540] avg loss 0.00238254, throughput 2.77445K wps
[Epoch 28 Batch 1050/1540] avg loss 0.00240136, throughput 2.78521K wps
[Epoch 28 Batch 1080/1540] avg loss 0.00234418, throughput 2.84511K wps
[Epoch 28 Batch 1110/1540] avg loss 0.00264258, throughput 2.83834K wps
[Epoch 28 Batch 1140/1540] avg loss 0.0023383, throughput 2.77383K wps
[Epoch 28 Batch 1170/1540] avg loss 0.00251055, throughput 2.83844K wps
[Epoch 28 Batch 1200/1540] avg loss 0.00218791, throughput 2.83701K wps
[Epoch 28 Batch 1230/1540] avg loss 0.00240196, throughput 2.83477K wps
[Epoch 28 Batch 1260/1540] avg loss 0.00250204, throughput 2.79798K wps
[Epoch 28 Batch 1290/1540] avg loss 0.00231728, throughput 2.84557K wps
[Epoch 28 Batch 1320/1540] avg loss 0.00243519, throughput 2.81192K wps
[Epoch 28 Batch 1350/1540] avg loss 0.00253346, throughput 2.77665K wps
[Epoch 28 Batch 1380/1540] avg loss 0.00238109, throughput 2.8388K wps
[Epoch 28 Batch 1410/1540] avg loss 0.00254549, throughput 2.79613K wps
[Epoch 28 Batch 1440/1540] avg loss 0.00279034, throughput 2.75187K wps
[Epoch 28 Batch 1470/1540] avg loss 0.00246373, throughput 2.76242K wps
[Epoch 28 Batch 1500/1540] avg loss 0.0025537, throughput 2.84522K wps
[Epoch 28 Batch 1530/1540] avg loss 0.00247711, throughput 2.79575K wps
Begin Testing...
[Epoch 28] train avg loss 0.00241982, dev acc 0.8314, dev avg loss 0.504491, throughput 2.80795K wps
[Epoch 29 Batch 30/1540] avg loss 0.00220972, throughput 2.82922K wps
[Epoch 29 Batch 60/1540] avg loss 0.00194123, throughput 2.81838K wps
[Epoch 29 Batch 90/1540] avg loss 0.00201429, throughput 2.80936K wps
[Epoch 29 Batch 120/1540] avg loss 0.00243545, throughput 2.80531K wps
[Epoch 29 Batch 150/1540] avg loss 0.00199602, throughput 2.84117K wps
[Epoch 29 Batch 180/1540] avg loss 0.00234776, throughput 2.81507K wps
[Epoch 29 Batch 210/1540] avg loss 0.00254796, throughput 2.76497K wps
[Epoch 29 Batch 240/1540] avg loss 0.002515, throughput 2.76467K wps
[Epoch 29 Batch 270/1540] avg loss 0.00201389, throughput 2.81913K wps
[Epoch 29 Batch 300/1540] avg loss 0.00213519, throughput 2.8272K wps
[Epoch 29 Batch 330/1540] avg loss 0.00225861, throughput 2.80784K wps
[Epoch 29 Batch 360/1540] avg loss 0.00215235, throughput 2.78245K wps
[Epoch 29 Batch 390/1540] avg loss 0.0025099, throughput 2.76501K wps
[Epoch 29 Batch 420/1540] avg loss 0.00234727, throughput 2.84014K wps
[Epoch 29 Batch 450/1540] avg loss 0.00241067, throughput 2.84744K wps
[Epoch 29 Batch 480/1540] avg loss 0.00217088, throughput 2.84702K wps
[Epoch 29 Batch 510/1540] avg loss 0.00256412, throughput 2.84843K wps
[Epoch 29 Batch 540/1540] avg loss 0.00238971, throughput 2.84554K wps
[Epoch 29 Batch 570/1540] avg loss 0.00251107, throughput 2.76335K wps
[Epoch 29 Batch 600/1540] avg loss 0.00238634, throughput 2.82807K wps
[Epoch 29 Batch 630/1540] avg loss 0.00247346, throughput 2.84019K wps
[Epoch 29 Batch 660/1540] avg loss 0.00243482, throughput 2.84458K wps
[Epoch 29 Batch 690/1540] avg loss 0.00258458, throughput 2.8344K wps
[Epoch 29 Batch 720/1540] avg loss 0.00219688, throughput 2.84594K wps
[Epoch 29 Batch 750/1540] avg loss 0.00264835, throughput 2.83092K wps
[Epoch 29 Batch 780/1540] avg loss 0.00259549, throughput 2.84266K wps
[Epoch 29 Batch 810/1540] avg loss 0.00226447, throughput 2.77308K wps
[Epoch 29 Batch 840/1540] avg loss 0.00208759, throughput 2.83974K wps
[Epoch 29 Batch 870/1540] avg loss 0.00258097, throughput 2.79537K wps
[Epoch 29 Batch 900/1540] avg loss 0.00271768, throughput 2.83883K wps
[Epoch 29 Batch 930/1540] avg loss 0.00231667, throughput 2.78303K wps
[Epoch 29 Batch 960/1540] avg loss 0.00247163, throughput 2.75891K wps
[Epoch 29 Batch 990/1540] avg loss 0.00247567, throughput 2.82717K wps
[Epoch 29 Batch 1020/1540] avg loss 0.00175011, throughput 2.84937K wps
[Epoch 29 Batch 1050/1540] avg loss 0.00230877, throughput 2.83724K wps
[Epoch 29 Batch 1080/1540] avg loss 0.00228362, throughput 2.82165K wps
[Epoch 29 Batch 1110/1540] avg loss 0.00262033, throughput 2.77123K wps
[Epoch 29 Batch 1140/1540] avg loss 0.00237632, throughput 2.83606K wps
[Epoch 29 Batch 1170/1540] avg loss 0.00235447, throughput 2.84508K wps
[Epoch 29 Batch 1200/1540] avg loss 0.00220494, throughput 2.82951K wps
[Epoch 29 Batch 1230/1540] avg loss 0.0023085, throughput 2.76053K wps
[Epoch 29 Batch 1260/1540] avg loss 0.00233883, throughput 2.80086K wps
[Epoch 29 Batch 1290/1540] avg loss 0.00246959, throughput 2.80884K wps
[Epoch 29 Batch 1320/1540] avg loss 0.00228024, throughput 2.80487K wps
[Epoch 29 Batch 1350/1540] avg loss 0.00239246, throughput 2.84056K wps
[Epoch 29 Batch 1380/1540] avg loss 0.00228853, throughput 2.84681K wps
[Epoch 29 Batch 1410/1540] avg loss 0.00196983, throughput 2.84665K wps
[Epoch 29 Batch 1440/1540] avg loss 0.00230802, throughput 2.84484K wps
[Epoch 29 Batch 1470/1540] avg loss 0.00258167, throughput 2.8445K wps
[Epoch 29 Batch 1500/1540] avg loss 0.00262131, throughput 2.84587K wps
[Epoch 29 Batch 1530/1540] avg loss 0.00233425, throughput 2.8087K wps
Begin Testing...
[Epoch 29] train avg loss 0.00234486, dev acc 0.8303, dev avg loss 0.506551, throughput 2.81841K wps
[Epoch 30 Batch 30/1540] avg loss 0.00234406, throughput 2.84276K wps
[Epoch 30 Batch 60/1540] avg loss 0.00210107, throughput 2.83107K wps
[Epoch 30 Batch 90/1540] avg loss 0.00203821, throughput 2.79238K wps
[Epoch 30 Batch 120/1540] avg loss 0.00235609, throughput 2.83591K wps
[Epoch 30 Batch 150/1540] avg loss 0.00261074, throughput 2.84313K wps
[Epoch 30 Batch 180/1540] avg loss 0.00211441, throughput 2.74602K wps
[Epoch 30 Batch 210/1540] avg loss 0.00218147, throughput 2.82835K wps
[Epoch 30 Batch 240/1540] avg loss 0.00178625, throughput 2.83371K wps
[Epoch 30 Batch 270/1540] avg loss 0.00223586, throughput 2.84507K wps
[Epoch 30 Batch 300/1540] avg loss 0.00200498, throughput 2.8468K wps
[Epoch 30 Batch 330/1540] avg loss 0.00241427, throughput 2.84558K wps
[Epoch 30 Batch 360/1540] avg loss 0.00238687, throughput 2.78718K wps
[Epoch 30 Batch 390/1540] avg loss 0.00222429, throughput 2.76261K wps
[Epoch 30 Batch 420/1540] avg loss 0.00242043, throughput 2.81594K wps
[Epoch 30 Batch 450/1540] avg loss 0.00227107, throughput 2.83445K wps
[Epoch 30 Batch 480/1540] avg loss 0.00247188, throughput 2.83824K wps
[Epoch 30 Batch 510/1540] avg loss 0.00236978, throughput 2.84516K wps
[Epoch 30 Batch 540/1540] avg loss 0.00235304, throughput 2.84424K wps
[Epoch 30 Batch 570/1540] avg loss 0.00235426, throughput 2.84879K wps
[Epoch 30 Batch 600/1540] avg loss 0.00213518, throughput 2.82207K wps
[Epoch 30 Batch 630/1540] avg loss 0.00199712, throughput 2.76414K wps
[Epoch 30 Batch 660/1540] avg loss 0.00234557, throughput 2.80251K wps
[Epoch 30 Batch 690/1540] avg loss 0.00225466, throughput 2.82459K wps
[Epoch 30 Batch 720/1540] avg loss 0.00206943, throughput 2.83375K wps
[Epoch 30 Batch 750/1540] avg loss 0.00224855, throughput 2.82407K wps
[Epoch 30 Batch 780/1540] avg loss 0.00214796, throughput 2.84554K wps
[Epoch 30 Batch 810/1540] avg loss 0.0025113, throughput 2.84423K wps
[Epoch 30 Batch 840/1540] avg loss 0.00225774, throughput 2.84461K wps
[Epoch 30 Batch 870/1540] avg loss 0.00229273, throughput 2.83862K wps
[Epoch 30 Batch 900/1540] avg loss 0.00251882, throughput 2.83349K wps
[Epoch 30 Batch 930/1540] avg loss 0.00249651, throughput 2.75846K wps
[Epoch 30 Batch 960/1540] avg loss 0.00215172, throughput 2.75098K wps
[Epoch 30 Batch 990/1540] avg loss 0.00259097, throughput 2.76305K wps
[Epoch 30 Batch 1020/1540] avg loss 0.00239313, throughput 2.78541K wps
[Epoch 30 Batch 1050/1540] avg loss 0.00245391, throughput 2.82193K wps
[Epoch 30 Batch 1080/1540] avg loss 0.00213404, throughput 2.79472K wps
[Epoch 30 Batch 1110/1540] avg loss 0.00214298, throughput 2.77547K wps
[Epoch 30 Batch 1140/1540] avg loss 0.00231276, throughput 2.75825K wps
[Epoch 30 Batch 1170/1540] avg loss 0.00252334, throughput 2.7704K wps
[Epoch 30 Batch 1200/1540] avg loss 0.00246191, throughput 2.84661K wps
[Epoch 30 Batch 1230/1540] avg loss 0.00242574, throughput 2.83075K wps
[Epoch 30 Batch 1260/1540] avg loss 0.00274792, throughput 2.84622K wps
[Epoch 30 Batch 1290/1540] avg loss 0.00213783, throughput 2.84104K wps
[Epoch 30 Batch 1320/1540] avg loss 0.00265359, throughput 2.83375K wps
[Epoch 30 Batch 1350/1540] avg loss 0.00229219, throughput 2.77961K wps
[Epoch 30 Batch 1380/1540] avg loss 0.0022685, throughput 2.76397K wps
[Epoch 30 Batch 1410/1540] avg loss 0.00233188, throughput 2.8483K wps
[Epoch 30 Batch 1440/1540] avg loss 0.0021793, throughput 2.84807K wps
[Epoch 30 Batch 1470/1540] avg loss 0.00252942, throughput 2.84378K wps
[Epoch 30 Batch 1500/1540] avg loss 0.00216974, throughput 2.83703K wps
[Epoch 30 Batch 1530/1540] avg loss 0.00222304, throughput 2.84656K wps
Begin Testing...
[Epoch 30] train avg loss 0.00230467, dev acc 0.8303, dev avg loss 0.506473, throughput 2.81716K wps
[Epoch 31 Batch 30/1540] avg loss 0.00214647, throughput 2.84443K wps
[Epoch 31 Batch 60/1540] avg loss 0.00202009, throughput 2.84389K wps
[Epoch 31 Batch 90/1540] avg loss 0.00189426, throughput 2.76398K wps
[Epoch 31 Batch 120/1540] avg loss 0.00207496, throughput 2.82853K wps
[Epoch 31 Batch 150/1540] avg loss 0.00216786, throughput 2.77399K wps
[Epoch 31 Batch 180/1540] avg loss 0.00194086, throughput 2.83078K wps
[Epoch 31 Batch 210/1540] avg loss 0.00200222, throughput 2.84121K wps
[Epoch 31 Batch 240/1540] avg loss 0.00226906, throughput 2.82259K wps
[Epoch 31 Batch 270/1540] avg loss 0.00221628, throughput 2.80671K wps
[Epoch 31 Batch 300/1540] avg loss 0.00225601, throughput 2.78243K wps
[Epoch 31 Batch 330/1540] avg loss 0.00240355, throughput 2.79924K wps
[Epoch 31 Batch 360/1540] avg loss 0.00220498, throughput 2.84552K wps
[Epoch 31 Batch 390/1540] avg loss 0.0020314, throughput 2.8437K wps
[Epoch 31 Batch 420/1540] avg loss 0.00213647, throughput 2.83427K wps
[Epoch 31 Batch 450/1540] avg loss 0.002054, throughput 2.74937K wps
[Epoch 31 Batch 480/1540] avg loss 0.00210301, throughput 2.79779K wps
[Epoch 31 Batch 510/1540] avg loss 0.00201497, throughput 2.80652K wps
[Epoch 31 Batch 540/1540] avg loss 0.00241116, throughput 2.76257K wps
[Epoch 31 Batch 570/1540] avg loss 0.00217715, throughput 2.76115K wps
[Epoch 31 Batch 600/1540] avg loss 0.00233754, throughput 2.76158K wps
[Epoch 31 Batch 630/1540] avg loss 0.00244983, throughput 2.75891K wps
[Epoch 31 Batch 660/1540] avg loss 0.00229733, throughput 2.76268K wps
[Epoch 31 Batch 690/1540] avg loss 0.00235847, throughput 2.83174K wps
[Epoch 31 Batch 720/1540] avg loss 0.00245073, throughput 2.83391K wps
[Epoch 31 Batch 750/1540] avg loss 0.0022165, throughput 2.8476K wps
[Epoch 31 Batch 780/1540] avg loss 0.00262221, throughput 2.84222K wps
[Epoch 31 Batch 810/1540] avg loss 0.00209679, throughput 2.83617K wps
[Epoch 31 Batch 840/1540] avg loss 0.00207629, throughput 2.82624K wps
[Epoch 31 Batch 870/1540] avg loss 0.00226369, throughput 2.80692K wps
[Epoch 31 Batch 900/1540] avg loss 0.00224028, throughput 2.78281K wps
[Epoch 31 Batch 930/1540] avg loss 0.00220911, throughput 2.7704K wps
[Epoch 31 Batch 960/1540] avg loss 0.00229715, throughput 2.84928K wps
[Epoch 31 Batch 990/1540] avg loss 0.00206049, throughput 2.82469K wps
[Epoch 31 Batch 1020/1540] avg loss 0.00244146, throughput 2.84806K wps
[Epoch 31 Batch 1050/1540] avg loss 0.00200877, throughput 2.84016K wps
[Epoch 31 Batch 1080/1540] avg loss 0.00221657, throughput 2.81711K wps
[Epoch 31 Batch 1110/1540] avg loss 0.0020603, throughput 2.84592K wps
[Epoch 31 Batch 1140/1540] avg loss 0.00260213, throughput 2.828K wps
[Epoch 31 Batch 1170/1540] avg loss 0.00242844, throughput 2.80684K wps
[Epoch 31 Batch 1200/1540] avg loss 0.00229708, throughput 2.84492K wps
[Epoch 31 Batch 1230/1540] avg loss 0.00250528, throughput 2.84788K wps
[Epoch 31 Batch 1260/1540] avg loss 0.00210089, throughput 2.7922K wps
[Epoch 31 Batch 1290/1540] avg loss 0.00227086, throughput 2.85059K wps
[Epoch 31 Batch 1320/1540] avg loss 0.00209592, throughput 2.84223K wps
[Epoch 31 Batch 1350/1540] avg loss 0.00239978, throughput 2.8398K wps
[Epoch 31 Batch 1380/1540] avg loss 0.00238362, throughput 2.84641K wps
[Epoch 31 Batch 1410/1540] avg loss 0.00240514, throughput 2.84643K wps
[Epoch 31 Batch 1440/1540] avg loss 0.00270325, throughput 2.81382K wps
[Epoch 31 Batch 1470/1540] avg loss 0.00232762, throughput 2.81021K wps
[Epoch 31 Batch 1500/1540] avg loss 0.00256414, throughput 2.81706K wps
[Epoch 31 Batch 1530/1540] avg loss 0.00221956, throughput 2.77407K wps
Begin Testing...
[Epoch 31] train avg loss 0.0022406, dev acc 0.8268, dev avg loss 0.521969, throughput 2.81532K wps
[Epoch 32 Batch 30/1540] avg loss 0.00193458, throughput 2.83659K wps
[Epoch 32 Batch 60/1540] avg loss 0.00184664, throughput 2.84735K wps
[Epoch 32 Batch 90/1540] avg loss 0.00200394, throughput 2.81596K wps
[Epoch 32 Batch 120/1540] avg loss 0.00239238, throughput 2.84984K wps
[Epoch 32 Batch 150/1540] avg loss 0.00184872, throughput 2.84456K wps
[Epoch 32 Batch 180/1540] avg loss 0.00207099, throughput 2.77482K wps
[Epoch 32 Batch 210/1540] avg loss 0.00219272, throughput 2.79664K wps
[Epoch 32 Batch 240/1540] avg loss 0.00227045, throughput 2.84228K wps
[Epoch 32 Batch 270/1540] avg loss 0.00241065, throughput 2.84521K wps
[Epoch 32 Batch 300/1540] avg loss 0.00215428, throughput 2.83752K wps
[Epoch 32 Batch 330/1540] avg loss 0.00215163, throughput 2.84949K wps
[Epoch 32 Batch 360/1540] avg loss 0.00193622, throughput 2.84991K wps
[Epoch 32 Batch 390/1540] avg loss 0.0019368, throughput 2.84948K wps
[Epoch 32 Batch 420/1540] avg loss 0.0020355, throughput 2.80627K wps
[Epoch 32 Batch 450/1540] avg loss 0.00194791, throughput 2.78058K wps
[Epoch 32 Batch 480/1540] avg loss 0.00207699, throughput 2.81227K wps
[Epoch 32 Batch 510/1540] avg loss 0.00221648, throughput 2.84732K wps
[Epoch 32 Batch 540/1540] avg loss 0.00202153, throughput 2.76131K wps
[Epoch 32 Batch 570/1540] avg loss 0.00237878, throughput 2.75292K wps
[Epoch 32 Batch 600/1540] avg loss 0.0022959, throughput 2.75792K wps
[Epoch 32 Batch 630/1540] avg loss 0.00201846, throughput 2.83014K wps
[Epoch 32 Batch 660/1540] avg loss 0.0021708, throughput 2.83691K wps
[Epoch 32 Batch 690/1540] avg loss 0.00251282, throughput 2.84421K wps
[Epoch 32 Batch 720/1540] avg loss 0.0023498, throughput 2.84639K wps
[Epoch 32 Batch 750/1540] avg loss 0.00185292, throughput 2.84528K wps
[Epoch 32 Batch 780/1540] avg loss 0.00216047, throughput 2.84161K wps
[Epoch 32 Batch 810/1540] avg loss 0.00213404, throughput 2.76485K wps
[Epoch 32 Batch 840/1540] avg loss 0.00210905, throughput 2.83551K wps
[Epoch 32 Batch 870/1540] avg loss 0.00241869, throughput 2.84823K wps
[Epoch 32 Batch 900/1540] avg loss 0.00211175, throughput 2.84723K wps
[Epoch 32 Batch 930/1540] avg loss 0.00257522, throughput 2.76692K wps
[Epoch 32 Batch 960/1540] avg loss 0.00222797, throughput 2.78857K wps
[Epoch 32 Batch 990/1540] avg loss 0.00249203, throughput 2.84811K wps
[Epoch 32 Batch 1020/1540] avg loss 0.00250777, throughput 2.84268K wps
[Epoch 32 Batch 1050/1540] avg loss 0.00189881, throughput 2.85035K wps
[Epoch 32 Batch 1080/1540] avg loss 0.00201707, throughput 2.80427K wps
[Epoch 32 Batch 1110/1540] avg loss 0.00195933, throughput 2.78125K wps
[Epoch 32 Batch 1140/1540] avg loss 0.00243836, throughput 2.75678K wps
[Epoch 32 Batch 1170/1540] avg loss 0.00217932, throughput 2.83756K wps
[Epoch 32 Batch 1200/1540] avg loss 0.00224328, throughput 2.8261K wps
[Epoch 32 Batch 1230/1540] avg loss 0.00212833, throughput 2.81896K wps
[Epoch 32 Batch 1260/1540] avg loss 0.00271038, throughput 2.84101K wps
[Epoch 32 Batch 1290/1540] avg loss 0.00228401, throughput 2.8413K wps
[Epoch 32 Batch 1320/1540] avg loss 0.00222, throughput 2.83935K wps
[Epoch 32 Batch 1350/1540] avg loss 0.00237554, throughput 2.8483K wps
[Epoch 32 Batch 1380/1540] avg loss 0.00222472, throughput 2.84641K wps
[Epoch 32 Batch 1410/1540] avg loss 0.00228346, throughput 2.84743K wps
[Epoch 32 Batch 1440/1540] avg loss 0.00217388, throughput 2.82721K wps
[Epoch 32 Batch 1470/1540] avg loss 0.00230318, throughput 2.81231K wps
[Epoch 32 Batch 1500/1540] avg loss 0.00241836, throughput 2.81957K wps
[Epoch 32 Batch 1530/1540] avg loss 0.0022803, throughput 2.83998K wps
Begin Testing...
[Epoch 32] train avg loss 0.00219624, dev acc 0.8280, dev avg loss 0.522943, throughput 2.82256K wps
[Epoch 33 Batch 30/1540] avg loss 0.00247531, throughput 2.88136K wps
[Epoch 33 Batch 60/1540] avg loss 0.00220865, throughput 2.80376K wps
[Epoch 33 Batch 90/1540] avg loss 0.00223303, throughput 2.77633K wps
[Epoch 33 Batch 120/1540] avg loss 0.00228469, throughput 2.85053K wps
[Epoch 33 Batch 150/1540] avg loss 0.00207245, throughput 2.84731K wps
[Epoch 33 Batch 180/1540] avg loss 0.00194527, throughput 2.843K wps
[Epoch 33 Batch 210/1540] avg loss 0.00202045, throughput 2.83826K wps
[Epoch 33 Batch 240/1540] avg loss 0.00208328, throughput 2.77362K wps
[Epoch 33 Batch 270/1540] avg loss 0.00221033, throughput 2.8516K wps
[Epoch 33 Batch 300/1540] avg loss 0.00215807, throughput 2.84882K wps
[Epoch 33 Batch 330/1540] avg loss 0.00200212, throughput 2.76693K wps
[Epoch 33 Batch 360/1540] avg loss 0.00201484, throughput 2.78907K wps
[Epoch 33 Batch 390/1540] avg loss 0.00210145, throughput 2.79843K wps
[Epoch 33 Batch 420/1540] avg loss 0.00237557, throughput 2.84846K wps
[Epoch 33 Batch 450/1540] avg loss 0.00228238, throughput 2.84587K wps
[Epoch 33 Batch 480/1540] avg loss 0.00224261, throughput 2.83939K wps
[Epoch 33 Batch 510/1540] avg loss 0.00189367, throughput 2.76426K wps
[Epoch 33 Batch 540/1540] avg loss 0.00204162, throughput 2.764K wps
[Epoch 33 Batch 570/1540] avg loss 0.00206594, throughput 2.84234K wps
[Epoch 33 Batch 600/1540] avg loss 0.00219562, throughput 2.79149K wps
[Epoch 33 Batch 630/1540] avg loss 0.00196933, throughput 2.76776K wps
[Epoch 33 Batch 660/1540] avg loss 0.0021358, throughput 2.75605K wps
[Epoch 33 Batch 690/1540] avg loss 0.00221527, throughput 2.77081K wps
[Epoch 33 Batch 720/1540] avg loss 0.00245959, throughput 2.85254K wps
[Epoch 33 Batch 750/1540] avg loss 0.00213415, throughput 2.84992K wps
[Epoch 33 Batch 780/1540] avg loss 0.00225854, throughput 2.84685K wps
[Epoch 33 Batch 810/1540] avg loss 0.00225874, throughput 2.84431K wps
[Epoch 33 Batch 840/1540] avg loss 0.00224966, throughput 2.85115K wps
[Epoch 33 Batch 870/1540] avg loss 0.00209259, throughput 2.82651K wps
[Epoch 33 Batch 900/1540] avg loss 0.00210768, throughput 2.84557K wps
[Epoch 33 Batch 930/1540] avg loss 0.00222564, throughput 2.85169K wps
[Epoch 33 Batch 960/1540] avg loss 0.00207568, throughput 2.85345K wps
[Epoch 33 Batch 990/1540] avg loss 0.0024798, throughput 2.84959K wps
[Epoch 33 Batch 1020/1540] avg loss 0.00224239, throughput 2.84776K wps
[Epoch 33 Batch 1050/1540] avg loss 0.00223689, throughput 2.82209K wps
[Epoch 33 Batch 1080/1540] avg loss 0.00233426, throughput 2.76235K wps
[Epoch 33 Batch 1110/1540] avg loss 0.00224778, throughput 2.77012K wps
[Epoch 33 Batch 1140/1540] avg loss 0.00222959, throughput 2.8494K wps
[Epoch 33 Batch 1170/1540] avg loss 0.00214001, throughput 2.8475K wps
[Epoch 33 Batch 1200/1540] avg loss 0.00224428, throughput 2.84824K wps
[Epoch 33 Batch 1230/1540] avg loss 0.00201527, throughput 2.83769K wps
[Epoch 33 Batch 1260/1540] avg loss 0.0023347, throughput 2.80788K wps
[Epoch 33 Batch 1290/1540] avg loss 0.00236442, throughput 2.8185K wps
[Epoch 33 Batch 1320/1540] avg loss 0.00238928, throughput 2.85032K wps
[Epoch 33 Batch 1350/1540] avg loss 0.00223027, throughput 2.82626K wps
[Epoch 33 Batch 1380/1540] avg loss 0.00227162, throughput 2.83826K wps
[Epoch 33 Batch 1410/1540] avg loss 0.00197495, throughput 2.83739K wps
[Epoch 33 Batch 1440/1540] avg loss 0.00222973, throughput 2.82074K wps
[Epoch 33 Batch 1470/1540] avg loss 0.00220944, throughput 2.85193K wps
[Epoch 33 Batch 1500/1540] avg loss 0.0018906, throughput 2.78062K wps
[Epoch 33 Batch 1530/1540] avg loss 0.00244601, throughput 2.75646K wps
Begin Testing...
[Epoch 33] train avg loss 0.00219133, dev acc 0.8303, dev avg loss 0.523926, throughput 2.82085K wps
[Epoch 34 Batch 30/1540] avg loss 0.00185257, throughput 2.89914K wps
[Epoch 34 Batch 60/1540] avg loss 0.00200239, throughput 2.81651K wps
[Epoch 34 Batch 90/1540] avg loss 0.00237462, throughput 2.79095K wps
[Epoch 34 Batch 120/1540] avg loss 0.00235732, throughput 2.76242K wps
[Epoch 34 Batch 150/1540] avg loss 0.00214749, throughput 2.74915K wps
[Epoch 34 Batch 180/1540] avg loss 0.00182026, throughput 2.76392K wps
[Epoch 34 Batch 210/1540] avg loss 0.0022652, throughput 2.75813K wps
[Epoch 34 Batch 240/1540] avg loss 0.0019517, throughput 2.81664K wps
[Epoch 34 Batch 270/1540] avg loss 0.00214594, throughput 2.83311K wps
[Epoch 34 Batch 300/1540] avg loss 0.00193297, throughput 2.84848K wps
[Epoch 34 Batch 330/1540] avg loss 0.00197947, throughput 2.79876K wps
[Epoch 34 Batch 360/1540] avg loss 0.00174994, throughput 2.76464K wps
[Epoch 34 Batch 390/1540] avg loss 0.00169107, throughput 2.83613K wps
[Epoch 34 Batch 420/1540] avg loss 0.00191131, throughput 2.84302K wps
[Epoch 34 Batch 450/1540] avg loss 0.00200085, throughput 2.83228K wps
[Epoch 34 Batch 480/1540] avg loss 0.00216505, throughput 2.75619K wps
[Epoch 34 Batch 510/1540] avg loss 0.00238751, throughput 2.83038K wps
[Epoch 34 Batch 540/1540] avg loss 0.00231744, throughput 2.85119K wps
[Epoch 34 Batch 570/1540] avg loss 0.00205776, throughput 2.84998K wps
[Epoch 34 Batch 600/1540] avg loss 0.00204049, throughput 2.84109K wps
[Epoch 34 Batch 630/1540] avg loss 0.00197148, throughput 2.84508K wps
[Epoch 34 Batch 660/1540] avg loss 0.00193723, throughput 2.85066K wps
[Epoch 34 Batch 690/1540] avg loss 0.00210741, throughput 2.84987K wps
[Epoch 34 Batch 720/1540] avg loss 0.00239315, throughput 2.76316K wps
[Epoch 34 Batch 750/1540] avg loss 0.00211983, throughput 2.82717K wps
[Epoch 34 Batch 780/1540] avg loss 0.00222726, throughput 2.79964K wps
[Epoch 34 Batch 810/1540] avg loss 0.00242984, throughput 2.81668K wps
[Epoch 34 Batch 840/1540] avg loss 0.0024128, throughput 2.84679K wps
[Epoch 34 Batch 870/1540] avg loss 0.00208335, throughput 2.84171K wps
[Epoch 34 Batch 900/1540] avg loss 0.00192856, throughput 2.84489K wps
[Epoch 34 Batch 930/1540] avg loss 0.00223089, throughput 2.85374K wps
[Epoch 34 Batch 960/1540] avg loss 0.00197818, throughput 2.82273K wps
[Epoch 34 Batch 990/1540] avg loss 0.00233749, throughput 2.77662K wps
[Epoch 34 Batch 1020/1540] avg loss 0.00218155, throughput 2.82263K wps
[Epoch 34 Batch 1050/1540] avg loss 0.00202698, throughput 2.84438K wps
[Epoch 34 Batch 1080/1540] avg loss 0.0020755, throughput 2.77494K wps
[Epoch 34 Batch 1110/1540] avg loss 0.00212386, throughput 2.84652K wps
[Epoch 34 Batch 1140/1540] avg loss 0.00207808, throughput 2.819K wps
[Epoch 34 Batch 1170/1540] avg loss 0.00225252, throughput 2.85186K wps
[Epoch 34 Batch 1200/1540] avg loss 0.00253419, throughput 2.8372K wps
[Epoch 34 Batch 1230/1540] avg loss 0.00240142, throughput 2.85179K wps
[Epoch 34 Batch 1260/1540] avg loss 0.00222277, throughput 2.84044K wps
[Epoch 34 Batch 1290/1540] avg loss 0.0019883, throughput 2.76863K wps
[Epoch 34 Batch 1320/1540] avg loss 0.00229228, throughput 2.8168K wps
[Epoch 34 Batch 1350/1540] avg loss 0.00224135, throughput 2.81653K wps
[Epoch 34 Batch 1380/1540] avg loss 0.00181813, throughput 2.84558K wps
[Epoch 34 Batch 1410/1540] avg loss 0.00219695, throughput 2.79514K wps
[Epoch 34 Batch 1440/1540] avg loss 0.00210751, throughput 2.76388K wps
[Epoch 34 Batch 1470/1540] avg loss 0.00253195, throughput 2.7682K wps
[Epoch 34 Batch 1500/1540] avg loss 0.00219463, throughput 2.82956K wps
[Epoch 34 Batch 1530/1540] avg loss 0.00233928, throughput 2.83507K wps
Begin Testing...
[Epoch 34] train avg loss 0.0021385, dev acc 0.8314, dev avg loss 0.529468, throughput 2.81757K wps
[Epoch 35 Batch 30/1540] avg loss 0.00183097, throughput 2.8321K wps
[Epoch 35 Batch 60/1540] avg loss 0.00203188, throughput 2.79435K wps
[Epoch 35 Batch 90/1540] avg loss 0.0019919, throughput 2.84177K wps
[Epoch 35 Batch 120/1540] avg loss 0.00187176, throughput 2.85045K wps
[Epoch 35 Batch 150/1540] avg loss 0.00215391, throughput 2.78839K wps
[Epoch 35 Batch 180/1540] avg loss 0.00245215, throughput 2.81483K wps
[Epoch 35 Batch 210/1540] avg loss 0.00181785, throughput 2.82285K wps
[Epoch 35 Batch 240/1540] avg loss 0.0019102, throughput 2.84089K wps
[Epoch 35 Batch 270/1540] avg loss 0.00182893, throughput 2.84757K wps
[Epoch 35 Batch 300/1540] avg loss 0.00177837, throughput 2.84636K wps
[Epoch 35 Batch 330/1540] avg loss 0.00228875, throughput 2.84523K wps
[Epoch 35 Batch 360/1540] avg loss 0.00203297, throughput 2.78864K wps
[Epoch 35 Batch 390/1540] avg loss 0.00204948, throughput 2.81981K wps
[Epoch 35 Batch 420/1540] avg loss 0.00183758, throughput 2.83904K wps
[Epoch 35 Batch 450/1540] avg loss 0.00220679, throughput 2.8447K wps
[Epoch 35 Batch 480/1540] avg loss 0.00220373, throughput 2.84863K wps
[Epoch 35 Batch 510/1540] avg loss 0.00189896, throughput 2.79202K wps
[Epoch 35 Batch 540/1540] avg loss 0.00221356, throughput 2.82059K wps
[Epoch 35 Batch 570/1540] avg loss 0.00184909, throughput 2.751K wps
[Epoch 35 Batch 600/1540] avg loss 0.00219891, throughput 2.75895K wps
[Epoch 35 Batch 630/1540] avg loss 0.00187402, throughput 2.83273K wps
[Epoch 35 Batch 660/1540] avg loss 0.00208276, throughput 2.84394K wps
[Epoch 35 Batch 690/1540] avg loss 0.00199035, throughput 2.83083K wps
[Epoch 35 Batch 720/1540] avg loss 0.00195975, throughput 2.75853K wps
[Epoch 35 Batch 750/1540] avg loss 0.00213588, throughput 2.75494K wps
[Epoch 35 Batch 780/1540] avg loss 0.00205943, throughput 2.825K wps
[Epoch 35 Batch 810/1540] avg loss 0.00228114, throughput 2.81348K wps
[Epoch 35 Batch 840/1540] avg loss 0.0022137, throughput 2.78837K wps
[Epoch 35 Batch 870/1540] avg loss 0.00179388, throughput 2.82591K wps
[Epoch 35 Batch 900/1540] avg loss 0.00197603, throughput 2.77771K wps
[Epoch 35 Batch 930/1540] avg loss 0.00202171, throughput 2.80459K wps
[Epoch 35 Batch 960/1540] avg loss 0.00222456, throughput 2.75763K wps
[Epoch 35 Batch 990/1540] avg loss 0.00204271, throughput 2.75656K wps
[Epoch 35 Batch 1020/1540] avg loss 0.00207401, throughput 2.76046K wps
[Epoch 35 Batch 1050/1540] avg loss 0.00212444, throughput 2.7879K wps
[Epoch 35 Batch 1080/1540] avg loss 0.00166034, throughput 2.75227K wps
[Epoch 35 Batch 1110/1540] avg loss 0.00178274, throughput 2.75326K wps
[Epoch 35 Batch 1140/1540] avg loss 0.00213631, throughput 2.816K wps
[Epoch 35 Batch 1170/1540] avg loss 0.00226476, throughput 2.8448K wps
[Epoch 35 Batch 1200/1540] avg loss 0.00218475, throughput 2.84895K wps
[Epoch 35 Batch 1230/1540] avg loss 0.00239684, throughput 2.84567K wps
[Epoch 35 Batch 1260/1540] avg loss 0.00219101, throughput 2.84436K wps
[Epoch 35 Batch 1290/1540] avg loss 0.00217812, throughput 2.84987K wps
[Epoch 35 Batch 1320/1540] avg loss 0.00206654, throughput 2.84743K wps
[Epoch 35 Batch 1350/1540] avg loss 0.00200638, throughput 2.80977K wps
[Epoch 35 Batch 1380/1540] avg loss 0.002184, throughput 2.85141K wps
[Epoch 35 Batch 1410/1540] avg loss 0.00230517, throughput 2.84903K wps
[Epoch 35 Batch 1440/1540] avg loss 0.00168898, throughput 2.78563K wps
[Epoch 35 Batch 1470/1540] avg loss 0.00207873, throughput 2.80437K wps
[Epoch 35 Batch 1500/1540] avg loss 0.00184302, throughput 2.75456K wps
[Epoch 35 Batch 1530/1540] avg loss 0.00234773, throughput 2.76333K wps
Begin Testing...
[Epoch 35] train avg loss 0.00205485, dev acc 0.8360, dev avg loss 0.533749, throughput 2.80995K wps
[Epoch 36 Batch 30/1540] avg loss 0.00201958, throughput 2.84412K wps
[Epoch 36 Batch 60/1540] avg loss 0.0017904, throughput 2.84063K wps
[Epoch 36 Batch 90/1540] avg loss 0.00208785, throughput 2.78971K wps
[Epoch 36 Batch 120/1540] avg loss 0.00192959, throughput 2.84667K wps
[Epoch 36 Batch 150/1540] avg loss 0.00198384, throughput 2.84806K wps
[Epoch 36 Batch 180/1540] avg loss 0.00193435, throughput 2.81533K wps
[Epoch 36 Batch 210/1540] avg loss 0.00195056, throughput 2.82425K wps
[Epoch 36 Batch 240/1540] avg loss 0.00186693, throughput 2.76008K wps
[Epoch 36 Batch 270/1540] avg loss 0.00188217, throughput 2.76984K wps
[Epoch 36 Batch 300/1540] avg loss 0.00197067, throughput 2.79116K wps
[Epoch 36 Batch 330/1540] avg loss 0.00199021, throughput 2.8441K wps
[Epoch 36 Batch 360/1540] avg loss 0.00176994, throughput 2.83365K wps
[Epoch 36 Batch 390/1540] avg loss 0.00227791, throughput 2.7943K wps
[Epoch 36 Batch 420/1540] avg loss 0.00207773, throughput 2.79755K wps
[Epoch 36 Batch 450/1540] avg loss 0.00186614, throughput 2.84873K wps
[Epoch 36 Batch 480/1540] avg loss 0.00205818, throughput 2.79005K wps
[Epoch 36 Batch 510/1540] avg loss 0.00197208, throughput 2.83241K wps
[Epoch 36 Batch 540/1540] avg loss 0.00210111, throughput 2.8485K wps
[Epoch 36 Batch 570/1540] avg loss 0.00183356, throughput 2.83127K wps
[Epoch 36 Batch 600/1540] avg loss 0.00221025, throughput 2.80629K wps
[Epoch 36 Batch 630/1540] avg loss 0.00188562, throughput 2.8023K wps
[Epoch 36 Batch 660/1540] avg loss 0.00210901, throughput 2.80191K wps
[Epoch 36 Batch 690/1540] avg loss 0.00194899, throughput 2.85064K wps
[Epoch 36 Batch 720/1540] avg loss 0.00190677, throughput 2.76069K wps
[Epoch 36 Batch 750/1540] avg loss 0.00237186, throughput 2.79962K wps
[Epoch 36 Batch 780/1540] avg loss 0.00198094, throughput 2.80438K wps
[Epoch 36 Batch 810/1540] avg loss 0.0020074, throughput 2.8475K wps
[Epoch 36 Batch 840/1540] avg loss 0.00217309, throughput 2.85275K wps
[Epoch 36 Batch 870/1540] avg loss 0.00192458, throughput 2.8459K wps
[Epoch 36 Batch 900/1540] avg loss 0.00189296, throughput 2.83756K wps
[Epoch 36 Batch 930/1540] avg loss 0.00220729, throughput 2.76399K wps
[Epoch 36 Batch 960/1540] avg loss 0.00194019, throughput 2.81669K wps
[Epoch 36 Batch 990/1540] avg loss 0.00220479, throughput 2.84339K wps
[Epoch 36 Batch 1020/1540] avg loss 0.00201358, throughput 2.79364K wps
[Epoch 36 Batch 1050/1540] avg loss 0.00246635, throughput 2.84144K wps
[Epoch 36 Batch 1080/1540] avg loss 0.00229551, throughput 2.85215K wps
[Epoch 36 Batch 1110/1540] avg loss 0.00217478, throughput 2.83885K wps
[Epoch 36 Batch 1140/1540] avg loss 0.00182772, throughput 2.79375K wps
[Epoch 36 Batch 1170/1540] avg loss 0.00215045, throughput 2.8025K wps
[Epoch 36 Batch 1200/1540] avg loss 0.00203557, throughput 2.79596K wps
[Epoch 36 Batch 1230/1540] avg loss 0.00202088, throughput 2.79423K wps
[Epoch 36 Batch 1260/1540] avg loss 0.00198755, throughput 2.7695K wps
[Epoch 36 Batch 1290/1540] avg loss 0.00217052, throughput 2.77645K wps
[Epoch 36 Batch 1320/1540] avg loss 0.00187435, throughput 2.8438K wps
[Epoch 36 Batch 1350/1540] avg loss 0.00257457, throughput 2.85274K wps
[Epoch 36 Batch 1380/1540] avg loss 0.00251256, throughput 2.8461K wps
[Epoch 36 Batch 1410/1540] avg loss 0.00212215, throughput 2.85K wps
[Epoch 36 Batch 1440/1540] avg loss 0.00235179, throughput 2.84682K wps
[Epoch 36 Batch 1470/1540] avg loss 0.00209841, throughput 2.77732K wps
[Epoch 36 Batch 1500/1540] avg loss 0.00205731, throughput 2.84438K wps
[Epoch 36 Batch 1530/1540] avg loss 0.00210823, throughput 2.7976K wps
Begin Testing...
[Epoch 36] train avg loss 0.00205824, dev acc 0.8337, dev avg loss 0.545448, throughput 2.81744K wps
[Epoch 37 Batch 30/1540] avg loss 0.00220961, throughput 2.87415K wps
[Epoch 37 Batch 60/1540] avg loss 0.00192537, throughput 2.85414K wps
[Epoch 37 Batch 90/1540] avg loss 0.00213752, throughput 2.849K wps
[Epoch 37 Batch 120/1540] avg loss 0.00192377, throughput 2.832K wps
[Epoch 37 Batch 150/1540] avg loss 0.00217799, throughput 2.80768K wps
[Epoch 37 Batch 180/1540] avg loss 0.00196647, throughput 2.77703K wps
[Epoch 37 Batch 210/1540] avg loss 0.00172459, throughput 2.84824K wps
[Epoch 37 Batch 240/1540] avg loss 0.00153869, throughput 2.84751K wps
[Epoch 37 Batch 270/1540] avg loss 0.00211237, throughput 2.85146K wps
[Epoch 37 Batch 300/1540] avg loss 0.00200643, throughput 2.82849K wps
[Epoch 37 Batch 330/1540] avg loss 0.00183488, throughput 2.82031K wps
[Epoch 37 Batch 360/1540] avg loss 0.00186878, throughput 2.84815K wps
[Epoch 37 Batch 390/1540] avg loss 0.00220239, throughput 2.84626K wps
[Epoch 37 Batch 420/1540] avg loss 0.0022669, throughput 2.84235K wps
[Epoch 37 Batch 450/1540] avg loss 0.00187319, throughput 2.79249K wps
[Epoch 37 Batch 480/1540] avg loss 0.00247539, throughput 2.8305K wps
[Epoch 37 Batch 510/1540] avg loss 0.00202546, throughput 2.78742K wps
[Epoch 37 Batch 540/1540] avg loss 0.00163383, throughput 2.82355K wps
[Epoch 37 Batch 570/1540] avg loss 0.00185614, throughput 2.8003K wps
[Epoch 37 Batch 600/1540] avg loss 0.00196616, throughput 2.76309K wps
[Epoch 37 Batch 630/1540] avg loss 0.00185776, throughput 2.76531K wps
[Epoch 37 Batch 660/1540] avg loss 0.00196453, throughput 2.83553K wps
[Epoch 37 Batch 690/1540] avg loss 0.00189738, throughput 2.84871K wps
[Epoch 37 Batch 720/1540] avg loss 0.00179459, throughput 2.80705K wps
[Epoch 37 Batch 750/1540] avg loss 0.00182822, throughput 2.83879K wps
[Epoch 37 Batch 780/1540] avg loss 0.00200114, throughput 2.85235K wps
[Epoch 37 Batch 810/1540] avg loss 0.00212045, throughput 2.8076K wps
[Epoch 37 Batch 840/1540] avg loss 0.00193063, throughput 2.79787K wps
[Epoch 37 Batch 870/1540] avg loss 0.00200622, throughput 2.8502K wps
[Epoch 37 Batch 900/1540] avg loss 0.00224456, throughput 2.84379K wps
[Epoch 37 Batch 930/1540] avg loss 0.00179568, throughput 2.83287K wps
[Epoch 37 Batch 960/1540] avg loss 0.00202029, throughput 2.8173K wps
[Epoch 37 Batch 990/1540] avg loss 0.00181177, throughput 2.85065K wps
[Epoch 37 Batch 1020/1540] avg loss 0.0020924, throughput 2.84682K wps
[Epoch 37 Batch 1050/1540] avg loss 0.0019734, throughput 2.83201K wps
[Epoch 37 Batch 1080/1540] avg loss 0.00187298, throughput 2.75787K wps
[Epoch 37 Batch 1110/1540] avg loss 0.00228998, throughput 2.78593K wps
[Epoch 37 Batch 1140/1540] avg loss 0.0022497, throughput 2.84694K wps
[Epoch 37 Batch 1170/1540] avg loss 0.0020201, throughput 2.8483K wps
[Epoch 37 Batch 1200/1540] avg loss 0.00194021, throughput 2.82224K wps
[Epoch 37 Batch 1230/1540] avg loss 0.00210231, throughput 2.80845K wps
[Epoch 37 Batch 1260/1540] avg loss 0.00221637, throughput 2.75118K wps
[Epoch 37 Batch 1290/1540] avg loss 0.00201065, throughput 2.76225K wps
[Epoch 37 Batch 1320/1540] avg loss 0.00193513, throughput 2.81726K wps
[Epoch 37 Batch 1350/1540] avg loss 0.00195282, throughput 2.84051K wps
[Epoch 37 Batch 1380/1540] avg loss 0.00196262, throughput 2.80417K wps
[Epoch 37 Batch 1410/1540] avg loss 0.00212627, throughput 2.81256K wps
[Epoch 37 Batch 1440/1540] avg loss 0.00207587, throughput 2.75846K wps
[Epoch 37 Batch 1470/1540] avg loss 0.00214208, throughput 2.81538K wps
[Epoch 37 Batch 1500/1540] avg loss 0.00172697, throughput 2.84657K wps
[Epoch 37 Batch 1530/1540] avg loss 0.00219845, throughput 2.80802K wps
Begin Testing...
[Epoch 37] train avg loss 0.00199913, dev acc 0.8326, dev avg loss 0.555795, throughput 2.82017K wps
[Epoch 38 Batch 30/1540] avg loss 0.00190577, throughput 2.90221K wps
[Epoch 38 Batch 60/1540] avg loss 0.00172061, throughput 2.84119K wps
[Epoch 38 Batch 90/1540] avg loss 0.00197401, throughput 2.84194K wps
[Epoch 38 Batch 120/1540] avg loss 0.00198111, throughput 2.84314K wps
[Epoch 38 Batch 150/1540] avg loss 0.00205782, throughput 2.84641K wps
[Epoch 38 Batch 180/1540] avg loss 0.00185977, throughput 2.84364K wps
[Epoch 38 Batch 210/1540] avg loss 0.00167944, throughput 2.82877K wps
[Epoch 38 Batch 240/1540] avg loss 0.00218445, throughput 2.83519K wps
[Epoch 38 Batch 270/1540] avg loss 0.00184305, throughput 2.79215K wps
[Epoch 38 Batch 300/1540] avg loss 0.0016497, throughput 2.83573K wps
[Epoch 38 Batch 330/1540] avg loss 0.00190425, throughput 2.82316K wps
[Epoch 38 Batch 360/1540] avg loss 0.00195656, throughput 2.75857K wps
[Epoch 38 Batch 390/1540] avg loss 0.00178514, throughput 2.76565K wps
[Epoch 38 Batch 420/1540] avg loss 0.00204986, throughput 2.7757K wps
[Epoch 38 Batch 450/1540] avg loss 0.00187752, throughput 2.80466K wps
[Epoch 38 Batch 480/1540] avg loss 0.00227677, throughput 2.75349K wps
[Epoch 38 Batch 510/1540] avg loss 0.00204422, throughput 2.85118K wps
[Epoch 38 Batch 540/1540] avg loss 0.00203024, throughput 2.84081K wps
[Epoch 38 Batch 570/1540] avg loss 0.00181364, throughput 2.77244K wps
[Epoch 38 Batch 600/1540] avg loss 0.00192433, throughput 2.76842K wps
[Epoch 38 Batch 630/1540] avg loss 0.00187917, throughput 2.80789K wps
[Epoch 38 Batch 660/1540] avg loss 0.00205328, throughput 2.79251K wps
[Epoch 38 Batch 690/1540] avg loss 0.0020035, throughput 2.76433K wps
[Epoch 38 Batch 720/1540] avg loss 0.0021338, throughput 2.81251K wps
[Epoch 38 Batch 750/1540] avg loss 0.00191749, throughput 2.85082K wps
[Epoch 38 Batch 780/1540] avg loss 0.00227072, throughput 2.81466K wps
[Epoch 38 Batch 810/1540] avg loss 0.00209846, throughput 2.77317K wps
[Epoch 38 Batch 840/1540] avg loss 0.00205149, throughput 2.80138K wps
[Epoch 38 Batch 870/1540] avg loss 0.00203929, throughput 2.81524K wps
[Epoch 38 Batch 900/1540] avg loss 0.00203505, throughput 2.76121K wps
[Epoch 38 Batch 930/1540] avg loss 0.00177516, throughput 2.8185K wps
[Epoch 38 Batch 960/1540] avg loss 0.00217169, throughput 2.84863K wps
[Epoch 38 Batch 990/1540] avg loss 0.00203099, throughput 2.80034K wps
[Epoch 38 Batch 1020/1540] avg loss 0.00246233, throughput 2.83841K wps
[Epoch 38 Batch 1050/1540] avg loss 0.00219456, throughput 2.84534K wps
[Epoch 38 Batch 1080/1540] avg loss 0.00185811, throughput 2.85419K wps
[Epoch 38 Batch 1110/1540] avg loss 0.00207639, throughput 2.84399K wps
[Epoch 38 Batch 1140/1540] avg loss 0.00181508, throughput 2.84649K wps
[Epoch 38 Batch 1170/1540] avg loss 0.00196453, throughput 2.82383K wps
[Epoch 38 Batch 1200/1540] avg loss 0.00180876, throughput 2.84877K wps
[Epoch 38 Batch 1230/1540] avg loss 0.00203565, throughput 2.85097K wps
[Epoch 38 Batch 1260/1540] avg loss 0.00194966, throughput 2.85092K wps
[Epoch 38 Batch 1290/1540] avg loss 0.00158833, throughput 2.84211K wps
[Epoch 38 Batch 1320/1540] avg loss 0.00195679, throughput 2.84409K wps
[Epoch 38 Batch 1350/1540] avg loss 0.00220503, throughput 2.77611K wps
[Epoch 38 Batch 1380/1540] avg loss 0.00206299, throughput 2.8344K wps
[Epoch 38 Batch 1410/1540] avg loss 0.00183156, throughput 2.83686K wps
[Epoch 38 Batch 1440/1540] avg loss 0.00191179, throughput 2.78841K wps
[Epoch 38 Batch 1470/1540] avg loss 0.00185371, throughput 2.84372K wps
[Epoch 38 Batch 1500/1540] avg loss 0.00231084, throughput 2.78493K wps
[Epoch 38 Batch 1530/1540] avg loss 0.00167097, throughput 2.81495K wps
Begin Testing...
[Epoch 38] train avg loss 0.00196878, dev acc 0.8280, dev avg loss 0.555718, throughput 2.8185K wps
[Epoch 39 Batch 30/1540] avg loss 0.00158559, throughput 2.90977K wps
[Epoch 39 Batch 60/1540] avg loss 0.00193361, throughput 2.8068K wps
[Epoch 39 Batch 90/1540] avg loss 0.00173123, throughput 2.78857K wps
[Epoch 39 Batch 120/1540] avg loss 0.00213765, throughput 2.78046K wps
[Epoch 39 Batch 150/1540] avg loss 0.00156095, throughput 2.85221K wps
[Epoch 39 Batch 180/1540] avg loss 0.00169888, throughput 2.82884K wps
[Epoch 39 Batch 210/1540] avg loss 0.00171465, throughput 2.81925K wps
[Epoch 39 Batch 240/1540] avg loss 0.0015858, throughput 2.79045K wps
[Epoch 39 Batch 270/1540] avg loss 0.00175633, throughput 2.84676K wps
[Epoch 39 Batch 300/1540] avg loss 0.00199587, throughput 2.79162K wps
[Epoch 39 Batch 330/1540] avg loss 0.0020554, throughput 2.81122K wps
[Epoch 39 Batch 360/1540] avg loss 0.00208936, throughput 2.85042K wps
[Epoch 39 Batch 390/1540] avg loss 0.00223205, throughput 2.84967K wps
[Epoch 39 Batch 420/1540] avg loss 0.00156623, throughput 2.83056K wps
[Epoch 39 Batch 450/1540] avg loss 0.00197754, throughput 2.78531K wps
[Epoch 39 Batch 480/1540] avg loss 0.0019651, throughput 2.77837K wps
[Epoch 39 Batch 510/1540] avg loss 0.00182856, throughput 2.77347K wps
[Epoch 39 Batch 540/1540] avg loss 0.00210719, throughput 2.83528K wps
[Epoch 39 Batch 570/1540] avg loss 0.00228829, throughput 2.80252K wps
[Epoch 39 Batch 600/1540] avg loss 0.00187455, throughput 2.79796K wps
[Epoch 39 Batch 630/1540] avg loss 0.00183765, throughput 2.848K wps
[Epoch 39 Batch 660/1540] avg loss 0.00198889, throughput 2.78008K wps
[Epoch 39 Batch 690/1540] avg loss 0.00179247, throughput 2.76964K wps
[Epoch 39 Batch 720/1540] avg loss 0.0017069, throughput 2.81336K wps
[Epoch 39 Batch 750/1540] avg loss 0.00225182, throughput 2.76907K wps
[Epoch 39 Batch 780/1540] avg loss 0.00199454, throughput 2.84429K wps
[Epoch 39 Batch 810/1540] avg loss 0.00187683, throughput 2.85473K wps
[Epoch 39 Batch 840/1540] avg loss 0.00204761, throughput 2.85063K wps
[Epoch 39 Batch 870/1540] avg loss 0.00199406, throughput 2.83913K wps
[Epoch 39 Batch 900/1540] avg loss 0.0019661, throughput 2.77167K wps
[Epoch 39 Batch 930/1540] avg loss 0.00191154, throughput 2.8402K wps
[Epoch 39 Batch 960/1540] avg loss 0.00174284, throughput 2.80337K wps
[Epoch 39 Batch 990/1540] avg loss 0.00197311, throughput 2.77779K wps
[Epoch 39 Batch 1020/1540] avg loss 0.00219378, throughput 2.83865K wps
[Epoch 39 Batch 1050/1540] avg loss 0.001711, throughput 2.85196K wps
[Epoch 39 Batch 1080/1540] avg loss 0.00192771, throughput 2.84812K wps
[Epoch 39 Batch 1110/1540] avg loss 0.00175945, throughput 2.85004K wps
[Epoch 39 Batch 1140/1540] avg loss 0.00195166, throughput 2.8101K wps
[Epoch 39 Batch 1170/1540] avg loss 0.00191617, throughput 2.79061K wps
[Epoch 39 Batch 1200/1540] avg loss 0.00198085, throughput 2.81109K wps
[Epoch 39 Batch 1230/1540] avg loss 0.00234794, throughput 2.8485K wps
[Epoch 39 Batch 1260/1540] avg loss 0.001922, throughput 2.83775K wps
[Epoch 39 Batch 1290/1540] avg loss 0.00207202, throughput 2.77866K wps
[Epoch 39 Batch 1320/1540] avg loss 0.00218867, throughput 2.81453K wps
[Epoch 39 Batch 1350/1540] avg loss 0.00225775, throughput 2.84826K wps
[Epoch 39 Batch 1380/1540] avg loss 0.00228181, throughput 2.84552K wps
[Epoch 39 Batch 1410/1540] avg loss 0.00198836, throughput 2.84444K wps
[Epoch 39 Batch 1440/1540] avg loss 0.00182203, throughput 2.83811K wps
[Epoch 39 Batch 1470/1540] avg loss 0.00231262, throughput 2.83693K wps
[Epoch 39 Batch 1500/1540] avg loss 0.0019758, throughput 2.82439K wps
[Epoch 39 Batch 1530/1540] avg loss 0.00192658, throughput 2.80185K wps
Begin Testing...
[Epoch 39] train avg loss 0.00194608, dev acc 0.8200, dev avg loss 0.555062, throughput 2.81906K wps
[Epoch 40 Batch 30/1540] avg loss 0.00182899, throughput 2.8166K wps
[Epoch 40 Batch 60/1540] avg loss 0.00208879, throughput 2.76356K wps
[Epoch 40 Batch 90/1540] avg loss 0.00192449, throughput 2.75668K wps
[Epoch 40 Batch 120/1540] avg loss 0.00178601, throughput 2.79593K wps
[Epoch 40 Batch 150/1540] avg loss 0.00191922, throughput 2.80836K wps
[Epoch 40 Batch 180/1540] avg loss 0.00207194, throughput 2.84719K wps
[Epoch 40 Batch 210/1540] avg loss 0.00154029, throughput 2.85591K wps
[Epoch 40 Batch 240/1540] avg loss 0.00204846, throughput 2.81969K wps
[Epoch 40 Batch 270/1540] avg loss 0.0016137, throughput 2.77593K wps
[Epoch 40 Batch 300/1540] avg loss 0.00157264, throughput 2.82597K wps
[Epoch 40 Batch 330/1540] avg loss 0.00167458, throughput 2.7558K wps
[Epoch 40 Batch 360/1540] avg loss 0.00216792, throughput 2.84709K wps
[Epoch 40 Batch 390/1540] avg loss 0.00175569, throughput 2.78276K wps
[Epoch 40 Batch 420/1540] avg loss 0.00195028, throughput 2.78474K wps
[Epoch 40 Batch 450/1540] avg loss 0.00199464, throughput 2.84644K wps
[Epoch 40 Batch 480/1540] avg loss 0.00167836, throughput 2.84667K wps
[Epoch 40 Batch 510/1540] avg loss 0.00203574, throughput 2.84709K wps
[Epoch 40 Batch 540/1540] avg loss 0.00180037, throughput 2.80436K wps
[Epoch 40 Batch 570/1540] avg loss 0.00191803, throughput 2.84944K wps
[Epoch 40 Batch 600/1540] avg loss 0.00214702, throughput 2.85208K wps
[Epoch 40 Batch 630/1540] avg loss 0.00213618, throughput 2.85368K wps
[Epoch 40 Batch 660/1540] avg loss 0.00216429, throughput 2.84749K wps
[Epoch 40 Batch 690/1540] avg loss 0.00195883, throughput 2.77914K wps
[Epoch 40 Batch 720/1540] avg loss 0.00201292, throughput 2.81406K wps
[Epoch 40 Batch 750/1540] avg loss 0.00167193, throughput 2.75835K wps
[Epoch 40 Batch 780/1540] avg loss 0.00166566, throughput 2.7575K wps
[Epoch 40 Batch 810/1540] avg loss 0.00165718, throughput 2.76386K wps
[Epoch 40 Batch 840/1540] avg loss 0.00197514, throughput 2.81068K wps
[Epoch 40 Batch 870/1540] avg loss 0.00190509, throughput 2.85006K wps
[Epoch 40 Batch 900/1540] avg loss 0.00154689, throughput 2.84945K wps
[Epoch 40 Batch 930/1540] avg loss 0.0018323, throughput 2.84771K wps
[Epoch 40 Batch 960/1540] avg loss 0.00168994, throughput 2.78812K wps
[Epoch 40 Batch 990/1540] avg loss 0.00183087, throughput 2.84839K wps
[Epoch 40 Batch 1020/1540] avg loss 0.00180247, throughput 2.84395K wps
[Epoch 40 Batch 1050/1540] avg loss 0.00166934, throughput 2.8055K wps
[Epoch 40 Batch 1080/1540] avg loss 0.00201464, throughput 2.84635K wps
[Epoch 40 Batch 1110/1540] avg loss 0.0020407, throughput 2.85393K wps
[Epoch 40 Batch 1140/1540] avg loss 0.00199479, throughput 2.84626K wps
[Epoch 40 Batch 1170/1540] avg loss 0.00188415, throughput 2.85008K wps
[Epoch 40 Batch 1200/1540] avg loss 0.00207595, throughput 2.84846K wps
[Epoch 40 Batch 1230/1540] avg loss 0.00198067, throughput 2.83125K wps
[Epoch 40 Batch 1260/1540] avg loss 0.00193752, throughput 2.84749K wps
[Epoch 40 Batch 1290/1540] avg loss 0.00182463, throughput 2.83985K wps
[Epoch 40 Batch 1320/1540] avg loss 0.00168967, throughput 2.76183K wps
[Epoch 40 Batch 1350/1540] avg loss 0.00199949, throughput 2.77182K wps
[Epoch 40 Batch 1380/1540] avg loss 0.00198774, throughput 2.75525K wps
[Epoch 40 Batch 1410/1540] avg loss 0.00195836, throughput 2.82879K wps
[Epoch 40 Batch 1440/1540] avg loss 0.00177431, throughput 2.84696K wps
[Epoch 40 Batch 1470/1540] avg loss 0.00183352, throughput 2.84539K wps
[Epoch 40 Batch 1500/1540] avg loss 0.00207883, throughput 2.76107K wps
[Epoch 40 Batch 1530/1540] avg loss 0.00184948, throughput 2.80741K wps
Begin Testing...
[Epoch 40] train avg loss 0.00188292, dev acc 0.8326, dev avg loss 0.564454, throughput 2.81627K wps
[Epoch 41 Batch 30/1540] avg loss 0.00150428, throughput 2.8943K wps
[Epoch 41 Batch 60/1540] avg loss 0.00178529, throughput 2.84811K wps
[Epoch 41 Batch 90/1540] avg loss 0.00160759, throughput 2.78078K wps
[Epoch 41 Batch 120/1540] avg loss 0.00175243, throughput 2.79995K wps
[Epoch 41 Batch 150/1540] avg loss 0.00191919, throughput 2.85269K wps
[Epoch 41 Batch 180/1540] avg loss 0.00158944, throughput 2.84938K wps
[Epoch 41 Batch 210/1540] avg loss 0.00198113, throughput 2.79437K wps
[Epoch 41 Batch 240/1540] avg loss 0.0019426, throughput 2.79675K wps
[Epoch 41 Batch 270/1540] avg loss 0.00205669, throughput 2.83795K wps
[Epoch 41 Batch 300/1540] avg loss 0.00181296, throughput 2.84828K wps
[Epoch 41 Batch 330/1540] avg loss 0.00177882, throughput 2.84867K wps
[Epoch 41 Batch 360/1540] avg loss 0.00172817, throughput 2.79347K wps
[Epoch 41 Batch 390/1540] avg loss 0.00171636, throughput 2.84945K wps
[Epoch 41 Batch 420/1540] avg loss 0.00162239, throughput 2.75438K wps
[Epoch 41 Batch 450/1540] avg loss 0.00186877, throughput 2.76468K wps
[Epoch 41 Batch 480/1540] avg loss 0.00151276, throughput 2.82627K wps
[Epoch 41 Batch 510/1540] avg loss 0.00187356, throughput 2.85423K wps
[Epoch 41 Batch 540/1540] avg loss 0.00195736, throughput 2.81592K wps
[Epoch 41 Batch 570/1540] avg loss 0.00191859, throughput 2.76024K wps
[Epoch 41 Batch 600/1540] avg loss 0.00147564, throughput 2.8463K wps
[Epoch 41 Batch 630/1540] avg loss 0.0019324, throughput 2.8443K wps
[Epoch 41 Batch 660/1540] avg loss 0.00180108, throughput 2.81663K wps
[Epoch 41 Batch 690/1540] avg loss 0.00178002, throughput 2.83957K wps
[Epoch 41 Batch 720/1540] avg loss 0.00177321, throughput 2.78851K wps
[Epoch 41 Batch 750/1540] avg loss 0.0017383, throughput 2.84122K wps
[Epoch 41 Batch 780/1540] avg loss 0.00158517, throughput 2.84796K wps
[Epoch 41 Batch 810/1540] avg loss 0.00214905, throughput 2.8336K wps
[Epoch 41 Batch 840/1540] avg loss 0.00167424, throughput 2.77637K wps
[Epoch 41 Batch 870/1540] avg loss 0.00193907, throughput 2.8035K wps
[Epoch 41 Batch 900/1540] avg loss 0.00174059, throughput 2.84926K wps
[Epoch 41 Batch 930/1540] avg loss 0.00175191, throughput 2.80794K wps
[Epoch 41 Batch 960/1540] avg loss 0.00227098, throughput 2.82502K wps
[Epoch 41 Batch 990/1540] avg loss 0.00174886, throughput 2.80065K wps
[Epoch 41 Batch 1020/1540] avg loss 0.00191397, throughput 2.77412K wps
[Epoch 41 Batch 1050/1540] avg loss 0.00208317, throughput 2.79211K wps
[Epoch 41 Batch 1080/1540] avg loss 0.00192625, throughput 2.76115K wps
[Epoch 41 Batch 1110/1540] avg loss 0.00189837, throughput 2.76682K wps
[Epoch 41 Batch 1140/1540] avg loss 0.00231459, throughput 2.84579K wps
[Epoch 41 Batch 1170/1540] avg loss 0.00213785, throughput 2.84112K wps
[Epoch 41 Batch 1200/1540] avg loss 0.00197748, throughput 2.82488K wps
[Epoch 41 Batch 1230/1540] avg loss 0.00220064, throughput 2.84992K wps
[Epoch 41 Batch 1260/1540] avg loss 0.00218357, throughput 2.84509K wps
[Epoch 41 Batch 1290/1540] avg loss 0.00184258, throughput 2.84893K wps
[Epoch 41 Batch 1320/1540] avg loss 0.00195472, throughput 2.85114K wps
[Epoch 41 Batch 1350/1540] avg loss 0.00197764, throughput 2.84778K wps
[Epoch 41 Batch 1380/1540] avg loss 0.00208039, throughput 2.84391K wps
[Epoch 41 Batch 1410/1540] avg loss 0.00219475, throughput 2.84272K wps
[Epoch 41 Batch 1440/1540] avg loss 0.00177994, throughput 2.79097K wps
[Epoch 41 Batch 1470/1540] avg loss 0.00183806, throughput 2.8297K wps
[Epoch 41 Batch 1500/1540] avg loss 0.00161357, throughput 2.78458K wps
[Epoch 41 Batch 1530/1540] avg loss 0.00201139, throughput 2.76203K wps
Begin Testing...
[Epoch 41] train avg loss 0.00186933, dev acc 0.8314, dev avg loss 0.574567, throughput 2.8186K wps
[Epoch 42 Batch 30/1540] avg loss 0.00176358, throughput 2.90507K wps
[Epoch 42 Batch 60/1540] avg loss 0.00166222, throughput 2.84769K wps
[Epoch 42 Batch 90/1540] avg loss 0.00153637, throughput 2.83795K wps
[Epoch 42 Batch 120/1540] avg loss 0.00160656, throughput 2.84953K wps
[Epoch 42 Batch 150/1540] avg loss 0.00166014, throughput 2.8551K wps
[Epoch 42 Batch 180/1540] avg loss 0.00194656, throughput 2.80423K wps
[Epoch 42 Batch 210/1540] avg loss 0.00178222, throughput 2.77567K wps
[Epoch 42 Batch 240/1540] avg loss 0.00180687, throughput 2.78774K wps
[Epoch 42 Batch 270/1540] avg loss 0.00165901, throughput 2.8289K wps
[Epoch 42 Batch 300/1540] avg loss 0.00206215, throughput 2.84977K wps
[Epoch 42 Batch 330/1540] avg loss 0.00170211, throughput 2.85048K wps
[Epoch 42 Batch 360/1540] avg loss 0.00207273, throughput 2.8409K wps
[Epoch 42 Batch 390/1540] avg loss 0.00183507, throughput 2.82741K wps
[Epoch 42 Batch 420/1540] avg loss 0.00188576, throughput 2.84637K wps
[Epoch 42 Batch 450/1540] avg loss 0.00170245, throughput 2.83891K wps
[Epoch 42 Batch 480/1540] avg loss 0.00188041, throughput 2.76002K wps
[Epoch 42 Batch 510/1540] avg loss 0.00186778, throughput 2.80123K wps
[Epoch 42 Batch 540/1540] avg loss 0.00164357, throughput 2.84113K wps
[Epoch 42 Batch 570/1540] avg loss 0.0016527, throughput 2.83603K wps
[Epoch 42 Batch 600/1540] avg loss 0.0017901, throughput 2.85154K wps
[Epoch 42 Batch 630/1540] avg loss 0.00201549, throughput 2.84211K wps
[Epoch 42 Batch 660/1540] avg loss 0.00205021, throughput 2.84624K wps
[Epoch 42 Batch 690/1540] avg loss 0.00165729, throughput 2.81688K wps
[Epoch 42 Batch 720/1540] avg loss 0.00187337, throughput 2.84708K wps
[Epoch 42 Batch 750/1540] avg loss 0.00189625, throughput 2.83991K wps
[Epoch 42 Batch 780/1540] avg loss 0.00188631, throughput 2.78869K wps
[Epoch 42 Batch 810/1540] avg loss 0.00224466, throughput 2.84183K wps
[Epoch 42 Batch 840/1540] avg loss 0.00179208, throughput 2.83725K wps
[Epoch 42 Batch 870/1540] avg loss 0.00171869, throughput 2.84244K wps
[Epoch 42 Batch 900/1540] avg loss 0.00188225, throughput 2.77764K wps
[Epoch 42 Batch 930/1540] avg loss 0.00156122, throughput 2.85077K wps
[Epoch 42 Batch 960/1540] avg loss 0.00181817, throughput 2.84304K wps
[Epoch 42 Batch 990/1540] avg loss 0.00217812, throughput 2.85105K wps
[Epoch 42 Batch 1020/1540] avg loss 0.00189008, throughput 2.80865K wps
[Epoch 42 Batch 1050/1540] avg loss 0.00187938, throughput 2.84486K wps
[Epoch 42 Batch 1080/1540] avg loss 0.00178835, throughput 2.76098K wps
[Epoch 42 Batch 1110/1540] avg loss 0.0021599, throughput 2.8439K wps
[Epoch 42 Batch 1140/1540] avg loss 0.00156762, throughput 2.83926K wps
[Epoch 42 Batch 1170/1540] avg loss 0.00189195, throughput 2.76153K wps
[Epoch 42 Batch 1200/1540] avg loss 0.00185168, throughput 2.801K wps
[Epoch 42 Batch 1230/1540] avg loss 0.00186371, throughput 2.85137K wps
[Epoch 42 Batch 1260/1540] avg loss 0.00177642, throughput 2.79765K wps
[Epoch 42 Batch 1290/1540] avg loss 0.00186995, throughput 2.78665K wps
[Epoch 42 Batch 1320/1540] avg loss 0.00186701, throughput 2.85082K wps
[Epoch 42 Batch 1350/1540] avg loss 0.00168264, throughput 2.84944K wps
[Epoch 42 Batch 1380/1540] avg loss 0.00208823, throughput 2.85248K wps
[Epoch 42 Batch 1410/1540] avg loss 0.00190566, throughput 2.84644K wps
[Epoch 42 Batch 1440/1540] avg loss 0.00196113, throughput 2.8141K wps
[Epoch 42 Batch 1470/1540] avg loss 0.00187544, throughput 2.78284K wps
[Epoch 42 Batch 1500/1540] avg loss 0.00160482, throughput 2.77352K wps
[Epoch 42 Batch 1530/1540] avg loss 0.00191599, throughput 2.7564K wps
Begin Testing...
[Epoch 42] train avg loss 0.00183572, dev acc 0.8257, dev avg loss 0.576211, throughput 2.82435K wps
[Epoch 43 Batch 30/1540] avg loss 0.00158416, throughput 2.90978K wps
[Epoch 43 Batch 60/1540] avg loss 0.00149476, throughput 2.761K wps
[Epoch 43 Batch 90/1540] avg loss 0.00154072, throughput 2.83866K wps
[Epoch 43 Batch 120/1540] avg loss 0.00182533, throughput 2.84673K wps
[Epoch 43 Batch 150/1540] avg loss 0.00172408, throughput 2.8136K wps
[Epoch 43 Batch 180/1540] avg loss 0.00174132, throughput 2.76369K wps
[Epoch 43 Batch 210/1540] avg loss 0.00190266, throughput 2.81568K wps
[Epoch 43 Batch 240/1540] avg loss 0.00188403, throughput 2.81909K wps
[Epoch 43 Batch 270/1540] avg loss 0.00171135, throughput 2.84595K wps
[Epoch 43 Batch 300/1540] avg loss 0.00167997, throughput 2.84714K wps
[Epoch 43 Batch 330/1540] avg loss 0.00194262, throughput 2.78686K wps
[Epoch 43 Batch 360/1540] avg loss 0.00183352, throughput 2.76375K wps
[Epoch 43 Batch 390/1540] avg loss 0.00182042, throughput 2.78374K wps
[Epoch 43 Batch 420/1540] avg loss 0.00181112, throughput 2.79906K wps
[Epoch 43 Batch 450/1540] avg loss 0.00155349, throughput 2.81619K wps
[Epoch 43 Batch 480/1540] avg loss 0.00181259, throughput 2.85435K wps
[Epoch 43 Batch 510/1540] avg loss 0.00149718, throughput 2.8524K wps
[Epoch 43 Batch 540/1540] avg loss 0.00210118, throughput 2.84781K wps
[Epoch 43 Batch 570/1540] avg loss 0.00155177, throughput 2.8134K wps
[Epoch 43 Batch 600/1540] avg loss 0.00181225, throughput 2.83447K wps
[Epoch 43 Batch 630/1540] avg loss 0.00158797, throughput 2.77747K wps
[Epoch 43 Batch 660/1540] avg loss 0.00180398, throughput 2.76073K wps
[Epoch 43 Batch 690/1540] avg loss 0.00169512, throughput 2.79147K wps
[Epoch 43 Batch 720/1540] avg loss 0.00173963, throughput 2.85173K wps
[Epoch 43 Batch 750/1540] avg loss 0.00198158, throughput 2.84279K wps
[Epoch 43 Batch 780/1540] avg loss 0.001979, throughput 2.8374K wps
[Epoch 43 Batch 810/1540] avg loss 0.00204175, throughput 2.84634K wps
[Epoch 43 Batch 840/1540] avg loss 0.00190363, throughput 2.80721K wps
[Epoch 43 Batch 870/1540] avg loss 0.00173605, throughput 2.8348K wps
[Epoch 43 Batch 900/1540] avg loss 0.00182789, throughput 2.76712K wps
[Epoch 43 Batch 930/1540] avg loss 0.00192291, throughput 2.84768K wps
[Epoch 43 Batch 960/1540] avg loss 0.00182942, throughput 2.85052K wps
[Epoch 43 Batch 990/1540] avg loss 0.00175716, throughput 2.8466K wps
[Epoch 43 Batch 1020/1540] avg loss 0.0018757, throughput 2.83601K wps
[Epoch 43 Batch 1050/1540] avg loss 0.00157399, throughput 2.83902K wps
[Epoch 43 Batch 1080/1540] avg loss 0.00213809, throughput 2.8328K wps
[Epoch 43 Batch 1110/1540] avg loss 0.00207529, throughput 2.84679K wps
[Epoch 43 Batch 1140/1540] avg loss 0.00166487, throughput 2.80661K wps
[Epoch 43 Batch 1170/1540] avg loss 0.00218082, throughput 2.80344K wps
[Epoch 43 Batch 1200/1540] avg loss 0.00205401, throughput 2.84558K wps
[Epoch 43 Batch 1230/1540] avg loss 0.00189945, throughput 2.79055K wps
[Epoch 43 Batch 1260/1540] avg loss 0.0020602, throughput 2.83695K wps
[Epoch 43 Batch 1290/1540] avg loss 0.00194378, throughput 2.81881K wps
[Epoch 43 Batch 1320/1540] avg loss 0.00198755, throughput 2.8477K wps
[Epoch 43 Batch 1350/1540] avg loss 0.00182391, throughput 2.85089K wps
[Epoch 43 Batch 1380/1540] avg loss 0.00192746, throughput 2.81183K wps
[Epoch 43 Batch 1410/1540] avg loss 0.00181055, throughput 2.79295K wps
[Epoch 43 Batch 1440/1540] avg loss 0.00180624, throughput 2.79752K wps
[Epoch 43 Batch 1470/1540] avg loss 0.00182475, throughput 2.79535K wps
[Epoch 43 Batch 1500/1540] avg loss 0.00191297, throughput 2.84545K wps
[Epoch 43 Batch 1530/1540] avg loss 0.00177173, throughput 2.82553K wps
Begin Testing...
[Epoch 43] train avg loss 0.00182693, dev acc 0.8200, dev avg loss 0.599599, throughput 2.82074K wps
[Epoch 44 Batch 30/1540] avg loss 0.0017877, throughput 2.83509K wps
[Epoch 44 Batch 60/1540] avg loss 0.00166068, throughput 2.81511K wps
[Epoch 44 Batch 90/1540] avg loss 0.00161005, throughput 2.76867K wps
[Epoch 44 Batch 120/1540] avg loss 0.00174273, throughput 2.76947K wps
[Epoch 44 Batch 150/1540] avg loss 0.00168898, throughput 2.75991K wps
[Epoch 44 Batch 180/1540] avg loss 0.00162278, throughput 2.84205K wps
[Epoch 44 Batch 210/1540] avg loss 0.00176291, throughput 2.84063K wps
[Epoch 44 Batch 240/1540] avg loss 0.00185442, throughput 2.81482K wps
[Epoch 44 Batch 270/1540] avg loss 0.00194329, throughput 2.8503K wps
[Epoch 44 Batch 300/1540] avg loss 0.00177893, throughput 2.84997K wps
[Epoch 44 Batch 330/1540] avg loss 0.00170507, throughput 2.85309K wps
[Epoch 44 Batch 360/1540] avg loss 0.00211265, throughput 2.84688K wps
[Epoch 44 Batch 390/1540] avg loss 0.00178604, throughput 2.8446K wps
[Epoch 44 Batch 420/1540] avg loss 0.00195421, throughput 2.84102K wps
[Epoch 44 Batch 450/1540] avg loss 0.00210746, throughput 2.8246K wps
[Epoch 44 Batch 480/1540] avg loss 0.00192408, throughput 2.83853K wps
[Epoch 44 Batch 510/1540] avg loss 0.00188882, throughput 2.84535K wps
[Epoch 44 Batch 540/1540] avg loss 0.00168372, throughput 2.84955K wps
[Epoch 44 Batch 570/1540] avg loss 0.00168762, throughput 2.83944K wps
[Epoch 44 Batch 600/1540] avg loss 0.0016447, throughput 2.83529K wps
[Epoch 44 Batch 630/1540] avg loss 0.00171588, throughput 2.8498K wps
[Epoch 44 Batch 660/1540] avg loss 0.00167837, throughput 2.77839K wps
[Epoch 44 Batch 690/1540] avg loss 0.00184825, throughput 2.8144K wps
[Epoch 44 Batch 720/1540] avg loss 0.0016857, throughput 2.78376K wps
[Epoch 44 Batch 750/1540] avg loss 0.0015889, throughput 2.78699K wps
[Epoch 44 Batch 780/1540] avg loss 0.00189482, throughput 2.84702K wps
[Epoch 44 Batch 810/1540] avg loss 0.00161571, throughput 2.84933K wps
[Epoch 44 Batch 840/1540] avg loss 0.00166714, throughput 2.8511K wps
[Epoch 44 Batch 870/1540] avg loss 0.0015378, throughput 2.8141K wps
[Epoch 44 Batch 900/1540] avg loss 0.00171085, throughput 2.84125K wps
[Epoch 44 Batch 930/1540] avg loss 0.0021019, throughput 2.83803K wps
[Epoch 44 Batch 960/1540] avg loss 0.00167721, throughput 2.84068K wps
[Epoch 44 Batch 990/1540] avg loss 0.00183124, throughput 2.84814K wps
[Epoch 44 Batch 1020/1540] avg loss 0.00162114, throughput 2.84847K wps
[Epoch 44 Batch 1050/1540] avg loss 0.00194237, throughput 2.84914K wps
[Epoch 44 Batch 1080/1540] avg loss 0.00169727, throughput 2.85122K wps
[Epoch 44 Batch 1110/1540] avg loss 0.00181678, throughput 2.84191K wps
[Epoch 44 Batch 1140/1540] avg loss 0.00184101, throughput 2.7825K wps
[Epoch 44 Batch 1170/1540] avg loss 0.00169487, throughput 2.77714K wps
[Epoch 44 Batch 1200/1540] avg loss 0.00198578, throughput 2.84747K wps
[Epoch 44 Batch 1230/1540] avg loss 0.00205794, throughput 2.84603K wps
[Epoch 44 Batch 1260/1540] avg loss 0.0016529, throughput 2.84769K wps
[Epoch 44 Batch 1290/1540] avg loss 0.00194817, throughput 2.83995K wps
[Epoch 44 Batch 1320/1540] avg loss 0.00142799, throughput 2.84486K wps
[Epoch 44 Batch 1350/1540] avg loss 0.00180898, throughput 2.84222K wps
[Epoch 44 Batch 1380/1540] avg loss 0.00173636, throughput 2.8411K wps
[Epoch 44 Batch 1410/1540] avg loss 0.00171886, throughput 2.80198K wps
[Epoch 44 Batch 1440/1540] avg loss 0.00177793, throughput 2.84164K wps
[Epoch 44 Batch 1470/1540] avg loss 0.00190187, throughput 2.84213K wps
[Epoch 44 Batch 1500/1540] avg loss 0.00174644, throughput 2.84687K wps
[Epoch 44 Batch 1530/1540] avg loss 0.00191253, throughput 2.84609K wps
Begin Testing...
[Epoch 44] train avg loss 0.00178173, dev acc 0.8268, dev avg loss 0.589262, throughput 2.83003K wps
[Epoch 45 Batch 30/1540] avg loss 0.00157983, throughput 2.81731K wps
[Epoch 45 Batch 60/1540] avg loss 0.0015631, throughput 2.82254K wps
[Epoch 45 Batch 90/1540] avg loss 0.00197619, throughput 2.81471K wps
[Epoch 45 Batch 120/1540] avg loss 0.00179646, throughput 2.85049K wps
[Epoch 45 Batch 150/1540] avg loss 0.00190015, throughput 2.84728K wps
[Epoch 45 Batch 180/1540] avg loss 0.00156019, throughput 2.85269K wps
[Epoch 45 Batch 210/1540] avg loss 0.00176334, throughput 2.82789K wps
[Epoch 45 Batch 240/1540] avg loss 0.00147542, throughput 2.81903K wps
[Epoch 45 Batch 270/1540] avg loss 0.00158742, throughput 2.82912K wps
[Epoch 45 Batch 300/1540] avg loss 0.00166777, throughput 2.79652K wps
[Epoch 45 Batch 330/1540] avg loss 0.00176382, throughput 2.76446K wps
[Epoch 45 Batch 360/1540] avg loss 0.00156813, throughput 2.76805K wps
[Epoch 45 Batch 390/1540] avg loss 0.00187551, throughput 2.85122K wps
[Epoch 45 Batch 420/1540] avg loss 0.00157507, throughput 2.78081K wps
[Epoch 45 Batch 450/1540] avg loss 0.0014888, throughput 2.76807K wps
[Epoch 45 Batch 480/1540] avg loss 0.00176818, throughput 2.80703K wps
[Epoch 45 Batch 510/1540] avg loss 0.00194032, throughput 2.84417K wps
[Epoch 45 Batch 540/1540] avg loss 0.00170864, throughput 2.80319K wps
[Epoch 45 Batch 570/1540] avg loss 0.00153496, throughput 2.82839K wps
[Epoch 45 Batch 600/1540] avg loss 0.00151534, throughput 2.79199K wps
[Epoch 45 Batch 630/1540] avg loss 0.00213163, throughput 2.76921K wps
[Epoch 45 Batch 660/1540] avg loss 0.00155437, throughput 2.84754K wps
[Epoch 45 Batch 690/1540] avg loss 0.00170628, throughput 2.84177K wps
[Epoch 45 Batch 720/1540] avg loss 0.00184801, throughput 2.75988K wps
[Epoch 45 Batch 750/1540] avg loss 0.00191022, throughput 2.77943K wps
[Epoch 45 Batch 780/1540] avg loss 0.00179934, throughput 2.84408K wps
[Epoch 45 Batch 810/1540] avg loss 0.00192074, throughput 2.80029K wps
[Epoch 45 Batch 840/1540] avg loss 0.00202899, throughput 2.81805K wps
[Epoch 45 Batch 870/1540] avg loss 0.00206718, throughput 2.80605K wps
[Epoch 45 Batch 900/1540] avg loss 0.00141428, throughput 2.7814K wps
[Epoch 45 Batch 930/1540] avg loss 0.00172749, throughput 2.83657K wps
[Epoch 45 Batch 960/1540] avg loss 0.00186978, throughput 2.79016K wps
[Epoch 45 Batch 990/1540] avg loss 0.00199565, throughput 2.82556K wps
[Epoch 45 Batch 1020/1540] avg loss 0.00185484, throughput 2.85015K wps
[Epoch 45 Batch 1050/1540] avg loss 0.00172467, throughput 2.79674K wps
[Epoch 45 Batch 1080/1540] avg loss 0.00180564, throughput 2.7612K wps
[Epoch 45 Batch 1110/1540] avg loss 0.00202215, throughput 2.76625K wps
[Epoch 45 Batch 1140/1540] avg loss 0.00200665, throughput 2.75719K wps
[Epoch 45 Batch 1170/1540] avg loss 0.00187281, throughput 2.81654K wps
[Epoch 45 Batch 1200/1540] avg loss 0.00186937, throughput 2.8483K wps
[Epoch 45 Batch 1230/1540] avg loss 0.0016886, throughput 2.79517K wps
[Epoch 45 Batch 1260/1540] avg loss 0.00215222, throughput 2.80228K wps
[Epoch 45 Batch 1290/1540] avg loss 0.00177747, throughput 2.84478K wps
[Epoch 45 Batch 1320/1540] avg loss 0.00184888, throughput 2.84581K wps
[Epoch 45 Batch 1350/1540] avg loss 0.00196658, throughput 2.82055K wps
[Epoch 45 Batch 1380/1540] avg loss 0.00170618, throughput 2.85052K wps
[Epoch 45 Batch 1410/1540] avg loss 0.00191049, throughput 2.84645K wps
[Epoch 45 Batch 1440/1540] avg loss 0.00178791, throughput 2.85006K wps
[Epoch 45 Batch 1470/1540] avg loss 0.00165304, throughput 2.84406K wps
[Epoch 45 Batch 1500/1540] avg loss 0.00174282, throughput 2.84422K wps
[Epoch 45 Batch 1530/1540] avg loss 0.00175429, throughput 2.75839K wps
Begin Testing...
[Epoch 45] train avg loss 0.00178184, dev acc 0.8291, dev avg loss 0.586842, throughput 2.81285K wps
[Epoch 46 Batch 30/1540] avg loss 0.00147409, throughput 2.89702K wps
[Epoch 46 Batch 60/1540] avg loss 0.00173521, throughput 2.81805K wps
[Epoch 46 Batch 90/1540] avg loss 0.00156336, throughput 2.8242K wps
[Epoch 46 Batch 120/1540] avg loss 0.00160282, throughput 2.76519K wps
[Epoch 46 Batch 150/1540] avg loss 0.00170582, throughput 2.7707K wps
[Epoch 46 Batch 180/1540] avg loss 0.00161456, throughput 2.84358K wps
[Epoch 46 Batch 210/1540] avg loss 0.00152925, throughput 2.84895K wps
[Epoch 46 Batch 240/1540] avg loss 0.00194698, throughput 2.84013K wps
[Epoch 46 Batch 270/1540] avg loss 0.00167527, throughput 2.84273K wps
[Epoch 46 Batch 300/1540] avg loss 0.00159382, throughput 2.84732K wps
[Epoch 46 Batch 330/1540] avg loss 0.00192328, throughput 2.84095K wps
[Epoch 46 Batch 360/1540] avg loss 0.00179198, throughput 2.82462K wps
[Epoch 46 Batch 390/1540] avg loss 0.00163346, throughput 2.84975K wps
[Epoch 46 Batch 420/1540] avg loss 0.00179513, throughput 2.84746K wps
[Epoch 46 Batch 450/1540] avg loss 0.00153566, throughput 2.84322K wps
[Epoch 46 Batch 480/1540] avg loss 0.00189614, throughput 2.85025K wps
[Epoch 46 Batch 510/1540] avg loss 0.00189772, throughput 2.83617K wps
[Epoch 46 Batch 540/1540] avg loss 0.00186253, throughput 2.84638K wps
[Epoch 46 Batch 570/1540] avg loss 0.00182471, throughput 2.82721K wps
[Epoch 46 Batch 600/1540] avg loss 0.00161076, throughput 2.84496K wps
[Epoch 46 Batch 630/1540] avg loss 0.00196858, throughput 2.84122K wps
[Epoch 46 Batch 660/1540] avg loss 0.0015905, throughput 2.75926K wps
[Epoch 46 Batch 690/1540] avg loss 0.0017268, throughput 2.80736K wps
[Epoch 46 Batch 720/1540] avg loss 0.0017916, throughput 2.80709K wps
[Epoch 46 Batch 750/1540] avg loss 0.00150229, throughput 2.80113K wps
[Epoch 46 Batch 780/1540] avg loss 0.00184197, throughput 2.76267K wps
[Epoch 46 Batch 810/1540] avg loss 0.00185984, throughput 2.7633K wps
[Epoch 46 Batch 840/1540] avg loss 0.00165313, throughput 2.76656K wps
[Epoch 46 Batch 870/1540] avg loss 0.00130515, throughput 2.84677K wps
[Epoch 46 Batch 900/1540] avg loss 0.00157637, throughput 2.85337K wps
[Epoch 46 Batch 930/1540] avg loss 0.00154394, throughput 2.84894K wps
[Epoch 46 Batch 960/1540] avg loss 0.00179946, throughput 2.83033K wps
[Epoch 46 Batch 990/1540] avg loss 0.00168785, throughput 2.75766K wps
[Epoch 46 Batch 1020/1540] avg loss 0.00219196, throughput 2.79303K wps
[Epoch 46 Batch 1050/1540] avg loss 0.0018073, throughput 2.82128K wps
[Epoch 46 Batch 1080/1540] avg loss 0.00183545, throughput 2.82896K wps
[Epoch 46 Batch 1110/1540] avg loss 0.00147237, throughput 2.80337K wps
[Epoch 46 Batch 1140/1540] avg loss 0.00162573, throughput 2.83142K wps
[Epoch 46 Batch 1170/1540] avg loss 0.00198807, throughput 2.84859K wps
[Epoch 46 Batch 1200/1540] avg loss 0.00188853, throughput 2.82854K wps
[Epoch 46 Batch 1230/1540] avg loss 0.00179687, throughput 2.76089K wps
[Epoch 46 Batch 1260/1540] avg loss 0.00176508, throughput 2.8257K wps
[Epoch 46 Batch 1290/1540] avg loss 0.00188526, throughput 2.84927K wps
[Epoch 46 Batch 1320/1540] avg loss 0.00201933, throughput 2.84806K wps
[Epoch 46 Batch 1350/1540] avg loss 0.00161903, throughput 2.82864K wps
[Epoch 46 Batch 1380/1540] avg loss 0.0014332, throughput 2.84685K wps
[Epoch 46 Batch 1410/1540] avg loss 0.00178997, throughput 2.76884K wps
[Epoch 46 Batch 1440/1540] avg loss 0.00174591, throughput 2.8457K wps
[Epoch 46 Batch 1470/1540] avg loss 0.00211309, throughput 2.81955K wps
[Epoch 46 Batch 1500/1540] avg loss 0.0016802, throughput 2.80348K wps
[Epoch 46 Batch 1530/1540] avg loss 0.00216291, throughput 2.7501K wps
Begin Testing...
[Epoch 46] train avg loss 0.00174422, dev acc 0.8360, dev avg loss 0.58958, throughput 2.81993K wps
[Epoch 47 Batch 30/1540] avg loss 0.00163691, throughput 2.87142K wps
[Epoch 47 Batch 60/1540] avg loss 0.00139114, throughput 2.84938K wps
[Epoch 47 Batch 90/1540] avg loss 0.00186557, throughput 2.8486K wps
[Epoch 47 Batch 120/1540] avg loss 0.00161256, throughput 2.8036K wps
[Epoch 47 Batch 150/1540] avg loss 0.0015005, throughput 2.84755K wps
[Epoch 47 Batch 180/1540] avg loss 0.0017337, throughput 2.85197K wps
[Epoch 47 Batch 210/1540] avg loss 0.00180075, throughput 2.85467K wps
[Epoch 47 Batch 240/1540] avg loss 0.00191397, throughput 2.83878K wps
[Epoch 47 Batch 270/1540] avg loss 0.00183209, throughput 2.77549K wps
[Epoch 47 Batch 300/1540] avg loss 0.00168127, throughput 2.84673K wps
[Epoch 47 Batch 330/1540] avg loss 0.00183929, throughput 2.79055K wps
[Epoch 47 Batch 360/1540] avg loss 0.00154469, throughput 2.79068K wps
[Epoch 47 Batch 390/1540] avg loss 0.00188554, throughput 2.83198K wps
[Epoch 47 Batch 420/1540] avg loss 0.00171006, throughput 2.85455K wps
[Epoch 47 Batch 450/1540] avg loss 0.00165018, throughput 2.8446K wps
[Epoch 47 Batch 480/1540] avg loss 0.00182606, throughput 2.79942K wps
[Epoch 47 Batch 510/1540] avg loss 0.00174697, throughput 2.85399K wps
[Epoch 47 Batch 540/1540] avg loss 0.00183028, throughput 2.84792K wps
[Epoch 47 Batch 570/1540] avg loss 0.0014215, throughput 2.84328K wps
[Epoch 47 Batch 600/1540] avg loss 0.00188789, throughput 2.84686K wps
[Epoch 47 Batch 630/1540] avg loss 0.00183682, throughput 2.84843K wps
[Epoch 47 Batch 660/1540] avg loss 0.0020193, throughput 2.84694K wps
[Epoch 47 Batch 690/1540] avg loss 0.00165595, throughput 2.85225K wps
[Epoch 47 Batch 720/1540] avg loss 0.0019367, throughput 2.83171K wps
[Epoch 47 Batch 750/1540] avg loss 0.00173768, throughput 2.76113K wps
[Epoch 47 Batch 780/1540] avg loss 0.00178397, throughput 2.74608K wps
[Epoch 47 Batch 810/1540] avg loss 0.00145643, throughput 2.77092K wps
[Epoch 47 Batch 840/1540] avg loss 0.00153157, throughput 2.83662K wps
[Epoch 47 Batch 870/1540] avg loss 0.00163334, throughput 2.8474K wps
[Epoch 47 Batch 900/1540] avg loss 0.00198789, throughput 2.842K wps
[Epoch 47 Batch 930/1540] avg loss 0.00164693, throughput 2.84156K wps
[Epoch 47 Batch 960/1540] avg loss 0.00152432, throughput 2.84805K wps
[Epoch 47 Batch 990/1540] avg loss 0.00198459, throughput 2.84672K wps
[Epoch 47 Batch 1020/1540] avg loss 0.00154518, throughput 2.8282K wps
[Epoch 47 Batch 1050/1540] avg loss 0.00201117, throughput 2.76173K wps
[Epoch 47 Batch 1080/1540] avg loss 0.00141067, throughput 2.81315K wps
[Epoch 47 Batch 1110/1540] avg loss 0.00151271, throughput 2.83064K wps
[Epoch 47 Batch 1140/1540] avg loss 0.00224281, throughput 2.84524K wps
[Epoch 47 Batch 1170/1540] avg loss 0.0016911, throughput 2.78957K wps
[Epoch 47 Batch 1200/1540] avg loss 0.00161763, throughput 2.77492K wps
[Epoch 47 Batch 1230/1540] avg loss 0.00184925, throughput 2.8004K wps
[Epoch 47 Batch 1260/1540] avg loss 0.00184444, throughput 2.75854K wps
[Epoch 47 Batch 1290/1540] avg loss 0.00195287, throughput 2.75992K wps
[Epoch 47 Batch 1320/1540] avg loss 0.00157338, throughput 2.85135K wps
[Epoch 47 Batch 1350/1540] avg loss 0.0017635, throughput 2.84434K wps
[Epoch 47 Batch 1380/1540] avg loss 0.0017674, throughput 2.8265K wps
[Epoch 47 Batch 1410/1540] avg loss 0.00164877, throughput 2.7818K wps
[Epoch 47 Batch 1440/1540] avg loss 0.00168551, throughput 2.77144K wps
[Epoch 47 Batch 1470/1540] avg loss 0.00180154, throughput 2.85122K wps
[Epoch 47 Batch 1500/1540] avg loss 0.00168792, throughput 2.85403K wps
[Epoch 47 Batch 1530/1540] avg loss 0.0017146, throughput 2.82469K wps
Begin Testing...
[Epoch 47] train avg loss 0.0017387, dev acc 0.8280, dev avg loss 0.617909, throughput 2.82229K wps
[Epoch 48 Batch 30/1540] avg loss 0.00157421, throughput 2.86742K wps
[Epoch 48 Batch 60/1540] avg loss 0.00141297, throughput 2.76247K wps
[Epoch 48 Batch 90/1540] avg loss 0.00157519, throughput 2.76514K wps
[Epoch 48 Batch 120/1540] avg loss 0.00172086, throughput 2.76275K wps
[Epoch 48 Batch 150/1540] avg loss 0.00156541, throughput 2.7597K wps
[Epoch 48 Batch 180/1540] avg loss 0.00139209, throughput 2.7836K wps
[Epoch 48 Batch 210/1540] avg loss 0.00173064, throughput 2.84952K wps
[Epoch 48 Batch 240/1540] avg loss 0.00150403, throughput 2.80943K wps
[Epoch 48 Batch 270/1540] avg loss 0.00141145, throughput 2.8459K wps
[Epoch 48 Batch 300/1540] avg loss 0.00157545, throughput 2.8466K wps
[Epoch 48 Batch 330/1540] avg loss 0.00174973, throughput 2.85138K wps
[Epoch 48 Batch 360/1540] avg loss 0.00167794, throughput 2.84136K wps
[Epoch 48 Batch 390/1540] avg loss 0.00169905, throughput 2.84238K wps
[Epoch 48 Batch 420/1540] avg loss 0.00139207, throughput 2.84841K wps
[Epoch 48 Batch 450/1540] avg loss 0.00176407, throughput 2.77023K wps
[Epoch 48 Batch 480/1540] avg loss 0.00145657, throughput 2.83137K wps
[Epoch 48 Batch 510/1540] avg loss 0.00189333, throughput 2.84906K wps
[Epoch 48 Batch 540/1540] avg loss 0.00168138, throughput 2.85484K wps
[Epoch 48 Batch 570/1540] avg loss 0.00159801, throughput 2.78501K wps
[Epoch 48 Batch 600/1540] avg loss 0.00170602, throughput 2.80664K wps
[Epoch 48 Batch 630/1540] avg loss 0.00161116, throughput 2.81189K wps
[Epoch 48 Batch 660/1540] avg loss 0.00159013, throughput 2.75665K wps
[Epoch 48 Batch 690/1540] avg loss 0.00152025, throughput 2.79303K wps
[Epoch 48 Batch 720/1540] avg loss 0.00166793, throughput 2.82639K wps
[Epoch 48 Batch 750/1540] avg loss 0.00157596, throughput 2.83278K wps
[Epoch 48 Batch 780/1540] avg loss 0.00149309, throughput 2.84094K wps
[Epoch 48 Batch 810/1540] avg loss 0.00143458, throughput 2.84426K wps
[Epoch 48 Batch 840/1540] avg loss 0.00173279, throughput 2.8113K wps
[Epoch 48 Batch 870/1540] avg loss 0.0016783, throughput 2.80515K wps
[Epoch 48 Batch 900/1540] avg loss 0.0017051, throughput 2.83491K wps
[Epoch 48 Batch 930/1540] avg loss 0.00172491, throughput 2.84025K wps
[Epoch 48 Batch 960/1540] avg loss 0.00203778, throughput 2.76208K wps
[Epoch 48 Batch 990/1540] avg loss 0.00170076, throughput 2.80447K wps
[Epoch 48 Batch 1020/1540] avg loss 0.00172106, throughput 2.84544K wps
[Epoch 48 Batch 1050/1540] avg loss 0.00158013, throughput 2.8504K wps
[Epoch 48 Batch 1080/1540] avg loss 0.00169774, throughput 2.84828K wps
[Epoch 48 Batch 1110/1540] avg loss 0.0017865, throughput 2.84991K wps
[Epoch 48 Batch 1140/1540] avg loss 0.00182178, throughput 2.84784K wps
[Epoch 48 Batch 1170/1540] avg loss 0.0017341, throughput 2.83592K wps
[Epoch 48 Batch 1200/1540] avg loss 0.00159297, throughput 2.83862K wps
[Epoch 48 Batch 1230/1540] avg loss 0.00179804, throughput 2.7598K wps
[Epoch 48 Batch 1260/1540] avg loss 0.00173032, throughput 2.85121K wps
[Epoch 48 Batch 1290/1540] avg loss 0.00158572, throughput 2.84434K wps
[Epoch 48 Batch 1320/1540] avg loss 0.00175016, throughput 2.8139K wps
[Epoch 48 Batch 1350/1540] avg loss 0.00177366, throughput 2.8258K wps
[Epoch 48 Batch 1380/1540] avg loss 0.00213275, throughput 2.81421K wps
[Epoch 48 Batch 1410/1540] avg loss 0.0016954, throughput 2.84858K wps
[Epoch 48 Batch 1440/1540] avg loss 0.00170168, throughput 2.76971K wps
[Epoch 48 Batch 1470/1540] avg loss 0.00171106, throughput 2.7664K wps
[Epoch 48 Batch 1500/1540] avg loss 0.0018728, throughput 2.76918K wps
[Epoch 48 Batch 1530/1540] avg loss 0.00154093, throughput 2.84176K wps
Begin Testing...
[Epoch 48] train avg loss 0.00166113, dev acc 0.8291, dev avg loss 0.611878, throughput 2.81781K wps
[Epoch 49 Batch 30/1540] avg loss 0.00177039, throughput 2.87476K wps
[Epoch 49 Batch 60/1540] avg loss 0.00131665, throughput 2.76018K wps
[Epoch 49 Batch 90/1540] avg loss 0.00145712, throughput 2.83126K wps
[Epoch 49 Batch 120/1540] avg loss 0.00154301, throughput 2.84697K wps
[Epoch 49 Batch 150/1540] avg loss 0.00158216, throughput 2.84684K wps
[Epoch 49 Batch 180/1540] avg loss 0.001735, throughput 2.84676K wps
[Epoch 49 Batch 210/1540] avg loss 0.00140238, throughput 2.84725K wps
[Epoch 49 Batch 240/1540] avg loss 0.00174175, throughput 2.84289K wps
[Epoch 49 Batch 270/1540] avg loss 0.00160455, throughput 2.8449K wps
[Epoch 49 Batch 300/1540] avg loss 0.00177554, throughput 2.7982K wps
[Epoch 49 Batch 330/1540] avg loss 0.00153732, throughput 2.76357K wps
[Epoch 49 Batch 360/1540] avg loss 0.0018079, throughput 2.75971K wps
[Epoch 49 Batch 390/1540] avg loss 0.00183832, throughput 2.83841K wps
[Epoch 49 Batch 420/1540] avg loss 0.00173571, throughput 2.81668K wps
[Epoch 49 Batch 450/1540] avg loss 0.00180331, throughput 2.75598K wps
[Epoch 49 Batch 480/1540] avg loss 0.00172405, throughput 2.75765K wps
[Epoch 49 Batch 510/1540] avg loss 0.00132498, throughput 2.81589K wps
[Epoch 49 Batch 540/1540] avg loss 0.0016889, throughput 2.84197K wps
[Epoch 49 Batch 570/1540] avg loss 0.00156039, throughput 2.84928K wps
[Epoch 49 Batch 600/1540] avg loss 0.00177264, throughput 2.84266K wps
[Epoch 49 Batch 630/1540] avg loss 0.00178169, throughput 2.84321K wps
[Epoch 49 Batch 660/1540] avg loss 0.00159155, throughput 2.84975K wps
[Epoch 49 Batch 690/1540] avg loss 0.00153601, throughput 2.84753K wps
[Epoch 49 Batch 720/1540] avg loss 0.0018778, throughput 2.84834K wps
[Epoch 49 Batch 750/1540] avg loss 0.00167684, throughput 2.84524K wps
[Epoch 49 Batch 780/1540] avg loss 0.00173923, throughput 2.84628K wps
[Epoch 49 Batch 810/1540] avg loss 0.00152999, throughput 2.83971K wps
[Epoch 49 Batch 840/1540] avg loss 0.00164438, throughput 2.83214K wps
[Epoch 49 Batch 870/1540] avg loss 0.00158827, throughput 2.80105K wps
[Epoch 49 Batch 900/1540] avg loss 0.00158016, throughput 2.76415K wps
[Epoch 49 Batch 930/1540] avg loss 0.00178833, throughput 2.83918K wps
[Epoch 49 Batch 960/1540] avg loss 0.00158941, throughput 2.80797K wps
[Epoch 49 Batch 990/1540] avg loss 0.00157309, throughput 2.828K wps
[Epoch 49 Batch 1020/1540] avg loss 0.00146714, throughput 2.77178K wps
[Epoch 49 Batch 1050/1540] avg loss 0.00153891, throughput 2.78043K wps
[Epoch 49 Batch 1080/1540] avg loss 0.00164426, throughput 2.82439K wps
[Epoch 49 Batch 1110/1540] avg loss 0.00171224, throughput 2.84306K wps
[Epoch 49 Batch 1140/1540] avg loss 0.00161167, throughput 2.81808K wps
[Epoch 49 Batch 1170/1540] avg loss 0.00197308, throughput 2.83905K wps
[Epoch 49 Batch 1200/1540] avg loss 0.00167355, throughput 2.84645K wps
[Epoch 49 Batch 1230/1540] avg loss 0.00172236, throughput 2.84273K wps
[Epoch 49 Batch 1260/1540] avg loss 0.00164472, throughput 2.81636K wps
[Epoch 49 Batch 1290/1540] avg loss 0.00178513, throughput 2.793K wps
[Epoch 49 Batch 1320/1540] avg loss 0.00218026, throughput 2.8263K wps
[Epoch 49 Batch 1350/1540] avg loss 0.00187527, throughput 2.75523K wps
[Epoch 49 Batch 1380/1540] avg loss 0.00152661, throughput 2.82078K wps
[Epoch 49 Batch 1410/1540] avg loss 0.00188437, throughput 2.83144K wps
[Epoch 49 Batch 1440/1540] avg loss 0.00161765, throughput 2.82221K wps
[Epoch 49 Batch 1470/1540] avg loss 0.00153014, throughput 2.84762K wps
[Epoch 49 Batch 1500/1540] avg loss 0.00171545, throughput 2.85234K wps
[Epoch 49 Batch 1530/1540] avg loss 0.00174662, throughput 2.83122K wps
Begin Testing...
[Epoch 49] train avg loss 0.00166701, dev acc 0.8245, dev avg loss 0.601545, throughput 2.82161K wps
[Epoch 50 Batch 30/1540] avg loss 0.00179161, throughput 2.82271K wps
[Epoch 50 Batch 60/1540] avg loss 0.00167236, throughput 2.82104K wps
[Epoch 50 Batch 90/1540] avg loss 0.00171548, throughput 2.84721K wps
[Epoch 50 Batch 120/1540] avg loss 0.0012232, throughput 2.85006K wps
[Epoch 50 Batch 150/1540] avg loss 0.00147763, throughput 2.84747K wps
[Epoch 50 Batch 180/1540] avg loss 0.00147464, throughput 2.79282K wps
[Epoch 50 Batch 210/1540] avg loss 0.00152249, throughput 2.842K wps
[Epoch 50 Batch 240/1540] avg loss 0.0016922, throughput 2.8109K wps
[Epoch 50 Batch 270/1540] avg loss 0.00185697, throughput 2.76162K wps
[Epoch 50 Batch 300/1540] avg loss 0.00169811, throughput 2.74133K wps
[Epoch 50 Batch 330/1540] avg loss 0.00169237, throughput 2.80934K wps
[Epoch 50 Batch 360/1540] avg loss 0.00176869, throughput 2.76372K wps
[Epoch 50 Batch 390/1540] avg loss 0.00146812, throughput 2.83554K wps
[Epoch 50 Batch 420/1540] avg loss 0.00179776, throughput 2.80714K wps
[Epoch 50 Batch 450/1540] avg loss 0.00150848, throughput 2.84179K wps
[Epoch 50 Batch 480/1540] avg loss 0.0016397, throughput 2.84575K wps
[Epoch 50 Batch 510/1540] avg loss 0.00179705, throughput 2.82695K wps
[Epoch 50 Batch 540/1540] avg loss 0.0015882, throughput 2.84166K wps
[Epoch 50 Batch 570/1540] avg loss 0.00168666, throughput 2.79771K wps
[Epoch 50 Batch 600/1540] avg loss 0.00178196, throughput 2.84886K wps
[Epoch 50 Batch 630/1540] avg loss 0.00183583, throughput 2.8396K wps
[Epoch 50 Batch 660/1540] avg loss 0.00151806, throughput 2.83853K wps
[Epoch 50 Batch 690/1540] avg loss 0.00151659, throughput 2.84572K wps
[Epoch 50 Batch 720/1540] avg loss 0.00140624, throughput 2.80003K wps
[Epoch 50 Batch 750/1540] avg loss 0.00164983, throughput 2.83757K wps
[Epoch 50 Batch 780/1540] avg loss 0.00135744, throughput 2.84124K wps
[Epoch 50 Batch 810/1540] avg loss 0.00176937, throughput 2.81277K wps
[Epoch 50 Batch 840/1540] avg loss 0.00135072, throughput 2.83128K wps
[Epoch 50 Batch 870/1540] avg loss 0.0016362, throughput 2.75675K wps
[Epoch 50 Batch 900/1540] avg loss 0.0016967, throughput 2.792K wps
[Epoch 50 Batch 930/1540] avg loss 0.0017411, throughput 2.76084K wps
[Epoch 50 Batch 960/1540] avg loss 0.00165788, throughput 2.76253K wps
[Epoch 50 Batch 990/1540] avg loss 0.00180782, throughput 2.81123K wps
[Epoch 50 Batch 1020/1540] avg loss 0.00164487, throughput 2.84517K wps
[Epoch 50 Batch 1050/1540] avg loss 0.00171162, throughput 2.85045K wps
[Epoch 50 Batch 1080/1540] avg loss 0.00149693, throughput 2.8445K wps
[Epoch 50 Batch 1110/1540] avg loss 0.00175627, throughput 2.84655K wps
[Epoch 50 Batch 1140/1540] avg loss 0.00184495, throughput 2.8499K wps
[Epoch 50 Batch 1170/1540] avg loss 0.0018682, throughput 2.85277K wps
[Epoch 50 Batch 1200/1540] avg loss 0.00155138, throughput 2.85273K wps
[Epoch 50 Batch 1230/1540] avg loss 0.00168087, throughput 2.8198K wps
[Epoch 50 Batch 1260/1540] avg loss 0.00169881, throughput 2.81776K wps
[Epoch 50 Batch 1290/1540] avg loss 0.00141347, throughput 2.84759K wps
[Epoch 50 Batch 1320/1540] avg loss 0.00150157, throughput 2.82296K wps
[Epoch 50 Batch 1350/1540] avg loss 0.00183672, throughput 2.76272K wps
[Epoch 50 Batch 1380/1540] avg loss 0.00170026, throughput 2.77389K wps
[Epoch 50 Batch 1410/1540] avg loss 0.00196499, throughput 2.80561K wps
[Epoch 50 Batch 1440/1540] avg loss 0.00151807, throughput 2.81264K wps
[Epoch 50 Batch 1470/1540] avg loss 0.00183877, throughput 2.84792K wps
[Epoch 50 Batch 1500/1540] avg loss 0.00174583, throughput 2.85125K wps
[Epoch 50 Batch 1530/1540] avg loss 0.00197142, throughput 2.75961K wps
Begin Testing...
[Epoch 50] train avg loss 0.00166154, dev acc 0.8280, dev avg loss 0.63177, throughput 2.81792K wps
[Epoch 51 Batch 30/1540] avg loss 0.00151125, throughput 2.85339K wps
[Epoch 51 Batch 60/1540] avg loss 0.00119668, throughput 2.85114K wps
[Epoch 51 Batch 90/1540] avg loss 0.00146068, throughput 2.83525K wps
[Epoch 51 Batch 120/1540] avg loss 0.00172259, throughput 2.79502K wps
[Epoch 51 Batch 150/1540] avg loss 0.00146142, throughput 2.78126K wps
[Epoch 51 Batch 180/1540] avg loss 0.00149249, throughput 2.84286K wps
[Epoch 51 Batch 210/1540] avg loss 0.00145464, throughput 2.79417K wps
[Epoch 51 Batch 240/1540] avg loss 0.00173541, throughput 2.84847K wps
[Epoch 51 Batch 270/1540] avg loss 0.00140182, throughput 2.84978K wps
[Epoch 51 Batch 300/1540] avg loss 0.00188427, throughput 2.84941K wps
[Epoch 51 Batch 330/1540] avg loss 0.00137971, throughput 2.76048K wps
[Epoch 51 Batch 360/1540] avg loss 0.00157328, throughput 2.7848K wps
[Epoch 51 Batch 390/1540] avg loss 0.00154389, throughput 2.82645K wps
[Epoch 51 Batch 420/1540] avg loss 0.00157013, throughput 2.77082K wps
[Epoch 51 Batch 450/1540] avg loss 0.00154786, throughput 2.79969K wps
[Epoch 51 Batch 480/1540] avg loss 0.00134483, throughput 2.83085K wps
[Epoch 51 Batch 510/1540] avg loss 0.00151099, throughput 2.84726K wps
[Epoch 51 Batch 540/1540] avg loss 0.00180141, throughput 2.80083K wps
[Epoch 51 Batch 570/1540] avg loss 0.00156936, throughput 2.78703K wps
[Epoch 51 Batch 600/1540] avg loss 0.00156464, throughput 2.83906K wps
[Epoch 51 Batch 630/1540] avg loss 0.00174208, throughput 2.84998K wps
[Epoch 51 Batch 660/1540] avg loss 0.00154295, throughput 2.83992K wps
[Epoch 51 Batch 690/1540] avg loss 0.00163549, throughput 2.84607K wps
[Epoch 51 Batch 720/1540] avg loss 0.00171235, throughput 2.84212K wps
[Epoch 51 Batch 750/1540] avg loss 0.00141223, throughput 2.8091K wps
[Epoch 51 Batch 780/1540] avg loss 0.00165101, throughput 2.83477K wps
[Epoch 51 Batch 810/1540] avg loss 0.00178944, throughput 2.84135K wps
[Epoch 51 Batch 840/1540] avg loss 0.00165502, throughput 2.79216K wps
[Epoch 51 Batch 870/1540] avg loss 0.00153707, throughput 2.80606K wps
[Epoch 51 Batch 900/1540] avg loss 0.00163699, throughput 2.79552K wps
[Epoch 51 Batch 930/1540] avg loss 0.00174346, throughput 2.85193K wps
[Epoch 51 Batch 960/1540] avg loss 0.00138808, throughput 2.84466K wps
[Epoch 51 Batch 990/1540] avg loss 0.00169055, throughput 2.85007K wps
[Epoch 51 Batch 1020/1540] avg loss 0.00169654, throughput 2.77776K wps
[Epoch 51 Batch 1050/1540] avg loss 0.00180496, throughput 2.83262K wps
[Epoch 51 Batch 1080/1540] avg loss 0.00163316, throughput 2.78235K wps
[Epoch 51 Batch 1110/1540] avg loss 0.00203204, throughput 2.83665K wps
[Epoch 51 Batch 1140/1540] avg loss 0.00156277, throughput 2.84695K wps
[Epoch 51 Batch 1170/1540] avg loss 0.00143962, throughput 2.78526K wps
[Epoch 51 Batch 1200/1540] avg loss 0.00156667, throughput 2.84595K wps
[Epoch 51 Batch 1230/1540] avg loss 0.00156675, throughput 2.81381K wps
[Epoch 51 Batch 1260/1540] avg loss 0.00183506, throughput 2.83762K wps
[Epoch 51 Batch 1290/1540] avg loss 0.00183059, throughput 2.84959K wps
[Epoch 51 Batch 1320/1540] avg loss 0.00178442, throughput 2.84632K wps
[Epoch 51 Batch 1350/1540] avg loss 0.00193653, throughput 2.84773K wps
[Epoch 51 Batch 1380/1540] avg loss 0.00145175, throughput 2.84196K wps
[Epoch 51 Batch 1410/1540] avg loss 0.00151019, throughput 2.84598K wps
[Epoch 51 Batch 1440/1540] avg loss 0.00180521, throughput 2.84905K wps
[Epoch 51 Batch 1470/1540] avg loss 0.0015918, throughput 2.84505K wps
[Epoch 51 Batch 1500/1540] avg loss 0.00155481, throughput 2.81386K wps
[Epoch 51 Batch 1530/1540] avg loss 0.0018042, throughput 2.84366K wps
Begin Testing...
[Epoch 51] train avg loss 0.00161205, dev acc 0.8303, dev avg loss 0.608659, throughput 2.82524K wps
[Epoch 52 Batch 30/1540] avg loss 0.00142586, throughput 2.86494K wps
[Epoch 52 Batch 60/1540] avg loss 0.00140589, throughput 2.83907K wps
[Epoch 52 Batch 90/1540] avg loss 0.00125522, throughput 2.84013K wps
[Epoch 52 Batch 120/1540] avg loss 0.00159256, throughput 2.84989K wps
[Epoch 52 Batch 150/1540] avg loss 0.00143898, throughput 2.84659K wps
[Epoch 52 Batch 180/1540] avg loss 0.00126827, throughput 2.854K wps
[Epoch 52 Batch 210/1540] avg loss 0.00163269, throughput 2.84341K wps
[Epoch 52 Batch 240/1540] avg loss 0.00164308, throughput 2.8455K wps
[Epoch 52 Batch 270/1540] avg loss 0.00168993, throughput 2.83197K wps
[Epoch 52 Batch 300/1540] avg loss 0.00151651, throughput 2.82318K wps
[Epoch 52 Batch 330/1540] avg loss 0.00151722, throughput 2.82377K wps
[Epoch 52 Batch 360/1540] avg loss 0.0017623, throughput 2.82488K wps
[Epoch 52 Batch 390/1540] avg loss 0.00151198, throughput 2.84115K wps
[Epoch 52 Batch 420/1540] avg loss 0.00151576, throughput 2.83981K wps
[Epoch 52 Batch 450/1540] avg loss 0.00183898, throughput 2.84586K wps
[Epoch 52 Batch 480/1540] avg loss 0.00170336, throughput 2.74957K wps
[Epoch 52 Batch 510/1540] avg loss 0.00172757, throughput 2.78163K wps
[Epoch 52 Batch 540/1540] avg loss 0.00140964, throughput 2.81998K wps
[Epoch 52 Batch 570/1540] avg loss 0.0016109, throughput 2.85306K wps
[Epoch 52 Batch 600/1540] avg loss 0.00169798, throughput 2.85055K wps
[Epoch 52 Batch 630/1540] avg loss 0.00166671, throughput 2.84625K wps
[Epoch 52 Batch 660/1540] avg loss 0.00172846, throughput 2.84117K wps
[Epoch 52 Batch 690/1540] avg loss 0.00127965, throughput 2.81731K wps
[Epoch 52 Batch 720/1540] avg loss 0.00175578, throughput 2.83886K wps
[Epoch 52 Batch 750/1540] avg loss 0.00177302, throughput 2.84291K wps
[Epoch 52 Batch 780/1540] avg loss 0.00141934, throughput 2.85233K wps
[Epoch 52 Batch 810/1540] avg loss 0.00181108, throughput 2.85322K wps
[Epoch 52 Batch 840/1540] avg loss 0.00192643, throughput 2.84202K wps
[Epoch 52 Batch 870/1540] avg loss 0.00165904, throughput 2.84405K wps
[Epoch 52 Batch 900/1540] avg loss 0.00154323, throughput 2.83943K wps
[Epoch 52 Batch 930/1540] avg loss 0.00141434, throughput 2.81647K wps
[Epoch 52 Batch 960/1540] avg loss 0.00160876, throughput 2.83178K wps
[Epoch 52 Batch 990/1540] avg loss 0.00154446, throughput 2.76548K wps
[Epoch 52 Batch 1020/1540] avg loss 0.00144492, throughput 2.81161K wps
[Epoch 52 Batch 1050/1540] avg loss 0.00150117, throughput 2.79861K wps
[Epoch 52 Batch 1080/1540] avg loss 0.00177814, throughput 2.82567K wps
[Epoch 52 Batch 1110/1540] avg loss 0.00163021, throughput 2.76446K wps
[Epoch 52 Batch 1140/1540] avg loss 0.00155893, throughput 2.77013K wps
[Epoch 52 Batch 1170/1540] avg loss 0.00172707, throughput 2.83988K wps
[Epoch 52 Batch 1200/1540] avg loss 0.0014556, throughput 2.84573K wps
[Epoch 52 Batch 1230/1540] avg loss 0.00167865, throughput 2.84573K wps
[Epoch 52 Batch 1260/1540] avg loss 0.00176365, throughput 2.8513K wps
[Epoch 52 Batch 1290/1540] avg loss 0.00168615, throughput 2.80232K wps
[Epoch 52 Batch 1320/1540] avg loss 0.00143145, throughput 2.79305K wps
[Epoch 52 Batch 1350/1540] avg loss 0.00146468, throughput 2.79889K wps
[Epoch 52 Batch 1380/1540] avg loss 0.00169367, throughput 2.85086K wps
[Epoch 52 Batch 1410/1540] avg loss 0.00159537, throughput 2.84624K wps
[Epoch 52 Batch 1440/1540] avg loss 0.00200999, throughput 2.84829K wps
[Epoch 52 Batch 1470/1540] avg loss 0.00169718, throughput 2.78699K wps
[Epoch 52 Batch 1500/1540] avg loss 0.0017861, throughput 2.85322K wps
[Epoch 52 Batch 1530/1540] avg loss 0.00167309, throughput 2.77687K wps
Begin Testing...
[Epoch 52] train avg loss 0.0016068, dev acc 0.8234, dev avg loss 0.62321, throughput 2.82712K wps
[Epoch 53 Batch 30/1540] avg loss 0.00168376, throughput 2.8786K wps
[Epoch 53 Batch 60/1540] avg loss 0.00149638, throughput 2.8381K wps
[Epoch 53 Batch 90/1540] avg loss 0.00133826, throughput 2.76899K wps
[Epoch 53 Batch 120/1540] avg loss 0.00124936, throughput 2.80589K wps
[Epoch 53 Batch 150/1540] avg loss 0.0014443, throughput 2.7863K wps
[Epoch 53 Batch 180/1540] avg loss 0.00159999, throughput 2.83944K wps
[Epoch 53 Batch 210/1540] avg loss 0.00150983, throughput 2.84718K wps
[Epoch 53 Batch 240/1540] avg loss 0.00169461, throughput 2.84574K wps
[Epoch 53 Batch 270/1540] avg loss 0.00147448, throughput 2.84345K wps
[Epoch 53 Batch 300/1540] avg loss 0.00170127, throughput 2.83737K wps
[Epoch 53 Batch 330/1540] avg loss 0.00168436, throughput 2.83109K wps
[Epoch 53 Batch 360/1540] avg loss 0.00163155, throughput 2.84221K wps
[Epoch 53 Batch 390/1540] avg loss 0.00149553, throughput 2.83846K wps
[Epoch 53 Batch 420/1540] avg loss 0.00132699, throughput 2.82597K wps
[Epoch 53 Batch 450/1540] avg loss 0.00138258, throughput 2.79991K wps
[Epoch 53 Batch 480/1540] avg loss 0.00185806, throughput 2.83817K wps
[Epoch 53 Batch 510/1540] avg loss 0.00152254, throughput 2.84123K wps
[Epoch 53 Batch 540/1540] avg loss 0.00141726, throughput 2.84K wps
[Epoch 53 Batch 570/1540] avg loss 0.00173816, throughput 2.84541K wps
[Epoch 53 Batch 600/1540] avg loss 0.00142296, throughput 2.81107K wps
[Epoch 53 Batch 630/1540] avg loss 0.00151006, throughput 2.78736K wps
[Epoch 53 Batch 660/1540] avg loss 0.00183238, throughput 2.80086K wps
[Epoch 53 Batch 690/1540] avg loss 0.00148359, throughput 2.84258K wps
[Epoch 53 Batch 720/1540] avg loss 0.001902, throughput 2.84302K wps
[Epoch 53 Batch 750/1540] avg loss 0.00158828, throughput 2.79331K wps
[Epoch 53 Batch 780/1540] avg loss 0.00135562, throughput 2.82127K wps
[Epoch 53 Batch 810/1540] avg loss 0.00155477, throughput 2.84593K wps
[Epoch 53 Batch 840/1540] avg loss 0.00150866, throughput 2.78666K wps
[Epoch 53 Batch 870/1540] avg loss 0.00130755, throughput 2.7598K wps
[Epoch 53 Batch 900/1540] avg loss 0.00171474, throughput 2.7854K wps
[Epoch 53 Batch 930/1540] avg loss 0.00150248, throughput 2.84583K wps
[Epoch 53 Batch 960/1540] avg loss 0.00147497, throughput 2.80937K wps
[Epoch 53 Batch 990/1540] avg loss 0.00159656, throughput 2.76767K wps
[Epoch 53 Batch 1020/1540] avg loss 0.00166414, throughput 2.75575K wps
[Epoch 53 Batch 1050/1540] avg loss 0.00173379, throughput 2.84732K wps
[Epoch 53 Batch 1080/1540] avg loss 0.00133014, throughput 2.80534K wps
[Epoch 53 Batch 1110/1540] avg loss 0.00159082, throughput 2.80562K wps
[Epoch 53 Batch 1140/1540] avg loss 0.00172253, throughput 2.84641K wps
[Epoch 53 Batch 1170/1540] avg loss 0.00156602, throughput 2.81122K wps
[Epoch 53 Batch 1200/1540] avg loss 0.00140825, throughput 2.84951K wps
[Epoch 53 Batch 1230/1540] avg loss 0.00166859, throughput 2.79038K wps
[Epoch 53 Batch 1260/1540] avg loss 0.00160276, throughput 2.78587K wps
[Epoch 53 Batch 1290/1540] avg loss 0.00186573, throughput 2.84927K wps
[Epoch 53 Batch 1320/1540] avg loss 0.00163318, throughput 2.8282K wps
[Epoch 53 Batch 1350/1540] avg loss 0.00162768, throughput 2.82966K wps
[Epoch 53 Batch 1380/1540] avg loss 0.00162749, throughput 2.83523K wps
[Epoch 53 Batch 1410/1540] avg loss 0.0017656, throughput 2.84831K wps
[Epoch 53 Batch 1440/1540] avg loss 0.00184393, throughput 2.76948K wps
[Epoch 53 Batch 1470/1540] avg loss 0.00135394, throughput 2.76864K wps
[Epoch 53 Batch 1500/1540] avg loss 0.00165903, throughput 2.81867K wps
[Epoch 53 Batch 1530/1540] avg loss 0.00161039, throughput 2.84467K wps
Begin Testing...
[Epoch 53] train avg loss 0.00157481, dev acc 0.8268, dev avg loss 0.613074, throughput 2.81913K wps
[Epoch 54 Batch 30/1540] avg loss 0.00178946, throughput 2.81625K wps
[Epoch 54 Batch 60/1540] avg loss 0.0014415, throughput 2.84777K wps
[Epoch 54 Batch 90/1540] avg loss 0.00138778, throughput 2.83282K wps
[Epoch 54 Batch 120/1540] avg loss 0.00144341, throughput 2.82598K wps
[Epoch 54 Batch 150/1540] avg loss 0.00146798, throughput 2.82361K wps
[Epoch 54 Batch 180/1540] avg loss 0.00129578, throughput 2.78763K wps
[Epoch 54 Batch 210/1540] avg loss 0.00152047, throughput 2.85229K wps
[Epoch 54 Batch 240/1540] avg loss 0.0015321, throughput 2.81367K wps
[Epoch 54 Batch 270/1540] avg loss 0.00116636, throughput 2.78047K wps
[Epoch 54 Batch 300/1540] avg loss 0.0014489, throughput 2.79254K wps
[Epoch 54 Batch 330/1540] avg loss 0.001484, throughput 2.82814K wps
[Epoch 54 Batch 360/1540] avg loss 0.00169392, throughput 2.84768K wps
[Epoch 54 Batch 390/1540] avg loss 0.00137311, throughput 2.84736K wps
[Epoch 54 Batch 420/1540] avg loss 0.00167045, throughput 2.85341K wps
[Epoch 54 Batch 450/1540] avg loss 0.00131751, throughput 2.82049K wps
[Epoch 54 Batch 480/1540] avg loss 0.00169674, throughput 2.82486K wps
[Epoch 54 Batch 510/1540] avg loss 0.00171225, throughput 2.77561K wps
[Epoch 54 Batch 540/1540] avg loss 0.00139223, throughput 2.81831K wps
[Epoch 54 Batch 570/1540] avg loss 0.00152077, throughput 2.79419K wps
[Epoch 54 Batch 600/1540] avg loss 0.00146333, throughput 2.81979K wps
[Epoch 54 Batch 630/1540] avg loss 0.00149765, throughput 2.85286K wps
[Epoch 54 Batch 660/1540] avg loss 0.00158359, throughput 2.8378K wps
[Epoch 54 Batch 690/1540] avg loss 0.00160761, throughput 2.80851K wps
[Epoch 54 Batch 720/1540] avg loss 0.00181784, throughput 2.85214K wps
[Epoch 54 Batch 750/1540] avg loss 0.00162168, throughput 2.85104K wps
[Epoch 54 Batch 780/1540] avg loss 0.00134077, throughput 2.84644K wps
[Epoch 54 Batch 810/1540] avg loss 0.00171043, throughput 2.84762K wps
[Epoch 54 Batch 840/1540] avg loss 0.00154186, throughput 2.84314K wps
[Epoch 54 Batch 870/1540] avg loss 0.00165858, throughput 2.84974K wps
[Epoch 54 Batch 900/1540] avg loss 0.00160691, throughput 2.7815K wps
[Epoch 54 Batch 930/1540] avg loss 0.001821, throughput 2.80309K wps
[Epoch 54 Batch 960/1540] avg loss 0.0017196, throughput 2.81729K wps
[Epoch 54 Batch 990/1540] avg loss 0.00165751, throughput 2.83679K wps
[Epoch 54 Batch 1020/1540] avg loss 0.00137689, throughput 2.7644K wps
[Epoch 54 Batch 1050/1540] avg loss 0.0017213, throughput 2.7658K wps
[Epoch 54 Batch 1080/1540] avg loss 0.00135313, throughput 2.76627K wps
[Epoch 54 Batch 1110/1540] avg loss 0.00160011, throughput 2.83572K wps
[Epoch 54 Batch 1140/1540] avg loss 0.00177548, throughput 2.85057K wps
[Epoch 54 Batch 1170/1540] avg loss 0.00162101, throughput 2.79283K wps
[Epoch 54 Batch 1200/1540] avg loss 0.00137545, throughput 2.84597K wps
[Epoch 54 Batch 1230/1540] avg loss 0.00148884, throughput 2.82992K wps
[Epoch 54 Batch 1260/1540] avg loss 0.00169516, throughput 2.7858K wps
[Epoch 54 Batch 1290/1540] avg loss 0.0015847, throughput 2.84717K wps
[Epoch 54 Batch 1320/1540] avg loss 0.00157971, throughput 2.76835K wps
[Epoch 54 Batch 1350/1540] avg loss 0.00166787, throughput 2.83423K wps
[Epoch 54 Batch 1380/1540] avg loss 0.00165, throughput 2.82296K wps
[Epoch 54 Batch 1410/1540] avg loss 0.00149737, throughput 2.75848K wps
[Epoch 54 Batch 1440/1540] avg loss 0.00143234, throughput 2.76238K wps
[Epoch 54 Batch 1470/1540] avg loss 0.00199275, throughput 2.83393K wps
[Epoch 54 Batch 1500/1540] avg loss 0.00157502, throughput 2.79965K wps
[Epoch 54 Batch 1530/1540] avg loss 0.00165244, throughput 2.75157K wps
Begin Testing...
[Epoch 54] train avg loss 0.00156316, dev acc 0.8268, dev avg loss 0.629573, throughput 2.81591K wps
[Epoch 55 Batch 30/1540] avg loss 0.00133662, throughput 2.83178K wps
[Epoch 55 Batch 60/1540] avg loss 0.00151888, throughput 2.8491K wps
[Epoch 55 Batch 90/1540] avg loss 0.00151651, throughput 2.85062K wps
[Epoch 55 Batch 120/1540] avg loss 0.00137905, throughput 2.82662K wps
[Epoch 55 Batch 150/1540] avg loss 0.00149852, throughput 2.84274K wps
[Epoch 55 Batch 180/1540] avg loss 0.00147379, throughput 2.84932K wps
[Epoch 55 Batch 210/1540] avg loss 0.0013382, throughput 2.84779K wps
[Epoch 55 Batch 240/1540] avg loss 0.00128212, throughput 2.84315K wps
[Epoch 55 Batch 270/1540] avg loss 0.00139871, throughput 2.84384K wps
[Epoch 55 Batch 300/1540] avg loss 0.00147507, throughput 2.83445K wps
[Epoch 55 Batch 330/1540] avg loss 0.00155571, throughput 2.8383K wps
[Epoch 55 Batch 360/1540] avg loss 0.00145163, throughput 2.8346K wps
[Epoch 55 Batch 390/1540] avg loss 0.00159744, throughput 2.80757K wps
[Epoch 55 Batch 420/1540] avg loss 0.00147355, throughput 2.75466K wps
[Epoch 55 Batch 450/1540] avg loss 0.00154238, throughput 2.80994K wps
[Epoch 55 Batch 480/1540] avg loss 0.00164533, throughput 2.81285K wps
[Epoch 55 Batch 510/1540] avg loss 0.0015781, throughput 2.84233K wps
[Epoch 55 Batch 540/1540] avg loss 0.00149444, throughput 2.85037K wps
[Epoch 55 Batch 570/1540] avg loss 0.00164203, throughput 2.85048K wps
[Epoch 55 Batch 600/1540] avg loss 0.00164791, throughput 2.84086K wps
[Epoch 55 Batch 630/1540] avg loss 0.00151126, throughput 2.80855K wps
[Epoch 55 Batch 660/1540] avg loss 0.00155201, throughput 2.84817K wps
[Epoch 55 Batch 690/1540] avg loss 0.00158611, throughput 2.85047K wps
[Epoch 55 Batch 720/1540] avg loss 0.0015446, throughput 2.83322K wps
[Epoch 55 Batch 750/1540] avg loss 0.00174323, throughput 2.79181K wps
[Epoch 55 Batch 780/1540] avg loss 0.00173553, throughput 2.83516K wps
[Epoch 55 Batch 810/1540] avg loss 0.0016255, throughput 2.84604K wps
[Epoch 55 Batch 840/1540] avg loss 0.00157427, throughput 2.83599K wps
[Epoch 55 Batch 870/1540] avg loss 0.00148228, throughput 2.80922K wps
[Epoch 55 Batch 900/1540] avg loss 0.00141311, throughput 2.80419K wps
[Epoch 55 Batch 930/1540] avg loss 0.00147897, throughput 2.79229K wps
[Epoch 55 Batch 960/1540] avg loss 0.00180527, throughput 2.78255K wps
[Epoch 55 Batch 990/1540] avg loss 0.00190032, throughput 2.82973K wps
[Epoch 55 Batch 1020/1540] avg loss 0.00161346, throughput 2.48098K wps
[Epoch 55 Batch 1050/1540] avg loss 0.00166528, throughput 2.78429K wps
[Epoch 55 Batch 1080/1540] avg loss 0.00144015, throughput 2.80527K wps
[Epoch 55 Batch 1110/1540] avg loss 0.00142505, throughput 2.75611K wps
[Epoch 55 Batch 1140/1540] avg loss 0.00138914, throughput 2.78635K wps
[Epoch 55 Batch 1170/1540] avg loss 0.00162126, throughput 2.80619K wps
[Epoch 55 Batch 1200/1540] avg loss 0.001489, throughput 2.84707K wps
[Epoch 55 Batch 1230/1540] avg loss 0.00165128, throughput 2.84905K wps
[Epoch 55 Batch 1260/1540] avg loss 0.00145809, throughput 2.84697K wps
[Epoch 55 Batch 1290/1540] avg loss 0.00177979, throughput 2.84618K wps
[Epoch 55 Batch 1320/1540] avg loss 0.00164275, throughput 2.77363K wps
[Epoch 55 Batch 1350/1540] avg loss 0.00170635, throughput 2.85105K wps
[Epoch 55 Batch 1380/1540] avg loss 0.00164928, throughput 2.79447K wps
[Epoch 55 Batch 1410/1540] avg loss 0.00171165, throughput 2.76143K wps
[Epoch 55 Batch 1440/1540] avg loss 0.00135586, throughput 2.83489K wps
[Epoch 55 Batch 1470/1540] avg loss 0.00150241, throughput 2.82812K wps
[Epoch 55 Batch 1500/1540] avg loss 0.00175656, throughput 2.84455K wps
[Epoch 55 Batch 1530/1540] avg loss 0.00155915, throughput 2.8486K wps
Begin Testing...
[Epoch 55] train avg loss 0.00155369, dev acc 0.8303, dev avg loss 0.63823, throughput 2.81619K wps
[Epoch 56 Batch 30/1540] avg loss 0.0015331, throughput 2.85753K wps
[Epoch 56 Batch 60/1540] avg loss 0.00118413, throughput 2.82124K wps
[Epoch 56 Batch 90/1540] avg loss 0.00151501, throughput 2.79249K wps
[Epoch 56 Batch 120/1540] avg loss 0.00147778, throughput 2.76015K wps
[Epoch 56 Batch 150/1540] avg loss 0.00119185, throughput 2.84451K wps
[Epoch 56 Batch 180/1540] avg loss 0.00154134, throughput 2.84635K wps
[Epoch 56 Batch 210/1540] avg loss 0.00146633, throughput 2.77495K wps
[Epoch 56 Batch 240/1540] avg loss 0.0015513, throughput 2.80422K wps
[Epoch 56 Batch 270/1540] avg loss 0.00143753, throughput 2.81172K wps
[Epoch 56 Batch 300/1540] avg loss 0.00123355, throughput 2.84852K wps
[Epoch 56 Batch 330/1540] avg loss 0.00168556, throughput 2.84608K wps
[Epoch 56 Batch 360/1540] avg loss 0.00156647, throughput 2.8499K wps
[Epoch 56 Batch 390/1540] avg loss 0.00142615, throughput 2.78002K wps
[Epoch 56 Batch 420/1540] avg loss 0.00150195, throughput 2.84245K wps
[Epoch 56 Batch 450/1540] avg loss 0.00131937, throughput 2.83202K wps
[Epoch 56 Batch 480/1540] avg loss 0.00125678, throughput 2.84669K wps
[Epoch 56 Batch 510/1540] avg loss 0.00158983, throughput 2.81876K wps
[Epoch 56 Batch 540/1540] avg loss 0.00145506, throughput 2.8168K wps
[Epoch 56 Batch 570/1540] avg loss 0.00158264, throughput 2.79597K wps
[Epoch 56 Batch 600/1540] avg loss 0.00173043, throughput 2.79564K wps
[Epoch 56 Batch 630/1540] avg loss 0.00162394, throughput 2.84553K wps
[Epoch 56 Batch 660/1540] avg loss 0.00141555, throughput 2.8462K wps
[Epoch 56 Batch 690/1540] avg loss 0.00141315, throughput 2.84586K wps
[Epoch 56 Batch 720/1540] avg loss 0.00145466, throughput 2.84837K wps
[Epoch 56 Batch 750/1540] avg loss 0.00124388, throughput 2.83837K wps
[Epoch 56 Batch 780/1540] avg loss 0.00170409, throughput 2.75808K wps
[Epoch 56 Batch 810/1540] avg loss 0.00133358, throughput 2.80213K wps
[Epoch 56 Batch 840/1540] avg loss 0.00149186, throughput 2.81831K wps
[Epoch 56 Batch 870/1540] avg loss 0.00136365, throughput 2.84465K wps
[Epoch 56 Batch 900/1540] avg loss 0.00149302, throughput 2.85195K wps
[Epoch 56 Batch 930/1540] avg loss 0.00145843, throughput 2.81355K wps
[Epoch 56 Batch 960/1540] avg loss 0.00145392, throughput 2.83109K wps
[Epoch 56 Batch 990/1540] avg loss 0.00168563, throughput 2.84285K wps
[Epoch 56 Batch 1020/1540] avg loss 0.0015349, throughput 2.78014K wps
[Epoch 56 Batch 1050/1540] avg loss 0.00171125, throughput 2.79896K wps
[Epoch 56 Batch 1080/1540] avg loss 0.00176983, throughput 2.83635K wps
[Epoch 56 Batch 1110/1540] avg loss 0.00148982, throughput 2.82366K wps
[Epoch 56 Batch 1140/1540] avg loss 0.00165265, throughput 2.85053K wps
[Epoch 56 Batch 1170/1540] avg loss 0.00168073, throughput 2.82706K wps
[Epoch 56 Batch 1200/1540] avg loss 0.00136367, throughput 2.75548K wps
[Epoch 56 Batch 1230/1540] avg loss 0.00195877, throughput 2.83608K wps
[Epoch 56 Batch 1260/1540] avg loss 0.00151219, throughput 2.8461K wps
[Epoch 56 Batch 1290/1540] avg loss 0.00141333, throughput 2.84757K wps
[Epoch 56 Batch 1320/1540] avg loss 0.00152341, throughput 2.8457K wps
[Epoch 56 Batch 1350/1540] avg loss 0.00153618, throughput 2.85183K wps
[Epoch 56 Batch 1380/1540] avg loss 0.00178014, throughput 2.85046K wps
[Epoch 56 Batch 1410/1540] avg loss 0.00158831, throughput 2.84017K wps
[Epoch 56 Batch 1440/1540] avg loss 0.00184094, throughput 2.83312K wps
[Epoch 56 Batch 1470/1540] avg loss 0.00156626, throughput 2.78183K wps
[Epoch 56 Batch 1500/1540] avg loss 0.00152188, throughput 2.76015K wps
[Epoch 56 Batch 1530/1540] avg loss 0.0015198, throughput 2.7625K wps
Begin Testing...
[Epoch 56] train avg loss 0.00151771, dev acc 0.8314, dev avg loss 0.633949, throughput 2.82091K wps
[Epoch 57 Batch 30/1540] avg loss 0.00131127, throughput 2.88266K wps
[Epoch 57 Batch 60/1540] avg loss 0.00150713, throughput 2.83075K wps
[Epoch 57 Batch 90/1540] avg loss 0.001537, throughput 2.79445K wps
[Epoch 57 Batch 120/1540] avg loss 0.00150302, throughput 2.85037K wps
[Epoch 57 Batch 150/1540] avg loss 0.00126267, throughput 2.84966K wps
[Epoch 57 Batch 180/1540] avg loss 0.00137075, throughput 2.84247K wps
[Epoch 57 Batch 210/1540] avg loss 0.00140543, throughput 2.83222K wps
[Epoch 57 Batch 240/1540] avg loss 0.00145209, throughput 2.81123K wps
[Epoch 57 Batch 270/1540] avg loss 0.0015316, throughput 2.80802K wps
[Epoch 57 Batch 300/1540] avg loss 0.00164144, throughput 2.82285K wps
[Epoch 57 Batch 330/1540] avg loss 0.00119377, throughput 2.83495K wps
[Epoch 57 Batch 360/1540] avg loss 0.00141041, throughput 2.77064K wps
[Epoch 57 Batch 390/1540] avg loss 0.00150508, throughput 2.83841K wps
[Epoch 57 Batch 420/1540] avg loss 0.00146869, throughput 2.78223K wps
[Epoch 57 Batch 450/1540] avg loss 0.00128478, throughput 2.81729K wps
[Epoch 57 Batch 480/1540] avg loss 0.00148726, throughput 2.79297K wps
[Epoch 57 Batch 510/1540] avg loss 0.00137567, throughput 2.76186K wps
[Epoch 57 Batch 540/1540] avg loss 0.00161441, throughput 2.76373K wps
[Epoch 57 Batch 570/1540] avg loss 0.00130073, throughput 2.76335K wps
[Epoch 57 Batch 600/1540] avg loss 0.00135898, throughput 2.82777K wps
[Epoch 57 Batch 630/1540] avg loss 0.00165538, throughput 2.82612K wps
[Epoch 57 Batch 660/1540] avg loss 0.00122295, throughput 2.74957K wps
[Epoch 57 Batch 690/1540] avg loss 0.00145649, throughput 2.78472K wps
[Epoch 57 Batch 720/1540] avg loss 0.00155079, throughput 2.84298K wps
[Epoch 57 Batch 750/1540] avg loss 0.00130448, throughput 2.84983K wps
[Epoch 57 Batch 780/1540] avg loss 0.00154914, throughput 2.8428K wps
[Epoch 57 Batch 810/1540] avg loss 0.00141971, throughput 2.84485K wps
[Epoch 57 Batch 840/1540] avg loss 0.00176175, throughput 2.80212K wps
[Epoch 57 Batch 870/1540] avg loss 0.00142603, throughput 2.77244K wps
[Epoch 57 Batch 900/1540] avg loss 0.00168632, throughput 2.82185K wps
[Epoch 57 Batch 930/1540] avg loss 0.00173562, throughput 2.84899K wps
[Epoch 57 Batch 960/1540] avg loss 0.00156402, throughput 2.85024K wps
[Epoch 57 Batch 990/1540] avg loss 0.00157854, throughput 2.85253K wps
[Epoch 57 Batch 1020/1540] avg loss 0.00167976, throughput 2.85412K wps
[Epoch 57 Batch 1050/1540] avg loss 0.00142776, throughput 2.84521K wps
[Epoch 57 Batch 1080/1540] avg loss 0.0016381, throughput 2.84522K wps
[Epoch 57 Batch 1110/1540] avg loss 0.00163753, throughput 2.84955K wps
[Epoch 57 Batch 1140/1540] avg loss 0.0013446, throughput 2.84531K wps
[Epoch 57 Batch 1170/1540] avg loss 0.00168202, throughput 2.85091K wps
[Epoch 57 Batch 1200/1540] avg loss 0.00152493, throughput 2.84743K wps
[Epoch 57 Batch 1230/1540] avg loss 0.00156918, throughput 2.84151K wps
[Epoch 57 Batch 1260/1540] avg loss 0.0016723, throughput 2.84024K wps
[Epoch 57 Batch 1290/1540] avg loss 0.00164287, throughput 2.78287K wps
[Epoch 57 Batch 1320/1540] avg loss 0.00167939, throughput 2.814K wps
[Epoch 57 Batch 1350/1540] avg loss 0.00148443, throughput 2.83881K wps
[Epoch 57 Batch 1380/1540] avg loss 0.00158905, throughput 2.84682K wps
[Epoch 57 Batch 1410/1540] avg loss 0.00185043, throughput 2.83497K wps
[Epoch 57 Batch 1440/1540] avg loss 0.00143833, throughput 2.79092K wps
[Epoch 57 Batch 1470/1540] avg loss 0.00147752, throughput 2.81276K wps
[Epoch 57 Batch 1500/1540] avg loss 0.0018646, throughput 2.85137K wps
[Epoch 57 Batch 1530/1540] avg loss 0.00159179, throughput 2.84617K wps
Begin Testing...
[Epoch 57] train avg loss 0.00151246, dev acc 0.8314, dev avg loss 0.640572, throughput 2.82346K wps
[Epoch 58 Batch 30/1540] avg loss 0.00138712, throughput 2.82191K wps
[Epoch 58 Batch 60/1540] avg loss 0.00135837, throughput 2.76111K wps
[Epoch 58 Batch 90/1540] avg loss 0.00145648, throughput 2.82713K wps
[Epoch 58 Batch 120/1540] avg loss 0.0014006, throughput 2.82972K wps
[Epoch 58 Batch 150/1540] avg loss 0.00139887, throughput 2.76145K wps
[Epoch 58 Batch 180/1540] avg loss 0.00131214, throughput 2.77852K wps
[Epoch 58 Batch 210/1540] avg loss 0.00143023, throughput 2.85106K wps
[Epoch 58 Batch 240/1540] avg loss 0.00146579, throughput 2.84944K wps
[Epoch 58 Batch 270/1540] avg loss 0.00142265, throughput 2.85113K wps
[Epoch 58 Batch 300/1540] avg loss 0.00114333, throughput 2.84779K wps
[Epoch 58 Batch 330/1540] avg loss 0.00148896, throughput 2.79686K wps
[Epoch 58 Batch 360/1540] avg loss 0.00152627, throughput 2.79829K wps
[Epoch 58 Batch 390/1540] avg loss 0.00133097, throughput 2.85293K wps
[Epoch 58 Batch 420/1540] avg loss 0.00160774, throughput 2.80743K wps
[Epoch 58 Batch 450/1540] avg loss 0.00142148, throughput 2.84302K wps
[Epoch 58 Batch 480/1540] avg loss 0.00141421, throughput 2.84286K wps
[Epoch 58 Batch 510/1540] avg loss 0.00168753, throughput 2.84176K wps
[Epoch 58 Batch 540/1540] avg loss 0.00160939, throughput 2.84566K wps
[Epoch 58 Batch 570/1540] avg loss 0.00133213, throughput 2.84867K wps
[Epoch 58 Batch 600/1540] avg loss 0.00157643, throughput 2.84074K wps
[Epoch 58 Batch 630/1540] avg loss 0.00135991, throughput 2.84372K wps
[Epoch 58 Batch 660/1540] avg loss 0.00149075, throughput 2.78156K wps
[Epoch 58 Batch 690/1540] avg loss 0.00132577, throughput 2.81602K wps
[Epoch 58 Batch 720/1540] avg loss 0.00164531, throughput 2.82656K wps
[Epoch 58 Batch 750/1540] avg loss 0.0015118, throughput 2.81818K wps
[Epoch 58 Batch 780/1540] avg loss 0.00126599, throughput 2.85545K wps
[Epoch 58 Batch 810/1540] avg loss 0.00137705, throughput 2.85074K wps
[Epoch 58 Batch 840/1540] avg loss 0.00127737, throughput 2.80719K wps
[Epoch 58 Batch 870/1540] avg loss 0.0013948, throughput 2.85347K wps
[Epoch 58 Batch 900/1540] avg loss 0.00132347, throughput 2.84732K wps
[Epoch 58 Batch 930/1540] avg loss 0.00132708, throughput 2.82047K wps
[Epoch 58 Batch 960/1540] avg loss 0.00157732, throughput 2.81038K wps
[Epoch 58 Batch 990/1540] avg loss 0.00165797, throughput 2.83232K wps
[Epoch 58 Batch 1020/1540] avg loss 0.00132239, throughput 2.78408K wps
[Epoch 58 Batch 1050/1540] avg loss 0.00142599, throughput 2.84629K wps
[Epoch 58 Batch 1080/1540] avg loss 0.0015613, throughput 2.8493K wps
[Epoch 58 Batch 1110/1540] avg loss 0.00163818, throughput 2.84766K wps
[Epoch 58 Batch 1140/1540] avg loss 0.00140377, throughput 2.85423K wps
[Epoch 58 Batch 1170/1540] avg loss 0.00171028, throughput 2.7745K wps
[Epoch 58 Batch 1200/1540] avg loss 0.00144547, throughput 2.82782K wps
[Epoch 58 Batch 1230/1540] avg loss 0.0014528, throughput 2.84027K wps
[Epoch 58 Batch 1260/1540] avg loss 0.00137822, throughput 2.7749K wps
[Epoch 58 Batch 1290/1540] avg loss 0.00146436, throughput 2.76773K wps
[Epoch 58 Batch 1320/1540] avg loss 0.00144912, throughput 2.84249K wps
[Epoch 58 Batch 1350/1540] avg loss 0.0015769, throughput 2.84684K wps
[Epoch 58 Batch 1380/1540] avg loss 0.00159537, throughput 2.8427K wps
[Epoch 58 Batch 1410/1540] avg loss 0.00169515, throughput 2.77942K wps
[Epoch 58 Batch 1440/1540] avg loss 0.00137708, throughput 2.76114K wps
[Epoch 58 Batch 1470/1540] avg loss 0.00133448, throughput 2.78693K wps
[Epoch 58 Batch 1500/1540] avg loss 0.00175897, throughput 2.79932K wps
[Epoch 58 Batch 1530/1540] avg loss 0.00149376, throughput 2.81968K wps
Begin Testing...
[Epoch 58] train avg loss 0.00145919, dev acc 0.8245, dev avg loss 0.654079, throughput 2.82094K wps
[Epoch 59 Batch 30/1540] avg loss 0.00139987, throughput 2.84397K wps
[Epoch 59 Batch 60/1540] avg loss 0.00130357, throughput 2.78358K wps
[Epoch 59 Batch 90/1540] avg loss 0.00113068, throughput 2.84125K wps
[Epoch 59 Batch 120/1540] avg loss 0.00123197, throughput 2.84734K wps
[Epoch 59 Batch 150/1540] avg loss 0.00137386, throughput 2.84499K wps
[Epoch 59 Batch 180/1540] avg loss 0.00128331, throughput 2.84681K wps
[Epoch 59 Batch 210/1540] avg loss 0.00157104, throughput 2.77153K wps
[Epoch 59 Batch 240/1540] avg loss 0.0014308, throughput 2.75958K wps
[Epoch 59 Batch 270/1540] avg loss 0.00133157, throughput 2.84517K wps
[Epoch 59 Batch 300/1540] avg loss 0.00143883, throughput 2.8435K wps
[Epoch 59 Batch 330/1540] avg loss 0.00148573, throughput 2.84628K wps
[Epoch 59 Batch 360/1540] avg loss 0.00152791, throughput 2.8427K wps
[Epoch 59 Batch 390/1540] avg loss 0.00139824, throughput 2.8433K wps
[Epoch 59 Batch 420/1540] avg loss 0.00155347, throughput 2.84446K wps
[Epoch 59 Batch 450/1540] avg loss 0.00138601, throughput 2.83643K wps
[Epoch 59 Batch 480/1540] avg loss 0.00126286, throughput 2.84852K wps
[Epoch 59 Batch 510/1540] avg loss 0.00156163, throughput 2.85063K wps
[Epoch 59 Batch 540/1540] avg loss 0.00145297, throughput 2.84547K wps
[Epoch 59 Batch 570/1540] avg loss 0.00143702, throughput 2.83747K wps
[Epoch 59 Batch 600/1540] avg loss 0.00143435, throughput 2.792K wps
[Epoch 59 Batch 630/1540] avg loss 0.00163598, throughput 2.82443K wps
[Epoch 59 Batch 660/1540] avg loss 0.00135749, throughput 2.84555K wps
[Epoch 59 Batch 690/1540] avg loss 0.00146952, throughput 2.84691K wps
[Epoch 59 Batch 720/1540] avg loss 0.00149412, throughput 2.84548K wps
[Epoch 59 Batch 750/1540] avg loss 0.00147079, throughput 2.8173K wps
[Epoch 59 Batch 780/1540] avg loss 0.0014942, throughput 2.81314K wps
[Epoch 59 Batch 810/1540] avg loss 0.0013903, throughput 2.85141K wps
[Epoch 59 Batch 840/1540] avg loss 0.00135859, throughput 2.82072K wps
[Epoch 59 Batch 870/1540] avg loss 0.00171363, throughput 2.77505K wps
[Epoch 59 Batch 900/1540] avg loss 0.00149877, throughput 2.83869K wps
[Epoch 59 Batch 930/1540] avg loss 0.00173806, throughput 2.80165K wps
[Epoch 59 Batch 960/1540] avg loss 0.00144088, throughput 2.75734K wps
[Epoch 59 Batch 990/1540] avg loss 0.00153407, throughput 2.76622K wps
[Epoch 59 Batch 1020/1540] avg loss 0.0016303, throughput 2.76745K wps
[Epoch 59 Batch 1050/1540] avg loss 0.00138672, throughput 2.78622K wps
[Epoch 59 Batch 1080/1540] avg loss 0.00157688, throughput 2.75365K wps
[Epoch 59 Batch 1110/1540] avg loss 0.00169469, throughput 2.83257K wps
[Epoch 59 Batch 1140/1540] avg loss 0.00151352, throughput 2.81719K wps
[Epoch 59 Batch 1170/1540] avg loss 0.00156639, throughput 2.77908K wps
[Epoch 59 Batch 1200/1540] avg loss 0.00127393, throughput 2.77162K wps
[Epoch 59 Batch 1230/1540] avg loss 0.00158113, throughput 2.84346K wps
[Epoch 59 Batch 1260/1540] avg loss 0.00156968, throughput 2.82398K wps
[Epoch 59 Batch 1290/1540] avg loss 0.00153688, throughput 2.82508K wps
[Epoch 59 Batch 1320/1540] avg loss 0.00185722, throughput 2.84346K wps
[Epoch 59 Batch 1350/1540] avg loss 0.00149464, throughput 2.79316K wps
[Epoch 59 Batch 1380/1540] avg loss 0.00168499, throughput 2.84645K wps
[Epoch 59 Batch 1410/1540] avg loss 0.00160863, throughput 2.76592K wps
[Epoch 59 Batch 1440/1540] avg loss 0.00124217, throughput 2.84925K wps
[Epoch 59 Batch 1470/1540] avg loss 0.00121381, throughput 2.85201K wps
[Epoch 59 Batch 1500/1540] avg loss 0.00159445, throughput 2.84762K wps
[Epoch 59 Batch 1530/1540] avg loss 0.00132675, throughput 2.81978K wps
Begin Testing...
[Epoch 59] train avg loss 0.00146896, dev acc 0.8349, dev avg loss 0.643498, throughput 2.82014K wps
[Epoch 60 Batch 30/1540] avg loss 0.00117369, throughput 2.81393K wps
[Epoch 60 Batch 60/1540] avg loss 0.0013757, throughput 2.76689K wps
[Epoch 60 Batch 90/1540] avg loss 0.00133022, throughput 2.79739K wps
[Epoch 60 Batch 120/1540] avg loss 0.00111949, throughput 2.85K wps
[Epoch 60 Batch 150/1540] avg loss 0.00146514, throughput 2.85416K wps
[Epoch 60 Batch 180/1540] avg loss 0.00133565, throughput 2.84607K wps
[Epoch 60 Batch 210/1540] avg loss 0.0012855, throughput 2.76251K wps
[Epoch 60 Batch 240/1540] avg loss 0.00112362, throughput 2.82932K wps
[Epoch 60 Batch 270/1540] avg loss 0.00147625, throughput 2.85016K wps
[Epoch 60 Batch 300/1540] avg loss 0.00123575, throughput 2.84891K wps
[Epoch 60 Batch 330/1540] avg loss 0.0011794, throughput 2.84917K wps
[Epoch 60 Batch 360/1540] avg loss 0.00119352, throughput 2.85233K wps
[Epoch 60 Batch 390/1540] avg loss 0.00159533, throughput 2.84902K wps
[Epoch 60 Batch 420/1540] avg loss 0.00138114, throughput 2.77759K wps
[Epoch 60 Batch 450/1540] avg loss 0.00128205, throughput 2.77449K wps
[Epoch 60 Batch 480/1540] avg loss 0.00169929, throughput 2.85081K wps
[Epoch 60 Batch 510/1540] avg loss 0.00135043, throughput 2.8447K wps
[Epoch 60 Batch 540/1540] avg loss 0.00176204, throughput 2.84263K wps
[Epoch 60 Batch 570/1540] avg loss 0.0015567, throughput 2.7925K wps
[Epoch 60 Batch 600/1540] avg loss 0.00130566, throughput 2.82993K wps
[Epoch 60 Batch 630/1540] avg loss 0.00134265, throughput 2.83387K wps
[Epoch 60 Batch 660/1540] avg loss 0.00158841, throughput 2.84235K wps
[Epoch 60 Batch 690/1540] avg loss 0.00137706, throughput 2.83701K wps
[Epoch 60 Batch 720/1540] avg loss 0.00157484, throughput 2.84308K wps
[Epoch 60 Batch 750/1540] avg loss 0.00134522, throughput 2.80415K wps
[Epoch 60 Batch 780/1540] avg loss 0.00146616, throughput 2.76855K wps
[Epoch 60 Batch 810/1540] avg loss 0.00112373, throughput 2.79586K wps
[Epoch 60 Batch 840/1540] avg loss 0.00165787, throughput 2.78755K wps
[Epoch 60 Batch 870/1540] avg loss 0.00163492, throughput 2.80522K wps
[Epoch 60 Batch 900/1540] avg loss 0.00137516, throughput 2.77978K wps
[Epoch 60 Batch 930/1540] avg loss 0.00138358, throughput 2.84302K wps
[Epoch 60 Batch 960/1540] avg loss 0.00161132, throughput 2.77964K wps
[Epoch 60 Batch 990/1540] avg loss 0.00134832, throughput 2.84252K wps
[Epoch 60 Batch 1020/1540] avg loss 0.00125976, throughput 2.85176K wps
[Epoch 60 Batch 1050/1540] avg loss 0.00156183, throughput 2.82649K wps
[Epoch 60 Batch 1080/1540] avg loss 0.00145084, throughput 2.7527K wps
[Epoch 60 Batch 1110/1540] avg loss 0.00161776, throughput 2.83056K wps
[Epoch 60 Batch 1140/1540] avg loss 0.0016163, throughput 2.82557K wps
[Epoch 60 Batch 1170/1540] avg loss 0.00158298, throughput 2.80118K wps
[Epoch 60 Batch 1200/1540] avg loss 0.00192436, throughput 2.84807K wps
[Epoch 60 Batch 1230/1540] avg loss 0.00161119, throughput 2.84974K wps
[Epoch 60 Batch 1260/1540] avg loss 0.00164994, throughput 2.82043K wps
[Epoch 60 Batch 1290/1540] avg loss 0.00157602, throughput 2.84483K wps
[Epoch 60 Batch 1320/1540] avg loss 0.00151437, throughput 2.84651K wps
[Epoch 60 Batch 1350/1540] avg loss 0.00151765, throughput 2.8443K wps
[Epoch 60 Batch 1380/1540] avg loss 0.00132822, throughput 2.75566K wps
[Epoch 60 Batch 1410/1540] avg loss 0.00173758, throughput 2.84666K wps
[Epoch 60 Batch 1440/1540] avg loss 0.0017401, throughput 2.84221K wps
[Epoch 60 Batch 1470/1540] avg loss 0.00153061, throughput 2.82743K wps
[Epoch 60 Batch 1500/1540] avg loss 0.00145844, throughput 2.7726K wps
[Epoch 60 Batch 1530/1540] avg loss 0.00153394, throughput 2.81729K wps
Begin Testing...
[Epoch 60] train avg loss 0.00145552, dev acc 0.8257, dev avg loss 0.65365, throughput 2.81988K wps
[Epoch 61 Batch 30/1540] avg loss 0.00130464, throughput 2.86116K wps
[Epoch 61 Batch 60/1540] avg loss 0.00118549, throughput 2.82812K wps
[Epoch 61 Batch 90/1540] avg loss 0.00125437, throughput 2.75602K wps
[Epoch 61 Batch 120/1540] avg loss 0.00138617, throughput 2.76118K wps
[Epoch 61 Batch 150/1540] avg loss 0.0012557, throughput 2.75915K wps
[Epoch 61 Batch 180/1540] avg loss 0.0013272, throughput 2.78964K wps
[Epoch 61 Batch 210/1540] avg loss 0.00135655, throughput 2.81062K wps
[Epoch 61 Batch 240/1540] avg loss 0.00157159, throughput 2.8503K wps
[Epoch 61 Batch 270/1540] avg loss 0.00139137, throughput 2.76778K wps
[Epoch 61 Batch 300/1540] avg loss 0.0015625, throughput 2.79267K wps
[Epoch 61 Batch 330/1540] avg loss 0.00139526, throughput 2.75651K wps
[Epoch 61 Batch 360/1540] avg loss 0.0014595, throughput 2.82026K wps
[Epoch 61 Batch 390/1540] avg loss 0.00143465, throughput 2.79275K wps
[Epoch 61 Batch 420/1540] avg loss 0.00121324, throughput 2.79349K wps
[Epoch 61 Batch 450/1540] avg loss 0.00140501, throughput 2.79178K wps
[Epoch 61 Batch 480/1540] avg loss 0.00116383, throughput 2.82504K wps
[Epoch 61 Batch 510/1540] avg loss 0.0014182, throughput 2.81637K wps
[Epoch 61 Batch 540/1540] avg loss 0.00138755, throughput 2.84832K wps
[Epoch 61 Batch 570/1540] avg loss 0.00166936, throughput 2.78007K wps
[Epoch 61 Batch 600/1540] avg loss 0.00116663, throughput 2.79365K wps
[Epoch 61 Batch 630/1540] avg loss 0.00128315, throughput 2.78993K wps
[Epoch 61 Batch 660/1540] avg loss 0.00154346, throughput 2.82299K wps
[Epoch 61 Batch 690/1540] avg loss 0.00164631, throughput 2.77385K wps
[Epoch 61 Batch 720/1540] avg loss 0.00167953, throughput 2.84032K wps
[Epoch 61 Batch 750/1540] avg loss 0.00138514, throughput 2.85307K wps
[Epoch 61 Batch 780/1540] avg loss 0.00139452, throughput 2.81979K wps
[Epoch 61 Batch 810/1540] avg loss 0.00129579, throughput 2.79552K wps
[Epoch 61 Batch 840/1540] avg loss 0.00161038, throughput 2.82114K wps
[Epoch 61 Batch 870/1540] avg loss 0.00138978, throughput 2.79722K wps
[Epoch 61 Batch 900/1540] avg loss 0.00139567, throughput 2.75355K wps
[Epoch 61 Batch 930/1540] avg loss 0.00100603, throughput 2.76196K wps
[Epoch 61 Batch 960/1540] avg loss 0.00136696, throughput 2.80666K wps
[Epoch 61 Batch 990/1540] avg loss 0.00122766, throughput 2.79695K wps
[Epoch 61 Batch 1020/1540] avg loss 0.00173051, throughput 2.82462K wps
[Epoch 61 Batch 1050/1540] avg loss 0.00145737, throughput 2.84722K wps
[Epoch 61 Batch 1080/1540] avg loss 0.00146945, throughput 2.82087K wps
[Epoch 61 Batch 1110/1540] avg loss 0.0013825, throughput 2.84181K wps
[Epoch 61 Batch 1140/1540] avg loss 0.00154693, throughput 2.81766K wps
[Epoch 61 Batch 1170/1540] avg loss 0.00136376, throughput 2.80304K wps
[Epoch 61 Batch 1200/1540] avg loss 0.00152953, throughput 2.81321K wps
[Epoch 61 Batch 1230/1540] avg loss 0.0016965, throughput 2.83656K wps
[Epoch 61 Batch 1260/1540] avg loss 0.00189669, throughput 2.78896K wps
[Epoch 61 Batch 1290/1540] avg loss 0.00144306, throughput 2.76311K wps
[Epoch 61 Batch 1320/1540] avg loss 0.00177247, throughput 2.81284K wps
[Epoch 61 Batch 1350/1540] avg loss 0.00162822, throughput 2.84034K wps
[Epoch 61 Batch 1380/1540] avg loss 0.00139093, throughput 2.81193K wps
[Epoch 61 Batch 1410/1540] avg loss 0.00154936, throughput 2.82117K wps
[Epoch 61 Batch 1440/1540] avg loss 0.00122182, throughput 2.81648K wps
[Epoch 61 Batch 1470/1540] avg loss 0.00136803, throughput 2.838K wps
[Epoch 61 Batch 1500/1540] avg loss 0.00135475, throughput 2.84485K wps
[Epoch 61 Batch 1530/1540] avg loss 0.00146876, throughput 2.84841K wps
Begin Testing...
[Epoch 61] train avg loss 0.00142649, dev acc 0.8234, dev avg loss 0.649793, throughput 2.80815K wps
[Epoch 62 Batch 30/1540] avg loss 0.00138637, throughput 2.87196K wps
[Epoch 62 Batch 60/1540] avg loss 0.00130343, throughput 2.84959K wps
[Epoch 62 Batch 90/1540] avg loss 0.00122063, throughput 2.85283K wps
[Epoch 62 Batch 120/1540] avg loss 0.00144796, throughput 2.84669K wps
[Epoch 62 Batch 150/1540] avg loss 0.00135203, throughput 2.83926K wps
[Epoch 62 Batch 180/1540] avg loss 0.00148897, throughput 2.82079K wps
[Epoch 62 Batch 210/1540] avg loss 0.00132415, throughput 2.84488K wps
[Epoch 62 Batch 240/1540] avg loss 0.00138481, throughput 2.84736K wps
[Epoch 62 Batch 270/1540] avg loss 0.00129471, throughput 2.84694K wps
[Epoch 62 Batch 300/1540] avg loss 0.00147089, throughput 2.79575K wps
[Epoch 62 Batch 330/1540] avg loss 0.00134709, throughput 2.8467K wps
[Epoch 62 Batch 360/1540] avg loss 0.00114738, throughput 2.85076K wps
[Epoch 62 Batch 390/1540] avg loss 0.00144368, throughput 2.7961K wps
[Epoch 62 Batch 420/1540] avg loss 0.00132481, throughput 2.81613K wps
[Epoch 62 Batch 450/1540] avg loss 0.00129132, throughput 2.82823K wps
[Epoch 62 Batch 480/1540] avg loss 0.00152403, throughput 2.78062K wps
[Epoch 62 Batch 510/1540] avg loss 0.00164586, throughput 2.81273K wps
[Epoch 62 Batch 540/1540] avg loss 0.00120421, throughput 2.80177K wps
[Epoch 62 Batch 570/1540] avg loss 0.00104443, throughput 2.77491K wps
[Epoch 62 Batch 600/1540] avg loss 0.00143804, throughput 2.82734K wps
[Epoch 62 Batch 630/1540] avg loss 0.00139947, throughput 2.82376K wps
[Epoch 62 Batch 660/1540] avg loss 0.00152207, throughput 2.78464K wps
[Epoch 62 Batch 690/1540] avg loss 0.00146401, throughput 2.8277K wps
[Epoch 62 Batch 720/1540] avg loss 0.00133404, throughput 2.8088K wps
[Epoch 62 Batch 750/1540] avg loss 0.00151244, throughput 2.7697K wps
[Epoch 62 Batch 780/1540] avg loss 0.00161694, throughput 2.78407K wps
[Epoch 62 Batch 810/1540] avg loss 0.00136599, throughput 2.78886K wps
[Epoch 62 Batch 840/1540] avg loss 0.00135625, throughput 2.83094K wps
[Epoch 62 Batch 870/1540] avg loss 0.00147714, throughput 2.79684K wps
[Epoch 62 Batch 900/1540] avg loss 0.00154096, throughput 2.80296K wps
[Epoch 62 Batch 930/1540] avg loss 0.00126495, throughput 2.74256K wps
[Epoch 62 Batch 960/1540] avg loss 0.00151919, throughput 2.81575K wps
[Epoch 62 Batch 990/1540] avg loss 0.00156871, throughput 2.82651K wps
[Epoch 62 Batch 1020/1540] avg loss 0.0013574, throughput 2.81381K wps
[Epoch 62 Batch 1050/1540] avg loss 0.00145772, throughput 2.84425K wps
[Epoch 62 Batch 1080/1540] avg loss 0.00151033, throughput 2.77726K wps
[Epoch 62 Batch 1110/1540] avg loss 0.00178498, throughput 2.83725K wps
[Epoch 62 Batch 1140/1540] avg loss 0.00157154, throughput 2.76393K wps
[Epoch 62 Batch 1170/1540] avg loss 0.00117728, throughput 2.82456K wps
[Epoch 62 Batch 1200/1540] avg loss 0.00150129, throughput 2.84738K wps
[Epoch 62 Batch 1230/1540] avg loss 0.0012973, throughput 2.79508K wps
[Epoch 62 Batch 1260/1540] avg loss 0.00170154, throughput 2.81676K wps
[Epoch 62 Batch 1290/1540] avg loss 0.00161974, throughput 2.84797K wps
[Epoch 62 Batch 1320/1540] avg loss 0.00158753, throughput 2.83716K wps
[Epoch 62 Batch 1350/1540] avg loss 0.00143369, throughput 2.84158K wps
[Epoch 62 Batch 1380/1540] avg loss 0.00135791, throughput 2.76113K wps
[Epoch 62 Batch 1410/1540] avg loss 0.00138546, throughput 2.77488K wps
[Epoch 62 Batch 1440/1540] avg loss 0.00136421, throughput 2.82712K wps
[Epoch 62 Batch 1470/1540] avg loss 0.00132265, throughput 2.80118K wps
[Epoch 62 Batch 1500/1540] avg loss 0.00162622, throughput 2.75114K wps
[Epoch 62 Batch 1530/1540] avg loss 0.00124333, throughput 2.75702K wps
Begin Testing...
[Epoch 62] train avg loss 0.00142061, dev acc 0.8303, dev avg loss 0.65112, throughput 2.81255K wps
[Epoch 63 Batch 30/1540] avg loss 0.00132806, throughput 2.81791K wps
[Epoch 63 Batch 60/1540] avg loss 0.00131403, throughput 2.81186K wps
[Epoch 63 Batch 90/1540] avg loss 0.00117651, throughput 2.83921K wps
[Epoch 63 Batch 120/1540] avg loss 0.00105522, throughput 2.84222K wps
[Epoch 63 Batch 150/1540] avg loss 0.00127495, throughput 2.81207K wps
[Epoch 63 Batch 180/1540] avg loss 0.00128982, throughput 2.81773K wps
[Epoch 63 Batch 210/1540] avg loss 0.0013639, throughput 2.84688K wps
[Epoch 63 Batch 240/1540] avg loss 0.00149786, throughput 2.8375K wps
[Epoch 63 Batch 270/1540] avg loss 0.00113049, throughput 2.84882K wps
[Epoch 63 Batch 300/1540] avg loss 0.00110781, throughput 2.85057K wps
[Epoch 63 Batch 330/1540] avg loss 0.00113252, throughput 2.79278K wps
[Epoch 63 Batch 360/1540] avg loss 0.00158116, throughput 2.84763K wps
[Epoch 63 Batch 390/1540] avg loss 0.00152096, throughput 2.82461K wps
[Epoch 63 Batch 420/1540] avg loss 0.00140044, throughput 2.75979K wps
[Epoch 63 Batch 450/1540] avg loss 0.00118338, throughput 2.84621K wps
[Epoch 63 Batch 480/1540] avg loss 0.00127823, throughput 2.8465K wps
[Epoch 63 Batch 510/1540] avg loss 0.00153134, throughput 2.84476K wps
[Epoch 63 Batch 540/1540] avg loss 0.00123096, throughput 2.75862K wps
[Epoch 63 Batch 570/1540] avg loss 0.00122172, throughput 2.75495K wps
[Epoch 63 Batch 600/1540] avg loss 0.00130657, throughput 2.77243K wps
[Epoch 63 Batch 630/1540] avg loss 0.00147657, throughput 2.81226K wps
[Epoch 63 Batch 660/1540] avg loss 0.00150962, throughput 2.84909K wps
[Epoch 63 Batch 690/1540] avg loss 0.00141608, throughput 2.84704K wps
[Epoch 63 Batch 720/1540] avg loss 0.00127577, throughput 2.78799K wps
[Epoch 63 Batch 750/1540] avg loss 0.00133038, throughput 2.78665K wps
[Epoch 63 Batch 780/1540] avg loss 0.00131399, throughput 2.79055K wps
[Epoch 63 Batch 810/1540] avg loss 0.00144311, throughput 2.84972K wps
[Epoch 63 Batch 840/1540] avg loss 0.00134906, throughput 2.76396K wps
[Epoch 63 Batch 870/1540] avg loss 0.00150545, throughput 2.82298K wps
[Epoch 63 Batch 900/1540] avg loss 0.00122393, throughput 2.8537K wps
[Epoch 63 Batch 930/1540] avg loss 0.00152499, throughput 2.83278K wps
[Epoch 63 Batch 960/1540] avg loss 0.00132052, throughput 2.8405K wps
[Epoch 63 Batch 990/1540] avg loss 0.00134065, throughput 2.84413K wps
[Epoch 63 Batch 1020/1540] avg loss 0.00160744, throughput 2.8426K wps
[Epoch 63 Batch 1050/1540] avg loss 0.00147255, throughput 2.85058K wps
[Epoch 63 Batch 1080/1540] avg loss 0.00121137, throughput 2.85014K wps
[Epoch 63 Batch 1110/1540] avg loss 0.00124743, throughput 2.77184K wps
[Epoch 63 Batch 1140/1540] avg loss 0.00135232, throughput 2.82614K wps
[Epoch 63 Batch 1170/1540] avg loss 0.00114236, throughput 2.84461K wps
[Epoch 63 Batch 1200/1540] avg loss 0.0017607, throughput 2.8429K wps
[Epoch 63 Batch 1230/1540] avg loss 0.00147259, throughput 2.77631K wps
[Epoch 63 Batch 1260/1540] avg loss 0.00153311, throughput 2.848K wps
[Epoch 63 Batch 1290/1540] avg loss 0.00146093, throughput 2.80916K wps
[Epoch 63 Batch 1320/1540] avg loss 0.00161033, throughput 2.81916K wps
[Epoch 63 Batch 1350/1540] avg loss 0.00184946, throughput 2.76335K wps
[Epoch 63 Batch 1380/1540] avg loss 0.00137802, throughput 2.76038K wps
[Epoch 63 Batch 1410/1540] avg loss 0.00169654, throughput 2.82882K wps
[Epoch 63 Batch 1440/1540] avg loss 0.00113288, throughput 2.84547K wps
[Epoch 63 Batch 1470/1540] avg loss 0.00130561, throughput 2.82107K wps
[Epoch 63 Batch 1500/1540] avg loss 0.00163725, throughput 2.77552K wps
[Epoch 63 Batch 1530/1540] avg loss 0.00149861, throughput 2.81208K wps
Begin Testing...
[Epoch 63] train avg loss 0.00137903, dev acc 0.8245, dev avg loss 0.686618, throughput 2.81788K wps
[Epoch 64 Batch 30/1540] avg loss 0.00115622, throughput 2.82561K wps
[Epoch 64 Batch 60/1540] avg loss 0.00105243, throughput 2.76566K wps
[Epoch 64 Batch 90/1540] avg loss 0.00123324, throughput 2.78009K wps
[Epoch 64 Batch 120/1540] avg loss 0.00151776, throughput 2.85386K wps
[Epoch 64 Batch 150/1540] avg loss 0.00129068, throughput 2.84621K wps
[Epoch 64 Batch 180/1540] avg loss 0.00132794, throughput 2.8513K wps
[Epoch 64 Batch 210/1540] avg loss 0.00135883, throughput 2.83867K wps
[Epoch 64 Batch 240/1540] avg loss 0.00129849, throughput 2.825K wps
[Epoch 64 Batch 270/1540] avg loss 0.00154468, throughput 2.81033K wps
[Epoch 64 Batch 300/1540] avg loss 0.0012186, throughput 2.78643K wps
[Epoch 64 Batch 330/1540] avg loss 0.0012284, throughput 2.81323K wps
[Epoch 64 Batch 360/1540] avg loss 0.0013638, throughput 2.81472K wps
[Epoch 64 Batch 390/1540] avg loss 0.00135947, throughput 2.83331K wps
[Epoch 64 Batch 420/1540] avg loss 0.00167563, throughput 2.7771K wps
[Epoch 64 Batch 450/1540] avg loss 0.00118884, throughput 2.84738K wps
[Epoch 64 Batch 480/1540] avg loss 0.00133961, throughput 2.77489K wps
[Epoch 64 Batch 510/1540] avg loss 0.00152414, throughput 2.77266K wps
[Epoch 64 Batch 540/1540] avg loss 0.00165878, throughput 2.76002K wps
[Epoch 64 Batch 570/1540] avg loss 0.00120172, throughput 2.75797K wps
[Epoch 64 Batch 600/1540] avg loss 0.00152181, throughput 2.81752K wps
[Epoch 64 Batch 630/1540] avg loss 0.00127372, throughput 2.84691K wps
[Epoch 64 Batch 660/1540] avg loss 0.00141417, throughput 2.844K wps
[Epoch 64 Batch 690/1540] avg loss 0.00121989, throughput 2.85026K wps
[Epoch 64 Batch 720/1540] avg loss 0.00141509, throughput 2.846K wps
[Epoch 64 Batch 750/1540] avg loss 0.00138204, throughput 2.80501K wps
[Epoch 64 Batch 780/1540] avg loss 0.00138077, throughput 2.8441K wps
[Epoch 64 Batch 810/1540] avg loss 0.00132936, throughput 2.84699K wps
[Epoch 64 Batch 840/1540] avg loss 0.00167398, throughput 2.81519K wps
[Epoch 64 Batch 870/1540] avg loss 0.00145166, throughput 2.81217K wps
[Epoch 64 Batch 900/1540] avg loss 0.00144638, throughput 2.7715K wps
[Epoch 64 Batch 930/1540] avg loss 0.00149528, throughput 2.82965K wps
[Epoch 64 Batch 960/1540] avg loss 0.00132827, throughput 2.76056K wps
[Epoch 64 Batch 990/1540] avg loss 0.00158138, throughput 2.8189K wps
[Epoch 64 Batch 1020/1540] avg loss 0.0013813, throughput 2.80474K wps
[Epoch 64 Batch 1050/1540] avg loss 0.00147229, throughput 2.84657K wps
[Epoch 64 Batch 1080/1540] avg loss 0.00154025, throughput 2.78526K wps
[Epoch 64 Batch 1110/1540] avg loss 0.001353, throughput 2.84204K wps
[Epoch 64 Batch 1140/1540] avg loss 0.00146589, throughput 2.84752K wps
[Epoch 64 Batch 1170/1540] avg loss 0.00146422, throughput 2.8104K wps
[Epoch 64 Batch 1200/1540] avg loss 0.00145479, throughput 2.80471K wps
[Epoch 64 Batch 1230/1540] avg loss 0.0013854, throughput 2.8029K wps
[Epoch 64 Batch 1260/1540] avg loss 0.00153213, throughput 2.7607K wps
[Epoch 64 Batch 1290/1540] avg loss 0.00155651, throughput 2.78565K wps
[Epoch 64 Batch 1320/1540] avg loss 0.00124212, throughput 2.79234K wps
[Epoch 64 Batch 1350/1540] avg loss 0.00125476, throughput 2.82292K wps
[Epoch 64 Batch 1380/1540] avg loss 0.00141286, throughput 2.78618K wps
[Epoch 64 Batch 1410/1540] avg loss 0.00121107, throughput 2.82455K wps
[Epoch 64 Batch 1440/1540] avg loss 0.00158293, throughput 2.81081K wps
[Epoch 64 Batch 1470/1540] avg loss 0.00154424, throughput 2.81969K wps
[Epoch 64 Batch 1500/1540] avg loss 0.00140736, throughput 2.81333K wps
[Epoch 64 Batch 1530/1540] avg loss 0.00130366, throughput 2.8388K wps
Begin Testing...
[Epoch 64] train avg loss 0.00138983, dev acc 0.8280, dev avg loss 0.669036, throughput 2.8124K wps
[Epoch 65 Batch 30/1540] avg loss 0.00127652, throughput 2.86401K wps
[Epoch 65 Batch 60/1540] avg loss 0.00116569, throughput 2.76298K wps
[Epoch 65 Batch 90/1540] avg loss 0.0011066, throughput 2.83905K wps
[Epoch 65 Batch 120/1540] avg loss 0.00148646, throughput 2.82419K wps
[Epoch 65 Batch 150/1540] avg loss 0.00151174, throughput 2.83855K wps
[Epoch 65 Batch 180/1540] avg loss 0.00107414, throughput 2.79472K wps
[Epoch 65 Batch 210/1540] avg loss 0.00129382, throughput 2.80534K wps
[Epoch 65 Batch 240/1540] avg loss 0.00130285, throughput 2.80838K wps
[Epoch 65 Batch 270/1540] avg loss 0.00133183, throughput 2.84599K wps
[Epoch 65 Batch 300/1540] avg loss 0.00123916, throughput 2.75712K wps
[Epoch 65 Batch 330/1540] avg loss 0.00149177, throughput 2.7629K wps
[Epoch 65 Batch 360/1540] avg loss 0.00144148, throughput 2.76253K wps
[Epoch 65 Batch 390/1540] avg loss 0.00114066, throughput 2.78116K wps
[Epoch 65 Batch 420/1540] avg loss 0.00108564, throughput 2.79308K wps
[Epoch 65 Batch 450/1540] avg loss 0.00138203, throughput 2.84269K wps
[Epoch 65 Batch 480/1540] avg loss 0.00108595, throughput 2.75275K wps
[Epoch 65 Batch 510/1540] avg loss 0.00159464, throughput 2.80309K wps
[Epoch 65 Batch 540/1540] avg loss 0.00136934, throughput 2.84858K wps
[Epoch 65 Batch 570/1540] avg loss 0.00154095, throughput 2.83867K wps
[Epoch 65 Batch 600/1540] avg loss 0.00165256, throughput 2.8081K wps
[Epoch 65 Batch 630/1540] avg loss 0.00129357, throughput 2.84599K wps
[Epoch 65 Batch 660/1540] avg loss 0.00159489, throughput 2.854K wps
[Epoch 65 Batch 690/1540] avg loss 0.00136334, throughput 2.84419K wps
[Epoch 65 Batch 720/1540] avg loss 0.00142836, throughput 2.8515K wps
[Epoch 65 Batch 750/1540] avg loss 0.00113471, throughput 2.84187K wps
[Epoch 65 Batch 780/1540] avg loss 0.00116026, throughput 2.78996K wps
[Epoch 65 Batch 810/1540] avg loss 0.00145876, throughput 2.84277K wps
[Epoch 65 Batch 840/1540] avg loss 0.00129385, throughput 2.8458K wps
[Epoch 65 Batch 870/1540] avg loss 0.00140733, throughput 2.77792K wps
[Epoch 65 Batch 900/1540] avg loss 0.00157643, throughput 2.8356K wps
[Epoch 65 Batch 930/1540] avg loss 0.00151332, throughput 2.84796K wps
[Epoch 65 Batch 960/1540] avg loss 0.00135382, throughput 2.79085K wps
[Epoch 65 Batch 990/1540] avg loss 0.00120381, throughput 2.83131K wps
[Epoch 65 Batch 1020/1540] avg loss 0.001228, throughput 2.80909K wps
[Epoch 65 Batch 1050/1540] avg loss 0.00111697, throughput 2.82827K wps
[Epoch 65 Batch 1080/1540] avg loss 0.00165816, throughput 2.83086K wps
[Epoch 65 Batch 1110/1540] avg loss 0.00138381, throughput 2.85376K wps
[Epoch 65 Batch 1140/1540] avg loss 0.00128691, throughput 2.85236K wps
[Epoch 65 Batch 1170/1540] avg loss 0.00130891, throughput 2.81565K wps
[Epoch 65 Batch 1200/1540] avg loss 0.00149743, throughput 2.84127K wps
[Epoch 65 Batch 1230/1540] avg loss 0.00136877, throughput 2.76456K wps
[Epoch 65 Batch 1260/1540] avg loss 0.00139328, throughput 2.76125K wps
[Epoch 65 Batch 1290/1540] avg loss 0.00149344, throughput 2.76101K wps
[Epoch 65 Batch 1320/1540] avg loss 0.00162544, throughput 2.81533K wps
[Epoch 65 Batch 1350/1540] avg loss 0.00146025, throughput 2.81851K wps
[Epoch 65 Batch 1380/1540] avg loss 0.00154175, throughput 2.80358K wps
[Epoch 65 Batch 1410/1540] avg loss 0.0014461, throughput 2.77229K wps
[Epoch 65 Batch 1440/1540] avg loss 0.00144795, throughput 2.78701K wps
[Epoch 65 Batch 1470/1540] avg loss 0.00128535, throughput 2.78793K wps
[Epoch 65 Batch 1500/1540] avg loss 0.00133795, throughput 2.78703K wps
[Epoch 65 Batch 1530/1540] avg loss 0.00125848, throughput 2.76145K wps
Begin Testing...
[Epoch 65] train avg loss 0.00136794, dev acc 0.8211, dev avg loss 0.705391, throughput 2.81075K wps
[Epoch 66 Batch 30/1540] avg loss 0.00150412, throughput 2.86498K wps
[Epoch 66 Batch 60/1540] avg loss 0.00131868, throughput 2.84786K wps
[Epoch 66 Batch 90/1540] avg loss 0.00125671, throughput 2.83741K wps
[Epoch 66 Batch 120/1540] avg loss 0.00129246, throughput 2.84745K wps
[Epoch 66 Batch 150/1540] avg loss 0.00128716, throughput 2.83481K wps
[Epoch 66 Batch 180/1540] avg loss 0.00121351, throughput 2.7751K wps
[Epoch 66 Batch 210/1540] avg loss 0.00132036, throughput 2.77518K wps
[Epoch 66 Batch 240/1540] avg loss 0.00126849, throughput 2.84491K wps
[Epoch 66 Batch 270/1540] avg loss 0.00113543, throughput 2.81656K wps
[Epoch 66 Batch 300/1540] avg loss 0.00138584, throughput 2.85049K wps
[Epoch 66 Batch 330/1540] avg loss 0.00116019, throughput 2.8483K wps
[Epoch 66 Batch 360/1540] avg loss 0.00105862, throughput 2.85088K wps
[Epoch 66 Batch 390/1540] avg loss 0.00126008, throughput 2.79146K wps
[Epoch 66 Batch 420/1540] avg loss 0.00122383, throughput 2.81192K wps
[Epoch 66 Batch 450/1540] avg loss 0.00145524, throughput 2.7989K wps
[Epoch 66 Batch 480/1540] avg loss 0.00117845, throughput 2.84592K wps
[Epoch 66 Batch 510/1540] avg loss 0.00157409, throughput 2.79544K wps
[Epoch 66 Batch 540/1540] avg loss 0.00131486, throughput 2.84771K wps
[Epoch 66 Batch 570/1540] avg loss 0.00152742, throughput 2.81371K wps
[Epoch 66 Batch 600/1540] avg loss 0.00116423, throughput 2.8535K wps
[Epoch 66 Batch 630/1540] avg loss 0.00135736, throughput 2.8096K wps
[Epoch 66 Batch 660/1540] avg loss 0.00119504, throughput 2.80716K wps
[Epoch 66 Batch 690/1540] avg loss 0.00138877, throughput 2.77616K wps
[Epoch 66 Batch 720/1540] avg loss 0.00134821, throughput 2.77284K wps
[Epoch 66 Batch 750/1540] avg loss 0.00157287, throughput 2.82738K wps
[Epoch 66 Batch 780/1540] avg loss 0.00142511, throughput 2.8502K wps
[Epoch 66 Batch 810/1540] avg loss 0.0015438, throughput 2.85199K wps
[Epoch 66 Batch 840/1540] avg loss 0.00138519, throughput 2.84895K wps
[Epoch 66 Batch 870/1540] avg loss 0.00144996, throughput 2.84502K wps
[Epoch 66 Batch 900/1540] avg loss 0.00136524, throughput 2.84695K wps
[Epoch 66 Batch 930/1540] avg loss 0.0014131, throughput 2.84501K wps
[Epoch 66 Batch 960/1540] avg loss 0.00131232, throughput 2.82615K wps
[Epoch 66 Batch 990/1540] avg loss 0.00132075, throughput 2.81396K wps
[Epoch 66 Batch 1020/1540] avg loss 0.00126323, throughput 2.8222K wps
[Epoch 66 Batch 1050/1540] avg loss 0.00147327, throughput 2.83765K wps
[Epoch 66 Batch 1080/1540] avg loss 0.00121619, throughput 2.83948K wps
[Epoch 66 Batch 1110/1540] avg loss 0.00144299, throughput 2.78932K wps
[Epoch 66 Batch 1140/1540] avg loss 0.0012649, throughput 2.80261K wps
[Epoch 66 Batch 1170/1540] avg loss 0.00147657, throughput 2.82515K wps
[Epoch 66 Batch 1200/1540] avg loss 0.00120296, throughput 2.83325K wps
[Epoch 66 Batch 1230/1540] avg loss 0.00111114, throughput 2.84667K wps
[Epoch 66 Batch 1260/1540] avg loss 0.00125123, throughput 2.84358K wps
[Epoch 66 Batch 1290/1540] avg loss 0.00125622, throughput 2.77449K wps
[Epoch 66 Batch 1320/1540] avg loss 0.00166964, throughput 2.82357K wps
[Epoch 66 Batch 1350/1540] avg loss 0.00123337, throughput 2.85031K wps
[Epoch 66 Batch 1380/1540] avg loss 0.00157972, throughput 2.84748K wps
[Epoch 66 Batch 1410/1540] avg loss 0.00146594, throughput 2.84688K wps
[Epoch 66 Batch 1440/1540] avg loss 0.00147775, throughput 2.79935K wps
[Epoch 66 Batch 1470/1540] avg loss 0.00135579, throughput 2.75663K wps
[Epoch 66 Batch 1500/1540] avg loss 0.001328, throughput 2.80063K wps
[Epoch 66 Batch 1530/1540] avg loss 0.00152138, throughput 2.84577K wps
Begin Testing...
[Epoch 66] train avg loss 0.0013464, dev acc 0.8291, dev avg loss 0.682711, throughput 2.82454K wps
[Epoch 67 Batch 30/1540] avg loss 0.00120698, throughput 2.82144K wps
[Epoch 67 Batch 60/1540] avg loss 0.00130126, throughput 2.76436K wps
[Epoch 67 Batch 90/1540] avg loss 0.00114489, throughput 2.76552K wps
[Epoch 67 Batch 120/1540] avg loss 0.00137436, throughput 2.78232K wps
[Epoch 67 Batch 150/1540] avg loss 0.00134152, throughput 2.81782K wps
[Epoch 67 Batch 180/1540] avg loss 0.00122005, throughput 2.76566K wps
[Epoch 67 Batch 210/1540] avg loss 0.0011834, throughput 2.83675K wps
[Epoch 67 Batch 240/1540] avg loss 0.0013587, throughput 2.82312K wps
[Epoch 67 Batch 270/1540] avg loss 0.0010773, throughput 2.84232K wps
[Epoch 67 Batch 300/1540] avg loss 0.00143286, throughput 2.81202K wps
[Epoch 67 Batch 330/1540] avg loss 0.00147995, throughput 2.80969K wps
[Epoch 67 Batch 360/1540] avg loss 0.00139869, throughput 2.76123K wps
[Epoch 67 Batch 390/1540] avg loss 0.00141834, throughput 2.82171K wps
[Epoch 67 Batch 420/1540] avg loss 0.00144737, throughput 2.83692K wps
[Epoch 67 Batch 450/1540] avg loss 0.00114299, throughput 2.76064K wps
[Epoch 67 Batch 480/1540] avg loss 0.00121729, throughput 2.80678K wps
[Epoch 67 Batch 510/1540] avg loss 0.00148459, throughput 2.84239K wps
[Epoch 67 Batch 540/1540] avg loss 0.00152348, throughput 2.84129K wps
[Epoch 67 Batch 570/1540] avg loss 0.00140434, throughput 2.85082K wps
[Epoch 67 Batch 600/1540] avg loss 0.000977125, throughput 2.82548K wps
[Epoch 67 Batch 630/1540] avg loss 0.00141587, throughput 2.81226K wps
[Epoch 67 Batch 660/1540] avg loss 0.00126063, throughput 2.84264K wps
[Epoch 67 Batch 690/1540] avg loss 0.00122325, throughput 2.79494K wps
[Epoch 67 Batch 720/1540] avg loss 0.00121937, throughput 2.84884K wps
[Epoch 67 Batch 750/1540] avg loss 0.001519, throughput 2.83791K wps
[Epoch 67 Batch 780/1540] avg loss 0.00133899, throughput 2.83244K wps
[Epoch 67 Batch 810/1540] avg loss 0.00107341, throughput 2.76279K wps
[Epoch 67 Batch 840/1540] avg loss 0.00141044, throughput 2.81324K wps
[Epoch 67 Batch 870/1540] avg loss 0.00125366, throughput 2.84795K wps
[Epoch 67 Batch 900/1540] avg loss 0.0015807, throughput 2.76052K wps
[Epoch 67 Batch 930/1540] avg loss 0.00118607, throughput 2.83217K wps
[Epoch 67 Batch 960/1540] avg loss 0.0013074, throughput 2.81048K wps
[Epoch 67 Batch 990/1540] avg loss 0.00128828, throughput 2.82857K wps
[Epoch 67 Batch 1020/1540] avg loss 0.00141741, throughput 2.78476K wps
[Epoch 67 Batch 1050/1540] avg loss 0.00129297, throughput 2.83828K wps
[Epoch 67 Batch 1080/1540] avg loss 0.00163555, throughput 2.78505K wps
[Epoch 67 Batch 1110/1540] avg loss 0.00111748, throughput 2.79368K wps
[Epoch 67 Batch 1140/1540] avg loss 0.0013924, throughput 2.76725K wps
[Epoch 67 Batch 1170/1540] avg loss 0.00120656, throughput 2.78636K wps
[Epoch 67 Batch 1200/1540] avg loss 0.00137349, throughput 2.79242K wps
[Epoch 67 Batch 1230/1540] avg loss 0.00132469, throughput 2.76665K wps
[Epoch 67 Batch 1260/1540] avg loss 0.00131465, throughput 2.80437K wps
[Epoch 67 Batch 1290/1540] avg loss 0.0013637, throughput 2.78808K wps
[Epoch 67 Batch 1320/1540] avg loss 0.00161416, throughput 2.79619K wps
[Epoch 67 Batch 1350/1540] avg loss 0.00140257, throughput 2.8325K wps
[Epoch 67 Batch 1380/1540] avg loss 0.00157167, throughput 2.84462K wps
[Epoch 67 Batch 1410/1540] avg loss 0.00145622, throughput 2.79844K wps
[Epoch 67 Batch 1440/1540] avg loss 0.00148192, throughput 2.76972K wps
[Epoch 67 Batch 1470/1540] avg loss 0.00116538, throughput 2.77937K wps
[Epoch 67 Batch 1500/1540] avg loss 0.00135735, throughput 2.82055K wps
[Epoch 67 Batch 1530/1540] avg loss 0.00136479, throughput 2.80443K wps
Begin Testing...
[Epoch 67] train avg loss 0.00133463, dev acc 0.8314, dev avg loss 0.674788, throughput 2.8067K wps
[Epoch 68 Batch 30/1540] avg loss 0.00129498, throughput 2.89774K wps
[Epoch 68 Batch 60/1540] avg loss 0.00136276, throughput 2.79938K wps
[Epoch 68 Batch 90/1540] avg loss 0.00132136, throughput 2.77772K wps
[Epoch 68 Batch 120/1540] avg loss 0.00135133, throughput 2.76678K wps
[Epoch 68 Batch 150/1540] avg loss 0.00124798, throughput 2.77168K wps
[Epoch 68 Batch 180/1540] avg loss 0.00102541, throughput 2.77098K wps
[Epoch 68 Batch 210/1540] avg loss 0.00156891, throughput 2.83193K wps
[Epoch 68 Batch 240/1540] avg loss 0.00108165, throughput 2.78669K wps
[Epoch 68 Batch 270/1540] avg loss 0.00128393, throughput 2.82944K wps
[Epoch 68 Batch 300/1540] avg loss 0.00126241, throughput 2.84996K wps
[Epoch 68 Batch 330/1540] avg loss 0.00151046, throughput 2.83367K wps
[Epoch 68 Batch 360/1540] avg loss 0.00116995, throughput 2.82007K wps
[Epoch 68 Batch 390/1540] avg loss 0.00119408, throughput 2.84788K wps
[Epoch 68 Batch 420/1540] avg loss 0.00144172, throughput 2.85271K wps
[Epoch 68 Batch 450/1540] avg loss 0.00124029, throughput 2.83796K wps
[Epoch 68 Batch 480/1540] avg loss 0.00140299, throughput 2.8048K wps
[Epoch 68 Batch 510/1540] avg loss 0.0014106, throughput 2.84478K wps
[Epoch 68 Batch 540/1540] avg loss 0.00141573, throughput 2.84457K wps
[Epoch 68 Batch 570/1540] avg loss 0.00128158, throughput 2.83736K wps
[Epoch 68 Batch 600/1540] avg loss 0.00126699, throughput 2.78203K wps
[Epoch 68 Batch 630/1540] avg loss 0.00139803, throughput 2.84419K wps
[Epoch 68 Batch 660/1540] avg loss 0.00142002, throughput 2.83536K wps
[Epoch 68 Batch 690/1540] avg loss 0.00119698, throughput 2.846K wps
[Epoch 68 Batch 720/1540] avg loss 0.00159482, throughput 2.84633K wps
[Epoch 68 Batch 750/1540] avg loss 0.00121168, throughput 2.83984K wps
[Epoch 68 Batch 780/1540] avg loss 0.00136571, throughput 2.80614K wps
[Epoch 68 Batch 810/1540] avg loss 0.00106494, throughput 2.77086K wps
[Epoch 68 Batch 840/1540] avg loss 0.00109981, throughput 2.83778K wps
[Epoch 68 Batch 870/1540] avg loss 0.00141351, throughput 2.82495K wps
[Epoch 68 Batch 900/1540] avg loss 0.00101937, throughput 2.75478K wps
[Epoch 68 Batch 930/1540] avg loss 0.00120231, throughput 2.84793K wps
[Epoch 68 Batch 960/1540] avg loss 0.00125676, throughput 2.8387K wps
[Epoch 68 Batch 990/1540] avg loss 0.00134496, throughput 2.81911K wps
[Epoch 68 Batch 1020/1540] avg loss 0.00153205, throughput 2.76482K wps
[Epoch 68 Batch 1050/1540] avg loss 0.00146754, throughput 2.81323K wps
[Epoch 68 Batch 1080/1540] avg loss 0.00169226, throughput 2.77661K wps
[Epoch 68 Batch 1110/1540] avg loss 0.00140695, throughput 2.85383K wps
[Epoch 68 Batch 1140/1540] avg loss 0.00154801, throughput 2.85033K wps
[Epoch 68 Batch 1170/1540] avg loss 0.00143303, throughput 2.84193K wps
[Epoch 68 Batch 1200/1540] avg loss 0.00153015, throughput 2.85193K wps
[Epoch 68 Batch 1230/1540] avg loss 0.00168733, throughput 2.78934K wps
[Epoch 68 Batch 1260/1540] avg loss 0.00111186, throughput 2.79641K wps
[Epoch 68 Batch 1290/1540] avg loss 0.0014436, throughput 2.83861K wps
[Epoch 68 Batch 1320/1540] avg loss 0.00134465, throughput 2.77376K wps
[Epoch 68 Batch 1350/1540] avg loss 0.00139514, throughput 2.84287K wps
[Epoch 68 Batch 1380/1540] avg loss 0.00158305, throughput 2.84596K wps
[Epoch 68 Batch 1410/1540] avg loss 0.00110751, throughput 2.85298K wps
[Epoch 68 Batch 1440/1540] avg loss 0.00144148, throughput 2.84896K wps
[Epoch 68 Batch 1470/1540] avg loss 0.00108277, throughput 2.84962K wps
[Epoch 68 Batch 1500/1540] avg loss 0.00105352, throughput 2.85186K wps
[Epoch 68 Batch 1530/1540] avg loss 0.00183451, throughput 2.84555K wps
Begin Testing...
[Epoch 68] train avg loss 0.00134615, dev acc 0.8245, dev avg loss 0.689698, throughput 2.82311K wps
[Epoch 69 Batch 30/1540] avg loss 0.00130448, throughput 2.87834K wps
[Epoch 69 Batch 60/1540] avg loss 0.00119484, throughput 2.8075K wps
[Epoch 69 Batch 90/1540] avg loss 0.00128733, throughput 2.78733K wps
[Epoch 69 Batch 120/1540] avg loss 0.00131005, throughput 2.84188K wps
[Epoch 69 Batch 150/1540] avg loss 0.00100847, throughput 2.84177K wps
[Epoch 69 Batch 180/1540] avg loss 0.00137982, throughput 2.82661K wps
[Epoch 69 Batch 210/1540] avg loss 0.00131059, throughput 2.82188K wps
[Epoch 69 Batch 240/1540] avg loss 0.00136738, throughput 2.74872K wps
[Epoch 69 Batch 270/1540] avg loss 0.00140612, throughput 2.82471K wps
[Epoch 69 Batch 300/1540] avg loss 0.00141307, throughput 2.75258K wps
[Epoch 69 Batch 330/1540] avg loss 0.00132978, throughput 2.84215K wps
[Epoch 69 Batch 360/1540] avg loss 0.0011167, throughput 2.82389K wps
[Epoch 69 Batch 390/1540] avg loss 0.00128458, throughput 2.84981K wps
[Epoch 69 Batch 420/1540] avg loss 0.00119993, throughput 2.84109K wps
[Epoch 69 Batch 450/1540] avg loss 0.00104779, throughput 2.78254K wps
[Epoch 69 Batch 480/1540] avg loss 0.00116852, throughput 2.84417K wps
[Epoch 69 Batch 510/1540] avg loss 0.00121064, throughput 2.82967K wps
[Epoch 69 Batch 540/1540] avg loss 0.00143022, throughput 2.83966K wps
[Epoch 69 Batch 570/1540] avg loss 0.00114458, throughput 2.84075K wps
[Epoch 69 Batch 600/1540] avg loss 0.00146794, throughput 2.82414K wps
[Epoch 69 Batch 630/1540] avg loss 0.00136524, throughput 2.80481K wps
[Epoch 69 Batch 660/1540] avg loss 0.00132793, throughput 2.77304K wps
[Epoch 69 Batch 690/1540] avg loss 0.00137458, throughput 2.84047K wps
[Epoch 69 Batch 720/1540] avg loss 0.00141862, throughput 2.82427K wps
[Epoch 69 Batch 750/1540] avg loss 0.00140696, throughput 2.80169K wps
[Epoch 69 Batch 780/1540] avg loss 0.00145173, throughput 2.84776K wps
[Epoch 69 Batch 810/1540] avg loss 0.00111278, throughput 2.84908K wps
[Epoch 69 Batch 840/1540] avg loss 0.00132507, throughput 2.8412K wps
[Epoch 69 Batch 870/1540] avg loss 0.00119608, throughput 2.83858K wps
[Epoch 69 Batch 900/1540] avg loss 0.00117145, throughput 2.83128K wps
[Epoch 69 Batch 930/1540] avg loss 0.00149767, throughput 2.76098K wps
[Epoch 69 Batch 960/1540] avg loss 0.00112392, throughput 2.82679K wps
[Epoch 69 Batch 990/1540] avg loss 0.00158163, throughput 2.80262K wps
[Epoch 69 Batch 1020/1540] avg loss 0.00139967, throughput 2.77172K wps
[Epoch 69 Batch 1050/1540] avg loss 0.00112329, throughput 2.77702K wps
[Epoch 69 Batch 1080/1540] avg loss 0.00171255, throughput 2.76512K wps
[Epoch 69 Batch 1110/1540] avg loss 0.00149302, throughput 2.77768K wps
[Epoch 69 Batch 1140/1540] avg loss 0.00117121, throughput 2.77667K wps
[Epoch 69 Batch 1170/1540] avg loss 0.00146089, throughput 2.76286K wps
[Epoch 69 Batch 1200/1540] avg loss 0.00114565, throughput 2.78464K wps
[Epoch 69 Batch 1230/1540] avg loss 0.00129055, throughput 2.77862K wps
[Epoch 69 Batch 1260/1540] avg loss 0.0014519, throughput 2.82723K wps
[Epoch 69 Batch 1290/1540] avg loss 0.00130148, throughput 2.84499K wps
[Epoch 69 Batch 1320/1540] a