Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='SST-2', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='multichannel')
Use gpu0
maximum length (in tokens): 53
Done! Tokenizing Time=0.81s, #Sentences=76961
Done! Tokenizing Time=0.03s, #Sentences=1821
Done! Tokenizing Time=0.01s, #Sentences=872
SentimentNet(
(embedding): Embedding(17244 -> 300, float32)
(embedding_extend): Embedding(17244 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/1540] avg loss 0.0141127, throughput 0.425862K wps
[Epoch 0 Batch 60/1540] avg loss 0.0138031, throughput 2.20071K wps
[Epoch 0 Batch 90/1540] avg loss 0.0137354, throughput 2.19767K wps
[Epoch 0 Batch 120/1540] avg loss 0.0136271, throughput 2.19604K wps
[Epoch 0 Batch 150/1540] avg loss 0.0137419, throughput 2.19886K wps
[Epoch 0 Batch 180/1540] avg loss 0.0134521, throughput 2.18779K wps
[Epoch 0 Batch 210/1540] avg loss 0.013471, throughput 2.18638K wps
[Epoch 0 Batch 240/1540] avg loss 0.0133766, throughput 2.16142K wps
[Epoch 0 Batch 270/1540] avg loss 0.0133872, throughput 2.15449K wps
[Epoch 0 Batch 300/1540] avg loss 0.0132732, throughput 2.18961K wps
[Epoch 0 Batch 330/1540] avg loss 0.0132894, throughput 2.19754K wps
[Epoch 0 Batch 360/1540] avg loss 0.0131825, throughput 2.19816K wps
[Epoch 0 Batch 390/1540] avg loss 0.0131346, throughput 2.18683K wps
[Epoch 0 Batch 420/1540] avg loss 0.012907, throughput 2.19652K wps
[Epoch 0 Batch 450/1540] avg loss 0.0128577, throughput 2.19836K wps
[Epoch 0 Batch 480/1540] avg loss 0.012843, throughput 2.18776K wps
[Epoch 0 Batch 510/1540] avg loss 0.0129099, throughput 2.18006K wps
[Epoch 0 Batch 540/1540] avg loss 0.012679, throughput 2.18801K wps
[Epoch 0 Batch 570/1540] avg loss 0.0126695, throughput 2.19497K wps
[Epoch 0 Batch 600/1540] avg loss 0.0124861, throughput 2.19404K wps
[Epoch 0 Batch 630/1540] avg loss 0.0123887, throughput 2.18209K wps
[Epoch 0 Batch 660/1540] avg loss 0.0124439, throughput 2.18133K wps
[Epoch 0 Batch 690/1540] avg loss 0.0123472, throughput 2.19644K wps
[Epoch 0 Batch 720/1540] avg loss 0.0121617, throughput 2.18709K wps
[Epoch 0 Batch 750/1540] avg loss 0.0121799, throughput 2.18395K wps
[Epoch 0 Batch 780/1540] avg loss 0.0120835, throughput 2.18431K wps
[Epoch 0 Batch 810/1540] avg loss 0.0117778, throughput 2.18043K wps
[Epoch 0 Batch 840/1540] avg loss 0.0119752, throughput 2.19584K wps
[Epoch 0 Batch 870/1540] avg loss 0.0117876, throughput 2.19076K wps
[Epoch 0 Batch 900/1540] avg loss 0.0118508, throughput 2.18142K wps
[Epoch 0 Batch 930/1540] avg loss 0.0117864, throughput 2.19364K wps
[Epoch 0 Batch 960/1540] avg loss 0.0115676, throughput 2.19747K wps
[Epoch 0 Batch 990/1540] avg loss 0.0116665, throughput 2.19268K wps
[Epoch 0 Batch 1020/1540] avg loss 0.0115654, throughput 2.18493K wps
[Epoch 0 Batch 1050/1540] avg loss 0.0113274, throughput 2.18433K wps
[Epoch 0 Batch 1080/1540] avg loss 0.0110519, throughput 2.19067K wps
[Epoch 0 Batch 1110/1540] avg loss 0.0111113, throughput 2.19563K wps
[Epoch 0 Batch 1140/1540] avg loss 0.0109176, throughput 2.18321K wps
[Epoch 0 Batch 1170/1540] avg loss 0.0109058, throughput 2.1965K wps
[Epoch 0 Batch 1200/1540] avg loss 0.010832, throughput 2.16882K wps
[Epoch 0 Batch 1230/1540] avg loss 0.0109426, throughput 2.17349K wps
[Epoch 0 Batch 1260/1540] avg loss 0.010711, throughput 2.17065K wps
[Epoch 0 Batch 1290/1540] avg loss 0.010519, throughput 2.19649K wps
[Epoch 0 Batch 1320/1540] avg loss 0.0107161, throughput 2.1883K wps
[Epoch 0 Batch 1350/1540] avg loss 0.0103918, throughput 2.18836K wps
[Epoch 0 Batch 1380/1540] avg loss 0.0103479, throughput 2.18694K wps
[Epoch 0 Batch 1410/1540] avg loss 0.0103015, throughput 2.18629K wps
[Epoch 0 Batch 1440/1540] avg loss 0.0102233, throughput 2.18563K wps
[Epoch 0 Batch 1470/1540] avg loss 0.0100963, throughput 2.18386K wps
[Epoch 0 Batch 1500/1540] avg loss 0.00993193, throughput 2.1862K wps
[Epoch 0 Batch 1530/1540] avg loss 0.0100748, throughput 2.19584K wps
Begin Testing...
[Epoch 0] train avg loss 0.0120125, dev acc 0.8016, dev avg loss 0.510245, throughput 1.92808K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 1 Batch 30/1540] avg loss 0.00981012, throughput 2.21779K wps
[Epoch 1 Batch 60/1540] avg loss 0.00967721, throughput 2.18671K wps
[Epoch 1 Batch 90/1540] avg loss 0.00948418, throughput 2.19195K wps
[Epoch 1 Batch 120/1540] avg loss 0.00952659, throughput 2.18104K wps
[Epoch 1 Batch 150/1540] avg loss 0.00965484, throughput 2.17993K wps
[Epoch 1 Batch 180/1540] avg loss 0.00951897, throughput 2.17954K wps
[Epoch 1 Batch 210/1540] avg loss 0.0095241, throughput 2.19124K wps
[Epoch 1 Batch 240/1540] avg loss 0.00915195, throughput 2.17401K wps
[Epoch 1 Batch 270/1540] avg loss 0.00910785, throughput 2.19546K wps
[Epoch 1 Batch 300/1540] avg loss 0.00912654, throughput 2.19836K wps
[Epoch 1 Batch 330/1540] avg loss 0.00908633, throughput 2.20299K wps
[Epoch 1 Batch 360/1540] avg loss 0.0090771, throughput 2.1824K wps
[Epoch 1 Batch 390/1540] avg loss 0.00914618, throughput 2.18955K wps
[Epoch 1 Batch 420/1540] avg loss 0.0087976, throughput 2.19401K wps
[Epoch 1 Batch 450/1540] avg loss 0.00890227, throughput 2.19137K wps
[Epoch 1 Batch 480/1540] avg loss 0.00865517, throughput 2.1692K wps
[Epoch 1 Batch 510/1540] avg loss 0.00888238, throughput 2.19419K wps
[Epoch 1 Batch 540/1540] avg loss 0.00879076, throughput 2.20123K wps
[Epoch 1 Batch 570/1540] avg loss 0.00871804, throughput 2.19436K wps
[Epoch 1 Batch 600/1540] avg loss 0.00863765, throughput 2.19654K wps
[Epoch 1 Batch 630/1540] avg loss 0.00873745, throughput 2.19696K wps
[Epoch 1 Batch 660/1540] avg loss 0.00853835, throughput 2.19234K wps
[Epoch 1 Batch 690/1540] avg loss 0.00856994, throughput 2.19844K wps
[Epoch 1 Batch 720/1540] avg loss 0.00870699, throughput 2.19537K wps
[Epoch 1 Batch 750/1540] avg loss 0.00874612, throughput 2.18792K wps
[Epoch 1 Batch 780/1540] avg loss 0.00836838, throughput 2.17798K wps
[Epoch 1 Batch 810/1540] avg loss 0.00835201, throughput 2.19009K wps
[Epoch 1 Batch 840/1540] avg loss 0.00873645, throughput 2.17811K wps
[Epoch 1 Batch 870/1540] avg loss 0.00840758, throughput 2.1963K wps
[Epoch 1 Batch 900/1540] avg loss 0.00849318, throughput 2.18949K wps
[Epoch 1 Batch 930/1540] avg loss 0.00889435, throughput 2.19391K wps
[Epoch 1 Batch 960/1540] avg loss 0.00832338, throughput 2.18562K wps
[Epoch 1 Batch 990/1540] avg loss 0.00840144, throughput 2.18804K wps
[Epoch 1 Batch 1020/1540] avg loss 0.00814559, throughput 2.19307K wps
[Epoch 1 Batch 1050/1540] avg loss 0.0079384, throughput 2.16757K wps
[Epoch 1 Batch 1080/1540] avg loss 0.0083828, throughput 2.18564K wps
[Epoch 1 Batch 1110/1540] avg loss 0.00795755, throughput 2.19067K wps
[Epoch 1 Batch 1140/1540] avg loss 0.0081238, throughput 2.19481K wps
[Epoch 1 Batch 1170/1540] avg loss 0.00808205, throughput 2.18962K wps
[Epoch 1 Batch 1200/1540] avg loss 0.0079414, throughput 2.1931K wps
[Epoch 1 Batch 1230/1540] avg loss 0.00800679, throughput 2.19907K wps
[Epoch 1 Batch 1260/1540] avg loss 0.00782828, throughput 2.18298K wps
[Epoch 1 Batch 1290/1540] avg loss 0.00766588, throughput 2.18952K wps
[Epoch 1 Batch 1320/1540] avg loss 0.00806144, throughput 2.19636K wps
[Epoch 1 Batch 1350/1540] avg loss 0.00823747, throughput 2.17935K wps
[Epoch 1 Batch 1380/1540] avg loss 0.00764574, throughput 2.18155K wps
[Epoch 1 Batch 1410/1540] avg loss 0.00762133, throughput 2.19088K wps
[Epoch 1 Batch 1440/1540] avg loss 0.00835194, throughput 2.18769K wps
[Epoch 1 Batch 1470/1540] avg loss 0.00763809, throughput 2.19906K wps
[Epoch 1 Batch 1500/1540] avg loss 0.00766001, throughput 2.18619K wps
[Epoch 1 Batch 1530/1540] avg loss 0.00786258, throughput 2.19238K wps
Begin Testing...
[Epoch 1] train avg loss 0.00859051, dev acc 0.8200, dev avg loss 0.41924, throughput 2.1899K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 2 Batch 30/1540] avg loss 0.00765313, throughput 2.24207K wps
[Epoch 2 Batch 60/1540] avg loss 0.00780072, throughput 2.18559K wps
[Epoch 2 Batch 90/1540] avg loss 0.00769987, throughput 2.19205K wps
[Epoch 2 Batch 120/1540] avg loss 0.00797949, throughput 2.18508K wps
[Epoch 2 Batch 150/1540] avg loss 0.00765616, throughput 2.17171K wps
[Epoch 2 Batch 180/1540] avg loss 0.00757388, throughput 2.17423K wps
[Epoch 2 Batch 210/1540] avg loss 0.00746295, throughput 2.189K wps
[Epoch 2 Batch 240/1540] avg loss 0.00755539, throughput 2.15733K wps
[Epoch 2 Batch 270/1540] avg loss 0.00780251, throughput 2.17345K wps
[Epoch 2 Batch 300/1540] avg loss 0.00806053, throughput 2.18311K wps
[Epoch 2 Batch 330/1540] avg loss 0.00747278, throughput 2.18582K wps
[Epoch 2 Batch 360/1540] avg loss 0.00753034, throughput 2.18785K wps
[Epoch 2 Batch 390/1540] avg loss 0.00763519, throughput 2.1899K wps
[Epoch 2 Batch 420/1540] avg loss 0.00728369, throughput 2.19059K wps
[Epoch 2 Batch 450/1540] avg loss 0.00708135, throughput 2.16754K wps
[Epoch 2 Batch 480/1540] avg loss 0.00726773, throughput 2.1848K wps
[Epoch 2 Batch 510/1540] avg loss 0.00717979, throughput 2.19203K wps
[Epoch 2 Batch 540/1540] avg loss 0.00746523, throughput 2.18533K wps
[Epoch 2 Batch 570/1540] avg loss 0.00745223, throughput 2.1838K wps
[Epoch 2 Batch 600/1540] avg loss 0.00727895, throughput 2.16961K wps
[Epoch 2 Batch 630/1540] avg loss 0.00773317, throughput 2.1919K wps
[Epoch 2 Batch 660/1540] avg loss 0.00737264, throughput 2.18266K wps
[Epoch 2 Batch 690/1540] avg loss 0.00694443, throughput 2.18187K wps
[Epoch 2 Batch 720/1540] avg loss 0.00759591, throughput 2.19277K wps
[Epoch 2 Batch 750/1540] avg loss 0.00742209, throughput 2.17429K wps
[Epoch 2 Batch 780/1540] avg loss 0.00713577, throughput 2.17897K wps
[Epoch 2 Batch 810/1540] avg loss 0.00746203, throughput 2.18836K wps
[Epoch 2 Batch 840/1540] avg loss 0.00741007, throughput 2.19545K wps
[Epoch 2 Batch 870/1540] avg loss 0.00689533, throughput 2.19191K wps
[Epoch 2 Batch 900/1540] avg loss 0.00761104, throughput 2.16871K wps
[Epoch 2 Batch 930/1540] avg loss 0.00734173, throughput 2.15352K wps
[Epoch 2 Batch 960/1540] avg loss 0.00722129, throughput 2.17858K wps
[Epoch 2 Batch 990/1540] avg loss 0.00747408, throughput 2.17546K wps
[Epoch 2 Batch 1020/1540] avg loss 0.0074088, throughput 2.16652K wps
[Epoch 2 Batch 1050/1540] avg loss 0.00756083, throughput 2.1924K wps
[Epoch 2 Batch 1080/1540] avg loss 0.00702031, throughput 2.18893K wps
[Epoch 2 Batch 1110/1540] avg loss 0.00714915, throughput 2.18609K wps
[Epoch 2 Batch 1140/1540] avg loss 0.00705831, throughput 2.18553K wps
[Epoch 2 Batch 1170/1540] avg loss 0.00694349, throughput 2.18754K wps
[Epoch 2 Batch 1200/1540] avg loss 0.00734301, throughput 2.19119K wps
[Epoch 2 Batch 1230/1540] avg loss 0.00694893, throughput 2.18002K wps
[Epoch 2 Batch 1260/1540] avg loss 0.00712878, throughput 2.19184K wps
[Epoch 2 Batch 1290/1540] avg loss 0.00724544, throughput 2.18753K wps
[Epoch 2 Batch 1320/1540] avg loss 0.00736845, throughput 2.18757K wps
[Epoch 2 Batch 1350/1540] avg loss 0.00708683, throughput 2.16674K wps
[Epoch 2 Batch 1380/1540] avg loss 0.00730155, throughput 2.18195K wps
[Epoch 2 Batch 1410/1540] avg loss 0.00665908, throughput 2.17735K wps
[Epoch 2 Batch 1440/1540] avg loss 0.00718993, throughput 2.18159K wps
[Epoch 2 Batch 1470/1540] avg loss 0.00679871, throughput 2.16232K wps
[Epoch 2 Batch 1500/1540] avg loss 0.00650102, throughput 2.17948K wps
[Epoch 2 Batch 1530/1540] avg loss 0.00695368, throughput 2.18966K wps
Begin Testing...
[Epoch 2] train avg loss 0.00734031, dev acc 0.8200, dev avg loss 0.39756, throughput 2.18298K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 3 Batch 30/1540] avg loss 0.00625909, throughput 2.22288K wps
[Epoch 3 Batch 60/1540] avg loss 0.00643339, throughput 2.19166K wps
[Epoch 3 Batch 90/1540] avg loss 0.00684464, throughput 2.19357K wps
[Epoch 3 Batch 120/1540] avg loss 0.00688752, throughput 2.19358K wps
[Epoch 3 Batch 150/1540] avg loss 0.00654861, throughput 2.18242K wps
[Epoch 3 Batch 180/1540] avg loss 0.00689108, throughput 2.18235K wps
[Epoch 3 Batch 210/1540] avg loss 0.00627219, throughput 2.17839K wps
[Epoch 3 Batch 240/1540] avg loss 0.00760698, throughput 2.18433K wps
[Epoch 3 Batch 270/1540] avg loss 0.00684796, throughput 2.19042K wps
[Epoch 3 Batch 300/1540] avg loss 0.00654331, throughput 2.1929K wps
[Epoch 3 Batch 330/1540] avg loss 0.00693111, throughput 2.16811K wps
[Epoch 3 Batch 360/1540] avg loss 0.00657177, throughput 2.16796K wps
[Epoch 3 Batch 390/1540] avg loss 0.00725869, throughput 2.19003K wps
[Epoch 3 Batch 420/1540] avg loss 0.00719763, throughput 2.19044K wps
[Epoch 3 Batch 450/1540] avg loss 0.00707039, throughput 2.19122K wps
[Epoch 3 Batch 480/1540] avg loss 0.0067903, throughput 2.18932K wps
[Epoch 3 Batch 510/1540] avg loss 0.00701761, throughput 2.18094K wps
[Epoch 3 Batch 540/1540] avg loss 0.00689255, throughput 2.18854K wps
[Epoch 3 Batch 570/1540] avg loss 0.00648468, throughput 2.15104K wps
[Epoch 3 Batch 600/1540] avg loss 0.00684725, throughput 2.17149K wps
[Epoch 3 Batch 630/1540] avg loss 0.00676565, throughput 2.18788K wps
[Epoch 3 Batch 660/1540] avg loss 0.00690395, throughput 2.18955K wps
[Epoch 3 Batch 690/1540] avg loss 0.0069088, throughput 2.18428K wps
[Epoch 3 Batch 720/1540] avg loss 0.00681447, throughput 2.16519K wps
[Epoch 3 Batch 750/1540] avg loss 0.00623585, throughput 2.18974K wps
[Epoch 3 Batch 780/1540] avg loss 0.0066702, throughput 2.18467K wps
[Epoch 3 Batch 810/1540] avg loss 0.00642812, throughput 2.19031K wps
[Epoch 3 Batch 840/1540] avg loss 0.006665, throughput 2.17995K wps
[Epoch 3 Batch 870/1540] avg loss 0.00657606, throughput 2.1876K wps
[Epoch 3 Batch 900/1540] avg loss 0.00685672, throughput 2.19341K wps
[Epoch 3 Batch 930/1540] avg loss 0.00631167, throughput 2.19115K wps
[Epoch 3 Batch 960/1540] avg loss 0.00649842, throughput 2.18663K wps
[Epoch 3 Batch 990/1540] avg loss 0.00658228, throughput 2.17591K wps
[Epoch 3 Batch 1020/1540] avg loss 0.00678823, throughput 2.18421K wps
[Epoch 3 Batch 1050/1540] avg loss 0.00629843, throughput 2.19003K wps
[Epoch 3 Batch 1080/1540] avg loss 0.0066577, throughput 2.15888K wps
[Epoch 3 Batch 1110/1540] avg loss 0.00690787, throughput 2.1926K wps
[Epoch 3 Batch 1140/1540] avg loss 0.0064153, throughput 2.18898K wps
[Epoch 3 Batch 1170/1540] avg loss 0.00623393, throughput 2.16927K wps
[Epoch 3 Batch 1200/1540] avg loss 0.00639911, throughput 2.14199K wps
[Epoch 3 Batch 1230/1540] avg loss 0.00649021, throughput 2.17761K wps
[Epoch 3 Batch 1260/1540] avg loss 0.00659273, throughput 2.19078K wps
[Epoch 3 Batch 1290/1540] avg loss 0.00675549, throughput 2.1885K wps
[Epoch 3 Batch 1320/1540] avg loss 0.00679249, throughput 2.14925K wps
[Epoch 3 Batch 1350/1540] avg loss 0.00675375, throughput 2.19239K wps
[Epoch 3 Batch 1380/1540] avg loss 0.00662192, throughput 2.17328K wps
[Epoch 3 Batch 1410/1540] avg loss 0.00674051, throughput 2.19025K wps
[Epoch 3 Batch 1440/1540] avg loss 0.00632266, throughput 2.18472K wps
[Epoch 3 Batch 1470/1540] avg loss 0.00640008, throughput 2.18262K wps
[Epoch 3 Batch 1500/1540] avg loss 0.00647266, throughput 2.15683K wps
[Epoch 3 Batch 1530/1540] avg loss 0.00671316, throughput 2.18858K wps
Begin Testing...
[Epoch 3] train avg loss 0.00668237, dev acc 0.8280, dev avg loss 0.393289, throughput 2.18247K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 4 Batch 30/1540] avg loss 0.00686973, throughput 2.22737K wps
[Epoch 4 Batch 60/1540] avg loss 0.0065109, throughput 2.19125K wps
[Epoch 4 Batch 90/1540] avg loss 0.00679182, throughput 2.19026K wps
[Epoch 4 Batch 120/1540] avg loss 0.00633387, throughput 2.19026K wps
[Epoch 4 Batch 150/1540] avg loss 0.00640282, throughput 2.19331K wps
[Epoch 4 Batch 180/1540] avg loss 0.00604922, throughput 2.18482K wps
[Epoch 4 Batch 210/1540] avg loss 0.00630005, throughput 2.16137K wps
[Epoch 4 Batch 240/1540] avg loss 0.00635855, throughput 2.17795K wps
[Epoch 4 Batch 270/1540] avg loss 0.006119, throughput 2.19082K wps
[Epoch 4 Batch 300/1540] avg loss 0.00601484, throughput 2.17614K wps
[Epoch 4 Batch 330/1540] avg loss 0.0062196, throughput 2.18067K wps
[Epoch 4 Batch 360/1540] avg loss 0.00656012, throughput 2.18868K wps
[Epoch 4 Batch 390/1540] avg loss 0.00616505, throughput 2.19247K wps
[Epoch 4 Batch 420/1540] avg loss 0.00624489, throughput 2.18767K wps
[Epoch 4 Batch 450/1540] avg loss 0.00648398, throughput 2.16908K wps
[Epoch 4 Batch 480/1540] avg loss 0.00603878, throughput 2.1861K wps
[Epoch 4 Batch 510/1540] avg loss 0.00615059, throughput 2.18769K wps
[Epoch 4 Batch 540/1540] avg loss 0.00582084, throughput 2.17346K wps
[Epoch 4 Batch 570/1540] avg loss 0.00680851, throughput 2.17126K wps
[Epoch 4 Batch 600/1540] avg loss 0.00607108, throughput 2.18193K wps
[Epoch 4 Batch 630/1540] avg loss 0.00614206, throughput 2.19123K wps
[Epoch 4 Batch 660/1540] avg loss 0.0062191, throughput 2.17903K wps
[Epoch 4 Batch 690/1540] avg loss 0.00591864, throughput 2.1436K wps
[Epoch 4 Batch 720/1540] avg loss 0.00637597, throughput 2.18581K wps
[Epoch 4 Batch 750/1540] avg loss 0.00645682, throughput 2.16594K wps
[Epoch 4 Batch 780/1540] avg loss 0.00585897, throughput 2.19048K wps
[Epoch 4 Batch 810/1540] avg loss 0.00628391, throughput 2.1783K wps
[Epoch 4 Batch 840/1540] avg loss 0.00629204, throughput 2.18654K wps
[Epoch 4 Batch 870/1540] avg loss 0.00620559, throughput 2.1901K wps
[Epoch 4 Batch 900/1540] avg loss 0.00588295, throughput 2.19299K wps
[Epoch 4 Batch 930/1540] avg loss 0.00595568, throughput 2.1932K wps
[Epoch 4 Batch 960/1540] avg loss 0.00642588, throughput 2.19238K wps
[Epoch 4 Batch 990/1540] avg loss 0.00640422, throughput 2.19229K wps
[Epoch 4 Batch 1020/1540] avg loss 0.00622265, throughput 2.18811K wps
[Epoch 4 Batch 1050/1540] avg loss 0.00621473, throughput 2.18932K wps
[Epoch 4 Batch 1080/1540] avg loss 0.00591483, throughput 2.1788K wps
[Epoch 4 Batch 1110/1540] avg loss 0.0062081, throughput 2.17123K wps
[Epoch 4 Batch 1140/1540] avg loss 0.00608949, throughput 2.18455K wps
[Epoch 4 Batch 1170/1540] avg loss 0.0061651, throughput 2.18599K wps
[Epoch 4 Batch 1200/1540] avg loss 0.0062469, throughput 2.17444K wps
[Epoch 4 Batch 1230/1540] avg loss 0.00617616, throughput 2.15345K wps
[Epoch 4 Batch 1260/1540] avg loss 0.00604452, throughput 2.18694K wps
[Epoch 4 Batch 1290/1540] avg loss 0.0061907, throughput 2.19073K wps
[Epoch 4 Batch 1320/1540] avg loss 0.00573968, throughput 2.18458K wps
[Epoch 4 Batch 1350/1540] avg loss 0.00608386, throughput 2.19376K wps
[Epoch 4 Batch 1380/1540] avg loss 0.00600289, throughput 2.18895K wps
[Epoch 4 Batch 1410/1540] avg loss 0.00632098, throughput 2.1877K wps
[Epoch 4 Batch 1440/1540] avg loss 0.00569522, throughput 2.1929K wps
[Epoch 4 Batch 1470/1540] avg loss 0.00612351, throughput 2.17642K wps
[Epoch 4 Batch 1500/1540] avg loss 0.00585735, throughput 2.19117K wps
[Epoch 4 Batch 1530/1540] avg loss 0.0060293, throughput 2.18651K wps
Begin Testing...
[Epoch 4] train avg loss 0.00619849, dev acc 0.8257, dev avg loss 0.382176, throughput 2.18417K wps
[Epoch 5 Batch 30/1540] avg loss 0.0057581, throughput 2.22291K wps
[Epoch 5 Batch 60/1540] avg loss 0.00608666, throughput 2.17622K wps
[Epoch 5 Batch 90/1540] avg loss 0.00564682, throughput 2.17268K wps
[Epoch 5 Batch 120/1540] avg loss 0.0059381, throughput 2.175K wps
[Epoch 5 Batch 150/1540] avg loss 0.00573029, throughput 2.19346K wps
[Epoch 5 Batch 180/1540] avg loss 0.00574626, throughput 2.18392K wps
[Epoch 5 Batch 210/1540] avg loss 0.00555524, throughput 2.1867K wps
[Epoch 5 Batch 240/1540] avg loss 0.00611158, throughput 2.19048K wps
[Epoch 5 Batch 270/1540] avg loss 0.00564137, throughput 2.18809K wps
[Epoch 5 Batch 300/1540] avg loss 0.00541585, throughput 2.18043K wps
[Epoch 5 Batch 330/1540] avg loss 0.00599037, throughput 2.18935K wps
[Epoch 5 Batch 360/1540] avg loss 0.00595338, throughput 2.17813K wps
[Epoch 5 Batch 390/1540] avg loss 0.00658597, throughput 2.19002K wps
[Epoch 5 Batch 420/1540] avg loss 0.00588466, throughput 2.19283K wps
[Epoch 5 Batch 450/1540] avg loss 0.00568212, throughput 2.16709K wps
[Epoch 5 Batch 480/1540] avg loss 0.00576093, throughput 2.18799K wps
[Epoch 5 Batch 510/1540] avg loss 0.00574451, throughput 2.18953K wps
[Epoch 5 Batch 540/1540] avg loss 0.00607059, throughput 2.17518K wps
[Epoch 5 Batch 570/1540] avg loss 0.00560679, throughput 2.18415K wps
[Epoch 5 Batch 600/1540] avg loss 0.00628777, throughput 2.18557K wps
[Epoch 5 Batch 630/1540] avg loss 0.00599882, throughput 2.18563K wps
[Epoch 5 Batch 660/1540] avg loss 0.00598413, throughput 2.18899K wps
[Epoch 5 Batch 690/1540] avg loss 0.00567523, throughput 2.19081K wps
[Epoch 5 Batch 720/1540] avg loss 0.00609623, throughput 2.19011K wps
[Epoch 5 Batch 750/1540] avg loss 0.00574571, throughput 2.18828K wps
[Epoch 5 Batch 780/1540] avg loss 0.00599716, throughput 2.1814K wps
[Epoch 5 Batch 810/1540] avg loss 0.00550119, throughput 2.1882K wps
[Epoch 5 Batch 840/1540] avg loss 0.00602774, throughput 2.1881K wps
[Epoch 5 Batch 870/1540] avg loss 0.00595282, throughput 2.18222K wps
[Epoch 5 Batch 900/1540] avg loss 0.00568906, throughput 2.17423K wps
[Epoch 5 Batch 930/1540] avg loss 0.00611099, throughput 2.19334K wps
[Epoch 5 Batch 960/1540] avg loss 0.00572438, throughput 2.1847K wps
[Epoch 5 Batch 990/1540] avg loss 0.00564751, throughput 2.17815K wps
[Epoch 5 Batch 1020/1540] avg loss 0.00515724, throughput 2.17294K wps
[Epoch 5 Batch 1050/1540] avg loss 0.0055974, throughput 2.19325K wps
[Epoch 5 Batch 1080/1540] avg loss 0.00539506, throughput 2.17838K wps
[Epoch 5 Batch 1110/1540] avg loss 0.0058026, throughput 2.19366K wps
[Epoch 5 Batch 1140/1540] avg loss 0.00599375, throughput 2.19059K wps
[Epoch 5 Batch 1170/1540] avg loss 0.00571249, throughput 2.17506K wps
[Epoch 5 Batch 1200/1540] avg loss 0.00535521, throughput 2.18891K wps
[Epoch 5 Batch 1230/1540] avg loss 0.00558469, throughput 2.17012K wps
[Epoch 5 Batch 1260/1540] avg loss 0.00582353, throughput 2.16569K wps
[Epoch 5 Batch 1290/1540] avg loss 0.00563931, throughput 2.18378K wps
[Epoch 5 Batch 1320/1540] avg loss 0.00618556, throughput 2.18556K wps
[Epoch 5 Batch 1350/1540] avg loss 0.00557759, throughput 2.18566K wps
[Epoch 5 Batch 1380/1540] avg loss 0.00558943, throughput 2.18842K wps
[Epoch 5 Batch 1410/1540] avg loss 0.00547408, throughput 2.18899K wps
[Epoch 5 Batch 1440/1540] avg loss 0.00616519, throughput 2.19051K wps
[Epoch 5 Batch 1470/1540] avg loss 0.00572483, throughput 2.19074K wps
[Epoch 5 Batch 1500/1540] avg loss 0.00551094, throughput 2.1901K wps
[Epoch 5 Batch 1530/1540] avg loss 0.00560693, throughput 2.18889K wps
Begin Testing...
[Epoch 5] train avg loss 0.00578997, dev acc 0.8268, dev avg loss 0.381168, throughput 2.18527K wps
[Epoch 6 Batch 30/1540] avg loss 0.00547445, throughput 2.22197K wps
[Epoch 6 Batch 60/1540] avg loss 0.00543892, throughput 2.18704K wps
[Epoch 6 Batch 90/1540] avg loss 0.00575667, throughput 2.18273K wps
[Epoch 6 Batch 120/1540] avg loss 0.00555986, throughput 2.17462K wps
[Epoch 6 Batch 150/1540] avg loss 0.00596152, throughput 2.18696K wps
[Epoch 6 Batch 180/1540] avg loss 0.0051169, throughput 2.18699K wps
[Epoch 6 Batch 210/1540] avg loss 0.00552336, throughput 2.18063K wps
[Epoch 6 Batch 240/1540] avg loss 0.00621476, throughput 2.17766K wps
[Epoch 6 Batch 270/1540] avg loss 0.00558695, throughput 2.19409K wps
[Epoch 6 Batch 300/1540] avg loss 0.00537249, throughput 2.19182K wps
[Epoch 6 Batch 330/1540] avg loss 0.00511943, throughput 2.18524K wps
[Epoch 6 Batch 360/1540] avg loss 0.0056162, throughput 2.18921K wps
[Epoch 6 Batch 390/1540] avg loss 0.00562585, throughput 2.18278K wps
[Epoch 6 Batch 420/1540] avg loss 0.00545609, throughput 2.1797K wps
[Epoch 6 Batch 450/1540] avg loss 0.00565773, throughput 2.1832K wps
[Epoch 6 Batch 480/1540] avg loss 0.00594171, throughput 2.18764K wps
[Epoch 6 Batch 510/1540] avg loss 0.00503165, throughput 2.16963K wps
[Epoch 6 Batch 540/1540] avg loss 0.0053907, throughput 2.19006K wps
[Epoch 6 Batch 570/1540] avg loss 0.00535069, throughput 2.18747K wps
[Epoch 6 Batch 600/1540] avg loss 0.00563281, throughput 2.19135K wps
[Epoch 6 Batch 630/1540] avg loss 0.00545637, throughput 2.18285K wps
[Epoch 6 Batch 660/1540] avg loss 0.00526063, throughput 2.18265K wps
[Epoch 6 Batch 690/1540] avg loss 0.00508871, throughput 2.18617K wps
[Epoch 6 Batch 720/1540] avg loss 0.00557678, throughput 2.16867K wps
[Epoch 6 Batch 750/1540] avg loss 0.00549832, throughput 2.17006K wps
[Epoch 6 Batch 780/1540] avg loss 0.00534404, throughput 2.19406K wps
[Epoch 6 Batch 810/1540] avg loss 0.00553857, throughput 2.18806K wps
[Epoch 6 Batch 840/1540] avg loss 0.00553592, throughput 2.17822K wps
[Epoch 6 Batch 870/1540] avg loss 0.00582576, throughput 2.18256K wps
[Epoch 6 Batch 900/1540] avg loss 0.00534536, throughput 2.16677K wps
[Epoch 6 Batch 930/1540] avg loss 0.00541325, throughput 2.1898K wps
[Epoch 6 Batch 960/1540] avg loss 0.00560583, throughput 2.1906K wps
[Epoch 6 Batch 990/1540] avg loss 0.00530816, throughput 2.1794K wps
[Epoch 6 Batch 1020/1540] avg loss 0.00534103, throughput 2.17782K wps
[Epoch 6 Batch 1050/1540] avg loss 0.0051274, throughput 2.19094K wps
[Epoch 6 Batch 1080/1540] avg loss 0.00572717, throughput 2.18131K wps
[Epoch 6 Batch 1110/1540] avg loss 0.0053671, throughput 2.18732K wps
[Epoch 6 Batch 1140/1540] avg loss 0.00566715, throughput 2.17786K wps
[Epoch 6 Batch 1170/1540] avg loss 0.00543728, throughput 2.18582K wps
[Epoch 6 Batch 1200/1540] avg loss 0.00551025, throughput 2.18123K wps
[Epoch 6 Batch 1230/1540] avg loss 0.00529047, throughput 2.18785K wps
[Epoch 6 Batch 1260/1540] avg loss 0.00535439, throughput 2.18586K wps
[Epoch 6 Batch 1290/1540] avg loss 0.00549463, throughput 2.18903K wps
[Epoch 6 Batch 1320/1540] avg loss 0.00509551, throughput 2.18939K wps
[Epoch 6 Batch 1350/1540] avg loss 0.00571453, throughput 2.18758K wps
[Epoch 6 Batch 1380/1540] avg loss 0.00513685, throughput 2.18865K wps
[Epoch 6 Batch 1410/1540] avg loss 0.00538781, throughput 2.18362K wps
[Epoch 6 Batch 1440/1540] avg loss 0.00537788, throughput 2.14462K wps
[Epoch 6 Batch 1470/1540] avg loss 0.00537866, throughput 2.18346K wps
[Epoch 6 Batch 1500/1540] avg loss 0.0052571, throughput 2.19083K wps
[Epoch 6 Batch 1530/1540] avg loss 0.00549076, throughput 2.17992K wps
Begin Testing...
[Epoch 6] train avg loss 0.00546244, dev acc 0.8268, dev avg loss 0.379096, throughput 2.18396K wps
[Epoch 7 Batch 30/1540] avg loss 0.00514174, throughput 2.23833K wps
[Epoch 7 Batch 60/1540] avg loss 0.00548631, throughput 2.19089K wps
[Epoch 7 Batch 90/1540] avg loss 0.00476701, throughput 2.19236K wps
[Epoch 7 Batch 120/1540] avg loss 0.00489656, throughput 2.18942K wps
[Epoch 7 Batch 150/1540] avg loss 0.00524228, throughput 2.19363K wps
[Epoch 7 Batch 180/1540] avg loss 0.00515995, throughput 2.19019K wps
[Epoch 7 Batch 210/1540] avg loss 0.00515029, throughput 2.17975K wps
[Epoch 7 Batch 240/1540] avg loss 0.00590719, throughput 2.15477K wps
[Epoch 7 Batch 270/1540] avg loss 0.00544098, throughput 2.15478K wps
[Epoch 7 Batch 300/1540] avg loss 0.00516657, throughput 2.18874K wps
[Epoch 7 Batch 330/1540] avg loss 0.00498113, throughput 2.1774K wps
[Epoch 7 Batch 360/1540] avg loss 0.00507701, throughput 2.18002K wps
[Epoch 7 Batch 390/1540] avg loss 0.00498559, throughput 2.1848K wps
[Epoch 7 Batch 420/1540] avg loss 0.00535787, throughput 2.19305K wps
[Epoch 7 Batch 450/1540] avg loss 0.00549037, throughput 2.18189K wps
[Epoch 7 Batch 480/1540] avg loss 0.00488173, throughput 2.1923K wps
[Epoch 7 Batch 510/1540] avg loss 0.00531228, throughput 2.19458K wps
[Epoch 7 Batch 540/1540] avg loss 0.00524713, throughput 2.19332K wps
[Epoch 7 Batch 570/1540] avg loss 0.00525212, throughput 2.18826K wps
[Epoch 7 Batch 600/1540] avg loss 0.00506161, throughput 2.16175K wps
[Epoch 7 Batch 630/1540] avg loss 0.00513756, throughput 2.16588K wps
[Epoch 7 Batch 660/1540] avg loss 0.00543696, throughput 2.18487K wps
[Epoch 7 Batch 690/1540] avg loss 0.00540713, throughput 2.17609K wps
[Epoch 7 Batch 720/1540] avg loss 0.0052375, throughput 2.18661K wps
[Epoch 7 Batch 750/1540] avg loss 0.00530475, throughput 2.18518K wps
[Epoch 7 Batch 780/1540] avg loss 0.00500625, throughput 2.1897K wps
[Epoch 7 Batch 810/1540] avg loss 0.00483653, throughput 2.18886K wps
[Epoch 7 Batch 840/1540] avg loss 0.00536221, throughput 2.18386K wps
[Epoch 7 Batch 870/1540] avg loss 0.00553102, throughput 2.19071K wps
[Epoch 7 Batch 900/1540] avg loss 0.00490935, throughput 2.19198K wps
[Epoch 7 Batch 930/1540] avg loss 0.00454776, throughput 2.16071K wps
[Epoch 7 Batch 960/1540] avg loss 0.00556403, throughput 2.19057K wps
[Epoch 7 Batch 990/1540] avg loss 0.00509371, throughput 2.19095K wps
[Epoch 7 Batch 1020/1540] avg loss 0.0049287, throughput 2.17818K wps
[Epoch 7 Batch 1050/1540] avg loss 0.00482849, throughput 2.18534K wps
[Epoch 7 Batch 1080/1540] avg loss 0.00469698, throughput 2.18663K wps
[Epoch 7 Batch 1110/1540] avg loss 0.00500886, throughput 2.19308K wps
[Epoch 7 Batch 1140/1540] avg loss 0.00544324, throughput 2.18909K wps
[Epoch 7 Batch 1170/1540] avg loss 0.00510876, throughput 2.18858K wps
[Epoch 7 Batch 1200/1540] avg loss 0.00511026, throughput 2.1812K wps
[Epoch 7 Batch 1230/1540] avg loss 0.00504326, throughput 2.19541K wps
[Epoch 7 Batch 1260/1540] avg loss 0.00507618, throughput 2.18632K wps
[Epoch 7 Batch 1290/1540] avg loss 0.00521584, throughput 2.18601K wps
[Epoch 7 Batch 1320/1540] avg loss 0.00530556, throughput 2.19335K wps
[Epoch 7 Batch 1350/1540] avg loss 0.00482158, throughput 2.19429K wps
[Epoch 7 Batch 1380/1540] avg loss 0.00508076, throughput 2.18377K wps
[Epoch 7 Batch 1410/1540] avg loss 0.00479496, throughput 2.19003K wps
[Epoch 7 Batch 1440/1540] avg loss 0.00556662, throughput 2.18261K wps
[Epoch 7 Batch 1470/1540] avg loss 0.00509233, throughput 2.19135K wps
[Epoch 7 Batch 1500/1540] avg loss 0.00503577, throughput 2.17342K wps
[Epoch 7 Batch 1530/1540] avg loss 0.00501703, throughput 2.19447K wps
Begin Testing...
[Epoch 7] train avg loss 0.00514836, dev acc 0.8234, dev avg loss 0.380037, throughput 2.18594K wps
[Epoch 8 Batch 30/1540] avg loss 0.00452625, throughput 2.22621K wps
[Epoch 8 Batch 60/1540] avg loss 0.00489189, throughput 2.18599K wps
[Epoch 8 Batch 90/1540] avg loss 0.00461191, throughput 2.17946K wps
[Epoch 8 Batch 120/1540] avg loss 0.00495884, throughput 2.18959K wps
[Epoch 8 Batch 150/1540] avg loss 0.00483539, throughput 2.18992K wps
[Epoch 8 Batch 180/1540] avg loss 0.00518846, throughput 2.18409K wps
[Epoch 8 Batch 210/1540] avg loss 0.0048673, throughput 2.15902K wps
[Epoch 8 Batch 240/1540] avg loss 0.00494976, throughput 2.19371K wps
[Epoch 8 Batch 270/1540] avg loss 0.00460597, throughput 2.18974K wps
[Epoch 8 Batch 300/1540] avg loss 0.00484513, throughput 2.19052K wps
[Epoch 8 Batch 330/1540] avg loss 0.00471401, throughput 2.18798K wps
[Epoch 8 Batch 360/1540] avg loss 0.00453015, throughput 2.18999K wps
[Epoch 8 Batch 390/1540] avg loss 0.00454287, throughput 2.17413K wps
[Epoch 8 Batch 420/1540] avg loss 0.00453497, throughput 2.16081K wps
[Epoch 8 Batch 450/1540] avg loss 0.00487147, throughput 2.1773K wps
[Epoch 8 Batch 480/1540] avg loss 0.00489274, throughput 2.19421K wps
[Epoch 8 Batch 510/1540] avg loss 0.00486256, throughput 2.18138K wps
[Epoch 8 Batch 540/1540] avg loss 0.00468679, throughput 2.18694K wps
[Epoch 8 Batch 570/1540] avg loss 0.00529781, throughput 2.19026K wps
[Epoch 8 Batch 600/1540] avg loss 0.00516958, throughput 2.1948K wps
[Epoch 8 Batch 630/1540] avg loss 0.00505212, throughput 2.19417K wps
[Epoch 8 Batch 660/1540] avg loss 0.00459479, throughput 2.17707K wps
[Epoch 8 Batch 690/1540] avg loss 0.00527486, throughput 2.18776K wps
[Epoch 8 Batch 720/1540] avg loss 0.0047314, throughput 2.19108K wps
[Epoch 8 Batch 750/1540] avg loss 0.00509074, throughput 2.18944K wps
[Epoch 8 Batch 780/1540] avg loss 0.00486889, throughput 2.18214K wps
[Epoch 8 Batch 810/1540] avg loss 0.00483739, throughput 2.19524K wps
[Epoch 8 Batch 840/1540] avg loss 0.00482667, throughput 2.19318K wps
[Epoch 8 Batch 870/1540] avg loss 0.00488004, throughput 2.16443K wps
[Epoch 8 Batch 900/1540] avg loss 0.00463429, throughput 2.18085K wps
[Epoch 8 Batch 930/1540] avg loss 0.00468125, throughput 2.166K wps
[Epoch 8 Batch 960/1540] avg loss 0.00450353, throughput 2.18749K wps
[Epoch 8 Batch 990/1540] avg loss 0.00506424, throughput 2.19195K wps
[Epoch 8 Batch 1020/1540] avg loss 0.00497122, throughput 2.18707K wps
[Epoch 8 Batch 1050/1540] avg loss 0.00476792, throughput 2.18304K wps
[Epoch 8 Batch 1080/1540] avg loss 0.00443715, throughput 2.18396K wps
[Epoch 8 Batch 1110/1540] avg loss 0.00513939, throughput 2.18587K wps
[Epoch 8 Batch 1140/1540] avg loss 0.00488853, throughput 2.18538K wps
[Epoch 8 Batch 1170/1540] avg loss 0.00501128, throughput 2.18933K wps
[Epoch 8 Batch 1200/1540] avg loss 0.00578997, throughput 2.18039K wps
[Epoch 8 Batch 1230/1540] avg loss 0.00533665, throughput 2.18584K wps
[Epoch 8 Batch 1260/1540] avg loss 0.00485614, throughput 2.17043K wps
[Epoch 8 Batch 1290/1540] avg loss 0.00471365, throughput 2.16236K wps
[Epoch 8 Batch 1320/1540] avg loss 0.00497634, throughput 2.15514K wps
[Epoch 8 Batch 1350/1540] avg loss 0.00462865, throughput 2.19509K wps
[Epoch 8 Batch 1380/1540] avg loss 0.00494115, throughput 2.19287K wps
[Epoch 8 Batch 1410/1540] avg loss 0.0045927, throughput 2.17402K wps
[Epoch 8 Batch 1440/1540] avg loss 0.00458841, throughput 2.18649K wps
[Epoch 8 Batch 1470/1540] avg loss 0.00493182, throughput 2.18238K wps
[Epoch 8 Batch 1500/1540] avg loss 0.00450633, throughput 2.16926K wps
[Epoch 8 Batch 1530/1540] avg loss 0.0046761, throughput 2.19116K wps
Begin Testing...
[Epoch 8] train avg loss 0.00484995, dev acc 0.8245, dev avg loss 0.379135, throughput 2.18412K wps
[Epoch 9 Batch 30/1540] avg loss 0.00483205, throughput 2.226K wps
[Epoch 9 Batch 60/1540] avg loss 0.00470118, throughput 2.1847K wps
[Epoch 9 Batch 90/1540] avg loss 0.00446546, throughput 2.16215K wps
[Epoch 9 Batch 120/1540] avg loss 0.00477733, throughput 2.18221K wps
[Epoch 9 Batch 150/1540] avg loss 0.00505064, throughput 2.19287K wps
[Epoch 9 Batch 180/1540] avg loss 0.00471643, throughput 2.17507K wps
[Epoch 9 Batch 210/1540] avg loss 0.00491572, throughput 2.1917K wps
[Epoch 9 Batch 240/1540] avg loss 0.00456517, throughput 2.18966K wps
[Epoch 9 Batch 270/1540] avg loss 0.00435249, throughput 2.18462K wps
[Epoch 9 Batch 300/1540] avg loss 0.00475061, throughput 2.18703K wps
[Epoch 9 Batch 330/1540] avg loss 0.00436934, throughput 2.1752K wps
[Epoch 9 Batch 360/1540] avg loss 0.00468119, throughput 2.16658K wps
[Epoch 9 Batch 390/1540] avg loss 0.00429742, throughput 2.1655K wps
[Epoch 9 Batch 420/1540] avg loss 0.00449457, throughput 2.1861K wps
[Epoch 9 Batch 450/1540] avg loss 0.004253, throughput 2.18135K wps
[Epoch 9 Batch 480/1540] avg loss 0.0047885, throughput 2.17987K wps
[Epoch 9 Batch 510/1540] avg loss 0.00462768, throughput 2.19303K wps
[Epoch 9 Batch 540/1540] avg loss 0.00431087, throughput 2.18998K wps
[Epoch 9 Batch 570/1540] avg loss 0.00496256, throughput 2.17386K wps
[Epoch 9 Batch 600/1540] avg loss 0.00452238, throughput 2.15961K wps
[Epoch 9 Batch 630/1540] avg loss 0.00395195, throughput 2.14865K wps
[Epoch 9 Batch 660/1540] avg loss 0.00506452, throughput 2.18846K wps
[Epoch 9 Batch 690/1540] avg loss 0.00405242, throughput 2.19188K wps
[Epoch 9 Batch 720/1540] avg loss 0.00470619, throughput 2.17237K wps
[Epoch 9 Batch 750/1540] avg loss 0.00434454, throughput 2.18585K wps
[Epoch 9 Batch 780/1540] avg loss 0.00494692, throughput 2.19274K wps
[Epoch 9 Batch 810/1540] avg loss 0.00468512, throughput 2.18054K wps
[Epoch 9 Batch 840/1540] avg loss 0.00420457, throughput 2.1828K wps
[Epoch 9 Batch 870/1540] avg loss 0.00465349, throughput 2.17424K wps
[Epoch 9 Batch 900/1540] avg loss 0.00458077, throughput 2.17481K wps
[Epoch 9 Batch 930/1540] avg loss 0.0051469, throughput 2.19401K wps
[Epoch 9 Batch 960/1540] avg loss 0.00449443, throughput 2.18937K wps
[Epoch 9 Batch 990/1540] avg loss 0.00449611, throughput 2.18926K wps
[Epoch 9 Batch 1020/1540] avg loss 0.00449254, throughput 2.19054K wps
[Epoch 9 Batch 1050/1540] avg loss 0.00498296, throughput 2.18825K wps
[Epoch 9 Batch 1080/1540] avg loss 0.00458597, throughput 2.15744K wps
[Epoch 9 Batch 1110/1540] avg loss 0.0043775, throughput 2.16833K wps
[Epoch 9 Batch 1140/1540] avg loss 0.00472892, throughput 2.15741K wps
[Epoch 9 Batch 1170/1540] avg loss 0.00475775, throughput 2.19145K wps
[Epoch 9 Batch 1200/1540] avg loss 0.00456249, throughput 2.18502K wps
[Epoch 9 Batch 1230/1540] avg loss 0.00475854, throughput 2.19058K wps
[Epoch 9 Batch 1260/1540] avg loss 0.00474601, throughput 2.19541K wps
[Epoch 9 Batch 1290/1540] avg loss 0.00459218, throughput 2.18908K wps
[Epoch 9 Batch 1320/1540] avg loss 0.0044362, throughput 2.18594K wps
[Epoch 9 Batch 1350/1540] avg loss 0.0044295, throughput 2.18464K wps
[Epoch 9 Batch 1380/1540] avg loss 0.00494138, throughput 2.19259K wps
[Epoch 9 Batch 1410/1540] avg loss 0.00457128, throughput 2.17776K wps
[Epoch 9 Batch 1440/1540] avg loss 0.0043201, throughput 2.18498K wps
[Epoch 9 Batch 1470/1540] avg loss 0.0044673, throughput 2.18518K wps
[Epoch 9 Batch 1500/1540] avg loss 0.00467883, throughput 2.162K wps
[Epoch 9 Batch 1530/1540] avg loss 0.00435506, throughput 2.19285K wps
Begin Testing...
[Epoch 9] train avg loss 0.00460981, dev acc 0.8222, dev avg loss 0.383128, throughput 2.18225K wps
[Epoch 10 Batch 30/1540] avg loss 0.00425644, throughput 2.24274K wps
[Epoch 10 Batch 60/1540] avg loss 0.00436244, throughput 2.18361K wps
[Epoch 10 Batch 90/1540] avg loss 0.00468529, throughput 2.17645K wps
[Epoch 10 Batch 120/1540] avg loss 0.00412172, throughput 2.17858K wps
[Epoch 10 Batch 150/1540] avg loss 0.00420036, throughput 2.18011K wps
[Epoch 10 Batch 180/1540] avg loss 0.00397786, throughput 2.19663K wps
[Epoch 10 Batch 210/1540] avg loss 0.00417534, throughput 2.18996K wps
[Epoch 10 Batch 240/1540] avg loss 0.00487214, throughput 2.17944K wps
[Epoch 10 Batch 270/1540] avg loss 0.0045717, throughput 2.18263K wps
[Epoch 10 Batch 300/1540] avg loss 0.0042247, throughput 2.19086K wps
[Epoch 10 Batch 330/1540] avg loss 0.00405988, throughput 2.18022K wps
[Epoch 10 Batch 360/1540] avg loss 0.00440759, throughput 2.19387K wps
[Epoch 10 Batch 390/1540] avg loss 0.00445743, throughput 2.19543K wps
[Epoch 10 Batch 420/1540] avg loss 0.00413869, throughput 2.15794K wps
[Epoch 10 Batch 450/1540] avg loss 0.00459795, throughput 2.17618K wps
[Epoch 10 Batch 480/1540] avg loss 0.00470132, throughput 2.16811K wps
[Epoch 10 Batch 510/1540] avg loss 0.00391616, throughput 2.19373K wps
[Epoch 10 Batch 540/1540] avg loss 0.00450401, throughput 2.18762K wps
[Epoch 10 Batch 570/1540] avg loss 0.00439884, throughput 2.18234K wps
[Epoch 10 Batch 600/1540] avg loss 0.00474572, throughput 2.16456K wps
[Epoch 10 Batch 630/1540] avg loss 0.00487019, throughput 2.19219K wps
[Epoch 10 Batch 660/1540] avg loss 0.00454852, throughput 2.1776K wps
[Epoch 10 Batch 690/1540] avg loss 0.00404446, throughput 2.17344K wps
[Epoch 10 Batch 720/1540] avg loss 0.00414978, throughput 2.18914K wps
[Epoch 10 Batch 750/1540] avg loss 0.00431906, throughput 2.18905K wps
[Epoch 10 Batch 780/1540] avg loss 0.00461388, throughput 2.19446K wps
[Epoch 10 Batch 810/1540] avg loss 0.00435947, throughput 2.18463K wps
[Epoch 10 Batch 840/1540] avg loss 0.00410748, throughput 2.16423K wps
[Epoch 10 Batch 870/1540] avg loss 0.00482023, throughput 2.16518K wps
[Epoch 10 Batch 900/1540] avg loss 0.00482314, throughput 2.17925K wps
[Epoch 10 Batch 930/1540] avg loss 0.00425523, throughput 2.18258K wps
[Epoch 10 Batch 960/1540] avg loss 0.00454196, throughput 2.16836K wps
[Epoch 10 Batch 990/1540] avg loss 0.00452171, throughput 2.18665K wps
[Epoch 10 Batch 1020/1540] avg loss 0.00441894, throughput 2.1893K wps
[Epoch 10 Batch 1050/1540] avg loss 0.00460524, throughput 2.17456K wps
[Epoch 10 Batch 1080/1540] avg loss 0.00515151, throughput 2.17271K wps
[Epoch 10 Batch 1110/1540] avg loss 0.00450632, throughput 2.19369K wps
[Epoch 10 Batch 1140/1540] avg loss 0.00418497, throughput 2.16606K wps
[Epoch 10 Batch 1170/1540] avg loss 0.00428191, throughput 2.18043K wps
[Epoch 10 Batch 1200/1540] avg loss 0.00417852, throughput 2.16849K wps
[Epoch 10 Batch 1230/1540] avg loss 0.00442869, throughput 2.19025K wps
[Epoch 10 Batch 1260/1540] avg loss 0.00441903, throughput 2.1739K wps
[Epoch 10 Batch 1290/1540] avg loss 0.00477021, throughput 2.16828K wps
[Epoch 10 Batch 1320/1540] avg loss 0.00416839, throughput 2.18326K wps
[Epoch 10 Batch 1350/1540] avg loss 0.00432108, throughput 2.18619K wps
[Epoch 10 Batch 1380/1540] avg loss 0.00426259, throughput 2.18582K wps
[Epoch 10 Batch 1410/1540] avg loss 0.00418355, throughput 2.19017K wps
[Epoch 10 Batch 1440/1540] avg loss 0.00412839, throughput 2.1754K wps
[Epoch 10 Batch 1470/1540] avg loss 0.00416663, throughput 2.18612K wps
[Epoch 10 Batch 1500/1540] avg loss 0.00450557, throughput 2.18339K wps
[Epoch 10 Batch 1530/1540] avg loss 0.00432796, throughput 2.15578K wps
Begin Testing...
[Epoch 10] train avg loss 0.0044015, dev acc 0.8349, dev avg loss 0.385159, throughput 2.18187K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 11 Batch 30/1540] avg loss 0.00456307, throughput 2.22649K wps
[Epoch 11 Batch 60/1540] avg loss 0.00477982, throughput 2.18463K wps
[Epoch 11 Batch 90/1540] avg loss 0.00378208, throughput 2.18595K wps
[Epoch 11 Batch 120/1540] avg loss 0.00418682, throughput 2.1884K wps
[Epoch 11 Batch 150/1540] avg loss 0.00448236, throughput 2.18945K wps
[Epoch 11 Batch 180/1540] avg loss 0.00436383, throughput 2.18667K wps
[Epoch 11 Batch 210/1540] avg loss 0.0042405, throughput 2.1919K wps
[Epoch 11 Batch 240/1540] avg loss 0.00419048, throughput 2.19155K wps
[Epoch 11 Batch 270/1540] avg loss 0.00401388, throughput 2.18603K wps
[Epoch 11 Batch 300/1540] avg loss 0.00406606, throughput 2.18529K wps
[Epoch 11 Batch 330/1540] avg loss 0.00458725, throughput 2.16521K wps
[Epoch 11 Batch 360/1540] avg loss 0.00398209, throughput 2.18329K wps
[Epoch 11 Batch 390/1540] avg loss 0.00420729, throughput 2.19218K wps
[Epoch 11 Batch 420/1540] avg loss 0.00458051, throughput 2.18581K wps
[Epoch 11 Batch 450/1540] avg loss 0.00402361, throughput 2.19062K wps
[Epoch 11 Batch 480/1540] avg loss 0.00414879, throughput 2.19213K wps
[Epoch 11 Batch 510/1540] avg loss 0.00427151, throughput 2.1774K wps
[Epoch 11 Batch 540/1540] avg loss 0.0041979, throughput 2.17257K wps
[Epoch 11 Batch 570/1540] avg loss 0.00352997, throughput 2.18885K wps
[Epoch 11 Batch 600/1540] avg loss 0.00372118, throughput 2.18924K wps
[Epoch 11 Batch 630/1540] avg loss 0.00420126, throughput 2.16738K wps
[Epoch 11 Batch 660/1540] avg loss 0.00432329, throughput 2.14553K wps
[Epoch 11 Batch 690/1540] avg loss 0.00388147, throughput 2.18974K wps
[Epoch 11 Batch 720/1540] avg loss 0.0040849, throughput 2.16514K wps
[Epoch 11 Batch 750/1540] avg loss 0.00411015, throughput 2.1932K wps
[Epoch 11 Batch 780/1540] avg loss 0.00393489, throughput 2.18751K wps
[Epoch 11 Batch 810/1540] avg loss 0.00426376, throughput 2.18997K wps
[Epoch 11 Batch 840/1540] avg loss 0.00393009, throughput 2.19163K wps
[Epoch 11 Batch 870/1540] avg loss 0.00423012, throughput 2.18966K wps
[Epoch 11 Batch 900/1540] avg loss 0.00431127, throughput 2.18489K wps
[Epoch 11 Batch 930/1540] avg loss 0.00423499, throughput 2.1925K wps
[Epoch 11 Batch 960/1540] avg loss 0.00418623, throughput 2.18522K wps
[Epoch 11 Batch 990/1540] avg loss 0.00425246, throughput 2.17679K wps
[Epoch 11 Batch 1020/1540] avg loss 0.00492302, throughput 2.18279K wps
[Epoch 11 Batch 1050/1540] avg loss 0.00400473, throughput 2.19424K wps
[Epoch 11 Batch 1080/1540] avg loss 0.00381165, throughput 2.19178K wps
[Epoch 11 Batch 1110/1540] avg loss 0.00395076, throughput 2.186K wps
[Epoch 11 Batch 1140/1540] avg loss 0.00458219, throughput 2.18314K wps
[Epoch 11 Batch 1170/1540] avg loss 0.00415582, throughput 2.18739K wps
[Epoch 11 Batch 1200/1540] avg loss 0.00430515, throughput 2.18514K wps
[Epoch 11 Batch 1230/1540] avg loss 0.00382502, throughput 2.19116K wps
[Epoch 11 Batch 1260/1540] avg loss 0.0043187, throughput 2.19362K wps
[Epoch 11 Batch 1290/1540] avg loss 0.00410424, throughput 2.19226K wps
[Epoch 11 Batch 1320/1540] avg loss 0.00430942, throughput 2.17749K wps
[Epoch 11 Batch 1350/1540] avg loss 0.00439527, throughput 2.18911K wps
[Epoch 11 Batch 1380/1540] avg loss 0.00418027, throughput 2.19011K wps
[Epoch 11 Batch 1410/1540] avg loss 0.0040999, throughput 2.18783K wps
[Epoch 11 Batch 1440/1540] avg loss 0.0045369, throughput 2.1793K wps
[Epoch 11 Batch 1470/1540] avg loss 0.00405111, throughput 2.1733K wps
[Epoch 11 Batch 1500/1540] avg loss 0.00447017, throughput 2.15772K wps
[Epoch 11 Batch 1530/1540] avg loss 0.00446649, throughput 2.18937K wps
Begin Testing...
[Epoch 11] train avg loss 0.0042084, dev acc 0.8245, dev avg loss 0.387404, throughput 2.18524K wps
[Epoch 12 Batch 30/1540] avg loss 0.00408996, throughput 2.21373K wps
[Epoch 12 Batch 60/1540] avg loss 0.00395434, throughput 2.18306K wps
[Epoch 12 Batch 90/1540] avg loss 0.0045199, throughput 2.18629K wps
[Epoch 12 Batch 120/1540] avg loss 0.00436393, throughput 2.1907K wps
[Epoch 12 Batch 150/1540] avg loss 0.00411464, throughput 2.18375K wps
[Epoch 12 Batch 180/1540] avg loss 0.00398282, throughput 2.16102K wps
[Epoch 12 Batch 210/1540] avg loss 0.00364673, throughput 2.18478K wps
[Epoch 12 Batch 240/1540] avg loss 0.00394516, throughput 2.17927K wps
[Epoch 12 Batch 270/1540] avg loss 0.00399116, throughput 2.18478K wps
[Epoch 12 Batch 300/1540] avg loss 0.00417714, throughput 2.18426K wps
[Epoch 12 Batch 330/1540] avg loss 0.00417858, throughput 2.17482K wps
[Epoch 12 Batch 360/1540] avg loss 0.00384623, throughput 2.17161K wps
[Epoch 12 Batch 390/1540] avg loss 0.00407073, throughput 2.1893K wps
[Epoch 12 Batch 420/1540] avg loss 0.00453208, throughput 2.19337K wps
[Epoch 12 Batch 450/1540] avg loss 0.00417564, throughput 2.18919K wps
[Epoch 12 Batch 480/1540] avg loss 0.00448613, throughput 2.18114K wps
[Epoch 12 Batch 510/1540] avg loss 0.00370475, throughput 2.18301K wps
[Epoch 12 Batch 540/1540] avg loss 0.00368182, throughput 2.18961K wps
[Epoch 12 Batch 570/1540] avg loss 0.00411235, throughput 2.17823K wps
[Epoch 12 Batch 600/1540] avg loss 0.00413945, throughput 2.1881K wps
[Epoch 12 Batch 630/1540] avg loss 0.00419073, throughput 2.18099K wps
[Epoch 12 Batch 660/1540] avg loss 0.00418639, throughput 2.18784K wps
[Epoch 12 Batch 690/1540] avg loss 0.00392782, throughput 2.19296K wps
[Epoch 12 Batch 720/1540] avg loss 0.00428848, throughput 2.18027K wps
[Epoch 12 Batch 750/1540] avg loss 0.00372903, throughput 2.16304K wps
[Epoch 12 Batch 780/1540] avg loss 0.00369482, throughput 2.19013K wps
[Epoch 12 Batch 810/1540] avg loss 0.00415155, throughput 2.18428K wps
[Epoch 12 Batch 840/1540] avg loss 0.00415341, throughput 2.1859K wps
[Epoch 12 Batch 870/1540] avg loss 0.00469948, throughput 2.16866K wps
[Epoch 12 Batch 900/1540] avg loss 0.0040534, throughput 2.18261K wps
[Epoch 12 Batch 930/1540] avg loss 0.00380614, throughput 2.18848K wps
[Epoch 12 Batch 960/1540] avg loss 0.00400721, throughput 2.19234K wps
[Epoch 12 Batch 990/1540] avg loss 0.00433281, throughput 2.18753K wps
[Epoch 12 Batch 1020/1540] avg loss 0.00379637, throughput 2.19191K wps
[Epoch 12 Batch 1050/1540] avg loss 0.00376057, throughput 2.19346K wps
[Epoch 12 Batch 1080/1540] avg loss 0.00381688, throughput 2.19383K wps
[Epoch 12 Batch 1110/1540] avg loss 0.00394807, throughput 2.18353K wps
[Epoch 12 Batch 1140/1540] avg loss 0.00367058, throughput 2.18145K wps
[Epoch 12 Batch 1170/1540] avg loss 0.00416487, throughput 2.1847K wps
[Epoch 12 Batch 1200/1540] avg loss 0.00382054, throughput 2.18414K wps
[Epoch 12 Batch 1230/1540] avg loss 0.00405379, throughput 2.18467K wps
[Epoch 12 Batch 1260/1540] avg loss 0.00406614, throughput 2.19233K wps
[Epoch 12 Batch 1290/1540] avg loss 0.00427728, throughput 2.15634K wps
[Epoch 12 Batch 1320/1540] avg loss 0.0037922, throughput 2.18556K wps
[Epoch 12 Batch 1350/1540] avg loss 0.00402858, throughput 2.18834K wps
[Epoch 12 Batch 1380/1540] avg loss 0.00390966, throughput 2.17503K wps
[Epoch 12 Batch 1410/1540] avg loss 0.00378877, throughput 2.18545K wps
[Epoch 12 Batch 1440/1540] avg loss 0.00370384, throughput 2.18481K wps
[Epoch 12 Batch 1470/1540] avg loss 0.00367863, throughput 2.18826K wps
[Epoch 12 Batch 1500/1540] avg loss 0.00382626, throughput 2.18872K wps
[Epoch 12 Batch 1530/1540] avg loss 0.00396324, throughput 2.18909K wps
Begin Testing...
[Epoch 12] train avg loss 0.00402385, dev acc 0.8257, dev avg loss 0.392449, throughput 2.18438K wps
[Epoch 13 Batch 30/1540] avg loss 0.00387324, throughput 2.21057K wps
[Epoch 13 Batch 60/1540] avg loss 0.00393464, throughput 2.17496K wps
[Epoch 13 Batch 90/1540] avg loss 0.00417758, throughput 2.18768K wps
[Epoch 13 Batch 120/1540] avg loss 0.00376319, throughput 2.18784K wps
[Epoch 13 Batch 150/1540] avg loss 0.00378794, throughput 2.19188K wps
[Epoch 13 Batch 180/1540] avg loss 0.00378257, throughput 2.18588K wps
[Epoch 13 Batch 210/1540] avg loss 0.00382424, throughput 2.19258K wps
[Epoch 13 Batch 240/1540] avg loss 0.00366025, throughput 2.19155K wps
[Epoch 13 Batch 270/1540] avg loss 0.00369993, throughput 2.17239K wps
[Epoch 13 Batch 300/1540] avg loss 0.00383563, throughput 2.19017K wps
[Epoch 13 Batch 330/1540] avg loss 0.00382205, throughput 2.19151K wps
[Epoch 13 Batch 360/1540] avg loss 0.00365648, throughput 2.18374K wps
[Epoch 13 Batch 390/1540] avg loss 0.0035473, throughput 2.18986K wps
[Epoch 13 Batch 420/1540] avg loss 0.00406705, throughput 2.18868K wps
[Epoch 13 Batch 450/1540] avg loss 0.00384871, throughput 2.18093K wps
[Epoch 13 Batch 480/1540] avg loss 0.00387896, throughput 2.18615K wps
[Epoch 13 Batch 510/1540] avg loss 0.00395289, throughput 2.17315K wps
[Epoch 13 Batch 540/1540] avg loss 0.00403085, throughput 2.18029K wps
[Epoch 13 Batch 570/1540] avg loss 0.00375332, throughput 2.17918K wps
[Epoch 13 Batch 600/1540] avg loss 0.00365647, throughput 2.20666K wps
[Epoch 13 Batch 630/1540] avg loss 0.00435089, throughput 2.19118K wps
[Epoch 13 Batch 660/1540] avg loss 0.00398776, throughput 2.19174K wps
[Epoch 13 Batch 690/1540] avg loss 0.0037102, throughput 2.1676K wps
[Epoch 13 Batch 720/1540] avg loss 0.00383566, throughput 2.18192K wps
[Epoch 13 Batch 750/1540] avg loss 0.00379102, throughput 2.15951K wps
[Epoch 13 Batch 780/1540] avg loss 0.00383412, throughput 2.19177K wps
[Epoch 13 Batch 810/1540] avg loss 0.00363865, throughput 2.18895K wps
[Epoch 13 Batch 840/1540] avg loss 0.00357136, throughput 2.19067K wps
[Epoch 13 Batch 870/1540] avg loss 0.00416576, throughput 2.18658K wps
[Epoch 13 Batch 900/1540] avg loss 0.00390047, throughput 2.18931K wps
[Epoch 13 Batch 930/1540] avg loss 0.00397427, throughput 2.18678K wps
[Epoch 13 Batch 960/1540] avg loss 0.00353427, throughput 2.18724K wps
[Epoch 13 Batch 990/1540] avg loss 0.00375934, throughput 2.17532K wps
[Epoch 13 Batch 1020/1540] avg loss 0.00381061, throughput 2.17949K wps
[Epoch 13 Batch 1050/1540] avg loss 0.00370903, throughput 2.15923K wps
[Epoch 13 Batch 1080/1540] avg loss 0.0040359, throughput 2.17093K wps
[Epoch 13 Batch 1110/1540] avg loss 0.00375898, throughput 2.19136K wps
[Epoch 13 Batch 1140/1540] avg loss 0.00401237, throughput 2.18128K wps
[Epoch 13 Batch 1170/1540] avg loss 0.00382698, throughput 2.17385K wps
[Epoch 13 Batch 1200/1540] avg loss 0.00423617, throughput 2.18663K wps
[Epoch 13 Batch 1230/1540] avg loss 0.00383518, throughput 2.18104K wps
[Epoch 13 Batch 1260/1540] avg loss 0.00373835, throughput 2.19283K wps
[Epoch 13 Batch 1290/1540] avg loss 0.0040955, throughput 2.18805K wps
[Epoch 13 Batch 1320/1540] avg loss 0.00379287, throughput 2.18959K wps
[Epoch 13 Batch 1350/1540] avg loss 0.00338031, throughput 2.19101K wps
[Epoch 13 Batch 1380/1540] avg loss 0.00364339, throughput 2.17139K wps
[Epoch 13 Batch 1410/1540] avg loss 0.0039167, throughput 2.18664K wps
[Epoch 13 Batch 1440/1540] avg loss 0.00431711, throughput 2.19112K wps
[Epoch 13 Batch 1470/1540] avg loss 0.00368695, throughput 2.19208K wps
[Epoch 13 Batch 1500/1540] avg loss 0.00376415, throughput 2.1569K wps
[Epoch 13 Batch 1530/1540] avg loss 0.0036754, throughput 2.18852K wps
Begin Testing...
[Epoch 13] train avg loss 0.00384144, dev acc 0.8372, dev avg loss 0.389253, throughput 2.1845K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 14 Batch 30/1540] avg loss 0.00393246, throughput 2.215K wps
[Epoch 14 Batch 60/1540] avg loss 0.00336002, throughput 2.16548K wps
[Epoch 14 Batch 90/1540] avg loss 0.00352435, throughput 2.19104K wps
[Epoch 14 Batch 120/1540] avg loss 0.00396964, throughput 2.18858K wps
[Epoch 14 Batch 150/1540] avg loss 0.00367601, throughput 2.15171K wps
[Epoch 14 Batch 180/1540] avg loss 0.00371794, throughput 2.18918K wps
[Epoch 14 Batch 210/1540] avg loss 0.00341595, throughput 2.19203K wps
[Epoch 14 Batch 240/1540] avg loss 0.00367325, throughput 2.19019K wps
[Epoch 14 Batch 270/1540] avg loss 0.00362489, throughput 2.19373K wps
[Epoch 14 Batch 300/1540] avg loss 0.003776, throughput 2.18073K wps
[Epoch 14 Batch 330/1540] avg loss 0.003522, throughput 2.18668K wps
[Epoch 14 Batch 360/1540] avg loss 0.00368306, throughput 2.19036K wps
[Epoch 14 Batch 390/1540] avg loss 0.00363056, throughput 2.15991K wps
[Epoch 14 Batch 420/1540] avg loss 0.00351018, throughput 2.19233K wps
[Epoch 14 Batch 450/1540] avg loss 0.00405999, throughput 2.18166K wps
[Epoch 14 Batch 480/1540] avg loss 0.00356131, throughput 2.18836K wps
[Epoch 14 Batch 510/1540] avg loss 0.00373917, throughput 2.19004K wps
[Epoch 14 Batch 540/1540] avg loss 0.00384305, throughput 2.14279K wps
[Epoch 14 Batch 570/1540] avg loss 0.00391122, throughput 2.18219K wps
[Epoch 14 Batch 600/1540] avg loss 0.00362156, throughput 2.17779K wps
[Epoch 14 Batch 630/1540] avg loss 0.00376291, throughput 2.19182K wps
[Epoch 14 Batch 660/1540] avg loss 0.00376207, throughput 2.19256K wps
[Epoch 14 Batch 690/1540] avg loss 0.00363427, throughput 2.17979K wps
[Epoch 14 Batch 720/1540] avg loss 0.00362242, throughput 2.19143K wps
[Epoch 14 Batch 750/1540] avg loss 0.003927, throughput 2.19034K wps
[Epoch 14 Batch 780/1540] avg loss 0.00345144, throughput 2.19519K wps
[Epoch 14 Batch 810/1540] avg loss 0.00364732, throughput 2.19364K wps
[Epoch 14 Batch 840/1540] avg loss 0.00335615, throughput 2.19362K wps
[Epoch 14 Batch 870/1540] avg loss 0.00328271, throughput 2.19267K wps
[Epoch 14 Batch 900/1540] avg loss 0.00372105, throughput 2.18835K wps
[Epoch 14 Batch 930/1540] avg loss 0.0036174, throughput 2.19269K wps
[Epoch 14 Batch 960/1540] avg loss 0.00382666, throughput 2.19272K wps
[Epoch 14 Batch 990/1540] avg loss 0.00388628, throughput 2.18919K wps
[Epoch 14 Batch 1020/1540] avg loss 0.00382727, throughput 2.18035K wps
[Epoch 14 Batch 1050/1540] avg loss 0.00312107, throughput 2.19247K wps
[Epoch 14 Batch 1080/1540] avg loss 0.00391106, throughput 2.17767K wps
[Epoch 14 Batch 1110/1540] avg loss 0.00383381, throughput 2.1747K wps
[Epoch 14 Batch 1140/1540] avg loss 0.00426805, throughput 2.19421K wps
[Epoch 14 Batch 1170/1540] avg loss 0.00364117, throughput 2.19062K wps
[Epoch 14 Batch 1200/1540] avg loss 0.00372861, throughput 2.1948K wps
[Epoch 14 Batch 1230/1540] avg loss 0.00363796, throughput 2.18776K wps
[Epoch 14 Batch 1260/1540] avg loss 0.00350738, throughput 2.18768K wps
[Epoch 14 Batch 1290/1540] avg loss 0.0037945, throughput 2.19136K wps
[Epoch 14 Batch 1320/1540] avg loss 0.00356993, throughput 2.18616K wps
[Epoch 14 Batch 1350/1540] avg loss 0.00364509, throughput 2.18619K wps
[Epoch 14 Batch 1380/1540] avg loss 0.00373301, throughput 2.18776K wps
[Epoch 14 Batch 1410/1540] avg loss 0.00364464, throughput 2.18533K wps
[Epoch 14 Batch 1440/1540] avg loss 0.00433114, throughput 2.18967K wps
[Epoch 14 Batch 1470/1540] avg loss 0.00360177, throughput 2.18867K wps
[Epoch 14 Batch 1500/1540] avg loss 0.0034504, throughput 2.17195K wps
[Epoch 14 Batch 1530/1540] avg loss 0.00346928, throughput 2.18567K wps
Begin Testing...
[Epoch 14] train avg loss 0.00368844, dev acc 0.8383, dev avg loss 0.398709, throughput 2.18608K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 15 Batch 30/1540] avg loss 0.00359998, throughput 2.20457K wps
[Epoch 15 Batch 60/1540] avg loss 0.00345644, throughput 2.17953K wps
[Epoch 15 Batch 90/1540] avg loss 0.00352104, throughput 2.19653K wps
[Epoch 15 Batch 120/1540] avg loss 0.00381895, throughput 2.18836K wps
[Epoch 15 Batch 150/1540] avg loss 0.00410945, throughput 2.18001K wps
[Epoch 15 Batch 180/1540] avg loss 0.00347885, throughput 2.18947K wps
[Epoch 15 Batch 210/1540] avg loss 0.00382725, throughput 2.16508K wps
[Epoch 15 Batch 240/1540] avg loss 0.0035563, throughput 2.19318K wps
[Epoch 15 Batch 270/1540] avg loss 0.00344311, throughput 2.18905K wps
[Epoch 15 Batch 300/1540] avg loss 0.00357303, throughput 2.17989K wps
[Epoch 15 Batch 330/1540] avg loss 0.00344161, throughput 2.18009K wps
[Epoch 15 Batch 360/1540] avg loss 0.00361055, throughput 2.18036K wps
[Epoch 15 Batch 390/1540] avg loss 0.00341198, throughput 2.15651K wps
[Epoch 15 Batch 420/1540] avg loss 0.00369841, throughput 2.17886K wps
[Epoch 15 Batch 450/1540] avg loss 0.00371941, throughput 2.17616K wps
[Epoch 15 Batch 480/1540] avg loss 0.00332438, throughput 2.17476K wps
[Epoch 15 Batch 510/1540] avg loss 0.00320496, throughput 2.17503K wps
[Epoch 15 Batch 540/1540] avg loss 0.00301139, throughput 2.18729K wps
[Epoch 15 Batch 570/1540] avg loss 0.00359902, throughput 2.17563K wps
[Epoch 15 Batch 600/1540] avg loss 0.0034343, throughput 2.15583K wps
[Epoch 15 Batch 630/1540] avg loss 0.00318314, throughput 2.16529K wps
[Epoch 15 Batch 660/1540] avg loss 0.00375039, throughput 2.18419K wps
[Epoch 15 Batch 690/1540] avg loss 0.00326921, throughput 2.192K wps
[Epoch 15 Batch 720/1540] avg loss 0.00357303, throughput 2.17353K wps
[Epoch 15 Batch 750/1540] avg loss 0.00319568, throughput 2.17464K wps
[Epoch 15 Batch 780/1540] avg loss 0.00385849, throughput 2.18802K wps
[Epoch 15 Batch 810/1540] avg loss 0.00369068, throughput 2.18385K wps
[Epoch 15 Batch 840/1540] avg loss 0.00343683, throughput 2.18873K wps
[Epoch 15 Batch 870/1540] avg loss 0.00384782, throughput 2.19416K wps
[Epoch 15 Batch 900/1540] avg loss 0.00351875, throughput 2.19553K wps
[Epoch 15 Batch 930/1540] avg loss 0.00364552, throughput 2.19352K wps
[Epoch 15 Batch 960/1540] avg loss 0.00378362, throughput 2.19661K wps
[Epoch 15 Batch 990/1540] avg loss 0.00339371, throughput 2.19194K wps
[Epoch 15 Batch 1020/1540] avg loss 0.00330067, throughput 2.18161K wps
[Epoch 15 Batch 1050/1540] avg loss 0.003549, throughput 2.18288K wps
[Epoch 15 Batch 1080/1540] avg loss 0.0034079, throughput 2.14626K wps
[Epoch 15 Batch 1110/1540] avg loss 0.00349517, throughput 2.17601K wps
[Epoch 15 Batch 1140/1540] avg loss 0.00343068, throughput 2.19651K wps
[Epoch 15 Batch 1170/1540] avg loss 0.00344266, throughput 2.16618K wps
[Epoch 15 Batch 1200/1540] avg loss 0.00299157, throughput 2.17625K wps
[Epoch 15 Batch 1230/1540] avg loss 0.00367258, throughput 2.16912K wps
[Epoch 15 Batch 1260/1540] avg loss 0.00339152, throughput 2.17703K wps
[Epoch 15 Batch 1290/1540] avg loss 0.00376349, throughput 2.19084K wps
[Epoch 15 Batch 1320/1540] avg loss 0.00339921, throughput 2.19711K wps
[Epoch 15 Batch 1350/1540] avg loss 0.00353607, throughput 2.17599K wps
[Epoch 15 Batch 1380/1540] avg loss 0.00363597, throughput 2.19293K wps
[Epoch 15 Batch 1410/1540] avg loss 0.00412178, throughput 2.18478K wps
[Epoch 15 Batch 1440/1540] avg loss 0.00341123, throughput 2.18309K wps
[Epoch 15 Batch 1470/1540] avg loss 0.00343237, throughput 2.19123K wps
[Epoch 15 Batch 1500/1540] avg loss 0.00370386, throughput 2.19582K wps
[Epoch 15 Batch 1530/1540] avg loss 0.0036657, throughput 2.19149K wps
Begin Testing...
[Epoch 15] train avg loss 0.00353711, dev acc 0.8372, dev avg loss 0.398043, throughput 2.18247K wps
[Epoch 16 Batch 30/1540] avg loss 0.00352794, throughput 2.2312K wps
[Epoch 16 Batch 60/1540] avg loss 0.00317905, throughput 2.15809K wps
[Epoch 16 Batch 90/1540] avg loss 0.0034334, throughput 2.17702K wps
[Epoch 16 Batch 120/1540] avg loss 0.00357239, throughput 2.18254K wps
[Epoch 16 Batch 150/1540] avg loss 0.00328906, throughput 2.19466K wps
[Epoch 16 Batch 180/1540] avg loss 0.00348166, throughput 2.19423K wps
[Epoch 16 Batch 210/1540] avg loss 0.00348799, throughput 2.18419K wps
[Epoch 16 Batch 240/1540] avg loss 0.0035739, throughput 2.18546K wps
[Epoch 16 Batch 270/1540] avg loss 0.00323122, throughput 2.18945K wps
[Epoch 16 Batch 300/1540] avg loss 0.00381506, throughput 2.18731K wps
[Epoch 16 Batch 330/1540] avg loss 0.0032098, throughput 2.1921K wps
[Epoch 16 Batch 360/1540] avg loss 0.00318236, throughput 2.19079K wps
[Epoch 16 Batch 390/1540] avg loss 0.00339765, throughput 2.19217K wps
[Epoch 16 Batch 420/1540] avg loss 0.00353705, throughput 2.17589K wps
[Epoch 16 Batch 450/1540] avg loss 0.00361715, throughput 2.19496K wps
[Epoch 16 Batch 480/1540] avg loss 0.00342778, throughput 2.19466K wps
[Epoch 16 Batch 510/1540] avg loss 0.00372262, throughput 2.189K wps
[Epoch 16 Batch 540/1540] avg loss 0.00347214, throughput 2.19331K wps
[Epoch 16 Batch 570/1540] avg loss 0.00321932, throughput 2.18099K wps
[Epoch 16 Batch 600/1540] avg loss 0.00330836, throughput 2.19197K wps
[Epoch 16 Batch 630/1540] avg loss 0.00331602, throughput 2.19386K wps
[Epoch 16 Batch 660/1540] avg loss 0.00353339, throughput 2.19517K wps
[Epoch 16 Batch 690/1540] avg loss 0.00348396, throughput 2.18713K wps
[Epoch 16 Batch 720/1540] avg loss 0.00317941, throughput 2.16392K wps
[Epoch 16 Batch 750/1540] avg loss 0.00346714, throughput 2.18459K wps
[Epoch 16 Batch 780/1540] avg loss 0.00320688, throughput 2.19191K wps
[Epoch 16 Batch 810/1540] avg loss 0.00347097, throughput 2.19256K wps
[Epoch 16 Batch 840/1540] avg loss 0.00329445, throughput 2.16765K wps
[Epoch 16 Batch 870/1540] avg loss 0.00345574, throughput 2.15673K wps
[Epoch 16 Batch 900/1540] avg loss 0.00336671, throughput 2.18512K wps
[Epoch 16 Batch 930/1540] avg loss 0.00378729, throughput 2.17698K wps
[Epoch 16 Batch 960/1540] avg loss 0.00355733, throughput 2.19367K wps
[Epoch 16 Batch 990/1540] avg loss 0.00340932, throughput 2.18628K wps
[Epoch 16 Batch 1020/1540] avg loss 0.00325579, throughput 2.17977K wps
[Epoch 16 Batch 1050/1540] avg loss 0.00329939, throughput 2.19567K wps
[Epoch 16 Batch 1080/1540] avg loss 0.00379717, throughput 2.19432K wps
[Epoch 16 Batch 1110/1540] avg loss 0.00335988, throughput 2.18958K wps
[Epoch 16 Batch 1140/1540] avg loss 0.00334152, throughput 2.19225K wps
[Epoch 16 Batch 1170/1540] avg loss 0.0032309, throughput 2.18525K wps
[Epoch 16 Batch 1200/1540] avg loss 0.00349658, throughput 2.19495K wps
[Epoch 16 Batch 1230/1540] avg loss 0.00344614, throughput 2.16887K wps
[Epoch 16 Batch 1260/1540] avg loss 0.00329321, throughput 2.16156K wps
[Epoch 16 Batch 1290/1540] avg loss 0.00347079, throughput 2.18158K wps
[Epoch 16 Batch 1320/1540] avg loss 0.00377568, throughput 2.16328K wps
[Epoch 16 Batch 1350/1540] avg loss 0.00314219, throughput 2.19121K wps
[Epoch 16 Batch 1380/1540] avg loss 0.00307698, throughput 2.18275K wps
[Epoch 16 Batch 1410/1540] avg loss 0.0034437, throughput 2.18955K wps
[Epoch 16 Batch 1440/1540] avg loss 0.0034422, throughput 2.18744K wps
[Epoch 16 Batch 1470/1540] avg loss 0.00313817, throughput 2.18797K wps
[Epoch 16 Batch 1500/1540] avg loss 0.00351405, throughput 2.19399K wps
[Epoch 16 Batch 1530/1540] avg loss 0.00379828, throughput 2.19354K wps
Begin Testing...
[Epoch 16] train avg loss 0.00342574, dev acc 0.8314, dev avg loss 0.399706, throughput 2.186K wps
[Epoch 17 Batch 30/1540] avg loss 0.00287364, throughput 2.21962K wps
[Epoch 17 Batch 60/1540] avg loss 0.00351505, throughput 2.18791K wps
[Epoch 17 Batch 90/1540] avg loss 0.00320969, throughput 2.18259K wps
[Epoch 17 Batch 120/1540] avg loss 0.00305262, throughput 2.18586K wps
[Epoch 17 Batch 150/1540] avg loss 0.00333938, throughput 2.19466K wps
[Epoch 17 Batch 180/1540] avg loss 0.00334824, throughput 2.18756K wps
[Epoch 17 Batch 210/1540] avg loss 0.00340764, throughput 2.17396K wps
[Epoch 17 Batch 240/1540] avg loss 0.00305954, throughput 2.18197K wps
[Epoch 17 Batch 270/1540] avg loss 0.00277222, throughput 2.1779K wps
[Epoch 17 Batch 300/1540] avg loss 0.00299198, throughput 2.19173K wps
[Epoch 17 Batch 330/1540] avg loss 0.0034564, throughput 2.18971K wps
[Epoch 17 Batch 360/1540] avg loss 0.0029248, throughput 2.1842K wps
[Epoch 17 Batch 390/1540] avg loss 0.00290109, throughput 2.1933K wps
[Epoch 17 Batch 420/1540] avg loss 0.00299547, throughput 2.19068K wps
[Epoch 17 Batch 450/1540] avg loss 0.00328947, throughput 2.17878K wps
[Epoch 17 Batch 480/1540] avg loss 0.00345571, throughput 2.19601K wps
[Epoch 17 Batch 510/1540] avg loss 0.00333137, throughput 2.19343K wps
[Epoch 17 Batch 540/1540] avg loss 0.00288601, throughput 2.1842K wps
[Epoch 17 Batch 570/1540] avg loss 0.00313089, throughput 2.18354K wps
[Epoch 17 Batch 600/1540] avg loss 0.00303374, throughput 2.19389K wps
[Epoch 17 Batch 630/1540] avg loss 0.00323535, throughput 2.18747K wps
[Epoch 17 Batch 660/1540] avg loss 0.00355078, throughput 2.18772K wps
[Epoch 17 Batch 690/1540] avg loss 0.0031852, throughput 2.19686K wps
[Epoch 17 Batch 720/1540] avg loss 0.00321032, throughput 2.15842K wps
[Epoch 17 Batch 750/1540] avg loss 0.00302007, throughput 2.17763K wps
[Epoch 17 Batch 780/1540] avg loss 0.00328848, throughput 2.17689K wps
[Epoch 17 Batch 810/1540] avg loss 0.00360966, throughput 2.18338K wps
[Epoch 17 Batch 840/1540] avg loss 0.00343104, throughput 2.18742K wps
[Epoch 17 Batch 870/1540] avg loss 0.00346591, throughput 2.18731K wps
[Epoch 17 Batch 900/1540] avg loss 0.00347389, throughput 2.18694K wps
[Epoch 17 Batch 930/1540] avg loss 0.00303319, throughput 2.1911K wps
[Epoch 17 Batch 960/1540] avg loss 0.00316245, throughput 2.1878K wps
[Epoch 17 Batch 990/1540] avg loss 0.00347098, throughput 2.19222K wps
[Epoch 17 Batch 1020/1540] avg loss 0.00357904, throughput 2.16663K wps
[Epoch 17 Batch 1050/1540] avg loss 0.00350802, throughput 2.17858K wps
[Epoch 17 Batch 1080/1540] avg loss 0.00307623, throughput 2.17281K wps
[Epoch 17 Batch 1110/1540] avg loss 0.00373848, throughput 2.18323K wps
[Epoch 17 Batch 1140/1540] avg loss 0.00329494, throughput 2.17689K wps
[Epoch 17 Batch 1170/1540] avg loss 0.00318228, throughput 2.186K wps
[Epoch 17 Batch 1200/1540] avg loss 0.00352078, throughput 2.19251K wps
[Epoch 17 Batch 1230/1540] avg loss 0.00364294, throughput 2.17422K wps
[Epoch 17 Batch 1260/1540] avg loss 0.00347588, throughput 2.19K wps
[Epoch 17 Batch 1290/1540] avg loss 0.00298218, throughput 2.19482K wps
[Epoch 17 Batch 1320/1540] avg loss 0.00313944, throughput 2.19586K wps
[Epoch 17 Batch 1350/1540] avg loss 0.00338874, throughput 2.1827K wps
[Epoch 17 Batch 1380/1540] avg loss 0.00309312, throughput 2.16328K wps
[Epoch 17 Batch 1410/1540] avg loss 0.00355499, throughput 2.19165K wps
[Epoch 17 Batch 1440/1540] avg loss 0.00359586, throughput 2.19522K wps
[Epoch 17 Batch 1470/1540] avg loss 0.00373508, throughput 2.19123K wps
[Epoch 17 Batch 1500/1540] avg loss 0.00350993, throughput 2.18894K wps
[Epoch 17 Batch 1530/1540] avg loss 0.00404921, throughput 2.19301K wps
Begin Testing...
[Epoch 17] train avg loss 0.00329967, dev acc 0.8383, dev avg loss 0.406653, throughput 2.18614K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 18 Batch 30/1540] avg loss 0.00307364, throughput 2.22714K wps
[Epoch 18 Batch 60/1540] avg loss 0.00294757, throughput 2.18063K wps
[Epoch 18 Batch 90/1540] avg loss 0.00270548, throughput 2.19398K wps
[Epoch 18 Batch 120/1540] avg loss 0.00315179, throughput 2.19311K wps
[Epoch 18 Batch 150/1540] avg loss 0.00316639, throughput 2.19177K wps
[Epoch 18 Batch 180/1540] avg loss 0.00297282, throughput 2.18786K wps
[Epoch 18 Batch 210/1540] avg loss 0.00280417, throughput 2.1931K wps
[Epoch 18 Batch 240/1540] avg loss 0.00310603, throughput 2.1785K wps
[Epoch 18 Batch 270/1540] avg loss 0.00302065, throughput 2.19429K wps
[Epoch 18 Batch 300/1540] avg loss 0.00299265, throughput 2.16743K wps
[Epoch 18 Batch 330/1540] avg loss 0.00324312, throughput 2.18712K wps
[Epoch 18 Batch 360/1540] avg loss 0.00334077, throughput 2.17135K wps
[Epoch 18 Batch 390/1540] avg loss 0.00336898, throughput 2.17365K wps
[Epoch 18 Batch 420/1540] avg loss 0.00315245, throughput 2.17888K wps
[Epoch 18 Batch 450/1540] avg loss 0.00312398, throughput 2.19161K wps
[Epoch 18 Batch 480/1540] avg loss 0.00319027, throughput 2.1942K wps
[Epoch 18 Batch 510/1540] avg loss 0.00306981, throughput 2.19172K wps
[Epoch 18 Batch 540/1540] avg loss 0.00325074, throughput 2.18971K wps
[Epoch 18 Batch 570/1540] avg loss 0.00294071, throughput 2.16175K wps
[Epoch 18 Batch 600/1540] avg loss 0.00330158, throughput 2.19249K wps
[Epoch 18 Batch 630/1540] avg loss 0.00307024, throughput 2.18687K wps
[Epoch 18 Batch 660/1540] avg loss 0.00328189, throughput 2.15164K wps
[Epoch 18 Batch 690/1540] avg loss 0.00341611, throughput 2.17477K wps
[Epoch 18 Batch 720/1540] avg loss 0.00325358, throughput 2.19138K wps
[Epoch 18 Batch 750/1540] avg loss 0.00280716, throughput 2.17381K wps
[Epoch 18 Batch 780/1540] avg loss 0.00291819, throughput 2.18035K wps
[Epoch 18 Batch 810/1540] avg loss 0.00304467, throughput 2.16957K wps
[Epoch 18 Batch 840/1540] avg loss 0.00351955, throughput 2.19051K wps
[Epoch 18 Batch 870/1540] avg loss 0.00350564, throughput 2.19639K wps
[Epoch 18 Batch 900/1540] avg loss 0.00331564, throughput 2.19303K wps
[Epoch 18 Batch 930/1540] avg loss 0.00320611, throughput 2.18129K wps
[Epoch 18 Batch 960/1540] avg loss 0.00305108, throughput 2.16395K wps
[Epoch 18 Batch 990/1540] avg loss 0.00365992, throughput 2.18759K wps
[Epoch 18 Batch 1020/1540] avg loss 0.00332774, throughput 2.19766K wps
[Epoch 18 Batch 1050/1540] avg loss 0.00346322, throughput 2.18822K wps
[Epoch 18 Batch 1080/1540] avg loss 0.00331018, throughput 2.18574K wps
[Epoch 18 Batch 1110/1540] avg loss 0.00306773, throughput 2.1722K wps
[Epoch 18 Batch 1140/1540] avg loss 0.00333519, throughput 2.19088K wps
[Epoch 18 Batch 1170/1540] avg loss 0.00281714, throughput 2.19091K wps
[Epoch 18 Batch 1200/1540] avg loss 0.00338109, throughput 2.17036K wps
[Epoch 18 Batch 1230/1540] avg loss 0.00324419, throughput 2.17266K wps
[Epoch 18 Batch 1260/1540] avg loss 0.00327745, throughput 2.18713K wps
[Epoch 18 Batch 1290/1540] avg loss 0.00337527, throughput 2.1961K wps
[Epoch 18 Batch 1320/1540] avg loss 0.00285202, throughput 2.19326K wps
[Epoch 18 Batch 1350/1540] avg loss 0.00361605, throughput 2.17126K wps
[Epoch 18 Batch 1380/1540] avg loss 0.00325989, throughput 2.18051K wps
[Epoch 18 Batch 1410/1540] avg loss 0.003193, throughput 2.19465K wps
[Epoch 18 Batch 1440/1540] avg loss 0.00330425, throughput 2.18782K wps
[Epoch 18 Batch 1470/1540] avg loss 0.00329172, throughput 2.17435K wps
[Epoch 18 Batch 1500/1540] avg loss 0.00319818, throughput 2.18489K wps
[Epoch 18 Batch 1530/1540] avg loss 0.00297726, throughput 2.17817K wps
Begin Testing...
[Epoch 18] train avg loss 0.00318265, dev acc 0.8417, dev avg loss 0.413191, throughput 2.18414K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 19 Batch 30/1540] avg loss 0.00328705, throughput 2.22688K wps
[Epoch 19 Batch 60/1540] avg loss 0.00286874, throughput 2.15836K wps
[Epoch 19 Batch 90/1540] avg loss 0.0029069, throughput 2.19003K wps
[Epoch 19 Batch 120/1540] avg loss 0.00285929, throughput 2.18061K wps
[Epoch 19 Batch 150/1540] avg loss 0.00312594, throughput 2.18168K wps
[Epoch 19 Batch 180/1540] avg loss 0.00309778, throughput 2.19246K wps
[Epoch 19 Batch 210/1540] avg loss 0.0029891, throughput 2.18899K wps
[Epoch 19 Batch 240/1540] avg loss 0.00274964, throughput 2.17403K wps
[Epoch 19 Batch 270/1540] avg loss 0.00318064, throughput 2.19181K wps
[Epoch 19 Batch 300/1540] avg loss 0.00285883, throughput 2.18466K wps
[Epoch 19 Batch 330/1540] avg loss 0.00318053, throughput 2.18442K wps
[Epoch 19 Batch 360/1540] avg loss 0.00296837, throughput 2.18564K wps
[Epoch 19 Batch 390/1540] avg loss 0.00311791, throughput 2.18997K wps
[Epoch 19 Batch 420/1540] avg loss 0.00307821, throughput 2.19225K wps
[Epoch 19 Batch 450/1540] avg loss 0.00342093, throughput 2.18528K wps
[Epoch 19 Batch 480/1540] avg loss 0.0029348, throughput 2.18485K wps
[Epoch 19 Batch 510/1540] avg loss 0.00336017, throughput 2.18497K wps
[Epoch 19 Batch 540/1540] avg loss 0.00316109, throughput 2.17202K wps
[Epoch 19 Batch 570/1540] avg loss 0.00301428, throughput 2.18927K wps
[Epoch 19 Batch 600/1540] avg loss 0.00300875, throughput 2.18944K wps
[Epoch 19 Batch 630/1540] avg loss 0.00322104, throughput 2.18819K wps
[Epoch 19 Batch 660/1540] avg loss 0.00292634, throughput 2.19486K wps
[Epoch 19 Batch 690/1540] avg loss 0.00320109, throughput 2.17348K wps
[Epoch 19 Batch 720/1540] avg loss 0.00296679, throughput 2.19646K wps
[Epoch 19 Batch 750/1540] avg loss 0.00310411, throughput 2.19591K wps
[Epoch 19 Batch 780/1540] avg loss 0.00293159, throughput 2.16499K wps
[Epoch 19 Batch 810/1540] avg loss 0.00314177, throughput 2.18469K wps
[Epoch 19 Batch 840/1540] avg loss 0.00322311, throughput 2.16796K wps
[Epoch 19 Batch 870/1540] avg loss 0.00358792, throughput 2.17838K wps
[Epoch 19 Batch 900/1540] avg loss 0.00298161, throughput 2.18565K wps
[Epoch 19 Batch 930/1540] avg loss 0.00309681, throughput 2.19386K wps
[Epoch 19 Batch 960/1540] avg loss 0.00300891, throughput 2.18565K wps
[Epoch 19 Batch 990/1540] avg loss 0.00286606, throughput 2.18995K wps
[Epoch 19 Batch 1020/1540] avg loss 0.00304628, throughput 2.18954K wps
[Epoch 19 Batch 1050/1540] avg loss 0.00277715, throughput 2.19209K wps
[Epoch 19 Batch 1080/1540] avg loss 0.00312181, throughput 2.19669K wps
[Epoch 19 Batch 1110/1540] avg loss 0.00307803, throughput 2.18169K wps
[Epoch 19 Batch 1140/1540] avg loss 0.00324243, throughput 2.17898K wps
[Epoch 19 Batch 1170/1540] avg loss 0.00317781, throughput 2.19054K wps
[Epoch 19 Batch 1200/1540] avg loss 0.00301338, throughput 2.18933K wps
[Epoch 19 Batch 1230/1540] avg loss 0.00290347, throughput 2.18491K wps
[Epoch 19 Batch 1260/1540] avg loss 0.00313359, throughput 2.16101K wps
[Epoch 19 Batch 1290/1540] avg loss 0.00282421, throughput 2.19437K wps
[Epoch 19 Batch 1320/1540] avg loss 0.00316614, throughput 2.17068K wps
[Epoch 19 Batch 1350/1540] avg loss 0.00332197, throughput 2.18071K wps
[Epoch 19 Batch 1380/1540] avg loss 0.00305609, throughput 2.17179K wps
[Epoch 19 Batch 1410/1540] avg loss 0.00286796, throughput 2.19205K wps
[Epoch 19 Batch 1440/1540] avg loss 0.00282449, throughput 2.19409K wps
[Epoch 19 Batch 1470/1540] avg loss 0.00299165, throughput 2.19552K wps
[Epoch 19 Batch 1500/1540] avg loss 0.00305642, throughput 2.17576K wps
[Epoch 19 Batch 1530/1540] avg loss 0.00355172, throughput 2.18508K wps
Begin Testing...
[Epoch 19] train avg loss 0.00307184, dev acc 0.8383, dev avg loss 0.414482, throughput 2.18529K wps
[Epoch 20 Batch 30/1540] avg loss 0.00335633, throughput 2.21326K wps
[Epoch 20 Batch 60/1540] avg loss 0.00285902, throughput 2.17646K wps
[Epoch 20 Batch 90/1540] avg loss 0.00286876, throughput 2.16986K wps
[Epoch 20 Batch 120/1540] avg loss 0.00285461, throughput 2.18778K wps
[Epoch 20 Batch 150/1540] avg loss 0.00284125, throughput 2.1903K wps
[Epoch 20 Batch 180/1540] avg loss 0.00314666, throughput 2.19373K wps
[Epoch 20 Batch 210/1540] avg loss 0.00330566, throughput 2.18879K wps
[Epoch 20 Batch 240/1540] avg loss 0.00288951, throughput 2.19243K wps
[Epoch 20 Batch 270/1540] avg loss 0.00311471, throughput 2.19317K wps
[Epoch 20 Batch 300/1540] avg loss 0.00289416, throughput 2.18552K wps
[Epoch 20 Batch 330/1540] avg loss 0.00314411, throughput 2.1895K wps
[Epoch 20 Batch 360/1540] avg loss 0.00299871, throughput 2.17005K wps
[Epoch 20 Batch 390/1540] avg loss 0.00299413, throughput 2.19152K wps
[Epoch 20 Batch 420/1540] avg loss 0.00305507, throughput 2.1827K wps
[Epoch 20 Batch 450/1540] avg loss 0.0027737, throughput 2.15787K wps
[Epoch 20 Batch 480/1540] avg loss 0.00292446, throughput 2.16796K wps
[Epoch 20 Batch 510/1540] avg loss 0.00289093, throughput 2.19254K wps
[Epoch 20 Batch 540/1540] avg loss 0.00302477, throughput 2.18785K wps
[Epoch 20 Batch 570/1540] avg loss 0.00276817, throughput 2.196K wps
[Epoch 20 Batch 600/1540] avg loss 0.00302515, throughput 2.18808K wps
[Epoch 20 Batch 630/1540] avg loss 0.00329232, throughput 2.18142K wps
[Epoch 20 Batch 660/1540] avg loss 0.00284712, throughput 2.18496K wps
[Epoch 20 Batch 690/1540] avg loss 0.003036, throughput 2.18899K wps
[Epoch 20 Batch 720/1540] avg loss 0.00316483, throughput 2.19068K wps
[Epoch 20 Batch 750/1540] avg loss 0.00340316, throughput 2.19338K wps
[Epoch 20 Batch 780/1540] avg loss 0.00264958, throughput 2.18948K wps
[Epoch 20 Batch 810/1540] avg loss 0.00285623, throughput 2.18939K wps
[Epoch 20 Batch 840/1540] avg loss 0.00294877, throughput 2.15044K wps
[Epoch 20 Batch 870/1540] avg loss 0.00272366, throughput 2.18275K wps
[Epoch 20 Batch 900/1540] avg loss 0.00302545, throughput 2.19534K wps
[Epoch 20 Batch 930/1540] avg loss 0.00271612, throughput 2.19502K wps
[Epoch 20 Batch 960/1540] avg loss 0.00322653, throughput 2.18814K wps
[Epoch 20 Batch 990/1540] avg loss 0.0029418, throughput 2.19106K wps
[Epoch 20 Batch 1020/1540] avg loss 0.00262573, throughput 2.17975K wps
[Epoch 20 Batch 1050/1540] avg loss 0.00289883, throughput 2.17125K wps
[Epoch 20 Batch 1080/1540] avg loss 0.00297449, throughput 2.16787K wps
[Epoch 20 Batch 1110/1540] avg loss 0.00275609, throughput 2.17911K wps
[Epoch 20 Batch 1140/1540] avg loss 0.00309385, throughput 2.1814K wps
[Epoch 20 Batch 1170/1540] avg loss 0.00331986, throughput 2.16303K wps
[Epoch 20 Batch 1200/1540] avg loss 0.00303132, throughput 2.18656K wps
[Epoch 20 Batch 1230/1540] avg loss 0.00300786, throughput 2.19129K wps
[Epoch 20 Batch 1260/1540] avg loss 0.0024922, throughput 2.1881K wps
[Epoch 20 Batch 1290/1540] avg loss 0.00290118, throughput 2.19047K wps
[Epoch 20 Batch 1320/1540] avg loss 0.00321767, throughput 2.18975K wps
[Epoch 20 Batch 1350/1540] avg loss 0.00300401, throughput 2.19208K wps
[Epoch 20 Batch 1380/1540] avg loss 0.00280171, throughput 2.19676K wps
[Epoch 20 Batch 1410/1540] avg loss 0.00350244, throughput 2.18808K wps
[Epoch 20 Batch 1440/1540] avg loss 0.00295601, throughput 2.19051K wps
[Epoch 20 Batch 1470/1540] avg loss 0.00316307, throughput 2.19039K wps
[Epoch 20 Batch 1500/1540] avg loss 0.00309169, throughput 2.18822K wps
[Epoch 20 Batch 1530/1540] avg loss 0.00307539, throughput 2.1838K wps
Begin Testing...
[Epoch 20] train avg loss 0.00298901, dev acc 0.8440, dev avg loss 0.414944, throughput 2.18544K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 21 Batch 30/1540] avg loss 0.00303243, throughput 2.20385K wps
[Epoch 21 Batch 60/1540] avg loss 0.00272466, throughput 2.19049K wps
[Epoch 21 Batch 90/1540] avg loss 0.00283699, throughput 2.19421K wps
[Epoch 21 Batch 120/1540] avg loss 0.00263615, throughput 2.19149K wps
[Epoch 21 Batch 150/1540] avg loss 0.00325635, throughput 2.18738K wps
[Epoch 21 Batch 180/1540] avg loss 0.00302851, throughput 2.19414K wps
[Epoch 21 Batch 210/1540] avg loss 0.0032284, throughput 2.18443K wps
[Epoch 21 Batch 240/1540] avg loss 0.00338377, throughput 2.19348K wps
[Epoch 21 Batch 270/1540] avg loss 0.00283892, throughput 2.1857K wps
[Epoch 21 Batch 300/1540] avg loss 0.00245384, throughput 2.19118K wps
[Epoch 21 Batch 330/1540] avg loss 0.00318711, throughput 2.1675K wps
[Epoch 21 Batch 360/1540] avg loss 0.00274917, throughput 2.17281K wps
[Epoch 21 Batch 390/1540] avg loss 0.00279557, throughput 2.19544K wps
[Epoch 21 Batch 420/1540] avg loss 0.00309809, throughput 2.18828K wps
[Epoch 21 Batch 450/1540] avg loss 0.00279287, throughput 2.19294K wps
[Epoch 21 Batch 480/1540] avg loss 0.00287099, throughput 2.19341K wps
[Epoch 21 Batch 510/1540] avg loss 0.00259143, throughput 2.18521K wps
[Epoch 21 Batch 540/1540] avg loss 0.00318701, throughput 2.18641K wps
[Epoch 21 Batch 570/1540] avg loss 0.00268592, throughput 2.17494K wps
[Epoch 21 Batch 600/1540] avg loss 0.00264552, throughput 2.1934K wps
[Epoch 21 Batch 630/1540] avg loss 0.00295318, throughput 2.18683K wps
[Epoch 21 Batch 660/1540] avg loss 0.00298036, throughput 2.18797K wps
[Epoch 21 Batch 690/1540] avg loss 0.00314075, throughput 2.18719K wps
[Epoch 21 Batch 720/1540] avg loss 0.00282536, throughput 2.19489K wps
[Epoch 21 Batch 750/1540] avg loss 0.00262791, throughput 2.19507K wps
[Epoch 21 Batch 780/1540] avg loss 0.00326248, throughput 2.19335K wps
[Epoch 21 Batch 810/1540] avg loss 0.00287236, throughput 2.16645K wps
[Epoch 21 Batch 840/1540] avg loss 0.00317167, throughput 2.1816K wps
[Epoch 21 Batch 870/1540] avg loss 0.00270179, throughput 2.18403K wps
[Epoch 21 Batch 900/1540] avg loss 0.00306358, throughput 2.17715K wps
[Epoch 21 Batch 930/1540] avg loss 0.00273955, throughput 2.16782K wps
[Epoch 21 Batch 960/1540] avg loss 0.00297662, throughput 2.19495K wps
[Epoch 21 Batch 990/1540] avg loss 0.00337935, throughput 2.18232K wps
[Epoch 21 Batch 1020/1540] avg loss 0.00281001, throughput 2.17806K wps
[Epoch 21 Batch 1050/1540] avg loss 0.00279461, throughput 2.19316K wps
[Epoch 21 Batch 1080/1540] avg loss 0.0028604, throughput 2.19181K wps
[Epoch 21 Batch 1110/1540] avg loss 0.00285211, throughput 2.19263K wps
[Epoch 21 Batch 1140/1540] avg loss 0.00285583, throughput 2.19375K wps
[Epoch 21 Batch 1170/1540] avg loss 0.0032943, throughput 2.19152K wps
[Epoch 21 Batch 1200/1540] avg loss 0.00257398, throughput 2.18805K wps
[Epoch 21 Batch 1230/1540] avg loss 0.00302736, throughput 2.19142K wps
[Epoch 21 Batch 1260/1540] avg loss 0.00262411, throughput 2.18582K wps
[Epoch 21 Batch 1290/1540] avg loss 0.00316314, throughput 2.18704K wps
[Epoch 21 Batch 1320/1540] avg loss 0.00298604, throughput 2.1949K wps
[Epoch 21 Batch 1350/1540] avg loss 0.00287505, throughput 2.18978K wps
[Epoch 21 Batch 1380/1540] avg loss 0.00306686, throughput 2.18964K wps
[Epoch 21 Batch 1410/1540] avg loss 0.0027824, throughput 2.16667K wps
[Epoch 21 Batch 1440/1540] avg loss 0.00268261, throughput 2.19655K wps
[Epoch 21 Batch 1470/1540] avg loss 0.0026706, throughput 2.19344K wps
[Epoch 21 Batch 1500/1540] avg loss 0.00252726, throughput 2.19394K wps
[Epoch 21 Batch 1530/1540] avg loss 0.00305463, throughput 2.18967K wps
Begin Testing...
[Epoch 21] train avg loss 0.00290541, dev acc 0.8417, dev avg loss 0.420522, throughput 2.18779K wps
[Epoch 22 Batch 30/1540] avg loss 0.00290411, throughput 2.23112K wps
[Epoch 22 Batch 60/1540] avg loss 0.00293083, throughput 2.17346K wps
[Epoch 22 Batch 90/1540] avg loss 0.00316709, throughput 2.16757K wps
[Epoch 22 Batch 120/1540] avg loss 0.00263178, throughput 2.19287K wps
[Epoch 22 Batch 150/1540] avg loss 0.0030099, throughput 2.19501K wps
[Epoch 22 Batch 180/1540] avg loss 0.0026527, throughput 2.17822K wps
[Epoch 22 Batch 210/1540] avg loss 0.00264095, throughput 2.17707K wps
[Epoch 22 Batch 240/1540] avg loss 0.00284353, throughput 2.19752K wps
[Epoch 22 Batch 270/1540] avg loss 0.00296295, throughput 2.1886K wps
[Epoch 22 Batch 300/1540] avg loss 0.00281357, throughput 2.17284K wps
[Epoch 22 Batch 330/1540] avg loss 0.0028163, throughput 2.19168K wps
[Epoch 22 Batch 360/1540] avg loss 0.00307427, throughput 2.18733K wps
[Epoch 22 Batch 390/1540] avg loss 0.00250766, throughput 2.18245K wps
[Epoch 22 Batch 420/1540] avg loss 0.00266355, throughput 2.15497K wps
[Epoch 22 Batch 450/1540] avg loss 0.00258868, throughput 2.19206K wps
[Epoch 22 Batch 480/1540] avg loss 0.002764, throughput 2.19033K wps
[Epoch 22 Batch 510/1540] avg loss 0.00289418, throughput 2.15692K wps
[Epoch 22 Batch 540/1540] avg loss 0.00290114, throughput 2.16546K wps
[Epoch 22 Batch 570/1540] avg loss 0.00283057, throughput 2.17313K wps
[Epoch 22 Batch 600/1540] avg loss 0.00307331, throughput 2.15414K wps
[Epoch 22 Batch 630/1540] avg loss 0.00292613, throughput 2.17239K wps
[Epoch 22 Batch 660/1540] avg loss 0.00278693, throughput 2.19052K wps
[Epoch 22 Batch 690/1540] avg loss 0.00291888, throughput 2.17801K wps
[Epoch 22 Batch 720/1540] avg loss 0.00260404, throughput 2.19189K wps
[Epoch 22 Batch 750/1540] avg loss 0.00272248, throughput 2.18389K wps
[Epoch 22 Batch 780/1540] avg loss 0.00321028, throughput 2.17852K wps
[Epoch 22 Batch 810/1540] avg loss 0.00275209, throughput 2.19188K wps
[Epoch 22 Batch 840/1540] avg loss 0.00242053, throughput 2.19378K wps
[Epoch 22 Batch 870/1540] avg loss 0.00306897, throughput 2.16466K wps
[Epoch 22 Batch 900/1540] avg loss 0.0027655, throughput 2.18925K wps
[Epoch 22 Batch 930/1540] avg loss 0.0028567, throughput 2.1958K wps
[Epoch 22 Batch 960/1540] avg loss 0.00273938, throughput 2.18179K wps
[Epoch 22 Batch 990/1540] avg loss 0.0027481, throughput 2.19483K wps
[Epoch 22 Batch 1020/1540] avg loss 0.00263665, throughput 2.19179K wps
[Epoch 22 Batch 1050/1540] avg loss 0.00301087, throughput 2.19337K wps
[Epoch 22 Batch 1080/1540] avg loss 0.00297015, throughput 2.19551K wps
[Epoch 22 Batch 1110/1540] avg loss 0.00284278, throughput 2.19005K wps
[Epoch 22 Batch 1140/1540] avg loss 0.00245652, throughput 2.17135K wps
[Epoch 22 Batch 1170/1540] avg loss 0.00288198, throughput 2.17253K wps
[Epoch 22 Batch 1200/1540] avg loss 0.00294501, throughput 2.17326K wps
[Epoch 22 Batch 1230/1540] avg loss 0.00275923, throughput 2.18902K wps
[Epoch 22 Batch 1260/1540] avg loss 0.00297397, throughput 2.19121K wps
[Epoch 22 Batch 1290/1540] avg loss 0.00277813, throughput 2.19191K wps
[Epoch 22 Batch 1320/1540] avg loss 0.00292098, throughput 2.1814K wps
[Epoch 22 Batch 1350/1540] avg loss 0.00300911, throughput 2.17397K wps
[Epoch 22 Batch 1380/1540] avg loss 0.00268811, throughput 2.18444K wps
[Epoch 22 Batch 1410/1540] avg loss 0.00264298, throughput 2.18127K wps
[Epoch 22 Batch 1440/1540] avg loss 0.00276577, throughput 2.19088K wps
[Epoch 22 Batch 1470/1540] avg loss 0.00266767, throughput 2.18075K wps
[Epoch 22 Batch 1500/1540] avg loss 0.0025658, throughput 2.19031K wps
[Epoch 22 Batch 1530/1540] avg loss 0.00287129, throughput 2.19024K wps
Begin Testing...
[Epoch 22] train avg loss 0.0028143, dev acc 0.8372, dev avg loss 0.424432, throughput 2.18365K wps
[Epoch 23 Batch 30/1540] avg loss 0.0026159, throughput 2.23066K wps
[Epoch 23 Batch 60/1540] avg loss 0.00273136, throughput 2.16506K wps
[Epoch 23 Batch 90/1540] avg loss 0.00279423, throughput 2.19592K wps
[Epoch 23 Batch 120/1540] avg loss 0.00267319, throughput 2.18031K wps
[Epoch 23 Batch 150/1540] avg loss 0.00260921, throughput 2.17584K wps
[Epoch 23 Batch 180/1540] avg loss 0.00278287, throughput 2.19374K wps
[Epoch 23 Batch 210/1540] avg loss 0.00241378, throughput 2.19348K wps
[Epoch 23 Batch 240/1540] avg loss 0.00268429, throughput 2.19007K wps
[Epoch 23 Batch 270/1540] avg loss 0.00245464, throughput 2.1613K wps
[Epoch 23 Batch 300/1540] avg loss 0.00247313, throughput 2.18235K wps
[Epoch 23 Batch 330/1540] avg loss 0.00257374, throughput 2.1851K wps
[Epoch 23 Batch 360/1540] avg loss 0.00299457, throughput 2.18957K wps
[Epoch 23 Batch 390/1540] avg loss 0.00292161, throughput 2.18489K wps
[Epoch 23 Batch 420/1540] avg loss 0.00266267, throughput 2.19446K wps
[Epoch 23 Batch 450/1540] avg loss 0.00238361, throughput 2.17555K wps
[Epoch 23 Batch 480/1540] avg loss 0.00290146, throughput 2.1942K wps
[Epoch 23 Batch 510/1540] avg loss 0.00256659, throughput 2.19579K wps
[Epoch 23 Batch 540/1540] avg loss 0.00271616, throughput 2.19608K wps
[Epoch 23 Batch 570/1540] avg loss 0.00264342, throughput 2.18995K wps
[Epoch 23 Batch 600/1540] avg loss 0.00241531, throughput 2.19067K wps
[Epoch 23 Batch 630/1540] avg loss 0.00273738, throughput 2.18889K wps
[Epoch 23 Batch 660/1540] avg loss 0.00271449, throughput 2.17622K wps
[Epoch 23 Batch 690/1540] avg loss 0.00247302, throughput 2.19379K wps
[Epoch 23 Batch 720/1540] avg loss 0.0030469, throughput 2.19247K wps
[Epoch 23 Batch 750/1540] avg loss 0.003196, throughput 2.19066K wps
[Epoch 23 Batch 780/1540] avg loss 0.00285164, throughput 2.19403K wps
[Epoch 23 Batch 810/1540] avg loss 0.00273159, throughput 2.18859K wps
[Epoch 23 Batch 840/1540] avg loss 0.00332783, throughput 2.19242K wps
[Epoch 23 Batch 870/1540] avg loss 0.00273227, throughput 2.1948K wps
[Epoch 23 Batch 900/1540] avg loss 0.00271231, throughput 2.17889K wps
[Epoch 23 Batch 930/1540] avg loss 0.00234421, throughput 2.18883K wps
[Epoch 23 Batch 960/1540] avg loss 0.0024108, throughput 2.18161K wps
[Epoch 23 Batch 990/1540] avg loss 0.00286963, throughput 2.18978K wps
[Epoch 23 Batch 1020/1540] avg loss 0.00263093, throughput 2.18271K wps
[Epoch 23 Batch 1050/1540] avg loss 0.00304146, throughput 2.15519K wps
[Epoch 23 Batch 1080/1540] avg loss 0.00281673, throughput 2.17222K wps
[Epoch 23 Batch 1110/1540] avg loss 0.00285226, throughput 2.19379K wps
[Epoch 23 Batch 1140/1540] avg loss 0.00270399, throughput 2.19303K wps
[Epoch 23 Batch 1170/1540] avg loss 0.00254157, throughput 2.17294K wps
[Epoch 23 Batch 1200/1540] avg loss 0.00298115, throughput 2.1945K wps
[Epoch 23 Batch 1230/1540] avg loss 0.00300379, throughput 2.18272K wps
[Epoch 23 Batch 1260/1540] avg loss 0.00312089, throughput 2.17162K wps
[Epoch 23 Batch 1290/1540] avg loss 0.00259306, throughput 2.19326K wps
[Epoch 23 Batch 1320/1540] avg loss 0.00265311, throughput 2.1654K wps
[Epoch 23 Batch 1350/1540] avg loss 0.00279782, throughput 2.17947K wps
[Epoch 23 Batch 1380/1540] avg loss 0.00265973, throughput 2.18154K wps
[Epoch 23 Batch 1410/1540] avg loss 0.00268971, throughput 2.1905K wps
[Epoch 23 Batch 1440/1540] avg loss 0.00250631, throughput 2.19306K wps
[Epoch 23 Batch 1470/1540] avg loss 0.00254535, throughput 2.17931K wps
[Epoch 23 Batch 1500/1540] avg loss 0.00282486, throughput 2.17784K wps
[Epoch 23 Batch 1530/1540] avg loss 0.00279106, throughput 2.19361K wps
Begin Testing...
[Epoch 23] train avg loss 0.00272268, dev acc 0.8394, dev avg loss 0.430709, throughput 2.18609K wps
[Epoch 24 Batch 30/1540] avg loss 0.00264644, throughput 2.22669K wps
[Epoch 24 Batch 60/1540] avg loss 0.002629, throughput 2.19423K wps
[Epoch 24 Batch 90/1540] avg loss 0.00243175, throughput 2.19028K wps
[Epoch 24 Batch 120/1540] avg loss 0.00246015, throughput 2.17482K wps
[Epoch 24 Batch 150/1540] avg loss 0.00244295, throughput 2.19697K wps
[Epoch 24 Batch 180/1540] avg loss 0.00232723, throughput 2.18975K wps
[Epoch 24 Batch 210/1540] avg loss 0.00274121, throughput 2.17411K wps
[Epoch 24 Batch 240/1540] avg loss 0.00246766, throughput 2.19419K wps
[Epoch 24 Batch 270/1540] avg loss 0.0025782, throughput 2.1678K wps
[Epoch 24 Batch 300/1540] avg loss 0.00234367, throughput 2.19267K wps
[Epoch 24 Batch 330/1540] avg loss 0.00272263, throughput 2.19586K wps
[Epoch 24 Batch 360/1540] avg loss 0.00307961, throughput 2.16473K wps
[Epoch 24 Batch 390/1540] avg loss 0.00276346, throughput 2.18736K wps
[Epoch 24 Batch 420/1540] avg loss 0.00257251, throughput 2.18432K wps
[Epoch 24 Batch 450/1540] avg loss 0.0028732, throughput 2.1893K wps
[Epoch 24 Batch 480/1540] avg loss 0.00281577, throughput 2.18725K wps
[Epoch 24 Batch 510/1540] avg loss 0.00306411, throughput 2.18045K wps
[Epoch 24 Batch 540/1540] avg loss 0.00285162, throughput 2.19072K wps
[Epoch 24 Batch 570/1540] avg loss 0.0027433, throughput 2.18271K wps
[Epoch 24 Batch 600/1540] avg loss 0.00270446, throughput 2.18131K wps
[Epoch 24 Batch 630/1540] avg loss 0.00281823, throughput 2.19477K wps
[Epoch 24 Batch 660/1540] avg loss 0.00261511, throughput 2.19172K wps
[Epoch 24 Batch 690/1540] avg loss 0.00261995, throughput 2.19043K wps
[Epoch 24 Batch 720/1540] avg loss 0.00264513, throughput 2.1909K wps
[Epoch 24 Batch 750/1540] avg loss 0.00254094, throughput 2.18996K wps
[Epoch 24 Batch 780/1540] avg loss 0.00255819, throughput 2.17372K wps
[Epoch 24 Batch 810/1540] avg loss 0.00247905, throughput 2.18762K wps
[Epoch 24 Batch 840/1540] avg loss 0.0025712, throughput 2.17453K wps
[Epoch 24 Batch 870/1540] avg loss 0.00281586, throughput 2.19587K wps
[Epoch 24 Batch 900/1540] avg loss 0.00270557, throughput 2.19241K wps
[Epoch 24 Batch 930/1540] avg loss 0.00278495, throughput 2.17856K wps
[Epoch 24 Batch 960/1540] avg loss 0.00246772, throughput 2.19355K wps
[Epoch 24 Batch 990/1540] avg loss 0.00289311, throughput 2.17501K wps
[Epoch 24 Batch 1020/1540] avg loss 0.0029022, throughput 2.17076K wps
[Epoch 24 Batch 1050/1540] avg loss 0.00269188, throughput 2.18866K wps
[Epoch 24 Batch 1080/1540] avg loss 0.0025543, throughput 2.19249K wps
[Epoch 24 Batch 1110/1540] avg loss 0.00310292, throughput 2.18821K wps
[Epoch 24 Batch 1140/1540] avg loss 0.00274159, throughput 2.19579K wps
[Epoch 24 Batch 1170/1540] avg loss 0.00254485, throughput 2.17614K wps
[Epoch 24 Batch 1200/1540] avg loss 0.00239246, throughput 2.18662K wps
[Epoch 24 Batch 1230/1540] avg loss 0.00282399, throughput 2.1809K wps
[Epoch 24 Batch 1260/1540] avg loss 0.00268071, throughput 2.1784K wps
[Epoch 24 Batch 1290/1540] avg loss 0.0026531, throughput 2.18876K wps
[Epoch 24 Batch 1320/1540] avg loss 0.00241396, throughput 2.1898K wps
[Epoch 24 Batch 1350/1540] avg loss 0.00251537, throughput 2.19664K wps
[Epoch 24 Batch 1380/1540] avg loss 0.00274172, throughput 2.18904K wps
[Epoch 24 Batch 1410/1540] avg loss 0.00253522, throughput 2.19246K wps
[Epoch 24 Batch 1440/1540] avg loss 0.00263473, throughput 2.19552K wps
[Epoch 24 Batch 1470/1540] avg loss 0.0030791, throughput 2.17791K wps
[Epoch 24 Batch 1500/1540] avg loss 0.00263954, throughput 2.16887K wps
[Epoch 24 Batch 1530/1540] avg loss 0.00254148, throughput 2.18486K wps
Begin Testing...
[Epoch 24] train avg loss 0.00266758, dev acc 0.8417, dev avg loss 0.43599, throughput 2.18667K wps
[Epoch 25 Batch 30/1540] avg loss 0.00282211, throughput 2.22101K wps
[Epoch 25 Batch 60/1540] avg loss 0.00225376, throughput 2.17922K wps
[Epoch 25 Batch 90/1540] avg loss 0.00256848, throughput 2.18981K wps
[Epoch 25 Batch 120/1540] avg loss 0.00258901, throughput 2.18247K wps
[Epoch 25 Batch 150/1540] avg loss 0.00251552, throughput 2.19292K wps
[Epoch 25 Batch 180/1540] avg loss 0.00221535, throughput 2.18385K wps
[Epoch 25 Batch 210/1540] avg loss 0.00240832, throughput 2.18396K wps
[Epoch 25 Batch 240/1540] avg loss 0.00264271, throughput 2.18556K wps
[Epoch 25 Batch 270/1540] avg loss 0.0026117, throughput 2.19269K wps
[Epoch 25 Batch 300/1540] avg loss 0.00273315, throughput 2.17749K wps
[Epoch 25 Batch 330/1540] avg loss 0.0022954, throughput 2.17509K wps
[Epoch 25 Batch 360/1540] avg loss 0.00224127, throughput 2.18922K wps
[Epoch 25 Batch 390/1540] avg loss 0.00243672, throughput 2.19188K wps
[Epoch 25 Batch 420/1540] avg loss 0.00243376, throughput 2.17564K wps
[Epoch 25 Batch 450/1540] avg loss 0.00236963, throughput 2.1464K wps
[Epoch 25 Batch 480/1540] avg loss 0.00265967, throughput 2.17042K wps
[Epoch 25 Batch 510/1540] avg loss 0.00272732, throughput 2.19145K wps
[Epoch 25 Batch 540/1540] avg loss 0.00268163, throughput 2.18918K wps
[Epoch 25 Batch 570/1540] avg loss 0.0027016, throughput 2.18516K wps
[Epoch 25 Batch 600/1540] avg loss 0.0026696, throughput 2.1863K wps
[Epoch 25 Batch 630/1540] avg loss 0.00234247, throughput 2.19357K wps
[Epoch 25 Batch 660/1540] avg loss 0.00260958, throughput 2.18404K wps
[Epoch 25 Batch 690/1540] avg loss 0.00262873, throughput 2.15211K wps
[Epoch 25 Batch 720/1540] avg loss 0.00272172, throughput 2.1775K wps
[Epoch 25 Batch 750/1540] avg loss 0.0028751, throughput 2.19668K wps
[Epoch 25 Batch 780/1540] avg loss 0.00260642, throughput 2.18445K wps
[Epoch 25 Batch 810/1540] avg loss 0.00261209, throughput 2.1736K wps
[Epoch 25 Batch 840/1540] avg loss 0.00236134, throughput 2.18124K wps
[Epoch 25 Batch 870/1540] avg loss 0.00233978, throughput 2.15941K wps
[Epoch 25 Batch 900/1540] avg loss 0.00256855, throughput 2.17266K wps
[Epoch 25 Batch 930/1540] avg loss 0.00276639, throughput 2.16627K wps
[Epoch 25 Batch 960/1540] avg loss 0.00232858, throughput 2.19216K wps
[Epoch 25 Batch 990/1540] avg loss 0.00238588, throughput 2.19407K wps
[Epoch 25 Batch 1020/1540] avg loss 0.00273372, throughput 2.1908K wps
[Epoch 25 Batch 1050/1540] avg loss 0.00263456, throughput 2.19006K wps
[Epoch 25 Batch 1080/1540] avg loss 0.00251515, throughput 2.1957K wps
[Epoch 25 Batch 1110/1540] avg loss 0.0024414, throughput 2.19385K wps
[Epoch 25 Batch 1140/1540] avg loss 0.00258103, throughput 2.1834K wps
[Epoch 25 Batch 1170/1540] avg loss 0.00254118, throughput 2.19567K wps
[Epoch 25 Batch 1200/1540] avg loss 0.00270087, throughput 2.1947K wps
[Epoch 25 Batch 1230/1540] avg loss 0.00273753, throughput 2.19378K wps
[Epoch 25 Batch 1260/1540] avg loss 0.00291742, throughput 2.17988K wps
[Epoch 25 Batch 1290/1540] avg loss 0.00282171, throughput 2.18102K wps
[Epoch 25 Batch 1320/1540] avg loss 0.00282855, throughput 2.18992K wps
[Epoch 25 Batch 1350/1540] avg loss 0.00252391, throughput 2.19705K wps
[Epoch 25 Batch 1380/1540] avg loss 0.00275425, throughput 2.18719K wps
[Epoch 25 Batch 1410/1540] avg loss 0.00258201, throughput 2.19285K wps
[Epoch 25 Batch 1440/1540] avg loss 0.00268496, throughput 2.17484K wps
[Epoch 25 Batch 1470/1540] avg loss 0.0025044, throughput 2.17159K wps
[Epoch 25 Batch 1500/1540] avg loss 0.00286791, throughput 2.18659K wps
[Epoch 25 Batch 1530/1540] avg loss 0.00252341, throughput 2.18671K wps
Begin Testing...
[Epoch 25] train avg loss 0.00258012, dev acc 0.8406, dev avg loss 0.435727, throughput 2.18423K wps
[Epoch 26 Batch 30/1540] avg loss 0.00217549, throughput 2.21624K wps
[Epoch 26 Batch 60/1540] avg loss 0.00265888, throughput 2.192K wps
[Epoch 26 Batch 90/1540] avg loss 0.0023807, throughput 2.19171K wps
[Epoch 26 Batch 120/1540] avg loss 0.00227966, throughput 2.17936K wps
[Epoch 26 Batch 150/1540] avg loss 0.00281971, throughput 2.1926K wps
[Epoch 26 Batch 180/1540] avg loss 0.00244564, throughput 2.17941K wps
[Epoch 26 Batch 210/1540] avg loss 0.00256129, throughput 2.18407K wps
[Epoch 26 Batch 240/1540] avg loss 0.00260182, throughput 2.17395K wps
[Epoch 26 Batch 270/1540] avg loss 0.00235998, throughput 2.17979K wps
[Epoch 26 Batch 300/1540] avg loss 0.00258025, throughput 2.18543K wps
[Epoch 26 Batch 330/1540] avg loss 0.00258412, throughput 2.18921K wps
[Epoch 26 Batch 360/1540] avg loss 0.00234832, throughput 2.19687K wps
[Epoch 26 Batch 390/1540] avg loss 0.00255755, throughput 2.18772K wps
[Epoch 26 Batch 420/1540] avg loss 0.00293754, throughput 2.18762K wps
[Epoch 26 Batch 450/1540] avg loss 0.00239997, throughput 2.19345K wps
[Epoch 26 Batch 480/1540] avg loss 0.00223567, throughput 2.18786K wps
[Epoch 26 Batch 510/1540] avg loss 0.00233599, throughput 2.1768K wps
[Epoch 26 Batch 540/1540] avg loss 0.0024811, throughput 2.16744K wps
[Epoch 26 Batch 570/1540] avg loss 0.00273473, throughput 2.18438K wps
[Epoch 26 Batch 600/1540] avg loss 0.00232703, throughput 2.18959K wps
[Epoch 26 Batch 630/1540] avg loss 0.00230741, throughput 2.18764K wps
[Epoch 26 Batch 660/1540] avg loss 0.00253562, throughput 2.18599K wps
[Epoch 26 Batch 690/1540] avg loss 0.00229628, throughput 2.16147K wps
[Epoch 26 Batch 720/1540] avg loss 0.00286166, throughput 2.17766K wps
[Epoch 26 Batch 750/1540] avg loss 0.00207187, throughput 2.17662K wps
[Epoch 26 Batch 780/1540] avg loss 0.00224643, throughput 2.18015K wps
[Epoch 26 Batch 810/1540] avg loss 0.00253217, throughput 2.19453K wps
[Epoch 26 Batch 840/1540] avg loss 0.00228797, throughput 2.19451K wps
[Epoch 26 Batch 870/1540] avg loss 0.00228972, throughput 2.18159K wps
[Epoch 26 Batch 900/1540] avg loss 0.00257154, throughput 2.1676K wps
[Epoch 26 Batch 930/1540] avg loss 0.00253003, throughput 2.18642K wps
[Epoch 26 Batch 960/1540] avg loss 0.00244635, throughput 2.1974K wps
[Epoch 26 Batch 990/1540] avg loss 0.00246849, throughput 2.18904K wps
[Epoch 26 Batch 1020/1540] avg loss 0.00273877, throughput 2.19159K wps
[Epoch 26 Batch 1050/1540] avg loss 0.00246192, throughput 2.19716K wps
[Epoch 26 Batch 1080/1540] avg loss 0.00240408, throughput 2.19561K wps
[Epoch 26 Batch 1110/1540] avg loss 0.00260566, throughput 2.19411K wps
[Epoch 26 Batch 1140/1540] avg loss 0.00233133, throughput 2.19385K wps
[Epoch 26 Batch 1170/1540] avg loss 0.00287352, throughput 2.19249K wps
[Epoch 26 Batch 1200/1540] avg loss 0.00271461, throughput 2.18975K wps
[Epoch 26 Batch 1230/1540] avg loss 0.00258355, throughput 2.18406K wps
[Epoch 26 Batch 1260/1540] avg loss 0.00277394, throughput 2.18529K wps
[Epoch 26 Batch 1290/1540] avg loss 0.00288269, throughput 2.18475K wps
[Epoch 26 Batch 1320/1540] avg loss 0.0029587, throughput 2.18352K wps
[Epoch 26 Batch 1350/1540] avg loss 0.00255312, throughput 2.19019K wps
[Epoch 26 Batch 1380/1540] avg loss 0.00237318, throughput 2.19171K wps
[Epoch 26 Batch 1410/1540] avg loss 0.00277979, throughput 2.19265K wps
[Epoch 26 Batch 1440/1540] avg loss 0.00250747, throughput 2.1861K wps
[Epoch 26 Batch 1470/1540] avg loss 0.00257211, throughput 2.16135K wps
[Epoch 26 Batch 1500/1540] avg loss 0.00247442, throughput 2.18149K wps
[Epoch 26 Batch 1530/1540] avg loss 0.00244848, throughput 2.1892K wps
Begin Testing...
[Epoch 26] train avg loss 0.00251326, dev acc 0.8406, dev avg loss 0.439425, throughput 2.18639K wps
[Epoch 27 Batch 30/1540] avg loss 0.00256792, throughput 2.21831K wps
[Epoch 27 Batch 60/1540] avg loss 0.00233599, throughput 2.19293K wps
[Epoch 27 Batch 90/1540] avg loss 0.00229648, throughput 2.18072K wps
[Epoch 27 Batch 120/1540] avg loss 0.00233449, throughput 2.17987K wps
[Epoch 27 Batch 150/1540] avg loss 0.00223555, throughput 2.18002K wps
[Epoch 27 Batch 180/1540] avg loss 0.00206426, throughput 2.19197K wps
[Epoch 27 Batch 210/1540] avg loss 0.0024225, throughput 2.19295K wps
[Epoch 27 Batch 240/1540] avg loss 0.00222031, throughput 2.18761K wps
[Epoch 27 Batch 270/1540] avg loss 0.00239653, throughput 2.19228K wps
[Epoch 27 Batch 300/1540] avg loss 0.00228625, throughput 2.19252K wps
[Epoch 27 Batch 330/1540] avg loss 0.0023407, throughput 2.19089K wps
[Epoch 27 Batch 360/1540] avg loss 0.00242389, throughput 2.18538K wps
[Epoch 27 Batch 390/1540] avg loss 0.00266166, throughput 2.1777K wps
[Epoch 27 Batch 420/1540] avg loss 0.00239165, throughput 2.19363K wps
[Epoch 27 Batch 450/1540] avg loss 0.00249704, throughput 2.17145K wps
[Epoch 27 Batch 480/1540] avg loss 0.00258668, throughput 2.1854K wps
[Epoch 27 Batch 510/1540] avg loss 0.00266761, throughput 2.18658K wps
[Epoch 27 Batch 540/1540] avg loss 0.00257275, throughput 2.17432K wps
[Epoch 27 Batch 570/1540] avg loss 0.00215267, throughput 2.17918K wps
[Epoch 27 Batch 600/1540] avg loss 0.00260644, throughput 2.19415K wps
[Epoch 27 Batch 630/1540] avg loss 0.00245312, throughput 2.17513K wps
[Epoch 27 Batch 660/1540] avg loss 0.00216429, throughput 2.19003K wps
[Epoch 27 Batch 690/1540] avg loss 0.00236296, throughput 2.19317K wps
[Epoch 27 Batch 720/1540] avg loss 0.00254282, throughput 2.18257K wps
[Epoch 27 Batch 750/1540] avg loss 0.00261175, throughput 2.18709K wps
[Epoch 27 Batch 780/1540] avg loss 0.00273068, throughput 2.18385K wps
[Epoch 27 Batch 810/1540] avg loss 0.00277909, throughput 2.18699K wps
[Epoch 27 Batch 840/1540] avg loss 0.00234653, throughput 2.18919K wps
[Epoch 27 Batch 870/1540] avg loss 0.0026722, throughput 2.16009K wps
[Epoch 27 Batch 900/1540] avg loss 0.00254892, throughput 2.18959K wps
[Epoch 27 Batch 930/1540] avg loss 0.00245626, throughput 2.19875K wps
[Epoch 27 Batch 960/1540] avg loss 0.00258554, throughput 2.18701K wps
[Epoch 27 Batch 990/1540] avg loss 0.00244065, throughput 2.19421K wps
[Epoch 27 Batch 1020/1540] avg loss 0.00253961, throughput 2.18623K wps
[Epoch 27 Batch 1050/1540] avg loss 0.00210787, throughput 2.16393K wps
[Epoch 27 Batch 1080/1540] avg loss 0.0022154, throughput 2.18092K wps
[Epoch 27 Batch 1110/1540] avg loss 0.00213173, throughput 2.16946K wps
[Epoch 27 Batch 1140/1540] avg loss 0.00281168, throughput 2.19365K wps
[Epoch 27 Batch 1170/1540] avg loss 0.00246655, throughput 2.18947K wps
[Epoch 27 Batch 1200/1540] avg loss 0.00228032, throughput 2.18109K wps
[Epoch 27 Batch 1230/1540] avg loss 0.00231405, throughput 2.17242K wps
[Epoch 27 Batch 1260/1540] avg loss 0.00271069, throughput 2.18342K wps
[Epoch 27 Batch 1290/1540] avg loss 0.00229017, throughput 2.1899K wps
[Epoch 27 Batch 1320/1540] avg loss 0.00224948, throughput 2.1927K wps
[Epoch 27 Batch 1350/1540] avg loss 0.00285166, throughput 2.197K wps
[Epoch 27 Batch 1380/1540] avg loss 0.00259772, throughput 2.19504K wps
[Epoch 27 Batch 1410/1540] avg loss 0.00258188, throughput 2.18978K wps
[Epoch 27 Batch 1440/1540] avg loss 0.00275524, throughput 2.19238K wps
[Epoch 27 Batch 1470/1540] avg loss 0.00247198, throughput 2.18728K wps
[Epoch 27 Batch 1500/1540] avg loss 0.00253305, throughput 2.17072K wps
[Epoch 27 Batch 1530/1540] avg loss 0.00223233, throughput 2.18707K wps
Begin Testing...
[Epoch 27] train avg loss 0.00245395, dev acc 0.8475, dev avg loss 0.447676, throughput 2.18605K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.18 s
[Epoch 28 Batch 30/1540] avg loss 0.00242994, throughput 2.20042K wps
[Epoch 28 Batch 60/1540] avg loss 0.00248539, throughput 2.17396K wps
[Epoch 28 Batch 90/1540] avg loss 0.00245239, throughput 2.1908K wps
[Epoch 28 Batch 120/1540] avg loss 0.00223004, throughput 2.18942K wps
[Epoch 28 Batch 150/1540] avg loss 0.00220783, throughput 2.19056K wps
[Epoch 28 Batch 180/1540] avg loss 0.00223458, throughput 2.18901K wps
[Epoch 28 Batch 210/1540] avg loss 0.00242097, throughput 2.19456K wps
[Epoch 28 Batch 240/1540] avg loss 0.00252178, throughput 2.19134K wps
[Epoch 28 Batch 270/1540] avg loss 0.00195657, throughput 2.18417K wps
[Epoch 28 Batch 300/1540] avg loss 0.00244446, throughput 2.18901K wps
[Epoch 28 Batch 330/1540] avg loss 0.00196334, throughput 2.19237K wps
[Epoch 28 Batch 360/1540] avg loss 0.00200078, throughput 2.18751K wps
[Epoch 28 Batch 390/1540] avg loss 0.00282693, throughput 2.19058K wps
[Epoch 28 Batch 420/1540] avg loss 0.00245686, throughput 2.18719K wps
[Epoch 28 Batch 450/1540] avg loss 0.00232146, throughput 2.18871K wps
[Epoch 28 Batch 480/1540] avg loss 0.00241461, throughput 2.18636K wps
[Epoch 28 Batch 510/1540] avg loss 0.00249616, throughput 2.16158K wps
[Epoch 28 Batch 540/1540] avg loss 0.00241935, throughput 2.19198K wps
[Epoch 28 Batch 570/1540] avg loss 0.00265697, throughput 2.19076K wps
[Epoch 28 Batch 600/1540] avg loss 0.00265441, throughput 2.18462K wps
[Epoch 28 Batch 630/1540] avg loss 0.00227715, throughput 2.17801K wps
[Epoch 28 Batch 660/1540] avg loss 0.0024315, throughput 2.16508K wps
[Epoch 28 Batch 690/1540] avg loss 0.00262298, throughput 2.19168K wps
[Epoch 28 Batch 720/1540] avg loss 0.00249901, throughput 2.18484K wps
[Epoch 28 Batch 750/1540] avg loss 0.00258027, throughput 2.18109K wps
[Epoch 28 Batch 780/1540] avg loss 0.00238241, throughput 2.16699K wps
[Epoch 28 Batch 810/1540] avg loss 0.00231679, throughput 2.17987K wps
[Epoch 28 Batch 840/1540] avg loss 0.00237779, throughput 2.19228K wps
[Epoch 28 Batch 870/1540] avg loss 0.00221076, throughput 2.18882K wps
[Epoch 28 Batch 900/1540] avg loss 0.00211326, throughput 2.17985K wps
[Epoch 28 Batch 930/1540] avg loss 0.00251172, throughput 2.16719K wps
[Epoch 28 Batch 960/1540] avg loss 0.00206567, throughput 2.18492K wps
[Epoch 28 Batch 990/1540] avg loss 0.00227619, throughput 2.19199K wps
[Epoch 28 Batch 1020/1540] avg loss 0.00248257, throughput 2.19148K wps
[Epoch 28 Batch 1050/1540] avg loss 0.00235875, throughput 2.19138K wps
[Epoch 28 Batch 1080/1540] avg loss 0.00230909, throughput 2.19111K wps
[Epoch 28 Batch 1110/1540] avg loss 0.00274351, throughput 2.19172K wps
[Epoch 28 Batch 1140/1540] avg loss 0.00254212, throughput 2.18965K wps
[Epoch 28 Batch 1170/1540] avg loss 0.0023557, throughput 2.16931K wps
[Epoch 28 Batch 1200/1540] avg loss 0.00209898, throughput 2.1938K wps
[Epoch 28 Batch 1230/1540] avg loss 0.00249053, throughput 2.19465K wps
[Epoch 28 Batch 1260/1540] avg loss 0.0024278, throughput 2.16366K wps
[Epoch 28 Batch 1290/1540] avg loss 0.00216284, throughput 2.1867K wps
[Epoch 28 Batch 1320/1540] avg loss 0.0024102, throughput 2.1878K wps
[Epoch 28 Batch 1350/1540] avg loss 0.00274053, throughput 2.18648K wps
[Epoch 28 Batch 1380/1540] avg loss 0.0023892, throughput 2.1757K wps
[Epoch 28 Batch 1410/1540] avg loss 0.00243401, throughput 2.1873K wps
[Epoch 28 Batch 1440/1540] avg loss 0.00292383, throughput 2.1862K wps
[Epoch 28 Batch 1470/1540] avg loss 0.00231201, throughput 2.17899K wps
[Epoch 28 Batch 1500/1540] avg loss 0.00261049, throughput 2.18233K wps
[Epoch 28 Batch 1530/1540] avg loss 0.00214097, throughput 2.16877K wps
Begin Testing...
[Epoch 28] train avg loss 0.00239408, dev acc 0.8417, dev avg loss 0.454321, throughput 2.18489K wps
[Epoch 29 Batch 30/1540] avg loss 0.00230828, throughput 2.21737K wps
[Epoch 29 Batch 60/1540] avg loss 0.00198618, throughput 2.19485K wps
[Epoch 29 Batch 90/1540] avg loss 0.00197457, throughput 2.19513K wps
[Epoch 29 Batch 120/1540] avg loss 0.00240536, throughput 2.18689K wps
[Epoch 29 Batch 150/1540] avg loss 0.00207802, throughput 2.19284K wps
[Epoch 29 Batch 180/1540] avg loss 0.00239905, throughput 2.16957K wps
[Epoch 29 Batch 210/1540] avg loss 0.00248196, throughput 2.1777K wps
[Epoch 29 Batch 240/1540] avg loss 0.002594, throughput 2.19481K wps
[Epoch 29 Batch 270/1540] avg loss 0.00206239, throughput 2.17686K wps
[Epoch 29 Batch 300/1540] avg loss 0.00197319, throughput 2.17448K wps
[Epoch 29 Batch 330/1540] avg loss 0.00222788, throughput 2.16995K wps
[Epoch 29 Batch 360/1540] avg loss 0.00223539, throughput 2.17741K wps
[Epoch 29 Batch 390/1540] avg loss 0.00262577, throughput 2.19326K wps
[Epoch 29 Batch 420/1540] avg loss 0.00220356, throughput 2.19408K wps
[Epoch 29 Batch 450/1540] avg loss 0.00223215, throughput 2.1903K wps
[Epoch 29 Batch 480/1540] avg loss 0.00213231, throughput 2.1802K wps
[Epoch 29 Batch 510/1540] avg loss 0.00230249, throughput 2.19368K wps
[Epoch 29 Batch 540/1540] avg loss 0.0026091, throughput 2.18912K wps
[Epoch 29 Batch 570/1540] avg loss 0.00253581, throughput 2.16955K wps
[Epoch 29 Batch 600/1540] avg loss 0.0021115, throughput 2.17575K wps
[Epoch 29 Batch 630/1540] avg loss 0.0023503, throughput 2.17777K wps
[Epoch 29 Batch 660/1540] avg loss 0.00230035, throughput 2.19258K wps
[Epoch 29 Batch 690/1540] avg loss 0.00274432, throughput 2.18662K wps
[Epoch 29 Batch 720/1540] avg loss 0.00229214, throughput 2.18629K wps
[Epoch 29 Batch 750/1540] avg loss 0.00273234, throughput 2.19356K wps
[Epoch 29 Batch 780/1540] avg loss 0.00237959, throughput 2.1911K wps
[Epoch 29 Batch 810/1540] avg loss 0.00218975, throughput 2.18098K wps
[Epoch 29 Batch 840/1540] avg loss 0.00215105, throughput 2.19377K wps
[Epoch 29 Batch 870/1540] avg loss 0.0024614, throughput 2.19332K wps
[Epoch 29 Batch 900/1540] avg loss 0.00269277, throughput 2.17973K wps
[Epoch 29 Batch 930/1540] avg loss 0.00213313, throughput 2.16271K wps
[Epoch 29 Batch 960/1540] avg loss 0.00242, throughput 2.18735K wps
[Epoch 29 Batch 990/1540] avg loss 0.00240275, throughput 2.19264K wps
[Epoch 29 Batch 1020/1540] avg loss 0.00191879, throughput 2.18438K wps
[Epoch 29 Batch 1050/1540] avg loss 0.00234478, throughput 2.18993K wps
[Epoch 29 Batch 1080/1540] avg loss 0.00241583, throughput 2.18465K wps
[Epoch 29 Batch 1110/1540] avg loss 0.00263201, throughput 2.17801K wps
[Epoch 29 Batch 1140/1540] avg loss 0.0024044, throughput 2.19016K wps
[Epoch 29 Batch 1170/1540] avg loss 0.00231929, throughput 2.18948K wps
[Epoch 29 Batch 1200/1540] avg loss 0.00238659, throughput 2.19285K wps
[Epoch 29 Batch 1230/1540] avg loss 0.00232153, throughput 2.19051K wps
[Epoch 29 Batch 1260/1540] avg loss 0.00219645, throughput 2.18043K wps
[Epoch 29 Batch 1290/1540] avg loss 0.00248477, throughput 2.19242K wps
[Epoch 29 Batch 1320/1540] avg loss 0.00244447, throughput 2.19222K wps
[Epoch 29 Batch 1350/1540] avg loss 0.00233895, throughput 2.19153K wps
[Epoch 29 Batch 1380/1540] avg loss 0.00235884, throughput 2.19241K wps
[Epoch 29 Batch 1410/1540] avg loss 0.00198782, throughput 2.17491K wps
[Epoch 29 Batch 1440/1540] avg loss 0.00232209, throughput 2.18326K wps
[Epoch 29 Batch 1470/1540] avg loss 0.00259867, throughput 2.196K wps
[Epoch 29 Batch 1500/1540] avg loss 0.00248446, throughput 2.18695K wps
[Epoch 29 Batch 1530/1540] avg loss 0.00239532, throughput 2.1642K wps
Begin Testing...
[Epoch 29] train avg loss 0.00233858, dev acc 0.8383, dev avg loss 0.452472, throughput 2.18587K wps
[Epoch 30 Batch 30/1540] avg loss 0.00228389, throughput 2.217K wps
[Epoch 30 Batch 60/1540] avg loss 0.0022, throughput 2.18914K wps
[Epoch 30 Batch 90/1540] avg loss 0.00224685, throughput 2.16762K wps
[Epoch 30 Batch 120/1540] avg loss 0.00238536, throughput 2.17631K wps
[Epoch 30 Batch 150/1540] avg loss 0.00227041, throughput 2.18811K wps
[Epoch 30 Batch 180/1540] avg loss 0.00215042, throughput 2.18739K wps
[Epoch 30 Batch 210/1540] avg loss 0.00234031, throughput 2.16911K wps
[Epoch 30 Batch 240/1540] avg loss 0.00183842, throughput 2.19133K wps
[Epoch 30 Batch 270/1540] avg loss 0.00227346, throughput 2.19305K wps
[Epoch 30 Batch 300/1540] avg loss 0.00202552, throughput 2.18085K wps
[Epoch 30 Batch 330/1540] avg loss 0.00225728, throughput 2.18618K wps
[Epoch 30 Batch 360/1540] avg loss 0.00224356, throughput 2.19315K wps
[Epoch 30 Batch 390/1540] avg loss 0.00223199, throughput 2.18923K wps
[Epoch 30 Batch 420/1540] avg loss 0.00231169, throughput 2.18808K wps
[Epoch 30 Batch 450/1540] avg loss 0.00222745, throughput 2.18837K wps
[Epoch 30 Batch 480/1540] avg loss 0.00244801, throughput 2.1648K wps
[Epoch 30 Batch 510/1540] avg loss 0.0026243, throughput 2.19435K wps
[Epoch 30 Batch 540/1540] avg loss 0.00228935, throughput 2.19416K wps
[Epoch 30 Batch 570/1540] avg loss 0.002293, throughput 2.18337K wps
[Epoch 30 Batch 600/1540] avg loss 0.00208439, throughput 2.17978K wps
[Epoch 30 Batch 630/1540] avg loss 0.00208715, throughput 2.18805K wps
[Epoch 30 Batch 660/1540] avg loss 0.00232738, throughput 2.17763K wps
[Epoch 30 Batch 690/1540] avg loss 0.00214186, throughput 2.18731K wps
[Epoch 30 Batch 720/1540] avg loss 0.00201637, throughput 2.18445K wps
[Epoch 30 Batch 750/1540] avg loss 0.00223926, throughput 2.18828K wps
[Epoch 30 Batch 780/1540] avg loss 0.00224587, throughput 2.15846K wps
[Epoch 30 Batch 810/1540] avg loss 0.00251791, throughput 2.18267K wps
[Epoch 30 Batch 840/1540] avg loss 0.00218824, throughput 2.17962K wps
[Epoch 30 Batch 870/1540] avg loss 0.0023166, throughput 2.1919K wps
[Epoch 30 Batch 900/1540] avg loss 0.00244551, throughput 2.19363K wps
[Epoch 30 Batch 930/1540] avg loss 0.00242898, throughput 2.18977K wps
[Epoch 30 Batch 960/1540] avg loss 0.00220367, throughput 2.18097K wps
[Epoch 30 Batch 990/1540] avg loss 0.00259533, throughput 2.17964K wps
[Epoch 30 Batch 1020/1540] avg loss 0.00238865, throughput 2.19377K wps
[Epoch 30 Batch 1050/1540] avg loss 0.0022934, throughput 2.18521K wps
[Epoch 30 Batch 1080/1540] avg loss 0.00206361, throughput 2.19269K wps
[Epoch 30 Batch 1110/1540] avg loss 0.00227997, throughput 2.19071K wps
[Epoch 30 Batch 1140/1540] avg loss 0.00230864, throughput 2.17746K wps
[Epoch 30 Batch 1170/1540] avg loss 0.00253664, throughput 2.18977K wps
[Epoch 30 Batch 1200/1540] avg loss 0.00251182, throughput 2.19019K wps
[Epoch 30 Batch 1230/1540] avg loss 0.00236485, throughput 2.18395K wps
[Epoch 30 Batch 1260/1540] avg loss 0.00262126, throughput 2.19628K wps
[Epoch 30 Batch 1290/1540] avg loss 0.00214474, throughput 2.19399K wps
[Epoch 30 Batch 1320/1540] avg loss 0.00249502, throughput 2.19425K wps
[Epoch 30 Batch 1350/1540] avg loss 0.00240169, throughput 2.17489K wps
[Epoch 30 Batch 1380/1540] avg loss 0.0021706, throughput 2.18283K wps
[Epoch 30 Batch 1410/1540] avg loss 0.00235126, throughput 2.18879K wps
[Epoch 30 Batch 1440/1540] avg loss 0.00220798, throughput 2.19307K wps
[Epoch 30 Batch 1470/1540] avg loss 0.00243104, throughput 2.19102K wps
[Epoch 30 Batch 1500/1540] avg loss 0.00221846, throughput 2.17014K wps
[Epoch 30 Batch 1530/1540] avg loss 0.00223196, throughput 2.15239K wps
Begin Testing...
[Epoch 30] train avg loss 0.00229104, dev acc 0.8394, dev avg loss 0.458155, throughput 2.18506K wps
[Epoch 31 Batch 30/1540] avg loss 0.00235362, throughput 2.21201K wps
[Epoch 31 Batch 60/1540] avg loss 0.00187611, throughput 2.19375K wps
[Epoch 31 Batch 90/1540] avg loss 0.00204183, throughput 2.17443K wps
[Epoch 31 Batch 120/1540] avg loss 0.0022085, throughput 2.19339K wps
[Epoch 31 Batch 150/1540] avg loss 0.00213535, throughput 2.19378K wps
[Epoch 31 Batch 180/1540] avg loss 0.0020413, throughput 2.18963K wps
[Epoch 31 Batch 210/1540] avg loss 0.00190649, throughput 2.18587K wps
[Epoch 31 Batch 240/1540] avg loss 0.00227104, throughput 2.18741K wps
[Epoch 31 Batch 270/1540] avg loss 0.00189461, throughput 2.18405K wps
[Epoch 31 Batch 300/1540] avg loss 0.00240708, throughput 2.18139K wps
[Epoch 31 Batch 330/1540] avg loss 0.0024071, throughput 2.1891K wps
[Epoch 31 Batch 360/1540] avg loss 0.00228033, throughput 2.19385K wps
[Epoch 31 Batch 390/1540] avg loss 0.0019763, throughput 2.19021K wps
[Epoch 31 Batch 420/1540] avg loss 0.00215584, throughput 2.18293K wps
[Epoch 31 Batch 450/1540] avg loss 0.00221441, throughput 2.17541K wps
[Epoch 31 Batch 480/1540] avg loss 0.0021863, throughput 2.15689K wps
[Epoch 31 Batch 510/1540] avg loss 0.00203883, throughput 2.18585K wps
[Epoch 31 Batch 540/1540] avg loss 0.00231975, throughput 2.19434K wps
[Epoch 31 Batch 570/1540] avg loss 0.00224799, throughput 2.16138K wps
[Epoch 31 Batch 600/1540] avg loss 0.00220295, throughput 2.18908K wps
[Epoch 31 Batch 630/1540] avg loss 0.00261464, throughput 2.19078K wps
[Epoch 31 Batch 660/1540] avg loss 0.00206662, throughput 2.18574K wps
[Epoch 31 Batch 690/1540] avg loss 0.00240112, throughput 2.18356K wps
[Epoch 31 Batch 720/1540] avg loss 0.00252132, throughput 2.18785K wps
[Epoch 31 Batch 750/1540] avg loss 0.00230623, throughput 2.19423K wps
[Epoch 31 Batch 780/1540] avg loss 0.00256575, throughput 2.18931K wps
[Epoch 31 Batch 810/1540] avg loss 0.00224586, throughput 2.19065K wps
[Epoch 31 Batch 840/1540] avg loss 0.002219, throughput 2.18049K wps
[Epoch 31 Batch 870/1540] avg loss 0.00230789, throughput 2.1874K wps
[Epoch 31 Batch 900/1540] avg loss 0.00228311, throughput 2.17731K wps
[Epoch 31 Batch 930/1540] avg loss 0.0023717, throughput 2.1918K wps
[Epoch 31 Batch 960/1540] avg loss 0.00217485, throughput 2.19164K wps
[Epoch 31 Batch 990/1540] avg loss 0.00217189, throughput 2.18556K wps
[Epoch 31 Batch 1020/1540] avg loss 0.00235911, throughput 2.18923K wps
[Epoch 31 Batch 1050/1540] avg loss 0.00211492, throughput 2.19287K wps
[Epoch 31 Batch 1080/1540] avg loss 0.00216979, throughput 2.19154K wps
[Epoch 31 Batch 1110/1540] avg loss 0.00208858, throughput 2.19305K wps
[Epoch 31 Batch 1140/1540] avg loss 0.00265528, throughput 2.18857K wps
[Epoch 31 Batch 1170/1540] avg loss 0.00225825, throughput 2.19727K wps
[Epoch 31 Batch 1200/1540] avg loss 0.00216231, throughput 2.18475K wps
[Epoch 31 Batch 1230/1540] avg loss 0.00261187, throughput 2.17912K wps
[Epoch 31 Batch 1260/1540] avg loss 0.00211878, throughput 2.17473K wps
[Epoch 31 Batch 1290/1540] avg loss 0.0022365, throughput 2.18151K wps
[Epoch 31 Batch 1320/1540] avg loss 0.00218043, throughput 2.18048K wps
[Epoch 31 Batch 1350/1540] avg loss 0.00220308, throughput 2.19007K wps
[Epoch 31 Batch 1380/1540] avg loss 0.00228204, throughput 2.19172K wps
[Epoch 31 Batch 1410/1540] avg loss 0.00253131, throughput 2.1746K wps
[Epoch 31 Batch 1440/1540] avg loss 0.0028414, throughput 2.16838K wps
[Epoch 31 Batch 1470/1540] avg loss 0.00218884, throughput 2.18892K wps
[Epoch 31 Batch 1500/1540] avg loss 0.00259826, throughput 2.18364K wps
[Epoch 31 Batch 1530/1540] avg loss 0.00213113, throughput 2.19002K wps
Begin Testing...
[Epoch 31] train avg loss 0.00225568, dev acc 0.8383, dev avg loss 0.46643, throughput 2.18617K wps
[Epoch 32 Batch 30/1540] avg loss 0.00195923, throughput 2.20712K wps
[Epoch 32 Batch 60/1540] avg loss 0.0018576, throughput 2.14897K wps
[Epoch 32 Batch 90/1540] avg loss 0.00175538, throughput 2.18053K wps
[Epoch 32 Batch 120/1540] avg loss 0.0019904, throughput 2.18126K wps
[Epoch 32 Batch 150/1540] avg loss 0.00192505, throughput 2.19119K wps
[Epoch 32 Batch 180/1540] avg loss 0.00201284, throughput 2.18907K wps
[Epoch 32 Batch 210/1540] avg loss 0.00203204, throughput 2.15969K wps
[Epoch 32 Batch 240/1540] avg loss 0.00236436, throughput 2.18428K wps
[Epoch 32 Batch 270/1540] avg loss 0.0022827, throughput 2.18201K wps
[Epoch 32 Batch 300/1540] avg loss 0.00198514, throughput 2.17421K wps
[Epoch 32 Batch 330/1540] avg loss 0.00215778, throughput 2.19163K wps
[Epoch 32 Batch 360/1540] avg loss 0.00206888, throughput 2.19405K wps
[Epoch 32 Batch 390/1540] avg loss 0.00199814, throughput 2.17658K wps
[Epoch 32 Batch 420/1540] avg loss 0.00216304, throughput 2.19178K wps
[Epoch 32 Batch 450/1540] avg loss 0.00205618, throughput 2.18971K wps
[Epoch 32 Batch 480/1540] avg loss 0.00204084, throughput 2.15856K wps
[Epoch 32 Batch 510/1540] avg loss 0.0021542, throughput 2.17664K wps
[Epoch 32 Batch 540/1540] avg loss 0.00208181, throughput 2.19188K wps
[Epoch 32 Batch 570/1540] avg loss 0.00224762, throughput 2.18511K wps
[Epoch 32 Batch 600/1540] avg loss 0.00222131, throughput 2.18041K wps
[Epoch 32 Batch 630/1540] avg loss 0.00198672, throughput 2.19278K wps
[Epoch 32 Batch 660/1540] avg loss 0.00226686, throughput 2.19162K wps
[Epoch 32 Batch 690/1540] avg loss 0.00233869, throughput 2.18902K wps
[Epoch 32 Batch 720/1540] avg loss 0.00260727, throughput 2.19345K wps
[Epoch 32 Batch 750/1540] avg loss 0.00198215, throughput 2.17K wps
[Epoch 32 Batch 780/1540] avg loss 0.00208754, throughput 2.19396K wps
[Epoch 32 Batch 810/1540] avg loss 0.00232547, throughput 2.17282K wps
[Epoch 32 Batch 840/1540] avg loss 0.00194812, throughput 2.1669K wps
[Epoch 32 Batch 870/1540] avg loss 0.00237823, throughput 2.18926K wps
[Epoch 32 Batch 900/1540] avg loss 0.0020943, throughput 2.19546K wps
[Epoch 32 Batch 930/1540] avg loss 0.00228275, throughput 2.19145K wps
[Epoch 32 Batch 960/1540] avg loss 0.00243942, throughput 2.19358K wps
[Epoch 32 Batch 990/1540] avg loss 0.00218122, throughput 2.18939K wps
[Epoch 32 Batch 1020/1540] avg loss 0.0023522, throughput 2.19046K wps
[Epoch 32 Batch 1050/1540] avg loss 0.00185613, throughput 2.19048K wps
[Epoch 32 Batch 1080/1540] avg loss 0.00211306, throughput 2.1658K wps
[Epoch 32 Batch 1110/1540] avg loss 0.00210733, throughput 2.18177K wps
[Epoch 32 Batch 1140/1540] avg loss 0.0024771, throughput 2.19492K wps
[Epoch 32 Batch 1170/1540] avg loss 0.00222748, throughput 2.18193K wps
[Epoch 32 Batch 1200/1540] avg loss 0.00212335, throughput 2.17116K wps
[Epoch 32 Batch 1230/1540] avg loss 0.00219741, throughput 2.18348K wps
[Epoch 32 Batch 1260/1540] avg loss 0.00263094, throughput 2.16491K wps
[Epoch 32 Batch 1290/1540] avg loss 0.00238117, throughput 2.19176K wps
[Epoch 32 Batch 1320/1540] avg loss 0.00242321, throughput 2.17331K wps
[Epoch 32 Batch 1350/1540] avg loss 0.00242677, throughput 2.19596K wps
[Epoch 32 Batch 1380/1540] avg loss 0.00206971, throughput 2.19259K wps
[Epoch 32 Batch 1410/1540] avg loss 0.002251, throughput 2.19698K wps
[Epoch 32 Batch 1440/1540] avg loss 0.00224783, throughput 2.17671K wps
[Epoch 32 Batch 1470/1540] avg loss 0.00225399, throughput 2.18043K wps
[Epoch 32 Batch 1500/1540] avg loss 0.00236028, throughput 2.19305K wps
[Epoch 32 Batch 1530/1540] avg loss 0.00226331, throughput 2.19199K wps
Begin Testing...
[Epoch 32] train avg loss 0.00217793, dev acc 0.8406, dev avg loss 0.47698, throughput 2.18397K wps
[Epoch 33 Batch 30/1540] avg loss 0.00229854, throughput 2.23782K wps
[Epoch 33 Batch 60/1540] avg loss 0.00213784, throughput 2.19021K wps
[Epoch 33 Batch 90/1540] avg loss 0.00213447, throughput 2.19154K wps
[Epoch 33 Batch 120/1540] avg loss 0.00228248, throughput 2.1901K wps
[Epoch 33 Batch 150/1540] avg loss 0.00189221, throughput 2.19492K wps
[Epoch 33 Batch 180/1540] avg loss 0.00197365, throughput 2.17819K wps
[Epoch 33 Batch 210/1540] avg loss 0.00202325, throughput 2.18545K wps
[Epoch 33 Batch 240/1540] avg loss 0.00218746, throughput 2.18825K wps
[Epoch 33 Batch 270/1540] avg loss 0.0021667, throughput 2.18838K wps
[Epoch 33 Batch 300/1540] avg loss 0.00215024, throughput 2.19245K wps
[Epoch 33 Batch 330/1540] avg loss 0.00202644, throughput 2.19366K wps
[Epoch 33 Batch 360/1540] avg loss 0.00191536, throughput 2.18501K wps
[Epoch 33 Batch 390/1540] avg loss 0.00205217, throughput 2.18237K wps
[Epoch 33 Batch 420/1540] avg loss 0.00218696, throughput 2.19451K wps
[Epoch 33 Batch 450/1540] avg loss 0.00227144, throughput 2.19522K wps
[Epoch 33 Batch 480/1540] avg loss 0.0021845, throughput 2.19561K wps
[Epoch 33 Batch 510/1540] avg loss 0.00190546, throughput 2.19526K wps
[Epoch 33 Batch 540/1540] avg loss 0.00198004, throughput 2.17037K wps
[Epoch 33 Batch 570/1540] avg loss 0.00208273, throughput 2.19399K wps
[Epoch 33 Batch 600/1540] avg loss 0.00214633, throughput 2.1831K wps
[Epoch 33 Batch 630/1540] avg loss 0.00188289, throughput 2.18961K wps
[Epoch 33 Batch 660/1540] avg loss 0.00202681, throughput 2.1859K wps
[Epoch 33 Batch 690/1540] avg loss 0.00248192, throughput 2.17816K wps
[Epoch 33 Batch 720/1540] avg loss 0.00241431, throughput 2.18914K wps
[Epoch 33 Batch 750/1540] avg loss 0.00220467, throughput 2.19376K wps
[Epoch 33 Batch 780/1540] avg loss 0.00198117, throughput 2.16897K wps
[Epoch 33 Batch 810/1540] avg loss 0.00216877, throughput 2.18082K wps
[Epoch 33 Batch 840/1540] avg loss 0.00235018, throughput 2.16278K wps
[Epoch 33 Batch 870/1540] avg loss 0.00217066, throughput 2.17627K wps
[Epoch 33 Batch 900/1540] avg loss 0.00245739, throughput 2.16905K wps
[Epoch 33 Batch 930/1540] avg loss 0.00210782, throughput 2.19262K wps
[Epoch 33 Batch 960/1540] avg loss 0.00195057, throughput 2.18907K wps
[Epoch 33 Batch 990/1540] avg loss 0.00245124, throughput 2.17364K wps
[Epoch 33 Batch 1020/1540] avg loss 0.0018992, throughput 2.16888K wps
[Epoch 33 Batch 1050/1540] avg loss 0.00212031, throughput 2.18905K wps
[Epoch 33 Batch 1080/1540] avg loss 0.00196053, throughput 2.18702K wps
[Epoch 33 Batch 1110/1540] avg loss 0.0021633, throughput 2.1819K wps
[Epoch 33 Batch 1140/1540] avg loss 0.00236062, throughput 2.18279K wps
[Epoch 33 Batch 1170/1540] avg loss 0.00214579, throughput 2.19493K wps
[Epoch 33 Batch 1200/1540] avg loss 0.00222654, throughput 2.19376K wps
[Epoch 33 Batch 1230/1540] avg loss 0.00192281, throughput 2.19032K wps
[Epoch 33 Batch 1260/1540] avg loss 0.00209, throughput 2.18806K wps
[Epoch 33 Batch 1290/1540] avg loss 0.00221615, throughput 2.18772K wps
[Epoch 33 Batch 1320/1540] avg loss 0.00243266, throughput 2.17698K wps
[Epoch 33 Batch 1350/1540] avg loss 0.00240752, throughput 2.19572K wps
[Epoch 33 Batch 1380/1540] avg loss 0.00191283, throughput 2.19123K wps
[Epoch 33 Batch 1410/1540] avg loss 0.00218638, throughput 2.19569K wps
[Epoch 33 Batch 1440/1540] avg loss 0.00222655, throughput 2.17789K wps
[Epoch 33 Batch 1470/1540] avg loss 0.00219612, throughput 2.19456K wps
[Epoch 33 Batch 1500/1540] avg loss 0.00226443, throughput 2.17382K wps
[Epoch 33 Batch 1530/1540] avg loss 0.00263281, throughput 2.197K wps
Begin Testing...
[Epoch 33] train avg loss 0.00215895, dev acc 0.8406, dev avg loss 0.476184, throughput 2.18718K wps
[Epoch 34 Batch 30/1540] avg loss 0.00191984, throughput 2.23845K wps
[Epoch 34 Batch 60/1540] avg loss 0.00200853, throughput 2.19517K wps
[Epoch 34 Batch 90/1540] avg loss 0.00219848, throughput 2.18893K wps
[Epoch 34 Batch 120/1540] avg loss 0.00226993, throughput 2.19375K wps
[Epoch 34 Batch 150/1540] avg loss 0.00201323, throughput 2.17327K wps
[Epoch 34 Batch 180/1540] avg loss 0.00171884, throughput 2.19279K wps
[Epoch 34 Batch 210/1540] avg loss 0.00208054, throughput 2.17995K wps
[Epoch 34 Batch 240/1540] avg loss 0.00186808, throughput 2.18097K wps
[Epoch 34 Batch 270/1540] avg loss 0.00232371, throughput 2.17277K wps
[Epoch 34 Batch 300/1540] avg loss 0.00195195, throughput 2.18951K wps
[Epoch 34 Batch 330/1540] avg loss 0.00177497, throughput 2.17843K wps
[Epoch 34 Batch 360/1540] avg loss 0.00210249, throughput 2.18126K wps
[Epoch 34 Batch 390/1540] avg loss 0.00175272, throughput 2.17283K wps
[Epoch 34 Batch 420/1540] avg loss 0.00181639, throughput 2.18889K wps
[Epoch 34 Batch 450/1540] avg loss 0.00183142, throughput 2.17572K wps
[Epoch 34 Batch 480/1540] avg loss 0.00222884, throughput 2.1934K wps
[Epoch 34 Batch 510/1540] avg loss 0.00237951, throughput 2.188K wps
[Epoch 34 Batch 540/1540] avg loss 0.00222701, throughput 2.18976K wps
[Epoch 34 Batch 570/1540] avg loss 0.00196274, throughput 2.18595K wps
[Epoch 34 Batch 600/1540] avg loss 0.00206024, throughput 2.18166K wps
[Epoch 34 Batch 630/1540] avg loss 0.00207769, throughput 2.15586K wps
[Epoch 34 Batch 660/1540] avg loss 0.00214412, throughput 2.1777K wps
[Epoch 34 Batch 690/1540] avg loss 0.00195798, throughput 2.19492K wps
[Epoch 34 Batch 720/1540] avg loss 0.00220071, throughput 2.18862K wps
[Epoch 34 Batch 750/1540] avg loss 0.00222619, throughput 2.18068K wps
[Epoch 34 Batch 780/1540] avg loss 0.00208819, throughput 2.1935K wps
[Epoch 34 Batch 810/1540] avg loss 0.00248917, throughput 2.18805K wps
[Epoch 34 Batch 840/1540] avg loss 0.00246003, throughput 2.19447K wps
[Epoch 34 Batch 870/1540] avg loss 0.00225534, throughput 2.19297K wps
[Epoch 34 Batch 900/1540] avg loss 0.00191817, throughput 2.19048K wps
[Epoch 34 Batch 930/1540] avg loss 0.00246868, throughput 2.1937K wps
[Epoch 34 Batch 960/1540] avg loss 0.00206704, throughput 2.17733K wps
[Epoch 34 Batch 990/1540] avg loss 0.00218895, throughput 2.19425K wps
[Epoch 34 Batch 1020/1540] avg loss 0.002004, throughput 2.19105K wps
[Epoch 34 Batch 1050/1540] avg loss 0.00216274, throughput 2.18755K wps
[Epoch 34 Batch 1080/1540] avg loss 0.0019289, throughput 2.19114K wps
[Epoch 34 Batch 1110/1540] avg loss 0.00210194, throughput 2.1927K wps
[Epoch 34 Batch 1140/1540] avg loss 0.00213072, throughput 2.18584K wps
[Epoch 34 Batch 1170/1540] avg loss 0.00214214, throughput 2.19131K wps
[Epoch 34 Batch 1200/1540] avg loss 0.00229078, throughput 2.18839K wps
[Epoch 34 Batch 1230/1540] avg loss 0.00222929, throughput 2.16548K wps
[Epoch 34 Batch 1260/1540] avg loss 0.00230816, throughput 2.19122K wps
[Epoch 34 Batch 1290/1540] avg loss 0.00197573, throughput 2.18938K wps
[Epoch 34 Batch 1320/1540] avg loss 0.00250506, throughput 2.19269K wps
[Epoch 34 Batch 1350/1540] avg loss 0.00223283, throughput 2.19132K wps
[Epoch 34 Batch 1380/1540] avg loss 0.00207153, throughput 2.17585K wps
[Epoch 34 Batch 1410/1540] avg loss 0.00217095, throughput 2.18533K wps
[Epoch 34 Batch 1440/1540] avg loss 0.00200018, throughput 2.17817K wps
[Epoch 34 Batch 1470/1540] avg loss 0.0023898, throughput 2.18921K wps
[Epoch 34 Batch 1500/1540] avg loss 0.00219811, throughput 2.18618K wps
[Epoch 34 Batch 1530/1540] avg loss 0.00242223, throughput 2.19197K wps
Begin Testing...
[Epoch 34] train avg loss 0.00212394, dev acc 0.8406, dev avg loss 0.478443, throughput 2.1869K wps
[Epoch 35 Batch 30/1540] avg loss 0.00184251, throughput 2.2159K wps
[Epoch 35 Batch 60/1540] avg loss 0.0019594, throughput 2.18709K wps
[Epoch 35 Batch 90/1540] avg loss 0.0019117, throughput 2.19285K wps
[Epoch 35 Batch 120/1540] avg loss 0.0016239, throughput 2.19265K wps
[Epoch 35 Batch 150/1540] avg loss 0.00203002, throughput 2.18554K wps
[Epoch 35 Batch 180/1540] avg loss 0.00227011, throughput 2.18097K wps
[Epoch 35 Batch 210/1540] avg loss 0.00179606, throughput 2.19469K wps
[Epoch 35 Batch 240/1540] avg loss 0.0019722, throughput 2.18898K wps
[Epoch 35 Batch 270/1540] avg loss 0.00183653, throughput 2.17102K wps
[Epoch 35 Batch 300/1540] avg loss 0.00194947, throughput 2.19182K wps
[Epoch 35 Batch 330/1540] avg loss 0.00210901, throughput 2.19321K wps
[Epoch 35 Batch 360/1540] avg loss 0.00199635, throughput 2.19066K wps
[Epoch 35 Batch 390/1540] avg loss 0.00198602, throughput 2.16717K wps
[Epoch 35 Batch 420/1540] avg loss 0.0017541, throughput 2.18945K wps
[Epoch 35 Batch 450/1540] avg loss 0.00212688, throughput 2.17755K wps
[Epoch 35 Batch 480/1540] avg loss 0.00224336, throughput 2.18151K wps
[Epoch 35 Batch 510/1540] avg loss 0.00183276, throughput 2.1872K wps
[Epoch 35 Batch 540/1540] avg loss 0.00235916, throughput 2.19089K wps
[Epoch 35 Batch 570/1540] avg loss 0.00187815, throughput 2.18028K wps
[Epoch 35 Batch 600/1540] avg loss 0.00209392, throughput 2.18966K wps
[Epoch 35 Batch 630/1540] avg loss 0.00192548, throughput 2.1541K wps
[Epoch 35 Batch 660/1540] avg loss 0.0020196, throughput 2.14932K wps
[Epoch 35 Batch 690/1540] avg loss 0.00200196, throughput 2.17874K wps
[Epoch 35 Batch 720/1540] avg loss 0.0019063, throughput 2.19267K wps
[Epoch 35 Batch 750/1540] avg loss 0.00205574, throughput 2.18281K wps
[Epoch 35 Batch 780/1540] avg loss 0.00180801, throughput 2.18526K wps
[Epoch 35 Batch 810/1540] avg loss 0.00242262, throughput 2.18724K wps
[Epoch 35 Batch 840/1540] avg loss 0.00229961, throughput 2.19097K wps
[Epoch 35 Batch 870/1540] avg loss 0.00188517, throughput 2.19198K wps
[Epoch 35 Batch 900/1540] avg loss 0.0019206, throughput 2.19254K wps
[Epoch 35 Batch 930/1540] avg loss 0.00200574, throughput 2.18848K wps
[Epoch 35 Batch 960/1540] avg loss 0.00239307, throughput 2.19627K wps
[Epoch 35 Batch 990/1540] avg loss 0.00201333, throughput 2.19025K wps
[Epoch 35 Batch 1020/1540] avg loss 0.00219881, throughput 2.18041K wps
[Epoch 35 Batch 1050/1540] avg loss 0.00202108, throughput 2.18503K wps
[Epoch 35 Batch 1080/1540] avg loss 0.00179372, throughput 2.17382K wps
[Epoch 35 Batch 1110/1540] avg loss 0.00202653, throughput 2.18163K wps
[Epoch 35 Batch 1140/1540] avg loss 0.00217896, throughput 2.19243K wps
[Epoch 35 Batch 1170/1540] avg loss 0.00229415, throughput 2.19308K wps
[Epoch 35 Batch 1200/1540] avg loss 0.00203091, throughput 2.1805K wps
[Epoch 35 Batch 1230/1540] avg loss 0.00253105, throughput 2.19219K wps
[Epoch 35 Batch 1260/1540] avg loss 0.00245275, throughput 2.1975K wps
[Epoch 35 Batch 1290/1540] avg loss 0.00201064, throughput 2.16839K wps
[Epoch 35 Batch 1320/1540] avg loss 0.00197151, throughput 2.16038K wps
[Epoch 35 Batch 1350/1540] avg loss 0.002042, throughput 2.18421K wps
[Epoch 35 Batch 1380/1540] avg loss 0.00198066, throughput 2.19444K wps
[Epoch 35 Batch 1410/1540] avg loss 0.00219156, throughput 2.19194K wps
[Epoch 35 Batch 1440/1540] avg loss 0.00203193, throughput 2.18468K wps
[Epoch 35 Batch 1470/1540] avg loss 0.00206788, throughput 2.17836K wps
[Epoch 35 Batch 1500/1540] avg loss 0.00203305, throughput 2.18393K wps
[Epoch 35 Batch 1530/1540] avg loss 0.00220728, throughput 2.18845K wps
Begin Testing...
[Epoch 35] train avg loss 0.00204804, dev acc 0.8452, dev avg loss 0.483881, throughput 2.18503K wps
[Epoch 36 Batch 30/1540] avg loss 0.00183511, throughput 2.21924K wps
[Epoch 36 Batch 60/1540] avg loss 0.00175101, throughput 2.17473K wps
[Epoch 36 Batch 90/1540] avg loss 0.00201594, throughput 2.16178K wps
[Epoch 36 Batch 120/1540] avg loss 0.00192034, throughput 2.18666K wps
[Epoch 36 Batch 150/1540] avg loss 0.00192327, throughput 2.18748K wps
[Epoch 36 Batch 180/1540] avg loss 0.00181279, throughput 2.19137K wps
[Epoch 36 Batch 210/1540] avg loss 0.0017457, throughput 2.163K wps
[Epoch 36 Batch 240/1540] avg loss 0.00174743, throughput 2.17834K wps
[Epoch 36 Batch 270/1540] avg loss 0.00187647, throughput 2.19144K wps
[Epoch 36 Batch 300/1540] avg loss 0.00198538, throughput 2.19572K wps
[Epoch 36 Batch 330/1540] avg loss 0.0018088, throughput 2.15379K wps
[Epoch 36 Batch 360/1540] avg loss 0.00181941, throughput 2.16143K wps
[Epoch 36 Batch 390/1540] avg loss 0.0021698, throughput 2.16397K wps
[Epoch 36 Batch 420/1540] avg loss 0.00190263, throughput 2.19383K wps
[Epoch 36 Batch 450/1540] avg loss 0.00196905, throughput 2.19285K wps
[Epoch 36 Batch 480/1540] avg loss 0.00220195, throughput 2.18453K wps
[Epoch 36 Batch 510/1540] avg loss 0.00192922, throughput 2.18152K wps
[Epoch 36 Batch 540/1540] avg loss 0.00220085, throughput 2.18835K wps
[Epoch 36 Batch 570/1540] avg loss 0.0019264, throughput 2.16715K wps
[Epoch 36 Batch 600/1540] avg loss 0.00223048, throughput 2.17506K wps
[Epoch 36 Batch 630/1540] avg loss 0.00190508, throughput 2.156K wps
[Epoch 36 Batch 660/1540] avg loss 0.00213574, throughput 2.18921K wps
[Epoch 36 Batch 690/1540] avg loss 0.0019629, throughput 2.19204K wps
[Epoch 36 Batch 720/1540] avg loss 0.0021302, throughput 2.18972K wps
[Epoch 36 Batch 750/1540] avg loss 0.00210498, throughput 2.16061K wps
[Epoch 36 Batch 780/1540] avg loss 0.00201288, throughput 2.18592K wps
[Epoch 36 Batch 810/1540] avg loss 0.00207634, throughput 2.18836K wps
[Epoch 36 Batch 840/1540] avg loss 0.00222113, throughput 2.18649K wps
[Epoch 36 Batch 870/1540] avg loss 0.00199431, throughput 2.18644K wps
[Epoch 36 Batch 900/1540] avg loss 0.00187626, throughput 2.18304K wps
[Epoch 36 Batch 930/1540] avg loss 0.00195265, throughput 2.1882K wps
[Epoch 36 Batch 960/1540] avg loss 0.0021028, throughput 2.1932K wps
[Epoch 36 Batch 990/1540] avg loss 0.00202454, throughput 2.19179K wps
[Epoch 36 Batch 1020/1540] avg loss 0.00185831, throughput 2.19405K wps
[Epoch 36 Batch 1050/1540] avg loss 0.00241589, throughput 2.19368K wps
[Epoch 36 Batch 1080/1540] avg loss 0.00233343, throughput 2.19239K wps
[Epoch 36 Batch 1110/1540] avg loss 0.00208142, throughput 2.19102K wps
[Epoch 36 Batch 1140/1540] avg loss 0.001796, throughput 2.19491K wps
[Epoch 36 Batch 1170/1540] avg loss 0.00213985, throughput 2.18712K wps
[Epoch 36 Batch 1200/1540] avg loss 0.00194982, throughput 2.19447K wps
[Epoch 36 Batch 1230/1540] avg loss 0.00199137, throughput 2.19018K wps
[Epoch 36 Batch 1260/1540] avg loss 0.00223916, throughput 2.19321K wps
[Epoch 36 Batch 1290/1540] avg loss 0.00210017, throughput 2.18604K wps
[Epoch 36 Batch 1320/1540] avg loss 0.00201467, throughput 2.1864K wps
[Epoch 36 Batch 1350/1540] avg loss 0.00233233, throughput 2.18431K wps
[Epoch 36 Batch 1380/1540] avg loss 0.00259515, throughput 2.18896K wps
[Epoch 36 Batch 1410/1540] avg loss 0.00208827, throughput 2.19229K wps
[Epoch 36 Batch 1440/1540] avg loss 0.0022616, throughput 2.16026K wps
[Epoch 36 Batch 1470/1540] avg loss 0.00211245, throughput 2.16343K wps
[Epoch 36 Batch 1500/1540] avg loss 0.00201751, throughput 2.18876K wps
[Epoch 36 Batch 1530/1540] avg loss 0.00227791, throughput 2.18277K wps
Begin Testing...
[Epoch 36] train avg loss 0.00203574, dev acc 0.8406, dev avg loss 0.491481, throughput 2.18369K wps
[Epoch 37 Batch 30/1540] avg loss 0.00193322, throughput 2.2353K wps
[Epoch 37 Batch 60/1540] avg loss 0.00175067, throughput 2.19569K wps
[Epoch 37 Batch 90/1540] avg loss 0.00217288, throughput 2.19166K wps
[Epoch 37 Batch 120/1540] avg loss 0.00195354, throughput 2.19021K wps
[Epoch 37 Batch 150/1540] avg loss 0.00200245, throughput 2.18989K wps
[Epoch 37 Batch 180/1540] avg loss 0.00178281, throughput 2.1922K wps
[Epoch 37 Batch 210/1540] avg loss 0.00181862, throughput 2.1803K wps
[Epoch 37 Batch 240/1540] avg loss 0.00171228, throughput 2.17651K wps
[Epoch 37 Batch 270/1540] avg loss 0.00215561, throughput 2.16595K wps
[Epoch 37 Batch 300/1540] avg loss 0.00212337, throughput 2.16705K wps
[Epoch 37 Batch 330/1540] avg loss 0.00185448, throughput 2.17754K wps
[Epoch 37 Batch 360/1540] avg loss 0.00187509, throughput 2.19356K wps
[Epoch 37 Batch 390/1540] avg loss 0.00227444, throughput 2.1639K wps
[Epoch 37 Batch 420/1540] avg loss 0.00212484, throughput 2.18359K wps
[Epoch 37 Batch 450/1540] avg loss 0.00177584, throughput 2.18682K wps
[Epoch 37 Batch 480/1540] avg loss 0.00231857, throughput 2.18241K wps
[Epoch 37 Batch 510/1540] avg loss 0.00196585, throughput 2.17103K wps
[Epoch 37 Batch 540/1540] avg loss 0.00166761, throughput 2.16625K wps
[Epoch 37 Batch 570/1540] avg loss 0.00189714, throughput 2.16816K wps
[Epoch 37 Batch 600/1540] avg loss 0.00213877, throughput 2.19451K wps
[Epoch 37 Batch 630/1540] avg loss 0.001831, throughput 2.18498K wps
[Epoch 37 Batch 660/1540] avg loss 0.00180095, throughput 2.18912K wps
[Epoch 37 Batch 690/1540] avg loss 0.00226343, throughput 2.18482K wps
[Epoch 37 Batch 720/1540] avg loss 0.00169671, throughput 2.18517K wps
[Epoch 37 Batch 750/1540] avg loss 0.00183928, throughput 2.18849K wps
[Epoch 37 Batch 780/1540] avg loss 0.00213104, throughput 2.18974K wps
[Epoch 37 Batch 810/1540] avg loss 0.00191482, throughput 2.1673K wps
[Epoch 37 Batch 840/1540] avg loss 0.00196437, throughput 2.15869K wps
[Epoch 37 Batch 870/1540] avg loss 0.00207755, throughput 2.17171K wps
[Epoch 37 Batch 900/1540] avg loss 0.00196228, throughput 2.18856K wps
[Epoch 37 Batch 930/1540] avg loss 0.00177933, throughput 2.17499K wps
[Epoch 37 Batch 960/1540] avg loss 0.00195402, throughput 2.18721K wps
[Epoch 37 Batch 990/1540] avg loss 0.00165491, throughput 2.19248K wps
[Epoch 37 Batch 1020/1540] avg loss 0.0020478, throughput 2.18946K wps
[Epoch 37 Batch 1050/1540] avg loss 0.00189521, throughput 2.19239K wps
[Epoch 37 Batch 1080/1540] avg loss 0.00196959, throughput 2.18387K wps
[Epoch 37 Batch 1110/1540] avg loss 0.00230647, throughput 2.17816K wps
[Epoch 37 Batch 1140/1540] avg loss 0.00229559, throughput 2.18941K wps
[Epoch 37 Batch 1170/1540] avg loss 0.00192901, throughput 2.18028K wps
[Epoch 37 Batch 1200/1540] avg loss 0.00210088, throughput 2.16094K wps
[Epoch 37 Batch 1230/1540] avg loss 0.00206638, throughput 2.15969K wps
[Epoch 37 Batch 1260/1540] avg loss 0.00201093, throughput 2.1861K wps
[Epoch 37 Batch 1290/1540] avg loss 0.00204087, throughput 2.19024K wps
[Epoch 37 Batch 1320/1540] avg loss 0.00173837, throughput 2.18362K wps
[Epoch 37 Batch 1350/1540] avg loss 0.00190153, throughput 2.17439K wps
[Epoch 37 Batch 1380/1540] avg loss 0.00201736, throughput 2.19071K wps
[Epoch 37 Batch 1410/1540] avg loss 0.00225109, throughput 2.16883K wps
[Epoch 37 Batch 1440/1540] avg loss 0.00232009, throughput 2.17637K wps
[Epoch 37 Batch 1470/1540] avg loss 0.00204595, throughput 2.19292K wps
[Epoch 37 Batch 1500/1540] avg loss 0.0017626, throughput 2.17301K wps
[Epoch 37 Batch 1530/1540] avg loss 0.00203646, throughput 2.19024K wps
Begin Testing...
[Epoch 37] train avg loss 0.00197964, dev acc 0.8303, dev avg loss 0.497021, throughput 2.1823K wps
[Epoch 38 Batch 30/1540] avg loss 0.00181954, throughput 2.24043K wps
[Epoch 38 Batch 60/1540] avg loss 0.00183916, throughput 2.19136K wps
[Epoch 38 Batch 90/1540] avg loss 0.00184111, throughput 2.18759K wps
[Epoch 38 Batch 120/1540] avg loss 0.00178717, throughput 2.1896K wps
[Epoch 38 Batch 150/1540] avg loss 0.00211973, throughput 2.18849K wps
[Epoch 38 Batch 180/1540] avg loss 0.00185679, throughput 2.19371K wps
[Epoch 38 Batch 210/1540] avg loss 0.00174854, throughput 2.1876K wps
[Epoch 38 Batch 240/1540] avg loss 0.00216838, throughput 2.18426K wps
[Epoch 38 Batch 270/1540] avg loss 0.00164351, throughput 2.19368K wps
[Epoch 38 Batch 300/1540] avg loss 0.00171572, throughput 2.19517K wps
[Epoch 38 Batch 330/1540] avg loss 0.00174923, throughput 2.1883K wps
[Epoch 38 Batch 360/1540] avg loss 0.00184105, throughput 2.19377K wps
[Epoch 38 Batch 390/1540] avg loss 0.00163965, throughput 2.18858K wps
[Epoch 38 Batch 420/1540] avg loss 0.00190899, throughput 2.19762K wps
[Epoch 38 Batch 450/1540] avg loss 0.00203935, throughput 2.19665K wps
[Epoch 38 Batch 480/1540] avg loss 0.00235573, throughput 2.19028K wps
[Epoch 38 Batch 510/1540] avg loss 0.00203211, throughput 2.17871K wps
[Epoch 38 Batch 540/1540] avg loss 0.00202983, throughput 2.1925K wps
[Epoch 38 Batch 570/1540] avg loss 0.00171157, throughput 2.18655K wps
[Epoch 38 Batch 600/1540] avg loss 0.0019793, throughput 2.1888K wps
[Epoch 38 Batch 630/1540] avg loss 0.00194635, throughput 2.18611K wps
[Epoch 38 Batch 660/1540] avg loss 0.00210441, throughput 2.1886K wps
[Epoch 38 Batch 690/1540] avg loss 0.00186604, throughput 2.18275K wps
[Epoch 38 Batch 720/1540] avg loss 0.00197307, throughput 2.19173K wps
[Epoch 38 Batch 750/1540] avg loss 0.00181994, throughput 2.19264K wps
[Epoch 38 Batch 780/1540] avg loss 0.00206786, throughput 2.18559K wps
[Epoch 38 Batch 810/1540] avg loss 0.00218007, throughput 2.19003K wps
[Epoch 38 Batch 840/1540] avg loss 0.00204295, throughput 2.19013K wps
[Epoch 38 Batch 870/1540] avg loss 0.00209391, throughput 2.19219K wps
[Epoch 38 Batch 900/1540] avg loss 0.00198297, throughput 2.19419K wps
[Epoch 38 Batch 930/1540] avg loss 0.00183506, throughput 2.18572K wps
[Epoch 38 Batch 960/1540] avg loss 0.00223862, throughput 2.16403K wps
[Epoch 38 Batch 990/1540] avg loss 0.00194937, throughput 2.16383K wps
[Epoch 38 Batch 1020/1540] avg loss 0.0025427, throughput 2.18746K wps
[Epoch 38 Batch 1050/1540] avg loss 0.00259207, throughput 2.1827K wps
[Epoch 38 Batch 1080/1540] avg loss 0.00181189, throughput 2.19183K wps
[Epoch 38 Batch 1110/1540] avg loss 0.00202128, throughput 2.19526K wps
[Epoch 38 Batch 1140/1540] avg loss 0.00173239, throughput 2.19196K wps
[Epoch 38 Batch 1170/1540] avg loss 0.00199013, throughput 2.1926K wps
[Epoch 38 Batch 1200/1540] avg loss 0.00208191, throughput 2.1885K wps
[Epoch 38 Batch 1230/1540] avg loss 0.00221728, throughput 2.19226K wps
[Epoch 38 Batch 1260/1540] avg loss 0.00184685, throughput 2.19185K wps
[Epoch 38 Batch 1290/1540] avg loss 0.00172393, throughput 2.18739K wps
[Epoch 38 Batch 1320/1540] avg loss 0.00184216, throughput 2.19224K wps
[Epoch 38 Batch 1350/1540] avg loss 0.00204226, throughput 2.16496K wps
[Epoch 38 Batch 1380/1540] avg loss 0.00205519, throughput 2.17046K wps
[Epoch 38 Batch 1410/1540] avg loss 0.00195115, throughput 2.16521K wps
[Epoch 38 Batch 1440/1540] avg loss 0.00186726, throughput 2.17017K wps
[Epoch 38 Batch 1470/1540] avg loss 0.00177005, throughput 2.15877K wps
[Epoch 38 Batch 1500/1540] avg loss 0.00230703, throughput 2.18096K wps
[Epoch 38 Batch 1530/1540] avg loss 0.00185065, throughput 2.19019K wps
Begin Testing...
[Epoch 38] train avg loss 0.00196405, dev acc 0.8394, dev avg loss 0.495014, throughput 2.18738K wps
[Epoch 39 Batch 30/1540] avg loss 0.00148645, throughput 2.23939K wps
[Epoch 39 Batch 60/1540] avg loss 0.00187827, throughput 2.19051K wps
[Epoch 39 Batch 90/1540] avg loss 0.00187437, throughput 2.17051K wps
[Epoch 39 Batch 120/1540] avg loss 0.00193341, throughput 2.19273K wps
[Epoch 39 Batch 150/1540] avg loss 0.00169784, throughput 2.19358K wps
[Epoch 39 Batch 180/1540] avg loss 0.00184657, throughput 2.18746K wps
[Epoch 39 Batch 210/1540] avg loss 0.00186759, throughput 2.17491K wps
[Epoch 39 Batch 240/1540] avg loss 0.00149643, throughput 2.19249K wps
[Epoch 39 Batch 270/1540] avg loss 0.00181593, throughput 2.1922K wps
[Epoch 39 Batch 300/1540] avg loss 0.00221413, throughput 2.18651K wps
[Epoch 39 Batch 330/1540] avg loss 0.00190421, throughput 2.1754K wps
[Epoch 39 Batch 360/1540] avg loss 0.00195718, throughput 2.18464K wps
[Epoch 39 Batch 390/1540] avg loss 0.00195407, throughput 2.19262K wps
[Epoch 39 Batch 420/1540] avg loss 0.00159455, throughput 2.19134K wps
[Epoch 39 Batch 450/1540] avg loss 0.00209476, throughput 2.1865K wps
[Epoch 39 Batch 480/1540] avg loss 0.0017759, throughput 2.19309K wps
[Epoch 39 Batch 510/1540] avg loss 0.0021838, throughput 2.19459K wps
[Epoch 39 Batch 540/1540] avg loss 0.00193808, throughput 2.19185K wps
[Epoch 39 Batch 570/1540] avg loss 0.00189438, throughput 2.19365K wps
[Epoch 39 Batch 600/1540] avg loss 0.00192375, throughput 2.192K wps
[Epoch 39 Batch 630/1540] avg loss 0.00180799, throughput 2.19415K wps
[Epoch 39 Batch 660/1540] avg loss 0.00193609, throughput 2.1912K wps
[Epoch 39 Batch 690/1540] avg loss 0.00177261, throughput 2.18626K wps
[Epoch 39 Batch 720/1540] avg loss 0.00169121, throughput 2.19348K wps
[Epoch 39 Batch 750/1540] avg loss 0.00217043, throughput 2.18941K wps
[Epoch 39 Batch 780/1540] avg loss 0.00185857, throughput 2.1721K wps
[Epoch 39 Batch 810/1540] avg loss 0.00188382, throughput 2.17287K wps
[Epoch 39 Batch 840/1540] avg loss 0.001862, throughput 2.19033K wps
[Epoch 39 Batch 870/1540] avg loss 0.00197605, throughput 2.18639K wps
[Epoch 39 Batch 900/1540] avg loss 0.0019158, throughput 2.1907K wps
[Epoch 39 Batch 930/1540] avg loss 0.00176558, throughput 2.18908K wps
[Epoch 39 Batch 960/1540] avg loss 0.00185269, throughput 2.17742K wps
[Epoch 39 Batch 990/1540] avg loss 0.0023851, throughput 2.18589K wps
[Epoch 39 Batch 1020/1540] avg loss 0.0021818, throughput 2.19194K wps
[Epoch 39 Batch 1050/1540] avg loss 0.00179688, throughput 2.18827K wps
[Epoch 39 Batch 1080/1540] avg loss 0.00186492, throughput 2.19276K wps
[Epoch 39 Batch 1110/1540] avg loss 0.00178719, throughput 2.19372K wps
[Epoch 39 Batch 1140/1540] avg loss 0.00182084, throughput 2.17555K wps
[Epoch 39 Batch 1170/1540] avg loss 0.00177979, throughput 2.19347K wps
[Epoch 39 Batch 1200/1540] avg loss 0.00204587, throughput 2.19279K wps
[Epoch 39 Batch 1230/1540] avg loss 0.00208313, throughput 2.19234K wps
[Epoch 39 Batch 1260/1540] avg loss 0.00189742, throughput 2.18957K wps
[Epoch 39 Batch 1290/1540] avg loss 0.001827, throughput 2.18547K wps
[Epoch 39 Batch 1320/1540] avg loss 0.00231262, throughput 2.18781K wps
[Epoch 39 Batch 1350/1540] avg loss 0.00200173, throughput 2.1828K wps
[Epoch 39 Batch 1380/1540] avg loss 0.0021673, throughput 2.19165K wps
[Epoch 39 Batch 1410/1540] avg loss 0.00195078, throughput 2.15987K wps
[Epoch 39 Batch 1440/1540] avg loss 0.00173833, throughput 2.1587K wps
[Epoch 39 Batch 1470/1540] avg loss 0.00215564, throughput 2.18263K wps
[Epoch 39 Batch 1500/1540] avg loss 0.00208941, throughput 2.18057K wps
[Epoch 39 Batch 1530/1540] avg loss 0.00189553, throughput 2.19386K wps
Begin Testing...
[Epoch 39] train avg loss 0.00191247, dev acc 0.8406, dev avg loss 0.504275, throughput 2.18767K wps
[Epoch 40 Batch 30/1540] avg loss 0.00182235, throughput 2.2298K wps
[Epoch 40 Batch 60/1540] avg loss 0.00184257, throughput 2.19186K wps
[Epoch 40 Batch 90/1540] avg loss 0.00210745, throughput 2.18119K wps
[Epoch 40 Batch 120/1540] avg loss 0.00173128, throughput 2.17794K wps
[Epoch 40 Batch 150/1540] avg loss 0.0018136, throughput 2.16909K wps
[Epoch 40 Batch 180/1540] avg loss 0.00201401, throughput 2.18903K wps
[Epoch 40 Batch 210/1540] avg loss 0.00161469, throughput 2.18233K wps
[Epoch 40 Batch 240/1540] avg loss 0.00197709, throughput 2.19296K wps
[Epoch 40 Batch 270/1540] avg loss 0.00165822, throughput 2.18908K wps
[Epoch 40 Batch 300/1540] avg loss 0.00181145, throughput 2.17691K wps
[Epoch 40 Batch 330/1540] avg loss 0.00167568, throughput 2.18972K wps
[Epoch 40 Batch 360/1540] avg loss 0.00197475, throughput 2.19165K wps
[Epoch 40 Batch 390/1540] avg loss 0.00177157, throughput 2.18381K wps
[Epoch 40 Batch 420/1540] avg loss 0.0021082, throughput 2.1852K wps
[Epoch 40 Batch 450/1540] avg loss 0.00184281, throughput 2.18838K wps
[Epoch 40 Batch 480/1540] avg loss 0.00168447, throughput 2.19032K wps
[Epoch 40 Batch 510/1540] avg loss 0.00215542, throughput 2.1895K wps
[Epoch 40 Batch 540/1540] avg loss 0.00172913, throughput 2.17601K wps
[Epoch 40 Batch 570/1540] avg loss 0.00196748, throughput 2.18567K wps
[Epoch 40 Batch 600/1540] avg loss 0.00235119, throughput 2.18858K wps
[Epoch 40 Batch 630/1540] avg loss 0.00206145, throughput 2.16549K wps
[Epoch 40 Batch 660/1540] avg loss 0.00211048, throughput 2.16208K wps
[Epoch 40 Batch 690/1540] avg loss 0.00184553, throughput 2.181K wps
[Epoch 40 Batch 720/1540] avg loss 0.00211865, throughput 2.18062K wps
[Epoch 40 Batch 750/1540] avg loss 0.00159853, throughput 2.18248K wps
[Epoch 40 Batch 780/1540] avg loss 0.00163638, throughput 2.19257K wps
[Epoch 40 Batch 810/1540] avg loss 0.0018753, throughput 2.1837K wps
[Epoch 40 Batch 840/1540] avg loss 0.00170568, throughput 2.19113K wps
[Epoch 40 Batch 870/1540] avg loss 0.00176075, throughput 2.18788K wps
[Epoch 40 Batch 900/1540] avg loss 0.00163557, throughput 2.1903K wps
[Epoch 40 Batch 930/1540] avg loss 0.00184279, throughput 2.18141K wps
[Epoch 40 Batch 960/1540] avg loss 0.00173341, throughput 2.19052K wps
[Epoch 40 Batch 990/1540] avg loss 0.00180717, throughput 2.16704K wps
[Epoch 40 Batch 1020/1540] avg loss 0.00177694, throughput 2.17021K wps
[Epoch 40 Batch 1050/1540] avg loss 0.00178889, throughput 2.18995K wps
[Epoch 40 Batch 1080/1540] avg loss 0.00191246, throughput 2.19372K wps
[Epoch 40 Batch 1110/1540] avg loss 0.00188898, throughput 2.188K wps
[Epoch 40 Batch 1140/1540] avg loss 0.00214952, throughput 2.19162K wps
[Epoch 40 Batch 1170/1540] avg loss 0.00181294, throughput 2.17559K wps
[Epoch 40 Batch 1200/1540] avg loss 0.00220262, throughput 2.15151K wps
[Epoch 40 Batch 1230/1540] avg loss 0.00187127, throughput 2.16422K wps
[Epoch 40 Batch 1260/1540] avg loss 0.00187363, throughput 2.17598K wps
[Epoch 40 Batch 1290/1540] avg loss 0.00198451, throughput 2.16769K wps
[Epoch 40 Batch 1320/1540] avg loss 0.00169074, throughput 2.18984K wps
[Epoch 40 Batch 1350/1540] avg loss 0.00205723, throughput 2.18396K wps
[Epoch 40 Batch 1380/1540] avg loss 0.00198591, throughput 2.19106K wps
[Epoch 40 Batch 1410/1540] avg loss 0.00203835, throughput 2.1876K wps
[Epoch 40 Batch 1440/1540] avg loss 0.00167102, throughput 2.18639K wps
[Epoch 40 Batch 1470/1540] avg loss 0.00176112, throughput 2.19494K wps
[Epoch 40 Batch 1500/1540] avg loss 0.00190166, throughput 2.17637K wps
[Epoch 40 Batch 1530/1540] avg loss 0.0018815, throughput 2.166K wps
Begin Testing...
[Epoch 40] train avg loss 0.00188048, dev acc 0.8383, dev avg loss 0.518173, throughput 2.18328K wps
[Epoch 41 Batch 30/1540] avg loss 0.00154123, throughput 2.22454K wps
[Epoch 41 Batch 60/1540] avg loss 0.00174674, throughput 2.18922K wps
[Epoch 41 Batch 90/1540] avg loss 0.00171899, throughput 2.19272K wps
[Epoch 41 Batch 120/1540] avg loss 0.0018245, throughput 2.19002K wps
[Epoch 41 Batch 150/1540] avg loss 0.00167066, throughput 2.16605K wps
[Epoch 41 Batch 180/1540] avg loss 0.00158267, throughput 2.19313K wps
[Epoch 41 Batch 210/1540] avg loss 0.00196107, throughput 2.18515K wps
[Epoch 41 Batch 240/1540] avg loss 0.00168776, throughput 2.1857K wps
[Epoch 41 Batch 270/1540] avg loss 0.00184745, throughput 2.19341K wps
[Epoch 41 Batch 300/1540] avg loss 0.00187851, throughput 2.18381K wps
[Epoch 41 Batch 330/1540] avg loss 0.00206988, throughput 2.18794K wps
[Epoch 41 Batch 360/1540] avg loss 0.00176922, throughput 2.17738K wps
[Epoch 41 Batch 390/1540] avg loss 0.0018355, throughput 2.18378K wps
[Epoch 41 Batch 420/1540] avg loss 0.00175793, throughput 2.18484K wps
[Epoch 41 Batch 450/1540] avg loss 0.00191678, throughput 2.18918K wps
[Epoch 41 Batch 480/1540] avg loss 0.00161354, throughput 2.18362K wps
[Epoch 41 Batch 510/1540] avg loss 0.00184077, throughput 2.1883K wps
[Epoch 41 Batch 540/1540] avg loss 0.00183639, throughput 2.19036K wps
[Epoch 41 Batch 570/1540] avg loss 0.00180273, throughput 2.17921K wps
[Epoch 41 Batch 600/1540] avg loss 0.00166724, throughput 2.16521K wps
[Epoch 41 Batch 630/1540] avg loss 0.00189416, throughput 2.18947K wps
[Epoch 41 Batch 660/1540] avg loss 0.00192416, throughput 2.18457K wps
[Epoch 41 Batch 690/1540] avg loss 0.00183815, throughput 2.1928K wps
[Epoch 41 Batch 720/1540] avg loss 0.00198334, throughput 2.19037K wps
[Epoch 41 Batch 750/1540] avg loss 0.00198949, throughput 2.1927K wps
[Epoch 41 Batch 780/1540] avg loss 0.00171441, throughput 2.18787K wps
[Epoch 41 Batch 810/1540] avg loss 0.00208846, throughput 2.19226K wps
[Epoch 41 Batch 840/1540] avg loss 0.00177218, throughput 2.18808K wps
[Epoch 41 Batch 870/1540] avg loss 0.00199813, throughput 2.18354K wps
[Epoch 41 Batch 900/1540] avg loss 0.00179853, throughput 2.19155K wps
[Epoch 41 Batch 930/1540] avg loss 0.00181357, throughput 2.19522K wps
[Epoch 41 Batch 960/1540] avg loss 0.00223835, throughput 2.1908K wps
[Epoch 41 Batch 990/1540] avg loss 0.00185197, throughput 2.18822K wps
[Epoch 41 Batch 1020/1540] avg loss 0.00185084, throughput 2.18849K wps
[Epoch 41 Batch 1050/1540] avg loss 0.00214603, throughput 2.18915K wps
[Epoch 41 Batch 1080/1540] avg loss 0.0019522, throughput 2.19129K wps
[Epoch 41 Batch 1110/1540] avg loss 0.00185064, throughput 2.15861K wps
[Epoch 41 Batch 1140/1540] avg loss 0.00203855, throughput 2.1686K wps
[Epoch 41 Batch 1170/1540] avg loss 0.00208153, throughput 2.18437K wps
[Epoch 41 Batch 1200/1540] avg loss 0.00187715, throughput 2.18962K wps
[Epoch 41 Batch 1230/1540] avg loss 0.0020473, throughput 2.19025K wps
[Epoch 41 Batch 1260/1540] avg loss 0.00218646, throughput 2.17514K wps
[Epoch 41 Batch 1290/1540] avg loss 0.00163835, throughput 2.19109K wps
[Epoch 41 Batch 1320/1540] avg loss 0.00195265, throughput 2.17954K wps
[Epoch 41 Batch 1350/1540] avg loss 0.00191344, throughput 2.17872K wps
[Epoch 41 Batch 1380/1540] avg loss 0.00209467, throughput 2.18683K wps
[Epoch 41 Batch 1410/1540] avg loss 0.00203235, throughput 2.19333K wps
[Epoch 41 Batch 1440/1540] avg loss 0.00199995, throughput 2.1914K wps
[Epoch 41 Batch 1470/1540] avg loss 0.00179975, throughput 2.19531K wps
[Epoch 41 Batch 1500/1540] avg loss 0.00160778, throughput 2.18034K wps
[Epoch 41 Batch 1530/1540] avg loss 0.00185535, throughput 2.19004K wps
Begin Testing...
[Epoch 41] train avg loss 0.00187233, dev acc 0.8406, dev avg loss 0.514474, throughput 2.18675K wps
[Epoch 42 Batch 30/1540] avg loss 0.00185709, throughput 2.22285K wps
[Epoch 42 Batch 60/1540] avg loss 0.00162246, throughput 2.1669K wps
[Epoch 42 Batch 90/1540] avg loss 0.00175597, throughput 2.18638K wps
[Epoch 42 Batch 120/1540] avg loss 0.00163844, throughput 2.17382K wps
[Epoch 42 Batch 150/1540] avg loss 0.00181089, throughput 2.18986K wps
[Epoch 42 Batch 180/1540] avg loss 0.00190609, throughput 2.19115K wps
[Epoch 42 Batch 210/1540] avg loss 0.0017887, throughput 2.18881K wps
[Epoch 42 Batch 240/1540] avg loss 0.00179731, throughput 2.18477K wps
[Epoch 42 Batch 270/1540] avg loss 0.00158615, throughput 2.17426K wps
[Epoch 42 Batch 300/1540] avg loss 0.00206252, throughput 2.18708K wps
[Epoch 42 Batch 330/1540] avg loss 0.00172803, throughput 2.19266K wps
[Epoch 42 Batch 360/1540] avg loss 0.00195096, throughput 2.16544K wps
[Epoch 42 Batch 390/1540] avg loss 0.00174689, throughput 2.17534K wps
[Epoch 42 Batch 420/1540] avg loss 0.0018407, throughput 2.18402K wps
[Epoch 42 Batch 450/1540] avg loss 0.00177068, throughput 2.18821K wps
[Epoch 42 Batch 480/1540] avg loss 0.00190326, throughput 2.16043K wps
[Epoch 42 Batch 510/1540] avg loss 0.00179413, throughput 2.16363K wps
[Epoch 42 Batch 540/1540] avg loss 0.00166782, throughput 2.19012K wps
[Epoch 42 Batch 570/1540] avg loss 0.0016084, throughput 2.17973K wps
[Epoch 42 Batch 600/1540] avg loss 0.00180787, throughput 2.18888K wps
[Epoch 42 Batch 630/1540] avg loss 0.00203364, throughput 2.18529K wps
[Epoch 42 Batch 660/1540] avg loss 0.0021439, throughput 2.1723K wps
[Epoch 42 Batch 690/1540] avg loss 0.00174401, throughput 2.18116K wps
[Epoch 42 Batch 720/1540] avg loss 0.00186403, throughput 2.16739K wps
[Epoch 42 Batch 750/1540] avg loss 0.00212036, throughput 2.17791K wps
[Epoch 42 Batch 780/1540] avg loss 0.00181486, throughput 2.19506K wps
[Epoch 42 Batch 810/1540] avg loss 0.00197201, throughput 2.18934K wps
[Epoch 42 Batch 840/1540] avg loss 0.00170229, throughput 2.19172K wps
[Epoch 42 Batch 870/1540] avg loss 0.00172436, throughput 2.17167K wps
[Epoch 42 Batch 900/1540] avg loss 0.00184421, throughput 2.18789K wps
[Epoch 42 Batch 930/1540] avg loss 0.00165278, throughput 2.19074K wps
[Epoch 42 Batch 960/1540] avg loss 0.00171131, throughput 2.19102K wps
[Epoch 42 Batch 990/1540] avg loss 0.00191246, throughput 2.18938K wps
[Epoch 42 Batch 1020/1540] avg loss 0.00198794, throughput 2.19534K wps
[Epoch 42 Batch 1050/1540] avg loss 0.00172689, throughput 2.18233K wps
[Epoch 42 Batch 1080/1540] avg loss 0.00170288, throughput 2.16269K wps
[Epoch 42 Batch 1110/1540] avg loss 0.00198267, throughput 2.183K wps
[Epoch 42 Batch 1140/1540] avg loss 0.00166259, throughput 2.18885K wps
[Epoch 42 Batch 1170/1540] avg loss 0.00175132, throughput 2.18891K wps
[Epoch 42 Batch 1200/1540] avg loss 0.00188122, throughput 2.17347K wps
[Epoch 42 Batch 1230/1540] avg loss 0.00158908, throughput 2.16781K wps
[Epoch 42 Batch 1260/1540] avg loss 0.00190309, throughput 2.18974K wps
[Epoch 42 Batch 1290/1540] avg loss 0.00196522, throughput 2.19265K wps
[Epoch 42 Batch 1320/1540] avg loss 0.00194736, throughput 2.17981K wps
[Epoch 42 Batch 1350/1540] avg loss 0.00172513, throughput 2.17508K wps
[Epoch 42 Batch 1380/1540] avg loss 0.00199398, throughput 2.17214K wps
[Epoch 42 Batch 1410/1540] avg loss 0.00175225, throughput 2.18047K wps
[Epoch 42 Batch 1440/1540] avg loss 0.00185834, throughput 2.1946K wps
[Epoch 42 Batch 1470/1540] avg loss 0.00172583, throughput 2.19399K wps
[Epoch 42 Batch 1500/1540] avg loss 0.00169228, throughput 2.19453K wps
[Epoch 42 Batch 1530/1540] avg loss 0.00183881, throughput 2.18253K wps
Begin Testing...
[Epoch 42] train avg loss 0.00181396, dev acc 0.8429, dev avg loss 0.513382, throughput 2.18328K wps
[Epoch 43 Batch 30/1540] avg loss 0.00151625, throughput 2.22239K wps
[Epoch 43 Batch 60/1540] avg loss 0.00151736, throughput 2.19534K wps
[Epoch 43 Batch 90/1540] avg loss 0.00157074, throughput 2.19413K wps
[Epoch 43 Batch 120/1540] avg loss 0.00169357, throughput 2.18811K wps
[Epoch 43 Batch 150/1540] avg loss 0.00164012, throughput 2.19447K wps
[Epoch 43 Batch 180/1540] avg loss 0.00161098, throughput 2.18935K wps
[Epoch 43 Batch 210/1540] avg loss 0.00179422, throughput 2.18168K wps
[Epoch 43 Batch 240/1540] avg loss 0.00184918, throughput 2.18525K wps
[Epoch 43 Batch 270/1540] avg loss 0.00174032, throughput 2.19052K wps
[Epoch 43 Batch 300/1540] avg loss 0.001648, throughput 2.18928K wps
[Epoch 43 Batch 330/1540] avg loss 0.00177869, throughput 2.19422K wps
[Epoch 43 Batch 360/1540] avg loss 0.00164536, throughput 2.18643K wps
[Epoch 43 Batch 390/1540] avg loss 0.00167927, throughput 2.18813K wps
[Epoch 43 Batch 420/1540] avg loss 0.00175444, throughput 2.18804K wps
[Epoch 43 Batch 450/1540] avg loss 0.00156503, throughput 2.18477K wps
[Epoch 43 Batch 480/1540] avg loss 0.00194038, throughput 2.19275K wps
[Epoch 43 Batch 510/1540] avg loss 0.00145263, throughput 2.19243K wps
[Epoch 43 Batch 540/1540] avg loss 0.00212815, throughput 2.18724K wps
[Epoch 43 Batch 570/1540] avg loss 0.0016045, throughput 2.17323K wps
[Epoch 43 Batch 600/1540] avg loss 0.00172369, throughput 2.16742K wps
[Epoch 43 Batch 630/1540] avg loss 0.00168882, throughput 2.16761K wps
[Epoch 43 Batch 660/1540] avg loss 0.00178075, throughput 2.18856K wps
[Epoch 43 Batch 690/1540] avg loss 0.00162129, throughput 2.18632K wps
[Epoch 43 Batch 720/1540] avg loss 0.00156038, throughput 2.1865K wps
[Epoch 43 Batch 750/1540] avg loss 0.00182638, throughput 2.18201K wps
[Epoch 43 Batch 780/1540] avg loss 0.00197769, throughput 2.19404K wps
[Epoch 43 Batch 810/1540] avg loss 0.00182578, throughput 2.18869K wps
[Epoch 43 Batch 840/1540] avg loss 0.00178725, throughput 2.19333K wps
[Epoch 43 Batch 870/1540] avg loss 0.00164234, throughput 2.19001K wps
[Epoch 43 Batch 900/1540] avg loss 0.00177607, throughput 2.18391K wps
[Epoch 43 Batch 930/1540] avg loss 0.00200736, throughput 2.18939K wps
[Epoch 43 Batch 960/1540] avg loss 0.00176582, throughput 2.18927K wps
[Epoch 43 Batch 990/1540] avg loss 0.00182414, throughput 2.16419K wps
[Epoch 43 Batch 1020/1540] avg loss 0.00188651, throughput 2.18823K wps
[Epoch 43 Batch 1050/1540] avg loss 0.00158736, throughput 2.18374K wps
[Epoch 43 Batch 1080/1540] avg loss 0.00196514, throughput 2.1937K wps
[Epoch 43 Batch 1110/1540] avg loss 0.00207322, throughput 2.18587K wps
[Epoch 43 Batch 1140/1540] avg loss 0.00183107, throughput 2.1656K wps
[Epoch 43 Batch 1170/1540] avg loss 0.0019584, throughput 2.18778K wps
[Epoch 43 Batch 1200/1540] avg loss 0.00211424, throughput 2.19233K wps
[Epoch 43 Batch 1230/1540] avg loss 0.00178475, throughput 2.17222K wps
[Epoch 43 Batch 1260/1540] avg loss 0.00201689, throughput 2.19052K wps
[Epoch 43 Batch 1290/1540] avg loss 0.00185857, throughput 2.18147K wps
[Epoch 43 Batch 1320/1540] avg loss 0.0020765, throughput 2.18329K wps
[Epoch 43 Batch 1350/1540] avg loss 0.00167082, throughput 2.16689K wps
[Epoch 43 Batch 1380/1540] avg loss 0.00195694, throughput 2.18951K wps
[Epoch 43 Batch 1410/1540] avg loss 0.00165936, throughput 2.18736K wps
[Epoch 43 Batch 1440/1540] avg loss 0.001844, throughput 2.18598K wps
[Epoch 43 Batch 1470/1540] avg loss 0.00167452, throughput 2.19117K wps
[Epoch 43 Batch 1500/1540] avg loss 0.00192015, throughput 2.19194K wps
[Epoch 43 Batch 1530/1540] avg loss 0.0019265, throughput 2.18641K wps
Begin Testing...
[Epoch 43] train avg loss 0.00178236, dev acc 0.8303, dev avg loss 0.553109, throughput 2.18637K wps
[Epoch 44 Batch 30/1540] avg loss 0.00189117, throughput 2.22976K wps
[Epoch 44 Batch 60/1540] avg loss 0.00169367, throughput 2.18696K wps
[Epoch 44 Batch 90/1540] avg loss 0.00185893, throughput 2.18869K wps
[Epoch 44 Batch 120/1540] avg loss 0.00167986, throughput 2.19101K wps
[Epoch 44 Batch 150/1540] avg loss 0.00166978, throughput 2.19217K wps
[Epoch 44 Batch 180/1540] avg loss 0.00161201, throughput 2.19135K wps
[Epoch 44 Batch 210/1540] avg loss 0.00157121, throughput 2.1874K wps
[Epoch 44 Batch 240/1540] avg loss 0.00178074, throughput 2.18805K wps
[Epoch 44 Batch 270/1540] avg loss 0.00165894, throughput 2.1898K wps
[Epoch 44 Batch 300/1540] avg loss 0.00167303, throughput 2.18053K wps
[Epoch 44 Batch 330/1540] avg loss 0.00163338, throughput 2.18888K wps
[Epoch 44 Batch 360/1540] avg loss 0.00188011, throughput 2.19061K wps
[Epoch 44 Batch 390/1540] avg loss 0.00192869, throughput 2.1842K wps
[Epoch 44 Batch 420/1540] avg loss 0.00184023, throughput 2.18414K wps
[Epoch 44 Batch 450/1540] avg loss 0.00193319, throughput 2.18715K wps
[Epoch 44 Batch 480/1540] avg loss 0.00171022, throughput 2.18852K wps
[Epoch 44 Batch 510/1540] avg loss 0.00185423, throughput 2.18876K wps
[Epoch 44 Batch 540/1540] avg loss 0.00158693, throughput 2.19206K wps
[Epoch 44 Batch 570/1540] avg loss 0.00173859, throughput 2.18854K wps
[Epoch 44 Batch 600/1540] avg loss 0.00180147, throughput 2.19362K wps
[Epoch 44 Batch 630/1540] avg loss 0.00196435, throughput 2.18825K wps
[Epoch 44 Batch 660/1540] avg loss 0.00156412, throughput 2.18953K wps
[Epoch 44 Batch 690/1540] avg loss 0.0018067, throughput 2.18404K wps
[Epoch 44 Batch 720/1540] avg loss 0.0018928, throughput 2.17571K wps
[Epoch 44 Batch 750/1540] avg loss 0.00176424, throughput 2.18187K wps
[Epoch 44 Batch 780/1540] avg loss 0.00187114, throughput 2.17109K wps
[Epoch 44 Batch 810/1540] avg loss 0.00165547, throughput 2.18991K wps
[Epoch 44 Batch 840/1540] avg loss 0.0018416, throughput 2.19088K wps
[Epoch 44 Batch 870/1540] avg loss 0.00172183, throughput 2.16266K wps
[Epoch 44 Batch 900/1540] avg loss 0.00193168, throughput 2.18581K wps
[Epoch 44 Batch 930/1540] avg loss 0.00203929, throughput 2.1897K wps
[Epoch 44 Batch 960/1540] avg loss 0.00175096, throughput 2.17439K wps
[Epoch 44 Batch 990/1540] avg loss 0.00200011, throughput 2.17735K wps
[Epoch 44 Batch 1020/1540] avg loss 0.00146933, throughput 2.18088K wps
[Epoch 44 Batch 1050/1540] avg loss 0.00190363, throughput 2.19035K wps
[Epoch 44 Batch 1080/1540] avg loss 0.00164951, throughput 2.18669K wps
[Epoch 44 Batch 1110/1540] avg loss 0.00180495, throughput 2.1889K wps
[Epoch 44 Batch 1140/1540] avg loss 0.00157917, throughput 2.19031K wps
[Epoch 44 Batch 1170/1540] avg loss 0.00167149, throughput 2.19105K wps
[Epoch 44 Batch 1200/1540] avg loss 0.00207619, throughput 2.19465K wps
[Epoch 44 Batch 1230/1540] avg loss 0.00195613, throughput 2.18096K wps
[Epoch 44 Batch 1260/1540] avg loss 0.00156316, throughput 2.16374K wps
[Epoch 44 Batch 1290/1540] avg loss 0.00191696, throughput 2.19025K wps
[Epoch 44 Batch 1320/1540] avg loss 0.00146781, throughput 2.19665K wps
[Epoch 44 Batch 1350/1540] avg loss 0.00159671, throughput 2.1936K wps
[Epoch 44 Batch 1380/1540] avg loss 0.00186775, throughput 2.18717K wps
[Epoch 44 Batch 1410/1540] avg loss 0.0018703, throughput 2.19517K wps
[Epoch 44 Batch 1440/1540] avg loss 0.0019503, throughput 2.19543K wps
[Epoch 44 Batch 1470/1540] avg loss 0.00189789, throughput 2.18627K wps
[Epoch 44 Batch 1500/1540] avg loss 0.00171905, throughput 2.18398K wps
[Epoch 44 Batch 1530/1540] avg loss 0.00184507, throughput 2.19313K wps
Begin Testing...
[Epoch 44] train avg loss 0.00177759, dev acc 0.8440, dev avg loss 0.528029, throughput 2.18758K wps
[Epoch 45 Batch 30/1540] avg loss 0.00166525, throughput 2.22768K wps
[Epoch 45 Batch 60/1540] avg loss 0.00171524, throughput 2.18104K wps
[Epoch 45 Batch 90/1540] avg loss 0.00197264, throughput 2.17315K wps
[Epoch 45 Batch 120/1540] avg loss 0.00162984, throughput 2.18381K wps
[Epoch 45 Batch 150/1540] avg loss 0.00177455, throughput 2.19456K wps
[Epoch 45 Batch 180/1540] avg loss 0.00165116, throughput 2.1943K wps
[Epoch 45 Batch 210/1540] avg loss 0.00180047, throughput 2.18645K wps
[Epoch 45 Batch 240/1540] avg loss 0.0016049, throughput 2.19289K wps
[Epoch 45 Batch 270/1540] avg loss 0.00135612, throughput 2.18171K wps
[Epoch 45 Batch 300/1540] avg loss 0.00175825, throughput 2.18803K wps
[Epoch 45 Batch 330/1540] avg loss 0.00163468, throughput 2.19269K wps
[Epoch 45 Batch 360/1540] avg loss 0.00153459, throughput 2.19482K wps
[Epoch 45 Batch 390/1540] avg loss 0.0019309, throughput 2.19092K wps
[Epoch 45 Batch 420/1540] avg loss 0.00147915, throughput 2.19178K wps
[Epoch 45 Batch 450/1540] avg loss 0.00185125, throughput 2.1596K wps
[Epoch 45 Batch 480/1540] avg loss 0.00168526, throughput 2.18601K wps
[Epoch 45 Batch 510/1540] avg loss 0.00153377, throughput 2.19652K wps
[Epoch 45 Batch 540/1540] avg loss 0.00152505, throughput 2.19093K wps
[Epoch 45 Batch 570/1540] avg loss 0.00149619, throughput 2.18926K wps
[Epoch 45 Batch 600/1540] avg loss 0.00177623, throughput 2.19363K wps
[Epoch 45 Batch 630/1540] avg loss 0.00194928, throughput 2.18371K wps
[Epoch 45 Batch 660/1540] avg loss 0.00160308, throughput 2.19523K wps
[Epoch 45 Batch 690/1540] avg loss 0.0015537, throughput 2.18741K wps
[Epoch 45 Batch 720/1540] avg loss 0.00178943, throughput 2.18683K wps
[Epoch 45 Batch 750/1540] avg loss 0.00168061, throughput 2.19295K wps
[Epoch 45 Batch 780/1540] avg loss 0.00173838, throughput 2.16713K wps
[Epoch 45 Batch 810/1540] avg loss 0.00166077, throughput 2.18776K wps
[Epoch 45 Batch 840/1540] avg loss 0.00180915, throughput 2.19723K wps
[Epoch 45 Batch 870/1540] avg loss 0.0018703, throughput 2.18447K wps
[Epoch 45 Batch 900/1540] avg loss 0.00159245, throughput 2.19177K wps
[Epoch 45 Batch 930/1540] avg loss 0.00164852, throughput 2.15561K wps
[Epoch 45 Batch 960/1540] avg loss 0.00189684, throughput 2.17126K wps
[Epoch 45 Batch 990/1540] avg loss 0.00194307, throughput 2.1652K wps
[Epoch 45 Batch 1020/1540] avg loss 0.00190892, throughput 2.19096K wps
[Epoch 45 Batch 1050/1540] avg loss 0.00171937, throughput 2.18388K wps
[Epoch 45 Batch 1080/1540] avg loss 0.00174442, throughput 2.17943K wps
[Epoch 45 Batch 1110/1540] avg loss 0.00189527, throughput 2.17708K wps
[Epoch 45 Batch 1140/1540] avg loss 0.00170444, throughput 2.19177K wps
[Epoch 45 Batch 1170/1540] avg loss 0.00197747, throughput 2.18902K wps
[Epoch 45 Batch 1200/1540] avg loss 0.00186811, throughput 2.18999K wps
[Epoch 45 Batch 1230/1540] avg loss 0.00183105, throughput 2.19075K wps
[Epoch 45 Batch 1260/1540] avg loss 0.00205406, throughput 2.17728K wps
[Epoch 45 Batch 1290/1540] avg loss 0.00164549, throughput 2.19195K wps
[Epoch 45 Batch 1320/1540] avg loss 0.00192172, throughput 2.18444K wps
[Epoch 45 Batch 1350/1540] avg loss 0.00192799, throughput 2.19173K wps
[Epoch 45 Batch 1380/1540] avg loss 0.00166175, throughput 2.19012K wps
[Epoch 45 Batch 1410/1540] avg loss 0.00192208, throughput 2.17828K wps
[Epoch 45 Batch 1440/1540] avg loss 0.00175321, throughput 2.19252K wps
[Epoch 45 Batch 1470/1540] avg loss 0.00153697, throughput 2.19101K wps
[Epoch 45 Batch 1500/1540] avg loss 0.0017522, throughput 2.19551K wps
[Epoch 45 Batch 1530/1540] avg loss 0.00177421, throughput 2.17528K wps
Begin Testing...
[Epoch 45] train avg loss 0.00174064, dev acc 0.8383, dev avg loss 0.53397, throughput 2.18668K wps
[Epoch 46 Batch 30/1540] avg loss 0.00156638, throughput 2.24139K wps
[Epoch 46 Batch 60/1540] avg loss 0.00170015, throughput 2.1884K wps
[Epoch 46 Batch 90/1540] avg loss 0.00156873, throughput 2.19551K wps
[Epoch 46 Batch 120/1540] avg loss 0.0018152, throughput 2.19402K wps
[Epoch 46 Batch 150/1540] avg loss 0.0016414, throughput 2.18915K wps
[Epoch 46 Batch 180/1540] avg loss 0.00150392, throughput 2.19553K wps
[Epoch 46 Batch 210/1540] avg loss 0.00148953, throughput 2.1955K wps
[Epoch 46 Batch 240/1540] avg loss 0.00180058, throughput 2.16468K wps
[Epoch 46 Batch 270/1540] avg loss 0.00158092, throughput 2.1883K wps
[Epoch 46 Batch 300/1540] avg loss 0.00158257, throughput 2.178K wps
[Epoch 46 Batch 330/1540] avg loss 0.00175646, throughput 2.18887K wps
[Epoch 46 Batch 360/1540] avg loss 0.00172198, throughput 2.19351K wps
[Epoch 46 Batch 390/1540] avg loss 0.00169524, throughput 2.18316K wps
[Epoch 46 Batch 420/1540] avg loss 0.00203436, throughput 2.19299K wps
[Epoch 46 Batch 450/1540] avg loss 0.00165232, throughput 2.19115K wps
[Epoch 46 Batch 480/1540] avg loss 0.00174902, throughput 2.19082K wps
[Epoch 46 Batch 510/1540] avg loss 0.00180552, throughput 2.16147K wps
[Epoch 46 Batch 540/1540] avg loss 0.00174319, throughput 2.18388K wps
[Epoch 46 Batch 570/1540] avg loss 0.00174463, throughput 2.19262K wps
[Epoch 46 Batch 600/1540] avg loss 0.00157445, throughput 2.18463K wps
[Epoch 46 Batch 630/1540] avg loss 0.00176166, throughput 2.19071K wps
[Epoch 46 Batch 660/1540] avg loss 0.00153861, throughput 2.19088K wps
[Epoch 46 Batch 690/1540] avg loss 0.00178933, throughput 2.18809K wps
[Epoch 46 Batch 720/1540] avg loss 0.00170353, throughput 2.19142K wps
[Epoch 46 Batch 750/1540] avg loss 0.00172824, throughput 2.18338K wps
[Epoch 46 Batch 780/1540] avg loss 0.00162485, throughput 2.17092K wps
[Epoch 46 Batch 810/1540] avg loss 0.0017356, throughput 2.19268K wps
[Epoch 46 Batch 840/1540] avg loss 0.00193794, throughput 2.18512K wps
[Epoch 46 Batch 870/1540] avg loss 0.0011825, throughput 2.1863K wps
[Epoch 46 Batch 900/1540] avg loss 0.00148075, throughput 2.16791K wps
[Epoch 46 Batch 930/1540] avg loss 0.00161152, throughput 2.18622K wps
[Epoch 46 Batch 960/1540] avg loss 0.00168231, throughput 2.18204K wps
[Epoch 46 Batch 990/1540] avg loss 0.00170967, throughput 2.19365K wps
[Epoch 46 Batch 1020/1540] avg loss 0.00204462, throughput 2.1818K wps
[Epoch 46 Batch 1050/1540] avg loss 0.00188016, throughput 2.19276K wps
[Epoch 46 Batch 1080/1540] avg loss 0.00186136, throughput 2.19043K wps
[Epoch 46 Batch 1110/1540] avg loss 0.00167191, throughput 2.19051K wps
[Epoch 46 Batch 1140/1540] avg loss 0.00172835, throughput 2.19403K wps
[Epoch 46 Batch 1170/1540] avg loss 0.00180366, throughput 2.17041K wps
[Epoch 46 Batch 1200/1540] avg loss 0.00182303, throughput 2.1791K wps
[Epoch 46 Batch 1230/1540] avg loss 0.00195032, throughput 2.18724K wps
[Epoch 46 Batch 1260/1540] avg loss 0.00163941, throughput 2.18495K wps
[Epoch 46 Batch 1290/1540] avg loss 0.00180676, throughput 2.17223K wps
[Epoch 46 Batch 1320/1540] avg loss 0.00196241, throughput 2.18189K wps
[Epoch 46 Batch 1350/1540] avg loss 0.00145904, throughput 2.1696K wps
[Epoch 46 Batch 1380/1540] avg loss 0.00154432, throughput 2.19416K wps
[Epoch 46 Batch 1410/1540] avg loss 0.00174315, throughput 2.19332K wps
[Epoch 46 Batch 1440/1540] avg loss 0.00181521, throughput 2.18963K wps
[Epoch 46 Batch 1470/1540] avg loss 0.00196266, throughput 2.19212K wps
[Epoch 46 Batch 1500/1540] avg loss 0.00181476, throughput 2.18207K wps
[Epoch 46 Batch 1530/1540] avg loss 0.00231422, throughput 2.18551K wps
Begin Testing...
[Epoch 46] train avg loss 0.00172662, dev acc 0.8463, dev avg loss 0.533297, throughput 2.18678K wps
[Epoch 47 Batch 30/1540] avg loss 0.00161934, throughput 2.21309K wps
[Epoch 47 Batch 60/1540] avg loss 0.0014388, throughput 2.18846K wps
[Epoch 47 Batch 90/1540] avg loss 0.00181497, throughput 2.19061K wps
[Epoch 47 Batch 120/1540] avg loss 0.00159063, throughput 2.19003K wps
[Epoch 47 Batch 150/1540] avg loss 0.00168893, throughput 2.19042K wps
[Epoch 47 Batch 180/1540] avg loss 0.00155221, throughput 2.18993K wps
[Epoch 47 Batch 210/1540] avg loss 0.0016808, throughput 2.19215K wps
[Epoch 47 Batch 240/1540] avg loss 0.00163723, throughput 2.19322K wps
[Epoch 47 Batch 270/1540] avg loss 0.00166164, throughput 2.19083K wps
[Epoch 47 Batch 300/1540] avg loss 0.00157439, throughput 2.18332K wps
[Epoch 47 Batch 330/1540] avg loss 0.00185033, throughput 2.18336K wps
[Epoch 47 Batch 360/1540] avg loss 0.00140075, throughput 2.17559K wps
[Epoch 47 Batch 390/1540] avg loss 0.00162783, throughput 2.18114K wps
[Epoch 47 Batch 420/1540] avg loss 0.00165135, throughput 2.16799K wps
[Epoch 47 Batch 450/1540] avg loss 0.00147418, throughput 2.16222K wps
[Epoch 47 Batch 480/1540] avg loss 0.00179699, throughput 2.16798K wps
[Epoch 47 Batch 510/1540] avg loss 0.00192552, throughput 2.18929K wps
[Epoch 47 Batch 540/1540] avg loss 0.00169555, throughput 2.19611K wps
[Epoch 47 Batch 570/1540] avg loss 0.00149324, throughput 2.19394K wps
[Epoch 47 Batch 600/1540] avg loss 0.00162269, throughput 2.1892K wps
[Epoch 47 Batch 630/1540] avg loss 0.00173798, throughput 2.19013K wps
[Epoch 47 Batch 660/1540] avg loss 0.00164935, throughput 2.19197K wps
[Epoch 47 Batch 690/1540] avg loss 0.00166431, throughput 2.18503K wps
[Epoch 47 Batch 720/1540] avg loss 0.00182447, throughput 2.175K wps
[Epoch 47 Batch 750/1540] avg loss 0.00163628, throughput 2.19258K wps
[Epoch 47 Batch 780/1540] avg loss 0.00171747, throughput 2.16025K wps
[Epoch 47 Batch 810/1540] avg loss 0.00157961, throughput 2.17886K wps
[Epoch 47 Batch 840/1540] avg loss 0.00149306, throughput 2.18904K wps
[Epoch 47 Batch 870/1540] avg loss 0.00179935, throughput 2.19615K wps
[Epoch 47 Batch 900/1540] avg loss 0.00188109, throughput 2.18957K wps
[Epoch 47 Batch 930/1540] avg loss 0.00168741, throughput 2.18908K wps
[Epoch 47 Batch 960/1540] avg loss 0.00147825, throughput 2.1835K wps
[Epoch 47 Batch 990/1540] avg loss 0.00203233, throughput 2.17507K wps
[Epoch 47 Batch 1020/1540] avg loss 0.00147645, throughput 2.15687K wps
[Epoch 47 Batch 1050/1540] avg loss 0.00186279, throughput 2.16182K wps
[Epoch 47 Batch 1080/1540] avg loss 0.00153359, throughput 2.17994K wps
[Epoch 47 Batch 1110/1540] avg loss 0.00157937, throughput 2.19194K wps
[Epoch 47 Batch 1140/1540] avg loss 0.0022727, throughput 2.18262K wps
[Epoch 47 Batch 1170/1540] avg loss 0.0016188, throughput 2.19393K wps
[Epoch 47 Batch 1200/1540] avg loss 0.00133816, throughput 2.18746K wps
[Epoch 47 Batch 1230/1540] avg loss 0.00195441, throughput 2.19199K wps
[Epoch 47 Batch 1260/1540] avg loss 0.00181868, throughput 2.17743K wps
[Epoch 47 Batch 1290/1540] avg loss 0.00199618, throughput 2.17048K wps
[Epoch 47 Batch 1320/1540] avg loss 0.00152384, throughput 2.19016K wps
[Epoch 47 Batch 1350/1540] avg loss 0.00183538, throughput 2.19621K wps
[Epoch 47 Batch 1380/1540] avg loss 0.0016677, throughput 2.19412K wps
[Epoch 47 Batch 1410/1540] avg loss 0.00177665, throughput 2.18164K wps
[Epoch 47 Batch 1440/1540] avg loss 0.0017152, throughput 2.19153K wps
[Epoch 47 Batch 1470/1540] avg loss 0.00157604, throughput 2.18499K wps
[Epoch 47 Batch 1500/1540] avg loss 0.00172827, throughput 2.17146K wps
[Epoch 47 Batch 1530/1540] avg loss 0.00160839, throughput 2.18768K wps
Begin Testing...
[Epoch 47] train avg loss 0.00168863, dev acc 0.8303, dev avg loss 0.579637, throughput 2.18475K wps
[Epoch 48 Batch 30/1540] avg loss 0.00162225, throughput 2.21642K wps
[Epoch 48 Batch 60/1540] avg loss 0.00144884, throughput 2.18965K wps
[Epoch 48 Batch 90/1540] avg loss 0.001526, throughput 2.18342K wps
[Epoch 48 Batch 120/1540] avg loss 0.00159803, throughput 2.19388K wps
[Epoch 48 Batch 150/1540] avg loss 0.00158854, throughput 2.18966K wps
[Epoch 48 Batch 180/1540] avg loss 0.0013574, throughput 2.1931K wps
[Epoch 48 Batch 210/1540] avg loss 0.0016651, throughput 2.18584K wps
[Epoch 48 Batch 240/1540] avg loss 0.00148042, throughput 2.18631K wps
[Epoch 48 Batch 270/1540] avg loss 0.00150086, throughput 2.17906K wps
[Epoch 48 Batch 300/1540] avg loss 0.00156495, throughput 2.18299K wps
[Epoch 48 Batch 330/1540] avg loss 0.00156385, throughput 2.18839K wps
[Epoch 48 Batch 360/1540] avg loss 0.00184386, throughput 2.19056K wps
[Epoch 48 Batch 390/1540] avg loss 0.00163378, throughput 2.17763K wps
[Epoch 48 Batch 420/1540] avg loss 0.0014998, throughput 2.16092K wps
[Epoch 48 Batch 450/1540] avg loss 0.00168843, throughput 2.19121K wps
[Epoch 48 Batch 480/1540] avg loss 0.00158482, throughput 2.18622K wps
[Epoch 48 Batch 510/1540] avg loss 0.00190522, throughput 2.18072K wps
[Epoch 48 Batch 540/1540] avg loss 0.0017126, throughput 2.18178K wps
[Epoch 48 Batch 570/1540] avg loss 0.00154599, throughput 2.18287K wps
[Epoch 48 Batch 600/1540] avg loss 0.00181515, throughput 2.18981K wps
[Epoch 48 Batch 630/1540] avg loss 0.00167816, throughput 2.19169K wps
[Epoch 48 Batch 660/1540] avg loss 0.00168661, throughput 2.19218K wps
[Epoch 48 Batch 690/1540] avg loss 0.00153867, throughput 2.18904K wps
[Epoch 48 Batch 720/1540] avg loss 0.00194581, throughput 2.19131K wps
[Epoch 48 Batch 750/1540] avg loss 0.00161071, throughput 2.19417K wps
[Epoch 48 Batch 780/1540] avg loss 0.00161081, throughput 2.19749K wps
[Epoch 48 Batch 810/1540] avg loss 0.00158761, throughput 2.1924K wps
[Epoch 48 Batch 840/1540] avg loss 0.0016738, throughput 2.19076K wps
[Epoch 48 Batch 870/1540] avg loss 0.00178915, throughput 2.18665K wps
[Epoch 48 Batch 900/1540] avg loss 0.00176409, throughput 2.17762K wps
[Epoch 48 Batch 930/1540] avg loss 0.00162163, throughput 2.19184K wps
[Epoch 48 Batch 960/1540] avg loss 0.00199249, throughput 2.17888K wps
[Epoch 48 Batch 990/1540] avg loss 0.0016848, throughput 2.19115K wps
[Epoch 48 Batch 1020/1540] avg loss 0.00175533, throughput 2.18599K wps
[Epoch 48 Batch 1050/1540] avg loss 0.00184927, throughput 2.19131K wps
[Epoch 48 Batch 1080/1540] avg loss 0.00168767, throughput 2.19062K wps
[Epoch 48 Batch 1110/1540] avg loss 0.00177102, throughput 2.16309K wps
[Epoch 48 Batch 1140/1540] avg loss 0.00172621, throughput 2.1866K wps
[Epoch 48 Batch 1170/1540] avg loss 0.00165597, throughput 2.18031K wps
[Epoch 48 Batch 1200/1540] avg loss 0.00151597, throughput 2.17147K wps
[Epoch 48 Batch 1230/1540] avg loss 0.00172112, throughput 2.1725K wps
[Epoch 48 Batch 1260/1540] avg loss 0.00169032, throughput 2.16792K wps
[Epoch 48 Batch 1290/1540] avg loss 0.00149808, throughput 2.18568K wps
[Epoch 48 Batch 1320/1540] avg loss 0.00177054, throughput 2.18303K wps
[Epoch 48 Batch 1350/1540] avg loss 0.00176633, throughput 2.18937K wps
[Epoch 48 Batch 1380/1540] avg loss 0.0019597, throughput 2.18449K wps
[Epoch 48 Batch 1410/1540] avg loss 0.00154945, throughput 2.18412K wps
[Epoch 48 Batch 1440/1540] avg loss 0.00175548, throughput 2.17392K wps
[Epoch 48 Batch 1470/1540] avg loss 0.00165321, throughput 2.19471K wps
[Epoch 48 Batch 1500/1540] avg loss 0.00171333, throughput 2.18582K wps
[Epoch 48 Batch 1530/1540] avg loss 0.00153903, throughput 2.19145K wps
Begin Testing...
[Epoch 48] train avg loss 0.0016647, dev acc 0.8452, dev avg loss 0.538315, throughput 2.18591K wps
[Epoch 49 Batch 30/1540] avg loss 0.00170992, throughput 2.20408K wps
[Epoch 49 Batch 60/1540] avg loss 0.00125487, throughput 2.18119K wps
[Epoch 49 Batch 90/1540] avg loss 0.00153708, throughput 2.18126K wps
[Epoch 49 Batch 120/1540] avg loss 0.00147145, throughput 2.18358K wps
[Epoch 49 Batch 150/1540] avg loss 0.00159291, throughput 2.16268K wps
[Epoch 49 Batch 180/1540] avg loss 0.00162453, throughput 2.17732K wps
[Epoch 49 Batch 210/1540] avg loss 0.00143504, throughput 2.18655K wps
[Epoch 49 Batch 240/1540] avg loss 0.00171241, throughput 2.17922K wps
[Epoch 49 Batch 270/1540] avg loss 0.00156141, throughput 2.17878K wps
[Epoch 49 Batch 300/1540] avg loss 0.00162041, throughput 2.19425K wps
[Epoch 49 Batch 330/1540] avg loss 0.00156136, throughput 2.18927K wps
[Epoch 49 Batch 360/1540] avg loss 0.00162697, throughput 2.18851K wps
[Epoch 49 Batch 390/1540] avg loss 0.00169205, throughput 2.18781K wps
[Epoch 49 Batch 420/1540] avg loss 0.00177254, throughput 2.18694K wps
[Epoch 49 Batch 450/1540] avg loss 0.00155275, throughput 2.19371K wps
[Epoch 49 Batch 480/1540] avg loss 0.00165187, throughput 2.19077K wps
[Epoch 49 Batch 510/1540] avg loss 0.00136308, throughput 2.1791K wps
[Epoch 49 Batch 540/1540] avg loss 0.00153761, throughput 2.1881K wps
[Epoch 49 Batch 570/1540] avg loss 0.00176085, throughput 2.19339K wps
[Epoch 49 Batch 600/1540] avg loss 0.00194695, throughput 2.19338K wps
[Epoch 49 Batch 630/1540] avg loss 0.00188992, throughput 2.18485K wps
[Epoch 49 Batch 660/1540] avg loss 0.00189436, throughput 2.19154K wps
[Epoch 49 Batch 690/1540] avg loss 0.00147399, throughput 2.19402K wps
[Epoch 49 Batch 720/1540] avg loss 0.0019193, throughput 2.1934K wps
[Epoch 49 Batch 750/1540] avg loss 0.00176791, throughput 2.18787K wps
[Epoch 49 Batch 780/1540] avg loss 0.00157161, throughput 2.18382K wps
[Epoch 49 Batch 810/1540] avg loss 0.00166068, throughput 2.19062K wps
[Epoch 49 Batch 840/1540] avg loss 0.0017194, throughput 2.18938K wps
[Epoch 49 Batch 870/1540] avg loss 0.00154467, throughput 2.18537K wps
[Epoch 49 Batch 900/1540] avg loss 0.001593, throughput 2.17617K wps
[Epoch 49 Batch 930/1540] avg loss 0.00165251, throughput 2.19374K wps
[Epoch 49 Batch 960/1540] avg loss 0.00149426, throughput 2.19386K wps
[Epoch 49 Batch 990/1540] avg loss 0.00163042, throughput 2.19304K wps
[Epoch 49 Batch 1020/1540] avg loss 0.00155056, throughput 2.17431K wps
[Epoch 49 Batch 1050/1540] avg loss 0.00135777, throughput 2.17463K wps
[Epoch 49 Batch 1080/1540] avg loss 0.001598, throughput 2.18739K wps
[Epoch 49 Batch 1110/1540] avg loss 0.00167713, throughput 2.18404K wps
[Epoch 49 Batch 1140/1540] avg loss 0.00178368, throughput 2.1704K wps
[Epoch 49 Batch 1170/1540] avg loss 0.00203805, throughput 2.17459K wps
[Epoch 49 Batch 1200/1540] avg loss 0.00163591, throughput 2.1861K wps
[Epoch 49 Batch 1230/1540] avg loss 0.00152283, throughput 2.18913K wps
[Epoch 49 Batch 1260/1540] avg loss 0.00152369, throughput 2.1929K wps
[Epoch 49 Batch 1290/1540] avg loss 0.00173675, throughput 2.19249K wps
[Epoch 49 Batch 1320/1540] avg loss 0.00197788, throughput 2.18276K wps
[Epoch 49 Batch 1350/1540] avg loss 0.00185788, throughput 2.18693K wps
[Epoch 49 Batch 1380/1540] avg loss 0.0015721, throughput 2.19278K wps
[Epoch 49 Batch 1410/1540] avg loss 0.00158614, throughput 2.19101K wps
[Epoch 49 Batch 1440/1540] avg loss 0.00170862, throughput 2.18785K wps
[Epoch 49 Batch 1470/1540] avg loss 0.00171029, throughput 2.17706K wps
[Epoch 49 Batch 1500/1540] avg loss 0.00175973, throughput 2.19542K wps
[Epoch 49 Batch 1530/1540] avg loss 0.00181676, throughput 2.1938K wps
Begin Testing...
[Epoch 49] train avg loss 0.00164987, dev acc 0.8394, dev avg loss 0.538954, throughput 2.18638K wps
[Epoch 50 Batch 30/1540] avg loss 0.00167444, throughput 2.23936K wps
[Epoch 50 Batch 60/1540] avg loss 0.00170649, throughput 2.17653K wps
[Epoch 50 Batch 90/1540] avg loss 0.00173092, throughput 2.18409K wps
[Epoch 50 Batch 120/1540] avg loss 0.00123102, throughput 2.19193K wps
[Epoch 50 Batch 150/1540] avg loss 0.00152452, throughput 2.18051K wps
[Epoch 50 Batch 180/1540] avg loss 0.00162641, throughput 2.17098K wps
[Epoch 50 Batch 210/1540] avg loss 0.00148096, throughput 2.18465K wps
[Epoch 50 Batch 240/1540] avg loss 0.00167435, throughput 2.19199K wps
[Epoch 50 Batch 270/1540] avg loss 0.001621, throughput 2.15563K wps
[Epoch 50 Batch 300/1540] avg loss 0.00167748, throughput 2.19494K wps
[Epoch 50 Batch 330/1540] avg loss 0.0017951, throughput 2.19448K wps
[Epoch 50 Batch 360/1540] avg loss 0.00171536, throughput 2.17793K wps
[Epoch 50 Batch 390/1540] avg loss 0.00165527, throughput 2.19452K wps
[Epoch 50 Batch 420/1540] avg loss 0.0018014, throughput 2.19048K wps
[Epoch 50 Batch 450/1540] avg loss 0.00151715, throughput 2.18706K wps
[Epoch 50 Batch 480/1540] avg loss 0.00160472, throughput 2.1934K wps
[Epoch 50 Batch 510/1540] avg loss 0.00179601, throughput 2.19531K wps
[Epoch 50 Batch 540/1540] avg loss 0.00152311, throughput 2.19046K wps
[Epoch 50 Batch 570/1540] avg loss 0.0017358, throughput 2.1819K wps
[Epoch 50 Batch 600/1540] avg loss 0.00162641, throughput 2.17767K wps
[Epoch 50 Batch 630/1540] avg loss 0.00147034, throughput 2.18081K wps
[Epoch 50 Batch 660/1540] avg loss 0.00148225, throughput 2.17332K wps
[Epoch 50 Batch 690/1540] avg loss 0.00167436, throughput 2.18103K wps
[Epoch 50 Batch 720/1540] avg loss 0.00150363, throughput 2.18402K wps
[Epoch 50 Batch 750/1540] avg loss 0.00163407, throughput 2.19227K wps
[Epoch 50 Batch 780/1540] avg loss 0.00131087, throughput 2.18982K wps
[Epoch 50 Batch 810/1540] avg loss 0.00159923, throughput 2.16629K wps
[Epoch 50 Batch 840/1540] avg loss 0.00116318, throughput 2.18202K wps
[Epoch 50 Batch 870/1540] avg loss 0.00164136, throughput 2.18552K wps
[Epoch 50 Batch 900/1540] avg loss 0.00163892, throughput 2.19164K wps
[Epoch 50 Batch 930/1540] avg loss 0.00159182, throughput 2.19012K wps
[Epoch 50 Batch 960/1540] avg loss 0.00146607, throughput 2.18047K wps
[Epoch 50 Batch 990/1540] avg loss 0.00182891, throughput 2.19124K wps
[Epoch 50 Batch 1020/1540] avg loss 0.00181373, throughput 2.19023K wps
[Epoch 50 Batch 1050/1540] avg loss 0.0017099, throughput 2.19104K wps
[Epoch 50 Batch 1080/1540] avg loss 0.00174116, throughput 2.18194K wps
[Epoch 50 Batch 1110/1540] avg loss 0.00156507, throughput 2.19633K wps
[Epoch 50 Batch 1140/1540] avg loss 0.00154356, throughput 2.18813K wps
[Epoch 50 Batch 1170/1540] avg loss 0.00159834, throughput 2.19056K wps
[Epoch 50 Batch 1200/1540] avg loss 0.00159995, throughput 2.1952K wps
[Epoch 50 Batch 1230/1540] avg loss 0.00182071, throughput 2.19289K wps
[Epoch 50 Batch 1260/1540] avg loss 0.00183466, throughput 2.18625K wps
[Epoch 50 Batch 1290/1540] avg loss 0.00148689, throughput 2.18505K wps
[Epoch 50 Batch 1320/1540] avg loss 0.00168694, throughput 2.15439K wps
[Epoch 50 Batch 1350/1540] avg loss 0.00179012, throughput 2.18531K wps
[Epoch 50 Batch 1380/1540] avg loss 0.001806, throughput 2.18579K wps
[Epoch 50 Batch 1410/1540] avg loss 0.00185326, throughput 2.17936K wps
[Epoch 50 Batch 1440/1540] avg loss 0.00162673, throughput 2.16753K wps
[Epoch 50 Batch 1470/1540] avg loss 0.00191108, throughput 2.15796K wps
[Epoch 50 Batch 1500/1540] avg loss 0.00155304, throughput 2.16719K wps
[Epoch 50 Batch 1530/1540] avg loss 0.00198945, throughput 2.19284K wps
Begin Testing...
[Epoch 50] train avg loss 0.00164479, dev acc 0.8417, dev avg loss 0.553582, throughput 2.18496K wps
[Epoch 51 Batch 30/1540] avg loss 0.00151224, throughput 2.23724K wps
[Epoch 51 Batch 60/1540] avg loss 0.00122491, throughput 2.19134K wps
[Epoch 51 Batch 90/1540] avg loss 0.00144492, throughput 2.19584K wps
[Epoch 51 Batch 120/1540] avg loss 0.00168327, throughput 2.18455K wps
[Epoch 51 Batch 150/1540] avg loss 0.00143098, throughput 2.1869K wps
[Epoch 51 Batch 180/1540] avg loss 0.00152194, throughput 2.19213K wps
[Epoch 51 Batch 210/1540] avg loss 0.00142404, throughput 2.18307K wps
[Epoch 51 Batch 240/1540] avg loss 0.00176406, throughput 2.18755K wps
[Epoch 51 Batch 270/1540] avg loss 0.00136547, throughput 2.18719K wps
[Epoch 51 Batch 300/1540] avg loss 0.00172142, throughput 2.18908K wps
[Epoch 51 Batch 330/1540] avg loss 0.00151046, throughput 2.19096K wps
[Epoch 51 Batch 360/1540] avg loss 0.00165019, throughput 2.19369K wps
[Epoch 51 Batch 390/1540] avg loss 0.00156652, throughput 2.1787K wps
[Epoch 51 Batch 420/1540] avg loss 0.00147225, throughput 2.19034K wps
[Epoch 51 Batch 450/1540] avg loss 0.00146078, throughput 2.18566K wps
[Epoch 51 Batch 480/1540] avg loss 0.00158269, throughput 2.18594K wps
[Epoch 51 Batch 510/1540] avg loss 0.00164307, throughput 2.19121K wps
[Epoch 51 Batch 540/1540] avg loss 0.00173612, throughput 2.17486K wps
[Epoch 51 Batch 570/1540] avg loss 0.00165544, throughput 2.19121K wps
[Epoch 51 Batch 600/1540] avg loss 0.00150861, throughput 2.19076K wps
[Epoch 51 Batch 630/1540] avg loss 0.0016637, throughput 2.16624K wps
[Epoch 51 Batch 660/1540] avg loss 0.00174533, throughput 2.19277K wps
[Epoch 51 Batch 690/1540] avg loss 0.00186028, throughput 2.19276K wps
[Epoch 51 Batch 720/1540] avg loss 0.00162964, throughput 2.19581K wps
[Epoch 51 Batch 750/1540] avg loss 0.0013679, throughput 2.19351K wps
[Epoch 51 Batch 780/1540] avg loss 0.00164375, throughput 2.18098K wps
[Epoch 51 Batch 810/1540] avg loss 0.00186964, throughput 2.18742K wps
[Epoch 51 Batch 840/1540] avg loss 0.00145978, throughput 2.17442K wps
[Epoch 51 Batch 870/1540] avg loss 0.00162345, throughput 2.17679K wps
[Epoch 51 Batch 900/1540] avg loss 0.00164634, throughput 2.18531K wps
[Epoch 51 Batch 930/1540] avg loss 0.00158598, throughput 2.18167K wps
[Epoch 51 Batch 960/1540] avg loss 0.00155436, throughput 2.19543K wps
[Epoch 51 Batch 990/1540] avg loss 0.00188169, throughput 2.19479K wps
[Epoch 51 Batch 1020/1540] avg loss 0.00156253, throughput 2.18664K wps
[Epoch 51 Batch 1050/1540] avg loss 0.00160693, throughput 2.18407K wps
[Epoch 51 Batch 1080/1540] avg loss 0.00161377, throughput 2.19373K wps
[Epoch 51 Batch 1110/1540] avg loss 0.0018379, throughput 2.19436K wps
[Epoch 51 Batch 1140/1540] avg loss 0.00166043, throughput 2.19127K wps
[Epoch 51 Batch 1170/1540] avg loss 0.00146117, throughput 2.17841K wps
[Epoch 51 Batch 1200/1540] avg loss 0.00151359, throughput 2.17798K wps
[Epoch 51 Batch 1230/1540] avg loss 0.00164518, throughput 2.16718K wps
[Epoch 51 Batch 1260/1540] avg loss 0.00173181, throughput 2.17844K wps
[Epoch 51 Batch 1290/1540] avg loss 0.00170794, throughput 2.15022K wps
[Epoch 51 Batch 1320/1540] avg loss 0.00172202, throughput 2.15228K wps
[Epoch 51 Batch 1350/1540] avg loss 0.00174078, throughput 2.15112K wps
[Epoch 51 Batch 1380/1540] avg loss 0.00151391, throughput 2.1948K wps
[Epoch 51 Batch 1410/1540] avg loss 0.00156366, throughput 2.18701K wps
[Epoch 51 Batch 1440/1540] avg loss 0.0015678, throughput 2.17737K wps
[Epoch 51 Batch 1470/1540] avg loss 0.00150397, throughput 2.1846K wps
[Epoch 51 Batch 1500/1540] avg loss 0.00154705, throughput 2.17495K wps
[Epoch 51 Batch 1530/1540] avg loss 0.00168434, throughput 2.17901K wps
Begin Testing...
[Epoch 51] train avg loss 0.00159991, dev acc 0.8406, dev avg loss 0.558439, throughput 2.18494K wps
[Epoch 52 Batch 30/1540] avg loss 0.00163001, throughput 2.23148K wps
[Epoch 52 Batch 60/1540] avg loss 0.00146072, throughput 2.16193K wps
[Epoch 52 Batch 90/1540] avg loss 0.00120321, throughput 2.17446K wps
[Epoch 52 Batch 120/1540] avg loss 0.00154629, throughput 2.18883K wps
[Epoch 52 Batch 150/1540] avg loss 0.00150607, throughput 2.17007K wps
[Epoch 52 Batch 180/1540] avg loss 0.00147378, throughput 2.18828K wps
[Epoch 52 Batch 210/1540] avg loss 0.00154106, throughput 2.19066K wps
[Epoch 52 Batch 240/1540] avg loss 0.00155123, throughput 2.19129K wps
[Epoch 52 Batch 270/1540] avg loss 0.00142744, throughput 2.19365K wps
[Epoch 52 Batch 300/1540] avg loss 0.00146827, throughput 2.1841K wps
[Epoch 52 Batch 330/1540] avg loss 0.00145833, throughput 2.16682K wps
[Epoch 52 Batch 360/1540] avg loss 0.00171586, throughput 2.16154K wps
[Epoch 52 Batch 390/1540] avg loss 0.00165521, throughput 2.15497K wps
[Epoch 52 Batch 420/1540] avg loss 0.00150031, throughput 2.18408K wps
[Epoch 52 Batch 450/1540] avg loss 0.00184945, throughput 2.1773K wps
[Epoch 52 Batch 480/1540] avg loss 0.0017896, throughput 2.18742K wps
[Epoch 52 Batch 510/1540] avg loss 0.00170928, throughput 2.18878K wps
[Epoch 52 Batch 540/1540] avg loss 0.00131437, throughput 2.19335K wps
[Epoch 52 Batch 570/1540] avg loss 0.00157352, throughput 2.16769K wps
[Epoch 52 Batch 600/1540] avg loss 0.00170298, throughput 2.18109K wps
[Epoch 52 Batch 630/1540] avg loss 0.00148082, throughput 2.19547K wps
[Epoch 52 Batch 660/1540] avg loss 0.00177001, throughput 2.18313K wps
[Epoch 52 Batch 690/1540] avg loss 0.0015106, throughput 2.18416K wps
[Epoch 52 Batch 720/1540] avg loss 0.00167837, throughput 2.18121K wps
[Epoch 52 Batch 750/1540] avg loss 0.00174068, throughput 2.19287K wps
[Epoch 52 Batch 780/1540] avg loss 0.00151575, throughput 2.1848K wps
[Epoch 52 Batch 810/1540] avg loss 0.00152028, throughput 2.18705K wps
[Epoch 52 Batch 840/1540] avg loss 0.00174276, throughput 2.16871K wps
[Epoch 52 Batch 870/1540] avg loss 0.00162788, throughput 2.19886K wps
[Epoch 52 Batch 900/1540] avg loss 0.0016148, throughput 2.19357K wps
[Epoch 52 Batch 930/1540] avg loss 0.00122408, throughput 2.17872K wps
[Epoch 52 Batch 960/1540] avg loss 0.00150562, throughput 2.18505K wps
[Epoch 52 Batch 990/1540] avg loss 0.00154933, throughput 2.19351K wps
[Epoch 52 Batch 1020/1540] avg loss 0.00165943, throughput 2.1747K wps
[Epoch 52 Batch 1050/1540] avg loss 0.00149187, throughput 2.18251K wps
[Epoch 52 Batch 1080/1540] avg loss 0.00168387, throughput 2.1952K wps
[Epoch 52 Batch 1110/1540] avg loss 0.00175609, throughput 2.18736K wps
[Epoch 52 Batch 1140/1540] avg loss 0.00159084, throughput 2.16893K wps
[Epoch 52 Batch 1170/1540] avg loss 0.00163178, throughput 2.18166K wps
[Epoch 52 Batch 1200/1540] avg loss 0.00159749, throughput 2.17496K wps
[Epoch 52 Batch 1230/1540] avg loss 0.00186282, throughput 2.19056K wps
[Epoch 52 Batch 1260/1540] avg loss 0.00189963, throughput 2.1935K wps
[Epoch 52 Batch 1290/1540] avg loss 0.0014356, throughput 2.19465K wps
[Epoch 52 Batch 1320/1540] avg loss 0.00152132, throughput 2.17518K wps
[Epoch 52 Batch 1350/1540] avg loss 0.00139613, throughput 2.17115K wps
[Epoch 52 Batch 1380/1540] avg loss 0.00168473, throughput 2.17286K wps
[Epoch 52 Batch 1410/1540] avg loss 0.00160872, throughput 2.19393K wps
[Epoch 52 Batch 1440/1540] avg loss 0.00169795, throughput 2.18812K wps
[Epoch 52 Batch 1470/1540] avg loss 0.00152528, throughput 2.16954K wps
[Epoch 52 Batch 1500/1540] avg loss 0.00170158, throughput 2.18649K wps
[Epoch 52 Batch 1530/1540] avg loss 0.00168369, throughput 2.19558K wps
Begin Testing...
[Epoch 52] train avg loss 0.00158861, dev acc 0.8429, dev avg loss 0.558697, throughput 2.18358K wps
[Epoch 53 Batch 30/1540] avg loss 0.00144916, throughput 2.22628K wps
[Epoch 53 Batch 60/1540] avg loss 0.00142933, throughput 2.1859K wps
[Epoch 53 Batch 90/1540] avg loss 0.00141355, throughput 2.18106K wps
[Epoch 53 Batch 120/1540] avg loss 0.00139082, throughput 2.19379K wps
[Epoch 53 Batch 150/1540] avg loss 0.00157313, throughput 2.19182K wps
[Epoch 53 Batch 180/1540] avg loss 0.00162342, throughput 2.18999K wps
[Epoch 53 Batch 210/1540] avg loss 0.00163765, throughput 2.17205K wps
[Epoch 53 Batch 240/1540] avg loss 0.00159969, throughput 2.19084K wps
[Epoch 53 Batch 270/1540] avg loss 0.00174526, throughput 2.1682K wps
[Epoch 53 Batch 300/1540] avg loss 0.0015754, throughput 2.19064K wps
[Epoch 53 Batch 330/1540] avg loss 0.00157706, throughput 2.19352K wps
[Epoch 53 Batch 360/1540] avg loss 0.00170449, throughput 2.18953K wps
[Epoch 53 Batch 390/1540] avg loss 0.0016608, throughput 2.19277K wps
[Epoch 53 Batch 420/1540] avg loss 0.00146517, throughput 2.19349K wps
[Epoch 53 Batch 450/1540] avg loss 0.00143399, throughput 2.18782K wps
[Epoch 53 Batch 480/1540] avg loss 0.00193121, throughput 2.19311K wps
[Epoch 53 Batch 510/1540] avg loss 0.00161701, throughput 2.18608K wps
[Epoch 53 Batch 540/1540] avg loss 0.00145734, throughput 2.18559K wps
[Epoch 53 Batch 570/1540] avg loss 0.00179042, throughput 2.19235K wps
[Epoch 53 Batch 600/1540] avg loss 0.00148575, throughput 2.18512K wps
[Epoch 53 Batch 630/1540] avg loss 0.00132407, throughput 2.18945K wps
[Epoch 53 Batch 660/1540] avg loss 0.0016527, throughput 2.18939K wps
[Epoch 53 Batch 690/1540] avg loss 0.00154825, throughput 2.19723K wps
[Epoch 53 Batch 720/1540] avg loss 0.00169254, throughput 2.176K wps
[Epoch 53 Batch 750/1540] avg loss 0.00136756, throughput 2.16248K wps
[Epoch 53 Batch 780/1540] avg loss 0.00143053, throughput 2.18521K wps
[Epoch 53 Batch 810/1540] avg loss 0.00144567, throughput 2.18569K wps
[Epoch 53 Batch 840/1540] avg loss 0.00149939, throughput 2.18194K wps
[Epoch 53 Batch 870/1540] avg loss 0.00118114, throughput 2.18707K wps
[Epoch 53 Batch 900/1540] avg loss 0.00159771, throughput 2.19213K wps
[Epoch 53 Batch 930/1540] avg loss 0.00144292, throughput 2.18772K wps
[Epoch 53 Batch 960/1540] avg loss 0.00158238, throughput 2.15563K wps
[Epoch 53 Batch 990/1540] avg loss 0.00186271, throughput 2.17891K wps
[Epoch 53 Batch 1020/1540] avg loss 0.00155612, throughput 2.19159K wps
[Epoch 53 Batch 1050/1540] avg loss 0.00182706, throughput 2.18272K wps
[Epoch 53 Batch 1080/1540] avg loss 0.00126531, throughput 2.18041K wps
[Epoch 53 Batch 1110/1540] avg loss 0.00147134, throughput 2.1891K wps
[Epoch 53 Batch 1140/1540] avg loss 0.00162272, throughput 2.19099K wps
[Epoch 53 Batch 1170/1540] avg loss 0.00146558, throughput 2.19382K wps
[Epoch 53 Batch 1200/1540] avg loss 0.00145897, throughput 2.17837K wps
[Epoch 53 Batch 1230/1540] avg loss 0.00162283, throughput 2.18308K wps
[Epoch 53 Batch 1260/1540] avg loss 0.00170726, throughput 2.1883K wps
[Epoch 53 Batch 1290/1540] avg loss 0.00160797, throughput 2.19056K wps
[Epoch 53 Batch 1320/1540] avg loss 0.00163557, throughput 2.171K wps
[Epoch 53 Batch 1350/1540] avg loss 0.00158045, throughput 2.18021K wps
[Epoch 53 Batch 1380/1540] avg loss 0.00153034, throughput 2.18265K wps
[Epoch 53 Batch 1410/1540] avg loss 0.00178381, throughput 2.18799K wps
[Epoch 53 Batch 1440/1540] avg loss 0.00170705, throughput 2.19504K wps
[Epoch 53 Batch 1470/1540] avg loss 0.0014689, throughput 2.19143K wps
[Epoch 53 Batch 1500/1540] avg loss 0.00173871, throughput 2.1843K wps
[Epoch 53 Batch 1530/1540] avg loss 0.00152757, throughput 2.18145K wps
Begin Testing...
[Epoch 53] train avg loss 0.00156359, dev acc 0.8429, dev avg loss 0.552181, throughput 2.18639K wps
[Epoch 54 Batch 30/1540] avg loss 0.00148154, throughput 2.21016K wps
[Epoch 54 Batch 60/1540] avg loss 0.0014578, throughput 2.19289K wps
[Epoch 54 Batch 90/1540] avg loss 0.00142773, throughput 2.18425K wps
[Epoch 54 Batch 120/1540] avg loss 0.00134154, throughput 2.19024K wps
[Epoch 54 Batch 150/1540] avg loss 0.00164234, throughput 2.19106K wps
[Epoch 54 Batch 180/1540] avg loss 0.00144045, throughput 2.19187K wps
[Epoch 54 Batch 210/1540] avg loss 0.0013647, throughput 2.1898K wps
[Epoch 54 Batch 240/1540] avg loss 0.00141375, throughput 2.19384K wps
[Epoch 54 Batch 270/1540] avg loss 0.0011359, throughput 2.18955K wps
[Epoch 54 Batch 300/1540] avg loss 0.0017322, throughput 2.18096K wps
[Epoch 54 Batch 330/1540] avg loss 0.00141115, throughput 2.17962K wps
[Epoch 54 Batch 360/1540] avg loss 0.00178262, throughput 2.19248K wps
[Epoch 54 Batch 390/1540] avg loss 0.00143474, throughput 2.18287K wps
[Epoch 54 Batch 420/1540] avg loss 0.00151306, throughput 2.19201K wps
[Epoch 54 Batch 450/1540] avg loss 0.00135013, throughput 2.02671K wps
[Epoch 54 Batch 480/1540] avg loss 0.00181301, throughput 2.17445K wps
[Epoch 54 Batch 510/1540] avg loss 0.00156205, throughput 2.14835K wps
[Epoch 54 Batch 540/1540] avg loss 0.0014378, throughput 2.16887K wps
[Epoch 54 Batch 570/1540] avg loss 0.00145102, throughput 2.18801K wps
[Epoch 54 Batch 600/1540] avg loss 0.00146655, throughput 2.17958K wps
[Epoch 54 Batch 630/1540] avg loss 0.00154138, throughput 2.19524K wps
[Epoch 54 Batch 660/1540] avg loss 0.00167703, throughput 2.19037K wps
[Epoch 54 Batch 690/1540] avg loss 0.00160471, throughput 2.1695K wps
[Epoch 54 Batch 720/1540] avg loss 0.00161192, throughput 2.16164K wps
[Epoch 54 Batch 750/1540] avg loss 0.00172308, throughput 2.1876K wps
[Epoch 54 Batch 780/1540] avg loss 0.00129502, throughput 2.18423K wps
[Epoch 54 Batch 810/1540] avg loss 0.00145602, throughput 2.18619K wps
[Epoch 54 Batch 840/1540] avg loss 0.00139895, throughput 2.19215K wps
[Epoch 54 Batch 870/1540] avg loss 0.00148426, throughput 2.19343K wps
[Epoch 54 Batch 900/1540] avg loss 0.0017818, throughput 2.19009K wps
[Epoch 54 Batch 930/1540] avg loss 0.0016762, throughput 2.1888K wps
[Epoch 54 Batch 960/1540] avg loss 0.00183841, throughput 2.18378K wps
[Epoch 54 Batch 990/1540] avg loss 0.0016629, throughput 2.19074K wps
[Epoch 54 Batch 1020/1540] avg loss 0.00144558, throughput 2.19115K wps
[Epoch 54 Batch 1050/1540] avg loss 0.00156814, throughput 2.19187K wps
[Epoch 54 Batch 1080/1540] avg loss 0.0013866, throughput 2.18685K wps
[Epoch 54 Batch 1110/1540] avg loss 0.00165238, throughput 2.1928K wps
[Epoch 54 Batch 1140/1540] avg loss 0.00189977, throughput 2.19302K wps
[Epoch 54 Batch 1170/1540] avg loss 0.00151785, throughput 2.17408K wps
[Epoch 54 Batch 1200/1540] avg loss 0.00117403, throughput 2.18987K wps
[Epoch 54 Batch 1230/1540] avg loss 0.00160371, throughput 2.1886K wps
[Epoch 54 Batch 1260/1540] avg loss 0.00175492, throughput 2.19338K wps
[Epoch 54 Batch 1290/1540] avg loss 0.00175703, throughput 2.19592K wps
[Epoch 54 Batch 1320/1540] avg loss 0.00174704, throughput 2.18436K wps
[Epoch 54 Batch 1350/1540] avg loss 0.00162816, throughput 2.18857K wps
[Epoch 54 Batch 1380/1540] avg loss 0.00142155, throughput 2.19125K wps
[Epoch 54 Batch 1410/1540] avg loss 0.00143715, throughput 2.18454K wps
[Epoch 54 Batch 1440/1540] avg loss 0.00145368, throughput 2.19321K wps
[Epoch 54 Batch 1470/1540] avg loss 0.00188621, throughput 2.19138K wps
[Epoch 54 Batch 1500/1540] avg loss 0.00153319, throughput 2.1605K wps
[Epoch 54 Batch 1530/1540] avg loss 0.00168643, throughput 2.15779K wps
Begin Testing...
[Epoch 54] train avg loss 0.00154906, dev acc 0.8337, dev avg loss 0.563436, throughput 2.18241K wps
[Epoch 55 Batch 30/1540] avg loss 0.0013553, throughput 2.22802K wps
[Epoch 55 Batch 60/1540] avg loss 0.0015036, throughput 2.19266K wps
[Epoch 55 Batch 90/1540] avg loss 0.00147205, throughput 2.1956K wps
[Epoch 55 Batch 120/1540] avg loss 0.00140997, throughput 2.18755K wps
[Epoch 55 Batch 150/1540] avg loss 0.00150449, throughput 2.19035K wps
[Epoch 55 Batch 180/1540] avg loss 0.00151018, throughput 2.18584K wps
[Epoch 55 Batch 210/1540] avg loss 0.00135339, throughput 2.18824K wps
[Epoch 55 Batch 240/1540] avg loss 0.00117947, throughput 2.18732K wps
[Epoch 55 Batch 270/1540] avg loss 0.00132833, throughput 2.1782K wps
[Epoch 55 Batch 300/1540] avg loss 0.00152846, throughput 2.18947K wps
[Epoch 55 Batch 330/1540] avg loss 0.00159639, throughput 2.18542K wps
[Epoch 55 Batch 360/1540] avg loss 0.00126069, throughput 2.1659K wps
[Epoch 55 Batch 390/1540] avg loss 0.00162535, throughput 2.1795K wps
[Epoch 55 Batch 420/1540] avg loss 0.00143431, throughput 2.15517K wps
[Epoch 55 Batch 450/1540] avg loss 0.00164898, throughput 2.18572K wps
[Epoch 55 Batch 480/1540] avg loss 0.0013499, throughput 2.1731K wps
[Epoch 55 Batch 510/1540] avg loss 0.00160366, throughput 2.16971K wps
[Epoch 55 Batch 540/1540] avg loss 0.00135985, throughput 2.18435K wps
[Epoch 55 Batch 570/1540] avg loss 0.00146441, throughput 2.17441K wps
[Epoch 55 Batch 600/1540] avg loss 0.00184317, throughput 2.17755K wps
[Epoch 55 Batch 630/1540] avg loss 0.00161898, throughput 2.19288K wps
[Epoch 55 Batch 660/1540] avg loss 0.00146186, throughput 2.18936K wps
[Epoch 55 Batch 690/1540] avg loss 0.00171233, throughput 2.18716K wps
[Epoch 55 Batch 720/1540] avg loss 0.00153302, throughput 2.19282K wps
[Epoch 55 Batch 750/1540] avg loss 0.00194709, throughput 2.18772K wps
[Epoch 55 Batch 780/1540] avg loss 0.00171481, throughput 2.19303K wps
[Epoch 55 Batch 810/1540] avg loss 0.00149376, throughput 2.19262K wps
[Epoch 55 Batch 840/1540] avg loss 0.00149909, throughput 2.18961K wps
[Epoch 55 Batch 870/1540] avg loss 0.00144348, throughput 2.17153K wps
[Epoch 55 Batch 900/1540] avg loss 0.00142327, throughput 2.16314K wps
[Epoch 55 Batch 930/1540] avg loss 0.00154659, throughput 2.18363K wps
[Epoch 55 Batch 960/1540] avg loss 0.00164802, throughput 2.19326K wps
[Epoch 55 Batch 990/1540] avg loss 0.00187803, throughput 2.19017K wps
[Epoch 55 Batch 1020/1540] avg loss 0.00132359, throughput 2.1831K wps
[Epoch 55 Batch 1050/1540] avg loss 0.00164262, throughput 2.19102K wps
[Epoch 55 Batch 1080/1540] avg loss 0.00140248, throughput 2.17466K wps
[Epoch 55 Batch 1110/1540] avg loss 0.00153716, throughput 2.15493K wps
[Epoch 55 Batch 1140/1540] avg loss 0.00156885, throughput 2.17402K wps
[Epoch 55 Batch 1170/1540] avg loss 0.00157067, throughput 2.17365K wps
[Epoch 55 Batch 1200/1540] avg loss 0.00148789, throughput 2.15728K wps
[Epoch 55 Batch 1230/1540] avg loss 0.00148123, throughput 2.19374K wps
[Epoch 55 Batch 1260/1540] avg loss 0.00142688, throughput 2.19113K wps
[Epoch 55 Batch 1290/1540] avg loss 0.00171065, throughput 2.18846K wps
[Epoch 55 Batch 1320/1540] avg loss 0.00164282, throughput 2.18498K wps
[Epoch 55 Batch 1350/1540] avg loss 0.00162609, throughput 2.17774K wps
[Epoch 55 Batch 1380/1540] avg loss 0.00147073, throughput 2.19029K wps
[Epoch 55 Batch 1410/1540] avg loss 0.00157729, throughput 2.18261K wps
[Epoch 55 Batch 1440/1540] avg loss 0.00160834, throughput 2.15979K wps
[Epoch 55 Batch 1470/1540] avg loss 0.0013966, throughput 2.18909K wps
[Epoch 55 Batch 1500/1540] avg loss 0.0017658, throughput 2.18532K wps
[Epoch 55 Batch 1530/1540] avg loss 0.00160055, throughput 2.18896K wps
Begin Testing...
[Epoch 55] train avg loss 0.00153214, dev acc 0.8337, dev avg loss 0.568284, throughput 2.18314K wps
[Epoch 56 Batch 30/1540] avg loss 0.00139793, throughput 2.22028K wps
[Epoch 56 Batch 60/1540] avg loss 0.00105898, throughput 2.18759K wps
[Epoch 56 Batch 90/1540] avg loss 0.0012984, throughput 2.18844K wps
[Epoch 56 Batch 120/1540] avg loss 0.00153311, throughput 2.18891K wps
[Epoch 56 Batch 150/1540] avg loss 0.00122788, throughput 2.18979K wps
[Epoch 56 Batch 180/1540] avg loss 0.00139594, throughput 2.19377K wps
[Epoch 56 Batch 210/1540] avg loss 0.00157973, throughput 2.18926K wps
[Epoch 56 Batch 240/1540] avg loss 0.0015277, throughput 2.19334K wps
[Epoch 56 Batch 270/1540] avg loss 0.00144357, throughput 2.193K wps
[Epoch 56 Batch 300/1540] avg loss 0.00136467, throughput 2.18746K wps
[Epoch 56 Batch 330/1540] avg loss 0.00151727, throughput 2.18578K wps
[Epoch 56 Batch 360/1540] avg loss 0.00138838, throughput 2.18258K wps
[Epoch 56 Batch 390/1540] avg loss 0.00140199, throughput 2.18361K wps
[Epoch 56 Batch 420/1540] avg loss 0.00179209, throughput 2.18513K wps
[Epoch 56 Batch 450/1540] avg loss 0.00140211, throughput 2.18982K wps
[Epoch 56 Batch 480/1540] avg loss 0.00140201, throughput 2.19069K wps
[Epoch 56 Batch 510/1540] avg loss 0.00137529, throughput 2.18495K wps
[Epoch 56 Batch 540/1540] avg loss 0.00132489, throughput 2.17033K wps
[Epoch 56 Batch 570/1540] avg loss 0.00163672, throughput 2.19039K wps
[Epoch 56 Batch 600/1540] avg loss 0.00162638, throughput 2.19125K wps
[Epoch 56 Batch 630/1540] avg loss 0.0015255, throughput 2.19454K wps
[Epoch 56 Batch 660/1540] avg loss 0.00151181, throughput 2.18887K wps
[Epoch 56 Batch 690/1540] avg loss 0.00148193, throughput 2.18009K wps
[Epoch 56 Batch 720/1540] avg loss 0.00155116, throughput 2.19114K wps
[Epoch 56 Batch 750/1540] avg loss 0.00134272, throughput 2.19169K wps
[Epoch 56 Batch 780/1540] avg loss 0.00179005, throughput 2.1845K wps
[Epoch 56 Batch 810/1540] avg loss 0.0016152, throughput 2.19211K wps
[Epoch 56 Batch 840/1540] avg loss 0.0014797, throughput 2.19031K wps
[Epoch 56 Batch 870/1540] avg loss 0.00139385, throughput 2.17486K wps
[Epoch 56 Batch 900/1540] avg loss 0.00147846, throughput 2.1777K wps
[Epoch 56 Batch 930/1540] avg loss 0.00152683, throughput 2.19266K wps
[Epoch 56 Batch 960/1540] avg loss 0.00132204, throughput 2.18851K wps
[Epoch 56 Batch 990/1540] avg loss 0.00159055, throughput 2.18031K wps
[Epoch 56 Batch 1020/1540] avg loss 0.00176062, throughput 2.16501K wps
[Epoch 56 Batch 1050/1540] avg loss 0.00158268, throughput 2.17879K wps
[Epoch 56 Batch 1080/1540] avg loss 0.00177425, throughput 2.18143K wps
[Epoch 56 Batch 1110/1540] avg loss 0.00155713, throughput 2.16328K wps
[Epoch 56 Batch 1140/1540] avg loss 0.00161486, throughput 2.17248K wps
[Epoch 56 Batch 1170/1540] avg loss 0.0018184, throughput 2.17482K wps
[Epoch 56 Batch 1200/1540] avg loss 0.00138617, throughput 2.17921K wps
[Epoch 56 Batch 1230/1540] avg loss 0.00189816, throughput 2.18329K wps
[Epoch 56 Batch 1260/1540] avg loss 0.00166762, throughput 2.17298K wps
[Epoch 56 Batch 1290/1540] avg loss 0.00137203, throughput 2.18363K wps
[Epoch 56 Batch 1320/1540] avg loss 0.00148656, throughput 2.18887K wps
[Epoch 56 Batch 1350/1540] avg loss 0.00158057, throughput 2.17099K wps
[Epoch 56 Batch 1380/1540] avg loss 0.00169286, throughput 2.16623K wps
[Epoch 56 Batch 1410/1540] avg loss 0.00148538, throughput 2.17986K wps
[Epoch 56 Batch 1440/1540] avg loss 0.0016481, throughput 2.17051K wps
[Epoch 56 Batch 1470/1540] avg loss 0.00147843, throughput 2.18866K wps
[Epoch 56 Batch 1500/1540] avg loss 0.00153345, throughput 2.1805K wps
[Epoch 56 Batch 1530/1540] avg loss 0.00149093, throughput 2.19326K wps
Begin Testing...
[Epoch 56] train avg loss 0.00151223, dev acc 0.8406, dev avg loss 0.565467, throughput 2.18435K wps
[Epoch 57 Batch 30/1540] avg loss 0.00130026, throughput 2.22616K wps
[Epoch 57 Batch 60/1540] avg loss 0.00155741, throughput 2.19108K wps
[Epoch 57 Batch 90/1540] avg loss 0.00159775, throughput 2.19599K wps
[Epoch 57 Batch 120/1540] avg loss 0.00121983, throughput 2.19189K wps
[Epoch 57 Batch 150/1540] avg loss 0.00141059, throughput 2.16983K wps
[Epoch 57 Batch 180/1540] avg loss 0.00126658, throughput 2.16564K wps
[Epoch 57 Batch 210/1540] avg loss 0.00140025, throughput 2.18939K wps
[Epoch 57 Batch 240/1540] avg loss 0.00128384, throughput 2.17478K wps
[Epoch 57 Batch 270/1540] avg loss 0.00131885, throughput 2.18428K wps
[Epoch 57 Batch 300/1540] avg loss 0.00141075, throughput 2.1913K wps
[Epoch 57 Batch 330/1540] avg loss 0.00122654, throughput 2.19127K wps
[Epoch 57 Batch 360/1540] avg loss 0.00162345, throughput 2.18936K wps
[Epoch 57 Batch 390/1540] avg loss 0.00161415, throughput 2.1819K wps
[Epoch 57 Batch 420/1540] avg loss 0.0016824, throughput 2.18201K wps
[Epoch 57 Batch 450/1540] avg loss 0.00143024, throughput 2.19318K wps
[Epoch 57 Batch 480/1540] avg loss 0.00149898, throughput 2.18959K wps
[Epoch 57 Batch 510/1540] avg loss 0.00127926, throughput 2.18501K wps
[Epoch 57 Batch 540/1540] avg loss 0.00173264, throughput 2.17976K wps
[Epoch 57 Batch 570/1540] avg loss 0.00128578, throughput 2.18174K wps
[Epoch 57 Batch 600/1540] avg loss 0.00131833, throughput 2.19276K wps
[Epoch 57 Batch 630/1540] avg loss 0.00162069, throughput 2.18413K wps
[Epoch 57 Batch 660/1540] avg loss 0.00124031, throughput 2.19205K wps
[Epoch 57 Batch 690/1540] avg loss 0.00139367, throughput 2.1756K wps
[Epoch 57 Batch 720/1540] avg loss 0.00171574, throughput 2.16113K wps
[Epoch 57 Batch 750/1540] avg loss 0.00146688, throughput 2.18752K wps
[Epoch 57 Batch 780/1540] avg loss 0.00160555, throughput 2.19411K wps
[Epoch 57 Batch 810/1540] avg loss 0.00141959, throughput 2.18471K wps
[Epoch 57 Batch 840/1540] avg loss 0.00177312, throughput 2.1745K wps
[Epoch 57 Batch 870/1540] avg loss 0.00142279, throughput 2.1857K wps
[Epoch 57 Batch 900/1540] avg loss 0.00151239, throughput 2.18913K wps
[Epoch 57 Batch 930/1540] avg loss 0.00171026, throughput 2.18284K wps
[Epoch 57 Batch 960/1540] avg loss 0.00159878, throughput 2.18883K wps
[Epoch 57 Batch 990/1540] avg loss 0.00160419, throughput 2.18181K wps
[Epoch 57 Batch 1020/1540] avg loss 0.00177331, throughput 2.18213K wps
[Epoch 57 Batch 1050/1540] avg loss 0.00142218, throughput 2.18951K wps
[Epoch 57 Batch 1080/1540] avg loss 0.00151402, throughput 2.18419K wps
[Epoch 57 Batch 1110/1540] avg loss 0.00157139, throughput 2.16957K wps
[Epoch 57 Batch 1140/1540] avg loss 0.00134923, throughput 2.19195K wps
[Epoch 57 Batch 1170/1540] avg loss 0.00153768, throughput 2.18416K wps
[Epoch 57 Batch 1200/1540] avg loss 0.00152691, throughput 2.18419K wps
[Epoch 57 Batch 1230/1540] avg loss 0.00143319, throughput 2.18755K wps
[Epoch 57 Batch 1260/1540] avg loss 0.00151242, throughput 2.19043K wps
[Epoch 57 Batch 1290/1540] avg loss 0.00157327, throughput 2.18495K wps
[Epoch 57 Batch 1320/1540] avg loss 0.00178443, throughput 2.17968K wps
[Epoch 57 Batch 1350/1540] avg loss 0.00142055, throughput 2.18038K wps
[Epoch 57 Batch 1380/1540] avg loss 0.00133161, throughput 2.17429K wps
[Epoch 57 Batch 1410/1540] avg loss 0.00154764, throughput 2.17152K wps
[Epoch 57 Batch 1440/1540] avg loss 0.00140816, throughput 2.18237K wps
[Epoch 57 Batch 1470/1540] avg loss 0.00145791, throughput 2.18966K wps
[Epoch 57 Batch 1500/1540] avg loss 0.00171476, throughput 2.1933K wps
[Epoch 57 Batch 1530/1540] avg loss 0.00151826, throughput 2.19223K wps
Begin Testing...
[Epoch 57] train avg loss 0.00148598, dev acc 0.8360, dev avg loss 0.576955, throughput 2.18516K wps
[Epoch 58 Batch 30/1540] avg loss 0.00125155, throughput 2.22444K wps
[Epoch 58 Batch 60/1540] avg loss 0.00139424, throughput 2.17503K wps
[Epoch 58 Batch 90/1540] avg loss 0.00153993, throughput 2.17039K wps
[Epoch 58 Batch 120/1540] avg loss 0.00144111, throughput 2.17099K wps
[Epoch 58 Batch 150/1540] avg loss 0.00135848, throughput 2.17168K wps
[Epoch 58 Batch 180/1540] avg loss 0.00129084, throughput 2.17251K wps
[Epoch 58 Batch 210/1540] avg loss 0.00138335, throughput 2.16639K wps
[Epoch 58 Batch 240/1540] avg loss 0.00157552, throughput 2.18408K wps
[Epoch 58 Batch 270/1540] avg loss 0.00135819, throughput 2.18886K wps
[Epoch 58 Batch 300/1540] avg loss 0.0012714, throughput 2.174K wps
[Epoch 58 Batch 330/1540] avg loss 0.00141611, throughput 2.19084K wps
[Epoch 58 Batch 360/1540] avg loss 0.00144604, throughput 2.17537K wps
[Epoch 58 Batch 390/1540] avg loss 0.00128735, throughput 2.18185K wps
[Epoch 58 Batch 420/1540] avg loss 0.00170962, throughput 2.18521K wps
[Epoch 58 Batch 450/1540] avg loss 0.00164344, throughput 2.17925K wps
[Epoch 58 Batch 480/1540] avg loss 0.00143272, throughput 2.18425K wps
[Epoch 58 Batch 510/1540] avg loss 0.00166532, throughput 2.19073K wps
[Epoch 58 Batch 540/1540] avg loss 0.00157836, throughput 2.19551K wps
[Epoch 58 Batch 570/1540] avg loss 0.00142422, throughput 2.18795K wps
[Epoch 58 Batch 600/1540] avg loss 0.00150902, throughput 2.18975K wps
[Epoch 58 Batch 630/1540] avg loss 0.00141838, throughput 2.18819K wps
[Epoch 58 Batch 660/1540] avg loss 0.00146178, throughput 2.18397K wps
[Epoch 58 Batch 690/1540] avg loss 0.00124921, throughput 2.18266K wps
[Epoch 58 Batch 720/1540] avg loss 0.00171694, throughput 2.18333K wps
[Epoch 58 Batch 750/1540] avg loss 0.00160556, throughput 2.19191K wps
[Epoch 58 Batch 780/1540] avg loss 0.00134317, throughput 2.17931K wps
[Epoch 58 Batch 810/1540] avg loss 0.00138514, throughput 2.19448K wps
[Epoch 58 Batch 840/1540] avg loss 0.00105776, throughput 2.18391K wps
[Epoch 58 Batch 870/1540] avg loss 0.00137254, throughput 2.1926K wps
[Epoch 58 Batch 900/1540] avg loss 0.00137673, throughput 2.1891K wps
[Epoch 58 Batch 930/1540] avg loss 0.00141663, throughput 2.18899K wps
[Epoch 58 Batch 960/1540] avg loss 0.00145346, throughput 2.16038K wps
[Epoch 58 Batch 990/1540] avg loss 0.00173981, throughput 2.17742K wps
[Epoch 58 Batch 1020/1540] avg loss 0.001525, throughput 2.19011K wps
[Epoch 58 Batch 1050/1540] avg loss 0.00123961, throughput 2.19422K wps
[Epoch 58 Batch 1080/1540] avg loss 0.00160493, throughput 2.19236K wps
[Epoch 58 Batch 1110/1540] avg loss 0.00156316, throughput 2.1852K wps
[Epoch 58 Batch 1140/1540] avg loss 0.00134464, throughput 2.1757K wps
[Epoch 58 Batch 1170/1540] avg loss 0.00155206, throughput 2.16457K wps
[Epoch 58 Batch 1200/1540] avg loss 0.00159208, throughput 2.17463K wps
[Epoch 58 Batch 1230/1540] avg loss 0.00124505, throughput 2.18487K wps
[Epoch 58 Batch 1260/1540] avg loss 0.00160523, throughput 2.18902K wps
[Epoch 58 Batch 1290/1540] avg loss 0.00148673, throughput 2.15784K wps
[Epoch 58 Batch 1320/1540] avg loss 0.00145841, throughput 2.17557K wps
[Epoch 58 Batch 1350/1540] avg loss 0.00156469, throughput 2.19243K wps
[Epoch 58 Batch 1380/1540] avg loss 0.00164439, throughput 2.18848K wps
[Epoch 58 Batch 1410/1540] avg loss 0.00171629, throughput 2.18714K wps
[Epoch 58 Batch 1440/1540] avg loss 0.00148317, throughput 2.19405K wps
[Epoch 58 Batch 1470/1540] avg loss 0.00136794, throughput 2.18849K wps
[Epoch 58 Batch 1500/1540] avg loss 0.00183728, throughput 2.19277K wps
[Epoch 58 Batch 1530/1540] avg loss 0.00142446, throughput 2.18773K wps
Begin Testing...
[Epoch 58] train avg loss 0.00146827, dev acc 0.8337, dev avg loss 0.587409, throughput 2.18362K wps
[Epoch 59 Batch 30/1540] avg loss 0.00137486, throughput 2.2314K wps
[Epoch 59 Batch 60/1540] avg loss 0.00137089, throughput 2.17568K wps
[Epoch 59 Batch 90/1540] avg loss 0.00113543, throughput 2.18501K wps
[Epoch 59 Batch 120/1540] avg loss 0.00120284, throughput 2.1914K wps
[Epoch 59 Batch 150/1540] avg loss 0.00166409, throughput 2.19086K wps
[Epoch 59 Batch 180/1540] avg loss 0.00135768, throughput 2.19324K wps
[Epoch 59 Batch 210/1540] avg loss 0.00142685, throughput 2.18478K wps
[Epoch 59 Batch 240/1540] avg loss 0.00140119, throughput 2.17299K wps
[Epoch 59 Batch 270/1540] avg loss 0.00148236, throughput 2.16749K wps
[Epoch 59 Batch 300/1540] avg loss 0.00124351, throughput 2.1853K wps
[Epoch 59 Batch 330/1540] avg loss 0.00130732, throughput 2.18734K wps
[Epoch 59 Batch 360/1540] avg loss 0.00129833, throughput 2.17022K wps
[Epoch 59 Batch 390/1540] avg loss 0.00136403, throughput 2.19587K wps
[Epoch 59 Batch 420/1540] avg loss 0.00126736, throughput 2.18959K wps
[Epoch 59 Batch 450/1540] avg loss 0.00134263, throughput 2.18349K wps
[Epoch 59 Batch 480/1540] avg loss 0.00147698, throughput 2.19193K wps
[Epoch 59 Batch 510/1540] avg loss 0.00149183, throughput 2.19641K wps
[Epoch 59 Batch 540/1540] avg loss 0.0015814, throughput 2.17204K wps
[Epoch 59 Batch 570/1540] avg loss 0.00121583, throughput 2.18023K wps
[Epoch 59 Batch 600/1540] avg loss 0.00140612, throughput 2.18104K wps
[Epoch 59 Batch 630/1540] avg loss 0.00155401, throughput 2.18271K wps
[Epoch 59 Batch 660/1540] avg loss 0.00137237, throughput 2.18961K wps
[Epoch 59 Batch 690/1540] avg loss 0.00114506, throughput 2.18357K wps
[Epoch 59 Batch 720/1540] avg loss 0.00133438, throughput 2.18334K wps
[Epoch 59 Batch 750/1540] avg loss 0.00131661, throughput 2.18789K wps
[Epoch 59 Batch 780/1540] avg loss 0.00129866, throughput 2.18637K wps
[Epoch 59 Batch 810/1540] avg loss 0.00137054, throughput 2.18067K wps
[Epoch 59 Batch 840/1540] avg loss 0.00149297, throughput 2.17629K wps
[Epoch 59 Batch 870/1540] avg loss 0.00134625, throughput 2.17623K wps
[Epoch 59 Batch 900/1540] avg loss 0.00147396, throughput 2.19606K wps
[Epoch 59 Batch 930/1540] avg loss 0.00147698, throughput 2.1909K wps
[Epoch 59 Batch 960/1540] avg loss 0.00146627, throughput 2.1943K wps
[Epoch 59 Batch 990/1540] avg loss 0.00133301, throughput 2.19231K wps
[Epoch 59 Batch 1020/1540] avg loss 0.00158972, throughput 2.18671K wps
[Epoch 59 Batch 1050/1540] avg loss 0.00146681, throughput 2.18763K wps
[Epoch 59 Batch 1080/1540] avg loss 0.00154504, throughput 2.19349K wps
[Epoch 59 Batch 1110/1540] avg loss 0.00148962, throughput 2.18944K wps
[Epoch 59 Batch 1140/1540] avg loss 0.00141014, throughput 2.17257K wps
[Epoch 59 Batch 1170/1540] avg loss 0.0016094, throughput 2.1795K wps
[Epoch 59 Batch 1200/1540] avg loss 0.00111516, throughput 2.18955K wps
[Epoch 59 Batch 1230/1540] avg loss 0.00163484, throughput 2.19125K wps
[Epoch 59 Batch 1260/1540] avg loss 0.00167022, throughput 2.18905K wps
[Epoch 59 Batch 1290/1540] avg loss 0.00156983, throughput 2.18785K wps
[Epoch 59 Batch 1320/1540] avg loss 0.00159927, throughput 2.18995K wps
[Epoch 59 Batch 1350/1540] avg loss 0.00140337, throughput 2.18908K wps
[Epoch 59 Batch 1380/1540] avg loss 0.00172277, throughput 2.18773K wps
[Epoch 59 Batch 1410/1540] avg loss 0.00141185, throughput 2.19059K wps
[Epoch 59 Batch 1440/1540] avg loss 0.00136977, throughput 2.19011K wps
[Epoch 59 Batch 1470/1540] avg loss 0.00120752, throughput 2.18306K wps
[Epoch 59 Batch 1500/1540] avg loss 0.00151971, throughput 2.16182K wps
[Epoch 59 Batch 1530/1540] avg loss 0.00127344, throughput 2.17975K wps
Begin Testing...
[Epoch 59] train avg loss 0.00141104, dev acc 0.8394, dev avg loss 0.587684, throughput 2.18598K wps
[Epoch 60 Batch 30/1540] avg loss 0.00123252, throughput 2.23707K wps
[Epoch 60 Batch 60/1540] avg loss 0.0014045, throughput 2.18987K wps
[Epoch 60 Batch 90/1540] avg loss 0.00134496, throughput 2.19091K wps
[Epoch 60 Batch 120/1540] avg loss 0.00130565, throughput 2.19044K wps
[Epoch 60 Batch 150/1540] avg loss 0.00141708, throughput 2.19151K wps
[Epoch 60 Batch 180/1540] avg loss 0.00114221, throughput 2.18195K wps
[Epoch 60 Batch 210/1540] avg loss 0.00119243, throughput 2.18834K wps
[Epoch 60 Batch 240/1540] avg loss 0.00130552, throughput 2.18912K wps
[Epoch 60 Batch 270/1540] avg loss 0.00137356, throughput 2.19421K wps
[Epoch 60 Batch 300/1540] avg loss 0.00130389, throughput 2.17277K wps
[Epoch 60 Batch 330/1540] avg loss 0.00148544, throughput 2.19066K wps
[Epoch 60 Batch 360/1540] avg loss 0.00107069, throughput 2.19194K wps
[Epoch 60 Batch 390/1540] avg loss 0.0014153, throughput 2.19161K wps
[Epoch 60 Batch 420/1540] avg loss 0.0015711, throughput 2.19036K wps
[Epoch 60 Batch 450/1540] avg loss 0.00134868, throughput 2.19069K wps
[Epoch 60 Batch 480/1540] avg loss 0.00150295, throughput 2.19211K wps
[Epoch 60 Batch 510/1540] avg loss 0.00126648, throughput 2.18827K wps
[Epoch 60 Batch 540/1540] avg loss 0.00130562, throughput 2.16832K wps
[Epoch 60 Batch 570/1540] avg loss 0.00147206, throughput 2.17477K wps
[Epoch 60 Batch 600/1540] avg loss 0.00130294, throughput 2.19476K wps
[Epoch 60 Batch 630/1540] avg loss 0.00126055, throughput 2.18877K wps
[Epoch 60 Batch 660/1540] avg loss 0.00136313, throughput 2.18596K wps
[Epoch 60 Batch 690/1540] avg loss 0.0014941, throughput 2.19061K wps
[Epoch 60 Batch 720/1540] avg loss 0.00174774, throughput 2.18798K wps
[Epoch 60 Batch 750/1540] avg loss 0.00163699, throughput 2.18625K wps
[Epoch 60 Batch 780/1540] avg loss 0.00123467, throughput 2.18875K wps
[Epoch 60 Batch 810/1540] avg loss 0.00124412, throughput 2.19233K wps
[Epoch 60 Batch 840/1540] avg loss 0.00178268, throughput 2.17435K wps
[Epoch 60 Batch 870/1540] avg loss 0.00132789, throughput 2.19498K wps
[Epoch 60 Batch 900/1540] avg loss 0.00118809, throughput 2.1738K wps
[Epoch 60 Batch 930/1540] avg loss 0.00140715, throughput 2.16615K wps
[Epoch 60 Batch 960/1540] avg loss 0.00151348, throughput 2.19419K wps
[Epoch 60 Batch 990/1540] avg loss 0.0013939, throughput 2.16974K wps
[Epoch 60 Batch 1020/1540] avg loss 0.00126415, throughput 2.17269K wps
[Epoch 60 Batch 1050/1540] avg loss 0.0013665, throughput 2.17501K wps
[Epoch 60 Batch 1080/1540] avg loss 0.00161804, throughput 2.19376K wps
[Epoch 60 Batch 1110/1540] avg loss 0.00150992, throughput 2.18302K wps
[Epoch 60 Batch 1140/1540] avg loss 0.00190108, throughput 2.19241K wps
[Epoch 60 Batch 1170/1540] avg loss 0.00159729, throughput 2.19145K wps
[Epoch 60 Batch 1200/1540] avg loss 0.00163161, throughput 2.18248K wps
[Epoch 60 Batch 1230/1540] avg loss 0.00156838, throughput 2.18733K wps
[Epoch 60 Batch 1260/1540] avg loss 0.00166871, throughput 2.19193K wps
[Epoch 60 Batch 1290/1540] avg loss 0.00133517, throughput 2.1913K wps
[Epoch 60 Batch 1320/1540] avg loss 0.00142479, throughput 2.1917K wps
[Epoch 60 Batch 1350/1540] avg loss 0.00149641, throughput 2.19217K wps
[Epoch 60 Batch 1380/1540] avg loss 0.00133693, throughput 2.19064K wps
[Epoch 60 Batch 1410/1540] avg loss 0.00177346, throughput 2.18881K wps
[Epoch 60 Batch 1440/1540] avg loss 0.00136136, throughput 2.18923K wps
[Epoch 60 Batch 1470/1540] avg loss 0.00146916, throughput 2.19294K wps
[Epoch 60 Batch 1500/1540] avg loss 0.00140521, throughput 2.19046K wps
[Epoch 60 Batch 1530/1540] avg loss 0.00149496, throughput 2.193K wps
Begin Testing...
[Epoch 60] train avg loss 0.0014229, dev acc 0.8372, dev avg loss 0.589206, throughput 2.18795K wps
[Epoch 61 Batch 30/1540] avg loss 0.00126958, throughput 2.23192K wps
[Epoch 61 Batch 60/1540] avg loss 0.00143318, throughput 2.18479K wps
[Epoch 61 Batch 90/1540] avg loss 0.00148407, throughput 2.18452K wps
[Epoch 61 Batch 120/1540] avg loss 0.00130675, throughput 2.19246K wps
[Epoch 61 Batch 150/1540] avg loss 0.0013025, throughput 2.17493K wps
[Epoch 61 Batch 180/1540] avg loss 0.00147738, throughput 2.18288K wps
[Epoch 61 Batch 210/1540] avg loss 0.00145301, throughput 2.19442K wps
[Epoch 61 Batch 240/1540] avg loss 0.00140212, throughput 2.16086K wps
[Epoch 61 Batch 270/1540] avg loss 0.00133939, throughput 2.16172K wps
[Epoch 61 Batch 300/1540] avg loss 0.00143877, throughput 2.17928K wps
[Epoch 61 Batch 330/1540] avg loss 0.00152778, throughput 2.1899K wps
[Epoch 61 Batch 360/1540] avg loss 0.00142471, throughput 2.18595K wps
[Epoch 61 Batch 390/1540] avg loss 0.00138206, throughput 2.19078K wps
[Epoch 61 Batch 420/1540] avg loss 0.0012362, throughput 2.18968K wps
[Epoch 61 Batch 450/1540] avg loss 0.00149238, throughput 2.19172K wps
[Epoch 61 Batch 480/1540] avg loss 0.00121227, throughput 2.18786K wps
[Epoch 61 Batch 510/1540] avg loss 0.00157274, throughput 2.18902K wps
[Epoch 61 Batch 540/1540] avg loss 0.00138135, throughput 2.1881K wps
[Epoch 61 Batch 570/1540] avg loss 0.00153863, throughput 2.17312K wps
[Epoch 61 Batch 600/1540] avg loss 0.00126268, throughput 2.16452K wps
[Epoch 61 Batch 630/1540] avg loss 0.00142767, throughput 2.1862K wps
[Epoch 61 Batch 660/1540] avg loss 0.00140124, throughput 2.18544K wps
[Epoch 61 Batch 690/1540] avg loss 0.0014177, throughput 2.18358K wps
[Epoch 61 Batch 720/1540] avg loss 0.00161394, throughput 2.1943K wps
[Epoch 61 Batch 750/1540] avg loss 0.00158375, throughput 2.18338K wps
[Epoch 61 Batch 780/1540] avg loss 0.00147999, throughput 2.16674K wps
[Epoch 61 Batch 810/1540] avg loss 0.00127862, throughput 2.18365K wps
[Epoch 61 Batch 840/1540] avg loss 0.00135442, throughput 2.17847K wps
[Epoch 61 Batch 870/1540] avg loss 0.0013547, throughput 2.1914K wps
[Epoch 61 Batch 900/1540] avg loss 0.00135732, throughput 2.18716K wps
[Epoch 61 Batch 930/1540] avg loss 0.000974528, throughput 2.16728K wps
[Epoch 61 Batch 960/1540] avg loss 0.00137032, throughput 2.1824K wps
[Epoch 61 Batch 990/1540] avg loss 0.0012547, throughput 2.19013K wps
[Epoch 61 Batch 1020/1540] avg loss 0.00160704, throughput 2.18944K wps
[Epoch 61 Batch 1050/1540] avg loss 0.00150586, throughput 2.16377K wps
[Epoch 61 Batch 1080/1540] avg loss 0.0014616, throughput 2.16812K wps
[Epoch 61 Batch 1110/1540] avg loss 0.0013189, throughput 2.19103K wps
[Epoch 61 Batch 1140/1540] avg loss 0.0014824, throughput 2.18485K wps
[Epoch 61 Batch 1170/1540] avg loss 0.0014353, throughput 2.18979K wps
[Epoch 61 Batch 1200/1540] avg loss 0.00176138, throughput 2.1727K wps
[Epoch 61 Batch 1230/1540] avg loss 0.00156989, throughput 2.1929K wps
[Epoch 61 Batch 1260/1540] avg loss 0.00166058, throughput 2.18665K wps
[Epoch 61 Batch 1290/1540] avg loss 0.00155286, throughput 2.18983K wps
[Epoch 61 Batch 1320/1540] avg loss 0.00165786, throughput 2.18824K wps
[Epoch 61 Batch 1350/1540] avg loss 0.00173861, throughput 2.19138K wps
[Epoch 61 Batch 1380/1540] avg loss 0.00144569, throughput 2.19109K wps
[Epoch 61 Batch 1410/1540] avg loss 0.00159199, throughput 2.18696K wps
[Epoch 61 Batch 1440/1540] avg loss 0.00127907, throughput 2.18561K wps
[Epoch 61 Batch 1470/1540] avg loss 0.00128403, throughput 2.18658K wps
[Epoch 61 Batch 1500/1540] avg loss 0.00145077, throughput 2.16416K wps
[Epoch 61 Batch 1530/1540] avg loss 0.00159154, throughput 2.18024K wps
Begin Testing...
[Epoch 61] train avg loss 0.00143616, dev acc 0.8394, dev avg loss 0.586818, throughput 2.18375K wps
[Epoch 62 Batch 30/1540] avg loss 0.00140694, throughput 2.19651K wps
[Epoch 62 Batch 60/1540] avg loss 0.00113929, throughput 2.18521K wps
[Epoch 62 Batch 90/1540] avg loss 0.00136262, throughput 2.19302K wps
[Epoch 62 Batch 120/1540] avg loss 0.00131844, throughput 2.19072K wps
[Epoch 62 Batch 150/1540] avg loss 0.00146703, throughput 2.18954K wps
[Epoch 62 Batch 180/1540] avg loss 0.00137826, throughput 2.17551K wps
[Epoch 62 Batch 210/1540] avg loss 0.00131, throughput 2.19015K wps
[Epoch 62 Batch 240/1540] avg loss 0.00143528, throughput 2.17976K wps
[Epoch 62 Batch 270/1540] avg loss 0.00123528, throughput 2.19186K wps
[Epoch 62 Batch 300/1540] avg loss 0.00121194, throughput 2.19324K wps
[Epoch 62 Batch 330/1540] avg loss 0.00146072, throughput 2.16597K wps
[Epoch 62 Batch 360/1540] avg loss 0.00125519, throughput 2.18261K wps
[Epoch 62 Batch 390/1540] avg loss 0.00130308, throughput 2.18413K wps
[Epoch 62 Batch 420/1540] avg loss 0.00131881, throughput 2.19535K wps
[Epoch 62 Batch 450/1540] avg loss 0.00150911, throughput 2.19371K wps
[Epoch 62 Batch 480/1540] avg loss 0.00151738, throughput 2.19626K wps
[Epoch 62 Batch 510/1540] avg loss 0.00151146, throughput 2.16357K wps
[Epoch 62 Batch 540/1540] avg loss 0.00103422, throughput 2.18289K wps
[Epoch 62 Batch 570/1540] avg loss 0.00119184, throughput 2.18142K wps
[Epoch 62 Batch 600/1540] avg loss 0.00151646, throughput 2.18232K wps
[Epoch 62 Batch 630/1540] avg loss 0.00141972, throughput 2.19023K wps
[Epoch 62 Batch 660/1540] avg loss 0.0013186, throughput 2.19296K wps
[Epoch 62 Batch 690/1540] avg loss 0.00149414, throughput 2.17533K wps
[Epoch 62 Batch 720/1540] avg loss 0.00117223, throughput 2.16625K wps
[Epoch 62 Batch 750/1540] avg loss 0.0014402, throughput 2.18227K wps
[Epoch 62 Batch 780/1540] avg loss 0.00143769, throughput 2.18461K wps
[Epoch 62 Batch 810/1540] avg loss 0.00136392, throughput 2.18336K wps
[Epoch 62 Batch 840/1540] avg loss 0.0015051, throughput 2.19797K wps
[Epoch 62 Batch 870/1540] avg loss 0.00138545, throughput 2.18813K wps
[Epoch 62 Batch 900/1540] avg loss 0.0013918, throughput 2.18817K wps
[Epoch 62 Batch 930/1540] avg loss 0.0014246, throughput 2.17622K wps
[Epoch 62 Batch 960/1540] avg loss 0.00171542, throughput 2.16447K wps
[Epoch 62 Batch 990/1540] avg loss 0.00139127, throughput 2.18856K wps
[Epoch 62 Batch 1020/1540] avg loss 0.00131706, throughput 2.19118K wps
[Epoch 62 Batch 1050/1540] avg loss 0.00153691, throughput 2.18784K wps
[Epoch 62 Batch 1080/1540] avg loss 0.00126966, throughput 2.18664K wps
[Epoch 62 Batch 1110/1540] avg loss 0.00161965, throughput 2.18669K wps
[Epoch 62 Batch 1140/1540] avg loss 0.00161447, throughput 2.18896K wps
[Epoch 62 Batch 1170/1540] avg loss 0.00142242, throughput 2.18975K wps
[Epoch 62 Batch 1200/1540] avg loss 0.00146849, throughput 2.19094K wps
[Epoch 62 Batch 1230/1540] avg loss 0.00122656, throughput 2.18981K wps
[Epoch 62 Batch 1260/1540] avg loss 0.00155317, throughput 2.15931K wps
[Epoch 62 Batch 1290/1540] avg loss 0.0015522, throughput 2.16945K wps
[Epoch 62 Batch 1320/1540] avg loss 0.00153819, throughput 2.19591K wps
[Epoch 62 Batch 1350/1540] avg loss 0.00140224, throughput 2.18884K wps
[Epoch 62 Batch 1380/1540] avg loss 0.00145892, throughput 2.18247K wps
[Epoch 62 Batch 1410/1540] avg loss 0.00119299, throughput 2.18776K wps
[Epoch 62 Batch 1440/1540] avg loss 0.00136426, throughput 2.18994K wps
[Epoch 62 Batch 1470/1540] avg loss 0.0012445, throughput 2.18944K wps
[Epoch 62 Batch 1500/1540] avg loss 0.00156801, throughput 2.18248K wps
[Epoch 62 Batch 1530/1540] avg loss 0.00127833, throughput 2.18595K wps
Begin Testing...
[Epoch 62] train avg loss 0.0013913, dev acc 0.8383, dev avg loss 0.594435, throughput 2.18509K wps
[Epoch 63 Batch 30/1540] avg loss 0.00142385, throughput 2.23584K wps
[Epoch 63 Batch 60/1540] avg loss 0.00117814, throughput 2.19193K wps
[Epoch 63 Batch 90/1540] avg loss 0.00111413, throughput 2.18856K wps
[Epoch 63 Batch 120/1540] avg loss 0.00115534, throughput 2.18699K wps
[Epoch 63 Batch 150/1540] avg loss 0.00108271, throughput 2.19264K wps
[Epoch 63 Batch 180/1540] avg loss 0.00123856, throughput 2.18933K wps
[Epoch 63 Batch 210/1540] avg loss 0.00134571, throughput 2.16896K wps
[Epoch 63 Batch 240/1540] avg loss 0.00125733, throughput 2.17229K wps
[Epoch 63 Batch 270/1540] avg loss 0.00119602, throughput 2.17824K wps
[Epoch 63 Batch 300/1540] avg loss 0.00128641, throughput 2.18601K wps
[Epoch 63 Batch 330/1540] avg loss 0.00110705, throughput 2.1884K wps
[Epoch 63 Batch 360/1540] avg loss 0.00157485, throughput 2.16611K wps
[Epoch 63 Batch 390/1540] avg loss 0.00150642, throughput 2.18266K wps
[Epoch 63 Batch 420/1540] avg loss 0.00127344, throughput 2.18373K wps
[Epoch 63 Batch 450/1540] avg loss 0.00132404, throughput 2.18151K wps
[Epoch 63 Batch 480/1540] avg loss 0.00137248, throughput 2.1928K wps
[Epoch 63 Batch 510/1540] avg loss 0.00139601, throughput 2.19324K wps
[Epoch 63 Batch 540/1540] avg loss 0.0013687, throughput 2.15407K wps
[Epoch 63 Batch 570/1540] avg loss 0.00105236, throughput 2.16336K wps
[Epoch 63 Batch 600/1540] avg loss 0.0014161, throughput 2.17209K wps
[Epoch 63 Batch 630/1540] avg loss 0.00157722, throughput 2.17836K wps
[Epoch 63 Batch 660/1540] avg loss 0.00137542, throughput 2.18295K wps
[Epoch 63 Batch 690/1540] avg loss 0.00148292, throughput 2.19066K wps
[Epoch 63 Batch 720/1540] avg loss 0.00128632, throughput 2.18991K wps
[Epoch 63 Batch 750/1540] avg loss 0.00140956, throughput 2.19356K wps
[Epoch 63 Batch 780/1540] avg loss 0.00141351, throughput 2.18399K wps
[Epoch 63 Batch 810/1540] avg loss 0.00162259, throughput 2.17094K wps
[Epoch 63 Batch 840/1540] avg loss 0.00134794, throughput 2.19023K wps
[Epoch 63 Batch 870/1540] avg loss 0.00168083, throughput 2.18875K wps
[Epoch 63 Batch 900/1540] avg loss 0.00124662, throughput 2.19008K wps
[Epoch 63 Batch 930/1540] avg loss 0.0016675, throughput 2.18778K wps
[Epoch 63 Batch 960/1540] avg loss 0.00139635, throughput 2.18311K wps
[Epoch 63 Batch 990/1540] avg loss 0.00132941, throughput 2.18924K wps
[Epoch 63 Batch 1020/1540] avg loss 0.00134697, throughput 2.19416K wps
[Epoch 63 Batch 1050/1540] avg loss 0.00133699, throughput 2.19086K wps
[Epoch 63 Batch 1080/1540] avg loss 0.00129958, throughput 2.17705K wps
[Epoch 63 Batch 1110/1540] avg loss 0.0014652, throughput 2.19182K wps
[Epoch 63 Batch 1140/1540] avg loss 0.00147944, throughput 2.18303K wps
[Epoch 63 Batch 1170/1540] avg loss 0.00110005, throughput 2.19145K wps
[Epoch 63 Batch 1200/1540] avg loss 0.00159252, throughput 2.19455K wps
[Epoch 63 Batch 1230/1540] avg loss 0.00158514, throughput 2.16672K wps
[Epoch 63 Batch 1260/1540] avg loss 0.00136017, throughput 2.172K wps
[Epoch 63 Batch 1290/1540] avg loss 0.00141681, throughput 2.16624K wps
[Epoch 63 Batch 1320/1540] avg loss 0.00159405, throughput 2.19321K wps
[Epoch 63 Batch 1350/1540] avg loss 0.00147317, throughput 2.17927K wps
[Epoch 63 Batch 1380/1540] avg loss 0.00142727, throughput 2.17472K wps
[Epoch 63 Batch 1410/1540] avg loss 0.00173329, throughput 2.19508K wps
[Epoch 63 Batch 1440/1540] avg loss 0.00133265, throughput 2.1881K wps
[Epoch 63 Batch 1470/1540] avg loss 0.00136179, throughput 2.18789K wps
[Epoch 63 Batch 1500/1540] avg loss 0.00176636, throughput 2.18506K wps
[Epoch 63 Batch 1530/1540] avg loss 0.00162306, throughput 2.19359K wps
Begin Testing...
[Epoch 63] train avg loss 0.00139127, dev acc 0.8383, dev avg loss 0.599749, throughput 2.18463K wps
[Epoch 64 Batch 30/1540] avg loss 0.00121695, throughput 2.22557K wps
[Epoch 64 Batch 60/1540] avg loss 0.00101169, throughput 2.17977K wps
[Epoch 64 Batch 90/1540] avg loss 0.00128145, throughput 2.18826K wps
[Epoch 64 Batch 120/1540] avg loss 0.00123407, throughput 2.18474K wps
[Epoch 64 Batch 150/1540] avg loss 0.00134396, throughput 2.18969K wps
[Epoch 64 Batch 180/1540] avg loss 0.00132324, throughput 2.19493K wps
[Epoch 64 Batch 210/1540] avg loss 0.00123401, throughput 2.19024K wps
[Epoch 64 Batch 240/1540] avg loss 0.00144466, throughput 2.1902K wps
[Epoch 64 Batch 270/1540] avg loss 0.00148012, throughput 2.17812K wps
[Epoch 64 Batch 300/1540] avg loss 0.00117286, throughput 2.18202K wps
[Epoch 64 Batch 330/1540] avg loss 0.00123268, throughput 2.19614K wps
[Epoch 64 Batch 360/1540] avg loss 0.00125636, throughput 2.19411K wps
[Epoch 64 Batch 390/1540] avg loss 0.00130916, throughput 2.18468K wps
[Epoch 64 Batch 420/1540] avg loss 0.00141844, throughput 2.17697K wps
[Epoch 64 Batch 450/1540] avg loss 0.00129854, throughput 2.18962K wps
[Epoch 64 Batch 480/1540] avg loss 0.00150565, throughput 2.17584K wps
[Epoch 64 Batch 510/1540] avg loss 0.00136081, throughput 2.18576K wps
[Epoch 64 Batch 540/1540] avg loss 0.00185908, throughput 2.19064K wps
[Epoch 64 Batch 570/1540] avg loss 0.0015106, throughput 2.18696K wps
[Epoch 64 Batch 600/1540] avg loss 0.00150543, throughput 2.19028K wps
[Epoch 64 Batch 630/1540] avg loss 0.00113952, throughput 2.1824K wps
[Epoch 64 Batch 660/1540] avg loss 0.00116862, throughput 2.17873K wps
[Epoch 64 Batch 690/1540] avg loss 0.00110663, throughput 2.16041K wps
[Epoch 64 Batch 720/1540] avg loss 0.0013436, throughput 2.16556K wps
[Epoch 64 Batch 750/1540] avg loss 0.00127, throughput 2.18684K wps
[Epoch 64 Batch 780/1540] avg loss 0.00145073, throughput 2.1809K wps
[Epoch 64 Batch 810/1540] avg loss 0.00143029, throughput 2.1734K wps
[Epoch 64 Batch 840/1540] avg loss 0.0015101, throughput 2.17586K wps
[Epoch 64 Batch 870/1540] avg loss 0.00150362, throughput 2.16331K wps
[Epoch 64 Batch 900/1540] avg loss 0.00143852, throughput 2.18911K wps
[Epoch 64 Batch 930/1540] avg loss 0.00146498, throughput 2.17882K wps
[Epoch 64 Batch 960/1540] avg loss 0.00138488, throughput 2.17906K wps
[Epoch 64 Batch 990/1540] avg loss 0.00158749, throughput 2.17337K wps
[Epoch 64 Batch 1020/1540] avg loss 0.00117392, throughput 2.1697K wps
[Epoch 64 Batch 1050/1540] avg loss 0.00143273, throughput 2.19003K wps
[Epoch 64 Batch 1080/1540] avg loss 0.00144507, throughput 2.1825K wps
[Epoch 64 Batch 1110/1540] avg loss 0.00139741, throughput 2.15062K wps
[Epoch 64 Batch 1140/1540] avg loss 0.00153035, throughput 2.16918K wps
[Epoch 64 Batch 1170/1540] avg loss 0.00163288, throughput 2.19045K wps
[Epoch 64 Batch 1200/1540] avg loss 0.00131432, throughput 2.18657K wps
[Epoch 64 Batch 1230/1540] avg loss 0.00110212, throughput 2.18578K wps
[Epoch 64 Batch 1260/1540] avg loss 0.00140702, throughput 2.17695K wps
[Epoch 64 Batch 1290/1540] avg loss 0.0012976, throughput 2.16031K wps
[Epoch 64 Batch 1320/1540] avg loss 0.00116937, throughput 2.16631K wps
[Epoch 64 Batch 1350/1540] avg loss 0.00136222, throughput 2.17974K wps
[Epoch 64 Batch 1380/1540] avg loss 0.00124807, throughput 2.18905K wps
[Epoch 64 Batch 1410/1540] avg loss 0.0013366, throughput 2.18639K wps
[Epoch 64 Batch 1440/1540] avg loss 0.00154343, throughput 2.18626K wps
[Epoch 64 Batch 1470/1540] avg loss 0.00136636, throughput 2.18797K wps
[Epoch 64 Batch 1500/1540] avg loss 0.0012967, throughput 2.15564K wps
[Epoch 64 Batch 1530/1540] avg loss 0.0013074, throughput 2.17156K wps
Begin Testing...
[Epoch 64] train avg loss 0.0013546, dev acc 0.8372, dev avg loss 0.591662, throughput 2.18138K wps
[Epoch 65 Batch 30/1540] avg loss 0.0013604, throughput 2.22487K wps
[Epoch 65 Batch 60/1540] avg loss 0.00118847, throughput 2.19106K wps
[Epoch 65 Batch 90/1540] avg loss 0.00115584, throughput 2.16898K wps
[Epoch 65 Batch 120/1540] avg loss 0.00153406, throughput 2.19052K wps
[Epoch 65 Batch 150/1540] avg loss 0.00128303, throughput 2.15591K wps
[Epoch 65 Batch 180/1540] avg loss 0.00134402, throughput 2.17598K wps
[Epoch 65 Batch 210/1540] avg loss 0.00148104, throughput 2.16889K wps
[Epoch 65 Batch 240/1540] avg loss 0.00135139, throughput 2.1878K wps
[Epoch 65 Batch 270/1540] avg loss 0.00149769, throughput 2.18542K wps
[Epoch 65 Batch 300/1540] avg loss 0.00138721, throughput 2.18984K wps
[Epoch 65 Batch 330/1540] avg loss 0.00130451, throughput 2.17719K wps
[Epoch 65 Batch 360/1540] avg loss 0.00132186, throughput 2.17054K wps
[Epoch 65 Batch 390/1540] avg loss 0.00136371, throughput 2.19123K wps
[Epoch 65 Batch 420/1540] avg loss 0.00120682, throughput 2.18826K wps
[Epoch 65 Batch 450/1540] avg loss 0.00151009, throughput 2.19287K wps
[Epoch 65 Batch 480/1540] avg loss 0.0012471, throughput 2.19082K wps
[Epoch 65 Batch 510/1540] avg loss 0.00152619, throughput 2.18088K wps
[Epoch 65 Batch 540/1540] avg loss 0.001329, throughput 2.17861K wps
[Epoch 65 Batch 570/1540] avg loss 0.00148177, throughput 2.16342K wps
[Epoch 65 Batch 600/1540] avg loss 0.00139052, throughput 2.18749K wps
[Epoch 65 Batch 630/1540] avg loss 0.00119052, throughput 2.18485K wps
[Epoch 65 Batch 660/1540] avg loss 0.0016321, throughput 2.18985K wps
[Epoch 65 Batch 690/1540] avg loss 0.00134506, throughput 2.18301K wps
[Epoch 65 Batch 720/1540] avg loss 0.00146363, throughput 2.191K wps
[Epoch 65 Batch 750/1540] avg loss 0.00116455, throughput 2.17675K wps
[Epoch 65 Batch 780/1540] avg loss 0.00135397, throughput 2.19401K wps
[Epoch 65 Batch 810/1540] avg loss 0.00143844, throughput 2.1753K wps
[Epoch 65 Batch 840/1540] avg loss 0.00121233, throughput 2.18602K wps
[Epoch 65 Batch 870/1540] avg loss 0.00122283, throughput 2.18648K wps
[Epoch 65 Batch 900/1540] avg loss 0.00148994, throughput 2.18956K wps
[Epoch 65 Batch 930/1540] avg loss 0.00141252, throughput 2.19136K wps
[Epoch 65 Batch 960/1540] avg loss 0.00135671, throughput 2.18459K wps
[Epoch 65 Batch 990/1540] avg loss 0.00129098, throughput 2.18749K wps
[Epoch 65 Batch 1020/1540] avg loss 0.00137906, throughput 2.14878K wps
[Epoch 65 Batch 1050/1540] avg loss 0.00113366, throughput 2.19083K wps
[Epoch 65 Batch 1080/1540] avg loss 0.00193209, throughput 2.1767K wps
[Epoch 65 Batch 1110/1540] avg loss 0.00143416, throughput 2.1872K wps
[Epoch 65 Batch 1140/1540] avg loss 0.00127405, throughput 2.18058K wps
[Epoch 65 Batch 1170/1540] avg loss 0.0013255, throughput 2.19029K wps
[Epoch 65 Batch 1200/1540] avg loss 0.00128481, throughput 2.17413K wps
[Epoch 65 Batch 1230/1540] avg loss 0.00128896, throughput 2.17922K wps
[Epoch 65 Batch 1260/1540] avg loss 0.00146056, throughput 2.17871K wps
[Epoch 65 Batch 1290/1540] avg loss 0.00155084, throughput 2.17743K wps
[Epoch 65 Batch 1320/1540] avg loss 0.0016556, throughput 2.18882K wps
[Epoch 65 Batch 1350/1540] avg loss 0.00141052, throughput 2.18252K wps
[Epoch 65 Batch 1380/1540] avg loss 0.0012869, throughput 2.18968K wps
[Epoch 65 Batch 1410/1540] avg loss 0.00133776, throughput 2.1794K wps
[Epoch 65 Batch 1440/1540] avg loss 0.00140201, throughput 2.18188K wps
[Epoch 65 Batch 1470/1540] avg loss 0.00129762, throughput 2.16037K wps
[Epoch 65 Batch 1500/1540] avg loss 0.00139098, throughput 2.1848K wps
[Epoch 65 Batch 1530/1540] avg loss 0.00118841, throughput 2.19002K wps
Begin Testing...
[Epoch 65] train avg loss 0.00137397, dev acc 0.8417, dev avg loss 0.587575, throughput 2.1828K wps
[Epoch 66 Batch 30/1540] avg loss 0.00148364, throughput 2.24102K wps
[Epoch 66 Batch 60/1540] avg loss 0.00136666, throughput 2.19199K wps
[Epoch 66 Batch 90/1540] avg loss 0.00140116, throughput 2.18104K wps
[Epoch 66 Batch 120/1540] avg loss 0.001294, throughput 2.17911K wps
[Epoch 66 Batch 150/1540] avg loss 0.00123445, throughput 2.18628K wps
[Epoch 66 Batch 180/1540] avg loss 0.00123967, throughput 2.1892K wps
[Epoch 66 Batch 210/1540] avg loss 0.00117083, throughput 2.18208K wps
[Epoch 66 Batch 240/1540] avg loss 0.00131796, throughput 2.17305K wps
[Epoch 66 Batch 270/1540] avg loss 0.00141064, throughput 2.18737K wps
[Epoch 66 Batch 300/1540] avg loss 0.00134107, throughput 2.19065K wps
[Epoch 66 Batch 330/1540] avg loss 0.00124225, throughput 2.18829K wps
[Epoch 66 Batch 360/1540] avg loss 0.00125315, throughput 2.15226K wps
[Epoch 66 Batch 390/1540] avg loss 0.00139666, throughput 2.19055K wps
[Epoch 66 Batch 420/1540] avg loss 0.00134007, throughput 2.18964K wps
[Epoch 66 Batch 450/1540] avg loss 0.00139375, throughput 2.18739K wps
[Epoch 66 Batch 480/1540] avg loss 0.00124109, throughput 2.1889K wps
[Epoch 66 Batch 510/1540] avg loss 0.00142252, throughput 2.17975K wps
[Epoch 66 Batch 540/1540] avg loss 0.00142391, throughput 2.19252K wps
[Epoch 66 Batch 570/1540] avg loss 0.00155164, throughput 2.17532K wps
[Epoch 66 Batch 600/1540] avg loss 0.00127357, throughput 2.1674K wps
[Epoch 66 Batch 630/1540] avg loss 0.00148731, throughput 2.18825K wps
[Epoch 66 Batch 660/1540] avg loss 0.00119577, throughput 2.17248K wps
[Epoch 66 Batch 690/1540] avg loss 0.00133751, throughput 2.18497K wps
[Epoch 66 Batch 720/1540] avg loss 0.00135276, throughput 2.19369K wps
[Epoch 66 Batch 750/1540] avg loss 0.00146756, throughput 2.19499K wps
[Epoch 66 Batch 780/1540] avg loss 0.00133674, throughput 2.19348K wps
[Epoch 66 Batch 810/1540] avg loss 0.00152859, throughput 2.18359K wps
[Epoch 66 Batch 840/1540] avg loss 0.00133148, throughput 2.19064K wps
[Epoch 66 Batch 870/1540] avg loss 0.00136277, throughput 2.18946K wps
[Epoch 66 Batch 900/1540] avg loss 0.00150396, throughput 2.19192K wps
[Epoch 66 Batch 930/1540] avg loss 0.00150935, throughput 2.16745K wps
[Epoch 66 Batch 960/1540] avg loss 0.00136289, throughput 2.1977K wps
[Epoch 66 Batch 990/1540] avg loss 0.00123381, throughput 2.18333K wps
[Epoch 66 Batch 1020/1540] avg loss 0.00134021, throughput 2.19253K wps
[Epoch 66 Batch 1050/1540] avg loss 0.00140819, throughput 2.18557K wps
[Epoch 66 Batch 1080/1540] avg loss 0.00114996, throughput 2.17765K wps
[Epoch 66 Batch 1110/1540] avg loss 0.00142236, throughput 2.19133K wps
[Epoch 66 Batch 1140/1540] avg loss 0.00125911, throughput 2.17997K wps
[Epoch 66 Batch 1170/1540] avg loss 0.00139396, throughput 2.19019K wps
[Epoch 66 Batch 1200/1540] avg loss 0.00117122, throughput 2.19223K wps
[Epoch 66 Batch 1230/1540] avg loss 0.00112224, throughput 2.18987K wps
[Epoch 66 Batch 1260/1540] avg loss 0.00106626, throughput 2.18742K wps
[Epoch 66 Batch 1290/1540] avg loss 0.00123527, throughput 2.17544K wps
[Epoch 66 Batch 1320/1540] avg loss 0.00148841, throughput 2.1958K wps
[Epoch 66 Batch 1350/1540] avg loss 0.00114607, throughput 2.18719K wps
[Epoch 66 Batch 1380/1540] avg loss 0.00156407, throughput 2.18664K wps
[Epoch 66 Batch 1410/1540] avg loss 0.00149329, throughput 2.18313K wps
[Epoch 66 Batch 1440/1540] avg loss 0.00124044, throughput 2.18122K wps
[Epoch 66 Batch 1470/1540] avg loss 0.0014804, throughput 2.19135K wps
[Epoch 66 Batch 1500/1540] avg loss 0.00121062, throughput 2.18715K wps
[Epoch 66 Batch 1530/1540] avg loss 0.00153581, throughput 2.15712K wps
Begin Testing...
[Epoch 66] train avg loss 0.00134588, dev acc 0.8372, dev avg loss 0.616442, throughput 2.18585K wps
[Epoch 67 Batch 30/1540] avg loss 0.00127244, throughput 2.21404K wps
[Epoch 67 Batch 60/1540] avg loss 0.00125379, throughput 2.18133K wps
[Epoch 67 Batch 90/1540] avg loss 0.00113688, throughput 2.17744K wps
[Epoch 67 Batch 120/1540] avg loss 0.00137608, throughput 2.16134K wps
[Epoch 67 Batch 150/1540] avg loss 0.00140622, throughput 2.17311K wps
[Epoch 67 Batch 180/1540] avg loss 0.0012614, throughput 2.18969K wps
[Epoch 67 Batch 210/1540] avg loss 0.0011455, throughput 2.17531K wps
[Epoch 67 Batch 240/1540] avg loss 0.00136627, throughput 2.19283K wps
[Epoch 67 Batch 270/1540] avg loss 0.0010481, throughput 2.18369K wps
[Epoch 67 Batch 300/1540] avg loss 0.0014299, throughput 2.18639K wps
[Epoch 67 Batch 330/1540] avg loss 0.00139538, throughput 2.18809K wps
[Epoch 67 Batch 360/1540] avg loss 0.00139044, throughput 2.18487K wps
[Epoch 67 Batch 390/1540] avg loss 0.00133808, throughput 2.1925K wps
[Epoch 67 Batch 420/1540] avg loss 0.00153024, throughput 2.19048K wps
[Epoch 67 Batch 450/1540] avg loss 0.00113998, throughput 2.18767K wps
[Epoch 67 Batch 480/1540] avg loss 0.00130754, throughput 2.19171K wps
[Epoch 67 Batch 510/1540] avg loss 0.00126853, throughput 2.17972K wps
[Epoch 67 Batch 540/1540] avg loss 0.00127547, throughput 2.16737K wps
[Epoch 67 Batch 570/1540] avg loss 0.00131054, throughput 2.18585K wps
[Epoch 67 Batch 600/1540] avg loss 0.00101526, throughput 2.18759K wps
[Epoch 67 Batch 630/1540] avg loss 0.00145737, throughput 2.18855K wps
[Epoch 67 Batch 660/1540] avg loss 0.00118653, throughput 2.1725K wps
[Epoch 67 Batch 690/1540] avg loss 0.00101864, throughput 2.1922K wps
[Epoch 67 Batch 720/1540] avg loss 0.00134283, throughput 2.19124K wps
[Epoch 67 Batch 750/1540] avg loss 0.00150613, throughput 2.19207K wps
[Epoch 67 Batch 780/1540] avg loss 0.00147374, throughput 2.1896K wps
[Epoch 67 Batch 810/1540] avg loss 0.00128811, throughput 2.19201K wps
[Epoch 67 Batch 840/1540] avg loss 0.00129657, throughput 2.19392K wps
[Epoch 67 Batch 870/1540] avg loss 0.00127629, throughput 2.18893K wps
[Epoch 67 Batch 900/1540] avg loss 0.00149054, throughput 2.18698K wps
[Epoch 67 Batch 930/1540] avg loss 0.00118856, throughput 2.19061K wps
[Epoch 67 Batch 960/1540] avg loss 0.00147564, throughput 2.18901K wps
[Epoch 67 Batch 990/1540] avg loss 0.00139409, throughput 2.19068K wps
[Epoch 67 Batch 1020/1540] avg loss 0.00144841, throughput 2.18757K wps
[Epoch 67 Batch 1050/1540] avg loss 0.00149394, throughput 2.17772K wps
[Epoch 67 Batch 1080/1540] avg loss 0.00147482, throughput 2.17596K wps
[Epoch 67 Batch 1110/1540] avg loss 0.00131629, throughput 2.18193K wps
[Epoch 67 Batch 1140/1540] avg loss 0.00157089, throughput 2.18236K wps
[Epoch 67 Batch 1170/1540] avg loss 0.00134571, throughput 2.17486K wps
[Epoch 67 Batch 1200/1540] avg loss 0.00134647, throughput 2.1681K wps
[Epoch 67 Batch 1230/1540] avg loss 0.00126205, throughput 2.18012K wps
[Epoch 67 Batch 1260/1540] avg loss 0.00127899, throughput 2.18203K wps
[Epoch 67 Batch 1290/1540] avg loss 0.0012988, throughput 2.19453K wps
[Epoch 67 Batch 1320/1540] avg loss 0.00142921, throughput 2.16697K wps
[Epoch 67 Batch 1350/1540] avg loss 0.00131668, throughput 2.17395K wps
[Epoch 67 Batch 1380/1540] avg loss 0.00142151, throughput 2.17843K wps
[Epoch 67 Batch 1410/1540] avg loss 0.00148205, throughput 2.1738K wps
[Epoch 67 Batch 1440/1540] avg loss 0.00135646, throughput 2.19238K wps
[Epoch 67 Batch 1470/1540] avg loss 0.00137682, throughput 2.1931K wps
[Epoch 67 Batch 1500/1540] avg loss 0.00140156, throughput 2.18528K wps
[Epoch 67 Batch 1530/1540] avg loss 0.00130387, throughput 2.19245K wps
Begin Testing...
[Epoch 67] train avg loss 0.00133557, dev acc 0.8429, dev avg loss 0.599735, throughput 2.18446K wps
[Epoch 68 Batch 30/1540] avg loss 0.00128239, throughput 2.23446K wps
[Epoch 68 Batch 60/1540] avg loss 0.00136266, throughput 2.1484K wps
[Epoch 68 Batch 90/1540] avg loss 0.00123227, throughput 2.15891K wps
[Epoch 68 Batch 120/1540] avg loss 0.00137985, throughput 2.18132K wps
[Epoch 68 Batch 150/1540] avg loss 0.00132504, throughput 2.1869K wps
[Epoch 68 Batch 180/1540] avg loss 0.00119402, throughput 2.1757K wps
[Epoch 68 Batch 210/1540] avg loss 0.00149831, throughput 2.19045K wps
[Epoch 68 Batch 240/1540] avg loss 0.00110212, throughput 2.18994K wps
[Epoch 68 Batch 270/1540] avg loss 0.00123328, throughput 2.19541K wps
[Epoch 68 Batch 300/1540] avg loss 0.00119746, throughput 2.18796K wps
[Epoch 68 Batch 330/1540] avg loss 0.00150886, throughput 2.16591K wps
[Epoch 68 Batch 360/1540] avg loss 0.00115192, throughput 2.16334K wps
[Epoch 68 Batch 390/1540] avg loss 0.00130106, throughput 2.19117K wps
[Epoch 68 Batch 420/1540] avg loss 0.00129221, throughput 2.18708K wps
[Epoch 68 Batch 450/1540] avg loss 0.00132617, throughput 2.19104K wps
[Epoch 68 Batch 480/1540] avg loss 0.00144662, throughput 2.18751K wps
[Epoch 68 Batch 510/1540] avg loss 0.00137518, throughput 2.17921K wps
[Epoch 68 Batch 540/1540] avg loss 0.0012071, throughput 2.18818K wps
[Epoch 68 Batch 570/1540] avg loss 0.00135761, throughput 2.18411K wps
[Epoch 68 Batch 600/1540] avg loss 0.00124604, throughput 2.18268K wps
[Epoch 68 Batch 630/1540] avg loss 0.00145995, throughput 2.17865K wps
[Epoch 68 Batch 660/1540] avg loss 0.00128059, throughput 2.1804K wps
[Epoch 68 Batch 690/1540] avg loss 0.00165561, throughput 2.19437K wps
[Epoch 68 Batch 720/1540] avg loss 0.00155915, throughput 2.19023K wps
[Epoch 68 Batch 750/1540] avg loss 0.00133737, throughput 2.17013K wps
[Epoch 68 Batch 780/1540] avg loss 0.00125169, throughput 2.1782K wps
[Epoch 68 Batch 810/1540] avg loss 0.00134154, throughput 2.19192K wps
[Epoch 68 Batch 840/1540] avg loss 0.00111539, throughput 2.18547K wps
[Epoch 68 Batch 870/1540] avg loss 0.0012809, throughput 2.17426K wps
[Epoch 68 Batch 900/1540] avg loss 0.00116756, throughput 2.18569K wps
[Epoch 68 Batch 930/1540] avg loss 0.00136846, throughput 2.19402K wps
[Epoch 68 Batch 960/1540] avg loss 0.00120779, throughput 2.18652K wps
[Epoch 68 Batch 990/1540] avg loss 0.00121918, throughput 2.17701K wps
[Epoch 68 Batch 1020/1540] avg loss 0.00136579, throughput 2.18825K wps
[Epoch 68 Batch 1050/1540] avg loss 0.00129254, throughput 2.19362K wps
[Epoch 68 Batch 1080/1540] avg loss 0.00146084, throughput 2.18115K wps
[Epoch 68 Batch 1110/1540] avg loss 0.00139888, throughput 2.18355K wps
[Epoch 68 Batch 1140/1540] avg loss 0.00127216, throughput 2.18365K wps
[Epoch 68 Batch 1170/1540] avg loss 0.00125629, throughput 2.18857K wps
[Epoch 68 Batch 1200/1540] avg loss 0.0013211, throughput 2.16804K wps
[Epoch 68 Batch 1230/1540] avg loss 0.00154655, throughput 2.18135K wps
[Epoch 68 Batch 1260/1540] avg loss 0.00134996, throughput 2.19227K wps
[Epoch 68 Batch 1290/1540] avg loss 0.00158464, throughput 2.1661K wps
[Epoch 68 Batch 1320/1540] avg loss 0.0011957, throughput 2.18849K wps
[Epoch 68 Batch 1350/1540] avg loss 0.001406, throughput 2.16892K wps
[Epoch 68 Batch 1380/1540] avg loss 0.00158681, throughput 2.17785K wps
[Epoch 68 Batch 1410/1540] avg loss 0.00115775, throughput 2.19158K wps
[Epoch 68 Batch 1440/1540] avg loss 0.00125015, throughput 2.1858K wps
[Epoch 68 Batch 1470/1540] avg loss 0.00126393, throughput 2.186K wps
[Epoch 68 Batch 1500/1540] avg loss 0.00131351, throughput 2.19612K wps
[Epoch 68 Batch 1530/1540] avg loss 0.0015233, throughput 2.18766K wps
Begin Testing...
[Epoch 68] train avg loss 0.00133405, dev acc 0.8326, dev avg loss 0.623435, throughput 2.18364K wps
[Epoch 69 Batch 30/1540] avg loss 0.00118234, throughput 2.23211K wps
[Epoch 69 Batch 60/1540] avg loss 0.00124345, throughput 2.19391K wps
[Epoch 69 Batch 90/1540] avg loss 0.00118992, throughput 2.19183K wps
[Epoch 69 Batch 120/1540] avg loss 0.00112155, throughput 2.18946K wps
[Epoch 69 Batch 150/1540] avg loss 0.00107628, throughput 2.18027K wps
[Epoch 69 Batch 180/1540] avg loss 0.00136865, throughput 2.1912K wps
[Epoch 69 Batch 210/1540] avg loss 0.00111884, throughput 2.19212K wps
[Epoch 69 Batch 240/1540] avg loss 0.00128373, throughput 2.18494K wps
[Epoch 69 Batch 270/1540] avg loss 0.0013278, throughput 2.17937K wps
[Epoch 69 Batch 300/1540] avg loss 0.00154967, throughput 2.17798K wps
[Epoch 69 Batch 330/1540] avg loss 0.00152045, throughput 2.1852K wps
[Epoch 69 Batch 360/1540] avg loss 0.000982355, throughput 2.18902K wps
[Epoch 69 Batch 390/1540] avg loss 0.0013315, throughput 2.1925K wps
[Epoch 69 Batch 420/1540] avg loss 0.00119887, throughput 2.19033K wps
[Epoch 69 Batch 450/1540] avg loss 0.00105346, throughput 2.16305K wps
[Epoch 69 Batch 480/1540] avg loss 0.00120493, throughput 2.19813K wps
[Epoch 69 Batch 510/1540] avg loss 0.00118246, throughput 2.19348K wps
[Epoch 69 Batch 540/1540] avg loss 0.00120536, throughput 2.18471K wps
[Epoch 69 Batch 570/1540] avg loss 0.00123932, throughput 2.1864K wps
[Epoch 69 Batch 600/1540] avg loss 0.00155406, throughput 2.15855K wps
[Epoch 69 Batch 630/1540] avg loss 0.00135501, throughput 2.18632K wps
[Epoch 69 Batch 660/1540] avg loss 0.0012199, throughput 2.1927K wps
[Epoch 69 Batch 690/1540] avg loss 0.0012849, throughput 2.1891K wps
[Epoch 69 Batch 720/1540] avg loss 0.0013108, throughput 2.18804K wps
[Epoch 69 Batch 750/1540] avg loss 0.00143835, throughput 2.18822K wps
[Epoch 69 Batch 780/1540] avg loss 0.00139199, throughput 2.17165K wps
[Epoch 69 Batch 810/1540] avg loss 0.00119127, throughput 2.16609K wps
[Epoch 69 Batch 840/1540] avg loss 0.00147565, throughput 2.17273K wps
[Epoch 69 Batch 870/1540] avg loss 0.00128223, throughput 2.18946K wps
[Epoch 69 Batch 900/1540] avg loss 0.00106781, throughput 2.1845K wps
[Epoch 69 Batch 930/1540] avg loss 0.00115973, throughput 2.18909K wps
[Epoch 69 Batch 960/1540] avg loss 0.00112979, throughput 2.17882K wps
[Epoch 69 Batch 990/1540] avg loss 0.00150946, throughput 2.18881K wps
[Epoch 69 Batch 1020/1540] avg loss 0.00130492, throughput 2.15527K wps
[Epoch 69 Batch 1050/1540] avg loss 0.00102491, throughput 2.1595K wps