Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='SST-2', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='static')
Use gpu0
maximum length (in tokens): 53
Done! Tokenizing Time=0.82s, #Sentences=76961
Done! Tokenizing Time=0.03s, #Sentences=1821
Done! Tokenizing Time=0.01s, #Sentences=872
SentimentNet(
(embedding): Embedding(17244 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/1540] avg loss 0.0138792, throughput 0.861731K wps
[Epoch 0 Batch 60/1540] avg loss 0.013821, throughput 4.73684K wps
[Epoch 0 Batch 90/1540] avg loss 0.0138026, throughput 4.98436K wps
[Epoch 0 Batch 120/1540] avg loss 0.0137526, throughput 5.22901K wps
[Epoch 0 Batch 150/1540] avg loss 0.0137629, throughput 5.42377K wps
[Epoch 0 Batch 180/1540] avg loss 0.0136742, throughput 4.63539K wps
[Epoch 0 Batch 210/1540] avg loss 0.013577, throughput 4.34876K wps
[Epoch 0 Batch 240/1540] avg loss 0.0136764, throughput 5.10333K wps
[Epoch 0 Batch 270/1540] avg loss 0.0135106, throughput 4.92396K wps
[Epoch 0 Batch 300/1540] avg loss 0.0134419, throughput 4.89391K wps
[Epoch 0 Batch 330/1540] avg loss 0.0134972, throughput 4.5121K wps
[Epoch 0 Batch 360/1540] avg loss 0.0133796, throughput 4.91044K wps
[Epoch 0 Batch 390/1540] avg loss 0.0134954, throughput 5.07773K wps
[Epoch 0 Batch 420/1540] avg loss 0.0132133, throughput 5.23382K wps
[Epoch 0 Batch 450/1540] avg loss 0.0131797, throughput 4.75222K wps
[Epoch 0 Batch 480/1540] avg loss 0.0133219, throughput 5.0942K wps
[Epoch 0 Batch 510/1540] avg loss 0.0132662, throughput 4.72039K wps
[Epoch 0 Batch 540/1540] avg loss 0.0130956, throughput 5.74136K wps
[Epoch 0 Batch 570/1540] avg loss 0.0131162, throughput 5.41752K wps
[Epoch 0 Batch 600/1540] avg loss 0.0129423, throughput 4.96132K wps
[Epoch 0 Batch 630/1540] avg loss 0.0129514, throughput 5.13204K wps
[Epoch 0 Batch 660/1540] avg loss 0.0130364, throughput 5.26315K wps
[Epoch 0 Batch 690/1540] avg loss 0.0129667, throughput 5.50256K wps
[Epoch 0 Batch 720/1540] avg loss 0.0128204, throughput 5.04154K wps
[Epoch 0 Batch 750/1540] avg loss 0.0128998, throughput 5.13232K wps
[Epoch 0 Batch 780/1540] avg loss 0.012946, throughput 5.52538K wps
[Epoch 0 Batch 810/1540] avg loss 0.0126599, throughput 5.32725K wps
[Epoch 0 Batch 840/1540] avg loss 0.0127798, throughput 4.71199K wps
[Epoch 0 Batch 870/1540] avg loss 0.0125999, throughput 5.65844K wps
[Epoch 0 Batch 900/1540] avg loss 0.0127011, throughput 5.33236K wps
[Epoch 0 Batch 930/1540] avg loss 0.0126541, throughput 5.47452K wps
[Epoch 0 Batch 960/1540] avg loss 0.0125421, throughput 4.93981K wps
[Epoch 0 Batch 990/1540] avg loss 0.0126159, throughput 4.99148K wps
[Epoch 0 Batch 1020/1540] avg loss 0.012382, throughput 5.07983K wps
[Epoch 0 Batch 1050/1540] avg loss 0.0123228, throughput 4.66846K wps
[Epoch 0 Batch 1080/1540] avg loss 0.0123909, throughput 4.71254K wps
[Epoch 0 Batch 1110/1540] avg loss 0.0123144, throughput 4.61051K wps
[Epoch 0 Batch 1140/1540] avg loss 0.0121857, throughput 5.24227K wps
[Epoch 0 Batch 1170/1540] avg loss 0.012291, throughput 5.87868K wps
[Epoch 0 Batch 1200/1540] avg loss 0.0122205, throughput 4.79803K wps
[Epoch 0 Batch 1230/1540] avg loss 0.0122922, throughput 5.34365K wps
[Epoch 0 Batch 1260/1540] avg loss 0.0121421, throughput 4.66134K wps
[Epoch 0 Batch 1290/1540] avg loss 0.0119442, throughput 5.38179K wps
[Epoch 0 Batch 1320/1540] avg loss 0.0120882, throughput 5.30799K wps
[Epoch 0 Batch 1350/1540] avg loss 0.0118595, throughput 4.98926K wps
[Epoch 0 Batch 1380/1540] avg loss 0.011844, throughput 4.60015K wps
[Epoch 0 Batch 1410/1540] avg loss 0.0117461, throughput 4.9166K wps
[Epoch 0 Batch 1440/1540] avg loss 0.0118945, throughput 4.9226K wps
[Epoch 0 Batch 1470/1540] avg loss 0.0117034, throughput 4.6803K wps
[Epoch 0 Batch 1500/1540] avg loss 0.0117434, throughput 4.99172K wps
[Epoch 0 Batch 1530/1540] avg loss 0.0115504, throughput 4.38582K wps
Begin Testing...
[Epoch 0] train avg loss 0.0127925, dev acc 0.7534, dev avg loss 0.586055, throughput 4.33071K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 1 Batch 30/1540] avg loss 0.0114031, throughput 5.1085K wps
[Epoch 1 Batch 60/1540] avg loss 0.0114035, throughput 5.14823K wps
[Epoch 1 Batch 90/1540] avg loss 0.0113484, throughput 4.71357K wps
[Epoch 1 Batch 120/1540] avg loss 0.0111048, throughput 4.9222K wps
[Epoch 1 Batch 150/1540] avg loss 0.0112204, throughput 4.70997K wps
[Epoch 1 Batch 180/1540] avg loss 0.0112529, throughput 5.18593K wps
[Epoch 1 Batch 210/1540] avg loss 0.0112402, throughput 4.57408K wps
[Epoch 1 Batch 240/1540] avg loss 0.0110806, throughput 5.4544K wps
[Epoch 1 Batch 270/1540] avg loss 0.0109454, throughput 4.39433K wps
[Epoch 1 Batch 300/1540] avg loss 0.011099, throughput 4.4822K wps
[Epoch 1 Batch 330/1540] avg loss 0.0108771, throughput 4.67019K wps
[Epoch 1 Batch 360/1540] avg loss 0.0108326, throughput 5.0821K wps
[Epoch 1 Batch 390/1540] avg loss 0.0110152, throughput 4.66666K wps
[Epoch 1 Batch 420/1540] avg loss 0.0107847, throughput 4.87537K wps
[Epoch 1 Batch 450/1540] avg loss 0.0108682, throughput 5.09467K wps
[Epoch 1 Batch 480/1540] avg loss 0.0104844, throughput 5.28681K wps
[Epoch 1 Batch 510/1540] avg loss 0.0107524, throughput 4.87359K wps
[Epoch 1 Batch 540/1540] avg loss 0.0104364, throughput 4.93682K wps
[Epoch 1 Batch 570/1540] avg loss 0.0104803, throughput 4.9965K wps
[Epoch 1 Batch 600/1540] avg loss 0.010426, throughput 4.73665K wps
[Epoch 1 Batch 630/1540] avg loss 0.0103928, throughput 4.42175K wps
[Epoch 1 Batch 660/1540] avg loss 0.0101953, throughput 4.75534K wps
[Epoch 1 Batch 690/1540] avg loss 0.0101117, throughput 4.73014K wps
[Epoch 1 Batch 720/1540] avg loss 0.0103745, throughput 4.96265K wps
[Epoch 1 Batch 750/1540] avg loss 0.0104955, throughput 4.62747K wps
[Epoch 1 Batch 780/1540] avg loss 0.0101885, throughput 4.8182K wps
[Epoch 1 Batch 810/1540] avg loss 0.010254, throughput 4.71134K wps
[Epoch 1 Batch 840/1540] avg loss 0.0102335, throughput 5.44157K wps
[Epoch 1 Batch 870/1540] avg loss 0.0101664, throughput 4.74353K wps
[Epoch 1 Batch 900/1540] avg loss 0.0101613, throughput 4.75225K wps
[Epoch 1 Batch 930/1540] avg loss 0.0103314, throughput 4.71415K wps
[Epoch 1 Batch 960/1540] avg loss 0.00970654, throughput 4.97567K wps
[Epoch 1 Batch 990/1540] avg loss 0.00994978, throughput 4.75406K wps
[Epoch 1 Batch 1020/1540] avg loss 0.00978917, throughput 4.38756K wps
[Epoch 1 Batch 1050/1540] avg loss 0.00946364, throughput 4.51277K wps
[Epoch 1 Batch 1080/1540] avg loss 0.00995845, throughput 4.63915K wps
[Epoch 1 Batch 1110/1540] avg loss 0.00963763, throughput 4.61651K wps
[Epoch 1 Batch 1140/1540] avg loss 0.00964271, throughput 4.4878K wps
[Epoch 1 Batch 1170/1540] avg loss 0.00950149, throughput 5.02884K wps
[Epoch 1 Batch 1200/1540] avg loss 0.00934346, throughput 4.81697K wps
[Epoch 1 Batch 1230/1540] avg loss 0.00953289, throughput 4.9947K wps
[Epoch 1 Batch 1260/1540] avg loss 0.00946964, throughput 5.27336K wps
[Epoch 1 Batch 1290/1540] avg loss 0.00918231, throughput 5.47243K wps
[Epoch 1 Batch 1320/1540] avg loss 0.00938752, throughput 5.01027K wps
[Epoch 1 Batch 1350/1540] avg loss 0.00953718, throughput 5.21049K wps
[Epoch 1 Batch 1380/1540] avg loss 0.00902953, throughput 5.2935K wps
[Epoch 1 Batch 1410/1540] avg loss 0.00886848, throughput 4.39697K wps
[Epoch 1 Batch 1440/1540] avg loss 0.00929226, throughput 4.5734K wps
[Epoch 1 Batch 1470/1540] avg loss 0.00928121, throughput 5.37083K wps
[Epoch 1 Batch 1500/1540] avg loss 0.00906143, throughput 4.71693K wps
[Epoch 1 Batch 1530/1540] avg loss 0.00924145, throughput 4.69483K wps
Begin Testing...
[Epoch 1] train avg loss 0.0102175, dev acc 0.7970, dev avg loss 0.478378, throughput 4.84648K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 2 Batch 30/1540] avg loss 0.00888431, throughput 5.62198K wps
[Epoch 2 Batch 60/1540] avg loss 0.00902721, throughput 4.88361K wps
[Epoch 2 Batch 90/1540] avg loss 0.00923355, throughput 4.41705K wps
[Epoch 2 Batch 120/1540] avg loss 0.00927417, throughput 4.48133K wps
[Epoch 2 Batch 150/1540] avg loss 0.00888103, throughput 5.05679K wps
[Epoch 2 Batch 180/1540] avg loss 0.00886782, throughput 4.70063K wps
[Epoch 2 Batch 210/1540] avg loss 0.00879544, throughput 5.1749K wps
[Epoch 2 Batch 240/1540] avg loss 0.00893365, throughput 5.15129K wps
[Epoch 2 Batch 270/1540] avg loss 0.00933671, throughput 4.72691K wps
[Epoch 2 Batch 300/1540] avg loss 0.00899716, throughput 4.8176K wps
[Epoch 2 Batch 330/1540] avg loss 0.00878147, throughput 4.97368K wps
[Epoch 2 Batch 360/1540] avg loss 0.00864625, throughput 5.09334K wps
[Epoch 2 Batch 390/1540] avg loss 0.00897448, throughput 5.13321K wps
[Epoch 2 Batch 420/1540] avg loss 0.00853823, throughput 4.51381K wps
[Epoch 2 Batch 450/1540] avg loss 0.00856942, throughput 4.74414K wps
[Epoch 2 Batch 480/1540] avg loss 0.00836215, throughput 4.90471K wps
[Epoch 2 Batch 510/1540] avg loss 0.00872918, throughput 4.72694K wps
[Epoch 2 Batch 540/1540] avg loss 0.00842333, throughput 4.83242K wps
[Epoch 2 Batch 570/1540] avg loss 0.0087047, throughput 4.49734K wps
[Epoch 2 Batch 600/1540] avg loss 0.00850284, throughput 4.51663K wps
[Epoch 2 Batch 630/1540] avg loss 0.0089826, throughput 4.66353K wps
[Epoch 2 Batch 660/1540] avg loss 0.00877523, throughput 5.65712K wps
[Epoch 2 Batch 690/1540] avg loss 0.00851036, throughput 5.1391K wps
[Epoch 2 Batch 720/1540] avg loss 0.00877433, throughput 5.81341K wps
[Epoch 2 Batch 750/1540] avg loss 0.00859932, throughput 4.88002K wps
[Epoch 2 Batch 780/1540] avg loss 0.0084007, throughput 4.84215K wps
[Epoch 2 Batch 810/1540] avg loss 0.00889708, throughput 4.66098K wps
[Epoch 2 Batch 840/1540] avg loss 0.00850276, throughput 5.43783K wps
[Epoch 2 Batch 870/1540] avg loss 0.00826028, throughput 4.58597K wps
[Epoch 2 Batch 900/1540] avg loss 0.00882848, throughput 4.70031K wps
[Epoch 2 Batch 930/1540] avg loss 0.00847947, throughput 4.84761K wps
[Epoch 2 Batch 960/1540] avg loss 0.00856268, throughput 4.93814K wps
[Epoch 2 Batch 990/1540] avg loss 0.00856858, throughput 4.44586K wps
[Epoch 2 Batch 1020/1540] avg loss 0.00875689, throughput 4.82057K wps
[Epoch 2 Batch 1050/1540] avg loss 0.00849652, throughput 5.07654K wps
[Epoch 2 Batch 1080/1540] avg loss 0.00866355, throughput 4.96695K wps
[Epoch 2 Batch 1110/1540] avg loss 0.00831604, throughput 5.25438K wps
[Epoch 2 Batch 1140/1540] avg loss 0.0081977, throughput 4.43798K wps
[Epoch 2 Batch 1170/1540] avg loss 0.00839652, throughput 5.03998K wps
[Epoch 2 Batch 1200/1540] avg loss 0.00846051, throughput 4.89923K wps
[Epoch 2 Batch 1230/1540] avg loss 0.00824108, throughput 4.87009K wps
[Epoch 2 Batch 1260/1540] avg loss 0.00816204, throughput 5.61718K wps
[Epoch 2 Batch 1290/1540] avg loss 0.00844974, throughput 4.8173K wps
[Epoch 2 Batch 1320/1540] avg loss 0.00838824, throughput 4.78352K wps
[Epoch 2 Batch 1350/1540] avg loss 0.00842136, throughput 4.64928K wps
[Epoch 2 Batch 1380/1540] avg loss 0.0083673, throughput 5.07037K wps
[Epoch 2 Batch 1410/1540] avg loss 0.00790145, throughput 5.38728K wps
[Epoch 2 Batch 1440/1540] avg loss 0.00873317, throughput 4.93124K wps
[Epoch 2 Batch 1470/1540] avg loss 0.00813661, throughput 4.67528K wps
[Epoch 2 Batch 1500/1540] avg loss 0.00779295, throughput 4.80698K wps
[Epoch 2 Batch 1530/1540] avg loss 0.00806087, throughput 4.98184K wps
Begin Testing...
[Epoch 2] train avg loss 0.00860297, dev acc 0.8177, dev avg loss 0.438096, throughput 4.89394K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 3 Batch 30/1540] avg loss 0.00744459, throughput 5.08313K wps
[Epoch 3 Batch 60/1540] avg loss 0.00786937, throughput 4.41259K wps
[Epoch 3 Batch 90/1540] avg loss 0.00789711, throughput 4.69612K wps
[Epoch 3 Batch 120/1540] avg loss 0.00787279, throughput 4.63962K wps
[Epoch 3 Batch 150/1540] avg loss 0.00769252, throughput 5.12405K wps
[Epoch 3 Batch 180/1540] avg loss 0.00805017, throughput 4.64286K wps
[Epoch 3 Batch 210/1540] avg loss 0.00772475, throughput 4.93487K wps
[Epoch 3 Batch 240/1540] avg loss 0.0086068, throughput 4.85473K wps
[Epoch 3 Batch 270/1540] avg loss 0.008176, throughput 4.9978K wps
[Epoch 3 Batch 300/1540] avg loss 0.00767153, throughput 4.70298K wps
[Epoch 3 Batch 330/1540] avg loss 0.00823967, throughput 4.90431K wps
[Epoch 3 Batch 360/1540] avg loss 0.0078491, throughput 4.95599K wps
[Epoch 3 Batch 390/1540] avg loss 0.00807278, throughput 4.64049K wps
[Epoch 3 Batch 420/1540] avg loss 0.00816592, throughput 4.7002K wps
[Epoch 3 Batch 450/1540] avg loss 0.00824119, throughput 4.49109K wps
[Epoch 3 Batch 480/1540] avg loss 0.0080822, throughput 4.81208K wps
[Epoch 3 Batch 510/1540] avg loss 0.00825905, throughput 5.49562K wps
[Epoch 3 Batch 540/1540] avg loss 0.00797653, throughput 4.96501K wps
[Epoch 3 Batch 570/1540] avg loss 0.00774393, throughput 5.54417K wps
[Epoch 3 Batch 600/1540] avg loss 0.00802929, throughput 5.40881K wps
[Epoch 3 Batch 630/1540] avg loss 0.00811996, throughput 4.99925K wps
[Epoch 3 Batch 660/1540] avg loss 0.00801715, throughput 4.97869K wps
[Epoch 3 Batch 690/1540] avg loss 0.00820484, throughput 4.8659K wps
[Epoch 3 Batch 720/1540] avg loss 0.00775219, throughput 4.66206K wps
[Epoch 3 Batch 750/1540] avg loss 0.00755087, throughput 4.67823K wps
[Epoch 3 Batch 780/1540] avg loss 0.00767439, throughput 4.9747K wps
[Epoch 3 Batch 810/1540] avg loss 0.00767995, throughput 4.84495K wps
[Epoch 3 Batch 840/1540] avg loss 0.00789385, throughput 4.96786K wps
[Epoch 3 Batch 870/1540] avg loss 0.00766366, throughput 4.54223K wps
[Epoch 3 Batch 900/1540] avg loss 0.00806589, throughput 4.82206K wps
[Epoch 3 Batch 930/1540] avg loss 0.00755522, throughput 4.56022K wps
[Epoch 3 Batch 960/1540] avg loss 0.00766548, throughput 4.59014K wps
[Epoch 3 Batch 990/1540] avg loss 0.00778531, throughput 5.08757K wps
[Epoch 3 Batch 1020/1540] avg loss 0.00826566, throughput 4.64955K wps
[Epoch 3 Batch 1050/1540] avg loss 0.00772524, throughput 4.9219K wps
[Epoch 3 Batch 1080/1540] avg loss 0.0078102, throughput 5.38497K wps
[Epoch 3 Batch 1110/1540] avg loss 0.00797961, throughput 4.85959K wps
[Epoch 3 Batch 1140/1540] avg loss 0.00763435, throughput 4.47651K wps
[Epoch 3 Batch 1170/1540] avg loss 0.00744389, throughput 4.72362K wps
[Epoch 3 Batch 1200/1540] avg loss 0.00737576, throughput 4.55302K wps
[Epoch 3 Batch 1230/1540] avg loss 0.0077835, throughput 4.90934K wps
[Epoch 3 Batch 1260/1540] avg loss 0.00776613, throughput 5.28942K wps
[Epoch 3 Batch 1290/1540] avg loss 0.00779842, throughput 5.84898K wps
[Epoch 3 Batch 1320/1540] avg loss 0.00802204, throughput 5.31616K wps
[Epoch 3 Batch 1350/1540] avg loss 0.00786847, throughput 4.67508K wps
[Epoch 3 Batch 1380/1540] avg loss 0.00770682, throughput 5.21615K wps
[Epoch 3 Batch 1410/1540] avg loss 0.00821475, throughput 5.55099K wps
[Epoch 3 Batch 1440/1540] avg loss 0.00760072, throughput 4.79231K wps
[Epoch 3 Batch 1470/1540] avg loss 0.00759243, throughput 5.23356K wps
[Epoch 3 Batch 1500/1540] avg loss 0.00788769, throughput 4.9415K wps
[Epoch 3 Batch 1530/1540] avg loss 0.00787783, throughput 5.01259K wps
Begin Testing...
[Epoch 3] train avg loss 0.00787805, dev acc 0.8165, dev avg loss 0.427746, throughput 4.89968K wps
[Epoch 4 Batch 30/1540] avg loss 0.00814007, throughput 5.36702K wps
[Epoch 4 Batch 60/1540] avg loss 0.00755531, throughput 5.44864K wps
[Epoch 4 Batch 90/1540] avg loss 0.00780467, throughput 4.33599K wps
[Epoch 4 Batch 120/1540] avg loss 0.00751751, throughput 5.40858K wps
[Epoch 4 Batch 150/1540] avg loss 0.00777986, throughput 5.05948K wps
[Epoch 4 Batch 180/1540] avg loss 0.00737176, throughput 4.66118K wps
[Epoch 4 Batch 210/1540] avg loss 0.00762507, throughput 4.94613K wps
[Epoch 4 Batch 240/1540] avg loss 0.00764447, throughput 5.37452K wps
[Epoch 4 Batch 270/1540] avg loss 0.00740544, throughput 4.69861K wps
[Epoch 4 Batch 300/1540] avg loss 0.00736625, throughput 5.09444K wps
[Epoch 4 Batch 330/1540] avg loss 0.00741362, throughput 4.74921K wps
[Epoch 4 Batch 360/1540] avg loss 0.00777479, throughput 4.73366K wps
[Epoch 4 Batch 390/1540] avg loss 0.00733948, throughput 4.78337K wps
[Epoch 4 Batch 420/1540] avg loss 0.00747123, throughput 5.36349K wps
[Epoch 4 Batch 450/1540] avg loss 0.00780795, throughput 4.97266K wps
[Epoch 4 Batch 480/1540] avg loss 0.00718875, throughput 4.89524K wps
[Epoch 4 Batch 510/1540] avg loss 0.00757272, throughput 4.64442K wps
[Epoch 4 Batch 540/1540] avg loss 0.00711121, throughput 4.59735K wps
[Epoch 4 Batch 570/1540] avg loss 0.0081434, throughput 4.35618K wps
[Epoch 4 Batch 600/1540] avg loss 0.00736412, throughput 4.63214K wps
[Epoch 4 Batch 630/1540] avg loss 0.00731479, throughput 4.39876K wps
[Epoch 4 Batch 660/1540] avg loss 0.00730865, throughput 4.44787K wps
[Epoch 4 Batch 690/1540] avg loss 0.00725556, throughput 4.49152K wps
[Epoch 4 Batch 720/1540] avg loss 0.00764979, throughput 4.9896K wps
[Epoch 4 Batch 750/1540] avg loss 0.0077179, throughput 4.94327K wps
[Epoch 4 Batch 780/1540] avg loss 0.00712762, throughput 5.40377K wps
[Epoch 4 Batch 810/1540] avg loss 0.00736856, throughput 4.40619K wps
[Epoch 4 Batch 840/1540] avg loss 0.0073472, throughput 5.23013K wps
[Epoch 4 Batch 870/1540] avg loss 0.00750943, throughput 5.41574K wps
[Epoch 4 Batch 900/1540] avg loss 0.00743828, throughput 4.80122K wps
[Epoch 4 Batch 930/1540] avg loss 0.00745924, throughput 4.56404K wps
[Epoch 4 Batch 960/1540] avg loss 0.00786013, throughput 4.60601K wps
[Epoch 4 Batch 990/1540] avg loss 0.00747006, throughput 4.74564K wps
[Epoch 4 Batch 1020/1540] avg loss 0.00769678, throughput 4.67883K wps
[Epoch 4 Batch 1050/1540] avg loss 0.00751729, throughput 5.30568K wps
[Epoch 4 Batch 1080/1540] avg loss 0.00704726, throughput 5.40163K wps
[Epoch 4 Batch 1110/1540] avg loss 0.00763421, throughput 5.09245K wps
[Epoch 4 Batch 1140/1540] avg loss 0.00733938, throughput 4.4624K wps
[Epoch 4 Batch 1170/1540] avg loss 0.00754691, throughput 5.33K wps
[Epoch 4 Batch 1200/1540] avg loss 0.00765444, throughput 4.69658K wps
[Epoch 4 Batch 1230/1540] avg loss 0.0076636, throughput 5.16501K wps
[Epoch 4 Batch 1260/1540] avg loss 0.00715675, throughput 4.8579K wps
[Epoch 4 Batch 1290/1540] avg loss 0.00737649, throughput 5.38247K wps
[Epoch 4 Batch 1320/1540] avg loss 0.00704854, throughput 4.88209K wps
[Epoch 4 Batch 1350/1540] avg loss 0.00734874, throughput 5.21612K wps
[Epoch 4 Batch 1380/1540] avg loss 0.00726952, throughput 4.66998K wps
[Epoch 4 Batch 1410/1540] avg loss 0.00759132, throughput 4.93197K wps
[Epoch 4 Batch 1440/1540] avg loss 0.00739093, throughput 5.15889K wps
[Epoch 4 Batch 1470/1540] avg loss 0.00756782, throughput 4.53689K wps
[Epoch 4 Batch 1500/1540] avg loss 0.00727878, throughput 4.5958K wps
[Epoch 4 Batch 1530/1540] avg loss 0.0071402, throughput 4.6838K wps
Begin Testing...
[Epoch 4] train avg loss 0.00748334, dev acc 0.8142, dev avg loss 0.410531, throughput 4.87473K wps
[Epoch 5 Batch 30/1540] avg loss 0.00698376, throughput 4.88636K wps
[Epoch 5 Batch 60/1540] avg loss 0.00746657, throughput 4.8774K wps
[Epoch 5 Batch 90/1540] avg loss 0.00700226, throughput 5.38579K wps
[Epoch 5 Batch 120/1540] avg loss 0.00726955, throughput 5.23251K wps
[Epoch 5 Batch 150/1540] avg loss 0.00706542, throughput 5.22312K wps
[Epoch 5 Batch 180/1540] avg loss 0.00706871, throughput 4.4629K wps
[Epoch 5 Batch 210/1540] avg loss 0.0070968, throughput 4.90479K wps
[Epoch 5 Batch 240/1540] avg loss 0.00718396, throughput 4.7256K wps
[Epoch 5 Batch 270/1540] avg loss 0.00691182, throughput 5.09887K wps
[Epoch 5 Batch 300/1540] avg loss 0.00693339, throughput 4.39906K wps
[Epoch 5 Batch 330/1540] avg loss 0.00743552, throughput 4.65553K wps
[Epoch 5 Batch 360/1540] avg loss 0.00720827, throughput 4.40717K wps
[Epoch 5 Batch 390/1540] avg loss 0.00776801, throughput 4.5452K wps
[Epoch 5 Batch 420/1540] avg loss 0.00711867, throughput 5.73254K wps
[Epoch 5 Batch 450/1540] avg loss 0.00697242, throughput 4.9487K wps
[Epoch 5 Batch 480/1540] avg loss 0.00729748, throughput 5.14386K wps
[Epoch 5 Batch 510/1540] avg loss 0.00702295, throughput 5.0503K wps
[Epoch 5 Batch 540/1540] avg loss 0.00711366, throughput 4.98723K wps
[Epoch 5 Batch 570/1540] avg loss 0.00688254, throughput 5.5226K wps
[Epoch 5 Batch 600/1540] avg loss 0.00760144, throughput 5.30856K wps
[Epoch 5 Batch 630/1540] avg loss 0.0074083, throughput 5.16266K wps
[Epoch 5 Batch 660/1540] avg loss 0.00748703, throughput 5.22948K wps
[Epoch 5 Batch 690/1540] avg loss 0.00713271, throughput 4.73859K wps
[Epoch 5 Batch 720/1540] avg loss 0.00717553, throughput 5.025K wps
[Epoch 5 Batch 750/1540] avg loss 0.00706379, throughput 4.8469K wps
[Epoch 5 Batch 780/1540] avg loss 0.0072222, throughput 5.04145K wps
[Epoch 5 Batch 810/1540] avg loss 0.00677923, throughput 4.76511K wps
[Epoch 5 Batch 840/1540] avg loss 0.00744712, throughput 5.45393K wps
[Epoch 5 Batch 870/1540] avg loss 0.00717499, throughput 4.5513K wps
[Epoch 5 Batch 900/1540] avg loss 0.00719446, throughput 4.88712K wps
[Epoch 5 Batch 930/1540] avg loss 0.00735827, throughput 5.05632K wps
[Epoch 5 Batch 960/1540] avg loss 0.00729623, throughput 4.52807K wps
[Epoch 5 Batch 990/1540] avg loss 0.00670369, throughput 4.99807K wps
[Epoch 5 Batch 1020/1540] avg loss 0.00660669, throughput 4.97035K wps
[Epoch 5 Batch 1050/1540] avg loss 0.00678355, throughput 4.86849K wps
[Epoch 5 Batch 1080/1540] avg loss 0.0070567, throughput 5.5128K wps
[Epoch 5 Batch 1110/1540] avg loss 0.00696985, throughput 4.94432K wps
[Epoch 5 Batch 1140/1540] avg loss 0.00739558, throughput 5.06408K wps
[Epoch 5 Batch 1170/1540] avg loss 0.00708616, throughput 5.45766K wps
[Epoch 5 Batch 1200/1540] avg loss 0.00649615, throughput 4.76893K wps
[Epoch 5 Batch 1230/1540] avg loss 0.0072315, throughput 4.75171K wps
[Epoch 5 Batch 1260/1540] avg loss 0.00726507, throughput 4.84106K wps
[Epoch 5 Batch 1290/1540] avg loss 0.00743232, throughput 5.00764K wps
[Epoch 5 Batch 1320/1540] avg loss 0.00760743, throughput 5.24893K wps
[Epoch 5 Batch 1350/1540] avg loss 0.00699635, throughput 4.92624K wps
[Epoch 5 Batch 1380/1540] avg loss 0.00699887, throughput 4.52143K wps
[Epoch 5 Batch 1410/1540] avg loss 0.00684277, throughput 4.80524K wps
[Epoch 5 Batch 1440/1540] avg loss 0.00758892, throughput 4.72177K wps
[Epoch 5 Batch 1470/1540] avg loss 0.00704323, throughput 4.45831K wps
[Epoch 5 Batch 1500/1540] avg loss 0.00666414, throughput 5.00981K wps
[Epoch 5 Batch 1530/1540] avg loss 0.00714523, throughput 4.99025K wps
Begin Testing...
[Epoch 5] train avg loss 0.00714215, dev acc 0.8222, dev avg loss 0.404485, throughput 4.93588K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 6 Batch 30/1540] avg loss 0.00704935, throughput 4.64005K wps
[Epoch 6 Batch 60/1540] avg loss 0.00696969, throughput 4.70492K wps
[Epoch 6 Batch 90/1540] avg loss 0.00726569, throughput 5.47991K wps
[Epoch 6 Batch 120/1540] avg loss 0.00693565, throughput 4.89494K wps
[Epoch 6 Batch 150/1540] avg loss 0.00696406, throughput 5.44796K wps
[Epoch 6 Batch 180/1540] avg loss 0.00672263, throughput 5.69107K wps
[Epoch 6 Batch 210/1540] avg loss 0.00672217, throughput 4.93821K wps
[Epoch 6 Batch 240/1540] avg loss 0.00757296, throughput 4.80243K wps
[Epoch 6 Batch 270/1540] avg loss 0.00714515, throughput 4.47209K wps
[Epoch 6 Batch 300/1540] avg loss 0.00688173, throughput 5.02295K wps
[Epoch 6 Batch 330/1540] avg loss 0.00642817, throughput 5.24375K wps
[Epoch 6 Batch 360/1540] avg loss 0.00683405, throughput 4.68767K wps
[Epoch 6 Batch 390/1540] avg loss 0.00708821, throughput 4.82543K wps
[Epoch 6 Batch 420/1540] avg loss 0.00664504, throughput 5.0227K wps
[Epoch 6 Batch 450/1540] avg loss 0.00699367, throughput 4.66527K wps
[Epoch 6 Batch 480/1540] avg loss 0.00739669, throughput 5.17039K wps
[Epoch 6 Batch 510/1540] avg loss 0.00647442, throughput 4.86753K wps
[Epoch 6 Batch 540/1540] avg loss 0.00694817, throughput 5.73181K wps
[Epoch 6 Batch 570/1540] avg loss 0.00663093, throughput 5.61647K wps
[Epoch 6 Batch 600/1540] avg loss 0.00706083, throughput 4.8802K wps
[Epoch 6 Batch 630/1540] avg loss 0.00683201, throughput 5.4186K wps
[Epoch 6 Batch 660/1540] avg loss 0.00686174, throughput 4.71482K wps
[Epoch 6 Batch 690/1540] avg loss 0.00658454, throughput 4.76439K wps
[Epoch 6 Batch 720/1540] avg loss 0.0071687, throughput 4.97743K wps
[Epoch 6 Batch 750/1540] avg loss 0.00685938, throughput 4.73672K wps
[Epoch 6 Batch 780/1540] avg loss 0.00683536, throughput 4.35115K wps
[Epoch 6 Batch 810/1540] avg loss 0.00704994, throughput 4.97396K wps
[Epoch 6 Batch 840/1540] avg loss 0.00703107, throughput 5.03643K wps
[Epoch 6 Batch 870/1540] avg loss 0.0073042, throughput 4.80814K wps
[Epoch 6 Batch 900/1540] avg loss 0.00695072, throughput 4.90489K wps
[Epoch 6 Batch 930/1540] avg loss 0.00665832, throughput 4.77566K wps
[Epoch 6 Batch 960/1540] avg loss 0.00690134, throughput 4.82684K wps
[Epoch 6 Batch 990/1540] avg loss 0.00685261, throughput 4.40739K wps
[Epoch 6 Batch 1020/1540] avg loss 0.00656486, throughput 5.8215K wps
[Epoch 6 Batch 1050/1540] avg loss 0.00636435, throughput 5.02624K wps
[Epoch 6 Batch 1080/1540] avg loss 0.00736872, throughput 5.32616K wps
[Epoch 6 Batch 1110/1540] avg loss 0.00699039, throughput 5.36831K wps
[Epoch 6 Batch 1140/1540] avg loss 0.00705574, throughput 4.94597K wps
[Epoch 6 Batch 1170/1540] avg loss 0.00688179, throughput 5.17595K wps
[Epoch 6 Batch 1200/1540] avg loss 0.00699092, throughput 4.93392K wps
[Epoch 6 Batch 1230/1540] avg loss 0.00677448, throughput 5.25193K wps
[Epoch 6 Batch 1260/1540] avg loss 0.00686501, throughput 4.77526K wps
[Epoch 6 Batch 1290/1540] avg loss 0.00683827, throughput 4.95424K wps
[Epoch 6 Batch 1320/1540] avg loss 0.00664792, throughput 5.19433K wps
[Epoch 6 Batch 1350/1540] avg loss 0.00707721, throughput 4.67957K wps
[Epoch 6 Batch 1380/1540] avg loss 0.00660116, throughput 4.92897K wps
[Epoch 6 Batch 1410/1540] avg loss 0.00696541, throughput 4.42815K wps
[Epoch 6 Batch 1440/1540] avg loss 0.00720966, throughput 5.28898K wps
[Epoch 6 Batch 1470/1540] avg loss 0.00663515, throughput 4.40775K wps
[Epoch 6 Batch 1500/1540] avg loss 0.00639781, throughput 4.79335K wps
[Epoch 6 Batch 1530/1540] avg loss 0.00714464, throughput 4.60439K wps
Begin Testing...
[Epoch 6] train avg loss 0.00689893, dev acc 0.8188, dev avg loss 0.39878, throughput 4.94259K wps
[Epoch 7 Batch 30/1540] avg loss 0.00669546, throughput 5.11895K wps
[Epoch 7 Batch 60/1540] avg loss 0.00705023, throughput 4.59844K wps
[Epoch 7 Batch 90/1540] avg loss 0.00641895, throughput 4.7199K wps
[Epoch 7 Batch 120/1540] avg loss 0.00627009, throughput 4.80793K wps
[Epoch 7 Batch 150/1540] avg loss 0.00682293, throughput 5.09035K wps
[Epoch 7 Batch 180/1540] avg loss 0.00661473, throughput 4.63799K wps
[Epoch 7 Batch 210/1540] avg loss 0.00668423, throughput 5.68439K wps
[Epoch 7 Batch 240/1540] avg loss 0.00739309, throughput 4.78049K wps
[Epoch 7 Batch 270/1540] avg loss 0.00677267, throughput 4.95712K wps
[Epoch 7 Batch 300/1540] avg loss 0.00657718, throughput 5.28631K wps
[Epoch 7 Batch 330/1540] avg loss 0.00639807, throughput 4.7315K wps
[Epoch 7 Batch 360/1540] avg loss 0.0063842, throughput 4.82649K wps
[Epoch 7 Batch 390/1540] avg loss 0.00659191, throughput 5.20272K wps
[Epoch 7 Batch 420/1540] avg loss 0.00686328, throughput 4.77543K wps
[Epoch 7 Batch 450/1540] avg loss 0.00724698, throughput 5.12966K wps
[Epoch 7 Batch 480/1540] avg loss 0.00661914, throughput 4.52076K wps
[Epoch 7 Batch 510/1540] avg loss 0.0068893, throughput 5.14797K wps
[Epoch 7 Batch 540/1540] avg loss 0.00668039, throughput 4.96417K wps
[Epoch 7 Batch 570/1540] avg loss 0.00678658, throughput 5.11937K wps
[Epoch 7 Batch 600/1540] avg loss 0.00665905, throughput 5.02389K wps
[Epoch 7 Batch 630/1540] avg loss 0.0067972, throughput 4.44941K wps
[Epoch 7 Batch 660/1540] avg loss 0.00706695, throughput 4.97246K wps
[Epoch 7 Batch 690/1540] avg loss 0.00695558, throughput 4.92014K wps
[Epoch 7 Batch 720/1540] avg loss 0.00721423, throughput 4.99954K wps
[Epoch 7 Batch 750/1540] avg loss 0.00688935, throughput 4.74802K wps
[Epoch 7 Batch 780/1540] avg loss 0.00670113, throughput 4.84359K wps
[Epoch 7 Batch 810/1540] avg loss 0.00621113, throughput 4.74301K wps
[Epoch 7 Batch 840/1540] avg loss 0.00694913, throughput 4.84962K wps
[Epoch 7 Batch 870/1540] avg loss 0.00699547, throughput 5.15138K wps
[Epoch 7 Batch 900/1540] avg loss 0.00647996, throughput 4.84607K wps
[Epoch 7 Batch 930/1540] avg loss 0.0061732, throughput 5.05863K wps
[Epoch 7 Batch 960/1540] avg loss 0.00679937, throughput 4.69038K wps
[Epoch 7 Batch 990/1540] avg loss 0.0065634, throughput 4.67384K wps
[Epoch 7 Batch 1020/1540] avg loss 0.00642505, throughput 5.13471K wps
[Epoch 7 Batch 1050/1540] avg loss 0.00672234, throughput 4.93525K wps
[Epoch 7 Batch 1080/1540] avg loss 0.00614129, throughput 5.45903K wps
[Epoch 7 Batch 1110/1540] avg loss 0.00625274, throughput 5.01344K wps
[Epoch 7 Batch 1140/1540] avg loss 0.00676808, throughput 5.29162K wps
[Epoch 7 Batch 1170/1540] avg loss 0.00648628, throughput 5.12402K wps
[Epoch 7 Batch 1200/1540] avg loss 0.00678844, throughput 5.00745K wps
[Epoch 7 Batch 1230/1540] avg loss 0.00694411, throughput 4.60836K wps
[Epoch 7 Batch 1260/1540] avg loss 0.00689851, throughput 5.03478K wps
[Epoch 7 Batch 1290/1540] avg loss 0.00668537, throughput 5.18988K wps
[Epoch 7 Batch 1320/1540] avg loss 0.00691833, throughput 4.77831K wps
[Epoch 7 Batch 1350/1540] avg loss 0.00635849, throughput 4.61379K wps
[Epoch 7 Batch 1380/1540] avg loss 0.00668575, throughput 4.90448K wps
[Epoch 7 Batch 1410/1540] avg loss 0.00630767, throughput 4.69739K wps
[Epoch 7 Batch 1440/1540] avg loss 0.00688852, throughput 5.2015K wps
[Epoch 7 Batch 1470/1540] avg loss 0.0064806, throughput 4.37748K wps
[Epoch 7 Batch 1500/1540] avg loss 0.0065121, throughput 5.30206K wps
[Epoch 7 Batch 1530/1540] avg loss 0.00679277, throughput 5.42781K wps
Begin Testing...
[Epoch 7] train avg loss 0.00669008, dev acc 0.8245, dev avg loss 0.395516, throughput 4.93024K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 8 Batch 30/1540] avg loss 0.00618719, throughput 4.43013K wps
[Epoch 8 Batch 60/1540] avg loss 0.00620646, throughput 4.44691K wps
[Epoch 8 Batch 90/1540] avg loss 0.00621256, throughput 5.57861K wps
[Epoch 8 Batch 120/1540] avg loss 0.00674495, throughput 4.91091K wps
[Epoch 8 Batch 150/1540] avg loss 0.00651294, throughput 5.10724K wps
[Epoch 8 Batch 180/1540] avg loss 0.00659201, throughput 4.76095K wps
[Epoch 8 Batch 210/1540] avg loss 0.00621828, throughput 4.86951K wps
[Epoch 8 Batch 240/1540] avg loss 0.0065963, throughput 4.87155K wps
[Epoch 8 Batch 270/1540] avg loss 0.0063315, throughput 4.75811K wps
[Epoch 8 Batch 300/1540] avg loss 0.0065438, throughput 5.01469K wps
[Epoch 8 Batch 330/1540] avg loss 0.00653663, throughput 4.97502K wps
[Epoch 8 Batch 360/1540] avg loss 0.00614317, throughput 4.53591K wps
[Epoch 8 Batch 390/1540] avg loss 0.00606554, throughput 5.03717K wps
[Epoch 8 Batch 420/1540] avg loss 0.00639973, throughput 5.28977K wps
[Epoch 8 Batch 450/1540] avg loss 0.0064332, throughput 5.12238K wps
[Epoch 8 Batch 480/1540] avg loss 0.00667194, throughput 4.98138K wps
[Epoch 8 Batch 510/1540] avg loss 0.00627198, throughput 5.52604K wps
[Epoch 8 Batch 540/1540] avg loss 0.0065515, throughput 5.30895K wps
[Epoch 8 Batch 570/1540] avg loss 0.00673688, throughput 5.63098K wps
[Epoch 8 Batch 600/1540] avg loss 0.00651553, throughput 5.06138K wps
[Epoch 8 Batch 630/1540] avg loss 0.00638527, throughput 4.52609K wps
[Epoch 8 Batch 660/1540] avg loss 0.00638196, throughput 4.73905K wps
[Epoch 8 Batch 690/1540] avg loss 0.00687543, throughput 4.96013K wps
[Epoch 8 Batch 720/1540] avg loss 0.00627399, throughput 4.97058K wps
[Epoch 8 Batch 750/1540] avg loss 0.00664053, throughput 4.9505K wps
[Epoch 8 Batch 780/1540] avg loss 0.006283, throughput 5.0792K wps
[Epoch 8 Batch 810/1540] avg loss 0.00645539, throughput 4.78092K wps
[Epoch 8 Batch 840/1540] avg loss 0.00656648, throughput 4.73762K wps
[Epoch 8 Batch 870/1540] avg loss 0.00664858, throughput 5.08503K wps
[Epoch 8 Batch 900/1540] avg loss 0.00601162, throughput 5.30497K wps
[Epoch 8 Batch 930/1540] avg loss 0.00641602, throughput 5.26501K wps
[Epoch 8 Batch 960/1540] avg loss 0.00609971, throughput 5.08921K wps
[Epoch 8 Batch 990/1540] avg loss 0.00654285, throughput 4.63559K wps
[Epoch 8 Batch 1020/1540] avg loss 0.00700242, throughput 4.94147K wps
[Epoch 8 Batch 1050/1540] avg loss 0.00655382, throughput 4.86788K wps
[Epoch 8 Batch 1080/1540] avg loss 0.00615387, throughput 5.0269K wps
[Epoch 8 Batch 1110/1540] avg loss 0.00645727, throughput 5.18308K wps
[Epoch 8 Batch 1140/1540] avg loss 0.00646993, throughput 4.77045K wps
[Epoch 8 Batch 1170/1540] avg loss 0.00649111, throughput 5.98927K wps
[Epoch 8 Batch 1200/1540] avg loss 0.00733493, throughput 4.82798K wps
[Epoch 8 Batch 1230/1540] avg loss 0.00675547, throughput 4.8619K wps
[Epoch 8 Batch 1260/1540] avg loss 0.00663339, throughput 4.84764K wps
[Epoch 8 Batch 1290/1540] avg loss 0.00636892, throughput 4.50274K wps
[Epoch 8 Batch 1320/1540] avg loss 0.00661846, throughput 4.62669K wps
[Epoch 8 Batch 1350/1540] avg loss 0.0063858, throughput 4.49483K wps
[Epoch 8 Batch 1380/1540] avg loss 0.00652565, throughput 5.4977K wps
[Epoch 8 Batch 1410/1540] avg loss 0.00603569, throughput 4.9724K wps
[Epoch 8 Batch 1440/1540] avg loss 0.00639871, throughput 5.32516K wps
[Epoch 8 Batch 1470/1540] avg loss 0.00664515, throughput 5.17408K wps
[Epoch 8 Batch 1500/1540] avg loss 0.00644675, throughput 4.62168K wps
[Epoch 8 Batch 1530/1540] avg loss 0.00628025, throughput 5.13233K wps
Begin Testing...
[Epoch 8] train avg loss 0.00646534, dev acc 0.8268, dev avg loss 0.390527, throughput 4.95875K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 9 Batch 30/1540] avg loss 0.00629277, throughput 4.85584K wps
[Epoch 9 Batch 60/1540] avg loss 0.00642477, throughput 5.08184K wps
[Epoch 9 Batch 90/1540] avg loss 0.00603803, throughput 5.17993K wps
[Epoch 9 Batch 120/1540] avg loss 0.00644385, throughput 4.8294K wps
[Epoch 9 Batch 150/1540] avg loss 0.00671991, throughput 4.69572K wps
[Epoch 9 Batch 180/1540] avg loss 0.00642774, throughput 4.86739K wps
[Epoch 9 Batch 210/1540] avg loss 0.00668729, throughput 5.06534K wps
[Epoch 9 Batch 240/1540] avg loss 0.00622879, throughput 5.46824K wps
[Epoch 9 Batch 270/1540] avg loss 0.00615117, throughput 4.98604K wps
[Epoch 9 Batch 300/1540] avg loss 0.00646761, throughput 5.14326K wps
[Epoch 9 Batch 330/1540] avg loss 0.00605138, throughput 4.62324K wps
[Epoch 9 Batch 360/1540] avg loss 0.00641753, throughput 4.92388K wps
[Epoch 9 Batch 390/1540] avg loss 0.00601617, throughput 4.58493K wps
[Epoch 9 Batch 420/1540] avg loss 0.00601013, throughput 5.66067K wps
[Epoch 9 Batch 450/1540] avg loss 0.00590578, throughput 5.35177K wps
[Epoch 9 Batch 480/1540] avg loss 0.00630066, throughput 4.89131K wps
[Epoch 9 Batch 510/1540] avg loss 0.0060868, throughput 4.5686K wps
[Epoch 9 Batch 540/1540] avg loss 0.00601163, throughput 4.78311K wps
[Epoch 9 Batch 570/1540] avg loss 0.00653387, throughput 5.89255K wps
[Epoch 9 Batch 600/1540] avg loss 0.00624479, throughput 4.77894K wps
[Epoch 9 Batch 630/1540] avg loss 0.00604674, throughput 4.58391K wps
[Epoch 9 Batch 660/1540] avg loss 0.00681771, throughput 4.86349K wps
[Epoch 9 Batch 690/1540] avg loss 0.00605106, throughput 4.68017K wps
[Epoch 9 Batch 720/1540] avg loss 0.00633203, throughput 4.92995K wps
[Epoch 9 Batch 750/1540] avg loss 0.0062838, throughput 5.25976K wps
[Epoch 9 Batch 780/1540] avg loss 0.00608783, throughput 4.82522K wps
[Epoch 9 Batch 810/1540] avg loss 0.00641975, throughput 5.27671K wps
[Epoch 9 Batch 840/1540] avg loss 0.00595523, throughput 4.92727K wps
[Epoch 9 Batch 870/1540] avg loss 0.00642018, throughput 4.68123K wps
[Epoch 9 Batch 900/1540] avg loss 0.006272, throughput 4.48294K wps
[Epoch 9 Batch 930/1540] avg loss 0.00677624, throughput 4.62427K wps
[Epoch 9 Batch 960/1540] avg loss 0.00607322, throughput 4.621K wps
[Epoch 9 Batch 990/1540] avg loss 0.00634481, throughput 5.146K wps
[Epoch 9 Batch 1020/1540] avg loss 0.00635884, throughput 5.15007K wps
[Epoch 9 Batch 1050/1540] avg loss 0.00663712, throughput 4.86902K wps
[Epoch 9 Batch 1080/1540] avg loss 0.00623741, throughput 4.9204K wps
[Epoch 9 Batch 1110/1540] avg loss 0.00613929, throughput 4.99752K wps
[Epoch 9 Batch 1140/1540] avg loss 0.00638847, throughput 5.17768K wps
[Epoch 9 Batch 1170/1540] avg loss 0.00639421, throughput 4.63448K wps
[Epoch 9 Batch 1200/1540] avg loss 0.00631949, throughput 4.81599K wps
[Epoch 9 Batch 1230/1540] avg loss 0.00619139, throughput 4.5118K wps
[Epoch 9 Batch 1260/1540] avg loss 0.00657995, throughput 4.699K wps
[Epoch 9 Batch 1290/1540] avg loss 0.00593306, throughput 4.59952K wps
[Epoch 9 Batch 1320/1540] avg loss 0.00634505, throughput 5.08618K wps
[Epoch 9 Batch 1350/1540] avg loss 0.0060504, throughput 4.51273K wps
[Epoch 9 Batch 1380/1540] avg loss 0.00643505, throughput 4.61099K wps
[Epoch 9 Batch 1410/1540] avg loss 0.00627511, throughput 4.69747K wps
[Epoch 9 Batch 1440/1540] avg loss 0.00617321, throughput 5.30779K wps
[Epoch 9 Batch 1470/1540] avg loss 0.00646775, throughput 5.12134K wps
[Epoch 9 Batch 1500/1540] avg loss 0.00635952, throughput 4.67871K wps
[Epoch 9 Batch 1530/1540] avg loss 0.00603203, throughput 5.09995K wps
Begin Testing...
[Epoch 9] train avg loss 0.00630189, dev acc 0.8280, dev avg loss 0.389297, throughput 4.89858K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 10 Batch 30/1540] avg loss 0.00593132, throughput 4.60558K wps
[Epoch 10 Batch 60/1540] avg loss 0.00602175, throughput 4.97986K wps
[Epoch 10 Batch 90/1540] avg loss 0.00644272, throughput 4.63513K wps
[Epoch 10 Batch 120/1540] avg loss 0.00586856, throughput 4.45316K wps
[Epoch 10 Batch 150/1540] avg loss 0.00600659, throughput 4.72687K wps
[Epoch 10 Batch 180/1540] avg loss 0.00575493, throughput 4.61924K wps
[Epoch 10 Batch 210/1540] avg loss 0.00585061, throughput 5.21677K wps
[Epoch 10 Batch 240/1540] avg loss 0.00662594, throughput 4.71996K wps
[Epoch 10 Batch 270/1540] avg loss 0.00652157, throughput 5.85439K wps
[Epoch 10 Batch 300/1540] avg loss 0.0060541, throughput 4.7409K wps
[Epoch 10 Batch 330/1540] avg loss 0.00588738, throughput 5.33116K wps
[Epoch 10 Batch 360/1540] avg loss 0.00590786, throughput 5.39197K wps
[Epoch 10 Batch 390/1540] avg loss 0.00614246, throughput 4.61764K wps
[Epoch 10 Batch 420/1540] avg loss 0.00599346, throughput 4.75695K wps
[Epoch 10 Batch 450/1540] avg loss 0.00600313, throughput 5.08738K wps
[Epoch 10 Batch 480/1540] avg loss 0.00649259, throughput 4.74K wps
[Epoch 10 Batch 510/1540] avg loss 0.00593357, throughput 4.70399K wps
[Epoch 10 Batch 540/1540] avg loss 0.00621807, throughput 4.83764K wps
[Epoch 10 Batch 570/1540] avg loss 0.00629482, throughput 4.65743K wps
[Epoch 10 Batch 600/1540] avg loss 0.00651628, throughput 4.79638K wps
[Epoch 10 Batch 630/1540] avg loss 0.00675736, throughput 4.6167K wps
[Epoch 10 Batch 660/1540] avg loss 0.00620789, throughput 4.41092K wps
[Epoch 10 Batch 690/1540] avg loss 0.00561796, throughput 4.8846K wps
[Epoch 10 Batch 720/1540] avg loss 0.00599384, throughput 5.49816K wps
[Epoch 10 Batch 750/1540] avg loss 0.00620521, throughput 5.347K wps
[Epoch 10 Batch 780/1540] avg loss 0.00673467, throughput 5.12839K wps
[Epoch 10 Batch 810/1540] avg loss 0.00588667, throughput 4.70651K wps
[Epoch 10 Batch 840/1540] avg loss 0.00605031, throughput 4.75165K wps
[Epoch 10 Batch 870/1540] avg loss 0.0064442, throughput 4.63034K wps
[Epoch 10 Batch 900/1540] avg loss 0.00636826, throughput 4.83909K wps
[Epoch 10 Batch 930/1540] avg loss 0.00619101, throughput 5.14264K wps
[Epoch 10 Batch 960/1540] avg loss 0.00610976, throughput 4.59095K wps
[Epoch 10 Batch 990/1540] avg loss 0.00631808, throughput 4.87887K wps
[Epoch 10 Batch 1020/1540] avg loss 0.00640393, throughput 4.54537K wps
[Epoch 10 Batch 1050/1540] avg loss 0.00632192, throughput 4.51323K wps
[Epoch 10 Batch 1080/1540] avg loss 0.00705651, throughput 4.65306K wps
[Epoch 10 Batch 1110/1540] avg loss 0.00597271, throughput 5.03687K wps
[Epoch 10 Batch 1140/1540] avg loss 0.00592656, throughput 4.97672K wps
[Epoch 10 Batch 1170/1540] avg loss 0.00598906, throughput 5.35528K wps
[Epoch 10 Batch 1200/1540] avg loss 0.00619691, throughput 4.9448K wps
[Epoch 10 Batch 1230/1540] avg loss 0.00618302, throughput 4.54242K wps
[Epoch 10 Batch 1260/1540] avg loss 0.00621743, throughput 4.60964K wps
[Epoch 10 Batch 1290/1540] avg loss 0.00626494, throughput 5.01495K wps
[Epoch 10 Batch 1320/1540] avg loss 0.00582336, throughput 5.13625K wps
[Epoch 10 Batch 1350/1540] avg loss 0.0058217, throughput 4.89107K wps
[Epoch 10 Batch 1380/1540] avg loss 0.00583907, throughput 5.10178K wps
[Epoch 10 Batch 1410/1540] avg loss 0.00611433, throughput 5.54443K wps
[Epoch 10 Batch 1440/1540] avg loss 0.00633245, throughput 4.8686K wps
[Epoch 10 Batch 1470/1540] avg loss 0.00602884, throughput 4.74788K wps
[Epoch 10 Batch 1500/1540] avg loss 0.00627551, throughput 4.65557K wps
[Epoch 10 Batch 1530/1540] avg loss 0.00593028, throughput 5.16407K wps
Begin Testing...
[Epoch 10] train avg loss 0.00616365, dev acc 0.8257, dev avg loss 0.385045, throughput 4.86889K wps
[Epoch 11 Batch 30/1540] avg loss 0.00647583, throughput 5.3984K wps
[Epoch 11 Batch 60/1540] avg loss 0.00678192, throughput 4.6501K wps
[Epoch 11 Batch 90/1540] avg loss 0.00568507, throughput 5.21337K wps
[Epoch 11 Batch 120/1540] avg loss 0.00602612, throughput 5.32671K wps
[Epoch 11 Batch 150/1540] avg loss 0.00630723, throughput 4.62174K wps
[Epoch 11 Batch 180/1540] avg loss 0.00579428, throughput 5.25829K wps
[Epoch 11 Batch 210/1540] avg loss 0.00619804, throughput 5.42928K wps
[Epoch 11 Batch 240/1540] avg loss 0.00591598, throughput 4.65235K wps
[Epoch 11 Batch 270/1540] avg loss 0.00577443, throughput 4.4969K wps
[Epoch 11 Batch 300/1540] avg loss 0.00597607, throughput 4.6047K wps
[Epoch 11 Batch 330/1540] avg loss 0.0065973, throughput 5.00109K wps
[Epoch 11 Batch 360/1540] avg loss 0.00568143, throughput 4.92454K wps
[Epoch 11 Batch 390/1540] avg loss 0.00598992, throughput 5.07784K wps
[Epoch 11 Batch 420/1540] avg loss 0.00626758, throughput 5.36216K wps
[Epoch 11 Batch 450/1540] avg loss 0.00613, throughput 5.0043K wps
[Epoch 11 Batch 480/1540] avg loss 0.00617572, throughput 4.66022K wps
[Epoch 11 Batch 510/1540] avg loss 0.00596682, throughput 4.92671K wps
[Epoch 11 Batch 540/1540] avg loss 0.00593657, throughput 5.92612K wps
[Epoch 11 Batch 570/1540] avg loss 0.00529007, throughput 4.54222K wps
[Epoch 11 Batch 600/1540] avg loss 0.00527764, throughput 4.61156K wps
[Epoch 11 Batch 630/1540] avg loss 0.00588233, throughput 4.56804K wps
[Epoch 11 Batch 660/1540] avg loss 0.00599488, throughput 5.06995K wps
[Epoch 11 Batch 690/1540] avg loss 0.00557013, throughput 4.81343K wps
[Epoch 11 Batch 720/1540] avg loss 0.00593011, throughput 5.26234K wps
[Epoch 11 Batch 750/1540] avg loss 0.00572813, throughput 4.65919K wps
[Epoch 11 Batch 780/1540] avg loss 0.00553646, throughput 4.52473K wps
[Epoch 11 Batch 810/1540] avg loss 0.00616459, throughput 5.36764K wps
[Epoch 11 Batch 840/1540] avg loss 0.00574496, throughput 5.15123K wps
[Epoch 11 Batch 870/1540] avg loss 0.00611395, throughput 4.71723K wps
[Epoch 11 Batch 900/1540] avg loss 0.00601689, throughput 5.17357K wps
[Epoch 11 Batch 930/1540] avg loss 0.00601963, throughput 5.11982K wps
[Epoch 11 Batch 960/1540] avg loss 0.00603421, throughput 5.45126K wps
[Epoch 11 Batch 990/1540] avg loss 0.00632794, throughput 4.4671K wps
[Epoch 11 Batch 1020/1540] avg loss 0.0061001, throughput 4.89771K wps
[Epoch 11 Batch 1050/1540] avg loss 0.00580262, throughput 5.38868K wps
[Epoch 11 Batch 1080/1540] avg loss 0.00580099, throughput 4.65203K wps
[Epoch 11 Batch 1110/1540] avg loss 0.0057805, throughput 4.67377K wps
[Epoch 11 Batch 1140/1540] avg loss 0.00636554, throughput 4.55083K wps
[Epoch 11 Batch 1170/1540] avg loss 0.00584727, throughput 5.24711K wps
[Epoch 11 Batch 1200/1540] avg loss 0.0061151, throughput 5.40543K wps
[Epoch 11 Batch 1230/1540] avg loss 0.00589704, throughput 5.16466K wps
[Epoch 11 Batch 1260/1540] avg loss 0.00616929, throughput 5.06357K wps
[Epoch 11 Batch 1290/1540] avg loss 0.00568482, throughput 4.68159K wps
[Epoch 11 Batch 1320/1540] avg loss 0.00607028, throughput 5.7864K wps
[Epoch 11 Batch 1350/1540] avg loss 0.00625382, throughput 5.10779K wps
[Epoch 11 Batch 1380/1540] avg loss 0.00592006, throughput 5.17063K wps
[Epoch 11 Batch 1410/1540] avg loss 0.00596668, throughput 5.02836K wps
[Epoch 11 Batch 1440/1540] avg loss 0.00632431, throughput 4.58683K wps
[Epoch 11 Batch 1470/1540] avg loss 0.00571543, throughput 4.75914K wps
[Epoch 11 Batch 1500/1540] avg loss 0.0061747, throughput 4.60195K wps
[Epoch 11 Batch 1530/1540] avg loss 0.00621668, throughput 4.83935K wps
Begin Testing...
[Epoch 11] train avg loss 0.00599554, dev acc 0.8303, dev avg loss 0.380721, throughput 4.95521K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 12 Batch 30/1540] avg loss 0.00587165, throughput 4.76415K wps
[Epoch 12 Batch 60/1540] avg loss 0.00564009, throughput 4.83521K wps
[Epoch 12 Batch 90/1540] avg loss 0.00624825, throughput 5.43684K wps
[Epoch 12 Batch 120/1540] avg loss 0.00623371, throughput 5.45392K wps
[Epoch 12 Batch 150/1540] avg loss 0.00612706, throughput 4.4213K wps
[Epoch 12 Batch 180/1540] avg loss 0.00576999, throughput 5.09025K wps
[Epoch 12 Batch 210/1540] avg loss 0.00561715, throughput 5.00797K wps
[Epoch 12 Batch 240/1540] avg loss 0.00563777, throughput 4.94442K wps
[Epoch 12 Batch 270/1540] avg loss 0.00608494, throughput 5.50988K wps
[Epoch 12 Batch 300/1540] avg loss 0.00607747, throughput 5.71201K wps
[Epoch 12 Batch 330/1540] avg loss 0.00573197, throughput 4.85435K wps
[Epoch 12 Batch 360/1540] avg loss 0.00585258, throughput 4.88242K wps
[Epoch 12 Batch 390/1540] avg loss 0.00549008, throughput 5.06699K wps
[Epoch 12 Batch 420/1540] avg loss 0.00652446, throughput 5.01265K wps
[Epoch 12 Batch 450/1540] avg loss 0.0058879, throughput 5.17289K wps
[Epoch 12 Batch 480/1540] avg loss 0.00646773, throughput 4.60731K wps
[Epoch 12 Batch 510/1540] avg loss 0.00527071, throughput 4.65488K wps
[Epoch 12 Batch 540/1540] avg loss 0.0056131, throughput 5.0021K wps
[Epoch 12 Batch 570/1540] avg loss 0.00623475, throughput 4.73968K wps
[Epoch 12 Batch 600/1540] avg loss 0.00564464, throughput 5.24563K wps
[Epoch 12 Batch 630/1540] avg loss 0.00607324, throughput 4.74203K wps
[Epoch 12 Batch 660/1540] avg loss 0.00618397, throughput 4.81551K wps
[Epoch 12 Batch 690/1540] avg loss 0.00547228, throughput 4.6218K wps
[Epoch 12 Batch 720/1540] avg loss 0.00581143, throughput 4.58631K wps
[Epoch 12 Batch 750/1540] avg loss 0.00577144, throughput 4.58536K wps
[Epoch 12 Batch 780/1540] avg loss 0.00573656, throughput 5.32115K wps
[Epoch 12 Batch 810/1540] avg loss 0.00596498, throughput 4.81703K wps
[Epoch 12 Batch 840/1540] avg loss 0.00601744, throughput 4.74825K wps
[Epoch 12 Batch 870/1540] avg loss 0.00605746, throughput 4.71718K wps
[Epoch 12 Batch 900/1540] avg loss 0.00607583, throughput 5.01574K wps
[Epoch 12 Batch 930/1540] avg loss 0.005747, throughput 5.24014K wps
[Epoch 12 Batch 960/1540] avg loss 0.00591694, throughput 4.64867K wps
[Epoch 12 Batch 990/1540] avg loss 0.00576314, throughput 5.6564K wps
[Epoch 12 Batch 1020/1540] avg loss 0.00549245, throughput 5.30804K wps
[Epoch 12 Batch 1050/1540] avg loss 0.00557669, throughput 5.03667K wps
[Epoch 12 Batch 1080/1540] avg loss 0.00562719, throughput 4.63907K wps
[Epoch 12 Batch 1110/1540] avg loss 0.00582459, throughput 5.5174K wps
[Epoch 12 Batch 1140/1540] avg loss 0.00527699, throughput 4.88099K wps
[Epoch 12 Batch 1170/1540] avg loss 0.00599458, throughput 4.77032K wps
[Epoch 12 Batch 1200/1540] avg loss 0.00566801, throughput 5.92715K wps
[Epoch 12 Batch 1230/1540] avg loss 0.00607112, throughput 4.93554K wps
[Epoch 12 Batch 1260/1540] avg loss 0.00570253, throughput 5.20274K wps
[Epoch 12 Batch 1290/1540] avg loss 0.00617666, throughput 4.69022K wps
[Epoch 12 Batch 1320/1540] avg loss 0.00577748, throughput 4.97291K wps
[Epoch 12 Batch 1350/1540] avg loss 0.00588248, throughput 4.46992K wps
[Epoch 12 Batch 1380/1540] avg loss 0.0055156, throughput 4.90747K wps
[Epoch 12 Batch 1410/1540] avg loss 0.00586411, throughput 5.61045K wps
[Epoch 12 Batch 1440/1540] avg loss 0.00564744, throughput 4.80898K wps
[Epoch 12 Batch 1470/1540] avg loss 0.00537426, throughput 4.43501K wps
[Epoch 12 Batch 1500/1540] avg loss 0.00543516, throughput 4.50512K wps
[Epoch 12 Batch 1530/1540] avg loss 0.00583922, throughput 4.53243K wps
Begin Testing...
[Epoch 12] train avg loss 0.0058327, dev acc 0.8326, dev avg loss 0.378583, throughput 4.93365K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 13 Batch 30/1540] avg loss 0.00573476, throughput 4.76356K wps
[Epoch 13 Batch 60/1540] avg loss 0.00622988, throughput 4.79169K wps
[Epoch 13 Batch 90/1540] avg loss 0.00622492, throughput 4.9005K wps
[Epoch 13 Batch 120/1540] avg loss 0.00556149, throughput 4.85111K wps
[Epoch 13 Batch 150/1540] avg loss 0.00553308, throughput 5.21456K wps
[Epoch 13 Batch 180/1540] avg loss 0.00591293, throughput 4.5006K wps
[Epoch 13 Batch 210/1540] avg loss 0.00570399, throughput 4.78627K wps
[Epoch 13 Batch 240/1540] avg loss 0.00550606, throughput 4.93046K wps
[Epoch 13 Batch 270/1540] avg loss 0.00529951, throughput 4.70372K wps
[Epoch 13 Batch 300/1540] avg loss 0.00569936, throughput 4.86383K wps
[Epoch 13 Batch 330/1540] avg loss 0.0055246, throughput 4.70449K wps
[Epoch 13 Batch 360/1540] avg loss 0.00562409, throughput 5.34486K wps
[Epoch 13 Batch 390/1540] avg loss 0.00543915, throughput 4.8981K wps
[Epoch 13 Batch 420/1540] avg loss 0.00601463, throughput 4.99167K wps
[Epoch 13 Batch 450/1540] avg loss 0.00531138, throughput 4.59693K wps
[Epoch 13 Batch 480/1540] avg loss 0.00569795, throughput 5.42009K wps
[Epoch 13 Batch 510/1540] avg loss 0.00570207, throughput 4.58454K wps
[Epoch 13 Batch 540/1540] avg loss 0.00576285, throughput 5.0146K wps
[Epoch 13 Batch 570/1540] avg loss 0.00546453, throughput 5.19283K wps
[Epoch 13 Batch 600/1540] avg loss 0.00557464, throughput 4.76828K wps
[Epoch 13 Batch 630/1540] avg loss 0.00612499, throughput 4.46944K wps
[Epoch 13 Batch 660/1540] avg loss 0.00580373, throughput 5.34495K wps
[Epoch 13 Batch 690/1540] avg loss 0.00529106, throughput 4.8798K wps
[Epoch 13 Batch 720/1540] avg loss 0.00580095, throughput 4.88828K wps
[Epoch 13 Batch 750/1540] avg loss 0.00576916, throughput 5.02039K wps
[Epoch 13 Batch 780/1540] avg loss 0.00558403, throughput 5.53732K wps
[Epoch 13 Batch 810/1540] avg loss 0.0054347, throughput 4.75321K wps
[Epoch 13 Batch 840/1540] avg loss 0.00553654, throughput 4.69458K wps
[Epoch 13 Batch 870/1540] avg loss 0.00587738, throughput 5.44493K wps
[Epoch 13 Batch 900/1540] avg loss 0.00595381, throughput 4.6639K wps
[Epoch 13 Batch 930/1540] avg loss 0.00576455, throughput 5.08297K wps
[Epoch 13 Batch 960/1540] avg loss 0.00547744, throughput 5.00206K wps
[Epoch 13 Batch 990/1540] avg loss 0.00548522, throughput 5.28186K wps
[Epoch 13 Batch 1020/1540] avg loss 0.00580154, throughput 4.5923K wps
[Epoch 13 Batch 1050/1540] avg loss 0.00548923, throughput 5.17322K wps
[Epoch 13 Batch 1080/1540] avg loss 0.00595975, throughput 4.77496K wps
[Epoch 13 Batch 1110/1540] avg loss 0.00561033, throughput 4.50441K wps
[Epoch 13 Batch 1140/1540] avg loss 0.00591734, throughput 4.94277K wps
[Epoch 13 Batch 1170/1540] avg loss 0.00600845, throughput 4.78241K wps
[Epoch 13 Batch 1200/1540] avg loss 0.00600523, throughput 4.413K wps
[Epoch 13 Batch 1230/1540] avg loss 0.00553442, throughput 4.35886K wps
[Epoch 13 Batch 1260/1540] avg loss 0.00552334, throughput 4.89666K wps
[Epoch 13 Batch 1290/1540] avg loss 0.00576717, throughput 4.7956K wps
[Epoch 13 Batch 1320/1540] avg loss 0.00573107, throughput 5.99514K wps
[Epoch 13 Batch 1350/1540] avg loss 0.00531927, throughput 4.9616K wps
[Epoch 13 Batch 1380/1540] avg loss 0.00528411, throughput 5.3134K wps
[Epoch 13 Batch 1410/1540] avg loss 0.00584158, throughput 5.15118K wps
[Epoch 13 Batch 1440/1540] avg loss 0.00603932, throughput 4.72209K wps
[Epoch 13 Batch 1470/1540] avg loss 0.00531644, throughput 5.36984K wps
[Epoch 13 Batch 1500/1540] avg loss 0.00573372, throughput 4.83584K wps
[Epoch 13 Batch 1530/1540] avg loss 0.00574675, throughput 4.64496K wps
Begin Testing...
[Epoch 13] train avg loss 0.00568667, dev acc 0.8326, dev avg loss 0.377951, throughput 4.90246K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 14 Batch 30/1540] avg loss 0.0057753, throughput 4.94583K wps
[Epoch 14 Batch 60/1540] avg loss 0.00548055, throughput 5.3813K wps
[Epoch 14 Batch 90/1540] avg loss 0.00533923, throughput 4.88088K wps
[Epoch 14 Batch 120/1540] avg loss 0.00572881, throughput 4.50406K wps
[Epoch 14 Batch 150/1540] avg loss 0.00578054, throughput 4.59814K wps
[Epoch 14 Batch 180/1540] avg loss 0.0054922, throughput 4.57655K wps
[Epoch 14 Batch 210/1540] avg loss 0.00559665, throughput 4.44863K wps
[Epoch 14 Batch 240/1540] avg loss 0.00561248, throughput 5.1966K wps
[Epoch 14 Batch 270/1540] avg loss 0.0060724, throughput 4.5479K wps
[Epoch 14 Batch 300/1540] avg loss 0.00578778, throughput 4.67823K wps
[Epoch 14 Batch 330/1540] avg loss 0.00574801, throughput 4.92878K wps
[Epoch 14 Batch 360/1540] avg loss 0.00543975, throughput 5.25362K wps
[Epoch 14 Batch 390/1540] avg loss 0.00543681, throughput 4.70716K wps
[Epoch 14 Batch 420/1540] avg loss 0.00546873, throughput 4.78782K wps
[Epoch 14 Batch 450/1540] avg loss 0.00584092, throughput 4.47198K wps
[Epoch 14 Batch 480/1540] avg loss 0.00519796, throughput 4.75468K wps
[Epoch 14 Batch 510/1540] avg loss 0.00547505, throughput 5.52628K wps
[Epoch 14 Batch 540/1540] avg loss 0.00586897, throughput 4.90044K wps
[Epoch 14 Batch 570/1540] avg loss 0.00573828, throughput 4.83875K wps
[Epoch 14 Batch 600/1540] avg loss 0.00549512, throughput 4.77885K wps
[Epoch 14 Batch 630/1540] avg loss 0.00546624, throughput 5.63048K wps
[Epoch 14 Batch 660/1540] avg loss 0.00578842, throughput 4.72797K wps
[Epoch 14 Batch 690/1540] avg loss 0.00554871, throughput 4.78583K wps
[Epoch 14 Batch 720/1540] avg loss 0.00554355, throughput 5.02397K wps
[Epoch 14 Batch 750/1540] avg loss 0.00579277, throughput 4.83576K wps
[Epoch 14 Batch 780/1540] avg loss 0.00529954, throughput 4.77236K wps
[Epoch 14 Batch 810/1540] avg loss 0.00556114, throughput 5.18526K wps
[Epoch 14 Batch 840/1540] avg loss 0.00504848, throughput 4.81958K wps
[Epoch 14 Batch 870/1540] avg loss 0.00540756, throughput 4.78654K wps
[Epoch 14 Batch 900/1540] avg loss 0.0055167, throughput 4.53367K wps
[Epoch 14 Batch 930/1540] avg loss 0.00535282, throughput 4.92871K wps
[Epoch 14 Batch 960/1540] avg loss 0.00571444, throughput 4.82098K wps
[Epoch 14 Batch 990/1540] avg loss 0.00581974, throughput 5.86149K wps
[Epoch 14 Batch 1020/1540] avg loss 0.00548258, throughput 4.92534K wps
[Epoch 14 Batch 1050/1540] avg loss 0.00528629, throughput 4.79208K wps
[Epoch 14 Batch 1080/1540] avg loss 0.00592048, throughput 4.93734K wps
[Epoch 14 Batch 1110/1540] avg loss 0.00550127, throughput 5.58263K wps
[Epoch 14 Batch 1140/1540] avg loss 0.00601628, throughput 4.62362K wps
[Epoch 14 Batch 1170/1540] avg loss 0.00563847, throughput 4.73272K wps
[Epoch 14 Batch 1200/1540] avg loss 0.005447, throughput 4.91431K wps
[Epoch 14 Batch 1230/1540] avg loss 0.005264, throughput 5.32628K wps
[Epoch 14 Batch 1260/1540] avg loss 0.00525004, throughput 4.82831K wps
[Epoch 14 Batch 1290/1540] avg loss 0.00573335, throughput 5.21508K wps
[Epoch 14 Batch 1320/1540] avg loss 0.00527653, throughput 4.70651K wps
[Epoch 14 Batch 1350/1540] avg loss 0.00556316, throughput 4.89174K wps
[Epoch 14 Batch 1380/1540] avg loss 0.00545725, throughput 4.96501K wps
[Epoch 14 Batch 1410/1540] avg loss 0.00558487, throughput 5.22664K wps
[Epoch 14 Batch 1440/1540] avg loss 0.00621255, throughput 4.46418K wps
[Epoch 14 Batch 1470/1540] avg loss 0.00558561, throughput 4.96031K wps
[Epoch 14 Batch 1500/1540] avg loss 0.00525722, throughput 4.9266K wps
[Epoch 14 Batch 1530/1540] avg loss 0.00526411, throughput 4.57819K wps
Begin Testing...
[Epoch 14] train avg loss 0.0055667, dev acc 0.8211, dev avg loss 0.389846, throughput 4.88011K wps
[Epoch 15 Batch 30/1540] avg loss 0.00579758, throughput 4.39045K wps
[Epoch 15 Batch 60/1540] avg loss 0.00533746, throughput 5.07466K wps
[Epoch 15 Batch 90/1540] avg loss 0.00542172, throughput 4.70548K wps
[Epoch 15 Batch 120/1540] avg loss 0.00577873, throughput 4.94586K wps
[Epoch 15 Batch 150/1540] avg loss 0.00586595, throughput 4.64458K wps
[Epoch 15 Batch 180/1540] avg loss 0.00557018, throughput 4.98061K wps
[Epoch 15 Batch 210/1540] avg loss 0.00538645, throughput 5.29979K wps
[Epoch 15 Batch 240/1540] avg loss 0.00548058, throughput 5.26346K wps
[Epoch 15 Batch 270/1540] avg loss 0.0053897, throughput 4.59085K wps
[Epoch 15 Batch 300/1540] avg loss 0.00546412, throughput 4.44808K wps
[Epoch 15 Batch 330/1540] avg loss 0.00527818, throughput 5.2894K wps
[Epoch 15 Batch 360/1540] avg loss 0.00583379, throughput 4.71135K wps
[Epoch 15 Batch 390/1540] avg loss 0.00498653, throughput 5.06476K wps
[Epoch 15 Batch 420/1540] avg loss 0.00542716, throughput 4.95894K wps
[Epoch 15 Batch 450/1540] avg loss 0.0053863, throughput 4.75087K wps
[Epoch 15 Batch 480/1540] avg loss 0.00527865, throughput 4.57371K wps
[Epoch 15 Batch 510/1540] avg loss 0.00502748, throughput 5.14103K wps
[Epoch 15 Batch 540/1540] avg loss 0.00489124, throughput 4.87012K wps
[Epoch 15 Batch 570/1540] avg loss 0.00571536, throughput 5.55403K wps
[Epoch 15 Batch 600/1540] avg loss 0.00565283, throughput 4.97329K wps
[Epoch 15 Batch 630/1540] avg loss 0.00507812, throughput 4.53012K wps
[Epoch 15 Batch 660/1540] avg loss 0.00588272, throughput 4.40136K wps
[Epoch 15 Batch 690/1540] avg loss 0.00509868, throughput 5.03995K wps
[Epoch 15 Batch 720/1540] avg loss 0.0054396, throughput 4.77105K wps
[Epoch 15 Batch 750/1540] avg loss 0.00497424, throughput 5.13811K wps
[Epoch 15 Batch 780/1540] avg loss 0.00582669, throughput 5.28658K wps
[Epoch 15 Batch 810/1540] avg loss 0.00556079, throughput 5.05015K wps
[Epoch 15 Batch 840/1540] avg loss 0.00576062, throughput 5.70748K wps
[Epoch 15 Batch 870/1540] avg loss 0.00585053, throughput 4.75621K wps
[Epoch 15 Batch 900/1540] avg loss 0.00555819, throughput 5.30285K wps
[Epoch 15 Batch 930/1540] avg loss 0.00570421, throughput 4.799K wps
[Epoch 15 Batch 960/1540] avg loss 0.00539592, throughput 5.07307K wps
[Epoch 15 Batch 990/1540] avg loss 0.00518413, throughput 4.67156K wps
[Epoch 15 Batch 1020/1540] avg loss 0.00531772, throughput 4.57696K wps
[Epoch 15 Batch 1050/1540] avg loss 0.00548492, throughput 4.62904K wps
[Epoch 15 Batch 1080/1540] avg loss 0.00545787, throughput 4.90922K wps
[Epoch 15 Batch 1110/1540] avg loss 0.00557054, throughput 5.37436K wps
[Epoch 15 Batch 1140/1540] avg loss 0.00542181, throughput 5.01515K wps
[Epoch 15 Batch 1170/1540] avg loss 0.00546949, throughput 4.72532K wps
[Epoch 15 Batch 1200/1540] avg loss 0.00493511, throughput 5.4638K wps
[Epoch 15 Batch 1230/1540] avg loss 0.00588846, throughput 4.58359K wps
[Epoch 15 Batch 1260/1540] avg loss 0.00555154, throughput 4.96055K wps
[Epoch 15 Batch 1290/1540] avg loss 0.00543728, throughput 4.55805K wps
[Epoch 15 Batch 1320/1540] avg loss 0.00497116, throughput 4.98402K wps
[Epoch 15 Batch 1350/1540] avg loss 0.0052409, throughput 4.52096K wps
[Epoch 15 Batch 1380/1540] avg loss 0.00532596, throughput 4.81499K wps
[Epoch 15 Batch 1410/1540] avg loss 0.0057008, throughput 4.75512K wps
[Epoch 15 Batch 1440/1540] avg loss 0.00537817, throughput 5.23258K wps
[Epoch 15 Batch 1470/1540] avg loss 0.00518574, throughput 4.84589K wps
[Epoch 15 Batch 1500/1540] avg loss 0.0053976, throughput 5.34287K wps
[Epoch 15 Batch 1530/1540] avg loss 0.00557547, throughput 4.71191K wps
Begin Testing...
[Epoch 15] train avg loss 0.005446, dev acc 0.8303, dev avg loss 0.373455, throughput 4.89347K wps
[Epoch 16 Batch 30/1540] avg loss 0.00531985, throughput 4.78623K wps
[Epoch 16 Batch 60/1540] avg loss 0.00528662, throughput 4.73061K wps
[Epoch 16 Batch 90/1540] avg loss 0.00551592, throughput 4.7588K wps
[Epoch 16 Batch 120/1540] avg loss 0.00538165, throughput 5.11259K wps
[Epoch 16 Batch 150/1540] avg loss 0.00541876, throughput 4.39406K wps
[Epoch 16 Batch 180/1540] avg loss 0.00533864, throughput 4.65551K wps
[Epoch 16 Batch 210/1540] avg loss 0.00531573, throughput 5.4943K wps
[Epoch 16 Batch 240/1540] avg loss 0.00534403, throughput 5.23854K wps
[Epoch 16 Batch 270/1540] avg loss 0.00499229, throughput 4.74403K wps
[Epoch 16 Batch 300/1540] avg loss 0.00549009, throughput 4.45046K wps
[Epoch 16 Batch 330/1540] avg loss 0.00539563, throughput 4.99889K wps
[Epoch 16 Batch 360/1540] avg loss 0.0051511, throughput 4.55233K wps
[Epoch 16 Batch 390/1540] avg loss 0.00564348, throughput 4.71024K wps
[Epoch 16 Batch 420/1540] avg loss 0.00529033, throughput 4.94586K wps
[Epoch 16 Batch 450/1540] avg loss 0.00546329, throughput 5.07686K wps
[Epoch 16 Batch 480/1540] avg loss 0.00572513, throughput 4.53964K wps
[Epoch 16 Batch 510/1540] avg loss 0.00531949, throughput 5.25113K wps
[Epoch 16 Batch 540/1540] avg loss 0.00578396, throughput 4.73808K wps
[Epoch 16 Batch 570/1540] avg loss 0.00522603, throughput 5.18925K wps
[Epoch 16 Batch 600/1540] avg loss 0.00514952, throughput 5.21852K wps
[Epoch 16 Batch 630/1540] avg loss 0.00543788, throughput 5.39496K wps
[Epoch 16 Batch 660/1540] avg loss 0.005494, throughput 4.45584K wps
[Epoch 16 Batch 690/1540] avg loss 0.00493459, throughput 4.74026K wps
[Epoch 16 Batch 720/1540] avg loss 0.00528765, throughput 4.7492K wps
[Epoch 16 Batch 750/1540] avg loss 0.00545321, throughput 4.99529K wps
[Epoch 16 Batch 780/1540] avg loss 0.00495677, throughput 4.49295K wps
[Epoch 16 Batch 810/1540] avg loss 0.0054641, throughput 4.78653K wps
[Epoch 16 Batch 840/1540] avg loss 0.00494545, throughput 5.13241K wps
[Epoch 16 Batch 870/1540] avg loss 0.00556322, throughput 5.00798K wps
[Epoch 16 Batch 900/1540] avg loss 0.00549586, throughput 4.97171K wps
[Epoch 16 Batch 930/1540] avg loss 0.00534172, throughput 5.41353K wps
[Epoch 16 Batch 960/1540] avg loss 0.00534499, throughput 5.14315K wps
[Epoch 16 Batch 990/1540] avg loss 0.0051516, throughput 5.12824K wps
[Epoch 16 Batch 1020/1540] avg loss 0.00506663, throughput 4.52169K wps
[Epoch 16 Batch 1050/1540] avg loss 0.00545723, throughput 4.50982K wps
[Epoch 16 Batch 1080/1540] avg loss 0.00579033, throughput 4.67022K wps
[Epoch 16 Batch 1110/1540] avg loss 0.00532682, throughput 5.53669K wps
[Epoch 16 Batch 1140/1540] avg loss 0.00512576, throughput 5.13141K wps
[Epoch 16 Batch 1170/1540] avg loss 0.00522466, throughput 4.89883K wps
[Epoch 16 Batch 1200/1540] avg loss 0.00565682, throughput 4.65764K wps
[Epoch 16 Batch 1230/1540] avg loss 0.00525567, throughput 5.15611K wps
[Epoch 16 Batch 1260/1540] avg loss 0.00536629, throughput 4.396K wps
[Epoch 16 Batch 1290/1540] avg loss 0.00517906, throughput 4.48257K wps
[Epoch 16 Batch 1320/1540] avg loss 0.00539942, throughput 5.18732K wps
[Epoch 16 Batch 1350/1540] avg loss 0.00506187, throughput 4.67665K wps
[Epoch 16 Batch 1380/1540] avg loss 0.00486141, throughput 4.69787K wps
[Epoch 16 Batch 1410/1540] avg loss 0.00526511, throughput 5.03303K wps
[Epoch 16 Batch 1440/1540] avg loss 0.00532548, throughput 5.62656K wps
[Epoch 16 Batch 1470/1540] avg loss 0.00517858, throughput 4.90388K wps
[Epoch 16 Batch 1500/1540] avg loss 0.00542236, throughput 5.39178K wps
[Epoch 16 Batch 1530/1540] avg loss 0.00521982, throughput 5.14857K wps
Begin Testing...
[Epoch 16] train avg loss 0.00533096, dev acc 0.8314, dev avg loss 0.370652, throughput 4.89522K wps
[Epoch 17 Batch 30/1540] avg loss 0.00498438, throughput 4.51646K wps
[Epoch 17 Batch 60/1540] avg loss 0.00567592, throughput 4.39891K wps
[Epoch 17 Batch 90/1540] avg loss 0.0051444, throughput 4.57975K wps
[Epoch 17 Batch 120/1540] avg loss 0.00510657, throughput 5.10094K wps
[Epoch 17 Batch 150/1540] avg loss 0.0050901, throughput 4.55415K wps
[Epoch 17 Batch 180/1540] avg loss 0.00541995, throughput 5.30641K wps
[Epoch 17 Batch 210/1540] avg loss 0.00552021, throughput 4.73257K wps
[Epoch 17 Batch 240/1540] avg loss 0.00504491, throughput 5.25128K wps
[Epoch 17 Batch 270/1540] avg loss 0.00479669, throughput 4.48225K wps
[Epoch 17 Batch 300/1540] avg loss 0.00511567, throughput 4.76658K wps
[Epoch 17 Batch 330/1540] avg loss 0.00524506, throughput 4.36739K wps
[Epoch 17 Batch 360/1540] avg loss 0.00489831, throughput 4.81819K wps
[Epoch 17 Batch 390/1540] avg loss 0.0048981, throughput 5.29916K wps
[Epoch 17 Batch 420/1540] avg loss 0.00468451, throughput 4.74086K wps
[Epoch 17 Batch 450/1540] avg loss 0.00514962, throughput 4.62723K wps
[Epoch 17 Batch 480/1540] avg loss 0.00537271, throughput 4.49831K wps
[Epoch 17 Batch 510/1540] avg loss 0.00527614, throughput 4.86436K wps
[Epoch 17 Batch 540/1540] avg loss 0.0048206, throughput 4.80821K wps
[Epoch 17 Batch 570/1540] avg loss 0.00507411, throughput 5.36386K wps
[Epoch 17 Batch 600/1540] avg loss 0.00513512, throughput 4.40499K wps
[Epoch 17 Batch 630/1540] avg loss 0.00499291, throughput 5.16795K wps
[Epoch 17 Batch 660/1540] avg loss 0.00509621, throughput 4.45312K wps
[Epoch 17 Batch 690/1540] avg loss 0.00493138, throughput 4.98939K wps
[Epoch 17 Batch 720/1540] avg loss 0.00521385, throughput 4.43352K wps
[Epoch 17 Batch 750/1540] avg loss 0.00502199, throughput 4.77226K wps
[Epoch 17 Batch 780/1540] avg loss 0.00509018, throughput 5.34544K wps
[Epoch 17 Batch 810/1540] avg loss 0.00558298, throughput 4.65583K wps
[Epoch 17 Batch 840/1540] avg loss 0.00494696, throughput 4.88828K wps
[Epoch 17 Batch 870/1540] avg loss 0.00530694, throughput 4.63153K wps
[Epoch 17 Batch 900/1540] avg loss 0.0052681, throughput 4.57637K wps
[Epoch 17 Batch 930/1540] avg loss 0.00495779, throughput 5.34294K wps
[Epoch 17 Batch 960/1540] avg loss 0.00527294, throughput 4.39391K wps
[Epoch 17 Batch 990/1540] avg loss 0.00524252, throughput 5.12197K wps
[Epoch 17 Batch 1020/1540] avg loss 0.00539254, throughput 4.7904K wps
[Epoch 17 Batch 1050/1540] avg loss 0.0050782, throughput 4.85353K wps
[Epoch 17 Batch 1080/1540] avg loss 0.00486182, throughput 5.18963K wps
[Epoch 17 Batch 1110/1540] avg loss 0.00543899, throughput 5.26977K wps
[Epoch 17 Batch 1140/1540] avg loss 0.00555033, throughput 4.67591K wps
[Epoch 17 Batch 1170/1540] avg loss 0.0050107, throughput 6.046K wps
[Epoch 17 Batch 1200/1540] avg loss 0.0052698, throughput 5.54766K wps
[Epoch 17 Batch 1230/1540] avg loss 0.00557312, throughput 4.79511K wps
[Epoch 17 Batch 1260/1540] avg loss 0.00551034, throughput 4.75627K wps
[Epoch 17 Batch 1290/1540] avg loss 0.00514066, throughput 4.87679K wps
[Epoch 17 Batch 1320/1540] avg loss 0.00506759, throughput 5.58276K wps
[Epoch 17 Batch 1350/1540] avg loss 0.00551102, throughput 5.30788K wps
[Epoch 17 Batch 1380/1540] avg loss 0.0053247, throughput 4.85193K wps
[Epoch 17 Batch 1410/1540] avg loss 0.00528328, throughput 5.28314K wps
[Epoch 17 Batch 1440/1540] avg loss 0.0055254, throughput 4.69248K wps
[Epoch 17 Batch 1470/1540] avg loss 0.00564012, throughput 4.73448K wps
[Epoch 17 Batch 1500/1540] avg loss 0.00498705, throughput 4.57088K wps
[Epoch 17 Batch 1530/1540] avg loss 0.00538793, throughput 4.92126K wps
Begin Testing...
[Epoch 17] train avg loss 0.0052002, dev acc 0.8383, dev avg loss 0.368694, throughput 4.85665K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 18 Batch 30/1540] avg loss 0.00490915, throughput 4.49268K wps
[Epoch 18 Batch 60/1540] avg loss 0.00495758, throughput 4.66704K wps
[Epoch 18 Batch 90/1540] avg loss 0.00485858, throughput 5.16117K wps
[Epoch 18 Batch 120/1540] avg loss 0.00515363, throughput 5.4474K wps
[Epoch 18 Batch 150/1540] avg loss 0.00514862, throughput 5.19991K wps
[Epoch 18 Batch 180/1540] avg loss 0.00449118, throughput 5.31729K wps
[Epoch 18 Batch 210/1540] avg loss 0.00480469, throughput 4.8773K wps
[Epoch 18 Batch 240/1540] avg loss 0.00516656, throughput 4.96744K wps
[Epoch 18 Batch 270/1540] avg loss 0.00523444, throughput 5.05681K wps
[Epoch 18 Batch 300/1540] avg loss 0.0051877, throughput 5.05688K wps
[Epoch 18 Batch 330/1540] avg loss 0.00499318, throughput 4.83148K wps
[Epoch 18 Batch 360/1540] avg loss 0.00537508, throughput 4.88055K wps
[Epoch 18 Batch 390/1540] avg loss 0.00501637, throughput 4.93272K wps
[Epoch 18 Batch 420/1540] avg loss 0.00525762, throughput 5.0288K wps
[Epoch 18 Batch 450/1540] avg loss 0.00512145, throughput 5.47883K wps
[Epoch 18 Batch 480/1540] avg loss 0.00495389, throughput 5.11497K wps
[Epoch 18 Batch 510/1540] avg loss 0.00473841, throughput 5.26004K wps
[Epoch 18 Batch 540/1540] avg loss 0.00545763, throughput 4.56303K wps
[Epoch 18 Batch 570/1540] avg loss 0.00517588, throughput 5.1223K wps
[Epoch 18 Batch 600/1540] avg loss 0.00529877, throughput 5.42964K wps
[Epoch 18 Batch 630/1540] avg loss 0.00504774, throughput 4.43243K wps
[Epoch 18 Batch 660/1540] avg loss 0.00513487, throughput 4.96251K wps
[Epoch 18 Batch 690/1540] avg loss 0.00530104, throughput 5.00554K wps
[Epoch 18 Batch 720/1540] avg loss 0.00497841, throughput 4.41076K wps
[Epoch 18 Batch 750/1540] avg loss 0.00468729, throughput 5.15722K wps
[Epoch 18 Batch 780/1540] avg loss 0.00489914, throughput 4.85977K wps
[Epoch 18 Batch 810/1540] avg loss 0.00496731, throughput 4.77725K wps
[Epoch 18 Batch 840/1540] avg loss 0.00522377, throughput 4.36819K wps
[Epoch 18 Batch 870/1540] avg loss 0.00509292, throughput 4.80146K wps
[Epoch 18 Batch 900/1540] avg loss 0.00527722, throughput 4.90041K wps
[Epoch 18 Batch 930/1540] avg loss 0.00530057, throughput 4.72549K wps
[Epoch 18 Batch 960/1540] avg loss 0.00514476, throughput 5.1703K wps
[Epoch 18 Batch 990/1540] avg loss 0.00558566, throughput 4.99255K wps
[Epoch 18 Batch 1020/1540] avg loss 0.00523096, throughput 5.84865K wps
[Epoch 18 Batch 1050/1540] avg loss 0.00528313, throughput 4.89472K wps
[Epoch 18 Batch 1080/1540] avg loss 0.0052642, throughput 4.70822K wps
[Epoch 18 Batch 1110/1540] avg loss 0.00483268, throughput 5.03748K wps
[Epoch 18 Batch 1140/1540] avg loss 0.00507102, throughput 4.48047K wps
[Epoch 18 Batch 1170/1540] avg loss 0.00489381, throughput 4.73995K wps
[Epoch 18 Batch 1200/1540] avg loss 0.00541672, throughput 4.86857K wps
[Epoch 18 Batch 1230/1540] avg loss 0.00506522, throughput 4.56009K wps
[Epoch 18 Batch 1260/1540] avg loss 0.00500604, throughput 4.98757K wps
[Epoch 18 Batch 1290/1540] avg loss 0.00530322, throughput 5.58023K wps
[Epoch 18 Batch 1320/1540] avg loss 0.00436396, throughput 4.7098K wps
[Epoch 18 Batch 1350/1540] avg loss 0.00532244, throughput 4.76583K wps
[Epoch 18 Batch 1380/1540] avg loss 0.00496194, throughput 4.67364K wps
[Epoch 18 Batch 1410/1540] avg loss 0.00485776, throughput 4.87231K wps
[Epoch 18 Batch 1440/1540] avg loss 0.00498983, throughput 5.05729K wps
[Epoch 18 Batch 1470/1540] avg loss 0.00493067, throughput 5.01167K wps
[Epoch 18 Batch 1500/1540] avg loss 0.00498548, throughput 4.54899K wps
[Epoch 18 Batch 1530/1540] avg loss 0.00501977, throughput 4.69676K wps
Begin Testing...
[Epoch 18] train avg loss 0.0050756, dev acc 0.8268, dev avg loss 0.383385, throughput 4.90789K wps
[Epoch 19 Batch 30/1540] avg loss 0.00536108, throughput 4.50748K wps
[Epoch 19 Batch 60/1540] avg loss 0.00488805, throughput 4.8557K wps
[Epoch 19 Batch 90/1540] avg loss 0.00484209, throughput 4.48795K wps
[Epoch 19 Batch 120/1540] avg loss 0.00487854, throughput 4.76419K wps
[Epoch 19 Batch 150/1540] avg loss 0.00471306, throughput 4.83359K wps
[Epoch 19 Batch 180/1540] avg loss 0.0050639, throughput 4.54616K wps
[Epoch 19 Batch 210/1540] avg loss 0.00508824, throughput 5.01278K wps
[Epoch 19 Batch 240/1540] avg loss 0.004984, throughput 4.76928K wps
[Epoch 19 Batch 270/1540] avg loss 0.00545027, throughput 4.53306K wps
[Epoch 19 Batch 300/1540] avg loss 0.00513269, throughput 4.77292K wps
[Epoch 19 Batch 330/1540] avg loss 0.00512768, throughput 4.62291K wps
[Epoch 19 Batch 360/1540] avg loss 0.00503496, throughput 5.00144K wps
[Epoch 19 Batch 390/1540] avg loss 0.00524846, throughput 5.58388K wps
[Epoch 19 Batch 420/1540] avg loss 0.00515388, throughput 5.3415K wps
[Epoch 19 Batch 450/1540] avg loss 0.00567462, throughput 5.58438K wps
[Epoch 19 Batch 480/1540] avg loss 0.00485788, throughput 4.61357K wps
[Epoch 19 Batch 510/1540] avg loss 0.00506879, throughput 4.42846K wps
[Epoch 19 Batch 540/1540] avg loss 0.00533305, throughput 4.65823K wps
[Epoch 19 Batch 570/1540] avg loss 0.00492131, throughput 4.54823K wps
[Epoch 19 Batch 600/1540] avg loss 0.00480468, throughput 4.99377K wps
[Epoch 19 Batch 630/1540] avg loss 0.00508523, throughput 5.26452K wps
[Epoch 19 Batch 660/1540] avg loss 0.00461809, throughput 4.82244K wps
[Epoch 19 Batch 690/1540] avg loss 0.00508747, throughput 4.5404K wps
[Epoch 19 Batch 720/1540] avg loss 0.00490238, throughput 4.53015K wps
[Epoch 19 Batch 750/1540] avg loss 0.0050807, throughput 5.15471K wps
[Epoch 19 Batch 780/1540] avg loss 0.00464109, throughput 5.15682K wps
[Epoch 19 Batch 810/1540] avg loss 0.00513656, throughput 5.08904K wps
[Epoch 19 Batch 840/1540] avg loss 0.00498686, throughput 4.94004K wps
[Epoch 19 Batch 870/1540] avg loss 0.00511484, throughput 4.53698K wps
[Epoch 19 Batch 900/1540] avg loss 0.00499056, throughput 4.51124K wps
[Epoch 19 Batch 930/1540] avg loss 0.00483062, throughput 5.08363K wps
[Epoch 19 Batch 960/1540] avg loss 0.00477218, throughput 4.47092K wps
[Epoch 19 Batch 990/1540] avg loss 0.00486972, throughput 4.8781K wps
[Epoch 19 Batch 1020/1540] avg loss 0.00526894, throughput 4.99409K wps
[Epoch 19 Batch 1050/1540] avg loss 0.00477519, throughput 4.87531K wps
[Epoch 19 Batch 1080/1540] avg loss 0.00515349, throughput 4.92319K wps
[Epoch 19 Batch 1110/1540] avg loss 0.00506039, throughput 4.5935K wps
[Epoch 19 Batch 1140/1540] avg loss 0.0052416, throughput 4.98287K wps
[Epoch 19 Batch 1170/1540] avg loss 0.00498519, throughput 5.1931K wps
[Epoch 19 Batch 1200/1540] avg loss 0.00474873, throughput 5.37139K wps
[Epoch 19 Batch 1230/1540] avg loss 0.00472781, throughput 4.86889K wps
[Epoch 19 Batch 1260/1540] avg loss 0.00494206, throughput 4.90268K wps
[Epoch 19 Batch 1290/1540] avg loss 0.0050761, throughput 4.60191K wps
[Epoch 19 Batch 1320/1540] avg loss 0.00491794, throughput 4.80026K wps
[Epoch 19 Batch 1350/1540] avg loss 0.00500987, throughput 4.76598K wps
[Epoch 19 Batch 1380/1540] avg loss 0.00529012, throughput 4.80547K wps
[Epoch 19 Batch 1410/1540] avg loss 0.00458102, throughput 5.57729K wps
[Epoch 19 Batch 1440/1540] avg loss 0.00512453, throughput 5.03817K wps
[Epoch 19 Batch 1470/1540] avg loss 0.00462125, throughput 4.7605K wps
[Epoch 19 Batch 1500/1540] avg loss 0.00485528, throughput 5.05994K wps
[Epoch 19 Batch 1530/1540] avg loss 0.0053342, throughput 4.70403K wps
Begin Testing...
[Epoch 19] train avg loss 0.00501059, dev acc 0.8372, dev avg loss 0.369326, throughput 4.84601K wps
[Epoch 20 Batch 30/1540] avg loss 0.00510258, throughput 5.22077K wps
[Epoch 20 Batch 60/1540] avg loss 0.00470519, throughput 4.43074K wps
[Epoch 20 Batch 90/1540] avg loss 0.00465604, throughput 4.74525K wps
[Epoch 20 Batch 120/1540] avg loss 0.0047281, throughput 4.99959K wps
[Epoch 20 Batch 150/1540] avg loss 0.00489511, throughput 5.35524K wps
[Epoch 20 Batch 180/1540] avg loss 0.00502365, throughput 4.90638K wps
[Epoch 20 Batch 210/1540] avg loss 0.00502452, throughput 5.25037K wps
[Epoch 20 Batch 240/1540] avg loss 0.00458251, throughput 4.76703K wps
[Epoch 20 Batch 270/1540] avg loss 0.0050354, throughput 4.87761K wps
[Epoch 20 Batch 300/1540] avg loss 0.0048684, throughput 5.42666K wps
[Epoch 20 Batch 330/1540] avg loss 0.0047312, throughput 4.52073K wps
[Epoch 20 Batch 360/1540] avg loss 0.00490212, throughput 4.43516K wps
[Epoch 20 Batch 390/1540] avg loss 0.0046744, throughput 5.18335K wps
[Epoch 20 Batch 420/1540] avg loss 0.0051225, throughput 4.98537K wps
[Epoch 20 Batch 450/1540] avg loss 0.00449538, throughput 4.60864K wps
[Epoch 20 Batch 480/1540] avg loss 0.00468332, throughput 4.66648K wps
[Epoch 20 Batch 510/1540] avg loss 0.00506074, throughput 4.94528K wps
[Epoch 20 Batch 540/1540] avg loss 0.00508112, throughput 4.90295K wps
[Epoch 20 Batch 570/1540] avg loss 0.00490486, throughput 5.14525K wps
[Epoch 20 Batch 600/1540] avg loss 0.00510554, throughput 4.70347K wps
[Epoch 20 Batch 630/1540] avg loss 0.00512273, throughput 4.86136K wps
[Epoch 20 Batch 660/1540] avg loss 0.00480236, throughput 4.75391K wps
[Epoch 20 Batch 690/1540] avg loss 0.00497875, throughput 5.33533K wps
[Epoch 20 Batch 720/1540] avg loss 0.00512643, throughput 5.12794K wps
[Epoch 20 Batch 750/1540] avg loss 0.00498445, throughput 5.16726K wps
[Epoch 20 Batch 780/1540] avg loss 0.00447855, throughput 5.20803K wps
[Epoch 20 Batch 810/1540] avg loss 0.00482566, throughput 4.8163K wps
[Epoch 20 Batch 840/1540] avg loss 0.00505739, throughput 4.96322K wps
[Epoch 20 Batch 870/1540] avg loss 0.00459957, throughput 4.98742K wps
[Epoch 20 Batch 900/1540] avg loss 0.00504281, throughput 4.89764K wps
[Epoch 20 Batch 930/1540] avg loss 0.00473979, throughput 5.11549K wps
[Epoch 20 Batch 960/1540] avg loss 0.00492013, throughput 4.58213K wps
[Epoch 20 Batch 990/1540] avg loss 0.00483477, throughput 5.20785K wps
[Epoch 20 Batch 1020/1540] avg loss 0.00491481, throughput 4.95694K wps
[Epoch 20 Batch 1050/1540] avg loss 0.00493994, throughput 5.40992K wps
[Epoch 20 Batch 1080/1540] avg loss 0.00473118, throughput 4.71075K wps
[Epoch 20 Batch 1110/1540] avg loss 0.00452138, throughput 4.46547K wps
[Epoch 20 Batch 1140/1540] avg loss 0.00514958, throughput 4.58891K wps
[Epoch 20 Batch 1170/1540] avg loss 0.00526821, throughput 4.60329K wps
[Epoch 20 Batch 1200/1540] avg loss 0.00449527, throughput 5.43605K wps
[Epoch 20 Batch 1230/1540] avg loss 0.00506654, throughput 4.67959K wps
[Epoch 20 Batch 1260/1540] avg loss 0.0045367, throughput 5.23573K wps
[Epoch 20 Batch 1290/1540] avg loss 0.00499947, throughput 5.17064K wps
[Epoch 20 Batch 1320/1540] avg loss 0.00491741, throughput 4.76245K wps
[Epoch 20 Batch 1350/1540] avg loss 0.00492791, throughput 5.06056K wps
[Epoch 20 Batch 1380/1540] avg loss 0.0047398, throughput 4.92068K wps
[Epoch 20 Batch 1410/1540] avg loss 0.00523129, throughput 4.89275K wps
[Epoch 20 Batch 1440/1540] avg loss 0.00493614, throughput 4.46121K wps
[Epoch 20 Batch 1470/1540] avg loss 0.00503684, throughput 5.01153K wps
[Epoch 20 Batch 1500/1540] avg loss 0.00500555, throughput 4.80176K wps
[Epoch 20 Batch 1530/1540] avg loss 0.00489171, throughput 4.74457K wps
Begin Testing...
[Epoch 20] train avg loss 0.004887, dev acc 0.8349, dev avg loss 0.368392, throughput 4.90255K wps
[Epoch 21 Batch 30/1540] avg loss 0.00497869, throughput 4.63947K wps
[Epoch 21 Batch 60/1540] avg loss 0.00472045, throughput 4.8455K wps
[Epoch 21 Batch 90/1540] avg loss 0.00481332, throughput 4.4544K wps
[Epoch 21 Batch 120/1540] avg loss 0.00477364, throughput 5.08574K wps
[Epoch 21 Batch 150/1540] avg loss 0.00545415, throughput 4.93819K wps
[Epoch 21 Batch 180/1540] avg loss 0.00448179, throughput 4.99422K wps
[Epoch 21 Batch 210/1540] avg loss 0.00492312, throughput 4.73949K wps
[Epoch 21 Batch 240/1540] avg loss 0.00518047, throughput 4.59188K wps
[Epoch 21 Batch 270/1540] avg loss 0.00470581, throughput 5.3635K wps
[Epoch 21 Batch 300/1540] avg loss 0.00465404, throughput 4.40034K wps
[Epoch 21 Batch 330/1540] avg loss 0.0051441, throughput 4.44876K wps
[Epoch 21 Batch 360/1540] avg loss 0.00474421, throughput 5.34759K wps
[Epoch 21 Batch 390/1540] avg loss 0.00456417, throughput 5.29225K wps
[Epoch 21 Batch 420/1540] avg loss 0.00471542, throughput 4.78641K wps
[Epoch 21 Batch 450/1540] avg loss 0.00497244, throughput 5.01058K wps
[Epoch 21 Batch 480/1540] avg loss 0.00490423, throughput 4.85459K wps
[Epoch 21 Batch 510/1540] avg loss 0.00479514, throughput 4.75253K wps
[Epoch 21 Batch 540/1540] avg loss 0.00522242, throughput 4.85096K wps
[Epoch 21 Batch 570/1540] avg loss 0.00466716, throughput 5.53184K wps
[Epoch 21 Batch 600/1540] avg loss 0.00470066, throughput 5.25088K wps
[Epoch 21 Batch 630/1540] avg loss 0.00463271, throughput 5.48409K wps
[Epoch 21 Batch 660/1540] avg loss 0.00493349, throughput 5.25316K wps
[Epoch 21 Batch 690/1540] avg loss 0.00473074, throughput 4.98668K wps
[Epoch 21 Batch 720/1540] avg loss 0.00469951, throughput 4.71411K wps
[Epoch 21 Batch 750/1540] avg loss 0.0045216, throughput 4.72766K wps
[Epoch 21 Batch 780/1540] avg loss 0.00518101, throughput 4.57772K wps
[Epoch 21 Batch 810/1540] avg loss 0.00478887, throughput 5.33063K wps
[Epoch 21 Batch 840/1540] avg loss 0.00513059, throughput 4.76191K wps
[Epoch 21 Batch 870/1540] avg loss 0.00439273, throughput 4.86586K wps
[Epoch 21 Batch 900/1540] avg loss 0.00475248, throughput 4.44544K wps
[Epoch 21 Batch 930/1540] avg loss 0.00458444, throughput 4.92644K wps
[Epoch 21 Batch 960/1540] avg loss 0.00496909, throughput 5.19051K wps
[Epoch 21 Batch 990/1540] avg loss 0.00532567, throughput 4.59153K wps
[Epoch 21 Batch 1020/1540] avg loss 0.00489752, throughput 5.00589K wps
[Epoch 21 Batch 1050/1540] avg loss 0.00455549, throughput 5.33118K wps
[Epoch 21 Batch 1080/1540] avg loss 0.00490501, throughput 5.40414K wps
[Epoch 21 Batch 1110/1540] avg loss 0.00482301, throughput 5.44916K wps
[Epoch 21 Batch 1140/1540] avg loss 0.00500549, throughput 4.83173K wps
[Epoch 21 Batch 1170/1540] avg loss 0.00485094, throughput 4.96714K wps
[Epoch 21 Batch 1200/1540] avg loss 0.00445604, throughput 4.69206K wps
[Epoch 21 Batch 1230/1540] avg loss 0.00480361, throughput 4.74614K wps
[Epoch 21 Batch 1260/1540] avg loss 0.00469056, throughput 4.59526K wps
[Epoch 21 Batch 1290/1540] avg loss 0.00491659, throughput 4.64688K wps
[Epoch 21 Batch 1320/1540] avg loss 0.00502625, throughput 4.72007K wps
[Epoch 21 Batch 1350/1540] avg loss 0.00481915, throughput 4.45099K wps
[Epoch 21 Batch 1380/1540] avg loss 0.00488157, throughput 5.31743K wps
[Epoch 21 Batch 1410/1540] avg loss 0.00462472, throughput 5.07405K wps
[Epoch 21 Batch 1440/1540] avg loss 0.00452857, throughput 4.76397K wps
[Epoch 21 Batch 1470/1540] avg loss 0.00446306, throughput 5.18565K wps
[Epoch 21 Batch 1500/1540] avg loss 0.00412131, throughput 5.01909K wps
[Epoch 21 Batch 1530/1540] avg loss 0.00458994, throughput 4.70446K wps
Begin Testing...
[Epoch 21] train avg loss 0.0047938, dev acc 0.8314, dev avg loss 0.375263, throughput 4.89711K wps
[Epoch 22 Batch 30/1540] avg loss 0.00485814, throughput 4.38739K wps
[Epoch 22 Batch 60/1540] avg loss 0.00498808, throughput 4.69248K wps
[Epoch 22 Batch 90/1540] avg loss 0.00529982, throughput 5.18781K wps
[Epoch 22 Batch 120/1540] avg loss 0.0049209, throughput 5.0075K wps
[Epoch 22 Batch 150/1540] avg loss 0.00488597, throughput 5.45608K wps
[Epoch 22 Batch 180/1540] avg loss 0.00470327, throughput 5.15013K wps
[Epoch 22 Batch 210/1540] avg loss 0.00472408, throughput 5.95555K wps
[Epoch 22 Batch 240/1540] avg loss 0.00470179, throughput 4.58534K wps
[Epoch 22 Batch 270/1540] avg loss 0.00448152, throughput 4.88526K wps
[Epoch 22 Batch 300/1540] avg loss 0.00432643, throughput 4.84364K wps
[Epoch 22 Batch 330/1540] avg loss 0.00494683, throughput 4.91171K wps
[Epoch 22 Batch 360/1540] avg loss 0.00477023, throughput 4.54171K wps
[Epoch 22 Batch 390/1540] avg loss 0.00427033, throughput 4.94299K wps
[Epoch 22 Batch 420/1540] avg loss 0.00475738, throughput 4.43094K wps
[Epoch 22 Batch 450/1540] avg loss 0.00442819, throughput 4.61209K wps
[Epoch 22 Batch 480/1540] avg loss 0.00469143, throughput 4.74674K wps
[Epoch 22 Batch 510/1540] avg loss 0.00496598, throughput 4.53616K wps
[Epoch 22 Batch 540/1540] avg loss 0.00495511, throughput 4.81973K wps
[Epoch 22 Batch 570/1540] avg loss 0.00463058, throughput 5.00153K wps
[Epoch 22 Batch 600/1540] avg loss 0.00498913, throughput 5.01262K wps
[Epoch 22 Batch 630/1540] avg loss 0.00468566, throughput 5.21847K wps
[Epoch 22 Batch 660/1540] avg loss 0.00489412, throughput 5.19687K wps
[Epoch 22 Batch 690/1540] avg loss 0.00470519, throughput 5.67798K wps
[Epoch 22 Batch 720/1540] avg loss 0.00446504, throughput 5.0451K wps
[Epoch 22 Batch 750/1540] avg loss 0.0045224, throughput 4.95163K wps
[Epoch 22 Batch 780/1540] avg loss 0.00491663, throughput 5.26428K wps
[Epoch 22 Batch 810/1540] avg loss 0.00493597, throughput 5.29113K wps
[Epoch 22 Batch 840/1540] avg loss 0.00429237, throughput 4.69499K wps
[Epoch 22 Batch 870/1540] avg loss 0.00500714, throughput 5.02795K wps
[Epoch 22 Batch 900/1540] avg loss 0.00466783, throughput 4.88192K wps
[Epoch 22 Batch 930/1540] avg loss 0.00449541, throughput 4.75563K wps
[Epoch 22 Batch 960/1540] avg loss 0.00459262, throughput 5.63644K wps
[Epoch 22 Batch 990/1540] avg loss 0.00460037, throughput 5.26849K wps
[Epoch 22 Batch 1020/1540] avg loss 0.00460783, throughput 4.70132K wps
[Epoch 22 Batch 1050/1540] avg loss 0.00506984, throughput 5.21805K wps
[Epoch 22 Batch 1080/1540] avg loss 0.00483489, throughput 4.4235K wps
[Epoch 22 Batch 1110/1540] avg loss 0.00482381, throughput 5.05081K wps
[Epoch 22 Batch 1140/1540] avg loss 0.00461706, throughput 4.94915K wps
[Epoch 22 Batch 1170/1540] avg loss 0.00476422, throughput 4.5136K wps
[Epoch 22 Batch 1200/1540] avg loss 0.00481629, throughput 4.82728K wps
[Epoch 22 Batch 1230/1540] avg loss 0.00463443, throughput 4.94414K wps
[Epoch 22 Batch 1260/1540] avg loss 0.00459173, throughput 5.30348K wps
[Epoch 22 Batch 1290/1540] avg loss 0.00457752, throughput 5.06113K wps
[Epoch 22 Batch 1320/1540] avg loss 0.00461198, throughput 4.90279K wps
[Epoch 22 Batch 1350/1540] avg loss 0.00479133, throughput 4.74614K wps
[Epoch 22 Batch 1380/1540] avg loss 0.00474615, throughput 5.12321K wps
[Epoch 22 Batch 1410/1540] avg loss 0.00440183, throughput 5.01353K wps
[Epoch 22 Batch 1440/1540] avg loss 0.00453105, throughput 4.76945K wps
[Epoch 22 Batch 1470/1540] avg loss 0.00434409, throughput 5.10062K wps
[Epoch 22 Batch 1500/1540] avg loss 0.00433774, throughput 4.59695K wps
[Epoch 22 Batch 1530/1540] avg loss 0.00488106, throughput 4.88833K wps
Begin Testing...
[Epoch 22] train avg loss 0.00470813, dev acc 0.8337, dev avg loss 0.369232, throughput 4.93741K wps
[Epoch 23 Batch 30/1540] avg loss 0.00439856, throughput 4.41266K wps
[Epoch 23 Batch 60/1540] avg loss 0.00474713, throughput 5.01123K wps
[Epoch 23 Batch 90/1540] avg loss 0.00441731, throughput 4.95598K wps
[Epoch 23 Batch 120/1540] avg loss 0.00489205, throughput 5.72791K wps
[Epoch 23 Batch 150/1540] avg loss 0.0047306, throughput 4.8223K wps
[Epoch 23 Batch 180/1540] avg loss 0.00439801, throughput 5.25929K wps
[Epoch 23 Batch 210/1540] avg loss 0.00443939, throughput 4.46947K wps
[Epoch 23 Batch 240/1540] avg loss 0.00465658, throughput 5.41374K wps
[Epoch 23 Batch 270/1540] avg loss 0.00448265, throughput 5.23489K wps
[Epoch 23 Batch 300/1540] avg loss 0.00427863, throughput 5.15145K wps
[Epoch 23 Batch 330/1540] avg loss 0.00463024, throughput 4.75274K wps
[Epoch 23 Batch 360/1540] avg loss 0.00485266, throughput 5.58574K wps
[Epoch 23 Batch 390/1540] avg loss 0.00497345, throughput 4.39794K wps
[Epoch 23 Batch 420/1540] avg loss 0.00447333, throughput 4.62057K wps
[Epoch 23 Batch 450/1540] avg loss 0.004419, throughput 4.73968K wps
[Epoch 23 Batch 480/1540] avg loss 0.00475896, throughput 5.43216K wps
[Epoch 23 Batch 510/1540] avg loss 0.00440409, throughput 5.59061K wps
[Epoch 23 Batch 540/1540] avg loss 0.00463497, throughput 4.92222K wps
[Epoch 23 Batch 570/1540] avg loss 0.00470797, throughput 5.11402K wps
[Epoch 23 Batch 600/1540] avg loss 0.00411716, throughput 5.29034K wps
[Epoch 23 Batch 630/1540] avg loss 0.00487208, throughput 4.79568K wps
[Epoch 23 Batch 660/1540] avg loss 0.00480922, throughput 5.11589K wps
[Epoch 23 Batch 690/1540] avg loss 0.00439299, throughput 5.36429K wps
[Epoch 23 Batch 720/1540] avg loss 0.00487975, throughput 4.78039K wps
[Epoch 23 Batch 750/1540] avg loss 0.00509415, throughput 5.50051K wps
[Epoch 23 Batch 780/1540] avg loss 0.00472754, throughput 4.49776K wps
[Epoch 23 Batch 810/1540] avg loss 0.00445068, throughput 4.77272K wps
[Epoch 23 Batch 840/1540] avg loss 0.00493877, throughput 4.66349K wps
[Epoch 23 Batch 870/1540] avg loss 0.00423407, throughput 5.03144K wps
[Epoch 23 Batch 900/1540] avg loss 0.00462189, throughput 4.56509K wps
[Epoch 23 Batch 930/1540] avg loss 0.00393886, throughput 4.78587K wps
[Epoch 23 Batch 960/1540] avg loss 0.00433914, throughput 4.80819K wps
[Epoch 23 Batch 990/1540] avg loss 0.00489604, throughput 4.75085K wps
[Epoch 23 Batch 1020/1540] avg loss 0.00432292, throughput 4.72699K wps
[Epoch 23 Batch 1050/1540] avg loss 0.00471387, throughput 4.47501K wps
[Epoch 23 Batch 1080/1540] avg loss 0.00480724, throughput 4.79345K wps
[Epoch 23 Batch 1110/1540] avg loss 0.00447119, throughput 4.387K wps
[Epoch 23 Batch 1140/1540] avg loss 0.00480415, throughput 4.71346K wps
[Epoch 23 Batch 1170/1540] avg loss 0.00420857, throughput 4.5486K wps
[Epoch 23 Batch 1200/1540] avg loss 0.00480153, throughput 5.11437K wps
[Epoch 23 Batch 1230/1540] avg loss 0.00546492, throughput 4.69537K wps
[Epoch 23 Batch 1260/1540] avg loss 0.0051845, throughput 4.61188K wps
[Epoch 23 Batch 1290/1540] avg loss 0.00454424, throughput 4.60541K wps
[Epoch 23 Batch 1320/1540] avg loss 0.00462889, throughput 5.32553K wps
[Epoch 23 Batch 1350/1540] avg loss 0.00451573, throughput 4.8849K wps
[Epoch 23 Batch 1380/1540] avg loss 0.00454171, throughput 4.83657K wps
[Epoch 23 Batch 1410/1540] avg loss 0.00472678, throughput 4.8088K wps
[Epoch 23 Batch 1440/1540] avg loss 0.00424292, throughput 5.36032K wps
[Epoch 23 Batch 1470/1540] avg loss 0.00420129, throughput 4.38745K wps
[Epoch 23 Batch 1500/1540] avg loss 0.00468837, throughput 4.64658K wps
[Epoch 23 Batch 1530/1540] avg loss 0.00468086, throughput 4.982K wps
Begin Testing...
[Epoch 23] train avg loss 0.00461033, dev acc 0.8360, dev avg loss 0.370858, throughput 4.87945K wps
[Epoch 24 Batch 30/1540] avg loss 0.00433971, throughput 4.43786K wps
[Epoch 24 Batch 60/1540] avg loss 0.00430431, throughput 5.0156K wps
[Epoch 24 Batch 90/1540] avg loss 0.00447061, throughput 4.97946K wps
[Epoch 24 Batch 120/1540] avg loss 0.00446768, throughput 5.29044K wps
[Epoch 24 Batch 150/1540] avg loss 0.00430419, throughput 5.20422K wps
[Epoch 24 Batch 180/1540] avg loss 0.00446638, throughput 4.42996K wps
[Epoch 24 Batch 210/1540] avg loss 0.00476319, throughput 5.10282K wps
[Epoch 24 Batch 240/1540] avg loss 0.0043419, throughput 4.61664K wps
[Epoch 24 Batch 270/1540] avg loss 0.00449273, throughput 4.68846K wps
[Epoch 24 Batch 300/1540] avg loss 0.00437801, throughput 5.01756K wps
[Epoch 24 Batch 330/1540] avg loss 0.00444513, throughput 4.95993K wps
[Epoch 24 Batch 360/1540] avg loss 0.0043699, throughput 4.97178K wps
[Epoch 24 Batch 390/1540] avg loss 0.00431564, throughput 5.32878K wps
[Epoch 24 Batch 420/1540] avg loss 0.00431574, throughput 4.7553K wps
[Epoch 24 Batch 450/1540] avg loss 0.00468691, throughput 4.56791K wps
[Epoch 24 Batch 480/1540] avg loss 0.00460474, throughput 4.71655K wps
[Epoch 24 Batch 510/1540] avg loss 0.00477615, throughput 4.82012K wps
[Epoch 24 Batch 540/1540] avg loss 0.0045333, throughput 4.74792K wps
[Epoch 24 Batch 570/1540] avg loss 0.00509935, throughput 4.97595K wps
[Epoch 24 Batch 600/1540] avg loss 0.00444425, throughput 4.68244K wps
[Epoch 24 Batch 630/1540] avg loss 0.00460751, throughput 4.32548K wps
[Epoch 24 Batch 660/1540] avg loss 0.00476669, throughput 4.41218K wps
[Epoch 24 Batch 690/1540] avg loss 0.00444325, throughput 4.7418K wps
[Epoch 24 Batch 720/1540] avg loss 0.00435551, throughput 4.41726K wps
[Epoch 24 Batch 750/1540] avg loss 0.00471362, throughput 4.38508K wps
[Epoch 24 Batch 780/1540] avg loss 0.0046279, throughput 5.06093K wps
[Epoch 24 Batch 810/1540] avg loss 0.00453533, throughput 5.14449K wps
[Epoch 24 Batch 840/1540] avg loss 0.00446815, throughput 5.50647K wps
[Epoch 24 Batch 870/1540] avg loss 0.00462322, throughput 4.82451K wps
[Epoch 24 Batch 900/1540] avg loss 0.00457481, throughput 5.03754K wps
[Epoch 24 Batch 930/1540] avg loss 0.00461147, throughput 4.68351K wps
[Epoch 24 Batch 960/1540] avg loss 0.00450456, throughput 4.75191K wps
[Epoch 24 Batch 990/1540] avg loss 0.00454343, throughput 4.67598K wps
[Epoch 24 Batch 1020/1540] avg loss 0.00470615, throughput 4.34056K wps
[Epoch 24 Batch 1050/1540] avg loss 0.00465935, throughput 4.75382K wps
[Epoch 24 Batch 1080/1540] avg loss 0.00441052, throughput 4.97798K wps
[Epoch 24 Batch 1110/1540] avg loss 0.00489743, throughput 4.67336K wps
[Epoch 24 Batch 1140/1540] avg loss 0.00459181, throughput 4.65544K wps
[Epoch 24 Batch 1170/1540] avg loss 0.00424319, throughput 5.1565K wps
[Epoch 24 Batch 1200/1540] avg loss 0.00413042, throughput 4.57486K wps
[Epoch 24 Batch 1230/1540] avg loss 0.00436813, throughput 5.24914K wps
[Epoch 24 Batch 1260/1540] avg loss 0.00435248, throughput 5.22441K wps
[Epoch 24 Batch 1290/1540] avg loss 0.00431539, throughput 4.95364K wps
[Epoch 24 Batch 1320/1540] avg loss 0.0043837, throughput 5.44703K wps
[Epoch 24 Batch 1350/1540] avg loss 0.00450484, throughput 4.76258K wps
[Epoch 24 Batch 1380/1540] avg loss 0.00461219, throughput 5.35624K wps
[Epoch 24 Batch 1410/1540] avg loss 0.00428192, throughput 5.09839K wps
[Epoch 24 Batch 1440/1540] avg loss 0.00426535, throughput 4.59102K wps
[Epoch 24 Batch 1470/1540] avg loss 0.00485487, throughput 5.25471K wps
[Epoch 24 Batch 1500/1540] avg loss 0.00494488, throughput 5.21033K wps
[Epoch 24 Batch 1530/1540] avg loss 0.00438239, throughput 4.74428K wps
Begin Testing...
[Epoch 24] train avg loss 0.00451508, dev acc 0.8372, dev avg loss 0.367916, throughput 4.84762K wps
[Epoch 25 Batch 30/1540] avg loss 0.00464174, throughput 4.81617K wps
[Epoch 25 Batch 60/1540] avg loss 0.00404975, throughput 4.41904K wps
[Epoch 25 Batch 90/1540] avg loss 0.0043496, throughput 4.88139K wps
[Epoch 25 Batch 120/1540] avg loss 0.00460135, throughput 5.02378K wps
[Epoch 25 Batch 150/1540] avg loss 0.00473036, throughput 5.04764K wps
[Epoch 25 Batch 180/1540] avg loss 0.00415456, throughput 4.63442K wps
[Epoch 25 Batch 210/1540] avg loss 0.00409499, throughput 5.01702K wps
[Epoch 25 Batch 240/1540] avg loss 0.00470003, throughput 4.81643K wps
[Epoch 25 Batch 270/1540] avg loss 0.00443806, throughput 5.22337K wps
[Epoch 25 Batch 300/1540] avg loss 0.00408326, throughput 4.63503K wps
[Epoch 25 Batch 330/1540] avg loss 0.00429829, throughput 5.03549K wps
[Epoch 25 Batch 360/1540] avg loss 0.00441905, throughput 4.73695K wps
[Epoch 25 Batch 390/1540] avg loss 0.00428054, throughput 4.76559K wps
[Epoch 25 Batch 420/1540] avg loss 0.0045764, throughput 4.74565K wps
[Epoch 25 Batch 450/1540] avg loss 0.00426998, throughput 4.71152K wps
[Epoch 25 Batch 480/1540] avg loss 0.00436556, throughput 4.79204K wps
[Epoch 25 Batch 510/1540] avg loss 0.00455424, throughput 5.47444K wps
[Epoch 25 Batch 540/1540] avg loss 0.00429234, throughput 5.84816K wps
[Epoch 25 Batch 570/1540] avg loss 0.0045715, throughput 4.9904K wps
[Epoch 25 Batch 600/1540] avg loss 0.00429857, throughput 6.0598K wps
[Epoch 25 Batch 630/1540] avg loss 0.0041198, throughput 4.74473K wps
[Epoch 25 Batch 660/1540] avg loss 0.00454152, throughput 5.09303K wps
[Epoch 25 Batch 690/1540] avg loss 0.0045023, throughput 4.50071K wps
[Epoch 25 Batch 720/1540] avg loss 0.00462578, throughput 5.08979K wps
[Epoch 25 Batch 750/1540] avg loss 0.00480624, throughput 5.23726K wps
[Epoch 25 Batch 780/1540] avg loss 0.00449061, throughput 4.85655K wps
[Epoch 25 Batch 810/1540] avg loss 0.00418675, throughput 4.54449K wps
[Epoch 25 Batch 840/1540] avg loss 0.00434763, throughput 4.51483K wps
[Epoch 25 Batch 870/1540] avg loss 0.004313, throughput 5.11857K wps
[Epoch 25 Batch 900/1540] avg loss 0.00421277, throughput 4.92934K wps
[Epoch 25 Batch 930/1540] avg loss 0.00476544, throughput 5.05988K wps
[Epoch 25 Batch 960/1540] avg loss 0.00441229, throughput 4.71448K wps
[Epoch 25 Batch 990/1540] avg loss 0.00430833, throughput 5.15228K wps
[Epoch 25 Batch 1020/1540] avg loss 0.00470978, throughput 5.00927K wps
[Epoch 25 Batch 1050/1540] avg loss 0.00454753, throughput 4.83254K wps
[Epoch 25 Batch 1080/1540] avg loss 0.00421781, throughput 4.38552K wps
[Epoch 25 Batch 1110/1540] avg loss 0.00438539, throughput 4.56881K wps
[Epoch 25 Batch 1140/1540] avg loss 0.0046571, throughput 5.14904K wps
[Epoch 25 Batch 1170/1540] avg loss 0.00447703, throughput 5.01066K wps
[Epoch 25 Batch 1200/1540] avg loss 0.00457487, throughput 5.24096K wps
[Epoch 25 Batch 1230/1540] avg loss 0.00438729, throughput 5.88397K wps
[Epoch 25 Batch 1260/1540] avg loss 0.00459037, throughput 5.01086K wps
[Epoch 25 Batch 1290/1540] avg loss 0.00457112, throughput 4.96821K wps
[Epoch 25 Batch 1320/1540] avg loss 0.00455715, throughput 4.44787K wps
[Epoch 25 Batch 1350/1540] avg loss 0.0042225, throughput 5.10721K wps
[Epoch 25 Batch 1380/1540] avg loss 0.00434684, throughput 5.02453K wps
[Epoch 25 Batch 1410/1540] avg loss 0.00449556, throughput 5.1311K wps
[Epoch 25 Batch 1440/1540] avg loss 0.00451146, throughput 4.71031K wps
[Epoch 25 Batch 1470/1540] avg loss 0.00455396, throughput 4.71267K wps
[Epoch 25 Batch 1500/1540] avg loss 0.00488172, throughput 4.59414K wps
[Epoch 25 Batch 1530/1540] avg loss 0.00439414, throughput 5.565K wps
Begin Testing...
[Epoch 25] train avg loss 0.00444155, dev acc 0.8406, dev avg loss 0.365583, throughput 4.92501K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 26 Batch 30/1540] avg loss 0.00389918, throughput 5.66232K wps
[Epoch 26 Batch 60/1540] avg loss 0.00461794, throughput 5.377K wps
[Epoch 26 Batch 90/1540] avg loss 0.00418458, throughput 5.14744K wps
[Epoch 26 Batch 120/1540] avg loss 0.00446791, throughput 4.97359K wps
[Epoch 26 Batch 150/1540] avg loss 0.00445478, throughput 5.19285K wps
[Epoch 26 Batch 180/1540] avg loss 0.00429082, throughput 4.84893K wps
[Epoch 26 Batch 210/1540] avg loss 0.00423031, throughput 5.32698K wps
[Epoch 26 Batch 240/1540] avg loss 0.00455258, throughput 4.47346K wps
[Epoch 26 Batch 270/1540] avg loss 0.00465525, throughput 4.40135K wps
[Epoch 26 Batch 300/1540] avg loss 0.00423384, throughput 5.06606K wps
[Epoch 26 Batch 330/1540] avg loss 0.0044177, throughput 4.483K wps
[Epoch 26 Batch 360/1540] avg loss 0.00407916, throughput 4.7174K wps
[Epoch 26 Batch 390/1540] avg loss 0.00483953, throughput 4.73404K wps
[Epoch 26 Batch 420/1540] avg loss 0.00478218, throughput 4.83775K wps
[Epoch 26 Batch 450/1540] avg loss 0.00440652, throughput 4.98466K wps
[Epoch 26 Batch 480/1540] avg loss 0.00380495, throughput 5.37678K wps
[Epoch 26 Batch 510/1540] avg loss 0.00429613, throughput 5.56828K wps
[Epoch 26 Batch 540/1540] avg loss 0.0043835, throughput 4.42584K wps
[Epoch 26 Batch 570/1540] avg loss 0.00427944, throughput 4.97357K wps
[Epoch 26 Batch 600/1540] avg loss 0.00393839, throughput 4.6046K wps
[Epoch 26 Batch 630/1540] avg loss 0.00447377, throughput 4.94466K wps
[Epoch 26 Batch 660/1540] avg loss 0.00442693, throughput 4.98514K wps
[Epoch 26 Batch 690/1540] avg loss 0.00404994, throughput 5.12008K wps
[Epoch 26 Batch 720/1540] avg loss 0.00470498, throughput 4.96975K wps
[Epoch 26 Batch 750/1540] avg loss 0.0038833, throughput 4.39621K wps
[Epoch 26 Batch 780/1540] avg loss 0.00416765, throughput 5.19639K wps
[Epoch 26 Batch 810/1540] avg loss 0.00450433, throughput 5.33312K wps
[Epoch 26 Batch 840/1540] avg loss 0.00443191, throughput 5.25508K wps
[Epoch 26 Batch 870/1540] avg loss 0.00414892, throughput 5.052K wps
[Epoch 26 Batch 900/1540] avg loss 0.00451723, throughput 5.27311K wps
[Epoch 26 Batch 930/1540] avg loss 0.00454857, throughput 5.24506K wps
[Epoch 26 Batch 960/1540] avg loss 0.0041137, throughput 4.62518K wps
[Epoch 26 Batch 990/1540] avg loss 0.00426092, throughput 4.5263K wps
[Epoch 26 Batch 1020/1540] avg loss 0.0046004, throughput 4.98915K wps
[Epoch 26 Batch 1050/1540] avg loss 0.00452018, throughput 4.7284K wps
[Epoch 26 Batch 1080/1540] avg loss 0.00447166, throughput 4.84133K wps
[Epoch 26 Batch 1110/1540] avg loss 0.00424414, throughput 5.24138K wps
[Epoch 26 Batch 1140/1540] avg loss 0.00426473, throughput 4.63555K wps
[Epoch 26 Batch 1170/1540] avg loss 0.00467677, throughput 5.28085K wps
[Epoch 26 Batch 1200/1540] avg loss 0.00440902, throughput 4.66717K wps
[Epoch 26 Batch 1230/1540] avg loss 0.00467384, throughput 4.58447K wps
[Epoch 26 Batch 1260/1540] avg loss 0.00486869, throughput 4.40425K wps
[Epoch 26 Batch 1290/1540] avg loss 0.00455456, throughput 4.8151K wps
[Epoch 26 Batch 1320/1540] avg loss 0.00474655, throughput 4.90868K wps
[Epoch 26 Batch 1350/1540] avg loss 0.00446499, throughput 5.17286K wps
[Epoch 26 Batch 1380/1540] avg loss 0.00423992, throughput 5.07698K wps
[Epoch 26 Batch 1410/1540] avg loss 0.00445731, throughput 5.01437K wps
[Epoch 26 Batch 1440/1540] avg loss 0.00423303, throughput 4.78818K wps
[Epoch 26 Batch 1470/1540] avg loss 0.00444376, throughput 4.56675K wps
[Epoch 26 Batch 1500/1540] avg loss 0.00438819, throughput 4.65534K wps
[Epoch 26 Batch 1530/1540] avg loss 0.00411834, throughput 4.88559K wps
Begin Testing...
[Epoch 26] train avg loss 0.00437785, dev acc 0.8372, dev avg loss 0.366675, throughput 4.90898K wps
[Epoch 27 Batch 30/1540] avg loss 0.00445957, throughput 5.15172K wps
[Epoch 27 Batch 60/1540] avg loss 0.00425053, throughput 5.26539K wps
[Epoch 27 Batch 90/1540] avg loss 0.00427251, throughput 4.62869K wps
[Epoch 27 Batch 120/1540] avg loss 0.0045322, throughput 4.82853K wps
[Epoch 27 Batch 150/1540] avg loss 0.00421267, throughput 5.06972K wps
[Epoch 27 Batch 180/1540] avg loss 0.00390336, throughput 4.73678K wps
[Epoch 27 Batch 210/1540] avg loss 0.0044173, throughput 4.93392K wps
[Epoch 27 Batch 240/1540] avg loss 0.00432737, throughput 4.72695K wps
[Epoch 27 Batch 270/1540] avg loss 0.00407979, throughput 4.96008K wps
[Epoch 27 Batch 300/1540] avg loss 0.00428247, throughput 5.0984K wps
[Epoch 27 Batch 330/1540] avg loss 0.00403147, throughput 4.73163K wps
[Epoch 27 Batch 360/1540] avg loss 0.00404572, throughput 4.4744K wps
[Epoch 27 Batch 390/1540] avg loss 0.00454574, throughput 5.12429K wps
[Epoch 27 Batch 420/1540] avg loss 0.00450965, throughput 4.98129K wps
[Epoch 27 Batch 450/1540] avg loss 0.00420522, throughput 4.76992K wps
[Epoch 27 Batch 480/1540] avg loss 0.0042997, throughput 5.94679K wps
[Epoch 27 Batch 510/1540] avg loss 0.00418953, throughput 4.88124K wps
[Epoch 27 Batch 540/1540] avg loss 0.00420248, throughput 4.8316K wps
[Epoch 27 Batch 570/1540] avg loss 0.0038696, throughput 4.6067K wps
[Epoch 27 Batch 600/1540] avg loss 0.00450806, throughput 5.63362K wps
[Epoch 27 Batch 630/1540] avg loss 0.00423576, throughput 4.55601K wps
[Epoch 27 Batch 660/1540] avg loss 0.00400912, throughput 4.74521K wps
[Epoch 27 Batch 690/1540] avg loss 0.00433519, throughput 4.85929K wps
[Epoch 27 Batch 720/1540] avg loss 0.00452931, throughput 5.27818K wps
[Epoch 27 Batch 750/1540] avg loss 0.0044827, throughput 5.44606K wps
[Epoch 27 Batch 780/1540] avg loss 0.00409522, throughput 5.34912K wps
[Epoch 27 Batch 810/1540] avg loss 0.00469294, throughput 4.64473K wps
[Epoch 27 Batch 840/1540] avg loss 0.00437555, throughput 4.81439K wps
[Epoch 27 Batch 870/1540] avg loss 0.00464173, throughput 4.90204K wps
[Epoch 27 Batch 900/1540] avg loss 0.00399806, throughput 5.22596K wps
[Epoch 27 Batch 930/1540] avg loss 0.00474561, throughput 5.19746K wps
[Epoch 27 Batch 960/1540] avg loss 0.00449893, throughput 5.02294K wps
[Epoch 27 Batch 990/1540] avg loss 0.00458479, throughput 5.24428K wps
[Epoch 27 Batch 1020/1540] avg loss 0.00403284, throughput 4.78447K wps
[Epoch 27 Batch 1050/1540] avg loss 0.00389853, throughput 4.68152K wps
[Epoch 27 Batch 1080/1540] avg loss 0.00393733, throughput 5.67894K wps
[Epoch 27 Batch 1110/1540] avg loss 0.00421953, throughput 4.82133K wps
[Epoch 27 Batch 1140/1540] avg loss 0.00468541, throughput 4.74368K wps
[Epoch 27 Batch 1170/1540] avg loss 0.00427079, throughput 5.1246K wps
[Epoch 27 Batch 1200/1540] avg loss 0.00418652, throughput 4.4861K wps
[Epoch 27 Batch 1230/1540] avg loss 0.00400844, throughput 4.61425K wps
[Epoch 27 Batch 1260/1540] avg loss 0.00411992, throughput 4.58327K wps
[Epoch 27 Batch 1290/1540] avg loss 0.00451989, throughput 4.5942K wps
[Epoch 27 Batch 1320/1540] avg loss 0.0040097, throughput 4.66131K wps
[Epoch 27 Batch 1350/1540] avg loss 0.00473467, throughput 4.96534K wps
[Epoch 27 Batch 1380/1540] avg loss 0.00426369, throughput 4.33093K wps
[Epoch 27 Batch 1410/1540] avg loss 0.00473527, throughput 4.7691K wps
[Epoch 27 Batch 1440/1540] avg loss 0.00422279, throughput 4.85001K wps
[Epoch 27 Batch 1470/1540] avg loss 0.00393242, throughput 4.64551K wps
[Epoch 27 Batch 1500/1540] avg loss 0.00426153, throughput 4.76337K wps
[Epoch 27 Batch 1530/1540] avg loss 0.00411863, throughput 4.69747K wps
Begin Testing...
[Epoch 27] train avg loss 0.00429372, dev acc 0.8406, dev avg loss 0.370956, throughput 4.8871K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.13 s
[Epoch 28 Batch 30/1540] avg loss 0.00434431, throughput 4.78088K wps
[Epoch 28 Batch 60/1540] avg loss 0.00460522, throughput 4.7065K wps
[Epoch 28 Batch 90/1540] avg loss 0.00452851, throughput 5.02598K wps
[Epoch 28 Batch 120/1540] avg loss 0.00424762, throughput 5.14969K wps
[Epoch 28 Batch 150/1540] avg loss 0.00391494, throughput 4.44033K wps
[Epoch 28 Batch 180/1540] avg loss 0.00406607, throughput 4.80357K wps
[Epoch 28 Batch 210/1540] avg loss 0.00399416, throughput 5.11664K wps
[Epoch 28 Batch 240/1540] avg loss 0.00439876, throughput 4.75506K wps
[Epoch 28 Batch 270/1540] avg loss 0.00400942, throughput 4.52058K wps
[Epoch 28 Batch 300/1540] avg loss 0.00415929, throughput 5.12285K wps
[Epoch 28 Batch 330/1540] avg loss 0.00408912, throughput 4.60286K wps
[Epoch 28 Batch 360/1540] avg loss 0.00390662, throughput 5.36537K wps
[Epoch 28 Batch 390/1540] avg loss 0.004663, throughput 5.16356K wps
[Epoch 28 Batch 420/1540] avg loss 0.00438535, throughput 4.93417K wps
[Epoch 28 Batch 450/1540] avg loss 0.00413568, throughput 4.50057K wps
[Epoch 28 Batch 480/1540] avg loss 0.00389405, throughput 5.09254K wps
[Epoch 28 Batch 510/1540] avg loss 0.00452843, throughput 4.49398K wps
[Epoch 28 Batch 540/1540] avg loss 0.0041331, throughput 5.20117K wps
[Epoch 28 Batch 570/1540] avg loss 0.00436076, throughput 4.44886K wps
[Epoch 28 Batch 600/1540] avg loss 0.00444965, throughput 5.18724K wps
[Epoch 28 Batch 630/1540] avg loss 0.00398722, throughput 4.77747K wps
[Epoch 28 Batch 660/1540] avg loss 0.00425992, throughput 4.43553K wps
[Epoch 28 Batch 690/1540] avg loss 0.00451064, throughput 4.86374K wps
[Epoch 28 Batch 720/1540] avg loss 0.00411839, throughput 5.03597K wps
[Epoch 28 Batch 750/1540] avg loss 0.00424174, throughput 4.57704K wps
[Epoch 28 Batch 780/1540] avg loss 0.00414121, throughput 5.25873K wps
[Epoch 28 Batch 810/1540] avg loss 0.00410468, throughput 5.5874K wps
[Epoch 28 Batch 840/1540] avg loss 0.00416115, throughput 4.55392K wps
[Epoch 28 Batch 870/1540] avg loss 0.0042215, throughput 4.7677K wps
[Epoch 28 Batch 900/1540] avg loss 0.00400111, throughput 5.24575K wps
[Epoch 28 Batch 930/1540] avg loss 0.00472509, throughput 5.17504K wps
[Epoch 28 Batch 960/1540] avg loss 0.00391249, throughput 5.24056K wps
[Epoch 28 Batch 990/1540] avg loss 0.00407083, throughput 4.32759K wps
[Epoch 28 Batch 1020/1540] avg loss 0.00436181, throughput 5.10108K wps
[Epoch 28 Batch 1050/1540] avg loss 0.00425921, throughput 5.35788K wps
[Epoch 28 Batch 1080/1540] avg loss 0.00384342, throughput 5.05833K wps
[Epoch 28 Batch 1110/1540] avg loss 0.00452404, throughput 4.62417K wps
[Epoch 28 Batch 1140/1540] avg loss 0.00421661, throughput 4.48986K wps
[Epoch 28 Batch 1170/1540] avg loss 0.00402842, throughput 5.09389K wps
[Epoch 28 Batch 1200/1540] avg loss 0.00386948, throughput 4.44981K wps
[Epoch 28 Batch 1230/1540] avg loss 0.00423602, throughput 4.99562K wps
[Epoch 28 Batch 1260/1540] avg loss 0.00428664, throughput 5.06931K wps
[Epoch 28 Batch 1290/1540] avg loss 0.00400312, throughput 4.64799K wps
[Epoch 28 Batch 1320/1540] avg loss 0.00416396, throughput 5.30606K wps
[Epoch 28 Batch 1350/1540] avg loss 0.00454088, throughput 5.282K wps
[Epoch 28 Batch 1380/1540] avg loss 0.00416962, throughput 5.37415K wps
[Epoch 28 Batch 1410/1540] avg loss 0.00418771, throughput 5.24921K wps
[Epoch 28 Batch 1440/1540] avg loss 0.00448349, throughput 5.20191K wps
[Epoch 28 Batch 1470/1540] avg loss 0.00440752, throughput 5.29247K wps
[Epoch 28 Batch 1500/1540] avg loss 0.00419762, throughput 5.4397K wps
[Epoch 28 Batch 1530/1540] avg loss 0.00430461, throughput 4.80515K wps
Begin Testing...
[Epoch 28] train avg loss 0.00422204, dev acc 0.8429, dev avg loss 0.369501, throughput 4.91764K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 29 Batch 30/1540] avg loss 0.0041389, throughput 4.8286K wps
[Epoch 29 Batch 60/1540] avg loss 0.00384717, throughput 5.02875K wps
[Epoch 29 Batch 90/1540] avg loss 0.0039376, throughput 5.5052K wps
[Epoch 29 Batch 120/1540] avg loss 0.00413063, throughput 4.42082K wps
[Epoch 29 Batch 150/1540] avg loss 0.00372093, throughput 4.64789K wps
[Epoch 29 Batch 180/1540] avg loss 0.00422519, throughput 5.05235K wps
[Epoch 29 Batch 210/1540] avg loss 0.00428975, throughput 4.67333K wps
[Epoch 29 Batch 240/1540] avg loss 0.00434287, throughput 4.76883K wps
[Epoch 29 Batch 270/1540] avg loss 0.00433178, throughput 4.82576K wps
[Epoch 29 Batch 300/1540] avg loss 0.00385408, throughput 5.56543K wps
[Epoch 29 Batch 330/1540] avg loss 0.00420505, throughput 4.9298K wps
[Epoch 29 Batch 360/1540] avg loss 0.00420048, throughput 4.78295K wps
[Epoch 29 Batch 390/1540] avg loss 0.00434066, throughput 5.04673K wps
[Epoch 29 Batch 420/1540] avg loss 0.00417089, throughput 4.77054K wps
[Epoch 29 Batch 450/1540] avg loss 0.00444388, throughput 4.80349K wps
[Epoch 29 Batch 480/1540] avg loss 0.0039934, throughput 5.23138K wps
[Epoch 29 Batch 510/1540] avg loss 0.0043371, throughput 5.3825K wps
[Epoch 29 Batch 540/1540] avg loss 0.00413541, throughput 5.41873K wps
[Epoch 29 Batch 570/1540] avg loss 0.00449433, throughput 4.68442K wps
[Epoch 29 Batch 600/1540] avg loss 0.00431082, throughput 5.61675K wps
[Epoch 29 Batch 630/1540] avg loss 0.00441817, throughput 5.71315K wps
[Epoch 29 Batch 660/1540] avg loss 0.0042906, throughput 4.58066K wps
[Epoch 29 Batch 690/1540] avg loss 0.0040809, throughput 4.39977K wps
[Epoch 29 Batch 720/1540] avg loss 0.00410611, throughput 4.84067K wps
[Epoch 29 Batch 750/1540] avg loss 0.00435056, throughput 4.47234K wps
[Epoch 29 Batch 780/1540] avg loss 0.00452276, throughput 5.30718K wps
[Epoch 29 Batch 810/1540] avg loss 0.00416126, throughput 5.03427K wps
[Epoch 29 Batch 840/1540] avg loss 0.00388409, throughput 4.86612K wps
[Epoch 29 Batch 870/1540] avg loss 0.00430399, throughput 5.66743K wps
[Epoch 29 Batch 900/1540] avg loss 0.00436708, throughput 5.74476K wps
[Epoch 29 Batch 930/1540] avg loss 0.00387519, throughput 4.71653K wps
[Epoch 29 Batch 960/1540] avg loss 0.0041206, throughput 4.44226K wps
[Epoch 29 Batch 990/1540] avg loss 0.00436392, throughput 4.36601K wps
[Epoch 29 Batch 1020/1540] avg loss 0.00376454, throughput 5.65249K wps
[Epoch 29 Batch 1050/1540] avg loss 0.00382968, throughput 5.25448K wps
[Epoch 29 Batch 1080/1540] avg loss 0.00427453, throughput 5.46145K wps
[Epoch 29 Batch 1110/1540] avg loss 0.00479178, throughput 4.78116K wps
[Epoch 29 Batch 1140/1540] avg loss 0.00413552, throughput 4.95089K wps
[Epoch 29 Batch 1170/1540] avg loss 0.00387141, throughput 4.45706K wps
[Epoch 29 Batch 1200/1540] avg loss 0.00407228, throughput 5.38099K wps
[Epoch 29 Batch 1230/1540] avg loss 0.00404863, throughput 5.00768K wps
[Epoch 29 Batch 1260/1540] avg loss 0.00432244, throughput 5.88387K wps
[Epoch 29 Batch 1290/1540] avg loss 0.0043487, throughput 4.50959K wps
[Epoch 29 Batch 1320/1540] avg loss 0.00415311, throughput 4.94121K wps
[Epoch 29 Batch 1350/1540] avg loss 0.00421909, throughput 5.54444K wps
[Epoch 29 Batch 1380/1540] avg loss 0.00417505, throughput 5.35866K wps
[Epoch 29 Batch 1410/1540] avg loss 0.00373919, throughput 4.4154K wps
[Epoch 29 Batch 1440/1540] avg loss 0.00404729, throughput 4.47561K wps
[Epoch 29 Batch 1470/1540] avg loss 0.00432159, throughput 4.93419K wps
[Epoch 29 Batch 1500/1540] avg loss 0.00456535, throughput 4.89816K wps
[Epoch 29 Batch 1530/1540] avg loss 0.00407345, throughput 5.49679K wps
Begin Testing...
[Epoch 29] train avg loss 0.00418112, dev acc 0.8394, dev avg loss 0.365906, throughput 4.97278K wps
[Epoch 30 Batch 30/1540] avg loss 0.00445748, throughput 4.51713K wps
[Epoch 30 Batch 60/1540] avg loss 0.00429158, throughput 4.9029K wps
[Epoch 30 Batch 90/1540] avg loss 0.00420697, throughput 6.04768K wps
[Epoch 30 Batch 120/1540] avg loss 0.0042077, throughput 5.0206K wps
[Epoch 30 Batch 150/1540] avg loss 0.00431095, throughput 4.69976K wps
[Epoch 30 Batch 180/1540] avg loss 0.00419801, throughput 4.94308K wps
[Epoch 30 Batch 210/1540] avg loss 0.00411519, throughput 4.7963K wps
[Epoch 30 Batch 240/1540] avg loss 0.00392375, throughput 5.29074K wps
[Epoch 30 Batch 270/1540] avg loss 0.00389336, throughput 5.06041K wps
[Epoch 30 Batch 300/1540] avg loss 0.00393478, throughput 4.61846K wps
[Epoch 30 Batch 330/1540] avg loss 0.0042279, throughput 4.83716K wps
[Epoch 30 Batch 360/1540] avg loss 0.00408563, throughput 4.60622K wps
[Epoch 30 Batch 390/1540] avg loss 0.00422487, throughput 5.01634K wps
[Epoch 30 Batch 420/1540] avg loss 0.004722, throughput 5.75381K wps
[Epoch 30 Batch 450/1540] avg loss 0.00401208, throughput 5.09832K wps
[Epoch 30 Batch 480/1540] avg loss 0.00418609, throughput 5.50496K wps
[Epoch 30 Batch 510/1540] avg loss 0.00418628, throughput 5.25253K wps
[Epoch 30 Batch 540/1540] avg loss 0.00403016, throughput 4.7742K wps
[Epoch 30 Batch 570/1540] avg loss 0.00405215, throughput 4.49461K wps
[Epoch 30 Batch 600/1540] avg loss 0.00366288, throughput 4.93206K wps
[Epoch 30 Batch 630/1540] avg loss 0.00370212, throughput 4.95437K wps
[Epoch 30 Batch 660/1540] avg loss 0.0040263, throughput 5.41803K wps
[Epoch 30 Batch 690/1540] avg loss 0.00399179, throughput 4.76629K wps
[Epoch 30 Batch 720/1540] avg loss 0.00387652, throughput 4.66769K wps
[Epoch 30 Batch 750/1540] avg loss 0.00394059, throughput 5.07923K wps
[Epoch 30 Batch 780/1540] avg loss 0.00437231, throughput 4.66382K wps
[Epoch 30 Batch 810/1540] avg loss 0.00439958, throughput 5.3773K wps
[Epoch 30 Batch 840/1540] avg loss 0.00398043, throughput 5.29718K wps
[Epoch 30 Batch 870/1540] avg loss 0.00409302, throughput 4.99238K wps
[Epoch 30 Batch 900/1540] avg loss 0.00439633, throughput 5.05614K wps
[Epoch 30 Batch 930/1540] avg loss 0.00401992, throughput 4.57826K wps
[Epoch 30 Batch 960/1540] avg loss 0.00409872, throughput 5.13824K wps
[Epoch 30 Batch 990/1540] avg loss 0.00447847, throughput 5.28287K wps
[Epoch 30 Batch 1020/1540] avg loss 0.00440741, throughput 4.63926K wps
[Epoch 30 Batch 1050/1540] avg loss 0.00399016, throughput 5.34596K wps
[Epoch 30 Batch 1080/1540] avg loss 0.00408482, throughput 5.21045K wps
[Epoch 30 Batch 1110/1540] avg loss 0.00387895, throughput 5.47629K wps
[Epoch 30 Batch 1140/1540] avg loss 0.00415239, throughput 5.02116K wps
[Epoch 30 Batch 1170/1540] avg loss 0.00435371, throughput 5.52693K wps
[Epoch 30 Batch 1200/1540] avg loss 0.00397714, throughput 4.63974K wps
[Epoch 30 Batch 1230/1540] avg loss 0.00390049, throughput 4.65758K wps
[Epoch 30 Batch 1260/1540] avg loss 0.00439477, throughput 4.78734K wps
[Epoch 30 Batch 1290/1540] avg loss 0.00389096, throughput 5.23187K wps
[Epoch 30 Batch 1320/1540] avg loss 0.0042188, throughput 4.64204K wps
[Epoch 30 Batch 1350/1540] avg loss 0.00403368, throughput 5.00272K wps
[Epoch 30 Batch 1380/1540] avg loss 0.00420216, throughput 4.6976K wps
[Epoch 30 Batch 1410/1540] avg loss 0.00418217, throughput 5.28919K wps
[Epoch 30 Batch 1440/1540] avg loss 0.00415878, throughput 5.1969K wps
[Epoch 30 Batch 1470/1540] avg loss 0.00397276, throughput 5.43579K wps
[Epoch 30 Batch 1500/1540] avg loss 0.00404821, throughput 4.41482K wps
[Epoch 30 Batch 1530/1540] avg loss 0.00382418, throughput 5.07438K wps
Begin Testing...
[Epoch 30] train avg loss 0.00411913, dev acc 0.8406, dev avg loss 0.366854, throughput 4.99331K wps
[Epoch 31 Batch 30/1540] avg loss 0.00419063, throughput 5.1264K wps
[Epoch 31 Batch 60/1540] avg loss 0.00364233, throughput 5.09892K wps
[Epoch 31 Batch 90/1540] avg loss 0.00357768, throughput 4.81709K wps
[Epoch 31 Batch 120/1540] avg loss 0.00398667, throughput 5.38048K wps
[Epoch 31 Batch 150/1540] avg loss 0.00398275, throughput 4.80762K wps
[Epoch 31 Batch 180/1540] avg loss 0.00369811, throughput 4.6574K wps
[Epoch 31 Batch 210/1540] avg loss 0.00373753, throughput 5.01733K wps
[Epoch 31 Batch 240/1540] avg loss 0.00402239, throughput 4.91807K wps
[Epoch 31 Batch 270/1540] avg loss 0.00410026, throughput 4.60884K wps
[Epoch 31 Batch 300/1540] avg loss 0.00401763, throughput 4.69737K wps
[Epoch 31 Batch 330/1540] avg loss 0.00398547, throughput 4.60565K wps
[Epoch 31 Batch 360/1540] avg loss 0.0039823, throughput 5.31611K wps
[Epoch 31 Batch 390/1540] avg loss 0.00392085, throughput 5.21192K wps
[Epoch 31 Batch 420/1540] avg loss 0.00398038, throughput 5.3408K wps
[Epoch 31 Batch 450/1540] avg loss 0.00419584, throughput 4.61488K wps
[Epoch 31 Batch 480/1540] avg loss 0.00419692, throughput 5.08147K wps
[Epoch 31 Batch 510/1540] avg loss 0.00401112, throughput 4.55474K wps
[Epoch 31 Batch 540/1540] avg loss 0.00417716, throughput 5.63349K wps
[Epoch 31 Batch 570/1540] avg loss 0.0041529, throughput 4.77938K wps
[Epoch 31 Batch 600/1540] avg loss 0.00407844, throughput 5.08305K wps
[Epoch 31 Batch 630/1540] avg loss 0.00430374, throughput 5.72157K wps
[Epoch 31 Batch 660/1540] avg loss 0.00390304, throughput 4.83977K wps
[Epoch 31 Batch 690/1540] avg loss 0.00431547, throughput 4.77209K wps
[Epoch 31 Batch 720/1540] avg loss 0.00414129, throughput 4.57262K wps
[Epoch 31 Batch 750/1540] avg loss 0.00413951, throughput 5.16348K wps
[Epoch 31 Batch 780/1540] avg loss 0.00418725, throughput 4.71902K wps
[Epoch 31 Batch 810/1540] avg loss 0.00406648, throughput 5.25607K wps
[Epoch 31 Batch 840/1540] avg loss 0.00406159, throughput 5.28247K wps
[Epoch 31 Batch 870/1540] avg loss 0.00399595, throughput 4.66748K wps
[Epoch 31 Batch 900/1540] avg loss 0.00418005, throughput 4.6561K wps
[Epoch 31 Batch 930/1540] avg loss 0.00395438, throughput 4.54506K wps
[Epoch 31 Batch 960/1540] avg loss 0.00381509, throughput 5.21815K wps
[Epoch 31 Batch 990/1540] avg loss 0.00414892, throughput 5.21412K wps
[Epoch 31 Batch 1020/1540] avg loss 0.00417014, throughput 5.29006K wps
[Epoch 31 Batch 1050/1540] avg loss 0.00394452, throughput 4.67042K wps
[Epoch 31 Batch 1080/1540] avg loss 0.00396079, throughput 4.52388K wps
[Epoch 31 Batch 1110/1540] avg loss 0.00383752, throughput 5.01128K wps
[Epoch 31 Batch 1140/1540] avg loss 0.00406322, throughput 4.98211K wps
[Epoch 31 Batch 1170/1540] avg loss 0.00388749, throughput 5.81185K wps
[Epoch 31 Batch 1200/1540] avg loss 0.0040489, throughput 5.27161K wps
[Epoch 31 Batch 1230/1540] avg loss 0.0040874, throughput 5.29212K wps
[Epoch 31 Batch 1260/1540] avg loss 0.00373974, throughput 5.42277K wps
[Epoch 31 Batch 1290/1540] avg loss 0.00425762, throughput 4.47546K wps
[Epoch 31 Batch 1320/1540] avg loss 0.00385115, throughput 5.07312K wps
[Epoch 31 Batch 1350/1540] avg loss 0.00415012, throughput 4.80194K wps
[Epoch 31 Batch 1380/1540] avg loss 0.00418632, throughput 5.00584K wps
[Epoch 31 Batch 1410/1540] avg loss 0.00442429, throughput 5.27463K wps
[Epoch 31 Batch 1440/1540] avg loss 0.00443198, throughput 5.21517K wps
[Epoch 31 Batch 1470/1540] avg loss 0.00370398, throughput 4.89187K wps
[Epoch 31 Batch 1500/1540] avg loss 0.00395815, throughput 4.65986K wps
[Epoch 31 Batch 1530/1540] avg loss 0.00388477, throughput 5.16161K wps
Begin Testing...
[Epoch 31] train avg loss 0.00402773, dev acc 0.8475, dev avg loss 0.367162, throughput 4.97149K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 32 Batch 30/1540] avg loss 0.00376036, throughput 5.07985K wps
[Epoch 32 Batch 60/1540] avg loss 0.00375215, throughput 5.20499K wps
[Epoch 32 Batch 90/1540] avg loss 0.00406815, throughput 5.43382K wps
[Epoch 32 Batch 120/1540] avg loss 0.00406176, throughput 4.92406K wps
[Epoch 32 Batch 150/1540] avg loss 0.00373547, throughput 5.08875K wps
[Epoch 32 Batch 180/1540] avg loss 0.00391993, throughput 4.83464K wps
[Epoch 32 Batch 210/1540] avg loss 0.0039163, throughput 4.5602K wps
[Epoch 32 Batch 240/1540] avg loss 0.00415451, throughput 4.85825K wps
[Epoch 32 Batch 270/1540] avg loss 0.00393445, throughput 5.11206K wps
[Epoch 32 Batch 300/1540] avg loss 0.00392275, throughput 4.7207K wps
[Epoch 32 Batch 330/1540] avg loss 0.00421948, throughput 4.74732K wps
[Epoch 32 Batch 360/1540] avg loss 0.00368443, throughput 4.81092K wps
[Epoch 32 Batch 390/1540] avg loss 0.00373249, throughput 5.23585K wps
[Epoch 32 Batch 420/1540] avg loss 0.00394812, throughput 5.00549K wps
[Epoch 32 Batch 450/1540] avg loss 0.00373825, throughput 4.97014K wps
[Epoch 32 Batch 480/1540] avg loss 0.00377035, throughput 4.79682K wps
[Epoch 32 Batch 510/1540] avg loss 0.00384799, throughput 5.28371K wps
[Epoch 32 Batch 540/1540] avg loss 0.00384507, throughput 4.93802K wps
[Epoch 32 Batch 570/1540] avg loss 0.00417372, throughput 4.74918K wps
[Epoch 32 Batch 600/1540] avg loss 0.00390922, throughput 5.145K wps
[Epoch 32 Batch 630/1540] avg loss 0.00370013, throughput 4.77126K wps
[Epoch 32 Batch 660/1540] avg loss 0.00385048, throughput 5.17389K wps
[Epoch 32 Batch 690/1540] avg loss 0.0042512, throughput 5.01762K wps
[Epoch 32 Batch 720/1540] avg loss 0.00407747, throughput 4.9765K wps
[Epoch 32 Batch 750/1540] avg loss 0.00379411, throughput 4.64395K wps
[Epoch 32 Batch 780/1540] avg loss 0.00378288, throughput 4.69735K wps
[Epoch 32 Batch 810/1540] avg loss 0.00396337, throughput 5.00585K wps
[Epoch 32 Batch 840/1540] avg loss 0.00388554, throughput 4.80891K wps
[Epoch 32 Batch 870/1540] avg loss 0.00442335, throughput 5.68002K wps
[Epoch 32 Batch 900/1540] avg loss 0.00390755, throughput 4.83948K wps
[Epoch 32 Batch 930/1540] avg loss 0.00450142, throughput 5.02589K wps
[Epoch 32 Batch 960/1540] avg loss 0.00380047, throughput 5.6613K wps
[Epoch 32 Batch 990/1540] avg loss 0.0039534, throughput 5.56392K wps
[Epoch 32 Batch 1020/1540] avg loss 0.00430536, throughput 5.51445K wps
[Epoch 32 Batch 1050/1540] avg loss 0.00361468, throughput 4.53201K wps
[Epoch 32 Batch 1080/1540] avg loss 0.00391397, throughput 4.8334K wps
[Epoch 32 Batch 1110/1540] avg loss 0.00366272, throughput 4.66261K wps
[Epoch 32 Batch 1140/1540] avg loss 0.0041101, throughput 4.78563K wps
[Epoch 32 Batch 1170/1540] avg loss 0.00402659, throughput 4.66723K wps
[Epoch 32 Batch 1200/1540] avg loss 0.00425733, throughput 4.74966K wps
[Epoch 32 Batch 1230/1540] avg loss 0.00397581, throughput 4.49223K wps
[Epoch 32 Batch 1260/1540] avg loss 0.00445702, throughput 4.83723K wps
[Epoch 32 Batch 1290/1540] avg loss 0.0042036, throughput 4.56244K wps
[Epoch 32 Batch 1320/1540] avg loss 0.00424412, throughput 5.17697K wps
[Epoch 32 Batch 1350/1540] avg loss 0.00377782, throughput 5.20963K wps
[Epoch 32 Batch 1380/1540] avg loss 0.00374396, throughput 4.85077K wps
[Epoch 32 Batch 1410/1540] avg loss 0.00396169, throughput 5.19901K wps
[Epoch 32 Batch 1440/1540] avg loss 0.00387414, throughput 4.49171K wps
[Epoch 32 Batch 1470/1540] avg loss 0.00373448, throughput 4.82598K wps
[Epoch 32 Batch 1500/1540] avg loss 0.0042927, throughput 5.90441K wps
[Epoch 32 Batch 1530/1540] avg loss 0.00374069, throughput 5.13546K wps
Begin Testing...
[Epoch 32] train avg loss 0.00396161, dev acc 0.8452, dev avg loss 0.369497, throughput 4.9534K wps
[Epoch 33 Batch 30/1540] avg loss 0.0039201, throughput 4.71419K wps
[Epoch 33 Batch 60/1540] avg loss 0.00367733, throughput 5.2707K wps
[Epoch 33 Batch 90/1540] avg loss 0.00425139, throughput 5.32959K wps
[Epoch 33 Batch 120/1540] avg loss 0.00399377, throughput 4.76073K wps
[Epoch 33 Batch 150/1540] avg loss 0.00380435, throughput 4.6653K wps
[Epoch 33 Batch 180/1540] avg loss 0.00382912, throughput 5.01115K wps
[Epoch 33 Batch 210/1540] avg loss 0.00353698, throughput 5.13411K wps
[Epoch 33 Batch 240/1540] avg loss 0.0036655, throughput 5.34K wps
[Epoch 33 Batch 270/1540] avg loss 0.00400589, throughput 5.30228K wps
[Epoch 33 Batch 300/1540] avg loss 0.00419672, throughput 4.92376K wps
[Epoch 33 Batch 330/1540] avg loss 0.00371658, throughput 5.01327K wps
[Epoch 33 Batch 360/1540] avg loss 0.00373864, throughput 5.68418K wps
[Epoch 33 Batch 390/1540] avg loss 0.00379675, throughput 5.41211K wps
[Epoch 33 Batch 420/1540] avg loss 0.00414164, throughput 4.6643K wps
[Epoch 33 Batch 450/1540] avg loss 0.0037732, throughput 5.66112K wps
[Epoch 33 Batch 480/1540] avg loss 0.00402474, throughput 4.62503K wps
[Epoch 33 Batch 510/1540] avg loss 0.00371926, throughput 4.77263K wps
[Epoch 33 Batch 540/1540] avg loss 0.00391948, throughput 4.50853K wps
[Epoch 33 Batch 570/1540] avg loss 0.00398406, throughput 4.45553K wps
[Epoch 33 Batch 600/1540] avg loss 0.00388408, throughput 4.46113K wps
[Epoch 33 Batch 630/1540] avg loss 0.00397342, throughput 5.30911K wps
[Epoch 33 Batch 660/1540] avg loss 0.00396809, throughput 4.59263K wps
[Epoch 33 Batch 690/1540] avg loss 0.00384841, throughput 5.09268K wps
[Epoch 33 Batch 720/1540] avg loss 0.00431819, throughput 4.51909K wps
[Epoch 33 Batch 750/1540] avg loss 0.00408924, throughput 4.95191K wps
[Epoch 33 Batch 780/1540] avg loss 0.00376369, throughput 4.56477K wps
[Epoch 33 Batch 810/1540] avg loss 0.00393598, throughput 4.88182K wps
[Epoch 33 Batch 840/1540] avg loss 0.00377254, throughput 5.2497K wps
[Epoch 33 Batch 870/1540] avg loss 0.00396709, throughput 5.01073K wps
[Epoch 33 Batch 900/1540] avg loss 0.00384749, throughput 5.32K wps
[Epoch 33 Batch 930/1540] avg loss 0.00389103, throughput 4.85478K wps
[Epoch 33 Batch 960/1540] avg loss 0.00380689, throughput 5.16292K wps
[Epoch 33 Batch 990/1540] avg loss 0.00428425, throughput 4.81114K wps
[Epoch 33 Batch 1020/1540] avg loss 0.00385881, throughput 4.72112K wps
[Epoch 33 Batch 1050/1540] avg loss 0.00403825, throughput 4.92087K wps
[Epoch 33 Batch 1080/1540] avg loss 0.00387981, throughput 4.46742K wps
[Epoch 33 Batch 1110/1540] avg loss 0.00420083, throughput 4.62951K wps
[Epoch 33 Batch 1140/1540] avg loss 0.00397325, throughput 5.09858K wps
[Epoch 33 Batch 1170/1540] avg loss 0.00373698, throughput 4.87884K wps
[Epoch 33 Batch 1200/1540] avg loss 0.00363317, throughput 5.11004K wps
[Epoch 33 Batch 1230/1540] avg loss 0.00349299, throughput 4.52666K wps
[Epoch 33 Batch 1260/1540] avg loss 0.00367047, throughput 4.75067K wps
[Epoch 33 Batch 1290/1540] avg loss 0.00392723, throughput 4.86799K wps
[Epoch 33 Batch 1320/1540] avg loss 0.00411438, throughput 4.44656K wps
[Epoch 33 Batch 1350/1540] avg loss 0.00417535, throughput 4.64179K wps
[Epoch 33 Batch 1380/1540] avg loss 0.00409863, throughput 5.38652K wps
[Epoch 33 Batch 1410/1540] avg loss 0.00368031, throughput 4.91701K wps
[Epoch 33 Batch 1440/1540] avg loss 0.00404719, throughput 4.89703K wps
[Epoch 33 Batch 1470/1540] avg loss 0.00405532, throughput 4.54624K wps
[Epoch 33 Batch 1500/1540] avg loss 0.00374862, throughput 4.91322K wps
[Epoch 33 Batch 1530/1540] avg loss 0.00407187, throughput 5.20097K wps
Begin Testing...
[Epoch 33] train avg loss 0.00391772, dev acc 0.8440, dev avg loss 0.370316, throughput 4.89992K wps
[Epoch 34 Batch 30/1540] avg loss 0.00336856, throughput 5.20048K wps
[Epoch 34 Batch 60/1540] avg loss 0.00372024, throughput 5.30349K wps
[Epoch 34 Batch 90/1540] avg loss 0.00391524, throughput 5.47486K wps
[Epoch 34 Batch 120/1540] avg loss 0.00435383, throughput 5.32879K wps
[Epoch 34 Batch 150/1540] avg loss 0.00437403, throughput 4.21199K wps
[Epoch 34 Batch 180/1540] avg loss 0.00315215, throughput 4.76245K wps
[Epoch 34 Batch 210/1540] avg loss 0.00374628, throughput 5.01651K wps
[Epoch 34 Batch 240/1540] avg loss 0.00340561, throughput 4.80042K wps
[Epoch 34 Batch 270/1540] avg loss 0.00425183, throughput 4.57918K wps
[Epoch 34 Batch 300/1540] avg loss 0.0035746, throughput 4.67707K wps
[Epoch 34 Batch 330/1540] avg loss 0.00347205, throughput 4.37268K wps
[Epoch 34 Batch 360/1540] avg loss 0.0036817, throughput 4.8291K wps
[Epoch 34 Batch 390/1540] avg loss 0.00336968, throughput 5.21423K wps
[Epoch 34 Batch 420/1540] avg loss 0.00346278, throughput 4.28158K wps
[Epoch 34 Batch 450/1540] avg loss 0.00374491, throughput 5.11878K wps
[Epoch 34 Batch 480/1540] avg loss 0.00382501, throughput 5.00808K wps
[Epoch 34 Batch 510/1540] avg loss 0.0041587, throughput 5.42368K wps
[Epoch 34 Batch 540/1540] avg loss 0.00401827, throughput 5.17084K wps
[Epoch 34 Batch 570/1540] avg loss 0.00365172, throughput 5.4925K wps
[Epoch 34 Batch 600/1540] avg loss 0.00383961, throughput 4.89351K wps
[Epoch 34 Batch 630/1540] avg loss 0.00382082, throughput 5.21783K wps
[Epoch 34 Batch 660/1540] avg loss 0.00379428, throughput 5.04492K wps
[Epoch 34 Batch 690/1540] avg loss 0.00355843, throughput 5.09662K wps
[Epoch 34 Batch 720/1540] avg loss 0.00400574, throughput 4.73113K wps
[Epoch 34 Batch 750/1540] avg loss 0.00403191, throughput 4.4337K wps
[Epoch 34 Batch 780/1540] avg loss 0.00387476, throughput 5.36794K wps
[Epoch 34 Batch 810/1540] avg loss 0.00443883, throughput 5.39863K wps
[Epoch 34 Batch 840/1540] avg loss 0.00392587, throughput 4.35964K wps
[Epoch 34 Batch 870/1540] avg loss 0.0037608, throughput 4.60561K wps
[Epoch 34 Batch 900/1540] avg loss 0.00394028, throughput 5.26909K wps
[Epoch 34 Batch 930/1540] avg loss 0.00409168, throughput 5.18774K wps
[Epoch 34 Batch 960/1540] avg loss 0.00370505, throughput 4.83854K wps
[Epoch 34 Batch 990/1540] avg loss 0.00404544, throughput 5.26044K wps
[Epoch 34 Batch 1020/1540] avg loss 0.00369021, throughput 5.04572K wps
[Epoch 34 Batch 1050/1540] avg loss 0.00371128, throughput 4.73793K wps
[Epoch 34 Batch 1080/1540] avg loss 0.00395368, throughput 5.7626K wps
[Epoch 34 Batch 1110/1540] avg loss 0.00358344, throughput 4.65213K wps
[Epoch 34 Batch 1140/1540] avg loss 0.00384376, throughput 4.81618K wps
[Epoch 34 Batch 1170/1540] avg loss 0.00403512, throughput 5.08354K wps
[Epoch 34 Batch 1200/1540] avg loss 0.00399693, throughput 5.31453K wps
[Epoch 34 Batch 1230/1540] avg loss 0.00406943, throughput 5.23812K wps
[Epoch 34 Batch 1260/1540] avg loss 0.00391265, throughput 5.01837K wps
[Epoch 34 Batch 1290/1540] avg loss 0.00384853, throughput 4.9072K wps
[Epoch 34 Batch 1320/1540] avg loss 0.00414774, throughput 4.47849K wps
[Epoch 34 Batch 1350/1540] avg loss 0.00390385, throughput 5.1037K wps
[Epoch 34 Batch 1380/1540] avg loss 0.00326333, throughput 5.44701K wps
[Epoch 34 Batch 1410/1540] avg loss 0.00380385, throughput 4.87701K wps
[Epoch 34 Batch 1440/1540] avg loss 0.00354533, throughput 4.55951K wps
[Epoch 34 Batch 1470/1540] avg loss 0.00396648, throughput 5.04108K wps
[Epoch 34 Batch 1500/1540] avg loss 0.00374601, throughput 5.4068K wps
[Epoch 34 Batch 1530/1540] avg loss 0.00417278, throughput 5.13354K wps
Begin Testing...
[Epoch 34] train avg loss 0.003832, dev acc 0.8475, dev avg loss 0.368025, throughput 4.96712K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 35 Batch 30/1540] avg loss 0.00349529, throughput 4.4923K wps
[Epoch 35 Batch 60/1540] avg loss 0.00343669, throughput 4.63134K wps
[Epoch 35 Batch 90/1540] avg loss 0.00377596, throughput 4.89936K wps
[Epoch 35 Batch 120/1540] avg loss 0.00368109, throughput 4.70477K wps
[Epoch 35 Batch 150/1540] avg loss 0.00354375, throughput 4.55164K wps
[Epoch 35 Batch 180/1540] avg loss 0.00413134, throughput 4.9743K wps
[Epoch 35 Batch 210/1540] avg loss 0.00370848, throughput 4.85951K wps
[Epoch 35 Batch 240/1540] avg loss 0.00372763, throughput 5.01012K wps
[Epoch 35 Batch 270/1540] avg loss 0.00365926, throughput 5.47318K wps
[Epoch 35 Batch 300/1540] avg loss 0.00356134, throughput 4.68744K wps
[Epoch 35 Batch 330/1540] avg loss 0.00426857, throughput 4.72024K wps
[Epoch 35 Batch 360/1540] avg loss 0.00377196, throughput 5.07858K wps
[Epoch 35 Batch 390/1540] avg loss 0.00380859, throughput 5.0374K wps
[Epoch 35 Batch 420/1540] avg loss 0.00349535, throughput 4.8511K wps
[Epoch 35 Batch 450/1540] avg loss 0.00402161, throughput 4.59571K wps
[Epoch 35 Batch 480/1540] avg loss 0.00396873, throughput 4.88011K wps
[Epoch 35 Batch 510/1540] avg loss 0.00395017, throughput 4.4321K wps
[Epoch 35 Batch 540/1540] avg loss 0.00420633, throughput 4.63765K wps
[Epoch 35 Batch 570/1540] avg loss 0.00396235, throughput 4.90145K wps
[Epoch 35 Batch 600/1540] avg loss 0.00377418, throughput 4.81944K wps
[Epoch 35 Batch 630/1540] avg loss 0.00359163, throughput 4.46597K wps
[Epoch 35 Batch 660/1540] avg loss 0.003785, throughput 4.75483K wps
[Epoch 35 Batch 690/1540] avg loss 0.00401632, throughput 5.19672K wps
[Epoch 35 Batch 720/1540] avg loss 0.00374835, throughput 4.89922K wps
[Epoch 35 Batch 750/1540] avg loss 0.00384909, throughput 4.81603K wps
[Epoch 35 Batch 780/1540] avg loss 0.00397416, throughput 5.03222K wps
[Epoch 35 Batch 810/1540] avg loss 0.00435708, throughput 4.37158K wps
[Epoch 35 Batch 840/1540] avg loss 0.00385021, throughput 4.98155K wps
[Epoch 35 Batch 870/1540] avg loss 0.00380727, throughput 5.47447K wps
[Epoch 35 Batch 900/1540] avg loss 0.00358874, throughput 5.091K wps
[Epoch 35 Batch 930/1540] avg loss 0.00374446, throughput 5.20761K wps
[Epoch 35 Batch 960/1540] avg loss 0.00384189, throughput 5.3196K wps
[Epoch 35 Batch 990/1540] avg loss 0.00362345, throughput 4.58077K wps
[Epoch 35 Batch 1020/1540] avg loss 0.00355177, throughput 4.76851K wps
[Epoch 35 Batch 1050/1540] avg loss 0.00367429, throughput 5.4386K wps
[Epoch 35 Batch 1080/1540] avg loss 0.00366337, throughput 5.11557K wps
[Epoch 35 Batch 1110/1540] avg loss 0.0036888, throughput 5.22819K wps
[Epoch 35 Batch 1140/1540] avg loss 0.00389405, throughput 5.66204K wps
[Epoch 35 Batch 1170/1540] avg loss 0.00406415, throughput 5.12693K wps
[Epoch 35 Batch 1200/1540] avg loss 0.0038073, throughput 5.0552K wps
[Epoch 35 Batch 1230/1540] avg loss 0.0042173, throughput 4.72704K wps
[Epoch 35 Batch 1260/1540] avg loss 0.00438207, throughput 4.78891K wps
[Epoch 35 Batch 1290/1540] avg loss 0.00378681, throughput 5.21809K wps
[Epoch 35 Batch 1320/1540] avg loss 0.00344459, throughput 5.03779K wps
[Epoch 35 Batch 1350/1540] avg loss 0.00383842, throughput 4.84562K wps
[Epoch 35 Batch 1380/1540] avg loss 0.0037258, throughput 4.60781K wps
[Epoch 35 Batch 1410/1540] avg loss 0.00396453, throughput 4.80543K wps
[Epoch 35 Batch 1440/1540] avg loss 0.00347251, throughput 4.88635K wps
[Epoch 35 Batch 1470/1540] avg loss 0.00367956, throughput 4.99011K wps
[Epoch 35 Batch 1500/1540] avg loss 0.00350766, throughput 4.79661K wps
[Epoch 35 Batch 1530/1540] avg loss 0.00410431, throughput 5.0558K wps
Begin Testing...
[Epoch 35] train avg loss 0.00380928, dev acc 0.8452, dev avg loss 0.376362, throughput 4.89571K wps
[Epoch 36 Batch 30/1540] avg loss 0.00366356, throughput 5.14482K wps
[Epoch 36 Batch 60/1540] avg loss 0.00393136, throughput 4.6924K wps
[Epoch 36 Batch 90/1540] avg loss 0.00404715, throughput 4.64726K wps
[Epoch 36 Batch 120/1540] avg loss 0.00393926, throughput 4.70183K wps
[Epoch 36 Batch 150/1540] avg loss 0.00344259, throughput 4.69028K wps
[Epoch 36 Batch 180/1540] avg loss 0.00338383, throughput 4.92498K wps
[Epoch 36 Batch 210/1540] avg loss 0.00344079, throughput 4.95456K wps
[Epoch 36 Batch 240/1540] avg loss 0.00339803, throughput 5.38546K wps
[Epoch 36 Batch 270/1540] avg loss 0.00347977, throughput 4.91345K wps
[Epoch 36 Batch 300/1540] avg loss 0.00398889, throughput 4.92964K wps
[Epoch 36 Batch 330/1540] avg loss 0.00360495, throughput 5.39208K wps
[Epoch 36 Batch 360/1540] avg loss 0.00338537, throughput 4.80085K wps
[Epoch 36 Batch 390/1540] avg loss 0.00400689, throughput 5.11534K wps
[Epoch 36 Batch 420/1540] avg loss 0.00370365, throughput 4.78982K wps
[Epoch 36 Batch 450/1540] avg loss 0.00376924, throughput 5.50306K wps
[Epoch 36 Batch 480/1540] avg loss 0.00371941, throughput 4.76218K wps
[Epoch 36 Batch 510/1540] avg loss 0.00410777, throughput 4.36646K wps
[Epoch 36 Batch 540/1540] avg loss 0.00383507, throughput 4.48368K wps
[Epoch 36 Batch 570/1540] avg loss 0.0033731, throughput 5.34869K wps
[Epoch 36 Batch 600/1540] avg loss 0.00425531, throughput 4.60928K wps
[Epoch 36 Batch 630/1540] avg loss 0.00325178, throughput 4.43422K wps
[Epoch 36 Batch 660/1540] avg loss 0.00377925, throughput 4.55065K wps
[Epoch 36 Batch 690/1540] avg loss 0.00388877, throughput 5.30635K wps
[Epoch 36 Batch 720/1540] avg loss 0.00372639, throughput 4.9431K wps
[Epoch 36 Batch 750/1540] avg loss 0.00364534, throughput 4.76896K wps
[Epoch 36 Batch 780/1540] avg loss 0.00358866, throughput 4.86301K wps
[Epoch 36 Batch 810/1540] avg loss 0.0038895, throughput 5.41448K wps
[Epoch 36 Batch 840/1540] avg loss 0.00387503, throughput 5.08449K wps
[Epoch 36 Batch 870/1540] avg loss 0.00346613, throughput 5.05384K wps
[Epoch 36 Batch 900/1540] avg loss 0.00385723, throughput 5.38636K wps
[Epoch 36 Batch 930/1540] avg loss 0.00377628, throughput 4.58049K wps
[Epoch 36 Batch 960/1540] avg loss 0.00383567, throughput 5.07492K wps
[Epoch 36 Batch 990/1540] avg loss 0.00380504, throughput 5.2491K wps
[Epoch 36 Batch 1020/1540] avg loss 0.00378183, throughput 5.00758K wps
[Epoch 36 Batch 1050/1540] avg loss 0.00425175, throughput 4.57477K wps
[Epoch 36 Batch 1080/1540] avg loss 0.00396156, throughput 5.05098K wps
[Epoch 36 Batch 1110/1540] avg loss 0.00362671, throughput 5.11378K wps
[Epoch 36 Batch 1140/1540] avg loss 0.00367906, throughput 5.39723K wps
[Epoch 36 Batch 1170/1540] avg loss 0.00419524, throughput 4.76662K wps
[Epoch 36 Batch 1200/1540] avg loss 0.00395269, throughput 4.84035K wps
[Epoch 36 Batch 1230/1540] avg loss 0.00373028, throughput 5.50295K wps
[Epoch 36 Batch 1260/1540] avg loss 0.00377987, throughput 4.62431K wps
[Epoch 36 Batch 1290/1540] avg loss 0.00394861, throughput 4.40279K wps
[Epoch 36 Batch 1320/1540] avg loss 0.00350176, throughput 5.13129K wps
[Epoch 36 Batch 1350/1540] avg loss 0.00418586, throughput 4.62832K wps
[Epoch 36 Batch 1380/1540] avg loss 0.00396506, throughput 4.46681K wps
[Epoch 36 Batch 1410/1540] avg loss 0.00356515, throughput 4.59449K wps
[Epoch 36 Batch 1440/1540] avg loss 0.00405088, throughput 5.02446K wps
[Epoch 36 Batch 1470/1540] avg loss 0.00358263, throughput 4.68392K wps
[Epoch 36 Batch 1500/1540] avg loss 0.00372671, throughput 5.07867K wps
[Epoch 36 Batch 1530/1540] avg loss 0.00389607, throughput 5.06624K wps
Begin Testing...
[Epoch 36] train avg loss 0.00376887, dev acc 0.8475, dev avg loss 0.371433, throughput 4.89495K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 37 Batch 30/1540] avg loss 0.00408444, throughput 5.47417K wps
[Epoch 37 Batch 60/1540] avg loss 0.00305858, throughput 4.79352K wps
[Epoch 37 Batch 90/1540] avg loss 0.00393085, throughput 5.18904K wps
[Epoch 37 Batch 120/1540] avg loss 0.00395005, throughput 5.40948K wps
[Epoch 37 Batch 150/1540] avg loss 0.0041056, throughput 5.147K wps
[Epoch 37 Batch 180/1540] avg loss 0.0033935, throughput 4.90534K wps
[Epoch 37 Batch 210/1540] avg loss 0.00329629, throughput 5.39067K wps
[Epoch 37 Batch 240/1540] avg loss 0.00342483, throughput 4.75997K wps
[Epoch 37 Batch 270/1540] avg loss 0.00386816, throughput 4.54519K wps
[Epoch 37 Batch 300/1540] avg loss 0.00369748, throughput 4.69028K wps
[Epoch 37 Batch 330/1540] avg loss 0.00346749, throughput 4.69064K wps
[Epoch 37 Batch 360/1540] avg loss 0.00370123, throughput 4.41074K wps
[Epoch 37 Batch 390/1540] avg loss 0.00403758, throughput 4.73333K wps
[Epoch 37 Batch 420/1540] avg loss 0.00416444, throughput 5.14404K wps
[Epoch 37 Batch 450/1540] avg loss 0.00378144, throughput 5.54263K wps
[Epoch 37 Batch 480/1540] avg loss 0.00398785, throughput 5.46866K wps
[Epoch 37 Batch 510/1540] avg loss 0.00373263, throughput 4.54553K wps
[Epoch 37 Batch 540/1540] avg loss 0.00368474, throughput 4.45648K wps
[Epoch 37 Batch 570/1540] avg loss 0.00403642, throughput 5.16574K wps
[Epoch 37 Batch 600/1540] avg loss 0.00341123, throughput 4.49316K wps
[Epoch 37 Batch 630/1540] avg loss 0.00354821, throughput 5.21716K wps
[Epoch 37 Batch 660/1540] avg loss 0.00384102, throughput 4.80507K wps
[Epoch 37 Batch 690/1540] avg loss 0.00380397, throughput 4.76278K wps
[Epoch 37 Batch 720/1540] avg loss 0.00330931, throughput 5.07075K wps
[Epoch 37 Batch 750/1540] avg loss 0.0037543, throughput 4.89226K wps
[Epoch 37 Batch 780/1540] avg loss 0.0036407, throughput 5.07228K wps
[Epoch 37 Batch 810/1540] avg loss 0.00369392, throughput 5.14117K wps
[Epoch 37 Batch 840/1540] avg loss 0.00395975, throughput 4.96845K wps
[Epoch 37 Batch 870/1540] avg loss 0.00358462, throughput 5.76579K wps
[Epoch 37 Batch 900/1540] avg loss 0.00381564, throughput 4.96423K wps
[Epoch 37 Batch 930/1540] avg loss 0.00328435, throughput 4.90724K wps
[Epoch 37 Batch 960/1540] avg loss 0.00375034, throughput 5.25723K wps
[Epoch 37 Batch 990/1540] avg loss 0.00358557, throughput 4.69127K wps
[Epoch 37 Batch 1020/1540] avg loss 0.00382533, throughput 4.72961K wps
[Epoch 37 Batch 1050/1540] avg loss 0.00351208, throughput 4.80017K wps
[Epoch 37 Batch 1080/1540] avg loss 0.00345635, throughput 4.62344K wps
[Epoch 37 Batch 1110/1540] avg loss 0.00371101, throughput 4.58562K wps
[Epoch 37 Batch 1140/1540] avg loss 0.00387276, throughput 4.93034K wps
[Epoch 37 Batch 1170/1540] avg loss 0.00371717, throughput 5.48713K wps
[Epoch 37 Batch 1200/1540] avg loss 0.00378256, throughput 4.66359K wps
[Epoch 37 Batch 1230/1540] avg loss 0.00366657, throughput 5.44635K wps
[Epoch 37 Batch 1260/1540] avg loss 0.00347316, throughput 5.2863K wps
[Epoch 37 Batch 1290/1540] avg loss 0.0036046, throughput 5.06079K wps
[Epoch 37 Batch 1320/1540] avg loss 0.00350231, throughput 4.92837K wps
[Epoch 37 Batch 1350/1540] avg loss 0.00377674, throughput 4.84767K wps
[Epoch 37 Batch 1380/1540] avg loss 0.00353122, throughput 5.04911K wps
[Epoch 37 Batch 1410/1540] avg loss 0.00392267, throughput 5.87128K wps
[Epoch 37 Batch 1440/1540] avg loss 0.00358331, throughput 4.85248K wps
[Epoch 37 Batch 1470/1540] avg loss 0.00405326, throughput 4.64102K wps
[Epoch 37 Batch 1500/1540] avg loss 0.00355834, throughput 5.06057K wps
[Epoch 37 Batch 1530/1540] avg loss 0.00395938, throughput 4.92075K wps
Begin Testing...
[Epoch 37] train avg loss 0.00370544, dev acc 0.8326, dev avg loss 0.368982, throughput 4.96743K wps
[Epoch 38 Batch 30/1540] avg loss 0.00377007, throughput 4.73232K wps
[Epoch 38 Batch 60/1540] avg loss 0.00351955, throughput 4.67469K wps
[Epoch 38 Batch 90/1540] avg loss 0.00376833, throughput 4.84529K wps
[Epoch 38 Batch 120/1540] avg loss 0.00400378, throughput 4.81946K wps
[Epoch 38 Batch 150/1540] avg loss 0.00380566, throughput 4.66939K wps
[Epoch 38 Batch 180/1540] avg loss 0.00363397, throughput 4.77705K wps
[Epoch 38 Batch 210/1540] avg loss 0.00332965, throughput 5.36526K wps
[Epoch 38 Batch 240/1540] avg loss 0.00404849, throughput 5.00103K wps
[Epoch 38 Batch 270/1540] avg loss 0.0038361, throughput 4.99981K wps
[Epoch 38 Batch 300/1540] avg loss 0.00358612, throughput 4.88647K wps
[Epoch 38 Batch 330/1540] avg loss 0.00349964, throughput 5.3058K wps
[Epoch 38 Batch 360/1540] avg loss 0.00361926, throughput 4.95706K wps
[Epoch 38 Batch 390/1540] avg loss 0.00343162, throughput 5.48796K wps
[Epoch 38 Batch 420/1540] avg loss 0.00367458, throughput 5.74442K wps
[Epoch 38 Batch 450/1540] avg loss 0.00342021, throughput 5.65851K wps
[Epoch 38 Batch 480/1540] avg loss 0.00381176, throughput 5.06895K wps
[Epoch 38 Batch 510/1540] avg loss 0.00393635, throughput 5.32398K wps
[Epoch 38 Batch 540/1540] avg loss 0.00378749, throughput 4.90046K wps
[Epoch 38 Batch 570/1540] avg loss 0.00341102, throughput 4.63295K wps
[Epoch 38 Batch 600/1540] avg loss 0.00421355, throughput 4.81686K wps
[Epoch 38 Batch 630/1540] avg loss 0.00358014, throughput 4.69705K wps
[Epoch 38 Batch 660/1540] avg loss 0.00394081, throughput 4.71561K wps
[Epoch 38 Batch 690/1540] avg loss 0.00365533, throughput 4.51424K wps
[Epoch 38 Batch 720/1540] avg loss 0.00383393, throughput 5.11903K wps
[Epoch 38 Batch 750/1540] avg loss 0.0035238, throughput 4.54163K wps
[Epoch 38 Batch 780/1540] avg loss 0.00409312, throughput 4.54805K wps
[Epoch 38 Batch 810/1540] avg loss 0.00384412, throughput 5.46612K wps
[Epoch 38 Batch 840/1540] avg loss 0.00354158, throughput 4.67256K wps
[Epoch 38 Batch 870/1540] avg loss 0.00359572, throughput 4.61902K wps
[Epoch 38 Batch 900/1540] avg loss 0.00346728, throughput 5.2176K wps
[Epoch 38 Batch 930/1540] avg loss 0.00347538, throughput 4.75715K wps
[Epoch 38 Batch 960/1540] avg loss 0.00393769, throughput 5.39689K wps
[Epoch 38 Batch 990/1540] avg loss 0.00357002, throughput 5.02684K wps
[Epoch 38 Batch 1020/1540] avg loss 0.00397525, throughput 5.08371K wps
[Epoch 38 Batch 1050/1540] avg loss 0.00416379, throughput 4.63294K wps
[Epoch 38 Batch 1080/1540] avg loss 0.00330528, throughput 4.69806K wps
[Epoch 38 Batch 1110/1540] avg loss 0.00360069, throughput 4.6461K wps
[Epoch 38 Batch 1140/1540] avg loss 0.00357423, throughput 5.04779K wps
[Epoch 38 Batch 1170/1540] avg loss 0.00364879, throughput 4.75153K wps
[Epoch 38 Batch 1200/1540] avg loss 0.00362001, throughput 5.23958K wps
[Epoch 38 Batch 1230/1540] avg loss 0.00361554, throughput 4.91988K wps
[Epoch 38 Batch 1260/1540] avg loss 0.00357027, throughput 4.9878K wps
[Epoch 38 Batch 1290/1540] avg loss 0.0033993, throughput 5.30258K wps
[Epoch 38 Batch 1320/1540] avg loss 0.00330106, throughput 5.61925K wps
[Epoch 38 Batch 1350/1540] avg loss 0.00372698, throughput 5.09738K wps
[Epoch 38 Batch 1380/1540] avg loss 0.00375491, throughput 5.29483K wps
[Epoch 38 Batch 1410/1540] avg loss 0.00369255, throughput 4.78288K wps
[Epoch 38 Batch 1440/1540] avg loss 0.00329151, throughput 4.50917K wps
[Epoch 38 Batch 1470/1540] avg loss 0.00349955, throughput 5.22972K wps
[Epoch 38 Batch 1500/1540] avg loss 0.00379664, throughput 5.58734K wps
[Epoch 38 Batch 1530/1540] avg loss 0.00349994, throughput 5.02372K wps
Begin Testing...
[Epoch 38] train avg loss 0.00366799, dev acc 0.8463, dev avg loss 0.37297, throughput 4.96431K wps
[Epoch 39 Batch 30/1540] avg loss 0.00325526, throughput 4.87025K wps
[Epoch 39 Batch 60/1540] avg loss 0.00370318, throughput 4.963K wps
[Epoch 39 Batch 90/1540] avg loss 0.00357586, throughput 4.63052K wps
[Epoch 39 Batch 120/1540] avg loss 0.00375434, throughput 4.6048K wps
[Epoch 39 Batch 150/1540] avg loss 0.00318928, throughput 4.78209K wps
[Epoch 39 Batch 180/1540] avg loss 0.00334915, throughput 4.96247K wps
[Epoch 39 Batch 210/1540] avg loss 0.00364932, throughput 5.50455K wps
[Epoch 39 Batch 240/1540] avg loss 0.00347892, throughput 5.02976K wps
[Epoch 39 Batch 270/1540] avg loss 0.00357029, throughput 4.91951K wps
[Epoch 39 Batch 300/1540] avg loss 0.00379634, throughput 4.49186K wps
[Epoch 39 Batch 330/1540] avg loss 0.00363108, throughput 4.64254K wps
[Epoch 39 Batch 360/1540] avg loss 0.00367963, throughput 5.1694K wps
[Epoch 39 Batch 390/1540] avg loss 0.00382048, throughput 4.48309K wps
[Epoch 39 Batch 420/1540] avg loss 0.003253, throughput 5.04084K wps
[Epoch 39 Batch 450/1540] avg loss 0.00385439, throughput 5.31234K wps
[Epoch 39 Batch 480/1540] avg loss 0.0034887, throughput 4.55766K wps
[Epoch 39 Batch 510/1540] avg loss 0.00362771, throughput 4.81632K wps
[Epoch 39 Batch 540/1540] avg loss 0.00385991, throughput 4.46688K wps
[Epoch 39 Batch 570/1540] avg loss 0.00357029, throughput 4.73402K wps
[Epoch 39 Batch 600/1540] avg loss 0.00361051, throughput 4.8818K wps
[Epoch 39 Batch 630/1540] avg loss 0.00371495, throughput 4.80879K wps
[Epoch 39 Batch 660/1540] avg loss 0.00355917, throughput 4.69603K wps
[Epoch 39 Batch 690/1540] avg loss 0.00335538, throughput 5.39204K wps
[Epoch 39 Batch 720/1540] avg loss 0.0035479, throughput 5.22562K wps
[Epoch 39 Batch 750/1540] avg loss 0.00416433, throughput 4.76107K wps
[Epoch 39 Batch 780/1540] avg loss 0.00360874, throughput 4.74744K wps
[Epoch 39 Batch 810/1540] avg loss 0.00381817, throughput 4.47304K wps
[Epoch 39 Batch 840/1540] avg loss 0.00353853, throughput 4.6163K wps
[Epoch 39 Batch 870/1540] avg loss 0.0036917, throughput 4.5826K wps
[Epoch 39 Batch 900/1540] avg loss 0.00351697, throughput 4.95057K wps
[Epoch 39 Batch 930/1540] avg loss 0.00347668, throughput 4.66039K wps
[Epoch 39 Batch 960/1540] avg loss 0.00337805, throughput 4.56921K wps
[Epoch 39 Batch 990/1540] avg loss 0.0041883, throughput 5.15861K wps
[Epoch 39 Batch 1020/1540] avg loss 0.00348444, throughput 5.00972K wps
[Epoch 39 Batch 1050/1540] avg loss 0.00359204, throughput 5.18874K wps
[Epoch 39 Batch 1080/1540] avg loss 0.00345708, throughput 4.71765K wps
[Epoch 39 Batch 1110/1540] avg loss 0.00336707, throughput 4.93601K wps
[Epoch 39 Batch 1140/1540] avg loss 0.0036151, throughput 4.84506K wps
[Epoch 39 Batch 1170/1540] avg loss 0.00357845, throughput 4.85591K wps
[Epoch 39 Batch 1200/1540] avg loss 0.00364398, throughput 5.15742K wps
[Epoch 39 Batch 1230/1540] avg loss 0.00412356, throughput 5.37045K wps
[Epoch 39 Batch 1260/1540] avg loss 0.00355543, throughput 4.61483K wps
[Epoch 39 Batch 1290/1540] avg loss 0.00332689, throughput 4.86959K wps
[Epoch 39 Batch 1320/1540] avg loss 0.00403902, throughput 5.325K wps
[Epoch 39 Batch 1350/1540] avg loss 0.00341554, throughput 5.51747K wps
[Epoch 39 Batch 1380/1540] avg loss 0.00355069, throughput 5.62932K wps
[Epoch 39 Batch 1410/1540] avg loss 0.00345419, throughput 4.72768K wps
[Epoch 39 Batch 1440/1540] avg loss 0.00332375, throughput 5.2654K wps
[Epoch 39 Batch 1470/1540] avg loss 0.00388552, throughput 5.23763K wps
[Epoch 39 Batch 1500/1540] avg loss 0.00355266, throughput 5.35854K wps
[Epoch 39 Batch 1530/1540] avg loss 0.00354972, throughput 4.77121K wps
Begin Testing...
[Epoch 39] train avg loss 0.00360007, dev acc 0.8452, dev avg loss 0.36808, throughput 4.89995K wps
[Epoch 40 Batch 30/1540] avg loss 0.00344883, throughput 5.8104K wps
[Epoch 40 Batch 60/1540] avg loss 0.00359948, throughput 5.11749K wps
[Epoch 40 Batch 90/1540] avg loss 0.00387724, throughput 4.90811K wps
[Epoch 40 Batch 120/1540] avg loss 0.00367263, throughput 5.48542K wps
[Epoch 40 Batch 150/1540] avg loss 0.0033387, throughput 5.08816K wps
[Epoch 40 Batch 180/1540] avg loss 0.00369973, throughput 4.99532K wps
[Epoch 40 Batch 210/1540] avg loss 0.00310469, throughput 5.06001K wps
[Epoch 40 Batch 240/1540] avg loss 0.00358635, throughput 4.5157K wps
[Epoch 40 Batch 270/1540] avg loss 0.00334672, throughput 5.08591K wps
[Epoch 40 Batch 300/1540] avg loss 0.00306923, throughput 4.80473K wps
[Epoch 40 Batch 330/1540] avg loss 0.00341896, throughput 4.71773K wps
[Epoch 40 Batch 360/1540] avg loss 0.00369002, throughput 4.43482K wps
[Epoch 40 Batch 390/1540] avg loss 0.0034528, throughput 4.67638K wps
[Epoch 40 Batch 420/1540] avg loss 0.00359874, throughput 4.67638K wps
[Epoch 40 Batch 450/1540] avg loss 0.00357407, throughput 4.78753K wps
[Epoch 40 Batch 480/1540] avg loss 0.00361914, throughput 4.86911K wps
[Epoch 40 Batch 510/1540] avg loss 0.00366946, throughput 4.68976K wps
[Epoch 40 Batch 540/1540] avg loss 0.00343333, throughput 4.92054K wps
[Epoch 40 Batch 570/1540] avg loss 0.00389947, throughput 5.41959K wps
[Epoch 40 Batch 600/1540] avg loss 0.00390793, throughput 4.82761K wps
[Epoch 40 Batch 630/1540] avg loss 0.00384295, throughput 4.91737K wps
[Epoch 40 Batch 660/1540] avg loss 0.00356733, throughput 4.8695K wps
[Epoch 40 Batch 690/1540] avg loss 0.00341145, throughput 5.492K wps
[Epoch 40 Batch 720/1540] avg loss 0.00348862, throughput 4.65231K wps
[Epoch 40 Batch 750/1540] avg loss 0.00332882, throughput 4.68916K wps
[Epoch 40 Batch 780/1540] avg loss 0.0034054, throughput 5.39908K wps
[Epoch 40 Batch 810/1540] avg loss 0.00351913, throughput 5.42289K wps
[Epoch 40 Batch 840/1540] avg loss 0.00345155, throughput 5.43879K wps
[Epoch 40 Batch 870/1540] avg loss 0.00366025, throughput 4.7793K wps
[Epoch 40 Batch 900/1540] avg loss 0.00323151, throughput 4.83688K wps
[Epoch 40 Batch 930/1540] avg loss 0.00333692, throughput 5.04052K wps
[Epoch 40 Batch 960/1540] avg loss 0.00337792, throughput 5.15786K wps
[Epoch 40 Batch 990/1540] avg loss 0.00364491, throughput 6.09666K wps
[Epoch 40 Batch 1020/1540] avg loss 0.003453, throughput 5.25824K wps
[Epoch 40 Batch 1050/1540] avg loss 0.00351291, throughput 5.32762K wps
[Epoch 40 Batch 1080/1540] avg loss 0.00360285, throughput 4.5155K wps
[Epoch 40 Batch 1110/1540] avg loss 0.00368715, throughput 4.43634K wps
[Epoch 40 Batch 1140/1540] avg loss 0.00370134, throughput 4.53188K wps
[Epoch 40 Batch 1170/1540] avg loss 0.00376453, throughput 5.0156K wps
[Epoch 40 Batch 1200/1540] avg loss 0.00389296, throughput 4.70328K wps
[Epoch 40 Batch 1230/1540] avg loss 0.00339211, throughput 4.85602K wps
[Epoch 40 Batch 1260/1540] avg loss 0.00354321, throughput 4.60678K wps
[Epoch 40 Batch 1290/1540] avg loss 0.00375275, throughput 4.58478K wps
[Epoch 40 Batch 1320/1540] avg loss 0.00358445, throughput 4.88897K wps
[Epoch 40 Batch 1350/1540] avg loss 0.0037684, throughput 5.20764K wps
[Epoch 40 Batch 1380/1540] avg loss 0.00339377, throughput 4.95948K wps
[Epoch 40 Batch 1410/1540] avg loss 0.00355741, throughput 5.13474K wps
[Epoch 40 Batch 1440/1540] avg loss 0.00350127, throughput 5.24519K wps
[Epoch 40 Batch 1470/1540] avg loss 0.00338736, throughput 4.39795K wps
[Epoch 40 Batch 1500/1540] avg loss 0.00320699, throughput 5.28813K wps
[Epoch 40 Batch 1530/1540] avg loss 0.00361184, throughput 4.60706K wps
Begin Testing...
[Epoch 40] train avg loss 0.00354314, dev acc 0.8498, dev avg loss 0.37618, throughput 4.94338K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 41 Batch 30/1540] avg loss 0.00317655, throughput 4.36461K wps
[Epoch 41 Batch 60/1540] avg loss 0.00344101, throughput 4.78426K wps
[Epoch 41 Batch 90/1540] avg loss 0.00323889, throughput 5.01235K wps
[Epoch 41 Batch 120/1540] avg loss 0.00343133, throughput 4.91124K wps
[Epoch 41 Batch 150/1540] avg loss 0.0033645, throughput 4.44211K wps
[Epoch 41 Batch 180/1540] avg loss 0.00335172, throughput 4.69661K wps
[Epoch 41 Batch 210/1540] avg loss 0.0035639, throughput 4.62389K wps
[Epoch 41 Batch 240/1540] avg loss 0.00369692, throughput 4.50864K wps
[Epoch 41 Batch 270/1540] avg loss 0.00366997, throughput 5.04416K wps
[Epoch 41 Batch 300/1540] avg loss 0.00374426, throughput 4.88505K wps
[Epoch 41 Batch 330/1540] avg loss 0.0033563, throughput 5.27683K wps
[Epoch 41 Batch 360/1540] avg loss 0.00319168, throughput 4.61903K wps
[Epoch 41 Batch 390/1540] avg loss 0.00331457, throughput 4.504K wps
[Epoch 41 Batch 420/1540] avg loss 0.00330211, throughput 4.95147K wps
[Epoch 41 Batch 450/1540] avg loss 0.00376289, throughput 4.90353K wps
[Epoch 41 Batch 480/1540] avg loss 0.00332965, throughput 5.16653K wps
[Epoch 41 Batch 510/1540] avg loss 0.00324591, throughput 5.19318K wps
[Epoch 41 Batch 540/1540] avg loss 0.00363419, throughput 4.91125K wps
[Epoch 41 Batch 570/1540] avg loss 0.00336084, throughput 4.68071K wps
[Epoch 41 Batch 600/1540] avg loss 0.00350464, throughput 5.1793K wps
[Epoch 41 Batch 630/1540] avg loss 0.00340724, throughput 4.62197K wps
[Epoch 41 Batch 660/1540] avg loss 0.00337373, throughput 5.03208K wps
[Epoch 41 Batch 690/1540] avg loss 0.00339632, throughput 4.71248K wps
[Epoch 41 Batch 720/1540] avg loss 0.00352892, throughput 5.33282K wps
[Epoch 41 Batch 750/1540] avg loss 0.00358943, throughput 5.1698K wps
[Epoch 41 Batch 780/1540] avg loss 0.00358238, throughput 5.04184K wps
[Epoch 41 Batch 810/1540] avg loss 0.00394084, throughput 4.48814K wps
[Epoch 41 Batch 840/1540] avg loss 0.00316012, throughput 4.59122K wps
[Epoch 41 Batch 870/1540] avg loss 0.00366209, throughput 4.64323K wps
[Epoch 41 Batch 900/1540] avg loss 0.0035852, throughput 5.01966K wps
[Epoch 41 Batch 930/1540] avg loss 0.00347742, throughput 4.63786K wps
[Epoch 41 Batch 960/1540] avg loss 0.00381703, throughput 4.77603K wps
[Epoch 41 Batch 990/1540] avg loss 0.00311536, throughput 4.87902K wps
[Epoch 41 Batch 1020/1540] avg loss 0.00373851, throughput 4.91661K wps
[Epoch 41 Batch 1050/1540] avg loss 0.00381416, throughput 4.76702K wps
[Epoch 41 Batch 1080/1540] avg loss 0.00377671, throughput 4.66735K wps
[Epoch 41 Batch 1110/1540] avg loss 0.00347013, throughput 4.80359K wps
[Epoch 41 Batch 1140/1540] avg loss 0.00373288, throughput 5.19201K wps
[Epoch 41 Batch 1170/1540] avg loss 0.00363026, throughput 4.99393K wps
[Epoch 41 Batch 1200/1540] avg loss 0.00349787, throughput 5.03061K wps
[Epoch 41 Batch 1230/1540] avg loss 0.00344777, throughput 4.84943K wps
[Epoch 41 Batch 1260/1540] avg loss 0.003669, throughput 4.72483K wps
[Epoch 41 Batch 1290/1540] avg loss 0.00363177, throughput 4.67602K wps
[Epoch 41 Batch 1320/1540] avg loss 0.00331209, throughput 4.67137K wps
[Epoch 41 Batch 1350/1540] avg loss 0.00347977, throughput 4.39174K wps
[Epoch 41 Batch 1380/1540] avg loss 0.00355061, throughput 4.48692K wps
[Epoch 41 Batch 1410/1540] avg loss 0.00363509, throughput 5.04919K wps
[Epoch 41 Batch 1440/1540] avg loss 0.00349898, throughput 4.86033K wps
[Epoch 41 Batch 1470/1540] avg loss 0.0033357, throughput 4.97123K wps
[Epoch 41 Batch 1500/1540] avg loss 0.00344969, throughput 4.78914K wps
[Epoch 41 Batch 1530/1540] avg loss 0.00366224, throughput 5.67608K wps
Begin Testing...
[Epoch 41] train avg loss 0.00351237, dev acc 0.8337, dev avg loss 0.39413, throughput 4.83619K wps
[Epoch 42 Batch 30/1540] avg loss 0.00347072, throughput 5.38136K wps
[Epoch 42 Batch 60/1540] avg loss 0.00358372, throughput 4.65577K wps
[Epoch 42 Batch 90/1540] avg loss 0.0034355, throughput 5.07695K wps
[Epoch 42 Batch 120/1540] avg loss 0.00323654, throughput 4.59064K wps
[Epoch 42 Batch 150/1540] avg loss 0.00353364, throughput 4.58322K wps
[Epoch 42 Batch 180/1540] avg loss 0.00380937, throughput 5.29578K wps
[Epoch 42 Batch 210/1540] avg loss 0.00374111, throughput 4.7659K wps
[Epoch 42 Batch 240/1540] avg loss 0.0034277, throughput 4.89104K wps
[Epoch 42 Batch 270/1540] avg loss 0.00311753, throughput 4.47718K wps
[Epoch 42 Batch 300/1540] avg loss 0.00386906, throughput 4.6653K wps
[Epoch 42 Batch 330/1540] avg loss 0.00370201, throughput 4.71318K wps
[Epoch 42 Batch 360/1540] avg loss 0.00373056, throughput 5.24751K wps
[Epoch 42 Batch 390/1540] avg loss 0.00326087, throughput 5.26255K wps
[Epoch 42 Batch 420/1540] avg loss 0.00349727, throughput 4.97011K wps
[Epoch 42 Batch 450/1540] avg loss 0.00328517, throughput 5.17087K wps
[Epoch 42 Batch 480/1540] avg loss 0.00349727, throughput 4.75128K wps
[Epoch 42 Batch 510/1540] avg loss 0.00351704, throughput 5.12675K wps
[Epoch 42 Batch 540/1540] avg loss 0.00297968, throughput 4.77326K wps
[Epoch 42 Batch 570/1540] avg loss 0.00313044, throughput 5.1245K wps
[Epoch 42 Batch 600/1540] avg loss 0.00356246, throughput 4.68917K wps
[Epoch 42 Batch 630/1540] avg loss 0.00348115, throughput 5.31421K wps
[Epoch 42 Batch 660/1540] avg loss 0.00387447, throughput 4.65416K wps
[Epoch 42 Batch 690/1540] avg loss 0.00325652, throughput 4.66778K wps
[Epoch 42 Batch 720/1540] avg loss 0.00351583, throughput 4.86357K wps
[Epoch 42 Batch 750/1540] avg loss 0.00364211, throughput 4.91256K wps
[Epoch 42 Batch 780/1540] avg loss 0.00339422, throughput 5.08517K wps
[Epoch 42 Batch 810/1540] avg loss 0.00384461, throughput 4.70255K wps
[Epoch 42 Batch 840/1540] avg loss 0.00326751, throughput 4.63001K wps
[Epoch 42 Batch 870/1540] avg loss 0.0032349, throughput 4.86928K wps
[Epoch 42 Batch 900/1540] avg loss 0.00344125, throughput 4.4663K wps
[Epoch 42 Batch 930/1540] avg loss 0.00311436, throughput 4.69231K wps
[Epoch 42 Batch 960/1540] avg loss 0.00337994, throughput 4.47118K wps
[Epoch 42 Batch 990/1540] avg loss 0.00360812, throughput 4.79541K wps
[Epoch 42 Batch 1020/1540] avg loss 0.00341395, throughput 4.7941K wps
[Epoch 42 Batch 1050/1540] avg loss 0.00351617, throughput 4.77984K wps
[Epoch 42 Batch 1080/1540] avg loss 0.00359198, throughput 4.88318K wps
[Epoch 42 Batch 1110/1540] avg loss 0.00347934, throughput 5.01632K wps
[Epoch 42 Batch 1140/1540] avg loss 0.00336491, throughput 4.9002K wps
[Epoch 42 Batch 1170/1540] avg loss 0.00365991, throughput 4.51911K wps
[Epoch 42 Batch 1200/1540] avg loss 0.00355717, throughput 4.51998K wps
[Epoch 42 Batch 1230/1540] avg loss 0.00310383, throughput 5.37733K wps
[Epoch 42 Batch 1260/1540] avg loss 0.00340012, throughput 4.87565K wps
[Epoch 42 Batch 1290/1540] avg loss 0.00356207, throughput 5.13828K wps
[Epoch 42 Batch 1320/1540] avg loss 0.00337929, throughput 5.19695K wps
[Epoch 42 Batch 1350/1540] avg loss 0.00324042, throughput 4.87085K wps
[Epoch 42 Batch 1380/1540] avg loss 0.00382381, throughput 4.8082K wps
[Epoch 42 Batch 1410/1540] avg loss 0.00307082, throughput 4.94731K wps
[Epoch 42 Batch 1440/1540] avg loss 0.00354353, throughput 4.52994K wps
[Epoch 42 Batch 1470/1540] avg loss 0.00324621, throughput 5.43771K wps
[Epoch 42 Batch 1500/1540] avg loss 0.00307346, throughput 4.6692K wps
[Epoch 42 Batch 1530/1540] avg loss 0.00354857, throughput 5.19074K wps
Begin Testing...
[Epoch 42] train avg loss 0.00345016, dev acc 0.8440, dev avg loss 0.373034, throughput 4.86492K wps
[Epoch 43 Batch 30/1540] avg loss 0.00316488, throughput 5.65808K wps
[Epoch 43 Batch 60/1540] avg loss 0.00344756, throughput 4.97464K wps
[Epoch 43 Batch 90/1540] avg loss 0.00336153, throughput 5.28731K wps
[Epoch 43 Batch 120/1540] avg loss 0.00377538, throughput 5.74649K wps
[Epoch 43 Batch 150/1540] avg loss 0.0034754, throughput 4.92454K wps
[Epoch 43 Batch 180/1540] avg loss 0.00344693, throughput 4.42169K wps
[Epoch 43 Batch 210/1540] avg loss 0.00348764, throughput 4.9016K wps
[Epoch 43 Batch 240/1540] avg loss 0.00343503, throughput 5.96681K wps
[Epoch 43 Batch 270/1540] avg loss 0.00330509, throughput 4.87702K wps
[Epoch 43 Batch 300/1540] avg loss 0.00323542, throughput 4.84938K wps
[Epoch 43 Batch 330/1540] avg loss 0.00333106, throughput 4.8606K wps
[Epoch 43 Batch 360/1540] avg loss 0.00348713, throughput 4.9438K wps
[Epoch 43 Batch 390/1540] avg loss 0.00367435, throughput 5.67091K wps
[Epoch 43 Batch 420/1540] avg loss 0.00344657, throughput 4.55836K wps
[Epoch 43 Batch 450/1540] avg loss 0.00316784, throughput 4.49811K wps
[Epoch 43 Batch 480/1540] avg loss 0.00338478, throughput 4.73035K wps
[Epoch 43 Batch 510/1540] avg loss 0.00319956, throughput 5.49848K wps
[Epoch 43 Batch 540/1540] avg loss 0.00340978, throughput 4.89592K wps
[Epoch 43 Batch 570/1540] avg loss 0.00322025, throughput 5.02046K wps
[Epoch 43 Batch 600/1540] avg loss 0.00332597, throughput 4.65893K wps
[Epoch 43 Batch 630/1540] avg loss 0.00320871, throughput 5.04029K wps
[Epoch 43 Batch 660/1540] avg loss 0.00339986, throughput 4.90302K wps
[Epoch 43 Batch 690/1540] avg loss 0.00338314, throughput 4.7135K wps
[Epoch 43 Batch 720/1540] avg loss 0.00336081, throughput 4.6889K wps
[Epoch 43 Batch 750/1540] avg loss 0.00334356, throughput 5.19826K wps
[Epoch 43 Batch 780/1540] avg loss 0.00326944, throughput 4.7229K wps
[Epoch 43 Batch 810/1540] avg loss 0.00362991, throughput 4.76293K wps
[Epoch 43 Batch 840/1540] avg loss 0.00348681, throughput 4.72581K wps
[Epoch 43 Batch 870/1540] avg loss 0.00297098, throughput 5.60958K wps
[Epoch 43 Batch 900/1540] avg loss 0.00332199, throughput 4.64696K wps
[Epoch 43 Batch 930/1540] avg loss 0.00360521, throughput 5.19006K wps
[Epoch 43 Batch 960/1540] avg loss 0.00357695, throughput 5.23601K wps
[Epoch 43 Batch 990/1540] avg loss 0.00340401, throughput 5.15151K wps
[Epoch 43 Batch 1020/1540] avg loss 0.00362537, throughput 4.83905K wps
[Epoch 43 Batch 1050/1540] avg loss 0.00285646, throughput 4.92527K wps
[Epoch 43 Batch 1080/1540] avg loss 0.00341355, throughput 4.66927K wps
[Epoch 43 Batch 1110/1540] avg loss 0.0035097, throughput 4.85568K wps
[Epoch 43 Batch 1140/1540] avg loss 0.00318962, throughput 4.94736K wps
[Epoch 43 Batch 1170/1540] avg loss 0.00351648, throughput 4.87006K wps
[Epoch 43 Batch 1200/1540] avg loss 0.00341186, throughput 5.46819K wps
[Epoch 43 Batch 1230/1540] avg loss 0.00362977, throughput 4.84479K wps
[Epoch 43 Batch 1260/1540] avg loss 0.00382754, throughput 5.61957K wps
[Epoch 43 Batch 1290/1540] avg loss 0.00373402, throughput 4.80092K wps
[Epoch 43 Batch 1320/1540] avg loss 0.00360707, throughput 4.88511K wps
[Epoch 43 Batch 1350/1540] avg loss 0.00342848, throughput 4.98971K wps
[Epoch 43 Batch 1380/1540] avg loss 0.00371775, throughput 4.8169K wps
[Epoch 43 Batch 1410/1540] avg loss 0.00339889, throughput 5.11602K wps
[Epoch 43 Batch 1440/1540] avg loss 0.00336081, throughput 4.68255K wps
[Epoch 43 Batch 1470/1540] avg loss 0.00344692, throughput 5.35891K wps
[Epoch 43 Batch 1500/1540] avg loss 0.00346914, throughput 4.70448K wps
[Epoch 43 Batch 1530/1540] avg loss 0.00332718, throughput 4.4364K wps
Begin Testing...
[Epoch 43] train avg loss 0.00341575, dev acc 0.8452, dev avg loss 0.384034, throughput 4.96049K wps
[Epoch 44 Batch 30/1540] avg loss 0.0033195, throughput 4.48468K wps
[Epoch 44 Batch 60/1540] avg loss 0.00342355, throughput 5.19575K wps
[Epoch 44 Batch 90/1540] avg loss 0.0034454, throughput 5.28848K wps
[Epoch 44 Batch 120/1540] avg loss 0.00359909, throughput 5.55119K wps
[Epoch 44 Batch 150/1540] avg loss 0.00317505, throughput 4.48958K wps
[Epoch 44 Batch 180/1540] avg loss 0.00323626, throughput 4.93499K wps
[Epoch 44 Batch 210/1540] avg loss 0.00330215, throughput 4.92574K wps
[Epoch 44 Batch 240/1540] avg loss 0.00358407, throughput 4.44616K wps
[Epoch 44 Batch 270/1540] avg loss 0.00341236, throughput 4.51736K wps
[Epoch 44 Batch 300/1540] avg loss 0.00368919, throughput 4.84987K wps
[Epoch 44 Batch 330/1540] avg loss 0.00329826, throughput 5.04841K wps
[Epoch 44 Batch 360/1540] avg loss 0.00368184, throughput 5.23779K wps
[Epoch 44 Batch 390/1540] avg loss 0.00344464, throughput 4.72695K wps
[Epoch 44 Batch 420/1540] avg loss 0.00350577, throughput 5.10164K wps
[Epoch 44 Batch 450/1540] avg loss 0.00364227, throughput 5.08084K wps
[Epoch 44 Batch 480/1540] avg loss 0.00349359, throughput 4.871K wps
[Epoch 44 Batch 510/1540] avg loss 0.00338199, throughput 5.16584K wps
[Epoch 44 Batch 540/1540] avg loss 0.00340267, throughput 5.29359K wps
[Epoch 44 Batch 570/1540] avg loss 0.00325961, throughput 5.19898K wps
[Epoch 44 Batch 600/1540] avg loss 0.00311304, throughput 4.58883K wps
[Epoch 44 Batch 630/1540] avg loss 0.00350859, throughput 5.06534K wps
[Epoch 44 Batch 660/1540] avg loss 0.00322405, throughput 5.20774K wps
[Epoch 44 Batch 690/1540] avg loss 0.00319019, throughput 5.48535K wps
[Epoch 44 Batch 720/1540] avg loss 0.00322034, throughput 4.91571K wps
[Epoch 44 Batch 750/1540] avg loss 0.00334688, throughput 4.40336K wps
[Epoch 44 Batch 780/1540] avg loss 0.00334769, throughput 5.16408K wps
[Epoch 44 Batch 810/1540] avg loss 0.00304194, throughput 5.28182K wps
[Epoch 44 Batch 840/1540] avg loss 0.00323111, throughput 4.9688K wps
[Epoch 44 Batch 870/1540] avg loss 0.00343902, throughput 4.78896K wps
[Epoch 44 Batch 900/1540] avg loss 0.00352539, throughput 4.896K wps
[Epoch 44 Batch 930/1540] avg loss 0.00382522, throughput 5.16854K wps
[Epoch 44 Batch 960/1540] avg loss 0.00344649, throughput 5.02994K wps
[Epoch 44 Batch 990/1540] avg loss 0.00329594, throughput 5.12299K wps
[Epoch 44 Batch 1020/1540] avg loss 0.00316906, throughput 5.29654K wps
[Epoch 44 Batch 1050/1540] avg loss 0.00354312, throughput 5.04406K wps
[Epoch 44 Batch 1080/1540] avg loss 0.00301941, throughput 5.06992K wps
[Epoch 44 Batch 1110/1540] avg loss 0.0032689, throughput 5.74333K wps
[Epoch 44 Batch 1140/1540] avg loss 0.00369249, throughput 5.15451K wps
[Epoch 44 Batch 1170/1540] avg loss 0.00349109, throughput 4.90167K wps
[Epoch 44 Batch 1200/1540] avg loss 0.00376483, throughput 5.04074K wps
[Epoch 44 Batch 1230/1540] avg loss 0.00360817, throughput 5.57437K wps
[Epoch 44 Batch 1260/1540] avg loss 0.00320286, throughput 5.20118K wps
[Epoch 44 Batch 1290/1540] avg loss 0.00332193, throughput 5.14915K wps
[Epoch 44 Batch 1320/1540] avg loss 0.0030785, throughput 4.99576K wps
[Epoch 44 Batch 1350/1540] avg loss 0.00307184, throughput 4.85805K wps
[Epoch 44 Batch 1380/1540] avg loss 0.00305428, throughput 4.83883K wps
[Epoch 44 Batch 1410/1540] avg loss 0.00342654, throughput 5.48529K wps
[Epoch 44 Batch 1440/1540] avg loss 0.00359996, throughput 5.25206K wps
[Epoch 44 Batch 1470/1540] avg loss 0.00358583, throughput 5.43762K wps
[Epoch 44 Batch 1500/1540] avg loss 0.00304005, throughput 4.73466K wps
[Epoch 44 Batch 1530/1540] avg loss 0.00357135, throughput 5.76273K wps
Begin Testing...
[Epoch 44] train avg loss 0.00338427, dev acc 0.8406, dev avg loss 0.367974, throughput 5.03473K wps
[Epoch 45 Batch 30/1540] avg loss 0.00330328, throughput 5.15731K wps
[Epoch 45 Batch 60/1540] avg loss 0.00314844, throughput 5.49341K wps
[Epoch 45 Batch 90/1540] avg loss 0.00347343, throughput 4.95979K wps
[Epoch 45 Batch 120/1540] avg loss 0.00323798, throughput 5.016K wps
[Epoch 45 Batch 150/1540] avg loss 0.00323396, throughput 5.42441K wps
[Epoch 45 Batch 180/1540] avg loss 0.00310881, throughput 4.89059K wps
[Epoch 45 Batch 210/1540] avg loss 0.0030379, throughput 4.63113K wps
[Epoch 45 Batch 240/1540] avg loss 0.0031666, throughput 4.50118K wps
[Epoch 45 Batch 270/1540] avg loss 0.00306597, throughput 5.39491K wps
[Epoch 45 Batch 300/1540] avg loss 0.00352836, throughput 4.98728K wps
[Epoch 45 Batch 330/1540] avg loss 0.00339416, throughput 5.40703K wps
[Epoch 45 Batch 360/1540] avg loss 0.00310538, throughput 5.05019K wps
[Epoch 45 Batch 390/1540] avg loss 0.00357673, throughput 4.55244K wps
[Epoch 45 Batch 420/1540] avg loss 0.00313407, throughput 4.97845K wps
[Epoch 45 Batch 450/1540] avg loss 0.0036521, throughput 5.32629K wps
[Epoch 45 Batch 480/1540] avg loss 0.00350534, throughput 4.68892K wps
[Epoch 45 Batch 510/1540] avg loss 0.00341049, throughput 5.5052K wps
[Epoch 45 Batch 540/1540] avg loss 0.00328735, throughput 5.52925K wps
[Epoch 45 Batch 570/1540] avg loss 0.00315579, throughput 5.74575K wps
[Epoch 45 Batch 600/1540] avg loss 0.0034359, throughput 5.32603K wps
[Epoch 45 Batch 630/1540] avg loss 0.00353396, throughput 4.60984K wps
[Epoch 45 Batch 660/1540] avg loss 0.00320071, throughput 5.02967K wps
[Epoch 45 Batch 690/1540] avg loss 0.00318115, throughput 4.97657K wps
[Epoch 45 Batch 720/1540] avg loss 0.00327009, throughput 5.04784K wps
[Epoch 45 Batch 750/1540] avg loss 0.00317752, throughput 5.06391K wps
[Epoch 45 Batch 780/1540] avg loss 0.00335789, throughput 5.03716K wps
[Epoch 45 Batch 810/1540] avg loss 0.00383197, throughput 4.91126K wps
[Epoch 45 Batch 840/1540] avg loss 0.00344267, throughput 5.28092K wps
[Epoch 45 Batch 870/1540] avg loss 0.00334118, throughput 5.00009K wps
[Epoch 45 Batch 900/1540] avg loss 0.00342655, throughput 5.08907K wps
[Epoch 45 Batch 930/1540] avg loss 0.00336819, throughput 4.50175K wps
[Epoch 45 Batch 960/1540] avg loss 0.00348258, throughput 4.4867K wps
[Epoch 45 Batch 990/1540] avg loss 0.00365719, throughput 4.9825K wps
[Epoch 45 Batch 1020/1540] avg loss 0.00323927, throughput 4.85593K wps
[Epoch 45 Batch 1050/1540] avg loss 0.00307617, throughput 4.96059K wps
[Epoch 45 Batch 1080/1540] avg loss 0.00310297, throughput 4.88758K wps
[Epoch 45 Batch 1110/1540] avg loss 0.00346245, throughput 5.15796K wps
[Epoch 45 Batch 1140/1540] avg loss 0.00311714, throughput 4.73714K wps
[Epoch 45 Batch 1170/1540] avg loss 0.00337458, throughput 4.70397K wps
[Epoch 45 Batch 1200/1540] avg loss 0.00342877, throughput 5.04249K wps
[Epoch 45 Batch 1230/1540] avg loss 0.00342211, throughput 5.12034K wps
[Epoch 45 Batch 1260/1540] avg loss 0.00359256, throughput 5.23016K wps
[Epoch 45 Batch 1290/1540] avg loss 0.00332417, throughput 4.60764K wps
[Epoch 45 Batch 1320/1540] avg loss 0.00362414, throughput 4.9577K wps
[Epoch 45 Batch 1350/1540] avg loss 0.00325114, throughput 4.49185K wps
[Epoch 45 Batch 1380/1540] avg loss 0.00315891, throughput 4.66441K wps
[Epoch 45 Batch 1410/1540] avg loss 0.00301754, throughput 4.83356K wps
[Epoch 45 Batch 1440/1540] avg loss 0.0035943, throughput 4.68996K wps
[Epoch 45 Batch 1470/1540] avg loss 0.00319462, throughput 4.39641K wps
[Epoch 45 Batch 1500/1540] avg loss 0.00324896, throughput 4.82799K wps
[Epoch 45 Batch 1530/1540] avg loss 0.00306588, throughput 4.4327K wps
Begin Testing...
[Epoch 45] train avg loss 0.00332654, dev acc 0.8486, dev avg loss 0.372921, throughput 4.94076K wps
[Epoch 46 Batch 30/1540] avg loss 0.00326246, throughput 4.94014K wps
[Epoch 46 Batch 60/1540] avg loss 0.00349429, throughput 5.81127K wps
[Epoch 46 Batch 90/1540] avg loss 0.00339367, throughput 5.19894K wps
[Epoch 46 Batch 120/1540] avg loss 0.00311419, throughput 4.7279K wps
[Epoch 46 Batch 150/1540] avg loss 0.0034781, throughput 4.80089K wps
[Epoch 46 Batch 180/1540] avg loss 0.00345558, throughput 5.19873K wps
[Epoch 46 Batch 210/1540] avg loss 0.00294227, throughput 4.74883K wps
[Epoch 46 Batch 240/1540] avg loss 0.00361199, throughput 4.92744K wps
[Epoch 46 Batch 270/1540] avg loss 0.00306456, throughput 5.22756K wps
[Epoch 46 Batch 300/1540] avg loss 0.00333648, throughput 5.20953K wps
[Epoch 46 Batch 330/1540] avg loss 0.00344421, throughput 5.15618K wps
[Epoch 46 Batch 360/1540] avg loss 0.00330244, throughput 4.8679K wps
[Epoch 46 Batch 390/1540] avg loss 0.0032228, throughput 4.94861K wps
[Epoch 46 Batch 420/1540] avg loss 0.00357637, throughput 5.28484K wps
[Epoch 46 Batch 450/1540] avg loss 0.00318365, throughput 5.1827K wps
[Epoch 46 Batch 480/1540] avg loss 0.00362569, throughput 4.95938K wps
[Epoch 46 Batch 510/1540] avg loss 0.00340803, throughput 4.67778K wps
[Epoch 46 Batch 540/1540] avg loss 0.00333983, throughput 5.1418K wps
[Epoch 46 Batch 570/1540] avg loss 0.00339803, throughput 5.30839K wps
[Epoch 46 Batch 600/1540] avg loss 0.00332201, throughput 4.53934K wps
[Epoch 46 Batch 630/1540] avg loss 0.0032546, throughput 4.92629K wps
[Epoch 46 Batch 660/1540] avg loss 0.00311295, throughput 5.98564K wps
[Epoch 46 Batch 690/1540] avg loss 0.0033004, throughput 5.01881K wps
[Epoch 46 Batch 720/1540] avg loss 0.00315544, throughput 4.85807K wps
[Epoch 46 Batch 750/1540] avg loss 0.00301927, throughput 5.17032K wps
[Epoch 46 Batch 780/1540] avg loss 0.00328941, throughput 5.39446K wps
[Epoch 46 Batch 810/1540] avg loss 0.00306377, throughput 4.92318K wps
[Epoch 46 Batch 840/1540] avg loss 0.00361106, throughput 4.89509K wps
[Epoch 46 Batch 870/1540] avg loss 0.00277934, throughput 4.46229K wps
[Epoch 46 Batch 900/1540] avg loss 0.00342876, throughput 5.16276K wps
[Epoch 46 Batch 930/1540] avg loss 0.00342276, throughput 4.68042K wps
[Epoch 46 Batch 960/1540] avg loss 0.003444, throughput 4.33233K wps
[Epoch 46 Batch 990/1540] avg loss 0.00350886, throughput 5.18766K wps
[Epoch 46 Batch 1020/1540] avg loss 0.00321689, throughput 5.32326K wps
[Epoch 46 Batch 1050/1540] avg loss 0.00335784, throughput 5.21476K wps
[Epoch 46 Batch 1080/1540] avg loss 0.00334602, throughput 4.67373K wps
[Epoch 46 Batch 1110/1540] avg loss 0.00316358, throughput 5.01289K wps
[Epoch 46 Batch 1140/1540] avg loss 0.00350471, throughput 5.32206K wps
[Epoch 46 Batch 1170/1540] avg loss 0.00326105, throughput 5.74778K wps
[Epoch 46 Batch 1200/1540] avg loss 0.00321279, throughput 4.76695K wps
[Epoch 46 Batch 1230/1540] avg loss 0.00350285, throughput 4.85471K wps
[Epoch 46 Batch 1260/1540] avg loss 0.00326351, throughput 5.54432K wps
[Epoch 46 Batch 1290/1540] avg loss 0.0033106, throughput 4.81756K wps
[Epoch 46 Batch 1320/1540] avg loss 0.00341477, throughput 4.74579K wps
[Epoch 46 Batch 1350/1540] avg loss 0.00300504, throughput 4.70014K wps
[Epoch 46 Batch 1380/1540] avg loss 0.0031877, throughput 5.40488K wps
[Epoch 46 Batch 1410/1540] avg loss 0.00331328, throughput 5.32725K wps
[Epoch 46 Batch 1440/1540] avg loss 0.00338693, throughput 4.57212K wps
[Epoch 46 Batch 1470/1540] avg loss 0.00345229, throughput 4.57912K wps
[Epoch 46 Batch 1500/1540] avg loss 0.00346831, throughput 4.97936K wps
[Epoch 46 Batch 1530/1540] avg loss 0.00359843, throughput 4.9285K wps
Begin Testing...
[Epoch 46] train avg loss 0.00332293, dev acc 0.8486, dev avg loss 0.372904, throughput 5.00069K wps
[Epoch 47 Batch 30/1540] avg loss 0.00349879, throughput 4.4865K wps
[Epoch 47 Batch 60/1540] avg loss 0.00304243, throughput 4.59807K wps
[Epoch 47 Batch 90/1540] avg loss 0.00350324, throughput 4.93242K wps
[Epoch 47 Batch 120/1540] avg loss 0.00342867, throughput 4.79934K wps
[Epoch 47 Batch 150/1540] avg loss 0.00318358, throughput 4.9165K wps
[Epoch 47 Batch 180/1540] avg loss 0.00306229, throughput 4.63833K wps
[Epoch 47 Batch 210/1540] avg loss 0.00329521, throughput 4.68353K wps
[Epoch 47 Batch 240/1540] avg loss 0.00342693, throughput 5.17203K wps
[Epoch 47 Batch 270/1540] avg loss 0.00318023, throughput 5.33528K wps
[Epoch 47 Batch 300/1540] avg loss 0.00350959, throughput 4.90382K wps
[Epoch 47 Batch 330/1540] avg loss 0.00330073, throughput 5.34867K wps
[Epoch 47 Batch 360/1540] avg loss 0.00314262, throughput 4.84263K wps
[Epoch 47 Batch 390/1540] avg loss 0.00329938, throughput 5.02623K wps
[Epoch 47 Batch 420/1540] avg loss 0.00320371, throughput 4.5096K wps
[Epoch 47 Batch 450/1540] avg loss 0.00341202, throughput 4.55155K wps
[Epoch 47 Batch 480/1540] avg loss 0.00343455, throughput 5.20834K wps
[Epoch 47 Batch 510/1540] avg loss 0.00352314, throughput 4.7541K wps
[Epoch 47 Batch 540/1540] avg loss 0.0033375, throughput 5.2003K wps
[Epoch 47 Batch 570/1540] avg loss 0.00279188, throughput 5.26339K wps
[Epoch 47 Batch 600/1540] avg loss 0.00339296, throughput 5.36706K wps
[Epoch 47 Batch 630/1540] avg loss 0.00344694, throughput 5.03695K wps
[Epoch 47 Batch 660/1540] avg loss 0.00325498, throughput 4.80039K wps
[Epoch 47 Batch 690/1540] avg loss 0.00335018, throughput 4.74948K wps
[Epoch 47 Batch 720/1540] avg loss 0.00343148, throughput 5.77083K wps
[Epoch 47 Batch 750/1540] avg loss 0.00316119, throughput 4.83418K wps
[Epoch 47 Batch 780/1540] avg loss 0.00325816, throughput 4.52835K wps
[Epoch 47 Batch 810/1540] avg loss 0.00318011, throughput 5.37128K wps
[Epoch 47 Batch 840/1540] avg loss 0.00269565, throughput 4.73004K wps
[Epoch 47 Batch 870/1540] avg loss 0.0033207, throughput 5.20117K wps
[Epoch 47 Batch 900/1540] avg loss 0.00354964, throughput 4.80845K wps
[Epoch 47 Batch 930/1540] avg loss 0.00320906, throughput 5.02755K wps
[Epoch 47 Batch 960/1540] avg loss 0.00271463, throughput 5.28441K wps
[Epoch 47 Batch 990/1540] avg loss 0.00354288, throughput 4.96217K wps
[Epoch 47 Batch 1020/1540] avg loss 0.0029235, throughput 4.91632K wps
[Epoch 47 Batch 1050/1540] avg loss 0.0032996, throughput 5.10292K wps
[Epoch 47 Batch 1080/1540] avg loss 0.00317642, throughput 4.53618K wps
[Epoch 47 Batch 1110/1540] avg loss 0.00303843, throughput 5.21888K wps
[Epoch 47 Batch 1140/1540] avg loss 0.00366515, throughput 4.85063K wps
[Epoch 47 Batch 1170/1540] avg loss 0.00316109, throughput 5.1402K wps
[Epoch 47 Batch 1200/1540] avg loss 0.00302262, throughput 4.47996K wps
[Epoch 47 Batch 1230/1540] avg loss 0.00328508, throughput 4.73947K wps
[Epoch 47 Batch 1260/1540] avg loss 0.00341344, throughput 5.03331K wps
[Epoch 47 Batch 1290/1540] avg loss 0.00325818, throughput 5.25407K wps
[Epoch 47 Batch 1320/1540] avg loss 0.00316328, throughput 5.35085K wps
[Epoch 47 Batch 1350/1540] avg loss 0.00338375, throughput 5.74015K wps
[Epoch 47 Batch 1380/1540] avg loss 0.00331405, throughput 4.83755K wps
[Epoch 47 Batch 1410/1540] avg loss 0.00320453, throughput 5.05196K wps
[Epoch 47 Batch 1440/1540] avg loss 0.00354603, throughput 5.51397K wps
[Epoch 47 Batch 1470/1540] avg loss 0.00301481, throughput 5.11741K wps
[Epoch 47 Batch 1500/1540] avg loss 0.00330588, throughput 4.87639K wps
[Epoch 47 Batch 1530/1540] avg loss 0.00325976, throughput 4.62613K wps
Begin Testing...
[Epoch 47] train avg loss 0.0032745, dev acc 0.8349, dev avg loss 0.412049, throughput 4.96179K wps
[Epoch 48 Batch 30/1540] avg loss 0.00322021, throughput 5.01561K wps
[Epoch 48 Batch 60/1540] avg loss 0.00300526, throughput 4.58865K wps
[Epoch 48 Batch 90/1540] avg loss 0.00301238, throughput 5.32704K wps
[Epoch 48 Batch 120/1540] avg loss 0.0029335, throughput 5.55594K wps
[Epoch 48 Batch 150/1540] avg loss 0.00358408, throughput 5.00908K wps
[Epoch 48 Batch 180/1540] avg loss 0.00298381, throughput 4.917K wps
[Epoch 48 Batch 210/1540] avg loss 0.00329257, throughput 4.72959K wps
[Epoch 48 Batch 240/1540] avg loss 0.00302998, throughput 5.47513K wps
[Epoch 48 Batch 270/1540] avg loss 0.00292717, throughput 5.23359K wps
[Epoch 48 Batch 300/1540] avg loss 0.00321792, throughput 4.82875K wps
[Epoch 48 Batch 330/1540] avg loss 0.00358877, throughput 4.85601K wps
[Epoch 48 Batch 360/1540] avg loss 0.00346828, throughput 4.52695K wps
[Epoch 48 Batch 390/1540] avg loss 0.0033778, throughput 5.20375K wps
[Epoch 48 Batch 420/1540] avg loss 0.00303637, throughput 4.7837K wps
[Epoch 48 Batch 450/1540] avg loss 0.00347953, throughput 4.73826K wps
[Epoch 48 Batch 480/1540] avg loss 0.00303342, throughput 5.6729K wps
[Epoch 48 Batch 510/1540] avg loss 0.00344642, throughput 5.93656K wps
[Epoch 48 Batch 540/1540] avg loss 0.00321707, throughput 4.93843K wps
[Epoch 48 Batch 570/1540] avg loss 0.00317644, throughput 4.90379K wps
[Epoch 48 Batch 600/1540] avg loss 0.00321864, throughput 4.51684K wps
[Epoch 48 Batch 630/1540] avg loss 0.00343872, throughput 5.0164K wps
[Epoch 48 Batch 660/1540] avg loss 0.00332174, throughput 4.66923K wps
[Epoch 48 Batch 690/1540] avg loss 0.00326058, throughput 4.66026K wps
[Epoch 48 Batch 720/1540] avg loss 0.00354552, throughput 4.85793K wps
[Epoch 48 Batch 750/1540] avg loss 0.00314653, throughput 5.50697K wps
[Epoch 48 Batch 780/1540] avg loss 0.00304474, throughput 4.58499K wps
[Epoch 48 Batch 810/1540] avg loss 0.00341961, throughput 5.39831K wps
[Epoch 48 Batch 840/1540] avg loss 0.0034943, throughput 4.61289K wps
[Epoch 48 Batch 870/1540] avg loss 0.00358391, throughput 5.0244K wps
[Epoch 48 Batch 900/1540] avg loss 0.00309554, throughput 5.84647K wps
[Epoch 48 Batch 930/1540] avg loss 0.00332564, throughput 5.04459K wps
[Epoch 48 Batch 960/1540] avg loss 0.00321158, throughput 4.66227K wps
[Epoch 48 Batch 990/1540] avg loss 0.00317947, throughput 5.25274K wps
[Epoch 48 Batch 1020/1540] avg loss 0.00344628, throughput 4.65704K wps
[Epoch 48 Batch 1050/1540] avg loss 0.00300681, throughput 5.06867K wps
[Epoch 48 Batch 1080/1540] avg loss 0.00324765, throughput 5.32244K wps
[Epoch 48 Batch 1110/1540] avg loss 0.00286205, throughput 5.1928K wps
[Epoch 48 Batch 1140/1540] avg loss 0.00313678, throughput 5.61047K wps
[Epoch 48 Batch 1170/1540] avg loss 0.00345933, throughput 4.61713K wps
[Epoch 48 Batch 1200/1540] avg loss 0.00298888, throughput 4.64157K wps
[Epoch 48 Batch 1230/1540] avg loss 0.00387949, throughput 5.26355K wps
[Epoch 48 Batch 1260/1540] avg loss 0.00309214, throughput 4.71725K wps
[Epoch 48 Batch 1290/1540] avg loss 0.00328947, throughput 4.44378K wps
[Epoch 48 Batch 1320/1540] avg loss 0.00296807, throughput 4.47877K wps
[Epoch 48 Batch 1350/1540] avg loss 0.0035514, throughput 4.70912K wps
[Epoch 48 Batch 1380/1540] avg loss 0.00308725, throughput 4.80757K wps
[Epoch 48 Batch 1410/1540] avg loss 0.00285772, throughput 4.92052K wps
[Epoch 48 Batch 1440/1540] avg loss 0.00334965, throughput 4.5417K wps
[Epoch 48 Batch 1470/1540] avg loss 0.00333563, throughput 5.0228K wps
[Epoch 48 Batch 1500/1540] avg loss 0.0034308, throughput 4.93783K wps
[Epoch 48 Batch 1530/1540] avg loss 0.0031174, throughput 5.23154K wps
Begin Testing...
[Epoch 48] train avg loss 0.00324258, dev acc 0.8475, dev avg loss 0.372754, throughput 4.95365K wps
[Epoch 49 Batch 30/1540] avg loss 0.00322388, throughput 4.74479K wps
[Epoch 49 Batch 60/1540] avg loss 0.00292741, throughput 5.08865K wps
[Epoch 49 Batch 90/1540] avg loss 0.00283867, throughput 4.54735K wps
[Epoch 49 Batch 120/1540] avg loss 0.00313632, throughput 4.88514K wps
[Epoch 49 Batch 150/1540] avg loss 0.00328159, throughput 5.27353K wps
[Epoch 49 Batch 180/1540] avg loss 0.00352336, throughput 5.22829K wps
[Epoch 49 Batch 210/1540] avg loss 0.00299282, throughput 5.25657K wps
[Epoch 49 Batch 240/1540] avg loss 0.0028781, throughput 5.7533K wps
[Epoch 49 Batch 270/1540] avg loss 0.00294514, throughput 5.14446K wps
[Epoch 49 Batch 300/1540] avg loss 0.00330345, throughput 5.18272K wps
[Epoch 49 Batch 330/1540] avg loss 0.00324921, throughput 4.66759K wps
[Epoch 49 Batch 360/1540] avg loss 0.00334101, throughput 5.24757K wps
[Epoch 49 Batch 390/1540] avg loss 0.00324336, throughput 4.42342K wps
[Epoch 49 Batch 420/1540] avg loss 0.00346449, throughput 4.59091K wps
[Epoch 49 Batch 450/1540] avg loss 0.00322636, throughput 5.48689K wps
[Epoch 49 Batch 480/1540] avg loss 0.00320452, throughput 5.10148K wps
[Epoch 49 Batch 510/1540] avg loss 0.00310924, throughput 4.86428K wps
[Epoch 49 Batch 540/1540] avg loss 0.00291541, throughput 4.89088K wps
[Epoch 49 Batch 570/1540] avg loss 0.00336037, throughput 4.72353K wps
[Epoch 49 Batch 600/1540] avg loss 0.00350428, throughput 4.6924K wps
[Epoch 49 Batch 630/1540] avg loss 0.00323253, throughput 4.66787K wps
[Epoch 49 Batch 660/1540] avg loss 0.00296026, throughput 6.21136K wps
[Epoch 49 Batch 690/1540] avg loss 0.00296411, throughput 5.57942K wps
[Epoch 49 Batch 720/1540] avg loss 0.00356837, throughput 4.58185K wps
[Epoch 49 Batch 750/1540] avg loss 0.00361773, throughput 5.30606K wps
[Epoch 49 Batch 780/1540] avg loss 0.00329273, throughput 4.82697K wps
[Epoch 49 Batch 810/1540] avg loss 0.00297161, throughput 4.73996K wps
[Epoch 49 Batch 840/1540] avg loss 0.00303807, throughput 4.57835K wps
[Epoch 49 Batch 870/1540] avg loss 0.00305217, throughput 4.65517K wps
[Epoch 49 Batch 900/1540] avg loss 0.00334816, throughput 5.18424K wps
[Epoch 49 Batch 930/1540] avg loss 0.0032254, throughput 5.19189K wps
[Epoch 49 Batch 960/1540] avg loss 0.00313482, throughput 4.97296K wps
[Epoch 49 Batch 990/1540] avg loss 0.00303356, throughput 4.71071K wps
[Epoch 49 Batch 1020/1540] avg loss 0.0030985, throughput 4.91208K wps
[Epoch 49 Batch 1050/1540] avg loss 0.00319069, throughput 4.71612K wps
[Epoch 49 Batch 1080/1540] avg loss 0.00329337, throughput 5.1279K wps
[Epoch 49 Batch 1110/1540] avg loss 0.00283336, throughput 5.0316K wps
[Epoch 49 Batch 1140/1540] avg loss 0.0033389, throughput 4.65437K wps
[Epoch 49 Batch 1170/1540] avg loss 0.00359259, throughput 5.03947K wps
[Epoch 49 Batch 1200/1540] avg loss 0.00310889, throughput 5.5349K wps
[Epoch 49 Batch 1230/1540] avg loss 0.00326588, throughput 4.67132K wps
[Epoch 49 Batch 1260/1540] avg loss 0.00325925, throughput 5.64655K wps
[Epoch 49 Batch 1290/1540] avg loss 0.00307708, throughput 5.16802K wps
[Epoch 49 Batch 1320/1540] avg loss 0.0035124, throughput 5.15583K wps
[Epoch 49 Batch 1350/1540] avg loss 0.00334959, throughput 4.66861K wps
[Epoch 49 Batch 1380/1540] avg loss 0.00318621, throughput 4.83016K wps
[Epoch 49 Batch 1410/1540] avg loss 0.00319821, throughput 5.73295K wps
[Epoch 49 Batch 1440/1540] avg loss 0.00318938, throughput 4.8052K wps
[Epoch 49 Batch 1470/1540] avg loss 0.00303248, throughput 4.63683K wps
[Epoch 49 Batch 1500/1540] avg loss 0.00315089, throughput 4.49619K wps
[Epoch 49 Batch 1530/1540] avg loss 0.00318133, throughput 4.82399K wps
Begin Testing...
[Epoch 49] train avg loss 0.0031966, dev acc 0.8383, dev avg loss 0.37428, throughput 4.96129K wps
[Epoch 50 Batch 30/1540] avg loss 0.00343037, throughput 5.29136K wps
[Epoch 50 Batch 60/1540] avg loss 0.00311498, throughput 5.40051K wps
[Epoch 50 Batch 90/1540] avg loss 0.00352578, throughput 4.40442K wps
[Epoch 50 Batch 120/1540] avg loss 0.00283003, throughput 4.79623K wps
[Epoch 50 Batch 150/1540] avg loss 0.00300133, throughput 4.53454K wps
[Epoch 50 Batch 180/1540] avg loss 0.00335861, throughput 4.89968K wps
[Epoch 50 Batch 210/1540] avg loss 0.00297704, throughput 4.85646K wps
[Epoch 50 Batch 240/1540] avg loss 0.00366785, throughput 5.03452K wps
[Epoch 50 Batch 270/1540] avg loss 0.00325389, throughput 4.99854K wps
[Epoch 50 Batch 300/1540] avg loss 0.00313783, throughput 4.72662K wps
[Epoch 50 Batch 330/1540] avg loss 0.00288966, throughput 4.67542K wps
[Epoch 50 Batch 360/1540] avg loss 0.00341837, throughput 5.7818K wps
[Epoch 50 Batch 390/1540] avg loss 0.00314406, throughput 5.09332K wps
[Epoch 50 Batch 420/1540] avg loss 0.00342474, throughput 4.95285K wps
[Epoch 50 Batch 450/1540] avg loss 0.00319968, throughput 4.60194K wps
[Epoch 50 Batch 480/1540] avg loss 0.0031214, throughput 4.91137K wps
[Epoch 50 Batch 510/1540] avg loss 0.00303314, throughput 4.67281K wps
[Epoch 50 Batch 540/1540] avg loss 0.00316072, throughput 4.96359K wps
[Epoch 50 Batch 570/1540] avg loss 0.00314547, throughput 4.5415K wps
[Epoch 50 Batch 600/1540] avg loss 0.0032477, throughput 5.00236K wps
[Epoch 50 Batch 630/1540] avg loss 0.00353058, throughput 4.76003K wps
[Epoch 50 Batch 660/1540] avg loss 0.00304787, throughput 4.64231K wps
[Epoch 50 Batch 690/1540] avg loss 0.00330432, throughput 4.42944K wps
[Epoch 50 Batch 720/1540] avg loss 0.0028132, throughput 4.79432K wps
[Epoch 50 Batch 750/1540] avg loss 0.00291804, throughput 5.09617K wps
[Epoch 50 Batch 780/1540] avg loss 0.00285831, throughput 5.07343K wps
[Epoch 50 Batch 810/1540] avg loss 0.00308238, throughput 5.08926K wps
[Epoch 50 Batch 840/1540] avg loss 0.00301531, throughput 4.80152K wps
[Epoch 50 Batch 870/1540] avg loss 0.00313549, throughput 5.31742K wps
[Epoch 50 Batch 900/1540] avg loss 0.00313932, throughput 5.01425K wps
[Epoch 50 Batch 930/1540] avg loss 0.0032199, throughput 4.92076K wps
[Epoch 50 Batch 960/1540] avg loss 0.00312824, throughput 5.83103K wps
[Epoch 50 Batch 990/1540] avg loss 0.00306208, throughput 4.86102K wps
[Epoch 50 Batch 1020/1540] avg loss 0.00326457, throughput 4.88902K wps
[Epoch 50 Batch 1050/1540] avg loss 0.00350759, throughput 4.86861K wps
[Epoch 50 Batch 1080/1540] avg loss 0.00318334, throughput 4.99375K wps
[Epoch 50 Batch 1110/1540] avg loss 0.00299971, throughput 4.941K wps
[Epoch 50 Batch 1140/1540] avg loss 0.00328987, throughput 5.48512K wps
[Epoch 50 Batch 1170/1540] avg loss 0.0031418, throughput 4.75995K wps
[Epoch 50 Batch 1200/1540] avg loss 0.00317505, throughput 4.49062K wps
[Epoch 50 Batch 1230/1540] avg loss 0.00315004, throughput 4.7974K wps
[Epoch 50 Batch 1260/1540] avg loss 0.00333035, throughput 4.83548K wps
[Epoch 50 Batch 1290/1540] avg loss 0.00271636, throughput 5.13395K wps
[Epoch 50 Batch 1320/1540] avg loss 0.00337166, throughput 4.89411K wps
[Epoch 50 Batch 1350/1540] avg loss 0.00307637, throughput 4.73051K wps
[Epoch 50 Batch 1380/1540] avg loss 0.00325927, throughput 4.69655K wps
[Epoch 50 Batch 1410/1540] avg loss 0.00327161, throughput 4.85509K wps
[Epoch 50 Batch 1440/1540] avg loss 0.00289682, throughput 4.73566K wps
[Epoch 50 Batch 1470/1540] avg loss 0.00336468, throughput 4.90636K wps
[Epoch 50 Batch 1500/1540] avg loss 0.00286949, throughput 4.92083K wps
[Epoch 50 Batch 1530/1540] avg loss 0.00335235, throughput 4.94236K wps
Begin Testing...
[Epoch 50] train avg loss 0.00316853, dev acc 0.8486, dev avg loss 0.376854, throughput 4.89574K wps
[Epoch 51 Batch 30/1540] avg loss 0.0031393, throughput 4.8097K wps
[Epoch 51 Batch 60/1540] avg loss 0.00236949, throughput 4.8259K wps
[Epoch 51 Batch 90/1540] avg loss 0.00343391, throughput 5.69478K wps
[Epoch 51 Batch 120/1540] avg loss 0.00281197, throughput 5.15773K wps
[Epoch 51 Batch 150/1540] avg loss 0.00322991, throughput 5.14637K wps
[Epoch 51 Batch 180/1540] avg loss 0.00303136, throughput 5.11611K wps
[Epoch 51 Batch 210/1540] avg loss 0.0032867, throughput 5.09556K wps
[Epoch 51 Batch 240/1540] avg loss 0.0033687, throughput 4.79641K wps
[Epoch 51 Batch 270/1540] avg loss 0.00316041, throughput 4.99579K wps
[Epoch 51 Batch 300/1540] avg loss 0.00337843, throughput 4.86965K wps
[Epoch 51 Batch 330/1540] avg loss 0.00299909, throughput 4.81816K wps
[Epoch 51 Batch 360/1540] avg loss 0.003016, throughput 5.70617K wps
[Epoch 51 Batch 390/1540] avg loss 0.00319708, throughput 4.67567K wps
[Epoch 51 Batch 420/1540] avg loss 0.00314519, throughput 4.47326K wps
[Epoch 51 Batch 450/1540] avg loss 0.00289043, throughput 5.16981K wps
[Epoch 51 Batch 480/1540] avg loss 0.00312888, throughput 5.08234K wps
[Epoch 51 Batch 510/1540] avg loss 0.00314021, throughput 4.58211K wps
[Epoch 51 Batch 540/1540] avg loss 0.00303926, throughput 4.91523K wps
[Epoch 51 Batch 570/1540] avg loss 0.00326423, throughput 5.39318K wps
[Epoch 51 Batch 600/1540] avg loss 0.00307518, throughput 4.98928K wps
[Epoch 51 Batch 630/1540] avg loss 0.00300447, throughput 4.95702K wps
[Epoch 51 Batch 660/1540] avg loss 0.0032701, throughput 4.7141K wps
[Epoch 51 Batch 690/1540] avg loss 0.00319505, throughput 5.28797K wps
[Epoch 51 Batch 720/1540] avg loss 0.00311492, throughput 4.91454K wps
[Epoch 51 Batch 750/1540] avg loss 0.00306026, throughput 5.30364K wps
[Epoch 51 Batch 780/1540] avg loss 0.00284469, throughput 4.61349K wps
[Epoch 51 Batch 810/1540] avg loss 0.00324904, throughput 4.6172K wps
[Epoch 51 Batch 840/1540] avg loss 0.00317315, throughput 4.87497K wps
[Epoch 51 Batch 870/1540] avg loss 0.00312956, throughput 5.00291K wps
[Epoch 51 Batch 900/1540] avg loss 0.00331401, throughput 5.43889K wps
[Epoch 51 Batch 930/1540] avg loss 0.00309833, throughput 4.70504K wps
[Epoch 51 Batch 960/1540] avg loss 0.00329287, throughput 5.06861K wps
[Epoch 51 Batch 990/1540] avg loss 0.00324061, throughput 4.43787K wps
[Epoch 51 Batch 1020/1540] avg loss 0.00323953, throughput 5.45394K wps
[Epoch 51 Batch 1050/1540] avg loss 0.00337106, throughput 4.96788K wps
[Epoch 51 Batch 1080/1540] avg loss 0.00318485, throughput 4.87145K wps
[Epoch 51 Batch 1110/1540] avg loss 0.00351314, throughput 4.77106K wps
[Epoch 51 Batch 1140/1540] avg loss 0.0030453, throughput 5.64399K wps
[Epoch 51 Batch 1170/1540] avg loss 0.00294621, throughput 5.39364K wps
[Epoch 51 Batch 1200/1540] avg loss 0.00285264, throughput 5.05694K wps
[Epoch 51 Batch 1230/1540] avg loss 0.00292145, throughput 4.73511K wps
[Epoch 51 Batch 1260/1540] avg loss 0.00350728, throughput 4.65422K wps
[Epoch 51 Batch 1290/1540] avg loss 0.00311813, throughput 4.68602K wps
[Epoch 51 Batch 1320/1540] avg loss 0.00310824, throughput 4.90032K wps
[Epoch 51 Batch 1350/1540] avg loss 0.00320259, throughput 5.03091K wps
[Epoch 51 Batch 1380/1540] avg loss 0.0032318, throughput 5.05515K wps
[Epoch 51 Batch 1410/1540] avg loss 0.00292868, throughput 5.58278K wps
[Epoch 51 Batch 1440/1540] avg loss 0.00304355, throughput 4.61873K wps
[Epoch 51 Batch 1470/1540] avg loss 0.00282234, throughput 5.44313K wps
[Epoch 51 Batch 1500/1540] avg loss 0.0032531, throughput 4.76417K wps
[Epoch 51 Batch 1530/1540] avg loss 0.00302924, throughput 5.20337K wps
Begin Testing...
[Epoch 51] train avg loss 0.0031254, dev acc 0.8417, dev avg loss 0.375454, throughput 4.97952K wps
[Epoch 52 Batch 30/1540] avg loss 0.00281657, throughput 4.86496K wps
[Epoch 52 Batch 60/1540] avg loss 0.00267031, throughput 5.23308K wps
[Epoch 52 Batch 90/1540] avg loss 0.00279833, throughput 4.76674K wps
[Epoch 52 Batch 120/1540] avg loss 0.00288172, throughput 4.88804K wps
[Epoch 52 Batch 150/1540] avg loss 0.00337039, throughput 4.75486K wps
[Epoch 52 Batch 180/1540] avg loss 0.0029719, throughput 5.23278K wps
[Epoch 52 Batch 210/1540] avg loss 0.00302378, throughput 5.42295K wps
[Epoch 52 Batch 240/1540] avg loss 0.00300178, throughput 4.6362K wps
[Epoch 52 Batch 270/1540] avg loss 0.00318371, throughput 4.82816K wps
[Epoch 52 Batch 300/1540] avg loss 0.00293069, throughput 4.93971K wps
[Epoch 52 Batch 330/1540] avg loss 0.00315273, throughput 4.82035K wps
[Epoch 52 Batch 360/1540] avg loss 0.00354403, throughput 4.82702K wps
[Epoch 52 Batch 390/1540] avg loss 0.00283317, throughput 5.11339K wps
[Epoch 52 Batch 420/1540] avg loss 0.00317098, throughput 5.31857K wps
[Epoch 52 Batch 450/1540] avg loss 0.0036016, throughput 5.80486K wps
[Epoch 52 Batch 480/1540] avg loss 0.00319249, throughput 4.87225K wps
[Epoch 52 Batch 510/1540] avg loss 0.00309486, throughput 4.76213K wps
[Epoch 52 Batch 540/1540] avg loss 0.00289353, throughput 5.28349K wps
[Epoch 52 Batch 570/1540] avg loss 0.00302344, throughput 5.00035K wps
[Epoch 52 Batch 600/1540] avg loss 0.00352627, throughput 4.83639K wps
[Epoch 52 Batch 630/1540] avg loss 0.00301119, throughput 4.71221K wps
[Epoch 52 Batch 660/1540] avg loss 0.00307545, throughput 5.11612K wps
[Epoch 52 Batch 690/1540] avg loss 0.0028156, throughput 4.80185K wps
[Epoch 52 Batch 720/1540] avg loss 0.00308536, throughput 4.44957K wps
[Epoch 52 Batch 750/1540] avg loss 0.0033956, throughput 4.83702K wps
[Epoch 52 Batch 780/1540] avg loss 0.00302631, throughput 5.23208K wps
[Epoch 52 Batch 810/1540] avg loss 0.00313895, throughput 5.00797K wps
[Epoch 52 Batch 840/1540] avg loss 0.00338335, throughput 4.94811K wps
[Epoch 52 Batch 870/1540] avg loss 0.0031895, throughput 4.55785K wps
[Epoch 52 Batch 900/1540] avg loss 0.00313145, throughput 4.99157K wps
[Epoch 52 Batch 930/1540] avg loss 0.00282423, throughput 5.36758K wps
[Epoch 52 Batch 960/1540] avg loss 0.00326585, throughput 4.82643K wps
[Epoch 52 Batch 990/1540] avg loss 0.00306586, throughput 5.68202K wps
[Epoch 52 Batch 1020/1540] avg loss 0.00319355, throughput 4.7989K wps
[Epoch 52 Batch 1050/1540] avg loss 0.00285754, throughput 4.84659K wps
[Epoch 52 Batch 1080/1540] avg loss 0.0029864, throughput 4.67079K wps
[Epoch 52 Batch 1110/1540] avg loss 0.00315672, throughput 5.14464K wps
[Epoch 52 Batch 1140/1540] avg loss 0.00309489, throughput 4.53862K wps
[Epoch 52 Batch 1170/1540] avg loss 0.00282962, throughput 4.60088K wps
[Epoch 52 Batch 1200/1540] avg loss 0.00288624, throughput 4.92972K wps
[Epoch 52 Batch 1230/1540] avg loss 0.00297594, throughput 5.57764K wps
[Epoch 52 Batch 1260/1540] avg loss 0.0033848, throughput 4.89496K wps
[Epoch 52 Batch 1290/1540] avg loss 0.00310554, throughput 5.21153K wps
[Epoch 52 Batch 1320/1540] avg loss 0.00275551, throughput 4.82953K wps
[Epoch 52 Batch 1350/1540] avg loss 0.00300932, throughput 5.0288K wps
[Epoch 52 Batch 1380/1540] avg loss 0.00309668, throughput 4.61941K wps
[Epoch 52 Batch 1410/1540] avg loss 0.00293855, throughput 4.61264K wps
[Epoch 52 Batch 1440/1540] avg loss 0.00354346, throughput 4.93456K wps
[Epoch 52 Batch 1470/1540] avg loss 0.00303904, throughput 5.73698K wps
[Epoch 52 Batch 1500/1540] avg loss 0.00323725, throughput 4.47244K wps
[Epoch 52 Batch 1530/1540] avg loss 0.00295564, throughput 5.0161K wps
Begin Testing...
[Epoch 52] train avg loss 0.00308304, dev acc 0.8394, dev avg loss 0.3745, throughput 4.94736K wps
[Epoch 53 Batch 30/1540] avg loss 0.00319834, throughput 4.52692K wps
[Epoch 53 Batch 60/1540] avg loss 0.00343268, throughput 4.66687K wps
[Epoch 53 Batch 90/1540] avg loss 0.00259544, throughput 5.30743K wps
[Epoch 53 Batch 120/1540] avg loss 0.00273203, throughput 4.7246K wps
[Epoch 53 Batch 150/1540] avg loss 0.00305411, throughput 4.36966K wps
[Epoch 53 Batch 180/1540] avg loss 0.00313747, throughput 5.23816K wps
[Epoch 53 Batch 210/1540] avg loss 0.00337964, throughput 4.49766K wps
[Epoch 53 Batch 240/1540] avg loss 0.00285506, throughput 4.75145K wps
[Epoch 53 Batch 270/1540] avg loss 0.0031254, throughput 4.98896K wps
[Epoch 53 Batch 300/1540] avg loss 0.00350666, throughput 5.01967K wps
[Epoch 53 Batch 330/1540] avg loss 0.00318938, throughput 4.47511K wps
[Epoch 53 Batch 360/1540] avg loss 0.00325284, throughput 5.07033K wps
[Epoch 53 Batch 390/1540] avg loss 0.00342997, throughput 4.99271K wps
[Epoch 53 Batch 420/1540] avg loss 0.0028626, throughput 4.79349K wps
[Epoch 53 Batch 450/1540] avg loss 0.00323404, throughput 4.95285K wps
[Epoch 53 Batch 480/1540] avg loss 0.00341193, throughput 5.2081K wps
[Epoch 53 Batch 510/1540] avg loss 0.00290716, throughput 4.96997K wps
[Epoch 53 Batch 540/1540] avg loss 0.00270805, throughput 5.34894K wps
[Epoch 53 Batch 570/1540] avg loss 0.00334012, throughput 5.27971K wps
[Epoch 53 Batch 600/1540] avg loss 0.00280983, throughput 4.80424K wps
[Epoch 53 Batch 630/1540] avg loss 0.00275019, throughput 4.96494K wps
[Epoch 53 Batch 660/1540] avg loss 0.00323315, throughput 4.55662K wps
[Epoch 53 Batch 690/1540] avg loss 0.00319174, throughput 4.53365K wps
[Epoch 53 Batch 720/1540] avg loss 0.00329978, throughput 4.71231K wps
[Epoch 53 Batch 750/1540] avg loss 0.002987, throughput 4.92587K wps
[Epoch 53 Batch 780/1540] avg loss 0.00258371, throughput 5.05608K wps
[Epoch 53 Batch 810/1540] avg loss 0.00297475, throughput 4.89646K wps
[Epoch 53 Batch 840/1540] avg loss 0.00286133, throughput 5.17663K wps
[Epoch 53 Batch 870/1540] avg loss 0.00286153, throughput 5.04478K wps
[Epoch 53 Batch 900/1540] avg loss 0.00285083, throughput 4.45382K wps
[Epoch 53 Batch 930/1540] avg loss 0.00292006, throughput 4.9165K wps
[Epoch 53 Batch 960/1540] avg loss 0.00281221, throughput 5.74971K wps
[Epoch 53 Batch 990/1540] avg loss 0.00325139, throughput 5.24851K wps
[Epoch 53 Batch 1020/1540] avg loss 0.0035478, throughput 4.78781K wps
[Epoch 53 Batch 1050/1540] avg loss 0.0034786, throughput 5.22594K wps
[Epoch 53 Batch 1080/1540] avg loss 0.00271614, throughput 5.04088K wps
[Epoch 53 Batch 1110/1540] avg loss 0.00282166, throughput 5.37375K wps
[Epoch 53 Batch 1140/1540] avg loss 0.00302264, throughput 4.64445K wps
[Epoch 53 Batch 1170/1540] avg loss 0.00325484, throughput 5.03261K wps
[Epoch 53 Batch 1200/1540] avg loss 0.00290183, throughput 4.83453K wps
[Epoch 53 Batch 1230/1540] avg loss 0.00304889, throughput 5.26515K wps
[Epoch 53 Batch 1260/1540] avg loss 0.00281319, throughput 4.93823K wps
[Epoch 53 Batch 1290/1540] avg loss 0.00332513, throughput 4.47083K wps
[Epoch 53 Batch 1320/1540] avg loss 0.00336699, throughput 4.70577K wps
[Epoch 53 Batch 1350/1540] avg loss 0.00332945, throughput 5.39762K wps
[Epoch 53 Batch 1380/1540] avg loss 0.00317065, throughput 4.98223K wps
[Epoch 53 Batch 1410/1540] avg loss 0.00327464, throughput 4.88582K wps
[Epoch 53 Batch 1440/1540] avg loss 0.00346559, throughput 4.69615K wps
[Epoch 53 Batch 1470/1540] avg loss 0.00322295, throughput 4.80271K wps
[Epoch 53 Batch 1500/1540] avg loss 0.00323572, throughput 4.7948K wps
[Epoch 53 Batch 1530/1540] avg loss 0.00291769, throughput 4.60319K wps
Begin Testing...
[Epoch 53] train avg loss 0.00308782, dev acc 0.8463, dev avg loss 0.375737, throughput 4.8945K wps
[Epoch 54 Batch 30/1540] avg loss 0.00281641, throughput 4.89419K wps
[Epoch 54 Batch 60/1540] avg loss 0.00292284, throughput 5.35475K wps
[Epoch 54 Batch 90/1540] avg loss 0.00299903, throughput 5.52998K wps
[Epoch 54 Batch 120/1540] avg loss 0.00297177, throughput 5.09983K wps
[Epoch 54 Batch 150/1540] avg loss 0.00283059, throughput 5.23281K wps
[Epoch 54 Batch 180/1540] avg loss 0.00266154, throughput 4.65302K wps
[Epoch 54 Batch 210/1540] avg loss 0.00281769, throughput 5.02704K wps
[Epoch 54 Batch 240/1540] avg loss 0.00300717, throughput 4.63559K wps
[Epoch 54 Batch 270/1540] avg loss 0.0027823, throughput 5.33572K wps
[Epoch 54 Batch 300/1540] avg loss 0.00301127, throughput 4.76517K wps
[Epoch 54 Batch 330/1540] avg loss 0.00312091, throughput 4.48241K wps
[Epoch 54 Batch 360/1540] avg loss 0.00312134, throughput 4.81762K wps
[Epoch 54 Batch 390/1540] avg loss 0.00277893, throughput 5.03638K wps
[Epoch 54 Batch 420/1540] avg loss 0.00309781, throughput 5.06402K wps
[Epoch 54 Batch 450/1540] avg loss 0.00313155, throughput 4.97886K wps
[Epoch 54 Batch 480/1540] avg loss 0.00309469, throughput 4.53608K wps
[Epoch 54 Batch 510/1540] avg loss 0.00302981, throughput 4.68032K wps
[Epoch 54 Batch 540/1540] avg loss 0.00313072, throughput 4.63305K wps
[Epoch 54 Batch 570/1540] avg loss 0.00319291, throughput 5.36812K wps
[Epoch 54 Batch 600/1540] avg loss 0.00287809, throughput 5.11158K wps
[Epoch 54 Batch 630/1540] avg loss 0.00292237, throughput 4.78946K wps
[Epoch 54 Batch 660/1540] avg loss 0.00322863, throughput 5.40148K wps
[Epoch 54 Batch 690/1540] avg loss 0.00336841, throughput 4.8261K wps
[Epoch 54 Batch 720/1540] avg loss 0.00321585, throughput 5.27093K wps
[Epoch 54 Batch 750/1540] avg loss 0.00299815, throughput 5.34211K wps
[Epoch 54 Batch 780/1540] avg loss 0.00274303, throughput 5.42595K wps
[Epoch 54 Batch 810/1540] avg loss 0.00289794, throughput 4.74728K wps
[Epoch 54 Batch 840/1540] avg loss 0.00305014, throughput 5.10085K wps
[Epoch 54 Batch 870/1540] avg loss 0.00307427, throughput 5.26136K wps
[Epoch 54 Batch 900/1540] avg loss 0.00303414, throughput 5.1382K wps
[Epoch 54 Batch 930/1540] avg loss 0.0037628, throughput 5.12051K wps
[Epoch 54 Batch 960/1540] avg loss 0.00323158, throughput 5.42567K wps
[Epoch 54 Batch 990/1540] avg loss 0.00319511, throughput 4.90048K wps
[Epoch 54 Batch 1020/1540] avg loss 0.00313059, throughput 5.00758K wps
[Epoch 54 Batch 1050/1540] avg loss 0.00290044, throughput 4.88594K wps
[Epoch 54 Batch 1080/1540] avg loss 0.00279636, throughput 4.92774K wps
[Epoch 54 Batch 1110/1540] avg loss 0.00302566, throughput 5.12K wps
[Epoch 54 Batch 1140/1540] avg loss 0.00336632, throughput 4.91246K wps
[Epoch 54 Batch 1170/1540] avg loss 0.00294297, throughput 5.09054K wps
[Epoch 54 Batch 1200/1540] avg loss 0.00288716, throughput 4.46285K wps
[Epoch 54 Batch 1230/1540] avg loss 0.00275416, throughput 4.86188K wps
[Epoch 54 Batch 1260/1540] avg loss 0.00302082, throughput 5.01614K wps
[Epoch 54 Batch 1290/1540] avg loss 0.00314225, throughput 4.97676K wps
[Epoch 54 Batch 1320/1540] avg loss 0.00286701, throughput 5.07057K wps
[Epoch 54 Batch 1350/1540] avg loss 0.00315076, throughput 4.7127K wps
[Epoch 54 Batch 1380/1540] avg loss 0.00291026, throughput 4.93257K wps
[Epoch 54 Batch 1410/1540] avg loss 0.00276144, throughput 5.1436K wps
[Epoch 54 Batch 1440/1540] avg loss 0.00297673, throughput 5.3511K wps
[Epoch 54 Batch 1470/1540] avg loss 0.00327431, throughput 4.63939K wps
[Epoch 54 Batch 1500/1540] avg loss 0.00311707, throughput 5.01881K wps
[Epoch 54 Batch 1530/1540] avg loss 0.00314786, throughput 5.42463K wps
Begin Testing...
[Epoch 54] train avg loss 0.00302589, dev acc 0.8452, dev avg loss 0.387168, throughput 5.00112K wps
[Epoch 55 Batch 30/1540] avg loss 0.00318822, throughput 4.85508K wps
[Epoch 55 Batch 60/1540] avg loss 0.00304314, throughput 4.54449K wps
[Epoch 55 Batch 90/1540] avg loss 0.00268879, throughput 4.93769K wps
[Epoch 55 Batch 120/1540] avg loss 0.00299755, throughput 4.67961K wps
[Epoch 55 Batch 150/1540] avg loss 0.00315184, throughput 4.6289K wps
[Epoch 55 Batch 180/1540] avg loss 0.00298068, throughput 5.07357K wps
[Epoch 55 Batch 210/1540] avg loss 0.00274759, throughput 4.87277K wps
[Epoch 55 Batch 240/1540] avg loss 0.00264198, throughput 4.74794K wps
[Epoch 55 Batch 270/1540] avg loss 0.00269348, throughput 4.9492K wps
[Epoch 55 Batch 300/1540] avg loss 0.00285237, throughput 5.54099K wps
[Epoch 55 Batch 330/1540] avg loss 0.00326094, throughput 5.40079K wps
[Epoch 55 Batch 360/1540] avg loss 0.00265717, throughput 4.94147K wps
[Epoch 55 Batch 390/1540] avg loss 0.00312934, throughput 4.65255K wps
[Epoch 55 Batch 420/1540] avg loss 0.00299431, throughput 4.6495K wps
[Epoch 55 Batch 450/1540] avg loss 0.0028103, throughput 4.52255K wps
[Epoch 55 Batch 480/1540] avg loss 0.00316949, throughput 5.21432K wps
[Epoch 55 Batch 510/1540] avg loss 0.00283257, throughput 5.36957K wps
[Epoch 55 Batch 540/1540] avg loss 0.00283701, throughput 5.07252K wps
[Epoch 55 Batch 570/1540] avg loss 0.00322001, throughput 5.06936K wps
[Epoch 55 Batch 600/1540] avg loss 0.00308424, throughput 4.77779K wps
[Epoch 55 Batch 630/1540] avg loss 0.0029092, throughput 4.75457K wps
[Epoch 55 Batch 660/1540] avg loss 0.00289728, throughput 4.85628K wps
[Epoch 55 Batch 690/1540] avg loss 0.00314151, throughput 5.63038K wps
[Epoch 55 Batch 720/1540] avg loss 0.00303132, throughput 5.67635K wps
[Epoch 55 Batch 750/1540] avg loss 0.00317409, throughput 5.11187K wps
[Epoch 55 Batch 780/1540] avg loss 0.00309693, throughput 4.95733K wps
[Epoch 55 Batch 810/1540] avg loss 0.0029443, throughput 4.93338K wps
[Epoch 55 Batch 840/1540] avg loss 0.0025223, throughput 4.71647K wps
[Epoch 55 Batch 870/1540] avg loss 0.00274694, throughput 4.84947K wps
[Epoch 55 Batch 900/1540] avg loss 0.00310641, throughput 4.75359K wps
[Epoch 55 Batch 930/1540] avg loss 0.00281622, throughput 4.68924K wps
[Epoch 55 Batch 960/1540] avg loss 0.00331, throughput 4.9013K wps
[Epoch 55 Batch 990/1540] avg loss 0.00357099, throughput 5.3052K wps
[Epoch 55 Batch 1020/1540] avg loss 0.002968, throughput 4.58346K wps
[Epoch 55 Batch 1050/1540] avg loss 0.00301972, throughput 4.48973K wps
[Epoch 55 Batch 1080/1540] avg loss 0.00310256, throughput 4.61374K wps
[Epoch 55 Batch 1110/1540] avg loss 0.00299289, throughput 5.07407K wps
[Epoch 55 Batch 1140/1540] avg loss 0.00272425, throughput 5.15808K wps
[Epoch 55 Batch 1170/1540] avg loss 0.00303225, throughput 5.01611K wps
[Epoch 55 Batch 1200/1540] avg loss 0.00311032, throughput 4.60595K wps
[Epoch 55 Batch 1230/1540] avg loss 0.00315641, throughput 4.98173K wps
[Epoch 55 Batch 1260/1540] avg loss 0.00306205, throughput 5.14463K wps
[Epoch 55 Batch 1290/1540] avg loss 0.00318523, throughput 5.04758K wps
[Epoch 55 Batch 1320/1540] avg loss 0.00276927, throughput 4.72466K wps
[Epoch 55 Batch 1350/1540] avg loss 0.00307338, throughput 5.08361K wps
[Epoch 55 Batch 1380/1540] avg loss 0.00277286, throughput 4.47616K wps
[Epoch 55 Batch 1410/1540] avg loss 0.00286457, throughput 4.96772K wps
[Epoch 55 Batch 1440/1540] avg loss 0.00281363, throughput 4.96412K wps
[Epoch 55 Batch 1470/1540] avg loss 0.00294455, throughput 4.77839K wps
[Epoch 55 Batch 1500/1540] avg loss 0.00314886, throughput 4.62683K wps
[Epoch 55 Batch 1530/1540] avg loss 0.00282455, throughput 5.03229K wps
Begin Testing...
[Epoch 55] train avg loss 0.0029802, dev acc 0.8452, dev avg loss 0.389495, throughput 4.90609K wps
[Epoch 56 Batch 30/1540] avg loss 0.00326962, throughput 4.58504K wps
[Epoch 56 Batch 60/1540] avg loss 0.00273768, throughput 4.56011K wps
[Epoch 56 Batch 90/1540] avg loss 0.00286204, throughput 5.44984K wps
[Epoch 56 Batch 120/1540] avg loss 0.00287645, throughput 5.06292K wps
[Epoch 56 Batch 150/1540] avg loss 0.00301595, throughput 5.05097K wps
[Epoch 56 Batch 180/1540] avg loss 0.0030761, throughput 4.49018K wps
[Epoch 56 Batch 210/1540] avg loss 0.00290909, throughput 4.55319K wps
[Epoch 56 Batch 240/1540] avg loss 0.00295138, throughput 5.10897K wps
[Epoch 56 Batch 270/1540] avg loss 0.00305695, throughput 4.46565K wps
[Epoch 56 Batch 300/1540] avg loss 0.00268556, throughput 5.00494K wps
[Epoch 56 Batch 330/1540] avg loss 0.00309273, throughput 5.27233K wps
[Epoch 56 Batch 360/1540] avg loss 0.00294364, throughput 4.84362K wps
[Epoch 56 Batch 390/1540] avg loss 0.00298442, throughput 4.58805K wps
[Epoch 56 Batch 420/1540] avg loss 0.00296509, throughput 4.91815K wps
[Epoch 56 Batch 450/1540] avg loss 0.00307851, throughput 5.22421K wps
[Epoch 56 Batch 480/1540] avg loss 0.00278098, throughput 5.07535K wps
[Epoch 56 Batch 510/1540] avg loss 0.00306285, throughput 5.13405K wps
[Epoch 56 Batch 540/1540] avg loss 0.0029989, throughput 4.49227K wps
[Epoch 56 Batch 570/1540] avg loss 0.00306842, throughput 5.20088K wps
[Epoch 56 Batch 600/1540] avg loss 0.00272458, throughput 4.97629K wps
[Epoch 56 Batch 630/1540] avg loss 0.00283802, throughput 5.10201K wps
[Epoch 56 Batch 660/1540] avg loss 0.00302823, throughput 4.93905K wps
[Epoch 56 Batch 690/1540] avg loss 0.00282709, throughput 4.88025K wps
[Epoch 56 Batch 720/1540] avg loss 0.00278237, throughput 4.6291K wps
[Epoch 56 Batch 750/1540] avg loss 0.00310288, throughput 4.54069K wps
[Epoch 56 Batch 780/1540] avg loss 0.00338238, throughput 5.26007K wps
[Epoch 56 Batch 810/1540] avg loss 0.0030457, throughput 5.11419K wps
[Epoch 56 Batch 840/1540] avg loss 0.00261481, throughput 5.83538K wps
[Epoch 56 Batch 870/1540] avg loss 0.00308876, throughput 5.01454K wps
[Epoch 56 Batch 900/1540] avg loss 0.002644, throughput 5.17425K wps
[Epoch 56 Batch 930/1540] avg loss 0.00290044, throughput 4.7957K wps
[Epoch 56 Batch 960/1540] avg loss 0.00285801, throughput 4.90292K wps
[Epoch 56 Batch 990/1540] avg loss 0.00314917, throughput 5.03542K wps
[Epoch 56 Batch 1020/1540] avg loss 0.00332755, throughput 4.71194K wps
[Epoch 56 Batch 1050/1540] avg loss 0.00302479, throughput 4.91606K wps
[Epoch 56 Batch 1080/1540] avg loss 0.0031516, throughput 4.68992K wps
[Epoch 56 Batch 1110/1540] avg loss 0.00309251, throughput 4.76126K wps
[Epoch 56 Batch 1140/1540] avg loss 0.00316667, throughput 5.53208K wps
[Epoch 56 Batch 1170/1540] avg loss 0.00329935, throughput 4.72082K wps
[Epoch 56 Batch 1200/1540] avg loss 0.00284826, throughput 4.98593K wps
[Epoch 56 Batch 1230/1540] avg loss 0.00315421, throughput 5.05743K wps
[Epoch 56 Batch 1260/1540] avg loss 0.00303952, throughput 5.34549K wps
[Epoch 56 Batch 1290/1540] avg loss 0.00305304, throughput 5.19076K wps
[Epoch 56 Batch 1320/1540] avg loss 0.00273316, throughput 4.59018K wps
[Epoch 56 Batch 1350/1540] avg loss 0.00316105, throughput 4.87691K wps
[Epoch 56 Batch 1380/1540] avg loss 0.00297961, throughput 4.70934K wps
[Epoch 56 Batch 1410/1540] avg loss 0.00298961, throughput 5.07911K wps
[Epoch 56 Batch 1440/1540] avg loss 0.00310218, throughput 4.74844K wps
[Epoch 56 Batch 1470/1540] avg loss 0.00287688, throughput 4.89421K wps
[Epoch 56 Batch 1500/1540] avg loss 0.00281304, throughput 4.82065K wps
[Epoch 56 Batch 1530/1540] avg loss 0.00282445, throughput 5.0406K wps
Begin Testing...
[Epoch 56] train avg loss 0.00298348, dev acc 0.8417, dev avg loss 0.377178, throughput 4.92829K wps
[Epoch 57 Batch 30/1540] avg loss 0.0027021, throughput 4.86171K wps
[Epoch 57 Batch 60/1540] avg loss 0.00294119, throughput 3.46394K wps
[Epoch 57 Batch 90/1540] avg loss 0.00323161, throughput 4.57928K wps
[Epoch 57 Batch 120/1540] avg loss 0.00297244, throughput 4.95343K wps
[Epoch 57 Batch 150/1540] avg loss 0.00283364, throughput 4.6456K wps
[Epoch 57 Batch 180/1540] avg loss 0.00266027, throughput 4.98391K wps
[Epoch 57 Batch 210/1540] avg loss 0.00285945, throughput 4.77369K wps
[Epoch 57 Batch 240/1540] avg loss 0.00271675, throughput 5.21248K wps
[Epoch 57 Batch 270/1540] avg loss 0.00304913, throughput 4.88767K wps
[Epoch 57 Batch 300/1540] avg loss 0.00291326, throughput 5.2305K wps
[Epoch 57 Batch 330/1540] avg loss 0.00273234, throughput 5.3072K wps
[Epoch 57 Batch 360/1540] avg loss 0.00277527, throughput 4.71426K wps
[Epoch 57 Batch 390/1540] avg loss 0.00323774, throughput 4.7961K wps
[Epoch 57 Batch 420/1540] avg loss 0.00311135, throughput 5.17537K wps
[Epoch 57 Batch 450/1540] avg loss 0.00290459, throughput 4.92904K wps
[Epoch 57 Batch 480/1540] avg loss 0.00292908, throughput 4.65314K wps
[Epoch 57 Batch 510/1540] avg loss 0.00303237, throughput 5.04449K wps
[Epoch 57 Batch 540/1540] avg loss 0.00279077, throughput 5.38575K wps
[Epoch 57 Batch 570/1540] avg loss 0.00299836, throughput 4.73814K wps
[Epoch 57 Batch 600/1540] avg loss 0.00287345, throughput 4.74815K wps
[Epoch 57 Batch 630/1540] avg loss 0.00290068, throughput 4.74316K wps
[Epoch 57 Batch 660/1540] avg loss 0.0026606, throughput 5.18937K wps
[Epoch 57 Batch 690/1540] avg loss 0.00292526, throughput 4.89363K wps
[Epoch 57 Batch 720/1540] avg loss 0.00321549, throughput 4.55478K wps
[Epoch 57 Batch 750/1540] avg loss 0.00279657, throughput 4.38446K wps
[Epoch 57 Batch 780/1540] avg loss 0.00301354, throughput 4.87084K wps
[Epoch 57 Batch 810/1540] avg loss 0.00291092, throughput 4.41144K wps
[Epoch 57 Batch 840/1540] avg loss 0.00338678, throughput 4.94603K wps
[Epoch 57 Batch 870/1540] avg loss 0.00286906, throughput 4.96281K wps
[Epoch 57 Batch 900/1540] avg loss 0.00312863, throughput 4.9491K wps
[Epoch 57 Batch 930/1540] avg loss 0.00278422, throughput 4.91366K wps
[Epoch 57 Batch 960/1540] avg loss 0.00297194, throughput 5.15807K wps
[Epoch 57 Batch 990/1540] avg loss 0.00326273, throughput 5.21422K wps
[Epoch 57 Batch 1020/1540] avg loss 0.00327567, throughput 4.61679K wps
[Epoch 57 Batch 1050/1540] avg loss 0.0029244, throughput 4.84967K wps
[Epoch 57 Batch 1080/1540] avg loss 0.00298177, throughput 4.70153K wps
[Epoch 57 Batch 1110/1540] avg loss 0.00314631, throughput 4.62397K wps
[Epoch 57 Batch 1140/1540] avg loss 0.0024483, throughput 5.18388K wps
[Epoch 57 Batch 1170/1540] avg loss 0.00303253, throughput 4.96631K wps
[Epoch 57 Batch 1200/1540] avg loss 0.00298042, throughput 4.82726K wps
[Epoch 57 Batch 1230/1540] avg loss 0.00296361, throughput 4.78137K wps
[Epoch 57 Batch 1260/1540] avg loss 0.00315532, throughput 5.25715K wps
[Epoch 57 Batch 1290/1540] avg loss 0.00304099, throughput 4.70664K wps
[Epoch 57 Batch 1320/1540] avg loss 0.00318785, throughput 5.13199K wps
[Epoch 57 Batch 1350/1540] avg loss 0.00284064, throughput 4.99736K wps
[Epoch 57 Batch 1380/1540] avg loss 0.00303776, throughput 5.00012K wps
[Epoch 57 Batch 1410/1540] avg loss 0.00306142, throughput 4.721K wps
[Epoch 57 Batch 1440/1540] avg loss 0.00270814, throughput 4.94416K wps
[Epoch 57 Batch 1470/1540] avg loss 0.00272785, throughput 5.21297K wps
[Epoch 57 Batch 1500/1540] avg loss 0.00310004, throughput 4.67659K wps
[Epoch 57 Batch 1530/1540] avg loss 0.00300002, throughput 5.57453K wps
Begin Testing...
[Epoch 57] train avg loss 0.00295374, dev acc 0.8406, dev avg loss 0.37527, throughput 4.85596K wps
[Epoch 58 Batch 30/1540] avg loss 0.00296021, throughput 4.35011K wps
[Epoch 58 Batch 60/1540] avg loss 0.00272681, throughput 4.74274K wps
[Epoch 58 Batch 90/1540] avg loss 0.0029493, throughput 5.05787K wps
[Epoch 58 Batch 120/1540] avg loss 0.00287141, throughput 5.24666K wps
[Epoch 58 Batch 150/1540] avg loss 0.0028191, throughput 5.04199K wps
[Epoch 58 Batch 180/1540] avg loss 0.00288193, throughput 4.60351K wps
[Epoch 58 Batch 210/1540] avg loss 0.00293665, throughput 5.22535K wps
[Epoch 58 Batch 240/1540] avg loss 0.00302103, throughput 5.21244K wps
[Epoch 58 Batch 270/1540] avg loss 0.00305886, throughput 4.80829K wps
[Epoch 58 Batch 300/1540] avg loss 0.00275156, throughput 4.93345K wps
[Epoch 58 Batch 330/1540] avg loss 0.00279561, throughput 5.05782K wps
[Epoch 58 Batch 360/1540] avg loss 0.00297315, throughput 5.09342K wps
[Epoch 58 Batch 390/1540] avg loss 0.00289075, throughput 4.62413K wps
[Epoch 58 Batch 420/1540] avg loss 0.00346124, throughput 5.06529K wps
[Epoch 58 Batch 450/1540] avg loss 0.00314623, throughput 5.08096K wps
[Epoch 58 Batch 480/1540] avg loss 0.00279432, throughput 4.84137K wps
[Epoch 58 Batch 510/1540] avg loss 0.00295639, throughput 4.46977K wps
[Epoch 58 Batch 540/1540] avg loss 0.00289515, throughput 5.23566K wps
[Epoch 58 Batch 570/1540] avg loss 0.00265807, throughput 5.12222K wps
[Epoch 58 Batch 600/1540] avg loss 0.00294852, throughput 4.82118K wps
[Epoch 58 Batch 630/1540] avg loss 0.00267666, throughput 4.56955K wps
[Epoch 58 Batch 660/1540] avg loss 0.00282695, throughput 4.94934K wps
[Epoch 58 Batch 690/1540] avg loss 0.00261478, throughput 5.48652K wps
[Epoch 58 Batch 720/1540] avg loss 0.00322508, throughput 4.90229K wps
[Epoch 58 Batch 750/1540] avg loss 0.00278746, throughput 4.74882K wps
[Epoch 58 Batch 780/1540] avg loss 0.0029929, throughput 5.19517K wps
[Epoch 58 Batch 810/1540] avg loss 0.00282118, throughput 4.83361K wps
[Epoch 58 Batch 840/1540] avg loss 0.00266154, throughput 5.22666K wps
[Epoch 58 Batch 870/1540] avg loss 0.00283, throughput 4.57856K wps
[Epoch 58 Batch 900/1540] avg loss 0.00298651, throughput 4.53217K wps
[Epoch 58 Batch 930/1540] avg loss 0.00274145, throughput 5.15543K wps
[Epoch 58 Batch 960/1540] avg loss 0.00288883, throughput 4.78323K wps
[Epoch 58 Batch 990/1540] avg loss 0.00289242, throughput 5.29561K wps
[Epoch 58 Batch 1020/1540] avg loss 0.00275332, throughput 4.84213K wps
[Epoch 58 Batch 1050/1540] avg loss 0.00291505, throughput 5.09811K wps
[Epoch 58 Batch 1080/1540] avg loss 0.00326984, throughput 5.1433K wps
[Epoch 58 Batch 1110/1540] avg loss 0.00313555, throughput 4.67701K wps
[Epoch 58 Batch 1140/1540] avg loss 0.00269014, throughput 4.96949K wps
[Epoch 58 Batch 1170/1540] avg loss 0.00305488, throughput 4.67458K wps
[Epoch 58 Batch 1200/1540] avg loss 0.00308789, throughput 5.19977K wps
[Epoch 58 Batch 1230/1540] avg loss 0.002701, throughput 4.79231K wps
[Epoch 58 Batch 1260/1540] avg loss 0.00282769, throughput 5.06483K wps
[Epoch 58 Batch 1290/1540] avg loss 0.00299703, throughput 4.98123K wps
[Epoch 58 Batch 1320/1540] avg loss 0.00311953, throughput 4.65112K wps
[Epoch 58 Batch 1350/1540] avg loss 0.00302837, throughput 4.9115K wps
[Epoch 58 Batch 1380/1540] avg loss 0.00299986, throughput 5.55285K wps
[Epoch 58 Batch 1410/1540] avg loss 0.00317296, throughput 4.93679K wps
[Epoch 58 Batch 1440/1540] avg loss 0.00293112, throughput 4.75488K wps
[Epoch 58 Batch 1470/1540] avg loss 0.00305511, throughput 5.15876K wps
[Epoch 58 Batch 1500/1540] avg loss 0.00342613, throughput 4.84766K wps
[Epoch 58 Batch 1530/1540] avg loss 0.00279256, throughput 4.79976K wps
Begin Testing...
[Epoch 58] train avg loss 0.00293195, dev acc 0.8417, dev avg loss 0.378752, throughput 4.92699K wps
[Epoch 59 Batch 30/1540] avg loss 0.00275337, throughput 4.60347K wps
[Epoch 59 Batch 60/1540] avg loss 0.00307158, throughput 4.52513K wps
[Epoch 59 Batch 90/1540] avg loss 0.0026399, throughput 4.86364K wps
[Epoch 59 Batch 120/1540] avg loss 0.00267188, throughput 5.04661K wps
[Epoch 59 Batch 150/1540] avg loss 0.0030251, throughput 4.98958K wps
[Epoch 59 Batch 180/1540] avg loss 0.00275272, throughput 5.18652K wps
[Epoch 59 Batch 210/1540] avg loss 0.00310153, throughput 4.7649K wps
[Epoch 59 Batch 240/1540] avg loss 0.00276338, throughput 4.88774K wps
[Epoch 59 Batch 270/1540] avg loss 0.0030055, throughput 4.84925K wps
[Epoch 59 Batch 300/1540] avg loss 0.00272503, throughput 4.73233K wps
[Epoch 59 Batch 330/1540] avg loss 0.00307508, throughput 4.57786K wps
[Epoch 59 Batch 360/1540] avg loss 0.00305398, throughput 5.00309K wps
[Epoch 59 Batch 390/1540] avg loss 0.00327454, throughput 4.91822K wps
[Epoch 59 Batch 420/1540] avg loss 0.0029527, throughput 4.8955K wps
[Epoch 59 Batch 450/1540] avg loss 0.0026713, throughput 5.20055K wps
[Epoch 59 Batch 480/1540] avg loss 0.00270162, throughput 5.61402K wps
[Epoch 59 Batch 510/1540] avg loss 0.00300233, throughput 5.07572K wps
[Epoch 59 Batch 540/1540] avg loss 0.00281192, throughput 4.99388K wps
[Epoch 59 Batch 570/1540] avg loss 0.00319122, throughput 5.21296K wps
[Epoch 59 Batch 600/1540] avg loss 0.00264751, throughput 4.36325K wps
[Epoch 59 Batch 630/1540] avg loss 0.00326914, throughput 5.32465K wps
[Epoch 59 Batch 660/1540] avg loss 0.00281225, throughput 5.19251K wps
[Epoch 59 Batch 690/1540] avg loss 0.00284798, throughput 5.34774K wps
[Epoch 59 Batch 720/1540] avg loss 0.00286053, throughput 4.87283K wps
[Epoch 59 Batch 750/1540] avg loss 0.00291104, throughput 5.20028K wps
[Epoch 59 Batch 780/1540] avg loss 0.00253602, throughput 5.01126K wps
[Epoch 59 Batch 810/1540] avg loss 0.00297159, throughput 4.89141K wps
[Epoch 59 Batch 840/1540] avg loss 0.00272554, throughput 4.93614K wps
[Epoch 59 Batch 870/1540] avg loss 0.00292254, throughput 4.55207K wps
[Epoch 59 Batch 900/1540] avg loss 0.00303977, throughput 4.5639K wps
[Epoch 59 Batch 930/1540] avg loss 0.00284363, throughput 5.08716K wps
[Epoch 59 Batch 960/1540] avg loss 0.00278099, throughput 5.00755K wps
[Epoch 59 Batch 990/1540] avg loss 0.00298899, throughput 5.91373K wps
[Epoch 59 Batch 1020/1540] avg loss 0.00313904, throughput 5.50988K wps
[Epoch 59 Batch 1050/1540] avg loss 0.00272963, throughput 5.30814K wps
[Epoch 59 Batch 1080/1540] avg loss 0.00315927, throughput 4.76788K wps
[Epoch 59 Batch 1110/1540] avg loss 0.0029814, throughput 4.77821K wps
[Epoch 59 Batch 1140/1540] avg loss 0.00270833, throughput 5.01461K wps
[Epoch 59 Batch 1170/1540] avg loss 0.0029551, throughput 5.76082K wps
[Epoch 59 Batch 1200/1540] avg loss 0.00257976, throughput 4.69226K wps
[Epoch 59 Batch 1230/1540] avg loss 0.0029329, throughput 5.18583K wps
[Epoch 59 Batch 1260/1540] avg loss 0.00325997, throughput 5.11449K wps
[Epoch 59 Batch 1290/1540] avg loss 0.00295275, throughput 4.78246K wps
[Epoch 59 Batch 1320/1540] avg loss 0.00298321, throughput 4.75414K wps
[Epoch 59 Batch 1350/1540] avg loss 0.00291443, throughput 4.64052K wps
[Epoch 59 Batch 1380/1540] avg loss 0.00283003, throughput 4.54135K wps
[Epoch 59 Batch 1410/1540] avg loss 0.00296275, throughput 5.0855K wps
[Epoch 59 Batch 1440/1540] avg loss 0.00256758, throughput 5.07861K wps
[Epoch 59 Batch 1470/1540] avg loss 0.00263487, throughput 4.71765K wps
[Epoch 59 Batch 1500/1540] avg loss 0.00277377, throughput 4.6362K wps
[Epoch 59 Batch 1530/1540] avg loss 0.0026854, throughput 5.28476K wps
Begin Testing...
[Epoch 59] train avg loss 0.00288318, dev acc 0.8463, dev avg loss 0.382178, throughput 4.96119K wps
[Epoch 60 Batch 30/1540] avg loss 0.00264427, throughput 5.14675K wps
[Epoch 60 Batch 60/1540] avg loss 0.00294995, throughput 5.00486K wps
[Epoch 60 Batch 90/1540] avg loss 0.00282478, throughput 5.13658K wps
[Epoch 60 Batch 120/1540] avg loss 0.0029353, throughput 4.8172K wps
[Epoch 60 Batch 150/1540] avg loss 0.00280497, throughput 5.17794K wps
[Epoch 60 Batch 180/1540] avg loss 0.00260089, throughput 4.83151K wps
[Epoch 60 Batch 210/1540] avg loss 0.00260211, throughput 5.24357K wps
[Epoch 60 Batch 240/1540] avg loss 0.00284022, throughput 5.62839K wps
[Epoch 60 Batch 270/1540] avg loss 0.00283915, throughput 5.27562K wps
[Epoch 60 Batch 300/1540] avg loss 0.0027806, throughput 5.37085K wps
[Epoch 60 Batch 330/1540] avg loss 0.00272228, throughput 5.03455K wps
[Epoch 60 Batch 360/1540] avg loss 0.00267141, throughput 4.84835K wps
[Epoch 60 Batch 390/1540] avg loss 0.00292561, throughput 5.43936K wps
[Epoch 60 Batch 420/1540] avg loss 0.00299, throughput 5.0148K wps
[Epoch 60 Batch 450/1540] avg loss 0.00267462, throughput 4.66002K wps
[Epoch 60 Batch 480/1540] avg loss 0.00329851, throughput 5.02106K wps
[Epoch 60 Batch 510/1540] avg loss 0.00284552, throughput 4.74963K wps
[Epoch 60 Batch 540/1540] avg loss 0.00307209, throughput 4.76448K wps
[Epoch 60 Batch 570/1540] avg loss 0.00310874, throughput 5.21349K wps
[Epoch 60 Batch 600/1540] avg loss 0.00263565, throughput 5.14216K wps
[Epoch 60 Batch 630/1540] avg loss 0.00268643, throughput 4.59481K wps
[Epoch 60 Batch 660/1540] avg loss 0.00301322, throughput 4.95938K wps
[Epoch 60 Batch 690/1540] avg loss 0.00275817, throughput 4.66293K wps
[Epoch 60 Batch 720/1540] avg loss 0.00307916, throughput 5.14126K wps
[Epoch 60 Batch 750/1540] avg loss 0.0029621, throughput 4.97566K wps
[Epoch 60 Batch 780/1540] avg loss 0.00268357, throughput 5.03748K wps
[Epoch 60 Batch 810/1540] avg loss 0.00259338, throughput 5.76952K wps
[Epoch 60 Batch 840/1540] avg loss 0.00306578, throughput 4.75676K wps
[Epoch 60 Batch 870/1540] avg loss 0.00282052, throughput 4.97556K wps
[Epoch 60 Batch 900/1540] avg loss 0.00276159, throughput 5.23996K wps
[Epoch 60 Batch 930/1540] avg loss 0.00309675, throughput 4.71048K wps
[Epoch 60 Batch 960/1540] avg loss 0.00306536, throughput 4.63926K wps
[Epoch 60 Batch 990/1540] avg loss 0.00256849, throughput 5.28954K wps
[Epoch 60 Batch 1020/1540] avg loss 0.0026755, throughput 5.05903K wps
[Epoch 60 Batch 1050/1540] avg loss 0.00275858, throughput 5.21168K wps
[Epoch 60 Batch 1080/1540] avg loss 0.00320191, throughput 4.89337K wps
[Epoch 60 Batch 1110/1540] avg loss 0.00290138, throughput 4.87166K wps
[Epoch 60 Batch 1140/1540] avg loss 0.0035331, throughput 4.7528K wps
[Epoch 60 Batch 1170/1540] avg loss 0.00296023, throughput 4.64386K wps
[Epoch 60 Batch 1200/1540] avg loss 0.00289434, throughput 4.93368K wps
[Epoch 60 Batch 1230/1540] avg loss 0.00268928, throughput 5.49339K wps
[Epoch 60 Batch 1260/1540] avg loss 0.00313897, throughput 5.01486K wps
[Epoch 60 Batch 1290/1540] avg loss 0.00277645, throughput 4.73614K wps
[Epoch 60 Batch 1320/1540] avg loss 0.00293173, throughput 5.04095K wps
[Epoch 60 Batch 1350/1540] avg loss 0.00298182, throughput 4.92896K wps
[Epoch 60 Batch 1380/1540] avg loss 0.00280071, throughput 5.11259K wps
[Epoch 60 Batch 1410/1540] avg loss 0.00300082, throughput 4.69972K wps
[Epoch 60 Batch 1440/1540] avg loss 0.00306886, throughput 5.44544K wps
[Epoch 60 Batch 1470/1540] avg loss 0.00287259, throughput 4.84465K wps
[Epoch 60 Batch 1500/1540] avg loss 0.00293651, throughput 4.42257K wps
[Epoch 60 Batch 1530/1540] avg loss 0.0029712, throughput 5.25662K wps
Begin Testing...
[Epoch 60] train avg loss 0.00288104, dev acc 0.8417, dev avg loss 0.37928, throughput 4.99248K wps
[Epoch 61 Batch 30/1540] avg loss 0.0028803, throughput 5.06926K wps
[Epoch 61 Batch 60/1540] avg loss 0.00276651, throughput 5.09966K wps
[Epoch 61 Batch 90/1540] avg loss 0.0026034, throughput 5.17277K wps
[Epoch 61 Batch 120/1540] avg loss 0.0026626, throughput 5.11629K wps
[Epoch 61 Batch 150/1540] avg loss 0.0024582, throughput 5.28948K wps
[Epoch 61 Batch 180/1540] avg loss 0.00309211, throughput 5.58768K wps
[Epoch 61 Batch 210/1540] avg loss 0.00282122, throughput 4.79828K wps
[Epoch 61 Batch 240/1540] avg loss 0.00318062, throughput 5.03656K wps
[Epoch 61 Batch 270/1540] avg loss 0.00283184, throughput 4.80157K wps
[Epoch 61 Batch 300/1540] avg loss 0.00314329, throughput 4.7986K wps
[Epoch 61 Batch 330/1540] avg loss 0.00280291, throughput 4.94713K wps
[Epoch 61 Batch 360/1540] avg loss 0.00280192, throughput 4.63232K wps
[Epoch 61 Batch 390/1540] avg loss 0.00284913, throughput 5.21293K wps
[Epoch 61 Batch 420/1540] avg loss 0.00251001, throughput 4.88011K wps
[Epoch 61 Batch 450/1540] avg loss 0.00285684, throughput 4.8114K wps
[Epoch 61 Batch 480/1540] avg loss 0.00257621, throughput 5.60532K wps
[Epoch 61 Batch 510/1540] avg loss 0.00326903, throughput 5.23049K wps
[Epoch 61 Batch 540/1540] avg loss 0.00290016, throughput 4.70186K wps
[Epoch 61 Batch 570/1540] avg loss 0.00300123, throughput 5.60728K wps
[Epoch 61 Batch 600/1540] avg loss 0.00266439, throughput 5.5428K wps
[Epoch 61 Batch 630/1540] avg loss 0.00277581, throughput 6.07229K wps
[Epoch 61 Batch 660/1540] avg loss 0.00331154, throughput 4.94496K wps
[Epoch 61 Batch 690/1540] avg loss 0.00283367, throughput 4.70594K wps
[Epoch 61 Batch 720/1540] avg loss 0.00296883, throughput 4.54186K wps
[Epoch 61 Batch 750/1540] avg loss 0.00295872, throughput 5.39978K wps
[Epoch 61 Batch 780/1540] avg loss 0.00252812, throughput 4.57527K wps
[Epoch 61 Batch 810/1540] avg loss 0.00254656, throughput 4.71148K wps
[Epoch 61 Batch 840/1540] avg loss 0.00284591, throughput 5.52437K wps
[Epoch 61 Batch 870/1540] avg loss 0.00273094, throughput 4.6288K wps
[Epoch 61 Batch 900/1540] avg loss 0.00288293, throughput 4.54416K wps
[Epoch 61 Batch 930/1540] avg loss 0.00231936, throughput 4.88876K wps
[Epoch 61 Batch 960/1540] avg loss 0.00255863, throughput 4.98992K wps
[Epoch 61 Batch 990/1540] avg loss 0.00282167, throughput 5.11987K wps
[Epoch 61 Batch 1020/1540] avg loss 0.00299492, throughput 4.88963K wps
[Epoch 61 Batch 1050/1540] avg loss 0.00283736, throughput 4.8648K wps
[Epoch 61 Batch 1080/1540] avg loss 0.00282895, throughput 5.54827K wps
[Epoch 61 Batch 1110/1540] avg loss 0.00265642, throughput 5.42935K wps
[Epoch 61 Batch 1140/1540] avg loss 0.00270893, throughput 5.06385K wps
[Epoch 61 Batch 1170/1540] avg loss 0.00279759, throughput 4.63896K wps
[Epoch 61 Batch 1200/1540] avg loss 0.00282509, throughput 4.86172K wps
[Epoch 61 Batch 1230/1540] avg loss 0.00314181, throughput 4.94156K wps
[Epoch 61 Batch 1260/1540] avg loss 0.00305351, throughput 4.7833K wps
[Epoch 61 Batch 1290/1540] avg loss 0.00276191, throughput 5.05693K wps
[Epoch 61 Batch 1320/1540] avg loss 0.00295928, throughput 4.86063K wps
[Epoch 61 Batch 1350/1540] avg loss 0.00299252, throughput 5.09292K wps
[Epoch 61 Batch 1380/1540] avg loss 0.00270296, throughput 4.77099K wps
[Epoch 61 Batch 1410/1540] avg loss 0.00293982, throughput 4.65427K wps
[Epoch 61 Batch 1440/1540] avg loss 0.00268542, throughput 5.46991K wps
[Epoch 61 Batch 1470/1540] avg loss 0.0027005, throughput 4.81088K wps
[Epoch 61 Batch 1500/1540] avg loss 0.00288848, throughput 4.61275K wps
[Epoch 61 Batch 1530/1540] avg loss 0.00297281, throughput 5.04717K wps
Begin Testing...
[Epoch 61] train avg loss 0.00282567, dev acc 0.8429, dev avg loss 0.380545, throughput 5.00063K wps
[Epoch 62 Batch 30/1540] avg loss 0.00316127, throughput 4.64689K wps
[Epoch 62 Batch 60/1540] avg loss 0.00257875, throughput 4.71072K wps
[Epoch 62 Batch 90/1540] avg loss 0.00268716, throughput 4.71634K wps
[Epoch 62 Batch 120/1540] avg loss 0.00273877, throughput 5.07099K wps
[Epoch 62 Batch 150/1540] avg loss 0.00271078, throughput 6.0555K wps
[Epoch 62 Batch 180/1540] avg loss 0.00304285, throughput 5.13084K wps
[Epoch 62 Batch 210/1540] avg loss 0.00270445, throughput 4.90165K wps
[Epoch 62 Batch 240/1540] avg loss 0.00291386, throughput 5.17455K wps
[Epoch 62 Batch 270/1540] avg loss 0.00270696, throughput 4.82127K wps
[Epoch 62 Batch 300/1540] avg loss 0.00253744, throughput 4.68136K wps
[Epoch 62 Batch 330/1540] avg loss 0.00288387, throughput 5.28763K wps
[Epoch 62 Batch 360/1540] avg loss 0.00255967, throughput 4.81826K wps
[Epoch 62 Batch 390/1540] avg loss 0.00292743, throughput 5.1699K wps
[Epoch 62 Batch 420/1540] avg loss 0.00267588, throughput 4.52907K wps
[Epoch 62 Batch 450/1540] avg loss 0.00281051, throughput 5.2791K wps
[Epoch 62 Batch 480/1540] avg loss 0.00303519, throughput 5.13871K wps
[Epoch 62 Batch 510/1540] avg loss 0.00336499, throughput 5.40505K wps
[Epoch 62 Batch 540/1540] avg loss 0.00234002, throughput 5.05541K wps
[Epoch 62 Batch 570/1540] avg loss 0.00238253, throughput 5.31681K wps
[Epoch 62 Batch 600/1540] avg loss 0.00270263, throughput 5.07775K wps
[Epoch 62 Batch 630/1540] avg loss 0.00255898, throughput 4.73719K wps
[Epoch 62 Batch 660/1540] avg loss 0.00302128, throughput 5.44592K wps
[Epoch 62 Batch 690/1540] avg loss 0.0029607, throughput 4.65612K wps
[Epoch 62 Batch 720/1540] avg loss 0.00254407, throughput 4.73091K wps
[Epoch 62 Batch 750/1540] avg loss 0.00311714, throughput 4.62572K wps
[Epoch 62 Batch 780/1540] avg loss 0.00282918, throughput 5.0468K wps
[Epoch 62 Batch 810/1540] avg loss 0.00278738, throughput 4.79484K wps
[Epoch 62 Batch 840/1540] avg loss 0.00288097, throughput 4.81994K wps
[Epoch 62 Batch 870/1540] avg loss 0.00295676, throughput 4.96316K wps
[Epoch 62 Batch 900/1540] avg loss 0.0027532, throughput 4.73293K wps
[Epoch 62 Batch 930/1540] avg loss 0.00266427, throughput 4.71451K wps
[Epoch 62 Batch 960/1540] avg loss 0.00288446, throughput 5.20832K wps
[Epoch 62 Batch 990/1540] avg loss 0.00311663, throughput 5.34089K wps
[Epoch 62 Batch 1020/1540] avg loss 0.00272279, throughput 4.79022K wps
[Epoch 62 Batch 1050/1540] avg loss 0.00288248, throughput 5.25126K wps
[Epoch 62 Batch 1080/1540] avg loss 0.00273569, throughput 4.79302K wps
[Epoch 62 Batch 1110/1540] avg loss 0.00313962, throughput 4.85294K wps
[Epoch 62 Batch 1140/1540] avg loss 0.00301939, throughput 5.20505K wps
[Epoch 62 Batch 1170/1540] avg loss 0.0025284, throughput 4.81645K wps
[Epoch 62 Batch 1200/1540] avg loss 0.00276575, throughput 5.25436K wps
[Epoch 62 Batch 1230/1540] avg loss 0.0026196, throughput 4.89149K wps
[Epoch 62 Batch 1260/1540] avg loss 0.00279959, throughput 5.0552K wps
[Epoch 62 Batch 1290/1540] avg loss 0.00270273, throughput 5.40261K wps
[Epoch 62 Batch 1320/1540] avg loss 0.00287906, throughput 5.07581K wps
[Epoch 62 Batch 1350/1540] avg loss 0.00277056, throughput 4.49908K wps
[Epoch 62 Batch 1380/1540] avg loss 0.00273135, throughput 4.58153K wps
[Epoch 62 Batch 1410/1540] avg loss 0.00270386, throughput 4.75152K wps
[Epoch 62 Batch 1440/1540] avg loss 0.00302438, throughput 5.1575K wps
[Epoch 62 Batch 1470/1540] avg loss 0.00256425, throughput 4.7685K wps
[Epoch 62 Batch 1500/1540] avg loss 0.00310532, throughput 4.9514K wps
[Epoch 62 Batch 1530/1540] avg loss 0.00275686, throughput 4.54715K wps
Begin Testing...
[Epoch 62] train avg loss 0.00280732, dev acc 0.8475, dev avg loss 0.380762, throughput 4.95236K wps
[Epoch 63 Batch 30/1540] avg loss 0.00261607, throughput 5.05952K wps
[Epoch 63 Batch 60/1540] avg loss 0.00257319, throughput 5.10834K wps
[Epoch 63 Batch 90/1540] avg loss 0.00254335, throughput 4.97595K wps
[Epoch 63 Batch 120/1540] avg loss 0.00247688, throughput 5.25491K wps
[Epoch 63 Batch 150/1540] avg loss 0.00232636, throughput 4.87309K wps
[Epoch 63 Batch 180/1540] avg loss 0.00251133, throughput 4.65828K wps
[Epoch 63 Batch 210/1540] avg loss 0.00271148, throughput 4.63568K wps
[Epoch 63 Batch 240/1540] avg loss 0.00322666, throughput 4.83258K wps
[Epoch 63 Batch 270/1540] avg loss 0.00254485, throughput 5.62111K wps
[Epoch 63 Batch 300/1540] avg loss 0.00243479, throughput 4.92241K wps
[Epoch 63 Batch 330/1540] avg loss 0.00254882, throughput 4.57108K wps
[Epoch 63 Batch 360/1540] avg loss 0.00317728, throughput 5.33227K wps
[Epoch 63 Batch 390/1540] avg loss 0.00301267, throughput 4.96903K wps
[Epoch 63 Batch 420/1540] avg loss 0.00263538, throughput 5.44387K wps
[Epoch 63 Batch 450/1540] avg loss 0.00271661, throughput 6.02347K wps
[Epoch 63 Batch 480/1540] avg loss 0.00273631, throughput 5.0479K wps
[Epoch 63 Batch 510/1540] avg loss 0.00287363, throughput 5.19418K wps
[Epoch 63 Batch 540/1540] avg loss 0.00268197, throughput 4.93636K wps
[Epoch 63 Batch 570/1540] avg loss 0.00243357, throughput 4.87358K wps
[Epoch 63 Batch 600/1540] avg loss 0.00288034, throughput 4.89148K wps
[Epoch 63 Batch 630/1540] avg loss 0.00298557, throughput 4.57684K wps
[Epoch 63 Batch 660/1540] avg loss 0.00299552, throughput 5.35817K wps
[Epoch 63 Batch 690/1540] avg loss 0.002679, throughput 5.093K wps
[Epoch 63 Batch 720/1540] avg loss 0.00307454, throughput 4.64055K wps
[Epoch 63 Batch 750/1540] avg loss 0.00284285, throughput 5.74344K wps
[Epoch 63 Batch 780/1540] avg loss 0.00271544, throughput 4.87703K wps
[Epoch 63 Batch 810/1540] avg loss 0.00291585, throughput 5.07065K wps
[Epoch 63 Batch 840/1540] avg loss 0.00323611, throughput 4.81045K wps
[Epoch 63 Batch 870/1540] avg loss 0.00272381, throughput 4.76979K wps
[Epoch 63 Batch 900/1540] avg loss 0.00232075, throughput 4.78928K wps
[Epoch 63 Batch 930/1540] avg loss 0.00303889, throughput 4.47205K wps
[Epoch 63 Batch 960/1540] avg loss 0.00288945, throughput 4.47875K wps
[Epoch 63 Batch 990/1540] avg loss 0.00254968, throughput 4.78564K wps
[Epoch 63 Batch 1020/1540] avg loss 0.00299165, throughput 5.2604K wps
[Epoch 63 Batch 1050/1540] avg loss 0.00297129, throughput 4.73041K wps
[Epoch 63 Batch 1080/1540] avg loss 0.00252394, throughput 4.64142K wps
[Epoch 63 Batch 1110/1540] avg loss 0.00247323, throughput 4.87862K wps
[Epoch 63 Batch 1140/1540] avg loss 0.00284911, throughput 4.82523K wps
[Epoch 63 Batch 1170/1540] avg loss 0.00255035, throughput 5.056K wps
[Epoch 63 Batch 1200/1540] avg loss 0.00299253, throughput 4.93649K wps
[Epoch 63 Batch 1230/1540] avg loss 0.00282383, throughput 4.85084K wps
[Epoch 63 Batch 1260/1540] avg loss 0.00291321, throughput 4.89295K wps
[Epoch 63 Batch 1290/1540] avg loss 0.0027089, throughput 4.84737K wps
[Epoch 63 Batch 1320/1540] avg loss 0.0029266, throughput 4.91474K wps
[Epoch 63 Batch 1350/1540] avg loss 0.00289432, throughput 5.46844K wps
[Epoch 63 Batch 1380/1540] avg loss 0.00296689, throughput 4.88033K wps
[Epoch 63 Batch 1410/1540] avg loss 0.00348974, throughput 5.12546K wps
[Epoch 63 Batch 1440/1540] avg loss 0.0026987, throughput 4.99191K wps
[Epoch 63 Batch 1470/1540] avg loss 0.0026458, throughput 5.29113K wps
[Epoch 63 Batch 1500/1540] avg loss 0.00319837, throughput 4.93799K wps
[Epoch 63 Batch 1530/1540] avg loss 0.00295266, throughput 4.82004K wps
Begin Testing...
[Epoch 63] train avg loss 0.00279392, dev acc 0.8463, dev avg loss 0.388555, throughput 4.96373K wps
[Epoch 64 Batch 30/1540] avg loss 0.00271301, throughput 4.42936K wps
[Epoch 64 Batch 60/1540] avg loss 0.00231537, throughput 4.53424K wps
[Epoch 64 Batch 90/1540] avg loss 0.00262996, throughput 5.73143K wps
[Epoch 64 Batch 120/1540] avg loss 0.00263816, throughput 5.13216K wps
[Epoch 64 Batch 150/1540] avg loss 0.00285325, throughput 5.39424K wps
[Epoch 64 Batch 180/1540] avg loss 0.0028634, throughput 5.01347K wps
[Epoch 64 Batch 210/1540] avg loss 0.00274977, throughput 5.48287K wps
[Epoch 64 Batch 240/1540] avg loss 0.00279254, throughput 5.62416K wps
[Epoch 64 Batch 270/1540] avg loss 0.00275409, throughput 4.8605K wps
[Epoch 64 Batch 300/1540] avg loss 0.00272965, throughput 4.72654K wps
[Epoch 64 Batch 330/1540] avg loss 0.00287147, throughput 5.01921K wps
[Epoch 64 Batch 360/1540] avg loss 0.00279649, throughput 5.07591K wps
[Epoch 64 Batch 390/1540] avg loss 0.00294898, throughput 5.13472K wps
[Epoch 64 Batch 420/1540] avg loss 0.00262033, throughput 5.10084K wps
[Epoch 64 Batch 450/1540] avg loss 0.00252548, throughput 4.7118K wps
[Epoch 64 Batch 480/1540] avg loss 0.00273116, throughput 5.19578K wps
[Epoch 64 Batch 510/1540] avg loss 0.00275555, throughput 4.90628K wps
[Epoch 64 Batch 540/1540] avg loss 0.00352501, throughput 5.19823K wps
[Epoch 64 Batch 570/1540] avg loss 0.0028321, throughput 4.81646K wps
[Epoch 64 Batch 600/1540] avg loss 0.00290715, throughput 4.69279K wps
[Epoch 64 Batch 630/1540] avg loss 0.00307772, throughput 5.18115K wps
[Epoch 64 Batch 660/1540] avg loss 0.00277344, throughput 5.38532K wps
[Epoch 64 Batch 690/1540] avg loss 0.00246863, throughput 5.35776K wps
[Epoch 64 Batch 720/1540] avg loss 0.00290637, throughput 4.96482K wps
[Epoch 64 Batch 750/1540] avg loss 0.00251843, throughput 4.88639K wps
[Epoch 64 Batch 780/1540] avg loss 0.00260146, throughput 4.50829K wps
[Epoch 64 Batch 810/1540] avg loss 0.00292672, throughput 4.46184K wps
[Epoch 64 Batch 840/1540] avg loss 0.00304444, throughput 4.69602K wps
[Epoch 64 Batch 870/1540] avg loss 0.00286947, throughput 5.00155K wps
[Epoch 64 Batch 900/1540] avg loss 0.00272845, throughput 5.13106K wps
[Epoch 64 Batch 930/1540] avg loss 0.00273389, throughput 4.90406K wps
[Epoch 64 Batch 960/1540] avg loss 0.00273065, throughput 5.57915K wps
[Epoch 64 Batch 990/1540] avg loss 0.00312334, throughput 5.16579K wps
[Epoch 64 Batch 1020/1540] avg loss 0.00262415, throughput 4.70417K wps
[Epoch 64 Batch 1050/1540] avg loss 0.00260195, throughput 4.82578K wps
[Epoch 64 Batch 1080/1540] avg loss 0.00297701, throughput 4.90357K wps
[Epoch 64 Batch 1110/1540] avg loss 0.0025822, throughput 5.10157K wps
[Epoch 64 Batch 1140/1540] avg loss 0.00295877, throughput 5.11046K wps
[Epoch 64 Batch 1170/1540] avg loss 0.00314107, throughput 5.26575K wps
[Epoch 64 Batch 1200/1540] avg loss 0.00263689, throughput 4.67161K wps
[Epoch 64 Batch 1230/1540] avg loss 0.0026486, throughput 4.63382K wps
[Epoch 64 Batch 1260/1540] avg loss 0.00273026, throughput 4.88671K wps
[Epoch 64 Batch 1290/1540] avg loss 0.00254339, throughput 4.95196K wps
[Epoch 64 Batch 1320/1540] avg loss 0.00243541, throughput 4.67388K wps
[Epoch 64 Batch 1350/1540] avg loss 0.00245947, throughput 4.93336K wps
[Epoch 64 Batch 1380/1540] avg loss 0.00271889, throughput 5.2259K wps
[Epoch 64 Batch 1410/1540] avg loss 0.00258418, throughput 4.88033K wps
[Epoch 64 Batch 1440/1540] avg loss 0.00323525, throughput 4.94783K wps
[Epoch 64 Batch 1470/1540] avg loss 0.00256228, throughput 4.90575K wps
[Epoch 64 Batch 1500/1540] avg loss 0.0028895, throughput 4.86877K wps
[Epoch 64 Batch 1530/1540] avg loss 0.00282839, throughput 4.79656K wps
Begin Testing...
[Epoch 64] train avg loss 0.0027698, dev acc 0.8440, dev avg loss 0.382202, throughput 4.96838K wps
[Epoch 65 Batch 30/1540] avg loss 0.002647, throughput 4.67489K wps
[Epoch 65 Batch 60/1540] avg loss 0.00252543, throughput 5.37246K wps
[Epoch 65 Batch 90/1540] avg loss 0.00263991, throughput 5.12129K wps
[Epoch 65 Batch 120/1540] avg loss 0.00270777, throughput 5.38779K wps
[Epoch 65 Batch 150/1540] avg loss 0.00268356, throughput 4.58565K wps
[Epoch 65 Batch 180/1540] avg loss 0.0024146, throughput 4.69281K wps
[Epoch 65 Batch 210/1540] avg loss 0.00295063, throughput 5.49012K wps
[Epoch 65 Batch 240/1540] avg loss 0.00301365, throughput 5.52134K wps
[Epoch 65 Batch 270/1540] avg loss 0.0028895, throughput 5.24742K wps
[Epoch 65 Batch 300/1540] avg loss 0.00293009, throughput 4.57911K wps
[Epoch 65 Batch 330/1540] avg loss 0.00271103, throughput 5.08731K wps
[Epoch 65 Batch 360/1540] avg loss 0.00276778, throughput 5.27911K wps
[Epoch 65 Batch 390/1540] avg loss 0.00261878, throughput 5.13102K wps
[Epoch 65 Batch 420/1540] avg loss 0.00243987, throughput 5.5963K wps
[Epoch 65 Batch 450/1540] avg loss 0.00272797, throughput 4.51278K wps
[Epoch 65 Batch 480/1540] avg loss 0.0026292, throughput 4.79278K wps
[Epoch 65 Batch 510/1540] avg loss 0.00286215, throughput 5.13519K wps
[Epoch 65 Batch 540/1540] avg loss 0.00261863, throughput 4.89848K wps
[Epoch 65 Batch 570/1540] avg loss 0.0026192, throughput 4.81114K wps
[Epoch 65 Batch 600/1540] avg loss 0.00274822, throughput 5.91956K wps
[Epoch 65 Batch 630/1540] avg loss 0.0025883, throughput 5.9972K wps
[Epoch 65 Batch 660/1540] avg loss 0.00284683, throughput 4.68252K wps
[Epoch 65 Batch 690/1540] avg loss 0.00276685, throughput 4.80079K wps
[Epoch 65 Batch 720/1540] avg loss 0.00257577, throughput 5.64077K wps
[Epoch 65 Batch 750/1540] avg loss 0.00249101, throughput 4.87424K wps
[Epoch 65 Batch 780/1540] avg loss 0.00256814, throughput 4.67243K wps
[Epoch 65 Batch 810/1540] avg loss 0.00292977, throughput 4.54889K wps
[Epoch 65 Batch 840/1540] avg loss 0.00252122, throughput 4.64872K wps
[Epoch 65 Batch 870/1540] avg loss 0.0026337, throughput 4.79182K wps
[Epoch 65 Batch 900/1540] avg loss 0.00283414, throughput 4.78714K wps
[Epoch 65 Batch 930/1540] avg loss 0.00275662, throughput 5.25164K wps
[Epoch 65 Batch 960/1540] avg loss 0.00235715, throughput 5.44723K wps
[Epoch 65 Batch 990/1540] avg loss 0.00254475, throughput 5.22619K wps
[Epoch 65 Batch 1020/1540] avg loss 0.00265348, throughput 4.76707K wps
[Epoch 65 Batch 1050/1540] avg loss 0.0028648, throughput 4.69983K wps
[Epoch 65 Batch 1080/1540] avg loss 0.00296549, throughput 4.45531K wps
[Epoch 65 Batch 1110/1540] avg loss 0.00252116, throughput 4.91352K wps
[Epoch 65 Batch 1140/1540] avg loss 0.00263067, throughput 4.80157K wps
[Epoch 65 Batch 1170/1540] avg loss 0.00273409, throughput 5.53194K wps
[Epoch 65 Batch 1200/1540] avg loss 0.00303827, throughput 5.16288K wps
[Epoch 65 Batch 1230/1540] avg loss 0.00284328, throughput 4.68141K wps
[Epoch 65 Batch 1260/1540] avg loss 0.00275848, throughput 4.75933K wps
[Epoch 65 Batch 1290/1540] avg loss 0.00244878, throughput 4.94971K wps
[Epoch 65 Batch 1320/1540] avg loss 0.00290243, throughput 4.9645K wps
[Epoch 65 Batch 1350/1540] avg loss 0.00299878, throughput 5.68587K wps
[Epoch 65 Batch 1380/1540] avg loss 0.00291753, throughput 4.78737K wps
[Epoch 65 Batch 1410/1540] avg loss 0.0026932, throughput 5.32918K wps
[Epoch 65 Batch 1440/1540] avg loss 0.00287091, throughput 4.98325K wps
[Epoch 65 Batch 1470/1540] avg loss 0.00293446, throughput 5.43958K wps
[Epoch 65 Batch 1500/1540] avg loss 0.00273597, throughput 5.36234K wps
[Epoch 65 Batch 1530/1540] avg loss 0.00246759, throughput 4.73238K wps
Begin Testing...
[Epoch 65] train avg loss 0.00272102, dev acc 0.8303, dev avg loss 0.414282, throughput 5.01611K wps
[Epoch 66 Batch 30/1540] avg loss 0.0029938, throughput 4.71132K wps
[Epoch 66 Batch 60/1540] avg loss 0.00285085, throughput 5.01706K wps
[Epoch 66 Batch 90/1540] avg loss 0.00255817, throughput 4.71695K wps
[Epoch 66 Batch 120/1540] avg loss 0.00244482, throughput 4.78023K wps
[Epoch 66 Batch 150/1540] avg loss 0.00272587, throughput 4.48239K wps
[Epoch 66 Batch 180/1540] avg loss 0.00247939, throughput 4.83722K wps
[Epoch 66 Batch 210/1540] avg loss 0.00260773, throughput 5.02969K wps
[Epoch 66 Batch 240/1540] avg loss 0.0027764, throughput 5.03143K wps
[Epoch 66 Batch 270/1540] avg loss 0.00260826, throughput 4.81786K wps
[Epoch 66 Batch 300/1540] avg loss 0.00283511, throughput 4.52195K wps
[Epoch 66 Batch 330/1540] avg loss 0.00266382, throughput 5.19016K wps
[Epoch 66 Batch 360/1540] avg loss 0.0023257, throughput 5.18739K wps
[Epoch 66 Batch 390/1540] avg loss 0.00274123, throughput 4.90208K wps
[Epoch 66 Batch 420/1540] avg loss 0.00276192, throughput 4.74697K wps
[Epoch 66 Batch 450/1540] avg loss 0.00270924, throughput 5.15151K wps
[Epoch 66 Batch 480/1540] avg loss 0.00273463, throughput 5.26802K wps
[Epoch 66 Batch 510/1540] avg loss 0.00291441, throughput 4.77604K wps
[Epoch 66 Batch 540/1540] avg loss 0.00265421, throughput 4.51518K wps
[Epoch 66 Batch 570/1540] avg loss 0.00301472, throughput 4.7308K wps
[Epoch 66 Batch 600/1540] avg loss 0.00254048, throughput 5.55259K wps
[Epoch 66 Batch 630/1540] avg loss 0.00306703, throughput 4.72648K wps
[Epoch 66 Batch 660/1540] avg loss 0.00236142, throughput 4.73719K wps
[Epoch 66 Batch 690/1540] avg loss 0.00261178, throughput 5.01141K wps
[Epoch 66 Batch 720/1540] avg loss 0.00301047, throughput 4.99455K wps
[Epoch 66 Batch 750/1540] avg loss 0.0027042, throughput 4.58601K wps
[Epoch 66 Batch 780/1540] avg loss 0.00261739, throughput 4.9891K wps
[Epoch 66 Batch 810/1540] avg loss 0.00283518, throughput 5.5484K wps
[Epoch 66 Batch 840/1540] avg loss 0.00291934, throughput 5.46456K wps
[Epoch 66 Batch 870/1540] avg loss 0.00247782, throughput 4.57269K wps
[Epoch 66 Batch 900/1540] avg loss 0.0029564, throughput 4.98791K wps
[Epoch 66 Batch 930/1540] avg loss 0.00299852, throughput 4.79379K wps
[Epoch 66 Batch 960/1540] avg loss 0.0026271, throughput 4.7527K wps
[Epoch 66 Batch 990/1540] avg loss 0.00277925, throughput 4.82913K wps
[Epoch 66 Batch 1020/1540] avg loss 0.00276607, throughput 4.69009K wps
[Epoch 66 Batch 1050/1540] avg loss 0.00300608, throughput 5.23531K wps
[Epoch 66 Batch 1080/1540] avg loss 0.00269682, throughput 4.49847K wps
[Epoch 66 Batch 1110/1540] avg loss 0.00274722, throughput 4.78337K wps
[Epoch 66 Batch 1140/1540] avg loss 0.0026502, throughput 4.83833K wps
[Epoch 66 Batch 1170/1540] avg loss 0.00290109, throughput 4.47089K wps
[Epoch 66 Batch 1200/1540] avg loss 0.0025768, throughput 4.92429K wps
[Epoch 66 Batch 1230/1540] avg loss 0.00229246, throughput 4.81489K wps
[Epoch 66 Batch 1260/1540] avg loss 0.00260205, throughput 4.97795K wps
[Epoch 66 Batch 1290/1540] avg loss 0.00248962, throughput 4.80955K wps
[Epoch 66 Batch 1320/1540] avg loss 0.00292835, throughput 4.99585K wps
[Epoch 66 Batch 1350/1540] avg loss 0.00273682, throughput 5.2903K wps
[Epoch 66 Batch 1380/1540] avg loss 0.00275198, throughput 5.14675K wps
[Epoch 66 Batch 1410/1540] avg loss 0.00279004, throughput 5.85455K wps
[Epoch 66 Batch 1440/1540] avg loss 0.00281823, throughput 5.11029K wps
[Epoch 66 Batch 1470/1540] avg loss 0.00275033, throughput 5.20362K wps
[Epoch 66 Batch 1500/1540] avg loss 0.00258441, throughput 4.98624K wps
[Epoch 66 Batch 1530/1540] avg loss 0.00272274, throughput 4.96323K wps
Begin Testing...
[Epoch 66] train avg loss 0.00272041, dev acc 0.8406, dev avg loss 0.392686, throughput 4.92092K wps
[Epoch 67 Batch 30/1540] avg loss 0.0027749, throughput 4.83924K wps
[Epoch 67 Batch 60/1540] avg loss 0.0028687, throughput 4.82797K wps
[Epoch 67 Batch 90/1540] avg loss 0.00273732, throughput 5.45674K wps
[Epoch 67 Batch 120/1540] avg loss 0.00262798, throughput 4.63601K wps
[Epoch 67 Batch 150/1540] avg loss 0.00250373, throughput 4.72732K wps
[Epoch 67 Batch 180/1540] avg loss 0.00282972, throughput 5.18985K wps
[Epoch 67 Batch 210/1540] avg loss 0.00249578, throughput 4.91027K wps
[Epoch 67 Batch 240/1540] avg loss 0.00262342, throughput 4.89999K wps
[Epoch 67 Batch 270/1540] avg loss 0.00253305, throughput 4.78122K wps
[Epoch 67 Batch 300/1540] avg loss 0.00313729, throughput 4.80552K wps
[Epoch 67 Batch 330/1540] avg loss 0.00278059, throughput 4.54085K wps
[Epoch 67 Batch 360/1540] avg loss 0.00298088, throughput 4.54158K wps
[Epoch 67 Batch 390/1540] avg loss 0.00244888, throughput 4.66243K wps
[Epoch 67 Batch 420/1540] avg loss 0.00252289, throughput 4.65709K wps
[Epoch 67 Batch 450/1540] avg loss 0.00249638, throughput 5.22542K wps
[Epoch 67 Batch 480/1540] avg loss 0.00271298, throughput 4.92705K wps
[Epoch 67 Batch 510/1540] avg loss 0.00248584, throughput 4.76079K wps
[Epoch 67 Batch 540/1540] avg loss 0.00266885, throughput 5.07522K wps
[Epoch 67 Batch 570/1540] avg loss 0.00267389, throughput 5.56771K wps
[Epoch 67 Batch 600/1540] avg loss 0.00250839, throughput 4.61132K wps
[Epoch 67 Batch 630/1540] avg loss 0.00280677, throughput 4.66272K wps
[Epoch 67 Batch 660/1540] avg loss 0.00258249, throughput 5.03148K wps
[Epoch 67 Batch 690/1540] avg loss 0.00255942, throughput 4.80535K wps
[Epoch 67 Batch 720/1540] avg loss 0.00251091, throughput 4.86404K wps
[Epoch 67 Batch 750/1540] avg loss 0.00273419, throughput 5.98012K wps
[Epoch 67 Batch 780/1540] avg loss 0.00267497, throughput 4.99998K wps
[Epoch 67 Batch 810/1540] avg loss 0.00248163, throughput 4.53904K wps
[Epoch 67 Batch 840/1540] avg loss 0.00261755, throughput 4.69509K wps
[Epoch 67 Batch 870/1540] avg loss 0.00286802, throughput 4.71777K wps
[Epoch 67 Batch 900/1540] avg loss 0.00309034, throughput 4.81919K wps
[Epoch 67 Batch 930/1540] avg loss 0.00261587, throughput 5.01936K wps
[Epoch 67 Batch 960/1540] avg loss 0.00273361, throughput 4.79311K wps
[Epoch 67 Batch 990/1540] avg loss 0.00260242, throughput 4.73906K wps
[Epoch 67 Batch 1020/1540] avg loss 0.00272677, throughput 5.41345K wps
[Epoch 67 Batch 1050/1540] avg loss 0.00276788, throughput 4.96045K wps
[Epoch 67 Batch 1080/1540] avg loss 0.00315854, throughput 4.90179K wps
[Epoch 67 Batch 1110/1540] avg loss 0.00283933, throughput 5.15563K wps
[Epoch 67 Batch 1140/1540] avg loss 0.00261495, throughput 4.73375K wps
[Epoch 67 Batch 1170/1540] avg loss 0.00233308, throughput 4.50861K wps
[Epoch 67 Batch 1200/1540] avg loss 0.00268759, throughput 5.41303K wps
[Epoch 67 Batch 1230/1540] avg loss 0.00288842, throughput 4.67305K wps
[Epoch 67 Batch 1260/1540] avg loss 0.00248955, throughput 4.77834K wps
[Epoch 67 Batch 1290/1540] avg loss 0.0026217, throughput 5.21143K wps
[Epoch 67 Batch 1320/1540] avg loss 0.00285194, throughput 4.79137K wps
[Epoch 67 Batch 1350/1540] avg loss 0.00278225, throughput 4.69969K wps
[Epoch 67 Batch 1380/1540] avg loss 0.00254756, throughput 4.4964K wps
[Epoch 67 Batch 1410/1540] avg loss 0.00247412, throughput 4.63201K wps
[Epoch 67 Batch 1440/1540] avg loss 0.00261427, throughput 4.95813K wps
[Epoch 67 Batch 1470/1540] avg loss 0.0025913, throughput 4.54581K wps
[Epoch 67 Batch 1500/1540] avg loss 0.00290951, throughput 5.03315K wps
[Epoch 67 Batch 1530/1540] avg loss 0.00268765, throughput 4.60705K wps
Begin Testing...
[Epoch 67] train avg loss 0.0026854, dev acc 0.8429, dev avg loss 0.384721, throughput 4.8629K wps
[Epoch 68 Batch 30/1540] avg loss 0.00260942, throughput 4.88671K wps
[Epoch 68 Batch 60/1540] avg loss 0.00255441, throughput 5.0439K wps
[Epoch 68 Batch 90/1540] avg loss 0.00270247, throughput 4.99054K wps
[Epoch 68 Batch 120/1540] avg loss 0.00280375, throughput 5.2237K wps
[Epoch 68 Batch 150/1540] avg loss 0.00249944, throughput 5.01305K wps
[Epoch 68 Batch 180/1540] avg loss 0.00225147, throughput 4.57889K wps
[Epoch 68 Batch 210/1540] avg loss 0.00266491, throughput 4.95837K wps
[Epoch 68 Batch 240/1540] avg loss 0.00236599, throughput 4.78849K wps
[Epoch 68 Batch 270/1540] avg loss 0.00253146, throughput 4.59709K wps
[Epoch 68 Batch 300/1540] avg loss 0.00249046, throughput 4.8223K wps
[Epoch 68 Batch 330/1540] avg loss 0.00280349, throughput 5.02214K wps
[Epoch 68 Batch 360/1540] avg loss 0.00259329, throughput 5.45895K wps
[Epoch 68 Batch 390/1540] avg loss 0.00250095, throughput 4.67411K wps
[Epoch 68 Batch 420/1540] avg loss 0.00292957, throughput 5.62787K wps
[Epoch 68 Batch 450/1540] avg loss 0.00273394, throughput 5.00678K wps
[Epoch 68 Batch 480/1540] avg loss 0.00299813, throughput 4.86777K wps
[Epoch 68 Batch 510/1540] avg loss 0.002717, throughput 4.51303K wps
[Epoch 68 Batch 540/1540] avg loss 0.00249303, throughput 4.95381K wps
[Epoch 68 Batch 570/1540] avg loss 0.00242509, throughput 5.45042K wps
[Epoch 68 Batch 600/1540] avg loss 0.00264412, throughput 4.92499K wps
[Epoch 68 Batch 630/1540] avg loss 0.00283387, throughput 4.88156K wps
[Epoch 68 Batch 660/1540] avg loss 0.00254603, throughput 4.96535K wps
[Epoch 68 Batch 690/1540] avg loss 0.00246204, throughput 5.78717K wps
[Epoch 68 Batch 720/1540] avg loss 0.00268227, throughput 4.95721K wps
[Epoch 68 Batch 750/1540] avg loss 0.00249681, throughput 4.66346K wps
[Epoch 68 Batch 780/1540] avg loss 0.00262484, throughput 4.57502K wps
[Epoch 68 Batch 810/1540] avg loss 0.00268055, throughput 4.63228K wps
[Epoch 68 Batch 840/1540] avg loss 0.00254115, throughput 4.52505K wps
[Epoch 68 Batch 870/1540] avg loss 0.0026973, throughput 4.87506K wps
[Epoch 68 Batch 900/1540] avg loss 0.00282082, throughput 4.86845K wps
[Epoch 68 Batch 930/1540] avg loss 0.00275988, throughput 5.10245K wps
[Epoch 68 Batch 960/1540] avg loss 0.00266245, throughput 4.73622K wps
[Epoch 68 Batch 990/1540] avg loss 0.0025714, throughput 4.88135K wps
[Epoch 68 Batch 1020/1540] avg loss 0.00311815, throughput 4.9578K wps
[Epoch 68 Batch 1050/1540] avg loss 0.00289901, throughput 4.64561K wps
[Epoch 68 Batch 1080/1540] avg loss 0.00294982, throughput 5.07583K wps
[Epoch 68 Batch 1110/1540] avg loss 0.00256427, throughput 5.08297K wps
[Epoch 68 Batch 1140/1540] avg loss 0.00274675, throughput 4.55339K wps
[Epoch 68 Batch 1170/1540] avg loss 0.00247451, throughput 4.73842K wps
[Epoch 68 Batch 1200/1540] avg loss 0.00286923, throughput 5.00722K wps
[Epoch 68 Batch 1230/1540] avg loss 0.00282292, throughput 4.71604K wps
[Epoch 68 Batch 1260/1540] avg loss 0.00236898, throughput 5.19595K wps
[Epoch 68 Batch 1290/1540] avg loss 0.00269378, throughput 5.09389K wps
[Epoch 68 Batch 1320/1540] avg loss 0.00264606, throughput 5.67467K wps
[Epoch 68 Batch 1350/1540] avg loss 0.00270897, throughput 5.51055K wps
[Epoch 68 Batch 1380/1540] avg loss 0.00293285, throughput 4.51686K wps
[Epoch 68 Batch 1410/1540] avg loss 0.00252797, throughput 5.01882K wps
[Epoch 68 Batch 1440/1540] avg loss 0.00260568, throughput 5.04261K wps
[Epoch 68 Batch 1470/1540] avg loss 0.00250552, throughput 5.44865K wps
[Epoch 68 Batch 1500/1540] avg loss 0.00251852, throughput 5.55597K wps
[Epoch 68 Batch 1530/1540] avg loss 0.00296157, throughput 5.47975K wps
Begin Testing...
[Epoch 68] train avg loss 0.00266528, dev acc 0.8417, dev avg loss 0.396764, throughput 4.96941K wps
[Epoch 69 Batch 30/1540] avg loss 0.00231979, throughput 4.61814K wps
[Epoch 69 Batch 60/1540] avg loss 0.00259668, throughput 5.3924K wps
[Epoch 69 Batch 90/1540] avg loss 0.00285451, throughput 5.60069K wps
[Epoch 69 Batch 120/1540] avg loss 0.00255389, throughput 4.77327K wps
[Epoch 69 Batch 150/1540] avg loss 0.0029128, throughput 4.63398K wps
[Epoch 69 Batch 180/1540] avg loss 0.0025971, throughput 4.67272K wps
[Epoch 69 Batch 210/1540] avg loss 0.00233838, throughput 4.95357K wps
[Epoch 69 Batch 240/1540] avg loss 0.00284897, throughput 5.08931K wps
[Epoch 69 Batch 270/1540] avg loss 0.00256011, throughput 4.7421K wps
[Epoch 69 Batch 300/1540] avg loss 0.0030029, throughput 5.15792K wps
[Epoch 69 Batch 330/1540] avg loss 0.00268371, throughput 4.82044K wps
[Epoch 69 Batch 360/1540] avg loss 0.00223317, throughput 4.59495K wps