Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='MR', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='non-static')
Use gpu0
maximum length (in tokens): 56
Done! Tokenizing Time=0.23s, #Sentences=10662
SentimentNet(
(embedding): Embedding(18768 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/173] avg loss 0.0139579, throughput 0.775187K wps
[Epoch 0 Batch 60/173] avg loss 0.0139497, throughput 2.82677K wps
[Epoch 0 Batch 90/173] avg loss 0.0139516, throughput 2.79663K wps
[Epoch 0 Batch 120/173] avg loss 0.013959, throughput 2.86521K wps
[Epoch 0 Batch 150/173] avg loss 0.0139825, throughput 2.86089K wps
Begin Testing...
[Epoch 0] train avg loss 0.0139822, dev acc 0.5766, dev avg loss 0.68563, throughput 1.53828K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0138453, throughput 2.92649K wps
[Epoch 1 Batch 60/173] avg loss 0.0136522, throughput 2.86374K wps
[Epoch 1 Batch 90/173] avg loss 0.0138347, throughput 2.82286K wps
[Epoch 1 Batch 120/173] avg loss 0.0137075, throughput 2.85821K wps
[Epoch 1 Batch 150/173] avg loss 0.0137196, throughput 2.8683K wps
Begin Testing...
[Epoch 1] train avg loss 0.0137434, dev acc 0.6017, dev avg loss 0.678158, throughput 2.86529K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0136441, throughput 2.93023K wps
[Epoch 2 Batch 60/173] avg loss 0.0136903, throughput 2.86557K wps
[Epoch 2 Batch 90/173] avg loss 0.0134569, throughput 2.87187K wps
[Epoch 2 Batch 120/173] avg loss 0.0136175, throughput 2.88452K wps
[Epoch 2 Batch 150/173] avg loss 0.0134026, throughput 2.79768K wps
Begin Testing...
[Epoch 2] train avg loss 0.0135667, dev acc 0.6851, dev avg loss 0.669489, throughput 2.85919K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0135236, throughput 2.87873K wps
[Epoch 3 Batch 60/173] avg loss 0.0134003, throughput 2.83293K wps
[Epoch 3 Batch 90/173] avg loss 0.013509, throughput 2.86958K wps
[Epoch 3 Batch 120/173] avg loss 0.0133581, throughput 2.85614K wps
[Epoch 3 Batch 150/173] avg loss 0.013296, throughput 2.8254K wps
Begin Testing...
[Epoch 3] train avg loss 0.0134165, dev acc 0.6528, dev avg loss 0.662731, throughput 2.8444K wps
[Epoch 4 Batch 30/173] avg loss 0.0132207, throughput 2.92377K wps
[Epoch 4 Batch 60/173] avg loss 0.0132522, throughput 2.84547K wps
[Epoch 4 Batch 90/173] avg loss 0.0132, throughput 2.82896K wps
[Epoch 4 Batch 120/173] avg loss 0.0132492, throughput 2.82404K wps
[Epoch 4 Batch 150/173] avg loss 0.0130591, throughput 2.82865K wps
Begin Testing...
[Epoch 4] train avg loss 0.0131956, dev acc 0.7143, dev avg loss 0.652531, throughput 2.85207K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.0130295, throughput 2.91391K wps
[Epoch 5 Batch 60/173] avg loss 0.0130004, throughput 2.85303K wps
[Epoch 5 Batch 90/173] avg loss 0.0128434, throughput 2.8698K wps
[Epoch 5 Batch 120/173] avg loss 0.0129025, throughput 2.86608K wps
[Epoch 5 Batch 150/173] avg loss 0.0130365, throughput 2.87286K wps
Begin Testing...
[Epoch 5] train avg loss 0.012994, dev acc 0.7132, dev avg loss 0.643113, throughput 2.87004K wps
[Epoch 6 Batch 30/173] avg loss 0.0127041, throughput 2.9359K wps
[Epoch 6 Batch 60/173] avg loss 0.0129012, throughput 2.877K wps
[Epoch 6 Batch 90/173] avg loss 0.0129017, throughput 2.86865K wps
[Epoch 6 Batch 120/173] avg loss 0.0126143, throughput 2.86588K wps
[Epoch 6 Batch 150/173] avg loss 0.0126589, throughput 2.85078K wps
Begin Testing...
[Epoch 6] train avg loss 0.0127685, dev acc 0.7216, dev avg loss 0.632654, throughput 2.8731K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/173] avg loss 0.0125939, throughput 2.91413K wps
[Epoch 7 Batch 60/173] avg loss 0.0126749, throughput 2.86696K wps
[Epoch 7 Batch 90/173] avg loss 0.0124972, throughput 2.80098K wps
[Epoch 7 Batch 120/173] avg loss 0.0124409, throughput 2.86283K wps
[Epoch 7 Batch 150/173] avg loss 0.0123432, throughput 2.86357K wps
Begin Testing...
[Epoch 7] train avg loss 0.0125091, dev acc 0.7278, dev avg loss 0.621455, throughput 2.86138K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/173] avg loss 0.0123361, throughput 2.89561K wps
[Epoch 8 Batch 60/173] avg loss 0.0124205, throughput 2.86662K wps
[Epoch 8 Batch 90/173] avg loss 0.0121645, throughput 2.87259K wps
[Epoch 8 Batch 120/173] avg loss 0.0122201, throughput 2.83622K wps
[Epoch 8 Batch 150/173] avg loss 0.0122239, throughput 2.86804K wps
Begin Testing...
[Epoch 8] train avg loss 0.0122681, dev acc 0.7320, dev avg loss 0.609518, throughput 2.86804K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/173] avg loss 0.0121213, throughput 2.89481K wps
[Epoch 9 Batch 60/173] avg loss 0.0119846, throughput 2.87649K wps
[Epoch 9 Batch 90/173] avg loss 0.0120368, throughput 2.87318K wps
[Epoch 9 Batch 120/173] avg loss 0.011874, throughput 2.87452K wps
[Epoch 9 Batch 150/173] avg loss 0.0119935, throughput 2.87517K wps
Begin Testing...
[Epoch 9] train avg loss 0.0119745, dev acc 0.7351, dev avg loss 0.596836, throughput 2.87784K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/173] avg loss 0.0118856, throughput 2.91016K wps
[Epoch 10 Batch 60/173] avg loss 0.0115843, throughput 2.87787K wps
[Epoch 10 Batch 90/173] avg loss 0.0119769, throughput 2.82393K wps
[Epoch 10 Batch 120/173] avg loss 0.0112688, throughput 2.8512K wps
[Epoch 10 Batch 150/173] avg loss 0.0116675, throughput 2.82774K wps
Begin Testing...
[Epoch 10] train avg loss 0.0117023, dev acc 0.7487, dev avg loss 0.584824, throughput 2.85766K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/173] avg loss 0.0116395, throughput 2.89911K wps
[Epoch 11 Batch 60/173] avg loss 0.0116509, throughput 2.8646K wps
[Epoch 11 Batch 90/173] avg loss 0.0113787, throughput 2.83115K wps
[Epoch 11 Batch 120/173] avg loss 0.0113592, throughput 2.87616K wps
[Epoch 11 Batch 150/173] avg loss 0.0113226, throughput 2.86526K wps
Begin Testing...
[Epoch 11] train avg loss 0.0114648, dev acc 0.7581, dev avg loss 0.571796, throughput 2.86534K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/173] avg loss 0.0111016, throughput 2.89097K wps
[Epoch 12 Batch 60/173] avg loss 0.0112272, throughput 2.85334K wps
[Epoch 12 Batch 90/173] avg loss 0.0111349, throughput 2.86495K wps
[Epoch 12 Batch 120/173] avg loss 0.0111343, throughput 2.87243K wps
[Epoch 12 Batch 150/173] avg loss 0.0107468, throughput 2.86733K wps
Begin Testing...
[Epoch 12] train avg loss 0.0110988, dev acc 0.7675, dev avg loss 0.558547, throughput 2.86364K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/173] avg loss 0.0108684, throughput 2.89689K wps
[Epoch 13 Batch 60/173] avg loss 0.0110105, throughput 2.84583K wps
[Epoch 13 Batch 90/173] avg loss 0.0107987, throughput 2.85266K wps
[Epoch 13 Batch 120/173] avg loss 0.0107656, throughput 2.77186K wps
[Epoch 13 Batch 150/173] avg loss 0.0107392, throughput 2.8092K wps
Begin Testing...
[Epoch 13] train avg loss 0.0107782, dev acc 0.7654, dev avg loss 0.546173, throughput 2.83825K wps
[Epoch 14 Batch 30/173] avg loss 0.0106559, throughput 2.94115K wps
[Epoch 14 Batch 60/173] avg loss 0.0106511, throughput 2.87687K wps
[Epoch 14 Batch 90/173] avg loss 0.0106943, throughput 2.86095K wps
[Epoch 14 Batch 120/173] avg loss 0.0102756, throughput 2.83357K wps
[Epoch 14 Batch 150/173] avg loss 0.0103533, throughput 2.83341K wps
Begin Testing...
[Epoch 14] train avg loss 0.0105264, dev acc 0.7810, dev avg loss 0.535192, throughput 2.86809K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/173] avg loss 0.0100749, throughput 2.93018K wps
[Epoch 15 Batch 60/173] avg loss 0.0105166, throughput 2.85738K wps
[Epoch 15 Batch 90/173] avg loss 0.010336, throughput 2.84573K wps
[Epoch 15 Batch 120/173] avg loss 0.0100369, throughput 2.85769K wps
[Epoch 15 Batch 150/173] avg loss 0.0100121, throughput 2.85086K wps
Begin Testing...
[Epoch 15] train avg loss 0.0102335, dev acc 0.7810, dev avg loss 0.523968, throughput 2.85968K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.0100882, throughput 2.93904K wps
[Epoch 16 Batch 60/173] avg loss 0.0100107, throughput 2.84402K wps
[Epoch 16 Batch 90/173] avg loss 0.0100586, throughput 2.79681K wps
[Epoch 16 Batch 120/173] avg loss 0.00999834, throughput 2.81068K wps
[Epoch 16 Batch 150/173] avg loss 0.00977912, throughput 2.84504K wps
Begin Testing...
[Epoch 16] train avg loss 0.00993622, dev acc 0.7842, dev avg loss 0.514234, throughput 2.85096K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/173] avg loss 0.00964738, throughput 2.91997K wps
[Epoch 17 Batch 60/173] avg loss 0.00978676, throughput 2.85933K wps
[Epoch 17 Batch 90/173] avg loss 0.00942658, throughput 2.84086K wps
[Epoch 17 Batch 120/173] avg loss 0.00997174, throughput 2.83996K wps
[Epoch 17 Batch 150/173] avg loss 0.00952252, throughput 2.87186K wps
Begin Testing...
[Epoch 17] train avg loss 0.00967697, dev acc 0.7894, dev avg loss 0.505312, throughput 2.86679K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.0092929, throughput 2.93619K wps
[Epoch 18 Batch 60/173] avg loss 0.00942955, throughput 2.84165K wps
[Epoch 18 Batch 90/173] avg loss 0.00941714, throughput 2.84339K wps
[Epoch 18 Batch 120/173] avg loss 0.00928235, throughput 2.84424K wps
[Epoch 18 Batch 150/173] avg loss 0.0100256, throughput 2.86348K wps
Begin Testing...
[Epoch 18] train avg loss 0.00947791, dev acc 0.7862, dev avg loss 0.499483, throughput 2.85484K wps
[Epoch 19 Batch 30/173] avg loss 0.00939098, throughput 2.8627K wps
[Epoch 19 Batch 60/173] avg loss 0.00948293, throughput 2.85591K wps
[Epoch 19 Batch 90/173] avg loss 0.00947677, throughput 2.85617K wps
[Epoch 19 Batch 120/173] avg loss 0.00872992, throughput 2.83123K wps
[Epoch 19 Batch 150/173] avg loss 0.00884396, throughput 2.86921K wps
Begin Testing...
[Epoch 19] train avg loss 0.00924161, dev acc 0.7904, dev avg loss 0.491203, throughput 2.8561K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/173] avg loss 0.00888177, throughput 2.89377K wps
[Epoch 20 Batch 60/173] avg loss 0.00904084, throughput 2.83103K wps
[Epoch 20 Batch 90/173] avg loss 0.00898185, throughput 2.83354K wps
[Epoch 20 Batch 120/173] avg loss 0.00888186, throughput 2.8286K wps
[Epoch 20 Batch 150/173] avg loss 0.00899989, throughput 2.84146K wps
Begin Testing...
[Epoch 20] train avg loss 0.00895518, dev acc 0.7925, dev avg loss 0.485469, throughput 2.84924K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/173] avg loss 0.00884987, throughput 2.8971K wps
[Epoch 21 Batch 60/173] avg loss 0.00909113, throughput 2.8478K wps
[Epoch 21 Batch 90/173] avg loss 0.00874782, throughput 2.82916K wps
[Epoch 21 Batch 120/173] avg loss 0.00854453, throughput 2.87075K wps
[Epoch 21 Batch 150/173] avg loss 0.00841091, throughput 2.84773K wps
Begin Testing...
[Epoch 21] train avg loss 0.00877007, dev acc 0.7946, dev avg loss 0.479655, throughput 2.85948K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/173] avg loss 0.00813608, throughput 2.88921K wps
[Epoch 22 Batch 60/173] avg loss 0.00878179, throughput 2.8354K wps
[Epoch 22 Batch 90/173] avg loss 0.00859688, throughput 2.83181K wps
[Epoch 22 Batch 120/173] avg loss 0.00838878, throughput 2.86205K wps
[Epoch 22 Batch 150/173] avg loss 0.00849658, throughput 2.87608K wps
Begin Testing...
[Epoch 22] train avg loss 0.0085324, dev acc 0.7925, dev avg loss 0.474702, throughput 2.86206K wps
[Epoch 23 Batch 30/173] avg loss 0.00813982, throughput 2.93005K wps
[Epoch 23 Batch 60/173] avg loss 0.00859668, throughput 2.8281K wps
[Epoch 23 Batch 90/173] avg loss 0.00832039, throughput 2.87448K wps
[Epoch 23 Batch 120/173] avg loss 0.00817002, throughput 2.86379K wps
[Epoch 23 Batch 150/173] avg loss 0.00816335, throughput 2.84801K wps
Begin Testing...
[Epoch 23] train avg loss 0.00834754, dev acc 0.7904, dev avg loss 0.471536, throughput 2.86903K wps
[Epoch 24 Batch 30/173] avg loss 0.00779305, throughput 2.94059K wps
[Epoch 24 Batch 60/173] avg loss 0.00817371, throughput 2.87525K wps
[Epoch 24 Batch 90/173] avg loss 0.00817022, throughput 2.8596K wps
[Epoch 24 Batch 120/173] avg loss 0.00815438, throughput 2.84022K wps
[Epoch 24 Batch 150/173] avg loss 0.00801406, throughput 2.87349K wps
Begin Testing...
[Epoch 24] train avg loss 0.00808959, dev acc 0.7977, dev avg loss 0.465966, throughput 2.87601K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/173] avg loss 0.00802523, throughput 2.91566K wps
[Epoch 25 Batch 60/173] avg loss 0.0078761, throughput 2.84926K wps
[Epoch 25 Batch 90/173] avg loss 0.00793728, throughput 2.87126K wps
[Epoch 25 Batch 120/173] avg loss 0.00784647, throughput 2.8553K wps
[Epoch 25 Batch 150/173] avg loss 0.00796994, throughput 2.86478K wps
Begin Testing...
[Epoch 25] train avg loss 0.00793361, dev acc 0.8008, dev avg loss 0.462357, throughput 2.87168K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/173] avg loss 0.00800508, throughput 2.9266K wps
[Epoch 26 Batch 60/173] avg loss 0.00769446, throughput 2.87059K wps
[Epoch 26 Batch 90/173] avg loss 0.00766823, throughput 2.85864K wps
[Epoch 26 Batch 120/173] avg loss 0.00772206, throughput 2.84201K wps
[Epoch 26 Batch 150/173] avg loss 0.00788491, throughput 2.83918K wps
Begin Testing...
[Epoch 26] train avg loss 0.0077801, dev acc 0.7904, dev avg loss 0.46044, throughput 2.86903K wps
[Epoch 27 Batch 30/173] avg loss 0.0076951, throughput 2.9158K wps
[Epoch 27 Batch 60/173] avg loss 0.00751856, throughput 2.8634K wps
[Epoch 27 Batch 90/173] avg loss 0.00762395, throughput 2.84662K wps
[Epoch 27 Batch 120/173] avg loss 0.00742273, throughput 2.83226K wps
[Epoch 27 Batch 150/173] avg loss 0.00747175, throughput 2.85139K wps
Begin Testing...
[Epoch 27] train avg loss 0.00756798, dev acc 0.7998, dev avg loss 0.456346, throughput 2.86332K wps
[Epoch 28 Batch 30/173] avg loss 0.00737909, throughput 2.92456K wps
[Epoch 28 Batch 60/173] avg loss 0.00721589, throughput 2.86755K wps
[Epoch 28 Batch 90/173] avg loss 0.00729782, throughput 2.83194K wps
[Epoch 28 Batch 120/173] avg loss 0.0076884, throughput 2.85355K wps
[Epoch 28 Batch 150/173] avg loss 0.00709606, throughput 2.86128K wps
Begin Testing...
[Epoch 28] train avg loss 0.00734226, dev acc 0.7977, dev avg loss 0.45341, throughput 2.86678K wps
[Epoch 29 Batch 30/173] avg loss 0.00713731, throughput 2.89336K wps
[Epoch 29 Batch 60/173] avg loss 0.00734807, throughput 2.86392K wps
[Epoch 29 Batch 90/173] avg loss 0.00702345, throughput 2.86637K wps
[Epoch 29 Batch 120/173] avg loss 0.00756101, throughput 2.87362K wps
[Epoch 29 Batch 150/173] avg loss 0.00728527, throughput 2.86855K wps
Begin Testing...
[Epoch 29] train avg loss 0.00724488, dev acc 0.7967, dev avg loss 0.454811, throughput 2.86865K wps
[Epoch 30 Batch 30/173] avg loss 0.00689731, throughput 2.88669K wps
[Epoch 30 Batch 60/173] avg loss 0.00721402, throughput 2.83334K wps
[Epoch 30 Batch 90/173] avg loss 0.00697884, throughput 2.86386K wps
[Epoch 30 Batch 120/173] avg loss 0.00707913, throughput 2.85509K wps
[Epoch 30 Batch 150/173] avg loss 0.0069871, throughput 2.86642K wps
Begin Testing...
[Epoch 30] train avg loss 0.00704097, dev acc 0.7987, dev avg loss 0.452, throughput 2.86205K wps
[Epoch 31 Batch 30/173] avg loss 0.00662127, throughput 2.90521K wps
[Epoch 31 Batch 60/173] avg loss 0.00687984, throughput 2.85475K wps
[Epoch 31 Batch 90/173] avg loss 0.00643069, throughput 2.86057K wps
[Epoch 31 Batch 120/173] avg loss 0.00709305, throughput 2.86279K wps
[Epoch 31 Batch 150/173] avg loss 0.00700779, throughput 2.87163K wps
Begin Testing...
[Epoch 31] train avg loss 0.00681777, dev acc 0.8008, dev avg loss 0.44824, throughput 2.87K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/173] avg loss 0.00651923, throughput 2.92671K wps
[Epoch 32 Batch 60/173] avg loss 0.00684465, throughput 2.85029K wps
[Epoch 32 Batch 90/173] avg loss 0.00646138, throughput 2.85209K wps
[Epoch 32 Batch 120/173] avg loss 0.00674017, throughput 2.79987K wps
[Epoch 32 Batch 150/173] avg loss 0.00658441, throughput 2.84975K wps
Begin Testing...
[Epoch 32] train avg loss 0.00666735, dev acc 0.7998, dev avg loss 0.446059, throughput 2.85326K wps
[Epoch 33 Batch 30/173] avg loss 0.00658084, throughput 2.8942K wps
[Epoch 33 Batch 60/173] avg loss 0.00635894, throughput 2.8614K wps
[Epoch 33 Batch 90/173] avg loss 0.00644148, throughput 2.85377K wps
[Epoch 33 Batch 120/173] avg loss 0.00668427, throughput 2.84291K wps
[Epoch 33 Batch 150/173] avg loss 0.00672162, throughput 2.85362K wps
Begin Testing...
[Epoch 33] train avg loss 0.00657439, dev acc 0.8040, dev avg loss 0.445867, throughput 2.86195K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/173] avg loss 0.00648728, throughput 2.8914K wps
[Epoch 34 Batch 60/173] avg loss 0.00604177, throughput 2.84067K wps
[Epoch 34 Batch 90/173] avg loss 0.0066979, throughput 2.85708K wps
[Epoch 34 Batch 120/173] avg loss 0.00611664, throughput 2.83164K wps
[Epoch 34 Batch 150/173] avg loss 0.00620072, throughput 2.86377K wps
Begin Testing...
[Epoch 34] train avg loss 0.00637984, dev acc 0.7956, dev avg loss 0.446416, throughput 2.85319K wps
[Epoch 35 Batch 30/173] avg loss 0.00651416, throughput 2.94005K wps
[Epoch 35 Batch 60/173] avg loss 0.00625306, throughput 2.86508K wps
[Epoch 35 Batch 90/173] avg loss 0.00577588, throughput 2.8593K wps
[Epoch 35 Batch 120/173] avg loss 0.00585338, throughput 2.82597K wps
[Epoch 35 Batch 150/173] avg loss 0.00605659, throughput 2.87173K wps
Begin Testing...
[Epoch 35] train avg loss 0.00614833, dev acc 0.7977, dev avg loss 0.441489, throughput 2.87201K wps
[Epoch 36 Batch 30/173] avg loss 0.00630508, throughput 2.92391K wps
[Epoch 36 Batch 60/173] avg loss 0.00599736, throughput 2.86483K wps
[Epoch 36 Batch 90/173] avg loss 0.0061263, throughput 2.83081K wps
[Epoch 36 Batch 120/173] avg loss 0.00611446, throughput 2.85221K wps
[Epoch 36 Batch 150/173] avg loss 0.0058216, throughput 2.87237K wps
Begin Testing...
[Epoch 36] train avg loss 0.00606353, dev acc 0.7883, dev avg loss 0.443594, throughput 2.86302K wps
[Epoch 37 Batch 30/173] avg loss 0.00543244, throughput 2.88622K wps
[Epoch 37 Batch 60/173] avg loss 0.00615511, throughput 2.87366K wps
[Epoch 37 Batch 90/173] avg loss 0.00573907, throughput 2.81242K wps
[Epoch 37 Batch 120/173] avg loss 0.00608355, throughput 2.86578K wps
[Epoch 37 Batch 150/173] avg loss 0.00584534, throughput 2.87443K wps
Begin Testing...
[Epoch 37] train avg loss 0.00588122, dev acc 0.7967, dev avg loss 0.439807, throughput 2.86086K wps
[Epoch 38 Batch 30/173] avg loss 0.00566849, throughput 2.89683K wps
[Epoch 38 Batch 60/173] avg loss 0.00581882, throughput 2.82211K wps
[Epoch 38 Batch 90/173] avg loss 0.00567395, throughput 2.81444K wps
[Epoch 38 Batch 120/173] avg loss 0.00591446, throughput 2.87146K wps
[Epoch 38 Batch 150/173] avg loss 0.00566934, throughput 2.86938K wps
Begin Testing...
[Epoch 38] train avg loss 0.00577887, dev acc 0.7987, dev avg loss 0.439525, throughput 2.85655K wps
[Epoch 39 Batch 30/173] avg loss 0.00569794, throughput 2.92006K wps
[Epoch 39 Batch 60/173] avg loss 0.00552663, throughput 2.87597K wps
[Epoch 39 Batch 90/173] avg loss 0.00555533, throughput 2.86331K wps
[Epoch 39 Batch 120/173] avg loss 0.00558547, throughput 2.84576K wps
[Epoch 39 Batch 150/173] avg loss 0.00536572, throughput 2.86919K wps
Begin Testing...
[Epoch 39] train avg loss 0.00559184, dev acc 0.7998, dev avg loss 0.437946, throughput 2.87146K wps
[Epoch 40 Batch 30/173] avg loss 0.00522547, throughput 2.87504K wps
[Epoch 40 Batch 60/173] avg loss 0.00573621, throughput 2.84237K wps
[Epoch 40 Batch 90/173] avg loss 0.00560357, throughput 2.80738K wps
[Epoch 40 Batch 120/173] avg loss 0.00535402, throughput 2.87K wps
[Epoch 40 Batch 150/173] avg loss 0.0056179, throughput 2.86402K wps
Begin Testing...
[Epoch 40] train avg loss 0.00551809, dev acc 0.7998, dev avg loss 0.437586, throughput 2.85229K wps
[Epoch 41 Batch 30/173] avg loss 0.00526998, throughput 2.90836K wps
[Epoch 41 Batch 60/173] avg loss 0.00534353, throughput 2.85462K wps
[Epoch 41 Batch 90/173] avg loss 0.00502323, throughput 2.82126K wps
[Epoch 41 Batch 120/173] avg loss 0.00546484, throughput 2.87161K wps
[Epoch 41 Batch 150/173] avg loss 0.00510226, throughput 2.8737K wps
Begin Testing...
[Epoch 41] train avg loss 0.00527863, dev acc 0.7904, dev avg loss 0.438399, throughput 2.86727K wps
[Epoch 42 Batch 30/173] avg loss 0.00515829, throughput 2.87158K wps
[Epoch 42 Batch 60/173] avg loss 0.00502215, throughput 2.86641K wps
[Epoch 42 Batch 90/173] avg loss 0.00509818, throughput 2.87319K wps
[Epoch 42 Batch 120/173] avg loss 0.00514846, throughput 2.84282K wps
[Epoch 42 Batch 150/173] avg loss 0.00523404, throughput 2.82876K wps
Begin Testing...
[Epoch 42] train avg loss 0.00512112, dev acc 0.7998, dev avg loss 0.436834, throughput 2.84883K wps
[Epoch 43 Batch 30/173] avg loss 0.00505229, throughput 2.88048K wps
[Epoch 43 Batch 60/173] avg loss 0.00501642, throughput 2.85171K wps
[Epoch 43 Batch 90/173] avg loss 0.00504847, throughput 2.87928K wps
[Epoch 43 Batch 120/173] avg loss 0.00527654, throughput 2.86364K wps
[Epoch 43 Batch 150/173] avg loss 0.00483905, throughput 2.85193K wps
Begin Testing...
[Epoch 43] train avg loss 0.0050543, dev acc 0.8029, dev avg loss 0.437482, throughput 2.85759K wps
[Epoch 44 Batch 30/173] avg loss 0.00483055, throughput 2.92495K wps
[Epoch 44 Batch 60/173] avg loss 0.00487134, throughput 2.83532K wps
[Epoch 44 Batch 90/173] avg loss 0.00474775, throughput 2.86394K wps
[Epoch 44 Batch 120/173] avg loss 0.00474959, throughput 2.87428K wps
[Epoch 44 Batch 150/173] avg loss 0.00489092, throughput 2.87429K wps
Begin Testing...
[Epoch 44] train avg loss 0.00485472, dev acc 0.7956, dev avg loss 0.441347, throughput 2.87348K wps
[Epoch 45 Batch 30/173] avg loss 0.00467103, throughput 2.91768K wps
[Epoch 45 Batch 60/173] avg loss 0.00491471, throughput 2.86128K wps
[Epoch 45 Batch 90/173] avg loss 0.00475908, throughput 2.86224K wps
[Epoch 45 Batch 120/173] avg loss 0.00473145, throughput 2.83214K wps
[Epoch 45 Batch 150/173] avg loss 0.00481158, throughput 2.86614K wps
Begin Testing...
[Epoch 45] train avg loss 0.00479002, dev acc 0.7977, dev avg loss 0.438254, throughput 2.86815K wps
[Epoch 46 Batch 30/173] avg loss 0.00452115, throughput 2.87175K wps
[Epoch 46 Batch 60/173] avg loss 0.00470007, throughput 2.8554K wps
[Epoch 46 Batch 90/173] avg loss 0.00462208, throughput 2.86369K wps
[Epoch 46 Batch 120/173] avg loss 0.00429355, throughput 2.86255K wps
[Epoch 46 Batch 150/173] avg loss 0.00448795, throughput 2.85412K wps
Begin Testing...
[Epoch 46] train avg loss 0.00458037, dev acc 0.7987, dev avg loss 0.43777, throughput 2.85628K wps
[Epoch 47 Batch 30/173] avg loss 0.0042938, throughput 2.92122K wps
[Epoch 47 Batch 60/173] avg loss 0.00438461, throughput 2.86764K wps
[Epoch 47 Batch 90/173] avg loss 0.00472868, throughput 2.85129K wps
[Epoch 47 Batch 120/173] avg loss 0.00438256, throughput 2.87117K wps
[Epoch 47 Batch 150/173] avg loss 0.00465709, throughput 2.85232K wps
Begin Testing...
[Epoch 47] train avg loss 0.00450578, dev acc 0.7935, dev avg loss 0.437675, throughput 2.86993K wps
[Epoch 48 Batch 30/173] avg loss 0.00456714, throughput 2.9294K wps
[Epoch 48 Batch 60/173] avg loss 0.00429984, throughput 2.87175K wps
[Epoch 48 Batch 90/173] avg loss 0.00432246, throughput 2.86139K wps
[Epoch 48 Batch 120/173] avg loss 0.00439133, throughput 2.85604K wps
[Epoch 48 Batch 150/173] avg loss 0.00437879, throughput 2.86466K wps
Begin Testing...
[Epoch 48] train avg loss 0.00436266, dev acc 0.7967, dev avg loss 0.440912, throughput 2.86707K wps
[Epoch 49 Batch 30/173] avg loss 0.00449852, throughput 2.92963K wps
[Epoch 49 Batch 60/173] avg loss 0.00440522, throughput 2.85525K wps
[Epoch 49 Batch 90/173] avg loss 0.00391713, throughput 2.85764K wps
[Epoch 49 Batch 120/173] avg loss 0.00411806, throughput 2.8812K wps
[Epoch 49 Batch 150/173] avg loss 0.00443942, throughput 2.87162K wps
Begin Testing...
[Epoch 49] train avg loss 0.00429917, dev acc 0.7998, dev avg loss 0.440395, throughput 2.87678K wps
[Epoch 50 Batch 30/173] avg loss 0.00398392, throughput 2.91702K wps
[Epoch 50 Batch 60/173] avg loss 0.0041386, throughput 2.82729K wps
[Epoch 50 Batch 90/173] avg loss 0.00415032, throughput 2.81928K wps
[Epoch 50 Batch 120/173] avg loss 0.00434237, throughput 2.83752K wps
[Epoch 50 Batch 150/173] avg loss 0.00419761, throughput 2.85956K wps
Begin Testing...
[Epoch 50] train avg loss 0.00414787, dev acc 0.7967, dev avg loss 0.442114, throughput 2.85083K wps
[Epoch 51 Batch 30/173] avg loss 0.00391652, throughput 2.93539K wps
[Epoch 51 Batch 60/173] avg loss 0.00394674, throughput 2.86397K wps
[Epoch 51 Batch 90/173] avg loss 0.00411792, throughput 2.80883K wps
[Epoch 51 Batch 120/173] avg loss 0.00391717, throughput 2.86912K wps
[Epoch 51 Batch 150/173] avg loss 0.00422231, throughput 2.86807K wps
Begin Testing...
[Epoch 51] train avg loss 0.00405509, dev acc 0.7977, dev avg loss 0.439148, throughput 2.86413K wps
[Epoch 52 Batch 30/173] avg loss 0.00418431, throughput 2.88674K wps
[Epoch 52 Batch 60/173] avg loss 0.00400698, throughput 2.82887K wps
[Epoch 52 Batch 90/173] avg loss 0.00368817, throughput 2.87088K wps
[Epoch 52 Batch 120/173] avg loss 0.00393081, throughput 2.86643K wps
[Epoch 52 Batch 150/173] avg loss 0.00398974, throughput 2.86911K wps
Begin Testing...
[Epoch 52] train avg loss 0.00394933, dev acc 0.7956, dev avg loss 0.439128, throughput 2.86365K wps
[Epoch 53 Batch 30/173] avg loss 0.00391949, throughput 2.9389K wps
[Epoch 53 Batch 60/173] avg loss 0.00383747, throughput 2.86711K wps
[Epoch 53 Batch 90/173] avg loss 0.00392347, throughput 2.83474K wps
[Epoch 53 Batch 120/173] avg loss 0.0037524, throughput 2.81641K wps
[Epoch 53 Batch 150/173] avg loss 0.00367158, throughput 2.8669K wps
Begin Testing...
[Epoch 53] train avg loss 0.0038741, dev acc 0.7914, dev avg loss 0.439999, throughput 2.86207K wps
[Epoch 54 Batch 30/173] avg loss 0.00363038, throughput 2.90108K wps
[Epoch 54 Batch 60/173] avg loss 0.0037692, throughput 2.82455K wps
[Epoch 54 Batch 90/173] avg loss 0.0038343, throughput 2.82149K wps
[Epoch 54 Batch 120/173] avg loss 0.00359127, throughput 2.79871K wps
[Epoch 54 Batch 150/173] avg loss 0.00347558, throughput 2.86364K wps
Begin Testing...
[Epoch 54] train avg loss 0.00365988, dev acc 0.8008, dev avg loss 0.442988, throughput 2.84378K wps
[Epoch 55 Batch 30/173] avg loss 0.00356731, throughput 2.91159K wps
[Epoch 55 Batch 60/173] avg loss 0.00356107, throughput 2.86864K wps
[Epoch 55 Batch 90/173] avg loss 0.00397273, throughput 2.86443K wps
[Epoch 55 Batch 120/173] avg loss 0.00363367, throughput 2.86504K wps
[Epoch 55 Batch 150/173] avg loss 0.00358662, throughput 2.86649K wps
Begin Testing...
[Epoch 55] train avg loss 0.00365731, dev acc 0.8019, dev avg loss 0.443671, throughput 2.87336K wps
[Epoch 56 Batch 30/173] avg loss 0.00367405, throughput 2.90767K wps
[Epoch 56 Batch 60/173] avg loss 0.0034177, throughput 2.8659K wps
[Epoch 56 Batch 90/173] avg loss 0.00365295, throughput 2.84866K wps
[Epoch 56 Batch 120/173] avg loss 0.00324299, throughput 2.83557K wps
[Epoch 56 Batch 150/173] avg loss 0.00347492, throughput 2.83834K wps
Begin Testing...
[Epoch 56] train avg loss 0.0035189, dev acc 0.7935, dev avg loss 0.443297, throughput 2.85802K wps
[Epoch 57 Batch 30/173] avg loss 0.00339669, throughput 2.94507K wps
[Epoch 57 Batch 60/173] avg loss 0.00327915, throughput 2.87661K wps
[Epoch 57 Batch 90/173] avg loss 0.00328999, throughput 2.86416K wps
[Epoch 57 Batch 120/173] avg loss 0.00344333, throughput 2.87042K wps
[Epoch 57 Batch 150/173] avg loss 0.00343901, throughput 2.87474K wps
Begin Testing...
[Epoch 57] train avg loss 0.00338778, dev acc 0.8040, dev avg loss 0.447623, throughput 2.88274K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/173] avg loss 0.00325422, throughput 2.9234K wps
[Epoch 58 Batch 60/173] avg loss 0.00324497, throughput 2.84351K wps
[Epoch 58 Batch 90/173] avg loss 0.00337107, throughput 2.83626K wps
[Epoch 58 Batch 120/173] avg loss 0.00330752, throughput 2.82496K wps
[Epoch 58 Batch 150/173] avg loss 0.00354129, throughput 2.85414K wps
Begin Testing...
[Epoch 58] train avg loss 0.00334229, dev acc 0.7987, dev avg loss 0.44459, throughput 2.85794K wps
[Epoch 59 Batch 30/173] avg loss 0.00327004, throughput 2.91497K wps
[Epoch 59 Batch 60/173] avg loss 0.00318557, throughput 2.85362K wps
[Epoch 59 Batch 90/173] avg loss 0.00318478, throughput 2.85456K wps
[Epoch 59 Batch 120/173] avg loss 0.00318252, throughput 2.86151K wps
[Epoch 59 Batch 150/173] avg loss 0.00311162, throughput 2.86518K wps
Begin Testing...
[Epoch 59] train avg loss 0.00322075, dev acc 0.7894, dev avg loss 0.445479, throughput 2.86693K wps
[Epoch 60 Batch 30/173] avg loss 0.00318319, throughput 2.92339K wps
[Epoch 60 Batch 60/173] avg loss 0.00321265, throughput 2.868K wps
[Epoch 60 Batch 90/173] avg loss 0.00305945, throughput 2.86872K wps
[Epoch 60 Batch 120/173] avg loss 0.00326188, throughput 2.86777K wps
[Epoch 60 Batch 150/173] avg loss 0.00297453, throughput 2.86512K wps
Begin Testing...
[Epoch 60] train avg loss 0.00314196, dev acc 0.7862, dev avg loss 0.44878, throughput 2.87695K wps
[Epoch 61 Batch 30/173] avg loss 0.00321478, throughput 2.93253K wps
[Epoch 61 Batch 60/173] avg loss 0.00321225, throughput 2.86736K wps
[Epoch 61 Batch 90/173] avg loss 0.00312554, throughput 2.86596K wps
[Epoch 61 Batch 120/173] avg loss 0.00292399, throughput 2.84024K wps
[Epoch 61 Batch 150/173] avg loss 0.00317514, throughput 2.86633K wps
Begin Testing...
[Epoch 61] train avg loss 0.00310567, dev acc 0.7935, dev avg loss 0.451473, throughput 2.86065K wps
[Epoch 62 Batch 30/173] avg loss 0.00301433, throughput 2.93199K wps
[Epoch 62 Batch 60/173] avg loss 0.00309146, throughput 2.83512K wps
[Epoch 62 Batch 90/173] avg loss 0.00289071, throughput 2.84078K wps
[Epoch 62 Batch 120/173] avg loss 0.00285767, throughput 2.86172K wps
[Epoch 62 Batch 150/173] avg loss 0.00321076, throughput 2.87283K wps
Begin Testing...
[Epoch 62] train avg loss 0.00302245, dev acc 0.7873, dev avg loss 0.450576, throughput 2.86844K wps
[Epoch 63 Batch 30/173] avg loss 0.00287151, throughput 2.93581K wps
[Epoch 63 Batch 60/173] avg loss 0.00289089, throughput 2.8605K wps
[Epoch 63 Batch 90/173] avg loss 0.00286494, throughput 2.845K wps
[Epoch 63 Batch 120/173] avg loss 0.00290285, throughput 2.82794K wps
[Epoch 63 Batch 150/173] avg loss 0.00295265, throughput 2.86869K wps
Begin Testing...
[Epoch 63] train avg loss 0.00289795, dev acc 0.7925, dev avg loss 0.453572, throughput 2.8643K wps
[Epoch 64 Batch 30/173] avg loss 0.00256945, throughput 2.89938K wps
[Epoch 64 Batch 60/173] avg loss 0.00262645, throughput 2.80641K wps
[Epoch 64 Batch 90/173] avg loss 0.00284007, throughput 2.80128K wps
[Epoch 64 Batch 120/173] avg loss 0.00282935, throughput 2.82888K wps
[Epoch 64 Batch 150/173] avg loss 0.00288833, throughput 2.84113K wps
Begin Testing...
[Epoch 64] train avg loss 0.00274948, dev acc 0.7904, dev avg loss 0.454964, throughput 2.83354K wps
[Epoch 65 Batch 30/173] avg loss 0.00289196, throughput 2.93597K wps
[Epoch 65 Batch 60/173] avg loss 0.00269144, throughput 2.87084K wps
[Epoch 65 Batch 90/173] avg loss 0.00276344, throughput 2.83675K wps
[Epoch 65 Batch 120/173] avg loss 0.00274274, throughput 2.86506K wps
[Epoch 65 Batch 150/173] avg loss 0.00273271, throughput 2.86716K wps
Begin Testing...
[Epoch 65] train avg loss 0.00274242, dev acc 0.7967, dev avg loss 0.460867, throughput 2.87255K wps
[Epoch 66 Batch 30/173] avg loss 0.00259328, throughput 2.92217K wps
[Epoch 66 Batch 60/173] avg loss 0.00269522, throughput 2.85111K wps
[Epoch 66 Batch 90/173] avg loss 0.00279817, throughput 2.85551K wps
[Epoch 66 Batch 120/173] avg loss 0.00247608, throughput 2.83687K wps
[Epoch 66 Batch 150/173] avg loss 0.00269338, throughput 2.83317K wps
Begin Testing...
[Epoch 66] train avg loss 0.00266933, dev acc 0.7977, dev avg loss 0.461881, throughput 2.86059K wps
[Epoch 67 Batch 30/173] avg loss 0.00253077, throughput 2.90532K wps
[Epoch 67 Batch 60/173] avg loss 0.00240458, throughput 2.84287K wps
[Epoch 67 Batch 90/173] avg loss 0.00255149, throughput 2.85748K wps
[Epoch 67 Batch 120/173] avg loss 0.00237614, throughput 2.8701K wps
[Epoch 67 Batch 150/173] avg loss 0.00263928, throughput 2.86581K wps
Begin Testing...
[Epoch 67] train avg loss 0.00251239, dev acc 0.7977, dev avg loss 0.459155, throughput 2.86453K wps
[Epoch 68 Batch 30/173] avg loss 0.00243584, throughput 2.86115K wps
[Epoch 68 Batch 60/173] avg loss 0.00228915, throughput 2.86056K wps
[Epoch 68 Batch 90/173] avg loss 0.00242577, throughput 2.86405K wps
[Epoch 68 Batch 120/173] avg loss 0.00265608, throughput 2.85206K wps
[Epoch 68 Batch 150/173] avg loss 0.00239599, throughput 2.83376K wps
Begin Testing...
[Epoch 68] train avg loss 0.00248001, dev acc 0.7914, dev avg loss 0.460282, throughput 2.85486K wps
[Epoch 69 Batch 30/173] avg loss 0.00239053, throughput 2.93489K wps
[Epoch 69 Batch 60/173] avg loss 0.00234478, throughput 2.86515K wps
[Epoch 69 Batch 90/173] avg loss 0.00237848, throughput 2.8638K wps
[Epoch 69 Batch 120/173] avg loss 0.00241873, throughput 2.85481K wps
[Epoch 69 Batch 150/173] avg loss 0.00250595, throughput 2.79588K wps
Begin Testing...
[Epoch 69] train avg loss 0.00242791, dev acc 0.7914, dev avg loss 0.463434, throughput 2.8622K wps
[Epoch 70 Batch 30/173] avg loss 0.00245523, throughput 2.88217K wps
[Epoch 70 Batch 60/173] avg loss 0.00221807, throughput 2.86782K wps
[Epoch 70 Batch 90/173] avg loss 0.00231383, throughput 2.86763K wps
[Epoch 70 Batch 120/173] avg loss 0.00240003, throughput 2.85901K wps
[Epoch 70 Batch 150/173] avg loss 0.00239474, throughput 2.86277K wps
Begin Testing...
[Epoch 70] train avg loss 0.00237526, dev acc 0.7894, dev avg loss 0.465338, throughput 2.86697K wps
[Epoch 71 Batch 30/173] avg loss 0.00217239, throughput 2.89487K wps
[Epoch 71 Batch 60/173] avg loss 0.00217057, throughput 2.8668K wps
[Epoch 71 Batch 90/173] avg loss 0.00241876, throughput 2.84475K wps
[Epoch 71 Batch 120/173] avg loss 0.00209705, throughput 2.86251K wps
[Epoch 71 Batch 150/173] avg loss 0.002203, throughput 2.85658K wps
Begin Testing...
[Epoch 71] train avg loss 0.00224831, dev acc 0.7914, dev avg loss 0.466582, throughput 2.86569K wps
[Epoch 72 Batch 30/173] avg loss 0.00229321, throughput 2.9083K wps
[Epoch 72 Batch 60/173] avg loss 0.00215812, throughput 2.86196K wps
[Epoch 72 Batch 90/173] avg loss 0.00232131, throughput 2.84287K wps
[Epoch 72 Batch 120/173] avg loss 0.00224221, throughput 2.86024K wps
[Epoch 72 Batch 150/173] avg loss 0.00218495, throughput 2.8537K wps
Begin Testing...
[Epoch 72] train avg loss 0.00225403, dev acc 0.7967, dev avg loss 0.468635, throughput 2.86467K wps
[Epoch 73 Batch 30/173] avg loss 0.00211, throughput 2.93204K wps
[Epoch 73 Batch 60/173] avg loss 0.00205986, throughput 2.86514K wps
[Epoch 73 Batch 90/173] avg loss 0.00227329, throughput 2.86491K wps
[Epoch 73 Batch 120/173] avg loss 0.00216135, throughput 2.86288K wps
[Epoch 73 Batch 150/173] avg loss 0.00211742, throughput 2.81426K wps
Begin Testing...
[Epoch 73] train avg loss 0.00215639, dev acc 0.7894, dev avg loss 0.477555, throughput 2.86448K wps
[Epoch 74 Batch 30/173] avg loss 0.00228283, throughput 2.92571K wps
[Epoch 74 Batch 60/173] avg loss 0.00206936, throughput 2.86406K wps
[Epoch 74 Batch 90/173] avg loss 0.0021127, throughput 2.84406K wps
[Epoch 74 Batch 120/173] avg loss 0.00237897, throughput 2.85956K wps
[Epoch 74 Batch 150/173] avg loss 0.00226171, throughput 2.85495K wps
Begin Testing...
[Epoch 74] train avg loss 0.00220065, dev acc 0.7956, dev avg loss 0.474442, throughput 2.86872K wps
[Epoch 75 Batch 30/173] avg loss 0.002033, throughput 2.86858K wps
[Epoch 75 Batch 60/173] avg loss 0.00210687, throughput 2.86549K wps
[Epoch 75 Batch 90/173] avg loss 0.0019982, throughput 2.86807K wps
[Epoch 75 Batch 120/173] avg loss 0.00204931, throughput 2.87275K wps
[Epoch 75 Batch 150/173] avg loss 0.00204827, throughput 2.86423K wps
Begin Testing...
[Epoch 75] train avg loss 0.00205326, dev acc 0.7925, dev avg loss 0.474861, throughput 2.8654K wps
[Epoch 76 Batch 30/173] avg loss 0.00194841, throughput 2.8927K wps
[Epoch 76 Batch 60/173] avg loss 0.00206247, throughput 2.81688K wps
[Epoch 76 Batch 90/173] avg loss 0.00207354, throughput 2.84801K wps
[Epoch 76 Batch 120/173] avg loss 0.00202714, throughput 2.84827K wps
[Epoch 76 Batch 150/173] avg loss 0.00210519, throughput 2.84185K wps
Begin Testing...
[Epoch 76] train avg loss 0.00206397, dev acc 0.7956, dev avg loss 0.473761, throughput 2.84355K wps
[Epoch 77 Batch 30/173] avg loss 0.00197407, throughput 2.85893K wps
[Epoch 77 Batch 60/173] avg loss 0.00184213, throughput 2.85247K wps
[Epoch 77 Batch 90/173] avg loss 0.00187515, throughput 2.85139K wps
[Epoch 77 Batch 120/173] avg loss 0.00199144, throughput 2.83932K wps
[Epoch 77 Batch 150/173] avg loss 0.00203055, throughput 2.83676K wps
Begin Testing...
[Epoch 77] train avg loss 0.00195469, dev acc 0.7925, dev avg loss 0.484963, throughput 2.84599K wps
[Epoch 78 Batch 30/173] avg loss 0.00166752, throughput 2.9317K wps
[Epoch 78 Batch 60/173] avg loss 0.00187353, throughput 2.86293K wps
[Epoch 78 Batch 90/173] avg loss 0.00195052, throughput 2.84405K wps
[Epoch 78 Batch 120/173] avg loss 0.00192387, throughput 2.83118K wps
[Epoch 78 Batch 150/173] avg loss 0.00210014, throughput 2.86692K wps
Begin Testing...
[Epoch 78] train avg loss 0.00191417, dev acc 0.7977, dev avg loss 0.481241, throughput 2.86657K wps
[Epoch 79 Batch 30/173] avg loss 0.0018424, throughput 2.91891K wps
[Epoch 79 Batch 60/173] avg loss 0.00172469, throughput 2.86538K wps
[Epoch 79 Batch 90/173] avg loss 0.00172761, throughput 2.84085K wps
[Epoch 79 Batch 120/173] avg loss 0.00190167, throughput 2.83023K wps
[Epoch 79 Batch 150/173] avg loss 0.00187206, throughput 2.86986K wps
Begin Testing...
[Epoch 79] train avg loss 0.00181598, dev acc 0.7935, dev avg loss 0.480201, throughput 2.8588K wps
[Epoch 80 Batch 30/173] avg loss 0.00180135, throughput 2.91876K wps
[Epoch 80 Batch 60/173] avg loss 0.00183458, throughput 2.87383K wps
[Epoch 80 Batch 90/173] avg loss 0.00179171, throughput 2.85773K wps
[Epoch 80 Batch 120/173] avg loss 0.00170495, throughput 2.87018K wps
[Epoch 80 Batch 150/173] avg loss 0.00170364, throughput 2.82627K wps
Begin Testing...
[Epoch 80] train avg loss 0.00177548, dev acc 0.7862, dev avg loss 0.482672, throughput 2.86877K wps
[Epoch 81 Batch 30/173] avg loss 0.00170917, throughput 2.90106K wps
[Epoch 81 Batch 60/173] avg loss 0.00169909, throughput 2.85924K wps
[Epoch 81 Batch 90/173] avg loss 0.00167632, throughput 2.872K wps
[Epoch 81 Batch 120/173] avg loss 0.00165447, throughput 2.869K wps
[Epoch 81 Batch 150/173] avg loss 0.00162543, throughput 2.86588K wps
Begin Testing...
[Epoch 81] train avg loss 0.00169018, dev acc 0.7894, dev avg loss 0.485156, throughput 2.87137K wps
[Epoch 82 Batch 30/173] avg loss 0.00164206, throughput 2.93126K wps
[Epoch 82 Batch 60/173] avg loss 0.00176895, throughput 2.87087K wps
[Epoch 82 Batch 90/173] avg loss 0.00174238, throughput 2.86286K wps
[Epoch 82 Batch 120/173] avg loss 0.00159217, throughput 2.86142K wps
[Epoch 82 Batch 150/173] avg loss 0.00165631, throughput 2.86504K wps
Begin Testing...
[Epoch 82] train avg loss 0.00170111, dev acc 0.8008, dev avg loss 0.487261, throughput 2.87078K wps
[Epoch 83 Batch 30/173] avg loss 0.00171629, throughput 2.89956K wps
[Epoch 83 Batch 60/173] avg loss 0.0016097, throughput 2.82297K wps
[Epoch 83 Batch 90/173] avg loss 0.00154036, throughput 2.86861K wps
[Epoch 83 Batch 120/173] avg loss 0.00166848, throughput 2.86519K wps
[Epoch 83 Batch 150/173] avg loss 0.00170628, throughput 2.84552K wps
Begin Testing...
[Epoch 83] train avg loss 0.0016623, dev acc 0.7967, dev avg loss 0.486631, throughput 2.85681K wps
[Epoch 84 Batch 30/173] avg loss 0.00160874, throughput 2.90539K wps
[Epoch 84 Batch 60/173] avg loss 0.00155549, throughput 2.81534K wps
[Epoch 84 Batch 90/173] avg loss 0.00157373, throughput 2.79627K wps
[Epoch 84 Batch 120/173] avg loss 0.00171868, throughput 2.85303K wps
[Epoch 84 Batch 150/173] avg loss 0.00160627, throughput 2.87541K wps
Begin Testing...
[Epoch 84] train avg loss 0.00160628, dev acc 0.8008, dev avg loss 0.489682, throughput 2.85101K wps
[Epoch 85 Batch 30/173] avg loss 0.00164805, throughput 2.91185K wps
[Epoch 85 Batch 60/173] avg loss 0.00158403, throughput 2.8333K wps
[Epoch 85 Batch 90/173] avg loss 0.00163713, throughput 2.84387K wps
[Epoch 85 Batch 120/173] avg loss 0.00165083, throughput 2.86846K wps
[Epoch 85 Batch 150/173] avg loss 0.00167075, throughput 2.87877K wps
Begin Testing...
[Epoch 85] train avg loss 0.00162217, dev acc 0.7946, dev avg loss 0.492284, throughput 2.86836K wps
[Epoch 86 Batch 30/173] avg loss 0.00164316, throughput 2.85315K wps
[Epoch 86 Batch 60/173] avg loss 0.00145787, throughput 2.84999K wps
[Epoch 86 Batch 90/173] avg loss 0.00144302, throughput 2.84017K wps
[Epoch 86 Batch 120/173] avg loss 0.00145052, throughput 2.87189K wps
[Epoch 86 Batch 150/173] avg loss 0.00159772, throughput 2.86732K wps
Begin Testing...
[Epoch 86] train avg loss 0.00154105, dev acc 0.7956, dev avg loss 0.495396, throughput 2.85792K wps
[Epoch 87 Batch 30/173] avg loss 0.00151327, throughput 2.8743K wps
[Epoch 87 Batch 60/173] avg loss 0.00150729, throughput 2.86041K wps
[Epoch 87 Batch 90/173] avg loss 0.00150301, throughput 2.86992K wps
[Epoch 87 Batch 120/173] avg loss 0.00146499, throughput 2.86416K wps
[Epoch 87 Batch 150/173] avg loss 0.00143007, throughput 2.87317K wps
Begin Testing...
[Epoch 87] train avg loss 0.00149145, dev acc 0.7883, dev avg loss 0.496358, throughput 2.8689K wps
[Epoch 88 Batch 30/173] avg loss 0.00158804, throughput 2.88392K wps
[Epoch 88 Batch 60/173] avg loss 0.00144862, throughput 2.81385K wps
[Epoch 88 Batch 90/173] avg loss 0.00153418, throughput 2.84546K wps
[Epoch 88 Batch 120/173] avg loss 0.00135173, throughput 2.83395K wps
[Epoch 88 Batch 150/173] avg loss 0.00141841, throughput 2.85307K wps
Begin Testing...
[Epoch 88] train avg loss 0.0014728, dev acc 0.7946, dev avg loss 0.499803, throughput 2.84864K wps
[Epoch 89 Batch 30/173] avg loss 0.00151287, throughput 2.92556K wps
[Epoch 89 Batch 60/173] avg loss 0.00147515, throughput 2.865K wps
[Epoch 89 Batch 90/173] avg loss 0.00134936, throughput 2.87081K wps
[Epoch 89 Batch 120/173] avg loss 0.00149824, throughput 2.87061K wps
[Epoch 89 Batch 150/173] avg loss 0.00139184, throughput 2.84012K wps
Begin Testing...
[Epoch 89] train avg loss 0.00144349, dev acc 0.7956, dev avg loss 0.501164, throughput 2.87302K wps
[Epoch 90 Batch 30/173] avg loss 0.00154208, throughput 2.92728K wps
[Epoch 90 Batch 60/173] avg loss 0.00125546, throughput 2.85592K wps
[Epoch 90 Batch 90/173] avg loss 0.00141972, throughput 2.85148K wps
[Epoch 90 Batch 120/173] avg loss 0.00137608, throughput 2.84577K wps
[Epoch 90 Batch 150/173] avg loss 0.0013945, throughput 2.86704K wps
Begin Testing...
[Epoch 90] train avg loss 0.00140931, dev acc 0.7946, dev avg loss 0.502956, throughput 2.86879K wps
[Epoch 91 Batch 30/173] avg loss 0.00137924, throughput 2.87058K wps
[Epoch 91 Batch 60/173] avg loss 0.00141988, throughput 2.82772K wps
[Epoch 91 Batch 90/173] avg loss 0.0013713, throughput 2.84335K wps
[Epoch 91 Batch 120/173] avg loss 0.00149201, throughput 2.8731K wps
[Epoch 91 Batch 150/173] avg loss 0.00136775, throughput 2.87275K wps
Begin Testing...
[Epoch 91] train avg loss 0.00141189, dev acc 0.7956, dev avg loss 0.504993, throughput 2.85921K wps
[Epoch 92 Batch 30/173] avg loss 0.00113728, throughput 2.87957K wps
[Epoch 92 Batch 60/173] avg loss 0.00123222, throughput 2.86592K wps
[Epoch 92 Batch 90/173] avg loss 0.00140736, throughput 2.8353K wps
[Epoch 92 Batch 120/173] avg loss 0.00131005, throughput 2.84135K wps
[Epoch 92 Batch 150/173] avg loss 0.00138435, throughput 2.82097K wps
Begin Testing...
[Epoch 92] train avg loss 0.001302, dev acc 0.7967, dev avg loss 0.507532, throughput 2.8484K wps
[Epoch 93 Batch 30/173] avg loss 0.00137052, throughput 2.93416K wps
[Epoch 93 Batch 60/173] avg loss 0.0012582, throughput 2.8602K wps
[Epoch 93 Batch 90/173] avg loss 0.00125985, throughput 2.86349K wps
[Epoch 93 Batch 120/173] avg loss 0.00137215, throughput 2.83965K wps
[Epoch 93 Batch 150/173] avg loss 0.00130927, throughput 2.87202K wps
Begin Testing...
[Epoch 93] train avg loss 0.00133154, dev acc 0.7852, dev avg loss 0.519636, throughput 2.87363K wps
[Epoch 94 Batch 30/173] avg loss 0.00132415, throughput 2.87547K wps
[Epoch 94 Batch 60/173] avg loss 0.00129058, throughput 2.82694K wps
[Epoch 94 Batch 90/173] avg loss 0.00123894, throughput 2.85095K wps
[Epoch 94 Batch 120/173] avg loss 0.00133257, throughput 2.86527K wps
[Epoch 94 Batch 150/173] avg loss 0.00120872, throughput 2.86917K wps
Begin Testing...
[Epoch 94] train avg loss 0.00128696, dev acc 0.7925, dev avg loss 0.510137, throughput 2.8594K wps
[Epoch 95 Batch 30/173] avg loss 0.00133577, throughput 2.9383K wps
[Epoch 95 Batch 60/173] avg loss 0.00125741, throughput 2.85469K wps
[Epoch 95 Batch 90/173] avg loss 0.00126191, throughput 2.86369K wps
[Epoch 95 Batch 120/173] avg loss 0.00125388, throughput 2.85416K wps
[Epoch 95 Batch 150/173] avg loss 0.00113447, throughput 2.8319K wps
Begin Testing...
[Epoch 95] train avg loss 0.00123303, dev acc 0.7925, dev avg loss 0.513324, throughput 2.85925K wps
[Epoch 96 Batch 30/173] avg loss 0.00122559, throughput 2.88658K wps
[Epoch 96 Batch 60/173] avg loss 0.00116746, throughput 2.84861K wps
[Epoch 96 Batch 90/173] avg loss 0.00121565, throughput 2.8573K wps
[Epoch 96 Batch 120/173] avg loss 0.00120635, throughput 2.83688K wps
[Epoch 96 Batch 150/173] avg loss 0.0012172, throughput 2.81422K wps
Begin Testing...
[Epoch 96] train avg loss 0.0012083, dev acc 0.7946, dev avg loss 0.517567, throughput 2.84892K wps
[Epoch 97 Batch 30/173] avg loss 0.00123245, throughput 2.8946K wps
[Epoch 97 Batch 60/173] avg loss 0.00109934, throughput 2.87772K wps
[Epoch 97 Batch 90/173] avg loss 0.00113115, throughput 2.8705K wps
[Epoch 97 Batch 120/173] avg loss 0.00113878, throughput 2.86264K wps
[Epoch 97 Batch 150/173] avg loss 0.00127668, throughput 2.84061K wps
Begin Testing...
[Epoch 97] train avg loss 0.00118213, dev acc 0.7914, dev avg loss 0.516835, throughput 2.86481K wps
[Epoch 98 Batch 30/173] avg loss 0.00120017, throughput 2.91015K wps
[Epoch 98 Batch 60/173] avg loss 0.00111559, throughput 2.86416K wps
[Epoch 98 Batch 90/173] avg loss 0.0010929, throughput 2.86713K wps
[Epoch 98 Batch 120/173] avg loss 0.00114884, throughput 2.8565K wps
[Epoch 98 Batch 150/173] avg loss 0.00108198, throughput 2.87271K wps
Begin Testing...
[Epoch 98] train avg loss 0.001133, dev acc 0.7987, dev avg loss 0.521065, throughput 2.8719K wps
[Epoch 99 Batch 30/173] avg loss 0.00105227, throughput 2.92741K wps
[Epoch 99 Batch 60/173] avg loss 0.00119149, throughput 2.86292K wps
[Epoch 99 Batch 90/173] avg loss 0.00113504, throughput 2.86538K wps
[Epoch 99 Batch 120/173] avg loss 0.00111216, throughput 2.86396K wps
[Epoch 99 Batch 150/173] avg loss 0.0012806, throughput 2.8669K wps
Begin Testing...
[Epoch 99] train avg loss 0.00113327, dev acc 0.7935, dev avg loss 0.521646, throughput 2.87693K wps
[Epoch 100 Batch 30/173] avg loss 0.00113393, throughput 2.91585K wps
[Epoch 100 Batch 60/173] avg loss 0.00108847, throughput 2.8727K wps
[Epoch 100 Batch 90/173] avg loss 0.00112166, throughput 2.8685K wps
[Epoch 100 Batch 120/173] avg loss 0.00105857, throughput 2.87026K wps
[Epoch 100 Batch 150/173] avg loss 0.00108658, throughput 2.79767K wps
Begin Testing...
[Epoch 100] train avg loss 0.00111327, dev acc 0.7925, dev avg loss 0.523001, throughput 2.85575K wps
[Epoch 101 Batch 30/173] avg loss 0.00107095, throughput 2.864K wps
[Epoch 101 Batch 60/173] avg loss 0.00102847, throughput 2.83861K wps
[Epoch 101 Batch 90/173] avg loss 0.0011039, throughput 2.8249K wps
[Epoch 101 Batch 120/173] avg loss 0.00116802, throughput 2.84091K wps
[Epoch 101 Batch 150/173] avg loss 0.00106479, throughput 2.82135K wps
Begin Testing...
[Epoch 101] train avg loss 0.00108565, dev acc 0.7873, dev avg loss 0.527351, throughput 2.84193K wps
[Epoch 102 Batch 30/173] avg loss 0.000951534, throughput 2.92627K wps
[Epoch 102 Batch 60/173] avg loss 0.000996127, throughput 2.84333K wps
[Epoch 102 Batch 90/173] avg loss 0.00114444, throughput 2.83225K wps
[Epoch 102 Batch 120/173] avg loss 0.00109993, throughput 2.82295K wps
[Epoch 102 Batch 150/173] avg loss 0.00106781, throughput 2.85784K wps
Begin Testing...
[Epoch 102] train avg loss 0.0010644, dev acc 0.7883, dev avg loss 0.525733, throughput 2.8576K wps
[Epoch 103 Batch 30/173] avg loss 0.00114203, throughput 2.90752K wps
[Epoch 103 Batch 60/173] avg loss 0.00101935, throughput 2.82925K wps
[Epoch 103 Batch 90/173] avg loss 0.00101404, throughput 2.86097K wps
[Epoch 103 Batch 120/173] avg loss 0.000999125, throughput 2.8497K wps
[Epoch 103 Batch 150/173] avg loss 0.00103864, throughput 2.87049K wps
Begin Testing...
[Epoch 103] train avg loss 0.00103604, dev acc 0.7883, dev avg loss 0.526527, throughput 2.85968K wps
[Epoch 104 Batch 30/173] avg loss 0.00094706, throughput 2.92267K wps
[Epoch 104 Batch 60/173] avg loss 0.00098587, throughput 2.8589K wps
[Epoch 104 Batch 90/173] avg loss 0.000974193, throughput 2.85802K wps
[Epoch 104 Batch 120/173] avg loss 0.00101247, throughput 2.86546K wps
[Epoch 104 Batch 150/173] avg loss 0.00109716, throughput 2.83462K wps
Begin Testing...
[Epoch 104] train avg loss 0.00101461, dev acc 0.7883, dev avg loss 0.530189, throughput 2.8678K wps
[Epoch 105 Batch 30/173] avg loss 0.000988535, throughput 2.93337K wps
[Epoch 105 Batch 60/173] avg loss 0.000916878, throughput 2.86026K wps
[Epoch 105 Batch 90/173] avg loss 0.00104603, throughput 2.84322K wps
[Epoch 105 Batch 120/173] avg loss 0.00104426, throughput 2.86781K wps
[Epoch 105 Batch 150/173] avg loss 0.0010794, throughput 2.86581K wps
Begin Testing...
[Epoch 105] train avg loss 0.00100839, dev acc 0.7894, dev avg loss 0.532642, throughput 2.87087K wps
[Epoch 106 Batch 30/173] avg loss 0.00100032, throughput 2.90878K wps
[Epoch 106 Batch 60/173] avg loss 0.00101957, throughput 2.78131K wps
[Epoch 106 Batch 90/173] avg loss 0.000896017, throughput 2.83016K wps
[Epoch 106 Batch 120/173] avg loss 0.000976152, throughput 2.83948K wps
[Epoch 106 Batch 150/173] avg loss 0.0010315, throughput 2.84377K wps
Begin Testing...
[Epoch 106] train avg loss 0.000992961, dev acc 0.7894, dev avg loss 0.537173, throughput 2.84316K wps
[Epoch 107 Batch 30/173] avg loss 0.00100769, throughput 2.9006K wps
[Epoch 107 Batch 60/173] avg loss 0.00102039, throughput 2.85979K wps
[Epoch 107 Batch 90/173] avg loss 0.000942577, throughput 2.82872K wps
[Epoch 107 Batch 120/173] avg loss 0.00088686, throughput 2.81438K wps
[Epoch 107 Batch 150/173] avg loss 0.00112001, throughput 2.8622K wps
Begin Testing...
[Epoch 107] train avg loss 0.000983833, dev acc 0.7904, dev avg loss 0.537209, throughput 2.84571K wps
[Epoch 108 Batch 30/173] avg loss 0.000977354, throughput 2.87562K wps
[Epoch 108 Batch 60/173] avg loss 0.00101361, throughput 2.84579K wps
[Epoch 108 Batch 90/173] avg loss 0.000864497, throughput 2.87128K wps
[Epoch 108 Batch 120/173] avg loss 0.000887982, throughput 2.87597K wps
[Epoch 108 Batch 150/173] avg loss 0.000951385, throughput 2.86873K wps
Begin Testing...
[Epoch 108] train avg loss 0.000929919, dev acc 0.7873, dev avg loss 0.535083, throughput 2.86757K wps
[Epoch 109 Batch 30/173] avg loss 0.000889298, throughput 2.85473K wps
[Epoch 109 Batch 60/173] avg loss 0.000803797, throughput 2.85209K wps
[Epoch 109 Batch 90/173] avg loss 0.000904864, throughput 2.84858K wps
[Epoch 109 Batch 120/173] avg loss 0.000862415, throughput 2.87376K wps
[Epoch 109 Batch 150/173] avg loss 0.000974908, throughput 2.86325K wps
Begin Testing...
[Epoch 109] train avg loss 0.000890744, dev acc 0.7894, dev avg loss 0.538273, throughput 2.85684K wps
[Epoch 110 Batch 30/173] avg loss 0.000904034, throughput 2.89672K wps
[Epoch 110 Batch 60/173] avg loss 0.000854084, throughput 2.86928K wps
[Epoch 110 Batch 90/173] avg loss 0.00108146, throughput 2.86607K wps
[Epoch 110 Batch 120/173] avg loss 0.000921811, throughput 2.84464K wps
[Epoch 110 Batch 150/173] avg loss 0.000856221, throughput 2.85108K wps
Begin Testing...
[Epoch 110] train avg loss 0.000920884, dev acc 0.7883, dev avg loss 0.540523, throughput 2.86448K wps
[Epoch 111 Batch 30/173] avg loss 0.000874105, throughput 2.86586K wps
[Epoch 111 Batch 60/173] avg loss 0.000801927, throughput 2.84736K wps
[Epoch 111 Batch 90/173] avg loss 0.000850573, throughput 2.8269K wps
[Epoch 111 Batch 120/173] avg loss 0.000857512, throughput 2.79259K wps
[Epoch 111 Batch 150/173] avg loss 0.000901201, throughput 2.86924K wps
Begin Testing...
[Epoch 111] train avg loss 0.000852228, dev acc 0.7842, dev avg loss 0.541123, throughput 2.84341K wps
[Epoch 112 Batch 30/173] avg loss 0.000776937, throughput 2.93198K wps
[Epoch 112 Batch 60/173] avg loss 0.000884375, throughput 2.86335K wps
[Epoch 112 Batch 90/173] avg loss 0.000898526, throughput 2.85983K wps
[Epoch 112 Batch 120/173] avg loss 0.000870538, throughput 2.86441K wps
[Epoch 112 Batch 150/173] avg loss 0.000954588, throughput 2.85953K wps
Begin Testing...
[Epoch 112] train avg loss 0.000872535, dev acc 0.7894, dev avg loss 0.543482, throughput 2.87023K wps
[Epoch 113 Batch 30/173] avg loss 0.000844729, throughput 2.91238K wps
[Epoch 113 Batch 60/173] avg loss 0.000817836, throughput 2.86393K wps
[Epoch 113 Batch 90/173] avg loss 0.000865402, throughput 2.8511K wps
[Epoch 113 Batch 120/173] avg loss 0.000867745, throughput 2.87428K wps
[Epoch 113 Batch 150/173] avg loss 0.000904201, throughput 2.83588K wps
Begin Testing...
[Epoch 113] train avg loss 0.000850158, dev acc 0.7883, dev avg loss 0.545575, throughput 2.86105K wps
[Epoch 114 Batch 30/173] avg loss 0.00085758, throughput 2.94175K wps
[Epoch 114 Batch 60/173] avg loss 0.000885513, throughput 2.86314K wps
[Epoch 114 Batch 90/173] avg loss 0.000811557, throughput 2.87508K wps
[Epoch 114 Batch 120/173] avg loss 0.000847903, throughput 2.85719K wps
[Epoch 114 Batch 150/173] avg loss 0.000806889, throughput 2.78601K wps
Begin Testing...
[Epoch 114] train avg loss 0.000845354, dev acc 0.7842, dev avg loss 0.548979, throughput 2.86143K wps
[Epoch 115 Batch 30/173] avg loss 0.000888454, throughput 2.89624K wps
[Epoch 115 Batch 60/173] avg loss 0.000743608, throughput 2.82958K wps
[Epoch 115 Batch 90/173] avg loss 0.000801368, throughput 2.8421K wps
[Epoch 115 Batch 120/173] avg loss 0.000778463, throughput 2.84373K wps
[Epoch 115 Batch 150/173] avg loss 0.000836928, throughput 2.84854K wps
Begin Testing...
[Epoch 115] train avg loss 0.000806987, dev acc 0.7842, dev avg loss 0.556744, throughput 2.84836K wps
[Epoch 116 Batch 30/173] avg loss 0.000730279, throughput 2.93169K wps
[Epoch 116 Batch 60/173] avg loss 0.000784707, throughput 2.85923K wps
[Epoch 116 Batch 90/173] avg loss 0.000832854, throughput 2.86501K wps
[Epoch 116 Batch 120/173] avg loss 0.000799152, throughput 2.8606K wps
[Epoch 116 Batch 150/173] avg loss 0.000807797, throughput 2.86769K wps
Begin Testing...
[Epoch 116] train avg loss 0.000793101, dev acc 0.7873, dev avg loss 0.553467, throughput 2.8744K wps
[Epoch 117 Batch 30/173] avg loss 0.000842393, throughput 2.87914K wps
[Epoch 117 Batch 60/173] avg loss 0.000786812, throughput 2.86159K wps
[Epoch 117 Batch 90/173] avg loss 0.000846374, throughput 2.79984K wps
[Epoch 117 Batch 120/173] avg loss 0.000799679, throughput 2.87205K wps
[Epoch 117 Batch 150/173] avg loss 0.000818622, throughput 2.86297K wps
Begin Testing...
[Epoch 117] train avg loss 0.000815719, dev acc 0.7862, dev avg loss 0.554884, throughput 2.85599K wps
[Epoch 118 Batch 30/173] avg loss 0.000713931, throughput 2.91276K wps
[Epoch 118 Batch 60/173] avg loss 0.000692539, throughput 2.82047K wps
[Epoch 118 Batch 90/173] avg loss 0.000709356, throughput 2.86541K wps
[Epoch 118 Batch 120/173] avg loss 0.000795119, throughput 2.80227K wps
[Epoch 118 Batch 150/173] avg loss 0.000823088, throughput 2.80979K wps
Begin Testing...
[Epoch 118] train avg loss 0.00074752, dev acc 0.7894, dev avg loss 0.553074, throughput 2.84206K wps
[Epoch 119 Batch 30/173] avg loss 0.000758625, throughput 2.89424K wps
[Epoch 119 Batch 60/173] avg loss 0.000677313, throughput 2.86858K wps
[Epoch 119 Batch 90/173] avg loss 0.000806543, throughput 2.8673K wps
[Epoch 119 Batch 120/173] avg loss 0.000734773, throughput 2.82896K wps
[Epoch 119 Batch 150/173] avg loss 0.000840653, throughput 2.85224K wps
Begin Testing...
[Epoch 119] train avg loss 0.000762932, dev acc 0.7914, dev avg loss 0.55731, throughput 2.86066K wps
[Epoch 120 Batch 30/173] avg loss 0.000685377, throughput 2.89369K wps
[Epoch 120 Batch 60/173] avg loss 0.000695518, throughput 2.84006K wps
[Epoch 120 Batch 90/173] avg loss 0.00070302, throughput 2.8624K wps
[Epoch 120 Batch 120/173] avg loss 0.00076555, throughput 2.86543K wps
[Epoch 120 Batch 150/173] avg loss 0.000833454, throughput 2.86008K wps
Begin Testing...
[Epoch 120] train avg loss 0.000734399, dev acc 0.7894, dev avg loss 0.558828, throughput 2.86231K wps
[Epoch 121 Batch 30/173] avg loss 0.0007947, throughput 2.9044K wps
[Epoch 121 Batch 60/173] avg loss 0.000819044, throughput 2.86407K wps
[Epoch 121 Batch 90/173] avg loss 0.000740714, throughput 2.86762K wps
[Epoch 121 Batch 120/173] avg loss 0.000624312, throughput 2.85864K wps
[Epoch 121 Batch 150/173] avg loss 0.000753101, throughput 2.83761K wps
Begin Testing...
[Epoch 121] train avg loss 0.000748041, dev acc 0.7894, dev avg loss 0.559162, throughput 2.86267K wps
[Epoch 122 Batch 30/173] avg loss 0.000611867, throughput 2.88378K wps
[Epoch 122 Batch 60/173] avg loss 0.000788026, throughput 2.86671K wps
[Epoch 122 Batch 90/173] avg loss 0.000779725, throughput 2.82704K wps
[Epoch 122 Batch 120/173] avg loss 0.0007189, throughput 2.86904K wps
[Epoch 122 Batch 150/173] avg loss 0.000758582, throughput 2.83839K wps
Begin Testing...
[Epoch 122] train avg loss 0.000739101, dev acc 0.7883, dev avg loss 0.566599, throughput 2.85691K wps
[Epoch 123 Batch 30/173] avg loss 0.000666242, throughput 2.89848K wps
[Epoch 123 Batch 60/173] avg loss 0.000607554, throughput 2.81781K wps
[Epoch 123 Batch 90/173] avg loss 0.000788023, throughput 2.80371K wps
[Epoch 123 Batch 120/173] avg loss 0.000640784, throughput 2.83532K wps
[Epoch 123 Batch 150/173] avg loss 0.000773552, throughput 2.86288K wps
Begin Testing...
[Epoch 123] train avg loss 0.000703267, dev acc 0.7904, dev avg loss 0.562868, throughput 2.84632K wps
[Epoch 124 Batch 30/173] avg loss 0.000699647, throughput 2.92799K wps
[Epoch 124 Batch 60/173] avg loss 0.000690343, throughput 2.86604K wps
[Epoch 124 Batch 90/173] avg loss 0.000762761, throughput 2.8643K wps
[Epoch 124 Batch 120/173] avg loss 0.000686946, throughput 2.86612K wps
[Epoch 124 Batch 150/173] avg loss 0.000680188, throughput 2.82565K wps
Begin Testing...
[Epoch 124] train avg loss 0.000694417, dev acc 0.7842, dev avg loss 0.567529, throughput 2.87073K wps
[Epoch 125 Batch 30/173] avg loss 0.000722685, throughput 2.91489K wps
[Epoch 125 Batch 60/173] avg loss 0.000653109, throughput 2.83623K wps
[Epoch 125 Batch 90/173] avg loss 0.00073974, throughput 2.85409K wps
[Epoch 125 Batch 120/173] avg loss 0.000658117, throughput 2.86074K wps
[Epoch 125 Batch 150/173] avg loss 0.000675775, throughput 2.85127K wps
Begin Testing...
[Epoch 125] train avg loss 0.000701267, dev acc 0.7883, dev avg loss 0.567085, throughput 2.85492K wps
[Epoch 126 Batch 30/173] avg loss 0.000645456, throughput 2.88561K wps
[Epoch 126 Batch 60/173] avg loss 0.00066951, throughput 2.8565K wps
[Epoch 126 Batch 90/173] avg loss 0.00067153, throughput 2.86254K wps
[Epoch 126 Batch 120/173] avg loss 0.000605011, throughput 2.85662K wps
[Epoch 126 Batch 150/173] avg loss 0.000777527, throughput 2.85111K wps
Begin Testing...
[Epoch 126] train avg loss 0.000698817, dev acc 0.7883, dev avg loss 0.566879, throughput 2.86153K wps
[Epoch 127 Batch 30/173] avg loss 0.000596724, throughput 2.92911K wps
[Epoch 127 Batch 60/173] avg loss 0.000741461, throughput 2.87142K wps
[Epoch 127 Batch 90/173] avg loss 0.000627628, throughput 2.8505K wps
[Epoch 127 Batch 120/173] avg loss 0.000668574, throughput 2.86672K wps
[Epoch 127 Batch 150/173] avg loss 0.000638993, throughput 2.86084K wps
Begin Testing...
[Epoch 127] train avg loss 0.000648041, dev acc 0.7904, dev avg loss 0.571345, throughput 2.87306K wps
[Epoch 128 Batch 30/173] avg loss 0.000703682, throughput 2.86707K wps
[Epoch 128 Batch 60/173] avg loss 0.00064492, throughput 2.87895K wps
[Epoch 128 Batch 90/173] avg loss 0.000646412, throughput 2.87107K wps
[Epoch 128 Batch 120/173] avg loss 0.000608468, throughput 2.86894K wps
[Epoch 128 Batch 150/173] avg loss 0.00073051, throughput 2.86825K wps
Begin Testing...
[Epoch 128] train avg loss 0.000676082, dev acc 0.7873, dev avg loss 0.577205, throughput 2.8694K wps
[Epoch 129 Batch 30/173] avg loss 0.000752943, throughput 2.89392K wps
[Epoch 129 Batch 60/173] avg loss 0.000653715, throughput 2.82177K wps
[Epoch 129 Batch 90/173] avg loss 0.000604421, throughput 2.85077K wps
[Epoch 129 Batch 120/173] avg loss 0.000594807, throughput 2.85767K wps
[Epoch 129 Batch 150/173] avg loss 0.000616509, throughput 2.84371K wps
Begin Testing...
[Epoch 129] train avg loss 0.000638085, dev acc 0.7883, dev avg loss 0.574116, throughput 2.84686K wps
[Epoch 130 Batch 30/173] avg loss 0.000682377, throughput 2.88485K wps
[Epoch 130 Batch 60/173] avg loss 0.000618801, throughput 2.85658K wps
[Epoch 130 Batch 90/173] avg loss 0.000620531, throughput 2.85624K wps
[Epoch 130 Batch 120/173] avg loss 0.000600236, throughput 2.84654K wps
[Epoch 130 Batch 150/173] avg loss 0.000610556, throughput 2.87359K wps
Begin Testing...
[Epoch 130] train avg loss 0.000626809, dev acc 0.7894, dev avg loss 0.580488, throughput 2.86224K wps
[Epoch 131 Batch 30/173] avg loss 0.000593628, throughput 2.91712K wps
[Epoch 131 Batch 60/173] avg loss 0.000620691, throughput 2.84168K wps
[Epoch 131 Batch 90/173] avg loss 0.000584938, throughput 2.863K wps
[Epoch 131 Batch 120/173] avg loss 0.000625925, throughput 2.86803K wps
[Epoch 131 Batch 150/173] avg loss 0.000666104, throughput 2.85832K wps
Begin Testing...
[Epoch 131] train avg loss 0.000625686, dev acc 0.7883, dev avg loss 0.581776, throughput 2.8694K wps
[Epoch 132 Batch 30/173] avg loss 0.00060341, throughput 2.87329K wps
[Epoch 132 Batch 60/173] avg loss 0.00060146, throughput 2.86647K wps
[Epoch 132 Batch 90/173] avg loss 0.000712924, throughput 2.86498K wps
[Epoch 132 Batch 120/173] avg loss 0.000534879, throughput 2.85142K wps
[Epoch 132 Batch 150/173] avg loss 0.000554733, throughput 2.84325K wps
Begin Testing...
[Epoch 132] train avg loss 0.000600029, dev acc 0.7873, dev avg loss 0.584552, throughput 2.85952K wps
[Epoch 133 Batch 30/173] avg loss 0.000576123, throughput 2.9378K wps
[Epoch 133 Batch 60/173] avg loss 0.000691667, throughput 2.86252K wps
[Epoch 133 Batch 90/173] avg loss 0.000740466, throughput 2.84353K wps
[Epoch 133 Batch 120/173] avg loss 0.000583488, throughput 2.86637K wps
[Epoch 133 Batch 150/173] avg loss 0.000568431, throughput 2.85996K wps
Begin Testing...
[Epoch 133] train avg loss 0.000634427, dev acc 0.7862, dev avg loss 0.581289, throughput 2.87067K wps
[Epoch 134 Batch 30/173] avg loss 0.000522191, throughput 2.9148K wps
[Epoch 134 Batch 60/173] avg loss 0.0006167, throughput 2.85626K wps
[Epoch 134 Batch 90/173] avg loss 0.000589738, throughput 2.86421K wps
[Epoch 134 Batch 120/173] avg loss 0.000597005, throughput 2.80624K wps
[Epoch 134 Batch 150/173] avg loss 0.00060141, throughput 2.8535K wps
Begin Testing...
[Epoch 134] train avg loss 0.000575723, dev acc 0.7914, dev avg loss 0.580955, throughput 2.85959K wps
[Epoch 135 Batch 30/173] avg loss 0.000562919, throughput 2.89988K wps
[Epoch 135 Batch 60/173] avg loss 0.000620614, throughput 2.80556K wps
[Epoch 135 Batch 90/173] avg loss 0.000613555, throughput 2.81587K wps
[Epoch 135 Batch 120/173] avg loss 0.000548077, throughput 2.86253K wps
[Epoch 135 Batch 150/173] avg loss 0.000631604, throughput 2.86363K wps
Begin Testing...
[Epoch 135] train avg loss 0.000606382, dev acc 0.7831, dev avg loss 0.585593, throughput 2.84874K wps
[Epoch 136 Batch 30/173] avg loss 0.000589028, throughput 2.90042K wps
[Epoch 136 Batch 60/173] avg loss 0.00057513, throughput 2.81341K wps
[Epoch 136 Batch 90/173] avg loss 0.000484383, throughput 2.8659K wps
[Epoch 136 Batch 120/173] avg loss 0.00058896, throughput 2.85206K wps
[Epoch 136 Batch 150/173] avg loss 0.000620375, throughput 2.84269K wps
Begin Testing...
[Epoch 136] train avg loss 0.00057522, dev acc 0.7842, dev avg loss 0.58839, throughput 2.85493K wps
[Epoch 137 Batch 30/173] avg loss 0.000618435, throughput 2.88572K wps
[Epoch 137 Batch 60/173] avg loss 0.000583747, throughput 2.86888K wps
[Epoch 137 Batch 90/173] avg loss 0.000518365, throughput 2.81729K wps
[Epoch 137 Batch 120/173] avg loss 0.000556939, throughput 2.80229K wps
[Epoch 137 Batch 150/173] avg loss 0.000615608, throughput 2.86641K wps
Begin Testing...
[Epoch 137] train avg loss 0.000571769, dev acc 0.7862, dev avg loss 0.5872, throughput 2.85086K wps
[Epoch 138 Batch 30/173] avg loss 0.000581338, throughput 2.93253K wps
[Epoch 138 Batch 60/173] avg loss 0.000565126, throughput 2.85091K wps
[Epoch 138 Batch 90/173] avg loss 0.000502521, throughput 2.85643K wps
[Epoch 138 Batch 120/173] avg loss 0.000586328, throughput 2.83198K wps
[Epoch 138 Batch 150/173] avg loss 0.000541032, throughput 2.87213K wps
Begin Testing...
[Epoch 138] train avg loss 0.000555906, dev acc 0.7873, dev avg loss 0.592046, throughput 2.86846K wps
[Epoch 139 Batch 30/173] avg loss 0.000547331, throughput 2.85225K wps
[Epoch 139 Batch 60/173] avg loss 0.000587881, throughput 2.85923K wps
[Epoch 139 Batch 90/173] avg loss 0.000535033, throughput 2.87111K wps
[Epoch 139 Batch 120/173] avg loss 0.000523643, throughput 2.87504K wps
[Epoch 139 Batch 150/173] avg loss 0.000511616, throughput 2.87245K wps
Begin Testing...
[Epoch 139] train avg loss 0.000549441, dev acc 0.7821, dev avg loss 0.593434, throughput 2.86694K wps
[Epoch 140 Batch 30/173] avg loss 0.000480556, throughput 2.90409K wps
[Epoch 140 Batch 60/173] avg loss 0.000527325, throughput 2.86156K wps
[Epoch 140 Batch 90/173] avg loss 0.000544636, throughput 2.87364K wps
[Epoch 140 Batch 120/173] avg loss 0.000589309, throughput 2.8557K wps
[Epoch 140 Batch 150/173] avg loss 0.000511025, throughput 2.83514K wps
Begin Testing...
[Epoch 140] train avg loss 0.000544897, dev acc 0.7831, dev avg loss 0.597931, throughput 2.86235K wps
[Epoch 141 Batch 30/173] avg loss 0.000530451, throughput 2.8881K wps
[Epoch 141 Batch 60/173] avg loss 0.000516867, throughput 2.87838K wps
[Epoch 141 Batch 90/173] avg loss 0.000540825, throughput 2.85597K wps
[Epoch 141 Batch 120/173] avg loss 0.000495038, throughput 2.8625K wps
[Epoch 141 Batch 150/173] avg loss 0.000535422, throughput 2.84786K wps
Begin Testing...
[Epoch 141] train avg loss 0.000519227, dev acc 0.7852, dev avg loss 0.596718, throughput 2.86335K wps
[Epoch 142 Batch 30/173] avg loss 0.000544097, throughput 2.91081K wps
[Epoch 142 Batch 60/173] avg loss 0.000480201, throughput 2.86924K wps
[Epoch 142 Batch 90/173] avg loss 0.000521929, throughput 2.82761K wps
[Epoch 142 Batch 120/173] avg loss 0.000501347, throughput 2.83044K wps
[Epoch 142 Batch 150/173] avg loss 0.000545809, throughput 2.87548K wps
Begin Testing...
[Epoch 142] train avg loss 0.000525574, dev acc 0.7842, dev avg loss 0.598651, throughput 2.86406K wps
[Epoch 143 Batch 30/173] avg loss 0.000524422, throughput 2.89519K wps
[Epoch 143 Batch 60/173] avg loss 0.000516038, throughput 2.87053K wps
[Epoch 143 Batch 90/173] avg loss 0.00054383, throughput 2.83636K wps
[Epoch 143 Batch 120/173] avg loss 0.000455938, throughput 2.86822K wps
[Epoch 143 Batch 150/173] avg loss 0.00061961, throughput 2.87336K wps
Begin Testing...
[Epoch 143] train avg loss 0.000530691, dev acc 0.7852, dev avg loss 0.598433, throughput 2.86936K wps
[Epoch 144 Batch 30/173] avg loss 0.000612047, throughput 2.92781K wps
[Epoch 144 Batch 60/173] avg loss 0.00057453, throughput 2.86286K wps
[Epoch 144 Batch 90/173] avg loss 0.000545852, throughput 2.85719K wps
[Epoch 144 Batch 120/173] avg loss 0.000506112, throughput 2.84179K wps
[Epoch 144 Batch 150/173] avg loss 0.000526208, throughput 2.83064K wps
Begin Testing...
[Epoch 144] train avg loss 0.000544316, dev acc 0.7862, dev avg loss 0.597743, throughput 2.86088K wps
[Epoch 145 Batch 30/173] avg loss 0.00052625, throughput 2.848K wps
[Epoch 145 Batch 60/173] avg loss 0.000436184, throughput 2.80015K wps
[Epoch 145 Batch 90/173] avg loss 0.000497123, throughput 2.83083K wps
[Epoch 145 Batch 120/173] avg loss 0.000470555, throughput 2.86097K wps
[Epoch 145 Batch 150/173] avg loss 0.00055001, throughput 2.87125K wps
Begin Testing...
[Epoch 145] train avg loss 0.000496515, dev acc 0.7862, dev avg loss 0.605492, throughput 2.84565K wps
[Epoch 146 Batch 30/173] avg loss 0.000495231, throughput 2.9186K wps
[Epoch 146 Batch 60/173] avg loss 0.00044561, throughput 2.82987K wps
[Epoch 146 Batch 90/173] avg loss 0.00050598, throughput 2.80695K wps
[Epoch 146 Batch 120/173] avg loss 0.000464138, throughput 2.81996K wps
[Epoch 146 Batch 150/173] avg loss 0.000495778, throughput 2.80794K wps
Begin Testing...
[Epoch 146] train avg loss 0.000484848, dev acc 0.7842, dev avg loss 0.607023, throughput 2.83874K wps
[Epoch 147 Batch 30/173] avg loss 0.000488258, throughput 2.93138K wps
[Epoch 147 Batch 60/173] avg loss 0.000503512, throughput 2.8525K wps
[Epoch 147 Batch 90/173] avg loss 0.000509825, throughput 2.85778K wps
[Epoch 147 Batch 120/173] avg loss 0.00048018, throughput 2.85767K wps
[Epoch 147 Batch 150/173] avg loss 0.000481595, throughput 2.87298K wps
Begin Testing...
[Epoch 147] train avg loss 0.000498277, dev acc 0.7862, dev avg loss 0.599949, throughput 2.8747K wps
[Epoch 148 Batch 30/173] avg loss 0.000529261, throughput 2.9362K wps
[Epoch 148 Batch 60/173] avg loss 0.000538964, throughput 2.85398K wps
[Epoch 148 Batch 90/173] avg loss 0.000389955, throughput 2.87139K wps
[Epoch 148 Batch 120/173] avg loss 0.000516279, throughput 2.83213K wps
[Epoch 148 Batch 150/173] avg loss 0.000467796, throughput 2.86562K wps
Begin Testing...
[Epoch 148] train avg loss 0.000483365, dev acc 0.7831, dev avg loss 0.605916, throughput 2.86901K wps
[Epoch 149 Batch 30/173] avg loss 0.000508606, throughput 2.88863K wps
[Epoch 149 Batch 60/173] avg loss 0.000484902, throughput 2.87288K wps
[Epoch 149 Batch 90/173] avg loss 0.000443456, throughput 2.86421K wps
[Epoch 149 Batch 120/173] avg loss 0.000471092, throughput 2.8471K wps
[Epoch 149 Batch 150/173] avg loss 0.000492399, throughput 2.82833K wps
Begin Testing...
[Epoch 149] train avg loss 0.000479148, dev acc 0.7862, dev avg loss 0.606718, throughput 2.85822K wps
[Epoch 150 Batch 30/173] avg loss 0.00044877, throughput 2.93583K wps
[Epoch 150 Batch 60/173] avg loss 0.00049695, throughput 2.84139K wps
[Epoch 150 Batch 90/173] avg loss 0.000482404, throughput 2.85488K wps
[Epoch 150 Batch 120/173] avg loss 0.000374875, throughput 2.86658K wps
[Epoch 150 Batch 150/173] avg loss 0.000498547, throughput 2.83185K wps
Begin Testing...
[Epoch 150] train avg loss 0.000465807, dev acc 0.7842, dev avg loss 0.606638, throughput 2.86612K wps
[Epoch 151 Batch 30/173] avg loss 0.000453489, throughput 2.93463K wps
[Epoch 151 Batch 60/173] avg loss 0.000484131, throughput 2.86368K wps
[Epoch 151 Batch 90/173] avg loss 0.000449863, throughput 2.8713K wps
[Epoch 151 Batch 120/173] avg loss 0.000501743, throughput 2.86658K wps
[Epoch 151 Batch 150/173] avg loss 0.000433364, throughput 2.84809K wps
Begin Testing...
[Epoch 151] train avg loss 0.000464144, dev acc 0.7873, dev avg loss 0.607889, throughput 2.87428K wps
[Epoch 152 Batch 30/173] avg loss 0.000403125, throughput 2.90409K wps
[Epoch 152 Batch 60/173] avg loss 0.000472942, throughput 2.83286K wps
[Epoch 152 Batch 90/173] avg loss 0.000521479, throughput 2.83872K wps
[Epoch 152 Batch 120/173] avg loss 0.000441486, throughput 2.83978K wps
[Epoch 152 Batch 150/173] avg loss 0.000460794, throughput 2.85757K wps
Begin Testing...
[Epoch 152] train avg loss 0.000464212, dev acc 0.7810, dev avg loss 0.617591, throughput 2.84883K wps
[Epoch 153 Batch 30/173] avg loss 0.000501107, throughput 2.87914K wps
[Epoch 153 Batch 60/173] avg loss 0.000451507, throughput 2.8617K wps
[Epoch 153 Batch 90/173] avg loss 0.000435249, throughput 2.85213K wps
[Epoch 153 Batch 120/173] avg loss 0.000468242, throughput 2.82695K wps
[Epoch 153 Batch 150/173] avg loss 0.000457009, throughput 2.8518K wps
Begin Testing...
[Epoch 153] train avg loss 0.000461288, dev acc 0.7831, dev avg loss 0.610568, throughput 2.85336K wps
[Epoch 154 Batch 30/173] avg loss 0.000445475, throughput 2.91784K wps
[Epoch 154 Batch 60/173] avg loss 0.000446607, throughput 2.86188K wps
[Epoch 154 Batch 90/173] avg loss 0.000465266, throughput 2.84234K wps
[Epoch 154 Batch 120/173] avg loss 0.000400074, throughput 2.83127K wps
[Epoch 154 Batch 150/173] avg loss 0.000473932, throughput 2.84317K wps
Begin Testing...
[Epoch 154] train avg loss 0.000443895, dev acc 0.7852, dev avg loss 0.618122, throughput 2.86047K wps
[Epoch 155 Batch 30/173] avg loss 0.00042468, throughput 2.89389K wps
[Epoch 155 Batch 60/173] avg loss 0.000407903, throughput 2.8674K wps
[Epoch 155 Batch 90/173] avg loss 0.000421715, throughput 2.82294K wps
[Epoch 155 Batch 120/173] avg loss 0.000432407, throughput 2.78569K wps
[Epoch 155 Batch 150/173] avg loss 0.0004663, throughput 2.87113K wps
Begin Testing...
[Epoch 155] train avg loss 0.000428664, dev acc 0.7842, dev avg loss 0.61699, throughput 2.84778K wps
[Epoch 156 Batch 30/173] avg loss 0.000487237, throughput 2.92453K wps
[Epoch 156 Batch 60/173] avg loss 0.000393822, throughput 2.84737K wps
[Epoch 156 Batch 90/173] avg loss 0.000367688, throughput 2.86762K wps
[Epoch 156 Batch 120/173] avg loss 0.000442786, throughput 2.8584K wps
[Epoch 156 Batch 150/173] avg loss 0.000427794, throughput 2.85449K wps
Begin Testing...
[Epoch 156] train avg loss 0.000418662, dev acc 0.7842, dev avg loss 0.615666, throughput 2.86741K wps
[Epoch 157 Batch 30/173] avg loss 0.000489022, throughput 2.92239K wps
[Epoch 157 Batch 60/173] avg loss 0.000361962, throughput 2.8369K wps
[Epoch 157 Batch 90/173] avg loss 0.000386125, throughput 2.82993K wps
[Epoch 157 Batch 120/173] avg loss 0.000437656, throughput 2.87204K wps
[Epoch 157 Batch 150/173] avg loss 0.000447016, throughput 2.86693K wps
Begin Testing...
[Epoch 157] train avg loss 0.000423601, dev acc 0.7779, dev avg loss 0.619952, throughput 2.86523K wps
[Epoch 158 Batch 30/173] avg loss 0.000368308, throughput 2.92895K wps
[Epoch 158 Batch 60/173] avg loss 0.000391919, throughput 2.86297K wps
[Epoch 158 Batch 90/173] avg loss 0.000420243, throughput 2.86514K wps
[Epoch 158 Batch 120/173] avg loss 0.000452044, throughput 2.8592K wps
[Epoch 158 Batch 150/173] avg loss 0.000403831, throughput 2.8688K wps
Begin Testing...
[Epoch 158] train avg loss 0.00041161, dev acc 0.7810, dev avg loss 0.622217, throughput 2.87039K wps
[Epoch 159 Batch 30/173] avg loss 0.000413393, throughput 2.91911K wps
[Epoch 159 Batch 60/173] avg loss 0.000431145, throughput 2.85875K wps
[Epoch 159 Batch 90/173] avg loss 0.000432695, throughput 2.84753K wps
[Epoch 159 Batch 120/173] avg loss 0.000348521, throughput 2.85505K wps
[Epoch 159 Batch 150/173] avg loss 0.000468004, throughput 2.86708K wps
Begin Testing...
[Epoch 159] train avg loss 0.000421614, dev acc 0.7831, dev avg loss 0.622049, throughput 2.87058K wps
[Epoch 160 Batch 30/173] avg loss 0.000375843, throughput 2.90768K wps
[Epoch 160 Batch 60/173] avg loss 0.000406541, throughput 2.83185K wps
[Epoch 160 Batch 90/173] avg loss 0.000400624, throughput 2.84277K wps
[Epoch 160 Batch 120/173] avg loss 0.000410934, throughput 2.86252K wps
[Epoch 160 Batch 150/173] avg loss 0.00039824, throughput 2.8406K wps
Begin Testing...
[Epoch 160] train avg loss 0.000403383, dev acc 0.7852, dev avg loss 0.620135, throughput 2.85378K wps
[Epoch 161 Batch 30/173] avg loss 0.000433599, throughput 2.90037K wps
[Epoch 161 Batch 60/173] avg loss 0.000405775, throughput 2.85895K wps
[Epoch 161 Batch 90/173] avg loss 0.000403699, throughput 2.83359K wps
[Epoch 161 Batch 120/173] avg loss 0.000415802, throughput 2.85215K wps
[Epoch 161 Batch 150/173] avg loss 0.000387323, throughput 2.82512K wps
Begin Testing...
[Epoch 161] train avg loss 0.000418197, dev acc 0.7862, dev avg loss 0.624249, throughput 2.85512K wps
[Epoch 162 Batch 30/173] avg loss 0.000485924, throughput 2.90857K wps
[Epoch 162 Batch 60/173] avg loss 0.000403681, throughput 2.87479K wps
[Epoch 162 Batch 90/173] avg loss 0.000372661, throughput 2.865K wps
[Epoch 162 Batch 120/173] avg loss 0.000354502, throughput 2.83671K wps
[Epoch 162 Batch 150/173] avg loss 0.000418805, throughput 2.86202K wps
Begin Testing...
[Epoch 162] train avg loss 0.000410964, dev acc 0.7852, dev avg loss 0.623792, throughput 2.86914K wps
[Epoch 163 Batch 30/173] avg loss 0.000396494, throughput 2.93705K wps
[Epoch 163 Batch 60/173] avg loss 0.000427403, throughput 2.83098K wps
[Epoch 163 Batch 90/173] avg loss 0.000417851, throughput 2.85837K wps
[Epoch 163 Batch 120/173] avg loss 0.000411434, throughput 2.86876K wps
[Epoch 163 Batch 150/173] avg loss 0.000385708, throughput 2.86411K wps
Begin Testing...
[Epoch 163] train avg loss 0.000401922, dev acc 0.7883, dev avg loss 0.621967, throughput 2.87047K wps
[Epoch 164 Batch 30/173] avg loss 0.000419446, throughput 2.88373K wps
[Epoch 164 Batch 60/173] avg loss 0.000446875, throughput 2.82919K wps
[Epoch 164 Batch 90/173] avg loss 0.000398492, throughput 2.82938K wps
[Epoch 164 Batch 120/173] avg loss 0.000401758, throughput 2.86607K wps
[Epoch 164 Batch 150/173] avg loss 0.000391692, throughput 2.85702K wps
Begin Testing...
[Epoch 164] train avg loss 0.000404994, dev acc 0.7862, dev avg loss 0.623651, throughput 2.85499K wps
[Epoch 165 Batch 30/173] avg loss 0.000397555, throughput 2.91469K wps
[Epoch 165 Batch 60/173] avg loss 0.000389963, throughput 2.85467K wps
[Epoch 165 Batch 90/173] avg loss 0.000379484, throughput 2.84114K wps
[Epoch 165 Batch 120/173] avg loss 0.000379686, throughput 2.84969K wps
[Epoch 165 Batch 150/173] avg loss 0.000419586, throughput 2.854K wps
Begin Testing...
[Epoch 165] train avg loss 0.000394029, dev acc 0.7862, dev avg loss 0.626869, throughput 2.86047K wps
[Epoch 166 Batch 30/173] avg loss 0.000340074, throughput 2.90884K wps
[Epoch 166 Batch 60/173] avg loss 0.000334226, throughput 2.82651K wps
[Epoch 166 Batch 90/173] avg loss 0.000394512, throughput 2.84758K wps
[Epoch 166 Batch 120/173] avg loss 0.00041348, throughput 2.86043K wps
[Epoch 166 Batch 150/173] avg loss 0.000475429, throughput 2.86496K wps
Begin Testing...
[Epoch 166] train avg loss 0.000392815, dev acc 0.7873, dev avg loss 0.627558, throughput 2.8546K wps
[Epoch 167 Batch 30/173] avg loss 0.000359572, throughput 2.92275K wps
[Epoch 167 Batch 60/173] avg loss 0.000406606, throughput 2.80062K wps
[Epoch 167 Batch 90/173] avg loss 0.000440937, throughput 2.86218K wps
[Epoch 167 Batch 120/173] avg loss 0.000381073, throughput 2.85217K wps
[Epoch 167 Batch 150/173] avg loss 0.000354021, throughput 2.8259K wps
Begin Testing...
[Epoch 167] train avg loss 0.000393433, dev acc 0.7842, dev avg loss 0.629742, throughput 2.85272K wps
[Epoch 168 Batch 30/173] avg loss 0.00037669, throughput 2.91809K wps
[Epoch 168 Batch 60/173] avg loss 0.000367021, throughput 2.8692K wps
[Epoch 168 Batch 90/173] avg loss 0.000374884, throughput 2.80766K wps
[Epoch 168 Batch 120/173] avg loss 0.000401825, throughput 2.85008K wps
[Epoch 168 Batch 150/173] avg loss 0.000402041, throughput 2.87256K wps
Begin Testing...
[Epoch 168] train avg loss 0.000383886, dev acc 0.7852, dev avg loss 0.62797, throughput 2.86251K wps
[Epoch 169 Batch 30/173] avg loss 0.000396427, throughput 2.87175K wps
[Epoch 169 Batch 60/173] avg loss 0.000421934, throughput 2.85722K wps
[Epoch 169 Batch 90/173] avg loss 0.000368073, throughput 2.84303K wps
[Epoch 169 Batch 120/173] avg loss 0.000379802, throughput 2.85762K wps
[Epoch 169 Batch 150/173] avg loss 0.000381328, throughput 2.85018K wps
Begin Testing...
[Epoch 169] train avg loss 0.000387008, dev acc 0.7873, dev avg loss 0.631282, throughput 2.85739K wps
[Epoch 170 Batch 30/173] avg loss 0.000379575, throughput 2.88353K wps
[Epoch 170 Batch 60/173] avg loss 0.000324883, throughput 2.84605K wps
[Epoch 170 Batch 90/173] avg loss 0.000416712, throughput 2.85384K wps
[Epoch 170 Batch 120/173] avg loss 0.000345016, throughput 2.8373K wps
[Epoch 170 Batch 150/173] avg loss 0.000374519, throughput 2.84806K wps
Begin Testing...
[Epoch 170] train avg loss 0.000365782, dev acc 0.7810, dev avg loss 0.639314, throughput 2.85606K wps
[Epoch 171 Batch 30/173] avg loss 0.000403998, throughput 2.92514K wps
[Epoch 171 Batch 60/173] avg loss 0.000428855, throughput 2.86575K wps
[Epoch 171 Batch 90/173] avg loss 0.000387055, throughput 2.86117K wps
[Epoch 171 Batch 120/173] avg loss 0.000351612, throughput 2.86308K wps
[Epoch 171 Batch 150/173] avg loss 0.000401956, throughput 2.8603K wps
Begin Testing...
[Epoch 171] train avg loss 0.000395305, dev acc 0.7862, dev avg loss 0.63512, throughput 2.87235K wps
[Epoch 172 Batch 30/173] avg loss 0.000352786, throughput 2.93147K wps
[Epoch 172 Batch 60/173] avg loss 0.000372118, throughput 2.86669K wps
[Epoch 172 Batch 90/173] avg loss 0.000328525, throughput 2.86652K wps
[Epoch 172 Batch 120/173] avg loss 0.000334523, throughput 2.86376K wps
[Epoch 172 Batch 150/173] avg loss 0.000397142, throughput 2.85514K wps
Begin Testing...
[Epoch 172] train avg loss 0.000352358, dev acc 0.7831, dev avg loss 0.637431, throughput 2.87126K wps
[Epoch 173 Batch 30/173] avg loss 0.000384349, throughput 2.90083K wps
[Epoch 173 Batch 60/173] avg loss 0.000381728, throughput 2.82599K wps
[Epoch 173 Batch 90/173] avg loss 0.000399281, throughput 2.8409K wps
[Epoch 173 Batch 120/173] avg loss 0.000456213, throughput 2.82177K wps
[Epoch 173 Batch 150/173] avg loss 0.000306709, throughput 2.85264K wps
Begin Testing...
[Epoch 173] train avg loss 0.000374301, dev acc 0.7821, dev avg loss 0.639582, throughput 2.84985K wps
[Epoch 174 Batch 30/173] avg loss 0.000319312, throughput 2.93577K wps
[Epoch 174 Batch 60/173] avg loss 0.000350041, throughput 2.84158K wps
[Epoch 174 Batch 90/173] avg loss 0.000385172, throughput 2.87056K wps
[Epoch 174 Batch 120/173] avg loss 0.000343787, throughput 2.83019K wps
[Epoch 174 Batch 150/173] avg loss 0.000341097, throughput 2.82847K wps
Begin Testing...
[Epoch 174] train avg loss 0.000358929, dev acc 0.7883, dev avg loss 0.635889, throughput 2.8594K wps
[Epoch 175 Batch 30/173] avg loss 0.000391662, throughput 2.87193K wps
[Epoch 175 Batch 60/173] avg loss 0.000334535, throughput 2.86624K wps
[Epoch 175 Batch 90/173] avg loss 0.000369582, throughput 2.83009K wps
[Epoch 175 Batch 120/173] avg loss 0.000406424, throughput 2.85162K wps
[Epoch 175 Batch 150/173] avg loss 0.000325553, throughput 2.84069K wps
Begin Testing...
[Epoch 175] train avg loss 0.000354657, dev acc 0.7852, dev avg loss 0.639623, throughput 2.8521K wps
[Epoch 176 Batch 30/173] avg loss 0.000398468, throughput 2.92676K wps
[Epoch 176 Batch 60/173] avg loss 0.000369051, throughput 2.85203K wps
[Epoch 176 Batch 90/173] avg loss 0.000376045, throughput 2.84691K wps
[Epoch 176 Batch 120/173] avg loss 0.000341836, throughput 2.84716K wps
[Epoch 176 Batch 150/173] avg loss 0.000291703, throughput 2.84994K wps
Begin Testing...
[Epoch 176] train avg loss 0.000346542, dev acc 0.7831, dev avg loss 0.64395, throughput 2.86163K wps
[Epoch 177 Batch 30/173] avg loss 0.000382068, throughput 2.93537K wps
[Epoch 177 Batch 60/173] avg loss 0.000383685, throughput 2.84973K wps
[Epoch 177 Batch 90/173] avg loss 0.000288965, throughput 2.83599K wps
[Epoch 177 Batch 120/173] avg loss 0.000307124, throughput 2.79185K wps
[Epoch 177 Batch 150/173] avg loss 0.000366816, throughput 2.84186K wps
Begin Testing...
[Epoch 177] train avg loss 0.000341137, dev acc 0.7842, dev avg loss 0.646296, throughput 2.84584K wps
[Epoch 178 Batch 30/173] avg loss 0.000337634, throughput 2.92163K wps
[Epoch 178 Batch 60/173] avg loss 0.000338621, throughput 2.84397K wps
[Epoch 178 Batch 90/173] avg loss 0.000327818, throughput 2.84298K wps
[Epoch 178 Batch 120/173] avg loss 0.000349389, throughput 2.86475K wps
[Epoch 178 Batch 150/173] avg loss 0.000343012, throughput 2.8634K wps
Begin Testing...
[Epoch 178] train avg loss 0.000339205, dev acc 0.7831, dev avg loss 0.645317, throughput 2.86586K wps
[Epoch 179 Batch 30/173] avg loss 0.00032542, throughput 2.89366K wps
[Epoch 179 Batch 60/173] avg loss 0.000376476, throughput 2.83648K wps
[Epoch 179 Batch 90/173] avg loss 0.000379678, throughput 2.85156K wps
[Epoch 179 Batch 120/173] avg loss 0.000309147, throughput 2.84284K wps
[Epoch 179 Batch 150/173] avg loss 0.000334106, throughput 2.8169K wps
Begin Testing...
[Epoch 179] train avg loss 0.000345424, dev acc 0.7810, dev avg loss 0.650379, throughput 2.85181K wps
[Epoch 180 Batch 30/173] avg loss 0.000342365, throughput 2.88535K wps
[Epoch 180 Batch 60/173] avg loss 0.000325236, throughput 2.82113K wps
[Epoch 180 Batch 90/173] avg loss 0.000320191, throughput 2.85304K wps
[Epoch 180 Batch 120/173] avg loss 0.000345469, throughput 2.86977K wps
[Epoch 180 Batch 150/173] avg loss 0.000384861, throughput 2.87106K wps
Begin Testing...
[Epoch 180] train avg loss 0.000340638, dev acc 0.7883, dev avg loss 0.643572, throughput 2.86167K wps
[Epoch 181 Batch 30/173] avg loss 0.000328188, throughput 2.90544K wps
[Epoch 181 Batch 60/173] avg loss 0.000320136, throughput 2.80168K wps
[Epoch 181 Batch 90/173] avg loss 0.000297587, throughput 2.82155K wps
[Epoch 181 Batch 120/173] avg loss 0.000286909, throughput 2.86764K wps
[Epoch 181 Batch 150/173] avg loss 0.00036964, throughput 2.87078K wps
Begin Testing...
[Epoch 181] train avg loss 0.00031944, dev acc 0.7894, dev avg loss 0.645273, throughput 2.85035K wps
[Epoch 182 Batch 30/173] avg loss 0.00038241, throughput 2.92877K wps
[Epoch 182 Batch 60/173] avg loss 0.000334133, throughput 2.81219K wps
[Epoch 182 Batch 90/173] avg loss 0.000302442, throughput 2.85585K wps
[Epoch 182 Batch 120/173] avg loss 0.000313233, throughput 2.86332K wps
[Epoch 182 Batch 150/173] avg loss 0.00030715, throughput 2.84062K wps
Begin Testing...
[Epoch 182] train avg loss 0.000329238, dev acc 0.7883, dev avg loss 0.645065, throughput 2.86036K wps
[Epoch 183 Batch 30/173] avg loss 0.000303067, throughput 2.86223K wps
[Epoch 183 Batch 60/173] avg loss 0.000339685, throughput 2.82264K wps
[Epoch 183 Batch 90/173] avg loss 0.000267997, throughput 2.86802K wps
[Epoch 183 Batch 120/173] avg loss 0.000294349, throughput 2.84717K wps
[Epoch 183 Batch 150/173] avg loss 0.000304883, throughput 2.80379K wps
Begin Testing...
[Epoch 183] train avg loss 0.000306977, dev acc 0.7873, dev avg loss 0.646245, throughput 2.8438K wps
[Epoch 184 Batch 30/173] avg loss 0.000314637, throughput 2.92109K wps
[Epoch 184 Batch 60/173] avg loss 0.00029772, throughput 2.86211K wps
[Epoch 184 Batch 90/173] avg loss 0.000322023, throughput 2.86562K wps
[Epoch 184 Batch 120/173] avg loss 0.000344799, throughput 2.86051K wps
[Epoch 184 Batch 150/173] avg loss 0.000322956, throughput 2.84161K wps
Begin Testing...
[Epoch 184] train avg loss 0.00031729, dev acc 0.7852, dev avg loss 0.650956, throughput 2.86798K wps
[Epoch 185 Batch 30/173] avg loss 0.000289646, throughput 2.90881K wps
[Epoch 185 Batch 60/173] avg loss 0.00027823, throughput 2.86748K wps
[Epoch 185 Batch 90/173] avg loss 0.00034618, throughput 2.86519K wps
[Epoch 185 Batch 120/173] avg loss 0.000280243, throughput 2.82829K wps
[Epoch 185 Batch 150/173] avg loss 0.000328808, throughput 2.86786K wps
Begin Testing...
[Epoch 185] train avg loss 0.000307281, dev acc 0.7862, dev avg loss 0.650292, throughput 2.86707K wps
[Epoch 186 Batch 30/173] avg loss 0.000306438, throughput 2.93569K wps
[Epoch 186 Batch 60/173] avg loss 0.000278284, throughput 2.85249K wps
[Epoch 186 Batch 90/173] avg loss 0.000292517, throughput 2.84061K wps
[Epoch 186 Batch 120/173] avg loss 0.00032218, throughput 2.86707K wps
[Epoch 186 Batch 150/173] avg loss 0.000339472, throughput 2.87237K wps
Begin Testing...
[Epoch 186] train avg loss 0.000310647, dev acc 0.7883, dev avg loss 0.65087, throughput 2.87418K wps
[Epoch 187 Batch 30/173] avg loss 0.000308383, throughput 2.89709K wps
[Epoch 187 Batch 60/173] avg loss 0.000287729, throughput 2.84948K wps
[Epoch 187 Batch 90/173] avg loss 0.000302269, throughput 2.84812K wps
[Epoch 187 Batch 120/173] avg loss 0.000291121, throughput 2.84768K wps
[Epoch 187 Batch 150/173] avg loss 0.000360098, throughput 2.84079K wps
Begin Testing...
[Epoch 187] train avg loss 0.00030615, dev acc 0.7873, dev avg loss 0.653143, throughput 2.85448K wps
[Epoch 188 Batch 30/173] avg loss 0.000317296, throughput 2.92542K wps
[Epoch 188 Batch 60/173] avg loss 0.00028025, throughput 2.86495K wps
[Epoch 188 Batch 90/173] avg loss 0.00031117, throughput 2.86027K wps
[Epoch 188 Batch 120/173] avg loss 0.000280006, throughput 2.84884K wps
[Epoch 188 Batch 150/173] avg loss 0.000334481, throughput 2.82584K wps
Begin Testing...
[Epoch 188] train avg loss 0.000305035, dev acc 0.7852, dev avg loss 0.655709, throughput 2.86191K wps
[Epoch 189 Batch 30/173] avg loss 0.000341148, throughput 2.90635K wps
[Epoch 189 Batch 60/173] avg loss 0.000273013, throughput 2.84993K wps
[Epoch 189 Batch 90/173] avg loss 0.000339357, throughput 2.86163K wps
[Epoch 189 Batch 120/173] avg loss 0.000302176, throughput 2.84795K wps
[Epoch 189 Batch 150/173] avg loss 0.000304354, throughput 2.86417K wps
Begin Testing...
[Epoch 189] train avg loss 0.000309187, dev acc 0.7852, dev avg loss 0.657796, throughput 2.86447K wps
[Epoch 190 Batch 30/173] avg loss 0.000279294, throughput 2.92776K wps
[Epoch 190 Batch 60/173] avg loss 0.000307117, throughput 2.84957K wps
[Epoch 190 Batch 90/173] avg loss 0.000285229, throughput 2.87123K wps
[Epoch 190 Batch 120/173] avg loss 0.000343704, throughput 2.83303K wps
[Epoch 190 Batch 150/173] avg loss 0.000374937, throughput 2.87431K wps
Begin Testing...
[Epoch 190] train avg loss 0.000316679, dev acc 0.7883, dev avg loss 0.652824, throughput 2.87069K wps
[Epoch 191 Batch 30/173] avg loss 0.000287529, throughput 2.93002K wps
[Epoch 191 Batch 60/173] avg loss 0.000295086, throughput 2.8672K wps
[Epoch 191 Batch 90/173] avg loss 0.000270606, throughput 2.85086K wps
[Epoch 191 Batch 120/173] avg loss 0.000296873, throughput 2.84506K wps
[Epoch 191 Batch 150/173] avg loss 0.00030273, throughput 2.86513K wps
Begin Testing...
[Epoch 191] train avg loss 0.000289438, dev acc 0.7894, dev avg loss 0.655944, throughput 2.86947K wps
[Epoch 192 Batch 30/173] avg loss 0.000296204, throughput 2.89212K wps
[Epoch 192 Batch 60/173] avg loss 0.00029718, throughput 2.85558K wps
[Epoch 192 Batch 90/173] avg loss 0.000301181, throughput 2.84044K wps
[Epoch 192 Batch 120/173] avg loss 0.000309779, throughput 2.87254K wps
[Epoch 192 Batch 150/173] avg loss 0.000294131, throughput 2.87034K wps
Begin Testing...
[Epoch 192] train avg loss 0.00029621, dev acc 0.7894, dev avg loss 0.657546, throughput 2.8669K wps
[Epoch 193 Batch 30/173] avg loss 0.000296724, throughput 2.89311K wps
[Epoch 193 Batch 60/173] avg loss 0.000314395, throughput 2.81428K wps
[Epoch 193 Batch 90/173] avg loss 0.000265989, throughput 2.82713K wps
[Epoch 193 Batch 120/173] avg loss 0.000302154, throughput 2.82045K wps
[Epoch 193 Batch 150/173] avg loss 0.000290713, throughput 2.8505K wps
Begin Testing...
[Epoch 193] train avg loss 0.000295648, dev acc 0.7883, dev avg loss 0.659201, throughput 2.84209K wps
[Epoch 194 Batch 30/173] avg loss 0.000241448, throughput 2.89433K wps
[Epoch 194 Batch 60/173] avg loss 0.000265296, throughput 2.85754K wps
[Epoch 194 Batch 90/173] avg loss 0.000306573, throughput 2.6715K wps
[Epoch 194 Batch 120/173] avg loss 0.000299401, throughput 2.83946K wps
[Epoch 194 Batch 150/173] avg loss 0.000338811, throughput 2.85479K wps
Begin Testing...
[Epoch 194] train avg loss 0.0002946, dev acc 0.7852, dev avg loss 0.664381, throughput 2.82451K wps
[Epoch 195 Batch 30/173] avg loss 0.000298672, throughput 2.88731K wps
[Epoch 195 Batch 60/173] avg loss 0.00025242, throughput 2.80172K wps
[Epoch 195 Batch 90/173] avg loss 0.000302822, throughput 2.84805K wps
[Epoch 195 Batch 120/173] avg loss 0.000309925, throughput 2.85747K wps
[Epoch 195 Batch 150/173] avg loss 0.000296518, throughput 2.86264K wps
Begin Testing...
[Epoch 195] train avg loss 0.000289287, dev acc 0.7862, dev avg loss 0.663519, throughput 2.85299K wps
[Epoch 196 Batch 30/173] avg loss 0.000286787, throughput 2.86812K wps
[Epoch 196 Batch 60/173] avg loss 0.000272863, throughput 2.83787K wps
[Epoch 196 Batch 90/173] avg loss 0.000313841, throughput 2.81133K wps
[Epoch 196 Batch 120/173] avg loss 0.000290919, throughput 2.84204K wps
[Epoch 196 Batch 150/173] avg loss 0.000304776, throughput 2.8768K wps
Begin Testing...
[Epoch 196] train avg loss 0.000291487, dev acc 0.7873, dev avg loss 0.664419, throughput 2.84823K wps
[Epoch 197 Batch 30/173] avg loss 0.000280119, throughput 2.91126K wps
[Epoch 197 Batch 60/173] avg loss 0.000310955, throughput 2.83524K wps
[Epoch 197 Batch 90/173] avg loss 0.000272869, throughput 2.87009K wps
[Epoch 197 Batch 120/173] avg loss 0.000290172, throughput 2.86475K wps
[Epoch 197 Batch 150/173] avg loss 0.000289505, throughput 2.85713K wps
Begin Testing...
[Epoch 197] train avg loss 0.000289769, dev acc 0.7842, dev avg loss 0.670371, throughput 2.86776K wps
[Epoch 198 Batch 30/173] avg loss 0.000296486, throughput 2.91121K wps
[Epoch 198 Batch 60/173] avg loss 0.000305907, throughput 2.82737K wps
[Epoch 198 Batch 90/173] avg loss 0.000281486, throughput 2.84845K wps
[Epoch 198 Batch 120/173] avg loss 0.000303389, throughput 2.84707K wps
[Epoch 198 Batch 150/173] avg loss 0.000238262, throughput 2.79797K wps
Begin Testing...
[Epoch 198] train avg loss 0.00028813, dev acc 0.7810, dev avg loss 0.670797, throughput 2.84626K wps
[Epoch 199 Batch 30/173] avg loss 0.000231246, throughput 2.92073K wps
[Epoch 199 Batch 60/173] avg loss 0.000323365, throughput 2.86744K wps
[Epoch 199 Batch 90/173] avg loss 0.000315559, throughput 2.85754K wps
[Epoch 199 Batch 120/173] avg loss 0.000292025, throughput 2.84629K wps
[Epoch 199 Batch 150/173] avg loss 0.000289151, throughput 2.85968K wps
Begin Testing...
[Epoch 199] train avg loss 0.000289742, dev acc 0.7842, dev avg loss 0.670702, throughput 2.86846K wps
Test loss 0.459057, test acc 0.7983
Total time cost 705.05s
[Epoch 0 Batch 30/173] avg loss 0.0140847, throughput 2.49248K wps
[Epoch 0 Batch 60/173] avg loss 0.0139572, throughput 2.86798K wps
[Epoch 0 Batch 90/173] avg loss 0.013991, throughput 2.87461K wps
[Epoch 0 Batch 120/173] avg loss 0.0139334, throughput 2.86823K wps
[Epoch 0 Batch 150/173] avg loss 0.0138198, throughput 2.81236K wps
Begin Testing...
[Epoch 0] train avg loss 0.0139777, dev acc 0.5985, dev avg loss 0.684126, throughput 2.78284K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0138313, throughput 2.86354K wps
[Epoch 1 Batch 60/173] avg loss 0.0137581, throughput 2.84908K wps
[Epoch 1 Batch 90/173] avg loss 0.0138352, throughput 2.88332K wps
[Epoch 1 Batch 120/173] avg loss 0.0136767, throughput 2.87685K wps
[Epoch 1 Batch 150/173] avg loss 0.0137124, throughput 2.83408K wps
Begin Testing...
[Epoch 1] train avg loss 0.013776, dev acc 0.6663, dev avg loss 0.677071, throughput 2.85792K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0136724, throughput 2.94197K wps
[Epoch 2 Batch 60/173] avg loss 0.0136192, throughput 2.87592K wps
[Epoch 2 Batch 90/173] avg loss 0.0136052, throughput 2.81939K wps
[Epoch 2 Batch 120/173] avg loss 0.0135573, throughput 2.85205K wps
[Epoch 2 Batch 150/173] avg loss 0.0135711, throughput 2.86581K wps
Begin Testing...
[Epoch 2] train avg loss 0.0136322, dev acc 0.6830, dev avg loss 0.669083, throughput 2.87155K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0134634, throughput 2.94149K wps
[Epoch 3 Batch 60/173] avg loss 0.0134601, throughput 2.86373K wps
[Epoch 3 Batch 90/173] avg loss 0.0134536, throughput 2.87139K wps
[Epoch 3 Batch 120/173] avg loss 0.0134386, throughput 2.85138K wps
[Epoch 3 Batch 150/173] avg loss 0.0134068, throughput 2.8748K wps
Begin Testing...
[Epoch 3] train avg loss 0.0134534, dev acc 0.7018, dev avg loss 0.662137, throughput 2.8775K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/173] avg loss 0.0133476, throughput 2.90283K wps
[Epoch 4 Batch 60/173] avg loss 0.0133047, throughput 2.82909K wps
[Epoch 4 Batch 90/173] avg loss 0.0132369, throughput 2.84061K wps
[Epoch 4 Batch 120/173] avg loss 0.0132073, throughput 2.83194K wps
[Epoch 4 Batch 150/173] avg loss 0.0131595, throughput 2.84361K wps
Begin Testing...
[Epoch 4] train avg loss 0.0132736, dev acc 0.7112, dev avg loss 0.653752, throughput 2.85001K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.0131351, throughput 2.93057K wps
[Epoch 5 Batch 60/173] avg loss 0.0130543, throughput 2.85865K wps
[Epoch 5 Batch 90/173] avg loss 0.013176, throughput 2.82576K wps
[Epoch 5 Batch 120/173] avg loss 0.0129575, throughput 2.76488K wps
[Epoch 5 Batch 150/173] avg loss 0.0130409, throughput 2.70924K wps
Begin Testing...
[Epoch 5] train avg loss 0.013085, dev acc 0.7185, dev avg loss 0.645623, throughput 2.80013K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/173] avg loss 0.0128542, throughput 2.90394K wps
[Epoch 6 Batch 60/173] avg loss 0.012805, throughput 2.86029K wps
[Epoch 6 Batch 90/173] avg loss 0.0128725, throughput 2.85817K wps
[Epoch 6 Batch 120/173] avg loss 0.0129579, throughput 2.85718K wps
[Epoch 6 Batch 150/173] avg loss 0.0126989, throughput 2.87516K wps
Begin Testing...
[Epoch 6] train avg loss 0.0128528, dev acc 0.7122, dev avg loss 0.634115, throughput 2.86905K wps
[Epoch 7 Batch 30/173] avg loss 0.0128057, throughput 2.89885K wps
[Epoch 7 Batch 60/173] avg loss 0.0127182, throughput 2.87402K wps
[Epoch 7 Batch 90/173] avg loss 0.0126083, throughput 2.85296K wps
[Epoch 7 Batch 120/173] avg loss 0.0126626, throughput 2.85465K wps
[Epoch 7 Batch 150/173] avg loss 0.012523, throughput 2.86705K wps
Begin Testing...
[Epoch 7] train avg loss 0.0126653, dev acc 0.7174, dev avg loss 0.623358, throughput 2.86207K wps
[Epoch 8 Batch 30/173] avg loss 0.012393, throughput 2.88353K wps
[Epoch 8 Batch 60/173] avg loss 0.0125691, throughput 2.85344K wps
[Epoch 8 Batch 90/173] avg loss 0.0124276, throughput 2.78248K wps
[Epoch 8 Batch 120/173] avg loss 0.012364, throughput 2.85529K wps
[Epoch 8 Batch 150/173] avg loss 0.0124661, throughput 2.85467K wps
Begin Testing...
[Epoch 8] train avg loss 0.0124316, dev acc 0.7310, dev avg loss 0.611602, throughput 2.84803K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/173] avg loss 0.0122438, throughput 2.93332K wps
[Epoch 9 Batch 60/173] avg loss 0.0122699, throughput 2.81733K wps
[Epoch 9 Batch 90/173] avg loss 0.0122149, throughput 2.80518K wps
[Epoch 9 Batch 120/173] avg loss 0.0122011, throughput 2.85579K wps
[Epoch 9 Batch 150/173] avg loss 0.0120252, throughput 2.87172K wps
Begin Testing...
[Epoch 9] train avg loss 0.0121944, dev acc 0.7310, dev avg loss 0.599281, throughput 2.85506K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/173] avg loss 0.0119889, throughput 2.93561K wps
[Epoch 10 Batch 60/173] avg loss 0.0119903, throughput 2.85939K wps
[Epoch 10 Batch 90/173] avg loss 0.0119215, throughput 2.85397K wps
[Epoch 10 Batch 120/173] avg loss 0.0118987, throughput 2.79299K wps
[Epoch 10 Batch 150/173] avg loss 0.0119501, throughput 2.79998K wps
Begin Testing...
[Epoch 10] train avg loss 0.0119235, dev acc 0.7351, dev avg loss 0.588995, throughput 2.84999K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/173] avg loss 0.0118182, throughput 2.92879K wps
[Epoch 11 Batch 60/173] avg loss 0.0117468, throughput 2.86945K wps
[Epoch 11 Batch 90/173] avg loss 0.0116548, throughput 2.84972K wps
[Epoch 11 Batch 120/173] avg loss 0.0115932, throughput 2.84402K wps
[Epoch 11 Batch 150/173] avg loss 0.0116465, throughput 2.86918K wps
Begin Testing...
[Epoch 11] train avg loss 0.0116583, dev acc 0.7289, dev avg loss 0.573458, throughput 2.87094K wps
[Epoch 12 Batch 30/173] avg loss 0.0114114, throughput 2.90234K wps
[Epoch 12 Batch 60/173] avg loss 0.0113851, throughput 2.86343K wps
[Epoch 12 Batch 90/173] avg loss 0.0113981, throughput 2.87336K wps
[Epoch 12 Batch 120/173] avg loss 0.0112872, throughput 2.86556K wps
[Epoch 12 Batch 150/173] avg loss 0.0111282, throughput 2.86364K wps
Begin Testing...
[Epoch 12] train avg loss 0.0113524, dev acc 0.7518, dev avg loss 0.560735, throughput 2.86962K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/173] avg loss 0.0113273, throughput 2.92901K wps
[Epoch 13 Batch 60/173] avg loss 0.0109319, throughput 2.86146K wps
[Epoch 13 Batch 90/173] avg loss 0.010859, throughput 2.87822K wps
[Epoch 13 Batch 120/173] avg loss 0.0108611, throughput 2.88143K wps
[Epoch 13 Batch 150/173] avg loss 0.0109225, throughput 2.84121K wps
Begin Testing...
[Epoch 13] train avg loss 0.0109875, dev acc 0.7487, dev avg loss 0.546262, throughput 2.87556K wps
[Epoch 14 Batch 30/173] avg loss 0.0109676, throughput 2.86913K wps
[Epoch 14 Batch 60/173] avg loss 0.0107177, throughput 2.85475K wps
[Epoch 14 Batch 90/173] avg loss 0.0107725, throughput 2.83864K wps
[Epoch 14 Batch 120/173] avg loss 0.0108453, throughput 2.81411K wps
[Epoch 14 Batch 150/173] avg loss 0.0107681, throughput 2.8458K wps
Begin Testing...
[Epoch 14] train avg loss 0.0107807, dev acc 0.7623, dev avg loss 0.533445, throughput 2.84559K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/173] avg loss 0.0107342, throughput 2.87291K wps
[Epoch 15 Batch 60/173] avg loss 0.0105301, throughput 2.8533K wps
[Epoch 15 Batch 90/173] avg loss 0.0104964, throughput 2.85998K wps
[Epoch 15 Batch 120/173] avg loss 0.0104152, throughput 2.86428K wps
[Epoch 15 Batch 150/173] avg loss 0.0103298, throughput 2.87132K wps
Begin Testing...
[Epoch 15] train avg loss 0.0105068, dev acc 0.7696, dev avg loss 0.524094, throughput 2.86257K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.0106053, throughput 2.92679K wps
[Epoch 16 Batch 60/173] avg loss 0.0104144, throughput 2.8134K wps
[Epoch 16 Batch 90/173] avg loss 0.0104771, throughput 2.81051K wps
[Epoch 16 Batch 120/173] avg loss 0.00998511, throughput 2.8636K wps
[Epoch 16 Batch 150/173] avg loss 0.00995218, throughput 2.85646K wps
Begin Testing...
[Epoch 16] train avg loss 0.0102839, dev acc 0.7789, dev avg loss 0.511205, throughput 2.84601K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/173] avg loss 0.0101166, throughput 2.92383K wps
[Epoch 17 Batch 60/173] avg loss 0.00993415, throughput 2.87993K wps
[Epoch 17 Batch 90/173] avg loss 0.0100379, throughput 2.86148K wps
[Epoch 17 Batch 120/173] avg loss 0.00991746, throughput 2.87041K wps
[Epoch 17 Batch 150/173] avg loss 0.0100588, throughput 2.84844K wps
Begin Testing...
[Epoch 17] train avg loss 0.0100091, dev acc 0.7748, dev avg loss 0.501992, throughput 2.86772K wps
[Epoch 18 Batch 30/173] avg loss 0.0100158, throughput 2.87848K wps
[Epoch 18 Batch 60/173] avg loss 0.00933833, throughput 2.82282K wps
[Epoch 18 Batch 90/173] avg loss 0.00997397, throughput 2.86539K wps
[Epoch 18 Batch 120/173] avg loss 0.00964822, throughput 2.82567K wps
[Epoch 18 Batch 150/173] avg loss 0.00965025, throughput 2.86479K wps
Begin Testing...
[Epoch 18] train avg loss 0.00971937, dev acc 0.7873, dev avg loss 0.491219, throughput 2.85089K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/173] avg loss 0.0093219, throughput 2.88673K wps
[Epoch 19 Batch 60/173] avg loss 0.00939992, throughput 2.84173K wps
[Epoch 19 Batch 90/173] avg loss 0.00950082, throughput 2.80131K wps
[Epoch 19 Batch 120/173] avg loss 0.00932653, throughput 2.79955K wps
[Epoch 19 Batch 150/173] avg loss 0.00964239, throughput 2.84165K wps
Begin Testing...
[Epoch 19] train avg loss 0.00947441, dev acc 0.7810, dev avg loss 0.484684, throughput 2.83822K wps
[Epoch 20 Batch 30/173] avg loss 0.00924335, throughput 2.94027K wps
[Epoch 20 Batch 60/173] avg loss 0.00901265, throughput 2.86836K wps
[Epoch 20 Batch 90/173] avg loss 0.00913958, throughput 2.80408K wps
[Epoch 20 Batch 120/173] avg loss 0.00921006, throughput 2.83352K wps
[Epoch 20 Batch 150/173] avg loss 0.00923801, throughput 2.81129K wps
Begin Testing...
[Epoch 20] train avg loss 0.00921948, dev acc 0.7842, dev avg loss 0.48113, throughput 2.84865K wps
[Epoch 21 Batch 30/173] avg loss 0.00949304, throughput 2.89498K wps
[Epoch 21 Batch 60/173] avg loss 0.00893886, throughput 2.8749K wps
[Epoch 21 Batch 90/173] avg loss 0.00885624, throughput 2.85738K wps
[Epoch 21 Batch 120/173] avg loss 0.00922596, throughput 2.86059K wps
[Epoch 21 Batch 150/173] avg loss 0.00877113, throughput 2.83418K wps
Begin Testing...
[Epoch 21] train avg loss 0.00904745, dev acc 0.7914, dev avg loss 0.469375, throughput 2.8647K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/173] avg loss 0.00895054, throughput 2.88817K wps
[Epoch 22 Batch 60/173] avg loss 0.00895046, throughput 2.87332K wps
[Epoch 22 Batch 90/173] avg loss 0.00845912, throughput 2.87501K wps
[Epoch 22 Batch 120/173] avg loss 0.0088985, throughput 2.86349K wps
[Epoch 22 Batch 150/173] avg loss 0.00869383, throughput 2.85274K wps
Begin Testing...
[Epoch 22] train avg loss 0.0087897, dev acc 0.7914, dev avg loss 0.461917, throughput 2.865K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/173] avg loss 0.00880351, throughput 2.91152K wps
[Epoch 23 Batch 60/173] avg loss 0.0084555, throughput 2.87505K wps
[Epoch 23 Batch 90/173] avg loss 0.00832395, throughput 2.86957K wps
[Epoch 23 Batch 120/173] avg loss 0.00863748, throughput 2.86075K wps
[Epoch 23 Batch 150/173] avg loss 0.00836286, throughput 2.86979K wps
Begin Testing...
[Epoch 23] train avg loss 0.00852725, dev acc 0.7925, dev avg loss 0.454571, throughput 2.87765K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/173] avg loss 0.00844457, throughput 2.9391K wps
[Epoch 24 Batch 60/173] avg loss 0.00863449, throughput 2.87481K wps
[Epoch 24 Batch 90/173] avg loss 0.00831235, throughput 2.8194K wps
[Epoch 24 Batch 120/173] avg loss 0.00827341, throughput 2.80149K wps
[Epoch 24 Batch 150/173] avg loss 0.00822607, throughput 2.82592K wps
Begin Testing...
[Epoch 24] train avg loss 0.00839791, dev acc 0.8040, dev avg loss 0.448566, throughput 2.85252K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/173] avg loss 0.00808247, throughput 2.85495K wps
[Epoch 25 Batch 60/173] avg loss 0.00824519, throughput 2.84725K wps
[Epoch 25 Batch 90/173] avg loss 0.0082166, throughput 2.8767K wps
[Epoch 25 Batch 120/173] avg loss 0.0080804, throughput 2.87733K wps
[Epoch 25 Batch 150/173] avg loss 0.00831519, throughput 2.83119K wps
Begin Testing...
[Epoch 25] train avg loss 0.00823149, dev acc 0.8040, dev avg loss 0.444424, throughput 2.8573K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/173] avg loss 0.00769994, throughput 2.93683K wps
[Epoch 26 Batch 60/173] avg loss 0.00776037, throughput 2.86349K wps
[Epoch 26 Batch 90/173] avg loss 0.0081638, throughput 2.84681K wps
[Epoch 26 Batch 120/173] avg loss 0.00821199, throughput 2.83188K wps
[Epoch 26 Batch 150/173] avg loss 0.00799796, throughput 2.86166K wps
Begin Testing...
[Epoch 26] train avg loss 0.00801279, dev acc 0.8113, dev avg loss 0.439087, throughput 2.86865K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/173] avg loss 0.00816922, throughput 2.93786K wps
[Epoch 27 Batch 60/173] avg loss 0.00804062, throughput 2.8629K wps
[Epoch 27 Batch 90/173] avg loss 0.00773312, throughput 2.83896K wps
[Epoch 27 Batch 120/173] avg loss 0.00770957, throughput 2.81563K wps
[Epoch 27 Batch 150/173] avg loss 0.00764176, throughput 2.88009K wps
Begin Testing...
[Epoch 27] train avg loss 0.00786039, dev acc 0.8040, dev avg loss 0.435949, throughput 2.86879K wps
[Epoch 28 Batch 30/173] avg loss 0.00784368, throughput 2.90715K wps
[Epoch 28 Batch 60/173] avg loss 0.00781691, throughput 2.84624K wps
[Epoch 28 Batch 90/173] avg loss 0.00738057, throughput 2.86593K wps
[Epoch 28 Batch 120/173] avg loss 0.00765471, throughput 2.87039K wps
[Epoch 28 Batch 150/173] avg loss 0.00783048, throughput 2.87125K wps
Begin Testing...
[Epoch 28] train avg loss 0.00769399, dev acc 0.8092, dev avg loss 0.431193, throughput 2.87249K wps
[Epoch 29 Batch 30/173] avg loss 0.00762849, throughput 2.89999K wps
[Epoch 29 Batch 60/173] avg loss 0.00753411, throughput 2.86642K wps
[Epoch 29 Batch 90/173] avg loss 0.00746096, throughput 2.83403K wps
[Epoch 29 Batch 120/173] avg loss 0.00722797, throughput 2.81584K wps
[Epoch 29 Batch 150/173] avg loss 0.0078268, throughput 2.86707K wps
Begin Testing...
[Epoch 29] train avg loss 0.00753954, dev acc 0.8144, dev avg loss 0.427083, throughput 2.85852K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/173] avg loss 0.00707328, throughput 2.92053K wps
[Epoch 30 Batch 60/173] avg loss 0.00766981, throughput 2.88439K wps
[Epoch 30 Batch 90/173] avg loss 0.00714419, throughput 2.84575K wps
[Epoch 30 Batch 120/173] avg loss 0.00730012, throughput 2.80342K wps
[Epoch 30 Batch 150/173] avg loss 0.00701633, throughput 2.81606K wps
Begin Testing...
[Epoch 30] train avg loss 0.00729432, dev acc 0.8154, dev avg loss 0.423899, throughput 2.85442K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/173] avg loss 0.00729706, throughput 2.92959K wps
[Epoch 31 Batch 60/173] avg loss 0.00707505, throughput 2.84851K wps
[Epoch 31 Batch 90/173] avg loss 0.0071513, throughput 2.84786K wps
[Epoch 31 Batch 120/173] avg loss 0.00683026, throughput 2.85446K wps
[Epoch 31 Batch 150/173] avg loss 0.00705751, throughput 2.87937K wps
Begin Testing...
[Epoch 31] train avg loss 0.00711568, dev acc 0.8165, dev avg loss 0.420455, throughput 2.87015K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/173] avg loss 0.00686867, throughput 2.92664K wps
[Epoch 32 Batch 60/173] avg loss 0.00721462, throughput 2.84726K wps
[Epoch 32 Batch 90/173] avg loss 0.00692042, throughput 2.867K wps
[Epoch 32 Batch 120/173] avg loss 0.00729203, throughput 2.85071K wps
[Epoch 32 Batch 150/173] avg loss 0.00670557, throughput 2.87884K wps
Begin Testing...
[Epoch 32] train avg loss 0.00702035, dev acc 0.8175, dev avg loss 0.416825, throughput 2.87231K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/173] avg loss 0.00683694, throughput 2.92651K wps
[Epoch 33 Batch 60/173] avg loss 0.00681391, throughput 2.87196K wps
[Epoch 33 Batch 90/173] avg loss 0.00686842, throughput 2.84881K wps
[Epoch 33 Batch 120/173] avg loss 0.00661721, throughput 2.83048K wps
[Epoch 33 Batch 150/173] avg loss 0.00719597, throughput 2.87098K wps
Begin Testing...
[Epoch 33] train avg loss 0.0068615, dev acc 0.8206, dev avg loss 0.413656, throughput 2.87085K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/173] avg loss 0.00656275, throughput 2.92583K wps
[Epoch 34 Batch 60/173] avg loss 0.00670423, throughput 2.85774K wps
[Epoch 34 Batch 90/173] avg loss 0.00659682, throughput 2.86199K wps
[Epoch 34 Batch 120/173] avg loss 0.00666958, throughput 2.87646K wps
[Epoch 34 Batch 150/173] avg loss 0.00685215, throughput 2.85275K wps
Begin Testing...
[Epoch 34] train avg loss 0.00667084, dev acc 0.8227, dev avg loss 0.411576, throughput 2.8668K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/173] avg loss 0.00637837, throughput 2.9155K wps
[Epoch 35 Batch 60/173] avg loss 0.00638793, throughput 2.86912K wps
[Epoch 35 Batch 90/173] avg loss 0.0065372, throughput 2.86915K wps
[Epoch 35 Batch 120/173] avg loss 0.00668274, throughput 2.86527K wps
[Epoch 35 Batch 150/173] avg loss 0.00650314, throughput 2.84315K wps
Begin Testing...
[Epoch 35] train avg loss 0.00646738, dev acc 0.8217, dev avg loss 0.408673, throughput 2.86827K wps
[Epoch 36 Batch 30/173] avg loss 0.0061988, throughput 2.93508K wps
[Epoch 36 Batch 60/173] avg loss 0.00626575, throughput 2.88104K wps
[Epoch 36 Batch 90/173] avg loss 0.00605625, throughput 2.88396K wps
[Epoch 36 Batch 120/173] avg loss 0.00656961, throughput 2.84976K wps
[Epoch 36 Batch 150/173] avg loss 0.00630203, throughput 2.85422K wps
Begin Testing...
[Epoch 36] train avg loss 0.00626228, dev acc 0.8248, dev avg loss 0.406515, throughput 2.88065K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/173] avg loss 0.00591896, throughput 2.8826K wps
[Epoch 37 Batch 60/173] avg loss 0.00640889, throughput 2.80011K wps
[Epoch 37 Batch 90/173] avg loss 0.00611483, throughput 2.87034K wps
[Epoch 37 Batch 120/173] avg loss 0.00601455, throughput 2.86704K wps
[Epoch 37 Batch 150/173] avg loss 0.00616099, throughput 2.85786K wps
Begin Testing...
[Epoch 37] train avg loss 0.00616333, dev acc 0.8227, dev avg loss 0.405587, throughput 2.8544K wps
[Epoch 38 Batch 30/173] avg loss 0.00615502, throughput 2.93151K wps
[Epoch 38 Batch 60/173] avg loss 0.00615451, throughput 2.87216K wps
[Epoch 38 Batch 90/173] avg loss 0.00621702, throughput 2.87169K wps
[Epoch 38 Batch 120/173] avg loss 0.00584656, throughput 2.83219K wps
[Epoch 38 Batch 150/173] avg loss 0.00593199, throughput 2.86653K wps
Begin Testing...
[Epoch 38] train avg loss 0.00609259, dev acc 0.8269, dev avg loss 0.402797, throughput 2.8711K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/173] avg loss 0.00612521, throughput 2.94548K wps
[Epoch 39 Batch 60/173] avg loss 0.00587649, throughput 2.87025K wps
[Epoch 39 Batch 90/173] avg loss 0.00585434, throughput 2.84695K wps
[Epoch 39 Batch 120/173] avg loss 0.00568069, throughput 2.84707K wps
[Epoch 39 Batch 150/173] avg loss 0.00586847, throughput 2.84576K wps
Begin Testing...
[Epoch 39] train avg loss 0.00588694, dev acc 0.8227, dev avg loss 0.403086, throughput 2.86903K wps
[Epoch 40 Batch 30/173] avg loss 0.00595669, throughput 2.90465K wps
[Epoch 40 Batch 60/173] avg loss 0.00578688, throughput 2.84351K wps
[Epoch 40 Batch 90/173] avg loss 0.00555918, throughput 2.85492K wps
[Epoch 40 Batch 120/173] avg loss 0.00588171, throughput 2.8513K wps
[Epoch 40 Batch 150/173] avg loss 0.0057814, throughput 2.84121K wps
Begin Testing...
[Epoch 40] train avg loss 0.00583291, dev acc 0.8227, dev avg loss 0.4011, throughput 2.8614K wps
[Epoch 41 Batch 30/173] avg loss 0.00572821, throughput 2.90473K wps
[Epoch 41 Batch 60/173] avg loss 0.00547408, throughput 2.87784K wps
[Epoch 41 Batch 90/173] avg loss 0.00556616, throughput 2.85993K wps
[Epoch 41 Batch 120/173] avg loss 0.00526646, throughput 2.87578K wps
[Epoch 41 Batch 150/173] avg loss 0.00583346, throughput 2.84823K wps
Begin Testing...
[Epoch 41] train avg loss 0.0055535, dev acc 0.8227, dev avg loss 0.400026, throughput 2.87251K wps
[Epoch 42 Batch 30/173] avg loss 0.00552465, throughput 2.91047K wps
[Epoch 42 Batch 60/173] avg loss 0.00543939, throughput 2.86594K wps
[Epoch 42 Batch 90/173] avg loss 0.00545637, throughput 2.84222K wps
[Epoch 42 Batch 120/173] avg loss 0.00527716, throughput 2.85464K wps
[Epoch 42 Batch 150/173] avg loss 0.0057226, throughput 2.8226K wps
Begin Testing...
[Epoch 42] train avg loss 0.0054983, dev acc 0.8290, dev avg loss 0.395998, throughput 2.86033K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/173] avg loss 0.00530803, throughput 2.93741K wps
[Epoch 43 Batch 60/173] avg loss 0.00527158, throughput 2.86702K wps
[Epoch 43 Batch 90/173] avg loss 0.00530565, throughput 2.83774K wps
[Epoch 43 Batch 120/173] avg loss 0.00534754, throughput 2.82204K wps
[Epoch 43 Batch 150/173] avg loss 0.00545988, throughput 2.84926K wps
Begin Testing...
[Epoch 43] train avg loss 0.00534518, dev acc 0.8279, dev avg loss 0.395407, throughput 2.85266K wps
[Epoch 44 Batch 30/173] avg loss 0.00524197, throughput 2.93114K wps
[Epoch 44 Batch 60/173] avg loss 0.004814, throughput 2.84141K wps
[Epoch 44 Batch 90/173] avg loss 0.00513633, throughput 2.87299K wps
[Epoch 44 Batch 120/173] avg loss 0.00524174, throughput 2.84828K wps
[Epoch 44 Batch 150/173] avg loss 0.00508993, throughput 2.87368K wps
Begin Testing...
[Epoch 44] train avg loss 0.00511101, dev acc 0.8290, dev avg loss 0.394101, throughput 2.86781K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/173] avg loss 0.00496815, throughput 2.92893K wps
[Epoch 45 Batch 60/173] avg loss 0.00497221, throughput 2.8379K wps
[Epoch 45 Batch 90/173] avg loss 0.0051254, throughput 2.85642K wps
[Epoch 45 Batch 120/173] avg loss 0.00505434, throughput 2.85677K wps
[Epoch 45 Batch 150/173] avg loss 0.00502426, throughput 2.79776K wps
Begin Testing...
[Epoch 45] train avg loss 0.0050658, dev acc 0.8269, dev avg loss 0.393965, throughput 2.85773K wps
[Epoch 46 Batch 30/173] avg loss 0.00451187, throughput 2.91299K wps
[Epoch 46 Batch 60/173] avg loss 0.00479579, throughput 2.83236K wps
[Epoch 46 Batch 90/173] avg loss 0.00491231, throughput 2.86678K wps
[Epoch 46 Batch 120/173] avg loss 0.0046696, throughput 2.84801K wps
[Epoch 46 Batch 150/173] avg loss 0.00497521, throughput 2.8734K wps
Begin Testing...
[Epoch 46] train avg loss 0.00486775, dev acc 0.8248, dev avg loss 0.391608, throughput 2.85506K wps
[Epoch 47 Batch 30/173] avg loss 0.00466413, throughput 2.90538K wps
[Epoch 47 Batch 60/173] avg loss 0.00480451, throughput 2.83077K wps
[Epoch 47 Batch 90/173] avg loss 0.00452267, throughput 2.85213K wps
[Epoch 47 Batch 120/173] avg loss 0.00491196, throughput 2.87109K wps
[Epoch 47 Batch 150/173] avg loss 0.00476051, throughput 2.83524K wps
Begin Testing...
[Epoch 47] train avg loss 0.00477132, dev acc 0.8279, dev avg loss 0.391183, throughput 2.86069K wps
[Epoch 48 Batch 30/173] avg loss 0.00418109, throughput 2.91682K wps
[Epoch 48 Batch 60/173] avg loss 0.00487603, throughput 2.87152K wps
[Epoch 48 Batch 90/173] avg loss 0.00452066, throughput 2.88429K wps
[Epoch 48 Batch 120/173] avg loss 0.00467279, throughput 2.84224K wps
[Epoch 48 Batch 150/173] avg loss 0.0046928, throughput 2.87732K wps
Begin Testing...
[Epoch 48] train avg loss 0.00460367, dev acc 0.8269, dev avg loss 0.391776, throughput 2.87768K wps
[Epoch 49 Batch 30/173] avg loss 0.00459675, throughput 2.93792K wps
[Epoch 49 Batch 60/173] avg loss 0.00460386, throughput 2.85605K wps
[Epoch 49 Batch 90/173] avg loss 0.0043029, throughput 2.83591K wps
[Epoch 49 Batch 120/173] avg loss 0.00463868, throughput 2.85173K wps
[Epoch 49 Batch 150/173] avg loss 0.00437461, throughput 2.87411K wps
Begin Testing...
[Epoch 49] train avg loss 0.00450728, dev acc 0.8259, dev avg loss 0.390519, throughput 2.87163K wps
[Epoch 50 Batch 30/173] avg loss 0.00424851, throughput 2.86653K wps
[Epoch 50 Batch 60/173] avg loss 0.00443666, throughput 2.81424K wps
[Epoch 50 Batch 90/173] avg loss 0.00462549, throughput 2.83506K wps
[Epoch 50 Batch 120/173] avg loss 0.00444527, throughput 2.87052K wps
[Epoch 50 Batch 150/173] avg loss 0.00441719, throughput 2.8788K wps
Begin Testing...
[Epoch 50] train avg loss 0.00445522, dev acc 0.8342, dev avg loss 0.390482, throughput 2.8546K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/173] avg loss 0.00397682, throughput 2.89797K wps
[Epoch 51 Batch 60/173] avg loss 0.00413468, throughput 2.87552K wps
[Epoch 51 Batch 90/173] avg loss 0.00446815, throughput 2.84579K wps
[Epoch 51 Batch 120/173] avg loss 0.00435455, throughput 2.86504K wps
[Epoch 51 Batch 150/173] avg loss 0.00428029, throughput 2.85457K wps
Begin Testing...
[Epoch 51] train avg loss 0.00423039, dev acc 0.8279, dev avg loss 0.38888, throughput 2.85919K wps
[Epoch 52 Batch 30/173] avg loss 0.00416055, throughput 2.92777K wps
[Epoch 52 Batch 60/173] avg loss 0.00382108, throughput 2.87236K wps
[Epoch 52 Batch 90/173] avg loss 0.00427656, throughput 2.85511K wps
[Epoch 52 Batch 120/173] avg loss 0.00435408, throughput 2.83254K wps
[Epoch 52 Batch 150/173] avg loss 0.003837, throughput 2.80679K wps
Begin Testing...
[Epoch 52] train avg loss 0.00416176, dev acc 0.8290, dev avg loss 0.389627, throughput 2.85517K wps
[Epoch 53 Batch 30/173] avg loss 0.00419201, throughput 2.86344K wps
[Epoch 53 Batch 60/173] avg loss 0.00371846, throughput 2.82287K wps
[Epoch 53 Batch 90/173] avg loss 0.00420115, throughput 2.81305K wps
[Epoch 53 Batch 120/173] avg loss 0.00439771, throughput 2.85164K wps
[Epoch 53 Batch 150/173] avg loss 0.00393365, throughput 2.84857K wps
Begin Testing...
[Epoch 53] train avg loss 0.00410033, dev acc 0.8269, dev avg loss 0.391288, throughput 2.84242K wps
[Epoch 54 Batch 30/173] avg loss 0.00402495, throughput 2.92816K wps
[Epoch 54 Batch 60/173] avg loss 0.00394518, throughput 2.79278K wps
[Epoch 54 Batch 90/173] avg loss 0.00391936, throughput 2.8405K wps
[Epoch 54 Batch 120/173] avg loss 0.00393948, throughput 2.87447K wps
[Epoch 54 Batch 150/173] avg loss 0.00393108, throughput 2.88212K wps
Begin Testing...
[Epoch 54] train avg loss 0.0039581, dev acc 0.8352, dev avg loss 0.389858, throughput 2.86295K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/173] avg loss 0.00405343, throughput 2.9349K wps
[Epoch 55 Batch 60/173] avg loss 0.00376124, throughput 2.83464K wps
[Epoch 55 Batch 90/173] avg loss 0.0039682, throughput 2.84612K wps
[Epoch 55 Batch 120/173] avg loss 0.00385517, throughput 2.78883K wps
[Epoch 55 Batch 150/173] avg loss 0.00360312, throughput 2.86666K wps
Begin Testing...
[Epoch 55] train avg loss 0.00386873, dev acc 0.8321, dev avg loss 0.388238, throughput 2.84585K wps
[Epoch 56 Batch 30/173] avg loss 0.0037872, throughput 2.90986K wps
[Epoch 56 Batch 60/173] avg loss 0.00377031, throughput 2.87627K wps
[Epoch 56 Batch 90/173] avg loss 0.00381695, throughput 2.87735K wps
[Epoch 56 Batch 120/173] avg loss 0.00385723, throughput 2.86273K wps
[Epoch 56 Batch 150/173] avg loss 0.003463, throughput 2.8713K wps
Begin Testing...
[Epoch 56] train avg loss 0.00376419, dev acc 0.8311, dev avg loss 0.389732, throughput 2.87732K wps
[Epoch 57 Batch 30/173] avg loss 0.00344649, throughput 2.94587K wps
[Epoch 57 Batch 60/173] avg loss 0.00356234, throughput 2.86534K wps
[Epoch 57 Batch 90/173] avg loss 0.00378962, throughput 2.85554K wps
[Epoch 57 Batch 120/173] avg loss 0.00395546, throughput 2.8649K wps
[Epoch 57 Batch 150/173] avg loss 0.00361128, throughput 2.87106K wps
Begin Testing...
[Epoch 57] train avg loss 0.00366979, dev acc 0.8321, dev avg loss 0.389497, throughput 2.87033K wps
[Epoch 58 Batch 30/173] avg loss 0.00333184, throughput 2.94904K wps
[Epoch 58 Batch 60/173] avg loss 0.0033215, throughput 2.87979K wps
[Epoch 58 Batch 90/173] avg loss 0.00382113, throughput 2.88347K wps
[Epoch 58 Batch 120/173] avg loss 0.00387027, throughput 2.88085K wps
[Epoch 58 Batch 150/173] avg loss 0.00346545, throughput 2.86147K wps
Begin Testing...
[Epoch 58] train avg loss 0.00355264, dev acc 0.8342, dev avg loss 0.389643, throughput 2.88115K wps
[Epoch 59 Batch 30/173] avg loss 0.00347751, throughput 2.86932K wps
[Epoch 59 Batch 60/173] avg loss 0.00336435, throughput 2.86848K wps
[Epoch 59 Batch 90/173] avg loss 0.00320685, throughput 2.84805K wps
[Epoch 59 Batch 120/173] avg loss 0.00341435, throughput 2.86768K wps
[Epoch 59 Batch 150/173] avg loss 0.003484, throughput 2.86792K wps
Begin Testing...
[Epoch 59] train avg loss 0.00340484, dev acc 0.8321, dev avg loss 0.392738, throughput 2.85416K wps
[Epoch 60 Batch 30/173] avg loss 0.00335431, throughput 2.90825K wps
[Epoch 60 Batch 60/173] avg loss 0.00347238, throughput 2.87277K wps
[Epoch 60 Batch 90/173] avg loss 0.00329562, throughput 2.8597K wps
[Epoch 60 Batch 120/173] avg loss 0.00352417, throughput 2.85894K wps
[Epoch 60 Batch 150/173] avg loss 0.00328246, throughput 2.84701K wps
Begin Testing...
[Epoch 60] train avg loss 0.00337835, dev acc 0.8342, dev avg loss 0.390023, throughput 2.86811K wps
[Epoch 61 Batch 30/173] avg loss 0.00325644, throughput 2.9453K wps
[Epoch 61 Batch 60/173] avg loss 0.00331675, throughput 2.87294K wps
[Epoch 61 Batch 90/173] avg loss 0.00308598, throughput 2.86678K wps
[Epoch 61 Batch 120/173] avg loss 0.00325212, throughput 2.8566K wps
[Epoch 61 Batch 150/173] avg loss 0.00343652, throughput 2.84317K wps
Begin Testing...
[Epoch 61] train avg loss 0.00326702, dev acc 0.8279, dev avg loss 0.391362, throughput 2.87321K wps
[Epoch 62 Batch 30/173] avg loss 0.00316878, throughput 2.86952K wps
[Epoch 62 Batch 60/173] avg loss 0.00332171, throughput 2.84551K wps
[Epoch 62 Batch 90/173] avg loss 0.00319979, throughput 2.85834K wps
[Epoch 62 Batch 120/173] avg loss 0.00302105, throughput 2.87022K wps
[Epoch 62 Batch 150/173] avg loss 0.00315789, throughput 2.82268K wps
Begin Testing...
[Epoch 62] train avg loss 0.00317791, dev acc 0.8373, dev avg loss 0.390088, throughput 2.85568K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/173] avg loss 0.00327255, throughput 2.94071K wps
[Epoch 63 Batch 60/173] avg loss 0.00301305, throughput 2.85471K wps
[Epoch 63 Batch 90/173] avg loss 0.003044, throughput 2.86835K wps
[Epoch 63 Batch 120/173] avg loss 0.00325471, throughput 2.85288K wps
[Epoch 63 Batch 150/173] avg loss 0.00312602, throughput 2.8768K wps
Begin Testing...
[Epoch 63] train avg loss 0.00313505, dev acc 0.8373, dev avg loss 0.391207, throughput 2.87463K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/173] avg loss 0.00294702, throughput 2.88639K wps
[Epoch 64 Batch 60/173] avg loss 0.00318591, throughput 2.86795K wps
[Epoch 64 Batch 90/173] avg loss 0.00301184, throughput 2.83409K wps
[Epoch 64 Batch 120/173] avg loss 0.00295035, throughput 2.82281K wps
[Epoch 64 Batch 150/173] avg loss 0.00278614, throughput 2.86989K wps
Begin Testing...
[Epoch 64] train avg loss 0.00300574, dev acc 0.8352, dev avg loss 0.392377, throughput 2.85497K wps
[Epoch 65 Batch 30/173] avg loss 0.00283749, throughput 2.8944K wps
[Epoch 65 Batch 60/173] avg loss 0.00275184, throughput 2.84194K wps
[Epoch 65 Batch 90/173] avg loss 0.00287264, throughput 2.87047K wps
[Epoch 65 Batch 120/173] avg loss 0.00288938, throughput 2.8649K wps
[Epoch 65 Batch 150/173] avg loss 0.00289637, throughput 2.83859K wps
Begin Testing...
[Epoch 65] train avg loss 0.00290776, dev acc 0.8290, dev avg loss 0.393214, throughput 2.86112K wps
[Epoch 66 Batch 30/173] avg loss 0.00296695, throughput 2.9196K wps
[Epoch 66 Batch 60/173] avg loss 0.00271939, throughput 2.87074K wps
[Epoch 66 Batch 90/173] avg loss 0.00294696, throughput 2.86624K wps
[Epoch 66 Batch 120/173] avg loss 0.00281355, throughput 2.86733K wps
[Epoch 66 Batch 150/173] avg loss 0.00282576, throughput 2.87092K wps
Begin Testing...
[Epoch 66] train avg loss 0.00282613, dev acc 0.8300, dev avg loss 0.394011, throughput 2.87077K wps
[Epoch 67 Batch 30/173] avg loss 0.00277128, throughput 2.87199K wps
[Epoch 67 Batch 60/173] avg loss 0.0029871, throughput 2.85969K wps
[Epoch 67 Batch 90/173] avg loss 0.00260626, throughput 2.83833K wps
[Epoch 67 Batch 120/173] avg loss 0.00272382, throughput 2.79973K wps
[Epoch 67 Batch 150/173] avg loss 0.00273182, throughput 2.85963K wps
Begin Testing...
[Epoch 67] train avg loss 0.00278421, dev acc 0.8300, dev avg loss 0.397395, throughput 2.8475K wps
[Epoch 68 Batch 30/173] avg loss 0.00254071, throughput 2.88732K wps
[Epoch 68 Batch 60/173] avg loss 0.00259797, throughput 2.88095K wps
[Epoch 68 Batch 90/173] avg loss 0.00290869, throughput 2.8562K wps
[Epoch 68 Batch 120/173] avg loss 0.00252081, throughput 2.86766K wps
[Epoch 68 Batch 150/173] avg loss 0.00259015, throughput 2.85251K wps
Begin Testing...
[Epoch 68] train avg loss 0.00265293, dev acc 0.8290, dev avg loss 0.397528, throughput 2.86123K wps
[Epoch 69 Batch 30/173] avg loss 0.00252086, throughput 2.90593K wps
[Epoch 69 Batch 60/173] avg loss 0.00269804, throughput 2.87806K wps
[Epoch 69 Batch 90/173] avg loss 0.00260596, throughput 2.84715K wps
[Epoch 69 Batch 120/173] avg loss 0.00267661, throughput 2.85796K wps
[Epoch 69 Batch 150/173] avg loss 0.00251143, throughput 2.8392K wps
Begin Testing...
[Epoch 69] train avg loss 0.00260394, dev acc 0.8311, dev avg loss 0.397755, throughput 2.86194K wps
[Epoch 70 Batch 30/173] avg loss 0.0025116, throughput 2.94407K wps
[Epoch 70 Batch 60/173] avg loss 0.00254023, throughput 2.88163K wps
[Epoch 70 Batch 90/173] avg loss 0.0025112, throughput 2.84961K wps
[Epoch 70 Batch 120/173] avg loss 0.00247576, throughput 2.87674K wps
[Epoch 70 Batch 150/173] avg loss 0.00244306, throughput 2.86132K wps
Begin Testing...
[Epoch 70] train avg loss 0.00252276, dev acc 0.8290, dev avg loss 0.398698, throughput 2.87945K wps
[Epoch 71 Batch 30/173] avg loss 0.00232315, throughput 2.89644K wps
[Epoch 71 Batch 60/173] avg loss 0.00244128, throughput 2.87754K wps
[Epoch 71 Batch 90/173] avg loss 0.00235694, throughput 2.84273K wps
[Epoch 71 Batch 120/173] avg loss 0.00253409, throughput 2.88294K wps
[Epoch 71 Batch 150/173] avg loss 0.00248011, throughput 2.87101K wps
Begin Testing...
[Epoch 71] train avg loss 0.00242244, dev acc 0.8227, dev avg loss 0.403925, throughput 2.87219K wps
[Epoch 72 Batch 30/173] avg loss 0.00230715, throughput 2.92734K wps
[Epoch 72 Batch 60/173] avg loss 0.00242317, throughput 2.84435K wps
[Epoch 72 Batch 90/173] avg loss 0.00242506, throughput 2.87341K wps
[Epoch 72 Batch 120/173] avg loss 0.00235601, throughput 2.86272K wps
[Epoch 72 Batch 150/173] avg loss 0.00242116, throughput 2.85126K wps
Begin Testing...
[Epoch 72] train avg loss 0.00237839, dev acc 0.8311, dev avg loss 0.401002, throughput 2.86542K wps
[Epoch 73 Batch 30/173] avg loss 0.00211893, throughput 2.86779K wps
[Epoch 73 Batch 60/173] avg loss 0.00229356, throughput 2.86823K wps
[Epoch 73 Batch 90/173] avg loss 0.00226822, throughput 2.8473K wps
[Epoch 73 Batch 120/173] avg loss 0.00233477, throughput 2.86288K wps
[Epoch 73 Batch 150/173] avg loss 0.00248233, throughput 2.85892K wps
Begin Testing...
[Epoch 73] train avg loss 0.00229597, dev acc 0.8248, dev avg loss 0.403355, throughput 2.85946K wps
[Epoch 74 Batch 30/173] avg loss 0.00216431, throughput 2.92846K wps
[Epoch 74 Batch 60/173] avg loss 0.00230044, throughput 2.86154K wps
[Epoch 74 Batch 90/173] avg loss 0.00235388, throughput 2.81057K wps
[Epoch 74 Batch 120/173] avg loss 0.00217353, throughput 2.80944K wps
[Epoch 74 Batch 150/173] avg loss 0.00229223, throughput 2.87315K wps
Begin Testing...
[Epoch 74] train avg loss 0.00226789, dev acc 0.8321, dev avg loss 0.402297, throughput 2.8573K wps
[Epoch 75 Batch 30/173] avg loss 0.00219259, throughput 2.92517K wps
[Epoch 75 Batch 60/173] avg loss 0.00222742, throughput 2.87247K wps
[Epoch 75 Batch 90/173] avg loss 0.00209813, throughput 2.87215K wps
[Epoch 75 Batch 120/173] avg loss 0.00211569, throughput 2.86958K wps
[Epoch 75 Batch 150/173] avg loss 0.00231248, throughput 2.8646K wps
Begin Testing...
[Epoch 75] train avg loss 0.00218897, dev acc 0.8248, dev avg loss 0.406015, throughput 2.88083K wps
[Epoch 76 Batch 30/173] avg loss 0.00202288, throughput 2.91401K wps
[Epoch 76 Batch 60/173] avg loss 0.00215028, throughput 2.85245K wps
[Epoch 76 Batch 90/173] avg loss 0.00216097, throughput 2.87051K wps
[Epoch 76 Batch 120/173] avg loss 0.0021128, throughput 2.85985K wps
[Epoch 76 Batch 150/173] avg loss 0.00214566, throughput 2.87084K wps
Begin Testing...
[Epoch 76] train avg loss 0.00212118, dev acc 0.8227, dev avg loss 0.410284, throughput 2.87255K wps
[Epoch 77 Batch 30/173] avg loss 0.00215056, throughput 2.92062K wps
[Epoch 77 Batch 60/173] avg loss 0.00219253, throughput 2.87325K wps
[Epoch 77 Batch 90/173] avg loss 0.00202057, throughput 2.85478K wps
[Epoch 77 Batch 120/173] avg loss 0.00193593, throughput 2.8309K wps
[Epoch 77 Batch 150/173] avg loss 0.00201022, throughput 2.87459K wps
Begin Testing...
[Epoch 77] train avg loss 0.0020683, dev acc 0.8300, dev avg loss 0.406943, throughput 2.86963K wps
[Epoch 78 Batch 30/173] avg loss 0.00193417, throughput 2.93531K wps
[Epoch 78 Batch 60/173] avg loss 0.00214237, throughput 2.87252K wps
[Epoch 78 Batch 90/173] avg loss 0.00204715, throughput 2.85618K wps
[Epoch 78 Batch 120/173] avg loss 0.00204835, throughput 2.84897K wps
[Epoch 78 Batch 150/173] avg loss 0.00186018, throughput 2.85298K wps
Begin Testing...
[Epoch 78] train avg loss 0.00202008, dev acc 0.8175, dev avg loss 0.412134, throughput 2.87292K wps
[Epoch 79 Batch 30/173] avg loss 0.00198847, throughput 2.92907K wps
[Epoch 79 Batch 60/173] avg loss 0.00185242, throughput 2.86927K wps
[Epoch 79 Batch 90/173] avg loss 0.00208896, throughput 2.85429K wps
[Epoch 79 Batch 120/173] avg loss 0.00192152, throughput 2.85571K wps
[Epoch 79 Batch 150/173] avg loss 0.00197991, throughput 2.86507K wps
Begin Testing...
[Epoch 79] train avg loss 0.00197023, dev acc 0.8279, dev avg loss 0.407857, throughput 2.87259K wps
[Epoch 80 Batch 30/173] avg loss 0.00191862, throughput 2.90005K wps
[Epoch 80 Batch 60/173] avg loss 0.00187749, throughput 2.84853K wps
[Epoch 80 Batch 90/173] avg loss 0.00201698, throughput 2.869K wps
[Epoch 80 Batch 120/173] avg loss 0.00183742, throughput 2.86539K wps
[Epoch 80 Batch 150/173] avg loss 0.00192519, throughput 2.8469K wps
Begin Testing...
[Epoch 80] train avg loss 0.00192196, dev acc 0.8300, dev avg loss 0.410143, throughput 2.86011K wps
[Epoch 81 Batch 30/173] avg loss 0.00188172, throughput 2.92943K wps
[Epoch 81 Batch 60/173] avg loss 0.00186192, throughput 2.78271K wps
[Epoch 81 Batch 90/173] avg loss 0.00187715, throughput 2.85378K wps
[Epoch 81 Batch 120/173] avg loss 0.00197557, throughput 2.84129K wps
[Epoch 81 Batch 150/173] avg loss 0.00181289, throughput 2.8598K wps
Begin Testing...
[Epoch 81] train avg loss 0.00188064, dev acc 0.8269, dev avg loss 0.412176, throughput 2.84398K wps
[Epoch 82 Batch 30/173] avg loss 0.00186853, throughput 2.91513K wps
[Epoch 82 Batch 60/173] avg loss 0.00191873, throughput 2.83281K wps
[Epoch 82 Batch 90/173] avg loss 0.00173364, throughput 2.86327K wps
[Epoch 82 Batch 120/173] avg loss 0.0018284, throughput 2.88362K wps
[Epoch 82 Batch 150/173] avg loss 0.00193647, throughput 2.8823K wps
Begin Testing...
[Epoch 82] train avg loss 0.00186804, dev acc 0.8269, dev avg loss 0.413381, throughput 2.8755K wps
[Epoch 83 Batch 30/173] avg loss 0.00178785, throughput 2.93449K wps
[Epoch 83 Batch 60/173] avg loss 0.00176506, throughput 2.8453K wps
[Epoch 83 Batch 90/173] avg loss 0.00182344, throughput 2.88083K wps
[Epoch 83 Batch 120/173] avg loss 0.00177762, throughput 2.86682K wps
[Epoch 83 Batch 150/173] avg loss 0.00188461, throughput 2.87722K wps
Begin Testing...
[Epoch 83] train avg loss 0.00182217, dev acc 0.8279, dev avg loss 0.412499, throughput 2.87601K wps
[Epoch 84 Batch 30/173] avg loss 0.00162387, throughput 2.93742K wps
[Epoch 84 Batch 60/173] avg loss 0.00165933, throughput 2.87818K wps
[Epoch 84 Batch 90/173] avg loss 0.00172247, throughput 2.8737K wps
[Epoch 84 Batch 120/173] avg loss 0.00177518, throughput 2.87891K wps
[Epoch 84 Batch 150/173] avg loss 0.00173284, throughput 2.87763K wps
Begin Testing...
[Epoch 84] train avg loss 0.00170518, dev acc 0.8279, dev avg loss 0.415179, throughput 2.88768K wps
[Epoch 85 Batch 30/173] avg loss 0.00171625, throughput 2.90649K wps
[Epoch 85 Batch 60/173] avg loss 0.00176243, throughput 2.88143K wps
[Epoch 85 Batch 90/173] avg loss 0.00164578, throughput 2.88019K wps
[Epoch 85 Batch 120/173] avg loss 0.00169971, throughput 2.87419K wps
[Epoch 85 Batch 150/173] avg loss 0.00170215, throughput 2.85212K wps
Begin Testing...
[Epoch 85] train avg loss 0.0017249, dev acc 0.8269, dev avg loss 0.417364, throughput 2.87565K wps
[Epoch 86 Batch 30/173] avg loss 0.00161827, throughput 2.88342K wps
[Epoch 86 Batch 60/173] avg loss 0.00166323, throughput 2.87591K wps
[Epoch 86 Batch 90/173] avg loss 0.00163681, throughput 2.85506K wps
[Epoch 86 Batch 120/173] avg loss 0.00156475, throughput 2.85295K wps
[Epoch 86 Batch 150/173] avg loss 0.00162884, throughput 2.86211K wps
Begin Testing...
[Epoch 86] train avg loss 0.0016315, dev acc 0.8217, dev avg loss 0.423317, throughput 2.86253K wps
[Epoch 87 Batch 30/173] avg loss 0.00157366, throughput 2.88915K wps
[Epoch 87 Batch 60/173] avg loss 0.00167143, throughput 2.84976K wps
[Epoch 87 Batch 90/173] avg loss 0.00167804, throughput 2.84232K wps
[Epoch 87 Batch 120/173] avg loss 0.00167408, throughput 2.86637K wps
[Epoch 87 Batch 150/173] avg loss 0.00156123, throughput 2.87319K wps
Begin Testing...
[Epoch 87] train avg loss 0.00164989, dev acc 0.8269, dev avg loss 0.418569, throughput 2.86602K wps
[Epoch 88 Batch 30/173] avg loss 0.00153707, throughput 2.93434K wps
[Epoch 88 Batch 60/173] avg loss 0.00144343, throughput 2.83792K wps
[Epoch 88 Batch 90/173] avg loss 0.00145665, throughput 2.86168K wps
[Epoch 88 Batch 120/173] avg loss 0.00149456, throughput 2.82958K wps
[Epoch 88 Batch 150/173] avg loss 0.00163732, throughput 2.84853K wps
Begin Testing...
[Epoch 88] train avg loss 0.001537, dev acc 0.8238, dev avg loss 0.420907, throughput 2.86269K wps
[Epoch 89 Batch 30/173] avg loss 0.00145334, throughput 2.87697K wps
[Epoch 89 Batch 60/173] avg loss 0.00149363, throughput 2.84531K wps
[Epoch 89 Batch 90/173] avg loss 0.00155609, throughput 2.82489K wps
[Epoch 89 Batch 120/173] avg loss 0.00161065, throughput 2.87176K wps
[Epoch 89 Batch 150/173] avg loss 0.00151278, throughput 2.80781K wps
Begin Testing...
[Epoch 89] train avg loss 0.00152285, dev acc 0.8259, dev avg loss 0.422612, throughput 2.84522K wps
[Epoch 90 Batch 30/173] avg loss 0.0015403, throughput 2.86313K wps
[Epoch 90 Batch 60/173] avg loss 0.00153568, throughput 2.82102K wps
[Epoch 90 Batch 90/173] avg loss 0.00138473, throughput 2.86052K wps
[Epoch 90 Batch 120/173] avg loss 0.00144208, throughput 2.85229K wps
[Epoch 90 Batch 150/173] avg loss 0.00142584, throughput 2.88226K wps
Begin Testing...
[Epoch 90] train avg loss 0.00147144, dev acc 0.8259, dev avg loss 0.422687, throughput 2.85861K wps
[Epoch 91 Batch 30/173] avg loss 0.00156544, throughput 2.9029K wps
[Epoch 91 Batch 60/173] avg loss 0.00150549, throughput 2.87657K wps
[Epoch 91 Batch 90/173] avg loss 0.00151605, throughput 2.88428K wps
[Epoch 91 Batch 120/173] avg loss 0.00147323, throughput 2.84834K wps
[Epoch 91 Batch 150/173] avg loss 0.00149341, throughput 2.87131K wps
Begin Testing...
[Epoch 91] train avg loss 0.00150348, dev acc 0.8217, dev avg loss 0.431502, throughput 2.87415K wps
[Epoch 92 Batch 30/173] avg loss 0.00135916, throughput 2.94296K wps
[Epoch 92 Batch 60/173] avg loss 0.00148369, throughput 2.86277K wps
[Epoch 92 Batch 90/173] avg loss 0.00144989, throughput 2.86328K wps
[Epoch 92 Batch 120/173] avg loss 0.00141676, throughput 2.85625K wps
[Epoch 92 Batch 150/173] avg loss 0.00128867, throughput 2.85544K wps
Begin Testing...
[Epoch 92] train avg loss 0.00141684, dev acc 0.8259, dev avg loss 0.427092, throughput 2.87669K wps
[Epoch 93 Batch 30/173] avg loss 0.00141753, throughput 2.91398K wps
[Epoch 93 Batch 60/173] avg loss 0.00144602, throughput 2.86406K wps
[Epoch 93 Batch 90/173] avg loss 0.00142614, throughput 2.83133K wps
[Epoch 93 Batch 120/173] avg loss 0.00132468, throughput 2.85448K wps
[Epoch 93 Batch 150/173] avg loss 0.00139253, throughput 2.86964K wps
Begin Testing...
[Epoch 93] train avg loss 0.00138453, dev acc 0.8259, dev avg loss 0.428211, throughput 2.86591K wps
[Epoch 94 Batch 30/173] avg loss 0.00134696, throughput 2.91671K wps
[Epoch 94 Batch 60/173] avg loss 0.00143127, throughput 2.83528K wps
[Epoch 94 Batch 90/173] avg loss 0.00128509, throughput 2.82188K wps
[Epoch 94 Batch 120/173] avg loss 0.00134443, throughput 2.87283K wps
[Epoch 94 Batch 150/173] avg loss 0.00139055, throughput 2.88121K wps
Begin Testing...
[Epoch 94] train avg loss 0.00137034, dev acc 0.8259, dev avg loss 0.430041, throughput 2.86679K wps
[Epoch 95 Batch 30/173] avg loss 0.00141619, throughput 2.91234K wps
[Epoch 95 Batch 60/173] avg loss 0.00129629, throughput 2.8589K wps
[Epoch 95 Batch 90/173] avg loss 0.00128133, throughput 2.83669K wps
[Epoch 95 Batch 120/173] avg loss 0.00126931, throughput 2.8509K wps
[Epoch 95 Batch 150/173] avg loss 0.00133317, throughput 2.87936K wps
Begin Testing...
[Epoch 95] train avg loss 0.0013102, dev acc 0.8290, dev avg loss 0.431696, throughput 2.86707K wps
[Epoch 96 Batch 30/173] avg loss 0.00132988, throughput 2.92317K wps
[Epoch 96 Batch 60/173] avg loss 0.0011918, throughput 2.87888K wps
[Epoch 96 Batch 90/173] avg loss 0.00144076, throughput 2.88145K wps
[Epoch 96 Batch 120/173] avg loss 0.00131569, throughput 2.8801K wps
[Epoch 96 Batch 150/173] avg loss 0.00134981, throughput 2.87249K wps
Begin Testing...
[Epoch 96] train avg loss 0.00133005, dev acc 0.8217, dev avg loss 0.431994, throughput 2.88521K wps
[Epoch 97 Batch 30/173] avg loss 0.00115971, throughput 2.9185K wps
[Epoch 97 Batch 60/173] avg loss 0.00130985, throughput 2.87661K wps
[Epoch 97 Batch 90/173] avg loss 0.00120623, throughput 2.83907K wps
[Epoch 97 Batch 120/173] avg loss 0.00127726, throughput 2.85273K wps
[Epoch 97 Batch 150/173] avg loss 0.00120441, throughput 2.84684K wps
Begin Testing...
[Epoch 97] train avg loss 0.00124685, dev acc 0.8279, dev avg loss 0.439176, throughput 2.86822K wps
[Epoch 98 Batch 30/173] avg loss 0.00120372, throughput 2.94869K wps
[Epoch 98 Batch 60/173] avg loss 0.00127718, throughput 2.83833K wps
[Epoch 98 Batch 90/173] avg loss 0.00128194, throughput 2.85828K wps
[Epoch 98 Batch 120/173] avg loss 0.00128089, throughput 2.84206K wps
[Epoch 98 Batch 150/173] avg loss 0.00123452, throughput 2.83916K wps
Begin Testing...
[Epoch 98] train avg loss 0.00125473, dev acc 0.8206, dev avg loss 0.435313, throughput 2.86072K wps
[Epoch 99 Batch 30/173] avg loss 0.00114583, throughput 2.95067K wps
[Epoch 99 Batch 60/173] avg loss 0.00129993, throughput 2.85437K wps
[Epoch 99 Batch 90/173] avg loss 0.00115896, throughput 2.84914K wps
[Epoch 99 Batch 120/173] avg loss 0.00119346, throughput 2.86675K wps
[Epoch 99 Batch 150/173] avg loss 0.00129821, throughput 2.8451K wps
Begin Testing...
[Epoch 99] train avg loss 0.00120898, dev acc 0.8238, dev avg loss 0.43719, throughput 2.87214K wps
[Epoch 100 Batch 30/173] avg loss 0.00127488, throughput 2.9185K wps
[Epoch 100 Batch 60/173] avg loss 0.00112415, throughput 2.83321K wps
[Epoch 100 Batch 90/173] avg loss 0.00115984, throughput 2.83444K wps
[Epoch 100 Batch 120/173] avg loss 0.0011976, throughput 2.86993K wps
[Epoch 100 Batch 150/173] avg loss 0.00111214, throughput 2.8692K wps
Begin Testing...
[Epoch 100] train avg loss 0.00117901, dev acc 0.8259, dev avg loss 0.438545, throughput 2.8559K wps
[Epoch 101 Batch 30/173] avg loss 0.0011827, throughput 2.88116K wps
[Epoch 101 Batch 60/173] avg loss 0.0012134, throughput 2.87251K wps
[Epoch 101 Batch 90/173] avg loss 0.0011267, throughput 2.87726K wps
[Epoch 101 Batch 120/173] avg loss 0.00122346, throughput 2.86552K wps
[Epoch 101 Batch 150/173] avg loss 0.00112907, throughput 2.8423K wps
Begin Testing...
[Epoch 101] train avg loss 0.00117024, dev acc 0.8217, dev avg loss 0.440057, throughput 2.86135K wps
[Epoch 102 Batch 30/173] avg loss 0.00114375, throughput 2.88517K wps
[Epoch 102 Batch 60/173] avg loss 0.0011247, throughput 2.88364K wps
[Epoch 102 Batch 90/173] avg loss 0.0011941, throughput 2.87703K wps
[Epoch 102 Batch 120/173] avg loss 0.00116214, throughput 2.8516K wps
[Epoch 102 Batch 150/173] avg loss 0.00105629, throughput 2.84763K wps
Begin Testing...
[Epoch 102] train avg loss 0.00114094, dev acc 0.8248, dev avg loss 0.441598, throughput 2.86734K wps
[Epoch 103 Batch 30/173] avg loss 0.00118749, throughput 2.89389K wps
[Epoch 103 Batch 60/173] avg loss 0.00111624, throughput 2.86226K wps
[Epoch 103 Batch 90/173] avg loss 0.00098844, throughput 2.86833K wps
[Epoch 103 Batch 120/173] avg loss 0.00118384, throughput 2.88494K wps
[Epoch 103 Batch 150/173] avg loss 0.00107828, throughput 2.8816K wps
Begin Testing...
[Epoch 103] train avg loss 0.00110753, dev acc 0.8206, dev avg loss 0.446806, throughput 2.87549K wps
[Epoch 104 Batch 30/173] avg loss 0.0010243, throughput 2.92292K wps
[Epoch 104 Batch 60/173] avg loss 0.00104562, throughput 2.8593K wps
[Epoch 104 Batch 90/173] avg loss 0.00116468, throughput 2.86103K wps
[Epoch 104 Batch 120/173] avg loss 0.00102012, throughput 2.86512K wps
[Epoch 104 Batch 150/173] avg loss 0.00100796, throughput 2.8541K wps
Begin Testing...
[Epoch 104] train avg loss 0.00105333, dev acc 0.8238, dev avg loss 0.443739, throughput 2.87263K wps
[Epoch 105 Batch 30/173] avg loss 0.00103504, throughput 2.92647K wps
[Epoch 105 Batch 60/173] avg loss 0.00107369, throughput 2.87691K wps
[Epoch 105 Batch 90/173] avg loss 0.00116718, throughput 2.86582K wps
[Epoch 105 Batch 120/173] avg loss 0.00112396, throughput 2.86505K wps
[Epoch 105 Batch 150/173] avg loss 0.00110827, throughput 2.84721K wps
Begin Testing...
[Epoch 105] train avg loss 0.00108685, dev acc 0.8186, dev avg loss 0.446268, throughput 2.87711K wps
[Epoch 106 Batch 30/173] avg loss 0.000995731, throughput 2.88234K wps
[Epoch 106 Batch 60/173] avg loss 0.00100883, throughput 2.87287K wps
[Epoch 106 Batch 90/173] avg loss 0.00107952, throughput 2.8335K wps
[Epoch 106 Batch 120/173] avg loss 0.000989376, throughput 2.85735K wps
[Epoch 106 Batch 150/173] avg loss 0.00103572, throughput 2.86386K wps
Begin Testing...
[Epoch 106] train avg loss 0.00104789, dev acc 0.8206, dev avg loss 0.446639, throughput 2.86253K wps
[Epoch 107 Batch 30/173] avg loss 0.000941613, throughput 2.92134K wps
[Epoch 107 Batch 60/173] avg loss 0.00104968, throughput 2.87382K wps
[Epoch 107 Batch 90/173] avg loss 0.000974941, throughput 2.85999K wps
[Epoch 107 Batch 120/173] avg loss 0.00101757, throughput 2.81498K wps
[Epoch 107 Batch 150/173] avg loss 0.00102905, throughput 2.84546K wps
Begin Testing...
[Epoch 107] train avg loss 0.00101091, dev acc 0.8206, dev avg loss 0.446579, throughput 2.86361K wps
[Epoch 108 Batch 30/173] avg loss 0.000969665, throughput 2.91648K wps
[Epoch 108 Batch 60/173] avg loss 0.000987223, throughput 2.86641K wps
[Epoch 108 Batch 90/173] avg loss 0.00105639, throughput 2.86282K wps
[Epoch 108 Batch 120/173] avg loss 0.00102183, throughput 2.86054K wps
[Epoch 108 Batch 150/173] avg loss 0.0010245, throughput 2.87926K wps
Begin Testing...
[Epoch 108] train avg loss 0.0010118, dev acc 0.8206, dev avg loss 0.449518, throughput 2.87429K wps
[Epoch 109 Batch 30/173] avg loss 0.000977883, throughput 2.89866K wps
[Epoch 109 Batch 60/173] avg loss 0.000943661, throughput 2.8727K wps
[Epoch 109 Batch 90/173] avg loss 0.000939749, throughput 2.86303K wps
[Epoch 109 Batch 120/173] avg loss 0.00102722, throughput 2.87867K wps
[Epoch 109 Batch 150/173] avg loss 0.00096953, throughput 2.87097K wps
Begin Testing...
[Epoch 109] train avg loss 0.000975191, dev acc 0.8217, dev avg loss 0.450395, throughput 2.87637K wps
[Epoch 110 Batch 30/173] avg loss 0.00102877, throughput 2.94628K wps
[Epoch 110 Batch 60/173] avg loss 0.000982412, throughput 2.86426K wps
[Epoch 110 Batch 90/173] avg loss 0.000942946, throughput 2.87543K wps
[Epoch 110 Batch 120/173] avg loss 0.00100572, throughput 2.84506K wps
[Epoch 110 Batch 150/173] avg loss 0.000906522, throughput 2.8547K wps
Begin Testing...
[Epoch 110] train avg loss 0.000977148, dev acc 0.8165, dev avg loss 0.451279, throughput 2.87587K wps
[Epoch 111 Batch 30/173] avg loss 0.00101114, throughput 2.91747K wps
[Epoch 111 Batch 60/173] avg loss 0.000954455, throughput 2.85399K wps
[Epoch 111 Batch 90/173] avg loss 0.000964427, throughput 2.78589K wps
[Epoch 111 Batch 120/173] avg loss 0.00098116, throughput 2.86172K wps
[Epoch 111 Batch 150/173] avg loss 0.00099118, throughput 2.87442K wps
Begin Testing...
[Epoch 111] train avg loss 0.000986832, dev acc 0.8227, dev avg loss 0.452369, throughput 2.85507K wps
[Epoch 112 Batch 30/173] avg loss 0.000913832, throughput 2.94753K wps
[Epoch 112 Batch 60/173] avg loss 0.000895699, throughput 2.88451K wps
[Epoch 112 Batch 90/173] avg loss 0.00110997, throughput 2.85276K wps
[Epoch 112 Batch 120/173] avg loss 0.000901604, throughput 2.85526K wps
[Epoch 112 Batch 150/173] avg loss 0.000940382, throughput 2.82019K wps
Begin Testing...
[Epoch 112] train avg loss 0.000943132, dev acc 0.8175, dev avg loss 0.454001, throughput 2.86549K wps
[Epoch 113 Batch 30/173] avg loss 0.000992224, throughput 2.91437K wps
[Epoch 113 Batch 60/173] avg loss 0.000956516, throughput 2.86816K wps
[Epoch 113 Batch 90/173] avg loss 0.000864855, throughput 2.88263K wps
[Epoch 113 Batch 120/173] avg loss 0.000959303, throughput 2.87409K wps
[Epoch 113 Batch 150/173] avg loss 0.000827309, throughput 2.87694K wps
Begin Testing...
[Epoch 113] train avg loss 0.000913015, dev acc 0.8186, dev avg loss 0.454731, throughput 2.88344K wps
[Epoch 114 Batch 30/173] avg loss 0.000876014, throughput 2.93047K wps
[Epoch 114 Batch 60/173] avg loss 0.000873973, throughput 2.8177K wps
[Epoch 114 Batch 90/173] avg loss 0.000845552, throughput 2.84625K wps
[Epoch 114 Batch 120/173] avg loss 0.000910083, throughput 2.83647K wps
[Epoch 114 Batch 150/173] avg loss 0.000872923, throughput 2.8689K wps
Begin Testing...
[Epoch 114] train avg loss 0.000893698, dev acc 0.8165, dev avg loss 0.456828, throughput 2.86198K wps
[Epoch 115 Batch 30/173] avg loss 0.000888782, throughput 2.95339K wps
[Epoch 115 Batch 60/173] avg loss 0.000892487, throughput 2.86328K wps
[Epoch 115 Batch 90/173] avg loss 0.000855591, throughput 2.88371K wps
[Epoch 115 Batch 120/173] avg loss 0.000960172, throughput 2.85638K wps
[Epoch 115 Batch 150/173] avg loss 0.000750728, throughput 2.86035K wps
Begin Testing...
[Epoch 115] train avg loss 0.000880293, dev acc 0.8186, dev avg loss 0.459261, throughput 2.8804K wps
[Epoch 116 Batch 30/173] avg loss 0.000845103, throughput 2.92128K wps
[Epoch 116 Batch 60/173] avg loss 0.000829892, throughput 2.84263K wps
[Epoch 116 Batch 90/173] avg loss 0.000890587, throughput 2.85859K wps
[Epoch 116 Batch 120/173] avg loss 0.000892204, throughput 2.88839K wps
[Epoch 116 Batch 150/173] avg loss 0.000813662, throughput 2.86973K wps
Begin Testing...
[Epoch 116] train avg loss 0.000861157, dev acc 0.8227, dev avg loss 0.458549, throughput 2.8759K wps
[Epoch 117 Batch 30/173] avg loss 0.000918471, throughput 2.91012K wps
[Epoch 117 Batch 60/173] avg loss 0.000802796, throughput 2.87451K wps
[Epoch 117 Batch 90/173] avg loss 0.000807859, throughput 2.87314K wps
[Epoch 117 Batch 120/173] avg loss 0.000773445, throughput 2.86423K wps
[Epoch 117 Batch 150/173] avg loss 0.000861819, throughput 2.86701K wps
Begin Testing...
[Epoch 117] train avg loss 0.000832155, dev acc 0.8238, dev avg loss 0.459722, throughput 2.87504K wps
[Epoch 118 Batch 30/173] avg loss 0.000783964, throughput 2.93322K wps
[Epoch 118 Batch 60/173] avg loss 0.000863161, throughput 2.8532K wps
[Epoch 118 Batch 90/173] avg loss 0.00091396, throughput 2.87153K wps
[Epoch 118 Batch 120/173] avg loss 0.000822201, throughput 2.81207K wps
[Epoch 118 Batch 150/173] avg loss 0.000785598, throughput 2.86437K wps
Begin Testing...
[Epoch 118] train avg loss 0.000841327, dev acc 0.8196, dev avg loss 0.460314, throughput 2.86719K wps
[Epoch 119 Batch 30/173] avg loss 0.000818595, throughput 2.94048K wps
[Epoch 119 Batch 60/173] avg loss 0.000728463, throughput 2.87876K wps
[Epoch 119 Batch 90/173] avg loss 0.000840962, throughput 2.86771K wps
[Epoch 119 Batch 120/173] avg loss 0.000857316, throughput 2.84933K wps
[Epoch 119 Batch 150/173] avg loss 0.000870109, throughput 2.85121K wps
Begin Testing...
[Epoch 119] train avg loss 0.00083045, dev acc 0.8186, dev avg loss 0.463075, throughput 2.86717K wps
[Epoch 120 Batch 30/173] avg loss 0.00084389, throughput 2.91907K wps
[Epoch 120 Batch 60/173] avg loss 0.000722773, throughput 2.81905K wps
[Epoch 120 Batch 90/173] avg loss 0.000784358, throughput 2.87834K wps
[Epoch 120 Batch 120/173] avg loss 0.000849526, throughput 2.86044K wps
[Epoch 120 Batch 150/173] avg loss 0.000853433, throughput 2.86915K wps
Begin Testing...
[Epoch 120] train avg loss 0.000812328, dev acc 0.8165, dev avg loss 0.463759, throughput 2.86714K wps
[Epoch 121 Batch 30/173] avg loss 0.000827776, throughput 2.94152K wps
[Epoch 121 Batch 60/173] avg loss 0.000727357, throughput 2.87622K wps
[Epoch 121 Batch 90/173] avg loss 0.000781831, throughput 2.80785K wps
[Epoch 121 Batch 120/173] avg loss 0.000775696, throughput 2.85189K wps
[Epoch 121 Batch 150/173] avg loss 0.000762507, throughput 2.87325K wps
Begin Testing...
[Epoch 121] train avg loss 0.000767458, dev acc 0.8154, dev avg loss 0.469335, throughput 2.87022K wps
[Epoch 122 Batch 30/173] avg loss 0.000730857, throughput 2.94362K wps
[Epoch 122 Batch 60/173] avg loss 0.000844213, throughput 2.86473K wps
[Epoch 122 Batch 90/173] avg loss 0.000726556, throughput 2.80644K wps
[Epoch 122 Batch 120/173] avg loss 0.000785755, throughput 2.85307K wps
[Epoch 122 Batch 150/173] avg loss 0.000760257, throughput 2.86747K wps
Begin Testing...
[Epoch 122] train avg loss 0.000772872, dev acc 0.8165, dev avg loss 0.465441, throughput 2.86311K wps
[Epoch 123 Batch 30/173] avg loss 0.000708913, throughput 2.86575K wps
[Epoch 123 Batch 60/173] avg loss 0.000749257, throughput 2.87612K wps
[Epoch 123 Batch 90/173] avg loss 0.000760096, throughput 2.87636K wps
[Epoch 123 Batch 120/173] avg loss 0.000839408, throughput 2.86427K wps
[Epoch 123 Batch 150/173] avg loss 0.000781345, throughput 2.85985K wps
Begin Testing...
[Epoch 123] train avg loss 0.000775937, dev acc 0.8259, dev avg loss 0.466363, throughput 2.86235K wps
[Epoch 124 Batch 30/173] avg loss 0.000714582, throughput 2.93325K wps
[Epoch 124 Batch 60/173] avg loss 0.000672821, throughput 2.87856K wps
[Epoch 124 Batch 90/173] avg loss 0.00081318, throughput 2.86196K wps
[Epoch 124 Batch 120/173] avg loss 0.000678438, throughput 2.85613K wps
[Epoch 124 Batch 150/173] avg loss 0.000788975, throughput 2.8747K wps
Begin Testing...
[Epoch 124] train avg loss 0.000739063, dev acc 0.8165, dev avg loss 0.470399, throughput 2.87354K wps
[Epoch 125 Batch 30/173] avg loss 0.000698384, throughput 2.91319K wps
[Epoch 125 Batch 60/173] avg loss 0.000704798, throughput 2.86098K wps
[Epoch 125 Batch 90/173] avg loss 0.000777572, throughput 2.83348K wps
[Epoch 125 Batch 120/173] avg loss 0.000741265, throughput 2.8529K wps
[Epoch 125 Batch 150/173] avg loss 0.000674456, throughput 2.84177K wps
Begin Testing...
[Epoch 125] train avg loss 0.000718284, dev acc 0.8196, dev avg loss 0.471127, throughput 2.86018K wps
[Epoch 126 Batch 30/173] avg loss 0.000692733, throughput 2.89768K wps
[Epoch 126 Batch 60/173] avg loss 0.000618359, throughput 2.85787K wps
[Epoch 126 Batch 90/173] avg loss 0.000749345, throughput 2.87756K wps
[Epoch 126 Batch 120/173] avg loss 0.000731112, throughput 2.87953K wps
[Epoch 126 Batch 150/173] avg loss 0.000763459, throughput 2.87599K wps
Begin Testing...
[Epoch 126] train avg loss 0.000717511, dev acc 0.8238, dev avg loss 0.473191, throughput 2.87517K wps
[Epoch 127 Batch 30/173] avg loss 0.000721539, throughput 2.94548K wps
[Epoch 127 Batch 60/173] avg loss 0.00074971, throughput 2.88003K wps
[Epoch 127 Batch 90/173] avg loss 0.000699646, throughput 2.83919K wps
[Epoch 127 Batch 120/173] avg loss 0.00079645, throughput 2.86524K wps
[Epoch 127 Batch 150/173] avg loss 0.000711328, throughput 2.8828K wps
Begin Testing...
[Epoch 127] train avg loss 0.000735, dev acc 0.8175, dev avg loss 0.475083, throughput 2.8729K wps
[Epoch 128 Batch 30/173] avg loss 0.000722109, throughput 2.93314K wps
[Epoch 128 Batch 60/173] avg loss 0.000640523, throughput 2.87444K wps
[Epoch 128 Batch 90/173] avg loss 0.000785654, throughput 2.81144K wps
[Epoch 128 Batch 120/173] avg loss 0.000722573, throughput 2.81734K wps
[Epoch 128 Batch 150/173] avg loss 0.000687223, throughput 2.81806K wps
Begin Testing...
[Epoch 128] train avg loss 0.000713624, dev acc 0.8206, dev avg loss 0.474481, throughput 2.84916K wps
[Epoch 129 Batch 30/173] avg loss 0.000708939, throughput 2.94957K wps
[Epoch 129 Batch 60/173] avg loss 0.000705787, throughput 2.88379K wps
[Epoch 129 Batch 90/173] avg loss 0.000714033, throughput 2.82138K wps
[Epoch 129 Batch 120/173] avg loss 0.000704911, throughput 2.84225K wps
[Epoch 129 Batch 150/173] avg loss 0.000619262, throughput 2.86844K wps
Begin Testing...
[Epoch 129] train avg loss 0.000697961, dev acc 0.8144, dev avg loss 0.475953, throughput 2.86203K wps
[Epoch 130 Batch 30/173] avg loss 0.000659047, throughput 2.91049K wps
[Epoch 130 Batch 60/173] avg loss 0.000734338, throughput 2.85575K wps
[Epoch 130 Batch 90/173] avg loss 0.000643772, throughput 2.84148K wps
[Epoch 130 Batch 120/173] avg loss 0.000641363, throughput 2.83659K wps
[Epoch 130 Batch 150/173] avg loss 0.000743451, throughput 2.82037K wps
Begin Testing...
[Epoch 130] train avg loss 0.000686279, dev acc 0.8165, dev avg loss 0.479955, throughput 2.85379K wps
[Epoch 131 Batch 30/173] avg loss 0.000655005, throughput 2.91366K wps
[Epoch 131 Batch 60/173] avg loss 0.000710446, throughput 2.85461K wps
[Epoch 131 Batch 90/173] avg loss 0.000739932, throughput 2.85588K wps
[Epoch 131 Batch 120/173] avg loss 0.000667926, throughput 2.85172K wps
[Epoch 131 Batch 150/173] avg loss 0.000764375, throughput 2.83361K wps
Begin Testing...
[Epoch 131] train avg loss 0.000701599, dev acc 0.8206, dev avg loss 0.479387, throughput 2.86416K wps
[Epoch 132 Batch 30/173] avg loss 0.000631631, throughput 2.91789K wps
[Epoch 132 Batch 60/173] avg loss 0.000629804, throughput 2.8838K wps
[Epoch 132 Batch 90/173] avg loss 0.000667282, throughput 2.88471K wps
[Epoch 132 Batch 120/173] avg loss 0.000663132, throughput 2.8653K wps
[Epoch 132 Batch 150/173] avg loss 0.000610968, throughput 2.84305K wps
Begin Testing...
[Epoch 132] train avg loss 0.000641901, dev acc 0.8154, dev avg loss 0.483144, throughput 2.87841K wps
[Epoch 133 Batch 30/173] avg loss 0.000705389, throughput 2.93176K wps
[Epoch 133 Batch 60/173] avg loss 0.000648515, throughput 2.87116K wps
[Epoch 133 Batch 90/173] avg loss 0.000645024, throughput 2.85553K wps
[Epoch 133 Batch 120/173] avg loss 0.000670341, throughput 2.86702K wps
[Epoch 133 Batch 150/173] avg loss 0.000620252, throughput 2.8486K wps
Begin Testing...
[Epoch 133] train avg loss 0.000664028, dev acc 0.8175, dev avg loss 0.480668, throughput 2.86515K wps
[Epoch 134 Batch 30/173] avg loss 0.0006265, throughput 2.9427K wps
[Epoch 134 Batch 60/173] avg loss 0.000678145, throughput 2.87815K wps
[Epoch 134 Batch 90/173] avg loss 0.00065638, throughput 2.87984K wps
[Epoch 134 Batch 120/173] avg loss 0.000654868, throughput 2.86977K wps
[Epoch 134 Batch 150/173] avg loss 0.000605215, throughput 2.87672K wps
Begin Testing...
[Epoch 134] train avg loss 0.000634988, dev acc 0.8165, dev avg loss 0.482644, throughput 2.88615K wps
[Epoch 135 Batch 30/173] avg loss 0.000635421, throughput 2.8686K wps
[Epoch 135 Batch 60/173] avg loss 0.000567463, throughput 2.85642K wps
[Epoch 135 Batch 90/173] avg loss 0.000582386, throughput 2.80425K wps
[Epoch 135 Batch 120/173] avg loss 0.000514526, throughput 2.82015K wps
[Epoch 135 Batch 150/173] avg loss 0.000576537, throughput 2.84156K wps
Begin Testing...
[Epoch 135] train avg loss 0.000595771, dev acc 0.8144, dev avg loss 0.484744, throughput 2.84325K wps
[Epoch 136 Batch 30/173] avg loss 0.000523631, throughput 2.92399K wps
[Epoch 136 Batch 60/173] avg loss 0.000599958, throughput 2.87993K wps
[Epoch 136 Batch 90/173] avg loss 0.000673753, throughput 2.83987K wps
[Epoch 136 Batch 120/173] avg loss 0.000622403, throughput 2.80322K wps
[Epoch 136 Batch 150/173] avg loss 0.000626796, throughput 2.82559K wps
Begin Testing...
[Epoch 136] train avg loss 0.000609412, dev acc 0.8186, dev avg loss 0.486374, throughput 2.85169K wps
[Epoch 137 Batch 30/173] avg loss 0.00062016, throughput 2.8767K wps
[Epoch 137 Batch 60/173] avg loss 0.000609053, throughput 2.82579K wps
[Epoch 137 Batch 90/173] avg loss 0.000661324, throughput 2.84782K wps
[Epoch 137 Batch 120/173] avg loss 0.000568436, throughput 2.87427K wps
[Epoch 137 Batch 150/173] avg loss 0.000647507, throughput 2.85959K wps
Begin Testing...
[Epoch 137] train avg loss 0.000612437, dev acc 0.8248, dev avg loss 0.487372, throughput 2.85876K wps
[Epoch 138 Batch 30/173] avg loss 0.000499718, throughput 2.87275K wps
[Epoch 138 Batch 60/173] avg loss 0.000586052, throughput 2.86491K wps
[Epoch 138 Batch 90/173] avg loss 0.000583222, throughput 2.85035K wps
[Epoch 138 Batch 120/173] avg loss 0.000631447, throughput 2.83768K wps
[Epoch 138 Batch 150/173] avg loss 0.000575972, throughput 2.87903K wps
Begin Testing...
[Epoch 138] train avg loss 0.000592537, dev acc 0.8165, dev avg loss 0.487891, throughput 2.86195K wps
[Epoch 139 Batch 30/173] avg loss 0.000572267, throughput 2.92481K wps
[Epoch 139 Batch 60/173] avg loss 0.000505032, throughput 2.87649K wps
[Epoch 139 Batch 90/173] avg loss 0.000553154, throughput 2.86161K wps
[Epoch 139 Batch 120/173] avg loss 0.000601868, throughput 2.85679K wps
[Epoch 139 Batch 150/173] avg loss 0.000660134, throughput 2.85928K wps
Begin Testing...
[Epoch 139] train avg loss 0.000586851, dev acc 0.8154, dev avg loss 0.489294, throughput 2.86839K wps
[Epoch 140 Batch 30/173] avg loss 0.000577662, throughput 2.92891K wps
[Epoch 140 Batch 60/173] avg loss 0.000596141, throughput 2.85815K wps
[Epoch 140 Batch 90/173] avg loss 0.000536873, throughput 2.87387K wps
[Epoch 140 Batch 120/173] avg loss 0.000554973, throughput 2.87397K wps
[Epoch 140 Batch 150/173] avg loss 0.00051813, throughput 2.82367K wps
Begin Testing...
[Epoch 140] train avg loss 0.00056653, dev acc 0.8133, dev avg loss 0.496028, throughput 2.86461K wps
[Epoch 141 Batch 30/173] avg loss 0.000572519, throughput 2.90838K wps
[Epoch 141 Batch 60/173] avg loss 0.000566583, throughput 2.88574K wps
[Epoch 141 Batch 90/173] avg loss 0.000614717, throughput 2.87858K wps
[Epoch 141 Batch 120/173] avg loss 0.000557678, throughput 2.87546K wps
[Epoch 141 Batch 150/173] avg loss 0.000577632, throughput 2.85896K wps
Begin Testing...
[Epoch 141] train avg loss 0.000575786, dev acc 0.8175, dev avg loss 0.492766, throughput 2.88174K wps
[Epoch 142 Batch 30/173] avg loss 0.000544903, throughput 2.91003K wps
[Epoch 142 Batch 60/173] avg loss 0.000585708, throughput 2.87531K wps
[Epoch 142 Batch 90/173] avg loss 0.000563884, throughput 2.8729K wps
[Epoch 142 Batch 120/173] avg loss 0.000533187, throughput 2.84871K wps
[Epoch 142 Batch 150/173] avg loss 0.000545633, throughput 2.88163K wps
Begin Testing...
[Epoch 142] train avg loss 0.000565472, dev acc 0.8123, dev avg loss 0.495764, throughput 2.87624K wps
[Epoch 143 Batch 30/173] avg loss 0.000524886, throughput 2.94889K wps
[Epoch 143 Batch 60/173] avg loss 0.00052913, throughput 2.84313K wps
[Epoch 143 Batch 90/173] avg loss 0.000613173, throughput 2.85678K wps
[Epoch 143 Batch 120/173] avg loss 0.000515078, throughput 2.84504K wps
[Epoch 143 Batch 150/173] avg loss 0.000543135, throughput 2.88482K wps
Begin Testing...
[Epoch 143] train avg loss 0.000546012, dev acc 0.8144, dev avg loss 0.497553, throughput 2.87601K wps
[Epoch 144 Batch 30/173] avg loss 0.00051928, throughput 2.93822K wps
[Epoch 144 Batch 60/173] avg loss 0.000536074, throughput 2.88127K wps
[Epoch 144 Batch 90/173] avg loss 0.000502918, throughput 2.86967K wps
[Epoch 144 Batch 120/173] avg loss 0.000544321, throughput 2.8637K wps
[Epoch 144 Batch 150/173] avg loss 0.000566535, throughput 2.86504K wps
Begin Testing...
[Epoch 144] train avg loss 0.000537277, dev acc 0.8154, dev avg loss 0.498026, throughput 2.88309K wps
[Epoch 145 Batch 30/173] avg loss 0.000561068, throughput 2.91195K wps
[Epoch 145 Batch 60/173] avg loss 0.000497445, throughput 2.85733K wps
[Epoch 145 Batch 90/173] avg loss 0.000568941, throughput 2.85154K wps
[Epoch 145 Batch 120/173] avg loss 0.000587934, throughput 2.87139K wps
[Epoch 145 Batch 150/173] avg loss 0.000531299, throughput 2.86049K wps
Begin Testing...
[Epoch 145] train avg loss 0.000548525, dev acc 0.8144, dev avg loss 0.499088, throughput 2.86524K wps
[Epoch 146 Batch 30/173] avg loss 0.00048188, throughput 2.89811K wps
[Epoch 146 Batch 60/173] avg loss 0.000536111, throughput 2.81417K wps
[Epoch 146 Batch 90/173] avg loss 0.000505815, throughput 2.87038K wps
[Epoch 146 Batch 120/173] avg loss 0.000515999, throughput 2.87018K wps
[Epoch 146 Batch 150/173] avg loss 0.000458621, throughput 2.84568K wps
Begin Testing...
[Epoch 146] train avg loss 0.000504608, dev acc 0.8165, dev avg loss 0.498406, throughput 2.86179K wps
[Epoch 147 Batch 30/173] avg loss 0.000479758, throughput 2.87481K wps
[Epoch 147 Batch 60/173] avg loss 0.000456948, throughput 2.84727K wps
[Epoch 147 Batch 90/173] avg loss 0.000553382, throughput 2.81066K wps
[Epoch 147 Batch 120/173] avg loss 0.000604259, throughput 2.7938K wps
[Epoch 147 Batch 150/173] avg loss 0.000599365, throughput 2.8636K wps
Begin Testing...
[Epoch 147] train avg loss 0.000528959, dev acc 0.8217, dev avg loss 0.497451, throughput 2.8399K wps
[Epoch 148 Batch 30/173] avg loss 0.000511122, throughput 2.87412K wps
[Epoch 148 Batch 60/173] avg loss 0.000426361, throughput 2.8378K wps
[Epoch 148 Batch 90/173] avg loss 0.000512135, throughput 2.86986K wps
[Epoch 148 Batch 120/173] avg loss 0.000505166, throughput 2.86505K wps
[Epoch 148 Batch 150/173] avg loss 0.000520858, throughput 2.87323K wps
Begin Testing...
[Epoch 148] train avg loss 0.000493676, dev acc 0.8227, dev avg loss 0.497755, throughput 2.8654K wps
[Epoch 149 Batch 30/173] avg loss 0.00050991, throughput 2.93249K wps
[Epoch 149 Batch 60/173] avg loss 0.000451613, throughput 2.86776K wps
[Epoch 149 Batch 90/173] avg loss 0.000512756, throughput 2.86782K wps
[Epoch 149 Batch 120/173] avg loss 0.000527724, throughput 2.87722K wps
[Epoch 149 Batch 150/173] avg loss 0.000531186, throughput 2.87794K wps
Begin Testing...
[Epoch 149] train avg loss 0.000513801, dev acc 0.8154, dev avg loss 0.499462, throughput 2.88297K wps
[Epoch 150 Batch 30/173] avg loss 0.000556981, throughput 2.92596K wps
[Epoch 150 Batch 60/173] avg loss 0.000452566, throughput 2.8508K wps
[Epoch 150 Batch 90/173] avg loss 0.000546861, throughput 2.85339K wps
[Epoch 150 Batch 120/173] avg loss 0.000493739, throughput 2.81747K wps
[Epoch 150 Batch 150/173] avg loss 0.000491491, throughput 2.84468K wps
Begin Testing...
[Epoch 150] train avg loss 0.000501954, dev acc 0.8217, dev avg loss 0.501226, throughput 2.8581K wps
[Epoch 151 Batch 30/173] avg loss 0.000478372, throughput 2.92913K wps
[Epoch 151 Batch 60/173] avg loss 0.00050603, throughput 2.86842K wps
[Epoch 151 Batch 90/173] avg loss 0.00049061, throughput 2.83377K wps
[Epoch 151 Batch 120/173] avg loss 0.000458161, throughput 2.86034K wps
[Epoch 151 Batch 150/173] avg loss 0.000483403, throughput 2.85984K wps
Begin Testing...
[Epoch 151] train avg loss 0.000480432, dev acc 0.8133, dev avg loss 0.503468, throughput 2.86882K wps
[Epoch 152 Batch 30/173] avg loss 0.000549062, throughput 2.88663K wps
[Epoch 152 Batch 60/173] avg loss 0.00043652, throughput 2.87058K wps
[Epoch 152 Batch 90/173] avg loss 0.000534328, throughput 2.82347K wps
[Epoch 152 Batch 120/173] avg loss 0.000480038, throughput 2.85821K wps
[Epoch 152 Batch 150/173] avg loss 0.000474446, throughput 2.87559K wps
Begin Testing...
[Epoch 152] train avg loss 0.000499438, dev acc 0.8175, dev avg loss 0.503494, throughput 2.86506K wps
[Epoch 153 Batch 30/173] avg loss 0.000534185, throughput 2.87968K wps
[Epoch 153 Batch 60/173] avg loss 0.000410093, throughput 2.87902K wps
[Epoch 153 Batch 90/173] avg loss 0.000451285, throughput 2.87356K wps
[Epoch 153 Batch 120/173] avg loss 0.000472713, throughput 2.87162K wps
[Epoch 153 Batch 150/173] avg loss 0.000468743, throughput 2.87592K wps
Begin Testing...
[Epoch 153] train avg loss 0.00047295, dev acc 0.8186, dev avg loss 0.505442, throughput 2.87599K wps
[Epoch 154 Batch 30/173] avg loss 0.000447918, throughput 2.94583K wps
[Epoch 154 Batch 60/173] avg loss 0.000459267, throughput 2.83222K wps
[Epoch 154 Batch 90/173] avg loss 0.000519425, throughput 2.85262K wps
[Epoch 154 Batch 120/173] avg loss 0.00048163, throughput 2.82303K wps
[Epoch 154 Batch 150/173] avg loss 0.00052879, throughput 2.8748K wps
Begin Testing...
[Epoch 154] train avg loss 0.00048404, dev acc 0.8133, dev avg loss 0.510439, throughput 2.85697K wps
[Epoch 155 Batch 30/173] avg loss 0.000448628, throughput 2.89551K wps
[Epoch 155 Batch 60/173] avg loss 0.000432321, throughput 2.80357K wps
[Epoch 155 Batch 90/173] avg loss 0.000463483, throughput 2.83926K wps
[Epoch 155 Batch 120/173] avg loss 0.000485243, throughput 2.82938K wps
[Epoch 155 Batch 150/173] avg loss 0.000443171, throughput 2.8756K wps
Begin Testing...
[Epoch 155] train avg loss 0.000447551, dev acc 0.8123, dev avg loss 0.508286, throughput 2.85159K wps
[Epoch 156 Batch 30/173] avg loss 0.0004329, throughput 2.93066K wps
[Epoch 156 Batch 60/173] avg loss 0.00043476, throughput 2.81295K wps
[Epoch 156 Batch 90/173] avg loss 0.000495163, throughput 2.86816K wps
[Epoch 156 Batch 120/173] avg loss 0.000494305, throughput 2.87834K wps
[Epoch 156 Batch 150/173] avg loss 0.000468997, throughput 2.84233K wps
Begin Testing...
[Epoch 156] train avg loss 0.00047138, dev acc 0.8154, dev avg loss 0.509837, throughput 2.85657K wps
[Epoch 157 Batch 30/173] avg loss 0.000445459, throughput 2.88461K wps
[Epoch 157 Batch 60/173] avg loss 0.000472909, throughput 2.81717K wps
[Epoch 157 Batch 90/173] avg loss 0.00040338, throughput 2.82221K wps
[Epoch 157 Batch 120/173] avg loss 0.00047541, throughput 2.88418K wps
[Epoch 157 Batch 150/173] avg loss 0.000396407, throughput 2.86847K wps
Begin Testing...
[Epoch 157] train avg loss 0.000435845, dev acc 0.8123, dev avg loss 0.512959, throughput 2.85744K wps
[Epoch 158 Batch 30/173] avg loss 0.000462551, throughput 2.93981K wps
[Epoch 158 Batch 60/173] avg loss 0.000471672, throughput 2.87155K wps
[Epoch 158 Batch 90/173] avg loss 0.000392691, throughput 2.86738K wps
[Epoch 158 Batch 120/173] avg loss 0.00043278, throughput 2.8798K wps
[Epoch 158 Batch 150/173] avg loss 0.000486366, throughput 2.83839K wps
Begin Testing...
[Epoch 158] train avg loss 0.000451669, dev acc 0.8217, dev avg loss 0.512248, throughput 2.86983K wps
[Epoch 159 Batch 30/173] avg loss 0.000464521, throughput 2.88277K wps
[Epoch 159 Batch 60/173] avg loss 0.000478484, throughput 2.87218K wps
[Epoch 159 Batch 90/173] avg loss 0.000405535, throughput 2.84751K wps
[Epoch 159 Batch 120/173] avg loss 0.000406543, throughput 2.85988K wps
[Epoch 159 Batch 150/173] avg loss 0.000415958, throughput 2.84952K wps
Begin Testing...
[Epoch 159] train avg loss 0.000438261, dev acc 0.8154, dev avg loss 0.512457, throughput 2.85823K wps
[Epoch 160 Batch 30/173] avg loss 0.000452249, throughput 2.88112K wps
[Epoch 160 Batch 60/173] avg loss 0.000446926, throughput 2.86276K wps
[Epoch 160 Batch 90/173] avg loss 0.000474033, throughput 2.86761K wps
[Epoch 160 Batch 120/173] avg loss 0.000458771, throughput 2.84301K wps
[Epoch 160 Batch 150/173] avg loss 0.000451263, throughput 2.87621K wps
Begin Testing...
[Epoch 160] train avg loss 0.000456283, dev acc 0.8154, dev avg loss 0.512798, throughput 2.86133K wps
[Epoch 161 Batch 30/173] avg loss 0.000398487, throughput 2.94694K wps
[Epoch 161 Batch 60/173] avg loss 0.000426661, throughput 2.86895K wps
[Epoch 161 Batch 90/173] avg loss 0.000447183, throughput 2.8769K wps
[Epoch 161 Batch 120/173] avg loss 0.000425874, throughput 2.81753K wps
[Epoch 161 Batch 150/173] avg loss 0.000456722, throughput 2.81374K wps
Begin Testing...
[Epoch 161] train avg loss 0.000434431, dev acc 0.8206, dev avg loss 0.513777, throughput 2.86144K wps
[Epoch 162 Batch 30/173] avg loss 0.00041342, throughput 2.92539K wps
[Epoch 162 Batch 60/173] avg loss 0.000361449, throughput 2.84288K wps
[Epoch 162 Batch 90/173] avg loss 0.000395458, throughput 2.86176K wps
[Epoch 162 Batch 120/173] avg loss 0.00049751, throughput 2.85593K wps
[Epoch 162 Batch 150/173] avg loss 0.000410961, throughput 2.87816K wps
Begin Testing...
[Epoch 162] train avg loss 0.000422183, dev acc 0.8165, dev avg loss 0.516247, throughput 2.87171K wps
[Epoch 163 Batch 30/173] avg loss 0.000469062, throughput 2.93563K wps
[Epoch 163 Batch 60/173] avg loss 0.000422543, throughput 2.85368K wps
[Epoch 163 Batch 90/173] avg loss 0.000435657, throughput 2.8459K wps
[Epoch 163 Batch 120/173] avg loss 0.000418645, throughput 2.82518K wps
[Epoch 163 Batch 150/173] avg loss 0.000390769, throughput 2.8726K wps
Begin Testing...
[Epoch 163] train avg loss 0.000428061, dev acc 0.8144, dev avg loss 0.516457, throughput 2.86798K wps
[Epoch 164 Batch 30/173] avg loss 0.000467741, throughput 2.94019K wps
[Epoch 164 Batch 60/173] avg loss 0.000417703, throughput 2.85579K wps
[Epoch 164 Batch 90/173] avg loss 0.00042075, throughput 2.87563K wps
[Epoch 164 Batch 120/173] avg loss 0.000448452, throughput 2.8523K wps
[Epoch 164 Batch 150/173] avg loss 0.000382106, throughput 2.85121K wps
Begin Testing...
[Epoch 164] train avg loss 0.000423952, dev acc 0.8133, dev avg loss 0.518373, throughput 2.86493K wps
[Epoch 165 Batch 30/173] avg loss 0.000450873, throughput 2.9135K wps
[Epoch 165 Batch 60/173] avg loss 0.000387386, throughput 2.86077K wps
[Epoch 165 Batch 90/173] avg loss 0.000417742, throughput 2.83674K wps
[Epoch 165 Batch 120/173] avg loss 0.000457715, throughput 2.87613K wps
[Epoch 165 Batch 150/173] avg loss 0.000435451, throughput 2.87986K wps
Begin Testing...
[Epoch 165] train avg loss 0.000435474, dev acc 0.8144, dev avg loss 0.519619, throughput 2.86979K wps
[Epoch 166 Batch 30/173] avg loss 0.000438644, throughput 2.93534K wps
[Epoch 166 Batch 60/173] avg loss 0.000387154, throughput 2.87952K wps
[Epoch 166 Batch 90/173] avg loss 0.000410836, throughput 2.87211K wps
[Epoch 166 Batch 120/173] avg loss 0.000403881, throughput 2.84801K wps
[Epoch 166 Batch 150/173] avg loss 0.000412327, throughput 2.87922K wps
Begin Testing...
[Epoch 166] train avg loss 0.000411051, dev acc 0.8154, dev avg loss 0.518181, throughput 2.87723K wps
[Epoch 167 Batch 30/173] avg loss 0.000399616, throughput 2.90947K wps
[Epoch 167 Batch 60/173] avg loss 0.000397295, throughput 2.87163K wps
[Epoch 167 Batch 90/173] avg loss 0.000444857, throughput 2.88001K wps
[Epoch 167 Batch 120/173] avg loss 0.000358348, throughput 2.86415K wps
[Epoch 167 Batch 150/173] avg loss 0.000398259, throughput 2.81227K wps
Begin Testing...
[Epoch 167] train avg loss 0.000400088, dev acc 0.8165, dev avg loss 0.521167, throughput 2.86772K wps
[Epoch 168 Batch 30/173] avg loss 0.000401254, throughput 2.92187K wps
[Epoch 168 Batch 60/173] avg loss 0.000353708, throughput 2.86635K wps
[Epoch 168 Batch 90/173] avg loss 0.000357584, throughput 2.86533K wps
[Epoch 168 Batch 120/173] avg loss 0.000418565, throughput 2.82805K wps
[Epoch 168 Batch 150/173] avg loss 0.000392969, throughput 2.8655K wps
Begin Testing...
[Epoch 168] train avg loss 0.000393794, dev acc 0.8165, dev avg loss 0.52292, throughput 2.86949K wps
[Epoch 169 Batch 30/173] avg loss 0.000393331, throughput 2.90087K wps
[Epoch 169 Batch 60/173] avg loss 0.000352602, throughput 2.86756K wps
[Epoch 169 Batch 90/173] avg loss 0.000375332, throughput 2.87841K wps
[Epoch 169 Batch 120/173] avg loss 0.000428132, throughput 2.84233K wps
[Epoch 169 Batch 150/173] avg loss 0.000428261, throughput 2.87564K wps
Begin Testing...
[Epoch 169] train avg loss 0.000389977, dev acc 0.8175, dev avg loss 0.524795, throughput 2.87403K wps
[Epoch 170 Batch 30/173] avg loss 0.000381143, throughput 2.90561K wps
[Epoch 170 Batch 60/173] avg loss 0.000361121, throughput 2.84746K wps
[Epoch 170 Batch 90/173] avg loss 0.000348911, throughput 2.8547K wps
[Epoch 170 Batch 120/173] avg loss 0.000401922, throughput 2.87652K wps
[Epoch 170 Batch 150/173] avg loss 0.000408851, throughput 2.86867K wps
Begin Testing...
[Epoch 170] train avg loss 0.000383952, dev acc 0.8144, dev avg loss 0.527117, throughput 2.87128K wps
[Epoch 171 Batch 30/173] avg loss 0.000411144, throughput 2.92148K wps
[Epoch 171 Batch 60/173] avg loss 0.000346148, throughput 2.87464K wps
[Epoch 171 Batch 90/173] avg loss 0.000334819, throughput 2.82098K wps
[Epoch 171 Batch 120/173] avg loss 0.000323307, throughput 2.82923K wps
[Epoch 171 Batch 150/173] avg loss 0.000369973, throughput 2.86186K wps
Begin Testing...
[Epoch 171] train avg loss 0.000358508, dev acc 0.8133, dev avg loss 0.526126, throughput 2.86279K wps
[Epoch 172 Batch 30/173] avg loss 0.000445854, throughput 2.93041K wps
[Epoch 172 Batch 60/173] avg loss 0.000409031, throughput 2.87424K wps
[Epoch 172 Batch 90/173] avg loss 0.000370248, throughput 2.88049K wps
[Epoch 172 Batch 120/173] avg loss 0.000342262, throughput 2.87015K wps
[Epoch 172 Batch 150/173] avg loss 0.000349933, throughput 2.87433K wps
Begin Testing...
[Epoch 172] train avg loss 0.000379821, dev acc 0.8144, dev avg loss 0.528411, throughput 2.88452K wps
[Epoch 173 Batch 30/173] avg loss 0.000345506, throughput 2.9177K wps
[Epoch 173 Batch 60/173] avg loss 0.000332497, throughput 2.83309K wps
[Epoch 173 Batch 90/173] avg loss 0.000366754, throughput 2.81696K wps
[Epoch 173 Batch 120/173] avg loss 0.000393497, throughput 2.8777K wps
[Epoch 173 Batch 150/173] avg loss 0.000481014, throughput 2.87068K wps
Begin Testing...
[Epoch 173] train avg loss 0.000386291, dev acc 0.8133, dev avg loss 0.52911, throughput 2.86541K wps
[Epoch 174 Batch 30/173] avg loss 0.000374672, throughput 2.93794K wps
[Epoch 174 Batch 60/173] avg loss 0.000349609, throughput 2.81535K wps
[Epoch 174 Batch 90/173] avg loss 0.000373476, throughput 2.83078K wps
[Epoch 174 Batch 120/173] avg loss 0.00039918, throughput 2.85984K wps
[Epoch 174 Batch 150/173] avg loss 0.000414702, throughput 2.85861K wps
Begin Testing...
[Epoch 174] train avg loss 0.000377403, dev acc 0.8154, dev avg loss 0.531524, throughput 2.86321K wps
[Epoch 175 Batch 30/173] avg loss 0.000342332, throughput 2.92984K wps
[Epoch 175 Batch 60/173] avg loss 0.000352631, throughput 2.86938K wps
[Epoch 175 Batch 90/173] avg loss 0.000370726, throughput 2.88011K wps
[Epoch 175 Batch 120/173] avg loss 0.000385514, throughput 2.86553K wps
[Epoch 175 Batch 150/173] avg loss 0.000371746, throughput 2.87534K wps
Begin Testing...
[Epoch 175] train avg loss 0.000363213, dev acc 0.8123, dev avg loss 0.535168, throughput 2.88334K wps
[Epoch 176 Batch 30/173] avg loss 0.000360437, throughput 2.90494K wps
[Epoch 176 Batch 60/173] avg loss 0.000378346, throughput 2.84742K wps
[Epoch 176 Batch 90/173] avg loss 0.000373512, throughput 2.85728K wps
[Epoch 176 Batch 120/173] avg loss 0.000395119, throughput 2.85631K wps
[Epoch 176 Batch 150/173] avg loss 0.000401045, throughput 2.80969K wps
Begin Testing...
[Epoch 176] train avg loss 0.000375848, dev acc 0.8196, dev avg loss 0.532514, throughput 2.8557K wps
[Epoch 177 Batch 30/173] avg loss 0.000346271, throughput 2.8862K wps
[Epoch 177 Batch 60/173] avg loss 0.000365308, throughput 2.86207K wps
[Epoch 177 Batch 90/173] avg loss 0.000337867, throughput 2.82163K wps
[Epoch 177 Batch 120/173] avg loss 0.000369486, throughput 2.86329K wps
[Epoch 177 Batch 150/173] avg loss 0.000349567, throughput 2.85773K wps
Begin Testing...
[Epoch 177] train avg loss 0.000357741, dev acc 0.8133, dev avg loss 0.533756, throughput 2.8554K wps
[Epoch 178 Batch 30/173] avg loss 0.000360226, throughput 2.92602K wps
[Epoch 178 Batch 60/173] avg loss 0.000319949, throughput 2.8053K wps
[Epoch 178 Batch 90/173] avg loss 0.000306575, throughput 2.86831K wps
[Epoch 178 Batch 120/173] avg loss 0.000337023, throughput 2.87985K wps
[Epoch 178 Batch 150/173] avg loss 0.000362158, throughput 2.87878K wps
Begin Testing...
[Epoch 178] train avg loss 0.000350923, dev acc 0.8133, dev avg loss 0.53593, throughput 2.8719K wps
[Epoch 179 Batch 30/173] avg loss 0.000367185, throughput 2.92333K wps
[Epoch 179 Batch 60/173] avg loss 0.000339825, throughput 2.8807K wps
[Epoch 179 Batch 90/173] avg loss 0.000374897, throughput 2.86427K wps
[Epoch 179 Batch 120/173] avg loss 0.000374184, throughput 2.86394K wps
[Epoch 179 Batch 150/173] avg loss 0.000396754, throughput 2.87366K wps
Begin Testing...
[Epoch 179] train avg loss 0.000370763, dev acc 0.8144, dev avg loss 0.535998, throughput 2.88069K wps
[Epoch 180 Batch 30/173] avg loss 0.000333361, throughput 2.88203K wps
[Epoch 180 Batch 60/173] avg loss 0.000314629, throughput 2.82428K wps
[Epoch 180 Batch 90/173] avg loss 0.00039154, throughput 2.80847K wps
[Epoch 180 Batch 120/173] avg loss 0.000337926, throughput 2.81669K wps
[Epoch 180 Batch 150/173] avg loss 0.000354141, throughput 2.85886K wps
Begin Testing...
[Epoch 180] train avg loss 0.000348454, dev acc 0.8165, dev avg loss 0.537009, throughput 2.84276K wps
[Epoch 181 Batch 30/173] avg loss 0.000337242, throughput 2.87508K wps
[Epoch 181 Batch 60/173] avg loss 0.000390838, throughput 2.84183K wps
[Epoch 181 Batch 90/173] avg loss 0.000348069, throughput 2.81854K wps
[Epoch 181 Batch 120/173] avg loss 0.000364375, throughput 2.83425K wps
[Epoch 181 Batch 150/173] avg loss 0.000347608, throughput 2.84513K wps
Begin Testing...
[Epoch 181] train avg loss 0.000355286, dev acc 0.8144, dev avg loss 0.538383, throughput 2.84711K wps
[Epoch 182 Batch 30/173] avg loss 0.00033379, throughput 2.92657K wps
[Epoch 182 Batch 60/173] avg loss 0.000338806, throughput 2.88108K wps
[Epoch 182 Batch 90/173] avg loss 0.00029725, throughput 2.86507K wps
[Epoch 182 Batch 120/173] avg loss 0.000325911, throughput 2.8291K wps
[Epoch 182 Batch 150/173] avg loss 0.000343588, throughput 2.81599K wps
Begin Testing...
[Epoch 182] train avg loss 0.000331246, dev acc 0.8133, dev avg loss 0.542845, throughput 2.86536K wps
[Epoch 183 Batch 30/173] avg loss 0.000347233, throughput 2.93675K wps
[Epoch 183 Batch 60/173] avg loss 0.0003113, throughput 2.85425K wps
[Epoch 183 Batch 90/173] avg loss 0.000313947, throughput 2.80555K wps
[Epoch 183 Batch 120/173] avg loss 0.000353678, throughput 2.80507K wps
[Epoch 183 Batch 150/173] avg loss 0.000355585, throughput 2.8778K wps
Begin Testing...
[Epoch 183] train avg loss 0.000341191, dev acc 0.8144, dev avg loss 0.539825, throughput 2.85756K wps
[Epoch 184 Batch 30/173] avg loss 0.000292533, throughput 2.92871K wps
[Epoch 184 Batch 60/173] avg loss 0.000324185, throughput 2.80593K wps
[Epoch 184 Batch 90/173] avg loss 0.000323246, throughput 2.86095K wps
[Epoch 184 Batch 120/173] avg loss 0.000335316, throughput 2.87525K wps
[Epoch 184 Batch 150/173] avg loss 0.000364705, throughput 2.85756K wps
Begin Testing...
[Epoch 184] train avg loss 0.000327232, dev acc 0.8133, dev avg loss 0.541044, throughput 2.864K wps
[Epoch 185 Batch 30/173] avg loss 0.000318221, throughput 2.89148K wps
[Epoch 185 Batch 60/173] avg loss 0.000385916, throughput 2.83157K wps
[Epoch 185 Batch 90/173] avg loss 0.000344781, throughput 2.79692K wps
[Epoch 185 Batch 120/173] avg loss 0.000339926, throughput 2.83386K wps
[Epoch 185 Batch 150/173] avg loss 0.000322971, throughput 2.87668K wps
Begin Testing...
[Epoch 185] train avg loss 0.000344164, dev acc 0.8133, dev avg loss 0.541551, throughput 2.84729K wps
[Epoch 186 Batch 30/173] avg loss 0.000300747, throughput 2.94273K wps
[Epoch 186 Batch 60/173] avg loss 0.000329857, throughput 2.84109K wps
[Epoch 186 Batch 90/173] avg loss 0.000306892, throughput 2.8466K wps
[Epoch 186 Batch 120/173] avg loss 0.000360547, throughput 2.83556K wps
[Epoch 186 Batch 150/173] avg loss 0.000335821, throughput 2.85809K wps
Begin Testing...
[Epoch 186] train avg loss 0.000324026, dev acc 0.8123, dev avg loss 0.545286, throughput 2.86468K wps
[Epoch 187 Batch 30/173] avg loss 0.00030914, throughput 2.9353K wps
[Epoch 187 Batch 60/173] avg loss 0.000315778, throughput 2.86386K wps
[Epoch 187 Batch 90/173] avg loss 0.000308116, throughput 2.83668K wps
[Epoch 187 Batch 120/173] avg loss 0.000359371, throughput 2.83694K wps
[Epoch 187 Batch 150/173] avg loss 0.000311032, throughput 2.85801K wps
Begin Testing...
[Epoch 187] train avg loss 0.000321147, dev acc 0.8206, dev avg loss 0.544391, throughput 2.86328K wps
[Epoch 188 Batch 30/173] avg loss 0.000317541, throughput 2.93997K wps
[Epoch 188 Batch 60/173] avg loss 0.000399703, throughput 2.8856K wps
[Epoch 188 Batch 90/173] avg loss 0.000323345, throughput 2.82792K wps
[Epoch 188 Batch 120/173] avg loss 0.000346567, throughput 2.82848K wps
[Epoch 188 Batch 150/173] avg loss 0.000339748, throughput 2.87917K wps
Begin Testing...
[Epoch 188] train avg loss 0.000348154, dev acc 0.8238, dev avg loss 0.546551, throughput 2.87349K wps
[Epoch 189 Batch 30/173] avg loss 0.000333459, throughput 2.94793K wps
[Epoch 189 Batch 60/173] avg loss 0.000291648, throughput 2.85317K wps
[Epoch 189 Batch 90/173] avg loss 0.00034129, throughput 2.86572K wps
[Epoch 189 Batch 120/173] avg loss 0.000346427, throughput 2.8791K wps
[Epoch 189 Batch 150/173] avg loss 0.000302774, throughput 2.82064K wps
Begin Testing...
[Epoch 189] train avg loss 0.00032198, dev acc 0.8133, dev avg loss 0.544532, throughput 2.86383K wps
[Epoch 190 Batch 30/173] avg loss 0.000341964, throughput 2.94232K wps
[Epoch 190 Batch 60/173] avg loss 0.000303311, throughput 2.87459K wps
[Epoch 190 Batch 90/173] avg loss 0.000326289, throughput 2.83255K wps
[Epoch 190 Batch 120/173] avg loss 0.000285945, throughput 2.83967K wps
[Epoch 190 Batch 150/173] avg loss 0.000330748, throughput 2.83325K wps
Begin Testing...
[Epoch 190] train avg loss 0.000323133, dev acc 0.8133, dev avg loss 0.546563, throughput 2.86387K wps
[Epoch 191 Batch 30/173] avg loss 0.000289033, throughput 2.87265K wps
[Epoch 191 Batch 60/173] avg loss 0.000333591, throughput 2.81394K wps
[Epoch 191 Batch 90/173] avg loss 0.000331517, throughput 2.82819K wps
[Epoch 191 Batch 120/173] avg loss 0.000290565, throughput 2.86214K wps
[Epoch 191 Batch 150/173] avg loss 0.000368119, throughput 2.87823K wps
Begin Testing...
[Epoch 191] train avg loss 0.00032354, dev acc 0.8238, dev avg loss 0.548522, throughput 2.85496K wps
[Epoch 192 Batch 30/173] avg loss 0.000335223, throughput 2.92165K wps
[Epoch 192 Batch 60/173] avg loss 0.000305069, throughput 2.84251K wps
[Epoch 192 Batch 90/173] avg loss 0.000307413, throughput 2.85794K wps
[Epoch 192 Batch 120/173] avg loss 0.000308593, throughput 2.87316K wps
[Epoch 192 Batch 150/173] avg loss 0.000330111, throughput 2.87373K wps
Begin Testing...
[Epoch 192] train avg loss 0.000322904, dev acc 0.8133, dev avg loss 0.550344, throughput 2.872K wps
[Epoch 193 Batch 30/173] avg loss 0.000308836, throughput 2.93585K wps
[Epoch 193 Batch 60/173] avg loss 0.000309375, throughput 2.87389K wps
[Epoch 193 Batch 90/173] avg loss 0.000296391, throughput 2.84091K wps
[Epoch 193 Batch 120/173] avg loss 0.000353055, throughput 2.87505K wps
[Epoch 193 Batch 150/173] avg loss 0.0003693, throughput 2.82458K wps
Begin Testing...
[Epoch 193] train avg loss 0.00032678, dev acc 0.8154, dev avg loss 0.548111, throughput 2.86328K wps
[Epoch 194 Batch 30/173] avg loss 0.000290375, throughput 2.92256K wps
[Epoch 194 Batch 60/173] avg loss 0.000285641, throughput 2.84046K wps
[Epoch 194 Batch 90/173] avg loss 0.000320104, throughput 2.8027K wps
[Epoch 194 Batch 120/173] avg loss 0.000300135, throughput 2.85796K wps
[Epoch 194 Batch 150/173] avg loss 0.000311332, throughput 2.87355K wps
Begin Testing...
[Epoch 194] train avg loss 0.000306594, dev acc 0.8133, dev avg loss 0.549271, throughput 2.86165K wps
[Epoch 195 Batch 30/173] avg loss 0.000269934, throughput 2.86953K wps
[Epoch 195 Batch 60/173] avg loss 0.000285961, throughput 2.87457K wps
[Epoch 195 Batch 90/173] avg loss 0.000319094, throughput 2.83158K wps
[Epoch 195 Batch 120/173] avg loss 0.000304119, throughput 2.85406K wps
[Epoch 195 Batch 150/173] avg loss 0.000296863, throughput 2.86049K wps
Begin Testing...
[Epoch 195] train avg loss 0.000291035, dev acc 0.8123, dev avg loss 0.5511, throughput 2.8576K wps
[Epoch 196 Batch 30/173] avg loss 0.000327403, throughput 2.93939K wps
[Epoch 196 Batch 60/173] avg loss 0.000317476, throughput 2.86483K wps
[Epoch 196 Batch 90/173] avg loss 0.000294847, throughput 2.88546K wps
[Epoch 196 Batch 120/173] avg loss 0.000323524, throughput 2.8483K wps
[Epoch 196 Batch 150/173] avg loss 0.000267339, throughput 2.84432K wps
Begin Testing...
[Epoch 196] train avg loss 0.000313247, dev acc 0.8113, dev avg loss 0.554138, throughput 2.87106K wps
[Epoch 197 Batch 30/173] avg loss 0.000275383, throughput 2.93472K wps
[Epoch 197 Batch 60/173] avg loss 0.000348104, throughput 2.8397K wps
[Epoch 197 Batch 90/173] avg loss 0.000249777, throughput 2.81565K wps
[Epoch 197 Batch 120/173] avg loss 0.000354153, throughput 2.86089K wps
[Epoch 197 Batch 150/173] avg loss 0.000333436, throughput 2.86327K wps
Begin Testing...
[Epoch 197] train avg loss 0.000307079, dev acc 0.8144, dev avg loss 0.55185, throughput 2.86145K wps
[Epoch 198 Batch 30/173] avg loss 0.000232883, throughput 2.90326K wps
[Epoch 198 Batch 60/173] avg loss 0.000284285, throughput 2.83928K wps
[Epoch 198 Batch 90/173] avg loss 0.000271507, throughput 2.86818K wps
[Epoch 198 Batch 120/173] avg loss 0.000331221, throughput 2.87038K wps
[Epoch 198 Batch 150/173] avg loss 0.000320429, throughput 2.88085K wps
Begin Testing...
[Epoch 198] train avg loss 0.000282185, dev acc 0.8123, dev avg loss 0.555547, throughput 2.87342K wps
[Epoch 199 Batch 30/173] avg loss 0.000263686, throughput 2.93648K wps
[Epoch 199 Batch 60/173] avg loss 0.000256518, throughput 2.86123K wps
[Epoch 199 Batch 90/173] avg loss 0.000296086, throughput 2.85586K wps
[Epoch 199 Batch 120/173] avg loss 0.000274071, throughput 2.82646K wps
[Epoch 199 Batch 150/173] avg loss 0.000275375, throughput 2.86888K wps
Begin Testing...
[Epoch 199] train avg loss 0.00026952, dev acc 0.8133, dev avg loss 0.555294, throughput 2.87055K wps
Test loss 0.440608, test acc 0.8002
Total time cost 698.05s
[Epoch 0 Batch 30/173] avg loss 0.0139885, throughput 2.45311K wps
[Epoch 0 Batch 60/173] avg loss 0.0140288, throughput 2.85265K wps
[Epoch 0 Batch 90/173] avg loss 0.013975, throughput 2.8754K wps
[Epoch 0 Batch 120/173] avg loss 0.0138342, throughput 2.86548K wps
[Epoch 0 Batch 150/173] avg loss 0.0139011, throughput 2.87822K wps
Begin Testing...
[Epoch 0] train avg loss 0.0139536, dev acc 0.6100, dev avg loss 0.683774, throughput 2.78544K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0138191, throughput 2.9363K wps
[Epoch 1 Batch 60/173] avg loss 0.013752, throughput 2.8767K wps
[Epoch 1 Batch 90/173] avg loss 0.0137515, throughput 2.87301K wps
[Epoch 1 Batch 120/173] avg loss 0.0138076, throughput 2.86613K wps
[Epoch 1 Batch 150/173] avg loss 0.0135841, throughput 2.84055K wps
Begin Testing...
[Epoch 1] train avg loss 0.0137588, dev acc 0.6548, dev avg loss 0.677479, throughput 2.87005K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0135807, throughput 2.9087K wps
[Epoch 2 Batch 60/173] avg loss 0.0135775, throughput 2.86476K wps
[Epoch 2 Batch 90/173] avg loss 0.0135737, throughput 2.87959K wps
[Epoch 2 Batch 120/173] avg loss 0.0136534, throughput 2.80576K wps
[Epoch 2 Batch 150/173] avg loss 0.0134897, throughput 2.86769K wps
Begin Testing...
[Epoch 2] train avg loss 0.0135881, dev acc 0.6851, dev avg loss 0.670157, throughput 2.86723K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0134526, throughput 2.94604K wps
[Epoch 3 Batch 60/173] avg loss 0.0135152, throughput 2.87792K wps
[Epoch 3 Batch 90/173] avg loss 0.013377, throughput 2.87964K wps
[Epoch 3 Batch 120/173] avg loss 0.0134085, throughput 2.81923K wps
[Epoch 3 Batch 150/173] avg loss 0.0134416, throughput 2.87737K wps
Begin Testing...
[Epoch 3] train avg loss 0.0134528, dev acc 0.6893, dev avg loss 0.663581, throughput 2.87743K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/173] avg loss 0.0133008, throughput 2.90956K wps
[Epoch 4 Batch 60/173] avg loss 0.0132023, throughput 2.84753K wps
[Epoch 4 Batch 90/173] avg loss 0.0133223, throughput 2.87579K wps
[Epoch 4 Batch 120/173] avg loss 0.0131547, throughput 2.85108K wps
[Epoch 4 Batch 150/173] avg loss 0.0131872, throughput 2.88058K wps
Begin Testing...
[Epoch 4] train avg loss 0.013259, dev acc 0.7018, dev avg loss 0.655159, throughput 2.87066K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.0130711, throughput 2.943K wps
[Epoch 5 Batch 60/173] avg loss 0.0130871, throughput 2.86952K wps
[Epoch 5 Batch 90/173] avg loss 0.0130127, throughput 2.87866K wps
[Epoch 5 Batch 120/173] avg loss 0.0130958, throughput 2.87624K wps
[Epoch 5 Batch 150/173] avg loss 0.0130964, throughput 2.88179K wps
Begin Testing...
[Epoch 5] train avg loss 0.013074, dev acc 0.7174, dev avg loss 0.646116, throughput 2.88851K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/173] avg loss 0.0128958, throughput 2.92464K wps
[Epoch 6 Batch 60/173] avg loss 0.0128169, throughput 2.85463K wps
[Epoch 6 Batch 90/173] avg loss 0.0128748, throughput 2.85842K wps
[Epoch 6 Batch 120/173] avg loss 0.0128619, throughput 2.86385K wps
[Epoch 6 Batch 150/173] avg loss 0.0129392, throughput 2.86672K wps
Begin Testing...
[Epoch 6] train avg loss 0.0128644, dev acc 0.7153, dev avg loss 0.636593, throughput 2.87142K wps
[Epoch 7 Batch 30/173] avg loss 0.0127788, throughput 2.92762K wps
[Epoch 7 Batch 60/173] avg loss 0.0126927, throughput 2.87235K wps
[Epoch 7 Batch 90/173] avg loss 0.0126812, throughput 2.87448K wps
[Epoch 7 Batch 120/173] avg loss 0.0125406, throughput 2.81063K wps
[Epoch 7 Batch 150/173] avg loss 0.0125015, throughput 2.85143K wps
Begin Testing...
[Epoch 7] train avg loss 0.0126395, dev acc 0.7372, dev avg loss 0.625981, throughput 2.86874K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/173] avg loss 0.0124985, throughput 2.94754K wps
[Epoch 8 Batch 60/173] avg loss 0.0124337, throughput 2.88326K wps
[Epoch 8 Batch 90/173] avg loss 0.0124383, throughput 2.87563K wps
[Epoch 8 Batch 120/173] avg loss 0.0124128, throughput 2.84949K wps
[Epoch 8 Batch 150/173] avg loss 0.0122014, throughput 2.86926K wps
Begin Testing...
[Epoch 8] train avg loss 0.0123903, dev acc 0.6966, dev avg loss 0.618531, throughput 2.88411K wps
[Epoch 9 Batch 30/173] avg loss 0.012144, throughput 2.91649K wps
[Epoch 9 Batch 60/173] avg loss 0.0120883, throughput 2.86367K wps
[Epoch 9 Batch 90/173] avg loss 0.0121959, throughput 2.86503K wps
[Epoch 9 Batch 120/173] avg loss 0.0119532, throughput 2.8651K wps
[Epoch 9 Batch 150/173] avg loss 0.0120715, throughput 2.87982K wps
Begin Testing...
[Epoch 9] train avg loss 0.0121112, dev acc 0.7299, dev avg loss 0.603448, throughput 2.8774K wps
[Epoch 10 Batch 30/173] avg loss 0.0119295, throughput 2.88511K wps
[Epoch 10 Batch 60/173] avg loss 0.0120619, throughput 2.87708K wps
[Epoch 10 Batch 90/173] avg loss 0.0118864, throughput 2.8709K wps
[Epoch 10 Batch 120/173] avg loss 0.0118947, throughput 2.86253K wps
[Epoch 10 Batch 150/173] avg loss 0.0118387, throughput 2.87769K wps
Begin Testing...
[Epoch 10] train avg loss 0.0118913, dev acc 0.7237, dev avg loss 0.593044, throughput 2.87492K wps
[Epoch 11 Batch 30/173] avg loss 0.011481, throughput 2.92368K wps
[Epoch 11 Batch 60/173] avg loss 0.0118786, throughput 2.86421K wps
[Epoch 11 Batch 90/173] avg loss 0.0114498, throughput 2.8832K wps
[Epoch 11 Batch 120/173] avg loss 0.0116547, throughput 2.88335K wps
[Epoch 11 Batch 150/173] avg loss 0.0116745, throughput 2.86938K wps
Begin Testing...
[Epoch 11] train avg loss 0.0116158, dev acc 0.7445, dev avg loss 0.579682, throughput 2.88283K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/173] avg loss 0.0112985, throughput 2.93818K wps
[Epoch 12 Batch 60/173] avg loss 0.0112001, throughput 2.87939K wps
[Epoch 12 Batch 90/173] avg loss 0.0115326, throughput 2.85881K wps
[Epoch 12 Batch 120/173] avg loss 0.0114849, throughput 2.88282K wps
[Epoch 12 Batch 150/173] avg loss 0.0109816, throughput 2.88244K wps
Begin Testing...
[Epoch 12] train avg loss 0.0112968, dev acc 0.7487, dev avg loss 0.567312, throughput 2.88741K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/173] avg loss 0.0110542, throughput 2.85965K wps
[Epoch 13 Batch 60/173] avg loss 0.0111114, throughput 2.81642K wps
[Epoch 13 Batch 90/173] avg loss 0.0110162, throughput 2.8745K wps
[Epoch 13 Batch 120/173] avg loss 0.0109905, throughput 2.86694K wps
[Epoch 13 Batch 150/173] avg loss 0.010828, throughput 2.86909K wps
Begin Testing...
[Epoch 13] train avg loss 0.0110137, dev acc 0.7508, dev avg loss 0.555969, throughput 2.84961K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/173] avg loss 0.0105928, throughput 2.94057K wps
[Epoch 14 Batch 60/173] avg loss 0.0106576, throughput 2.86738K wps
[Epoch 14 Batch 90/173] avg loss 0.0106207, throughput 2.86959K wps
[Epoch 14 Batch 120/173] avg loss 0.0106273, throughput 2.86929K wps
[Epoch 14 Batch 150/173] avg loss 0.0107176, throughput 2.85972K wps
Begin Testing...
[Epoch 14] train avg loss 0.0106568, dev acc 0.7445, dev avg loss 0.546345, throughput 2.87574K wps
[Epoch 15 Batch 30/173] avg loss 0.0104171, throughput 2.90672K wps
[Epoch 15 Batch 60/173] avg loss 0.0104914, throughput 2.85219K wps
[Epoch 15 Batch 90/173] avg loss 0.0104458, throughput 2.88106K wps
[Epoch 15 Batch 120/173] avg loss 0.0104793, throughput 2.88389K wps
[Epoch 15 Batch 150/173] avg loss 0.0103223, throughput 2.85816K wps
Begin Testing...
[Epoch 15] train avg loss 0.0104409, dev acc 0.7602, dev avg loss 0.533402, throughput 2.86856K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.0102742, throughput 2.89195K wps
[Epoch 16 Batch 60/173] avg loss 0.0103414, throughput 2.88719K wps
[Epoch 16 Batch 90/173] avg loss 0.00987811, throughput 2.8742K wps
[Epoch 16 Batch 120/173] avg loss 0.0101421, throughput 2.85224K wps
[Epoch 16 Batch 150/173] avg loss 0.010111, throughput 2.84766K wps
Begin Testing...
[Epoch 16] train avg loss 0.010131, dev acc 0.7685, dev avg loss 0.522593, throughput 2.87028K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/173] avg loss 0.00983152, throughput 2.91797K wps
[Epoch 17 Batch 60/173] avg loss 0.00980189, throughput 2.88006K wps
[Epoch 17 Batch 90/173] avg loss 0.0102239, throughput 2.8726K wps
[Epoch 17 Batch 120/173] avg loss 0.00980059, throughput 2.88715K wps
[Epoch 17 Batch 150/173] avg loss 0.00966106, throughput 2.84569K wps
Begin Testing...
[Epoch 17] train avg loss 0.00988494, dev acc 0.7737, dev avg loss 0.51274, throughput 2.88088K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.00966367, throughput 2.89032K wps
[Epoch 18 Batch 60/173] avg loss 0.00966053, throughput 2.85689K wps
[Epoch 18 Batch 90/173] avg loss 0.00989386, throughput 2.82585K wps
[Epoch 18 Batch 120/173] avg loss 0.00982418, throughput 2.79872K wps
[Epoch 18 Batch 150/173] avg loss 0.00958254, throughput 2.87781K wps
Begin Testing...
[Epoch 18] train avg loss 0.00969279, dev acc 0.7779, dev avg loss 0.50418, throughput 2.84577K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/173] avg loss 0.00939319, throughput 2.9333K wps
[Epoch 19 Batch 60/173] avg loss 0.00923051, throughput 2.87001K wps
[Epoch 19 Batch 90/173] avg loss 0.00946095, throughput 2.87707K wps
[Epoch 19 Batch 120/173] avg loss 0.00954097, throughput 2.86715K wps
[Epoch 19 Batch 150/173] avg loss 0.00952411, throughput 2.87227K wps
Begin Testing...
[Epoch 19] train avg loss 0.00941746, dev acc 0.7810, dev avg loss 0.495465, throughput 2.8825K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/173] avg loss 0.00920667, throughput 2.8808K wps
[Epoch 20 Batch 60/173] avg loss 0.0093187, throughput 2.88534K wps
[Epoch 20 Batch 90/173] avg loss 0.00914373, throughput 2.88658K wps
[Epoch 20 Batch 120/173] avg loss 0.00936041, throughput 2.85562K wps
[Epoch 20 Batch 150/173] avg loss 0.0088263, throughput 2.81975K wps
Begin Testing...
[Epoch 20] train avg loss 0.00916826, dev acc 0.7748, dev avg loss 0.489614, throughput 2.86744K wps
[Epoch 21 Batch 30/173] avg loss 0.00920282, throughput 2.92197K wps
[Epoch 21 Batch 60/173] avg loss 0.00873718, throughput 2.85843K wps
[Epoch 21 Batch 90/173] avg loss 0.00878544, throughput 2.79761K wps
[Epoch 21 Batch 120/173] avg loss 0.00873418, throughput 2.86243K wps
[Epoch 21 Batch 150/173] avg loss 0.00906858, throughput 2.87561K wps
Begin Testing...
[Epoch 21] train avg loss 0.00893598, dev acc 0.7810, dev avg loss 0.482136, throughput 2.86383K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/173] avg loss 0.0088568, throughput 2.87896K wps
[Epoch 22 Batch 60/173] avg loss 0.00890958, throughput 2.85485K wps
[Epoch 22 Batch 90/173] avg loss 0.00863791, throughput 2.86815K wps
[Epoch 22 Batch 120/173] avg loss 0.00837537, throughput 2.84679K wps
[Epoch 22 Batch 150/173] avg loss 0.00857147, throughput 2.87113K wps
Begin Testing...
[Epoch 22] train avg loss 0.00874989, dev acc 0.7769, dev avg loss 0.476999, throughput 2.86581K wps
[Epoch 23 Batch 30/173] avg loss 0.00855375, throughput 2.93918K wps
[Epoch 23 Batch 60/173] avg loss 0.0084894, throughput 2.86559K wps
[Epoch 23 Batch 90/173] avg loss 0.00853754, throughput 2.87066K wps
[Epoch 23 Batch 120/173] avg loss 0.00823443, throughput 2.87202K wps
[Epoch 23 Batch 150/173] avg loss 0.00849789, throughput 2.87064K wps
Begin Testing...
[Epoch 23] train avg loss 0.00852757, dev acc 0.7810, dev avg loss 0.470792, throughput 2.87894K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/173] avg loss 0.00833636, throughput 2.91953K wps
[Epoch 24 Batch 60/173] avg loss 0.00801893, throughput 2.8599K wps
[Epoch 24 Batch 90/173] avg loss 0.00844696, throughput 2.87688K wps
[Epoch 24 Batch 120/173] avg loss 0.00849889, throughput 2.88273K wps
[Epoch 24 Batch 150/173] avg loss 0.00822889, throughput 2.88701K wps
Begin Testing...
[Epoch 24] train avg loss 0.00831472, dev acc 0.7779, dev avg loss 0.466671, throughput 2.88367K wps
[Epoch 25 Batch 30/173] avg loss 0.00831569, throughput 2.91274K wps
[Epoch 25 Batch 60/173] avg loss 0.00821586, throughput 2.87573K wps
[Epoch 25 Batch 90/173] avg loss 0.00812445, throughput 2.82948K wps
[Epoch 25 Batch 120/173] avg loss 0.00817102, throughput 2.81438K wps
[Epoch 25 Batch 150/173] avg loss 0.00803518, throughput 2.86404K wps
Begin Testing...
[Epoch 25] train avg loss 0.00818423, dev acc 0.7831, dev avg loss 0.464035, throughput 2.86139K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/173] avg loss 0.00773624, throughput 2.9398K wps
[Epoch 26 Batch 60/173] avg loss 0.00806583, throughput 2.86241K wps
[Epoch 26 Batch 90/173] avg loss 0.00802357, throughput 2.84689K wps
[Epoch 26 Batch 120/173] avg loss 0.00775676, throughput 2.86768K wps
[Epoch 26 Batch 150/173] avg loss 0.00783096, throughput 2.81333K wps
Begin Testing...
[Epoch 26] train avg loss 0.00790329, dev acc 0.7862, dev avg loss 0.457278, throughput 2.86496K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/173] avg loss 0.00787539, throughput 2.94625K wps
[Epoch 27 Batch 60/173] avg loss 0.00769244, throughput 2.88299K wps
[Epoch 27 Batch 90/173] avg loss 0.00809658, throughput 2.8359K wps
[Epoch 27 Batch 120/173] avg loss 0.00781445, throughput 2.86399K wps
[Epoch 27 Batch 150/173] avg loss 0.00748941, throughput 2.86696K wps
Begin Testing...
[Epoch 27] train avg loss 0.00780079, dev acc 0.7831, dev avg loss 0.454828, throughput 2.87762K wps
[Epoch 28 Batch 30/173] avg loss 0.00743666, throughput 2.84919K wps
[Epoch 28 Batch 60/173] avg loss 0.0072502, throughput 2.84535K wps
[Epoch 28 Batch 90/173] avg loss 0.00778758, throughput 2.88447K wps
[Epoch 28 Batch 120/173] avg loss 0.00762344, throughput 2.8839K wps
[Epoch 28 Batch 150/173] avg loss 0.00757724, throughput 2.8763K wps
Begin Testing...
[Epoch 28] train avg loss 0.00758496, dev acc 0.7873, dev avg loss 0.448824, throughput 2.86643K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/173] avg loss 0.00717805, throughput 2.87039K wps
[Epoch 29 Batch 60/173] avg loss 0.00764375, throughput 2.86108K wps
[Epoch 29 Batch 90/173] avg loss 0.00728434, throughput 2.8387K wps
[Epoch 29 Batch 120/173] avg loss 0.00745826, throughput 2.87155K wps
[Epoch 29 Batch 150/173] avg loss 0.00746326, throughput 2.86296K wps
Begin Testing...
[Epoch 29] train avg loss 0.00744748, dev acc 0.7894, dev avg loss 0.447407, throughput 2.85261K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/173] avg loss 0.00734647, throughput 2.92002K wps
[Epoch 30 Batch 60/173] avg loss 0.00712797, throughput 2.86584K wps
[Epoch 30 Batch 90/173] avg loss 0.00737447, throughput 2.85454K wps
[Epoch 30 Batch 120/173] avg loss 0.0072575, throughput 2.86785K wps
[Epoch 30 Batch 150/173] avg loss 0.00683675, throughput 2.83877K wps
Begin Testing...
[Epoch 30] train avg loss 0.00722087, dev acc 0.7883, dev avg loss 0.445677, throughput 2.87097K wps
[Epoch 31 Batch 30/173] avg loss 0.00697096, throughput 2.90265K wps
[Epoch 31 Batch 60/173] avg loss 0.00715423, throughput 2.84137K wps
[Epoch 31 Batch 90/173] avg loss 0.00701588, throughput 2.85415K wps
[Epoch 31 Batch 120/173] avg loss 0.00735627, throughput 2.79495K wps
[Epoch 31 Batch 150/173] avg loss 0.00654029, throughput 2.81862K wps
Begin Testing...
[Epoch 31] train avg loss 0.00704392, dev acc 0.7852, dev avg loss 0.441521, throughput 2.83565K wps
[Epoch 32 Batch 30/173] avg loss 0.00672162, throughput 2.8765K wps
[Epoch 32 Batch 60/173] avg loss 0.006808, throughput 2.86813K wps
[Epoch 32 Batch 90/173] avg loss 0.00681345, throughput 2.87227K wps
[Epoch 32 Batch 120/173] avg loss 0.00687577, throughput 2.87755K wps
[Epoch 32 Batch 150/173] avg loss 0.00691685, throughput 2.86832K wps
Begin Testing...
[Epoch 32] train avg loss 0.00683696, dev acc 0.7883, dev avg loss 0.439172, throughput 2.85954K wps
[Epoch 33 Batch 30/173] avg loss 0.00678608, throughput 2.91641K wps
[Epoch 33 Batch 60/173] avg loss 0.0068573, throughput 2.78646K wps
[Epoch 33 Batch 90/173] avg loss 0.00703853, throughput 2.85069K wps
[Epoch 33 Batch 120/173] avg loss 0.00662453, throughput 2.61424K wps
[Epoch 33 Batch 150/173] avg loss 0.00661907, throughput 2.85953K wps
Begin Testing...
[Epoch 33] train avg loss 0.00678812, dev acc 0.7883, dev avg loss 0.436711, throughput 2.81012K wps
[Epoch 34 Batch 30/173] avg loss 0.00648539, throughput 2.87643K wps
[Epoch 34 Batch 60/173] avg loss 0.00662161, throughput 2.81607K wps
[Epoch 34 Batch 90/173] avg loss 0.00660272, throughput 2.86755K wps
[Epoch 34 Batch 120/173] avg loss 0.0063389, throughput 2.88045K wps
[Epoch 34 Batch 150/173] avg loss 0.00629518, throughput 2.84707K wps
Begin Testing...
[Epoch 34] train avg loss 0.00648797, dev acc 0.7894, dev avg loss 0.434084, throughput 2.85903K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/173] avg loss 0.00658309, throughput 2.92603K wps
[Epoch 35 Batch 60/173] avg loss 0.00634125, throughput 2.83176K wps
[Epoch 35 Batch 90/173] avg loss 0.00668955, throughput 2.86351K wps
[Epoch 35 Batch 120/173] avg loss 0.00612692, throughput 2.82267K wps
[Epoch 35 Batch 150/173] avg loss 0.00659616, throughput 2.86612K wps
Begin Testing...
[Epoch 35] train avg loss 0.00642125, dev acc 0.7873, dev avg loss 0.432364, throughput 2.85697K wps
[Epoch 36 Batch 30/173] avg loss 0.00636654, throughput 2.88383K wps
[Epoch 36 Batch 60/173] avg loss 0.00629686, throughput 2.85104K wps
[Epoch 36 Batch 90/173] avg loss 0.00624742, throughput 2.86122K wps
[Epoch 36 Batch 120/173] avg loss 0.00599998, throughput 2.86511K wps
[Epoch 36 Batch 150/173] avg loss 0.00596546, throughput 2.87214K wps
Begin Testing...
[Epoch 36] train avg loss 0.0062387, dev acc 0.7862, dev avg loss 0.434059, throughput 2.86784K wps
[Epoch 37 Batch 30/173] avg loss 0.00601609, throughput 2.93125K wps
[Epoch 37 Batch 60/173] avg loss 0.00597531, throughput 2.87207K wps
[Epoch 37 Batch 90/173] avg loss 0.00581511, throughput 2.86553K wps
[Epoch 37 Batch 120/173] avg loss 0.006329, throughput 2.88117K wps
[Epoch 37 Batch 150/173] avg loss 0.00627114, throughput 2.83932K wps
Begin Testing...
[Epoch 37] train avg loss 0.00610164, dev acc 0.7852, dev avg loss 0.432767, throughput 2.87629K wps
[Epoch 38 Batch 30/173] avg loss 0.00620213, throughput 2.90813K wps
[Epoch 38 Batch 60/173] avg loss 0.00596406, throughput 2.87768K wps
[Epoch 38 Batch 90/173] avg loss 0.00574627, throughput 2.87384K wps
[Epoch 38 Batch 120/173] avg loss 0.00566363, throughput 2.82652K wps
[Epoch 38 Batch 150/173] avg loss 0.00592805, throughput 2.83231K wps
Begin Testing...
[Epoch 38] train avg loss 0.00593098, dev acc 0.7914, dev avg loss 0.427984, throughput 2.86425K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/173] avg loss 0.00581754, throughput 2.93837K wps
[Epoch 39 Batch 60/173] avg loss 0.00563027, throughput 2.87684K wps
[Epoch 39 Batch 90/173] avg loss 0.00605582, throughput 2.84927K wps
[Epoch 39 Batch 120/173] avg loss 0.00592691, throughput 2.81051K wps
[Epoch 39 Batch 150/173] avg loss 0.0057187, throughput 2.82365K wps
Begin Testing...
[Epoch 39] train avg loss 0.00580875, dev acc 0.7904, dev avg loss 0.427086, throughput 2.8605K wps
[Epoch 40 Batch 30/173] avg loss 0.00567448, throughput 2.87408K wps
[Epoch 40 Batch 60/173] avg loss 0.00565089, throughput 2.83174K wps
[Epoch 40 Batch 90/173] avg loss 0.00581039, throughput 2.84513K wps
[Epoch 40 Batch 120/173] avg loss 0.00542995, throughput 2.86316K wps
[Epoch 40 Batch 150/173] avg loss 0.00567227, throughput 2.87503K wps
Begin Testing...
[Epoch 40] train avg loss 0.00566166, dev acc 0.7977, dev avg loss 0.425461, throughput 2.85903K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/173] avg loss 0.00543224, throughput 2.86453K wps
[Epoch 41 Batch 60/173] avg loss 0.00584546, throughput 2.85637K wps
[Epoch 41 Batch 90/173] avg loss 0.00522856, throughput 2.87588K wps
[Epoch 41 Batch 120/173] avg loss 0.00577702, throughput 2.87152K wps
[Epoch 41 Batch 150/173] avg loss 0.0052003, throughput 2.87657K wps
Begin Testing...
[Epoch 41] train avg loss 0.0055122, dev acc 0.7977, dev avg loss 0.424931, throughput 2.87148K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/173] avg loss 0.00511017, throughput 2.88134K wps
[Epoch 42 Batch 60/173] avg loss 0.00568026, throughput 2.8724K wps
[Epoch 42 Batch 90/173] avg loss 0.00532734, throughput 2.87484K wps
[Epoch 42 Batch 120/173] avg loss 0.00526616, throughput 2.87343K wps
[Epoch 42 Batch 150/173] avg loss 0.00539943, throughput 2.85551K wps
Begin Testing...
[Epoch 42] train avg loss 0.00536795, dev acc 0.7925, dev avg loss 0.43004, throughput 2.86199K wps
[Epoch 43 Batch 30/173] avg loss 0.00504374, throughput 2.84721K wps
[Epoch 43 Batch 60/173] avg loss 0.00507518, throughput 2.85988K wps
[Epoch 43 Batch 90/173] avg loss 0.00524795, throughput 2.83582K wps
[Epoch 43 Batch 120/173] avg loss 0.00534785, throughput 2.86755K wps
[Epoch 43 Batch 150/173] avg loss 0.00520477, throughput 2.84556K wps
Begin Testing...
[Epoch 43] train avg loss 0.00517655, dev acc 0.7956, dev avg loss 0.424761, throughput 2.85243K wps
[Epoch 44 Batch 30/173] avg loss 0.00503883, throughput 2.94721K wps
[Epoch 44 Batch 60/173] avg loss 0.00505794, throughput 2.86419K wps
[Epoch 44 Batch 90/173] avg loss 0.00516958, throughput 2.87317K wps
[Epoch 44 Batch 120/173] avg loss 0.00483347, throughput 2.87707K wps
[Epoch 44 Batch 150/173] avg loss 0.00532607, throughput 2.87613K wps
Begin Testing...
[Epoch 44] train avg loss 0.00510103, dev acc 0.7956, dev avg loss 0.423244, throughput 2.8856K wps
[Epoch 45 Batch 30/173] avg loss 0.00484685, throughput 2.91538K wps
[Epoch 45 Batch 60/173] avg loss 0.00498223, throughput 2.84194K wps
[Epoch 45 Batch 90/173] avg loss 0.00488071, throughput 2.86101K wps
[Epoch 45 Batch 120/173] avg loss 0.00500497, throughput 2.85372K wps
[Epoch 45 Batch 150/173] avg loss 0.00501481, throughput 2.86256K wps
Begin Testing...
[Epoch 45] train avg loss 0.00494353, dev acc 0.7935, dev avg loss 0.425679, throughput 2.86638K wps
[Epoch 46 Batch 30/173] avg loss 0.00485489, throughput 2.84564K wps
[Epoch 46 Batch 60/173] avg loss 0.00480421, throughput 2.8727K wps
[Epoch 46 Batch 90/173] avg loss 0.00472304, throughput 2.87791K wps
[Epoch 46 Batch 120/173] avg loss 0.00492031, throughput 2.88192K wps
[Epoch 46 Batch 150/173] avg loss 0.00479443, throughput 2.87838K wps
Begin Testing...
[Epoch 46] train avg loss 0.00485657, dev acc 0.8019, dev avg loss 0.427048, throughput 2.86756K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/173] avg loss 0.00485593, throughput 2.88135K wps
[Epoch 47 Batch 60/173] avg loss 0.00468075, throughput 2.86136K wps
[Epoch 47 Batch 90/173] avg loss 0.00453512, throughput 2.86214K wps
[Epoch 47 Batch 120/173] avg loss 0.00464053, throughput 2.84734K wps
[Epoch 47 Batch 150/173] avg loss 0.00464376, throughput 2.87506K wps
Begin Testing...
[Epoch 47] train avg loss 0.00469221, dev acc 0.7987, dev avg loss 0.422925, throughput 2.86708K wps
[Epoch 48 Batch 30/173] avg loss 0.0046092, throughput 2.91031K wps
[Epoch 48 Batch 60/173] avg loss 0.00472233, throughput 2.84974K wps
[Epoch 48 Batch 90/173] avg loss 0.00439847, throughput 2.87486K wps
[Epoch 48 Batch 120/173] avg loss 0.00441294, throughput 2.84562K wps
[Epoch 48 Batch 150/173] avg loss 0.0045084, throughput 2.84784K wps
Begin Testing...
[Epoch 48] train avg loss 0.00455555, dev acc 0.7987, dev avg loss 0.424385, throughput 2.8572K wps
[Epoch 49 Batch 30/173] avg loss 0.00438322, throughput 2.93228K wps
[Epoch 49 Batch 60/173] avg loss 0.00451799, throughput 2.88402K wps
[Epoch 49 Batch 90/173] avg loss 0.00434648, throughput 2.87637K wps
[Epoch 49 Batch 120/173] avg loss 0.00448751, throughput 2.87071K wps
[Epoch 49 Batch 150/173] avg loss 0.00449274, throughput 2.87219K wps
Begin Testing...
[Epoch 49] train avg loss 0.00447766, dev acc 0.7977, dev avg loss 0.42928, throughput 2.88391K wps
[Epoch 50 Batch 30/173] avg loss 0.00451578, throughput 2.91284K wps
[Epoch 50 Batch 60/173] avg loss 0.00434725, throughput 2.84914K wps
[Epoch 50 Batch 90/173] avg loss 0.00410067, throughput 2.87113K wps
[Epoch 50 Batch 120/173] avg loss 0.0041187, throughput 2.86134K wps
[Epoch 50 Batch 150/173] avg loss 0.00445821, throughput 2.86372K wps
Begin Testing...
[Epoch 50] train avg loss 0.00429055, dev acc 0.7967, dev avg loss 0.426109, throughput 2.86742K wps
[Epoch 51 Batch 30/173] avg loss 0.00452351, throughput 2.94524K wps
[Epoch 51 Batch 60/173] avg loss 0.00421054, throughput 2.86029K wps
[Epoch 51 Batch 90/173] avg loss 0.00421532, throughput 2.80635K wps
[Epoch 51 Batch 120/173] avg loss 0.0042505, throughput 2.87939K wps
[Epoch 51 Batch 150/173] avg loss 0.00423541, throughput 2.87674K wps
Begin Testing...
[Epoch 51] train avg loss 0.00424718, dev acc 0.7998, dev avg loss 0.425741, throughput 2.87418K wps
[Epoch 52 Batch 30/173] avg loss 0.00374821, throughput 2.89849K wps
[Epoch 52 Batch 60/173] avg loss 0.00410433, throughput 2.85226K wps
[Epoch 52 Batch 90/173] avg loss 0.0041571, throughput 2.85447K wps
[Epoch 52 Batch 120/173] avg loss 0.00388154, throughput 2.86994K wps
[Epoch 52 Batch 150/173] avg loss 0.00399963, throughput 2.82957K wps
Begin Testing...
[Epoch 52] train avg loss 0.00407492, dev acc 0.8040, dev avg loss 0.424542, throughput 2.85973K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/173] avg loss 0.00408827, throughput 2.86281K wps
[Epoch 53 Batch 60/173] avg loss 0.0040861, throughput 2.84826K wps
[Epoch 53 Batch 90/173] avg loss 0.00397411, throughput 2.86232K wps
[Epoch 53 Batch 120/173] avg loss 0.00406623, throughput 2.86911K wps
[Epoch 53 Batch 150/173] avg loss 0.00402767, throughput 2.87457K wps
Begin Testing...
[Epoch 53] train avg loss 0.00405695, dev acc 0.8008, dev avg loss 0.425347, throughput 2.86506K wps
[Epoch 54 Batch 30/173] avg loss 0.00403621, throughput 2.90871K wps
[Epoch 54 Batch 60/173] avg loss 0.00395195, throughput 2.86372K wps
[Epoch 54 Batch 90/173] avg loss 0.00399521, throughput 2.84517K wps
[Epoch 54 Batch 120/173] avg loss 0.00367529, throughput 2.87313K wps
[Epoch 54 Batch 150/173] avg loss 0.00377226, throughput 2.87303K wps
Begin Testing...
[Epoch 54] train avg loss 0.00389272, dev acc 0.7956, dev avg loss 0.42854, throughput 2.87261K wps
[Epoch 55 Batch 30/173] avg loss 0.00386058, throughput 2.92713K wps
[Epoch 55 Batch 60/173] avg loss 0.00381172, throughput 2.84844K wps
[Epoch 55 Batch 90/173] avg loss 0.00378837, throughput 2.80065K wps
[Epoch 55 Batch 120/173] avg loss 0.00351455, throughput 2.83583K wps
[Epoch 55 Batch 150/173] avg loss 0.00377289, throughput 2.79801K wps
Begin Testing...
[Epoch 55] train avg loss 0.00375138, dev acc 0.8029, dev avg loss 0.427247, throughput 2.84465K wps
[Epoch 56 Batch 30/173] avg loss 0.0036681, throughput 2.91874K wps
[Epoch 56 Batch 60/173] avg loss 0.00400825, throughput 2.8713K wps
[Epoch 56 Batch 90/173] avg loss 0.00363597, throughput 2.87222K wps
[Epoch 56 Batch 120/173] avg loss 0.00368407, throughput 2.86342K wps
[Epoch 56 Batch 150/173] avg loss 0.00373728, throughput 2.8752K wps
Begin Testing...
[Epoch 56] train avg loss 0.00369948, dev acc 0.7987, dev avg loss 0.428976, throughput 2.87732K wps
[Epoch 57 Batch 30/173] avg loss 0.0034547, throughput 2.94576K wps
[Epoch 57 Batch 60/173] avg loss 0.00354885, throughput 2.86571K wps
[Epoch 57 Batch 90/173] avg loss 0.0034169, throughput 2.86096K wps
[Epoch 57 Batch 120/173] avg loss 0.00361398, throughput 2.87412K wps
[Epoch 57 Batch 150/173] avg loss 0.00362631, throughput 2.88217K wps
Begin Testing...
[Epoch 57] train avg loss 0.00355069, dev acc 0.7935, dev avg loss 0.437265, throughput 2.88501K wps
[Epoch 58 Batch 30/173] avg loss 0.00341539, throughput 2.95032K wps
[Epoch 58 Batch 60/173] avg loss 0.003437, throughput 2.88073K wps
[Epoch 58 Batch 90/173] avg loss 0.00356269, throughput 2.87115K wps
[Epoch 58 Batch 120/173] avg loss 0.0031639, throughput 2.86349K wps
[Epoch 58 Batch 150/173] avg loss 0.00338328, throughput 2.85174K wps
Begin Testing...
[Epoch 58] train avg loss 0.00341377, dev acc 0.7987, dev avg loss 0.431006, throughput 2.87748K wps
[Epoch 59 Batch 30/173] avg loss 0.00324804, throughput 2.91526K wps
[Epoch 59 Batch 60/173] avg loss 0.00312341, throughput 2.87676K wps
[Epoch 59 Batch 90/173] avg loss 0.00335376, throughput 2.87552K wps
[Epoch 59 Batch 120/173] avg loss 0.00350031, throughput 2.8782K wps
[Epoch 59 Batch 150/173] avg loss 0.00354407, throughput 2.87102K wps
Begin Testing...
[Epoch 59] train avg loss 0.00336415, dev acc 0.7987, dev avg loss 0.433143, throughput 2.87694K wps
[Epoch 60 Batch 30/173] avg loss 0.00322393, throughput 2.90534K wps
[Epoch 60 Batch 60/173] avg loss 0.00320139, throughput 2.85833K wps
[Epoch 60 Batch 90/173] avg loss 0.00322348, throughput 2.81417K wps
[Epoch 60 Batch 120/173] avg loss 0.00324359, throughput 2.86363K wps
[Epoch 60 Batch 150/173] avg loss 0.00334219, throughput 2.86304K wps
Begin Testing...
[Epoch 60] train avg loss 0.00324766, dev acc 0.7946, dev avg loss 0.436621, throughput 2.86075K wps
[Epoch 61 Batch 30/173] avg loss 0.00314116, throughput 2.93205K wps
[Epoch 61 Batch 60/173] avg loss 0.00312458, throughput 2.84803K wps
[Epoch 61 Batch 90/173] avg loss 0.00301609, throughput 2.81766K wps
[Epoch 61 Batch 120/173] avg loss 0.00318435, throughput 2.80383K wps
[Epoch 61 Batch 150/173] avg loss 0.00312221, throughput 2.82768K wps
Begin Testing...
[Epoch 61] train avg loss 0.00314844, dev acc 0.8008, dev avg loss 0.43525, throughput 2.84736K wps
[Epoch 62 Batch 30/173] avg loss 0.00302036, throughput 2.94822K wps
[Epoch 62 Batch 60/173] avg loss 0.00308784, throughput 2.82584K wps
[Epoch 62 Batch 90/173] avg loss 0.0030919, throughput 2.84837K wps
[Epoch 62 Batch 120/173] avg loss 0.00306216, throughput 2.82352K wps
[Epoch 62 Batch 150/173] avg loss 0.00294568, throughput 2.85404K wps
Begin Testing...
[Epoch 62] train avg loss 0.00306091, dev acc 0.8050, dev avg loss 0.435452, throughput 2.85086K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/173] avg loss 0.00301405, throughput 2.94062K wps
[Epoch 63 Batch 60/173] avg loss 0.00295278, throughput 2.87357K wps
[Epoch 63 Batch 90/173] avg loss 0.00299557, throughput 2.85342K wps
[Epoch 63 Batch 120/173] avg loss 0.00299167, throughput 2.8185K wps
[Epoch 63 Batch 150/173] avg loss 0.00290036, throughput 2.8729K wps
Begin Testing...
[Epoch 63] train avg loss 0.00298332, dev acc 0.7935, dev avg loss 0.441757, throughput 2.87104K wps
[Epoch 64 Batch 30/173] avg loss 0.0027659, throughput 2.88767K wps
[Epoch 64 Batch 60/173] avg loss 0.00288974, throughput 2.87739K wps
[Epoch 64 Batch 90/173] avg loss 0.00282006, throughput 2.84541K wps
[Epoch 64 Batch 120/173] avg loss 0.00289973, throughput 2.88302K wps
[Epoch 64 Batch 150/173] avg loss 0.00290647, throughput 2.85891K wps
Begin Testing...
[Epoch 64] train avg loss 0.00288436, dev acc 0.8029, dev avg loss 0.437115, throughput 2.87179K wps
[Epoch 65 Batch 30/173] avg loss 0.00285267, throughput 2.94013K wps
[Epoch 65 Batch 60/173] avg loss 0.00297726, throughput 2.8302K wps
[Epoch 65 Batch 90/173] avg loss 0.0026333, throughput 2.87338K wps
[Epoch 65 Batch 120/173] avg loss 0.00288181, throughput 2.86861K wps
[Epoch 65 Batch 150/173] avg loss 0.00277317, throughput 2.82389K wps
Begin Testing...
[Epoch 65] train avg loss 0.00280357, dev acc 0.8019, dev avg loss 0.439666, throughput 2.865K wps
[Epoch 66 Batch 30/173] avg loss 0.00272088, throughput 2.93939K wps
[Epoch 66 Batch 60/173] avg loss 0.00263682, throughput 2.87472K wps
[Epoch 66 Batch 90/173] avg loss 0.00263044, throughput 2.87509K wps
[Epoch 66 Batch 120/173] avg loss 0.00284773, throughput 2.87156K wps
[Epoch 66 Batch 150/173] avg loss 0.00275948, throughput 2.87957K wps
Begin Testing...
[Epoch 66] train avg loss 0.00268609, dev acc 0.8019, dev avg loss 0.440214, throughput 2.88614K wps
[Epoch 67 Batch 30/173] avg loss 0.00277075, throughput 2.92485K wps
[Epoch 67 Batch 60/173] avg loss 0.00256196, throughput 2.85904K wps
[Epoch 67 Batch 90/173] avg loss 0.00256471, throughput 2.87692K wps
[Epoch 67 Batch 120/173] avg loss 0.00275588, throughput 2.87673K wps
[Epoch 67 Batch 150/173] avg loss 0.00270608, throughput 2.81352K wps
Begin Testing...
[Epoch 67] train avg loss 0.00267989, dev acc 0.7956, dev avg loss 0.446478, throughput 2.86368K wps
[Epoch 68 Batch 30/173] avg loss 0.00260551, throughput 2.92271K wps
[Epoch 68 Batch 60/173] avg loss 0.00248466, throughput 2.82199K wps
[Epoch 68 Batch 90/173] avg loss 0.0026697, throughput 2.86652K wps
[Epoch 68 Batch 120/173] avg loss 0.00262663, throughput 2.86793K wps
[Epoch 68 Batch 150/173] avg loss 0.00262904, throughput 2.87389K wps
Begin Testing...
[Epoch 68] train avg loss 0.00261508, dev acc 0.8008, dev avg loss 0.445105, throughput 2.87049K wps
[Epoch 69 Batch 30/173] avg loss 0.00258208, throughput 2.89435K wps
[Epoch 69 Batch 60/173] avg loss 0.00264913, throughput 2.8531K wps
[Epoch 69 Batch 90/173] avg loss 0.00227248, throughput 2.86978K wps
[Epoch 69 Batch 120/173] avg loss 0.00267887, throughput 2.84079K wps
[Epoch 69 Batch 150/173] avg loss 0.00252497, throughput 2.87766K wps
Begin Testing...
[Epoch 69] train avg loss 0.00253655, dev acc 0.7956, dev avg loss 0.448587, throughput 2.86849K wps
[Epoch 70 Batch 30/173] avg loss 0.00239063, throughput 2.90737K wps
[Epoch 70 Batch 60/173] avg loss 0.00227985, throughput 2.84818K wps
[Epoch 70 Batch 90/173] avg loss 0.00243475, throughput 2.86513K wps
[Epoch 70 Batch 120/173] avg loss 0.00262212, throughput 2.84284K wps
[Epoch 70 Batch 150/173] avg loss 0.00254871, throughput 2.88205K wps
Begin Testing...
[Epoch 70] train avg loss 0.00246208, dev acc 0.7998, dev avg loss 0.450379, throughput 2.86784K wps
[Epoch 71 Batch 30/173] avg loss 0.00243138, throughput 2.92796K wps
[Epoch 71 Batch 60/173] avg loss 0.00238725, throughput 2.84227K wps
[Epoch 71 Batch 90/173] avg loss 0.00221918, throughput 2.85996K wps
[Epoch 71 Batch 120/173] avg loss 0.00255662, throughput 2.87172K wps
[Epoch 71 Batch 150/173] avg loss 0.00238386, throughput 2.87998K wps
Begin Testing...
[Epoch 71] train avg loss 0.00237592, dev acc 0.8029, dev avg loss 0.449632, throughput 2.87735K wps
[Epoch 72 Batch 30/173] avg loss 0.00238629, throughput 2.8484K wps
[Epoch 72 Batch 60/173] avg loss 0.002145, throughput 2.86185K wps
[Epoch 72 Batch 90/173] avg loss 0.00241656, throughput 2.86464K wps
[Epoch 72 Batch 120/173] avg loss 0.00239149, throughput 2.82949K wps
[Epoch 72 Batch 150/173] avg loss 0.00228266, throughput 2.87029K wps
Begin Testing...
[Epoch 72] train avg loss 0.0023258, dev acc 0.8029, dev avg loss 0.451622, throughput 2.85684K wps
[Epoch 73 Batch 30/173] avg loss 0.00208866, throughput 2.93426K wps
[Epoch 73 Batch 60/173] avg loss 0.002308, throughput 2.88114K wps
[Epoch 73 Batch 90/173] avg loss 0.00235347, throughput 2.8855K wps
[Epoch 73 Batch 120/173] avg loss 0.00241546, throughput 2.85504K wps
[Epoch 73 Batch 150/173] avg loss 0.0021723, throughput 2.85313K wps
Begin Testing...
[Epoch 73] train avg loss 0.00225581, dev acc 0.7956, dev avg loss 0.459778, throughput 2.88088K wps
[Epoch 74 Batch 30/173] avg loss 0.0020928, throughput 2.94427K wps
[Epoch 74 Batch 60/173] avg loss 0.00221554, throughput 2.88002K wps
[Epoch 74 Batch 90/173] avg loss 0.00214552, throughput 2.82182K wps
[Epoch 74 Batch 120/173] avg loss 0.00198201, throughput 2.80695K wps
[Epoch 74 Batch 150/173] avg loss 0.00228963, throughput 2.80429K wps
Begin Testing...
[Epoch 74] train avg loss 0.00217659, dev acc 0.7977, dev avg loss 0.456204, throughput 2.85169K wps
[Epoch 75 Batch 30/173] avg loss 0.002098, throughput 2.90767K wps
[Epoch 75 Batch 60/173] avg loss 0.00216774, throughput 2.87445K wps
[Epoch 75 Batch 90/173] avg loss 0.00211419, throughput 2.87106K wps
[Epoch 75 Batch 120/173] avg loss 0.00208331, throughput 2.87751K wps
[Epoch 75 Batch 150/173] avg loss 0.00187874, throughput 2.85049K wps
Begin Testing...
[Epoch 75] train avg loss 0.00206683, dev acc 0.8008, dev avg loss 0.459026, throughput 2.87662K wps
[Epoch 76 Batch 30/173] avg loss 0.0020729, throughput 2.85506K wps
[Epoch 76 Batch 60/173] avg loss 0.00207153, throughput 2.83207K wps
[Epoch 76 Batch 90/173] avg loss 0.00191254, throughput 2.87801K wps
[Epoch 76 Batch 120/173] avg loss 0.00208666, throughput 2.88059K wps
[Epoch 76 Batch 150/173] avg loss 0.0021598, throughput 2.87611K wps
Begin Testing...
[Epoch 76] train avg loss 0.00208827, dev acc 0.7977, dev avg loss 0.458764, throughput 2.86669K wps
[Epoch 77 Batch 30/173] avg loss 0.0020721, throughput 2.93112K wps
[Epoch 77 Batch 60/173] avg loss 0.00204274, throughput 2.85046K wps
[Epoch 77 Batch 90/173] avg loss 0.00208669, throughput 2.85323K wps
[Epoch 77 Batch 120/173] avg loss 0.00202505, throughput 2.86616K wps
[Epoch 77 Batch 150/173] avg loss 0.00183419, throughput 2.82804K wps
Begin Testing...
[Epoch 77] train avg loss 0.00203015, dev acc 0.7967, dev avg loss 0.461584, throughput 2.86642K wps
[Epoch 78 Batch 30/173] avg loss 0.00186973, throughput 2.84746K wps
[Epoch 78 Batch 60/173] avg loss 0.00198105, throughput 2.85762K wps
[Epoch 78 Batch 90/173] avg loss 0.00193146, throughput 2.87042K wps
[Epoch 78 Batch 120/173] avg loss 0.00195555, throughput 2.87344K wps
[Epoch 78 Batch 150/173] avg loss 0.0019841, throughput 2.86626K wps
Begin Testing...
[Epoch 78] train avg loss 0.00194898, dev acc 0.8019, dev avg loss 0.4613, throughput 2.86196K wps
[Epoch 79 Batch 30/173] avg loss 0.00196469, throughput 2.89508K wps
[Epoch 79 Batch 60/173] avg loss 0.00192105, throughput 2.82616K wps
[Epoch 79 Batch 90/173] avg loss 0.00186571, throughput 2.81061K wps
[Epoch 79 Batch 120/173] avg loss 0.0021649, throughput 2.84197K wps
[Epoch 79 Batch 150/173] avg loss 0.00175275, throughput 2.83748K wps
Begin Testing...
[Epoch 79] train avg loss 0.00191618, dev acc 0.7967, dev avg loss 0.466517, throughput 2.84619K wps
[Epoch 80 Batch 30/173] avg loss 0.00160957, throughput 2.92827K wps
[Epoch 80 Batch 60/173] avg loss 0.00194707, throughput 2.86175K wps
[Epoch 80 Batch 90/173] avg loss 0.00198231, throughput 2.83288K wps
[Epoch 80 Batch 120/173] avg loss 0.00184201, throughput 2.85326K wps
[Epoch 80 Batch 150/173] avg loss 0.00184245, throughput 2.86642K wps
Begin Testing...
[Epoch 80] train avg loss 0.00185679, dev acc 0.8040, dev avg loss 0.466611, throughput 2.86953K wps
[Epoch 81 Batch 30/173] avg loss 0.00165184, throughput 2.92718K wps
[Epoch 81 Batch 60/173] avg loss 0.00177106, throughput 2.86655K wps
[Epoch 81 Batch 90/173] avg loss 0.00189179, throughput 2.87307K wps
[Epoch 81 Batch 120/173] avg loss 0.00171199, throughput 2.87785K wps
[Epoch 81 Batch 150/173] avg loss 0.0019385, throughput 2.82771K wps
Begin Testing...
[Epoch 81] train avg loss 0.00180054, dev acc 0.8040, dev avg loss 0.468565, throughput 2.87058K wps
[Epoch 82 Batch 30/173] avg loss 0.00172846, throughput 2.94119K wps
[Epoch 82 Batch 60/173] avg loss 0.00177479, throughput 2.87898K wps
[Epoch 82 Batch 90/173] avg loss 0.00183067, throughput 2.87545K wps
[Epoch 82 Batch 120/173] avg loss 0.00177018, throughput 2.87425K wps
[Epoch 82 Batch 150/173] avg loss 0.00198845, throughput 2.87737K wps
Begin Testing...
[Epoch 82] train avg loss 0.0018062, dev acc 0.8029, dev avg loss 0.470905, throughput 2.88743K wps
[Epoch 83 Batch 30/173] avg loss 0.00162224, throughput 2.87182K wps
[Epoch 83 Batch 60/173] avg loss 0.0017382, throughput 2.83825K wps
[Epoch 83 Batch 90/173] avg loss 0.00166217, throughput 2.88054K wps
[Epoch 83 Batch 120/173] avg loss 0.00174622, throughput 2.8232K wps
[Epoch 83 Batch 150/173] avg loss 0.00166452, throughput 2.80343K wps
Begin Testing...
[Epoch 83] train avg loss 0.00170913, dev acc 0.7967, dev avg loss 0.475043, throughput 2.84292K wps
[Epoch 84 Batch 30/173] avg loss 0.00163837, throughput 2.91887K wps
[Epoch 84 Batch 60/173] avg loss 0.00177377, throughput 2.8634K wps
[Epoch 84 Batch 90/173] avg loss 0.00156823, throughput 2.86579K wps
[Epoch 84 Batch 120/173] avg loss 0.00173469, throughput 2.8084K wps
[Epoch 84 Batch 150/173] avg loss 0.00180672, throughput 2.84342K wps
Begin Testing...
[Epoch 84] train avg loss 0.00171126, dev acc 0.8008, dev avg loss 0.474134, throughput 2.85992K wps
[Epoch 85 Batch 30/173] avg loss 0.00155534, throughput 2.91608K wps
[Epoch 85 Batch 60/173] avg loss 0.0015887, throughput 2.80828K wps
[Epoch 85 Batch 90/173] avg loss 0.00158783, throughput 2.79162K wps
[Epoch 85 Batch 120/173] avg loss 0.00168157, throughput 2.85064K wps
[Epoch 85 Batch 150/173] avg loss 0.00172923, throughput 2.86391K wps
Begin Testing...
[Epoch 85] train avg loss 0.00162785, dev acc 0.7925, dev avg loss 0.480481, throughput 2.84919K wps
[Epoch 86 Batch 30/173] avg loss 0.00159642, throughput 2.89704K wps
[Epoch 86 Batch 60/173] avg loss 0.00164348, throughput 2.87635K wps
[Epoch 86 Batch 90/173] avg loss 0.00168536, throughput 2.87715K wps
[Epoch 86 Batch 120/173] avg loss 0.00161368, throughput 2.81043K wps
[Epoch 86 Batch 150/173] avg loss 0.0015426, throughput 2.83588K wps
Begin Testing...
[Epoch 86] train avg loss 0.00161982, dev acc 0.7977, dev avg loss 0.481663, throughput 2.86034K wps
[Epoch 87 Batch 30/173] avg loss 0.00145836, throughput 2.93176K wps
[Epoch 87 Batch 60/173] avg loss 0.0015903, throughput 2.87245K wps
[Epoch 87 Batch 90/173] avg loss 0.00165183, throughput 2.87983K wps
[Epoch 87 Batch 120/173] avg loss 0.00151969, throughput 2.87208K wps
[Epoch 87 Batch 150/173] avg loss 0.00157356, throughput 2.86833K wps
Begin Testing...
[Epoch 87] train avg loss 0.00157564, dev acc 0.7967, dev avg loss 0.483379, throughput 2.88326K wps
[Epoch 88 Batch 30/173] avg loss 0.00147944, throughput 2.92384K wps
[Epoch 88 Batch 60/173] avg loss 0.00160999, throughput 2.86951K wps
[Epoch 88 Batch 90/173] avg loss 0.00155875, throughput 2.88224K wps
[Epoch 88 Batch 120/173] avg loss 0.00148443, throughput 2.86834K wps
[Epoch 88 Batch 150/173] avg loss 0.00174701, throughput 2.88395K wps
Begin Testing...
[Epoch 88] train avg loss 0.00156022, dev acc 0.8008, dev avg loss 0.485642, throughput 2.88498K wps
[Epoch 89 Batch 30/173] avg loss 0.00151297, throughput 2.92627K wps
[Epoch 89 Batch 60/173] avg loss 0.00147004, throughput 2.87652K wps
[Epoch 89 Batch 90/173] avg loss 0.00152504, throughput 2.87059K wps
[Epoch 89 Batch 120/173] avg loss 0.00135341, throughput 2.87699K wps
[Epoch 89 Batch 150/173] avg loss 0.00139164, throughput 2.8744K wps
Begin Testing...
[Epoch 89] train avg loss 0.00146923, dev acc 0.7987, dev avg loss 0.48836, throughput 2.88317K wps
[Epoch 90 Batch 30/173] avg loss 0.00138646, throughput 2.93609K wps
[Epoch 90 Batch 60/173] avg loss 0.00145081, throughput 2.86371K wps
[Epoch 90 Batch 90/173] avg loss 0.00148765, throughput 2.82319K wps
[Epoch 90 Batch 120/173] avg loss 0.00142053, throughput 2.86938K wps
[Epoch 90 Batch 150/173] avg loss 0.00148724, throughput 2.81388K wps
Begin Testing...
[Epoch 90] train avg loss 0.001465, dev acc 0.7935, dev avg loss 0.496853, throughput 2.86355K wps
[Epoch 91 Batch 30/173] avg loss 0.00136022, throughput 2.91669K wps
[Epoch 91 Batch 60/173] avg loss 0.00146115, throughput 2.87945K wps
[Epoch 91 Batch 90/173] avg loss 0.0013908, throughput 2.86773K wps
[Epoch 91 Batch 120/173] avg loss 0.00133998, throughput 2.86697K wps
[Epoch 91 Batch 150/173] avg loss 0.00141945, throughput 2.86296K wps
Begin Testing...
[Epoch 91] train avg loss 0.00141292, dev acc 0.7946, dev avg loss 0.493458, throughput 2.87805K wps
[Epoch 92 Batch 30/173] avg loss 0.00139401, throughput 2.9195K wps
[Epoch 92 Batch 60/173] avg loss 0.00140347, throughput 2.86567K wps
[Epoch 92 Batch 90/173] avg loss 0.00133871, throughput 2.843K wps
[Epoch 92 Batch 120/173] avg loss 0.00132303, throughput 2.86576K wps
[Epoch 92 Batch 150/173] avg loss 0.00139192, throughput 2.86388K wps
Begin Testing...
[Epoch 92] train avg loss 0.00135691, dev acc 0.7998, dev avg loss 0.495584, throughput 2.87013K wps
[Epoch 93 Batch 30/173] avg loss 0.00138237, throughput 2.91244K wps
[Epoch 93 Batch 60/173] avg loss 0.00141844, throughput 2.88112K wps
[Epoch 93 Batch 90/173] avg loss 0.00131575, throughput 2.87922K wps
[Epoch 93 Batch 120/173] avg loss 0.00125099, throughput 2.80512K wps
[Epoch 93 Batch 150/173] avg loss 0.00132455, throughput 2.86677K wps
Begin Testing...
[Epoch 93] train avg loss 0.00136561, dev acc 0.7977, dev avg loss 0.494601, throughput 2.86737K wps
[Epoch 94 Batch 30/173] avg loss 0.00133252, throughput 2.88327K wps
[Epoch 94 Batch 60/173] avg loss 0.00130698, throughput 2.81557K wps
[Epoch 94 Batch 90/173] avg loss 0.00126576, throughput 2.85502K wps
[Epoch 94 Batch 120/173] avg loss 0.00146788, throughput 2.87877K wps
[Epoch 94 Batch 150/173] avg loss 0.00127735, throughput 2.86308K wps
Begin Testing...
[Epoch 94] train avg loss 0.00132408, dev acc 0.7967, dev avg loss 0.496892, throughput 2.85239K wps
[Epoch 95 Batch 30/173] avg loss 0.00118532, throughput 2.91709K wps
[Epoch 95 Batch 60/173] avg loss 0.00118911, throughput 2.87465K wps
[Epoch 95 Batch 90/173] avg loss 0.00130002, throughput 2.8263K wps
[Epoch 95 Batch 120/173] avg loss 0.00137896, throughput 2.87139K wps
[Epoch 95 Batch 150/173] avg loss 0.00135279, throughput 2.8621K wps
Begin Testing...
[Epoch 95] train avg loss 0.00129021, dev acc 0.7987, dev avg loss 0.496171, throughput 2.87187K wps
[Epoch 96 Batch 30/173] avg loss 0.0011647, throughput 2.93031K wps
[Epoch 96 Batch 60/173] avg loss 0.00125286, throughput 2.86219K wps
[Epoch 96 Batch 90/173] avg loss 0.00131451, throughput 2.80725K wps
[Epoch 96 Batch 120/173] avg loss 0.00139448, throughput 2.83094K wps
[Epoch 96 Batch 150/173] avg loss 0.00128379, throughput 2.8008K wps
Begin Testing...
[Epoch 96] train avg loss 0.00127371, dev acc 0.8008, dev avg loss 0.498206, throughput 2.84611K wps
[Epoch 97 Batch 30/173] avg loss 0.00116637, throughput 2.908K wps
[Epoch 97 Batch 60/173] avg loss 0.00126775, throughput 2.85389K wps
[Epoch 97 Batch 90/173] avg loss 0.00123946, throughput 2.8549K wps
[Epoch 97 Batch 120/173] avg loss 0.00134858, throughput 2.82184K wps
[Epoch 97 Batch 150/173] avg loss 0.00115689, throughput 2.87105K wps
Begin Testing...
[Epoch 97] train avg loss 0.00123184, dev acc 0.7987, dev avg loss 0.503119, throughput 2.86258K wps
[Epoch 98 Batch 30/173] avg loss 0.00109022, throughput 2.90115K wps
[Epoch 98 Batch 60/173] avg loss 0.00125819, throughput 2.87216K wps
[Epoch 98 Batch 90/173] avg loss 0.00119353, throughput 2.87354K wps
[Epoch 98 Batch 120/173] avg loss 0.00126321, throughput 2.80608K wps
[Epoch 98 Batch 150/173] avg loss 0.00119793, throughput 2.82473K wps
Begin Testing...
[Epoch 98] train avg loss 0.00120224, dev acc 0.7956, dev avg loss 0.506447, throughput 2.85851K wps
[Epoch 99 Batch 30/173] avg loss 0.00114286, throughput 2.92973K wps
[Epoch 99 Batch 60/173] avg loss 0.00112771, throughput 2.85874K wps
[Epoch 99 Batch 90/173] avg loss 0.00103984, throughput 2.87399K wps
[Epoch 99 Batch 120/173] avg loss 0.00115114, throughput 2.863K wps
[Epoch 99 Batch 150/173] avg loss 0.00122998, throughput 2.83892K wps
Begin Testing...
[Epoch 99] train avg loss 0.00114291, dev acc 0.7998, dev avg loss 0.506626, throughput 2.86408K wps
[Epoch 100 Batch 30/173] avg loss 0.0010489, throughput 2.91938K wps
[Epoch 100 Batch 60/173] avg loss 0.00109393, throughput 2.87487K wps
[Epoch 100 Batch 90/173] avg loss 0.00114275, throughput 2.84819K wps
[Epoch 100 Batch 120/173] avg loss 0.00110486, throughput 2.86138K wps
[Epoch 100 Batch 150/173] avg loss 0.00113952, throughput 2.88335K wps
Begin Testing...
[Epoch 100] train avg loss 0.00113748, dev acc 0.7967, dev avg loss 0.513572, throughput 2.87826K wps
[Epoch 101 Batch 30/173] avg loss 0.00108601, throughput 2.88397K wps
[Epoch 101 Batch 60/173] avg loss 0.0011936, throughput 2.84356K wps
[Epoch 101 Batch 90/173] avg loss 0.00113893, throughput 2.81699K wps
[Epoch 101 Batch 120/173] avg loss 0.00110401, throughput 2.84694K wps
[Epoch 101 Batch 150/173] avg loss 0.00115178, throughput 2.8655K wps
Begin Testing...
[Epoch 101] train avg loss 0.00112981, dev acc 0.7935, dev avg loss 0.518666, throughput 2.85399K wps
[Epoch 102 Batch 30/173] avg loss 0.00118046, throughput 2.94788K wps
[Epoch 102 Batch 60/173] avg loss 0.0010786, throughput 2.87298K wps
[Epoch 102 Batch 90/173] avg loss 0.00112429, throughput 2.86007K wps
[Epoch 102 Batch 120/173] avg loss 0.00112059, throughput 2.86935K wps
[Epoch 102 Batch 150/173] avg loss 0.00110422, throughput 2.86153K wps
Begin Testing...
[Epoch 102] train avg loss 0.00110776, dev acc 0.7914, dev avg loss 0.520289, throughput 2.87446K wps
[Epoch 103 Batch 30/173] avg loss 0.00106487, throughput 2.90784K wps
[Epoch 103 Batch 60/173] avg loss 0.00107137, throughput 2.88291K wps
[Epoch 103 Batch 90/173] avg loss 0.00106519, throughput 2.86764K wps
[Epoch 103 Batch 120/173] avg loss 0.00113607, throughput 2.88096K wps
[Epoch 103 Batch 150/173] avg loss 0.00108463, throughput 2.87036K wps
Begin Testing...
[Epoch 103] train avg loss 0.00109095, dev acc 0.7925, dev avg loss 0.518213, throughput 2.87578K wps
[Epoch 104 Batch 30/173] avg loss 0.0010789, throughput 2.87426K wps
[Epoch 104 Batch 60/173] avg loss 0.00101518, throughput 2.81081K wps
[Epoch 104 Batch 90/173] avg loss 0.001177, throughput 2.87619K wps
[Epoch 104 Batch 120/173] avg loss 0.00107541, throughput 2.87875K wps
[Epoch 104 Batch 150/173] avg loss 0.000969642, throughput 2.85987K wps
Begin Testing...
[Epoch 104] train avg loss 0.00105328, dev acc 0.7956, dev avg loss 0.517371, throughput 2.8526K wps
[Epoch 105 Batch 30/173] avg loss 0.000927272, throughput 2.90813K wps
[Epoch 105 Batch 60/173] avg loss 0.00104543, throughput 2.87498K wps
[Epoch 105 Batch 90/173] avg loss 0.000989088, throughput 2.82388K wps
[Epoch 105 Batch 120/173] avg loss 0.00106695, throughput 2.84584K wps
[Epoch 105 Batch 150/173] avg loss 0.000938606, throughput 2.83863K wps
Begin Testing...
[Epoch 105] train avg loss 0.000991577, dev acc 0.8040, dev avg loss 0.521176, throughput 2.86053K wps
[Epoch 106 Batch 30/173] avg loss 0.00102472, throughput 2.93211K wps
[Epoch 106 Batch 60/173] avg loss 0.00101361, throughput 2.87967K wps
[Epoch 106 Batch 90/173] avg loss 0.00103217, throughput 2.86223K wps
[Epoch 106 Batch 120/173] avg loss 0.000932622, throughput 2.79282K wps
[Epoch 106 Batch 150/173] avg loss 0.000994413, throughput 2.87174K wps
Begin Testing...
[Epoch 106] train avg loss 0.00101078, dev acc 0.7956, dev avg loss 0.524255, throughput 2.86165K wps
[Epoch 107 Batch 30/173] avg loss 0.00107841, throughput 2.89643K wps
[Epoch 107 Batch 60/173] avg loss 0.000954888, throughput 2.83629K wps
[Epoch 107 Batch 90/173] avg loss 0.00100851, throughput 2.86785K wps
[Epoch 107 Batch 120/173] avg loss 0.00104688, throughput 2.87175K wps
[Epoch 107 Batch 150/173] avg loss 0.0010771, throughput 2.87201K wps
Begin Testing...
[Epoch 107] train avg loss 0.00102688, dev acc 0.7935, dev avg loss 0.529511, throughput 2.86499K wps
[Epoch 108 Batch 30/173] avg loss 0.00095366, throughput 2.94189K wps
[Epoch 108 Batch 60/173] avg loss 0.000948086, throughput 2.82028K wps
[Epoch 108 Batch 90/173] avg loss 0.000923757, throughput 2.83806K wps
[Epoch 108 Batch 120/173] avg loss 0.000961376, throughput 2.88351K wps
[Epoch 108 Batch 150/173] avg loss 0.000999991, throughput 2.8786K wps
Begin Testing...
[Epoch 108] train avg loss 0.000968281, dev acc 0.7987, dev avg loss 0.525569, throughput 2.86851K wps
[Epoch 109 Batch 30/173] avg loss 0.000943056, throughput 2.92084K wps
[Epoch 109 Batch 60/173] avg loss 0.000882888, throughput 2.83395K wps
[Epoch 109 Batch 90/173] avg loss 0.000955955, throughput 2.85935K wps
[Epoch 109 Batch 120/173] avg loss 0.000966396, throughput 2.84134K wps
[Epoch 109 Batch 150/173] avg loss 0.00083313, throughput 2.80912K wps
Begin Testing...
[Epoch 109] train avg loss 0.000927407, dev acc 0.7946, dev avg loss 0.5298, throughput 2.84816K wps
[Epoch 110 Batch 30/173] avg loss 0.00109191, throughput 2.93678K wps
[Epoch 110 Batch 60/173] avg loss 0.000914194, throughput 2.80395K wps
[Epoch 110 Batch 90/173] avg loss 0.000985326, throughput 2.87312K wps
[Epoch 110 Batch 120/173] avg loss 0.000927203, throughput 2.85443K wps
[Epoch 110 Batch 150/173] avg loss 0.000909871, throughput 2.86711K wps
Begin Testing...
[Epoch 110] train avg loss 0.000951586, dev acc 0.8019, dev avg loss 0.528314, throughput 2.86599K wps
[Epoch 111 Batch 30/173] avg loss 0.000937868, throughput 2.91403K wps
[Epoch 111 Batch 60/173] avg loss 0.00103136, throughput 2.8652K wps
[Epoch 111 Batch 90/173] avg loss 0.000884975, throughput 2.8543K wps
[Epoch 111 Batch 120/173] avg loss 0.00101107, throughput 2.8762K wps
[Epoch 111 Batch 150/173] avg loss 0.000912404, throughput 2.88095K wps
Begin Testing...
[Epoch 111] train avg loss 0.000970891, dev acc 0.8019, dev avg loss 0.528571, throughput 2.87749K wps
[Epoch 112 Batch 30/173] avg loss 0.000915615, throughput 2.87038K wps
[Epoch 112 Batch 60/173] avg loss 0.000913336, throughput 2.84057K wps
[Epoch 112 Batch 90/173] avg loss 0.000963051, throughput 2.81574K wps
[Epoch 112 Batch 120/173] avg loss 0.000961746, throughput 2.87423K wps
[Epoch 112 Batch 150/173] avg loss 0.000920737, throughput 2.86692K wps
Begin Testing...
[Epoch 112] train avg loss 0.000932758, dev acc 0.7987, dev avg loss 0.530889, throughput 2.85177K wps
[Epoch 113 Batch 30/173] avg loss 0.000838182, throughput 2.912K wps
[Epoch 113 Batch 60/173] avg loss 0.000820746, throughput 2.85575K wps
[Epoch 113 Batch 90/173] avg loss 0.000841148, throughput 2.86124K wps
[Epoch 113 Batch 120/173] avg loss 0.000968879, throughput 2.86029K wps
[Epoch 113 Batch 150/173] avg loss 0.000910294, throughput 2.82568K wps
Begin Testing...
[Epoch 113] train avg loss 0.000888199, dev acc 0.7967, dev avg loss 0.537814, throughput 2.85328K wps
[Epoch 114 Batch 30/173] avg loss 0.000841376, throughput 2.93735K wps
[Epoch 114 Batch 60/173] avg loss 0.00087541, throughput 2.8251K wps
[Epoch 114 Batch 90/173] avg loss 0.000796865, throughput 2.86224K wps
[Epoch 114 Batch 120/173] avg loss 0.00084857, throughput 2.86304K wps
[Epoch 114 Batch 150/173] avg loss 0.000944839, throughput 2.87648K wps
Begin Testing...
[Epoch 114] train avg loss 0.000852088, dev acc 0.7967, dev avg loss 0.537468, throughput 2.87322K wps
[Epoch 115 Batch 30/173] avg loss 0.000731506, throughput 2.89771K wps
[Epoch 115 Batch 60/173] avg loss 0.00102009, throughput 2.85973K wps
[Epoch 115 Batch 90/173] avg loss 0.000887505, throughput 2.86064K wps
[Epoch 115 Batch 120/173] avg loss 0.000884197, throughput 2.86642K wps
[Epoch 115 Batch 150/173] avg loss 0.000855448, throughput 2.8278K wps
Begin Testing...
[Epoch 115] train avg loss 0.000865832, dev acc 0.7998, dev avg loss 0.539083, throughput 2.8645K wps
[Epoch 116 Batch 30/173] avg loss 0.000802364, throughput 2.93851K wps
[Epoch 116 Batch 60/173] avg loss 0.000843752, throughput 2.8648K wps
[Epoch 116 Batch 90/173] avg loss 0.000786806, throughput 2.85434K wps
[Epoch 116 Batch 120/173] avg loss 0.000858148, throughput 2.85618K wps
[Epoch 116 Batch 150/173] avg loss 0.000786109, throughput 2.83867K wps
Begin Testing...
[Epoch 116] train avg loss 0.000818118, dev acc 0.7925, dev avg loss 0.545719, throughput 2.86989K wps
[Epoch 117 Batch 30/173] avg loss 0.000778833, throughput 2.87872K wps
[Epoch 117 Batch 60/173] avg loss 0.000718111, throughput 2.87835K wps
[Epoch 117 Batch 90/173] avg loss 0.000777344, throughput 2.87733K wps
[Epoch 117 Batch 120/173] avg loss 0.000769517, throughput 2.87549K wps
[Epoch 117 Batch 150/173] avg loss 0.00088813, throughput 2.86619K wps
Begin Testing...
[Epoch 117] train avg loss 0.000801979, dev acc 0.7946, dev avg loss 0.548189, throughput 2.87558K wps
[Epoch 118 Batch 30/173] avg loss 0.00079289, throughput 2.94035K wps
[Epoch 118 Batch 60/173] avg loss 0.000844414, throughput 2.88031K wps
[Epoch 118 Batch 90/173] avg loss 0.000854436, throughput 2.88213K wps
[Epoch 118 Batch 120/173] avg loss 0.000755674, throughput 2.83604K wps
[Epoch 118 Batch 150/173] avg loss 0.000799857, throughput 2.84984K wps
Begin Testing...
[Epoch 118] train avg loss 0.000799049, dev acc 0.7935, dev avg loss 0.545143, throughput 2.87744K wps
[Epoch 119 Batch 30/173] avg loss 0.000832184, throughput 2.85748K wps
[Epoch 119 Batch 60/173] avg loss 0.000717119, throughput 2.82444K wps