Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
15231 lines (15230 sloc) 940 KB
Namespace(batch_size=50, data_name='Subj', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='rand')
Use gpu0
Downloading data/Subj/all-9e7bd1da.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/subj/all-9e7bd1da.zip...
maximum length (in tokens): 120
Done! Tokenizing Time=0.23s, #Sentences=10000
SentimentNet(
(embedding): Embedding(21326 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/162] avg loss 0.0138909, throughput 0.522161K wps
[Epoch 0 Batch 60/162] avg loss 0.0138759, throughput 3.95404K wps
[Epoch 0 Batch 90/162] avg loss 0.013896, throughput 3.97351K wps
[Epoch 0 Batch 120/162] avg loss 0.0138294, throughput 3.96696K wps
[Epoch 0 Batch 150/162] avg loss 0.0138731, throughput 3.97796K wps
Begin Testing...
[Epoch 0] train avg loss 0.013871, dev acc 0.5744, dev avg loss 0.690571, throughput 1.78584K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0137667, throughput 4.06541K wps
[Epoch 1 Batch 60/162] avg loss 0.0137923, throughput 3.97257K wps
[Epoch 1 Batch 90/162] avg loss 0.013768, throughput 3.97724K wps
[Epoch 1 Batch 120/162] avg loss 0.0137653, throughput 3.94822K wps
[Epoch 1 Batch 150/162] avg loss 0.013732, throughput 3.96598K wps
Begin Testing...
[Epoch 1] train avg loss 0.0137544, dev acc 0.4844, dev avg loss 0.690246, throughput 3.98377K wps
[Epoch 2 Batch 30/162] avg loss 0.0137515, throughput 4.06505K wps
[Epoch 2 Batch 60/162] avg loss 0.0136824, throughput 3.97214K wps
[Epoch 2 Batch 90/162] avg loss 0.0136426, throughput 3.96751K wps
[Epoch 2 Batch 120/162] avg loss 0.0136057, throughput 3.97263K wps
[Epoch 2 Batch 150/162] avg loss 0.0135923, throughput 3.97816K wps
Begin Testing...
[Epoch 2] train avg loss 0.0136548, dev acc 0.6522, dev avg loss 0.680789, throughput 3.98968K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0135493, throughput 4.05837K wps
[Epoch 3 Batch 60/162] avg loss 0.0135641, throughput 3.96222K wps
[Epoch 3 Batch 90/162] avg loss 0.0135126, throughput 3.95023K wps
[Epoch 3 Batch 120/162] avg loss 0.0135067, throughput 3.96634K wps
[Epoch 3 Batch 150/162] avg loss 0.0134486, throughput 3.94222K wps
Begin Testing...
[Epoch 3] train avg loss 0.0135101, dev acc 0.6311, dev avg loss 0.675445, throughput 3.97409K wps
[Epoch 4 Batch 30/162] avg loss 0.0133907, throughput 4.05892K wps
[Epoch 4 Batch 60/162] avg loss 0.0133574, throughput 3.96841K wps
[Epoch 4 Batch 90/162] avg loss 0.0133609, throughput 3.97717K wps
[Epoch 4 Batch 120/162] avg loss 0.0133418, throughput 3.97265K wps
[Epoch 4 Batch 150/162] avg loss 0.0132602, throughput 3.96352K wps
Begin Testing...
[Epoch 4] train avg loss 0.0133251, dev acc 0.6644, dev avg loss 0.663578, throughput 3.9865K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.013195, throughput 4.05772K wps
[Epoch 5 Batch 60/162] avg loss 0.0131378, throughput 3.96483K wps
[Epoch 5 Batch 90/162] avg loss 0.0130453, throughput 3.97448K wps
[Epoch 5 Batch 120/162] avg loss 0.0129839, throughput 3.9688K wps
[Epoch 5 Batch 150/162] avg loss 0.0129752, throughput 3.97084K wps
Begin Testing...
[Epoch 5] train avg loss 0.0130802, dev acc 0.6833, dev avg loss 0.649721, throughput 3.9859K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/162] avg loss 0.0129642, throughput 4.07653K wps
[Epoch 6 Batch 60/162] avg loss 0.0128236, throughput 3.97179K wps
[Epoch 6 Batch 90/162] avg loss 0.0127098, throughput 3.96691K wps
[Epoch 6 Batch 120/162] avg loss 0.012743, throughput 3.96828K wps
[Epoch 6 Batch 150/162] avg loss 0.0126907, throughput 3.97585K wps
Begin Testing...
[Epoch 6] train avg loss 0.012757, dev acc 0.6856, dev avg loss 0.632746, throughput 3.98983K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/162] avg loss 0.0125518, throughput 4.05011K wps
[Epoch 7 Batch 60/162] avg loss 0.0125852, throughput 3.96994K wps
[Epoch 7 Batch 90/162] avg loss 0.0122358, throughput 3.96831K wps
[Epoch 7 Batch 120/162] avg loss 0.0122987, throughput 3.97053K wps
[Epoch 7 Batch 150/162] avg loss 0.0122168, throughput 3.97003K wps
Begin Testing...
[Epoch 7] train avg loss 0.0123582, dev acc 0.7244, dev avg loss 0.612668, throughput 3.98438K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.0121942, throughput 4.0626K wps
[Epoch 8 Batch 60/162] avg loss 0.0119414, throughput 3.97095K wps
[Epoch 8 Batch 90/162] avg loss 0.0118887, throughput 3.972K wps
[Epoch 8 Batch 120/162] avg loss 0.0119377, throughput 3.97809K wps
[Epoch 8 Batch 150/162] avg loss 0.0119273, throughput 3.9468K wps
Begin Testing...
[Epoch 8] train avg loss 0.011938, dev acc 0.7344, dev avg loss 0.593457, throughput 3.98362K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/162] avg loss 0.0116675, throughput 4.06706K wps
[Epoch 9 Batch 60/162] avg loss 0.0115375, throughput 3.97731K wps
[Epoch 9 Batch 90/162] avg loss 0.0117331, throughput 3.96468K wps
[Epoch 9 Batch 120/162] avg loss 0.0116222, throughput 3.97325K wps
[Epoch 9 Batch 150/162] avg loss 0.011342, throughput 3.97897K wps
Begin Testing...
[Epoch 9] train avg loss 0.0115439, dev acc 0.7422, dev avg loss 0.575438, throughput 3.98907K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.0110751, throughput 4.0414K wps
[Epoch 10 Batch 60/162] avg loss 0.0109325, throughput 3.96808K wps
[Epoch 10 Batch 90/162] avg loss 0.0111027, throughput 3.96847K wps
[Epoch 10 Batch 120/162] avg loss 0.0113998, throughput 3.96523K wps
[Epoch 10 Batch 150/162] avg loss 0.0111177, throughput 3.97314K wps
Begin Testing...
[Epoch 10] train avg loss 0.0111306, dev acc 0.7511, dev avg loss 0.558347, throughput 3.98155K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.011001, throughput 4.06671K wps
[Epoch 11 Batch 60/162] avg loss 0.0105225, throughput 3.97508K wps
[Epoch 11 Batch 90/162] avg loss 0.0109302, throughput 3.9754K wps
[Epoch 11 Batch 120/162] avg loss 0.0107842, throughput 3.96022K wps
[Epoch 11 Batch 150/162] avg loss 0.0105979, throughput 3.96907K wps
Begin Testing...
[Epoch 11] train avg loss 0.0107371, dev acc 0.7511, dev avg loss 0.54202, throughput 3.98756K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.0108484, throughput 4.06674K wps
[Epoch 12 Batch 60/162] avg loss 0.0101431, throughput 3.97561K wps
[Epoch 12 Batch 90/162] avg loss 0.0103274, throughput 3.96107K wps
[Epoch 12 Batch 120/162] avg loss 0.0100346, throughput 3.97744K wps
[Epoch 12 Batch 150/162] avg loss 0.0104318, throughput 3.97399K wps
Begin Testing...
[Epoch 12] train avg loss 0.010341, dev acc 0.7633, dev avg loss 0.524403, throughput 3.98851K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.00999672, throughput 4.05816K wps
[Epoch 13 Batch 60/162] avg loss 0.0101492, throughput 3.95391K wps
[Epoch 13 Batch 90/162] avg loss 0.0103126, throughput 3.96247K wps
[Epoch 13 Batch 120/162] avg loss 0.00996865, throughput 3.97086K wps
[Epoch 13 Batch 150/162] avg loss 0.00992058, throughput 3.95739K wps
Begin Testing...
[Epoch 13] train avg loss 0.0100571, dev acc 0.7689, dev avg loss 0.509317, throughput 3.97867K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00937267, throughput 4.07659K wps
[Epoch 14 Batch 60/162] avg loss 0.0095352, throughput 3.9672K wps
[Epoch 14 Batch 90/162] avg loss 0.0100816, throughput 3.96998K wps
[Epoch 14 Batch 120/162] avg loss 0.00977529, throughput 3.97545K wps
[Epoch 14 Batch 150/162] avg loss 0.00977304, throughput 3.96766K wps
Begin Testing...
[Epoch 14] train avg loss 0.00971274, dev acc 0.7878, dev avg loss 0.496533, throughput 3.98828K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00956919, throughput 4.05761K wps
[Epoch 15 Batch 60/162] avg loss 0.00938794, throughput 3.96122K wps
[Epoch 15 Batch 90/162] avg loss 0.00950259, throughput 3.96496K wps
[Epoch 15 Batch 120/162] avg loss 0.00945746, throughput 3.97076K wps
[Epoch 15 Batch 150/162] avg loss 0.00943777, throughput 3.95735K wps
Begin Testing...
[Epoch 15] train avg loss 0.00945511, dev acc 0.7878, dev avg loss 0.486257, throughput 3.98075K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/162] avg loss 0.00914223, throughput 4.06437K wps
[Epoch 16 Batch 60/162] avg loss 0.0093277, throughput 3.96523K wps
[Epoch 16 Batch 90/162] avg loss 0.00933557, throughput 3.96294K wps
[Epoch 16 Batch 120/162] avg loss 0.00910399, throughput 3.95028K wps
[Epoch 16 Batch 150/162] avg loss 0.00920284, throughput 3.95792K wps
Begin Testing...
[Epoch 16] train avg loss 0.00921963, dev acc 0.7978, dev avg loss 0.476496, throughput 3.97865K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.00888479, throughput 4.05814K wps
[Epoch 17 Batch 60/162] avg loss 0.00918899, throughput 3.97586K wps
[Epoch 17 Batch 90/162] avg loss 0.00933154, throughput 3.97473K wps
[Epoch 17 Batch 120/162] avg loss 0.00873363, throughput 3.97221K wps
[Epoch 17 Batch 150/162] avg loss 0.00920762, throughput 3.97023K wps
Begin Testing...
[Epoch 17] train avg loss 0.00905921, dev acc 0.7956, dev avg loss 0.469395, throughput 3.98863K wps
[Epoch 18 Batch 30/162] avg loss 0.00894841, throughput 4.06798K wps
[Epoch 18 Batch 60/162] avg loss 0.00889926, throughput 3.96696K wps
[Epoch 18 Batch 90/162] avg loss 0.00914978, throughput 3.97355K wps
[Epoch 18 Batch 120/162] avg loss 0.0088809, throughput 3.97295K wps
[Epoch 18 Batch 150/162] avg loss 0.00846183, throughput 3.96979K wps
Begin Testing...
[Epoch 18] train avg loss 0.00886752, dev acc 0.8000, dev avg loss 0.462003, throughput 3.98801K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.0087903, throughput 4.06539K wps
[Epoch 19 Batch 60/162] avg loss 0.00850021, throughput 3.97195K wps
[Epoch 19 Batch 90/162] avg loss 0.00905541, throughput 3.97166K wps
[Epoch 19 Batch 120/162] avg loss 0.00884593, throughput 3.96207K wps
[Epoch 19 Batch 150/162] avg loss 0.00865202, throughput 3.97452K wps
Begin Testing...
[Epoch 19] train avg loss 0.00872859, dev acc 0.8011, dev avg loss 0.455951, throughput 3.98633K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/162] avg loss 0.0084258, throughput 4.07267K wps
[Epoch 20 Batch 60/162] avg loss 0.00875196, throughput 3.97345K wps
[Epoch 20 Batch 90/162] avg loss 0.00861142, throughput 3.96477K wps
[Epoch 20 Batch 120/162] avg loss 0.00859526, throughput 3.97046K wps
[Epoch 20 Batch 150/162] avg loss 0.00851271, throughput 3.97667K wps
Begin Testing...
[Epoch 20] train avg loss 0.00856629, dev acc 0.8011, dev avg loss 0.450819, throughput 3.9887K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/162] avg loss 0.00860789, throughput 4.0654K wps
[Epoch 21 Batch 60/162] avg loss 0.00865558, throughput 3.97025K wps
[Epoch 21 Batch 90/162] avg loss 0.00854993, throughput 3.95038K wps
[Epoch 21 Batch 120/162] avg loss 0.00827262, throughput 3.97061K wps
[Epoch 21 Batch 150/162] avg loss 0.00806594, throughput 3.97203K wps
Begin Testing...
[Epoch 21] train avg loss 0.00841016, dev acc 0.8033, dev avg loss 0.44448, throughput 3.98397K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.00840679, throughput 4.04536K wps
[Epoch 22 Batch 60/162] avg loss 0.00834537, throughput 3.97352K wps
[Epoch 22 Batch 90/162] avg loss 0.00821922, throughput 3.97564K wps
[Epoch 22 Batch 120/162] avg loss 0.00852201, throughput 3.97746K wps
[Epoch 22 Batch 150/162] avg loss 0.00832282, throughput 3.97415K wps
Begin Testing...
[Epoch 22] train avg loss 0.00833736, dev acc 0.8100, dev avg loss 0.438733, throughput 3.98795K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/162] avg loss 0.00795419, throughput 4.06282K wps
[Epoch 23 Batch 60/162] avg loss 0.00819213, throughput 3.97573K wps
[Epoch 23 Batch 90/162] avg loss 0.00842817, throughput 3.98014K wps
[Epoch 23 Batch 120/162] avg loss 0.00837379, throughput 3.96695K wps
[Epoch 23 Batch 150/162] avg loss 0.00817938, throughput 3.96668K wps
Begin Testing...
[Epoch 23] train avg loss 0.00820708, dev acc 0.8067, dev avg loss 0.435151, throughput 3.98833K wps
[Epoch 24 Batch 30/162] avg loss 0.00824372, throughput 4.05471K wps
[Epoch 24 Batch 60/162] avg loss 0.00807126, throughput 3.97041K wps
[Epoch 24 Batch 90/162] avg loss 0.0079676, throughput 3.97174K wps
[Epoch 24 Batch 120/162] avg loss 0.00813586, throughput 3.96093K wps
[Epoch 24 Batch 150/162] avg loss 0.00800666, throughput 3.96419K wps
Begin Testing...
[Epoch 24] train avg loss 0.00805893, dev acc 0.8078, dev avg loss 0.429897, throughput 3.98333K wps
[Epoch 25 Batch 30/162] avg loss 0.00791133, throughput 4.06643K wps
[Epoch 25 Batch 60/162] avg loss 0.00803068, throughput 3.97053K wps
[Epoch 25 Batch 90/162] avg loss 0.00768511, throughput 3.97391K wps
[Epoch 25 Batch 120/162] avg loss 0.00816471, throughput 3.96427K wps
[Epoch 25 Batch 150/162] avg loss 0.00767453, throughput 3.97957K wps
Begin Testing...
[Epoch 25] train avg loss 0.00792995, dev acc 0.8200, dev avg loss 0.423888, throughput 3.98967K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/162] avg loss 0.00790163, throughput 4.07133K wps
[Epoch 26 Batch 60/162] avg loss 0.00736342, throughput 3.94886K wps
[Epoch 26 Batch 90/162] avg loss 0.00761649, throughput 3.96791K wps
[Epoch 26 Batch 120/162] avg loss 0.00804764, throughput 3.95407K wps
[Epoch 26 Batch 150/162] avg loss 0.00780643, throughput 3.96475K wps
Begin Testing...
[Epoch 26] train avg loss 0.00774582, dev acc 0.8211, dev avg loss 0.41998, throughput 3.97952K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.00769579, throughput 4.04322K wps
[Epoch 27 Batch 60/162] avg loss 0.00743643, throughput 3.96075K wps
[Epoch 27 Batch 90/162] avg loss 0.00762307, throughput 3.97868K wps
[Epoch 27 Batch 120/162] avg loss 0.00751194, throughput 3.97286K wps
[Epoch 27 Batch 150/162] avg loss 0.00806997, throughput 3.97101K wps
Begin Testing...
[Epoch 27] train avg loss 0.00764998, dev acc 0.8233, dev avg loss 0.414827, throughput 3.9838K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/162] avg loss 0.00767238, throughput 4.06397K wps
[Epoch 28 Batch 60/162] avg loss 0.00767189, throughput 3.95799K wps
[Epoch 28 Batch 90/162] avg loss 0.00714391, throughput 3.96886K wps
[Epoch 28 Batch 120/162] avg loss 0.00741243, throughput 3.96243K wps
[Epoch 28 Batch 150/162] avg loss 0.00750565, throughput 3.96315K wps
Begin Testing...
[Epoch 28] train avg loss 0.00751795, dev acc 0.8200, dev avg loss 0.410256, throughput 3.98068K wps
[Epoch 29 Batch 30/162] avg loss 0.0075198, throughput 4.0625K wps
[Epoch 29 Batch 60/162] avg loss 0.00737441, throughput 3.9817K wps
[Epoch 29 Batch 90/162] avg loss 0.00732478, throughput 3.97825K wps
[Epoch 29 Batch 120/162] avg loss 0.00738615, throughput 3.97184K wps
[Epoch 29 Batch 150/162] avg loss 0.0073042, throughput 3.95776K wps
Begin Testing...
[Epoch 29] train avg loss 0.00737288, dev acc 0.8278, dev avg loss 0.405825, throughput 3.98795K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/162] avg loss 0.00731333, throughput 4.042K wps
[Epoch 30 Batch 60/162] avg loss 0.00695265, throughput 3.97054K wps
[Epoch 30 Batch 90/162] avg loss 0.00754772, throughput 3.96292K wps
[Epoch 30 Batch 120/162] avg loss 0.00724415, throughput 3.96812K wps
[Epoch 30 Batch 150/162] avg loss 0.0071445, throughput 3.96956K wps
Begin Testing...
[Epoch 30] train avg loss 0.0072583, dev acc 0.8289, dev avg loss 0.401957, throughput 3.98164K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00680786, throughput 4.06043K wps
[Epoch 31 Batch 60/162] avg loss 0.00724817, throughput 3.96374K wps
[Epoch 31 Batch 90/162] avg loss 0.00716392, throughput 3.96329K wps
[Epoch 31 Batch 120/162] avg loss 0.00717849, throughput 3.96581K wps
[Epoch 31 Batch 150/162] avg loss 0.00724011, throughput 3.97415K wps
Begin Testing...
[Epoch 31] train avg loss 0.00712147, dev acc 0.8333, dev avg loss 0.397201, throughput 3.98168K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/162] avg loss 0.00702221, throughput 4.06228K wps
[Epoch 32 Batch 60/162] avg loss 0.00707208, throughput 3.96233K wps
[Epoch 32 Batch 90/162] avg loss 0.00739662, throughput 3.97098K wps
[Epoch 32 Batch 120/162] avg loss 0.00687023, throughput 3.96685K wps
[Epoch 32 Batch 150/162] avg loss 0.00657964, throughput 3.96728K wps
Begin Testing...
[Epoch 32] train avg loss 0.00697632, dev acc 0.8367, dev avg loss 0.393298, throughput 3.98475K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/162] avg loss 0.00718997, throughput 4.0634K wps
[Epoch 33 Batch 60/162] avg loss 0.00659966, throughput 3.97658K wps
[Epoch 33 Batch 90/162] avg loss 0.00698496, throughput 3.9668K wps
[Epoch 33 Batch 120/162] avg loss 0.00701382, throughput 3.9607K wps
[Epoch 33 Batch 150/162] avg loss 0.00701008, throughput 3.9669K wps
Begin Testing...
[Epoch 33] train avg loss 0.0069215, dev acc 0.8367, dev avg loss 0.390172, throughput 3.98393K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/162] avg loss 0.00673737, throughput 4.05691K wps
[Epoch 34 Batch 60/162] avg loss 0.00648004, throughput 3.9681K wps
[Epoch 34 Batch 90/162] avg loss 0.00670328, throughput 3.96845K wps
[Epoch 34 Batch 120/162] avg loss 0.00646793, throughput 3.96954K wps
[Epoch 34 Batch 150/162] avg loss 0.0070943, throughput 3.95243K wps
Begin Testing...
[Epoch 34] train avg loss 0.00673344, dev acc 0.8378, dev avg loss 0.385301, throughput 3.98198K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/162] avg loss 0.00697648, throughput 4.07135K wps
[Epoch 35 Batch 60/162] avg loss 0.00645784, throughput 3.95262K wps
[Epoch 35 Batch 90/162] avg loss 0.00672117, throughput 3.96544K wps
[Epoch 35 Batch 120/162] avg loss 0.00635604, throughput 3.97K wps
[Epoch 35 Batch 150/162] avg loss 0.00610974, throughput 3.96353K wps
Begin Testing...
[Epoch 35] train avg loss 0.00657022, dev acc 0.8378, dev avg loss 0.381008, throughput 3.98307K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/162] avg loss 0.00649237, throughput 4.04849K wps
[Epoch 36 Batch 60/162] avg loss 0.00660976, throughput 3.96546K wps
[Epoch 36 Batch 90/162] avg loss 0.00641931, throughput 3.9607K wps
[Epoch 36 Batch 120/162] avg loss 0.00611852, throughput 3.96349K wps
[Epoch 36 Batch 150/162] avg loss 0.00684483, throughput 3.94538K wps
Begin Testing...
[Epoch 36] train avg loss 0.00647452, dev acc 0.8467, dev avg loss 0.377315, throughput 3.97524K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/162] avg loss 0.00615249, throughput 4.06811K wps
[Epoch 37 Batch 60/162] avg loss 0.00673175, throughput 3.9706K wps
[Epoch 37 Batch 90/162] avg loss 0.00633271, throughput 3.96386K wps
[Epoch 37 Batch 120/162] avg loss 0.00648667, throughput 3.96824K wps
[Epoch 37 Batch 150/162] avg loss 0.00652717, throughput 3.9622K wps
Begin Testing...
[Epoch 37] train avg loss 0.00642578, dev acc 0.8433, dev avg loss 0.372994, throughput 3.98484K wps
[Epoch 38 Batch 30/162] avg loss 0.00610234, throughput 4.05796K wps
[Epoch 38 Batch 60/162] avg loss 0.00628078, throughput 3.96896K wps
[Epoch 38 Batch 90/162] avg loss 0.00616965, throughput 3.9676K wps
[Epoch 38 Batch 120/162] avg loss 0.00615838, throughput 3.96344K wps
[Epoch 38 Batch 150/162] avg loss 0.00595615, throughput 3.9678K wps
Begin Testing...
[Epoch 38] train avg loss 0.00617376, dev acc 0.8478, dev avg loss 0.369198, throughput 3.98308K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/162] avg loss 0.00597498, throughput 4.0706K wps
[Epoch 39 Batch 60/162] avg loss 0.00628119, throughput 3.94666K wps
[Epoch 39 Batch 90/162] avg loss 0.00565145, throughput 3.96554K wps
[Epoch 39 Batch 120/162] avg loss 0.0061913, throughput 3.96835K wps
[Epoch 39 Batch 150/162] avg loss 0.00606241, throughput 3.96003K wps
Begin Testing...
[Epoch 39] train avg loss 0.00608363, dev acc 0.8367, dev avg loss 0.365956, throughput 3.97969K wps
[Epoch 40 Batch 30/162] avg loss 0.00586725, throughput 4.0694K wps
[Epoch 40 Batch 60/162] avg loss 0.00563755, throughput 3.96814K wps
[Epoch 40 Batch 90/162] avg loss 0.0061527, throughput 3.97032K wps
[Epoch 40 Batch 120/162] avg loss 0.0060656, throughput 3.97032K wps
[Epoch 40 Batch 150/162] avg loss 0.00578942, throughput 3.97323K wps
Begin Testing...
[Epoch 40] train avg loss 0.00592227, dev acc 0.8422, dev avg loss 0.362109, throughput 3.98889K wps
[Epoch 41 Batch 30/162] avg loss 0.0060176, throughput 4.07286K wps
[Epoch 41 Batch 60/162] avg loss 0.00535184, throughput 3.97492K wps
[Epoch 41 Batch 90/162] avg loss 0.00583556, throughput 3.97363K wps
[Epoch 41 Batch 120/162] avg loss 0.00583533, throughput 3.96315K wps
[Epoch 41 Batch 150/162] avg loss 0.00594154, throughput 3.97664K wps
Begin Testing...
[Epoch 41] train avg loss 0.00581147, dev acc 0.8411, dev avg loss 0.358671, throughput 3.99019K wps
[Epoch 42 Batch 30/162] avg loss 0.00550544, throughput 4.0594K wps
[Epoch 42 Batch 60/162] avg loss 0.0056228, throughput 3.97913K wps
[Epoch 42 Batch 90/162] avg loss 0.00578923, throughput 3.96475K wps
[Epoch 42 Batch 120/162] avg loss 0.00535894, throughput 3.96471K wps
[Epoch 42 Batch 150/162] avg loss 0.00603089, throughput 3.97173K wps
Begin Testing...
[Epoch 42] train avg loss 0.00568169, dev acc 0.8422, dev avg loss 0.354389, throughput 3.98528K wps
[Epoch 43 Batch 30/162] avg loss 0.00572102, throughput 4.05442K wps
[Epoch 43 Batch 60/162] avg loss 0.00540078, throughput 3.96606K wps
[Epoch 43 Batch 90/162] avg loss 0.00550182, throughput 3.96705K wps
[Epoch 43 Batch 120/162] avg loss 0.0058685, throughput 3.96483K wps
[Epoch 43 Batch 150/162] avg loss 0.00532339, throughput 3.95824K wps
Begin Testing...
[Epoch 43] train avg loss 0.0055515, dev acc 0.8500, dev avg loss 0.350924, throughput 3.98113K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/162] avg loss 0.00534321, throughput 4.06635K wps
[Epoch 44 Batch 60/162] avg loss 0.00545266, throughput 3.96948K wps
[Epoch 44 Batch 90/162] avg loss 0.00564917, throughput 3.97947K wps
[Epoch 44 Batch 120/162] avg loss 0.00531471, throughput 3.97239K wps
[Epoch 44 Batch 150/162] avg loss 0.00542916, throughput 3.95624K wps
Begin Testing...
[Epoch 44] train avg loss 0.00542204, dev acc 0.8489, dev avg loss 0.347826, throughput 3.98742K wps
[Epoch 45 Batch 30/162] avg loss 0.00529396, throughput 4.06946K wps
[Epoch 45 Batch 60/162] avg loss 0.00523643, throughput 3.97152K wps
[Epoch 45 Batch 90/162] avg loss 0.00509459, throughput 3.96268K wps
[Epoch 45 Batch 120/162] avg loss 0.00556256, throughput 3.97156K wps
[Epoch 45 Batch 150/162] avg loss 0.00547023, throughput 3.9804K wps
Begin Testing...
[Epoch 45] train avg loss 0.00531801, dev acc 0.8511, dev avg loss 0.343979, throughput 3.98927K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/162] avg loss 0.00522556, throughput 4.06393K wps
[Epoch 46 Batch 60/162] avg loss 0.00521342, throughput 3.9731K wps
[Epoch 46 Batch 90/162] avg loss 0.00507416, throughput 3.96766K wps
[Epoch 46 Batch 120/162] avg loss 0.00535367, throughput 3.96115K wps
[Epoch 46 Batch 150/162] avg loss 0.00510678, throughput 3.97382K wps
Begin Testing...
[Epoch 46] train avg loss 0.00519704, dev acc 0.8533, dev avg loss 0.341761, throughput 3.98629K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/162] avg loss 0.00521976, throughput 4.04928K wps
[Epoch 47 Batch 60/162] avg loss 0.00520787, throughput 3.97311K wps
[Epoch 47 Batch 90/162] avg loss 0.00487019, throughput 3.96684K wps
[Epoch 47 Batch 120/162] avg loss 0.00499032, throughput 3.96985K wps
[Epoch 47 Batch 150/162] avg loss 0.00492421, throughput 3.97335K wps
Begin Testing...
[Epoch 47] train avg loss 0.00501436, dev acc 0.8556, dev avg loss 0.339246, throughput 3.98261K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/162] avg loss 0.00511391, throughput 4.04078K wps
[Epoch 48 Batch 60/162] avg loss 0.0050445, throughput 3.97277K wps
[Epoch 48 Batch 90/162] avg loss 0.00477559, throughput 3.95829K wps
[Epoch 48 Batch 120/162] avg loss 0.00489723, throughput 3.97532K wps
[Epoch 48 Batch 150/162] avg loss 0.00494146, throughput 3.97538K wps
Begin Testing...
[Epoch 48] train avg loss 0.00491791, dev acc 0.8544, dev avg loss 0.335349, throughput 3.98253K wps
[Epoch 49 Batch 30/162] avg loss 0.00478406, throughput 4.06336K wps
[Epoch 49 Batch 60/162] avg loss 0.00458083, throughput 3.97345K wps
[Epoch 49 Batch 90/162] avg loss 0.0048917, throughput 3.9674K wps
[Epoch 49 Batch 120/162] avg loss 0.00472051, throughput 3.96077K wps
[Epoch 49 Batch 150/162] avg loss 0.00481293, throughput 3.9614K wps
Begin Testing...
[Epoch 49] train avg loss 0.00476609, dev acc 0.8556, dev avg loss 0.332897, throughput 3.98377K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/162] avg loss 0.00460776, throughput 4.07861K wps
[Epoch 50 Batch 60/162] avg loss 0.00470063, throughput 3.96037K wps
[Epoch 50 Batch 90/162] avg loss 0.00474181, throughput 3.97561K wps
[Epoch 50 Batch 120/162] avg loss 0.00483422, throughput 3.96743K wps
[Epoch 50 Batch 150/162] avg loss 0.00430643, throughput 3.97739K wps
Begin Testing...
[Epoch 50] train avg loss 0.00463352, dev acc 0.8556, dev avg loss 0.329892, throughput 3.99019K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/162] avg loss 0.00430766, throughput 4.05542K wps
[Epoch 51 Batch 60/162] avg loss 0.0046241, throughput 3.95934K wps
[Epoch 51 Batch 90/162] avg loss 0.0043115, throughput 3.96416K wps
[Epoch 51 Batch 120/162] avg loss 0.00457498, throughput 3.95916K wps
[Epoch 51 Batch 150/162] avg loss 0.00459356, throughput 3.96198K wps
Begin Testing...
[Epoch 51] train avg loss 0.00448569, dev acc 0.8556, dev avg loss 0.327956, throughput 3.97653K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/162] avg loss 0.0042089, throughput 4.04299K wps
[Epoch 52 Batch 60/162] avg loss 0.00425824, throughput 3.96831K wps
[Epoch 52 Batch 90/162] avg loss 0.00478255, throughput 3.96492K wps
[Epoch 52 Batch 120/162] avg loss 0.00423447, throughput 3.97523K wps
[Epoch 52 Batch 150/162] avg loss 0.0044277, throughput 3.97688K wps
Begin Testing...
[Epoch 52] train avg loss 0.00439155, dev acc 0.8578, dev avg loss 0.325598, throughput 3.98426K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/162] avg loss 0.00431932, throughput 4.05742K wps
[Epoch 53 Batch 60/162] avg loss 0.00465514, throughput 3.97344K wps
[Epoch 53 Batch 90/162] avg loss 0.00400558, throughput 3.97003K wps
[Epoch 53 Batch 120/162] avg loss 0.00446695, throughput 3.95905K wps
[Epoch 53 Batch 150/162] avg loss 0.00436321, throughput 3.97621K wps
Begin Testing...
[Epoch 53] train avg loss 0.00433587, dev acc 0.8578, dev avg loss 0.32419, throughput 3.98569K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/162] avg loss 0.0039928, throughput 4.06244K wps
[Epoch 54 Batch 60/162] avg loss 0.00424235, throughput 3.97109K wps
[Epoch 54 Batch 90/162] avg loss 0.00431372, throughput 3.9681K wps
[Epoch 54 Batch 120/162] avg loss 0.00417427, throughput 3.96939K wps
[Epoch 54 Batch 150/162] avg loss 0.00429937, throughput 3.95986K wps
Begin Testing...
[Epoch 54] train avg loss 0.00419622, dev acc 0.8622, dev avg loss 0.324468, throughput 3.98466K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/162] avg loss 0.00422532, throughput 4.06137K wps
[Epoch 55 Batch 60/162] avg loss 0.00423172, throughput 3.95803K wps
[Epoch 55 Batch 90/162] avg loss 0.0039639, throughput 3.97017K wps
[Epoch 55 Batch 120/162] avg loss 0.00415518, throughput 3.96186K wps
[Epoch 55 Batch 150/162] avg loss 0.00408249, throughput 3.96193K wps
Begin Testing...
[Epoch 55] train avg loss 0.00412487, dev acc 0.8589, dev avg loss 0.320348, throughput 3.98177K wps
[Epoch 56 Batch 30/162] avg loss 0.00371595, throughput 4.06735K wps
[Epoch 56 Batch 60/162] avg loss 0.00419847, throughput 3.97306K wps
[Epoch 56 Batch 90/162] avg loss 0.00384727, throughput 3.94487K wps
[Epoch 56 Batch 120/162] avg loss 0.00408264, throughput 3.94468K wps
[Epoch 56 Batch 150/162] avg loss 0.00382674, throughput 3.97505K wps
Begin Testing...
[Epoch 56] train avg loss 0.00394977, dev acc 0.8633, dev avg loss 0.318825, throughput 3.97977K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/162] avg loss 0.00418595, throughput 4.0596K wps
[Epoch 57 Batch 60/162] avg loss 0.00375859, throughput 3.96538K wps
[Epoch 57 Batch 90/162] avg loss 0.00381477, throughput 3.97162K wps
[Epoch 57 Batch 120/162] avg loss 0.00398867, throughput 3.97055K wps
[Epoch 57 Batch 150/162] avg loss 0.00365126, throughput 3.9762K wps
Begin Testing...
[Epoch 57] train avg loss 0.00389868, dev acc 0.8600, dev avg loss 0.316616, throughput 3.98725K wps
[Epoch 58 Batch 30/162] avg loss 0.0038224, throughput 4.05817K wps
[Epoch 58 Batch 60/162] avg loss 0.0037811, throughput 3.97106K wps
[Epoch 58 Batch 90/162] avg loss 0.00387142, throughput 3.9636K wps
[Epoch 58 Batch 120/162] avg loss 0.00372186, throughput 3.95022K wps
[Epoch 58 Batch 150/162] avg loss 0.00387833, throughput 3.95877K wps
Begin Testing...
[Epoch 58] train avg loss 0.00379362, dev acc 0.8611, dev avg loss 0.314801, throughput 3.97857K wps
[Epoch 59 Batch 30/162] avg loss 0.00366713, throughput 4.05464K wps
[Epoch 59 Batch 60/162] avg loss 0.00358728, throughput 3.97278K wps
[Epoch 59 Batch 90/162] avg loss 0.00326086, throughput 3.96764K wps
[Epoch 59 Batch 120/162] avg loss 0.00382304, throughput 3.96727K wps
[Epoch 59 Batch 150/162] avg loss 0.00378243, throughput 3.96107K wps
Begin Testing...
[Epoch 59] train avg loss 0.00364403, dev acc 0.8611, dev avg loss 0.313916, throughput 3.98299K wps
[Epoch 60 Batch 30/162] avg loss 0.00365332, throughput 4.04471K wps
[Epoch 60 Batch 60/162] avg loss 0.00372772, throughput 3.96756K wps
[Epoch 60 Batch 90/162] avg loss 0.00353552, throughput 3.94998K wps
[Epoch 60 Batch 120/162] avg loss 0.00349168, throughput 3.96154K wps
[Epoch 60 Batch 150/162] avg loss 0.00339088, throughput 3.96763K wps
Begin Testing...
[Epoch 60] train avg loss 0.00354292, dev acc 0.8611, dev avg loss 0.314935, throughput 3.97771K wps
[Epoch 61 Batch 30/162] avg loss 0.00330816, throughput 4.06428K wps
[Epoch 61 Batch 60/162] avg loss 0.00373306, throughput 3.96867K wps
[Epoch 61 Batch 90/162] avg loss 0.00357366, throughput 3.96005K wps
[Epoch 61 Batch 120/162] avg loss 0.00338325, throughput 3.97809K wps
[Epoch 61 Batch 150/162] avg loss 0.0035105, throughput 3.96743K wps
Begin Testing...
[Epoch 61] train avg loss 0.00349129, dev acc 0.8600, dev avg loss 0.31143, throughput 3.98446K wps
[Epoch 62 Batch 30/162] avg loss 0.00346457, throughput 4.05169K wps
[Epoch 62 Batch 60/162] avg loss 0.00327158, throughput 3.9579K wps
[Epoch 62 Batch 90/162] avg loss 0.00323108, throughput 3.95519K wps
[Epoch 62 Batch 120/162] avg loss 0.00332312, throughput 3.95711K wps
[Epoch 62 Batch 150/162] avg loss 0.00336363, throughput 3.95913K wps
Begin Testing...
[Epoch 62] train avg loss 0.00334394, dev acc 0.8656, dev avg loss 0.309308, throughput 3.97582K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/162] avg loss 0.00319156, throughput 4.04827K wps
[Epoch 63 Batch 60/162] avg loss 0.00339977, throughput 3.96311K wps
[Epoch 63 Batch 90/162] avg loss 0.00332761, throughput 3.95293K wps
[Epoch 63 Batch 120/162] avg loss 0.00336902, throughput 3.95598K wps
[Epoch 63 Batch 150/162] avg loss 0.00311024, throughput 3.95496K wps
Begin Testing...
[Epoch 63] train avg loss 0.0032913, dev acc 0.8644, dev avg loss 0.30816, throughput 3.97314K wps
[Epoch 64 Batch 30/162] avg loss 0.00310411, throughput 4.04367K wps
[Epoch 64 Batch 60/162] avg loss 0.00303918, throughput 3.9546K wps
[Epoch 64 Batch 90/162] avg loss 0.00324126, throughput 3.95609K wps
[Epoch 64 Batch 120/162] avg loss 0.00325021, throughput 3.95253K wps
[Epoch 64 Batch 150/162] avg loss 0.00323414, throughput 3.93407K wps
Begin Testing...
[Epoch 64] train avg loss 0.00317662, dev acc 0.8633, dev avg loss 0.308213, throughput 3.96655K wps
[Epoch 65 Batch 30/162] avg loss 0.00302689, throughput 4.05185K wps
[Epoch 65 Batch 60/162] avg loss 0.00298141, throughput 3.95438K wps
[Epoch 65 Batch 90/162] avg loss 0.0030143, throughput 3.94319K wps
[Epoch 65 Batch 120/162] avg loss 0.00355815, throughput 3.96367K wps
[Epoch 65 Batch 150/162] avg loss 0.00312986, throughput 3.95914K wps
Begin Testing...
[Epoch 65] train avg loss 0.00312117, dev acc 0.8667, dev avg loss 0.306406, throughput 3.97306K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/162] avg loss 0.00290164, throughput 4.05872K wps
[Epoch 66 Batch 60/162] avg loss 0.00317297, throughput 3.9685K wps
[Epoch 66 Batch 90/162] avg loss 0.00296834, throughput 3.95225K wps
[Epoch 66 Batch 120/162] avg loss 0.0032478, throughput 3.96868K wps
[Epoch 66 Batch 150/162] avg loss 0.00291982, throughput 3.94712K wps
Begin Testing...
[Epoch 66] train avg loss 0.00302944, dev acc 0.8656, dev avg loss 0.304164, throughput 3.97685K wps
[Epoch 67 Batch 30/162] avg loss 0.00270109, throughput 4.05594K wps
[Epoch 67 Batch 60/162] avg loss 0.00318366, throughput 3.95324K wps
[Epoch 67 Batch 90/162] avg loss 0.00295839, throughput 3.9645K wps
[Epoch 67 Batch 120/162] avg loss 0.00276191, throughput 3.96K wps
[Epoch 67 Batch 150/162] avg loss 0.00301592, throughput 3.94741K wps
Begin Testing...
[Epoch 67] train avg loss 0.00292886, dev acc 0.8667, dev avg loss 0.305035, throughput 3.97367K wps
Observed Improvement.
Begin Testing...
[Epoch 68 Batch 30/162] avg loss 0.00270374, throughput 4.04786K wps
[Epoch 68 Batch 60/162] avg loss 0.00285659, throughput 3.95944K wps
[Epoch 68 Batch 90/162] avg loss 0.00289354, throughput 3.97272K wps
[Epoch 68 Batch 120/162] avg loss 0.00262115, throughput 3.95978K wps
[Epoch 68 Batch 150/162] avg loss 0.00295972, throughput 3.95395K wps
Begin Testing...
[Epoch 68] train avg loss 0.0028453, dev acc 0.8656, dev avg loss 0.305324, throughput 3.97455K wps
[Epoch 69 Batch 30/162] avg loss 0.00284187, throughput 4.05514K wps
[Epoch 69 Batch 60/162] avg loss 0.00291146, throughput 3.96594K wps
[Epoch 69 Batch 90/162] avg loss 0.00282034, throughput 3.96556K wps
[Epoch 69 Batch 120/162] avg loss 0.00243586, throughput 3.96552K wps
[Epoch 69 Batch 150/162] avg loss 0.00270197, throughput 3.95036K wps
Begin Testing...
[Epoch 69] train avg loss 0.00272757, dev acc 0.8667, dev avg loss 0.303513, throughput 3.97804K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/162] avg loss 0.00250425, throughput 4.04239K wps
[Epoch 70 Batch 60/162] avg loss 0.00263352, throughput 3.95152K wps
[Epoch 70 Batch 90/162] avg loss 0.00266163, throughput 3.97294K wps
[Epoch 70 Batch 120/162] avg loss 0.00252238, throughput 3.96724K wps
[Epoch 70 Batch 150/162] avg loss 0.00269867, throughput 3.96846K wps
Begin Testing...
[Epoch 70] train avg loss 0.00262479, dev acc 0.8700, dev avg loss 0.302612, throughput 3.97797K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/162] avg loss 0.00269192, throughput 4.03472K wps
[Epoch 71 Batch 60/162] avg loss 0.00245442, throughput 3.9568K wps
[Epoch 71 Batch 90/162] avg loss 0.00257424, throughput 3.9644K wps
[Epoch 71 Batch 120/162] avg loss 0.00254478, throughput 3.97159K wps
[Epoch 71 Batch 150/162] avg loss 0.00255301, throughput 3.97389K wps
Begin Testing...
[Epoch 71] train avg loss 0.00256303, dev acc 0.8667, dev avg loss 0.302217, throughput 3.97692K wps
[Epoch 72 Batch 30/162] avg loss 0.00254647, throughput 4.04547K wps
[Epoch 72 Batch 60/162] avg loss 0.00257139, throughput 3.94661K wps
[Epoch 72 Batch 90/162] avg loss 0.0023558, throughput 3.95819K wps
[Epoch 72 Batch 120/162] avg loss 0.00247331, throughput 3.9699K wps
[Epoch 72 Batch 150/162] avg loss 0.00249343, throughput 3.96837K wps
Begin Testing...
[Epoch 72] train avg loss 0.00248661, dev acc 0.8656, dev avg loss 0.302478, throughput 3.97578K wps
[Epoch 73 Batch 30/162] avg loss 0.00247048, throughput 4.0365K wps
[Epoch 73 Batch 60/162] avg loss 0.00251785, throughput 3.95938K wps
[Epoch 73 Batch 90/162] avg loss 0.00247988, throughput 3.96648K wps
[Epoch 73 Batch 120/162] avg loss 0.00242916, throughput 3.94413K wps
[Epoch 73 Batch 150/162] avg loss 0.00238196, throughput 3.957K wps
Begin Testing...
[Epoch 73] train avg loss 0.00244028, dev acc 0.8700, dev avg loss 0.301266, throughput 3.97084K wps
Observed Improvement.
Begin Testing...
[Epoch 74 Batch 30/162] avg loss 0.00232279, throughput 4.05297K wps
[Epoch 74 Batch 60/162] avg loss 0.00236134, throughput 3.95573K wps
[Epoch 74 Batch 90/162] avg loss 0.00235109, throughput 3.96802K wps
[Epoch 74 Batch 120/162] avg loss 0.00232235, throughput 3.96483K wps
[Epoch 74 Batch 150/162] avg loss 0.00222927, throughput 3.95562K wps
Begin Testing...
[Epoch 74] train avg loss 0.00233743, dev acc 0.8667, dev avg loss 0.30256, throughput 3.97588K wps
[Epoch 75 Batch 30/162] avg loss 0.0022848, throughput 4.04757K wps
[Epoch 75 Batch 60/162] avg loss 0.00228229, throughput 3.9498K wps
[Epoch 75 Batch 90/162] avg loss 0.00231089, throughput 3.95457K wps
[Epoch 75 Batch 120/162] avg loss 0.0022867, throughput 3.95554K wps
[Epoch 75 Batch 150/162] avg loss 0.00249211, throughput 3.94924K wps
Begin Testing...
[Epoch 75] train avg loss 0.0023502, dev acc 0.8656, dev avg loss 0.300572, throughput 3.97034K wps
[Epoch 76 Batch 30/162] avg loss 0.00219476, throughput 4.05289K wps
[Epoch 76 Batch 60/162] avg loss 0.00230824, throughput 3.95784K wps
[Epoch 76 Batch 90/162] avg loss 0.00239677, throughput 3.94841K wps
[Epoch 76 Batch 120/162] avg loss 0.0023319, throughput 3.95744K wps
[Epoch 76 Batch 150/162] avg loss 0.0021391, throughput 3.95902K wps
Begin Testing...
[Epoch 76] train avg loss 0.00223681, dev acc 0.8756, dev avg loss 0.301188, throughput 3.97273K wps
Observed Improvement.
Begin Testing...
[Epoch 77 Batch 30/162] avg loss 0.00221117, throughput 4.0402K wps
[Epoch 77 Batch 60/162] avg loss 0.00233189, throughput 3.95921K wps
[Epoch 77 Batch 90/162] avg loss 0.00223188, throughput 3.96704K wps
[Epoch 77 Batch 120/162] avg loss 0.00217761, throughput 3.97126K wps
[Epoch 77 Batch 150/162] avg loss 0.00208938, throughput 3.9552K wps
Begin Testing...
[Epoch 77] train avg loss 0.00217849, dev acc 0.8711, dev avg loss 0.299258, throughput 3.97648K wps
[Epoch 78 Batch 30/162] avg loss 0.00218436, throughput 4.04009K wps
[Epoch 78 Batch 60/162] avg loss 0.00227856, throughput 3.96523K wps
[Epoch 78 Batch 90/162] avg loss 0.00197403, throughput 3.96726K wps
[Epoch 78 Batch 120/162] avg loss 0.00202569, throughput 3.94223K wps
[Epoch 78 Batch 150/162] avg loss 0.00198787, throughput 3.95074K wps
Begin Testing...
[Epoch 78] train avg loss 0.00207803, dev acc 0.8700, dev avg loss 0.299817, throughput 3.97161K wps
[Epoch 79 Batch 30/162] avg loss 0.00210488, throughput 4.05981K wps
[Epoch 79 Batch 60/162] avg loss 0.00200016, throughput 3.96103K wps
[Epoch 79 Batch 90/162] avg loss 0.00193657, throughput 3.95335K wps
[Epoch 79 Batch 120/162] avg loss 0.00215487, throughput 3.94958K wps
[Epoch 79 Batch 150/162] avg loss 0.00186951, throughput 3.96244K wps
Begin Testing...
[Epoch 79] train avg loss 0.0020329, dev acc 0.8722, dev avg loss 0.300775, throughput 3.97618K wps
[Epoch 80 Batch 30/162] avg loss 0.00188333, throughput 4.04405K wps
[Epoch 80 Batch 60/162] avg loss 0.00184201, throughput 3.96024K wps
[Epoch 80 Batch 90/162] avg loss 0.00213193, throughput 3.96753K wps
[Epoch 80 Batch 120/162] avg loss 0.00208481, throughput 3.96345K wps
[Epoch 80 Batch 150/162] avg loss 0.00204496, throughput 3.96837K wps
Begin Testing...
[Epoch 80] train avg loss 0.00200178, dev acc 0.8689, dev avg loss 0.300853, throughput 3.97973K wps
[Epoch 81 Batch 30/162] avg loss 0.00174028, throughput 4.05905K wps
[Epoch 81 Batch 60/162] avg loss 0.00189213, throughput 3.96192K wps
[Epoch 81 Batch 90/162] avg loss 0.0019274, throughput 3.95832K wps
[Epoch 81 Batch 120/162] avg loss 0.00181306, throughput 3.95815K wps
[Epoch 81 Batch 150/162] avg loss 0.00191931, throughput 3.94096K wps
Begin Testing...
[Epoch 81] train avg loss 0.00187031, dev acc 0.8667, dev avg loss 0.301186, throughput 3.97456K wps
[Epoch 82 Batch 30/162] avg loss 0.001797, throughput 4.05082K wps
[Epoch 82 Batch 60/162] avg loss 0.0018737, throughput 3.96794K wps
[Epoch 82 Batch 90/162] avg loss 0.00182941, throughput 3.9663K wps
[Epoch 82 Batch 120/162] avg loss 0.00198825, throughput 3.95699K wps
[Epoch 82 Batch 150/162] avg loss 0.00194198, throughput 3.96565K wps
Begin Testing...
[Epoch 82] train avg loss 0.00188424, dev acc 0.8711, dev avg loss 0.301041, throughput 3.97886K wps
[Epoch 83 Batch 30/162] avg loss 0.00190285, throughput 4.03128K wps
[Epoch 83 Batch 60/162] avg loss 0.00184906, throughput 3.93013K wps
[Epoch 83 Batch 90/162] avg loss 0.00184631, throughput 3.96634K wps
[Epoch 83 Batch 120/162] avg loss 0.00177328, throughput 3.95694K wps
[Epoch 83 Batch 150/162] avg loss 0.00192047, throughput 3.93801K wps
Begin Testing...
[Epoch 83] train avg loss 0.00184007, dev acc 0.8678, dev avg loss 0.301408, throughput 3.96384K wps
[Epoch 84 Batch 30/162] avg loss 0.00186147, throughput 4.04058K wps
[Epoch 84 Batch 60/162] avg loss 0.00170271, throughput 3.95698K wps
[Epoch 84 Batch 90/162] avg loss 0.0016994, throughput 3.96353K wps
[Epoch 84 Batch 120/162] avg loss 0.00193276, throughput 3.95574K wps
[Epoch 84 Batch 150/162] avg loss 0.00160804, throughput 3.95285K wps
Begin Testing...
[Epoch 84] train avg loss 0.00177187, dev acc 0.8700, dev avg loss 0.302549, throughput 3.97238K wps
[Epoch 85 Batch 30/162] avg loss 0.00169745, throughput 4.0627K wps
[Epoch 85 Batch 60/162] avg loss 0.00190902, throughput 3.96086K wps
[Epoch 85 Batch 90/162] avg loss 0.00166032, throughput 3.94927K wps
[Epoch 85 Batch 120/162] avg loss 0.00158009, throughput 3.95217K wps
[Epoch 85 Batch 150/162] avg loss 0.00170872, throughput 3.95684K wps
Begin Testing...
[Epoch 85] train avg loss 0.0017196, dev acc 0.8678, dev avg loss 0.30379, throughput 3.97456K wps
[Epoch 86 Batch 30/162] avg loss 0.00147425, throughput 4.04546K wps
[Epoch 86 Batch 60/162] avg loss 0.00165328, throughput 3.95194K wps
[Epoch 86 Batch 90/162] avg loss 0.00181543, throughput 3.96008K wps
[Epoch 86 Batch 120/162] avg loss 0.00180534, throughput 3.96157K wps
[Epoch 86 Batch 150/162] avg loss 0.00158589, throughput 3.94763K wps
Begin Testing...
[Epoch 86] train avg loss 0.00165957, dev acc 0.8700, dev avg loss 0.303543, throughput 3.96984K wps
[Epoch 87 Batch 30/162] avg loss 0.001679, throughput 4.05078K wps
[Epoch 87 Batch 60/162] avg loss 0.00167825, throughput 3.94699K wps
[Epoch 87 Batch 90/162] avg loss 0.00138357, throughput 3.95888K wps
[Epoch 87 Batch 120/162] avg loss 0.00149391, throughput 3.94517K wps
[Epoch 87 Batch 150/162] avg loss 0.00161578, throughput 3.9415K wps
Begin Testing...
[Epoch 87] train avg loss 0.00157154, dev acc 0.8678, dev avg loss 0.304916, throughput 3.96674K wps
[Epoch 88 Batch 30/162] avg loss 0.00160466, throughput 4.05908K wps
[Epoch 88 Batch 60/162] avg loss 0.00161739, throughput 3.96511K wps
[Epoch 88 Batch 90/162] avg loss 0.00145599, throughput 3.96732K wps
[Epoch 88 Batch 120/162] avg loss 0.00142828, throughput 3.96311K wps
[Epoch 88 Batch 150/162] avg loss 0.00164337, throughput 3.94949K wps
Begin Testing...
[Epoch 88] train avg loss 0.00156228, dev acc 0.8689, dev avg loss 0.305885, throughput 3.97723K wps
[Epoch 89 Batch 30/162] avg loss 0.00154134, throughput 4.06394K wps
[Epoch 89 Batch 60/162] avg loss 0.00153533, throughput 3.95737K wps
[Epoch 89 Batch 90/162] avg loss 0.00164337, throughput 3.95578K wps
[Epoch 89 Batch 120/162] avg loss 0.00150011, throughput 3.9664K wps
[Epoch 89 Batch 150/162] avg loss 0.0014193, throughput 3.96984K wps
Begin Testing...
[Epoch 89] train avg loss 0.00151243, dev acc 0.8700, dev avg loss 0.304036, throughput 3.97871K wps
[Epoch 90 Batch 30/162] avg loss 0.00146866, throughput 4.05516K wps
[Epoch 90 Batch 60/162] avg loss 0.00145186, throughput 3.96692K wps
[Epoch 90 Batch 90/162] avg loss 0.0015021, throughput 3.96028K wps
[Epoch 90 Batch 120/162] avg loss 0.00148024, throughput 3.95598K wps
[Epoch 90 Batch 150/162] avg loss 0.00155666, throughput 3.95111K wps
Begin Testing...
[Epoch 90] train avg loss 0.00150002, dev acc 0.8700, dev avg loss 0.303331, throughput 3.97689K wps
[Epoch 91 Batch 30/162] avg loss 0.00146061, throughput 4.05199K wps
[Epoch 91 Batch 60/162] avg loss 0.00135912, throughput 3.95553K wps
[Epoch 91 Batch 90/162] avg loss 0.00144416, throughput 3.96165K wps
[Epoch 91 Batch 120/162] avg loss 0.00143526, throughput 3.95626K wps
[Epoch 91 Batch 150/162] avg loss 0.00162824, throughput 3.95234K wps
Begin Testing...
[Epoch 91] train avg loss 0.00147451, dev acc 0.8700, dev avg loss 0.304853, throughput 3.97448K wps
[Epoch 92 Batch 30/162] avg loss 0.00145623, throughput 4.05773K wps
[Epoch 92 Batch 60/162] avg loss 0.00135228, throughput 3.94534K wps
[Epoch 92 Batch 90/162] avg loss 0.00137694, throughput 3.96016K wps
[Epoch 92 Batch 120/162] avg loss 0.00144796, throughput 3.96237K wps
[Epoch 92 Batch 150/162] avg loss 0.00135676, throughput 3.94292K wps
Begin Testing...
[Epoch 92] train avg loss 0.00139959, dev acc 0.8722, dev avg loss 0.304989, throughput 3.97239K wps
[Epoch 93 Batch 30/162] avg loss 0.0013107, throughput 4.05997K wps
[Epoch 93 Batch 60/162] avg loss 0.0014151, throughput 3.94149K wps
[Epoch 93 Batch 90/162] avg loss 0.00130473, throughput 3.96965K wps
[Epoch 93 Batch 120/162] avg loss 0.00126175, throughput 3.95525K wps
[Epoch 93 Batch 150/162] avg loss 0.00127807, throughput 3.94426K wps
Begin Testing...
[Epoch 93] train avg loss 0.00130324, dev acc 0.8678, dev avg loss 0.306949, throughput 3.97175K wps
[Epoch 94 Batch 30/162] avg loss 0.00126864, throughput 4.05494K wps
[Epoch 94 Batch 60/162] avg loss 0.00126087, throughput 3.94902K wps
[Epoch 94 Batch 90/162] avg loss 0.00125838, throughput 3.96K wps
[Epoch 94 Batch 120/162] avg loss 0.00124916, throughput 3.95402K wps
[Epoch 94 Batch 150/162] avg loss 0.00126078, throughput 3.95517K wps
Begin Testing...
[Epoch 94] train avg loss 0.00126586, dev acc 0.8689, dev avg loss 0.309402, throughput 3.97175K wps
[Epoch 95 Batch 30/162] avg loss 0.00140411, throughput 4.0631K wps
[Epoch 95 Batch 60/162] avg loss 0.00129217, throughput 3.96129K wps
[Epoch 95 Batch 90/162] avg loss 0.00113225, throughput 3.96445K wps
[Epoch 95 Batch 120/162] avg loss 0.0011999, throughput 3.93964K wps
[Epoch 95 Batch 150/162] avg loss 0.00134177, throughput 3.96104K wps
Begin Testing...
[Epoch 95] train avg loss 0.00126083, dev acc 0.8689, dev avg loss 0.308259, throughput 3.97574K wps
[Epoch 96 Batch 30/162] avg loss 0.00121261, throughput 4.05829K wps
[Epoch 96 Batch 60/162] avg loss 0.00127411, throughput 3.96609K wps
[Epoch 96 Batch 90/162] avg loss 0.00122093, throughput 3.95395K wps
[Epoch 96 Batch 120/162] avg loss 0.00129181, throughput 3.94034K wps
[Epoch 96 Batch 150/162] avg loss 0.00106537, throughput 3.95734K wps
Begin Testing...
[Epoch 96] train avg loss 0.00121632, dev acc 0.8700, dev avg loss 0.310204, throughput 3.97293K wps
[Epoch 97 Batch 30/162] avg loss 0.00111517, throughput 4.05107K wps
[Epoch 97 Batch 60/162] avg loss 0.00119671, throughput 3.95963K wps
[Epoch 97 Batch 90/162] avg loss 0.0012721, throughput 3.95156K wps
[Epoch 97 Batch 120/162] avg loss 0.00130497, throughput 3.95681K wps
[Epoch 97 Batch 150/162] avg loss 0.00112453, throughput 3.96259K wps
Begin Testing...
[Epoch 97] train avg loss 0.00121534, dev acc 0.8700, dev avg loss 0.310329, throughput 3.97462K wps
[Epoch 98 Batch 30/162] avg loss 0.00114138, throughput 4.06175K wps
[Epoch 98 Batch 60/162] avg loss 0.00106557, throughput 3.93746K wps
[Epoch 98 Batch 90/162] avg loss 0.0011552, throughput 3.95438K wps
[Epoch 98 Batch 120/162] avg loss 0.0012665, throughput 3.94042K wps
[Epoch 98 Batch 150/162] avg loss 0.00118254, throughput 3.95273K wps
Begin Testing...
[Epoch 98] train avg loss 0.00115761, dev acc 0.8711, dev avg loss 0.311822, throughput 3.96654K wps
[Epoch 99 Batch 30/162] avg loss 0.00113544, throughput 4.04377K wps
[Epoch 99 Batch 60/162] avg loss 0.00116247, throughput 3.95024K wps
[Epoch 99 Batch 90/162] avg loss 0.00110412, throughput 3.95341K wps
[Epoch 99 Batch 120/162] avg loss 0.00114584, throughput 3.9394K wps
[Epoch 99 Batch 150/162] avg loss 0.00110388, throughput 3.95466K wps
Begin Testing...
[Epoch 99] train avg loss 0.00112337, dev acc 0.8700, dev avg loss 0.311413, throughput 3.96679K wps
[Epoch 100 Batch 30/162] avg loss 0.000985832, throughput 4.03855K wps
[Epoch 100 Batch 60/162] avg loss 0.000999138, throughput 3.94192K wps
[Epoch 100 Batch 90/162] avg loss 0.00116142, throughput 3.95545K wps
[Epoch 100 Batch 120/162] avg loss 0.00112321, throughput 3.95239K wps
[Epoch 100 Batch 150/162] avg loss 0.00117358, throughput 3.96159K wps
Begin Testing...
[Epoch 100] train avg loss 0.001089, dev acc 0.8722, dev avg loss 0.312663, throughput 3.96877K wps
[Epoch 101 Batch 30/162] avg loss 0.00102728, throughput 4.05791K wps
[Epoch 101 Batch 60/162] avg loss 0.00108264, throughput 3.96304K wps
[Epoch 101 Batch 90/162] avg loss 0.00105729, throughput 3.95758K wps
[Epoch 101 Batch 120/162] avg loss 0.00111388, throughput 3.95694K wps
[Epoch 101 Batch 150/162] avg loss 0.00105416, throughput 3.93091K wps
Begin Testing...
[Epoch 101] train avg loss 0.00106023, dev acc 0.8689, dev avg loss 0.31267, throughput 3.97073K wps
[Epoch 102 Batch 30/162] avg loss 0.000956886, throughput 4.06719K wps
[Epoch 102 Batch 60/162] avg loss 0.00100633, throughput 3.95833K wps
[Epoch 102 Batch 90/162] avg loss 0.00105659, throughput 3.93811K wps
[Epoch 102 Batch 120/162] avg loss 0.00109832, throughput 3.96242K wps
[Epoch 102 Batch 150/162] avg loss 0.00112987, throughput 3.94742K wps
Begin Testing...
[Epoch 102] train avg loss 0.00104171, dev acc 0.8744, dev avg loss 0.313028, throughput 3.97209K wps
[Epoch 103 Batch 30/162] avg loss 0.00100845, throughput 4.04801K wps
[Epoch 103 Batch 60/162] avg loss 0.00090072, throughput 3.96848K wps
[Epoch 103 Batch 90/162] avg loss 0.000963857, throughput 3.95819K wps
[Epoch 103 Batch 120/162] avg loss 0.00101881, throughput 3.94952K wps
[Epoch 103 Batch 150/162] avg loss 0.000973115, throughput 3.95554K wps
Begin Testing...
[Epoch 103] train avg loss 0.000973402, dev acc 0.8667, dev avg loss 0.316, throughput 3.97457K wps
[Epoch 104 Batch 30/162] avg loss 0.000917541, throughput 4.05567K wps
[Epoch 104 Batch 60/162] avg loss 0.000970986, throughput 3.95158K wps
[Epoch 104 Batch 90/162] avg loss 0.0010907, throughput 3.9494K wps
[Epoch 104 Batch 120/162] avg loss 0.000858411, throughput 3.94848K wps
[Epoch 104 Batch 150/162] avg loss 0.000958698, throughput 3.95385K wps
Begin Testing...
[Epoch 104] train avg loss 0.00095761, dev acc 0.8667, dev avg loss 0.317377, throughput 3.96907K wps
[Epoch 105 Batch 30/162] avg loss 0.000936288, throughput 4.01844K wps
[Epoch 105 Batch 60/162] avg loss 0.000924604, throughput 3.9508K wps
[Epoch 105 Batch 90/162] avg loss 0.000911683, throughput 3.96496K wps
[Epoch 105 Batch 120/162] avg loss 0.000933944, throughput 3.95694K wps
[Epoch 105 Batch 150/162] avg loss 0.000954037, throughput 3.9615K wps
Begin Testing...
[Epoch 105] train avg loss 0.000931577, dev acc 0.8744, dev avg loss 0.316151, throughput 3.97022K wps
[Epoch 106 Batch 30/162] avg loss 0.00091595, throughput 4.05397K wps
[Epoch 106 Batch 60/162] avg loss 0.000882288, throughput 3.93856K wps
[Epoch 106 Batch 90/162] avg loss 0.000975554, throughput 3.95315K wps
[Epoch 106 Batch 120/162] avg loss 0.000923872, throughput 3.94145K wps
[Epoch 106 Batch 150/162] avg loss 0.00090584, throughput 3.95234K wps
Begin Testing...
[Epoch 106] train avg loss 0.000920473, dev acc 0.8711, dev avg loss 0.318615, throughput 3.96689K wps
[Epoch 107 Batch 30/162] avg loss 0.000968648, throughput 4.06344K wps
[Epoch 107 Batch 60/162] avg loss 0.000873121, throughput 3.95276K wps
[Epoch 107 Batch 90/162] avg loss 0.000870741, throughput 3.95247K wps
[Epoch 107 Batch 120/162] avg loss 0.000919019, throughput 3.9502K wps
[Epoch 107 Batch 150/162] avg loss 0.000911482, throughput 3.94766K wps
Begin Testing...
[Epoch 107] train avg loss 0.000915068, dev acc 0.8689, dev avg loss 0.318438, throughput 3.97038K wps
[Epoch 108 Batch 30/162] avg loss 0.000928886, throughput 4.03905K wps
[Epoch 108 Batch 60/162] avg loss 0.000852494, throughput 3.95108K wps
[Epoch 108 Batch 90/162] avg loss 0.000892008, throughput 3.95458K wps
[Epoch 108 Batch 120/162] avg loss 0.000845439, throughput 3.94578K wps
[Epoch 108 Batch 150/162] avg loss 0.00083358, throughput 3.94388K wps
Begin Testing...
[Epoch 108] train avg loss 0.000863958, dev acc 0.8678, dev avg loss 0.320159, throughput 3.96551K wps
[Epoch 109 Batch 30/162] avg loss 0.000855992, throughput 4.04774K wps
[Epoch 109 Batch 60/162] avg loss 0.000780208, throughput 3.96299K wps
[Epoch 109 Batch 90/162] avg loss 0.000751818, throughput 3.96161K wps
[Epoch 109 Batch 120/162] avg loss 0.000921723, throughput 3.96273K wps
[Epoch 109 Batch 150/162] avg loss 0.000918694, throughput 3.95448K wps
Begin Testing...
[Epoch 109] train avg loss 0.000857691, dev acc 0.8711, dev avg loss 0.319678, throughput 3.97419K wps
[Epoch 110 Batch 30/162] avg loss 0.000907782, throughput 4.03858K wps
[Epoch 110 Batch 60/162] avg loss 0.000799454, throughput 3.95375K wps
[Epoch 110 Batch 90/162] avg loss 0.000853345, throughput 3.96869K wps
[Epoch 110 Batch 120/162] avg loss 0.000824266, throughput 3.94698K wps
[Epoch 110 Batch 150/162] avg loss 0.000925665, throughput 3.94248K wps
Begin Testing...
[Epoch 110] train avg loss 0.000862273, dev acc 0.8733, dev avg loss 0.321133, throughput 3.96801K wps
[Epoch 111 Batch 30/162] avg loss 0.000860355, throughput 4.02977K wps
[Epoch 111 Batch 60/162] avg loss 0.000724976, throughput 3.95991K wps
[Epoch 111 Batch 90/162] avg loss 0.000725119, throughput 3.93958K wps
[Epoch 111 Batch 120/162] avg loss 0.000842122, throughput 3.94473K wps
[Epoch 111 Batch 150/162] avg loss 0.000828535, throughput 3.94267K wps
Begin Testing...
[Epoch 111] train avg loss 0.000797388, dev acc 0.8711, dev avg loss 0.322749, throughput 3.96094K wps
[Epoch 112 Batch 30/162] avg loss 0.000733537, throughput 4.04759K wps
[Epoch 112 Batch 60/162] avg loss 0.00087894, throughput 3.95294K wps
[Epoch 112 Batch 90/162] avg loss 0.000851285, throughput 3.96166K wps
[Epoch 112 Batch 120/162] avg loss 0.000734073, throughput 3.95378K wps
[Epoch 112 Batch 150/162] avg loss 0.000751507, throughput 3.95106K wps
Begin Testing...
[Epoch 112] train avg loss 0.000780636, dev acc 0.8689, dev avg loss 0.324148, throughput 3.97163K wps
[Epoch 113 Batch 30/162] avg loss 0.000760562, throughput 4.05209K wps
[Epoch 113 Batch 60/162] avg loss 0.000827499, throughput 3.95502K wps
[Epoch 113 Batch 90/162] avg loss 0.000880966, throughput 3.95159K wps
[Epoch 113 Batch 120/162] avg loss 0.000768138, throughput 3.94091K wps
[Epoch 113 Batch 150/162] avg loss 0.000808642, throughput 3.96049K wps
Begin Testing...
[Epoch 113] train avg loss 0.000802858, dev acc 0.8689, dev avg loss 0.323003, throughput 3.96967K wps
[Epoch 114 Batch 30/162] avg loss 0.000705708, throughput 4.04235K wps
[Epoch 114 Batch 60/162] avg loss 0.000733735, throughput 3.9608K wps
[Epoch 114 Batch 90/162] avg loss 0.000806634, throughput 3.96748K wps
[Epoch 114 Batch 120/162] avg loss 0.000846899, throughput 3.95358K wps
[Epoch 114 Batch 150/162] avg loss 0.000645113, throughput 3.96315K wps
Begin Testing...
[Epoch 114] train avg loss 0.000750896, dev acc 0.8733, dev avg loss 0.324248, throughput 3.97592K wps
[Epoch 115 Batch 30/162] avg loss 0.00074217, throughput 4.04174K wps
[Epoch 115 Batch 60/162] avg loss 0.00077987, throughput 3.94976K wps
[Epoch 115 Batch 90/162] avg loss 0.000733686, throughput 3.96732K wps
[Epoch 115 Batch 120/162] avg loss 0.000669849, throughput 3.95757K wps
[Epoch 115 Batch 150/162] avg loss 0.000775208, throughput 3.95732K wps
Begin Testing...
[Epoch 115] train avg loss 0.000738996, dev acc 0.8711, dev avg loss 0.32478, throughput 3.97341K wps
[Epoch 116 Batch 30/162] avg loss 0.000705902, throughput 4.0503K wps
[Epoch 116 Batch 60/162] avg loss 0.000716519, throughput 3.95249K wps
[Epoch 116 Batch 90/162] avg loss 0.000701379, throughput 3.96289K wps
[Epoch 116 Batch 120/162] avg loss 0.000774994, throughput 3.96373K wps
[Epoch 116 Batch 150/162] avg loss 0.000661596, throughput 3.95593K wps
Begin Testing...
[Epoch 116] train avg loss 0.000706162, dev acc 0.8722, dev avg loss 0.327459, throughput 3.97319K wps
[Epoch 117 Batch 30/162] avg loss 0.000788306, throughput 4.02682K wps
[Epoch 117 Batch 60/162] avg loss 0.000654506, throughput 3.95585K wps
[Epoch 117 Batch 90/162] avg loss 0.000608761, throughput 3.95734K wps
[Epoch 117 Batch 120/162] avg loss 0.000665841, throughput 3.95709K wps
[Epoch 117 Batch 150/162] avg loss 0.000761919, throughput 3.96688K wps
Begin Testing...
[Epoch 117] train avg loss 0.000697771, dev acc 0.8744, dev avg loss 0.328092, throughput 3.97033K wps
[Epoch 118 Batch 30/162] avg loss 0.000640404, throughput 4.05828K wps
[Epoch 118 Batch 60/162] avg loss 0.000624179, throughput 3.96824K wps
[Epoch 118 Batch 90/162] avg loss 0.000690407, throughput 3.9553K wps
[Epoch 118 Batch 120/162] avg loss 0.00068841, throughput 3.96328K wps
[Epoch 118 Batch 150/162] avg loss 0.000630051, throughput 3.95584K wps
Begin Testing...
[Epoch 118] train avg loss 0.000657645, dev acc 0.8700, dev avg loss 0.329443, throughput 3.97947K wps
[Epoch 119 Batch 30/162] avg loss 0.000679195, throughput 4.02823K wps
[Epoch 119 Batch 60/162] avg loss 0.000716458, throughput 3.94828K wps
[Epoch 119 Batch 90/162] avg loss 0.000646815, throughput 3.96534K wps
[Epoch 119 Batch 120/162] avg loss 0.000656319, throughput 3.95759K wps
[Epoch 119 Batch 150/162] avg loss 0.000709737, throughput 3.95602K wps
Begin Testing...
[Epoch 119] train avg loss 0.000675678, dev acc 0.8733, dev avg loss 0.330359, throughput 3.97033K wps
[Epoch 120 Batch 30/162] avg loss 0.000620581, throughput 4.04685K wps
[Epoch 120 Batch 60/162] avg loss 0.000643347, throughput 3.94067K wps
[Epoch 120 Batch 90/162] avg loss 0.000586165, throughput 3.93841K wps
[Epoch 120 Batch 120/162] avg loss 0.000602841, throughput 3.95203K wps
[Epoch 120 Batch 150/162] avg loss 0.00062025, throughput 3.96488K wps
Begin Testing...
[Epoch 120] train avg loss 0.000622809, dev acc 0.8733, dev avg loss 0.331216, throughput 3.96716K wps
[Epoch 121 Batch 30/162] avg loss 0.00063742, throughput 4.05237K wps
[Epoch 121 Batch 60/162] avg loss 0.000656654, throughput 3.9377K wps
[Epoch 121 Batch 90/162] avg loss 0.000628281, throughput 3.95875K wps
[Epoch 121 Batch 120/162] avg loss 0.000560999, throughput 3.9457K wps
[Epoch 121 Batch 150/162] avg loss 0.000633017, throughput 3.94604K wps
Begin Testing...
[Epoch 121] train avg loss 0.000631138, dev acc 0.8733, dev avg loss 0.332738, throughput 3.9663K wps
[Epoch 122 Batch 30/162] avg loss 0.000581026, throughput 4.06396K wps
[Epoch 122 Batch 60/162] avg loss 0.000681836, throughput 3.96456K wps
[Epoch 122 Batch 90/162] avg loss 0.000594043, throughput 3.95377K wps
[Epoch 122 Batch 120/162] avg loss 0.000588444, throughput 3.94695K wps
[Epoch 122 Batch 150/162] avg loss 0.000642279, throughput 3.95756K wps
Begin Testing...
[Epoch 122] train avg loss 0.000613052, dev acc 0.8711, dev avg loss 0.33236, throughput 3.97506K wps
[Epoch 123 Batch 30/162] avg loss 0.000576388, throughput 4.03283K wps
[Epoch 123 Batch 60/162] avg loss 0.000580047, throughput 3.93388K wps
[Epoch 123 Batch 90/162] avg loss 0.000594658, throughput 3.95403K wps
[Epoch 123 Batch 120/162] avg loss 0.000555258, throughput 3.94887K wps
[Epoch 123 Batch 150/162] avg loss 0.00067031, throughput 3.96256K wps
Begin Testing...
[Epoch 123] train avg loss 0.000595226, dev acc 0.8689, dev avg loss 0.333026, throughput 3.96573K wps
[Epoch 124 Batch 30/162] avg loss 0.000637954, throughput 4.06406K wps
[Epoch 124 Batch 60/162] avg loss 0.000593675, throughput 3.96399K wps
[Epoch 124 Batch 90/162] avg loss 0.000597745, throughput 3.95876K wps
[Epoch 124 Batch 120/162] avg loss 0.000581186, throughput 3.96299K wps
[Epoch 124 Batch 150/162] avg loss 0.000593609, throughput 3.9616K wps
Begin Testing...
[Epoch 124] train avg loss 0.000594145, dev acc 0.8744, dev avg loss 0.334748, throughput 3.98021K wps
[Epoch 125 Batch 30/162] avg loss 0.000598503, throughput 4.04807K wps
[Epoch 125 Batch 60/162] avg loss 0.000580293, throughput 3.95894K wps
[Epoch 125 Batch 90/162] avg loss 0.000594149, throughput 3.95522K wps
[Epoch 125 Batch 120/162] avg loss 0.000586566, throughput 3.96366K wps
[Epoch 125 Batch 150/162] avg loss 0.000563089, throughput 3.96049K wps
Begin Testing...
[Epoch 125] train avg loss 0.000592719, dev acc 0.8711, dev avg loss 0.335416, throughput 3.97476K wps
[Epoch 126 Batch 30/162] avg loss 0.000579092, throughput 4.05983K wps
[Epoch 126 Batch 60/162] avg loss 0.000631974, throughput 3.96441K wps
[Epoch 126 Batch 90/162] avg loss 0.000550542, throughput 3.9625K wps
[Epoch 126 Batch 120/162] avg loss 0.000609321, throughput 3.96479K wps
[Epoch 126 Batch 150/162] avg loss 0.000579816, throughput 3.95974K wps
Begin Testing...
[Epoch 126] train avg loss 0.000596914, dev acc 0.8744, dev avg loss 0.336188, throughput 3.98078K wps
[Epoch 127 Batch 30/162] avg loss 0.000684008, throughput 4.03536K wps
[Epoch 127 Batch 60/162] avg loss 0.000526554, throughput 3.95041K wps
[Epoch 127 Batch 90/162] avg loss 0.000555666, throughput 3.94853K wps
[Epoch 127 Batch 120/162] avg loss 0.000515797, throughput 3.96163K wps
[Epoch 127 Batch 150/162] avg loss 0.00056324, throughput 3.96769K wps
Begin Testing...
[Epoch 127] train avg loss 0.000560363, dev acc 0.8756, dev avg loss 0.336826, throughput 3.97178K wps
Observed Improvement.
Begin Testing...
[Epoch 128 Batch 30/162] avg loss 0.000590017, throughput 4.05017K wps
[Epoch 128 Batch 60/162] avg loss 0.000577444, throughput 3.95688K wps
[Epoch 128 Batch 90/162] avg loss 0.000556642, throughput 3.98111K wps
[Epoch 128 Batch 120/162] avg loss 0.000450467, throughput 3.96099K wps
[Epoch 128 Batch 150/162] avg loss 0.000498118, throughput 3.9598K wps
Begin Testing...
[Epoch 128] train avg loss 0.000533267, dev acc 0.8722, dev avg loss 0.338178, throughput 3.97949K wps
[Epoch 129 Batch 30/162] avg loss 0.000535857, throughput 4.0531K wps
[Epoch 129 Batch 60/162] avg loss 0.000517204, throughput 3.96153K wps
[Epoch 129 Batch 90/162] avg loss 0.000571695, throughput 3.97163K wps
[Epoch 129 Batch 120/162] avg loss 0.000533359, throughput 3.95781K wps
[Epoch 129 Batch 150/162] avg loss 0.000530628, throughput 3.96472K wps
Begin Testing...
[Epoch 129] train avg loss 0.000542628, dev acc 0.8689, dev avg loss 0.339271, throughput 3.97978K wps
[Epoch 130 Batch 30/162] avg loss 0.000533652, throughput 4.05816K wps
[Epoch 130 Batch 60/162] avg loss 0.000518492, throughput 3.95964K wps
[Epoch 130 Batch 90/162] avg loss 0.000552886, throughput 3.94683K wps
[Epoch 130 Batch 120/162] avg loss 0.000470923, throughput 3.96573K wps
[Epoch 130 Batch 150/162] avg loss 0.000557084, throughput 3.97441K wps
Begin Testing...
[Epoch 130] train avg loss 0.00052918, dev acc 0.8722, dev avg loss 0.3405, throughput 3.9785K wps
[Epoch 131 Batch 30/162] avg loss 0.000564771, throughput 4.05984K wps
[Epoch 131 Batch 60/162] avg loss 0.000525661, throughput 3.96163K wps
[Epoch 131 Batch 90/162] avg loss 0.000508259, throughput 3.94929K wps
[Epoch 131 Batch 120/162] avg loss 0.000520279, throughput 3.95444K wps
[Epoch 131 Batch 150/162] avg loss 0.000573608, throughput 3.94916K wps
Begin Testing...
[Epoch 131] train avg loss 0.000538112, dev acc 0.8722, dev avg loss 0.340357, throughput 3.97267K wps
[Epoch 132 Batch 30/162] avg loss 0.000528341, throughput 4.06201K wps
[Epoch 132 Batch 60/162] avg loss 0.00044103, throughput 3.96538K wps
[Epoch 132 Batch 90/162] avg loss 0.000523024, throughput 3.97061K wps
[Epoch 132 Batch 120/162] avg loss 0.000478552, throughput 3.95753K wps
[Epoch 132 Batch 150/162] avg loss 0.000604844, throughput 3.96785K wps
Begin Testing...
[Epoch 132] train avg loss 0.000515399, dev acc 0.8722, dev avg loss 0.341469, throughput 3.9822K wps
[Epoch 133 Batch 30/162] avg loss 0.000509505, throughput 4.05947K wps
[Epoch 133 Batch 60/162] avg loss 0.000517309, throughput 3.97039K wps
[Epoch 133 Batch 90/162] avg loss 0.000451002, throughput 3.94708K wps
[Epoch 133 Batch 120/162] avg loss 0.000503515, throughput 3.96669K wps
[Epoch 133 Batch 150/162] avg loss 0.000502468, throughput 3.95809K wps
Begin Testing...
[Epoch 133] train avg loss 0.000487707, dev acc 0.8711, dev avg loss 0.344033, throughput 3.97856K wps
[Epoch 134 Batch 30/162] avg loss 0.000492308, throughput 4.05082K wps
[Epoch 134 Batch 60/162] avg loss 0.00043941, throughput 3.94728K wps
[Epoch 134 Batch 90/162] avg loss 0.000485044, throughput 3.95848K wps
[Epoch 134 Batch 120/162] avg loss 0.000612839, throughput 3.97318K wps
[Epoch 134 Batch 150/162] avg loss 0.000431918, throughput 3.97414K wps
Begin Testing...
[Epoch 134] train avg loss 0.000486773, dev acc 0.8744, dev avg loss 0.344888, throughput 3.97798K wps
[Epoch 135 Batch 30/162] avg loss 0.000481371, throughput 4.04651K wps
[Epoch 135 Batch 60/162] avg loss 0.000465166, throughput 3.94963K wps
[Epoch 135 Batch 90/162] avg loss 0.000471873, throughput 3.93467K wps
[Epoch 135 Batch 120/162] avg loss 0.000431727, throughput 3.95798K wps
[Epoch 135 Batch 150/162] avg loss 0.000472871, throughput 3.96466K wps
Begin Testing...
[Epoch 135] train avg loss 0.000468958, dev acc 0.8744, dev avg loss 0.345443, throughput 3.96977K wps
[Epoch 136 Batch 30/162] avg loss 0.000440267, throughput 4.05588K wps
[Epoch 136 Batch 60/162] avg loss 0.000458683, throughput 3.96517K wps
[Epoch 136 Batch 90/162] avg loss 0.000409823, throughput 3.9678K wps
[Epoch 136 Batch 120/162] avg loss 0.000467439, throughput 3.96178K wps
[Epoch 136 Batch 150/162] avg loss 0.000493976, throughput 3.9614K wps
Begin Testing...
[Epoch 136] train avg loss 0.000462104, dev acc 0.8778, dev avg loss 0.343777, throughput 3.98065K wps
Observed Improvement.
Begin Testing...
[Epoch 137 Batch 30/162] avg loss 0.000439894, throughput 4.06163K wps
[Epoch 137 Batch 60/162] avg loss 0.000485679, throughput 3.95414K wps
[Epoch 137 Batch 90/162] avg loss 0.000437067, throughput 3.97194K wps
[Epoch 137 Batch 120/162] avg loss 0.000480773, throughput 3.95823K wps
[Epoch 137 Batch 150/162] avg loss 0.000498504, throughput 3.96695K wps
Begin Testing...
[Epoch 137] train avg loss 0.000463032, dev acc 0.8689, dev avg loss 0.346629, throughput 3.98004K wps
[Epoch 138 Batch 30/162] avg loss 0.000516112, throughput 4.06055K wps
[Epoch 138 Batch 60/162] avg loss 0.000447294, throughput 3.96839K wps
[Epoch 138 Batch 90/162] avg loss 0.00046956, throughput 3.96214K wps
[Epoch 138 Batch 120/162] avg loss 0.00045606, throughput 3.96545K wps
[Epoch 138 Batch 150/162] avg loss 0.000427011, throughput 3.97257K wps
Begin Testing...
[Epoch 138] train avg loss 0.000458727, dev acc 0.8811, dev avg loss 0.347055, throughput 3.94735K wps
Observed Improvement.
Begin Testing...
[Epoch 139 Batch 30/162] avg loss 0.000410396, throughput 4.06201K wps
[Epoch 139 Batch 60/162] avg loss 0.000443291, throughput 3.97234K wps
[Epoch 139 Batch 90/162] avg loss 0.000426597, throughput 3.95029K wps
[Epoch 139 Batch 120/162] avg loss 0.000389795, throughput 3.96937K wps
[Epoch 139 Batch 150/162] avg loss 0.00046126, throughput 3.97032K wps
Begin Testing...
[Epoch 139] train avg loss 0.000422986, dev acc 0.8744, dev avg loss 0.346594, throughput 3.98213K wps
[Epoch 140 Batch 30/162] avg loss 0.000488082, throughput 4.06888K wps
[Epoch 140 Batch 60/162] avg loss 0.000412843, throughput 3.94774K wps
[Epoch 140 Batch 90/162] avg loss 0.0003762, throughput 3.96507K wps
[Epoch 140 Batch 120/162] avg loss 0.000444114, throughput 3.93048K wps
[Epoch 140 Batch 150/162] avg loss 0.000364205, throughput 3.93703K wps
Begin Testing...
[Epoch 140] train avg loss 0.000419264, dev acc 0.8767, dev avg loss 0.347907, throughput 3.96756K wps
[Epoch 141 Batch 30/162] avg loss 0.000441834, throughput 4.04627K wps
[Epoch 141 Batch 60/162] avg loss 0.000423552, throughput 3.94856K wps
[Epoch 141 Batch 90/162] avg loss 0.000431284, throughput 3.95085K wps
[Epoch 141 Batch 120/162] avg loss 0.000405939, throughput 3.95668K wps
[Epoch 141 Batch 150/162] avg loss 0.000452048, throughput 3.95923K wps
Begin Testing...
[Epoch 141] train avg loss 0.000426039, dev acc 0.8722, dev avg loss 0.350509, throughput 3.9699K wps
[Epoch 142 Batch 30/162] avg loss 0.000372451, throughput 4.05839K wps
[Epoch 142 Batch 60/162] avg loss 0.000437155, throughput 3.96307K wps
[Epoch 142 Batch 90/162] avg loss 0.000404643, throughput 3.94914K wps
[Epoch 142 Batch 120/162] avg loss 0.00038652, throughput 3.9588K wps
[Epoch 142 Batch 150/162] avg loss 0.000394296, throughput 3.9594K wps
Begin Testing...
[Epoch 142] train avg loss 0.000394318, dev acc 0.8756, dev avg loss 0.349279, throughput 3.97548K wps
[Epoch 143 Batch 30/162] avg loss 0.000394702, throughput 4.04119K wps
[Epoch 143 Batch 60/162] avg loss 0.000458443, throughput 3.96324K wps
[Epoch 143 Batch 90/162] avg loss 0.000416099, throughput 3.96322K wps
[Epoch 143 Batch 120/162] avg loss 0.00043793, throughput 3.96419K wps
[Epoch 143 Batch 150/162] avg loss 0.000358315, throughput 3.94294K wps
Begin Testing...
[Epoch 143] train avg loss 0.000406795, dev acc 0.8744, dev avg loss 0.34866, throughput 3.97099K wps
[Epoch 144 Batch 30/162] avg loss 0.00042192, throughput 4.05113K wps
[Epoch 144 Batch 60/162] avg loss 0.000405457, throughput 3.96671K wps
[Epoch 144 Batch 90/162] avg loss 0.00037333, throughput 3.95481K wps
[Epoch 144 Batch 120/162] avg loss 0.000353351, throughput 3.96038K wps
[Epoch 144 Batch 150/162] avg loss 0.000381136, throughput 3.94324K wps
Begin Testing...
[Epoch 144] train avg loss 0.000390633, dev acc 0.8700, dev avg loss 0.352807, throughput 3.97422K wps
[Epoch 145 Batch 30/162] avg loss 0.000384161, throughput 4.0517K wps
[Epoch 145 Batch 60/162] avg loss 0.000394856, throughput 3.96195K wps
[Epoch 145 Batch 90/162] avg loss 0.000361635, throughput 3.97146K wps
[Epoch 145 Batch 120/162] avg loss 0.000457297, throughput 3.95754K wps
[Epoch 145 Batch 150/162] avg loss 0.000353282, throughput 3.96535K wps
Begin Testing...
[Epoch 145] train avg loss 0.000386223, dev acc 0.8778, dev avg loss 0.351417, throughput 3.98003K wps
[Epoch 146 Batch 30/162] avg loss 0.000349572, throughput 4.05252K wps
[Epoch 146 Batch 60/162] avg loss 0.000355267, throughput 3.95019K wps
[Epoch 146 Batch 90/162] avg loss 0.000380316, throughput 3.95608K wps
[Epoch 146 Batch 120/162] avg loss 0.000363877, throughput 3.96066K wps
[Epoch 146 Batch 150/162] avg loss 0.000364624, throughput 3.96232K wps
Begin Testing...
[Epoch 146] train avg loss 0.000367716, dev acc 0.8722, dev avg loss 0.35248, throughput 3.9741K wps
[Epoch 147 Batch 30/162] avg loss 0.000345885, throughput 4.0415K wps
[Epoch 147 Batch 60/162] avg loss 0.00039319, throughput 3.95065K wps
[Epoch 147 Batch 90/162] avg loss 0.000366816, throughput 3.96547K wps
[Epoch 147 Batch 120/162] avg loss 0.000354852, throughput 3.95331K wps
[Epoch 147 Batch 150/162] avg loss 0.000348888, throughput 3.95866K wps
Begin Testing...
[Epoch 147] train avg loss 0.000365055, dev acc 0.8733, dev avg loss 0.353398, throughput 3.97149K wps
[Epoch 148 Batch 30/162] avg loss 0.000350535, throughput 4.05809K wps
[Epoch 148 Batch 60/162] avg loss 0.000396544, throughput 3.9649K wps
[Epoch 148 Batch 90/162] avg loss 0.000374311, throughput 3.95656K wps
[Epoch 148 Batch 120/162] avg loss 0.000313889, throughput 3.95773K wps
[Epoch 148 Batch 150/162] avg loss 0.000400996, throughput 3.96473K wps
Begin Testing...
[Epoch 148] train avg loss 0.000362087, dev acc 0.8778, dev avg loss 0.353675, throughput 3.97846K wps
[Epoch 149 Batch 30/162] avg loss 0.000372928, throughput 4.05747K wps
[Epoch 149 Batch 60/162] avg loss 0.000415365, throughput 3.95926K wps
[Epoch 149 Batch 90/162] avg loss 0.000377937, throughput 3.96073K wps
[Epoch 149 Batch 120/162] avg loss 0.000345187, throughput 3.95932K wps
[Epoch 149 Batch 150/162] avg loss 0.000321543, throughput 3.96548K wps
Begin Testing...
[Epoch 149] train avg loss 0.000368905, dev acc 0.8800, dev avg loss 0.353776, throughput 3.97955K wps
[Epoch 150 Batch 30/162] avg loss 0.000359066, throughput 4.057K wps
[Epoch 150 Batch 60/162] avg loss 0.000367981, throughput 3.95046K wps
[Epoch 150 Batch 90/162] avg loss 0.000374034, throughput 3.96841K wps
[Epoch 150 Batch 120/162] avg loss 0.000340449, throughput 3.94365K wps
[Epoch 150 Batch 150/162] avg loss 0.000371178, throughput 3.95052K wps
Begin Testing...
[Epoch 150] train avg loss 0.000361052, dev acc 0.8822, dev avg loss 0.35676, throughput 3.97146K wps
Observed Improvement.
Begin Testing...
[Epoch 151 Batch 30/162] avg loss 0.000374767, throughput 4.05718K wps
[Epoch 151 Batch 60/162] avg loss 0.000348284, throughput 3.96405K wps
[Epoch 151 Batch 90/162] avg loss 0.000332676, throughput 3.96924K wps
[Epoch 151 Batch 120/162] avg loss 0.000360408, throughput 3.96105K wps
[Epoch 151 Batch 150/162] avg loss 0.000351951, throughput 3.96456K wps
Begin Testing...
[Epoch 151] train avg loss 0.000357006, dev acc 0.8767, dev avg loss 0.358076, throughput 3.98211K wps
[Epoch 152 Batch 30/162] avg loss 0.000337296, throughput 4.06341K wps
[Epoch 152 Batch 60/162] avg loss 0.000413937, throughput 3.96258K wps
[Epoch 152 Batch 90/162] avg loss 0.000388187, throughput 3.9566K wps
[Epoch 152 Batch 120/162] avg loss 0.00034261, throughput 3.96866K wps
[Epoch 152 Batch 150/162] avg loss 0.000315709, throughput 3.94656K wps
Begin Testing...
[Epoch 152] train avg loss 0.000360362, dev acc 0.8744, dev avg loss 0.357071, throughput 3.97693K wps
[Epoch 153 Batch 30/162] avg loss 0.000330775, throughput 4.05451K wps
[Epoch 153 Batch 60/162] avg loss 0.000320381, throughput 3.95449K wps
[Epoch 153 Batch 90/162] avg loss 0.000357384, throughput 3.95594K wps
[Epoch 153 Batch 120/162] avg loss 0.000301508, throughput 3.96093K wps
[Epoch 153 Batch 150/162] avg loss 0.000328437, throughput 3.9728K wps
Begin Testing...
[Epoch 153] train avg loss 0.000323192, dev acc 0.8789, dev avg loss 0.360389, throughput 3.97728K wps
[Epoch 154 Batch 30/162] avg loss 0.000310739, throughput 4.0547K wps
[Epoch 154 Batch 60/162] avg loss 0.000280058, throughput 3.95825K wps
[Epoch 154 Batch 90/162] avg loss 0.000279237, throughput 3.9527K wps
[Epoch 154 Batch 120/162] avg loss 0.000302837, throughput 3.95989K wps
[Epoch 154 Batch 150/162] avg loss 0.000383532, throughput 3.96483K wps
Begin Testing...
[Epoch 154] train avg loss 0.000311838, dev acc 0.8789, dev avg loss 0.359517, throughput 3.97537K wps
[Epoch 155 Batch 30/162] avg loss 0.00031177, throughput 4.06137K wps
[Epoch 155 Batch 60/162] avg loss 0.000327021, throughput 3.96306K wps
[Epoch 155 Batch 90/162] avg loss 0.000332528, throughput 3.96371K wps
[Epoch 155 Batch 120/162] avg loss 0.000312505, throughput 3.95957K wps
[Epoch 155 Batch 150/162] avg loss 0.000372134, throughput 3.94602K wps
Begin Testing...
[Epoch 155] train avg loss 0.000333155, dev acc 0.8767, dev avg loss 0.359988, throughput 3.97739K wps
[Epoch 156 Batch 30/162] avg loss 0.000325126, throughput 4.05116K wps
[Epoch 156 Batch 60/162] avg loss 0.000302407, throughput 3.95507K wps
[Epoch 156 Batch 90/162] avg loss 0.000338013, throughput 3.95748K wps
[Epoch 156 Batch 120/162] avg loss 0.000308931, throughput 3.96378K wps
[Epoch 156 Batch 150/162] avg loss 0.000347835, throughput 3.94814K wps
Begin Testing...
[Epoch 156] train avg loss 0.000322947, dev acc 0.8722, dev avg loss 0.363325, throughput 3.97336K wps
[Epoch 157 Batch 30/162] avg loss 0.000306878, throughput 4.04948K wps
[Epoch 157 Batch 60/162] avg loss 0.000319214, throughput 3.9493K wps
[Epoch 157 Batch 90/162] avg loss 0.000299385, throughput 3.93594K wps
[Epoch 157 Batch 120/162] avg loss 0.000357437, throughput 3.9718K wps
[Epoch 157 Batch 150/162] avg loss 0.000302647, throughput 3.97508K wps
Begin Testing...
[Epoch 157] train avg loss 0.000319623, dev acc 0.8778, dev avg loss 0.36068, throughput 3.97401K wps
[Epoch 158 Batch 30/162] avg loss 0.000344389, throughput 4.05121K wps
[Epoch 158 Batch 60/162] avg loss 0.000279364, throughput 3.96254K wps
[Epoch 158 Batch 90/162] avg loss 0.000326065, throughput 3.96747K wps
[Epoch 158 Batch 120/162] avg loss 0.000306646, throughput 3.95767K wps
[Epoch 158 Batch 150/162] avg loss 0.000280712, throughput 3.95202K wps
Begin Testing...
[Epoch 158] train avg loss 0.000306827, dev acc 0.8789, dev avg loss 0.36184, throughput 3.97697K wps
[Epoch 159 Batch 30/162] avg loss 0.000292956, throughput 4.04522K wps
[Epoch 159 Batch 60/162] avg loss 0.000335599, throughput 3.94912K wps
[Epoch 159 Batch 90/162] avg loss 0.000305514, throughput 3.93638K wps
[Epoch 159 Batch 120/162] avg loss 0.000290446, throughput 3.95031K wps
[Epoch 159 Batch 150/162] avg loss 0.00035511, throughput 3.94168K wps
Begin Testing...
[Epoch 159] train avg loss 0.000314509, dev acc 0.8744, dev avg loss 0.363879, throughput 3.96396K wps
[Epoch 160 Batch 30/162] avg loss 0.000261176, throughput 4.07106K wps
[Epoch 160 Batch 60/162] avg loss 0.000294956, throughput 3.96244K wps
[Epoch 160 Batch 90/162] avg loss 0.00029106, throughput 3.95548K wps
[Epoch 160 Batch 120/162] avg loss 0.000347018, throughput 3.96368K wps
[Epoch 160 Batch 150/162] avg loss 0.00031581, throughput 3.95636K wps
Begin Testing...
[Epoch 160] train avg loss 0.000306363, dev acc 0.8756, dev avg loss 0.36175, throughput 3.97997K wps
[Epoch 161 Batch 30/162] avg loss 0.000309389, throughput 4.06598K wps
[Epoch 161 Batch 60/162] avg loss 0.000240746, throughput 3.95701K wps
[Epoch 161 Batch 90/162] avg loss 0.000296852, throughput 3.95973K wps
[Epoch 161 Batch 120/162] avg loss 0.000307134, throughput 3.96192K wps
[Epoch 161 Batch 150/162] avg loss 0.00033765, throughput 3.95906K wps
Begin Testing...
[Epoch 161] train avg loss 0.000296557, dev acc 0.8800, dev avg loss 0.362803, throughput 3.97845K wps
[Epoch 162 Batch 30/162] avg loss 0.000296729, throughput 4.06175K wps
[Epoch 162 Batch 60/162] avg loss 0.000281825, throughput 3.95628K wps
[Epoch 162 Batch 90/162] avg loss 0.000286819, throughput 3.94895K wps
[Epoch 162 Batch 120/162] avg loss 0.000332894, throughput 3.96262K wps
[Epoch 162 Batch 150/162] avg loss 0.000305065, throughput 3.9626K wps
Begin Testing...
[Epoch 162] train avg loss 0.000305928, dev acc 0.8767, dev avg loss 0.363389, throughput 3.97798K wps
[Epoch 163 Batch 30/162] avg loss 0.000298493, throughput 4.03907K wps
[Epoch 163 Batch 60/162] avg loss 0.00032192, throughput 3.96162K wps
[Epoch 163 Batch 90/162] avg loss 0.000298162, throughput 3.94829K wps
[Epoch 163 Batch 120/162] avg loss 0.000288992, throughput 3.96523K wps
[Epoch 163 Batch 150/162] avg loss 0.000268097, throughput 3.9669K wps
Begin Testing...
[Epoch 163] train avg loss 0.000294323, dev acc 0.8733, dev avg loss 0.366378, throughput 3.97528K wps
[Epoch 164 Batch 30/162] avg loss 0.000270921, throughput 4.05839K wps
[Epoch 164 Batch 60/162] avg loss 0.00027663, throughput 3.93459K wps
[Epoch 164 Batch 90/162] avg loss 0.000291825, throughput 3.95986K wps
[Epoch 164 Batch 120/162] avg loss 0.000249636, throughput 3.95291K wps
[Epoch 164 Batch 150/162] avg loss 0.000286409, throughput 3.94615K wps
Begin Testing...
[Epoch 164] train avg loss 0.000272962, dev acc 0.8778, dev avg loss 0.366315, throughput 3.97052K wps
[Epoch 165 Batch 30/162] avg loss 0.000247146, throughput 4.04754K wps
[Epoch 165 Batch 60/162] avg loss 0.000274462, throughput 3.951K wps
[Epoch 165 Batch 90/162] avg loss 0.000297963, throughput 3.9412K wps
[Epoch 165 Batch 120/162] avg loss 0.000305075, throughput 3.9541K wps
[Epoch 165 Batch 150/162] avg loss 0.000308556, throughput 3.96888K wps
Begin Testing...
[Epoch 165] train avg loss 0.000283301, dev acc 0.8789, dev avg loss 0.365432, throughput 3.96996K wps
[Epoch 166 Batch 30/162] avg loss 0.000263791, throughput 4.04865K wps
[Epoch 166 Batch 60/162] avg loss 0.000247552, throughput 3.95244K wps
[Epoch 166 Batch 90/162] avg loss 0.000267646, throughput 3.94382K wps
[Epoch 166 Batch 120/162] avg loss 0.000303291, throughput 3.96369K wps
[Epoch 166 Batch 150/162] avg loss 0.000325617, throughput 3.95463K wps
Begin Testing...
[Epoch 166] train avg loss 0.000279905, dev acc 0.8800, dev avg loss 0.365795, throughput 3.97147K wps
[Epoch 167 Batch 30/162] avg loss 0.000274607, throughput 4.0517K wps
[Epoch 167 Batch 60/162] avg loss 0.000247084, throughput 3.96263K wps
[Epoch 167 Batch 90/162] avg loss 0.000255548, throughput 3.96355K wps
[Epoch 167 Batch 120/162] avg loss 0.000264418, throughput 3.94725K wps
[Epoch 167 Batch 150/162] avg loss 0.000261637, throughput 3.95622K wps
Begin Testing...
[Epoch 167] train avg loss 0.000262031, dev acc 0.8767, dev avg loss 0.368288, throughput 3.9756K wps
[Epoch 168 Batch 30/162] avg loss 0.000268076, throughput 4.05349K wps
[Epoch 168 Batch 60/162] avg loss 0.000268803, throughput 3.94996K wps
[Epoch 168 Batch 90/162] avg loss 0.000248839, throughput 3.96773K wps
[Epoch 168 Batch 120/162] avg loss 0.000274471, throughput 3.96028K wps
[Epoch 168 Batch 150/162] avg loss 0.000260818, throughput 3.9626K wps
Begin Testing...
[Epoch 168] train avg loss 0.000262073, dev acc 0.8756, dev avg loss 0.369071, throughput 3.97653K wps
[Epoch 169 Batch 30/162] avg loss 0.000237903, throughput 4.04268K wps
[Epoch 169 Batch 60/162] avg loss 0.000255424, throughput 3.94746K wps
[Epoch 169 Batch 90/162] avg loss 0.000262465, throughput 3.931K wps
[Epoch 169 Batch 120/162] avg loss 0.000255819, throughput 3.95354K wps
[Epoch 169 Batch 150/162] avg loss 0.000242763, throughput 3.94058K wps
Begin Testing...
[Epoch 169] train avg loss 0.000257853, dev acc 0.8744, dev avg loss 0.369835, throughput 3.96181K wps
[Epoch 170 Batch 30/162] avg loss 0.000237994, throughput 4.04025K wps
[Epoch 170 Batch 60/162] avg loss 0.000295478, throughput 3.96644K wps
[Epoch 170 Batch 90/162] avg loss 0.000253256, throughput 3.9643K wps
[Epoch 170 Batch 120/162] avg loss 0.000253208, throughput 3.9711K wps
[Epoch 170 Batch 150/162] avg loss 0.000293776, throughput 3.95448K wps
Begin Testing...
[Epoch 170] train avg loss 0.000262216, dev acc 0.8767, dev avg loss 0.369365, throughput 3.97622K wps
[Epoch 171 Batch 30/162] avg loss 0.000262973, throughput 4.04803K wps
[Epoch 171 Batch 60/162] avg loss 0.000245166, throughput 3.96516K wps
[Epoch 171 Batch 90/162] avg loss 0.00025151, throughput 3.95438K wps
[Epoch 171 Batch 120/162] avg loss 0.000227704, throughput 3.95919K wps
[Epoch 171 Batch 150/162] avg loss 0.00026897, throughput 3.94449K wps
Begin Testing...
[Epoch 171] train avg loss 0.000255071, dev acc 0.8744, dev avg loss 0.370418, throughput 3.97277K wps
[Epoch 172 Batch 30/162] avg loss 0.000273966, throughput 4.0617K wps
[Epoch 172 Batch 60/162] avg loss 0.000209355, throughput 3.95402K wps
[Epoch 172 Batch 90/162] avg loss 0.000241665, throughput 3.95283K wps
[Epoch 172 Batch 120/162] avg loss 0.00025483, throughput 3.9509K wps
[Epoch 172 Batch 150/162] avg loss 0.000218247, throughput 3.96302K wps
Begin Testing...
[Epoch 172] train avg loss 0.000242808, dev acc 0.8744, dev avg loss 0.372799, throughput 3.97539K wps
[Epoch 173 Batch 30/162] avg loss 0.000241103, throughput 4.05104K wps
[Epoch 173 Batch 60/162] avg loss 0.000237669, throughput 3.96076K wps
[Epoch 173 Batch 90/162] avg loss 0.00027657, throughput 3.95504K wps
[Epoch 173 Batch 120/162] avg loss 0.000260724, throughput 3.95366K wps
[Epoch 173 Batch 150/162] avg loss 0.000277628, throughput 3.96672K wps
Begin Testing...
[Epoch 173] train avg loss 0.000259067, dev acc 0.8756, dev avg loss 0.369135, throughput 3.97487K wps
[Epoch 174 Batch 30/162] avg loss 0.000252888, throughput 4.06714K wps
[Epoch 174 Batch 60/162] avg loss 0.000253606, throughput 3.9618K wps
[Epoch 174 Batch 90/162] avg loss 0.000247277, throughput 3.96517K wps
[Epoch 174 Batch 120/162] avg loss 0.000259701, throughput 3.9635K wps
[Epoch 174 Batch 150/162] avg loss 0.000249054, throughput 3.96042K wps
Begin Testing...
[Epoch 174] train avg loss 0.000253802, dev acc 0.8744, dev avg loss 0.370706, throughput 3.98111K wps
[Epoch 175 Batch 30/162] avg loss 0.000269444, throughput 4.05701K wps
[Epoch 175 Batch 60/162] avg loss 0.000223875, throughput 3.96781K wps
[Epoch 175 Batch 90/162] avg loss 0.000217924, throughput 3.94648K wps
[Epoch 175 Batch 120/162] avg loss 0.000233415, throughput 3.96994K wps
[Epoch 175 Batch 150/162] avg loss 0.000275408, throughput 3.95882K wps
Begin Testing...
[Epoch 175] train avg loss 0.000241494, dev acc 0.8767, dev avg loss 0.371899, throughput 3.97783K wps
[Epoch 176 Batch 30/162] avg loss 0.000299991, throughput 4.03367K wps
[Epoch 176 Batch 60/162] avg loss 0.000233969, throughput 3.95583K wps
[Epoch 176 Batch 90/162] avg loss 0.000230582, throughput 3.95271K wps
[Epoch 176 Batch 120/162] avg loss 0.000222982, throughput 3.94591K wps
[Epoch 176 Batch 150/162] avg loss 0.000213328, throughput 3.96452K wps
Begin Testing...
[Epoch 176] train avg loss 0.00023985, dev acc 0.8756, dev avg loss 0.37279, throughput 3.96804K wps
[Epoch 177 Batch 30/162] avg loss 0.00023531, throughput 4.05201K wps
[Epoch 177 Batch 60/162] avg loss 0.000215174, throughput 3.95793K wps
[Epoch 177 Batch 90/162] avg loss 0.000271424, throughput 3.96628K wps
[Epoch 177 Batch 120/162] avg loss 0.000214845, throughput 3.97344K wps
[Epoch 177 Batch 150/162] avg loss 0.000247782, throughput 3.94654K wps
Begin Testing...
[Epoch 177] train avg loss 0.00023455, dev acc 0.8789, dev avg loss 0.372822, throughput 3.97815K wps
[Epoch 178 Batch 30/162] avg loss 0.000194549, throughput 4.05755K wps
[Epoch 178 Batch 60/162] avg loss 0.000204889, throughput 3.9563K wps
[Epoch 178 Batch 90/162] avg loss 0.000252112, throughput 3.95242K wps
[Epoch 178 Batch 120/162] avg loss 0.000209705, throughput 3.967K wps
[Epoch 178 Batch 150/162] avg loss 0.000243404, throughput 3.95553K wps
Begin Testing...
[Epoch 178] train avg loss 0.000224528, dev acc 0.8756, dev avg loss 0.373541, throughput 3.97626K wps
[Epoch 179 Batch 30/162] avg loss 0.00020438, throughput 4.04436K wps
[Epoch 179 Batch 60/162] avg loss 0.000220593, throughput 3.94903K wps
[Epoch 179 Batch 90/162] avg loss 0.000245087, throughput 3.95677K wps
[Epoch 179 Batch 120/162] avg loss 0.000210764, throughput 3.96961K wps
[Epoch 179 Batch 150/162] avg loss 0.00020863, throughput 3.96687K wps
Begin Testing...
[Epoch 179] train avg loss 0.000216929, dev acc 0.8733, dev avg loss 0.374645, throughput 3.97598K wps
[Epoch 180 Batch 30/162] avg loss 0.00021777, throughput 4.04811K wps
[Epoch 180 Batch 60/162] avg loss 0.000224815, throughput 3.95889K wps
[Epoch 180 Batch 90/162] avg loss 0.000225689, throughput 3.94811K wps
[Epoch 180 Batch 120/162] avg loss 0.000226896, throughput 3.94154K wps
[Epoch 180 Batch 150/162] avg loss 0.000221833, throughput 3.945K wps
Begin Testing...
[Epoch 180] train avg loss 0.000223146, dev acc 0.8744, dev avg loss 0.37787, throughput 3.96529K wps
[Epoch 181 Batch 30/162] avg loss 0.000201337, throughput 4.0374K wps
[Epoch 181 Batch 60/162] avg loss 0.000236027, throughput 3.95623K wps
[Epoch 181 Batch 90/162] avg loss 0.000238609, throughput 3.94354K wps
[Epoch 181 Batch 120/162] avg loss 0.000219418, throughput 3.9576K wps
[Epoch 181 Batch 150/162] avg loss 0.00019284, throughput 3.95469K wps
Begin Testing...
[Epoch 181] train avg loss 0.000217044, dev acc 0.8778, dev avg loss 0.378017, throughput 3.9678K wps
[Epoch 182 Batch 30/162] avg loss 0.000218038, throughput 4.05302K wps
[Epoch 182 Batch 60/162] avg loss 0.000246357, throughput 3.94114K wps
[Epoch 182 Batch 90/162] avg loss 0.000244649, throughput 3.96248K wps
[Epoch 182 Batch 120/162] avg loss 0.000194925, throughput 3.93937K wps
[Epoch 182 Batch 150/162] avg loss 0.000215682, throughput 3.94485K wps
Begin Testing...
[Epoch 182] train avg loss 0.000226949, dev acc 0.8778, dev avg loss 0.379224, throughput 3.96648K wps
[Epoch 183 Batch 30/162] avg loss 0.000204317, throughput 4.04939K wps
[Epoch 183 Batch 60/162] avg loss 0.000217531, throughput 3.94371K wps
[Epoch 183 Batch 90/162] avg loss 0.000198275, throughput 3.93255K wps
[Epoch 183 Batch 120/162] avg loss 0.000243345, throughput 3.97022K wps
[Epoch 183 Batch 150/162] avg loss 0.00020177, throughput 3.95552K wps
Begin Testing...
[Epoch 183] train avg loss 0.000217573, dev acc 0.8778, dev avg loss 0.377811, throughput 3.96851K wps
[Epoch 184 Batch 30/162] avg loss 0.000237384, throughput 4.06999K wps
[Epoch 184 Batch 60/162] avg loss 0.000238974, throughput 3.93695K wps
[Epoch 184 Batch 90/162] avg loss 0.000204911, throughput 3.95733K wps
[Epoch 184 Batch 120/162] avg loss 0.000268385, throughput 3.95869K wps
[Epoch 184 Batch 150/162] avg loss 0.000212797, throughput 3.96662K wps
Begin Testing...
[Epoch 184] train avg loss 0.000232062, dev acc 0.8778, dev avg loss 0.378919, throughput 3.97694K wps
[Epoch 185 Batch 30/162] avg loss 0.000226097, throughput 4.04322K wps
[Epoch 185 Batch 60/162] avg loss 0.000205455, throughput 3.9586K wps
[Epoch 185 Batch 90/162] avg loss 0.0001903, throughput 3.96272K wps
[Epoch 185 Batch 120/162] avg loss 0.000215253, throughput 3.97381K wps
[Epoch 185 Batch 150/162] avg loss 0.00020334, throughput 3.95947K wps
Begin Testing...
[Epoch 185] train avg loss 0.000208948, dev acc 0.8767, dev avg loss 0.379278, throughput 3.97764K wps
[Epoch 186 Batch 30/162] avg loss 0.000205749, throughput 4.05752K wps
[Epoch 186 Batch 60/162] avg loss 0.000212374, throughput 3.95348K wps
[Epoch 186 Batch 90/162] avg loss 0.000208822, throughput 3.96431K wps
[Epoch 186 Batch 120/162] avg loss 0.000199347, throughput 3.95118K wps
[Epoch 186 Batch 150/162] avg loss 0.00023054, throughput 3.95311K wps
Begin Testing...
[Epoch 186] train avg loss 0.000207314, dev acc 0.8733, dev avg loss 0.381769, throughput 3.97455K wps
[Epoch 187 Batch 30/162] avg loss 0.000225651, throughput 4.05391K wps
[Epoch 187 Batch 60/162] avg loss 0.000194454, throughput 3.96407K wps
[Epoch 187 Batch 90/162] avg loss 0.000194161, throughput 3.96726K wps
[Epoch 187 Batch 120/162] avg loss 0.000187485, throughput 3.94192K wps
[Epoch 187 Batch 150/162] avg loss 0.000212483, throughput 3.96751K wps
Begin Testing...
[Epoch 187] train avg loss 0.00020539, dev acc 0.8744, dev avg loss 0.38198, throughput 3.97629K wps
[Epoch 188 Batch 30/162] avg loss 0.000203794, throughput 4.06858K wps
[Epoch 188 Batch 60/162] avg loss 0.000214883, throughput 3.96995K wps
[Epoch 188 Batch 90/162] avg loss 0.000190945, throughput 3.95312K wps
[Epoch 188 Batch 120/162] avg loss 0.000213635, throughput 3.96612K wps
[Epoch 188 Batch 150/162] avg loss 0.000187521, throughput 3.96437K wps
Begin Testing...
[Epoch 188] train avg loss 0.000199257, dev acc 0.8756, dev avg loss 0.381101, throughput 3.98189K wps
[Epoch 189 Batch 30/162] avg loss 0.000205645, throughput 4.05259K wps
[Epoch 189 Batch 60/162] avg loss 0.000186044, throughput 3.95955K wps
[Epoch 189 Batch 90/162] avg loss 0.000207948, throughput 3.96628K wps
[Epoch 189 Batch 120/162] avg loss 0.000181697, throughput 3.96493K wps
[Epoch 189 Batch 150/162] avg loss 0.000199706, throughput 3.96064K wps
Begin Testing...
[Epoch 189] train avg loss 0.000199659, dev acc 0.8756, dev avg loss 0.38027, throughput 3.97903K wps
[Epoch 190 Batch 30/162] avg loss 0.000201648, throughput 4.0496K wps
[Epoch 190 Batch 60/162] avg loss 0.000196055, throughput 3.96302K wps
[Epoch 190 Batch 90/162] avg loss 0.000214433, throughput 3.96757K wps
[Epoch 190 Batch 120/162] avg loss 0.000184389, throughput 3.96011K wps
[Epoch 190 Batch 150/162] avg loss 0.000198067, throughput 3.96934K wps
Begin Testing...
[Epoch 190] train avg loss 0.000200911, dev acc 0.8789, dev avg loss 0.382973, throughput 3.97943K wps
[Epoch 191 Batch 30/162] avg loss 0.000198361, throughput 4.07113K wps
[Epoch 191 Batch 60/162] avg loss 0.000201012, throughput 3.95667K wps
[Epoch 191 Batch 90/162] avg loss 0.000196722, throughput 3.96708K wps
[Epoch 191 Batch 120/162] avg loss 0.000176202, throughput 3.97001K wps
[Epoch 191 Batch 150/162] avg loss 0.00017448, throughput 3.95354K wps
Begin Testing...
[Epoch 191] train avg loss 0.000186487, dev acc 0.8789, dev avg loss 0.383094, throughput 3.98248K wps
[Epoch 192 Batch 30/162] avg loss 0.000163081, throughput 4.05737K wps
[Epoch 192 Batch 60/162] avg loss 0.000200649, throughput 3.9628K wps
[Epoch 192 Batch 90/162] avg loss 0.000181792, throughput 3.95344K wps
[Epoch 192 Batch 120/162] avg loss 0.000154786, throughput 3.95526K wps
[Epoch 192 Batch 150/162] avg loss 0.000237618, throughput 3.95284K wps
Begin Testing...
[Epoch 192] train avg loss 0.00019107, dev acc 0.8733, dev avg loss 0.38356, throughput 3.97392K wps
[Epoch 193 Batch 30/162] avg loss 0.000184091, throughput 4.06072K wps
[Epoch 193 Batch 60/162] avg loss 0.000209705, throughput 3.96156K wps
[Epoch 193 Batch 90/162] avg loss 0.000194394, throughput 3.95414K wps
[Epoch 193 Batch 120/162] avg loss 0.000197468, throughput 3.95195K wps
[Epoch 193 Batch 150/162] avg loss 0.000161204, throughput 3.97156K wps
Begin Testing...
[Epoch 193] train avg loss 0.000188962, dev acc 0.8767, dev avg loss 0.383471, throughput 3.9783K wps
[Epoch 194 Batch 30/162] avg loss 0.000189633, throughput 4.05553K wps
[Epoch 194 Batch 60/162] avg loss 0.000182631, throughput 3.96079K wps
[Epoch 194 Batch 90/162] avg loss 0.000187828, throughput 3.95903K wps
[Epoch 194 Batch 120/162] avg loss 0.000195135, throughput 3.95355K wps
[Epoch 194 Batch 150/162] avg loss 0.000190007, throughput 3.96024K wps
Begin Testing...
[Epoch 194] train avg loss 0.000188841, dev acc 0.8789, dev avg loss 0.384737, throughput 3.97674K wps
[Epoch 195 Batch 30/162] avg loss 0.000211407, throughput 4.05877K wps
[Epoch 195 Batch 60/162] avg loss 0.00021982, throughput 3.95667K wps
[Epoch 195 Batch 90/162] avg loss 0.000180899, throughput 3.97322K wps
[Epoch 195 Batch 120/162] avg loss 0.00024409, throughput 3.96546K wps
[Epoch 195 Batch 150/162] avg loss 0.000182545, throughput 3.96621K wps
Begin Testing...
[Epoch 195] train avg loss 0.000205809, dev acc 0.8778, dev avg loss 0.385376, throughput 3.98205K wps
[Epoch 196 Batch 30/162] avg loss 0.000150477, throughput 4.0292K wps
[Epoch 196 Batch 60/162] avg loss 0.000206484, throughput 3.95712K wps
[Epoch 196 Batch 90/162] avg loss 0.00017303, throughput 3.95844K wps
[Epoch 196 Batch 120/162] avg loss 0.000207582, throughput 3.95278K wps
[Epoch 196 Batch 150/162] avg loss 0.000214284, throughput 3.96275K wps
Begin Testing...
[Epoch 196] train avg loss 0.000190793, dev acc 0.8778, dev avg loss 0.386962, throughput 3.97065K wps
[Epoch 197 Batch 30/162] avg loss 0.000181637, throughput 4.05711K wps
[Epoch 197 Batch 60/162] avg loss 0.00022129, throughput 3.95701K wps
[Epoch 197 Batch 90/162] avg loss 0.000190373, throughput 3.94634K wps
[Epoch 197 Batch 120/162] avg loss 0.000190854, throughput 3.96064K wps
[Epoch 197 Batch 150/162] avg loss 0.000188692, throughput 3.96206K wps
Begin Testing...
[Epoch 197] train avg loss 0.000190726, dev acc 0.8800, dev avg loss 0.386989, throughput 3.9755K wps
[Epoch 198 Batch 30/162] avg loss 0.000168107, throughput 4.05216K wps
[Epoch 198 Batch 60/162] avg loss 0.000175255, throughput 3.97344K wps
[Epoch 198 Batch 90/162] avg loss 0.000163046, throughput 3.95736K wps
[Epoch 198 Batch 120/162] avg loss 0.000186337, throughput 3.96006K wps
[Epoch 198 Batch 150/162] avg loss 0.000179829, throughput 3.96416K wps
Begin Testing...
[Epoch 198] train avg loss 0.00017437, dev acc 0.8800, dev avg loss 0.386972, throughput 3.9794K wps
[Epoch 199 Batch 30/162] avg loss 0.000169923, throughput 4.06168K wps
[Epoch 199 Batch 60/162] avg loss 0.00020854, throughput 3.96563K wps
[Epoch 199 Batch 90/162] avg loss 0.00016532, throughput 3.96184K wps
[Epoch 199 Batch 120/162] avg loss 0.000166511, throughput 3.96865K wps
[Epoch 199 Batch 150/162] avg loss 0.000170915, throughput 3.95265K wps
Begin Testing...
[Epoch 199] train avg loss 0.000178227, dev acc 0.8789, dev avg loss 0.387254, throughput 3.98121K wps
Test loss 0.313204, test acc 0.8890
Total time cost 1020.10s
[Epoch 0 Batch 30/162] avg loss 0.0138686, throughput 3.60417K wps
[Epoch 0 Batch 60/162] avg loss 0.0138897, throughput 3.94766K wps
[Epoch 0 Batch 90/162] avg loss 0.0138669, throughput 3.9682K wps
[Epoch 0 Batch 120/162] avg loss 0.0138456, throughput 3.9725K wps
[Epoch 0 Batch 150/162] avg loss 0.0138184, throughput 3.96788K wps
Begin Testing...
[Epoch 0] train avg loss 0.013854, dev acc 0.4878, dev avg loss 0.69059, throughput 3.88924K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0137659, throughput 4.05631K wps
[Epoch 1 Batch 60/162] avg loss 0.0137696, throughput 3.96761K wps
[Epoch 1 Batch 90/162] avg loss 0.0137936, throughput 3.96277K wps
[Epoch 1 Batch 120/162] avg loss 0.0137515, throughput 3.95207K wps
[Epoch 1 Batch 150/162] avg loss 0.0137168, throughput 3.95303K wps
Begin Testing...
[Epoch 1] train avg loss 0.013758, dev acc 0.5733, dev avg loss 0.68505, throughput 3.97586K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/162] avg loss 0.0136831, throughput 4.04608K wps
[Epoch 2 Batch 60/162] avg loss 0.0136757, throughput 3.96709K wps
[Epoch 2 Batch 90/162] avg loss 0.0137101, throughput 3.96571K wps
[Epoch 2 Batch 120/162] avg loss 0.0136197, throughput 3.96414K wps
[Epoch 2 Batch 150/162] avg loss 0.0136119, throughput 3.96236K wps
Begin Testing...
[Epoch 2] train avg loss 0.013653, dev acc 0.7089, dev avg loss 0.679688, throughput 3.9799K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0135525, throughput 4.0585K wps
[Epoch 3 Batch 60/162] avg loss 0.0135528, throughput 3.97059K wps
[Epoch 3 Batch 90/162] avg loss 0.0134722, throughput 3.97083K wps
[Epoch 3 Batch 120/162] avg loss 0.0135319, throughput 3.96903K wps
[Epoch 3 Batch 150/162] avg loss 0.0134343, throughput 3.95992K wps
Begin Testing...
[Epoch 3] train avg loss 0.0135003, dev acc 0.6167, dev avg loss 0.671491, throughput 3.98428K wps
[Epoch 4 Batch 30/162] avg loss 0.0134088, throughput 4.04787K wps
[Epoch 4 Batch 60/162] avg loss 0.0132978, throughput 3.94317K wps
[Epoch 4 Batch 90/162] avg loss 0.0132903, throughput 3.9603K wps
[Epoch 4 Batch 120/162] avg loss 0.0132351, throughput 3.95299K wps
[Epoch 4 Batch 150/162] avg loss 0.0131912, throughput 3.95399K wps
Begin Testing...
[Epoch 4] train avg loss 0.0132835, dev acc 0.7044, dev avg loss 0.659937, throughput 3.96982K wps
[Epoch 5 Batch 30/162] avg loss 0.0130906, throughput 4.04911K wps
[Epoch 5 Batch 60/162] avg loss 0.0130511, throughput 3.95491K wps
[Epoch 5 Batch 90/162] avg loss 0.0130499, throughput 3.95886K wps
[Epoch 5 Batch 120/162] avg loss 0.0129121, throughput 3.96433K wps
[Epoch 5 Batch 150/162] avg loss 0.01295, throughput 3.9659K wps
Begin Testing...
[Epoch 5] train avg loss 0.0130052, dev acc 0.7222, dev avg loss 0.644098, throughput 3.97723K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/162] avg loss 0.0126625, throughput 4.04225K wps
[Epoch 6 Batch 60/162] avg loss 0.0126994, throughput 3.9599K wps
[Epoch 6 Batch 90/162] avg loss 0.0126761, throughput 3.96429K wps
[Epoch 6 Batch 120/162] avg loss 0.0126284, throughput 3.97013K wps
[Epoch 6 Batch 150/162] avg loss 0.0125055, throughput 3.95162K wps
Begin Testing...
[Epoch 6] train avg loss 0.0126342, dev acc 0.7189, dev avg loss 0.624823, throughput 3.97616K wps
[Epoch 7 Batch 30/162] avg loss 0.0122555, throughput 4.0607K wps
[Epoch 7 Batch 60/162] avg loss 0.0124217, throughput 3.96934K wps
[Epoch 7 Batch 90/162] avg loss 0.0122595, throughput 3.97064K wps
[Epoch 7 Batch 120/162] avg loss 0.0120566, throughput 3.97505K wps
[Epoch 7 Batch 150/162] avg loss 0.0123154, throughput 3.96882K wps
Begin Testing...
[Epoch 7] train avg loss 0.0122404, dev acc 0.7178, dev avg loss 0.60576, throughput 3.98523K wps
[Epoch 8 Batch 30/162] avg loss 0.0119575, throughput 4.02441K wps
[Epoch 8 Batch 60/162] avg loss 0.0118662, throughput 3.94865K wps
[Epoch 8 Batch 90/162] avg loss 0.0120469, throughput 3.9552K wps
[Epoch 8 Batch 120/162] avg loss 0.0117551, throughput 3.96395K wps
[Epoch 8 Batch 150/162] avg loss 0.0117209, throughput 3.97496K wps
Begin Testing...
[Epoch 8] train avg loss 0.011851, dev acc 0.7256, dev avg loss 0.586963, throughput 3.97222K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/162] avg loss 0.0115089, throughput 4.06866K wps
[Epoch 9 Batch 60/162] avg loss 0.0116564, throughput 3.96933K wps
[Epoch 9 Batch 90/162] avg loss 0.0113795, throughput 3.97265K wps
[Epoch 9 Batch 120/162] avg loss 0.0115505, throughput 3.96241K wps
[Epoch 9 Batch 150/162] avg loss 0.0112186, throughput 3.96168K wps
Begin Testing...
[Epoch 9] train avg loss 0.0114353, dev acc 0.7378, dev avg loss 0.569296, throughput 3.98496K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.0114977, throughput 4.06308K wps
[Epoch 10 Batch 60/162] avg loss 0.0110704, throughput 3.96841K wps
[Epoch 10 Batch 90/162] avg loss 0.0111808, throughput 3.96485K wps
[Epoch 10 Batch 120/162] avg loss 0.0109895, throughput 3.95326K wps
[Epoch 10 Batch 150/162] avg loss 0.0108477, throughput 3.96313K wps
Begin Testing...
[Epoch 10] train avg loss 0.0110951, dev acc 0.7456, dev avg loss 0.552339, throughput 3.98089K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.0105496, throughput 4.06012K wps
[Epoch 11 Batch 60/162] avg loss 0.0107877, throughput 3.94587K wps
[Epoch 11 Batch 90/162] avg loss 0.0108121, throughput 3.95457K wps
[Epoch 11 Batch 120/162] avg loss 0.0110854, throughput 3.96183K wps
[Epoch 11 Batch 150/162] avg loss 0.0106887, throughput 3.95987K wps
Begin Testing...
[Epoch 11] train avg loss 0.0107774, dev acc 0.7511, dev avg loss 0.536774, throughput 3.97429K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.0105938, throughput 4.04748K wps
[Epoch 12 Batch 60/162] avg loss 0.0107872, throughput 3.95915K wps
[Epoch 12 Batch 90/162] avg loss 0.0103655, throughput 3.95581K wps
[Epoch 12 Batch 120/162] avg loss 0.0102248, throughput 3.9485K wps
[Epoch 12 Batch 150/162] avg loss 0.0104138, throughput 3.9488K wps
Begin Testing...
[Epoch 12] train avg loss 0.0105129, dev acc 0.7589, dev avg loss 0.523809, throughput 3.9715K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.0101804, throughput 4.05302K wps
[Epoch 13 Batch 60/162] avg loss 0.0101653, throughput 3.95379K wps
[Epoch 13 Batch 90/162] avg loss 0.0103792, throughput 3.9501K wps
[Epoch 13 Batch 120/162] avg loss 0.0104334, throughput 3.95058K wps
[Epoch 13 Batch 150/162] avg loss 0.00990823, throughput 3.96505K wps
Begin Testing...
[Epoch 13] train avg loss 0.0101871, dev acc 0.7522, dev avg loss 0.511114, throughput 3.97397K wps
[Epoch 14 Batch 30/162] avg loss 0.00984842, throughput 4.05892K wps
[Epoch 14 Batch 60/162] avg loss 0.0100623, throughput 3.94591K wps
[Epoch 14 Batch 90/162] avg loss 0.0101471, throughput 3.95739K wps
[Epoch 14 Batch 120/162] avg loss 0.00946767, throughput 3.95897K wps
[Epoch 14 Batch 150/162] avg loss 0.0102081, throughput 3.97535K wps
Begin Testing...
[Epoch 14] train avg loss 0.00997625, dev acc 0.7611, dev avg loss 0.500769, throughput 3.97651K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00957569, throughput 4.03295K wps
[Epoch 15 Batch 60/162] avg loss 0.00921135, throughput 3.96886K wps
[Epoch 15 Batch 90/162] avg loss 0.0101604, throughput 3.95503K wps
[Epoch 15 Batch 120/162] avg loss 0.00985325, throughput 3.95991K wps
[Epoch 15 Batch 150/162] avg loss 0.00986022, throughput 3.95177K wps
Begin Testing...
[Epoch 15] train avg loss 0.00974769, dev acc 0.7644, dev avg loss 0.492029, throughput 3.97278K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/162] avg loss 0.00979623, throughput 4.05934K wps
[Epoch 16 Batch 60/162] avg loss 0.00955047, throughput 3.94757K wps
[Epoch 16 Batch 90/162] avg loss 0.00949138, throughput 3.95887K wps
[Epoch 16 Batch 120/162] avg loss 0.00911435, throughput 3.97312K wps
[Epoch 16 Batch 150/162] avg loss 0.0098678, throughput 3.9571K wps
Begin Testing...
[Epoch 16] train avg loss 0.00951962, dev acc 0.7667, dev avg loss 0.480854, throughput 3.97697K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.00949428, throughput 4.05178K wps
[Epoch 17 Batch 60/162] avg loss 0.00929101, throughput 3.96244K wps
[Epoch 17 Batch 90/162] avg loss 0.00944483, throughput 3.97136K wps
[Epoch 17 Batch 120/162] avg loss 0.0095785, throughput 3.9533K wps
[Epoch 17 Batch 150/162] avg loss 0.0093068, throughput 3.96704K wps
Begin Testing...
[Epoch 17] train avg loss 0.00939381, dev acc 0.7822, dev avg loss 0.473097, throughput 3.979K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/162] avg loss 0.00915599, throughput 4.0508K wps
[Epoch 18 Batch 60/162] avg loss 0.00922526, throughput 3.9648K wps
[Epoch 18 Batch 90/162] avg loss 0.00935713, throughput 3.93529K wps
[Epoch 18 Batch 120/162] avg loss 0.00901648, throughput 3.94613K wps
[Epoch 18 Batch 150/162] avg loss 0.00906545, throughput 3.95193K wps
Begin Testing...
[Epoch 18] train avg loss 0.00916681, dev acc 0.7689, dev avg loss 0.4624, throughput 3.96905K wps
[Epoch 19 Batch 30/162] avg loss 0.00928909, throughput 4.06191K wps
[Epoch 19 Batch 60/162] avg loss 0.00899193, throughput 3.9572K wps
[Epoch 19 Batch 90/162] avg loss 0.00921203, throughput 3.96051K wps
[Epoch 19 Batch 120/162] avg loss 0.0086372, throughput 3.95401K wps
[Epoch 19 Batch 150/162] avg loss 0.00865279, throughput 3.97317K wps
Begin Testing...
[Epoch 19] train avg loss 0.00895023, dev acc 0.7800, dev avg loss 0.452108, throughput 3.98054K wps
[Epoch 20 Batch 30/162] avg loss 0.00888644, throughput 4.05089K wps
[Epoch 20 Batch 60/162] avg loss 0.00856031, throughput 3.96051K wps
[Epoch 20 Batch 90/162] avg loss 0.00865008, throughput 3.94419K wps
[Epoch 20 Batch 120/162] avg loss 0.00911031, throughput 3.96316K wps
[Epoch 20 Batch 150/162] avg loss 0.00846405, throughput 3.96443K wps
Begin Testing...
[Epoch 20] train avg loss 0.00872819, dev acc 0.7822, dev avg loss 0.443503, throughput 3.97615K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/162] avg loss 0.00860742, throughput 4.05137K wps
[Epoch 21 Batch 60/162] avg loss 0.00886311, throughput 3.95911K wps
[Epoch 21 Batch 90/162] avg loss 0.00830552, throughput 3.96034K wps
[Epoch 21 Batch 120/162] avg loss 0.00854156, throughput 3.96417K wps
[Epoch 21 Batch 150/162] avg loss 0.00863862, throughput 3.95399K wps
Begin Testing...
[Epoch 21] train avg loss 0.00859401, dev acc 0.7956, dev avg loss 0.434459, throughput 3.97579K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.00811854, throughput 4.05704K wps
[Epoch 22 Batch 60/162] avg loss 0.00852483, throughput 3.9507K wps
[Epoch 22 Batch 90/162] avg loss 0.00852325, throughput 3.97065K wps
[Epoch 22 Batch 120/162] avg loss 0.00855876, throughput 3.96192K wps
[Epoch 22 Batch 150/162] avg loss 0.0081424, throughput 3.95477K wps
Begin Testing...
[Epoch 22] train avg loss 0.00837064, dev acc 0.8111, dev avg loss 0.427291, throughput 3.97511K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/162] avg loss 0.00800348, throughput 4.06262K wps
[Epoch 23 Batch 60/162] avg loss 0.00832957, throughput 3.96141K wps
[Epoch 23 Batch 90/162] avg loss 0.00805148, throughput 3.96528K wps
[Epoch 23 Batch 120/162] avg loss 0.00850204, throughput 3.96569K wps
[Epoch 23 Batch 150/162] avg loss 0.00799578, throughput 3.96816K wps
Begin Testing...
[Epoch 23] train avg loss 0.00823327, dev acc 0.8067, dev avg loss 0.419069, throughput 3.98179K wps
[Epoch 24 Batch 30/162] avg loss 0.00821616, throughput 4.07185K wps
[Epoch 24 Batch 60/162] avg loss 0.00780387, throughput 3.97003K wps
[Epoch 24 Batch 90/162] avg loss 0.00838949, throughput 3.97263K wps
[Epoch 24 Batch 120/162] avg loss 0.00821546, throughput 3.95944K wps
[Epoch 24 Batch 150/162] avg loss 0.00812477, throughput 3.96357K wps
Begin Testing...
[Epoch 24] train avg loss 0.00809459, dev acc 0.8133, dev avg loss 0.412609, throughput 3.98514K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/162] avg loss 0.00769747, throughput 4.04201K wps
[Epoch 25 Batch 60/162] avg loss 0.0080983, throughput 3.96231K wps
[Epoch 25 Batch 90/162] avg loss 0.00807124, throughput 3.96207K wps
[Epoch 25 Batch 120/162] avg loss 0.0076643, throughput 3.96091K wps
[Epoch 25 Batch 150/162] avg loss 0.00803686, throughput 3.9522K wps
Begin Testing...
[Epoch 25] train avg loss 0.00794396, dev acc 0.8211, dev avg loss 0.406265, throughput 3.97576K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/162] avg loss 0.00809231, throughput 4.0427K wps
[Epoch 26 Batch 60/162] avg loss 0.00749594, throughput 3.9583K wps
[Epoch 26 Batch 90/162] avg loss 0.00747432, throughput 3.96751K wps
[Epoch 26 Batch 120/162] avg loss 0.00775767, throughput 3.96385K wps
[Epoch 26 Batch 150/162] avg loss 0.00785164, throughput 3.95871K wps
Begin Testing...
[Epoch 26] train avg loss 0.00774964, dev acc 0.8233, dev avg loss 0.399946, throughput 3.97679K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.00758991, throughput 4.06012K wps
[Epoch 27 Batch 60/162] avg loss 0.00767566, throughput 3.958K wps
[Epoch 27 Batch 90/162] avg loss 0.00748987, throughput 3.96849K wps
[Epoch 27 Batch 120/162] avg loss 0.00787471, throughput 3.95588K wps
[Epoch 27 Batch 150/162] avg loss 0.00764808, throughput 3.95944K wps
Begin Testing...
[Epoch 27] train avg loss 0.00764958, dev acc 0.8256, dev avg loss 0.395887, throughput 3.97867K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/162] avg loss 0.00746945, throughput 4.06184K wps
[Epoch 28 Batch 60/162] avg loss 0.00750937, throughput 3.9664K wps
[Epoch 28 Batch 90/162] avg loss 0.00791185, throughput 3.9539K wps
[Epoch 28 Batch 120/162] avg loss 0.00728748, throughput 3.96356K wps
[Epoch 28 Batch 150/162] avg loss 0.00743693, throughput 3.94602K wps
Begin Testing...
[Epoch 28] train avg loss 0.00749477, dev acc 0.8222, dev avg loss 0.389738, throughput 3.97782K wps
[Epoch 29 Batch 30/162] avg loss 0.00759543, throughput 4.0595K wps
[Epoch 29 Batch 60/162] avg loss 0.00750284, throughput 3.95666K wps
[Epoch 29 Batch 90/162] avg loss 0.00734799, throughput 3.96247K wps
[Epoch 29 Batch 120/162] avg loss 0.00724071, throughput 3.96189K wps
[Epoch 29 Batch 150/162] avg loss 0.00737917, throughput 3.94639K wps
Begin Testing...
[Epoch 29] train avg loss 0.00740278, dev acc 0.8244, dev avg loss 0.384508, throughput 3.97617K wps
[Epoch 30 Batch 30/162] avg loss 0.00734882, throughput 4.02824K wps
[Epoch 30 Batch 60/162] avg loss 0.00739803, throughput 3.94509K wps
[Epoch 30 Batch 90/162] avg loss 0.00718143, throughput 3.96665K wps
[Epoch 30 Batch 120/162] avg loss 0.00709616, throughput 3.95956K wps
[Epoch 30 Batch 150/162] avg loss 0.00735525, throughput 3.9639K wps
Begin Testing...
[Epoch 30] train avg loss 0.00730603, dev acc 0.8278, dev avg loss 0.381858, throughput 3.97233K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00694959, throughput 4.06218K wps
[Epoch 31 Batch 60/162] avg loss 0.00710748, throughput 3.94828K wps
[Epoch 31 Batch 90/162] avg loss 0.00748593, throughput 3.95433K wps
[Epoch 31 Batch 120/162] avg loss 0.00710171, throughput 3.9593K wps
[Epoch 31 Batch 150/162] avg loss 0.0072897, throughput 3.95194K wps
Begin Testing...
[Epoch 31] train avg loss 0.0071515, dev acc 0.8278, dev avg loss 0.37561, throughput 3.97268K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/162] avg loss 0.00721069, throughput 4.06143K wps
[Epoch 32 Batch 60/162] avg loss 0.00670749, throughput 3.95234K wps
[Epoch 32 Batch 90/162] avg loss 0.00733727, throughput 3.95408K wps
[Epoch 32 Batch 120/162] avg loss 0.00666019, throughput 3.97076K wps
[Epoch 32 Batch 150/162] avg loss 0.00721989, throughput 3.97196K wps
Begin Testing...
[Epoch 32] train avg loss 0.0070453, dev acc 0.8267, dev avg loss 0.372313, throughput 3.97961K wps
[Epoch 33 Batch 30/162] avg loss 0.00674355, throughput 4.06098K wps
[Epoch 33 Batch 60/162] avg loss 0.00699474, throughput 3.96738K wps
[Epoch 33 Batch 90/162] avg loss 0.00726649, throughput 3.95752K wps
[Epoch 33 Batch 120/162] avg loss 0.0073009, throughput 3.96641K wps
[Epoch 33 Batch 150/162] avg loss 0.00662718, throughput 3.96494K wps
Begin Testing...
[Epoch 33] train avg loss 0.00693116, dev acc 0.8300, dev avg loss 0.368931, throughput 3.98195K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/162] avg loss 0.00698523, throughput 4.0694K wps
[Epoch 34 Batch 60/162] avg loss 0.00633666, throughput 3.96348K wps
[Epoch 34 Batch 90/162] avg loss 0.00641988, throughput 3.94362K wps
[Epoch 34 Batch 120/162] avg loss 0.00700941, throughput 3.94306K wps
[Epoch 34 Batch 150/162] avg loss 0.00717257, throughput 3.97451K wps
Begin Testing...
[Epoch 34] train avg loss 0.00677402, dev acc 0.8311, dev avg loss 0.362207, throughput 3.9773K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/162] avg loss 0.00695395, throughput 4.02913K wps
[Epoch 35 Batch 60/162] avg loss 0.00643559, throughput 3.96154K wps
[Epoch 35 Batch 90/162] avg loss 0.00653795, throughput 3.96843K wps
[Epoch 35 Batch 120/162] avg loss 0.0068556, throughput 3.96366K wps
[Epoch 35 Batch 150/162] avg loss 0.0065848, throughput 3.94897K wps
Begin Testing...
[Epoch 35] train avg loss 0.00667611, dev acc 0.8267, dev avg loss 0.362938, throughput 3.97193K wps
[Epoch 36 Batch 30/162] avg loss 0.00673646, throughput 4.05635K wps
[Epoch 36 Batch 60/162] avg loss 0.00668007, throughput 3.9488K wps
[Epoch 36 Batch 90/162] avg loss 0.0067747, throughput 3.96583K wps
[Epoch 36 Batch 120/162] avg loss 0.00673597, throughput 3.96228K wps
[Epoch 36 Batch 150/162] avg loss 0.00600343, throughput 3.95986K wps
Begin Testing...
[Epoch 36] train avg loss 0.00656113, dev acc 0.8344, dev avg loss 0.353288, throughput 3.97761K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/162] avg loss 0.00672527, throughput 4.07098K wps
[Epoch 37 Batch 60/162] avg loss 0.00645756, throughput 3.96608K wps
[Epoch 37 Batch 90/162] avg loss 0.00594413, throughput 3.96773K wps
[Epoch 37 Batch 120/162] avg loss 0.00635743, throughput 3.95713K wps
[Epoch 37 Batch 150/162] avg loss 0.00658676, throughput 3.95855K wps
Begin Testing...
[Epoch 37] train avg loss 0.00641551, dev acc 0.8378, dev avg loss 0.349236, throughput 3.9815K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/162] avg loss 0.0060979, throughput 4.06554K wps
[Epoch 38 Batch 60/162] avg loss 0.00650236, throughput 3.96416K wps
[Epoch 38 Batch 90/162] avg loss 0.00625515, throughput 3.95373K wps
[Epoch 38 Batch 120/162] avg loss 0.00620633, throughput 3.9583K wps
[Epoch 38 Batch 150/162] avg loss 0.00631937, throughput 3.95855K wps
Begin Testing...
[Epoch 38] train avg loss 0.00625353, dev acc 0.8411, dev avg loss 0.345169, throughput 3.97474K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/162] avg loss 0.00593514, throughput 4.05166K wps
[Epoch 39 Batch 60/162] avg loss 0.00612452, throughput 3.96034K wps
[Epoch 39 Batch 90/162] avg loss 0.00564712, throughput 3.96974K wps
[Epoch 39 Batch 120/162] avg loss 0.00616506, throughput 3.96543K wps
[Epoch 39 Batch 150/162] avg loss 0.00653509, throughput 3.94612K wps
Begin Testing...
[Epoch 39] train avg loss 0.0060825, dev acc 0.8444, dev avg loss 0.340096, throughput 3.97685K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/162] avg loss 0.00604361, throughput 4.04589K wps
[Epoch 40 Batch 60/162] avg loss 0.00583772, throughput 3.95456K wps
[Epoch 40 Batch 90/162] avg loss 0.00648946, throughput 3.94853K wps
[Epoch 40 Batch 120/162] avg loss 0.00600103, throughput 3.96448K wps
[Epoch 40 Batch 150/162] avg loss 0.00557157, throughput 3.97199K wps
Begin Testing...
[Epoch 40] train avg loss 0.00598513, dev acc 0.8478, dev avg loss 0.335529, throughput 3.97612K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/162] avg loss 0.00589931, throughput 4.03955K wps
[Epoch 41 Batch 60/162] avg loss 0.0060257, throughput 3.95525K wps
[Epoch 41 Batch 90/162] avg loss 0.00606682, throughput 3.95468K wps
[Epoch 41 Batch 120/162] avg loss 0.00597432, throughput 3.95742K wps
[Epoch 41 Batch 150/162] avg loss 0.00563749, throughput 3.94893K wps
Begin Testing...
[Epoch 41] train avg loss 0.00591423, dev acc 0.8467, dev avg loss 0.331854, throughput 3.96997K wps
[Epoch 42 Batch 30/162] avg loss 0.00531664, throughput 4.05669K wps
[Epoch 42 Batch 60/162] avg loss 0.00628344, throughput 3.96203K wps
[Epoch 42 Batch 90/162] avg loss 0.00586368, throughput 3.96984K wps
[Epoch 42 Batch 120/162] avg loss 0.00537888, throughput 3.9675K wps
[Epoch 42 Batch 150/162] avg loss 0.00564224, throughput 3.95572K wps
Begin Testing...
[Epoch 42] train avg loss 0.00570045, dev acc 0.8522, dev avg loss 0.326308, throughput 3.98127K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/162] avg loss 0.00581977, throughput 4.05741K wps
[Epoch 43 Batch 60/162] avg loss 0.00566829, throughput 3.97047K wps
[Epoch 43 Batch 90/162] avg loss 0.00581055, throughput 3.9707K wps
[Epoch 43 Batch 120/162] avg loss 0.00528552, throughput 3.9532K wps
[Epoch 43 Batch 150/162] avg loss 0.00543017, throughput 3.96191K wps
Begin Testing...
[Epoch 43] train avg loss 0.00560621, dev acc 0.8567, dev avg loss 0.323043, throughput 3.97973K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/162] avg loss 0.00541494, throughput 4.06392K wps
[Epoch 44 Batch 60/162] avg loss 0.00521353, throughput 3.9399K wps
[Epoch 44 Batch 90/162] avg loss 0.00579466, throughput 3.95484K wps
[Epoch 44 Batch 120/162] avg loss 0.00559355, throughput 3.95885K wps
[Epoch 44 Batch 150/162] avg loss 0.00524511, throughput 3.96442K wps
Begin Testing...
[Epoch 44] train avg loss 0.00546727, dev acc 0.8567, dev avg loss 0.319104, throughput 3.97493K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/162] avg loss 0.00539623, throughput 4.044K wps
[Epoch 45 Batch 60/162] avg loss 0.00502918, throughput 3.95943K wps
[Epoch 45 Batch 90/162] avg loss 0.00535971, throughput 3.94811K wps
[Epoch 45 Batch 120/162] avg loss 0.0056316, throughput 3.97213K wps
[Epoch 45 Batch 150/162] avg loss 0.00537354, throughput 3.96538K wps
Begin Testing...
[Epoch 45] train avg loss 0.00533965, dev acc 0.8633, dev avg loss 0.315156, throughput 3.97556K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/162] avg loss 0.00551493, throughput 4.02232K wps
[Epoch 46 Batch 60/162] avg loss 0.00552, throughput 3.9332K wps
[Epoch 46 Batch 90/162] avg loss 0.00479708, throughput 3.93961K wps
[Epoch 46 Batch 120/162] avg loss 0.00523372, throughput 3.93546K wps
[Epoch 46 Batch 150/162] avg loss 0.00500175, throughput 3.92902K wps
Begin Testing...
[Epoch 46] train avg loss 0.00521871, dev acc 0.8656, dev avg loss 0.311103, throughput 3.95029K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/162] avg loss 0.00483453, throughput 4.04021K wps
[Epoch 47 Batch 60/162] avg loss 0.00538057, throughput 3.96452K wps
[Epoch 47 Batch 90/162] avg loss 0.00500341, throughput 3.95856K wps
[Epoch 47 Batch 120/162] avg loss 0.00511282, throughput 3.96175K wps
[Epoch 47 Batch 150/162] avg loss 0.00503279, throughput 3.96187K wps
Begin Testing...
[Epoch 47] train avg loss 0.00507153, dev acc 0.8700, dev avg loss 0.306954, throughput 3.97633K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/162] avg loss 0.00505175, throughput 4.05379K wps
[Epoch 48 Batch 60/162] avg loss 0.00511451, throughput 3.97151K wps
[Epoch 48 Batch 90/162] avg loss 0.00508248, throughput 3.96024K wps
[Epoch 48 Batch 120/162] avg loss 0.00505141, throughput 3.9606K wps
[Epoch 48 Batch 150/162] avg loss 0.00517025, throughput 3.95546K wps
Begin Testing...
[Epoch 48] train avg loss 0.005028, dev acc 0.8700, dev avg loss 0.303189, throughput 3.97978K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/162] avg loss 0.00489374, throughput 4.05051K wps
[Epoch 49 Batch 60/162] avg loss 0.00500571, throughput 3.96601K wps
[Epoch 49 Batch 90/162] avg loss 0.00474993, throughput 3.96746K wps
[Epoch 49 Batch 120/162] avg loss 0.00466721, throughput 3.96804K wps
[Epoch 49 Batch 150/162] avg loss 0.00524323, throughput 3.9792K wps
Begin Testing...
[Epoch 49] train avg loss 0.00488972, dev acc 0.8756, dev avg loss 0.29893, throughput 3.98366K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/162] avg loss 0.00492021, throughput 4.06117K wps
[Epoch 50 Batch 60/162] avg loss 0.00470688, throughput 3.96666K wps
[Epoch 50 Batch 90/162] avg loss 0.00479288, throughput 3.95733K wps
[Epoch 50 Batch 120/162] avg loss 0.004688, throughput 3.95885K wps
[Epoch 50 Batch 150/162] avg loss 0.00479082, throughput 3.95743K wps
Begin Testing...
[Epoch 50] train avg loss 0.00476003, dev acc 0.8767, dev avg loss 0.296075, throughput 3.97805K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/162] avg loss 0.00429948, throughput 4.05193K wps
[Epoch 51 Batch 60/162] avg loss 0.00442784, throughput 3.95006K wps
[Epoch 51 Batch 90/162] avg loss 0.00473109, throughput 3.96424K wps
[Epoch 51 Batch 120/162] avg loss 0.00484363, throughput 3.96831K wps
[Epoch 51 Batch 150/162] avg loss 0.0045258, throughput 3.9534K wps
Begin Testing...
[Epoch 51] train avg loss 0.00455829, dev acc 0.8789, dev avg loss 0.292893, throughput 3.97722K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/162] avg loss 0.004679, throughput 4.06881K wps
[Epoch 52 Batch 60/162] avg loss 0.00461984, throughput 3.97159K wps
[Epoch 52 Batch 90/162] avg loss 0.00474435, throughput 3.96702K wps
[Epoch 52 Batch 120/162] avg loss 0.00430153, throughput 3.96615K wps
[Epoch 52 Batch 150/162] avg loss 0.00440303, throughput 3.97559K wps
Begin Testing...
[Epoch 52] train avg loss 0.00454581, dev acc 0.8800, dev avg loss 0.289549, throughput 3.98749K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/162] avg loss 0.00455456, throughput 4.05406K wps
[Epoch 53 Batch 60/162] avg loss 0.00413786, throughput 3.96624K wps
[Epoch 53 Batch 90/162] avg loss 0.00484614, throughput 3.96985K wps
[Epoch 53 Batch 120/162] avg loss 0.00404392, throughput 3.95896K wps
[Epoch 53 Batch 150/162] avg loss 0.00438544, throughput 3.95269K wps
Begin Testing...
[Epoch 53] train avg loss 0.00438505, dev acc 0.8767, dev avg loss 0.286348, throughput 3.97938K wps
[Epoch 54 Batch 30/162] avg loss 0.00401935, throughput 4.06174K wps
[Epoch 54 Batch 60/162] avg loss 0.00426639, throughput 3.97375K wps
[Epoch 54 Batch 90/162] avg loss 0.00425201, throughput 3.97721K wps
[Epoch 54 Batch 120/162] avg loss 0.00440051, throughput 3.94802K wps
[Epoch 54 Batch 150/162] avg loss 0.0043452, throughput 3.97319K wps
Begin Testing...
[Epoch 54] train avg loss 0.00423321, dev acc 0.8844, dev avg loss 0.282579, throughput 3.98459K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/162] avg loss 0.00429018, throughput 4.06185K wps
[Epoch 55 Batch 60/162] avg loss 0.0039918, throughput 3.96458K wps
[Epoch 55 Batch 90/162] avg loss 0.00398275, throughput 3.9557K wps
[Epoch 55 Batch 120/162] avg loss 0.00417512, throughput 3.96925K wps
[Epoch 55 Batch 150/162] avg loss 0.0041492, throughput 3.97086K wps
Begin Testing...
[Epoch 55] train avg loss 0.00417935, dev acc 0.8822, dev avg loss 0.280639, throughput 3.98334K wps
[Epoch 56 Batch 30/162] avg loss 0.00399613, throughput 4.05836K wps
[Epoch 56 Batch 60/162] avg loss 0.00412333, throughput 3.95207K wps
[Epoch 56 Batch 90/162] avg loss 0.00400806, throughput 3.96867K wps
[Epoch 56 Batch 120/162] avg loss 0.00414336, throughput 3.96177K wps
[Epoch 56 Batch 150/162] avg loss 0.00373862, throughput 3.95182K wps
Begin Testing...
[Epoch 56] train avg loss 0.00401851, dev acc 0.8811, dev avg loss 0.27636, throughput 3.97694K wps
[Epoch 57 Batch 30/162] avg loss 0.00382102, throughput 4.07157K wps
[Epoch 57 Batch 60/162] avg loss 0.00381513, throughput 3.96798K wps
[Epoch 57 Batch 90/162] avg loss 0.00406262, throughput 3.96787K wps
[Epoch 57 Batch 120/162] avg loss 0.00412506, throughput 3.96099K wps
[Epoch 57 Batch 150/162] avg loss 0.00376672, throughput 3.96293K wps
Begin Testing...
[Epoch 57] train avg loss 0.00389594, dev acc 0.8833, dev avg loss 0.273328, throughput 3.98231K wps
[Epoch 58 Batch 30/162] avg loss 0.00391451, throughput 4.04464K wps
[Epoch 58 Batch 60/162] avg loss 0.00381057, throughput 3.96762K wps
[Epoch 58 Batch 90/162] avg loss 0.00379468, throughput 3.93618K wps
[Epoch 58 Batch 120/162] avg loss 0.00362694, throughput 3.96168K wps
[Epoch 58 Batch 150/162] avg loss 0.00381733, throughput 3.95428K wps
Begin Testing...
[Epoch 58] train avg loss 0.00380486, dev acc 0.8800, dev avg loss 0.270231, throughput 3.971K wps
[Epoch 59 Batch 30/162] avg loss 0.00361579, throughput 4.06706K wps
[Epoch 59 Batch 60/162] avg loss 0.00368767, throughput 3.94689K wps
[Epoch 59 Batch 90/162] avg loss 0.00383628, throughput 3.96055K wps
[Epoch 59 Batch 120/162] avg loss 0.00368289, throughput 3.9456K wps
[Epoch 59 Batch 150/162] avg loss 0.00356781, throughput 3.96894K wps
Begin Testing...
[Epoch 59] train avg loss 0.00366797, dev acc 0.8811, dev avg loss 0.269986, throughput 3.9774K wps
[Epoch 60 Batch 30/162] avg loss 0.00376551, throughput 4.0634K wps
[Epoch 60 Batch 60/162] avg loss 0.00357728, throughput 3.96624K wps
[Epoch 60 Batch 90/162] avg loss 0.0036628, throughput 3.94954K wps
[Epoch 60 Batch 120/162] avg loss 0.00342631, throughput 3.96091K wps
[Epoch 60 Batch 150/162] avg loss 0.00361498, throughput 3.95555K wps
Begin Testing...
[Epoch 60] train avg loss 0.0036193, dev acc 0.8844, dev avg loss 0.26449, throughput 3.9777K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/162] avg loss 0.00335269, throughput 4.05926K wps
[Epoch 61 Batch 60/162] avg loss 0.00344974, throughput 3.94919K wps
[Epoch 61 Batch 90/162] avg loss 0.00361609, throughput 3.95327K wps
[Epoch 61 Batch 120/162] avg loss 0.0038088, throughput 3.98029K wps
[Epoch 61 Batch 150/162] avg loss 0.00325453, throughput 3.96656K wps
Begin Testing...
[Epoch 61] train avg loss 0.00349162, dev acc 0.8856, dev avg loss 0.263042, throughput 3.97838K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/162] avg loss 0.00349216, throughput 4.03026K wps
[Epoch 62 Batch 60/162] avg loss 0.00327224, throughput 3.94651K wps
[Epoch 62 Batch 90/162] avg loss 0.00360522, throughput 3.96571K wps
[Epoch 62 Batch 120/162] avg loss 0.0034311, throughput 3.96737K wps
[Epoch 62 Batch 150/162] avg loss 0.00364588, throughput 3.95209K wps
Begin Testing...
[Epoch 62] train avg loss 0.00345753, dev acc 0.8900, dev avg loss 0.261503, throughput 3.96985K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/162] avg loss 0.00323236, throughput 4.06986K wps
[Epoch 63 Batch 60/162] avg loss 0.00322278, throughput 3.95334K wps
[Epoch 63 Batch 90/162] avg loss 0.0033689, throughput 3.93917K wps
[Epoch 63 Batch 120/162] avg loss 0.00327136, throughput 3.96195K wps
[Epoch 63 Batch 150/162] avg loss 0.00326057, throughput 3.95885K wps
Begin Testing...
[Epoch 63] train avg loss 0.00326237, dev acc 0.8889, dev avg loss 0.259178, throughput 3.976K wps
[Epoch 64 Batch 30/162] avg loss 0.00322634, throughput 4.07253K wps
[Epoch 64 Batch 60/162] avg loss 0.00326409, throughput 3.96469K wps
[Epoch 64 Batch 90/162] avg loss 0.00353154, throughput 3.96303K wps
[Epoch 64 Batch 120/162] avg loss 0.00317855, throughput 3.95474K wps
[Epoch 64 Batch 150/162] avg loss 0.00317387, throughput 3.96813K wps
Begin Testing...
[Epoch 64] train avg loss 0.00325087, dev acc 0.8856, dev avg loss 0.255836, throughput 3.98123K wps
[Epoch 65 Batch 30/162] avg loss 0.00310449, throughput 4.05658K wps
[Epoch 65 Batch 60/162] avg loss 0.00298674, throughput 3.95429K wps
[Epoch 65 Batch 90/162] avg loss 0.00297285, throughput 3.96268K wps
[Epoch 65 Batch 120/162] avg loss 0.00331723, throughput 3.96005K wps
[Epoch 65 Batch 150/162] avg loss 0.0030716, throughput 3.96304K wps
Begin Testing...
[Epoch 65] train avg loss 0.00308213, dev acc 0.8889, dev avg loss 0.253893, throughput 3.979K wps
[Epoch 66 Batch 30/162] avg loss 0.00290651, throughput 4.06837K wps
[Epoch 66 Batch 60/162] avg loss 0.00252065, throughput 3.96314K wps
[Epoch 66 Batch 90/162] avg loss 0.00353621, throughput 3.97289K wps
[Epoch 66 Batch 120/162] avg loss 0.00305625, throughput 3.95887K wps
[Epoch 66 Batch 150/162] avg loss 0.00303284, throughput 3.96212K wps
Begin Testing...
[Epoch 66] train avg loss 0.00298695, dev acc 0.8922, dev avg loss 0.252236, throughput 3.98188K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/162] avg loss 0.00294257, throughput 4.04961K wps
[Epoch 67 Batch 60/162] avg loss 0.00304788, throughput 3.94885K wps
[Epoch 67 Batch 90/162] avg loss 0.00285441, throughput 3.96127K wps
[Epoch 67 Batch 120/162] avg loss 0.00296895, throughput 3.97067K wps
[Epoch 67 Batch 150/162] avg loss 0.00300585, throughput 3.97208K wps
Begin Testing...
[Epoch 67] train avg loss 0.00297021, dev acc 0.8933, dev avg loss 0.250565, throughput 3.97914K wps
Observed Improvement.
Begin Testing...
[Epoch 68 Batch 30/162] avg loss 0.00300014, throughput 4.06097K wps
[Epoch 68 Batch 60/162] avg loss 0.00300896, throughput 3.96637K wps
[Epoch 68 Batch 90/162] avg loss 0.00262226, throughput 3.95296K wps
[Epoch 68 Batch 120/162] avg loss 0.00279833, throughput 3.97086K wps
[Epoch 68 Batch 150/162] avg loss 0.00272304, throughput 3.96714K wps
Begin Testing...
[Epoch 68] train avg loss 0.00282178, dev acc 0.8922, dev avg loss 0.251045, throughput 3.98221K wps
[Epoch 69 Batch 30/162] avg loss 0.00277807, throughput 4.06849K wps
[Epoch 69 Batch 60/162] avg loss 0.00285583, throughput 3.97504K wps
[Epoch 69 Batch 90/162] avg loss 0.00284129, throughput 3.96847K wps
[Epoch 69 Batch 120/162] avg loss 0.00287538, throughput 3.9754K wps
[Epoch 69 Batch 150/162] avg loss 0.00270674, throughput 3.9614K wps
Begin Testing...
[Epoch 69] train avg loss 0.00280636, dev acc 0.8933, dev avg loss 0.249045, throughput 3.98733K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/162] avg loss 0.00271525, throughput 4.05091K wps
[Epoch 70 Batch 60/162] avg loss 0.00276461, throughput 3.94097K wps
[Epoch 70 Batch 90/162] avg loss 0.00261404, throughput 3.96054K wps
[Epoch 70 Batch 120/162] avg loss 0.00254371, throughput 3.96197K wps
[Epoch 70 Batch 150/162] avg loss 0.0025887, throughput 3.96481K wps
Begin Testing...
[Epoch 70] train avg loss 0.00264403, dev acc 0.8933, dev avg loss 0.245256, throughput 3.9755K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/162] avg loss 0.00253933, throughput 4.05544K wps
[Epoch 71 Batch 60/162] avg loss 0.00269069, throughput 3.96724K wps
[Epoch 71 Batch 90/162] avg loss 0.00268657, throughput 3.94922K wps
[Epoch 71 Batch 120/162] avg loss 0.00236744, throughput 3.95861K wps
[Epoch 71 Batch 150/162] avg loss 0.00265281, throughput 3.96791K wps
Begin Testing...
[Epoch 71] train avg loss 0.00257228, dev acc 0.8944, dev avg loss 0.244825, throughput 3.97617K wps
Observed Improvement.
Begin Testing...
[Epoch 72 Batch 30/162] avg loss 0.0024361, throughput 4.05798K wps
[Epoch 72 Batch 60/162] avg loss 0.00248205, throughput 3.96369K wps
[Epoch 72 Batch 90/162] avg loss 0.00268138, throughput 3.95893K wps
[Epoch 72 Batch 120/162] avg loss 0.00234486, throughput 3.95216K wps
[Epoch 72 Batch 150/162] avg loss 0.0025993, throughput 3.96919K wps
Begin Testing...
[Epoch 72] train avg loss 0.00250625, dev acc 0.8956, dev avg loss 0.243742, throughput 3.97913K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/162] avg loss 0.00251631, throughput 4.02628K wps
[Epoch 73 Batch 60/162] avg loss 0.00227127, throughput 3.9664K wps
[Epoch 73 Batch 90/162] avg loss 0.00250701, throughput 3.96804K wps
[Epoch 73 Batch 120/162] avg loss 0.00265593, throughput 3.94513K wps
[Epoch 73 Batch 150/162] avg loss 0.00238291, throughput 3.96952K wps
Begin Testing...
[Epoch 73] train avg loss 0.00245592, dev acc 0.8989, dev avg loss 0.244805, throughput 3.97477K wps
Observed Improvement.
Begin Testing...
[Epoch 74 Batch 30/162] avg loss 0.0023455, throughput 4.06204K wps
[Epoch 74 Batch 60/162] avg loss 0.00244211, throughput 3.9569K wps
[Epoch 74 Batch 90/162] avg loss 0.00229628, throughput 3.97561K wps
[Epoch 74 Batch 120/162] avg loss 0.00229774, throughput 3.95154K wps
[Epoch 74 Batch 150/162] avg loss 0.00256526, throughput 3.95575K wps
Begin Testing...
[Epoch 74] train avg loss 0.00239434, dev acc 0.8967, dev avg loss 0.241755, throughput 3.9781K wps
[Epoch 75 Batch 30/162] avg loss 0.00242506, throughput 4.06289K wps
[Epoch 75 Batch 60/162] avg loss 0.00222214, throughput 3.9646K wps
[Epoch 75 Batch 90/162] avg loss 0.00228528, throughput 3.95875K wps
[Epoch 75 Batch 120/162] avg loss 0.00230387, throughput 3.96636K wps
[Epoch 75 Batch 150/162] avg loss 0.0021914, throughput 3.95509K wps
Begin Testing...
[Epoch 75] train avg loss 0.00230354, dev acc 0.8956, dev avg loss 0.240732, throughput 3.98021K wps
[Epoch 76 Batch 30/162] avg loss 0.00244977, throughput 4.04436K wps
[Epoch 76 Batch 60/162] avg loss 0.00222534, throughput 3.95579K wps
[Epoch 76 Batch 90/162] avg loss 0.0021683, throughput 3.96162K wps
[Epoch 76 Batch 120/162] avg loss 0.00208596, throughput 3.97113K wps
[Epoch 76 Batch 150/162] avg loss 0.00233554, throughput 3.97022K wps
Begin Testing...
[Epoch 76] train avg loss 0.00226071, dev acc 0.9000, dev avg loss 0.240361, throughput 3.97872K wps
Observed Improvement.
Begin Testing...
[Epoch 77 Batch 30/162] avg loss 0.00235833, throughput 4.04451K wps
[Epoch 77 Batch 60/162] avg loss 0.00215022, throughput 3.95864K wps
[Epoch 77 Batch 90/162] avg loss 0.00223557, throughput 3.96468K wps
[Epoch 77 Batch 120/162] avg loss 0.00201282, throughput 3.96672K wps
[Epoch 77 Batch 150/162] avg loss 0.00200236, throughput 3.9746K wps
Begin Testing...
[Epoch 77] train avg loss 0.00217124, dev acc 0.8989, dev avg loss 0.240651, throughput 3.97869K wps
[Epoch 78 Batch 30/162] avg loss 0.00210733, throughput 4.03738K wps
[Epoch 78 Batch 60/162] avg loss 0.00203033, throughput 3.97368K wps
[Epoch 78 Batch 90/162] avg loss 0.00212875, throughput 3.96772K wps
[Epoch 78 Batch 120/162] avg loss 0.00195987, throughput 3.94755K wps
[Epoch 78 Batch 150/162] avg loss 0.00194777, throughput 3.96723K wps
Begin Testing...
[Epoch 78] train avg loss 0.00203619, dev acc 0.9011, dev avg loss 0.24139, throughput 3.97725K wps
Observed Improvement.
Begin Testing...
[Epoch 79 Batch 30/162] avg loss 0.00215608, throughput 4.06517K wps
[Epoch 79 Batch 60/162] avg loss 0.00200392, throughput 3.96051K wps
[Epoch 79 Batch 90/162] avg loss 0.00194284, throughput 3.96099K wps
[Epoch 79 Batch 120/162] avg loss 0.00192171, throughput 3.96086K wps
[Epoch 79 Batch 150/162] avg loss 0.00187912, throughput 3.97876K wps
Begin Testing...
[Epoch 79] train avg loss 0.00196884, dev acc 0.9033, dev avg loss 0.241328, throughput 3.98163K wps
Observed Improvement.
Begin Testing...
[Epoch 80 Batch 30/162] avg loss 0.00209023, throughput 4.05796K wps
[Epoch 80 Batch 60/162] avg loss 0.00172572, throughput 3.96314K wps
[Epoch 80 Batch 90/162] avg loss 0.00205868, throughput 3.96428K wps
[Epoch 80 Batch 120/162] avg loss 0.00211927, throughput 3.96822K wps
[Epoch 80 Batch 150/162] avg loss 0.00173837, throughput 3.96674K wps
Begin Testing...
[Epoch 80] train avg loss 0.00194921, dev acc 0.8989, dev avg loss 0.239486, throughput 3.98315K wps
[Epoch 81 Batch 30/162] avg loss 0.00211888, throughput 4.07019K wps
[Epoch 81 Batch 60/162] avg loss 0.00177163, throughput 3.95227K wps
[Epoch 81 Batch 90/162] avg loss 0.0020371, throughput 3.96265K wps
[Epoch 81 Batch 120/162] avg loss 0.00198307, throughput 3.96925K wps
[Epoch 81 Batch 150/162] avg loss 0.00175582, throughput 3.96741K wps
Begin Testing...
[Epoch 81] train avg loss 0.00195633, dev acc 0.9011, dev avg loss 0.241331, throughput 3.98192K wps
[Epoch 82 Batch 30/162] avg loss 0.00186989, throughput 4.05504K wps
[Epoch 82 Batch 60/162] avg loss 0.00190732, throughput 3.96594K wps
[Epoch 82 Batch 90/162] avg loss 0.00204596, throughput 3.96546K wps
[Epoch 82 Batch 120/162] avg loss 0.00174881, throughput 3.95747K wps
[Epoch 82 Batch 150/162] avg loss 0.00183708, throughput 3.9602K wps
Begin Testing...
[Epoch 82] train avg loss 0.00187073, dev acc 0.9000, dev avg loss 0.238067, throughput 3.97942K wps
[Epoch 83 Batch 30/162] avg loss 0.00171226, throughput 4.04326K wps
[Epoch 83 Batch 60/162] avg loss 0.00164452, throughput 3.9683K wps
[Epoch 83 Batch 90/162] avg loss 0.00199694, throughput 3.96563K wps
[Epoch 83 Batch 120/162] avg loss 0.00201016, throughput 3.95327K wps
[Epoch 83 Batch 150/162] avg loss 0.00182503, throughput 3.93469K wps
Begin Testing...
[Epoch 83] train avg loss 0.00183701, dev acc 0.8989, dev avg loss 0.239399, throughput 3.97106K wps
[Epoch 84 Batch 30/162] avg loss 0.00166996, throughput 4.05683K wps
[Epoch 84 Batch 60/162] avg loss 0.00174818, throughput 3.96695K wps
[Epoch 84 Batch 90/162] avg loss 0.00165225, throughput 3.95285K wps
[Epoch 84 Batch 120/162] avg loss 0.00172488, throughput 3.95525K wps
[Epoch 84 Batch 150/162] avg loss 0.00165093, throughput 3.94382K wps
Begin Testing...
[Epoch 84] train avg loss 0.00170078, dev acc 0.9000, dev avg loss 0.239728, throughput 3.97462K wps
[Epoch 85 Batch 30/162] avg loss 0.00168601, throughput 4.05349K wps
[Epoch 85 Batch 60/162] avg loss 0.00159504, throughput 3.94293K wps
[Epoch 85 Batch 90/162] avg loss 0.00170272, throughput 3.95102K wps
[Epoch 85 Batch 120/162] avg loss 0.0016044, throughput 3.96461K wps
[Epoch 85 Batch 150/162] avg loss 0.00182079, throughput 3.95376K wps
Begin Testing...
[Epoch 85] train avg loss 0.0016724, dev acc 0.8989, dev avg loss 0.239934, throughput 3.9702K wps
[Epoch 86 Batch 30/162] avg loss 0.001518, throughput 4.06084K wps
[Epoch 86 Batch 60/162] avg loss 0.00178481, throughput 3.95997K wps
[Epoch 86 Batch 90/162] avg loss 0.00164182, throughput 3.95049K wps
[Epoch 86 Batch 120/162] avg loss 0.00152076, throughput 3.9751K wps
[Epoch 86 Batch 150/162] avg loss 0.00156044, throughput 3.95552K wps
Begin Testing...
[Epoch 86] train avg loss 0.00161438, dev acc 0.8978, dev avg loss 0.239597, throughput 3.97712K wps
[Epoch 87 Batch 30/162] avg loss 0.00145027, throughput 4.065K wps
[Epoch 87 Batch 60/162] avg loss 0.00175027, throughput 3.9439K wps
[Epoch 87 Batch 90/162] avg loss 0.00156064, throughput 3.9448K wps
[Epoch 87 Batch 120/162] avg loss 0.00160775, throughput 3.9635K wps
[Epoch 87 Batch 150/162] avg loss 0.00155216, throughput 3.96536K wps
Begin Testing...
[Epoch 87] train avg loss 0.00158392, dev acc 0.8978, dev avg loss 0.240893, throughput 3.97523K wps
[Epoch 88 Batch 30/162] avg loss 0.0016033, throughput 4.05738K wps
[Epoch 88 Batch 60/162] avg loss 0.00167878, throughput 3.96346K wps
[Epoch 88 Batch 90/162] avg loss 0.0014008, throughput 3.95642K wps
[Epoch 88 Batch 120/162] avg loss 0.00160333, throughput 3.9673K wps
[Epoch 88 Batch 150/162] avg loss 0.00161325, throughput 3.96861K wps
Begin Testing...
[Epoch 88] train avg loss 0.00157028, dev acc 0.8989, dev avg loss 0.238147, throughput 3.97738K wps
[Epoch 89 Batch 30/162] avg loss 0.00169086, throughput 4.05123K wps
[Epoch 89 Batch 60/162] avg loss 0.00133566, throughput 3.94428K wps
[Epoch 89 Batch 90/162] avg loss 0.00153993, throughput 3.96258K wps
[Epoch 89 Batch 120/162] avg loss 0.00154911, throughput 3.96983K wps
[Epoch 89 Batch 150/162] avg loss 0.00158629, throughput 3.96127K wps
Begin Testing...
[Epoch 89] train avg loss 0.00154873, dev acc 0.8989, dev avg loss 0.239097, throughput 3.97698K wps
[Epoch 90 Batch 30/162] avg loss 0.00146761, throughput 4.05827K wps
[Epoch 90 Batch 60/162] avg loss 0.00148392, throughput 3.94654K wps
[Epoch 90 Batch 90/162] avg loss 0.00152199, throughput 3.95957K wps
[Epoch 90 Batch 120/162] avg loss 0.00135562, throughput 3.95849K wps
[Epoch 90 Batch 150/162] avg loss 0.00133083, throughput 3.958K wps
Begin Testing...
[Epoch 90] train avg loss 0.0014426, dev acc 0.8978, dev avg loss 0.238468, throughput 3.97525K wps
[Epoch 91 Batch 30/162] avg loss 0.00133155, throughput 4.05727K wps
[Epoch 91 Batch 60/162] avg loss 0.00132776, throughput 3.9744K wps
[Epoch 91 Batch 90/162] avg loss 0.00140035, throughput 3.95938K wps
[Epoch 91 Batch 120/162] avg loss 0.00152908, throughput 3.96588K wps
[Epoch 91 Batch 150/162] avg loss 0.00148522, throughput 3.97505K wps
Begin Testing...
[Epoch 91] train avg loss 0.00140125, dev acc 0.8978, dev avg loss 0.239955, throughput 3.98327K wps
[Epoch 92 Batch 30/162] avg loss 0.00154604, throughput 4.04377K wps
[Epoch 92 Batch 60/162] avg loss 0.00114665, throughput 3.95651K wps
[Epoch 92 Batch 90/162] avg loss 0.00136424, throughput 3.96586K wps
[Epoch 92 Batch 120/162] avg loss 0.00129132, throughput 3.95641K wps
[Epoch 92 Batch 150/162] avg loss 0.00142167, throughput 3.96718K wps
Begin Testing...
[Epoch 92] train avg loss 0.00136577, dev acc 0.8956, dev avg loss 0.239937, throughput 3.97649K wps
[Epoch 93 Batch 30/162] avg loss 0.00139935, throughput 4.04246K wps
[Epoch 93 Batch 60/162] avg loss 0.0013111, throughput 3.95187K wps
[Epoch 93 Batch 90/162] avg loss 0.00131296, throughput 3.95584K wps
[Epoch 93 Batch 120/162] avg loss 0.00116895, throughput 3.96084K wps
[Epoch 93 Batch 150/162] avg loss 0.00129297, throughput 3.95217K wps
Begin Testing...
[Epoch 93] train avg loss 0.00131044, dev acc 0.8956, dev avg loss 0.241189, throughput 3.97057K wps
[Epoch 94 Batch 30/162] avg loss 0.00125496, throughput 4.06625K wps
[Epoch 94 Batch 60/162] avg loss 0.00119867, throughput 3.96925K wps
[Epoch 94 Batch 90/162] avg loss 0.00133707, throughput 3.95542K wps
[Epoch 94 Batch 120/162] avg loss 0.00111569, throughput 3.96436K wps
[Epoch 94 Batch 150/162] avg loss 0.00130494, throughput 3.97068K wps
Begin Testing...
[Epoch 94] train avg loss 0.0012645, dev acc 0.8967, dev avg loss 0.241228, throughput 3.98333K wps
[Epoch 95 Batch 30/162] avg loss 0.0011475, throughput 4.07267K wps
[Epoch 95 Batch 60/162] avg loss 0.00126641, throughput 3.95725K wps
[Epoch 95 Batch 90/162] avg loss 0.00119294, throughput 3.95691K wps
[Epoch 95 Batch 120/162] avg loss 0.00142312, throughput 3.97245K wps
[Epoch 95 Batch 150/162] avg loss 0.00118461, throughput 3.9715K wps
Begin Testing...
[Epoch 95] train avg loss 0.00124105, dev acc 0.8967, dev avg loss 0.24198, throughput 3.9838K wps
[Epoch 96 Batch 30/162] avg loss 0.00131186, throughput 4.06996K wps
[Epoch 96 Batch 60/162] avg loss 0.00103235, throughput 3.96605K wps
[Epoch 96 Batch 90/162] avg loss 0.00123159, throughput 3.96681K wps
[Epoch 96 Batch 120/162] avg loss 0.00122476, throughput 3.96247K wps
[Epoch 96 Batch 150/162] avg loss 0.00114792, throughput 3.97953K wps
Begin Testing...
[Epoch 96] train avg loss 0.0012123, dev acc 0.8956, dev avg loss 0.241236, throughput 3.98778K wps
[Epoch 97 Batch 30/162] avg loss 0.00115085, throughput 4.06758K wps
[Epoch 97 Batch 60/162] avg loss 0.00111026, throughput 3.96895K wps
[Epoch 97 Batch 90/162] avg loss 0.00115325, throughput 3.97393K wps
[Epoch 97 Batch 120/162] avg loss 0.00120569, throughput 3.96583K wps
[Epoch 97 Batch 150/162] avg loss 0.00108215, throughput 3.97509K wps
Begin Testing...
[Epoch 97] train avg loss 0.00115237, dev acc 0.8989, dev avg loss 0.242093, throughput 3.98875K wps
[Epoch 98 Batch 30/162] avg loss 0.0012531, throughput 4.06963K wps
[Epoch 98 Batch 60/162] avg loss 0.00115388, throughput 3.94954K wps
[Epoch 98 Batch 90/162] avg loss 0.00105686, throughput 3.96759K wps
[Epoch 98 Batch 120/162] avg loss 0.00120488, throughput 3.95709K wps
[Epoch 98 Batch 150/162] avg loss 0.00118372, throughput 3.96781K wps
Begin Testing...
[Epoch 98] train avg loss 0.00118039, dev acc 0.9000, dev avg loss 0.242999, throughput 3.98105K wps
[Epoch 99 Batch 30/162] avg loss 0.00109133, throughput 4.04268K wps
[Epoch 99 Batch 60/162] avg loss 0.00101396, throughput 3.94738K wps
[Epoch 99 Batch 90/162] avg loss 0.00111775, throughput 3.95559K wps
[Epoch 99 Batch 120/162] avg loss 0.00105711, throughput 3.97358K wps
[Epoch 99 Batch 150/162] avg loss 0.00116118, throughput 3.95699K wps
Begin Testing...
[Epoch 99] train avg loss 0.00110146, dev acc 0.8967, dev avg loss 0.245265, throughput 3.97342K wps
[Epoch 100 Batch 30/162] avg loss 0.00107504, throughput 4.04922K wps
[Epoch 100 Batch 60/162] avg loss 0.000943282, throughput 3.96059K wps
[Epoch 100 Batch 90/162] avg loss 0.00102858, throughput 3.96441K wps
[Epoch 100 Batch 120/162] avg loss 0.00107484, throughput 3.9469K wps
[Epoch 100 Batch 150/162] avg loss 0.00115652, throughput 3.96045K wps
Begin Testing...
[Epoch 100] train avg loss 0.00106105, dev acc 0.9044, dev avg loss 0.248524, throughput 3.9755K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/162] avg loss 0.000943511, throughput 4.06044K wps
[Epoch 101 Batch 60/162] avg loss 0.00110198, throughput 3.97191K wps
[Epoch 101 Batch 90/162] avg loss 0.000965096, throughput 3.95608K wps
[Epoch 101 Batch 120/162] avg loss 0.00109847, throughput 3.94225K wps
[Epoch 101 Batch 150/162] avg loss 0.00103827, throughput 3.96424K wps
Begin Testing...
[Epoch 101] train avg loss 0.00102412, dev acc 0.8956, dev avg loss 0.245734, throughput 3.9784K wps
[Epoch 102 Batch 30/162] avg loss 0.000889875, throughput 4.06598K wps
[Epoch 102 Batch 60/162] avg loss 0.00110784, throughput 3.96938K wps
[Epoch 102 Batch 90/162] avg loss 0.00111209, throughput 3.96323K wps
[Epoch 102 Batch 120/162] avg loss 0.0010292, throughput 3.97061K wps
[Epoch 102 Batch 150/162] avg loss 0.000914474, throughput 3.95475K wps
Begin Testing...
[Epoch 102] train avg loss 0.00101207, dev acc 0.8956, dev avg loss 0.245136, throughput 3.98268K wps
[Epoch 103 Batch 30/162] avg loss 0.00100605, throughput 4.04121K wps
[Epoch 103 Batch 60/162] avg loss 0.00103279, throughput 3.95573K wps
[Epoch 103 Batch 90/162] avg loss 0.00110248, throughput 3.97128K wps
[Epoch 103 Batch 120/162] avg loss 0.00101588, throughput 3.96509K wps
[Epoch 103 Batch 150/162] avg loss 0.000924667, throughput 3.9593K wps
Begin Testing...
[Epoch 103] train avg loss 0.000993822, dev acc 0.9000, dev avg loss 0.245273, throughput 3.97685K wps
[Epoch 104 Batch 30/162] avg loss 0.000930212, throughput 4.06496K wps
[Epoch 104 Batch 60/162] avg loss 0.000975993, throughput 3.97981K wps
[Epoch 104 Batch 90/162] avg loss 0.00091366, throughput 3.9667K wps
[Epoch 104 Batch 120/162] avg loss 0.00105616, throughput 3.9599K wps
[Epoch 104 Batch 150/162] avg loss 0.00102331, throughput 3.97843K wps
Begin Testing...
[Epoch 104] train avg loss 0.000976941, dev acc 0.8956, dev avg loss 0.244884, throughput 3.98699K wps
[Epoch 105 Batch 30/162] avg loss 0.000951423, throughput 4.05438K wps
[Epoch 105 Batch 60/162] avg loss 0.000899866, throughput 3.9578K wps
[Epoch 105 Batch 90/162] avg loss 0.000890108, throughput 3.94492K wps
[Epoch 105 Batch 120/162] avg loss 0.000976226, throughput 3.96K wps
[Epoch 105 Batch 150/162] avg loss 0.00104204, throughput 3.97359K wps
Begin Testing...
[Epoch 105] train avg loss 0.000954696, dev acc 0.8967, dev avg loss 0.245687, throughput 3.97733K wps
[Epoch 106 Batch 30/162] avg loss 0.000926991, throughput 4.05952K wps
[Epoch 106 Batch 60/162] avg loss 0.000902835, throughput 3.95706K wps
[Epoch 106 Batch 90/162] avg loss 0.000939963, throughput 3.962K wps
[Epoch 106 Batch 120/162] avg loss 0.00086456, throughput 3.94928K wps
[Epoch 106 Batch 150/162] avg loss 0.000894882, throughput 3.96499K wps
Begin Testing...
[Epoch 106] train avg loss 0.00090057, dev acc 0.8956, dev avg loss 0.245998, throughput 3.9768K wps
[Epoch 107 Batch 30/162] avg loss 0.000803415, throughput 4.05935K wps
[Epoch 107 Batch 60/162] avg loss 0.00083208, throughput 3.94786K wps
[Epoch 107 Batch 90/162] avg loss 0.000888453, throughput 3.95765K wps
[Epoch 107 Batch 120/162] avg loss 0.00093531, throughput 3.97219K wps
[Epoch 107 Batch 150/162] avg loss 0.000762583, throughput 3.97412K wps
Begin Testing...
[Epoch 107] train avg loss 0.000858362, dev acc 0.8978, dev avg loss 0.247313, throughput 3.98106K wps
[Epoch 108 Batch 30/162] avg loss 0.000910319, throughput 4.06619K wps
[Epoch 108 Batch 60/162] avg loss 0.000884165, throughput 3.96707K wps
[Epoch 108 Batch 90/162] avg loss 0.000930021, throughput 3.95829K wps
[Epoch 108 Batch 120/162] avg loss 0.000894289, throughput 3.95765K wps
[Epoch 108 Batch 150/162] avg loss 0.000845178, throughput 3.9483K wps
Begin Testing...
[Epoch 108] train avg loss 0.000890356, dev acc 0.8978, dev avg loss 0.248501, throughput 3.97673K wps
[Epoch 109 Batch 30/162] avg loss 0.000841816, throughput 4.05377K wps
[Epoch 109 Batch 60/162] avg loss 0.000833182, throughput 3.9628K wps
[Epoch 109 Batch 90/162] avg loss 0.000961313, throughput 3.96421K wps
[Epoch 109 Batch 120/162] avg loss 0.000946962, throughput 3.96638K wps
[Epoch 109 Batch 150/162] avg loss 0.000831215, throughput 3.97144K wps
Begin Testing...
[Epoch 109] train avg loss 0.000875513, dev acc 0.8967, dev avg loss 0.247223, throughput 3.98243K wps
[Epoch 110 Batch 30/162] avg loss 0.000831883, throughput 4.03978K wps
[Epoch 110 Batch 60/162] avg loss 0.000783402, throughput 3.95704K wps
[Epoch 110 Batch 90/162] avg loss 0.000857973, throughput 3.96088K wps
[Epoch 110 Batch 120/162] avg loss 0.000776701, throughput 3.95214K wps
[Epoch 110 Batch 150/162] avg loss 0.000831313, throughput 3.95548K wps
Begin Testing...
[Epoch 110] train avg loss 0.00082019, dev acc 0.8978, dev avg loss 0.24951, throughput 3.97277K wps
[Epoch 111 Batch 30/162] avg loss 0.000804183, throughput 4.06091K wps
[Epoch 111 Batch 60/162] avg loss 0.000781075, throughput 3.95206K wps
[Epoch 111 Batch 90/162] avg loss 0.000681437, throughput 3.96827K wps
[Epoch 111 Batch 120/162] avg loss 0.000825613, throughput 3.96838K wps
[Epoch 111 Batch 150/162] avg loss 0.000979108, throughput 3.96021K wps
Begin Testing...
[Epoch 111] train avg loss 0.000820107, dev acc 0.8967, dev avg loss 0.252263, throughput 3.98097K wps
[Epoch 112 Batch 30/162] avg loss 0.000719986, throughput 4.07855K wps
[Epoch 112 Batch 60/162] avg loss 0.000835301, throughput 3.96985K wps
[Epoch 112 Batch 90/162] avg loss 0.000809133, throughput 3.96437K wps
[Epoch 112 Batch 120/162] avg loss 0.000808466, throughput 3.9747K wps
[Epoch 112 Batch 150/162] avg loss 0.000692065, throughput 3.96838K wps
Begin Testing...
[Epoch 112] train avg loss 0.000769508, dev acc 0.8989, dev avg loss 0.25049, throughput 3.98896K wps
[Epoch 113 Batch 30/162] avg loss 0.000852566, throughput 4.05946K wps
[Epoch 113 Batch 60/162] avg loss 0.000713773, throughput 3.97179K wps
[Epoch 113 Batch 90/162] avg loss 0.000707412, throughput 3.96998K wps
[Epoch 113 Batch 120/162] avg loss 0.000673329, throughput 3.96076K wps
[Epoch 113 Batch 150/162] avg loss 0.000812328, throughput 3.97079K wps
Begin Testing...
[Epoch 113] train avg loss 0.000749065, dev acc 0.8978, dev avg loss 0.252632, throughput 3.98572K wps
[Epoch 114 Batch 30/162] avg loss 0.000668251, throughput 4.03569K wps
[Epoch 114 Batch 60/162] avg loss 0.000719066, throughput 3.95973K wps
[Epoch 114 Batch 90/162] avg loss 0.000770817, throughput 3.9621K wps
[Epoch 114 Batch 120/162] avg loss 0.000750088, throughput 3.95508K wps
[Epoch 114 Batch 150/162] avg loss 0.000923265, throughput 3.9646K wps
Begin Testing...
[Epoch 114] train avg loss 0.000756746, dev acc 0.8956, dev avg loss 0.252296, throughput 3.97528K wps
[Epoch 115 Batch 30/162] avg loss 0.000713875, throughput 4.05896K wps
[Epoch 115 Batch 60/162] avg loss 0.00068118, throughput 3.9522K wps
[Epoch 115 Batch 90/162] avg loss 0.00086092, throughput 3.96702K wps
[Epoch 115 Batch 120/162] avg loss 0.000725498, throughput 3.96376K wps
[Epoch 115 Batch 150/162] avg loss 0.000710573, throughput 3.97055K wps
Begin Testing...
[Epoch 115] train avg loss 0.000731012, dev acc 0.8989, dev avg loss 0.252655, throughput 3.98115K wps
[Epoch 116 Batch 30/162] avg loss 0.000717067, throughput 4.06031K wps
[Epoch 116 Batch 60/162] avg loss 0.000635242, throughput 3.9488K wps
[Epoch 116 Batch 90/162] avg loss 0.00073288, throughput 3.97832K wps
[Epoch 116 Batch 120/162] avg loss 0.000687232, throughput 3.9694K wps
[Epoch 116 Batch 150/162] avg loss 0.00065538, throughput 3.96649K wps
Begin Testing...
[Epoch 116] train avg loss 0.000684248, dev acc 0.8956, dev avg loss 0.254198, throughput 3.98075K wps
[Epoch 117 Batch 30/162] avg loss 0.00063391, throughput 4.06971K wps
[Epoch 117 Batch 60/162] avg loss 0.000682647, throughput 3.95732K wps
[Epoch 117 Batch 90/162] avg loss 0.000688209, throughput 3.95666K wps
[Epoch 117 Batch 120/162] avg loss 0.000655646, throughput 3.96604K wps
[Epoch 117 Batch 150/162] avg loss 0.000718038, throughput 3.97919K wps
Begin Testing...
[Epoch 117] train avg loss 0.000671532, dev acc 0.8956, dev avg loss 0.252121, throughput 3.98467K wps
[Epoch 118 Batch 30/162] avg loss 0.000665647, throughput 4.06192K wps
[Epoch 118 Batch 60/162] avg loss 0.000676545, throughput 3.97289K wps
[Epoch 118 Batch 90/162] avg loss 0.000633692, throughput 3.95873K wps
[Epoch 118 Batch 120/162] avg loss 0.000620294, throughput 3.96207K wps
[Epoch 118 Batch 150/162] avg loss 0.00065941, throughput 3.97135K wps
Begin Testing...
[Epoch 118] train avg loss 0.00065222, dev acc 0.8978, dev avg loss 0.254072, throughput 3.98197K wps
[Epoch 119 Batch 30/162] avg loss 0.000633247, throughput 4.06641K wps
[Epoch 119 Batch 60/162] avg loss 0.000685571, throughput 3.9602K wps
[Epoch 119 Batch 90/162] avg loss 0.000681268, throughput 3.97445K wps
[Epoch 119 Batch 120/162] avg loss 0.000648439, throughput 3.95831K wps
[Epoch 119 Batch 150/162] avg loss 0.000680758, throughput 3.96009K wps
Begin Testing...
[Epoch 119] train avg loss 0.000662531, dev acc 0.8967, dev avg loss 0.255835, throughput 3.98235K wps
[Epoch 120 Batch 30/162] avg loss 0.000632685, throughput 4.06655K wps
[Epoch 120 Batch 60/162] avg loss 0.000728685, throughput 3.94839K wps
[Epoch 120 Batch 90/162] avg loss 0.000617156, throughput 3.97718K wps
[Epoch 120 Batch 120/162] avg loss 0.000654249, throughput 3.97544K wps
[Epoch 120 Batch 150/162] avg loss 0.000629029, throughput 3.96376K wps
Begin Testing...
[Epoch 120] train avg loss 0.000647743, dev acc 0.8956, dev avg loss 0.254573, throughput 3.98414K wps
[Epoch 121 Batch 30/162] avg loss 0.000663116, throughput 4.0558K wps
[Epoch 121 Batch 60/162] avg loss 0.000528905, throughput 3.95766K wps
[Epoch 121 Batch 90/162] avg loss 0.000641359, throughput 3.96156K wps
[Epoch 121 Batch 120/162] avg loss 0.000646815, throughput 3.97405K wps
[Epoch 121 Batch 150/162] avg loss 0.000584232, throughput 3.96105K wps
Begin Testing...
[Epoch 121] train avg loss 0.000617876, dev acc 0.8978, dev avg loss 0.257717, throughput 3.98013K wps
[Epoch 122 Batch 30/162] avg loss 0.000571483, throughput 4.06767K wps
[Epoch 122 Batch 60/162] avg loss 0.000675399, throughput 3.96562K wps
[Epoch 122 Batch 90/162] avg loss 0.000663318, throughput 3.96728K wps
[Epoch 122 Batch 120/162] avg loss 0.000566027, throughput 3.95438K wps
[Epoch 122 Batch 150/162] avg loss 0.000712876, throughput 3.96364K wps
Begin Testing...
[Epoch 122] train avg loss 0.00062761, dev acc 0.8978, dev avg loss 0.25608, throughput 3.9818K wps
[Epoch 123 Batch 30/162] avg loss 0.000604489, throughput 4.0526K wps
[Epoch 123 Batch 60/162] avg loss 0.000646186, throughput 3.96717K wps
[Epoch 123 Batch 90/162] avg loss 0.000569686, throughput 3.96154K wps
[Epoch 123 Batch 120/162] avg loss 0.000567558, throughput 3.96125K wps
[Epoch 123 Batch 150/162] avg loss 0.000552118, throughput 3.9798K wps
Begin Testing...
[Epoch 123] train avg loss 0.000584997, dev acc 0.8956, dev avg loss 0.256448, throughput 3.98249K wps
[Epoch 124 Batch 30/162] avg loss 0.000627028, throughput 4.05554K wps
[Epoch 124 Batch 60/162] avg loss 0.000641558, throughput 3.96011K wps
[Epoch 124 Batch 90/162] avg loss 0.0005078, throughput 3.97203K wps
[Epoch 124 Batch 120/162] avg loss 0.000537339, throughput 3.95715K wps
[Epoch 124 Batch 150/162] avg loss 0.000619223, throughput 3.94861K wps
Begin Testing...
[Epoch 124] train avg loss 0.000581833, dev acc 0.8978, dev avg loss 0.259145, throughput 3.97727K wps
[Epoch 125 Batch 30/162] avg loss 0.000547609, throughput 4.04722K wps
[Epoch 125 Batch 60/162] avg loss 0.000548101, throughput 3.94631K wps
[Epoch 125 Batch 90/162] avg loss 0.000594597, throughput 3.96028K wps
[Epoch 125 Batch 120/162] avg loss 0.000630136, throughput 3.97206K wps
[Epoch 125 Batch 150/162] avg loss 0.00053708, throughput 3.97082K wps
Begin Testing...
[Epoch 125] train avg loss 0.000563737, dev acc 0.8956, dev avg loss 0.258453, throughput 3.97778K wps
[Epoch 126 Batch 30/162] avg loss 0.000594024, throughput 4.05928K wps
[Epoch 126 Batch 60/162] avg loss 0.000506824, throughput 3.96236K wps
[Epoch 126 Batch 90/162] avg loss 0.000578271, throughput 3.96961K wps
[Epoch 126 Batch 120/162] avg loss 0.000597347, throughput 3.97177K wps
[Epoch 126 Batch 150/162] avg loss 0.000568022, throughput 3.95545K wps
Begin Testing...
[Epoch 126] train avg loss 0.000571447, dev acc 0.8967, dev avg loss 0.258832, throughput 3.98166K wps
[Epoch 127 Batch 30/162] avg loss 0.00051339, throughput 4.0587K wps
[Epoch 127 Batch 60/162] avg loss 0.000556379, throughput 3.95654K wps
[Epoch 127 Batch 90/162] avg loss 0.000598162, throughput 3.96448K wps
[Epoch 127 Batch 120/162] avg loss 0.000553316, throughput 3.94299K wps
[Epoch 127 Batch 150/162] avg loss 0.000621371, throughput 3.95422K wps
Begin Testing...
[Epoch 127] train avg loss 0.00056144, dev acc 0.8944, dev avg loss 0.258212, throughput 3.97303K wps
[Epoch 128 Batch 30/162] avg loss 0.000641037, throughput 4.06058K wps
[Epoch 128 Batch 60/162] avg loss 0.00045457, throughput 3.95345K wps
[Epoch 128 Batch 90/162] avg loss 0.000425318, throughput 3.9598K wps
[Epoch 128 Batch 120/162] avg loss 0.000594218, throughput 3.97241K wps
[Epoch 128 Batch 150/162] avg loss 0.000560563, throughput 3.95537K wps
Begin Testing...
[Epoch 128] train avg loss 0.00053689, dev acc 0.8978, dev avg loss 0.260877, throughput 3.97954K wps
[Epoch 129 Batch 30/162] avg loss 0.000512431, throughput 4.06692K wps
[Epoch 129 Batch 60/162] avg loss 0.000525189, throughput 3.9561K wps
[Epoch 129 Batch 90/162] avg loss 0.000523227, throughput 3.97054K wps
[Epoch 129 Batch 120/162] avg loss 0.000531152, throughput 3.96605K wps
[Epoch 129 Batch 150/162] avg loss 0.000548773, throughput 3.97257K wps
Begin Testing...
[Epoch 129] train avg loss 0.000526755, dev acc 0.9000, dev avg loss 0.262216, throughput 3.98561K wps
[Epoch 130 Batch 30/162] avg loss 0.000512992, throughput 4.06416K wps
[Epoch 130 Batch 60/162] avg loss 0.000495642, throughput 3.95732K wps
[Epoch 130 Batch 90/162] avg loss 0.000519911, throughput 3.95495K wps
[Epoch 130 Batch 120/162] avg loss 0.000484654, throughput 3.95917K wps
[Epoch 130 Batch 150/162] avg loss 0.000492878, throughput 3.97201K wps
Begin Testing...
[Epoch 130] train avg loss 0.000503757, dev acc 0.8956, dev avg loss 0.262126, throughput 3.97886K wps
[Epoch 131 Batch 30/162] avg loss 0.000473289, throughput 4.0583K wps
[Epoch 131 Batch 60/162] avg loss 0.000502768, throughput 3.97313K wps
[Epoch 131 Batch 90/162] avg loss 0.000523169, throughput 3.97542K wps
[Epoch 131 Batch 120/162] avg loss 0.000491004, throughput 3.96286K wps
[Epoch 131 Batch 150/162] avg loss 0.000460247, throughput 3.96907K wps
Begin Testing...
[Epoch 131] train avg loss 0.000489153, dev acc 0.8989, dev avg loss 0.26284, throughput 3.98633K wps
[Epoch 132 Batch 30/162] avg loss 0.000513328, throughput 4.05925K wps
[Epoch 132 Batch 60/162] avg loss 0.000419858, throughput 3.96462K wps
[Epoch 132 Batch 90/162] avg loss 0.000497213, throughput 3.9623K wps
[Epoch 132 Batch 120/162] avg loss 0.000485253, throughput 3.95862K wps
[Epoch 132 Batch 150/162] avg loss 0.000513628, throughput 3.96966K wps
Begin Testing...
[Epoch 132] train avg loss 0.000486497, dev acc 0.9022, dev avg loss 0.264335, throughput 3.98267K wps
[Epoch 133 Batch 30/162] avg loss 0.000469916, throughput 4.05269K wps
[Epoch 133 Batch 60/162] avg loss 0.000406434, throughput 3.96756K wps
[Epoch 133 Batch 90/162] avg loss 0.000437805, throughput 3.95604K wps
[Epoch 133 Batch 120/162] avg loss 0.000460494, throughput 3.95441K wps
[Epoch 133 Batch 150/162] avg loss 0.000507908, throughput 3.96556K wps
Begin Testing...
[Epoch 133] train avg loss 0.000456683, dev acc 0.8967, dev avg loss 0.263162, throughput 3.97796K wps
[Epoch 134 Batch 30/162] avg loss 0.000504523, throughput 4.05423K wps
[Epoch 134 Batch 60/162] avg loss 0.000450976, throughput 3.96688K wps
[Epoch 134 Batch 90/162] avg loss 0.000454018, throughput 3.9521K wps
[Epoch 134 Batch 120/162] avg loss 0.000461211, throughput 3.96017K wps
[Epoch 134 Batch 150/162] avg loss 0.000409863, throughput 3.94955K wps
Begin Testing...
[Epoch 134] train avg loss 0.00045573, dev acc 0.9000, dev avg loss 0.267173, throughput 3.97641K wps
[Epoch 135 Batch 30/162] avg loss 0.000445409, throughput 4.06152K wps
[Epoch 135 Batch 60/162] avg loss 0.000492525, throughput 3.96234K wps
[Epoch 135 Batch 90/162] avg loss 0.000443544, throughput 3.95903K wps
[Epoch 135 Batch 120/162] avg loss 0.000491173, throughput 3.96762K wps
[Epoch 135 Batch 150/162] avg loss 0.000460817, throughput 3.96665K wps
Begin Testing...
[Epoch 135] train avg loss 0.000464545, dev acc 0.8989, dev avg loss 0.266041, throughput 3.98044K wps
[Epoch 136 Batch 30/162] avg loss 0.000464394, throughput 4.06034K wps
[Epoch 136 Batch 60/162] avg loss 0.000454644, throughput 3.96377K wps
[Epoch 136 Batch 90/162] avg loss 0.00047059, throughput 3.96995K wps
[Epoch 136 Batch 120/162] avg loss 0.000412518, throughput 3.95909K wps
[Epoch 136 Batch 150/162] avg loss 0.000422375, throughput 3.96326K wps
Begin Testing...
[Epoch 136] train avg loss 0.000449433, dev acc 0.9000, dev avg loss 0.265786, throughput 3.9816K wps
[Epoch 137 Batch 30/162] avg loss 0.000414045, throughput 4.05939K wps
[Epoch 137 Batch 60/162] avg loss 0.000419513, throughput 3.96669K wps
[Epoch 137 Batch 90/162] avg loss 0.000453706, throughput 3.97652K wps
[Epoch 137 Batch 120/162] avg loss 0.000369582, throughput 3.96888K wps
[Epoch 137 Batch 150/162] avg loss 0.000485378, throughput 3.96462K wps
Begin Testing...
[Epoch 137] train avg loss 0.000431673, dev acc 0.8933, dev avg loss 0.265819, throughput 3.98396K wps
[Epoch 138 Batch 30/162] avg loss 0.000438793, throughput 4.0683K wps
[Epoch 138 Batch 60/162] avg loss 0.00044864, throughput 3.95917K wps
[Epoch 138 Batch 90/162] avg loss 0.000390744, throughput 3.9788K wps
[Epoch 138 Batch 120/162] avg loss 0.000464867, throughput 3.97064K wps
[Epoch 138 Batch 150/162] avg loss 0.000381109, throughput 3.95713K wps
Begin Testing...
[Epoch 138] train avg loss 0.000423083, dev acc 0.8944, dev avg loss 0.266983, throughput 3.98525K wps
[Epoch 139 Batch 30/162] avg loss 0.00040827, throughput 4.04545K wps
[Epoch 139 Batch 60/162] avg loss 0.000396237, throughput 3.96878K wps
[Epoch 139 Batch 90/162] avg loss 0.000406822, throughput 3.97079K wps
[Epoch 139 Batch 120/162] avg loss 0.000486218, throughput 3.96354K wps
[Epoch 139 Batch 150/162] avg loss 0.000428558, throughput 3.95885K wps
Begin Testing...
[Epoch 139] train avg loss 0.00042223, dev acc 0.8967, dev avg loss 0.267247, throughput 3.98129K wps
[Epoch 140 Batch 30/162] avg loss 0.000399354, throughput 4.06479K wps
[Epoch 140 Batch 60/162] avg loss 0.000406218, throughput 3.97724K wps
[Epoch 140 Batch 90/162] avg loss 0.000405882, throughput 3.96895K wps
[Epoch 140 Batch 120/162] avg loss 0.000419345, throughput 3.95278K wps
[Epoch 140 Batch 150/162] avg loss 0.000401336, throughput 3.94964K wps
Begin Testing...
[Epoch 140] train avg loss 0.00041477, dev acc 0.8956, dev avg loss 0.268792, throughput 3.98175K wps
[Epoch 141 Batch 30/162] avg loss 0.000442725, throughput 4.04225K wps
[Epoch 141 Batch 60/162] avg loss 0.000405275, throughput 3.95299K wps
[Epoch 141 Batch 90/162] avg loss 0.000421854, throughput 3.96674K wps
[Epoch 141 Batch 120/162] avg loss 0.000370185, throughput 3.96248K wps
[Epoch 141 Batch 150/162] avg loss 0.000415256, throughput 3.97614K wps
Begin Testing...
[Epoch 141] train avg loss 0.000404232, dev acc 0.9000, dev avg loss 0.269768, throughput 3.97866K wps
[Epoch 142 Batch 30/162] avg loss 0.000398413, throughput 4.07129K wps
[Epoch 142 Batch 60/162] avg loss 0.000390004, throughput 3.96801K wps
[Epoch 142 Batch 90/162] avg loss 0.000409493, throughput 3.96641K wps
[Epoch 142 Batch 120/162] avg loss 0.000381378, throughput 3.96731K wps
[Epoch 142 Batch 150/162] avg loss 0.000403053, throughput 3.95612K wps
Begin Testing...
[Epoch 142] train avg loss 0.000400348, dev acc 0.8967, dev avg loss 0.2697, throughput 3.98411K wps
[Epoch 143 Batch 30/162] avg loss 0.00038887, throughput 4.05133K wps
[Epoch 143 Batch 60/162] avg loss 0.000364251, throughput 3.95493K wps
[Epoch 143 Batch 90/162] avg loss 0.000391205, throughput 3.95162K wps
[Epoch 143 Batch 120/162] avg loss 0.000475072, throughput 3.95341K wps
[Epoch 143 Batch 150/162] avg loss 0.000402343, throughput 3.95092K wps
Begin Testing...
[Epoch 143] train avg loss 0.000402685, dev acc 0.8967, dev avg loss 0.269463, throughput 3.9689K wps
[Epoch 144 Batch 30/162] avg loss 0.000341929, throughput 4.06584K wps
[Epoch 144 Batch 60/162] avg loss 0.000375543, throughput 3.97043K wps
[Epoch 144 Batch 90/162] avg loss 0.000408633, throughput 3.96254K wps
[Epoch 144 Batch 120/162] avg loss 0.000375055, throughput 3.97242K wps
[Epoch 144 Batch 150/162] avg loss 0.000380236, throughput 3.96886K wps
Begin Testing...
[Epoch 144] train avg loss 0.000376159, dev acc 0.8967, dev avg loss 0.270658, throughput 3.98704K wps
[Epoch 145 Batch 30/162] avg loss 0.000383681, throughput 4.0703K wps
[Epoch 145 Batch 60/162] avg loss 0.000401399, throughput 3.9747K wps
[Epoch 145 Batch 90/162] avg loss 0.000391487, throughput 3.95834K wps
[Epoch 145 Batch 120/162] avg loss 0.000333586, throughput 3.9614K wps
[Epoch 145 Batch 150/162] avg loss 0.000369512, throughput 3.9711K wps
Begin Testing...
[Epoch 145] train avg loss 0.000378976, dev acc 0.8944, dev avg loss 0.272299, throughput 3.98444K wps
[Epoch 146 Batch 30/162] avg loss 0.000404226, throughput 4.06304K wps
[Epoch 146 Batch 60/162] avg loss 0.000400629, throughput 3.97297K wps
[Epoch 146 Batch 90/162] avg loss 0.000331055, throughput 3.97053K wps
[Epoch 146 Batch 120/162] avg loss 0.000337389, throughput 3.96659K wps
[Epoch 146 Batch 150/162] avg loss 0.000327058, throughput 3.96926K wps
Begin Testing...
[Epoch 146] train avg loss 0.000360274, dev acc 0.8978, dev avg loss 0.272607, throughput 3.98532K wps
[Epoch 147 Batch 30/162] avg loss 0.000312457, throughput 4.05325K wps
[Epoch 147 Batch 60/162] avg loss 0.000372308, throughput 3.94937K wps
[Epoch 147 Batch 90/162] avg loss 0.000449159, throughput 3.9605K wps
[Epoch 147 Batch 120/162] avg loss 0.000393867, throughput 3.96876K wps
[Epoch 147 Batch 150/162] avg loss 0.000401046, throughput 3.94959K wps
Begin Testing...
[Epoch 147] train avg loss 0.000386993, dev acc 0.9000, dev avg loss 0.273541, throughput 3.97443K wps
[Epoch 148 Batch 30/162] avg loss 0.000342459, throughput 4.04564K wps
[Epoch 148 Batch 60/162] avg loss 0.000364587, throughput 3.96441K wps
[Epoch 148 Batch 90/162] avg loss 0.000380355, throughput 3.96495K wps
[Epoch 148 Batch 120/162] avg loss 0.000357455, throughput 3.97271K wps
[Epoch 148 Batch 150/162] avg loss 0.000363338, throughput 3.95076K wps
Begin Testing...
[Epoch 148] train avg loss 0.00036355, dev acc 0.8989, dev avg loss 0.273906, throughput 3.97734K wps
[Epoch 149 Batch 30/162] avg loss 0.000399581, throughput 4.02619K wps
[Epoch 149 Batch 60/162] avg loss 0.000287234, throughput 3.95246K wps
[Epoch 149 Batch 90/162] avg loss 0.000346599, throughput 3.9426K wps
[Epoch 149 Batch 120/162] avg loss 0.00040594, throughput 3.95608K wps
[Epoch 149 Batch 150/162] avg loss 0.000363939, throughput 3.96929K wps
Begin Testing...
[Epoch 149] train avg loss 0.000359301, dev acc 0.8989, dev avg loss 0.275419, throughput 3.96954K wps
[Epoch 150 Batch 30/162] avg loss 0.000406814, throughput 4.05889K wps
[Epoch 150 Batch 60/162] avg loss 0.000331873, throughput 3.96289K wps
[Epoch 150 Batch 90/162] avg loss 0.000336332, throughput 3.96579K wps
[Epoch 150 Batch 120/162] avg loss 0.000343555, throughput 3.93493K wps
[Epoch 150 Batch 150/162] avg loss 0.000378225, throughput 3.96652K wps
Begin Testing...
[Epoch 150] train avg loss 0.000359674, dev acc 0.9000, dev avg loss 0.274517, throughput 3.97719K wps
[Epoch 151 Batch 30/162] avg loss 0.000353029, throughput 4.06048K wps
[Epoch 151 Batch 60/162] avg loss 0.000340439, throughput 3.96644K wps
[Epoch 151 Batch 90/162] avg loss 0.000348096, throughput 3.96205K wps
[Epoch 151 Batch 120/162] avg loss 0.000314347, throughput 3.96546K wps
[Epoch 151 Batch 150/162] avg loss 0.00032665, throughput 3.95267K wps
Begin Testing...
[Epoch 151] train avg loss 0.000337244, dev acc 0.8978, dev avg loss 0.274733, throughput 3.97979K wps
[Epoch 152 Batch 30/162] avg loss 0.000375185, throughput 4.06707K wps
[Epoch 152 Batch 60/162] avg loss 0.000302868, throughput 3.96455K wps
[Epoch 152 Batch 90/162] avg loss 0.000339142, throughput 3.95603K wps
[Epoch 152 Batch 120/162] avg loss 0.000340597, throughput 3.9505K wps
[Epoch 152 Batch 150/162] avg loss 0.000331859, throughput 3.97447K wps
Begin Testing...
[Epoch 152] train avg loss 0.000338502, dev acc 0.9011, dev avg loss 0.277272, throughput 3.98057K wps
[Epoch 153 Batch 30/162] avg loss 0.000284087, throughput 4.0602K wps
[Epoch 153 Batch 60/162] avg loss 0.000336014, throughput 3.95898K wps
[Epoch 153 Batch 90/162] avg loss 0.000329774, throughput 3.96864K wps
[Epoch 153 Batch 120/162] avg loss 0.000334695, throughput 3.96648K wps
[Epoch 153 Batch 150/162] avg loss 0.000286432, throughput 3.96818K wps
Begin Testing...
[Epoch 153] train avg loss 0.00032133, dev acc 0.8989, dev avg loss 0.278033, throughput 3.98156K wps
[Epoch 154 Batch 30/162] avg loss 0.000301467, throughput 4.05463K wps
[Epoch 154 Batch 60/162] avg loss 0.000324747, throughput 3.95707K wps
[Epoch 154 Batch 90/162] avg loss 0.00030857, throughput 3.97418K wps
[Epoch 154 Batch 120/162] avg loss 0.000334182, throughput 3.96909K wps
[Epoch 154 Batch 150/162] avg loss 0.00030305, throughput 3.96855K wps
Begin Testing...
[Epoch 154] train avg loss 0.000312899, dev acc 0.8978, dev avg loss 0.278021, throughput 3.98349K wps
[Epoch 155 Batch 30/162] avg loss 0.000287567, throughput 4.05102K wps
[Epoch 155 Batch 60/162] avg loss 0.000325455, throughput 3.95953K wps
[Epoch 155 Batch 90/162] avg loss 0.000286938, throughput 3.98226K wps
[Epoch 155 Batch 120/162] avg loss 0.000319448, throughput 3.97366K wps
[Epoch 155 Batch 150/162] avg loss 0.000327716, throughput 3.97659K wps
Begin Testing...
[Epoch 155] train avg loss 0.000310208, dev acc 0.8978, dev avg loss 0.278686, throughput 3.98595K wps
[Epoch 156 Batch 30/162] avg loss 0.000347534, throughput 4.06505K wps
[Epoch 156 Batch 60/162] avg loss 0.000341393, throughput 3.96602K wps
[Epoch 156 Batch 90/162] avg loss 0.000287925, throughput 3.94482K wps
[Epoch 156 Batch 120/162] avg loss 0.00035033, throughput 3.96923K wps
[Epoch 156 Batch 150/162] avg loss 0.000311205, throughput 3.93954K wps
Begin Testing...
[Epoch 156] train avg loss 0.000328112, dev acc 0.8956, dev avg loss 0.279155, throughput 3.97473K wps
[Epoch 157 Batch 30/162] avg loss 0.000299438, throughput 4.05689K wps
[Epoch 157 Batch 60/162] avg loss 0.000308857, throughput 3.96947K wps
[Epoch 157 Batch 90/162] avg loss 0.00032397, throughput 3.95312K wps
[Epoch 157 Batch 120/162] avg loss 0.000318938, throughput 3.95071K wps
[Epoch 157 Batch 150/162] avg loss 0.000285785, throughput 3.96494K wps
Begin Testing...
[Epoch 157] train avg loss 0.000307591, dev acc 0.8956, dev avg loss 0.279274, throughput 3.97836K wps
[Epoch 158 Batch 30/162] avg loss 0.000294523, throughput 4.05034K wps
[Epoch 158 Batch 60/162] avg loss 0.000317819, throughput 3.94977K wps
[Epoch 158 Batch 90/162] avg loss 0.000321567, throughput 3.97136K wps
[Epoch 158 Batch 120/162] avg loss 0.000297945, throughput 3.97191K wps
[Epoch 158 Batch 150/162] avg loss 0.00028634, throughput 3.95522K wps
Begin Testing...
[Epoch 158] train avg loss 0.000304978, dev acc 0.9011, dev avg loss 0.280575, throughput 3.97663K wps
[Epoch 159 Batch 30/162] avg loss 0.000283568, throughput 4.06775K wps
[Epoch 159 Batch 60/162] avg loss 0.000303383, throughput 3.96074K wps
[Epoch 159 Batch 90/162] avg loss 0.000268271, throughput 3.96305K wps
[Epoch 159 Batch 120/162] avg loss 0.000350629, throughput 3.96444K wps
[Epoch 159 Batch 150/162] avg loss 0.000257978, throughput 3.95089K wps
Begin Testing...
[Epoch 159] train avg loss 0.000295215, dev acc 0.8978, dev avg loss 0.282139, throughput 3.97959K wps
[Epoch 160 Batch 30/162] avg loss 0.000309787, throughput 4.05848K wps
[Epoch 160 Batch 60/162] avg loss 0.0002951, throughput 3.96781K wps
[Epoch 160 Batch 90/162] avg loss 0.000316916, throughput 3.97396K wps
[Epoch 160 Batch 120/162] avg loss 0.000299802, throughput 3.96594K wps
[Epoch 160 Batch 150/162] avg loss 0.000287207, throughput 3.96367K wps
Begin Testing...
[Epoch 160] train avg loss 0.000299241, dev acc 0.8967, dev avg loss 0.280367, throughput 3.98467K wps
[Epoch 161 Batch 30/162] avg loss 0.000352416, throughput 4.04906K wps
[Epoch 161 Batch 60/162] avg loss 0.000264132, throughput 3.96151K wps
[Epoch 161 Batch 90/162] avg loss 0.000285667, throughput 3.96682K wps
[Epoch 161 Batch 120/162] avg loss 0.000294935, throughput 3.96577K wps
[Epoch 161 Batch 150/162] avg loss 0.000251309, throughput 3.96987K wps
Begin Testing...
[Epoch 161] train avg loss 0.000294834, dev acc 0.8967, dev avg loss 0.281612, throughput 3.98081K wps
[Epoch 162 Batch 30/162] avg loss 0.000272575, throughput 4.06492K wps
[Epoch 162 Batch 60/162] avg loss 0.000289583, throughput 3.96618K wps
[Epoch 162 Batch 90/162] avg loss 0.000267492, throughput 3.95772K wps
[Epoch 162 Batch 120/162] avg loss 0.000267037, throughput 3.94489K wps
[Epoch 162 Batch 150/162] avg loss 0.000328753, throughput 3.96257K wps
Begin Testing...
[Epoch 162] train avg loss 0.000285411, dev acc 0.8978, dev avg loss 0.282662, throughput 3.97815K wps
[Epoch 163 Batch 30/162] avg loss 0.000301709, throughput 4.06383K wps
[Epoch 163 Batch 60/162] avg loss 0.000328333, throughput 3.97123K wps
[Epoch 163 Batch 90/162] avg loss 0.000265279, throughput 3.96598K wps
[Epoch 163 Batch 120/162] avg loss 0.000270546, throughput 3.96923K wps
[Epoch 163 Batch 150/162] avg loss 0.000267931, throughput 3.97361K wps
Begin Testing...
[Epoch 163] train avg loss 0.000283902, dev acc 0.8967, dev avg loss 0.283625, throughput 3.98619K wps
[Epoch 164 Batch 30/162] avg loss 0.000235846, throughput 4.05747K wps
[Epoch 164 Batch 60/162] avg loss 0.000286529, throughput 3.97163K wps
[Epoch 164 Batch 90/162] avg loss 0.000267055, throughput 3.96389K wps
[Epoch 164 Batch 120/162] avg loss 0.000244464, throughput 3.96817K wps
[Epoch 164 Batch 150/162] avg loss 0.000275492, throughput 3.96797K wps
Begin Testing...
[Epoch 164] train avg loss 0.000260754, dev acc 0.9000, dev avg loss 0.284462, throughput 3.98214K wps
[Epoch 165 Batch 30/162] avg loss 0.00026955, throughput 4.04821K wps
[Epoch 165 Batch 60/162] avg loss 0.000276315, throughput 3.95849K wps
[Epoch 165 Batch 90/162] avg loss 0.000266492, throughput 3.96381K wps
[Epoch 165 Batch 120/162] avg loss 0.000273013, throughput 3.96965K wps
[Epoch 165 Batch 150/162] avg loss 0.000272899, throughput 3.95611K wps
Begin Testing...
[Epoch 165] train avg loss 0.000275602, dev acc 0.8989, dev avg loss 0.284123, throughput 3.97684K wps
[Epoch 166 Batch 30/162] avg loss 0.000257128, throughput 4.06559K wps
[Epoch 166 Batch 60/162] avg loss 0.000269589, throughput 3.95514K wps
[Epoch 166 Batch 90/162] avg loss 0.000324847, throughput 3.94643K wps
[Epoch 166 Batch 120/162] avg loss 0.000254986, throughput 3.96558K wps
[Epoch 166 Batch 150/162] avg loss 0.000269906, throughput 3.962K wps
Begin Testing...
[Epoch 166] train avg loss 0.000274061, dev acc 0.9000, dev avg loss 0.284983, throughput 3.97799K wps
[Epoch 167 Batch 30/162] avg loss 0.000271932, throughput 4.05738K wps
[Epoch 167 Batch 60/162] avg loss 0.000261148, throughput 3.9645K wps
[Epoch 167 Batch 90/162] avg loss 0.000246419, throughput 3.96472K wps
[Epoch 167 Batch 120/162] avg loss 0.000296387, throughput 3.95191K wps
[Epoch 167 Batch 150/162] avg loss 0.000294746, throughput 3.96143K wps
Begin Testing...
[Epoch 167] train avg loss 0.000269254, dev acc 0.9011, dev avg loss 0.285937, throughput 3.97779K wps
[Epoch 168 Batch 30/162] avg loss 0.000245081, throughput 4.05172K wps
[Epoch 168 Batch 60/162] avg loss 0.000236503, throughput 3.96646K wps
[Epoch 168 Batch 90/162] avg loss 0.000241456, throughput 3.93839K wps
[Epoch 168 Batch 120/162] avg loss 0.000233577, throughput 3.96056K wps
[Epoch 168 Batch 150/162] avg loss 0.000246323, throughput 3.96988K wps
Begin Testing...
[Epoch 168] train avg loss 0.000241903, dev acc 0.8956, dev avg loss 0.284843, throughput 3.97514K wps
[Epoch 169 Batch 30/162] avg loss 0.000235272, throughput 4.0562K wps
[Epoch 169 Batch 60/162] avg loss 0.000231477, throughput 3.97436K wps
[Epoch 169 Batch 90/162] avg loss 0.000264823, throughput 3.96093K wps
[Epoch 169 Batch 120/162] avg loss 0.000295784, throughput 3.95632K wps
[Epoch 169 Batch 150/162] avg loss 0.000214183, throughput 3.95769K wps
Begin Testing...
[Epoch 169] train avg loss 0.000255669, dev acc 0.8967, dev avg loss 0.287677, throughput 3.97809K wps
[Epoch 170 Batch 30/162] avg loss 0.00022656, throughput 4.05541K wps
[Epoch 170 Batch 60/162] avg loss 0.000209702, throughput 3.95458K wps
[Epoch 170 Batch 90/162] avg loss 0.000210065, throughput 3.96028K wps
[Epoch 170 Batch 120/162] avg loss 0.000245545, throughput 3.96456K wps
[Epoch 170 Batch 150/162] avg loss 0.000259445, throughput 3.97343K wps
Begin Testing...
[Epoch 170] train avg loss 0.000228557, dev acc 0.9000, dev avg loss 0.289287, throughput 3.9804K wps
[Epoch 171 Batch 30/162] avg loss 0.000281146, throughput 4.05369K wps
[Epoch 171 Batch 60/162] avg loss 0.000261314, throughput 3.95656K wps
[Epoch 171 Batch 90/162] avg loss 0.000243478, throughput 3.96004K wps
[Epoch 171 Batch 120/162] avg loss 0.000230502, throughput 3.95202K wps
[Epoch 171 Batch 150/162] avg loss 0.000225196, throughput 3.95294K wps
Begin Testing...
[Epoch 171] train avg loss 0.000250208, dev acc 0.8989, dev avg loss 0.288053, throughput 3.97382K wps
[Epoch 172 Batch 30/162] avg loss 0.000231182, throughput 4.04647K wps
[Epoch 172 Batch 60/162] avg loss 0.000223763, throughput 3.95843K wps
[Epoch 172 Batch 90/162] avg loss 0.000233552, throughput 3.95862K wps
[Epoch 172 Batch 120/162] avg loss 0.000260559, throughput 3.94783K wps
[Epoch 172 Batch 150/162] avg loss 0.00027105, throughput 3.94724K wps
Begin Testing...
[Epoch 172] train avg loss 0.000241447, dev acc 0.8978, dev avg loss 0.288143, throughput 3.97003K wps
[Epoch 173 Batch 30/162] avg loss 0.000279006, throughput 4.04203K wps
[Epoch 173 Batch 60/162] avg loss 0.000231639, throughput 3.96852K wps
[Epoch 173 Batch 90/162] avg loss 0.000222931, throughput 3.96674K wps
[Epoch 173 Batch 120/162] avg loss 0.000224178, throughput 3.95429K wps
[Epoch 173 Batch 150/162] avg loss 0.000212335, throughput 3.9741K wps
Begin Testing...
[Epoch 173] train avg loss 0.000234925, dev acc 0.8989, dev avg loss 0.288995, throughput 3.97959K wps
[Epoch 174 Batch 30/162] avg loss 0.000226046, throughput 4.06105K wps
[Epoch 174 Batch 60/162] avg loss 0.00021404, throughput 3.94813K wps
[Epoch 174 Batch 90/162] avg loss 0.000278722, throughput 3.97337K wps
[Epoch 174 Batch 120/162] avg loss 0.000248319, throughput 3.95233K wps
[Epoch 174 Batch 150/162] avg loss 0.000238073, throughput 3.97253K wps
Begin Testing...
[Epoch 174] train avg loss 0.000236557, dev acc 0.8978, dev avg loss 0.287875, throughput 3.98052K wps
[Epoch 175 Batch 30/162] avg loss 0.000240373, throughput 4.06532K wps
[Epoch 175 Batch 60/162] avg loss 0.000215842, throughput 3.94942K wps
[Epoch 175 Batch 90/162] avg loss 0.000243883, throughput 3.9616K wps
[Epoch 175 Batch 120/162] avg loss 0.000243747, throughput 3.97438K wps
[Epoch 175 Batch 150/162] avg loss 0.000254398, throughput 3.96069K wps
Begin Testing...
[Epoch 175] train avg loss 0.000238599, dev acc 0.9000, dev avg loss 0.287663, throughput 3.98055K wps
[Epoch 176 Batch 30/162] avg loss 0.000258021, throughput 4.07449K wps
[Epoch 176 Batch 60/162] avg loss 0.000235621, throughput 3.96164K wps
[Epoch 176 Batch 90/162] avg loss 0.00023686, throughput 3.97384K wps
[Epoch 176 Batch 120/162] avg loss 0.000262838, throughput 3.96126K wps
[Epoch 176 Batch 150/162] avg loss 0.00021941, throughput 3.95191K wps
Begin Testing...
[Epoch 176] train avg loss 0.000241109, dev acc 0.8978, dev avg loss 0.288153, throughput 3.98169K wps
[Epoch 177 Batch 30/162] avg loss 0.000186747, throughput 4.05754K wps
[Epoch 177 Batch 60/162] avg loss 0.000210895, throughput 3.98096K wps
[Epoch 177 Batch 90/162] avg loss 0.000218655, throughput 3.97731K wps
[Epoch 177 Batch 120/162] avg loss 0.000222623, throughput 3.96023K wps
[Epoch 177 Batch 150/162] avg loss 0.00024932, throughput 3.9747K wps
Begin Testing...
[Epoch 177] train avg loss 0.000218432, dev acc 0.9000, dev avg loss 0.289776, throughput 3.98743K wps
[Epoch 178 Batch 30/162] avg loss 0.000214785, throughput 4.04925K wps
[Epoch 178 Batch 60/162] avg loss 0.000187737, throughput 3.95233K wps
[Epoch 178 Batch 90/162] avg loss 0.000185576, throughput 3.94962K wps
[Epoch 178 Batch 120/162] avg loss 0.000223164, throughput 3.96083K wps
[Epoch 178 Batch 150/162] avg loss 0.000237002, throughput 3.9636K wps
Begin Testing...
[Epoch 178] train avg loss 0.000210393, dev acc 0.9000, dev avg loss 0.291152, throughput 3.97211K wps
[Epoch 179 Batch 30/162] avg loss 0.000203535, throughput 4.04733K wps
[Epoch 179 Batch 60/162] avg loss 0.000242268, throughput 3.96545K wps
[Epoch 179 Batch 90/162] avg loss 0.000236581, throughput 3.9619K wps
[Epoch 179 Batch 120/162] avg loss 0.000220843, throughput 3.96482K wps
[Epoch 179 Batch 150/162] avg loss 0.000226125, throughput 3.95729K wps
Begin Testing...
[Epoch 179] train avg loss 0.000225855, dev acc 0.9000, dev avg loss 0.292268, throughput 3.97836K wps
[Epoch 180 Batch 30/162] avg loss 0.000234402, throughput 4.04229K wps
[Epoch 180 Batch 60/162] avg loss 0.00025807, throughput 3.94596K wps
[Epoch 180 Batch 90/162] avg loss 0.000219527, throughput 3.9374K wps
[Epoch 180 Batch 120/162] avg loss 0.000223886, throughput 3.96224K wps
[Epoch 180 Batch 150/162] avg loss 0.000196126, throughput 3.96773K wps
Begin Testing...
[Epoch 180] train avg loss 0.000229634, dev acc 0.8989, dev avg loss 0.291229, throughput 3.97148K wps
[Epoch 181 Batch 30/162] avg loss 0.000209163, throughput 4.07634K wps
[Epoch 181 Batch 60/162] avg loss 0.000239298, throughput 3.97041K wps
[Epoch 181 Batch 90/162] avg loss 0.00021597, throughput 3.94854K wps
[Epoch 181 Batch 120/162] avg loss 0.000215697, throughput 3.96422K wps
[Epoch 181 Batch 150/162] avg loss 0.000203109, throughput 3.96152K wps
Begin Testing...
[Epoch 181] train avg loss 0.000216583, dev acc 0.9000, dev avg loss 0.291674, throughput 3.98256K wps
[Epoch 182 Batch 30/162] avg loss 0.000207649, throughput 4.06644K wps
[Epoch 182 Batch 60/162] avg loss 0.000194826, throughput 3.94866K wps
[Epoch 182 Batch 90/162] avg loss 0.000217407, throughput 3.95783K wps
[Epoch 182 Batch 120/162] avg loss 0.000244012, throughput 3.96662K wps
[Epoch 182 Batch 150/162] avg loss 0.000204868, throughput 3.96196K wps
Begin Testing...
[Epoch 182] train avg loss 0.000211794, dev acc 0.9000, dev avg loss 0.293758, throughput 3.97992K wps
[Epoch 183 Batch 30/162] avg loss 0.000218384, throughput 4.04774K wps
[Epoch 183 Batch 60/162] avg loss 0.000201314, throughput 3.95586K wps
[Epoch 183 Batch 90/162] avg loss 0.000230479, throughput 3.95463K wps
[Epoch 183 Batch 120/162] avg loss 0.000214194, throughput 3.96539K wps
[Epoch 183 Batch 150/162] avg loss 0.000204679, throughput 3.96835K wps
Begin Testing...
[Epoch 183] train avg loss 0.000209866, dev acc 0.9011, dev avg loss 0.292457, throughput 3.97804K wps
[Epoch 184 Batch 30/162] avg loss 0.000229975, throughput 4.05031K wps
[Epoch 184 Batch 60/162] avg loss 0.00023866, throughput 3.97302K wps
[Epoch 184 Batch 90/162] avg loss 0.000197327, throughput 3.96263K wps
[Epoch 184 Batch 120/162] avg loss 0.000171384, throughput 3.95933K wps
[Epoch 184 Batch 150/162] avg loss 0.000212041, throughput 3.95716K wps
Begin Testing...
[Epoch 184] train avg loss 0.000208559, dev acc 0.9022, dev avg loss 0.294908, throughput 3.98002K wps
[Epoch 185 Batch 30/162] avg loss 0.000172154, throughput 4.05453K wps
[Epoch 185 Batch 60/162] avg loss 0.000171395, throughput 3.94739K wps
[Epoch 185 Batch 90/162] avg loss 0.000255702, throughput 3.96965K wps
[Epoch 185 Batch 120/162] avg loss 0.000207976, throughput 3.96521K wps
[Epoch 185 Batch 150/162] avg loss 0.000216257, throughput 3.80354K wps
Begin Testing...
[Epoch 185] train avg loss 0.000205658, dev acc 0.8978, dev avg loss 0.293852, throughput 3.94773K wps
[Epoch 186 Batch 30/162] avg loss 0.000183255, throughput 4.0638K wps
[Epoch 186 Batch 60/162] avg loss 0.000221339, throughput 3.96701K wps
[Epoch 186 Batch 90/162] avg loss 0.000226228, throughput 3.9685K wps
[Epoch 186 Batch 120/162] avg loss 0.000194229, throughput 3.97825K wps
[Epoch 186 Batch 150/162] avg loss 0.000253131, throughput 3.96253K wps
Begin Testing...
[Epoch 186] train avg loss 0.000217097, dev acc 0.9000, dev avg loss 0.294594, throughput 3.98481K wps
[Epoch 187 Batch 30/162] avg loss 0.000200741, throughput 4.05969K wps
[Epoch 187 Batch 60/162] avg loss 0.000198898, throughput 3.96805K wps
[Epoch 187 Batch 90/162] avg loss 0.000195608, throughput 3.9688K wps
[Epoch 187 Batch 120/162] avg loss 0.000217629, throughput 3.96663K wps
[Epoch 187 Batch 150/162] avg loss 0.000166092, throughput 3.96324K wps
Begin Testing...
[Epoch 187] train avg loss 0.000195807, dev acc 0.9000, dev avg loss 0.296087, throughput 3.98193K wps
[Epoch 188 Batch 30/162] avg loss 0.000194049, throughput 4.05354K wps
[Epoch 188 Batch 60/162] avg loss 0.000208791, throughput 3.9641K wps
[Epoch 188 Batch 90/162] avg loss 0.000182114, throughput 3.94375K wps
[Epoch 188 Batch 120/162] avg loss 0.000190613, throughput 3.96618K wps
[Epoch 188 Batch 150/162] avg loss 0.000193331, throughput 3.97254K wps
Begin Testing...
[Epoch 188] train avg loss 0.000192139, dev acc 0.9000, dev avg loss 0.299242, throughput 3.97905K wps
[Epoch 189 Batch 30/162] avg loss 0.000176676, throughput 4.05723K wps
[Epoch 189 Batch 60/162] avg loss 0.000176662, throughput 3.97139K wps
[Epoch 189 Batch 90/162] avg loss 0.000194415, throughput 3.97662K wps
[Epoch 189 Batch 120/162] avg loss 0.000184889, throughput 3.97461K wps
[Epoch 189 Batch 150/162] avg loss 0.000180777, throughput 3.95129K wps
Begin Testing...
[Epoch 189] train avg loss 0.000183892, dev acc 0.9022, dev avg loss 0.297882, throughput 3.98301K wps
[Epoch 190 Batch 30/162] avg loss 0.000182387, throughput 4.04853K wps
[Epoch 190 Batch 60/162] avg loss 0.000184178, throughput 3.95643K wps
[Epoch 190 Batch 90/162] avg loss 0.000171766, throughput 3.96764K wps
[Epoch 190 Batch 120/162] avg loss 0.000179421, throughput 3.97148K wps
[Epoch 190 Batch 150/162] avg loss 0.000194764, throughput 3.97169K wps
Begin Testing...
[Epoch 190] train avg loss 0.000184662, dev acc 0.9000, dev avg loss 0.299239, throughput 3.98172K wps
[Epoch 191 Batch 30/162] avg loss 0.000183953, throughput 4.05901K wps
[Epoch 191 Batch 60/162] avg loss 0.00017084, throughput 3.96282K wps
[Epoch 191 Batch 90/162] avg loss 0.000228897, throughput 3.96423K wps
[Epoch 191 Batch 120/162] avg loss 0.000178828, throughput 3.96579K wps
[Epoch 191 Batch 150/162] avg loss 0.000172405, throughput 3.95779K wps
Begin Testing...
[Epoch 191] train avg loss 0.000184344, dev acc 0.8978, dev avg loss 0.29981, throughput 3.97993K wps
[Epoch 192 Batch 30/162] avg loss 0.000221333, throughput 4.05472K wps
[Epoch 192 Batch 60/162] avg loss 0.000195306, throughput 3.95907K wps
[Epoch 192 Batch 90/162] avg loss 0.000196048, throughput 3.96587K wps
[Epoch 192 Batch 120/162] avg loss 0.000193249, throughput 3.97364K wps
[Epoch 192 Batch 150/162] avg loss 0.000174033, throughput 3.97155K wps
Begin Testing...
[Epoch 192] train avg loss 0.000202486, dev acc 0.9022, dev avg loss 0.301925, throughput 3.98389K wps
[Epoch 193 Batch 30/162] avg loss 0.000152009, throughput 4.03605K wps
[Epoch 193 Batch 60/162] avg loss 0.000189739, throughput 3.95644K wps
[Epoch 193 Batch 90/162] avg loss 0.000180925, throughput 3.96006K wps
[Epoch 193 Batch 120/162] avg loss 0.000189543, throughput 3.95247K wps
[Epoch 193 Batch 150/162] avg loss 0.000192281, throughput 3.95993K wps
Begin Testing...
[Epoch 193] train avg loss 0.00018308, dev acc 0.8978, dev avg loss 0.298257, throughput 3.9737K wps
[Epoch 194 Batch 30/162] avg loss 0.000199493, throughput 4.05203K wps
[Epoch 194 Batch 60/162] avg loss 0.000183031, throughput 3.97081K wps
[Epoch 194 Batch 90/162] avg loss 0.00017951, throughput 3.96406K wps
[Epoch 194 Batch 120/162] avg loss 0.000227861, throughput 3.96972K wps
[Epoch 194 Batch 150/162] avg loss 0.000155716, throughput 3.95894K wps
Begin Testing...
[Epoch 194] train avg loss 0.000185462, dev acc 0.8989, dev avg loss 0.300721, throughput 3.98199K wps
[Epoch 195 Batch 30/162] avg loss 0.000175069, throughput 4.07106K wps
[Epoch 195 Batch 60/162] avg loss 0.000206559, throughput 3.96402K wps
[Epoch 195 Batch 90/162] avg loss 0.000171746, throughput 3.96152K wps
[Epoch 195 Batch 120/162] avg loss 0.000166036, throughput 3.9605K wps
[Epoch 195 Batch 150/162] avg loss 0.000156456, throughput 3.95805K wps
Begin Testing...
[Epoch 195] train avg loss 0.000179964, dev acc 0.9022, dev avg loss 0.300147, throughput 3.98174K wps
[Epoch 196 Batch 30/162] avg loss 0.000181052, throughput 4.05398K wps
[Epoch 196 Batch 60/162] avg loss 0.00017292, throughput 3.96763K wps
[Epoch 196 Batch 90/162] avg loss 0.0001803, throughput 3.95027K wps
[Epoch 196 Batch 120/162] avg loss 0.000162383, throughput 3.95837K wps
[Epoch 196 Batch 150/162] avg loss 0.000158061, throughput 3.97586K wps
Begin Testing...
[Epoch 196] train avg loss 0.000169684, dev acc 0.9033, dev avg loss 0.300715, throughput 3.98016K wps
[Epoch 197 Batch 30/162] avg loss 0.000158477, throughput 4.05481K wps
[Epoch 197 Batch 60/162] avg loss 0.000164712, throughput 3.95703K wps
[Epoch 197 Batch 90/162] avg loss 0.000181403, throughput 3.96626K wps
[Epoch 197 Batch 120/162] avg loss 0.000180879, throughput 3.96285K wps
[Epoch 197 Batch 150/162] avg loss 0.000200915, throughput 3.96117K wps
Begin Testing...
[Epoch 197] train avg loss 0.000174566, dev acc 0.9011, dev avg loss 0.300463, throughput 3.97858K wps
[Epoch 198 Batch 30/162] avg loss 0.000169337, throughput 4.05705K wps
[Epoch 198 Batch 60/162] avg loss 0.000166676, throughput 3.95837K wps
[Epoch 198 Batch 90/162] avg loss 0.000159157, throughput 3.96003K wps
[Epoch 198 Batch 120/162] avg loss 0.000206411, throughput 3.95713K wps
[Epoch 198 Batch 150/162] avg loss 0.000204798, throughput 3.96767K wps
Begin Testing...
[Epoch 198] train avg loss 0.000179598, dev acc 0.8978, dev avg loss 0.301171, throughput 3.97914K wps
[Epoch 199 Batch 30/162] avg loss 0.000166412, throughput 4.06778K wps
[Epoch 199 Batch 60/162] avg loss 0.000152203, throughput 3.9793K wps
[Epoch 199 Batch 90/162] avg loss 0.000179558, throughput 3.95718K wps
[Epoch 199 Batch 120/162] avg loss 0.000146998, throughput 3.96592K wps
[Epoch 199 Batch 150/162] avg loss 0.00016462, throughput 3.95537K wps
Begin Testing...
[Epoch 199] train avg loss 0.000161939, dev acc 0.9000, dev avg loss 0.30148, throughput 3.98167K wps
Test loss 0.281244, test acc 0.8870
Total time cost 1011.77s
[Epoch 0 Batch 30/162] avg loss 0.0139186, throughput 3.61853K wps
[Epoch 0 Batch 60/162] avg loss 0.0138543, throughput 3.95332K wps
[Epoch 0 Batch 90/162] avg loss 0.0138217, throughput 3.96511K wps
[Epoch 0 Batch 120/162] avg loss 0.0138326, throughput 3.96729K wps
[Epoch 0 Batch 150/162] avg loss 0.0138413, throughput 3.97767K wps
Begin Testing...
[Epoch 0] train avg loss 0.0138515, dev acc 0.5667, dev avg loss 0.689759, throughput 3.89694K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0137652, throughput 4.06163K wps
[Epoch 1 Batch 60/162] avg loss 0.0137796, throughput 3.97429K wps
[Epoch 1 Batch 90/162] avg loss 0.0137755, throughput 3.96392K wps
[Epoch 1 Batch 120/162] avg loss 0.0137095, throughput 3.96207K wps
[Epoch 1 Batch 150/162] avg loss 0.0137509, throughput 3.97831K wps
Begin Testing...
[Epoch 1] train avg loss 0.013753, dev acc 0.6911, dev avg loss 0.686247, throughput 3.98731K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/162] avg loss 0.0136895, throughput 4.04684K wps
[Epoch 2 Batch 60/162] avg loss 0.0136677, throughput 3.96056K wps
[Epoch 2 Batch 90/162] avg loss 0.0136629, throughput 3.97596K wps
[Epoch 2 Batch 120/162] avg loss 0.0136417, throughput 3.95918K wps
[Epoch 2 Batch 150/162] avg loss 0.0136176, throughput 3.97255K wps
Begin Testing...
[Epoch 2] train avg loss 0.0136514, dev acc 0.6344, dev avg loss 0.681016, throughput 3.98164K wps
[Epoch 3 Batch 30/162] avg loss 0.0135907, throughput 4.07308K wps
[Epoch 3 Batch 60/162] avg loss 0.0135686, throughput 3.9508K wps
[Epoch 3 Batch 90/162] avg loss 0.0135287, throughput 3.9734K wps
[Epoch 3 Batch 120/162] avg loss 0.0135283, throughput 3.95436K wps
[Epoch 3 Batch 150/162] avg loss 0.0135095, throughput 3.94787K wps
Begin Testing...
[Epoch 3] train avg loss 0.0135353, dev acc 0.6767, dev avg loss 0.67574, throughput 3.97541K wps
[Epoch 4 Batch 30/162] avg loss 0.0134399, throughput 4.05258K wps
[Epoch 4 Batch 60/162] avg loss 0.0134087, throughput 3.96726K wps
[Epoch 4 Batch 90/162] avg loss 0.0133656, throughput 3.96916K wps
[Epoch 4 Batch 120/162] avg loss 0.0133508, throughput 3.97104K wps
[Epoch 4 Batch 150/162] avg loss 0.013267, throughput 3.9669K wps
Begin Testing...
[Epoch 4] train avg loss 0.0133603, dev acc 0.6811, dev avg loss 0.666934, throughput 3.98372K wps
[Epoch 5 Batch 30/162] avg loss 0.0132794, throughput 4.0553K wps
[Epoch 5 Batch 60/162] avg loss 0.0131896, throughput 3.96565K wps
[Epoch 5 Batch 90/162] avg loss 0.0131805, throughput 3.97124K wps
[Epoch 5 Batch 120/162] avg loss 0.0130697, throughput 3.95767K wps
[Epoch 5 Batch 150/162] avg loss 0.0130641, throughput 3.96793K wps
Begin Testing...
[Epoch 5] train avg loss 0.0131542, dev acc 0.6744, dev avg loss 0.656544, throughput 3.98059K wps
[Epoch 6 Batch 30/162] avg loss 0.0129645, throughput 4.06342K wps
[Epoch 6 Batch 60/162] avg loss 0.013074, throughput 3.95305K wps
[Epoch 6 Batch 90/162] avg loss 0.0128551, throughput 3.9573K wps
[Epoch 6 Batch 120/162] avg loss 0.0128132, throughput 3.94755K wps
[Epoch 6 Batch 150/162] avg loss 0.0128198, throughput 3.96235K wps
Begin Testing...
[Epoch 6] train avg loss 0.0128903, dev acc 0.6944, dev avg loss 0.644414, throughput 3.97174K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/162] avg loss 0.0127202, throughput 4.04178K wps
[Epoch 7 Batch 60/162] avg loss 0.0126651, throughput 3.94675K wps
[Epoch 7 Batch 90/162] avg loss 0.0125529, throughput 3.96245K wps
[Epoch 7 Batch 120/162] avg loss 0.0125821, throughput 3.96671K wps
[Epoch 7 Batch 150/162] avg loss 0.0125163, throughput 3.96904K wps
Begin Testing...
[Epoch 7] train avg loss 0.0125954, dev acc 0.6967, dev avg loss 0.628662, throughput 3.97533K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.0124187, throughput 4.04004K wps
[Epoch 8 Batch 60/162] avg loss 0.0123746, throughput 3.93936K wps
[Epoch 8 Batch 90/162] avg loss 0.0122095, throughput 3.96252K wps
[Epoch 8 Batch 120/162] avg loss 0.0121616, throughput 3.96942K wps
[Epoch 8 Batch 150/162] avg loss 0.0120772, throughput 3.95614K wps
Begin Testing...
[Epoch 8] train avg loss 0.0122343, dev acc 0.6967, dev avg loss 0.610792, throughput 3.97262K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/162] avg loss 0.0121247, throughput 4.06465K wps
[Epoch 9 Batch 60/162] avg loss 0.0119794, throughput 3.95834K wps
[Epoch 9 Batch 90/162] avg loss 0.0118618, throughput 3.96143K wps
[Epoch 9 Batch 120/162] avg loss 0.011697, throughput 3.9676K wps
[Epoch 9 Batch 150/162] avg loss 0.0116807, throughput 3.96262K wps
Begin Testing...
[Epoch 9] train avg loss 0.0118535, dev acc 0.7144, dev avg loss 0.592115, throughput 3.98081K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.0116117, throughput 4.0691K wps
[Epoch 10 Batch 60/162] avg loss 0.0116207, throughput 3.96963K wps
[Epoch 10 Batch 90/162] avg loss 0.0113785, throughput 3.95242K wps
[Epoch 10 Batch 120/162] avg loss 0.0112602, throughput 3.9581K wps
[Epoch 10 Batch 150/162] avg loss 0.0113888, throughput 3.96708K wps
Begin Testing...
[Epoch 10] train avg loss 0.011447, dev acc 0.7289, dev avg loss 0.573475, throughput 3.98094K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.011056, throughput 4.06387K wps
[Epoch 11 Batch 60/162] avg loss 0.0110788, throughput 3.96236K wps
[Epoch 11 Batch 90/162] avg loss 0.0111718, throughput 3.93109K wps
[Epoch 11 Batch 120/162] avg loss 0.011298, throughput 3.95253K wps
[Epoch 11 Batch 150/162] avg loss 0.0108268, throughput 3.96782K wps
Begin Testing...
[Epoch 11] train avg loss 0.0110809, dev acc 0.7433, dev avg loss 0.555143, throughput 3.97242K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.0108666, throughput 4.04952K wps
[Epoch 12 Batch 60/162] avg loss 0.0107875, throughput 3.97027K wps
[Epoch 12 Batch 90/162] avg loss 0.0108085, throughput 3.96001K wps
[Epoch 12 Batch 120/162] avg loss 0.010608, throughput 3.95859K wps
[Epoch 12 Batch 150/162] avg loss 0.0103854, throughput 3.95012K wps
Begin Testing...
[Epoch 12] train avg loss 0.0106804, dev acc 0.7556, dev avg loss 0.537087, throughput 3.97681K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.0104294, throughput 4.06472K wps
[Epoch 13 Batch 60/162] avg loss 0.0107732, throughput 3.94646K wps
[Epoch 13 Batch 90/162] avg loss 0.0100968, throughput 3.95745K wps
[Epoch 13 Batch 120/162] avg loss 0.0105833, throughput 3.9629K wps
[Epoch 13 Batch 150/162] avg loss 0.00988996, throughput 3.96397K wps
Begin Testing...
[Epoch 13] train avg loss 0.0103377, dev acc 0.7700, dev avg loss 0.519454, throughput 3.97511K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.0100832, throughput 4.04457K wps
[Epoch 14 Batch 60/162] avg loss 0.0101441, throughput 3.95969K wps
[Epoch 14 Batch 90/162] avg loss 0.00999382, throughput 3.95508K wps
[Epoch 14 Batch 120/162] avg loss 0.00960651, throughput 3.96401K wps
[Epoch 14 Batch 150/162] avg loss 0.0100396, throughput 3.95633K wps
Begin Testing...
[Epoch 14] train avg loss 0.00995789, dev acc 0.7733, dev avg loss 0.503052, throughput 3.97137K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00980825, throughput 4.03833K wps
[Epoch 15 Batch 60/162] avg loss 0.00977802, throughput 3.93526K wps
[Epoch 15 Batch 90/162] avg loss 0.00962245, throughput 3.9356K wps
[Epoch 15 Batch 120/162] avg loss 0.00943122, throughput 3.92673K wps
[Epoch 15 Batch 150/162] avg loss 0.00954587, throughput 3.9367K wps
Begin Testing...
[Epoch 15] train avg loss 0.00963721, dev acc 0.7789, dev avg loss 0.488834, throughput 3.95332K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/162] avg loss 0.00968896, throughput 4.04917K wps
[Epoch 16 Batch 60/162] avg loss 0.00939697, throughput 3.9703K wps
[Epoch 16 Batch 90/162] avg loss 0.00959436, throughput 3.95733K wps
[Epoch 16 Batch 120/162] avg loss 0.00927148, throughput 3.95875K wps
[Epoch 16 Batch 150/162] avg loss 0.00907132, throughput 3.94937K wps
Begin Testing...
[Epoch 16] train avg loss 0.00939633, dev acc 0.7811, dev avg loss 0.475578, throughput 3.97611K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.00878026, throughput 4.06313K wps
[Epoch 17 Batch 60/162] avg loss 0.00963091, throughput 3.96766K wps
[Epoch 17 Batch 90/162] avg loss 0.00925339, throughput 3.96524K wps
[Epoch 17 Batch 120/162] avg loss 0.00936468, throughput 3.97513K wps
[Epoch 17 Batch 150/162] avg loss 0.00877781, throughput 3.96435K wps
Begin Testing...
[Epoch 17] train avg loss 0.00915593, dev acc 0.7867, dev avg loss 0.465488, throughput 3.98404K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/162] avg loss 0.00930204, throughput 4.05517K wps
[Epoch 18 Batch 60/162] avg loss 0.00891702, throughput 3.95624K wps
[Epoch 18 Batch 90/162] avg loss 0.00871292, throughput 3.95156K wps
[Epoch 18 Batch 120/162] avg loss 0.00896524, throughput 3.95003K wps
[Epoch 18 Batch 150/162] avg loss 0.00853627, throughput 3.95304K wps
Begin Testing...
[Epoch 18] train avg loss 0.00891693, dev acc 0.7922, dev avg loss 0.455025, throughput 3.97225K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.0090976, throughput 4.05562K wps
[Epoch 19 Batch 60/162] avg loss 0.00854929, throughput 3.96463K wps
[Epoch 19 Batch 90/162] avg loss 0.0083399, throughput 3.9621K wps
[Epoch 19 Batch 120/162] avg loss 0.00903384, throughput 3.96362K wps
[Epoch 19 Batch 150/162] avg loss 0.00863034, throughput 3.95098K wps
Begin Testing...
[Epoch 19] train avg loss 0.00872656, dev acc 0.7922, dev avg loss 0.448304, throughput 3.97666K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/162] avg loss 0.00852291, throughput 4.06548K wps
[Epoch 20 Batch 60/162] avg loss 0.00848978, throughput 3.97423K wps
[Epoch 20 Batch 90/162] avg loss 0.00839327, throughput 3.96001K wps
[Epoch 20 Batch 120/162] avg loss 0.00844032, throughput 3.9624K wps
[Epoch 20 Batch 150/162] avg loss 0.00854502, throughput 3.9557K wps
Begin Testing...
[Epoch 20] train avg loss 0.0085283, dev acc 0.8000, dev avg loss 0.438536, throughput 3.98199K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/162] avg loss 0.00866311, throughput 4.06332K wps
[Epoch 21 Batch 60/162] avg loss 0.00826124, throughput 3.95117K wps
[Epoch 21 Batch 90/162] avg loss 0.0084702, throughput 3.96531K wps
[Epoch 21 Batch 120/162] avg loss 0.00893669, throughput 3.96028K wps
[Epoch 21 Batch 150/162] avg loss 0.00809938, throughput 3.96543K wps
Begin Testing...
[Epoch 21] train avg loss 0.00848484, dev acc 0.8011, dev avg loss 0.43295, throughput 3.97792K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.00813795, throughput 4.05779K wps
[Epoch 22 Batch 60/162] avg loss 0.00832134, throughput 3.96364K wps
[Epoch 22 Batch 90/162] avg loss 0.00811698, throughput 3.96768K wps
[Epoch 22 Batch 120/162] avg loss 0.00844433, throughput 3.96848K wps
[Epoch 22 Batch 150/162] avg loss 0.0084828, throughput 3.9658K wps
Begin Testing...
[Epoch 22] train avg loss 0.00827088, dev acc 0.8056, dev avg loss 0.426252, throughput 3.98247K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/162] avg loss 0.00799586, throughput 4.05629K wps
[Epoch 23 Batch 60/162] avg loss 0.00834652, throughput 3.95683K wps
[Epoch 23 Batch 90/162] avg loss 0.00810674, throughput 3.97357K wps
[Epoch 23 Batch 120/162] avg loss 0.00814798, throughput 3.97379K wps
[Epoch 23 Batch 150/162] avg loss 0.00800411, throughput 3.9475K wps
Begin Testing...
[Epoch 23] train avg loss 0.00812864, dev acc 0.8056, dev avg loss 0.420571, throughput 3.97857K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/162] avg loss 0.00834679, throughput 4.05575K wps
[Epoch 24 Batch 60/162] avg loss 0.00798169, throughput 3.96182K wps
[Epoch 24 Batch 90/162] avg loss 0.00799153, throughput 3.96656K wps
[Epoch 24 Batch 120/162] avg loss 0.00771147, throughput 3.95591K wps
[Epoch 24 Batch 150/162] avg loss 0.00818285, throughput 3.95283K wps
Begin Testing...
[Epoch 24] train avg loss 0.0080262, dev acc 0.8078, dev avg loss 0.41572, throughput 3.97573K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/162] avg loss 0.00793604, throughput 4.06151K wps
[Epoch 25 Batch 60/162] avg loss 0.00797628, throughput 3.9559K wps
[Epoch 25 Batch 90/162] avg loss 0.00786818, throughput 3.94913K wps
[Epoch 25 Batch 120/162] avg loss 0.00797915, throughput 3.95921K wps
[Epoch 25 Batch 150/162] avg loss 0.00765396, throughput 3.96759K wps
Begin Testing...
[Epoch 25] train avg loss 0.00784327, dev acc 0.8133, dev avg loss 0.412003, throughput 3.97722K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/162] avg loss 0.00792041, throughput 4.05833K wps
[Epoch 26 Batch 60/162] avg loss 0.00786177, throughput 3.96036K wps
[Epoch 26 Batch 90/162] avg loss 0.00769927, throughput 3.95817K wps
[Epoch 26 Batch 120/162] avg loss 0.007757, throughput 3.95317K wps
[Epoch 26 Batch 150/162] avg loss 0.00751316, throughput 3.96113K wps
Begin Testing...
[Epoch 26] train avg loss 0.00775524, dev acc 0.8100, dev avg loss 0.406927, throughput 3.97733K wps
[Epoch 27 Batch 30/162] avg loss 0.0078265, throughput 4.05958K wps
[Epoch 27 Batch 60/162] avg loss 0.00709236, throughput 3.97965K wps
[Epoch 27 Batch 90/162] avg loss 0.0077651, throughput 3.96729K wps
[Epoch 27 Batch 120/162] avg loss 0.00771493, throughput 3.94926K wps
[Epoch 27 Batch 150/162] avg loss 0.0076089, throughput 3.96288K wps
Begin Testing...
[Epoch 27] train avg loss 0.00757329, dev acc 0.8200, dev avg loss 0.402764, throughput 3.98235K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/162] avg loss 0.00802194, throughput 4.0535K wps
[Epoch 28 Batch 60/162] avg loss 0.00717826, throughput 3.94482K wps
[Epoch 28 Batch 90/162] avg loss 0.00742138, throughput 3.95467K wps
[Epoch 28 Batch 120/162] avg loss 0.0072789, throughput 3.95931K wps
[Epoch 28 Batch 150/162] avg loss 0.00752838, throughput 3.95274K wps
Begin Testing...
[Epoch 28] train avg loss 0.00748165, dev acc 0.8189, dev avg loss 0.396646, throughput 3.97282K wps
[Epoch 29 Batch 30/162] avg loss 0.00737801, throughput 4.05876K wps
[Epoch 29 Batch 60/162] avg loss 0.00702807, throughput 3.96919K wps
[Epoch 29 Batch 90/162] avg loss 0.007255, throughput 3.96655K wps
[Epoch 29 Batch 120/162] avg loss 0.00755945, throughput 3.97072K wps
[Epoch 29 Batch 150/162] avg loss 0.00716215, throughput 3.93846K wps
Begin Testing...
[Epoch 29] train avg loss 0.00730065, dev acc 0.8200, dev avg loss 0.392283, throughput 3.97817K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/162] avg loss 0.00737739, throughput 4.05318K wps
[Epoch 30 Batch 60/162] avg loss 0.00727277, throughput 3.96931K wps
[Epoch 30 Batch 90/162] avg loss 0.00753839, throughput 3.96522K wps
[Epoch 30 Batch 120/162] avg loss 0.0069584, throughput 3.96382K wps
[Epoch 30 Batch 150/162] avg loss 0.00720548, throughput 3.95102K wps
Begin Testing...
[Epoch 30] train avg loss 0.00719987, dev acc 0.8222, dev avg loss 0.387851, throughput 3.97882K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00717025, throughput 4.06307K wps
[Epoch 31 Batch 60/162] avg loss 0.00684842, throughput 3.96275K wps
[Epoch 31 Batch 90/162] avg loss 0.00686918, throughput 3.95403K wps
[Epoch 31 Batch 120/162] avg loss 0.00722036, throughput 3.96181K wps
[Epoch 31 Batch 150/162] avg loss 0.00711033, throughput 3.94763K wps
Begin Testing...
[Epoch 31] train avg loss 0.00702088, dev acc 0.8267, dev avg loss 0.38365, throughput 3.97622K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/162] avg loss 0.00698619, throughput 4.06056K wps
[Epoch 32 Batch 60/162] avg loss 0.00727675, throughput 3.96308K wps
[Epoch 32 Batch 90/162] avg loss 0.00680455, throughput 3.96322K wps
[Epoch 32 Batch 120/162] avg loss 0.00659875, throughput 3.97256K wps
[Epoch 32 Batch 150/162] avg loss 0.00686626, throughput 3.97685K wps
Begin Testing...
[Epoch 32] train avg loss 0.00688111, dev acc 0.8256, dev avg loss 0.379486, throughput 3.98559K wps
[Epoch 33 Batch 30/162] avg loss 0.00673901, throughput 4.04098K wps
[Epoch 33 Batch 60/162] avg loss 0.00671808, throughput 3.96813K wps
[Epoch 33 Batch 90/162] avg loss 0.00685027, throughput 3.96472K wps
[Epoch 33 Batch 120/162] avg loss 0.00661827, throughput 3.9563K wps
[Epoch 33 Batch 150/162] avg loss 0.00713594, throughput 3.95462K wps
Begin Testing...
[Epoch 33] train avg loss 0.00683234, dev acc 0.8267, dev avg loss 0.37754, throughput 3.97601K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/162] avg loss 0.00680215, throughput 4.06108K wps
[Epoch 34 Batch 60/162] avg loss 0.00693536, throughput 3.98248K wps
[Epoch 34 Batch 90/162] avg loss 0.00680345, throughput 3.96645K wps
[Epoch 34 Batch 120/162] avg loss 0.00641749, throughput 3.95884K wps
[Epoch 34 Batch 150/162] avg loss 0.00624609, throughput 3.9599K wps
Begin Testing...
[Epoch 34] train avg loss 0.00664532, dev acc 0.8333, dev avg loss 0.37204, throughput 3.98324K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/162] avg loss 0.00630539, throughput 4.04333K wps
[Epoch 35 Batch 60/162] avg loss 0.00653013, throughput 3.9673K wps
[Epoch 35 Batch 90/162] avg loss 0.00700816, throughput 3.95568K wps
[Epoch 35 Batch 120/162] avg loss 0.00645346, throughput 3.94875K wps
[Epoch 35 Batch 150/162] avg loss 0.00673755, throughput 3.96481K wps
Begin Testing...
[Epoch 35] train avg loss 0.00660914, dev acc 0.8344, dev avg loss 0.367575, throughput 3.97398K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/162] avg loss 0.00631932, throughput 4.05675K wps
[Epoch 36 Batch 60/162] avg loss 0.00638291, throughput 3.96412K wps
[Epoch 36 Batch 90/162] avg loss 0.00653317, throughput 3.94813K wps
[Epoch 36 Batch 120/162] avg loss 0.00668866, throughput 3.94789K wps
[Epoch 36 Batch 150/162] avg loss 0.00623191, throughput 3.96568K wps
Begin Testing...
[Epoch 36] train avg loss 0.00639559, dev acc 0.8311, dev avg loss 0.363243, throughput 3.97471K wps
[Epoch 37 Batch 30/162] avg loss 0.00649472, throughput 4.04164K wps
[Epoch 37 Batch 60/162] avg loss 0.00653782, throughput 3.95091K wps
[Epoch 37 Batch 90/162] avg loss 0.00633798, throughput 3.96432K wps
[Epoch 37 Batch 120/162] avg loss 0.00631994, throughput 3.96523K wps
[Epoch 37 Batch 150/162] avg loss 0.00613692, throughput 3.96434K wps
Begin Testing...
[Epoch 37] train avg loss 0.00630788, dev acc 0.8411, dev avg loss 0.359514, throughput 3.97542K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/162] avg loss 0.00553655, throughput 4.05379K wps
[Epoch 38 Batch 60/162] avg loss 0.00640269, throughput 3.97506K wps
[Epoch 38 Batch 90/162] avg loss 0.00616288, throughput 3.9551K wps
[Epoch 38 Batch 120/162] avg loss 0.00628466, throughput 3.95313K wps
[Epoch 38 Batch 150/162] avg loss 0.0061243, throughput 3.96741K wps
Begin Testing...
[Epoch 38] train avg loss 0.00614495, dev acc 0.8444, dev avg loss 0.356156, throughput 3.97937K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/162] avg loss 0.00640052, throughput 4.05321K wps
[Epoch 39 Batch 60/162] avg loss 0.00627402, throughput 3.97232K wps
[Epoch 39 Batch 90/162] avg loss 0.0060299, throughput 3.95616K wps
[Epoch 39 Batch 120/162] avg loss 0.00574171, throughput 3.96431K wps
[Epoch 39 Batch 150/162] avg loss 0.00593621, throughput 3.95309K wps
Begin Testing...
[Epoch 39] train avg loss 0.00605259, dev acc 0.8444, dev avg loss 0.353854, throughput 3.97665K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/162] avg loss 0.00606731, throughput 4.0593K wps
[Epoch 40 Batch 60/162] avg loss 0.00586184, throughput 3.94976K wps
[Epoch 40 Batch 90/162] avg loss 0.00551285, throughput 3.95869K wps
[Epoch 40 Batch 120/162] avg loss 0.00591421, throughput 3.96391K wps
[Epoch 40 Batch 150/162] avg loss 0.00610253, throughput 3.96666K wps
Begin Testing...
[Epoch 40] train avg loss 0.00588957, dev acc 0.8478, dev avg loss 0.348484, throughput 3.97781K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/162] avg loss 0.00582836, throughput 4.05746K wps
[Epoch 41 Batch 60/162] avg loss 0.00562484, throughput 3.96431K wps
[Epoch 41 Batch 90/162] avg loss 0.00572647, throughput 3.95572K wps
[Epoch 41 Batch 120/162] avg loss 0.00599174, throughput 3.97048K wps
[Epoch 41 Batch 150/162] avg loss 0.00588973, throughput 3.97169K wps
Begin Testing...
[Epoch 41] train avg loss 0.00578351, dev acc 0.8467, dev avg loss 0.344811, throughput 3.98217K wps
[Epoch 42 Batch 30/162] avg loss 0.00597888, throughput 4.04356K wps
[Epoch 42 Batch 60/162] avg loss 0.00552133, throughput 3.95517K wps
[Epoch 42 Batch 90/162] avg loss 0.00525359, throughput 3.95487K wps
[Epoch 42 Batch 120/162] avg loss 0.00575483, throughput 3.97025K wps
[Epoch 42 Batch 150/162] avg loss 0.00564955, throughput 3.96804K wps
Begin Testing...
[Epoch 42] train avg loss 0.00564594, dev acc 0.8478, dev avg loss 0.343096, throughput 3.97762K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/162] avg loss 0.00563372, throughput 4.04255K wps
[Epoch 43 Batch 60/162] avg loss 0.00585565, throughput 3.9681K wps
[Epoch 43 Batch 90/162] avg loss 0.00531114, throughput 3.9616K wps
[Epoch 43 Batch 120/162] avg loss 0.00527545, throughput 3.94346K wps
[Epoch 43 Batch 150/162] avg loss 0.0053546, throughput 3.96482K wps
Begin Testing...
[Epoch 43] train avg loss 0.00548071, dev acc 0.8533, dev avg loss 0.339121, throughput 3.9762K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/162] avg loss 0.00506364, throughput 4.05854K wps
[Epoch 44 Batch 60/162] avg loss 0.00536341, throughput 3.9694K wps
[Epoch 44 Batch 90/162] avg loss 0.00565054, throughput 3.96597K wps
[Epoch 44 Batch 120/162] avg loss 0.0056536, throughput 3.96794K wps
[Epoch 44 Batch 150/162] avg loss 0.00506548, throughput 3.96322K wps
Begin Testing...
[Epoch 44] train avg loss 0.00540027, dev acc 0.8544, dev avg loss 0.333174, throughput 3.98371K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/162] avg loss 0.00568089, throughput 4.04579K wps
[Epoch 45 Batch 60/162] avg loss 0.00531891, throughput 3.97263K wps
[Epoch 45 Batch 90/162] avg loss 0.00499022, throughput 3.97885K wps
[Epoch 45 Batch 120/162] avg loss 0.00508608, throughput 3.97204K wps
[Epoch 45 Batch 150/162] avg loss 0.00521932, throughput 3.95266K wps
Begin Testing...
[Epoch 45] train avg loss 0.00520724, dev acc 0.8622, dev avg loss 0.330611, throughput 3.98237K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/162] avg loss 0.00499877, throughput 4.06714K wps
[Epoch 46 Batch 60/162] avg loss 0.00520107, throughput 3.96036K wps
[Epoch 46 Batch 90/162] avg loss 0.00528619, throughput 3.96245K wps
[Epoch 46 Batch 120/162] avg loss 0.00493711, throughput 3.9649K wps
[Epoch 46 Batch 150/162] avg loss 0.00519146, throughput 3.95331K wps
Begin Testing...
[Epoch 46] train avg loss 0.00512975, dev acc 0.8611, dev avg loss 0.326064, throughput 3.97896K wps
[Epoch 47 Batch 30/162] avg loss 0.00467008, throughput 4.07012K wps
[Epoch 47 Batch 60/162] avg loss 0.00502967, throughput 3.9671K wps
[Epoch 47 Batch 90/162] avg loss 0.00507757, throughput 3.94208K wps
[Epoch 47 Batch 120/162] avg loss 0.0051884, throughput 3.96852K wps
[Epoch 47 Batch 150/162] avg loss 0.00497026, throughput 3.97382K wps
Begin Testing...
[Epoch 47] train avg loss 0.00499445, dev acc 0.8611, dev avg loss 0.323757, throughput 3.98215K wps
[Epoch 48 Batch 30/162] avg loss 0.00477627, throughput 4.06331K wps
[Epoch 48 Batch 60/162] avg loss 0.0051286, throughput 3.96069K wps
[Epoch 48 Batch 90/162] avg loss 0.00456977, throughput 3.96053K wps
[Epoch 48 Batch 120/162] avg loss 0.00468258, throughput 3.9501K wps
[Epoch 48 Batch 150/162] avg loss 0.00501984, throughput 3.95688K wps
Begin Testing...
[Epoch 48] train avg loss 0.00485527, dev acc 0.8633, dev avg loss 0.319973, throughput 3.97704K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/162] avg loss 0.00455121, throughput 4.06311K wps
[Epoch 49 Batch 60/162] avg loss 0.00486467, throughput 3.96286K wps
[Epoch 49 Batch 90/162] avg loss 0.00472579, throughput 3.96047K wps
[Epoch 49 Batch 120/162] avg loss 0.00499569, throughput 3.96344K wps
[Epoch 49 Batch 150/162] avg loss 0.0050474, throughput 3.96125K wps
Begin Testing...
[Epoch 49] train avg loss 0.00481264, dev acc 0.8667, dev avg loss 0.317701, throughput 3.98038K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/162] avg loss 0.00462248, throughput 4.06062K wps
[Epoch 50 Batch 60/162] avg loss 0.00468758, throughput 3.94721K wps
[Epoch 50 Batch 90/162] avg loss 0.00475145, throughput 3.97419K wps
[Epoch 50 Batch 120/162] avg loss 0.00468624, throughput 3.94349K wps
[Epoch 50 Batch 150/162] avg loss 0.00449352, throughput 3.96484K wps
Begin Testing...
[Epoch 50] train avg loss 0.00462798, dev acc 0.8633, dev avg loss 0.315667, throughput 3.97545K wps
[Epoch 51 Batch 30/162] avg loss 0.00467351, throughput 4.05367K wps
[Epoch 51 Batch 60/162] avg loss 0.00459607, throughput 3.96665K wps
[Epoch 51 Batch 90/162] avg loss 0.00465219, throughput 3.95191K wps
[Epoch 51 Batch 120/162] avg loss 0.00425794, throughput 3.96944K wps
[Epoch 51 Batch 150/162] avg loss 0.00412042, throughput 3.96126K wps
Begin Testing...
[Epoch 51] train avg loss 0.00448415, dev acc 0.8644, dev avg loss 0.312176, throughput 3.97915K wps
[Epoch 52 Batch 30/162] avg loss 0.00419271, throughput 4.06139K wps
[Epoch 52 Batch 60/162] avg loss 0.00428834, throughput 3.96425K wps
[Epoch 52 Batch 90/162] avg loss 0.00445145, throughput 3.95957K wps
[Epoch 52 Batch 120/162] avg loss 0.00439379, throughput 3.96109K wps
[Epoch 52 Batch 150/162] avg loss 0.00442242, throughput 3.9563K wps
Begin Testing...
[Epoch 52] train avg loss 0.00435309, dev acc 0.8622, dev avg loss 0.308901, throughput 3.97848K wps
[Epoch 53 Batch 30/162] avg loss 0.00397612, throughput 4.04652K wps
[Epoch 53 Batch 60/162] avg loss 0.00442046, throughput 3.95879K wps
[Epoch 53 Batch 90/162] avg loss 0.00460905, throughput 3.95182K wps
[Epoch 53 Batch 120/162] avg loss 0.00414497, throughput 3.95949K wps
[Epoch 53 Batch 150/162] avg loss 0.00433326, throughput 3.95376K wps
Begin Testing...
[Epoch 53] train avg loss 0.00431061, dev acc 0.8711, dev avg loss 0.306237, throughput 3.97203K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/162] avg loss 0.00453107, throughput 4.05711K wps
[Epoch 54 Batch 60/162] avg loss 0.0038721, throughput 3.95697K wps
[Epoch 54 Batch 90/162] avg loss 0.00413791, throughput 3.95999K wps
[Epoch 54 Batch 120/162] avg loss 0.00455637, throughput 3.96395K wps
[Epoch 54 Batch 150/162] avg loss 0.00411772, throughput 3.95304K wps
Begin Testing...
[Epoch 54] train avg loss 0.00420326, dev acc 0.8633, dev avg loss 0.307234, throughput 3.97585K wps
[Epoch 55 Batch 30/162] avg loss 0.00386683, throughput 4.03983K wps
[Epoch 55 Batch 60/162] avg loss 0.00439057, throughput 3.93637K wps
[Epoch 55 Batch 90/162] avg loss 0.00405723, throughput 3.95393K wps
[Epoch 55 Batch 120/162] avg loss 0.00410667, throughput 3.95641K wps
[Epoch 55 Batch 150/162] avg loss 0.00396027, throughput 3.95314K wps
Begin Testing...
[Epoch 55] train avg loss 0.00406144, dev acc 0.8711, dev avg loss 0.302036, throughput 3.96607K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/162] avg loss 0.00385318, throughput 4.07727K wps
[Epoch 56 Batch 60/162] avg loss 0.00395309, throughput 3.97172K wps
[Epoch 56 Batch 90/162] avg loss 0.00390394, throughput 3.95704K wps
[Epoch 56 Batch 120/162] avg loss 0.00397103, throughput 3.96448K wps
[Epoch 56 Batch 150/162] avg loss 0.00412434, throughput 3.95812K wps
Begin Testing...
[Epoch 56] train avg loss 0.00396686, dev acc 0.8711, dev avg loss 0.298178, throughput 3.98441K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/162] avg loss 0.00387903, throughput 4.06112K wps
[Epoch 57 Batch 60/162] avg loss 0.00367846, throughput 3.96433K wps
[Epoch 57 Batch 90/162] avg loss 0.00383183, throughput 3.97186K wps
[Epoch 57 Batch 120/162] avg loss 0.00380822, throughput 3.95186K wps
[Epoch 57 Batch 150/162] avg loss 0.00414166, throughput 3.96067K wps
Begin Testing...
[Epoch 57] train avg loss 0.00384734, dev acc 0.8767, dev avg loss 0.295926, throughput 3.98059K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/162] avg loss 0.00365785, throughput 4.05722K wps
[Epoch 58 Batch 60/162] avg loss 0.00385439, throughput 3.9697K wps
[Epoch 58 Batch 90/162] avg loss 0.00384175, throughput 3.96052K wps
[Epoch 58 Batch 120/162] avg loss 0.00363426, throughput 3.96439K wps
[Epoch 58 Batch 150/162] avg loss 0.00363362, throughput 3.9683K wps
Begin Testing...
[Epoch 58] train avg loss 0.00372078, dev acc 0.8689, dev avg loss 0.293595, throughput 3.98232K wps
[Epoch 59 Batch 30/162] avg loss 0.00362947, throughput 4.06627K wps
[Epoch 59 Batch 60/162] avg loss 0.00344786, throughput 3.97817K wps
[Epoch 59 Batch 90/162] avg loss 0.00353805, throughput 3.96326K wps
[Epoch 59 Batch 120/162] avg loss 0.00388866, throughput 3.9505K wps
[Epoch 59 Batch 150/162] avg loss 0.00364157, throughput 3.9727K wps
Begin Testing...
[Epoch 59] train avg loss 0.00363957, dev acc 0.8722, dev avg loss 0.29268, throughput 3.98428K wps
[Epoch 60 Batch 30/162] avg loss 0.00348743, throughput 4.06567K wps
[Epoch 60 Batch 60/162] avg loss 0.0034599, throughput 3.96542K wps
[Epoch 60 Batch 90/162] avg loss 0.00340809, throughput 3.96622K wps
[Epoch 60 Batch 120/162] avg loss 0.00342855, throughput 3.95413K wps
[Epoch 60 Batch 150/162] avg loss 0.00363465, throughput 3.9599K wps
Begin Testing...
[Epoch 60] train avg loss 0.00348469, dev acc 0.8689, dev avg loss 0.291562, throughput 3.98054K wps
[Epoch 61 Batch 30/162] avg loss 0.00331352, throughput 4.06896K wps
[Epoch 61 Batch 60/162] avg loss 0.00367612, throughput 3.96701K wps
[Epoch 61 Batch 90/162] avg loss 0.00342193, throughput 3.9537K wps
[Epoch 61 Batch 120/162] avg loss 0.00342075, throughput 3.96444K wps
[Epoch 61 Batch 150/162] avg loss 0.00350505, throughput 3.96582K wps
Begin Testing...
[Epoch 61] train avg loss 0.00343445, dev acc 0.8700, dev avg loss 0.290521, throughput 3.98221K wps
[Epoch 62 Batch 30/162] avg loss 0.003563, throughput 4.0602K wps
[Epoch 62 Batch 60/162] avg loss 0.00320036, throughput 3.96374K wps
[Epoch 62 Batch 90/162] avg loss 0.00330318, throughput 3.97039K wps
[Epoch 62 Batch 120/162] avg loss 0.00313785, throughput 3.98119K wps
[Epoch 62 Batch 150/162] avg loss 0.00324138, throughput 3.97253K wps
Begin Testing...
[Epoch 62] train avg loss 0.00330424, dev acc 0.8722, dev avg loss 0.287978, throughput 3.98563K wps
[Epoch 63 Batch 30/162] avg loss 0.00303365, throughput 4.05678K wps
[Epoch 63 Batch 60/162] avg loss 0.00345216, throughput 3.96203K wps
[Epoch 63 Batch 90/162] avg loss 0.00295819, throughput 3.9655K wps
[Epoch 63 Batch 120/162] avg loss 0.00328528, throughput 3.96284K wps
[Epoch 63 Batch 150/162] avg loss 0.00323579, throughput 3.96942K wps
Begin Testing...
[Epoch 63] train avg loss 0.00319934, dev acc 0.8744, dev avg loss 0.286327, throughput 3.98215K wps
[Epoch 64 Batch 30/162] avg loss 0.00324467, throughput 4.06992K wps
[Epoch 64 Batch 60/162] avg loss 0.00333519, throughput 3.96783K wps
[Epoch 64 Batch 90/162] avg loss 0.00321916, throughput 3.97706K wps
[Epoch 64 Batch 120/162] avg loss 0.00298914, throughput 3.9748K wps
[Epoch 64 Batch 150/162] avg loss 0.00293019, throughput 3.95715K wps
Begin Testing...
[Epoch 64] train avg loss 0.0031374, dev acc 0.8722, dev avg loss 0.285131, throughput 3.98785K wps
[Epoch 65 Batch 30/162] avg loss 0.003106, throughput 4.07293K wps
[Epoch 65 Batch 60/162] avg loss 0.00306334, throughput 3.95571K wps
[Epoch 65 Batch 90/162] avg loss 0.00298489, throughput 3.95993K wps
[Epoch 65 Batch 120/162] avg loss 0.00295505, throughput 3.97367K wps
[Epoch 65 Batch 150/162] avg loss 0.00295377, throughput 3.96127K wps
Begin Testing...
[Epoch 65] train avg loss 0.00303792, dev acc 0.8744, dev avg loss 0.284255, throughput 3.98319K wps
[Epoch 66 Batch 30/162] avg loss 0.00277504, throughput 4.05928K wps
[Epoch 66 Batch 60/162] avg loss 0.00286052, throughput 3.96737K wps
[Epoch 66 Batch 90/162] avg loss 0.00299081, throughput 3.96189K wps
[Epoch 66 Batch 120/162] avg loss 0.00309648, throughput 3.96812K wps
[Epoch 66 Batch 150/162] avg loss 0.00270712, throughput 3.96469K wps
Begin Testing...
[Epoch 66] train avg loss 0.0029042, dev acc 0.8733, dev avg loss 0.283867, throughput 3.97942K wps
[Epoch 67 Batch 30/162] avg loss 0.00293572, throughput 4.04504K wps
[Epoch 67 Batch 60/162] avg loss 0.00256908, throughput 3.94174K wps
[Epoch 67 Batch 90/162] avg loss 0.00297008, throughput 3.97319K wps
[Epoch 67 Batch 120/162] avg loss 0.00291749, throughput 3.96984K wps
[Epoch 67 Batch 150/162] avg loss 0.00287619, throughput 3.95303K wps
Begin Testing...
[Epoch 67] train avg loss 0.00284449, dev acc 0.8744, dev avg loss 0.281916, throughput 3.97331K wps
[Epoch 68 Batch 30/162] avg loss 0.00256161, throughput 4.06861K wps
[Epoch 68 Batch 60/162] avg loss 0.0030321, throughput 3.95473K wps
[Epoch 68 Batch 90/162] avg loss 0.00263634, throughput 3.96401K wps
[Epoch 68 Batch 120/162] avg loss 0.00295167, throughput 3.96835K wps
[Epoch 68 Batch 150/162] avg loss 0.00268792, throughput 3.96234K wps
Begin Testing...
[Epoch 68] train avg loss 0.00279116, dev acc 0.8778, dev avg loss 0.280807, throughput 3.98234K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/162] avg loss 0.00265644, throughput 4.05181K wps
[Epoch 69 Batch 60/162] avg loss 0.00286651, throughput 3.96042K wps
[Epoch 69 Batch 90/162] avg loss 0.00255007, throughput 3.97014K wps
[Epoch 69 Batch 120/162] avg loss 0.00276033, throughput 3.96934K wps
[Epoch 69 Batch 150/162] avg loss 0.00277663, throughput 3.95227K wps
Begin Testing...
[Epoch 69] train avg loss 0.00271956, dev acc 0.8789, dev avg loss 0.280841, throughput 3.97746K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/162] avg loss 0.00243572, throughput 4.0566K wps
[Epoch 70 Batch 60/162] avg loss 0.00271245, throughput 3.9518K wps
[Epoch 70 Batch 90/162] avg loss 0.00269544, throughput 3.96346K wps
[Epoch 70 Batch 120/162] avg loss 0.00258655, throughput 3.9709K wps
[Epoch 70 Batch 150/162] avg loss 0.00256441, throughput 3.94896K wps
Begin Testing...
[Epoch 70] train avg loss 0.00258538, dev acc 0.8778, dev avg loss 0.279661, throughput 3.97665K wps
[Epoch 71 Batch 30/162] avg loss 0.00258123, throughput 4.06915K wps
[Epoch 71 Batch 60/162] avg loss 0.0023649, throughput 3.98095K wps
[Epoch 71 Batch 90/162] avg loss 0.00273192, throughput 3.96429K wps
[Epoch 71 Batch 120/162] avg loss 0.00254445, throughput 3.95K wps
[Epoch 71 Batch 150/162] avg loss 0.00260922, throughput 3.97604K wps
Begin Testing...
[Epoch 71] train avg loss 0.00257503, dev acc 0.8800, dev avg loss 0.279458, throughput 3.98425K wps
Observed Improvement.
Begin Testing...
[Epoch 72 Batch 30/162] avg loss 0.00274277, throughput 4.05598K wps
[Epoch 72 Batch 60/162] avg loss 0.00244865, throughput 3.97283K wps
[Epoch 72 Batch 90/162] avg loss 0.00254976, throughput 3.95892K wps
[Epoch 72 Batch 120/162] avg loss 0.00225288, throughput 3.96722K wps
[Epoch 72 Batch 150/162] avg loss 0.00239094, throughput 3.95962K wps
Begin Testing...
[Epoch 72] train avg loss 0.00246489, dev acc 0.8811, dev avg loss 0.279256, throughput 3.98236K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/162] avg loss 0.00249929, throughput 4.05768K wps
[Epoch 73 Batch 60/162] avg loss 0.00223435, throughput 3.96358K wps
[Epoch 73 Batch 90/162] avg loss 0.00236863, throughput 3.97439K wps
[Epoch 73 Batch 120/162] avg loss 0.00235516, throughput 3.9548K wps
[Epoch 73 Batch 150/162] avg loss 0.00227806, throughput 3.9619K wps
Begin Testing...
[Epoch 73] train avg loss 0.00235539, dev acc 0.8811, dev avg loss 0.278475, throughput 3.98105K wps
Observed Improvement.
Begin Testing...
[Epoch 74 Batch 30/162] avg loss 0.00221229, throughput 4.06359K wps
[Epoch 74 Batch 60/162] avg loss 0.00232312, throughput 3.96093K wps
[Epoch 74 Batch 90/162] avg loss 0.00230437, throughput 3.96364K wps
[Epoch 74 Batch 120/162] avg loss 0.00255931, throughput 3.96972K wps
[Epoch 74 Batch 150/162] avg loss 0.00216411, throughput 3.95141K wps
Begin Testing...
[Epoch 74] train avg loss 0.00230811, dev acc 0.8833, dev avg loss 0.277499, throughput 3.97984K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/162] avg loss 0.00218582, throughput 4.07006K wps
[Epoch 75 Batch 60/162] avg loss 0.00222351, throughput 3.95301K wps
[Epoch 75 Batch 90/162] avg loss 0.00208275, throughput 3.9691K wps
[Epoch 75 Batch 120/162] avg loss 0.00235391, throughput 3.96723K wps
[Epoch 75 Batch 150/162] avg loss 0.00221885, throughput 3.96946K wps
Begin Testing...
[Epoch 75] train avg loss 0.00222566, dev acc 0.8811, dev avg loss 0.276782, throughput 3.98348K wps
[Epoch 76 Batch 30/162] avg loss 0.0022073, throughput 4.07144K wps
[Epoch 76 Batch 60/162] avg loss 0.00219461, throughput 3.9682K wps
[Epoch 76 Batch 90/162] avg loss 0.00201511, throughput 3.95671K wps
[Epoch 76 Batch 120/162] avg loss 0.00213857, throughput 3.96621K wps
[Epoch 76 Batch 150/162] avg loss 0.00212156, throughput 3.94728K wps
Begin Testing...
[Epoch 76] train avg loss 0.00213739, dev acc 0.8867, dev avg loss 0.275826, throughput 3.97897K wps
Observed Improvement.
Begin Testing...
[Epoch 77 Batch 30/162] avg loss 0.00212755, throughput 4.05636K wps
[Epoch 77 Batch 60/162] avg loss 0.00204646, throughput 3.96641K wps
[Epoch 77 Batch 90/162] avg loss 0.00204413, throughput 3.96097K wps
[Epoch 77 Batch 120/162] avg loss 0.00197693, throughput 3.96419K wps
[Epoch 77 Batch 150/162] avg loss 0.00229002, throughput 3.96446K wps
Begin Testing...
[Epoch 77] train avg loss 0.0021096, dev acc 0.8856, dev avg loss 0.276202, throughput 3.98139K wps
[Epoch 78 Batch 30/162] avg loss 0.00189193, throughput 4.05799K wps
[Epoch 78 Batch 60/162] avg loss 0.00183008, throughput 3.9694K wps
[Epoch 78 Batch 90/162] avg loss 0.00212254, throughput 3.9721K wps
[Epoch 78 Batch 120/162] avg loss 0.00216896, throughput 3.95811K wps
[Epoch 78 Batch 150/162] avg loss 0.00218401, throughput 3.95338K wps
Begin Testing...
[Epoch 78] train avg loss 0.00202332, dev acc 0.8844, dev avg loss 0.277058, throughput 3.9797K wps
[Epoch 79 Batch 30/162] avg loss 0.00207825, throughput 4.06333K wps
[Epoch 79 Batch 60/162] avg loss 0.00199607, throughput 3.96709K wps
[Epoch 79 Batch 90/162] avg loss 0.00189292, throughput 3.97068K wps
[Epoch 79 Batch 120/162] avg loss 0.00208363, throughput 3.96996K wps
[Epoch 79 Batch 150/162] avg loss 0.00196671, throughput 3.94695K wps
Begin Testing...
[Epoch 79] train avg loss 0.00198659, dev acc 0.8844, dev avg loss 0.276608, throughput 3.98112K wps
[Epoch 80 Batch 30/162] avg loss 0.00186464, throughput 4.05038K wps
[Epoch 80 Batch 60/162] avg loss 0.0019827, throughput 3.95759K wps
[Epoch 80 Batch 90/162] avg loss 0.00184194, throughput 3.94753K wps
[Epoch 80 Batch 120/162] avg loss 0.00200872, throughput 3.96861K wps
[Epoch 80 Batch 150/162] avg loss 0.00193202, throughput 3.971K wps
Begin Testing...
[Epoch 80] train avg loss 0.00191807, dev acc 0.8856, dev avg loss 0.276028, throughput 3.97754K wps
[Epoch 81 Batch 30/162] avg loss 0.00193098, throughput 4.06412K wps
[Epoch 81 Batch 60/162] avg loss 0.00196655, throughput 3.96938K wps
[Epoch 81 Batch 90/162] avg loss 0.00175825, throughput 3.9507K wps
[Epoch 81 Batch 120/162] avg loss 0.00188826, throughput 3.97491K wps
[Epoch 81 Batch 150/162] avg loss 0.0017458, throughput 3.97122K wps
Begin Testing...
[Epoch 81] train avg loss 0.00187903, dev acc 0.8867, dev avg loss 0.278, throughput 3.98347K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/162] avg loss 0.00168745, throughput 4.05044K wps
[Epoch 82 Batch 60/162] avg loss 0.0018435, throughput 3.96933K wps
[Epoch 82 Batch 90/162] avg loss 0.00188328, throughput 3.96352K wps
[Epoch 82 Batch 120/162] avg loss 0.00169494, throughput 3.96058K wps
[Epoch 82 Batch 150/162] avg loss 0.00185751, throughput 3.9621K wps
Begin Testing...
[Epoch 82] train avg loss 0.001785, dev acc 0.8878, dev avg loss 0.277434, throughput 3.97929K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/162] avg loss 0.00183383, throughput 4.07369K wps
[Epoch 83 Batch 60/162] avg loss 0.00169341, throughput 3.94845K wps
[Epoch 83 Batch 90/162] avg loss 0.0018105, throughput 3.95789K wps
[Epoch 83 Batch 120/162] avg loss 0.00165875, throughput 3.94739K wps
[Epoch 83 Batch 150/162] avg loss 0.00158974, throughput 3.95636K wps
Begin Testing...
[Epoch 83] train avg loss 0.00173517, dev acc 0.8856, dev avg loss 0.277487, throughput 3.9741K wps
[Epoch 84 Batch 30/162] avg loss 0.0017397, throughput 4.06007K wps
[Epoch 84 Batch 60/162] avg loss 0.00162001, throughput 3.95738K wps
[Epoch 84 Batch 90/162] avg loss 0.00169562, throughput 3.96503K wps
[Epoch 84 Batch 120/162] avg loss 0.00173763, throughput 3.98325K wps
[Epoch 84 Batch 150/162] avg loss 0.00156323, throughput 3.96493K wps
Begin Testing...
[Epoch 84] train avg loss 0.00167572, dev acc 0.8878, dev avg loss 0.276267, throughput 3.98299K wps
Observed Improvement.
Begin Testing...
[Epoch 85 Batch 30/162] avg loss 0.00169736, throughput 4.05553K wps
[Epoch 85 Batch 60/162] avg loss 0.00158302, throughput 3.96006K wps
[Epoch 85 Batch 90/162] avg loss 0.0015763, throughput 3.95399K wps
[Epoch 85 Batch 120/162] avg loss 0.00153849, throughput 3.96429K wps
[Epoch 85 Batch 150/162] avg loss 0.00167361, throughput 3.97588K wps
Begin Testing...
[Epoch 85] train avg loss 0.00160738, dev acc 0.8889, dev avg loss 0.282253, throughput 3.98069K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/162] avg loss 0.00185684, throughput 4.05317K wps
[Epoch 86 Batch 60/162] avg loss 0.00159762, throughput 3.96076K wps
[Epoch 86 Batch 90/162] avg loss 0.00151522, throughput 3.9729K wps
[Epoch 86 Batch 120/162] avg loss 0.00176114, throughput 3.95057K wps
[Epoch 86 Batch 150/162] avg loss 0.00159205, throughput 3.96753K wps
Begin Testing...
[Epoch 86] train avg loss 0.00166574, dev acc 0.8889, dev avg loss 0.276223, throughput 3.97918K wps
Observed Improvement.
Begin Testing...
[Epoch 87 Batch 30/162] avg loss 0.00150201, throughput 4.05136K wps
[Epoch 87 Batch 60/162] avg loss 0.00162019, throughput 3.95984K wps
[Epoch 87 Batch 90/162] avg loss 0.00151396, throughput 3.96583K wps
[Epoch 87 Batch 120/162] avg loss 0.00152053, throughput 3.97547K wps
[Epoch 87 Batch 150/162] avg loss 0.00152081, throughput 3.97133K wps
Begin Testing...
[Epoch 87] train avg loss 0.00154227, dev acc 0.8856, dev avg loss 0.277779, throughput 3.98368K wps
[Epoch 88 Batch 30/162] avg loss 0.00145485, throughput 4.04604K wps
[Epoch 88 Batch 60/162] avg loss 0.00146418, throughput 3.9493K wps
[Epoch 88 Batch 90/162] avg loss 0.00147823, throughput 3.969K wps
[Epoch 88 Batch 120/162] avg loss 0.00145457, throughput 3.96791K wps
[Epoch 88 Batch 150/162] avg loss 0.00163335, throughput 3.95109K wps
Begin Testing...
[Epoch 88] train avg loss 0.0014907, dev acc 0.8822, dev avg loss 0.278668, throughput 3.97401K wps
[Epoch 89 Batch 30/162] avg loss 0.00140739, throughput 4.05734K wps
[Epoch 89 Batch 60/162] avg loss 0.00136238, throughput 3.96353K wps
[Epoch 89 Batch 90/162] avg loss 0.0014661, throughput 3.97702K wps
[Epoch 89 Batch 120/162] avg loss 0.00140645, throughput 3.96534K wps
[Epoch 89 Batch 150/162] avg loss 0.00153167, throughput 3.96452K wps
Begin Testing...
[Epoch 89] train avg loss 0.00142395, dev acc 0.8856, dev avg loss 0.277179, throughput 3.98226K wps
[Epoch 90 Batch 30/162] avg loss 0.00147064, throughput 4.06912K wps
[Epoch 90 Batch 60/162] avg loss 0.00135798, throughput 3.95456K wps
[Epoch 90 Batch 90/162] avg loss 0.00148213, throughput 3.96146K wps
[Epoch 90 Batch 120/162] avg loss 0.00137676, throughput 3.96946K wps
[Epoch 90 Batch 150/162] avg loss 0.00133689, throughput 3.96675K wps
Begin Testing...
[Epoch 90] train avg loss 0.00139619, dev acc 0.8878, dev avg loss 0.277886, throughput 3.98214K wps
[Epoch 91 Batch 30/162] avg loss 0.00151717, throughput 4.05437K wps
[Epoch 91 Batch 60/162] avg loss 0.00125098, throughput 3.97006K wps
[Epoch 91 Batch 90/162] avg loss 0.00149165, throughput 3.9712K wps
[Epoch 91 Batch 120/162] avg loss 0.00123086, throughput 3.9686K wps
[Epoch 91 Batch 150/162] avg loss 0.00142589, throughput 3.96477K wps
Begin Testing...
[Epoch 91] train avg loss 0.0013779, dev acc 0.8889, dev avg loss 0.279811, throughput 3.98274K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/162] avg loss 0.00132738, throughput 4.05661K wps
[Epoch 92 Batch 60/162] avg loss 0.00125461, throughput 3.96563K wps
[Epoch 92 Batch 90/162] avg loss 0.00120217, throughput 3.97368K wps
[Epoch 92 Batch 120/162] avg loss 0.00127685, throughput 3.97121K wps
[Epoch 92 Batch 150/162] avg loss 0.00133389, throughput 3.97564K wps
Begin Testing...
[Epoch 92] train avg loss 0.00128543, dev acc 0.8911, dev avg loss 0.278913, throughput 3.98687K wps
Observed Improvement.
Begin Testing...
[Epoch 93 Batch 30/162] avg loss 0.00124891, throughput 4.05959K wps
[Epoch 93 Batch 60/162] avg loss 0.00131169, throughput 3.93759K wps
[Epoch 93 Batch 90/162] avg loss 0.00134415, throughput 3.97491K wps
[Epoch 93 Batch 120/162] avg loss 0.00138468, throughput 3.94399K wps
[Epoch 93 Batch 150/162] avg loss 0.00128106, throughput 3.95871K wps
Begin Testing...
[Epoch 93] train avg loss 0.00129468, dev acc 0.8933, dev avg loss 0.279746, throughput 3.97413K wps
Observed Improvement.
Begin Testing...
[Epoch 94 Batch 30/162] avg loss 0.00119929, throughput 4.05284K wps
[Epoch 94 Batch 60/162] avg loss 0.00139645, throughput 3.967K wps
[Epoch 94 Batch 90/162] avg loss 0.00121864, throughput 3.95993K wps
[Epoch 94 Batch 120/162] avg loss 0.00121974, throughput 3.97567K wps
[Epoch 94 Batch 150/162] avg loss 0.00130694, throughput 3.95719K wps
Begin Testing...
[Epoch 94] train avg loss 0.0012496, dev acc 0.8900, dev avg loss 0.280474, throughput 3.98073K wps
[Epoch 95 Batch 30/162] avg loss 0.00120923, throughput 4.05204K wps
[Epoch 95 Batch 60/162] avg loss 0.00109935, throughput 3.96705K wps
[Epoch 95 Batch 90/162] avg loss 0.00118926, throughput 3.95684K wps
[Epoch 95 Batch 120/162] avg loss 0.0012031, throughput 3.95204K wps
[Epoch 95 Batch 150/162] avg loss 0.0012445, throughput 3.96537K wps
Begin Testing...
[Epoch 95] train avg loss 0.00119136, dev acc 0.8867, dev avg loss 0.28199, throughput 3.97666K wps
[Epoch 96 Batch 30/162] avg loss 0.00121069, throughput 4.06331K wps
[Epoch 96 Batch 60/162] avg loss 0.00124861, throughput 3.9747K wps
[Epoch 96 Batch 90/162] avg loss 0.00112082, throughput 3.95364K wps
[Epoch 96 Batch 120/162] avg loss 0.00126328, throughput 3.95363K wps
[Epoch 96 Batch 150/162] avg loss 0.00114431, throughput 3.9582K wps
Begin Testing...
[Epoch 96] train avg loss 0.0012029, dev acc 0.8922, dev avg loss 0.285579, throughput 3.97875K wps
[Epoch 97 Batch 30/162] avg loss 0.00117556, throughput 4.05208K wps
[Epoch 97 Batch 60/162] avg loss 0.00121426, throughput 3.95718K wps
[Epoch 97 Batch 90/162] avg loss 0.00108049, throughput 3.95084K wps
[Epoch 97 Batch 120/162] avg loss 0.00108367, throughput 3.96432K wps
[Epoch 97 Batch 150/162] avg loss 0.0010378, throughput 3.97632K wps
Begin Testing...
[Epoch 97] train avg loss 0.00111023, dev acc 0.8900, dev avg loss 0.284127, throughput 3.97751K wps
[Epoch 98 Batch 30/162] avg loss 0.00116968, throughput 4.06243K wps
[Epoch 98 Batch 60/162] avg loss 0.00105453, throughput 3.95644K wps
[Epoch 98 Batch 90/162] avg loss 0.00115999, throughput 3.9658K wps
[Epoch 98 Batch 120/162] avg loss 0.00121315, throughput 3.96487K wps
[Epoch 98 Batch 150/162] avg loss 0.00117694, throughput 3.96639K wps
Begin Testing...
[Epoch 98] train avg loss 0.00115621, dev acc 0.8867, dev avg loss 0.283113, throughput 3.97956K wps
[Epoch 99 Batch 30/162] avg loss 0.0011751, throughput 4.056K wps
[Epoch 99 Batch 60/162] avg loss 0.00107727, throughput 3.9558K wps
[Epoch 99 Batch 90/162] avg loss 0.000962734, throughput 3.9756K wps
[Epoch 99 Batch 120/162] avg loss 0.00109363, throughput 3.96844K wps
[Epoch 99 Batch 150/162] avg loss 0.00111112, throughput 3.95515K wps
Begin Testing...
[Epoch 99] train avg loss 0.00108874, dev acc 0.8878, dev avg loss 0.284323, throughput 3.9786K wps
[Epoch 100 Batch 30/162] avg loss 0.00101711, throughput 4.07296K wps
[Epoch 100 Batch 60/162] avg loss 0.00103892, throughput 3.96653K wps
[Epoch 100 Batch 90/162] avg loss 0.00104789, throughput 3.95721K wps
[Epoch 100 Batch 120/162] avg loss 0.00104483, throughput 3.95746K wps
[Epoch 100 Batch 150/162] avg loss 0.00106114, throughput 3.95035K wps
Begin Testing...
[Epoch 100] train avg loss 0.00105031, dev acc 0.8878, dev avg loss 0.2843, throughput 3.97854K wps
[Epoch 101 Batch 30/162] avg loss 0.00100486, throughput 4.0652K wps
[Epoch 101 Batch 60/162] avg loss 0.00105288, throughput 3.97662K wps
[Epoch 101 Batch 90/162] avg loss 0.000866287, throughput 3.95263K wps
[Epoch 101 Batch 120/162] avg loss 0.00108675, throughput 3.96776K wps
[Epoch 101 Batch 150/162] avg loss 0.00112619, throughput 3.97458K wps
Begin Testing...
[Epoch 101] train avg loss 0.0010204, dev acc 0.8922, dev avg loss 0.285933, throughput 3.98473K wps
[Epoch 102 Batch 30/162] avg loss 0.00105533, throughput 4.06663K wps
[Epoch 102 Batch 60/162] avg loss 0.000979634, throughput 3.94757K wps
[Epoch 102 Batch 90/162] avg loss 0.000973675, throughput 3.95757K wps
[Epoch 102 Batch 120/162] avg loss 0.000949513, throughput 3.96234K wps
[Epoch 102 Batch 150/162] avg loss 0.00091299, throughput 3.9701K wps
Begin Testing...
[Epoch 102] train avg loss 0.000986851, dev acc 0.8889, dev avg loss 0.28531, throughput 3.98025K wps
[Epoch 103 Batch 30/162] avg loss 0.000932907, throughput 4.06186K wps
[Epoch 103 Batch 60/162] avg loss 0.00096236, throughput 3.96673K wps
[Epoch 103 Batch 90/162] avg loss 0.000988141, throughput 3.96713K wps
[Epoch 103 Batch 120/162] avg loss 0.000957335, throughput 3.98111K wps
[Epoch 103 Batch 150/162] avg loss 0.00104864, throughput 3.97315K wps
Begin Testing...
[Epoch 103] train avg loss 0.000974886, dev acc 0.8900, dev avg loss 0.286644, throughput 3.98724K wps
[Epoch 104 Batch 30/162] avg loss 0.000920027, throughput 4.07147K wps
[Epoch 104 Batch 60/162] avg loss 0.000939371, throughput 3.96934K wps
[Epoch 104 Batch 90/162] avg loss 0.000821575, throughput 3.95219K wps
[Epoch 104 Batch 120/162] avg loss 0.000862076, throughput 3.95844K wps
[Epoch 104 Batch 150/162] avg loss 0.000971694, throughput 3.96853K wps
Begin Testing...
[Epoch 104] train avg loss 0.00091193, dev acc 0.8944, dev avg loss 0.287019, throughput 3.98203K wps
Observed Improvement.
Begin Testing...
[Epoch 105 Batch 30/162] avg loss 0.000853932, throughput 4.06295K wps
[Epoch 105 Batch 60/162] avg loss 0.000939955, throughput 3.97489K wps
[Epoch 105 Batch 90/162] avg loss 0.000889016, throughput 3.96513K wps
[Epoch 105 Batch 120/162] avg loss 0.000871407, throughput 3.96463K wps
[Epoch 105 Batch 150/162] avg loss 0.000885994, throughput 3.96463K wps
Begin Testing...
[Epoch 105] train avg loss 0.000879158, dev acc 0.8889, dev avg loss 0.288667, throughput 3.98471K wps
[Epoch 106 Batch 30/162] avg loss 0.000903196, throughput 4.05964K wps
[Epoch 106 Batch 60/162] avg loss 0.000823965, throughput 3.97325K wps
[Epoch 106 Batch 90/162] avg loss 0.000898679, throughput 3.95803K wps
[Epoch 106 Batch 120/162] avg loss 0.000905379, throughput 3.96022K wps
[Epoch 106 Batch 150/162] avg loss 0.00092186, throughput 3.96128K wps
Begin Testing...
[Epoch 106] train avg loss 0.000883663, dev acc 0.8922, dev avg loss 0.290629, throughput 3.98016K wps
[Epoch 107 Batch 30/162] avg loss 0.000899421, throughput 4.06638K wps
[Epoch 107 Batch 60/162] avg loss 0.000886439, throughput 3.96155K wps
[Epoch 107 Batch 90/162] avg loss 0.000841622, throughput 3.96731K wps
[Epoch 107 Batch 120/162] avg loss 0.000814309, throughput 3.97485K wps
[Epoch 107 Batch 150/162] avg loss 0.000854642, throughput 3.9732K wps
Begin Testing...
[Epoch 107] train avg loss 0.000862293, dev acc 0.8911, dev avg loss 0.291131, throughput 3.98733K wps
[Epoch 108 Batch 30/162] avg loss 0.000774306, throughput 4.05728K wps
[Epoch 108 Batch 60/162] avg loss 0.000785533, throughput 3.95412K wps
[Epoch 108 Batch 90/162] avg loss 0.000695734, throughput 3.97537K wps
[Epoch 108 Batch 120/162] avg loss 0.00082875, throughput 3.96457K wps
[Epoch 108 Batch 150/162] avg loss 0.000854681, throughput 3.96634K wps
Begin Testing...
[Epoch 108] train avg loss 0.000786579, dev acc 0.8889, dev avg loss 0.293178, throughput 3.9821K wps
[Epoch 109 Batch 30/162] avg loss 0.000899412, throughput 4.05578K wps
[Epoch 109 Batch 60/162] avg loss 0.000824624, throughput 3.95349K wps
[Epoch 109 Batch 90/162] avg loss 0.000759936, throughput 3.98205K wps
[Epoch 109 Batch 120/162] avg loss 0.000745172, throughput 3.96801K wps
[Epoch 109 Batch 150/162] avg loss 0.000815385, throughput 3.95159K wps
Begin Testing...
[Epoch 109] train avg loss 0.000809188, dev acc 0.8900, dev avg loss 0.294541, throughput 3.98076K wps
[Epoch 110 Batch 30/162] avg loss 0.00071791, throughput 4.05175K wps
[Epoch 110 Batch 60/162] avg loss 0.000700857, throughput 3.96124K wps
[Epoch 110 Batch 90/162] avg loss 0.000816311, throughput 3.96299K wps
[Epoch 110 Batch 120/162] avg loss 0.000718337, throughput 3.97949K wps
[Epoch 110 Batch 150/162] avg loss 0.000713068, throughput 3.96693K wps
Begin Testing...
[Epoch 110] train avg loss 0.000741805, dev acc 0.8900, dev avg loss 0.295439, throughput 3.98379K wps
[Epoch 111 Batch 30/162] avg loss 0.000695274, throughput 4.06167K wps
[Epoch 111 Batch 60/162] avg loss 0.000747528, throughput 3.9735K wps
[Epoch 111 Batch 90/162] avg loss 0.000783863, throughput 3.95047K wps
[Epoch 111 Batch 120/162] avg loss 0.000694707, throughput 3.96051K wps
[Epoch 111 Batch 150/162] avg loss 0.000780933, throughput 3.95324K wps
Begin Testing...
[Epoch 111] train avg loss 0.000754531, dev acc 0.8911, dev avg loss 0.293328, throughput 3.97879K wps
[Epoch 112 Batch 30/162] avg loss 0.000761718, throughput 4.04225K wps
[Epoch 112 Batch 60/162] avg loss 0.000623368, throughput 3.96474K wps
[Epoch 112 Batch 90/162] avg loss 0.000791513, throughput 3.96911K wps
[Epoch 112 Batch 120/162] avg loss 0.000685995, throughput 3.97361K wps
[Epoch 112 Batch 150/162] avg loss 0.00071464, throughput 3.96984K wps
Begin Testing...
[Epoch 112] train avg loss 0.0007166, dev acc 0.8944, dev avg loss 0.293507, throughput 3.98299K wps
Observed Improvement.
Begin Testing...
[Epoch 113 Batch 30/162] avg loss 0.000684292, throughput 4.06943K wps
[Epoch 113 Batch 60/162] avg loss 0.000681867, throughput 3.965K wps
[Epoch 113 Batch 90/162] avg loss 0.000703026, throughput 3.97245K wps
[Epoch 113 Batch 120/162] avg loss 0.000694955, throughput 3.96853K wps
[Epoch 113 Batch 150/162] avg loss 0.000759109, throughput 3.9573K wps
Begin Testing...
[Epoch 113] train avg loss 0.000711828, dev acc 0.8933, dev avg loss 0.294252, throughput 3.98509K wps
[Epoch 114 Batch 30/162] avg loss 0.000717616, throughput 4.07576K wps
[Epoch 114 Batch 60/162] avg loss 0.000746247, throughput 3.96748K wps
[Epoch 114 Batch 90/162] avg loss 0.000708401, throughput 3.95209K wps
[Epoch 114 Batch 120/162] avg loss 0.000607646, throughput 3.97435K wps
[Epoch 114 Batch 150/162] avg loss 0.000658109, throughput 3.96666K wps
Begin Testing...
[Epoch 114] train avg loss 0.000686728, dev acc 0.8944, dev avg loss 0.294712, throughput 3.98539K wps
Observed Improvement.
Begin Testing...
[Epoch 115 Batch 30/162] avg loss 0.000649943, throughput 4.05859K wps
[Epoch 115 Batch 60/162] avg loss 0.000731261, throughput 3.97514K wps
[Epoch 115 Batch 90/162] avg loss 0.000714533, throughput 3.98K wps
[Epoch 115 Batch 120/162] avg loss 0.000731159, throughput 3.95488K wps
[Epoch 115 Batch 150/162] avg loss 0.000705424, throughput 3.97247K wps
Begin Testing...
[Epoch 115] train avg loss 0.000704269, dev acc 0.8944, dev avg loss 0.298004, throughput 3.98719K wps
Observed Improvement.
Begin Testing...
[Epoch 116 Batch 30/162] avg loss 0.000672868, throughput 4.05667K wps
[Epoch 116 Batch 60/162] avg loss 0.000694833, throughput 3.97142K wps
[Epoch 116 Batch 90/162] avg loss 0.000662029, throughput 3.96849K wps
[Epoch 116 Batch 120/162] avg loss 0.000672467, throughput 3.96532K wps
[Epoch 116 Batch 150/162] avg loss 0.000680665, throughput 3.95852K wps
Begin Testing...
[Epoch 116] train avg loss 0.000676276, dev acc 0.8933, dev avg loss 0.298261, throughput 3.98316K wps
[Epoch 117 Batch 30/162] avg loss 0.000684255, throughput 4.06182K wps
[Epoch 117 Batch 60/162] avg loss 0.000649015, throughput 3.95321K wps
[Epoch 117 Batch 90/162] avg loss 0.000604767, throughput 3.97387K wps
[Epoch 117 Batch 120/162] avg loss 0.000623633, throughput 3.94543K wps
[Epoch 117 Batch 150/162] avg loss 0.00057021, throughput 3.96337K wps
Begin Testing...
[Epoch 117] train avg loss 0.000634121, dev acc 0.8900, dev avg loss 0.299551, throughput 3.9792K wps
[Epoch 118 Batch 30/162] avg loss 0.000634701, throughput 4.0673K wps
[Epoch 118 Batch 60/162] avg loss 0.00061544, throughput 3.97677K wps
[Epoch 118 Batch 90/162] avg loss 0.00070156, throughput 3.9662K wps
[Epoch 118 Batch 120/162] avg loss 0.000651922, throughput 3.96479K wps
[Epoch 118 Batch 150/162] avg loss 0.000675033, throughput 3.95812K wps
Begin Testing...
[Epoch 118] train avg loss 0.000650746, dev acc 0.8944, dev avg loss 0.299145, throughput 3.98202K wps
Observed Improvement.
Begin Testing...
[Epoch 119 Batch 30/162] avg loss 0.00058176, throughput 4.05688K wps
[Epoch 119 Batch 60/162] avg loss 0.000651864, throughput 3.96248K wps
[Epoch 119 Batch 90/162] avg loss 0.000617659, throughput 3.96742K wps
[Epoch 119 Batch 120/162] avg loss 0.000679709, throughput 3.96951K wps
[Epoch 119 Batch 150/162] avg loss 0.000654648, throughput 3.96172K wps
Begin Testing...
[Epoch 119] train avg loss 0.00063176, dev acc 0.8922, dev avg loss 0.299605, throughput 3.98193K wps
[Epoch 120 Batch 30/162] avg loss 0.000556577, throughput 4.05542K wps
[Epoch 120 Batch 60/162] avg loss 0.000598865, throughput 3.97523K wps
[Epoch 120 Batch 90/162] avg loss 0.00061982, throughput 3.95512K wps
[Epoch 120 Batch 120/162] avg loss 0.000640394, throughput 3.96473K wps
[Epoch 120 Batch 150/162] avg loss 0.000638888, throughput 3.95779K wps
Begin Testing...
[Epoch 120] train avg loss 0.000610547, dev acc 0.8911, dev avg loss 0.300215, throughput 3.98119K wps
[Epoch 121 Batch 30/162] avg loss 0.000588921, throughput 4.06078K wps
[Epoch 121 Batch 60/162] avg loss 0.000559958, throughput 3.98268K wps
[Epoch 121 Batch 90/162] avg loss 0.000582907, throughput 3.97026K wps
[Epoch 121 Batch 120/162] avg loss 0.000639695, throughput 3.96401K wps
[Epoch 121 Batch 150/162] avg loss 0.000580245, throughput 3.96495K wps
Begin Testing...
[Epoch 121] train avg loss 0.000603651, dev acc 0.8922, dev avg loss 0.301156, throughput 3.9867K wps
[Epoch 122 Batch 30/162] avg loss 0.000570561, throughput 4.07605K wps
[Epoch 122 Batch 60/162] avg loss 0.000553821, throughput 3.96908K wps
[Epoch 122 Batch 90/162] avg loss 0.000605603, throughput 3.97012K wps
[Epoch 122 Batch 120/162] avg loss 0.000587359, throughput 3.97926K wps
[Epoch 122 Batch 150/162] avg loss 0.000530589, throughput 3.97366K wps
Begin Testing...
[Epoch 122] train avg loss 0.000570102, dev acc 0.8878, dev avg loss 0.303719, throughput 3.99176K wps
[Epoch 123 Batch 30/162] avg loss 0.000528637, throughput 4.067K wps
[Epoch 123 Batch 60/162] avg loss 0.000574327, throughput 3