Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
14617 lines (14616 sloc) 927 KB
Namespace(batch_size=50, data_name='MPQA', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='multichannel')
Use gpu0
maximum length (in tokens): 36
Done! Tokenizing Time=0.05s, #Sentences=10606
SentimentNet(
(embedding): Embedding(6250 -> 300, float32)
(embedding_extend): Embedding(6250 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/172] avg loss 0.0126655, throughput 0.314633K wps
[Epoch 0 Batch 60/172] avg loss 0.012152, throughput 2.14515K wps
[Epoch 0 Batch 90/172] avg loss 0.0124092, throughput 2.10079K wps
[Epoch 0 Batch 120/172] avg loss 0.0121792, throughput 2.14025K wps
[Epoch 0 Batch 150/172] avg loss 0.0125256, throughput 2.15032K wps
Begin Testing...
[Epoch 0] train avg loss 0.0123922, dev acc 0.7013, dev avg loss 0.588332, throughput 0.73508K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0119811, throughput 2.05609K wps
[Epoch 1 Batch 60/172] avg loss 0.0121349, throughput 2.1472K wps
[Epoch 1 Batch 90/172] avg loss 0.0119277, throughput 2.12971K wps
[Epoch 1 Batch 120/172] avg loss 0.0116403, throughput 2.10803K wps
[Epoch 1 Batch 150/172] avg loss 0.0116108, throughput 2.10161K wps
Begin Testing...
[Epoch 1] train avg loss 0.0118752, dev acc 0.7013, dev avg loss 0.564495, throughput 2.11143K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0114707, throughput 2.17002K wps
[Epoch 2 Batch 60/172] avg loss 0.0114825, throughput 2.12375K wps
[Epoch 2 Batch 90/172] avg loss 0.0111411, throughput 2.07774K wps
[Epoch 2 Batch 120/172] avg loss 0.0111995, throughput 2.11417K wps
[Epoch 2 Batch 150/172] avg loss 0.0111085, throughput 2.13935K wps
Begin Testing...
[Epoch 2] train avg loss 0.0112713, dev acc 0.7107, dev avg loss 0.536462, throughput 2.1201K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0105832, throughput 2.18328K wps
[Epoch 3 Batch 60/172] avg loss 0.0109819, throughput 2.13132K wps
[Epoch 3 Batch 90/172] avg loss 0.0103823, throughput 2.10431K wps
[Epoch 3 Batch 120/172] avg loss 0.0102565, throughput 2.11865K wps
[Epoch 3 Batch 150/172] avg loss 0.0105755, throughput 2.13583K wps
Begin Testing...
[Epoch 3] train avg loss 0.0105123, dev acc 0.7704, dev avg loss 0.502312, throughput 2.13003K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0100097, throughput 2.17076K wps
[Epoch 4 Batch 60/172] avg loss 0.0099625, throughput 2.08296K wps
[Epoch 4 Batch 90/172] avg loss 0.00967144, throughput 2.10357K wps
[Epoch 4 Batch 120/172] avg loss 0.00944186, throughput 2.13712K wps
[Epoch 4 Batch 150/172] avg loss 0.00963987, throughput 2.09062K wps
Begin Testing...
[Epoch 4] train avg loss 0.00971531, dev acc 0.7945, dev avg loss 0.464684, throughput 2.11527K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.00897961, throughput 2.15773K wps
[Epoch 5 Batch 60/172] avg loss 0.00915384, throughput 2.09848K wps
[Epoch 5 Batch 90/172] avg loss 0.00876488, throughput 2.12296K wps
[Epoch 5 Batch 120/172] avg loss 0.00900478, throughput 2.10323K wps
[Epoch 5 Batch 150/172] avg loss 0.00876584, throughput 2.14334K wps
Begin Testing...
[Epoch 5] train avg loss 0.00894122, dev acc 0.8281, dev avg loss 0.428755, throughput 2.11942K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.00848663, throughput 2.1823K wps
[Epoch 6 Batch 60/172] avg loss 0.00822595, throughput 2.15073K wps
[Epoch 6 Batch 90/172] avg loss 0.00841743, throughput 2.13024K wps
[Epoch 6 Batch 120/172] avg loss 0.0081487, throughput 2.13071K wps
[Epoch 6 Batch 150/172] avg loss 0.00807519, throughput 2.08817K wps
Begin Testing...
[Epoch 6] train avg loss 0.00821743, dev acc 0.8417, dev avg loss 0.398036, throughput 2.13113K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.00779896, throughput 2.18571K wps
[Epoch 7 Batch 60/172] avg loss 0.00788874, throughput 2.14971K wps
[Epoch 7 Batch 90/172] avg loss 0.0078715, throughput 2.13227K wps
[Epoch 7 Batch 120/172] avg loss 0.00769915, throughput 2.14289K wps
[Epoch 7 Batch 150/172] avg loss 0.00698229, throughput 2.10072K wps
Begin Testing...
[Epoch 7] train avg loss 0.00763195, dev acc 0.8470, dev avg loss 0.374271, throughput 2.13455K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00734412, throughput 2.19111K wps
[Epoch 8 Batch 60/172] avg loss 0.00714136, throughput 2.12634K wps
[Epoch 8 Batch 90/172] avg loss 0.00720233, throughput 2.14758K wps
[Epoch 8 Batch 120/172] avg loss 0.00719234, throughput 2.1224K wps
[Epoch 8 Batch 150/172] avg loss 0.00702099, throughput 2.10215K wps
Begin Testing...
[Epoch 8] train avg loss 0.00719018, dev acc 0.8564, dev avg loss 0.355893, throughput 2.13523K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00681207, throughput 2.17562K wps
[Epoch 9 Batch 60/172] avg loss 0.00677591, throughput 2.11598K wps
[Epoch 9 Batch 90/172] avg loss 0.00673441, throughput 2.10143K wps
[Epoch 9 Batch 120/172] avg loss 0.00685978, throughput 2.13478K wps
[Epoch 9 Batch 150/172] avg loss 0.00688196, throughput 2.12079K wps
Begin Testing...
[Epoch 9] train avg loss 0.00681969, dev acc 0.8700, dev avg loss 0.341472, throughput 2.1297K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00684449, throughput 2.19173K wps
[Epoch 10 Batch 60/172] avg loss 0.00651928, throughput 2.15069K wps
[Epoch 10 Batch 90/172] avg loss 0.00611258, throughput 2.1021K wps
[Epoch 10 Batch 120/172] avg loss 0.00654822, throughput 2.10843K wps
[Epoch 10 Batch 150/172] avg loss 0.00643713, throughput 2.10618K wps
Begin Testing...
[Epoch 10] train avg loss 0.00652948, dev acc 0.8742, dev avg loss 0.33094, throughput 2.13414K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00654922, throughput 2.1655K wps
[Epoch 11 Batch 60/172] avg loss 0.00608692, throughput 2.09004K wps
[Epoch 11 Batch 90/172] avg loss 0.0060823, throughput 2.10492K wps
[Epoch 11 Batch 120/172] avg loss 0.00631626, throughput 2.11453K wps
[Epoch 11 Batch 150/172] avg loss 0.00625615, throughput 2.13297K wps
Begin Testing...
[Epoch 11] train avg loss 0.00624146, dev acc 0.8763, dev avg loss 0.323076, throughput 2.11765K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.00588047, throughput 2.18295K wps
[Epoch 12 Batch 60/172] avg loss 0.00602356, throughput 2.13842K wps
[Epoch 12 Batch 90/172] avg loss 0.00609657, throughput 2.12487K wps
[Epoch 12 Batch 120/172] avg loss 0.00591915, throughput 2.08203K wps
[Epoch 12 Batch 150/172] avg loss 0.00620869, throughput 2.12691K wps
Begin Testing...
[Epoch 12] train avg loss 0.00604589, dev acc 0.8721, dev avg loss 0.318833, throughput 2.12748K wps
[Epoch 13 Batch 30/172] avg loss 0.00557063, throughput 2.12984K wps
[Epoch 13 Batch 60/172] avg loss 0.00601351, throughput 2.0892K wps
[Epoch 13 Batch 90/172] avg loss 0.00589459, throughput 2.07921K wps
[Epoch 13 Batch 120/172] avg loss 0.00599623, throughput 2.09347K wps
[Epoch 13 Batch 150/172] avg loss 0.00583628, throughput 2.11052K wps
Begin Testing...
[Epoch 13] train avg loss 0.00588172, dev acc 0.8795, dev avg loss 0.311447, throughput 2.10505K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.00568619, throughput 2.14781K wps
[Epoch 14 Batch 60/172] avg loss 0.00596304, throughput 2.08942K wps
[Epoch 14 Batch 90/172] avg loss 0.00580064, throughput 2.12913K wps
[Epoch 14 Batch 120/172] avg loss 0.00538032, throughput 2.11529K wps
[Epoch 14 Batch 150/172] avg loss 0.0056416, throughput 2.10573K wps
Begin Testing...
[Epoch 14] train avg loss 0.0057086, dev acc 0.8826, dev avg loss 0.307167, throughput 2.11942K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.00562328, throughput 2.16922K wps
[Epoch 15 Batch 60/172] avg loss 0.00569408, throughput 2.12158K wps
[Epoch 15 Batch 90/172] avg loss 0.00542712, throughput 2.08987K wps
[Epoch 15 Batch 120/172] avg loss 0.00586187, throughput 2.1049K wps
[Epoch 15 Batch 150/172] avg loss 0.00523263, throughput 2.12038K wps
Begin Testing...
[Epoch 15] train avg loss 0.00558066, dev acc 0.8826, dev avg loss 0.302837, throughput 2.12289K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.00593012, throughput 2.18096K wps
[Epoch 16 Batch 60/172] avg loss 0.00558195, throughput 2.11291K wps
[Epoch 16 Batch 90/172] avg loss 0.0052399, throughput 2.08751K wps
[Epoch 16 Batch 120/172] avg loss 0.00515386, throughput 2.12027K wps
[Epoch 16 Batch 150/172] avg loss 0.00519707, throughput 2.10589K wps
Begin Testing...
[Epoch 16] train avg loss 0.00543385, dev acc 0.8857, dev avg loss 0.300741, throughput 2.11907K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/172] avg loss 0.0057114, throughput 2.16668K wps
[Epoch 17 Batch 60/172] avg loss 0.00516197, throughput 2.1122K wps
[Epoch 17 Batch 90/172] avg loss 0.00506562, throughput 2.07668K wps
[Epoch 17 Batch 120/172] avg loss 0.00532022, throughput 2.13086K wps
[Epoch 17 Batch 150/172] avg loss 0.00570673, throughput 2.11883K wps
Begin Testing...
[Epoch 17] train avg loss 0.00533908, dev acc 0.8816, dev avg loss 0.297425, throughput 2.12146K wps
[Epoch 18 Batch 30/172] avg loss 0.0054754, throughput 2.16699K wps
[Epoch 18 Batch 60/172] avg loss 0.00511056, throughput 2.12649K wps
[Epoch 18 Batch 90/172] avg loss 0.00507575, throughput 2.14129K wps
[Epoch 18 Batch 120/172] avg loss 0.0051312, throughput 2.10531K wps
[Epoch 18 Batch 150/172] avg loss 0.0050901, throughput 2.09919K wps
Begin Testing...
[Epoch 18] train avg loss 0.0052224, dev acc 0.8857, dev avg loss 0.295526, throughput 2.13003K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00511274, throughput 2.19384K wps
[Epoch 19 Batch 60/172] avg loss 0.00485296, throughput 2.11175K wps
[Epoch 19 Batch 90/172] avg loss 0.00516709, throughput 2.10842K wps
[Epoch 19 Batch 120/172] avg loss 0.00530158, throughput 2.11344K wps
[Epoch 19 Batch 150/172] avg loss 0.00513915, throughput 2.12012K wps
Begin Testing...
[Epoch 19] train avg loss 0.0051255, dev acc 0.8836, dev avg loss 0.292903, throughput 2.12956K wps
[Epoch 20 Batch 30/172] avg loss 0.00490954, throughput 2.16673K wps
[Epoch 20 Batch 60/172] avg loss 0.00501123, throughput 2.09574K wps
[Epoch 20 Batch 90/172] avg loss 0.00526264, throughput 2.12873K wps
[Epoch 20 Batch 120/172] avg loss 0.00508868, throughput 2.12491K wps
[Epoch 20 Batch 150/172] avg loss 0.00487522, throughput 2.13082K wps
Begin Testing...
[Epoch 20] train avg loss 0.00497373, dev acc 0.8836, dev avg loss 0.290658, throughput 2.13173K wps
[Epoch 21 Batch 30/172] avg loss 0.00457537, throughput 2.13608K wps
[Epoch 21 Batch 60/172] avg loss 0.00513057, throughput 2.09783K wps
[Epoch 21 Batch 90/172] avg loss 0.00515661, throughput 2.13322K wps
[Epoch 21 Batch 120/172] avg loss 0.00466637, throughput 2.08408K wps
[Epoch 21 Batch 150/172] avg loss 0.0050419, throughput 2.12604K wps
Begin Testing...
[Epoch 21] train avg loss 0.0049075, dev acc 0.8836, dev avg loss 0.288987, throughput 2.11807K wps
[Epoch 22 Batch 30/172] avg loss 0.00435953, throughput 2.19325K wps
[Epoch 22 Batch 60/172] avg loss 0.00491777, throughput 2.12908K wps
[Epoch 22 Batch 90/172] avg loss 0.00492193, throughput 2.10273K wps
[Epoch 22 Batch 120/172] avg loss 0.00496495, throughput 2.10944K wps
[Epoch 22 Batch 150/172] avg loss 0.00489234, throughput 2.08376K wps
Begin Testing...
[Epoch 22] train avg loss 0.00480015, dev acc 0.8868, dev avg loss 0.287349, throughput 2.12234K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/172] avg loss 0.00454398, throughput 2.18422K wps
[Epoch 23 Batch 60/172] avg loss 0.00474352, throughput 2.10818K wps
[Epoch 23 Batch 90/172] avg loss 0.00485393, throughput 2.10363K wps
[Epoch 23 Batch 120/172] avg loss 0.00474293, throughput 2.09806K wps
[Epoch 23 Batch 150/172] avg loss 0.00442372, throughput 2.08815K wps
Begin Testing...
[Epoch 23] train avg loss 0.0046974, dev acc 0.8868, dev avg loss 0.286601, throughput 2.11289K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00446104, throughput 2.15198K wps
[Epoch 24 Batch 60/172] avg loss 0.00469163, throughput 2.11464K wps
[Epoch 24 Batch 90/172] avg loss 0.00469697, throughput 2.09488K wps
[Epoch 24 Batch 120/172] avg loss 0.00439997, throughput 2.1144K wps
[Epoch 24 Batch 150/172] avg loss 0.00464765, throughput 2.07631K wps
Begin Testing...
[Epoch 24] train avg loss 0.00459015, dev acc 0.8899, dev avg loss 0.284891, throughput 2.11133K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/172] avg loss 0.00449057, throughput 2.14892K wps
[Epoch 25 Batch 60/172] avg loss 0.0045545, throughput 2.13999K wps
[Epoch 25 Batch 90/172] avg loss 0.00454761, throughput 2.11092K wps
[Epoch 25 Batch 120/172] avg loss 0.00457514, throughput 2.08303K wps
[Epoch 25 Batch 150/172] avg loss 0.00486248, throughput 2.08399K wps
Begin Testing...
[Epoch 25] train avg loss 0.00453455, dev acc 0.8931, dev avg loss 0.284216, throughput 2.11266K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/172] avg loss 0.00433417, throughput 2.19168K wps
[Epoch 26 Batch 60/172] avg loss 0.00435556, throughput 2.13171K wps
[Epoch 26 Batch 90/172] avg loss 0.00439233, throughput 2.13505K wps
[Epoch 26 Batch 120/172] avg loss 0.00441866, throughput 2.11303K wps
[Epoch 26 Batch 150/172] avg loss 0.00453538, throughput 2.10893K wps
Begin Testing...
[Epoch 26] train avg loss 0.00443324, dev acc 0.8941, dev avg loss 0.283005, throughput 2.13389K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/172] avg loss 0.00430827, throughput 2.16512K wps
[Epoch 27 Batch 60/172] avg loss 0.00471227, throughput 2.12843K wps
[Epoch 27 Batch 90/172] avg loss 0.00428144, throughput 2.09366K wps
[Epoch 27 Batch 120/172] avg loss 0.00421939, throughput 2.09419K wps
[Epoch 27 Batch 150/172] avg loss 0.004513, throughput 2.08232K wps
Begin Testing...
[Epoch 27] train avg loss 0.00433096, dev acc 0.8910, dev avg loss 0.282534, throughput 2.10979K wps
[Epoch 28 Batch 30/172] avg loss 0.0044391, throughput 2.14855K wps
[Epoch 28 Batch 60/172] avg loss 0.0040799, throughput 2.08384K wps
[Epoch 28 Batch 90/172] avg loss 0.00422295, throughput 2.10393K wps
[Epoch 28 Batch 120/172] avg loss 0.0041934, throughput 2.08014K wps
[Epoch 28 Batch 150/172] avg loss 0.0041089, throughput 2.09347K wps
Begin Testing...
[Epoch 28] train avg loss 0.00426262, dev acc 0.8931, dev avg loss 0.281227, throughput 2.10122K wps
[Epoch 29 Batch 30/172] avg loss 0.00417192, throughput 2.17665K wps
[Epoch 29 Batch 60/172] avg loss 0.00384114, throughput 2.13695K wps
[Epoch 29 Batch 90/172] avg loss 0.00457117, throughput 2.13757K wps
[Epoch 29 Batch 120/172] avg loss 0.00396131, throughput 2.12847K wps
[Epoch 29 Batch 150/172] avg loss 0.00407933, throughput 2.12761K wps
Begin Testing...
[Epoch 29] train avg loss 0.00412474, dev acc 0.8973, dev avg loss 0.281366, throughput 2.14147K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/172] avg loss 0.00483663, throughput 2.18147K wps
[Epoch 30 Batch 60/172] avg loss 0.00368596, throughput 2.09995K wps
[Epoch 30 Batch 90/172] avg loss 0.0038131, throughput 2.09561K wps
[Epoch 30 Batch 120/172] avg loss 0.00403131, throughput 2.13687K wps
[Epoch 30 Batch 150/172] avg loss 0.00443157, throughput 2.13928K wps
Begin Testing...
[Epoch 30] train avg loss 0.00410793, dev acc 0.8941, dev avg loss 0.279714, throughput 2.12863K wps
[Epoch 31 Batch 30/172] avg loss 0.00415476, throughput 2.15087K wps
[Epoch 31 Batch 60/172] avg loss 0.00363188, throughput 2.13846K wps
[Epoch 31 Batch 90/172] avg loss 0.00410447, throughput 2.13481K wps
[Epoch 31 Batch 120/172] avg loss 0.00363519, throughput 2.1413K wps
[Epoch 31 Batch 150/172] avg loss 0.00394327, throughput 2.09462K wps
Begin Testing...
[Epoch 31] train avg loss 0.00397553, dev acc 0.8931, dev avg loss 0.279499, throughput 2.12643K wps
[Epoch 32 Batch 30/172] avg loss 0.00355086, throughput 2.15277K wps
[Epoch 32 Batch 60/172] avg loss 0.00398542, throughput 2.13192K wps
[Epoch 32 Batch 90/172] avg loss 0.00422257, throughput 2.13315K wps
[Epoch 32 Batch 120/172] avg loss 0.00404714, throughput 2.13812K wps
[Epoch 32 Batch 150/172] avg loss 0.00403189, throughput 2.14446K wps
Begin Testing...
[Epoch 32] train avg loss 0.0039119, dev acc 0.8952, dev avg loss 0.280908, throughput 2.13902K wps
[Epoch 33 Batch 30/172] avg loss 0.00366026, throughput 2.13736K wps
[Epoch 33 Batch 60/172] avg loss 0.00357518, throughput 2.11274K wps
[Epoch 33 Batch 90/172] avg loss 0.0038556, throughput 2.11936K wps
[Epoch 33 Batch 120/172] avg loss 0.00360337, throughput 2.13554K wps
[Epoch 33 Batch 150/172] avg loss 0.00433087, throughput 2.14261K wps
Begin Testing...
[Epoch 33] train avg loss 0.00384143, dev acc 0.8920, dev avg loss 0.279302, throughput 2.13079K wps
[Epoch 34 Batch 30/172] avg loss 0.00375364, throughput 2.16814K wps
[Epoch 34 Batch 60/172] avg loss 0.00400194, throughput 2.13154K wps
[Epoch 34 Batch 90/172] avg loss 0.00378662, throughput 2.13112K wps
[Epoch 34 Batch 120/172] avg loss 0.00362648, throughput 2.0875K wps
[Epoch 34 Batch 150/172] avg loss 0.0036973, throughput 2.08458K wps
Begin Testing...
[Epoch 34] train avg loss 0.00373857, dev acc 0.8962, dev avg loss 0.281009, throughput 2.12299K wps
[Epoch 35 Batch 30/172] avg loss 0.00360403, throughput 2.12521K wps
[Epoch 35 Batch 60/172] avg loss 0.00342386, throughput 2.06869K wps
[Epoch 35 Batch 90/172] avg loss 0.00380601, throughput 2.07852K wps
[Epoch 35 Batch 120/172] avg loss 0.00407511, throughput 2.12698K wps
[Epoch 35 Batch 150/172] avg loss 0.00348106, throughput 2.10205K wps
Begin Testing...
[Epoch 35] train avg loss 0.00370476, dev acc 0.8983, dev avg loss 0.278515, throughput 2.09486K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/172] avg loss 0.00340767, throughput 2.16008K wps
[Epoch 36 Batch 60/172] avg loss 0.00343906, throughput 2.09508K wps
[Epoch 36 Batch 90/172] avg loss 0.00347856, throughput 2.07125K wps
[Epoch 36 Batch 120/172] avg loss 0.00377428, throughput 2.07123K wps
[Epoch 36 Batch 150/172] avg loss 0.00365946, throughput 2.13585K wps
Begin Testing...
[Epoch 36] train avg loss 0.00360113, dev acc 0.8941, dev avg loss 0.278568, throughput 2.10461K wps
[Epoch 37 Batch 30/172] avg loss 0.00357502, throughput 2.15765K wps
[Epoch 37 Batch 60/172] avg loss 0.00341553, throughput 2.08499K wps
[Epoch 37 Batch 90/172] avg loss 0.00334801, throughput 2.08607K wps
[Epoch 37 Batch 120/172] avg loss 0.00355727, throughput 2.13589K wps
[Epoch 37 Batch 150/172] avg loss 0.0035326, throughput 2.12006K wps
Begin Testing...
[Epoch 37] train avg loss 0.00352384, dev acc 0.8973, dev avg loss 0.279238, throughput 2.11719K wps
[Epoch 38 Batch 30/172] avg loss 0.00366782, throughput 2.12256K wps
[Epoch 38 Batch 60/172] avg loss 0.00340707, throughput 2.10619K wps
[Epoch 38 Batch 90/172] avg loss 0.00349595, throughput 2.10199K wps
[Epoch 38 Batch 120/172] avg loss 0.00324987, throughput 2.07971K wps
[Epoch 38 Batch 150/172] avg loss 0.0036918, throughput 2.09633K wps
Begin Testing...
[Epoch 38] train avg loss 0.00348461, dev acc 0.8962, dev avg loss 0.279281, throughput 2.10095K wps
[Epoch 39 Batch 30/172] avg loss 0.00342332, throughput 2.11571K wps
[Epoch 39 Batch 60/172] avg loss 0.00327843, throughput 2.10178K wps
[Epoch 39 Batch 90/172] avg loss 0.00340686, throughput 2.11297K wps
[Epoch 39 Batch 120/172] avg loss 0.00347173, throughput 2.09169K wps
[Epoch 39 Batch 150/172] avg loss 0.00341565, throughput 2.09982K wps
Begin Testing...
[Epoch 39] train avg loss 0.00338878, dev acc 0.8952, dev avg loss 0.281169, throughput 2.10084K wps
[Epoch 40 Batch 30/172] avg loss 0.00367124, throughput 2.12605K wps
[Epoch 40 Batch 60/172] avg loss 0.00297049, throughput 2.12978K wps
[Epoch 40 Batch 90/172] avg loss 0.00337936, throughput 2.13383K wps
[Epoch 40 Batch 120/172] avg loss 0.00340157, throughput 2.08021K wps
[Epoch 40 Batch 150/172] avg loss 0.00308343, throughput 2.09638K wps
Begin Testing...
[Epoch 40] train avg loss 0.0033087, dev acc 0.8952, dev avg loss 0.280818, throughput 2.11502K wps
[Epoch 41 Batch 30/172] avg loss 0.00351824, throughput 2.13766K wps
[Epoch 41 Batch 60/172] avg loss 0.00299317, throughput 2.08291K wps
[Epoch 41 Batch 90/172] avg loss 0.00326196, throughput 2.08715K wps
[Epoch 41 Batch 120/172] avg loss 0.00317012, throughput 2.11844K wps
[Epoch 41 Batch 150/172] avg loss 0.00382234, throughput 2.09799K wps
Begin Testing...
[Epoch 41] train avg loss 0.00328531, dev acc 0.8941, dev avg loss 0.280908, throughput 2.10775K wps
[Epoch 42 Batch 30/172] avg loss 0.00261215, throughput 2.12717K wps
[Epoch 42 Batch 60/172] avg loss 0.00340674, throughput 2.11659K wps
[Epoch 42 Batch 90/172] avg loss 0.00346324, throughput 2.0779K wps
[Epoch 42 Batch 120/172] avg loss 0.00330889, throughput 2.09164K wps
[Epoch 42 Batch 150/172] avg loss 0.00323102, throughput 2.07561K wps
Begin Testing...
[Epoch 42] train avg loss 0.00324223, dev acc 0.8941, dev avg loss 0.281751, throughput 2.09525K wps
[Epoch 43 Batch 30/172] avg loss 0.00293275, throughput 2.15586K wps
[Epoch 43 Batch 60/172] avg loss 0.0029954, throughput 2.09321K wps
[Epoch 43 Batch 90/172] avg loss 0.00312966, throughput 2.12549K wps
[Epoch 43 Batch 120/172] avg loss 0.00289781, throughput 2.12964K wps
[Epoch 43 Batch 150/172] avg loss 0.00318207, throughput 2.11444K wps
Begin Testing...
[Epoch 43] train avg loss 0.00308361, dev acc 0.8962, dev avg loss 0.282952, throughput 2.12067K wps
[Epoch 44 Batch 30/172] avg loss 0.00305855, throughput 2.13882K wps
[Epoch 44 Batch 60/172] avg loss 0.00314027, throughput 2.10953K wps
[Epoch 44 Batch 90/172] avg loss 0.00308429, throughput 2.13195K wps
[Epoch 44 Batch 120/172] avg loss 0.00304618, throughput 2.10528K wps
[Epoch 44 Batch 150/172] avg loss 0.00294822, throughput 2.12408K wps
Begin Testing...
[Epoch 44] train avg loss 0.00305539, dev acc 0.8983, dev avg loss 0.28278, throughput 2.1176K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/172] avg loss 0.00299302, throughput 2.13691K wps
[Epoch 45 Batch 60/172] avg loss 0.00243811, throughput 2.07247K wps
[Epoch 45 Batch 90/172] avg loss 0.00334633, throughput 2.07418K wps
[Epoch 45 Batch 120/172] avg loss 0.00300421, throughput 2.065K wps
[Epoch 45 Batch 150/172] avg loss 0.00289225, throughput 2.08982K wps
Begin Testing...
[Epoch 45] train avg loss 0.00298587, dev acc 0.8962, dev avg loss 0.286444, throughput 2.09382K wps
[Epoch 46 Batch 30/172] avg loss 0.00283645, throughput 2.1591K wps
[Epoch 46 Batch 60/172] avg loss 0.00291182, throughput 2.08512K wps
[Epoch 46 Batch 90/172] avg loss 0.00296331, throughput 2.07591K wps
[Epoch 46 Batch 120/172] avg loss 0.00302862, throughput 2.08406K wps
[Epoch 46 Batch 150/172] avg loss 0.00292487, throughput 2.0722K wps
Begin Testing...
[Epoch 46] train avg loss 0.00289775, dev acc 0.8962, dev avg loss 0.285728, throughput 2.09342K wps
[Epoch 47 Batch 30/172] avg loss 0.00260301, throughput 2.13383K wps
[Epoch 47 Batch 60/172] avg loss 0.00298212, throughput 2.10568K wps
[Epoch 47 Batch 90/172] avg loss 0.00274576, throughput 2.14195K wps
[Epoch 47 Batch 120/172] avg loss 0.00300473, throughput 2.13679K wps
[Epoch 47 Batch 150/172] avg loss 0.00289813, throughput 2.13904K wps
Begin Testing...
[Epoch 47] train avg loss 0.00284046, dev acc 0.8962, dev avg loss 0.286181, throughput 2.12599K wps
[Epoch 48 Batch 30/172] avg loss 0.00282465, throughput 2.18996K wps
[Epoch 48 Batch 60/172] avg loss 0.00258821, throughput 2.10182K wps
[Epoch 48 Batch 90/172] avg loss 0.00306577, throughput 2.1017K wps
[Epoch 48 Batch 120/172] avg loss 0.00275873, throughput 2.12594K wps
[Epoch 48 Batch 150/172] avg loss 0.00286895, throughput 2.12831K wps
Begin Testing...
[Epoch 48] train avg loss 0.00277332, dev acc 0.8973, dev avg loss 0.287222, throughput 2.12761K wps
[Epoch 49 Batch 30/172] avg loss 0.00277457, throughput 2.15906K wps
[Epoch 49 Batch 60/172] avg loss 0.002862, throughput 2.14162K wps
[Epoch 49 Batch 90/172] avg loss 0.0026098, throughput 2.14079K wps
[Epoch 49 Batch 120/172] avg loss 0.00256156, throughput 2.11123K wps
[Epoch 49 Batch 150/172] avg loss 0.00284327, throughput 2.10192K wps
Begin Testing...
[Epoch 49] train avg loss 0.00274052, dev acc 0.8983, dev avg loss 0.28803, throughput 2.12501K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/172] avg loss 0.00264304, throughput 2.18456K wps
[Epoch 50 Batch 60/172] avg loss 0.00256961, throughput 2.09117K wps
[Epoch 50 Batch 90/172] avg loss 0.00281519, throughput 2.08187K wps
[Epoch 50 Batch 120/172] avg loss 0.00283223, throughput 2.08552K wps
[Epoch 50 Batch 150/172] avg loss 0.00261161, throughput 2.11677K wps
Begin Testing...
[Epoch 50] train avg loss 0.00269565, dev acc 0.8920, dev avg loss 0.288674, throughput 2.11485K wps
[Epoch 51 Batch 30/172] avg loss 0.00248415, throughput 2.16872K wps
[Epoch 51 Batch 60/172] avg loss 0.00256449, throughput 2.08976K wps
[Epoch 51 Batch 90/172] avg loss 0.0027636, throughput 2.10955K wps
[Epoch 51 Batch 120/172] avg loss 0.0024742, throughput 2.08468K wps
[Epoch 51 Batch 150/172] avg loss 0.00275473, throughput 2.10123K wps
Begin Testing...
[Epoch 51] train avg loss 0.00267945, dev acc 0.8962, dev avg loss 0.289219, throughput 2.11264K wps
[Epoch 52 Batch 30/172] avg loss 0.00234614, throughput 2.16082K wps
[Epoch 52 Batch 60/172] avg loss 0.00259792, throughput 2.08639K wps
[Epoch 52 Batch 90/172] avg loss 0.00267784, throughput 2.07925K wps
[Epoch 52 Batch 120/172] avg loss 0.0025534, throughput 2.11904K wps
[Epoch 52 Batch 150/172] avg loss 0.00287836, throughput 2.1143K wps
Begin Testing...
[Epoch 52] train avg loss 0.00259329, dev acc 0.8983, dev avg loss 0.290602, throughput 2.10654K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/172] avg loss 0.00246675, throughput 2.16498K wps
[Epoch 53 Batch 60/172] avg loss 0.00269203, throughput 2.11634K wps
[Epoch 53 Batch 90/172] avg loss 0.00225953, throughput 2.11728K wps
[Epoch 53 Batch 120/172] avg loss 0.00246183, throughput 2.12206K wps
[Epoch 53 Batch 150/172] avg loss 0.00260828, throughput 2.11014K wps
Begin Testing...
[Epoch 53] train avg loss 0.00249512, dev acc 0.8952, dev avg loss 0.294168, throughput 2.12832K wps
[Epoch 54 Batch 30/172] avg loss 0.00276375, throughput 2.15708K wps
[Epoch 54 Batch 60/172] avg loss 0.00249727, throughput 2.13168K wps
[Epoch 54 Batch 90/172] avg loss 0.00265008, throughput 2.12211K wps
[Epoch 54 Batch 120/172] avg loss 0.00235392, throughput 2.07685K wps
[Epoch 54 Batch 150/172] avg loss 0.00246748, throughput 2.10617K wps
Begin Testing...
[Epoch 54] train avg loss 0.00251423, dev acc 0.8952, dev avg loss 0.292883, throughput 2.11555K wps
[Epoch 55 Batch 30/172] avg loss 0.00189529, throughput 2.13031K wps
[Epoch 55 Batch 60/172] avg loss 0.00264074, throughput 2.09526K wps
[Epoch 55 Batch 90/172] avg loss 0.00251646, throughput 2.11868K wps
[Epoch 55 Batch 120/172] avg loss 0.00242882, throughput 2.12373K wps
[Epoch 55 Batch 150/172] avg loss 0.00237905, throughput 2.12004K wps
Begin Testing...
[Epoch 55] train avg loss 0.00241203, dev acc 0.8931, dev avg loss 0.297003, throughput 2.1155K wps
[Epoch 56 Batch 30/172] avg loss 0.00253112, throughput 2.11403K wps
[Epoch 56 Batch 60/172] avg loss 0.00246425, throughput 2.08059K wps
[Epoch 56 Batch 90/172] avg loss 0.00219654, throughput 2.08316K wps
[Epoch 56 Batch 120/172] avg loss 0.0025728, throughput 2.10951K wps
[Epoch 56 Batch 150/172] avg loss 0.00228473, throughput 2.13982K wps
Begin Testing...
[Epoch 56] train avg loss 0.00238562, dev acc 0.8973, dev avg loss 0.297291, throughput 2.10596K wps
[Epoch 57 Batch 30/172] avg loss 0.00232784, throughput 2.15042K wps
[Epoch 57 Batch 60/172] avg loss 0.00236124, throughput 2.1205K wps
[Epoch 57 Batch 90/172] avg loss 0.00232184, throughput 2.08529K wps
[Epoch 57 Batch 120/172] avg loss 0.00215046, throughput 2.12441K wps
[Epoch 57 Batch 150/172] avg loss 0.00218838, throughput 2.10848K wps
Begin Testing...
[Epoch 57] train avg loss 0.00229829, dev acc 0.8952, dev avg loss 0.298389, throughput 2.11776K wps
[Epoch 58 Batch 30/172] avg loss 0.00221191, throughput 2.11082K wps
[Epoch 58 Batch 60/172] avg loss 0.00231465, throughput 2.06502K wps
[Epoch 58 Batch 90/172] avg loss 0.00215476, throughput 2.07824K wps
[Epoch 58 Batch 120/172] avg loss 0.00219845, throughput 2.07971K wps
[Epoch 58 Batch 150/172] avg loss 0.00226083, throughput 2.08325K wps
Begin Testing...
[Epoch 58] train avg loss 0.00226908, dev acc 0.8962, dev avg loss 0.300918, throughput 2.08662K wps
[Epoch 59 Batch 30/172] avg loss 0.0022603, throughput 2.17887K wps
[Epoch 59 Batch 60/172] avg loss 0.00203183, throughput 2.13317K wps
[Epoch 59 Batch 90/172] avg loss 0.00235378, throughput 2.13315K wps
[Epoch 59 Batch 120/172] avg loss 0.00223137, throughput 2.13022K wps
[Epoch 59 Batch 150/172] avg loss 0.00241207, throughput 2.10588K wps
Begin Testing...
[Epoch 59] train avg loss 0.00225671, dev acc 0.8941, dev avg loss 0.301418, throughput 2.12908K wps
[Epoch 60 Batch 30/172] avg loss 0.00225689, throughput 2.12278K wps
[Epoch 60 Batch 60/172] avg loss 0.00215623, throughput 2.11477K wps
[Epoch 60 Batch 90/172] avg loss 0.00211417, throughput 2.10631K wps
[Epoch 60 Batch 120/172] avg loss 0.00205415, throughput 2.07599K wps
[Epoch 60 Batch 150/172] avg loss 0.00239126, throughput 2.10234K wps
Begin Testing...
[Epoch 60] train avg loss 0.00217685, dev acc 0.8952, dev avg loss 0.305282, throughput 2.10431K wps
[Epoch 61 Batch 30/172] avg loss 0.00226505, throughput 2.11836K wps
[Epoch 61 Batch 60/172] avg loss 0.0018974, throughput 2.09344K wps
[Epoch 61 Batch 90/172] avg loss 0.00218467, throughput 2.14094K wps
[Epoch 61 Batch 120/172] avg loss 0.00222768, throughput 2.12127K wps
[Epoch 61 Batch 150/172] avg loss 0.00238446, throughput 2.13614K wps
Begin Testing...
[Epoch 61] train avg loss 0.0021671, dev acc 0.8952, dev avg loss 0.309853, throughput 2.12318K wps
[Epoch 62 Batch 30/172] avg loss 0.00220022, throughput 2.13473K wps
[Epoch 62 Batch 60/172] avg loss 0.00198386, throughput 2.08065K wps
[Epoch 62 Batch 90/172] avg loss 0.0024527, throughput 2.10007K wps
[Epoch 62 Batch 120/172] avg loss 0.00205185, throughput 2.11303K wps
[Epoch 62 Batch 150/172] avg loss 0.00218915, throughput 2.0746K wps
Begin Testing...
[Epoch 62] train avg loss 0.00215986, dev acc 0.8931, dev avg loss 0.306647, throughput 2.09792K wps
[Epoch 63 Batch 30/172] avg loss 0.00204855, throughput 2.12109K wps
[Epoch 63 Batch 60/172] avg loss 0.00238076, throughput 2.07746K wps
[Epoch 63 Batch 90/172] avg loss 0.0017848, throughput 2.10257K wps
[Epoch 63 Batch 120/172] avg loss 0.00228392, throughput 2.12094K wps
[Epoch 63 Batch 150/172] avg loss 0.00214334, throughput 2.08673K wps
Begin Testing...
[Epoch 63] train avg loss 0.00209927, dev acc 0.8920, dev avg loss 0.308408, throughput 2.10447K wps
[Epoch 64 Batch 30/172] avg loss 0.0020923, throughput 2.12858K wps
[Epoch 64 Batch 60/172] avg loss 0.00204741, throughput 2.13243K wps
[Epoch 64 Batch 90/172] avg loss 0.00206204, throughput 2.14281K wps
[Epoch 64 Batch 120/172] avg loss 0.00186517, throughput 2.11768K wps
[Epoch 64 Batch 150/172] avg loss 0.00218209, throughput 2.09362K wps
Begin Testing...
[Epoch 64] train avg loss 0.00201784, dev acc 0.8931, dev avg loss 0.309405, throughput 2.12109K wps
[Epoch 65 Batch 30/172] avg loss 0.00172004, throughput 2.16826K wps
[Epoch 65 Batch 60/172] avg loss 0.00230178, throughput 2.12978K wps
[Epoch 65 Batch 90/172] avg loss 0.00217755, throughput 2.13206K wps
[Epoch 65 Batch 120/172] avg loss 0.00193044, throughput 2.09977K wps
[Epoch 65 Batch 150/172] avg loss 0.00200404, throughput 2.11886K wps
Begin Testing...
[Epoch 65] train avg loss 0.00203588, dev acc 0.8899, dev avg loss 0.312577, throughput 2.12664K wps
[Epoch 66 Batch 30/172] avg loss 0.00197513, throughput 2.16707K wps
[Epoch 66 Batch 60/172] avg loss 0.00190518, throughput 2.11374K wps
[Epoch 66 Batch 90/172] avg loss 0.00184592, throughput 2.13244K wps
[Epoch 66 Batch 120/172] avg loss 0.00232614, throughput 2.12679K wps
[Epoch 66 Batch 150/172] avg loss 0.00189793, throughput 2.11628K wps
Begin Testing...
[Epoch 66] train avg loss 0.0020121, dev acc 0.8910, dev avg loss 0.313974, throughput 2.1318K wps
[Epoch 67 Batch 30/172] avg loss 0.00170402, throughput 2.12693K wps
[Epoch 67 Batch 60/172] avg loss 0.00201406, throughput 2.08183K wps
[Epoch 67 Batch 90/172] avg loss 0.0019381, throughput 2.07978K wps
[Epoch 67 Batch 120/172] avg loss 0.00218114, throughput 2.11286K wps
[Epoch 67 Batch 150/172] avg loss 0.0018202, throughput 2.11227K wps
Begin Testing...
[Epoch 67] train avg loss 0.00193167, dev acc 0.8878, dev avg loss 0.312982, throughput 2.10678K wps
[Epoch 68 Batch 30/172] avg loss 0.00187671, throughput 2.1476K wps
[Epoch 68 Batch 60/172] avg loss 0.00170848, throughput 2.08752K wps
[Epoch 68 Batch 90/172] avg loss 0.00195582, throughput 2.06502K wps
[Epoch 68 Batch 120/172] avg loss 0.00204559, throughput 2.07469K wps
[Epoch 68 Batch 150/172] avg loss 0.00183821, throughput 2.0551K wps
Begin Testing...
[Epoch 68] train avg loss 0.00187523, dev acc 0.8868, dev avg loss 0.317394, throughput 2.08824K wps
[Epoch 69 Batch 30/172] avg loss 0.00181876, throughput 2.15096K wps
[Epoch 69 Batch 60/172] avg loss 0.00204055, throughput 2.10984K wps
[Epoch 69 Batch 90/172] avg loss 0.00188383, throughput 2.09082K wps
[Epoch 69 Batch 120/172] avg loss 0.00165548, throughput 2.11965K wps
[Epoch 69 Batch 150/172] avg loss 0.00181547, throughput 2.09464K wps
Begin Testing...
[Epoch 69] train avg loss 0.00184855, dev acc 0.8889, dev avg loss 0.321765, throughput 2.10854K wps
[Epoch 70 Batch 30/172] avg loss 0.00188031, throughput 2.12619K wps
[Epoch 70 Batch 60/172] avg loss 0.00188724, throughput 2.08917K wps
[Epoch 70 Batch 90/172] avg loss 0.00178648, throughput 2.08195K wps
[Epoch 70 Batch 120/172] avg loss 0.00181529, throughput 2.07753K wps
[Epoch 70 Batch 150/172] avg loss 0.00186309, throughput 2.08422K wps
Begin Testing...
[Epoch 70] train avg loss 0.00185967, dev acc 0.8889, dev avg loss 0.321542, throughput 2.09729K wps
[Epoch 71 Batch 30/172] avg loss 0.00169088, throughput 2.15849K wps
[Epoch 71 Batch 60/172] avg loss 0.00162652, throughput 2.07591K wps
[Epoch 71 Batch 90/172] avg loss 0.0017199, throughput 2.08599K wps
[Epoch 71 Batch 120/172] avg loss 0.00201683, throughput 2.07396K wps
[Epoch 71 Batch 150/172] avg loss 0.00184906, throughput 2.07136K wps
Begin Testing...
[Epoch 71] train avg loss 0.001847, dev acc 0.8868, dev avg loss 0.320872, throughput 2.09379K wps
[Epoch 72 Batch 30/172] avg loss 0.00168021, throughput 2.10912K wps
[Epoch 72 Batch 60/172] avg loss 0.0015995, throughput 2.07388K wps
[Epoch 72 Batch 90/172] avg loss 0.00192745, throughput 2.05765K wps
[Epoch 72 Batch 120/172] avg loss 0.00185728, throughput 2.06828K wps
[Epoch 72 Batch 150/172] avg loss 0.00185503, throughput 2.08197K wps
Begin Testing...
[Epoch 72] train avg loss 0.00175995, dev acc 0.8868, dev avg loss 0.325166, throughput 2.0786K wps
[Epoch 73 Batch 30/172] avg loss 0.00159045, throughput 2.13874K wps
[Epoch 73 Batch 60/172] avg loss 0.00182015, throughput 2.07478K wps
[Epoch 73 Batch 90/172] avg loss 0.00164198, throughput 2.11249K wps
[Epoch 73 Batch 120/172] avg loss 0.00178141, throughput 2.08306K wps
[Epoch 73 Batch 150/172] avg loss 0.00185606, throughput 2.13761K wps
Begin Testing...
[Epoch 73] train avg loss 0.0017711, dev acc 0.8878, dev avg loss 0.324159, throughput 2.11242K wps
[Epoch 74 Batch 30/172] avg loss 0.00157464, throughput 2.12757K wps
[Epoch 74 Batch 60/172] avg loss 0.0016901, throughput 2.07731K wps
[Epoch 74 Batch 90/172] avg loss 0.0016458, throughput 2.08851K wps
[Epoch 74 Batch 120/172] avg loss 0.00169373, throughput 2.11405K wps
[Epoch 74 Batch 150/172] avg loss 0.00168742, throughput 2.07304K wps
Begin Testing...
[Epoch 74] train avg loss 0.00167213, dev acc 0.8889, dev avg loss 0.334488, throughput 2.10029K wps
[Epoch 75 Batch 30/172] avg loss 0.00159106, throughput 2.17988K wps
[Epoch 75 Batch 60/172] avg loss 0.00167324, throughput 2.13111K wps
[Epoch 75 Batch 90/172] avg loss 0.00179118, throughput 2.07575K wps
[Epoch 75 Batch 120/172] avg loss 0.00169765, throughput 2.07417K wps
[Epoch 75 Batch 150/172] avg loss 0.0015412, throughput 2.0916K wps
Begin Testing...
[Epoch 75] train avg loss 0.00169648, dev acc 0.8878, dev avg loss 0.331386, throughput 2.11376K wps
[Epoch 76 Batch 30/172] avg loss 0.00146972, throughput 2.18917K wps
[Epoch 76 Batch 60/172] avg loss 0.00155826, throughput 2.09747K wps
[Epoch 76 Batch 90/172] avg loss 0.00204919, throughput 2.11889K wps
[Epoch 76 Batch 120/172] avg loss 0.00166459, throughput 2.1261K wps
[Epoch 76 Batch 150/172] avg loss 0.00175363, throughput 2.12814K wps
Begin Testing...
[Epoch 76] train avg loss 0.00168514, dev acc 0.8868, dev avg loss 0.333138, throughput 2.12972K wps
[Epoch 77 Batch 30/172] avg loss 0.0015806, throughput 2.11974K wps
[Epoch 77 Batch 60/172] avg loss 0.001756, throughput 2.08254K wps
[Epoch 77 Batch 90/172] avg loss 0.00165291, throughput 2.08117K wps
[Epoch 77 Batch 120/172] avg loss 0.00139771, throughput 2.07968K wps
[Epoch 77 Batch 150/172] avg loss 0.00184383, throughput 2.10687K wps
Begin Testing...
[Epoch 77] train avg loss 0.00166323, dev acc 0.8836, dev avg loss 0.33173, throughput 2.09516K wps
[Epoch 78 Batch 30/172] avg loss 0.00146894, throughput 2.15133K wps
[Epoch 78 Batch 60/172] avg loss 0.0019385, throughput 2.12306K wps
[Epoch 78 Batch 90/172] avg loss 0.0013466, throughput 2.07295K wps
[Epoch 78 Batch 120/172] avg loss 0.00176998, throughput 2.07972K wps
[Epoch 78 Batch 150/172] avg loss 0.00184779, throughput 2.09516K wps
Begin Testing...
[Epoch 78] train avg loss 0.00165818, dev acc 0.8868, dev avg loss 0.341194, throughput 2.10674K wps
[Epoch 79 Batch 30/172] avg loss 0.00159884, throughput 2.13908K wps
[Epoch 79 Batch 60/172] avg loss 0.00161603, throughput 2.0684K wps
[Epoch 79 Batch 90/172] avg loss 0.00148378, throughput 2.11868K wps
[Epoch 79 Batch 120/172] avg loss 0.00164849, throughput 2.11989K wps
[Epoch 79 Batch 150/172] avg loss 0.00161798, throughput 2.12866K wps
Begin Testing...
[Epoch 79] train avg loss 0.0016108, dev acc 0.8847, dev avg loss 0.336242, throughput 2.11411K wps
[Epoch 80 Batch 30/172] avg loss 0.00143635, throughput 2.14102K wps
[Epoch 80 Batch 60/172] avg loss 0.00169595, throughput 2.13348K wps
[Epoch 80 Batch 90/172] avg loss 0.00152683, throughput 2.13017K wps
[Epoch 80 Batch 120/172] avg loss 0.00164757, throughput 2.13481K wps
[Epoch 80 Batch 150/172] avg loss 0.00149859, throughput 2.13552K wps
Begin Testing...
[Epoch 80] train avg loss 0.00155801, dev acc 0.8805, dev avg loss 0.335287, throughput 2.13446K wps
[Epoch 81 Batch 30/172] avg loss 0.00136639, throughput 2.17387K wps
[Epoch 81 Batch 60/172] avg loss 0.00185327, throughput 2.11155K wps
[Epoch 81 Batch 90/172] avg loss 0.0014491, throughput 2.12356K wps
[Epoch 81 Batch 120/172] avg loss 0.0015729, throughput 2.12351K wps
[Epoch 81 Batch 150/172] avg loss 0.00177191, throughput 2.13803K wps
Begin Testing...
[Epoch 81] train avg loss 0.00159322, dev acc 0.8836, dev avg loss 0.341354, throughput 2.13344K wps
[Epoch 82 Batch 30/172] avg loss 0.00171811, throughput 2.15702K wps
[Epoch 82 Batch 60/172] avg loss 0.00155645, throughput 2.1056K wps
[Epoch 82 Batch 90/172] avg loss 0.00141172, throughput 2.07055K wps
[Epoch 82 Batch 120/172] avg loss 0.0014255, throughput 2.08198K wps
[Epoch 82 Batch 150/172] avg loss 0.00161993, throughput 2.12161K wps
Begin Testing...
[Epoch 82] train avg loss 0.00152267, dev acc 0.8805, dev avg loss 0.341379, throughput 2.10839K wps
[Epoch 83 Batch 30/172] avg loss 0.00154968, throughput 2.1313K wps
[Epoch 83 Batch 60/172] avg loss 0.00156591, throughput 2.10523K wps
[Epoch 83 Batch 90/172] avg loss 0.00133625, throughput 2.13649K wps
[Epoch 83 Batch 120/172] avg loss 0.00157735, throughput 2.10874K wps
[Epoch 83 Batch 150/172] avg loss 0.001675, throughput 2.12112K wps
Begin Testing...
[Epoch 83] train avg loss 0.00151839, dev acc 0.8826, dev avg loss 0.34345, throughput 2.11707K wps
[Epoch 84 Batch 30/172] avg loss 0.00168074, throughput 2.14008K wps
[Epoch 84 Batch 60/172] avg loss 0.0013355, throughput 2.09864K wps
[Epoch 84 Batch 90/172] avg loss 0.00152017, throughput 2.08833K wps
[Epoch 84 Batch 120/172] avg loss 0.00133932, throughput 2.10563K wps
[Epoch 84 Batch 150/172] avg loss 0.0015236, throughput 2.10955K wps
Begin Testing...
[Epoch 84] train avg loss 0.00149571, dev acc 0.8836, dev avg loss 0.347979, throughput 2.10992K wps
[Epoch 85 Batch 30/172] avg loss 0.00144911, throughput 2.15566K wps
[Epoch 85 Batch 60/172] avg loss 0.00131902, throughput 2.07674K wps
[Epoch 85 Batch 90/172] avg loss 0.00145889, throughput 2.11591K wps
[Epoch 85 Batch 120/172] avg loss 0.00180869, throughput 2.11568K wps
[Epoch 85 Batch 150/172] avg loss 0.00145545, throughput 2.09535K wps
Begin Testing...
[Epoch 85] train avg loss 0.00146148, dev acc 0.8805, dev avg loss 0.345972, throughput 2.11451K wps
[Epoch 86 Batch 30/172] avg loss 0.00130344, throughput 2.16781K wps
[Epoch 86 Batch 60/172] avg loss 0.0015071, throughput 2.13211K wps
[Epoch 86 Batch 90/172] avg loss 0.0011678, throughput 2.07253K wps
[Epoch 86 Batch 120/172] avg loss 0.0016943, throughput 2.09835K wps
[Epoch 86 Batch 150/172] avg loss 0.00141474, throughput 2.07976K wps
Begin Testing...
[Epoch 86] train avg loss 0.00143763, dev acc 0.8816, dev avg loss 0.346974, throughput 2.11119K wps
[Epoch 87 Batch 30/172] avg loss 0.00143727, throughput 2.17712K wps
[Epoch 87 Batch 60/172] avg loss 0.00140405, throughput 2.12778K wps
[Epoch 87 Batch 90/172] avg loss 0.001485, throughput 2.12602K wps
[Epoch 87 Batch 120/172] avg loss 0.00127118, throughput 2.07553K wps
[Epoch 87 Batch 150/172] avg loss 0.00147811, throughput 2.09242K wps
Begin Testing...
[Epoch 87] train avg loss 0.00142696, dev acc 0.8868, dev avg loss 0.356355, throughput 2.1145K wps
[Epoch 88 Batch 30/172] avg loss 0.00143863, throughput 2.13929K wps
[Epoch 88 Batch 60/172] avg loss 0.00140303, throughput 2.10629K wps
[Epoch 88 Batch 90/172] avg loss 0.0017304, throughput 2.12527K wps
[Epoch 88 Batch 120/172] avg loss 0.00135684, throughput 2.13123K wps
[Epoch 88 Batch 150/172] avg loss 0.0013523, throughput 2.12714K wps
Begin Testing...
[Epoch 88] train avg loss 0.0014676, dev acc 0.8805, dev avg loss 0.351033, throughput 2.12012K wps
[Epoch 89 Batch 30/172] avg loss 0.00143537, throughput 2.12932K wps
[Epoch 89 Batch 60/172] avg loss 0.00127075, throughput 2.06488K wps
[Epoch 89 Batch 90/172] avg loss 0.00170851, throughput 2.12784K wps
[Epoch 89 Batch 120/172] avg loss 0.00133461, throughput 2.11147K wps
[Epoch 89 Batch 150/172] avg loss 0.00142221, throughput 2.10886K wps
Begin Testing...
[Epoch 89] train avg loss 0.00140807, dev acc 0.8784, dev avg loss 0.351497, throughput 2.11079K wps
[Epoch 90 Batch 30/172] avg loss 0.00136172, throughput 2.16397K wps
[Epoch 90 Batch 60/172] avg loss 0.00134547, throughput 2.06792K wps
[Epoch 90 Batch 90/172] avg loss 0.00139913, throughput 2.12131K wps
[Epoch 90 Batch 120/172] avg loss 0.00151863, throughput 2.13999K wps
[Epoch 90 Batch 150/172] avg loss 0.00134178, throughput 2.14026K wps
Begin Testing...
[Epoch 90] train avg loss 0.00139131, dev acc 0.8784, dev avg loss 0.352647, throughput 2.1223K wps
[Epoch 91 Batch 30/172] avg loss 0.00138014, throughput 2.14878K wps
[Epoch 91 Batch 60/172] avg loss 0.00143091, throughput 2.10133K wps
[Epoch 91 Batch 90/172] avg loss 0.0011911, throughput 2.08187K wps
[Epoch 91 Batch 120/172] avg loss 0.00161297, throughput 2.13067K wps
[Epoch 91 Batch 150/172] avg loss 0.00127189, throughput 2.11577K wps
Begin Testing...
[Epoch 91] train avg loss 0.00138728, dev acc 0.8836, dev avg loss 0.358024, throughput 2.11423K wps
[Epoch 92 Batch 30/172] avg loss 0.00141337, throughput 2.17408K wps
[Epoch 92 Batch 60/172] avg loss 0.00129602, throughput 2.10551K wps
[Epoch 92 Batch 90/172] avg loss 0.00134989, throughput 2.07558K wps
[Epoch 92 Batch 120/172] avg loss 0.0014132, throughput 2.11185K wps
[Epoch 92 Batch 150/172] avg loss 0.00129919, throughput 2.09092K wps
Begin Testing...
[Epoch 92] train avg loss 0.00136754, dev acc 0.8816, dev avg loss 0.354149, throughput 2.11221K wps
[Epoch 93 Batch 30/172] avg loss 0.00136772, throughput 2.15158K wps
[Epoch 93 Batch 60/172] avg loss 0.00123932, throughput 2.12225K wps
[Epoch 93 Batch 90/172] avg loss 0.00123095, throughput 2.10638K wps
[Epoch 93 Batch 120/172] avg loss 0.0012903, throughput 2.12669K wps
[Epoch 93 Batch 150/172] avg loss 0.00127953, throughput 2.11066K wps
Begin Testing...
[Epoch 93] train avg loss 0.0013186, dev acc 0.8826, dev avg loss 0.362901, throughput 2.1249K wps
[Epoch 94 Batch 30/172] avg loss 0.00113333, throughput 2.16665K wps
[Epoch 94 Batch 60/172] avg loss 0.00130793, throughput 2.11437K wps
[Epoch 94 Batch 90/172] avg loss 0.00149194, throughput 2.07827K wps
[Epoch 94 Batch 120/172] avg loss 0.00134964, throughput 2.0953K wps
[Epoch 94 Batch 150/172] avg loss 0.00133875, throughput 2.1155K wps
Begin Testing...
[Epoch 94] train avg loss 0.00132458, dev acc 0.8816, dev avg loss 0.358166, throughput 2.11003K wps
[Epoch 95 Batch 30/172] avg loss 0.00138936, throughput 2.17769K wps
[Epoch 95 Batch 60/172] avg loss 0.00136634, throughput 2.12738K wps
[Epoch 95 Batch 90/172] avg loss 0.00131196, throughput 2.12116K wps
[Epoch 95 Batch 120/172] avg loss 0.00120875, throughput 2.10708K wps
[Epoch 95 Batch 150/172] avg loss 0.00153737, throughput 2.10894K wps
Begin Testing...
[Epoch 95] train avg loss 0.00134363, dev acc 0.8836, dev avg loss 0.364583, throughput 2.12691K wps
[Epoch 96 Batch 30/172] avg loss 0.00119568, throughput 2.12482K wps
[Epoch 96 Batch 60/172] avg loss 0.00119523, throughput 2.09995K wps
[Epoch 96 Batch 90/172] avg loss 0.00124606, throughput 2.06907K wps
[Epoch 96 Batch 120/172] avg loss 0.0015083, throughput 2.13427K wps
[Epoch 96 Batch 150/172] avg loss 0.00129724, throughput 2.13675K wps
Begin Testing...
[Epoch 96] train avg loss 0.00127994, dev acc 0.8857, dev avg loss 0.372558, throughput 2.1155K wps
[Epoch 97 Batch 30/172] avg loss 0.00121768, throughput 2.1305K wps
[Epoch 97 Batch 60/172] avg loss 0.00131524, throughput 2.08088K wps
[Epoch 97 Batch 90/172] avg loss 0.00148479, throughput 2.08265K wps
[Epoch 97 Batch 120/172] avg loss 0.00115987, throughput 2.09963K wps
[Epoch 97 Batch 150/172] avg loss 0.00134824, throughput 2.11603K wps
Begin Testing...
[Epoch 97] train avg loss 0.0013214, dev acc 0.8836, dev avg loss 0.367144, throughput 2.09958K wps
[Epoch 98 Batch 30/172] avg loss 0.00113855, throughput 2.16804K wps
[Epoch 98 Batch 60/172] avg loss 0.00133916, throughput 2.11386K wps
[Epoch 98 Batch 90/172] avg loss 0.00123119, throughput 2.08662K wps
[Epoch 98 Batch 120/172] avg loss 0.00124544, throughput 2.08914K wps
[Epoch 98 Batch 150/172] avg loss 0.00116877, throughput 2.09749K wps
Begin Testing...
[Epoch 98] train avg loss 0.00124652, dev acc 0.8878, dev avg loss 0.376628, throughput 2.11026K wps
[Epoch 99 Batch 30/172] avg loss 0.00130783, throughput 2.17169K wps
[Epoch 99 Batch 60/172] avg loss 0.00135404, throughput 2.0994K wps
[Epoch 99 Batch 90/172] avg loss 0.00135436, throughput 2.11138K wps
[Epoch 99 Batch 120/172] avg loss 0.00109161, throughput 2.10636K wps
[Epoch 99 Batch 150/172] avg loss 0.00116731, throughput 2.09732K wps
Begin Testing...
[Epoch 99] train avg loss 0.00125288, dev acc 0.8795, dev avg loss 0.370413, throughput 2.11883K wps
[Epoch 100 Batch 30/172] avg loss 0.00136197, throughput 2.14274K wps
[Epoch 100 Batch 60/172] avg loss 0.00119943, throughput 2.091K wps
[Epoch 100 Batch 90/172] avg loss 0.00116803, throughput 2.08715K wps
[Epoch 100 Batch 120/172] avg loss 0.00103863, throughput 2.12813K wps
[Epoch 100 Batch 150/172] avg loss 0.00116729, throughput 2.12344K wps
Begin Testing...
[Epoch 100] train avg loss 0.00121296, dev acc 0.8836, dev avg loss 0.377772, throughput 2.11291K wps
[Epoch 101 Batch 30/172] avg loss 0.00135222, throughput 2.16728K wps
[Epoch 101 Batch 60/172] avg loss 0.00119699, throughput 2.13978K wps
[Epoch 101 Batch 90/172] avg loss 0.00106595, throughput 2.1356K wps
[Epoch 101 Batch 120/172] avg loss 0.00121874, throughput 2.11753K wps
[Epoch 101 Batch 150/172] avg loss 0.00148189, throughput 2.11723K wps
Begin Testing...
[Epoch 101] train avg loss 0.00125542, dev acc 0.8847, dev avg loss 0.380005, throughput 2.13351K wps
[Epoch 102 Batch 30/172] avg loss 0.00129931, throughput 2.15262K wps
[Epoch 102 Batch 60/172] avg loss 0.00123223, throughput 2.0722K wps
[Epoch 102 Batch 90/172] avg loss 0.00118572, throughput 2.12165K wps
[Epoch 102 Batch 120/172] avg loss 0.00125107, throughput 2.13767K wps
[Epoch 102 Batch 150/172] avg loss 0.00121392, throughput 2.12373K wps
Begin Testing...
[Epoch 102] train avg loss 0.00122458, dev acc 0.8805, dev avg loss 0.375981, throughput 2.11647K wps
[Epoch 103 Batch 30/172] avg loss 0.00108206, throughput 2.14717K wps
[Epoch 103 Batch 60/172] avg loss 0.00145146, throughput 2.08277K wps
[Epoch 103 Batch 90/172] avg loss 0.00109521, throughput 2.07348K wps
[Epoch 103 Batch 120/172] avg loss 0.0012889, throughput 2.0875K wps
[Epoch 103 Batch 150/172] avg loss 0.00130317, throughput 2.11036K wps
Begin Testing...
[Epoch 103] train avg loss 0.00123522, dev acc 0.8826, dev avg loss 0.377744, throughput 2.10491K wps
[Epoch 104 Batch 30/172] avg loss 0.00111754, throughput 2.16757K wps
[Epoch 104 Batch 60/172] avg loss 0.00114604, throughput 2.12977K wps
[Epoch 104 Batch 90/172] avg loss 0.00146711, throughput 2.08466K wps
[Epoch 104 Batch 120/172] avg loss 0.00126538, throughput 2.10313K wps
[Epoch 104 Batch 150/172] avg loss 0.00116142, throughput 2.11113K wps
Begin Testing...
[Epoch 104] train avg loss 0.00120776, dev acc 0.8784, dev avg loss 0.376971, throughput 2.11929K wps
[Epoch 105 Batch 30/172] avg loss 0.00103363, throughput 2.13135K wps
[Epoch 105 Batch 60/172] avg loss 0.00121643, throughput 2.06723K wps
[Epoch 105 Batch 90/172] avg loss 0.00128198, throughput 2.10148K wps
[Epoch 105 Batch 120/172] avg loss 0.00134665, throughput 2.10913K wps
[Epoch 105 Batch 150/172] avg loss 0.00104727, throughput 2.11947K wps
Begin Testing...
[Epoch 105] train avg loss 0.00120486, dev acc 0.8836, dev avg loss 0.385693, throughput 2.1095K wps
[Epoch 106 Batch 30/172] avg loss 0.00104036, throughput 2.14513K wps
[Epoch 106 Batch 60/172] avg loss 0.00107495, throughput 2.10627K wps
[Epoch 106 Batch 90/172] avg loss 0.0010523, throughput 2.07525K wps
[Epoch 106 Batch 120/172] avg loss 0.00128087, throughput 2.08652K wps
[Epoch 106 Batch 150/172] avg loss 0.00130804, throughput 2.07767K wps
Begin Testing...
[Epoch 106] train avg loss 0.00116552, dev acc 0.8805, dev avg loss 0.382519, throughput 2.09455K wps
[Epoch 107 Batch 30/172] avg loss 0.00113691, throughput 2.13599K wps
[Epoch 107 Batch 60/172] avg loss 0.000975227, throughput 2.09401K wps
[Epoch 107 Batch 90/172] avg loss 0.00114376, throughput 2.08254K wps
[Epoch 107 Batch 120/172] avg loss 0.00125697, throughput 2.07593K wps
[Epoch 107 Batch 150/172] avg loss 0.00116234, throughput 2.12594K wps
Begin Testing...
[Epoch 107] train avg loss 0.00117474, dev acc 0.8774, dev avg loss 0.381376, throughput 2.10607K wps
[Epoch 108 Batch 30/172] avg loss 0.00117251, throughput 2.12842K wps
[Epoch 108 Batch 60/172] avg loss 0.00117207, throughput 2.0714K wps
[Epoch 108 Batch 90/172] avg loss 0.00118846, throughput 2.14353K wps
[Epoch 108 Batch 120/172] avg loss 0.00105695, throughput 2.12659K wps
[Epoch 108 Batch 150/172] avg loss 0.00127915, throughput 2.13801K wps
Begin Testing...
[Epoch 108] train avg loss 0.00118214, dev acc 0.8795, dev avg loss 0.384451, throughput 2.11875K wps
[Epoch 109 Batch 30/172] avg loss 0.00111611, throughput 2.1651K wps
[Epoch 109 Batch 60/172] avg loss 0.00125684, throughput 2.10496K wps
[Epoch 109 Batch 90/172] avg loss 0.00119276, throughput 2.13204K wps
[Epoch 109 Batch 120/172] avg loss 0.00138295, throughput 2.13792K wps
[Epoch 109 Batch 150/172] avg loss 0.000924943, throughput 2.11308K wps
Begin Testing...
[Epoch 109] train avg loss 0.00118914, dev acc 0.8795, dev avg loss 0.385189, throughput 2.12459K wps
[Epoch 110 Batch 30/172] avg loss 0.00104929, throughput 2.17571K wps
[Epoch 110 Batch 60/172] avg loss 0.00106527, throughput 2.10597K wps
[Epoch 110 Batch 90/172] avg loss 0.00109421, throughput 2.06454K wps
[Epoch 110 Batch 120/172] avg loss 0.00116498, throughput 2.07623K wps
[Epoch 110 Batch 150/172] avg loss 0.00107361, throughput 2.06961K wps
Begin Testing...
[Epoch 110] train avg loss 0.00109703, dev acc 0.8826, dev avg loss 0.393887, throughput 2.09571K wps
[Epoch 111 Batch 30/172] avg loss 0.000991749, throughput 2.14644K wps
[Epoch 111 Batch 60/172] avg loss 0.00119745, throughput 2.12983K wps
[Epoch 111 Batch 90/172] avg loss 0.00100789, throughput 2.13998K wps
[Epoch 111 Batch 120/172] avg loss 0.00122076, throughput 2.07381K wps
[Epoch 111 Batch 150/172] avg loss 0.00105855, throughput 2.08945K wps
Begin Testing...
[Epoch 111] train avg loss 0.00109967, dev acc 0.8784, dev avg loss 0.391366, throughput 2.1185K wps
[Epoch 112 Batch 30/172] avg loss 0.00112708, throughput 2.16958K wps
[Epoch 112 Batch 60/172] avg loss 0.00113762, throughput 2.12977K wps
[Epoch 112 Batch 90/172] avg loss 0.00106015, throughput 2.13803K wps
[Epoch 112 Batch 120/172] avg loss 0.00100856, throughput 2.13216K wps
[Epoch 112 Batch 150/172] avg loss 0.00101968, throughput 2.11995K wps
Begin Testing...
[Epoch 112] train avg loss 0.0011289, dev acc 0.8774, dev avg loss 0.390385, throughput 2.13012K wps
[Epoch 113 Batch 30/172] avg loss 0.000937368, throughput 2.12769K wps
[Epoch 113 Batch 60/172] avg loss 0.00102531, throughput 2.08361K wps
[Epoch 113 Batch 90/172] avg loss 0.00126994, throughput 2.07763K wps
[Epoch 113 Batch 120/172] avg loss 0.0011085, throughput 2.07842K wps
[Epoch 113 Batch 150/172] avg loss 0.00143171, throughput 2.08026K wps
Begin Testing...
[Epoch 113] train avg loss 0.00112087, dev acc 0.8795, dev avg loss 0.396273, throughput 2.09044K wps
[Epoch 114 Batch 30/172] avg loss 0.00116975, throughput 2.17482K wps
[Epoch 114 Batch 60/172] avg loss 0.00105434, throughput 2.12894K wps
[Epoch 114 Batch 90/172] avg loss 0.00103739, throughput 2.12621K wps
[Epoch 114 Batch 120/172] avg loss 0.00119664, throughput 2.10915K wps
[Epoch 114 Batch 150/172] avg loss 0.00121978, throughput 2.11168K wps
Begin Testing...
[Epoch 114] train avg loss 0.00113625, dev acc 0.8784, dev avg loss 0.391682, throughput 2.12229K wps
[Epoch 115 Batch 30/172] avg loss 0.00109607, throughput 2.17371K wps
[Epoch 115 Batch 60/172] avg loss 0.00112172, throughput 2.11006K wps
[Epoch 115 Batch 90/172] avg loss 0.00104299, throughput 2.12002K wps
[Epoch 115 Batch 120/172] avg loss 0.00111346, throughput 2.1105K wps
[Epoch 115 Batch 150/172] avg loss 0.000968642, throughput 2.07658K wps
Begin Testing...
[Epoch 115] train avg loss 0.00110698, dev acc 0.8753, dev avg loss 0.394319, throughput 2.11616K wps
[Epoch 116 Batch 30/172] avg loss 0.00113301, throughput 2.18044K wps
[Epoch 116 Batch 60/172] avg loss 0.00118064, throughput 2.13578K wps
[Epoch 116 Batch 90/172] avg loss 0.000960186, throughput 2.13421K wps
[Epoch 116 Batch 120/172] avg loss 0.00104672, throughput 2.13301K wps
[Epoch 116 Batch 150/172] avg loss 0.00122859, throughput 2.07656K wps
Begin Testing...
[Epoch 116] train avg loss 0.00110955, dev acc 0.8721, dev avg loss 0.394912, throughput 2.12586K wps
[Epoch 117 Batch 30/172] avg loss 0.00103911, throughput 2.18938K wps
[Epoch 117 Batch 60/172] avg loss 0.00116494, throughput 2.10286K wps
[Epoch 117 Batch 90/172] avg loss 0.000999107, throughput 2.08404K wps
[Epoch 117 Batch 120/172] avg loss 0.00107029, throughput 2.09111K wps
[Epoch 117 Batch 150/172] avg loss 0.000935688, throughput 2.13136K wps
Begin Testing...
[Epoch 117] train avg loss 0.00105059, dev acc 0.8836, dev avg loss 0.405358, throughput 2.12022K wps
[Epoch 118 Batch 30/172] avg loss 0.00101476, throughput 2.15871K wps
[Epoch 118 Batch 60/172] avg loss 0.0011652, throughput 2.09618K wps
[Epoch 118 Batch 90/172] avg loss 0.00101128, throughput 2.1248K wps
[Epoch 118 Batch 120/172] avg loss 0.0011119, throughput 2.07372K wps
[Epoch 118 Batch 150/172] avg loss 0.00094646, throughput 2.08238K wps
Begin Testing...
[Epoch 118] train avg loss 0.00108608, dev acc 0.8805, dev avg loss 0.399411, throughput 2.10229K wps
[Epoch 119 Batch 30/172] avg loss 0.000981789, throughput 2.12021K wps
[Epoch 119 Batch 60/172] avg loss 0.000958616, throughput 2.10965K wps
[Epoch 119 Batch 90/172] avg loss 0.00109063, throughput 2.10736K wps
[Epoch 119 Batch 120/172] avg loss 0.00119417, throughput 2.06957K wps
[Epoch 119 Batch 150/172] avg loss 0.00109964, throughput 2.13352K wps
Begin Testing...
[Epoch 119] train avg loss 0.00105431, dev acc 0.8816, dev avg loss 0.404154, throughput 2.10422K wps
[Epoch 120 Batch 30/172] avg loss 0.00104376, throughput 2.1446K wps
[Epoch 120 Batch 60/172] avg loss 0.00104838, throughput 2.0888K wps
[Epoch 120 Batch 90/172] avg loss 0.00106563, throughput 2.12551K wps
[Epoch 120 Batch 120/172] avg loss 0.000788616, throughput 2.13197K wps
[Epoch 120 Batch 150/172] avg loss 0.00110879, throughput 2.12419K wps
Begin Testing...
[Epoch 120] train avg loss 0.00102008, dev acc 0.8763, dev avg loss 0.402008, throughput 2.12076K wps
[Epoch 121 Batch 30/172] avg loss 0.00101564, throughput 2.12856K wps
[Epoch 121 Batch 60/172] avg loss 0.000962902, throughput 2.13341K wps
[Epoch 121 Batch 90/172] avg loss 0.00108956, throughput 2.13307K wps
[Epoch 121 Batch 120/172] avg loss 0.00110156, throughput 2.08945K wps
[Epoch 121 Batch 150/172] avg loss 0.00100991, throughput 2.12161K wps
Begin Testing...
[Epoch 121] train avg loss 0.00104148, dev acc 0.8784, dev avg loss 0.406245, throughput 2.11975K wps
[Epoch 122 Batch 30/172] avg loss 0.00120116, throughput 2.18757K wps
[Epoch 122 Batch 60/172] avg loss 0.000958175, throughput 2.08986K wps
[Epoch 122 Batch 90/172] avg loss 0.00104114, throughput 2.13088K wps
[Epoch 122 Batch 120/172] avg loss 0.000924366, throughput 2.10973K wps
[Epoch 122 Batch 150/172] avg loss 0.00109742, throughput 2.1367K wps
Begin Testing...
[Epoch 122] train avg loss 0.00103975, dev acc 0.8763, dev avg loss 0.402732, throughput 2.12852K wps
[Epoch 123 Batch 30/172] avg loss 0.00120139, throughput 2.1749K wps
[Epoch 123 Batch 60/172] avg loss 0.000975359, throughput 2.10757K wps
[Epoch 123 Batch 90/172] avg loss 0.00115208, throughput 2.12804K wps
[Epoch 123 Batch 120/172] avg loss 0.00112134, throughput 2.10559K wps
[Epoch 123 Batch 150/172] avg loss 0.000903328, throughput 2.11437K wps
Begin Testing...
[Epoch 123] train avg loss 0.00104278, dev acc 0.8774, dev avg loss 0.406284, throughput 2.12652K wps
[Epoch 124 Batch 30/172] avg loss 0.000839073, throughput 2.17002K wps
[Epoch 124 Batch 60/172] avg loss 0.000944215, throughput 2.10152K wps
[Epoch 124 Batch 90/172] avg loss 0.000884393, throughput 2.13547K wps
[Epoch 124 Batch 120/172] avg loss 0.0012076, throughput 2.13916K wps
[Epoch 124 Batch 150/172] avg loss 0.00120592, throughput 2.14301K wps
Begin Testing...
[Epoch 124] train avg loss 0.00102517, dev acc 0.8742, dev avg loss 0.403406, throughput 2.13486K wps
[Epoch 125 Batch 30/172] avg loss 0.000932899, throughput 2.17304K wps
[Epoch 125 Batch 60/172] avg loss 0.000945498, throughput 2.12395K wps
[Epoch 125 Batch 90/172] avg loss 0.00124952, throughput 2.08655K wps
[Epoch 125 Batch 120/172] avg loss 0.000947912, throughput 2.08643K wps
[Epoch 125 Batch 150/172] avg loss 0.00107643, throughput 2.08821K wps
Begin Testing...
[Epoch 125] train avg loss 0.00104018, dev acc 0.8774, dev avg loss 0.40895, throughput 2.11184K wps
[Epoch 126 Batch 30/172] avg loss 0.000982407, throughput 2.18615K wps
[Epoch 126 Batch 60/172] avg loss 0.000900028, throughput 2.1013K wps
[Epoch 126 Batch 90/172] avg loss 0.00110286, throughput 2.12924K wps
[Epoch 126 Batch 120/172] avg loss 0.00087126, throughput 2.08371K wps
[Epoch 126 Batch 150/172] avg loss 0.00111865, throughput 2.10091K wps
Begin Testing...
[Epoch 126] train avg loss 0.00102316, dev acc 0.8784, dev avg loss 0.412964, throughput 2.11426K wps
[Epoch 127 Batch 30/172] avg loss 0.000965334, throughput 2.11983K wps
[Epoch 127 Batch 60/172] avg loss 0.000881828, throughput 2.08372K wps
[Epoch 127 Batch 90/172] avg loss 0.00117771, throughput 2.07545K wps
[Epoch 127 Batch 120/172] avg loss 0.00105642, throughput 2.10776K wps
[Epoch 127 Batch 150/172] avg loss 0.00110759, throughput 2.11043K wps
Begin Testing...
[Epoch 127] train avg loss 0.00101181, dev acc 0.8732, dev avg loss 0.41148, throughput 2.09582K wps
[Epoch 128 Batch 30/172] avg loss 0.000829902, throughput 2.16098K wps
[Epoch 128 Batch 60/172] avg loss 0.000993179, throughput 2.1026K wps
[Epoch 128 Batch 90/172] avg loss 0.000917254, throughput 2.08742K wps
[Epoch 128 Batch 120/172] avg loss 0.00114612, throughput 2.09568K wps
[Epoch 128 Batch 150/172] avg loss 0.000924564, throughput 2.13416K wps
Begin Testing...
[Epoch 128] train avg loss 0.000986214, dev acc 0.8784, dev avg loss 0.415237, throughput 2.1171K wps
[Epoch 129 Batch 30/172] avg loss 0.0010276, throughput 2.17912K wps
[Epoch 129 Batch 60/172] avg loss 0.000904063, throughput 2.1414K wps
[Epoch 129 Batch 90/172] avg loss 0.000745108, throughput 2.13589K wps
[Epoch 129 Batch 120/172] avg loss 0.00121556, throughput 2.13645K wps
[Epoch 129 Batch 150/172] avg loss 0.000918097, throughput 2.14084K wps
Begin Testing...
[Epoch 129] train avg loss 0.000984007, dev acc 0.8742, dev avg loss 0.412101, throughput 2.14544K wps
[Epoch 130 Batch 30/172] avg loss 0.000872549, throughput 2.18121K wps
[Epoch 130 Batch 60/172] avg loss 0.000913759, throughput 2.12399K wps
[Epoch 130 Batch 90/172] avg loss 0.000860566, throughput 2.10593K wps
[Epoch 130 Batch 120/172] avg loss 0.00103801, throughput 2.12621K wps
[Epoch 130 Batch 150/172] avg loss 0.00112371, throughput 2.06952K wps
Begin Testing...
[Epoch 130] train avg loss 0.000955939, dev acc 0.8795, dev avg loss 0.425025, throughput 2.1151K wps
[Epoch 131 Batch 30/172] avg loss 0.000910385, throughput 2.13609K wps
[Epoch 131 Batch 60/172] avg loss 0.00101297, throughput 2.11975K wps
[Epoch 131 Batch 90/172] avg loss 0.000941683, throughput 2.1168K wps
[Epoch 131 Batch 120/172] avg loss 0.00101315, throughput 2.13819K wps
[Epoch 131 Batch 150/172] avg loss 0.00108207, throughput 2.10278K wps
Begin Testing...
[Epoch 131] train avg loss 0.000968606, dev acc 0.8795, dev avg loss 0.420676, throughput 2.11933K wps
[Epoch 132 Batch 30/172] avg loss 0.00102111, throughput 2.15593K wps
[Epoch 132 Batch 60/172] avg loss 0.000803779, throughput 2.12599K wps
[Epoch 132 Batch 90/172] avg loss 0.00104524, throughput 2.1361K wps
[Epoch 132 Batch 120/172] avg loss 0.000940011, throughput 2.12837K wps
[Epoch 132 Batch 150/172] avg loss 0.00105998, throughput 2.10488K wps
Begin Testing...
[Epoch 132] train avg loss 0.000994782, dev acc 0.8763, dev avg loss 0.41821, throughput 2.12389K wps
[Epoch 133 Batch 30/172] avg loss 0.000929871, throughput 2.1674K wps
[Epoch 133 Batch 60/172] avg loss 0.000895809, throughput 2.14066K wps
[Epoch 133 Batch 90/172] avg loss 0.000966929, throughput 2.12161K wps
[Epoch 133 Batch 120/172] avg loss 0.000942523, throughput 2.11534K wps
[Epoch 133 Batch 150/172] avg loss 0.0010415, throughput 2.10577K wps
Begin Testing...
[Epoch 133] train avg loss 0.00096399, dev acc 0.8742, dev avg loss 0.417301, throughput 2.1311K wps
[Epoch 134 Batch 30/172] avg loss 0.00116052, throughput 2.14036K wps
[Epoch 134 Batch 60/172] avg loss 0.0010141, throughput 2.08715K wps
[Epoch 134 Batch 90/172] avg loss 0.000981384, throughput 2.08882K wps
[Epoch 134 Batch 120/172] avg loss 0.000984648, throughput 2.09586K wps
[Epoch 134 Batch 150/172] avg loss 0.000814685, throughput 2.09583K wps
Begin Testing...
[Epoch 134] train avg loss 0.000955708, dev acc 0.8721, dev avg loss 0.419455, throughput 2.10692K wps
[Epoch 135 Batch 30/172] avg loss 0.000878778, throughput 2.18541K wps
[Epoch 135 Batch 60/172] avg loss 0.000954432, throughput 2.12313K wps
[Epoch 135 Batch 90/172] avg loss 0.000931727, throughput 2.10257K wps
[Epoch 135 Batch 120/172] avg loss 0.000964053, throughput 2.09042K wps
[Epoch 135 Batch 150/172] avg loss 0.00106563, throughput 2.14133K wps
Begin Testing...
[Epoch 135] train avg loss 0.000955521, dev acc 0.8826, dev avg loss 0.428753, throughput 2.12908K wps
[Epoch 136 Batch 30/172] avg loss 0.000922381, throughput 2.17198K wps
[Epoch 136 Batch 60/172] avg loss 0.000858215, throughput 2.13919K wps
[Epoch 136 Batch 90/172] avg loss 0.000979956, throughput 2.13281K wps
[Epoch 136 Batch 120/172] avg loss 0.000951781, throughput 2.11042K wps
[Epoch 136 Batch 150/172] avg loss 0.00099331, throughput 2.10013K wps
Begin Testing...
[Epoch 136] train avg loss 0.00092541, dev acc 0.8742, dev avg loss 0.420605, throughput 2.12781K wps
[Epoch 137 Batch 30/172] avg loss 0.000890693, throughput 2.15921K wps
[Epoch 137 Batch 60/172] avg loss 0.000925812, throughput 2.09678K wps
[Epoch 137 Batch 90/172] avg loss 0.000928938, throughput 2.07867K wps
[Epoch 137 Batch 120/172] avg loss 0.00102596, throughput 2.11973K wps
[Epoch 137 Batch 150/172] avg loss 0.000944174, throughput 2.08233K wps
Begin Testing...
[Epoch 137] train avg loss 0.000930588, dev acc 0.8711, dev avg loss 0.421583, throughput 2.10536K wps
[Epoch 138 Batch 30/172] avg loss 0.00085796, throughput 2.16928K wps
[Epoch 138 Batch 60/172] avg loss 0.000988281, throughput 2.12427K wps
[Epoch 138 Batch 90/172] avg loss 0.000917557, throughput 2.12719K wps
[Epoch 138 Batch 120/172] avg loss 0.00102954, throughput 2.10409K wps
[Epoch 138 Batch 150/172] avg loss 0.00101043, throughput 2.11038K wps
Begin Testing...
[Epoch 138] train avg loss 0.000971582, dev acc 0.8784, dev avg loss 0.432378, throughput 2.12882K wps
[Epoch 139 Batch 30/172] avg loss 0.000926657, throughput 2.12965K wps
[Epoch 139 Batch 60/172] avg loss 0.000875049, throughput 2.09295K wps
[Epoch 139 Batch 90/172] avg loss 0.000923549, throughput 2.12006K wps
[Epoch 139 Batch 120/172] avg loss 0.000935154, throughput 2.07844K wps
[Epoch 139 Batch 150/172] avg loss 0.00102589, throughput 2.08948K wps
Begin Testing...
[Epoch 139] train avg loss 0.000947706, dev acc 0.8711, dev avg loss 0.423035, throughput 2.09838K wps
[Epoch 140 Batch 30/172] avg loss 0.000978719, throughput 2.15179K wps
[Epoch 140 Batch 60/172] avg loss 0.00110923, throughput 2.07895K wps
[Epoch 140 Batch 90/172] avg loss 0.000845404, throughput 2.08661K wps
[Epoch 140 Batch 120/172] avg loss 0.00096685, throughput 2.09811K wps
[Epoch 140 Batch 150/172] avg loss 0.000935015, throughput 2.13618K wps
Begin Testing...
[Epoch 140] train avg loss 0.000957169, dev acc 0.8753, dev avg loss 0.426271, throughput 2.11162K wps
[Epoch 141 Batch 30/172] avg loss 0.000946935, throughput 2.13039K wps
[Epoch 141 Batch 60/172] avg loss 0.00078103, throughput 2.07316K wps
[Epoch 141 Batch 90/172] avg loss 0.00106133, throughput 2.08442K wps
[Epoch 141 Batch 120/172] avg loss 0.000895668, throughput 2.10471K wps
[Epoch 141 Batch 150/172] avg loss 0.000897041, throughput 2.11384K wps
Begin Testing...
[Epoch 141] train avg loss 0.000940964, dev acc 0.8784, dev avg loss 0.430239, throughput 2.10616K wps
[Epoch 142 Batch 30/172] avg loss 0.000868368, throughput 2.15244K wps
[Epoch 142 Batch 60/172] avg loss 0.000876698, throughput 2.10922K wps
[Epoch 142 Batch 90/172] avg loss 0.0010506, throughput 2.13303K wps
[Epoch 142 Batch 120/172] avg loss 0.000804, throughput 2.13662K wps
[Epoch 142 Batch 150/172] avg loss 0.000958018, throughput 2.10656K wps
Begin Testing...
[Epoch 142] train avg loss 0.00090648, dev acc 0.8784, dev avg loss 0.438792, throughput 2.12194K wps
[Epoch 143 Batch 30/172] avg loss 0.000885823, throughput 2.17196K wps
[Epoch 143 Batch 60/172] avg loss 0.000842215, throughput 2.1467K wps
[Epoch 143 Batch 90/172] avg loss 0.000852639, throughput 2.10633K wps
[Epoch 143 Batch 120/172] avg loss 0.000975409, throughput 2.13081K wps
[Epoch 143 Batch 150/172] avg loss 0.000888441, throughput 2.0895K wps
Begin Testing...
[Epoch 143] train avg loss 0.000912459, dev acc 0.8732, dev avg loss 0.42824, throughput 2.12335K wps
[Epoch 144 Batch 30/172] avg loss 0.00100849, throughput 2.14129K wps
[Epoch 144 Batch 60/172] avg loss 0.000921131, throughput 2.12414K wps
[Epoch 144 Batch 90/172] avg loss 0.000818593, throughput 2.13076K wps
[Epoch 144 Batch 120/172] avg loss 0.00100442, throughput 2.1199K wps
[Epoch 144 Batch 150/172] avg loss 0.000886223, throughput 2.13331K wps
Begin Testing...
[Epoch 144] train avg loss 0.000908872, dev acc 0.8753, dev avg loss 0.432158, throughput 2.12947K wps
[Epoch 145 Batch 30/172] avg loss 0.000938457, throughput 2.16668K wps
[Epoch 145 Batch 60/172] avg loss 0.000884729, throughput 2.09188K wps
[Epoch 145 Batch 90/172] avg loss 0.0009068, throughput 2.12981K wps
[Epoch 145 Batch 120/172] avg loss 0.000925731, throughput 2.11636K wps
[Epoch 145 Batch 150/172] avg loss 0.000981138, throughput 2.11627K wps
Begin Testing...
[Epoch 145] train avg loss 0.000917182, dev acc 0.8763, dev avg loss 0.438286, throughput 2.12149K wps
[Epoch 146 Batch 30/172] avg loss 0.000790447, throughput 2.18791K wps
[Epoch 146 Batch 60/172] avg loss 0.000985605, throughput 2.1021K wps
[Epoch 146 Batch 90/172] avg loss 0.000923605, throughput 2.13394K wps
[Epoch 146 Batch 120/172] avg loss 0.00098841, throughput 2.09538K wps
[Epoch 146 Batch 150/172] avg loss 0.000904156, throughput 2.12502K wps
Begin Testing...
[Epoch 146] train avg loss 0.000903491, dev acc 0.8763, dev avg loss 0.435657, throughput 2.1273K wps
[Epoch 147 Batch 30/172] avg loss 0.000821636, throughput 2.13499K wps
[Epoch 147 Batch 60/172] avg loss 0.000992116, throughput 2.1141K wps
[Epoch 147 Batch 90/172] avg loss 0.00096232, throughput 2.09716K wps
[Epoch 147 Batch 120/172] avg loss 0.0006808, throughput 2.08648K wps
[Epoch 147 Batch 150/172] avg loss 0.000898205, throughput 2.08845K wps
Begin Testing...
[Epoch 147] train avg loss 0.000887812, dev acc 0.8700, dev avg loss 0.432006, throughput 2.10243K wps
[Epoch 148 Batch 30/172] avg loss 0.000909355, throughput 2.12881K wps
[Epoch 148 Batch 60/172] avg loss 0.000817288, throughput 2.10676K wps
[Epoch 148 Batch 90/172] avg loss 0.000859639, throughput 2.11698K wps
[Epoch 148 Batch 120/172] avg loss 0.001033, throughput 2.13506K wps
[Epoch 148 Batch 150/172] avg loss 0.000942205, throughput 2.14265K wps
Begin Testing...
[Epoch 148] train avg loss 0.000884614, dev acc 0.8721, dev avg loss 0.436683, throughput 2.12774K wps
[Epoch 149 Batch 30/172] avg loss 0.000866843, throughput 2.13513K wps
[Epoch 149 Batch 60/172] avg loss 0.000858407, throughput 2.10455K wps
[Epoch 149 Batch 90/172] avg loss 0.000771428, throughput 2.13541K wps
[Epoch 149 Batch 120/172] avg loss 0.000973819, throughput 2.12734K wps
[Epoch 149 Batch 150/172] avg loss 0.000953815, throughput 2.13906K wps
Begin Testing...
[Epoch 149] train avg loss 0.000897014, dev acc 0.8732, dev avg loss 0.43316, throughput 2.12873K wps
[Epoch 150 Batch 30/172] avg loss 0.000822848, throughput 2.16593K wps
[Epoch 150 Batch 60/172] avg loss 0.00107144, throughput 2.14181K wps
[Epoch 150 Batch 90/172] avg loss 0.000840576, throughput 2.14823K wps
[Epoch 150 Batch 120/172] avg loss 0.000933259, throughput 2.12327K wps
[Epoch 150 Batch 150/172] avg loss 0.000740252, throughput 2.12715K wps
Begin Testing...
[Epoch 150] train avg loss 0.000876651, dev acc 0.8742, dev avg loss 0.436568, throughput 2.13833K wps
[Epoch 151 Batch 30/172] avg loss 0.00103915, throughput 2.16746K wps
[Epoch 151 Batch 60/172] avg loss 0.000792719, throughput 2.11859K wps
[Epoch 151 Batch 90/172] avg loss 0.000633522, throughput 2.14524K wps
[Epoch 151 Batch 120/172] avg loss 0.00085938, throughput 2.13994K wps
[Epoch 151 Batch 150/172] avg loss 0.000903195, throughput 2.13954K wps
Begin Testing...
[Epoch 151] train avg loss 0.000876784, dev acc 0.8711, dev avg loss 0.437879, throughput 2.14081K wps
[Epoch 152 Batch 30/172] avg loss 0.000803898, throughput 2.14661K wps
[Epoch 152 Batch 60/172] avg loss 0.000864791, throughput 2.11814K wps
[Epoch 152 Batch 90/172] avg loss 0.000730223, throughput 2.11011K wps
[Epoch 152 Batch 120/172] avg loss 0.000697547, throughput 2.08871K wps
[Epoch 152 Batch 150/172] avg loss 0.00098022, throughput 2.08011K wps
Begin Testing...
[Epoch 152] train avg loss 0.000831095, dev acc 0.8774, dev avg loss 0.447108, throughput 2.10288K wps
[Epoch 153 Batch 30/172] avg loss 0.000937258, throughput 2.16799K wps
[Epoch 153 Batch 60/172] avg loss 0.000822482, throughput 2.11295K wps
[Epoch 153 Batch 90/172] avg loss 0.000828869, throughput 2.12218K wps
[Epoch 153 Batch 120/172] avg loss 0.000970624, throughput 2.1178K wps
[Epoch 153 Batch 150/172] avg loss 0.000990784, throughput 2.10391K wps
Begin Testing...
[Epoch 153] train avg loss 0.00087167, dev acc 0.8816, dev avg loss 0.454947, throughput 2.11881K wps
[Epoch 154 Batch 30/172] avg loss 0.000758737, throughput 2.13811K wps
[Epoch 154 Batch 60/172] avg loss 0.000766522, throughput 2.08907K wps
[Epoch 154 Batch 90/172] avg loss 0.000952524, throughput 2.1029K wps
[Epoch 154 Batch 120/172] avg loss 0.000806593, throughput 2.10205K wps
[Epoch 154 Batch 150/172] avg loss 0.000902469, throughput 2.08377K wps
Begin Testing...
[Epoch 154] train avg loss 0.000859554, dev acc 0.8763, dev avg loss 0.449877, throughput 2.09993K wps
[Epoch 155 Batch 30/172] avg loss 0.000905821, throughput 2.14643K wps
[Epoch 155 Batch 60/172] avg loss 0.000917429, throughput 2.07858K wps
[Epoch 155 Batch 90/172] avg loss 0.00096279, throughput 2.08437K wps
[Epoch 155 Batch 120/172] avg loss 0.000724812, throughput 2.11087K wps
[Epoch 155 Batch 150/172] avg loss 0.000935092, throughput 2.12287K wps
Begin Testing...
[Epoch 155] train avg loss 0.000896819, dev acc 0.8763, dev avg loss 0.444507, throughput 2.11176K wps
[Epoch 156 Batch 30/172] avg loss 0.000755532, throughput 2.12445K wps
[Epoch 156 Batch 60/172] avg loss 0.000895702, throughput 2.07686K wps
[Epoch 156 Batch 90/172] avg loss 0.00093876, throughput 2.11356K wps
[Epoch 156 Batch 120/172] avg loss 0.000931657, throughput 2.13748K wps
[Epoch 156 Batch 150/172] avg loss 0.000877833, throughput 2.13846K wps
Begin Testing...
[Epoch 156] train avg loss 0.000872403, dev acc 0.8742, dev avg loss 0.444459, throughput 2.11257K wps
[Epoch 157 Batch 30/172] avg loss 0.000736697, throughput 2.15635K wps
[Epoch 157 Batch 60/172] avg loss 0.000932675, throughput 2.08362K wps
[Epoch 157 Batch 90/172] avg loss 0.000833718, throughput 2.08104K wps
[Epoch 157 Batch 120/172] avg loss 0.000802433, throughput 2.08691K wps
[Epoch 157 Batch 150/172] avg loss 0.000828162, throughput 2.11089K wps
Begin Testing...
[Epoch 157] train avg loss 0.000825044, dev acc 0.8784, dev avg loss 0.449884, throughput 2.10588K wps
[Epoch 158 Batch 30/172] avg loss 0.000823787, throughput 2.16738K wps
[Epoch 158 Batch 60/172] avg loss 0.000761071, throughput 2.08462K wps
[Epoch 158 Batch 90/172] avg loss 0.000930889, throughput 2.08944K wps
[Epoch 158 Batch 120/172] avg loss 0.000887197, throughput 2.13039K wps
[Epoch 158 Batch 150/172] avg loss 0.000831012, throughput 2.13715K wps
Begin Testing...
[Epoch 158] train avg loss 0.000839754, dev acc 0.8763, dev avg loss 0.455046, throughput 2.11937K wps
[Epoch 159 Batch 30/172] avg loss 0.000791262, throughput 2.17828K wps
[Epoch 159 Batch 60/172] avg loss 0.000747834, throughput 2.09447K wps
[Epoch 159 Batch 90/172] avg loss 0.000777809, throughput 2.12614K wps
[Epoch 159 Batch 120/172] avg loss 0.00102116, throughput 2.12212K wps
[Epoch 159 Batch 150/172] avg loss 0.000729422, throughput 2.08537K wps
Begin Testing...
[Epoch 159] train avg loss 0.000822321, dev acc 0.8774, dev avg loss 0.458874, throughput 2.11608K wps
[Epoch 160 Batch 30/172] avg loss 0.000657192, throughput 2.12715K wps
[Epoch 160 Batch 60/172] avg loss 0.000811216, throughput 2.10544K wps
[Epoch 160 Batch 90/172] avg loss 0.000847087, throughput 2.09453K wps
[Epoch 160 Batch 120/172] avg loss 0.0010568, throughput 2.07601K wps
[Epoch 160 Batch 150/172] avg loss 0.000900688, throughput 2.07717K wps
Begin Testing...
[Epoch 160] train avg loss 0.0008605, dev acc 0.8742, dev avg loss 0.450241, throughput 2.09752K wps
[Epoch 161 Batch 30/172] avg loss 0.000760489, throughput 2.13325K wps
[Epoch 161 Batch 60/172] avg loss 0.000773692, throughput 2.07927K wps
[Epoch 161 Batch 90/172] avg loss 0.000911256, throughput 2.12833K wps
[Epoch 161 Batch 120/172] avg loss 0.000878109, throughput 2.11404K wps
[Epoch 161 Batch 150/172] avg loss 0.000934215, throughput 2.13468K wps
Begin Testing...
[Epoch 161] train avg loss 0.000835644, dev acc 0.8763, dev avg loss 0.45654, throughput 2.11571K wps
[Epoch 162 Batch 30/172] avg loss 0.00092099, throughput 2.18249K wps
[Epoch 162 Batch 60/172] avg loss 0.000793697, throughput 2.12875K wps
[Epoch 162 Batch 90/172] avg loss 0.000953214, throughput 2.08556K wps
[Epoch 162 Batch 120/172] avg loss 0.000779425, throughput 2.0789K wps
[Epoch 162 Batch 150/172] avg loss 0.00079897, throughput 2.07612K wps
Begin Testing...
[Epoch 162] train avg loss 0.000854703, dev acc 0.8753, dev avg loss 0.4531, throughput 2.10556K wps
[Epoch 163 Batch 30/172] avg loss 0.000711143, throughput 2.13226K wps
[Epoch 163 Batch 60/172] avg loss 0.000803171, throughput 2.07546K wps
[Epoch 163 Batch 90/172] avg loss 0.000738949, throughput 2.09213K wps
[Epoch 163 Batch 120/172] avg loss 0.000902912, throughput 2.14803K wps
[Epoch 163 Batch 150/172] avg loss 0.00106591, throughput 2.1428K wps
Begin Testing...
[Epoch 163] train avg loss 0.000837165, dev acc 0.8732, dev avg loss 0.448245, throughput 2.12142K wps
[Epoch 164 Batch 30/172] avg loss 0.000772469, throughput 2.15803K wps
[Epoch 164 Batch 60/172] avg loss 0.000874296, throughput 2.14647K wps
[Epoch 164 Batch 90/172] avg loss 0.000756845, throughput 2.13757K wps
[Epoch 164 Batch 120/172] avg loss 0.000933816, throughput 2.11648K wps
[Epoch 164 Batch 150/172] avg loss 0.000853556, throughput 2.09231K wps
Begin Testing...
[Epoch 164] train avg loss 0.000845551, dev acc 0.8721, dev avg loss 0.452558, throughput 2.12435K wps
[Epoch 165 Batch 30/172] avg loss 0.000748464, throughput 2.14509K wps
[Epoch 165 Batch 60/172] avg loss 0.000917491, throughput 2.07853K wps
[Epoch 165 Batch 90/172] avg loss 0.000732852, throughput 2.13215K wps
[Epoch 165 Batch 120/172] avg loss 0.000714695, throughput 2.12857K wps
[Epoch 165 Batch 150/172] avg loss 0.00090066, throughput 2.08672K wps
Begin Testing...
[Epoch 165] train avg loss 0.000836413, dev acc 0.8721, dev avg loss 0.4568, throughput 2.11634K wps
[Epoch 166 Batch 30/172] avg loss 0.000770367, throughput 2.13902K wps
[Epoch 166 Batch 60/172] avg loss 0.00096964, throughput 2.07155K wps
[Epoch 166 Batch 90/172] avg loss 0.000781666, throughput 2.08281K wps
[Epoch 166 Batch 120/172] avg loss 0.00073442, throughput 2.14295K wps
[Epoch 166 Batch 150/172] avg loss 0.000895184, throughput 2.12601K wps
Begin Testing...
[Epoch 166] train avg loss 0.0008218, dev acc 0.8763, dev avg loss 0.459955, throughput 2.1115K wps
[Epoch 167 Batch 30/172] avg loss 0.000697005, throughput 2.15105K wps
[Epoch 167 Batch 60/172] avg loss 0.000880488, throughput 2.1374K wps
[Epoch 167 Batch 90/172] avg loss 0.000806377, throughput 2.12983K wps
[Epoch 167 Batch 120/172] avg loss 0.000665852, throughput 2.13935K wps
[Epoch 167 Batch 150/172] avg loss 0.000892823, throughput 2.10151K wps
Begin Testing...
[Epoch 167] train avg loss 0.000799874, dev acc 0.8763, dev avg loss 0.461719, throughput 2.13015K wps
[Epoch 168 Batch 30/172] avg loss 0.000717567, throughput 2.15429K wps
[Epoch 168 Batch 60/172] avg loss 0.000762001, throughput 2.10961K wps
[Epoch 168 Batch 90/172] avg loss 0.000848004, throughput 2.08129K wps
[Epoch 168 Batch 120/172] avg loss 0.000891983, throughput 2.07576K wps
[Epoch 168 Batch 150/172] avg loss 0.000910478, throughput 2.11465K wps
Begin Testing...
[Epoch 168] train avg loss 0.000808088, dev acc 0.8763, dev avg loss 0.462033, throughput 2.10943K wps
[Epoch 169 Batch 30/172] avg loss 0.000659392, throughput 2.15261K wps
[Epoch 169 Batch 60/172] avg loss 0.000981528, throughput 2.09633K wps
[Epoch 169 Batch 90/172] avg loss 0.000888195, throughput 2.09685K wps
[Epoch 169 Batch 120/172] avg loss 0.000730898, throughput 2.08054K wps
[Epoch 169 Batch 150/172] avg loss 0.00100105, throughput 2.0787K wps
Begin Testing...
[Epoch 169] train avg loss 0.000836783, dev acc 0.8711, dev avg loss 0.458035, throughput 2.09868K wps
[Epoch 170 Batch 30/172] avg loss 0.000743238, throughput 2.13204K wps
[Epoch 170 Batch 60/172] avg loss 0.000740295, throughput 2.08687K wps
[Epoch 170 Batch 90/172] avg loss 0.000852764, throughput 2.09397K wps
[Epoch 170 Batch 120/172] avg loss 0.000871122, throughput 2.13916K wps
[Epoch 170 Batch 150/172] avg loss 0.000820557, throughput 2.08532K wps
Begin Testing...
[Epoch 170] train avg loss 0.000813664, dev acc 0.8763, dev avg loss 0.463704, throughput 2.10436K wps
[Epoch 171 Batch 30/172] avg loss 0.000696968, throughput 2.14559K wps
[Epoch 171 Batch 60/172] avg loss 0.000750229, throughput 2.08056K wps
[Epoch 171 Batch 90/172] avg loss 0.000788384, throughput 2.07542K wps
[Epoch 171 Batch 120/172] avg loss 0.000939249, throughput 2.06636K wps
[Epoch 171 Batch 150/172] avg loss 0.000907647, throughput 2.09173K wps
Begin Testing...
[Epoch 171] train avg loss 0.000797638, dev acc 0.8732, dev avg loss 0.461305, throughput 2.09599K wps
[Epoch 172 Batch 30/172] avg loss 0.000909921, throughput 2.17835K wps
[Epoch 172 Batch 60/172] avg loss 0.000913522, throughput 2.09293K wps
[Epoch 172 Batch 90/172] avg loss 0.000784123, throughput 2.07599K wps
[Epoch 172 Batch 120/172] avg loss 0.000648909, throughput 2.09082K wps
[Epoch 172 Batch 150/172] avg loss 0.000661211, throughput 2.12339K wps
Begin Testing...
[Epoch 172] train avg loss 0.000809419, dev acc 0.8826, dev avg loss 0.47574, throughput 2.11491K wps
[Epoch 173 Batch 30/172] avg loss 0.000740693, throughput 2.12506K wps
[Epoch 173 Batch 60/172] avg loss 0.000666586, throughput 2.07914K wps
[Epoch 173 Batch 90/172] avg loss 0.000768405, throughput 2.08972K wps
[Epoch 173 Batch 120/172] avg loss 0.000764569, throughput 2.07752K wps
[Epoch 173 Batch 150/172] avg loss 0.000964847, throughput 2.10844K wps
Begin Testing...
[Epoch 173] train avg loss 0.000795, dev acc 0.8784, dev avg loss 0.475364, throughput 2.10076K wps
[Epoch 174 Batch 30/172] avg loss 0.000853093, throughput 2.18482K wps
[Epoch 174 Batch 60/172] avg loss 0.000738594, throughput 2.10566K wps
[Epoch 174 Batch 90/172] avg loss 0.000713593, throughput 2.08472K wps
[Epoch 174 Batch 120/172] avg loss 0.000697542, throughput 2.11423K wps
[Epoch 174 Batch 150/172] avg loss 0.000899245, throughput 2.13913K wps
Begin Testing...
[Epoch 174] train avg loss 0.000766932, dev acc 0.8721, dev avg loss 0.462132, throughput 2.12563K wps
[Epoch 175 Batch 30/172] avg loss 0.000633635, throughput 2.16745K wps
[Epoch 175 Batch 60/172] avg loss 0.000840873, throughput 2.07034K wps
[Epoch 175 Batch 90/172] avg loss 0.000839928, throughput 2.07532K wps
[Epoch 175 Batch 120/172] avg loss 0.000827567, throughput 2.0863K wps
[Epoch 175 Batch 150/172] avg loss 0.000896988, throughput 2.07901K wps
Begin Testing...
[Epoch 175] train avg loss 0.000805197, dev acc 0.8774, dev avg loss 0.473755, throughput 2.10062K wps
[Epoch 176 Batch 30/172] avg loss 0.000791458, throughput 2.1876K wps
[Epoch 176 Batch 60/172] avg loss 0.000758394, throughput 2.08925K wps
[Epoch 176 Batch 90/172] avg loss 0.000807197, throughput 2.08277K wps
[Epoch 176 Batch 120/172] avg loss 0.000948776, throughput 2.08978K wps
[Epoch 176 Batch 150/172] avg loss 0.000625224, throughput 2.0801K wps
Begin Testing...
[Epoch 176] train avg loss 0.000778195, dev acc 0.8721, dev avg loss 0.46757, throughput 2.102K wps
[Epoch 177 Batch 30/172] avg loss 0.000634392, throughput 2.18785K wps
[Epoch 177 Batch 60/172] avg loss 0.000701964, throughput 2.11557K wps
[Epoch 177 Batch 90/172] avg loss 0.000862167, throughput 2.12537K wps
[Epoch 177 Batch 120/172] avg loss 0.00091522, throughput 2.14275K wps
[Epoch 177 Batch 150/172] avg loss 0.000870106, throughput 2.08827K wps
Begin Testing...
[Epoch 177] train avg loss 0.000787437, dev acc 0.8774, dev avg loss 0.469841, throughput 2.131K wps
[Epoch 178 Batch 30/172] avg loss 0.000750634, throughput 2.14202K wps
[Epoch 178 Batch 60/172] avg loss 0.000783882, throughput 2.10841K wps
[Epoch 178 Batch 90/172] avg loss 0.000847756, throughput 2.12967K wps
[Epoch 178 Batch 120/172] avg loss 0.000807763, throughput 2.09324K wps
[Epoch 178 Batch 150/172] avg loss 0.00076912, throughput 2.07678K wps
Begin Testing...
[Epoch 178] train avg loss 0.00078281, dev acc 0.8721, dev avg loss 0.473374, throughput 2.10715K wps
[Epoch 179 Batch 30/172] avg loss 0.000827232, throughput 2.18221K wps
[Epoch 179 Batch 60/172] avg loss 0.0008095, throughput 2.14793K wps
[Epoch 179 Batch 90/172] avg loss 0.000883471, throughput 2.14143K wps
[Epoch 179 Batch 120/172] avg loss 0.000725358, throughput 2.07064K wps
[Epoch 179 Batch 150/172] avg loss 0.000901322, throughput 2.083K wps
Begin Testing...
[Epoch 179] train avg loss 0.000791039, dev acc 0.8700, dev avg loss 0.471241, throughput 2.12005K wps
[Epoch 180 Batch 30/172] avg loss 0.00069395, throughput 2.12525K wps
[Epoch 180 Batch 60/172] avg loss 0.000819435, throughput 2.12756K wps
[Epoch 180 Batch 90/172] avg loss 0.000819557, throughput 2.13803K wps
[Epoch 180 Batch 120/172] avg loss 0.000816762, throughput 2.08057K wps
[Epoch 180 Batch 150/172] avg loss 0.000745737, throughput 2.09568K wps
Begin Testing...
[Epoch 180] train avg loss 0.000763617, dev acc 0.8690, dev avg loss 0.472664, throughput 2.10859K wps
[Epoch 181 Batch 30/172] avg loss 0.000765383, throughput 2.15869K wps
[Epoch 181 Batch 60/172] avg loss 0.000654415, throughput 2.08976K wps
[Epoch 181 Batch 90/172] avg loss 0.000793319, throughput 2.09032K wps
[Epoch 181 Batch 120/172] avg loss 0.000824636, throughput 2.06612K wps
[Epoch 181 Batch 150/172] avg loss 0.000726642, throughput 2.1074K wps
Begin Testing...
[Epoch 181] train avg loss 0.000767647, dev acc 0.8774, dev avg loss 0.478105, throughput 2.10211K wps
[Epoch 182 Batch 30/172] avg loss 0.00078651, throughput 2.15448K wps
[Epoch 182 Batch 60/172] avg loss 0.000746677, throughput 2.09852K wps
[Epoch 182 Batch 90/172] avg loss 0.00068937, throughput 2.10612K wps
[Epoch 182 Batch 120/172] avg loss 0.000773925, throughput 2.06084K wps
[Epoch 182 Batch 150/172] avg loss 0.000753011, throughput 2.07045K wps
Begin Testing...
[Epoch 182] train avg loss 0.000766078, dev acc 0.8690, dev avg loss 0.470962, throughput 2.09801K wps
[Epoch 183 Batch 30/172] avg loss 0.000746414, throughput 2.19105K wps
[Epoch 183 Batch 60/172] avg loss 0.000593784, throughput 2.09187K wps
[Epoch 183 Batch 90/172] avg loss 0.000800605, throughput 2.11415K wps
[Epoch 183 Batch 120/172] avg loss 0.000839988, throughput 2.10671K wps
[Epoch 183 Batch 150/172] avg loss 0.000754097, throughput 2.08534K wps
Begin Testing...
[Epoch 183] train avg loss 0.000789064, dev acc 0.8690, dev avg loss 0.469183, throughput 2.11326K wps
[Epoch 184 Batch 30/172] avg loss 0.000733814, throughput 2.15366K wps
[Epoch 184 Batch 60/172] avg loss 0.000871159, throughput 2.11426K wps
[Epoch 184 Batch 90/172] avg loss 0.0006863, throughput 2.13699K wps
[Epoch 184 Batch 120/172] avg loss 0.000733023, throughput 2.13889K wps
[Epoch 184 Batch 150/172] avg loss 0.000761987, throughput 2.10357K wps
Begin Testing...
[Epoch 184] train avg loss 0.000766791, dev acc 0.8711, dev avg loss 0.471608, throughput 2.1305K wps
[Epoch 185 Batch 30/172] avg loss 0.000823281, throughput 2.15839K wps
[Epoch 185 Batch 60/172] avg loss 0.000647518, throughput 2.12038K wps
[Epoch 185 Batch 90/172] avg loss 0.000744284, throughput 2.1126K wps
[Epoch 185 Batch 120/172] avg loss 0.000684404, throughput 2.09205K wps
[Epoch 185 Batch 150/172] avg loss 0.000804409, throughput 2.10215K wps
Begin Testing...
[Epoch 185] train avg loss 0.000758632, dev acc 0.8732, dev avg loss 0.477875, throughput 2.12024K wps
[Epoch 186 Batch 30/172] avg loss 0.000874337, throughput 2.18401K wps
[Epoch 186 Batch 60/172] avg loss 0.000873454, throughput 2.10899K wps
[Epoch 186 Batch 90/172] avg loss 0.000738837, throughput 2.09494K wps
[Epoch 186 Batch 120/172] avg loss 0.000730162, throughput 2.10306K wps
[Epoch 186 Batch 150/172] avg loss 0.000663466, throughput 2.09417K wps
Begin Testing...
[Epoch 186] train avg loss 0.000762156, dev acc 0.8847, dev avg loss 0.493958, throughput 2.113K wps
[Epoch 187 Batch 30/172] avg loss 0.000642243, throughput 2.1731K wps
[Epoch 187 Batch 60/172] avg loss 0.000816126, throughput 2.10174K wps
[Epoch 187 Batch 90/172] avg loss 0.000819585, throughput 2.07344K wps
[Epoch 187 Batch 120/172] avg loss 0.000677496, throughput 2.07665K wps
[Epoch 187 Batch 150/172] avg loss 0.000641962, throughput 2.08931K wps
Begin Testing...
[Epoch 187] train avg loss 0.000716895, dev acc 0.8732, dev avg loss 0.473018, throughput 2.10448K wps
[Epoch 188 Batch 30/172] avg loss 0.000710983, throughput 2.14758K wps
[Epoch 188 Batch 60/172] avg loss 0.000721332, throughput 2.07407K wps
[Epoch 188 Batch 90/172] avg loss 0.000745612, throughput 2.07422K wps
[Epoch 188 Batch 120/172] avg loss 0.000788089, throughput 2.14256K wps
[Epoch 188 Batch 150/172] avg loss 0.00065193, throughput 2.09619K wps
Begin Testing...
[Epoch 188] train avg loss 0.000727216, dev acc 0.8784, dev avg loss 0.488283, throughput 2.1046K wps
[Epoch 189 Batch 30/172] avg loss 0.000697306, throughput 2.15312K wps
[Epoch 189 Batch 60/172] avg loss 0.000737116, throughput 2.13826K wps
[Epoch 189 Batch 90/172] avg loss 0.000667169, throughput 2.12655K wps
[Epoch 189 Batch 120/172] avg loss 0.000826968, throughput 2.13394K wps
[Epoch 189 Batch 150/172] avg loss 0.000995811, throughput 2.08969K wps
Begin Testing...
[Epoch 189] train avg loss 0.000772416, dev acc 0.8732, dev avg loss 0.479292, throughput 2.12563K wps
[Epoch 190 Batch 30/172] avg loss 0.000728169, throughput 2.13874K wps
[Epoch 190 Batch 60/172] avg loss 0.000622768, throughput 2.14181K wps
[Epoch 190 Batch 90/172] avg loss 0.000672008, throughput 2.10189K wps
[Epoch 190 Batch 120/172] avg loss 0.00074369, throughput 2.10206K wps
[Epoch 190 Batch 150/172] avg loss 0.000838297, throughput 2.13677K wps
Begin Testing...
[Epoch 190] train avg loss 0.000746438, dev acc 0.8711, dev avg loss 0.478389, throughput 2.12557K wps
[Epoch 191 Batch 30/172] avg loss 0.000597376, throughput 2.16429K wps
[Epoch 191 Batch 60/172] avg loss 0.000698271, throughput 2.1282K wps
[Epoch 191 Batch 90/172] avg loss 0.000608367, throughput 2.14196K wps
[Epoch 191 Batch 120/172] avg loss 0.000803739, throughput 2.11191K wps
[Epoch 191 Batch 150/172] avg loss 0.000849503, throughput 2.14184K wps
Begin Testing...
[Epoch 191] train avg loss 0.000733477, dev acc 0.8784, dev avg loss 0.48739, throughput 2.13819K wps
[Epoch 192 Batch 30/172] avg loss 0.000645987, throughput 2.15824K wps
[Epoch 192 Batch 60/172] avg loss 0.000785349, throughput 2.08044K wps
[Epoch 192 Batch 90/172] avg loss 0.000562399, throughput 2.10483K wps
[Epoch 192 Batch 120/172] avg loss 0.00080159, throughput 2.09227K wps
[Epoch 192 Batch 150/172] avg loss 0.000637369, throughput 2.09259K wps
Begin Testing...
[Epoch 192] train avg loss 0.000711344, dev acc 0.8774, dev avg loss 0.490041, throughput 2.10999K wps
[Epoch 193 Batch 30/172] avg loss 0.000780814, throughput 2.18852K wps
[Epoch 193 Batch 60/172] avg loss 0.000671746, throughput 2.08397K wps
[Epoch 193 Batch 90/172] avg loss 0.000831882, throughput 2.0837K wps
[Epoch 193 Batch 120/172] avg loss 0.000650776, throughput 2.08677K wps
[Epoch 193 Batch 150/172] avg loss 0.000814032, throughput 2.13768K wps
Begin Testing...
[Epoch 193] train avg loss 0.000731144, dev acc 0.8795, dev avg loss 0.492928, throughput 2.1156K wps
[Epoch 194 Batch 30/172] avg loss 0.000635121, throughput 2.12021K wps
[Epoch 194 Batch 60/172] avg loss 0.0008721, throughput 2.1142K wps
[Epoch 194 Batch 90/172] avg loss 0.000663379, throughput 2.13914K wps
[Epoch 194 Batch 120/172] avg loss 0.000640906, throughput 2.14148K wps
[Epoch 194 Batch 150/172] avg loss 0.00073256, throughput 2.14664K wps
Begin Testing...
[Epoch 194] train avg loss 0.00073014, dev acc 0.8795, dev avg loss 0.491836, throughput 2.13237K wps
[Epoch 195 Batch 30/172] avg loss 0.000789196, throughput 2.13227K wps
[Epoch 195 Batch 60/172] avg loss 0.000638794, throughput 2.09604K wps
[Epoch 195 Batch 90/172] avg loss 0.000758205, throughput 2.10163K wps
[Epoch 195 Batch 120/172] avg loss 0.000625695, throughput 2.09351K wps
[Epoch 195 Batch 150/172] avg loss 0.00065706, throughput 2.08056K wps
Begin Testing...
[Epoch 195] train avg loss 0.000711875, dev acc 0.8742, dev avg loss 0.48755, throughput 2.09768K wps
[Epoch 196 Batch 30/172] avg loss 0.000661204, throughput 2.15639K wps
[Epoch 196 Batch 60/172] avg loss 0.000864016, throughput 2.13855K wps
[Epoch 196 Batch 90/172] avg loss 0.000691691, throughput 2.13222K wps
[Epoch 196 Batch 120/172] avg loss 0.000733598, throughput 2.14833K wps
[Epoch 196 Batch 150/172] avg loss 0.000598535, throughput 2.13055K wps
Begin Testing...
[Epoch 196] train avg loss 0.000711048, dev acc 0.8711, dev avg loss 0.48612, throughput 2.1417K wps
[Epoch 197 Batch 30/172] avg loss 0.000537967, throughput 2.17584K wps
[Epoch 197 Batch 60/172] avg loss 0.000711847, throughput 2.1426K wps
[Epoch 197 Batch 90/172] avg loss 0.000889088, throughput 2.14276K wps
[Epoch 197 Batch 120/172] avg loss 0.000792676, throughput 2.11273K wps
[Epoch 197 Batch 150/172] avg loss 0.000707445, throughput 2.08147K wps
Begin Testing...
[Epoch 197] train avg loss 0.000706879, dev acc 0.8700, dev avg loss 0.488225, throughput 2.12642K wps
[Epoch 198 Batch 30/172] avg loss 0.000804379, throughput 2.17611K wps
[Epoch 198 Batch 60/172] avg loss 0.00085825, throughput 2.14091K wps
[Epoch 198 Batch 90/172] avg loss 0.000517583, throughput 2.14423K wps
[Epoch 198 Batch 120/172] avg loss 0.000730211, throughput 2.10266K wps
[Epoch 198 Batch 150/172] avg loss 0.000797765, throughput 2.08804K wps
Begin Testing...
[Epoch 198] train avg loss 0.000732061, dev acc 0.8711, dev avg loss 0.490363, throughput 2.124K wps
[Epoch 199 Batch 30/172] avg loss 0.000929436, throughput 2.17695K wps
[Epoch 199 Batch 60/172] avg loss 0.000655374, throughput 2.12109K wps
[Epoch 199 Batch 90/172] avg loss 0.000604322, throughput 2.11488K wps
[Epoch 199 Batch 120/172] avg loss 0.000737799, throughput 2.10251K wps
[Epoch 199 Batch 150/172] avg loss 0.000728083, throughput 2.10518K wps
Begin Testing...
[Epoch 199] train avg loss 0.00073982, dev acc 0.8742, dev avg loss 0.489839, throughput 2.12112K wps
Test loss 0.254742, test acc 0.8962
Total time cost 622.87s
[Epoch 0 Batch 30/172] avg loss 0.0130187, throughput 1.86761K wps
[Epoch 0 Batch 60/172] avg loss 0.0125502, throughput 2.08139K wps
[Epoch 0 Batch 90/172] avg loss 0.0122762, throughput 2.10293K wps
[Epoch 0 Batch 120/172] avg loss 0.0119577, throughput 2.14081K wps
[Epoch 0 Batch 150/172] avg loss 0.0118344, throughput 2.14617K wps
Begin Testing...
[Epoch 0] train avg loss 0.0123318, dev acc 0.7044, dev avg loss 0.579781, throughput 2.07389K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0117283, throughput 2.18732K wps
[Epoch 1 Batch 60/172] avg loss 0.0118073, throughput 2.09398K wps
[Epoch 1 Batch 90/172] avg loss 0.0116526, throughput 2.10384K wps
[Epoch 1 Batch 120/172] avg loss 0.0115149, throughput 2.14987K wps
[Epoch 1 Batch 150/172] avg loss 0.0117798, throughput 2.14246K wps
Begin Testing...
[Epoch 1] train avg loss 0.0117142, dev acc 0.7044, dev avg loss 0.557724, throughput 2.13506K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0113558, throughput 2.16893K wps
[Epoch 2 Batch 60/172] avg loss 0.0112787, throughput 2.09405K wps
[Epoch 2 Batch 90/172] avg loss 0.0109429, throughput 2.09048K wps
[Epoch 2 Batch 120/172] avg loss 0.0110744, throughput 2.1516K wps
[Epoch 2 Batch 150/172] avg loss 0.0108375, throughput 2.15424K wps
Begin Testing...
[Epoch 2] train avg loss 0.0111021, dev acc 0.7254, dev avg loss 0.523795, throughput 2.13334K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0109483, throughput 2.15716K wps
[Epoch 3 Batch 60/172] avg loss 0.0103484, throughput 2.1156K wps
[Epoch 3 Batch 90/172] avg loss 0.0103689, throughput 2.08113K wps
[Epoch 3 Batch 120/172] avg loss 0.0101156, throughput 2.08826K wps
[Epoch 3 Batch 150/172] avg loss 0.0098777, throughput 2.12742K wps
Begin Testing...
[Epoch 3] train avg loss 0.0103222, dev acc 0.7589, dev avg loss 0.484641, throughput 2.11815K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0100677, throughput 2.1526K wps
[Epoch 4 Batch 60/172] avg loss 0.00955831, throughput 2.091K wps
[Epoch 4 Batch 90/172] avg loss 0.00931945, throughput 2.1545K wps
[Epoch 4 Batch 120/172] avg loss 0.00942328, throughput 2.13992K wps
[Epoch 4 Batch 150/172] avg loss 0.00935092, throughput 2.08348K wps
Begin Testing...
[Epoch 4] train avg loss 0.00947031, dev acc 0.8103, dev avg loss 0.44572, throughput 2.12026K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.00909783, throughput 2.14081K wps
[Epoch 5 Batch 60/172] avg loss 0.00899985, throughput 2.13063K wps
[Epoch 5 Batch 90/172] avg loss 0.00886093, throughput 2.1079K wps
[Epoch 5 Batch 120/172] avg loss 0.00854896, throughput 2.0962K wps
[Epoch 5 Batch 150/172] avg loss 0.00827546, throughput 2.13763K wps
Begin Testing...
[Epoch 5] train avg loss 0.00870082, dev acc 0.8491, dev avg loss 0.410319, throughput 2.12218K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.00813637, throughput 2.19031K wps
[Epoch 6 Batch 60/172] avg loss 0.00833676, throughput 2.14125K wps
[Epoch 6 Batch 90/172] avg loss 0.00813817, throughput 2.1514K wps
[Epoch 6 Batch 120/172] avg loss 0.007815, throughput 2.09315K wps
[Epoch 6 Batch 150/172] avg loss 0.00792816, throughput 2.08713K wps
Begin Testing...
[Epoch 6] train avg loss 0.00799832, dev acc 0.8585, dev avg loss 0.380563, throughput 2.12536K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.00749914, throughput 2.13798K wps
[Epoch 7 Batch 60/172] avg loss 0.00760827, throughput 2.10398K wps
[Epoch 7 Batch 90/172] avg loss 0.00758292, throughput 2.10788K wps
[Epoch 7 Batch 120/172] avg loss 0.00730489, throughput 2.10131K wps
[Epoch 7 Batch 150/172] avg loss 0.00724531, throughput 2.06824K wps
Begin Testing...
[Epoch 7] train avg loss 0.00741177, dev acc 0.8669, dev avg loss 0.360551, throughput 2.10578K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00703204, throughput 2.16148K wps
[Epoch 8 Batch 60/172] avg loss 0.00735499, throughput 2.08041K wps
[Epoch 8 Batch 90/172] avg loss 0.00689023, throughput 2.10474K wps
[Epoch 8 Batch 120/172] avg loss 0.00670118, throughput 2.09308K wps
[Epoch 8 Batch 150/172] avg loss 0.0067212, throughput 2.1008K wps
Begin Testing...
[Epoch 8] train avg loss 0.00693052, dev acc 0.8679, dev avg loss 0.344564, throughput 2.1044K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.0068004, throughput 2.17326K wps
[Epoch 9 Batch 60/172] avg loss 0.00657555, throughput 2.15087K wps
[Epoch 9 Batch 90/172] avg loss 0.00686983, throughput 2.15354K wps
[Epoch 9 Batch 120/172] avg loss 0.00633009, throughput 2.14815K wps
[Epoch 9 Batch 150/172] avg loss 0.00632875, throughput 2.09556K wps
Begin Testing...
[Epoch 9] train avg loss 0.0065906, dev acc 0.8721, dev avg loss 0.329024, throughput 2.14246K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00636039, throughput 2.15719K wps
[Epoch 10 Batch 60/172] avg loss 0.0056433, throughput 2.1412K wps
[Epoch 10 Batch 90/172] avg loss 0.00663009, throughput 2.13357K wps
[Epoch 10 Batch 120/172] avg loss 0.00638625, throughput 2.14843K wps
[Epoch 10 Batch 150/172] avg loss 0.00650946, throughput 2.14843K wps
Begin Testing...
[Epoch 10] train avg loss 0.00636446, dev acc 0.8711, dev avg loss 0.321459, throughput 2.14161K wps
[Epoch 11 Batch 30/172] avg loss 0.00620346, throughput 2.13758K wps
[Epoch 11 Batch 60/172] avg loss 0.00608449, throughput 2.13869K wps
[Epoch 11 Batch 90/172] avg loss 0.00591849, throughput 2.14713K wps
[Epoch 11 Batch 120/172] avg loss 0.00616115, throughput 2.10045K wps
[Epoch 11 Batch 150/172] avg loss 0.00610602, throughput 2.11932K wps
Begin Testing...
[Epoch 11] train avg loss 0.00610859, dev acc 0.8732, dev avg loss 0.31344, throughput 2.12296K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.00625322, throughput 2.19121K wps
[Epoch 12 Batch 60/172] avg loss 0.00636274, throughput 2.141K wps
[Epoch 12 Batch 90/172] avg loss 0.00553183, throughput 2.12171K wps
[Epoch 12 Batch 120/172] avg loss 0.0058653, throughput 2.13138K wps
[Epoch 12 Batch 150/172] avg loss 0.00536344, throughput 2.14473K wps
Begin Testing...
[Epoch 12] train avg loss 0.00589106, dev acc 0.8763, dev avg loss 0.307228, throughput 2.14591K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.00598983, throughput 2.12449K wps
[Epoch 13 Batch 60/172] avg loss 0.00603802, throughput 2.12712K wps
[Epoch 13 Batch 90/172] avg loss 0.0059006, throughput 2.14696K wps
[Epoch 13 Batch 120/172] avg loss 0.00551551, throughput 2.15265K wps
[Epoch 13 Batch 150/172] avg loss 0.00572455, throughput 2.10508K wps
Begin Testing...
[Epoch 13] train avg loss 0.00576024, dev acc 0.8795, dev avg loss 0.302538, throughput 2.12532K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.00544738, throughput 2.17434K wps
[Epoch 14 Batch 60/172] avg loss 0.00549978, throughput 2.15499K wps
[Epoch 14 Batch 90/172] avg loss 0.00599145, throughput 2.14489K wps
[Epoch 14 Batch 120/172] avg loss 0.00555924, throughput 2.1079K wps
[Epoch 14 Batch 150/172] avg loss 0.00594619, throughput 2.15094K wps
Begin Testing...
[Epoch 14] train avg loss 0.005599, dev acc 0.8805, dev avg loss 0.298756, throughput 2.14655K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.00542325, throughput 2.13867K wps
[Epoch 15 Batch 60/172] avg loss 0.00534179, throughput 2.09976K wps
[Epoch 15 Batch 90/172] avg loss 0.00517598, throughput 2.15228K wps
[Epoch 15 Batch 120/172] avg loss 0.00560748, throughput 2.14151K wps
[Epoch 15 Batch 150/172] avg loss 0.00539529, throughput 2.1489K wps
Begin Testing...
[Epoch 15] train avg loss 0.00544393, dev acc 0.8826, dev avg loss 0.295353, throughput 2.13599K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.00531226, throughput 2.13841K wps
[Epoch 16 Batch 60/172] avg loss 0.00534393, throughput 2.09501K wps
[Epoch 16 Batch 90/172] avg loss 0.00538523, throughput 2.10148K wps
[Epoch 16 Batch 120/172] avg loss 0.00512575, throughput 2.14125K wps
[Epoch 16 Batch 150/172] avg loss 0.00494463, throughput 2.1483K wps
Begin Testing...
[Epoch 16] train avg loss 0.005273, dev acc 0.8826, dev avg loss 0.292581, throughput 2.12524K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/172] avg loss 0.0054931, throughput 2.15343K wps
[Epoch 17 Batch 60/172] avg loss 0.00463153, throughput 2.08968K wps
[Epoch 17 Batch 90/172] avg loss 0.00503542, throughput 2.13906K wps
[Epoch 17 Batch 120/172] avg loss 0.00561408, throughput 2.15013K wps
[Epoch 17 Batch 150/172] avg loss 0.00503371, throughput 2.13757K wps
Begin Testing...
[Epoch 17] train avg loss 0.00516604, dev acc 0.8816, dev avg loss 0.290239, throughput 2.13552K wps
[Epoch 18 Batch 30/172] avg loss 0.0049645, throughput 2.13576K wps
[Epoch 18 Batch 60/172] avg loss 0.00491396, throughput 2.15294K wps
[Epoch 18 Batch 90/172] avg loss 0.00490828, throughput 2.11615K wps
[Epoch 18 Batch 120/172] avg loss 0.00531319, throughput 2.13945K wps
[Epoch 18 Batch 150/172] avg loss 0.00490571, throughput 2.12987K wps
Begin Testing...
[Epoch 18] train avg loss 0.00505383, dev acc 0.8868, dev avg loss 0.288452, throughput 2.1364K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00509044, throughput 2.15002K wps
[Epoch 19 Batch 60/172] avg loss 0.00528848, throughput 2.14001K wps
[Epoch 19 Batch 90/172] avg loss 0.00481879, throughput 2.08319K wps
[Epoch 19 Batch 120/172] avg loss 0.0048641, throughput 2.08971K wps
[Epoch 19 Batch 150/172] avg loss 0.00489437, throughput 2.13893K wps
Begin Testing...
[Epoch 19] train avg loss 0.00495195, dev acc 0.8868, dev avg loss 0.28673, throughput 2.11784K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.00552383, throughput 2.18344K wps
[Epoch 20 Batch 60/172] avg loss 0.0049493, throughput 2.1444K wps
[Epoch 20 Batch 90/172] avg loss 0.0045667, throughput 2.13917K wps
[Epoch 20 Batch 120/172] avg loss 0.00450282, throughput 2.14699K wps
[Epoch 20 Batch 150/172] avg loss 0.00455337, throughput 2.13632K wps
Begin Testing...
[Epoch 20] train avg loss 0.00485917, dev acc 0.8857, dev avg loss 0.285725, throughput 2.14443K wps
[Epoch 21 Batch 30/172] avg loss 0.00482265, throughput 2.1635K wps
[Epoch 21 Batch 60/172] avg loss 0.00472242, throughput 2.11656K wps
[Epoch 21 Batch 90/172] avg loss 0.00466871, throughput 2.09642K wps
[Epoch 21 Batch 120/172] avg loss 0.00472584, throughput 2.09122K wps
[Epoch 21 Batch 150/172] avg loss 0.00429384, throughput 2.07671K wps
Begin Testing...
[Epoch 21] train avg loss 0.00472111, dev acc 0.8878, dev avg loss 0.284324, throughput 2.10591K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00441317, throughput 2.14325K wps
[Epoch 22 Batch 60/172] avg loss 0.00460547, throughput 2.1185K wps
[Epoch 22 Batch 90/172] avg loss 0.00478907, throughput 2.08737K wps
[Epoch 22 Batch 120/172] avg loss 0.00456175, throughput 2.14011K wps
[Epoch 22 Batch 150/172] avg loss 0.00490581, throughput 2.15281K wps
Begin Testing...
[Epoch 22] train avg loss 0.00467257, dev acc 0.8836, dev avg loss 0.283291, throughput 2.1308K wps
[Epoch 23 Batch 30/172] avg loss 0.00441923, throughput 2.13122K wps
[Epoch 23 Batch 60/172] avg loss 0.00471353, throughput 2.13654K wps
[Epoch 23 Batch 90/172] avg loss 0.00473168, throughput 2.14547K wps
[Epoch 23 Batch 120/172] avg loss 0.0044292, throughput 2.14944K wps
[Epoch 23 Batch 150/172] avg loss 0.00446003, throughput 2.14849K wps
Begin Testing...
[Epoch 23] train avg loss 0.00456122, dev acc 0.8878, dev avg loss 0.283358, throughput 2.14236K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00467222, throughput 2.14464K wps
[Epoch 24 Batch 60/172] avg loss 0.0041959, throughput 2.09331K wps
[Epoch 24 Batch 90/172] avg loss 0.0044124, throughput 2.10002K wps
[Epoch 24 Batch 120/172] avg loss 0.00440171, throughput 2.08585K wps
[Epoch 24 Batch 150/172] avg loss 0.00472594, throughput 2.10384K wps
Begin Testing...
[Epoch 24] train avg loss 0.00446947, dev acc 0.8868, dev avg loss 0.282051, throughput 2.11079K wps
[Epoch 25 Batch 30/172] avg loss 0.00418988, throughput 2.18117K wps
[Epoch 25 Batch 60/172] avg loss 0.00467737, throughput 2.14328K wps
[Epoch 25 Batch 90/172] avg loss 0.00416571, throughput 2.15022K wps
[Epoch 25 Batch 120/172] avg loss 0.00448345, throughput 2.10725K wps
[Epoch 25 Batch 150/172] avg loss 0.00423433, throughput 2.08577K wps
Begin Testing...
[Epoch 25] train avg loss 0.0043433, dev acc 0.8868, dev avg loss 0.281886, throughput 2.12809K wps
[Epoch 26 Batch 30/172] avg loss 0.00399257, throughput 2.1538K wps
[Epoch 26 Batch 60/172] avg loss 0.00413228, throughput 2.08333K wps
[Epoch 26 Batch 90/172] avg loss 0.00479992, throughput 2.08614K wps
[Epoch 26 Batch 120/172] avg loss 0.00430579, throughput 2.09745K wps
[Epoch 26 Batch 150/172] avg loss 0.00439344, throughput 2.15142K wps
Begin Testing...
[Epoch 26] train avg loss 0.00428301, dev acc 0.8889, dev avg loss 0.280637, throughput 2.11592K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/172] avg loss 0.00379508, throughput 2.18456K wps
[Epoch 27 Batch 60/172] avg loss 0.00378307, throughput 2.1093K wps
[Epoch 27 Batch 90/172] avg loss 0.00446441, throughput 2.08976K wps
[Epoch 27 Batch 120/172] avg loss 0.00451192, throughput 2.09263K wps
[Epoch 27 Batch 150/172] avg loss 0.00442827, throughput 2.10746K wps
Begin Testing...
[Epoch 27] train avg loss 0.00418836, dev acc 0.8868, dev avg loss 0.280282, throughput 2.11537K wps
[Epoch 28 Batch 30/172] avg loss 0.00392724, throughput 2.19258K wps
[Epoch 28 Batch 60/172] avg loss 0.00400272, throughput 2.14352K wps
[Epoch 28 Batch 90/172] avg loss 0.00435495, throughput 2.11409K wps
[Epoch 28 Batch 120/172] avg loss 0.0041367, throughput 2.08254K wps
[Epoch 28 Batch 150/172] avg loss 0.00397004, throughput 2.14839K wps
Begin Testing...
[Epoch 28] train avg loss 0.00412923, dev acc 0.8836, dev avg loss 0.280006, throughput 2.1381K wps
[Epoch 29 Batch 30/172] avg loss 0.00421693, throughput 2.19919K wps
[Epoch 29 Batch 60/172] avg loss 0.0038676, throughput 2.15263K wps
[Epoch 29 Batch 90/172] avg loss 0.00428414, throughput 2.13499K wps
[Epoch 29 Batch 120/172] avg loss 0.00404092, throughput 2.08958K wps
[Epoch 29 Batch 150/172] avg loss 0.00358083, throughput 2.14257K wps
Begin Testing...
[Epoch 29] train avg loss 0.00401664, dev acc 0.8857, dev avg loss 0.280926, throughput 2.14438K wps
[Epoch 30 Batch 30/172] avg loss 0.00389227, throughput 2.18129K wps
[Epoch 30 Batch 60/172] avg loss 0.00375786, throughput 2.10578K wps
[Epoch 30 Batch 90/172] avg loss 0.00388128, throughput 2.13177K wps
[Epoch 30 Batch 120/172] avg loss 0.00442859, throughput 2.13712K wps
[Epoch 30 Batch 150/172] avg loss 0.00362452, throughput 2.14064K wps
Begin Testing...
[Epoch 30] train avg loss 0.00391299, dev acc 0.8857, dev avg loss 0.280079, throughput 2.13916K wps
[Epoch 31 Batch 30/172] avg loss 0.00391445, throughput 2.17638K wps
[Epoch 31 Batch 60/172] avg loss 0.00381152, throughput 2.1055K wps
[Epoch 31 Batch 90/172] avg loss 0.00381127, throughput 2.09543K wps
[Epoch 31 Batch 120/172] avg loss 0.00396446, throughput 2.13504K wps
[Epoch 31 Batch 150/172] avg loss 0.00381635, throughput 2.13069K wps
Begin Testing...
[Epoch 31] train avg loss 0.00388813, dev acc 0.8868, dev avg loss 0.28063, throughput 2.13057K wps
[Epoch 32 Batch 30/172] avg loss 0.0039041, throughput 2.14423K wps
[Epoch 32 Batch 60/172] avg loss 0.00374188, throughput 2.1443K wps
[Epoch 32 Batch 90/172] avg loss 0.00338015, throughput 2.12702K wps
[Epoch 32 Batch 120/172] avg loss 0.00397545, throughput 2.0729K wps
[Epoch 32 Batch 150/172] avg loss 0.00397772, throughput 2.07992K wps
Begin Testing...
[Epoch 32] train avg loss 0.00380456, dev acc 0.8868, dev avg loss 0.280202, throughput 2.10955K wps
[Epoch 33 Batch 30/172] avg loss 0.00396255, throughput 2.10749K wps
[Epoch 33 Batch 60/172] avg loss 0.00363704, throughput 2.08966K wps
[Epoch 33 Batch 90/172] avg loss 0.00374084, throughput 2.10002K wps
[Epoch 33 Batch 120/172] avg loss 0.00358014, throughput 2.14131K wps
[Epoch 33 Batch 150/172] avg loss 0.00364383, throughput 2.14727K wps
Begin Testing...
[Epoch 33] train avg loss 0.00372367, dev acc 0.8878, dev avg loss 0.280788, throughput 2.12015K wps
[Epoch 34 Batch 30/172] avg loss 0.00342923, throughput 2.10893K wps
[Epoch 34 Batch 60/172] avg loss 0.00339666, throughput 2.09961K wps
[Epoch 34 Batch 90/172] avg loss 0.00387205, throughput 2.14251K wps
[Epoch 34 Batch 120/172] avg loss 0.00342253, throughput 2.13998K wps
[Epoch 34 Batch 150/172] avg loss 0.00388156, throughput 2.149K wps
Begin Testing...
[Epoch 34] train avg loss 0.00360028, dev acc 0.8878, dev avg loss 0.281524, throughput 2.13016K wps
[Epoch 35 Batch 30/172] avg loss 0.00335986, throughput 2.12681K wps
[Epoch 35 Batch 60/172] avg loss 0.00356274, throughput 2.13268K wps
[Epoch 35 Batch 90/172] avg loss 0.00326689, throughput 2.12683K wps
[Epoch 35 Batch 120/172] avg loss 0.00357597, throughput 2.08709K wps
[Epoch 35 Batch 150/172] avg loss 0.00389832, throughput 2.09448K wps
Begin Testing...
[Epoch 35] train avg loss 0.00354724, dev acc 0.8878, dev avg loss 0.281793, throughput 2.11738K wps
[Epoch 36 Batch 30/172] avg loss 0.00345655, throughput 2.15207K wps
[Epoch 36 Batch 60/172] avg loss 0.00338364, throughput 2.07662K wps
[Epoch 36 Batch 90/172] avg loss 0.00336912, throughput 2.1225K wps
[Epoch 36 Batch 120/172] avg loss 0.00331491, throughput 2.10594K wps
[Epoch 36 Batch 150/172] avg loss 0.0035169, throughput 2.14457K wps
Begin Testing...
[Epoch 36] train avg loss 0.00343733, dev acc 0.8889, dev avg loss 0.282236, throughput 2.12081K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/172] avg loss 0.00332692, throughput 2.15669K wps
[Epoch 37 Batch 60/172] avg loss 0.00350158, throughput 2.12682K wps
[Epoch 37 Batch 90/172] avg loss 0.00330146, throughput 2.09821K wps
[Epoch 37 Batch 120/172] avg loss 0.00357772, throughput 2.14706K wps
[Epoch 37 Batch 150/172] avg loss 0.00311475, throughput 2.14305K wps
Begin Testing...
[Epoch 37] train avg loss 0.00335986, dev acc 0.8889, dev avg loss 0.282662, throughput 2.12824K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/172] avg loss 0.00346488, throughput 2.13835K wps
[Epoch 38 Batch 60/172] avg loss 0.0035615, throughput 2.08258K wps
[Epoch 38 Batch 90/172] avg loss 0.00299576, throughput 2.11431K wps
[Epoch 38 Batch 120/172] avg loss 0.00330997, throughput 2.076K wps
[Epoch 38 Batch 150/172] avg loss 0.00327691, throughput 2.10694K wps
Begin Testing...
[Epoch 38] train avg loss 0.0032976, dev acc 0.8836, dev avg loss 0.286145, throughput 2.10814K wps
[Epoch 39 Batch 30/172] avg loss 0.00379904, throughput 2.16247K wps
[Epoch 39 Batch 60/172] avg loss 0.00328289, throughput 2.09186K wps
[Epoch 39 Batch 90/172] avg loss 0.00345905, throughput 2.01583K wps
[Epoch 39 Batch 120/172] avg loss 0.00309442, throughput 2.09269K wps
[Epoch 39 Batch 150/172] avg loss 0.00294599, throughput 2.14779K wps
Begin Testing...
[Epoch 39] train avg loss 0.00326988, dev acc 0.8910, dev avg loss 0.284156, throughput 2.10573K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/172] avg loss 0.00311499, throughput 2.15767K wps
[Epoch 40 Batch 60/172] avg loss 0.00315679, throughput 2.08586K wps
[Epoch 40 Batch 90/172] avg loss 0.00327291, throughput 2.10079K wps
[Epoch 40 Batch 120/172] avg loss 0.00329866, throughput 2.13523K wps
[Epoch 40 Batch 150/172] avg loss 0.00344389, throughput 2.12822K wps
Begin Testing...
[Epoch 40] train avg loss 0.00321626, dev acc 0.8910, dev avg loss 0.286209, throughput 2.12483K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/172] avg loss 0.00325997, throughput 2.14458K wps
[Epoch 41 Batch 60/172] avg loss 0.00293169, throughput 2.08982K wps
[Epoch 41 Batch 90/172] avg loss 0.00325758, throughput 2.10811K wps
[Epoch 41 Batch 120/172] avg loss 0.00299738, throughput 2.10625K wps
[Epoch 41 Batch 150/172] avg loss 0.0030961, throughput 2.07643K wps
Begin Testing...
[Epoch 41] train avg loss 0.00312989, dev acc 0.8910, dev avg loss 0.28673, throughput 2.1044K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/172] avg loss 0.00302987, throughput 2.16996K wps
[Epoch 42 Batch 60/172] avg loss 0.00338617, throughput 2.13302K wps
[Epoch 42 Batch 90/172] avg loss 0.00281867, throughput 2.14916K wps
[Epoch 42 Batch 120/172] avg loss 0.00294446, throughput 2.10221K wps
[Epoch 42 Batch 150/172] avg loss 0.00321678, throughput 2.09294K wps
Begin Testing...
[Epoch 42] train avg loss 0.0030871, dev acc 0.8910, dev avg loss 0.288484, throughput 2.12273K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/172] avg loss 0.00307799, throughput 2.13489K wps
[Epoch 43 Batch 60/172] avg loss 0.00287233, throughput 2.13776K wps
[Epoch 43 Batch 90/172] avg loss 0.00309004, throughput 2.14164K wps
[Epoch 43 Batch 120/172] avg loss 0.003007, throughput 2.09907K wps
[Epoch 43 Batch 150/172] avg loss 0.00292298, throughput 2.14334K wps
Begin Testing...
[Epoch 43] train avg loss 0.00299758, dev acc 0.8899, dev avg loss 0.2911, throughput 2.13274K wps
[Epoch 44 Batch 30/172] avg loss 0.0030913, throughput 2.16762K wps
[Epoch 44 Batch 60/172] avg loss 0.00312008, throughput 2.15251K wps
[Epoch 44 Batch 90/172] avg loss 0.00276232, throughput 2.14746K wps
[Epoch 44 Batch 120/172] avg loss 0.0029594, throughput 2.15139K wps
[Epoch 44 Batch 150/172] avg loss 0.00301783, throughput 2.14822K wps
Begin Testing...
[Epoch 44] train avg loss 0.00294026, dev acc 0.8910, dev avg loss 0.28925, throughput 2.15361K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/172] avg loss 0.00272778, throughput 2.18688K wps
[Epoch 45 Batch 60/172] avg loss 0.00300964, throughput 2.14258K wps
[Epoch 45 Batch 90/172] avg loss 0.002702, throughput 2.14071K wps
[Epoch 45 Batch 120/172] avg loss 0.00304039, throughput 2.1348K wps
[Epoch 45 Batch 150/172] avg loss 0.00282467, throughput 2.13709K wps
Begin Testing...
[Epoch 45] train avg loss 0.00285991, dev acc 0.8920, dev avg loss 0.290696, throughput 2.14486K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/172] avg loss 0.00249467, throughput 2.14306K wps
[Epoch 46 Batch 60/172] avg loss 0.00277046, throughput 2.12714K wps
[Epoch 46 Batch 90/172] avg loss 0.0026432, throughput 2.1403K wps
[Epoch 46 Batch 120/172] avg loss 0.0032759, throughput 2.14988K wps
[Epoch 46 Batch 150/172] avg loss 0.00289594, throughput 2.14364K wps
Begin Testing...
[Epoch 46] train avg loss 0.00280204, dev acc 0.8910, dev avg loss 0.291407, throughput 2.1361K wps
[Epoch 47 Batch 30/172] avg loss 0.00275591, throughput 2.17494K wps
[Epoch 47 Batch 60/172] avg loss 0.00263489, throughput 2.09696K wps
[Epoch 47 Batch 90/172] avg loss 0.00264282, throughput 2.14561K wps
[Epoch 47 Batch 120/172] avg loss 0.00277915, throughput 2.10684K wps
[Epoch 47 Batch 150/172] avg loss 0.00284297, throughput 2.09747K wps
Begin Testing...
[Epoch 47] train avg loss 0.00274118, dev acc 0.8910, dev avg loss 0.292281, throughput 2.12667K wps
[Epoch 48 Batch 30/172] avg loss 0.00289424, throughput 2.12538K wps
[Epoch 48 Batch 60/172] avg loss 0.00241959, throughput 2.10079K wps
[Epoch 48 Batch 90/172] avg loss 0.00274275, throughput 2.14808K wps
[Epoch 48 Batch 120/172] avg loss 0.00265082, throughput 2.15223K wps
[Epoch 48 Batch 150/172] avg loss 0.00247999, throughput 2.08693K wps
Begin Testing...
[Epoch 48] train avg loss 0.00262642, dev acc 0.8899, dev avg loss 0.29508, throughput 2.11825K wps
[Epoch 49 Batch 30/172] avg loss 0.00255096, throughput 2.17881K wps
[Epoch 49 Batch 60/172] avg loss 0.00271224, throughput 2.15079K wps
[Epoch 49 Batch 90/172] avg loss 0.0024187, throughput 2.13965K wps
[Epoch 49 Batch 120/172] avg loss 0.0027839, throughput 2.10968K wps
[Epoch 49 Batch 150/172] avg loss 0.00251814, throughput 2.09348K wps
Begin Testing...
[Epoch 49] train avg loss 0.0026003, dev acc 0.8920, dev avg loss 0.296188, throughput 2.13473K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/172] avg loss 0.00232767, throughput 2.18918K wps
[Epoch 50 Batch 60/172] avg loss 0.00260105, throughput 2.14766K wps
[Epoch 50 Batch 90/172] avg loss 0.00275322, throughput 2.13903K wps
[Epoch 50 Batch 120/172] avg loss 0.00244604, throughput 2.12341K wps
[Epoch 50 Batch 150/172] avg loss 0.0025889, throughput 2.11224K wps
Begin Testing...
[Epoch 50] train avg loss 0.00252266, dev acc 0.8878, dev avg loss 0.299392, throughput 2.14254K wps
[Epoch 51 Batch 30/172] avg loss 0.00258137, throughput 2.15408K wps
[Epoch 51 Batch 60/172] avg loss 0.00229052, throughput 2.14678K wps
[Epoch 51 Batch 90/172] avg loss 0.00261254, throughput 2.12511K wps
[Epoch 51 Batch 120/172] avg loss 0.00253681, throughput 2.08469K wps
[Epoch 51 Batch 150/172] avg loss 0.00257739, throughput 2.12691K wps
Begin Testing...
[Epoch 51] train avg loss 0.00251802, dev acc 0.8910, dev avg loss 0.29996, throughput 2.13006K wps
[Epoch 52 Batch 30/172] avg loss 0.00252844, throughput 2.14421K wps
[Epoch 52 Batch 60/172] avg loss 0.00244344, throughput 2.10788K wps
[Epoch 52 Batch 90/172] avg loss 0.00228833, throughput 2.0787K wps
[Epoch 52 Batch 120/172] avg loss 0.00269904, throughput 2.13776K wps
[Epoch 52 Batch 150/172] avg loss 0.00241925, throughput 2.14021K wps
Begin Testing...
[Epoch 52] train avg loss 0.00247348, dev acc 0.8920, dev avg loss 0.30167, throughput 2.12409K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/172] avg loss 0.00234045, throughput 2.17247K wps
[Epoch 53 Batch 60/172] avg loss 0.00236943, throughput 2.15007K wps
[Epoch 53 Batch 90/172] avg loss 0.00229139, throughput 2.14906K wps
[Epoch 53 Batch 120/172] avg loss 0.00231939, throughput 2.14642K wps
[Epoch 53 Batch 150/172] avg loss 0.00246873, throughput 2.15639K wps
Begin Testing...
[Epoch 53] train avg loss 0.00239065, dev acc 0.8910, dev avg loss 0.303804, throughput 2.15386K wps
[Epoch 54 Batch 30/172] avg loss 0.00246235, throughput 2.18812K wps
[Epoch 54 Batch 60/172] avg loss 0.00235697, throughput 2.13011K wps
[Epoch 54 Batch 90/172] avg loss 0.00211906, throughput 2.10815K wps
[Epoch 54 Batch 120/172] avg loss 0.00221866, throughput 2.08585K wps
[Epoch 54 Batch 150/172] avg loss 0.0024181, throughput 2.0903K wps
Begin Testing...
[Epoch 54] train avg loss 0.00235414, dev acc 0.8920, dev avg loss 0.30527, throughput 2.11429K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/172] avg loss 0.00223925, throughput 2.16524K wps
[Epoch 55 Batch 60/172] avg loss 0.00243131, throughput 2.14187K wps
[Epoch 55 Batch 90/172] avg loss 0.00218003, throughput 2.12293K wps
[Epoch 55 Batch 120/172] avg loss 0.00235552, throughput 2.08362K wps
[Epoch 55 Batch 150/172] avg loss 0.00231107, throughput 2.08543K wps
Begin Testing...
[Epoch 55] train avg loss 0.00231792, dev acc 0.8878, dev avg loss 0.308219, throughput 2.1148K wps
[Epoch 56 Batch 30/172] avg loss 0.00205116, throughput 2.13404K wps
[Epoch 56 Batch 60/172] avg loss 0.0021027, throughput 2.13519K wps
[Epoch 56 Batch 90/172] avg loss 0.00242923, throughput 2.14588K wps
[Epoch 56 Batch 120/172] avg loss 0.0023605, throughput 2.15272K wps
[Epoch 56 Batch 150/172] avg loss 0.00225792, throughput 2.14218K wps
Begin Testing...
[Epoch 56] train avg loss 0.0022333, dev acc 0.8899, dev avg loss 0.309032, throughput 2.14224K wps
[Epoch 57 Batch 30/172] avg loss 0.00224139, throughput 2.14988K wps
[Epoch 57 Batch 60/172] avg loss 0.00236516, throughput 2.10762K wps
[Epoch 57 Batch 90/172] avg loss 0.00225349, throughput 2.08929K wps
[Epoch 57 Batch 120/172] avg loss 0.0019012, throughput 2.10705K wps
[Epoch 57 Batch 150/172] avg loss 0.00213058, throughput 2.12008K wps
Begin Testing...
[Epoch 57] train avg loss 0.00223041, dev acc 0.8899, dev avg loss 0.311608, throughput 2.11248K wps
[Epoch 58 Batch 30/172] avg loss 0.00209476, throughput 2.15691K wps
[Epoch 58 Batch 60/172] avg loss 0.00195958, throughput 2.13643K wps
[Epoch 58 Batch 90/172] avg loss 0.00216609, throughput 2.08926K wps
[Epoch 58 Batch 120/172] avg loss 0.0023871, throughput 2.1118K wps
[Epoch 58 Batch 150/172] avg loss 0.00222056, throughput 2.11465K wps
Begin Testing...
[Epoch 58] train avg loss 0.00219546, dev acc 0.8910, dev avg loss 0.312207, throughput 2.1206K wps
[Epoch 59 Batch 30/172] avg loss 0.00192112, throughput 2.12495K wps
[Epoch 59 Batch 60/172] avg loss 0.00196435, throughput 2.09138K wps
[Epoch 59 Batch 90/172] avg loss 0.00236179, throughput 2.07788K wps
[Epoch 59 Batch 120/172] avg loss 0.00211109, throughput 2.09057K wps
[Epoch 59 Batch 150/172] avg loss 0.00207989, throughput 2.1449K wps
Begin Testing...
[Epoch 59] train avg loss 0.00212558, dev acc 0.8889, dev avg loss 0.313485, throughput 2.1109K wps
[Epoch 60 Batch 30/172] avg loss 0.00203757, throughput 2.1433K wps
[Epoch 60 Batch 60/172] avg loss 0.00191617, throughput 2.09287K wps
[Epoch 60 Batch 90/172] avg loss 0.00197522, throughput 2.08947K wps
[Epoch 60 Batch 120/172] avg loss 0.00205278, throughput 2.0917K wps
[Epoch 60 Batch 150/172] avg loss 0.00204814, throughput 2.08188K wps
Begin Testing...
[Epoch 60] train avg loss 0.00208758, dev acc 0.8868, dev avg loss 0.316369, throughput 2.10044K wps
[Epoch 61 Batch 30/172] avg loss 0.00184605, throughput 2.15453K wps
[Epoch 61 Batch 60/172] avg loss 0.00204209, throughput 2.07111K wps
[Epoch 61 Batch 90/172] avg loss 0.00183075, throughput 2.09594K wps
[Epoch 61 Batch 120/172] avg loss 0.00191755, throughput 2.0883K wps
[Epoch 61 Batch 150/172] avg loss 0.00222142, throughput 2.10834K wps
Begin Testing...
[Epoch 61] train avg loss 0.00195448, dev acc 0.8847, dev avg loss 0.31985, throughput 2.10054K wps
[Epoch 62 Batch 30/172] avg loss 0.00205312, throughput 2.16139K wps
[Epoch 62 Batch 60/172] avg loss 0.00175239, throughput 2.09925K wps
[Epoch 62 Batch 90/172] avg loss 0.00212598, throughput 2.15658K wps
[Epoch 62 Batch 120/172] avg loss 0.00233901, throughput 2.15082K wps
[Epoch 62 Batch 150/172] avg loss 0.00182679, throughput 2.08375K wps
Begin Testing...
[Epoch 62] train avg loss 0.0020487, dev acc 0.8889, dev avg loss 0.31958, throughput 2.12964K wps
[Epoch 63 Batch 30/172] avg loss 0.00172168, throughput 2.13127K wps
[Epoch 63 Batch 60/172] avg loss 0.00200728, throughput 2.13099K wps
[Epoch 63 Batch 90/172] avg loss 0.00193709, throughput 2.15098K wps
[Epoch 63 Batch 120/172] avg loss 0.00224024, throughput 2.1492K wps
[Epoch 63 Batch 150/172] avg loss 0.00226682, throughput 2.14378K wps
Begin Testing...
[Epoch 63] train avg loss 0.00202626, dev acc 0.8889, dev avg loss 0.321088, throughput 2.14096K wps
[Epoch 64 Batch 30/172] avg loss 0.00180558, throughput 2.19579K wps
[Epoch 64 Batch 60/172] avg loss 0.0018812, throughput 2.14528K wps
[Epoch 64 Batch 90/172] avg loss 0.00220527, throughput 2.08882K wps
[Epoch 64 Batch 120/172] avg loss 0.00189011, throughput 2.08177K wps
[Epoch 64 Batch 150/172] avg loss 0.0019152, throughput 2.12004K wps
Begin Testing...
[Epoch 64] train avg loss 0.00196278, dev acc 0.8857, dev avg loss 0.324333, throughput 2.12831K wps
[Epoch 65 Batch 30/172] avg loss 0.00193276, throughput 2.14696K wps
[Epoch 65 Batch 60/172] avg loss 0.00186563, throughput 2.1105K wps
[Epoch 65 Batch 90/172] avg loss 0.00185739, throughput 2.09017K wps
[Epoch 65 Batch 120/172] avg loss 0.00173773, throughput 2.07034K wps
[Epoch 65 Batch 150/172] avg loss 0.00195627, throughput 2.10698K wps
Begin Testing...
[Epoch 65] train avg loss 0.00190922, dev acc 0.8868, dev avg loss 0.32617, throughput 2.11067K wps
[Epoch 66 Batch 30/172] avg loss 0.00184996, throughput 2.14541K wps
[Epoch 66 Batch 60/172] avg loss 0.00181949, throughput 2.09686K wps
[Epoch 66 Batch 90/172] avg loss 0.00169817, throughput 2.13542K wps
[Epoch 66 Batch 120/172] avg loss 0.00210632, throughput 2.11942K wps
[Epoch 66 Batch 150/172] avg loss 0.00164711, throughput 2.06525K wps
Begin Testing...
[Epoch 66] train avg loss 0.00186681, dev acc 0.8878, dev avg loss 0.327826, throughput 2.1066K wps
[Epoch 67 Batch 30/172] avg loss 0.0020283, throughput 2.17989K wps
[Epoch 67 Batch 60/172] avg loss 0.00164774, throughput 2.12146K wps
[Epoch 67 Batch 90/172] avg loss 0.00194825, throughput 2.08358K wps
[Epoch 67 Batch 120/172] avg loss 0.00217209, throughput 2.06812K wps
[Epoch 67 Batch 150/172] avg loss 0.00160314, throughput 2.0764K wps
Begin Testing...
[Epoch 67] train avg loss 0.0018831, dev acc 0.8857, dev avg loss 0.329608, throughput 2.1047K wps
[Epoch 68 Batch 30/172] avg loss 0.00205843, throughput 2.14056K wps
[Epoch 68 Batch 60/172] avg loss 0.00177118, throughput 2.11842K wps
[Epoch 68 Batch 90/172] avg loss 0.00168936, throughput 2.08312K wps
[Epoch 68 Batch 120/172] avg loss 0.00185181, throughput 2.13904K wps
[Epoch 68 Batch 150/172] avg loss 0.00171421, throughput 2.08657K wps
Begin Testing...
[Epoch 68] train avg loss 0.00181058, dev acc 0.8868, dev avg loss 0.331505, throughput 2.11173K wps
[Epoch 69 Batch 30/172] avg loss 0.00186893, throughput 2.12901K wps
[Epoch 69 Batch 60/172] avg loss 0.00180478, throughput 2.07979K wps
[Epoch 69 Batch 90/172] avg loss 0.00190819, throughput 2.12252K wps
[Epoch 69 Batch 120/172] avg loss 0.00176447, throughput 2.14343K wps
[Epoch 69 Batch 150/172] avg loss 0.00175148, throughput 2.12061K wps
Begin Testing...
[Epoch 69] train avg loss 0.00182383, dev acc 0.8857, dev avg loss 0.333784, throughput 2.12221K wps
[Epoch 70 Batch 30/172] avg loss 0.00180884, throughput 2.14885K wps
[Epoch 70 Batch 60/172] avg loss 0.00201705, throughput 2.12138K wps
[Epoch 70 Batch 90/172] avg loss 0.00159996, throughput 2.14571K wps
[Epoch 70 Batch 120/172] avg loss 0.00179289, throughput 2.15061K wps
[Epoch 70 Batch 150/172] avg loss 0.00182079, throughput 2.14773K wps
Begin Testing...
[Epoch 70] train avg loss 0.00180453, dev acc 0.8868, dev avg loss 0.336292, throughput 2.14259K wps
[Epoch 71 Batch 30/172] avg loss 0.00156987, throughput 2.12326K wps
[Epoch 71 Batch 60/172] avg loss 0.00173053, throughput 2.06385K wps
[Epoch 71 Batch 90/172] avg loss 0.00199224, throughput 2.08118K wps
[Epoch 71 Batch 120/172] avg loss 0.00196271, throughput 2.12718K wps
[Epoch 71 Batch 150/172] avg loss 0.00166496, throughput 2.14916K wps
Begin Testing...
[Epoch 71] train avg loss 0.00176747, dev acc 0.8847, dev avg loss 0.337013, throughput 2.11196K wps
[Epoch 72 Batch 30/172] avg loss 0.00177613, throughput 2.12249K wps
[Epoch 72 Batch 60/172] avg loss 0.00196975, throughput 2.10809K wps
[Epoch 72 Batch 90/172] avg loss 0.00161873, throughput 2.14514K wps
[Epoch 72 Batch 120/172] avg loss 0.00171895, throughput 2.14276K wps
[Epoch 72 Batch 150/172] avg loss 0.00156927, throughput 2.1413K wps
Begin Testing...
[Epoch 72] train avg loss 0.0017106, dev acc 0.8868, dev avg loss 0.339624, throughput 2.13415K wps
[Epoch 73 Batch 30/172] avg loss 0.00148561, throughput 2.1297K wps
[Epoch 73 Batch 60/172] avg loss 0.00176341, throughput 2.12141K wps
[Epoch 73 Batch 90/172] avg loss 0.00178427, throughput 2.15382K wps
[Epoch 73 Batch 120/172] avg loss 0.00180655, throughput 2.14055K wps
[Epoch 73 Batch 150/172] avg loss 0.00158175, throughput 2.10224K wps
Begin Testing...
[Epoch 73] train avg loss 0.00169107, dev acc 0.8847, dev avg loss 0.341552, throughput 2.1253K wps
[Epoch 74 Batch 30/172] avg loss 0.00165598, throughput 2.14621K wps
[Epoch 74 Batch 60/172] avg loss 0.00186475, throughput 2.07244K wps
[Epoch 74 Batch 90/172] avg loss 0.0017022, throughput 2.08689K wps
[Epoch 74 Batch 120/172] avg loss 0.00164419, throughput 2.08167K wps
[Epoch 74 Batch 150/172] avg loss 0.00158472, throughput 2.09794K wps
Begin Testing...
[Epoch 74] train avg loss 0.00165347, dev acc 0.8868, dev avg loss 0.343801, throughput 2.09355K wps
[Epoch 75 Batch 30/172] avg loss 0.00160832, throughput 2.15276K wps
[Epoch 75 Batch 60/172] avg loss 0.00154612, throughput 2.14201K wps
[Epoch 75 Batch 90/172] avg loss 0.0016891, throughput 2.14324K wps
[Epoch 75 Batch 120/172] avg loss 0.0014561, throughput 2.11812K wps
[Epoch 75 Batch 150/172] avg loss 0.00178584, throughput 2.0809K wps
Begin Testing...
[Epoch 75] train avg loss 0.00162418, dev acc 0.8868, dev avg loss 0.345782, throughput 2.12342K wps
[Epoch 76 Batch 30/172] avg loss 0.00176043, throughput 2.18055K wps
[Epoch 76 Batch 60/172] avg loss 0.00150024, throughput 2.09042K wps
[Epoch 76 Batch 90/172] avg loss 0.00156763, throughput 2.11083K wps
[Epoch 76 Batch 120/172] avg loss 0.00164153, throughput 2.14824K wps
[Epoch 76 Batch 150/172] avg loss 0.0015244, throughput 2.12123K wps
Begin Testing...
[Epoch 76] train avg loss 0.001599, dev acc 0.8878, dev avg loss 0.347391, throughput 2.12642K wps
[Epoch 77 Batch 30/172] avg loss 0.00175147, throughput 2.14217K wps
[Epoch 77 Batch 60/172] avg loss 0.00151064, throughput 2.09127K wps
[Epoch 77 Batch 90/172] avg loss 0.00160563, throughput 2.14247K wps
[Epoch 77 Batch 120/172] avg loss 0.00160452, throughput 2.0862K wps
[Epoch 77 Batch 150/172] avg loss 0.00154889, throughput 2.14717K wps
Begin Testing...
[Epoch 77] train avg loss 0.00160298, dev acc 0.8868, dev avg loss 0.348872, throughput 2.12514K wps
[Epoch 78 Batch 30/172] avg loss 0.00158865, throughput 2.16031K wps
[Epoch 78 Batch 60/172] avg loss 0.00172312, throughput 2.1118K wps
[Epoch 78 Batch 90/172] avg loss 0.00124676, throughput 2.10978K wps
[Epoch 78 Batch 120/172] avg loss 0.00141087, throughput 2.13707K wps
[Epoch 78 Batch 150/172] avg loss 0.0020187, throughput 2.10887K wps
Begin Testing...
[Epoch 78] train avg loss 0.00157648, dev acc 0.8889, dev avg loss 0.350936, throughput 2.12677K wps
[Epoch 79 Batch 30/172] avg loss 0.00181336, throughput 2.19096K wps
[Epoch 79 Batch 60/172] avg loss 0.00153374, throughput 2.15305K wps
[Epoch 79 Batch 90/172] avg loss 0.00147408, throughput 2.1301K wps
[Epoch 79 Batch 120/172] avg loss 0.00144776, throughput 2.09835K wps
[Epoch 79 Batch 150/172] avg loss 0.00147417, throughput 2.13701K wps
Begin Testing...
[Epoch 79] train avg loss 0.00155557, dev acc 0.8857, dev avg loss 0.352998, throughput 2.14165K wps
[Epoch 80 Batch 30/172] avg loss 0.00149196, throughput 2.14146K wps
[Epoch 80 Batch 60/172] avg loss 0.00185682, throughput 2.14604K wps
[Epoch 80 Batch 90/172] avg loss 0.00147853, throughput 2.1073K wps
[Epoch 80 Batch 120/172] avg loss 0.0013298, throughput 2.09864K wps
[Epoch 80 Batch 150/172] avg loss 0.00144046, throughput 2.13523K wps
Begin Testing...
[Epoch 80] train avg loss 0.00154975, dev acc 0.8857, dev avg loss 0.35572, throughput 2.12285K wps
[Epoch 81 Batch 30/172] avg loss 0.00146513, throughput 2.14254K wps
[Epoch 81 Batch 60/172] avg loss 0.0013142, throughput 2.08617K wps
[Epoch 81 Batch 90/172] avg loss 0.0015773, throughput 2.09669K wps
[Epoch 81 Batch 120/172] avg loss 0.00159331, throughput 2.08918K wps
[Epoch 81 Batch 150/172] avg loss 0.00142499, throughput 2.1066K wps
Begin Testing...
[Epoch 81] train avg loss 0.00146794, dev acc 0.8784, dev avg loss 0.358955, throughput 2.10688K wps
[Epoch 82 Batch 30/172] avg loss 0.00172038, throughput 2.18396K wps
[Epoch 82 Batch 60/172] avg loss 0.00126344, throughput 2.08302K wps
[Epoch 82 Batch 90/172] avg loss 0.0013173, throughput 2.09511K wps
[Epoch 82 Batch 120/172] avg loss 0.00145434, throughput 2.14843K wps
[Epoch 82 Batch 150/172] avg loss 0.0015932, throughput 2.15356K wps
Begin Testing...
[Epoch 82] train avg loss 0.00148377, dev acc 0.8795, dev avg loss 0.360511, throughput 2.13475K wps
[Epoch 83 Batch 30/172] avg loss 0.00154976, throughput 2.16996K wps
[Epoch 83 Batch 60/172] avg loss 0.00129066, throughput 2.14665K wps
[Epoch 83 Batch 90/172] avg loss 0.00146826, throughput 2.14847K wps
[Epoch 83 Batch 120/172] avg loss 0.00133078, throughput 2.12247K wps
[Epoch 83 Batch 150/172] avg loss 0.00157748, throughput 2.12903K wps
Begin Testing...
[Epoch 83] train avg loss 0.00146869, dev acc 0.8857, dev avg loss 0.361171, throughput 2.14385K wps
[Epoch 84 Batch 30/172] avg loss 0.00152912, throughput 2.13957K wps
[Epoch 84 Batch 60/172] avg loss 0.00147116, throughput 2.146K wps
[Epoch 84 Batch 90/172] avg loss 0.00139582, throughput 2.15831K wps
[Epoch 84 Batch 120/172] avg loss 0.00112027, throughput 2.12453K wps
[Epoch 84 Batch 150/172] avg loss 0.00135902, throughput 2.15369K wps
Begin Testing...
[Epoch 84] train avg loss 0.00140844, dev acc 0.8826, dev avg loss 0.363024, throughput 2.14302K wps
[Epoch 85 Batch 30/172] avg loss 0.00134503, throughput 2.15625K wps
[Epoch 85 Batch 60/172] avg loss 0.00152187, throughput 2.08355K wps
[Epoch 85 Batch 90/172] avg loss 0.00135959, throughput 2.09188K wps
[Epoch 85 Batch 120/172] avg loss 0.00160023, throughput 2.1343K wps
[Epoch 85 Batch 150/172] avg loss 0.0012652, throughput 2.14901K wps
Begin Testing...
[Epoch 85] train avg loss 0.00146048, dev acc 0.8878, dev avg loss 0.363779, throughput 2.12364K wps
[Epoch 86 Batch 30/172] avg loss 0.00142639, throughput 2.18583K wps
[Epoch 86 Batch 60/172] avg loss 0.00125874, throughput 2.12957K wps
[Epoch 86 Batch 90/172] avg loss 0.00138045, throughput 2.11574K wps
[Epoch 86 Batch 120/172] avg loss 0.00138754, throughput 2.08033K wps
[Epoch 86 Batch 150/172] avg loss 0.00147262, throughput 2.0913K wps
Begin Testing...
[Epoch 86] train avg loss 0.00144652, dev acc 0.8836, dev avg loss 0.365866, throughput 2.12455K wps
[Epoch 87 Batch 30/172] avg loss 0.00124217, throughput 2.16248K wps
[Epoch 87 Batch 60/172] avg loss 0.00167754, throughput 2.14334K wps
[Epoch 87 Batch 90/172] avg loss 0.0014129, throughput 2.14771K wps
[Epoch 87 Batch 120/172] avg loss 0.00125183, throughput 2.09791K wps
[Epoch 87 Batch 150/172] avg loss 0.00132138, throughput 2.11609K wps
Begin Testing...
[Epoch 87] train avg loss 0.00138489, dev acc 0.8868, dev avg loss 0.367658, throughput 2.13573K wps
[Epoch 88 Batch 30/172] avg loss 0.00142489, throughput 2.18219K wps
[Epoch 88 Batch 60/172] avg loss 0.00136108, throughput 2.12185K wps
[Epoch 88 Batch 90/172] avg loss 0.00129588, throughput 2.14765K wps
[Epoch 88 Batch 120/172] avg loss 0.00134786, throughput 2.09024K wps
[Epoch 88 Batch 150/172] avg loss 0.00137919, throughput 2.12267K wps
Begin Testing...
[Epoch 88] train avg loss 0.00139094, dev acc 0.8847, dev avg loss 0.370808, throughput 2.13294K wps
[Epoch 89 Batch 30/172] avg loss 0.00142525, throughput 2.19147K wps
[Epoch 89 Batch 60/172] avg loss 0.00130273, throughput 2.13405K wps
[Epoch 89 Batch 90/172] avg loss 0.00141233, throughput 2.1337K wps
[Epoch 89 Batch 120/172] avg loss 0.00149531, throughput 2.0972K wps
[Epoch 89 Batch 150/172] avg loss 0.00134752, throughput 2.14432K wps
Begin Testing...
[Epoch 89] train avg loss 0.00140288, dev acc 0.8847, dev avg loss 0.371035, throughput 2.13272K wps
[Epoch 90 Batch 30/172] avg loss 0.00143672, throughput 2.13527K wps
[Epoch 90 Batch 60/172] avg loss 0.00154875, throughput 2.09307K wps
[Epoch 90 Batch 90/172] avg loss 0.00118774, throughput 2.099K wps
[Epoch 90 Batch 120/172] avg loss 0.00146276, throughput 2.14063K wps
[Epoch 90 Batch 150/172] avg loss 0.00125278, throughput 2.08416K wps
Begin Testing...
[Epoch 90] train avg loss 0.00134891, dev acc 0.8857, dev avg loss 0.375261, throughput 2.11069K wps
[Epoch 91 Batch 30/172] avg loss 0.00135463, throughput 2.16926K wps
[Epoch 91 Batch 60/172] avg loss 0.00121797, throughput 2.15447K wps
[Epoch 91 Batch 90/172] avg loss 0.0013011, throughput 2.13573K wps
[Epoch 91 Batch 120/172] avg loss 0.00142774, throughput 2.10759K wps
[Epoch 91 Batch 150/172] avg loss 0.00146164, throughput 2.10254K wps
Begin Testing...
[Epoch 91] train avg loss 0.00136004, dev acc 0.8878, dev avg loss 0.376606, throughput 2.13476K wps
[Epoch 92 Batch 30/172] avg loss 0.00124128, throughput 2.14084K wps
[Epoch 92 Batch 60/172] avg loss 0.00121912, throughput 2.08847K wps
[Epoch 92 Batch 90/172] avg loss 0.00111903, throughput 2.09499K wps
[Epoch 92 Batch 120/172] avg loss 0.00153955, throughput 2.11107K wps
[Epoch 92 Batch 150/172] avg loss 0.00151949, throughput 2.09456K wps
Begin Testing...
[Epoch 92] train avg loss 0.001312, dev acc 0.8847, dev avg loss 0.376668, throughput 2.10325K wps
[Epoch 93 Batch 30/172] avg loss 0.00125132, throughput 2.20474K wps
[Epoch 93 Batch 60/172] avg loss 0.00144846, throughput 2.11127K wps
[Epoch 93 Batch 90/172] avg loss 0.00125093, throughput 2.08961K wps
[Epoch 93 Batch 120/172] avg loss 0.00135575, throughput 2.07677K wps
[Epoch 93 Batch 150/172] avg loss 0.00123037, throughput 2.08344K wps
Begin Testing...
[Epoch 93] train avg loss 0.00131177, dev acc 0.8857, dev avg loss 0.378797, throughput 2.10986K wps
[Epoch 94 Batch 30/172] avg loss 0.00123415, throughput 2.17946K wps
[Epoch 94 Batch 60/172] avg loss 0.00144405, throughput 2.12976K wps
[Epoch 94 Batch 90/172] avg loss 0.00142861, throughput 2.12802K wps
[Epoch 94 Batch 120/172] avg loss 0.00131094, throughput 2.11011K wps
[Epoch 94 Batch 150/172] avg loss 0.00122734, throughput 2.1041K wps
Begin Testing...
[Epoch 94] train avg loss 0.00134503, dev acc 0.8826, dev avg loss 0.381365, throughput 2.1255K wps
[Epoch 95 Batch 30/172] avg loss 0.00110244, throughput 2.13632K wps
[Epoch 95 Batch 60/172] avg loss 0.00133856, throughput 2.08079K wps
[Epoch 95 Batch 90/172] avg loss 0.00137287, throughput 2.09106K wps
[Epoch 95 Batch 120/172] avg loss 0.00144148, throughput 2.13864K wps
[Epoch 95 Batch 150/172] avg loss 0.00114739, throughput 2.14545K wps
Begin Testing...
[Epoch 95] train avg loss 0.00130314, dev acc 0.8826, dev avg loss 0.384459, throughput 2.11826K wps
[Epoch 96 Batch 30/172] avg loss 0.00128955, throughput 2.14036K wps
[Epoch 96 Batch 60/172] avg loss 0.0011851, throughput 2.12265K wps
[Epoch 96 Batch 90/172] avg loss 0.00126372, throughput 2.12483K wps
[Epoch 96 Batch 120/172] avg loss 0.00138982, throughput 2.1165K wps
[Epoch 96 Batch 150/172] avg loss 0.00123901, throughput 2.0914K wps
Begin Testing...
[Epoch 96] train avg loss 0.0012771, dev acc 0.8868, dev avg loss 0.385643, throughput 2.12274K wps
[Epoch 97 Batch 30/172] avg loss 0.00128936, throughput 2.20893K wps
[Epoch 97 Batch 60/172] avg loss 0.00135646, throughput 2.15214K wps
[Epoch 97 Batch 90/172] avg loss 0.0012103, throughput 2.13086K wps
[Epoch 97 Batch 120/172] avg loss 0.001438, throughput 2.09614K wps
[Epoch 97 Batch 150/172] avg loss 0.00111337, throughput 2.10215K wps
Begin Testing...
[Epoch 97] train avg loss 0.00129326, dev acc 0.8826, dev avg loss 0.386878, throughput 2.12981K wps
[Epoch 98 Batch 30/172] avg loss 0.00112447, throughput 2.13078K wps
[Epoch 98 Batch 60/172] avg loss 0.00116867, throughput 2.11912K wps
[Epoch 98 Batch 90/172] avg loss 0.00110308, throughput 2.14006K wps
[Epoch 98 Batch 120/172] avg loss 0.00145743, throughput 2.1051K wps
[Epoch 98 Batch 150/172] avg loss 0.00135448, throughput 2.13897K wps
Begin Testing...
[Epoch 98] train avg loss 0.00124632, dev acc 0.8857, dev avg loss 0.390197, throughput 2.12338K wps
[Epoch 99 Batch 30/172] avg loss 0.00119729, throughput 2.13387K wps
[Epoch 99 Batch 60/172] avg loss 0.00109454, throughput 2.08906K wps
[Epoch 99 Batch 90/172] avg loss 0.00109129, throughput 2.13837K wps
[Epoch 99 Batch 120/172] avg loss 0.00133825, throughput 2.14555K wps
[Epoch 99 Batch 150/172] avg loss 0.0013001, throughput 2.08939K wps
Begin Testing...
[Epoch 99] train avg loss 0.00123863, dev acc 0.8836, dev avg loss 0.390098, throughput 2.11621K wps
[Epoch 100 Batch 30/172] avg loss 0.00111401, throughput 2.185K wps
[Epoch 100 Batch 60/172] avg loss 0.00130295, throughput 2.11665K wps
[Epoch 100 Batch 90/172] avg loss 0.00128995, throughput 2.09854K wps
[Epoch 100 Batch 120/172] avg loss 0.00119288, throughput 2.09847K wps
[Epoch 100 Batch 150/172] avg loss 0.00150826, throughput 2.14365K wps
Begin Testing...
[Epoch 100] train avg loss 0.0012701, dev acc 0.8868, dev avg loss 0.391073, throughput 2.12833K wps
[Epoch 101 Batch 30/172] avg loss 0.00141492, throughput 2.19369K wps
[Epoch 101 Batch 60/172] avg loss 0.00135101, throughput 2.13181K wps
[Epoch 101 Batch 90/172] avg loss 0.00100548, throughput 2.13528K wps
[Epoch 101 Batch 120/172] avg loss 0.00109931, throughput 2.15314K wps
[Epoch 101 Batch 150/172] avg loss 0.00128776, throughput 2.15297K wps
Begin Testing...
[Epoch 101] train avg loss 0.00123509, dev acc 0.8857, dev avg loss 0.394971, throughput 2.15336K wps
[Epoch 102 Batch 30/172] avg loss 0.00136943, throughput 2.16741K wps
[Epoch 102 Batch 60/172] avg loss 0.00106584, throughput 2.11905K wps
[Epoch 102 Batch 90/172] avg loss 0.00138739, throughput 2.08265K wps
[Epoch 102 Batch 120/172] avg loss 0.00122914, throughput 2.10249K wps
[Epoch 102 Batch 150/172] avg loss 0.00119999, throughput 2.10848K wps
Begin Testing...
[Epoch 102] train avg loss 0.00125503, dev acc 0.8878, dev avg loss 0.392792, throughput 2.11987K wps
[Epoch 103 Batch 30/172] avg loss 0.00112683, throughput 2.13484K wps
[Epoch 103 Batch 60/172] avg loss 0.00119231, throughput 2.13355K wps
[Epoch 103 Batch 90/172] avg loss 0.00132843, throughput 2.0978K wps
[Epoch 103 Batch 120/172] avg loss 0.00102732, throughput 2.12523K wps
[Epoch 103 Batch 150/172] avg loss 0.00137387, throughput 2.14757K wps
Begin Testing...
[Epoch 103] train avg loss 0.00121533, dev acc 0.8868, dev avg loss 0.394725, throughput 2.13K wps
[Epoch 104 Batch 30/172] avg loss 0.00115296, throughput 2.17414K wps
[Epoch 104 Batch 60/172] avg loss 0.000998835, throughput 2.14942K wps
[Epoch 104 Batch 90/172] avg loss 0.00120175, throughput 2.15406K wps
[Epoch 104 Batch 120/172] avg loss 0.00134388, throughput 2.15462K wps
[Epoch 104 Batch 150/172] avg loss 0.0012206, throughput 2.14278K wps
Begin Testing...
[Epoch 104] train avg loss 0.00124474, dev acc 0.8857, dev avg loss 0.395903, throughput 2.15152K wps
[Epoch 105 Batch 30/172] avg loss 0.0011989, throughput 2.15554K wps
[Epoch 105 Batch 60/172] avg loss 0.0013036, throughput 2.12869K wps
[Epoch 105 Batch 90/172] avg loss 0.00103581, throughput 2.09887K wps
[Epoch 105 Batch 120/172] avg loss 0.00134334, throughput 2.11043K wps
[Epoch 105 Batch 150/172] avg loss 0.00111065, throughput 2.1079K wps
Begin Testing...
[Epoch 105] train avg loss 0.00121875, dev acc 0.8816, dev avg loss 0.395758, throughput 2.1242K wps
[Epoch 106 Batch 30/172] avg loss 0.000937304, throughput 2.18482K wps
[Epoch 106 Batch 60/172] avg loss 0.00107351, throughput 2.14371K wps
[Epoch 106 Batch 90/172] avg loss 0.00113842, throughput 2.13419K wps
[Epoch 106 Batch 120/172] avg loss 0.00136964, throughput 2.12331K wps
[Epoch 106 Batch 150/172] avg loss 0.00124486, throughput 2.1155K wps
Begin Testing...
[Epoch 106] train avg loss 0.00116279, dev acc 0.8857, dev avg loss 0.400265, throughput 2.14069K wps
[Epoch 107 Batch 30/172] avg loss 0.00107178, throughput 2.15194K wps
[Epoch 107 Batch 60/172] avg loss 0.00118871, throughput 2.13373K wps
[Epoch 107 Batch 90/172] avg loss 0.0013523, throughput 2.14963K wps
[Epoch 107 Batch 120/172] avg loss 0.00105816, throughput 2.14925K wps
[Epoch 107 Batch 150/172] avg loss 0.00121491, throughput 2.1074K wps
Begin Testing...
[Epoch 107] train avg loss 0.00115084, dev acc 0.8868, dev avg loss 0.400654, throughput 2.13232K wps
[Epoch 108 Batch 30/172] avg loss 0.00117902, throughput 2.18342K wps
[Epoch 108 Batch 60/172] avg loss 0.00118316, throughput 2.13947K wps
[Epoch 108 Batch 90/172] avg loss 0.00114529, throughput 2.08853K wps
[Epoch 108 Batch 120/172] avg loss 0.00103578, throughput 2.09635K wps
[Epoch 108 Batch 150/172] avg loss 0.00129044, throughput 2.14015K wps
Begin Testing...
[Epoch 108] train avg loss 0.00116112, dev acc 0.8868, dev avg loss 0.403514, throughput 2.1313K wps
[Epoch 109 Batch 30/172] avg loss 0.000919716, throughput 2.18333K wps
[Epoch 109 Batch 60/172] avg loss 0.00128095, throughput 2.09251K wps
[Epoch 109 Batch 90/172] avg loss 0.00109331, throughput 2.08391K wps
[Epoch 109 Batch 120/172] avg loss 0.00110891, throughput 2.09238K wps
[Epoch 109 Batch 150/172] avg loss 0.00113042, throughput 2.08252K wps
Begin Testing...
[Epoch 109] train avg loss 0.00116935, dev acc 0.8836, dev avg loss 0.402408, throughput 2.10364K wps
[Epoch 110 Batch 30/172] avg loss 0.00110862, throughput 2.13487K wps
[Epoch 110 Batch 60/172] avg loss 0.00108114, throughput 2.12877K wps
[Epoch 110 Batch 90/172] avg loss 0.0012531, throughput 2.12619K wps
[Epoch 110 Batch 120/172] avg loss 0.00107023, throughput 2.0873K wps
[Epoch 110 Batch 150/172] avg loss 0.0011766, throughput 2.08709K wps
Begin Testing...
[Epoch 110] train avg loss 0.00116816, dev acc 0.8857, dev avg loss 0.403395, throughput 2.10885K wps
[Epoch 111 Batch 30/172] avg loss 0.0012327, throughput 2.13964K wps
[Epoch 111 Batch 60/172] avg loss 0.00103227, throughput 2.09725K wps
[Epoch 111 Batch 90/172] avg loss 0.00132634, throughput 2.09359K wps
[Epoch 111 Batch 120/172] avg loss 0.000992968, throughput 2.14301K wps
[Epoch 111 Batch 150/172] avg loss 0.00119438, throughput 2.10679K wps
Begin Testing...
[Epoch 111] train avg loss 0.00115463, dev acc 0.8847, dev avg loss 0.408502, throughput 2.12025K wps
[Epoch 112 Batch 30/172] avg loss 0.00122625, throughput 2.15253K wps
[Epoch 112 Batch 60/172] avg loss 0.00102568, throughput 2.12988K wps
[Epoch 112 Batch 90/172] avg loss 0.00125159, throughput 2.13068K wps
[Epoch 112 Batch 120/172] avg loss 0.00106449, throughput 2.14243K wps
[Epoch 112 Batch 150/172] avg loss 0.00101404, throughput 2.14589K wps
Begin Testing...
[Epoch 112] train avg loss 0.00112288, dev acc 0.8857, dev avg loss 0.406192, throughput 2.14164K wps
[Epoch 113 Batch 30/172] avg loss 0.00108615, throughput 2.12357K wps
[Epoch 113 Batch 60/172] avg loss 0.00102611, throughput 2.11146K wps
[Epoch 113 Batch 90/172] avg loss 0.00117492, throughput 2.15539K wps
[Epoch 113 Batch 120/172] avg loss 0.00100508, throughput 2.15075K wps
[Epoch 113 Batch 150/172] avg loss 0.0012806, throughput 2.11414K wps
Begin Testing...
[Epoch 113] train avg loss 0.00110876, dev acc 0.8836, dev avg loss 0.408904, throughput 2.12868K wps
[Epoch 114 Batch 30/172] avg loss 0.00104656, throughput 2.12718K wps
[Epoch 114 Batch 60/172] avg loss 0.00136103, throughput 2.09591K wps
[Epoch 114 Batch 90/172] avg loss 0.00122471, throughput 2.08344K wps
[Epoch 114 Batch 120/172] avg loss 0.00109028, throughput 2.09027K wps
[Epoch 114 Batch 150/172] avg loss 0.000965296, throughput 2.08398K wps
Begin Testing...
[Epoch 114] train avg loss 0.00112304, dev acc 0.8857, dev avg loss 0.40873, throughput 2.10173K wps
[Epoch 115 Batch 30/172] avg loss 0.000781548, throughput 2.18476K wps
[Epoch 115 Batch 60/172] avg loss 0.00126601, throughput 2.08472K wps
[Epoch 115 Batch 90/172] avg loss 0.00100086, throughput 2.11424K wps
[Epoch 115 Batch 120/172] avg loss 0.0010996, throughput 2.14979K wps
[Epoch 115 Batch 150/172] avg loss 0.0011645, throughput 2.12407K wps
Begin Testing...
[Epoch 115] train avg loss 0.00108756, dev acc 0.8857, dev avg loss 0.411466, throughput 2.12927K wps
[Epoch 116 Batch 30/172] avg loss 0.00103455, throughput 2.18789K wps
[Epoch 116 Batch 60/172] avg loss 0.000999846, throughput 2.15072K wps
[Epoch 116 Batch 90/172] avg loss 0.0010411, throughput 2.09472K wps
[Epoch 116 Batch 120/172] avg loss 0.00110771, throughput 2.1056K wps
[Epoch 116 Batch 150/172] avg loss 0.00117199, throughput 2.11759K wps
Begin Testing...
[Epoch 116] train avg loss 0.00108717, dev acc 0.8857, dev avg loss 0.412275, throughput 2.12873K wps
[Epoch 117 Batch 30/172] avg loss 0.000966045, throughput 2.12704K wps
[Epoch 117 Batch 60/172] avg loss 0.00107475, throughput 2.10811K wps
[Epoch 117 Batch 90/172] avg loss 0.00114368, throughput 2.13358K wps
[Epoch 117 Batch 120/172] avg loss 0.000905195, throughput 2.14883K wps
[Epoch 117 Batch 150/172] avg loss 0.00121935, throughput 2.14739K wps
Begin Testing...
[Epoch 117] train avg loss 0.00108056, dev acc 0.8868, dev avg loss 0.41423, throughput 2.13336K wps
[Epoch 118 Batch 30/172] avg loss 0.00081136, throughput 2.12627K wps
[Epoch 118 Batch 60/172] avg loss 0.00124076, throughput 2.08021K wps
[Epoch 118 Batch 90/172] avg loss 0.0010994, throughput 2.09133K wps
[Epoch 118 Batch 120/172] avg loss 0.00107049, throughput 2.08693K wps
[Epoch 118 Batch 150/172] avg loss 0.00112772, throughput 2.11048K wps
Begin Testing...
[Epoch 118] train avg loss 0.00106938, dev acc 0.8847, dev avg loss 0.41619, throughput 2.10389K wps
[Epoch 119 Batch 30/172] avg loss 0.00106609, throughput 2.17449K wps
[Epoch 119 Batch 60/172] avg loss 0.00106649, throughput 2.14485K wps
[Epoch 119 Batch 90/172] avg loss 0.00115129, throughput 2.13642K wps
[Epoch 119 Batch 120/172] avg loss 0.000929365, throughput 2.08865K wps
[Epoch 119 Batch 150/172] avg loss 0.000969384, throughput 2.08106K wps
Begin Testing...
[Epoch 119] train avg loss 0.00104529, dev acc 0.8868, dev avg loss 0.417243, throughput 2.11873K wps
[Epoch 120 Batch 30/172] avg loss 0.00107594, throughput 2.13105K wps
[Epoch 120 Batch 60/172] avg loss 0.000971309, throughput 2.09819K wps
[Epoch 120 Batch 90/172] avg loss 0.00101416, throughput 2.13558K wps
[Epoch 120 Batch 120/172] avg loss 0.0011763, throughput 2.146K wps
[Epoch 120 Batch 150/172] avg loss 0.000948179, throughput 2.15236K wps
Begin Testing...
[Epoch 120] train avg loss 0.00103457, dev acc 0.8857, dev avg loss 0.421407, throughput 2.13514K wps
[Epoch 121 Batch 30/172] avg loss 0.000995514, throughput 2.15488K wps
[Epoch 121 Batch 60/172] avg loss 0.00102489, throughput 2.08742K wps
[Epoch 121 Batch 90/172] avg loss 0.0010548, throughput 2.10702K wps
[Epoch 121 Batch 120/172] avg loss 0.00108153, throughput 2.07602K wps
[Epoch 121 Batch 150/172] avg loss 0.00125427, throughput 2.09695K wps
Begin Testing...
[Epoch 121] train avg loss 0.00110695, dev acc 0.8847, dev avg loss 0.421444, throughput 2.10793K wps
[Epoch 122 Batch 30/172] avg loss 0.000847821, throughput 2.14316K wps
[Epoch 122 Batch 60/172] avg loss 0.00104236, throughput 2.10627K wps
[Epoch 122 Batch 90/172] avg loss 0.00122728, throughput 2.084K wps
[Epoch 122 Batch 120/172] avg loss 0.00103708, throughput 2.07638K wps
[Epoch 122 Batch 150/172] avg loss 0.00108369, throughput 2.08398K wps
Begin Testing...
[Epoch 122] train avg loss 0.00102864, dev acc 0.8847, dev avg loss 0.42267, throughput 2.09734K wps
[Epoch 123 Batch 30/172] avg loss 0.00107857, throughput 2.16328K wps
[Epoch 123 Batch 60/172] avg loss 0.000853589, throughput 2.15164K wps
[Epoch 123 Batch 90/172] avg loss 0.00102881, throughput 2.09974K wps
[Epoch 123 Batch 120/172] avg loss 0.000940248, throughput 2.13635K wps
[Epoch 123 Batch 150/172] avg loss 0.00121176, throughput 2.14587K wps
Begin Testing...
[Epoch 123] train avg loss 0.0010373, dev acc 0.8889, dev avg loss 0.422402, throughput 2.13886K wps
[Epoch 124 Batch 30/172] avg loss 0.00101007, throughput 2.1684K wps
[Epoch 124 Batch 60/172] avg loss 0.00100544, throughput 2.14971K wps
[Epoch 124 Batch 90/172] avg loss 0.000857982, throughput 2.08221K wps
[Epoch 124 Batch 120/172] avg loss 0.00100497, throughput 2.08311K wps
[Epoch 124 Batch 150/172] avg loss 0.0011021, throughput 2.08322K wps
Begin Testing...
[Epoch 124] train avg loss 0.00103207, dev acc 0.8795, dev avg loss 0.423891, throughput 2.11243K wps
[Epoch 125 Batch 30/172] avg loss 0.000953279, throughput 2.16971K wps
[Epoch 125 Batch 60/172] avg loss 0.000921022, throughput 2.15333K wps
[Epoch 125 Batch 90/172] avg loss 0.00105696, throughput 2.10531K wps
[Epoch 125 Batch 120/172] avg loss 0.000894508, throughput 2.09042K wps
[Epoch 125 Batch 150/172] avg loss 0.00124967, throughput 2.08983K wps
Begin Testing...
[Epoch 125] train avg loss 0.00106227, dev acc 0.8826, dev avg loss 0.422364, throughput 2.12181K wps
[Epoch 126 Batch 30/172] avg loss 0.0010095, throughput 2.17738K wps
[Epoch 126 Batch 60/172] avg loss 0.00104153, throughput 2.102K wps
[Epoch 126 Batch 90/172] avg loss 0.00114848, throughput 2.12633K wps
[Epoch 126 Batch 120/172] avg loss 0.00104668, throughput 2.12926K wps
[Epoch 126 Batch 150/172] avg loss 0.000937919, throughput 2.10697K wps
Begin Testing...
[Epoch 126] train avg loss 0.00101272, dev acc 0.8847, dev avg loss 0.427797, throughput 2.12663K wps
[Epoch 127 Batch 30/172] avg loss 0.00111862, throughput 2.19331K wps
[Epoch 127 Batch 60/172] avg loss 0.00109874, throughput 2.13009K wps
[Epoch 127 Batch 90/172] avg loss 0.00101703, throughput 2.13805K wps
[Epoch 127 Batch 120/172] avg loss 0.00106184, throughput 2.11126K wps
[Epoch 127 Batch 150/172] avg loss 0.000807428, throughput 2.08568K wps
Begin Testing...
[Epoch 127] train avg loss 0.00101989, dev acc 0.8857, dev avg loss 0.428012, throughput 2.12433K wps
[Epoch 128 Batch 30/172] avg loss 0.000816588, throughput 2.18996K wps
[Epoch 128 Batch 60/172] avg loss 0.00101122, throughput 2.10606K wps
[Epoch 128 Batch 90/172] avg loss 0.00100142, throughput 2.11946K wps
[Epoch 128 Batch 120/172] avg loss 0.00105815, throughput 2.14892K wps
[Epoch 128 Batch 150/172] avg loss 0.000978486, throughput 2.14675K wps
Begin Testing...
[Epoch 128] train avg loss 0.00097147, dev acc 0.8826, dev avg loss 0.43052, throughput 2.14306K wps
[Epoch 129 Batch 30/172] avg loss 0.00100264, throughput 2.13474K wps
[Epoch 129 Batch 60/172] avg loss 0.00098304, throughput 2.08376K wps
[Epoch 129 Batch 90/172] avg loss 0.000830941, throughput 2.14764K wps
[Epoch 129 Batch 120/172] avg loss 0.00100669, throughput 2.09344K wps
[Epoch 129 Batch 150/172] avg loss 0.00101131, throughput 2.15085K wps
Begin Testing...
[Epoch 129] train avg loss 0.000982213, dev acc 0.8826, dev avg loss 0.438369, throughput 2.12504K wps
[Epoch 130 Batch 30/172] avg loss 0.000920842, throughput 2.14596K wps
[Epoch 130 Batch 60/172] avg loss 0.00110422, throughput 2.13798K wps
[Epoch 130 Batch 90/172] avg loss 0.000887987, throughput 2.14861K wps
[Epoch 130 Batch 120/172] avg loss 0.00123778, throughput 2.14799K wps
[Epoch 130 Batch 150/172] avg loss 0.00101545, throughput 2.14218K wps
Begin Testing...
[Epoch 130] train avg loss 0.00101608, dev acc 0.8847, dev avg loss 0.431143, throughput 2.14377K wps
[Epoch 131 Batch 30/172] avg loss 0.00111034, throughput 2.15092K wps
[Epoch 131 Batch 60/172] avg loss 0.000933305, throughput 2.13937K wps
[Epoch 131 Batch 90/172] avg loss 0.00102395, throughput 2.08936K wps
[Epoch 131 Batch 120/172] avg loss 0.00106516, throughput 2.144K wps
[Epoch 131 Batch 150/172] avg loss 0.00110859, throughput 2.14846K wps
Begin Testing...
[Epoch 131] train avg loss 0.00103901, dev acc 0.8857, dev avg loss 0.430986, throughput 2.1369K wps
[Epoch 132 Batch 30/172] avg loss 0.00107427, throughput 2.15242K wps
[Epoch 132 Batch 60/172] avg loss 0.000955703, throughput 2.08413K wps
[Epoch 132 Batch 90/172] avg loss 0.00101557, throughput 2.11371K wps
[Epoch 132 Batch 120/172] avg loss 0.000883783, throughput 2.11826K wps
[Epoch 132 Batch 150/172] avg loss 0.000814965, throughput 2.07689K wps
Begin Testing...
[Epoch 132] train avg loss 0.000935874, dev acc 0.8847, dev avg loss 0.436054, throughput 2.11192K wps
[Epoch 133 Batch 30/172] avg loss 0.000817696, throughput 2.19678K wps
[Epoch 133 Batch 60/172] avg loss 0.0011286, throughput 2.14134K wps
[Epoch 133 Batch 90/172] avg loss 0.00076009, throughput 2.14503K wps
[Epoch 133 Batch 120/172] avg loss 0.00100502, throughput 2.07758K wps
[Epoch 133 Batch 150/172] avg loss 0.00103459, throughput 2.0795K wps
Begin Testing...
[Epoch 133] train avg loss 0.000957543, dev acc 0.8857, dev avg loss 0.436546, throughput 2.12562K wps
[Epoch 134 Batch 30/172] avg loss 0.000878794, throughput 2.18512K wps
[Epoch 134 Batch 60/172] avg loss 0.00090171, throughput 2.0959K wps
[Epoch 134 Batch 90/172] avg loss 0.000945374, throughput 2.10213K wps
[Epoch 134 Batch 120/172] avg loss 0.000969416, throughput 2.12617K wps
[Epoch 134 Batch 150/172] avg loss 0.00123839, throughput 2.07567K wps
Begin Testing...
[Epoch 134] train avg loss 0.000965021, dev acc 0.8857, dev avg loss 0.437875, throughput 2.11272K wps
[Epoch 135 Batch 30/172] avg loss 0.000966882, throughput 2.15543K wps
[Epoch 135 Batch 60/172] avg loss 0.0008891, throughput 2.08833K wps
[Epoch 135 Batch 90/172] avg loss 0.000804678, throughput 2.083K wps
[Epoch 135 Batch 120/172] avg loss 0.000911352, throughput 2.0964K wps
[Epoch 135 Batch 150/172] avg loss 0.00120427, throughput 2.08462K wps
Begin Testing...
[Epoch 135] train avg loss 0.000959087, dev acc 0.8857, dev avg loss 0.440477, throughput 2.10022K wps
[Epoch 136 Batch 30/172] avg loss 0.00096488, throughput 2.16835K wps
[Epoch 136 Batch 60/172] avg loss 0.00106367, throughput 2.14636K wps
[Epoch 136 Batch 90/172] avg loss 0.00075751, throughput 2.11861K wps
[Epoch 136 Batch 120/172] avg loss 0.00103268, throughput 2.14936K wps
[Epoch 136 Batch 150/172] avg loss 0.00107325, throughput 2.14562K wps
Begin Testing...
[Epoch 136] train avg loss 0.000973903, dev acc 0.8868, dev avg loss 0.44212, throughput 2.13877K wps
[Epoch 137 Batch 30/172] avg loss 0.000868432, throughput 2.16746K wps
[Epoch 137 Batch 60/172] avg loss 0.000845438, throughput 2.09318K wps
[Epoch 137 Batch 90/172] avg loss 0.000971921, throughput 2.08918K wps
[Epoch 137 Batch 120/172] avg loss 0.00107547, throughput 2.1115K wps
[Epoch 137 Batch 150/172] avg loss 0.000794671, throughput 2.14802K wps
Begin Testing...
[Epoch 137] train avg loss 0.000932681, dev acc 0.8826, dev avg loss 0.440941, throughput 2.11498K wps
[Epoch 138 Batch 30/172] avg loss 0.000856932, throughput 2.14486K wps
[Epoch 138 Batch 60/172] avg loss 0.000920121, throughput 2.07676K wps
[Epoch 138 Batch 90/172] avg loss 0.000977274, throughput 2.09002K wps
[Epoch 138 Batch 120/172] avg loss 0.00112908, throughput 2.0881K wps
[Epoch 138 Batch 150/172] avg loss 0.00108144, throughput 2.07653K wps
Begin Testing...
[Epoch 138] train avg loss 0.000985985, dev acc 0.8868, dev avg loss 0.440468, throughput 2.0925K wps
[Epoch 139 Batch 30/172] avg loss 0.000976643, throughput 2.17798K wps
[Epoch 139 Batch 60/172] avg loss 0.000928223, throughput 2.14487K wps
[Epoch 139 Batch 90/172] avg loss 0.000800604, throughput 2.14298K wps
[Epoch 139 Batch 120/172] avg loss 0.000868504, throughput 2.10433K wps
[Epoch 139 Batch 150/172] avg loss 0.00109144, throughput 2.10122K wps
Begin Testing...
[Epoch 139] train avg loss 0.000969166, dev acc 0.8826, dev avg loss 0.44816, throughput 2.13459K wps
[Epoch 140 Batch 30/172] avg loss 0.00074332, throughput 2.1282K wps
[Epoch 140 Batch 60/172] avg loss 0.00109116, throughput 2.10529K wps
[Epoch 140 Batch 90/172] avg loss 0.00085105, throughput 2.10244K wps
[Epoch 140 Batch 120/172] avg loss 0.000807858, throughput 2.09331K wps
[Epoch 140 Batch 150/172] avg loss 0.00111909, throughput 2.14771K wps
Begin Testing...
[Epoch 140] train avg loss 0.00092701, dev acc 0.8878, dev avg loss 0.444547, throughput 2.11717K wps
[Epoch 141 Batch 30/172] avg loss 0.000890415, throughput 2.11471K wps
[Epoch 141 Batch 60/172] avg loss 0.000953877, throughput 2.14428K wps
[Epoch 141 Batch 90/172] avg loss 0.000901454, throughput 2.1447K wps
[Epoch 141 Batch 120/172] avg loss 0.00111768, throughput 2.14653K wps
[Epoch 141 Batch 150/172] avg loss 0.000939289, throughput 2.14661K wps
Begin Testing...
[Epoch 141] train avg loss 0.000950441, dev acc 0.8889, dev avg loss 0.446214, throughput 2.13886K wps
[Epoch 142 Batch 30/172] avg loss 0.000924448, throughput 2.13131K wps
[Epoch 142 Batch 60/172] avg loss 0.00083402, throughput 2.12524K wps
[Epoch 142 Batch 90/172] avg loss 0.000896508, throughput 2.15284K wps
[Epoch 142 Batch 120/172] avg loss 0.00101483, throughput 2.11642K wps
[Epoch 142 Batch 150/172] avg loss 0.000660473, throughput 2.10644K wps
Begin Testing...
[Epoch 142] train avg loss 0.000930944, dev acc 0.8868, dev avg loss 0.446456, throughput 2.12714K wps
[Epoch 143 Batch 30/172] avg loss 0.00111258, throughput 2.18486K wps
[Epoch 143 Batch 60/172] avg loss 0.000895866, throughput 2.10296K wps
[Epoch 143 Batch 90/172] avg loss 0.000889805, throughput 2.12265K wps
[Epoch 143 Batch 120/172] avg loss 0.000976909, throughput 2.14632K wps
[Epoch 143 Batch 150/172] avg loss 0.000944978, throughput 2.0965K wps
Begin Testing...
[Epoch 143] train avg loss 0.000948919, dev acc 0.8868, dev avg loss 0.446463, throughput 2.13138K wps
[Epoch 144 Batch 30/172] avg loss 0.000873216, throughput 2.17477K wps
[Epoch 144 Batch 60/172] avg loss 0.00082726, throughput 2.10259K wps
[Epoch 144 Batch 90/172] avg loss 0.00090239, throughput 2.092K wps
[Epoch 144 Batch 120/172] avg loss 0.000875789, throughput 2.10541K wps
[Epoch 144 Batch 150/172] avg loss 0.00103753, throughput 2.12362K wps
Begin Testing...
[Epoch 144] train avg loss 0.000913016, dev acc 0.8868, dev avg loss 0.451221, throughput 2.12362K wps
[Epoch 145 Batch 30/172] avg loss 0.000696625, throughput 2.13735K wps
[Epoch 145 Batch 60/172] avg loss 0.00093165, throughput 2.09722K wps
[Epoch 145 Batch 90/172] avg loss 0.0008306, throughput 2.11139K wps
[Epoch 145 Batch 120/172] avg loss 0.000834451, throughput 2.14579K wps
[Epoch 145 Batch 150/172] avg loss 0.00105449, throughput 2.14174K wps
Begin Testing...
[Epoch 145] train avg loss 0.000883999, dev acc 0.8868, dev avg loss 0.451591, throughput 2.12922K wps
[Epoch 146 Batch 30/172] avg loss 0.000854326, throughput 2.16041K wps
[Epoch 146 Batch 60/172] avg loss 0.000748282, throughput 2.13787K wps
[Epoch 146 Batch 90/172] avg loss 0.000907457, throughput 2.07389K wps
[Epoch 146 Batch 120/172] avg loss 0.00101793, throughput 2.11816K wps
[Epoch 146 Batch 150/172] avg loss 0.00102171, throughput 2.15072K wps
Begin Testing...
[Epoch 146] train avg loss 0.000902594, dev acc 0.8878, dev avg loss 0.451316, throughput 2.12926K wps
[Epoch 147 Batch 30/172] avg loss 0.000883141, throughput 2.17227K wps
[Epoch 147 Batch 60/172] avg loss 0.000854019, throughput 2.11956K wps
[Epoch 147 Batch 90/172] avg loss 0.000803711, throughput 2.11225K wps
[Epoch 147 Batch 120/172] avg loss 0.00083537, throughput 2.1362K wps
[Epoch 147 Batch 150/172] avg loss 0.00108236, throughput 2.10502K wps
Begin Testing...
[Epoch 147] train avg loss 0.000897018, dev acc 0.8878, dev avg loss 0.452717, throughput 2.12594K wps
[Epoch 148 Batch 30/172] avg loss 0.000695738, throughput 2.15422K wps
[Epoch 148 Batch 60/172] avg loss 0.000887391, throughput 2.1063K wps
[Epoch 148 Batch 90/172] avg loss 0.00104914, throughput 2.11729K wps
[Epoch 148 Batch 120/172] avg loss 0.000855998, throughput 2.12655K wps
[Epoch 148 Batch 150/172] avg loss 0.000968805, throughput 2.08507K wps
Begin Testing...
[Epoch 148] train avg loss 0.000888166, dev acc 0.8878, dev avg loss 0.454919, throughput 2.1143K wps
[Epoch 149 Batch 30/172] avg loss 0.000890158, throughput 2.13054K wps
[Epoch 149 Batch 60/172] avg loss 0.000871878, throughput 2.0754K wps
[Epoch 149 Batch 90/172] avg loss 0.000986633, throughput 2.12043K wps
[Epoch 149 Batch 120/172] avg loss 0.000922202, throughput 2.13817K wps
[Epoch 149 Batch 150/172] avg loss 0.000916002, throughput 2.11529K wps
Begin Testing...
[Epoch 149] train avg loss 0.000883877, dev acc 0.8878, dev avg loss 0.455423, throughput 2.11772K wps
[Epoch 150 Batch 30/172] avg loss 0.000901034, throughput 2.18247K wps
[Epoch 150 Batch 60/172] avg loss 0.000837849, throughput 2.14648K wps
[Epoch 150 Batch 90/172] avg loss 0.00102603, throughput 2.13367K wps
[Epoch 150 Batch 120/172] avg loss 0.000785894, throughput 2.08237K wps
[Epoch 150 Batch 150/172] avg loss 0.000742913, throughput 2.0887K wps
Begin Testing...
[Epoch 150] train avg loss 0.000882332, dev acc 0.8868, dev avg loss 0.454617, throughput 2.123K wps
[Epoch 151 Batch 30/172] avg loss 0.000911895, throughput 2.18576K wps
[Epoch 151 Batch 60/172] avg loss 0.000758394, throughput 2.13853K wps
[Epoch 151 Batch 90/172] avg loss 0.00094536, throughput 2.11863K wps
[Epoch 151 Batch 120/172] avg loss 0.000780229, throughput 2.12523K wps
[Epoch 151 Batch 150/172] avg loss 0.000926837, throughput 2.13654K wps
Begin Testing...
[Epoch 151] train avg loss 0.000860019, dev acc 0.8795, dev avg loss 0.458747, throughput 2.14232K wps
[Epoch 152 Batch 30/172] avg loss 0.000849023, throughput 2.18973K wps
[Epoch 152 Batch 60/172] avg loss 0.0010086, throughput 2.13131K wps
[Epoch 152 Batch 90/172] avg loss 0.000908668, throughput 2.14877K wps
[Epoch 152 Batch 120/172] avg loss 0.000885528, throughput 2.12504K wps
[Epoch 152 Batch 150/172] avg loss 0.000970799, throughput 2.09297K wps
Begin Testing...
[Epoch 152] train avg loss 0.000936235, dev acc 0.8836, dev avg loss 0.457666, throughput 2.13428K wps
[Epoch 153 Batch 30/172] avg loss 0.000774296, throughput 2.16122K wps
[Epoch 153 Batch 60/172] avg loss 0.000853658, throughput 2.08997K wps
[Epoch 153 Batch 90/172] avg loss 0.000829678, throughput 2.12013K wps
[Epoch 153 Batch 120/172] avg loss 0.000832109, throughput 2.08221K wps
[Epoch 153 Batch 150/172] avg loss 0.00107947, throughput 2.08355K wps
Begin Testing...
[Epoch 153] train avg loss 0.000879033, dev acc 0.8868, dev avg loss 0.459833, throughput 2.1085K wps
[Epoch 154 Batch 30/172] avg loss 0.000793542, throughput 2.14734K wps
[Epoch 154 Batch 60/172] avg loss 0.000830385, throughput 2.117K wps
[Epoch 154 Batch 90/172] avg loss 0.000854665, throughput 2.13941K wps
[Epoch 154 Batch 120/172] avg loss 0.000979538, throughput 2.09931K wps
[Epoch 154 Batch 150/172] avg loss 0.000801945, throughput 2.09096K wps
Begin Testing...
[Epoch 154] train avg loss 0.000883408, dev acc 0.8847, dev avg loss 0.463428, throughput 2.11539K wps
[Epoch 155 Batch 30/172] avg loss 0.000886472, throughput 2.15K wps
[Epoch 155 Batch 60/172] avg loss 0.000788426, throughput 2.08145K wps
[Epoch 155 Batch 90/172] avg loss 0.000976344, throughput 2.07376K wps
[Epoch 155 Batch 120/172] avg loss 0.000899852, throughput 2.07932K wps
[Epoch 155 Batch 150/172] avg loss 0.000770002, throughput 2.11627K wps
Begin Testing...
[Epoch 155] train avg loss 0.000891364, dev acc 0.8847, dev avg loss 0.463884, throughput 2.10344K wps
[Epoch 156 Batch 30/172] avg loss 0.000788081, throughput 2.19276K wps
[Epoch 156 Batch 60/172] avg loss 0.000966015, throughput 2.10301K wps
[Epoch 156 Batch 90/172] avg loss 0.000884065, throughput 2.1091K wps
[Epoch 156 Batch 120/172] avg loss 0.000789468, throughput 2.12485K wps
[Epoch 156 Batch 150/172] avg loss 0.000930139, throughput 2.14706K wps
Begin Testing...
[Epoch 156] train avg loss 0.000864217, dev acc 0.8868, dev avg loss 0.464533, throughput 2.12947K wps
[Epoch 157 Batch 30/172] avg loss 0.000809286, throughput 2.16508K wps
[Epoch 157 Batch 60/172] avg loss 0.000784031, throughput 2.14295K wps
[Epoch 157 Batch 90/172] avg loss 0.00102331, throughput 2.1393K wps
[Epoch 157 Batch 120/172] avg loss 0.000751777, throughput 2.10755K wps
[Epoch 157 Batch 150/172] avg loss 0.000899206, throughput 2.10084K wps
Begin Testing...
[Epoch 157] train avg loss 0.00087036, dev acc 0.8868, dev avg loss 0.462322, throughput 2.12907K wps
[Epoch 158 Batch 30/172] avg loss 0.000805508, throughput 2.13865K wps
[Epoch 158 Batch 60/172] avg loss 0.000816399, throughput 2.09989K wps
[Epoch 158 Batch 90/172] avg loss 0.000988857, throughput 2.13001K wps
[Epoch 158 Batch 120/172] avg loss 0.00071876, throughput 2.12013K wps
[Epoch 158 Batch 150/172] avg loss 0.00087667, throughput 2.07873K wps
Begin Testing...
[Epoch 158] train avg loss 0.000840987, dev acc 0.8857, dev avg loss 0.463849, throughput 2.1119K wps
[Epoch 159 Batch 30/172] avg loss 0.000839972, throughput 2.15843K wps
[Epoch 159 Batch 60/172] avg loss 0.000982893, throughput 2.13437K wps
[Epoch 159 Batch 90/172] avg loss 0.000855724, throughput 2.15846K wps
[Epoch 159 Batch 120/172] avg loss 0.00101702, throughput 2.1438K wps
[Epoch 159 Batch 150/172] avg loss 0.000758951, throughput 2.15676K wps
Begin Testing...
[Epoch 159] train avg loss 0.000871773, dev acc 0.8889, dev avg loss 0.464663, throughput 2.15041K wps
[Epoch 160 Batch 30/172] avg loss 0.000818809, throughput 2.13867K wps
[Epoch 160 Batch 60/172] avg loss 0.000924207, throughput 2.1093K wps
[Epoch 160 Batch 90/172] avg loss 0.000808733, throughput 2.08793K wps
[Epoch 160 Batch 120/172] avg loss 0.00110039, throughput 2.12283K wps
[Epoch 160 Batch 150/172] avg loss 0.000670081, throughput 2.13925K wps
Begin Testing...
[Epoch 160] train avg loss 0.000870524, dev acc 0.8857, dev avg loss 0.466012, throughput 2.12129K wps
[Epoch 161 Batch 30/172] avg loss 0.000792398, throughput 2.18161K wps
[Epoch 161 Batch 60/172] avg loss 0.000835691, throughput 2.14797K wps
[Epoch 161 Batch 90/172] avg loss 0.000888818, throughput 2.15694K wps
[Epoch 161 Batch 120/172] avg loss 0.000799927, throughput 2.15328K wps
[Epoch 161 Batch 150/172] avg loss 0.000934769, throughput 2.13826K wps
Begin Testing...
[Epoch 161] train avg loss 0.000832943, dev acc 0.8857, dev avg loss 0.468061, throughput 2.15395K wps
[Epoch 162 Batch 30/172] avg loss 0.000720834, throughput 2.16104K wps
[Epoch 162 Batch 60/172] avg loss 0.000885879, throughput 2.13331K wps
[Epoch 162 Batch 90/172] avg loss 0.00091201, throughput 2.09599K wps
[Epoch 162 Batch 120/172] avg loss 0.000946483, throughput 2.12602K wps
[Epoch 162 Batch 150/172] avg loss 0.000777355, throughput 2.1367K wps
Begin Testing...
[Epoch 162] train avg loss 0.000834856, dev acc 0.8826, dev avg loss 0.471271, throughput 2.13149K wps
[Epoch 163 Batch 30/172] avg loss 0.000860247, throughput 2.13056K wps
[Epoch 163 Batch 60/172] avg loss 0.00088059, throughput 2.08665K wps
[Epoch 163 Batch 90/172] avg loss 0.00077309, throughput 2.097K wps
[Epoch 163 Batch 120/172] avg loss 0.000716609, throughput 2.1515K wps
[Epoch 163 Batch 150/172] avg loss 0.000934022, throughput 2.13939K wps
Begin Testing...
[Epoch 163] train avg loss 0.000836393, dev acc 0.8847, dev avg loss 0.470417, throughput 2.11731K wps
[Epoch 164 Batch 30/172] avg loss 0.000620804, throughput 2.13277K wps
[Epoch 164 Batch 60/172] avg loss 0.000936973, throughput 2.13778K wps
[Epoch 164 Batch 90/172] avg loss 0.000957175, throughput 2.14627K wps
[Epoch 164 Batch 120/172] avg loss 0.00100337, throughput 2.08447K wps
[Epoch 164 Batch 150/172] avg loss 0.000765277, throughput 2.08623K wps
Begin Testing...
[Epoch 164] train avg loss 0.000864212, dev acc 0.8857, dev avg loss 0.473363, throughput 2.12153K wps
[Epoch 165 Batch 30/172] avg loss 0.00076663, throughput 2.18841K wps
[Epoch 165 Batch 60/172] avg loss 0.000717613, throughput 2.14445K wps
[Epoch 165 Batch 90/172] avg loss 0.000898289, throughput 2.08782K wps
[Epoch 165 Batch 120/172] avg loss 0.000959916, throughput 2.08628K wps
[Epoch 165 Batch 150/172] avg loss 0.000804029, throughput 2.11254K wps
Begin Testing...
[Epoch 165] train avg loss 0.000858231, dev acc 0.8857, dev avg loss 0.472383, throughput 2.12444K wps
[Epoch 166 Batch 30/172] avg loss 0.000830919, throughput 2.15375K wps
[Epoch 166 Batch 60/172] avg loss 0.000757503, throughput 2.09526K wps
[Epoch 166 Batch 90/172] avg loss 0.000813151, throughput 2.1277K wps
[Epoch 166 Batch 120/172] avg loss 0.000685334, throughput 2.09266K wps
[Epoch 166 Batch 150/172] avg loss 0.000914578, throughput 2.10881K wps
Begin Testing...
[Epoch 166] train avg loss 0.000807367, dev acc 0.8847, dev avg loss 0.476041, throughput 2.11824K wps
[Epoch 167 Batch 30/172] avg loss 0.000772524, throughput 2.18407K wps
[Epoch 167 Batch 60/172] avg loss 0.00087217, throughput 2.13518K wps
[Epoch 167 Batch 90/172] avg loss 0.000685704, throughput 2.14436K wps
[Epoch 167 Batch 120/172] avg loss 0.000894772, throughput 2.14624K wps
[Epoch 167 Batch 150/172] avg loss 0.000774608, throughput 2.09318K wps
Begin Testing...
[Epoch 167] train avg loss 0.000817515, dev acc 0.8816, dev avg loss 0.473657, throughput 2.13694K wps
[Epoch 168 Batch 30/172] avg loss 0.000839441, throughput 2.18302K wps
[Epoch 168 Batch 60/172] avg loss 0.000643549, throughput 2.09645K wps
[Epoch 168 Batch 90/172] avg loss 0.000841221, throughput 2.12833K wps
[Epoch 168 Batch 120/172] avg loss 0.000999221, throughput 2.13165K wps
[Epoch 168 Batch 150/172] avg loss 0.000764349, throughput 2.14807K wps
Begin Testing...
[Epoch 168] train avg loss 0.000849951, dev acc 0.8847, dev avg loss 0.475418, throughput 2.13714K wps
[Epoch 169 Batch 30/172] avg loss 0.000943995, throughput 2.16113K wps
[Epoch 169 Batch 60/172] avg loss 0.000861548, throughput 2.10014K wps
[Epoch 169 Batch 90/172] avg loss 0.000771202, throughput 2.07334K wps
[Epoch 169 Batch 120/172] avg loss 0.000972944, throughput 2.08871K wps
[Epoch 169 Batch 150/172] avg loss 0.000805823, throughput 2.14586K wps
Begin Testing...
[Epoch 169] train avg loss 0.000856239, dev acc 0.8857, dev avg loss 0.474904, throughput 2.11024K wps
[Epoch 170 Batch 30/172] avg loss 0.000737374, throughput 2.13777K wps
[Epoch 170 Batch 60/172] avg loss 0.000778578, throughput 2.1441K wps
[Epoch 170 Batch 90/172] avg loss 0.000678287, throughput 2.09016K wps
[Epoch 170 Batch 120/172] avg loss 0.000925944, throughput 2.13162K wps
[Epoch 170 Batch 150/172] avg loss 0.000846608, throughput 2.14079K wps
Begin Testing...
[Epoch 170] train avg loss 0.000821445, dev acc 0.8868, dev avg loss 0.475129, throughput 2.13076K wps
[Epoch 171 Batch 30/172] avg loss 0.000788362, throughput 2.18735K wps
[Epoch 171 Batch 60/172] avg loss 0.000719766, throughput 2.14427K wps
[Epoch 171 Batch 90/172] avg loss 0.000754939, throughput 2.09663K wps
[Epoch 171 Batch 120/172] avg loss 0.000924435, throughput 2.09023K wps
[Epoch 171 Batch 150/172] avg loss 0.00072273, throughput 2.10112K wps
Begin Testing...
[Epoch 171] train avg loss 0.000794503, dev acc 0.8857, dev avg loss 0.477317, throughput 2.12702K wps
[Epoch 172 Batch 30/172] avg loss 0.000684737, throughput 2.1442K wps
[Epoch 172 Batch 60/172] avg loss 0.000865114, throughput 2.06661K wps
[Epoch 172 Batch 90/172] avg loss 0.000908257, throughput 2.0925K wps
[Epoch 172 Batch 120/172] avg loss 0.000709621, throughput 2.09471K wps
[Epoch 172 Batch 150/172] avg loss 0.000868609, throughput 2.11385K wps
Begin Testing...
[Epoch 172] train avg loss 0.000797396, dev acc 0.8847, dev avg loss 0.481929, throughput 2.10234K wps
[Epoch 173 Batch 30/172] avg loss 0.000732665, throughput 2.12009K wps
[Epoch 173 Batch 60/172] avg loss 0.000795687, throughput 2.11221K wps
[Epoch 173 Batch 90/172] avg loss 0.000888605, throughput 2.14088K wps
[Epoch 173 Batch 120/172] avg loss 0.000678116, throughput 2.14615K wps
[Epoch 173 Batch 150/172] avg loss 0.000876833, throughput 2.11236K wps
Begin Testing...
[Epoch 173] train avg loss 0.000821952, dev acc 0.8857, dev avg loss 0.483014, throughput 2.12004K wps
[Epoch 174 Batch 30/172] avg loss 0.000798626, throughput 2.18563K wps
[Epoch 174 Batch 60/172] avg loss 0.000825686, throughput 2.14678K wps
[Epoch 174 Batch 90/172] avg loss 0.000809129, throughput 2.15029K wps
[Epoch 174 Batch 120/172] avg loss 0.000812591, throughput 2.12566K wps
[Epoch 174 Batch 150/172] avg loss 0.000947624, throughput 2.08342K wps
Begin Testing...
[Epoch 174] train avg loss 0.000827499, dev acc 0.8868, dev avg loss 0.481187, throughput 2.13374K wps
[Epoch 175 Batch 30/172] avg loss 0.000862418, throughput 2.13165K wps
[Epoch 175 Batch 60/172] avg loss 0.000723338, throughput 2.12207K wps
[Epoch 175 Batch 90/172] avg loss 0.000746881, throughput 2.09435K wps
[Epoch 175 Batch 120/172] avg loss 0.000883959, throughput 2.13666K wps
[Epoch 175 Batch 150/172] avg loss 0.000849365, throughput 2.13881K wps
Begin Testing...
[Epoch 175] train avg loss 0.000806175, dev acc 0.8847, dev avg loss 0.48268, throughput 2.12761K wps
[Epoch 176 Batch 30/172] avg loss 0.000855474, throughput 2.18629K wps
[Epoch 176 Batch 60/172] avg loss 0.000785583, throughput 2.14348K wps
[Epoch 176 Batch 90/172] avg loss 0.000724466, throughput 2.10729K wps
[Epoch 176 Batch 120/172] avg loss 0.000809782, throughput 2.08675K wps
[Epoch 176 Batch 150/172] avg loss 0.000859416, throughput 2.10975K wps
Begin Testing...
[Epoch 176] train avg loss 0.000794079, dev acc 0.8826, dev avg loss 0.483239, throughput 2.12251K wps
[Epoch 177 Batch 30/172] avg loss 0.000901391, throughput 2.13272K wps
[Epoch 177 Batch 60/172] avg loss 0.000730143, throughput 2.09294K wps
[Epoch 177 Batch 90/172] avg loss 0.000754478, throughput 2.09966K wps
[Epoch 177 Batch 120/172] avg loss 0.00091811, throughput 2.09088K wps
[Epoch 177 Batch 150/172] avg loss 0.000725844, throughput 2.09073K wps
Begin Testing...
[Epoch 177] train avg loss 0.000810581, dev acc 0.8847, dev avg loss 0.486413, throughput 2.10414K wps
[Epoch 178 Batch 30/172] avg loss 0.000855867, throughput 2.12567K wps
[Epoch 178 Batch 60/172] avg loss 0.000784427, throughput 2.06622K wps
[Epoch 178 Batch 90/172] avg loss 0.000993362, throughput 2.06996K wps
[Epoch 178 Batch 120/172] avg loss 0.000680245, throughput 2.07843K wps
[Epoch 178 Batch 150/172] avg loss 0.000555181, throughput 2.12064K wps
Begin Testing...
[Epoch 178] train avg loss 0.000792734, dev acc 0.8816, dev avg loss 0.485355, throughput 2.09625K wps
[Epoch 179 Batch 30/172] avg loss 0.000752369, throughput 2.17035K wps
[Epoch 179 Batch 60/172] avg loss 0.000925505, throughput 2.11705K wps
[Epoch 179 Batch 90/172] avg loss 0.000806078, throughput 2.1369K wps
[Epoch 179 Batch 120/172] avg loss 0.000743764, throughput 2.11356K wps
[Epoch 179 Batch 150/172] avg loss 0.000944259, throughput 2.09148K wps
Begin Testing...
[Epoch 179] train avg loss 0.000824746, dev acc 0.8836, dev avg loss 0.493316, throughput 2.12667K wps
[Epoch 180 Batch 30/172] avg loss 0.000718257, throughput 2.19759K wps
[Epoch 180 Batch 60/172] avg loss 0.000705164, throughput 2.094K wps
[Epoch 180 Batch 90/172] avg loss 0.000773761, throughput 2.08435K wps
[Epoch 180 Batch 120/172] avg loss 0.00086268, throughput 2.11942K wps
[Epoch 180 Batch 150/172] avg loss 0.000728873, throughput 2.08732K wps
Begin Testing...
[Epoch 180] train avg loss 0.000782436, dev acc 0.8795, dev avg loss 0.488255, throughput 2.11177K wps
[Epoch 181 Batch 30/172] avg loss 0.0006843, throughput 2.19787K wps
[Epoch 181 Batch 60/172] avg loss 0.000920345, throughput 2.14737K wps
[Epoch 181 Batch 90/172] avg loss 0.000697428, throughput 2.11256K wps
[Epoch 181 Batch 120/172] avg loss 0.000874394, throughput 2.09667K wps
[Epoch 181 Batch 150/172] avg loss 0.000877566, throughput 2.12524K wps
Begin Testing...
[Epoch 181] train avg loss 0.000818885, dev acc 0.8836, dev avg loss 0.491584, throughput 2.1372K wps
[Epoch 182 Batch 30/172] avg loss 0.00083716, throughput 2.16226K wps
[Epoch 182 Batch 60/172] avg loss 0.000938508, throughput 2.09386K wps
[Epoch 182 Batch 90/172] avg loss 0.000715589, throughput 2.13885K wps
[Epoch 182 Batch 120/172] avg loss 0.000691341, throughput 2.12613K wps
[Epoch 182 Batch 150/172] avg loss 0.000828298, throughput 2.12974K wps
Begin Testing...
[Epoch 182] train avg loss 0.000782969, dev acc 0.8816, dev avg loss 0.490164, throughput 2.13253K wps
[Epoch 183 Batch 30/172] avg loss 0.000677253, throughput 2.13917K wps
[Epoch 183 Batch 60/172] avg loss 0.000821261, throughput 2.10607K wps
[Epoch 183 Batch 90/172] avg loss 0.000821875, throughput 2.08062K wps
[Epoch 183 Batch 120/172] avg loss 0.000844435, throughput 2.10051K wps
[Epoch 183 Batch 150/172] avg loss 0.000739503, throughput 2.15611K wps
Begin Testing...
[Epoch 183] train avg loss 0.000802658, dev acc 0.8816, dev avg loss 0.489075, throughput 2.1208K wps
[Epoch 184 Batch 30/172] avg loss 0.00094816, throughput 2.13636K wps
[Epoch 184 Batch 60/172] avg loss 0.000724637, throughput 2.13286K wps
[Epoch 184 Batch 90/172] avg loss 0.000740773, throughput 2.08848K wps
[Epoch 184 Batch 120/172] avg loss 0.000814349, throughput 2.14134K wps
[Epoch 184 Batch 150/172] avg loss 0.000710942, throughput 2.10321K wps
Begin Testing...
[Epoch 184] train avg loss 0.000794735, dev acc 0.8805, dev avg loss 0.489875, throughput 2.11598K wps
[Epoch 185 Batch 30/172] avg loss 0.000697661, throughput 2.16497K wps
[Epoch 185 Batch 60/172] avg loss 0.000681346, throughput 2.15454K wps
[Epoch 185 Batch 90/172] avg loss 0.000762883, throughput 2.09677K wps
[Epoch 185 Batch 120/172] avg loss 0.000751531, throughput 2.10356K wps
[Epoch 185 Batch 150/172] avg loss 0.000939374, throughput 2.1271K wps
Begin Testing...
[Epoch 185] train avg loss 0.000782615, dev acc 0.8836, dev avg loss 0.492845, throughput 2.13094K wps
[Epoch 186 Batch 30/172] avg loss 0.000679408, throughput 2.15024K wps
[Epoch 186 Batch 60/172] avg loss 0.000662241, throughput 2.11239K wps
[Epoch 186 Batch 90/172] avg loss 0.000937642, throughput 2.11615K wps
[Epoch 186 Batch 120/172] avg loss 0.000708049, throughput 2.12928K wps
[Epoch 186 Batch 150/172] avg loss 0.000941387, throughput 2.12902K wps
Begin Testing...
[Epoch 186] train avg loss 0.000773244, dev acc 0.8868, dev avg loss 0.496621, throughput 2.12682K wps
[Epoch 187 Batch 30/172] avg loss 0.000778257, throughput 2.17087K wps
[Epoch 187 Batch 60/172] avg loss 0.00074904, throughput 2.10016K wps
[Epoch 187 Batch 90/172] avg loss 0.000734626, throughput 2.11059K wps
[Epoch 187 Batch 120/172] avg loss 0.000775664, throughput 2.09983K wps
[Epoch 187 Batch 150/172] avg loss 0.000954609, throughput 2.11027K wps
Begin Testing...
[Epoch 187] train avg loss 0.000799522, dev acc 0.8836, dev avg loss 0.493763, throughput 2.12087K wps
[Epoch 188 Batch 30/172] avg loss 0.000880262, throughput 2.1364K wps
[Epoch 188 Batch 60/172] avg loss 0.000721896, throughput 2.08128K wps
[Epoch 188 Batch 90/172] avg loss 0.000694721, throughput 2.08306K wps
[Epoch 188 Batch 120/172] avg loss 0.000740218, throughput 2.09439K wps
[Epoch 188 Batch 150/172] avg loss 0.000609349, throughput 2.10865K wps
Begin Testing...
[Epoch 188] train avg loss 0.000770426, dev acc 0.8836, dev avg loss 0.494902, throughput 2.105K wps
[Epoch 189 Batch 30/172] avg loss 0.000692016, throughput 2.17691K wps
[Epoch 189 Batch 60/172] avg loss 0.000773982, throughput 2.1003K wps
[Epoch 189 Batch 90/172] avg loss 0.000668071, throughput 2.09222K wps
[Epoch 189 Batch 120/172] avg loss 0.000756447, throughput 2.07806K wps
[Epoch 189 Batch 150/172] avg loss 0.00085508, throughput 2.14534K wps
Begin Testing...
[Epoch 189] train avg loss 0.000750557, dev acc 0.8836, dev avg loss 0.500822, throughput 2.11806K wps
[Epoch 190 Batch 30/172] avg loss 0.000652933, throughput 2.14036K wps
[Epoch 190 Batch 60/172] avg loss 0.000801575, throughput 2.12452K wps
[Epoch 190 Batch 90/172] avg loss 0.000828333, throughput 2.07953K wps
[Epoch 190 Batch 120/172] avg loss 0.000748909, throughput 2.06818K wps
[Epoch 190 Batch 150/172] avg loss 0.000574935, throughput 2.10618K wps
Begin Testing...
[Epoch 190] train avg loss 0.000741373, dev acc 0.8826, dev avg loss 0.497037, throughput 2.10723K wps
[Epoch 191 Batch 30/172] avg loss 0.000655183, throughput 2.13935K wps
[Epoch 191 Batch 60/172] avg loss 0.000670424, throughput 2.11456K wps
[Epoch 191 Batch 90/172] avg loss 0.000890809, throughput 2.12056K wps
[Epoch 191 Batch 120/172] avg loss 0.000706084, throughput 2.08286K wps
[Epoch 191 Batch 150/172] avg loss 0.000804842, throughput 2.13248K wps
Begin Testing...
[Epoch 191] train avg loss 0.000767298, dev acc 0.8836, dev avg loss 0.501669, throughput 2.12028K wps
[Epoch 192 Batch 30/172] avg loss 0.000680686, throughput 2.1282K wps
[Epoch 192 Batch 60/172] avg loss 0.000564688, throughput 2.07466K wps
[Epoch 192 Batch 90/172] avg loss 0.00086582, throughput 2.1383K wps
[Epoch 192 Batch 120/172] avg loss 0.000933745, throughput 2.14178K wps
[Epoch 192 Batch 150/172] avg loss 0.000869877, throughput 2.11945K wps
Begin Testing...
[Epoch 192] train avg loss 0.000775203, dev acc 0.8826, dev avg loss 0.497847, throughput 2.11676K wps
[Epoch 193 Batch 30/172] avg loss 0.000692745, throughput 2.18693K wps
[Epoch 193 Batch 60/172] avg loss 0.000865095, throughput 2.12344K wps
[Epoch 193 Batch 90/172] avg loss 0.000884842, throughput 2.13236K wps
[Epoch 193 Batch 120/172] avg loss 0.000735462, throughput 2.11778K wps
[Epoch 193 Batch 150/172] avg loss 0.000741433, throughput 2.149K wps
Begin Testing...
[Epoch 193] train avg loss 0.000771501, dev acc 0.8826, dev avg loss 0.502472, throughput 2.14316K wps
[Epoch 194 Batch 30/172] avg loss 0.000801648, throughput 2.18211K wps
[Epoch 194 Batch 60/172] avg loss 0.000752591, throughput 2.14021K wps
[Epoch 194 Batch 90/172] avg loss 0.000578913, throughput 2.1413K wps
[Epoch 194 Batch 120/172] avg loss 0.000839738, throughput 2.11823K wps
[Epoch 194 Batch 150/172] avg loss 0.000767091, throughput 2.10759K wps
Begin Testing...
[Epoch 194] train avg loss 0.000759826, dev acc 0.8826, dev avg loss 0.499353, throughput 2.13453K wps
[Epoch 195 Batch 30/172] avg loss 0.00106687, throughput 2.17317K wps
[Epoch 195 Batch 60/172] avg loss 0.000674206, throughput 2.14964K wps
[Epoch 195 Batch 90/172] avg loss 0.00062566, throughput 2.1093K wps
[Epoch 195 Batch 120/172] avg loss 0.000815748, throughput 2.08543K wps
[Epoch 195 Batch 150/172] avg loss 0.000774753, throughput 2.14489K wps
Begin Testing...
[Epoch 195] train avg loss 0.000771598, dev acc 0.8836, dev avg loss 0.504247, throughput 2.12534K wps
[Epoch 196 Batch 30/172] avg loss 0.000735618, throughput 2.16235K wps
[Epoch 196 Batch 60/172] avg loss 0.000611289, throughput 2.15415K wps
[Epoch 196 Batch 90/172] avg loss 0.000694824, throughput 2.14297K wps
[Epoch 196 Batch 120/172] avg loss 0.000828852, throughput 2.14152K wps
[Epoch 196 Batch 150/172] avg loss 0.000750492, throughput 2.101K wps
Begin Testing...
[Epoch 196] train avg loss 0.000744761, dev acc 0.8826, dev avg loss 0.501426, throughput 2.13735K wps
[Epoch 197 Batch 30/172] avg loss 0.000745481, throughput 2.17483K wps
[Epoch 197 Batch 60/172] avg loss 0.000722901, throughput 2.12156K wps
[Epoch 197 Batch 90/172] avg loss 0.000987577, throughput 2.06596K wps
[Epoch 197 Batch 120/172] avg loss 0.00056315, throughput 2.1217K wps
[Epoch 197 Batch 150/172] avg loss 0.000820619, throughput 2.09028K wps
Begin Testing...
[Epoch 197] train avg loss 0.000755066, dev acc 0.8857, dev avg loss 0.501708, throughput 2.11529K wps
[Epoch 198 Batch 30/172] avg loss 0.00059088, throughput 2.16504K wps
[Epoch 198 Batch 60/172] avg loss 0.000750928, throughput 2.10283K wps
[Epoch 198 Batch 90/172] avg loss 0.000622471, throughput 2.12383K wps
[Epoch 198 Batch 120/172] avg loss 0.000809561, throughput 2.08665K wps
[Epoch 198 Batch 150/172] avg loss 0.000801007, throughput 2.10523K wps
Begin Testing...
[Epoch 198] train avg loss 0.000740869, dev acc 0.8857, dev avg loss 0.505035, throughput 2.11297K wps
[Epoch 199 Batch 30/172] avg loss 0.000763995, throughput 2.17185K wps
[Epoch 199 Batch 60/172] avg loss 0.000834356, throughput 2.13643K wps
[Epoch 199 Batch 90/172] avg loss 0.000735338, throughput 2.13345K wps
[Epoch 199 Batch 120/172] avg loss 0.000672704, throughput 2.10339K wps
[Epoch 199 Batch 150/172] avg loss 0.000668457, throughput 2.08846K wps
Begin Testing...
[Epoch 199] train avg loss 0.000748669, dev acc 0.8805, dev avg loss 0.504021, throughput 2.12121K wps
Test loss 0.317749, test acc 0.8868
Total time cost 609.58s
[Epoch 0 Batch 30/172] avg loss 0.012641, throughput 1.87613K wps
[Epoch 0 Batch 60/172] avg loss 0.0122796, throughput 2.08154K wps
[Epoch 0 Batch 90/172] avg loss 0.0123718, throughput 2.10193K wps
[Epoch 0 Batch 120/172] avg loss 0.0122029, throughput 2.11585K wps
[Epoch 0 Batch 150/172] avg loss 0.0121345, throughput 2.11523K wps
Begin Testing...
[Epoch 0] train avg loss 0.0123071, dev acc 0.6771, dev avg loss 0.609449, throughput 2.06028K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0118126, throughput 2.1668K wps
[Epoch 1 Batch 60/172] avg loss 0.0118526, throughput 2.11339K wps
[Epoch 1 Batch 90/172] avg loss 0.0120555, throughput 2.12096K wps
[Epoch 1 Batch 120/172] avg loss 0.0118151, throughput 2.12764K wps
[Epoch 1 Batch 150/172] avg loss 0.0116045, throughput 2.09675K wps
Begin Testing...
[Epoch 1] train avg loss 0.0117898, dev acc 0.6771, dev avg loss 0.585707, throughput 2.1232K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0114733, throughput 2.15874K wps
[Epoch 2 Batch 60/172] avg loss 0.0113648, throughput 2.1095K wps
[Epoch 2 Batch 90/172] avg loss 0.0113738, throughput 2.1185K wps
[Epoch 2 Batch 120/172] avg loss 0.0112251, throughput 2.0975K wps
[Epoch 2 Batch 150/172] avg loss 0.0110844, throughput 2.09941K wps
Begin Testing...
[Epoch 2] train avg loss 0.0112406, dev acc 0.6803, dev avg loss 0.556171, throughput 2.11274K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0109663, throughput 2.19111K wps
[Epoch 3 Batch 60/172] avg loss 0.0105826, throughput 2.09376K wps
[Epoch 3 Batch 90/172] avg loss 0.0106011, throughput 2.13707K wps
[Epoch 3 Batch 120/172] avg loss 0.0106037, throughput 2.10052K wps
[Epoch 3 Batch 150/172] avg loss 0.0101778, throughput 2.07806K wps
Begin Testing...
[Epoch 3] train avg loss 0.0105541, dev acc 0.7453, dev avg loss 0.51729, throughput 2.11805K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.00998821, throughput 2.15157K wps
[Epoch 4 Batch 60/172] avg loss 0.010305, throughput 2.11775K wps
[Epoch 4 Batch 90/172] avg loss 0.00994458, throughput 2.08154K wps
[Epoch 4 Batch 120/172] avg loss 0.00941889, throughput 2.13532K wps
[Epoch 4 Batch 150/172] avg loss 0.00948884, throughput 2.10177K wps
Begin Testing...
[Epoch 4] train avg loss 0.00976781, dev acc 0.7809, dev avg loss 0.475427, throughput 2.1141K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.00920268, throughput 2.19058K wps
[Epoch 5 Batch 60/172] avg loss 0.00894881, throughput 2.13928K wps
[Epoch 5 Batch 90/172] avg loss 0.00925103, throughput 2.11863K wps
[Epoch 5 Batch 120/172] avg loss 0.00887529, throughput 2.12659K wps
[Epoch 5 Batch 150/172] avg loss 0.00888937, throughput 2.12324K wps
Begin Testing...
[Epoch 5] train avg loss 0.00901744, dev acc 0.8208, dev avg loss 0.434699, throughput 2.13949K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.00831387, throughput 2.17086K wps
[Epoch 6 Batch 60/172] avg loss 0.00839981, throughput 2.12979K wps
[Epoch 6 Batch 90/172] avg loss 0.0081565, throughput 2.11597K wps
[Epoch 6 Batch 120/172] avg loss 0.00843646, throughput 2.11823K wps
[Epoch 6 Batch 150/172] avg loss 0.00808748, throughput 2.12585K wps
Begin Testing...
[Epoch 6] train avg loss 0.00827387, dev acc 0.8543, dev avg loss 0.399422, throughput 2.13289K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.00787796, throughput 2.15412K wps
[Epoch 7 Batch 60/172] avg loss 0.00771366, throughput 2.10157K wps
[Epoch 7 Batch 90/172] avg loss 0.00755418, throughput 2.13518K wps
[Epoch 7 Batch 120/172] avg loss 0.0077746, throughput 2.12227K wps
[Epoch 7 Batch 150/172] avg loss 0.0074419, throughput 2.11513K wps
Begin Testing...
[Epoch 7] train avg loss 0.00765766, dev acc 0.8470, dev avg loss 0.371355, throughput 2.1251K wps
[Epoch 8 Batch 30/172] avg loss 0.00697968, throughput 2.1487K wps
[Epoch 8 Batch 60/172] avg loss 0.00706461, throughput 2.1046K wps
[Epoch 8 Batch 90/172] avg loss 0.0074191, throughput 2.08564K wps
[Epoch 8 Batch 120/172] avg loss 0.00712555, throughput 2.11487K wps
[Epoch 8 Batch 150/172] avg loss 0.00723839, throughput 2.13215K wps
Begin Testing...
[Epoch 8] train avg loss 0.00715545, dev acc 0.8679, dev avg loss 0.348926, throughput 2.1174K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00708153, throughput 2.14986K wps
[Epoch 9 Batch 60/172] avg loss 0.00693757, throughput 2.09826K wps
[Epoch 9 Batch 90/172] avg loss 0.00643273, throughput 2.09511K wps
[Epoch 9 Batch 120/172] avg loss 0.00677278, throughput 2.1353K wps
[Epoch 9 Batch 150/172] avg loss 0.00678234, throughput 2.11989K wps
Begin Testing...
[Epoch 9] train avg loss 0.00674595, dev acc 0.8669, dev avg loss 0.334666, throughput 2.11985K wps
[Epoch 10 Batch 30/172] avg loss 0.00605225, throughput 2.18659K wps
[Epoch 10 Batch 60/172] avg loss 0.00701292, throughput 2.10248K wps
[Epoch 10 Batch 90/172] avg loss 0.00642057, throughput 2.13007K wps
[Epoch 10 Batch 120/172] avg loss 0.00671323, throughput 2.1364K wps
[Epoch 10 Batch 150/172] avg loss 0.0062105, throughput 2.13538K wps
Begin Testing...
[Epoch 10] train avg loss 0.00646019, dev acc 0.8784, dev avg loss 0.323104, throughput 2.13758K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00597619, throughput 2.13333K wps
[Epoch 11 Batch 60/172] avg loss 0.00620051, throughput 2.10302K wps
[Epoch 11 Batch 90/172] avg loss 0.00636024, throughput 2.10465K wps
[Epoch 11 Batch 120/172] avg loss 0.00610372, throughput 2.08717K wps
[Epoch 11 Batch 150/172] avg loss 0.00632616, throughput 2.10691K wps
Begin Testing...
[Epoch 11] train avg loss 0.00622437, dev acc 0.8857, dev avg loss 0.310498, throughput 2.10467K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.00574915, throughput 2.16491K wps
[Epoch 12 Batch 60/172] avg loss 0.00595715, throughput 2.10507K wps
[Epoch 12 Batch 90/172] avg loss 0.00635667, throughput 2.09604K wps
[Epoch 12 Batch 120/172] avg loss 0.00611426, throughput 2.09662K wps
[Epoch 12 Batch 150/172] avg loss 0.00594602, throughput 2.13678K wps
Begin Testing...
[Epoch 12] train avg loss 0.00601125, dev acc 0.8910, dev avg loss 0.302743, throughput 2.11872K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.00571734, throughput 2.15459K wps
[Epoch 13 Batch 60/172] avg loss 0.00617177, throughput 2.11668K wps
[Epoch 13 Batch 90/172] avg loss 0.00596652, throughput 2.12077K wps
[Epoch 13 Batch 120/172] avg loss 0.00605221, throughput 2.10668K wps
[Epoch 13 Batch 150/172] avg loss 0.00563019, throughput 2.10957K wps
Begin Testing...
[Epoch 13] train avg loss 0.00589063, dev acc 0.8878, dev avg loss 0.297133, throughput 2.12004K wps
[Epoch 14 Batch 30/172] avg loss 0.00595014, throughput 2.17505K wps
[Epoch 14 Batch 60/172] avg loss 0.00564224, throughput 2.10062K wps
[Epoch 14 Batch 90/172] avg loss 0.00569081, throughput 2.09569K wps
[Epoch 14 Batch 120/172] avg loss 0.00599054, throughput 2.08944K wps
[Epoch 14 Batch 150/172] avg loss 0.00510092, throughput 2.08814K wps
Begin Testing...
[Epoch 14] train avg loss 0.00571301, dev acc 0.8910, dev avg loss 0.293514, throughput 2.10872K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.00563618, throughput 2.15396K wps
[Epoch 15 Batch 60/172] avg loss 0.00549048, throughput 2.14934K wps
[Epoch 15 Batch 90/172] avg loss 0.00511532, throughput 2.1245K wps
[Epoch 15 Batch 120/172] avg loss 0.00571221, throughput 2.11126K wps
[Epoch 15 Batch 150/172] avg loss 0.00571882, throughput 2.11222K wps
Begin Testing...
[Epoch 15] train avg loss 0.00550027, dev acc 0.8920, dev avg loss 0.28756, throughput 2.12654K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.00503929, throughput 2.15171K wps
[Epoch 16 Batch 60/172] avg loss 0.00582735, throughput 2.12732K wps
[Epoch 16 Batch 90/172] avg loss 0.00512177, throughput 2.12678K wps
[Epoch 16 Batch 120/172] avg loss 0.00548815, throughput 2.11573K wps
[Epoch 16 Batch 150/172] avg loss 0.00565926, throughput 2.12207K wps
Begin Testing...
[Epoch 16] train avg loss 0.00539951, dev acc 0.8941, dev avg loss 0.284691, throughput 2.12552K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/172] avg loss 0.00538955, throughput 2.15526K wps
[Epoch 17 Batch 60/172] avg loss 0.00470251, throughput 2.10388K wps
[Epoch 17 Batch 90/172] avg loss 0.00516489, throughput 2.11246K wps
[Epoch 17 Batch 120/172] avg loss 0.00545709, throughput 2.10917K wps
[Epoch 17 Batch 150/172] avg loss 0.00563957, throughput 2.1215K wps
Begin Testing...
[Epoch 17] train avg loss 0.00527713, dev acc 0.8941, dev avg loss 0.280919, throughput 2.12245K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00544028, throughput 2.17096K wps
[Epoch 18 Batch 60/172] avg loss 0.00498341, throughput 2.11258K wps
[Epoch 18 Batch 90/172] avg loss 0.00509889, throughput 2.11923K wps
[Epoch 18 Batch 120/172] avg loss 0.0047508, throughput 2.13621K wps
[Epoch 18 Batch 150/172] avg loss 0.00538334, throughput 2.11531K wps
Begin Testing...
[Epoch 18] train avg loss 0.00515852, dev acc 0.8983, dev avg loss 0.278127, throughput 2.12912K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00502593, throughput 2.1428K wps
[Epoch 19 Batch 60/172] avg loss 0.00461395, throughput 2.11603K wps
[Epoch 19 Batch 90/172] avg loss 0.00512661, throughput 2.1001K wps
[Epoch 19 Batch 120/172] avg loss 0.00517893, throughput 2.08413K wps
[Epoch 19 Batch 150/172] avg loss 0.0050097, throughput 2.09217K wps
Begin Testing...
[Epoch 19] train avg loss 0.00501679, dev acc 0.9015, dev avg loss 0.276259, throughput 2.1038K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.00543267, throughput 2.17201K wps
[Epoch 20 Batch 60/172] avg loss 0.00467075, throughput 2.11523K wps
[Epoch 20 Batch 90/172] avg loss 0.00484205, throughput 2.13676K wps
[Epoch 20 Batch 120/172] avg loss 0.00482399, throughput 2.11848K wps
[Epoch 20 Batch 150/172] avg loss 0.00551724, throughput 2.12554K wps
Begin Testing...
[Epoch 20] train avg loss 0.00505207, dev acc 0.8994, dev avg loss 0.274754, throughput 2.12756K wps
[Epoch 21 Batch 30/172] avg loss 0.00438316, throughput 2.1412K wps
[Epoch 21 Batch 60/172] avg loss 0.0049782, throughput 2.08214K wps
[Epoch 21 Batch 90/172] avg loss 0.00553083, throughput 2.12914K wps
[Epoch 21 Batch 120/172] avg loss 0.0048247, throughput 2.13007K wps
[Epoch 21 Batch 150/172] avg loss 0.0041993, throughput 2.12554K wps
Begin Testing...
[Epoch 21] train avg loss 0.00484223, dev acc 0.9036, dev avg loss 0.273146, throughput 2.1202K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00494415, throughput 2.14234K wps
[Epoch 22 Batch 60/172] avg loss 0.0048068, throughput 2.1038K wps
[Epoch 22 Batch 90/172] avg loss 0.00503011, throughput 2.11468K wps
[Epoch 22 Batch 120/172] avg loss 0.00452347, throughput 2.13377K wps
[Epoch 22 Batch 150/172] avg loss 0.00455577, throughput 2.11382K wps
Begin Testing...
[Epoch 22] train avg loss 0.00473942, dev acc 0.9025, dev avg loss 0.271159, throughput 2.12085K wps
[Epoch 23 Batch 30/172] avg loss 0.00458462, throughput 2.17085K wps
[Epoch 23 Batch 60/172] avg loss 0.00477036, throughput 2.12525K wps
[Epoch 23 Batch 90/172] avg loss 0.00488519, throughput 2.10174K wps
[Epoch 23 Batch 120/172] avg loss 0.00483703, throughput 2.11429K wps
[Epoch 23 Batch 150/172] avg loss 0.00472726, throughput 2.13413K wps
Begin Testing...
[Epoch 23] train avg loss 0.00470139, dev acc 0.9036, dev avg loss 0.270665, throughput 2.13055K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00424974, throughput 2.17112K wps
[Epoch 24 Batch 60/172] avg loss 0.00461629, throughput 2.10838K wps
[Epoch 24 Batch 90/172] avg loss 0.00475986, throughput 2.12416K wps
[Epoch 24 Batch 120/172] avg loss 0.0048147, throughput 2.12465K wps
[Epoch 24 Batch 150/172] avg loss 0.00467112, throughput 2.09384K wps
Begin Testing...
[Epoch 24] train avg loss 0.00458235, dev acc 0.9088, dev avg loss 0.268658, throughput 2.12217K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/172] avg loss 0.00478827, throughput 2.15285K wps
[Epoch 25 Batch 60/172] avg loss 0.00444845, throughput 2.12909K wps
[Epoch 25 Batch 90/172] avg loss 0.0043775, throughput 2.12429K wps
[Epoch 25 Batch 120/172] avg loss 0.00452896, throughput 2.08685K wps
[Epoch 25 Batch 150/172] avg loss 0.00461565, throughput 2.09101K wps
Begin Testing...
[Epoch 25] train avg loss 0.0045519, dev acc 0.9067, dev avg loss 0.268116, throughput 2.11375K wps
[Epoch 26 Batch 30/172] avg loss 0.00488352, throughput 2.14077K wps
[Epoch 26 Batch 60/172] avg loss 0.00435337, throughput 2.11829K wps
[Epoch 26 Batch 90/172] avg loss 0.00425373, throughput 2.12499K wps
[Epoch 26 Batch 120/172] avg loss 0.00470406, throughput 2.08468K wps
[Epoch 26 Batch 150/172] avg loss 0.00447878, throughput 2.11214K wps
Begin Testing...
[Epoch 26] train avg loss 0.00441718, dev acc 0.9067, dev avg loss 0.266725, throughput 2.11612K wps
[Epoch 27 Batch 30/172] avg loss 0.00464615, throughput 2.1583K wps
[Epoch 27 Batch 60/172] avg loss 0.0039855, throughput 2.09318K wps
[Epoch 27 Batch 90/172] avg loss 0.00371613, throughput 2.09909K wps
[Epoch 27 Batch 120/172] avg loss 0.00423904, throughput 2.10216K wps
[Epoch 27 Batch 150/172] avg loss 0.00461246, throughput 2.10825K wps
Begin Testing...
[Epoch 27] train avg loss 0.00427526, dev acc 0.9130, dev avg loss 0.265345, throughput 2.10965K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/172] avg loss 0.00454425, throughput 2.17144K wps
[Epoch 28 Batch 60/172] avg loss 0.00420724, throughput 2.11506K wps
[Epoch 28 Batch 90/172] avg loss 0.00409546, throughput 2.09598K wps
[Epoch 28 Batch 120/172] avg loss 0.00428103, throughput 2.10751K wps
[Epoch 28 Batch 150/172] avg loss 0.00412856, throughput 2.12626K wps
Begin Testing...
[Epoch 28] train avg loss 0.00423756, dev acc 0.9099, dev avg loss 0.264336, throughput 2.11979K wps
[Epoch 29 Batch 30/172] avg loss 0.00409547, throughput 2.14483K wps
[Epoch 29 Batch 60/172] avg loss 0.00386299, throughput 2.11974K wps
[Epoch 29 Batch 90/172] avg loss 0.00439579, throughput 2.12689K wps
[Epoch 29 Batch 120/172] avg loss 0.00404789, throughput 2.09596K wps
[Epoch 29 Batch 150/172] avg loss 0.00440529, throughput 2.09899K wps
Begin Testing...
[Epoch 29] train avg loss 0.00412676, dev acc 0.9119, dev avg loss 0.263943, throughput 2.11409K wps
[Epoch 30 Batch 30/172] avg loss 0.00379159, throughput 2.16198K wps
[Epoch 30 Batch 60/172] avg loss 0.00411843, throughput 2.1144K wps
[Epoch 30 Batch 90/172] avg loss 0.00369342, throughput 2.12413K wps
[Epoch 30 Batch 120/172] avg loss 0.00443106, throughput 2.13738K wps
[Epoch 30 Batch 150/172] avg loss 0.00417676, throughput 2.1422K wps
Begin Testing...
[Epoch 30] train avg loss 0.00406701, dev acc 0.9046, dev avg loss 0.26635, throughput 2.13546K wps
[Epoch 31 Batch 30/172] avg loss 0.0038248, throughput 2.16175K wps
[Epoch 31 Batch 60/172] avg loss 0.00406282, throughput 2.10728K wps
[Epoch 31 Batch 90/172] avg loss 0.00354817, throughput 2.13338K wps
[Epoch 31 Batch 120/172] avg loss 0.00398561, throughput 2.10785K wps
[Epoch 31 Batch 150/172] avg loss 0.00411448, throughput 2.10771K wps
Begin Testing...
[Epoch 31] train avg loss 0.00401227, dev acc 0.9099, dev avg loss 0.263228, throughput 2.12433K wps
[Epoch 32 Batch 30/172] avg loss 0.00393469, throughput 2.17402K wps
[Epoch 32 Batch 60/172] avg loss 0.00410455, throughput 2.10425K wps
[Epoch 32 Batch 90/172] avg loss 0.0038277, throughput 2.0931K wps
[Epoch 32 Batch 120/172] avg loss 0.00379881, throughput 2.10397K wps
[Epoch 32 Batch 150/172] avg loss 0.0038508, throughput 2.10766K wps
Begin Testing...
[Epoch 32] train avg loss 0.00388439, dev acc 0.9067, dev avg loss 0.263644, throughput 2.11716K wps
[Epoch 33 Batch 30/172] avg loss 0.00374903, throughput 2.11697K wps
[Epoch 33 Batch 60/172] avg loss 0.00401149, throughput 2.10132K wps
[Epoch 33 Batch 90/172] avg loss 0.00428294, throughput 2.12547K wps
[Epoch 33 Batch 120/172] avg loss 0.00329659, throughput 2.1083K wps
[Epoch 33 Batch 150/172] avg loss 0.00386101, throughput 2.09088K wps
Begin Testing...
[Epoch 33] train avg loss 0.00385965, dev acc 0.9099, dev avg loss 0.262252, throughput 2.10761K wps
[Epoch 34 Batch 30/172] avg loss 0.00330153, throughput 2.14624K wps
[Epoch 34 Batch 60/172] avg loss 0.00381298, throughput 2.121K wps
[Epoch 34 Batch 90/172] avg loss 0.00453745, throughput 2.09999K wps
[Epoch 34 Batch 120/172] avg loss 0.00361379, throughput 2.08386K wps
[Epoch 34 Batch 150/172] avg loss 0.00367851, throughput 2.09864K wps
Begin Testing...
[Epoch 34] train avg loss 0.00375895, dev acc 0.9099, dev avg loss 0.262421, throughput 2.11116K wps
[Epoch 35 Batch 30/172] avg loss 0.00356556, throughput 2.17176K wps
[Epoch 35 Batch 60/172] avg loss 0.0037136, throughput 2.0919K wps
[Epoch 35 Batch 90/172] avg loss 0.00366841, throughput 2.11331K wps
[Epoch 35 Batch 120/172] avg loss 0.00377795, throughput 2.13016K wps
[Epoch 35 Batch 150/172] avg loss 0.00374875, throughput 2.12464K wps
Begin Testing...
[Epoch 35] train avg loss 0.00366047, dev acc 0.9099, dev avg loss 0.262396, throughput 2.11869K wps
[Epoch 36 Batch 30/172] avg loss 0.0038753, throughput 2.15953K wps
[Epoch 36 Batch 60/172] avg loss 0.00350221, throughput 2.10469K wps
[Epoch 36 Batch 90/172] avg loss 0.00352682, throughput 2.13112K wps
[Epoch 36 Batch 120/172] avg loss 0.00344763, throughput 2.09588K wps
[Epoch 36 Batch 150/172] avg loss 0.00344473, throughput 2.10123K wps
Begin Testing...
[Epoch 36] train avg loss 0.00360336, dev acc 0.9067, dev avg loss 0.261369, throughput 2.11499K wps
[Epoch 37 Batch 30/172] avg loss 0.00347522, throughput 2.14983K wps
[Epoch 37 Batch 60/172] avg loss 0.00343183, throughput 2.12314K wps
[Epoch 37 Batch 90/172] avg loss 0.00398006, throughput 2.13456K wps
[Epoch 37 Batch 120/172] avg loss 0.00356964, throughput 2.11834K wps
[Epoch 37 Batch 150/172] avg loss 0.0035613, throughput 2.09054K wps
Begin Testing...
[Epoch 37] train avg loss 0.00356013, dev acc 0.9078, dev avg loss 0.261258, throughput 2.11864K wps
[Epoch 38 Batch 30/172] avg loss 0.00349933, throughput 2.16631K wps
[Epoch 38 Batch 60/172] avg loss 0.00375157, throughput 2.11943K wps
[Epoch 38 Batch 90/172] avg loss 0.00355159, throughput 2.11681K wps
[Epoch 38 Batch 120/172] avg loss 0.0035229, throughput 2.12824K wps
[Epoch 38 Batch 150/172] avg loss 0.00346315, throughput 2.1137K wps
Begin Testing...
[Epoch 38] train avg loss 0.0035192, dev acc 0.9078, dev avg loss 0.261416, throughput 2.1303K wps
[Epoch 39 Batch 30/172] avg loss 0.00362279, throughput 2.14531K wps
[Epoch 39 Batch 60/172] avg loss 0.00305195, throughput 2.13664K wps
[Epoch 39 Batch 90/172] avg loss 0.00356698, throughput 2.1277K wps
[Epoch 39 Batch 120/172] avg loss 0.00332708, throughput 2.10538K wps
[Epoch 39 Batch 150/172] avg loss 0.0035947, throughput 2.13766K wps
Begin Testing...
[Epoch 39] train avg loss 0.00342673, dev acc 0.9067, dev avg loss 0.261302, throughput 2.12987K wps
[Epoch 40 Batch 30/172] avg loss 0.0032455, throughput 2.1907K wps
[Epoch 40 Batch 60/172] avg loss 0.0030978, throughput 2.12251K wps
[Epoch 40 Batch 90/172] avg loss 0.00353529, throughput 2.13552K wps
[Epoch 40 Batch 120/172] avg loss 0.00341207, throughput 2.13247K wps
[Epoch 40 Batch 150/172] avg loss 0.00368747, throughput 2.09359K wps
Begin Testing...
[Epoch 40] train avg loss 0.00336838, dev acc 0.9057, dev avg loss 0.261498, throughput 2.13362K wps
[Epoch 41 Batch 30/172] avg loss 0.00319086, throughput 2.14787K wps
[Epoch 41 Batch 60/172] avg loss 0.00308799, throughput 2.1287K wps
[Epoch 41 Batch 90/172] avg loss 0.00352612, throughput 2.12164K wps
[Epoch 41 Batch 120/172] avg loss 0.0038263, throughput 2.13792K wps
[Epoch 41 Batch 150/172] avg loss 0.00300749, throughput 2.14625K wps
Begin Testing...
[Epoch 41] train avg loss 0.00330692, dev acc 0.9057, dev avg loss 0.261651, throughput 2.13265K wps
[Epoch 42 Batch 30/172] avg loss 0.0030138, throughput 2.15461K wps
[Epoch 42 Batch 60/172] avg loss 0.00316121, throughput 2.09725K wps
[Epoch 42 Batch 90/172] avg loss 0.00343197, throughput 2.12803K wps
[Epoch 42 Batch 120/172] avg loss 0.00344768, throughput 2.09794K wps
[Epoch 42 Batch 150/172] avg loss 0.00295424, throughput 2.10655K wps
Begin Testing...
[Epoch 42] train avg loss 0.00322571, dev acc 0.9057, dev avg loss 0.262412, throughput 2.11491K wps
[Epoch 43 Batch 30/172] avg loss 0.00304809, throughput 2.13016K wps
[Epoch 43 Batch 60/172] avg loss 0.00311855, throughput 2.11161K wps
[Epoch 43 Batch 90/172] avg loss 0.00343176, throughput 2.10337K wps
[Epoch 43 Batch 120/172] avg loss 0.00317898, throughput 2.13257K wps
[Epoch 43 Batch 150/172] avg loss 0.00314376, throughput 2.11041K wps
Begin Testing...
[Epoch 43] train avg loss 0.00319652, dev acc 0.9046, dev avg loss 0.262454, throughput 2.11712K wps
[Epoch 44 Batch 30/172] avg loss 0.00304641, throughput 2.16108K wps
[Epoch 44 Batch 60/172] avg loss 0.00314858, throughput 2.10299K wps
[Epoch 44 Batch 90/172] avg loss 0.00315609, throughput 2.10004K wps
[Epoch 44 Batch 120/172] avg loss 0.00301793, throughput 2.14415K wps
[Epoch 44 Batch 150/172] avg loss 0.0028464, throughput 2.13199K wps
Begin Testing...
[Epoch 44] train avg loss 0.00309667, dev acc 0.9057, dev avg loss 0.26309, throughput 2.12919K wps
[Epoch 45 Batch 30/172] avg loss 0.00309994, throughput 2.12083K wps
[Epoch 45 Batch 60/172] avg loss 0.00300029, throughput 2.14389K wps
[Epoch 45 Batch 90/172] avg loss 0.00293029, throughput 2.09531K wps
[Epoch 45 Batch 120/172] avg loss 0.00304969, throughput 2.0887K wps
[Epoch 45 Batch 150/172] avg loss 0.00318531, throughput 2.09544K wps
Begin Testing...
[Epoch 45] train avg loss 0.00304812, dev acc 0.9057, dev avg loss 0.263865, throughput 2.10728K wps
[Epoch 46 Batch 30/172] avg loss 0.0026742, throughput 2.18061K wps
[Epoch 46 Batch 60/172] avg loss 0.00297278, throughput 2.10993K wps
[Epoch 46 Batch 90/172] avg loss 0.00300881, throughput 2.14106K wps
[Epoch 46 Batch 120/172] avg loss 0.00306669, throughput 2.13726K wps
[Epoch 46 Batch 150/172] avg loss 0.00291582, throughput 2.10933K wps
Begin Testing...
[Epoch 46] train avg loss 0.00293348, dev acc 0.9057, dev avg loss 0.26342, throughput 2.13062K wps
[Epoch 47 Batch 30/172] avg loss 0.00285794, throughput 2.13018K wps
[Epoch 47 Batch 60/172] avg loss 0.00281212, throughput 2.14407K wps
[Epoch 47 Batch 90/172] avg loss 0.00311127, throughput 2.11965K wps
[Epoch 47 Batch 120/172] avg loss 0.00305859, throughput 2.14422K wps
[Epoch 47 Batch 150/172] avg loss 0.0029597, throughput 2.12537K wps
Begin Testing...
[Epoch 47] train avg loss 0.00294283, dev acc 0.9067, dev avg loss 0.264894, throughput 2.12994K wps
[Epoch 48 Batch 30/172] avg loss 0.00291655, throughput 2.18458K wps
[Epoch 48 Batch 60/172] avg loss 0.0028993, throughput 2.10023K wps
[Epoch 48 Batch 90/172] avg loss 0.00272134, throughput 2.09169K wps
[Epoch 48 Batch 120/172] avg loss 0.00314012, throughput 2.11864K wps
[Epoch 48 Batch 150/172] avg loss 0.00255489, throughput 2.09953K wps
Begin Testing...
[Epoch 48] train avg loss 0.00288125, dev acc 0.9067, dev avg loss 0.265366, throughput 2.11747K wps
[Epoch 49 Batch 30/172] avg loss 0.00278127, throughput 2.17657K wps
[Epoch 49 Batch 60/172] avg loss 0.0023819, throughput 2.10435K wps
[Epoch 49 Batch 90/172] avg loss 0.00287194, throughput 2.09995K wps
[Epoch 49 Batch 120/172] avg loss 0.00250685, throughput 2.11289K wps
[Epoch 49 Batch 150/172] avg loss 0.00311262, throughput 2.11606K wps
Begin Testing...
[Epoch 49] train avg loss 0.00274919, dev acc 0.9057, dev avg loss 0.267017, throughput 2.11885K wps
[Epoch 50 Batch 30/172] avg loss 0.00288111, throughput 2.15754K wps
[Epoch 50 Batch 60/172] avg loss 0.00269501, throughput 2.13321K wps
[Epoch 50 Batch 90/172] avg loss 0.00247541, throughput 2.10082K wps
[Epoch 50 Batch 120/172] avg loss 0.00270259, throughput 2.09078K wps
[Epoch 50 Batch 150/172] avg loss 0.00281258, throughput 2.11693K wps
Begin Testing...
[Epoch 50] train avg loss 0.00270384, dev acc 0.9067, dev avg loss 0.267114, throughput 2.11769K wps
[Epoch 51 Batch 30/172] avg loss 0.00260987, throughput 2.16333K wps
[Epoch 51 Batch 60/172] avg loss 0.00238389, throughput 2.10645K wps
[Epoch 51 Batch 90/172] avg loss 0.00261083, throughput 2.14181K wps
[Epoch 51 Batch 120/172] avg loss 0.00306128, throughput 2.13834K wps
[Epoch 51 Batch 150/172] avg loss 0.00273006, throughput 2.12128K wps
Begin Testing...
[Epoch 51] train avg loss 0.00266625, dev acc 0.9067, dev avg loss 0.267775, throughput 2.13192K wps
[Epoch 52 Batch 30/172] avg loss 0.00258161, throughput 2.17096K wps
[Epoch 52 Batch 60/172] avg loss 0.00235601, throughput 2.12084K wps
[Epoch 52 Batch 90/172] avg loss 0.00284121, throughput 2.09215K wps
[Epoch 52 Batch 120/172] avg loss 0.00277607, throughput 2.09878K wps
[Epoch 52 Batch 150/172] avg loss 0.00226757, throughput 2.11697K wps
Begin Testing...
[Epoch 52] train avg loss 0.00262042, dev acc 0.9067, dev avg loss 0.268138, throughput 2.11942K wps
[Epoch 53 Batch 30/172] avg loss 0.00262587, throughput 2.15704K wps
[Epoch 53 Batch 60/172] avg loss 0.00236494, throughput 2.09604K wps
[Epoch 53 Batch 90/172] avg loss 0.00234919, throughput 2.08676K wps
[Epoch 53 Batch 120/172] avg loss 0.00261761, throughput 2.1176K wps
[Epoch 53 Batch 150/172] avg loss 0.00263781, throughput 2.1283K wps
Begin Testing...
[Epoch 53] train avg loss 0.00256974, dev acc 0.9057, dev avg loss 0.269127, throughput 2.11388K wps
[Epoch 54 Batch 30/172] avg loss 0.00251529, throughput 2.16812K wps
[Epoch 54 Batch 60/172] avg loss 0.00217001, throughput 2.08475K wps
[Epoch 54 Batch 90/172] avg loss 0.00275804, throughput 2.10315K wps
[Epoch 54 Batch 120/172] avg loss 0.00268039, throughput 2.10565K wps
[Epoch 54 Batch 150/172] avg loss 0.00238774, throughput 2.11672K wps
Begin Testing...
[Epoch 54] train avg loss 0.0024911, dev acc 0.9078, dev avg loss 0.27106, throughput 2.11613K wps
[Epoch 55 Batch 30/172] avg loss 0.00251363, throughput 2.15093K wps
[Epoch 55 Batch 60/172] avg loss 0.00239491, throughput 2.0921K wps
[Epoch 55 Batch 90/172] avg loss 0.00235948, throughput 2.10411K wps
[Epoch 55 Batch 120/172] avg loss 0.00247577, throughput 2.1031K wps
[Epoch 55 Batch 150/172] avg loss 0.00283094, throughput 2.09252K wps
Begin Testing...
[Epoch 55] train avg loss 0.00245978, dev acc 0.9036, dev avg loss 0.271322, throughput 2.10559K wps
[Epoch 56 Batch 30/172] avg loss 0.00234675, throughput 2.1307K wps
[Epoch 56 Batch 60/172] avg loss 0.00241246, throughput 2.07929K wps
[Epoch 56 Batch 90/172] avg loss 0.00235382, throughput 2.09598K wps
[Epoch 56 Batch 120/172] avg loss 0.00257707, throughput 2.1117K wps
[Epoch 56 Batch 150/172] avg loss 0.00234622, throughput 2.10054K wps
Begin Testing...
[Epoch 56] train avg loss 0.00242274, dev acc 0.9046, dev avg loss 0.273579, throughput 2.10052K wps
[Epoch 57 Batch 30/172] avg loss 0.00252548, throughput 2.15889K wps
[Epoch 57 Batch 60/172] avg loss 0.00196964, throughput 2.12103K wps
[Epoch 57 Batch 90/172] avg loss 0.00295769, throughput 2.10729K wps
[Epoch 57 Batch 120/172] avg loss 0.00212713, throughput 2.1245K wps
[Epoch 57 Batch 150/172] avg loss 0.00228817, throughput 2.1099K wps
Begin Testing...
[Epoch 57] train avg loss 0.00232751, dev acc 0.9046, dev avg loss 0.27446, throughput 2.12075K wps
[Epoch 58 Batch 30/172] avg loss 0.00236479, throughput 2.18546K wps
[Epoch 58 Batch 60/172] avg loss 0.00232314, throughput 2.09462K wps
[Epoch 58 Batch 90/172] avg loss 0.00235781, throughput 2.12396K wps
[Epoch 58 Batch 120/172] avg loss 0.00246067, throughput 2.10514K wps
[Epoch 58 Batch 150/172] avg loss 0.00232065, throughput 2.10555K wps
Begin Testing...
[Epoch 58] train avg loss 0.00232852, dev acc 0.9046, dev avg loss 0.275135, throughput 2.11596K wps
[Epoch 59 Batch 30/172] avg loss 0.00241251, throughput 2.17663K wps
[Epoch 59 Batch 60/172] avg loss 0.00215564, throughput 2.10703K wps
[Epoch 59 Batch 90/172] avg loss 0.00218307, throughput 2.08864K wps
[Epoch 59 Batch 120/172] avg loss 0.00206206, throughput 2.10637K wps
[Epoch 59 Batch 150/172] avg loss 0.00212298, throughput 2.11808K wps
Begin Testing...
[Epoch 59] train avg loss 0.00221621, dev acc 0.9046, dev avg loss 0.276193, throughput 2.11929K wps
[Epoch 60 Batch 30/172] avg loss 0.00222549, throughput 2.1424K wps
[Epoch 60 Batch 60/172] avg loss 0.00198943, throughput 2.0977K wps
[Epoch 60 Batch 90/172] avg loss 0.00213832, throughput 2.12186K wps
[Epoch 60 Batch 120/172] avg loss 0.00243832, throughput 2.12447K wps
[Epoch 60 Batch 150/172] avg loss 0.00235746, throughput 2.1129K wps
Begin Testing...
[Epoch 60] train avg loss 0.00225005, dev acc 0.9036, dev avg loss 0.277166, throughput 2.1182K wps
[Epoch 61 Batch 30/172] avg loss 0.00182534, throughput 2.15478K wps
[Epoch 61 Batch 60/172] avg loss 0.0024344, throughput 2.13693K wps
[Epoch 61 Batch 90/172] avg loss 0.00209811, throughput 2.13484K wps
[Epoch 61 Batch 120/172] avg loss 0.00226691, throughput 2.11241K wps
[Epoch 61 Batch 150/172] avg loss 0.00235062, throughput 2.08764K wps
Begin Testing...
[Epoch 61] train avg loss 0.00219222, dev acc 0.9036, dev avg loss 0.279216, throughput 2.12127K wps
[Epoch 62 Batch 30/172] avg loss 0.00205361, throughput 2.13976K wps
[Epoch 62 Batch 60/172] avg loss 0.00222177, throughput 2.08035K wps
[Epoch 62 Batch 90/172] avg loss 0.00220583, throughput 2.09083K wps
[Epoch 62 Batch 120/172] avg loss 0.00215675, throughput 2.1131K wps
[Epoch 62 Batch 150/172] avg loss 0.00216611, throughput 2.09092K wps
Begin Testing...
[Epoch 62] train avg loss 0.00217706, dev acc 0.9025, dev avg loss 0.28045, throughput 2.10243K wps
[Epoch 63 Batch 30/172] avg loss 0.00208682, throughput 2.14561K wps
[Epoch 63 Batch 60/172] avg loss 0.00210206, throughput 2.09966K wps
[Epoch 63 Batch 90/172] avg loss 0.0020305, throughput 2.1097K wps
[Epoch 63 Batch 120/172] avg loss 0.00215035, throughput 2.12366K wps
[Epoch 63 Batch 150/172] avg loss 0.00231686, throughput 2.12217K wps
Begin Testing...
[Epoch 63] train avg loss 0.00211466, dev acc 0.9025, dev avg loss 0.281889, throughput 2.12081K wps
[Epoch 64 Batch 30/172] avg loss 0.00191736, throughput 2.1515K wps
[Epoch 64 Batch 60/172] avg loss 0.00229367, throughput 2.12501K wps
[Epoch 64 Batch 90/172] avg loss 0.00180455, throughput 2.13724K wps
[Epoch 64 Batch 120/172] avg loss 0.00201153, throughput 2.12764K wps
[Epoch 64 Batch 150/172] avg loss 0.00212962, throughput 2.10557K wps
Begin Testing...
[Epoch 64] train avg loss 0.00204558, dev acc 0.9025, dev avg loss 0.283262, throughput 2.12906K wps
[Epoch 65 Batch 30/172] avg loss 0.00194104, throughput 2.15211K wps
[Epoch 65 Batch 60/172] avg loss 0.00200262, throughput 2.09653K wps
[Epoch 65 Batch 90/172] avg loss 0.00196832, throughput 2.10977K wps
[Epoch 65 Batch 120/172] avg loss 0.00209673, throughput 2.07855K wps
[Epoch 65 Batch 150/172] avg loss 0.00221939, throughput 2.06774K wps
Begin Testing...
[Epoch 65] train avg loss 0.00204372, dev acc 0.9046, dev avg loss 0.286771, throughput 2.10068K wps
[Epoch 66 Batch 30/172] avg loss 0.00195533, throughput 2.17497K wps
[Epoch 66 Batch 60/172] avg loss 0.00201924, throughput 2.14028K wps
[Epoch 66 Batch 90/172] avg loss 0.00191958, throughput 2.12022K wps
[Epoch 66 Batch 120/172] avg loss 0.00212539, throughput 2.11409K wps
[Epoch 66 Batch 150/172] avg loss 0.00193759, throughput 2.1047K wps
Begin Testing...
[Epoch 66] train avg loss 0.00199792, dev acc 0.9046, dev avg loss 0.287279, throughput 2.12493K wps
[Epoch 67 Batch 30/172] avg loss 0.00215556, throughput 2.14327K wps
[Epoch 67 Batch 60/172] avg loss 0.00182375, throughput 2.0928K wps
[Epoch 67 Batch 90/172] avg loss 0.00188518, throughput 2.10268K wps
[Epoch 67 Batch 120/172] avg loss 0.00173921, throughput 2.12301K wps
[Epoch 67 Batch 150/172] avg loss 0.00199744, throughput 2.08929K wps
Begin Testing...
[Epoch 67] train avg loss 0.00196261, dev acc 0.9057, dev avg loss 0.289385, throughput 2.10716K wps
[Epoch 68 Batch 30/172] avg loss 0.0017649, throughput 2.15777K wps
[Epoch 68 Batch 60/172] avg loss 0.00190916, throughput 2.10305K wps
[Epoch 68 Batch 90/172] avg loss 0.00193357, throughput 2.1196K wps
[Epoch 68 Batch 120/172] avg loss 0.00199456, throughput 2.14419K wps
[Epoch 68 Batch 150/172] avg loss 0.00201504, throughput 2.1017K wps
Begin Testing...
[Epoch 68] train avg loss 0.00193101, dev acc 0.9025, dev avg loss 0.288123, throughput 2.11935K wps
[Epoch 69 Batch 30/172] avg loss 0.00181143, throughput 2.15429K wps
[Epoch 69 Batch 60/172] avg loss 0.00214247, throughput 2.1122K wps
[Epoch 69 Batch 90/172] avg loss 0.00168359, throughput 2.11247K wps
[Epoch 69 Batch 120/172] avg loss 0.00196425, throughput 2.11552K wps
[Epoch 69 Batch 150/172] avg loss 0.00189437, throughput 2.10783K wps
Begin Testing...
[Epoch 69] train avg loss 0.0019197, dev acc 0.9036, dev avg loss 0.288849, throughput 2.11775K wps
[Epoch 70 Batch 30/172] avg loss 0.00171761, throughput 2.16525K wps
[Epoch 70 Batch 60/172] avg loss 0.00189143, throughput 2.08854K wps
[Epoch 70 Batch 90/172] avg loss 0.00175926, throughput 2.11483K wps
[Epoch 70 Batch 120/172] avg loss 0.0016854, throughput 2.101K wps
[Epoch 70 Batch 150/172] avg loss 0.00196462, throughput 2.09468K wps
Begin Testing...
[Epoch 70] train avg loss 0.0018434, dev acc 0.9015, dev avg loss 0.289487, throughput 2.11093K wps
[Epoch 71 Batch 30/172] avg loss 0.00191196, throughput 2.14348K wps
[Epoch 71 Batch 60/172] avg loss 0.00164043, throughput 2.13463K wps
[Epoch 71 Batch 90/172] avg loss 0.00178116, throughput 2.12769K wps
[Epoch 71 Batch 120/172] avg loss 0.00198933, throughput 2.12546K wps
[Epoch 71 Batch 150/172] avg loss 0.00209557, throughput 2.14055K wps
Begin Testing...
[Epoch 71] train avg loss 0.00186844, dev acc 0.9025, dev avg loss 0.290245, throughput 2.13522K wps
[Epoch 72 Batch 30/172] avg loss 0.0020423, throughput 2.14689K wps
[Epoch 72 Batch 60/172] avg loss 0.00178244, throughput 2.09218K wps
[Epoch 72 Batch 90/172] avg loss 0.00195807, throughput 2.13996K wps
[Epoch 72 Batch 120/172] avg loss 0.00185794, throughput 2.12683K wps
[Epoch 72 Batch 150/172] avg loss 0.00176532, throughput 2.10669K wps
Begin Testing...
[Epoch 72] train avg loss 0.00184572, dev acc 0.9046, dev avg loss 0.29347, throughput 2.1211K wps
[Epoch 73 Batch 30/172] avg loss 0.00197281, throughput 2.15314K wps
[Epoch 73 Batch 60/172] avg loss 0.0016352, throughput 2.13044K wps
[Epoch 73 Batch 90/172] avg loss 0.00197984, throughput 2.12721K wps
[Epoch 73 Batch 120/172] avg loss 0.00158484, throughput 2.13265K wps
[Epoch 73 Batch 150/172] avg loss 0.00170282, throughput 2.1K wps
Begin Testing...
[Epoch 73] train avg loss 0.00180154, dev acc 0.9004, dev avg loss 0.295286, throughput 2.12544K wps
[Epoch 74 Batch 30/172] avg loss 0.00177416, throughput 2.18074K wps
[Epoch 74 Batch 60/172] avg loss 0.00157075, throughput 2.10701K wps
[Epoch 74 Batch 90/172] avg loss 0.00191179, throughput 2.09658K wps
[Epoch 74 Batch 120/172] avg loss 0.00187866, throughput 2.12247K wps
[Epoch 74 Batch 150/172] avg loss 0.00173115, throughput 2.14144K wps
Begin Testing...
[Epoch 74] train avg loss 0.0017914, dev acc 0.9015, dev avg loss 0.297592, throughput 2.1262K wps
[Epoch 75 Batch 30/172] avg loss 0.00176767, throughput 2.13784K wps
[Epoch 75 Batch 60/172] avg loss 0.00159804, throughput 2.12168K wps
[Epoch 75 Batch 90/172] avg loss 0.00188059, throughput 2.14124K wps
[Epoch 75 Batch 120/172] avg loss 0.00150351, throughput 2.10208K wps
[Epoch 75 Batch 150/172] avg loss 0.00176079, throughput 2.09908K wps
Begin Testing...
[Epoch 75] train avg loss 0.00171151, dev acc 0.8983, dev avg loss 0.301053, throughput 2.12008K wps
[Epoch 76 Batch 30/172] avg loss 0.00162064, throughput 2.18115K wps
[Epoch 76 Batch 60/172] avg loss 0.001666, throughput 2.1303K wps
[Epoch 76 Batch 90/172] avg loss 0.00174006, throughput 2.13459K wps
[Epoch 76 Batch 120/172] avg loss 0.00171202, throughput 2.13048K wps
[Epoch 76 Batch 150/172] avg loss 0.00192875, throughput 2.11075K wps
Begin Testing...
[Epoch 76] train avg loss 0.0017296, dev acc 0.9004, dev avg loss 0.298896, throughput 2.13139K wps
[Epoch 77 Batch 30/172] avg loss 0.00173053, throughput 2.15588K wps
[Epoch 77 Batch 60/172] avg loss 0.0016912, throughput 2.10901K wps
[Epoch 77 Batch 90/172] avg loss 0.00172081, throughput 2.08334K wps
[Epoch 77 Batch 120/172] avg loss 0.00177406, throughput 2.09695K wps
[Epoch 77 Batch 150/172] avg loss 0.00177571, throughput 2.12319K wps
Begin Testing...
[Epoch 77] train avg loss 0.00169077, dev acc 0.9004, dev avg loss 0.300467, throughput 2.11114K wps
[Epoch 78 Batch 30/172] avg loss 0.00173761, throughput 2.13807K wps
[Epoch 78 Batch 60/172] avg loss 0.0014696, throughput 2.10782K wps
[Epoch 78 Batch 90/172] avg loss 0.00183415, throughput 2.09904K wps
[Epoch 78 Batch 120/172] avg loss 0.00163949, throughput 2.0944K wps
[Epoch 78 Batch 150/172] avg loss 0.00183736, throughput 2.10051K wps
Begin Testing...
[Epoch 78] train avg loss 0.00166171, dev acc 0.8973, dev avg loss 0.3039, throughput 2.11224K wps
[Epoch 79 Batch 30/172] avg loss 0.00147199, throughput 2.15652K wps
[Epoch 79 Batch 60/172] avg loss 0.00172965, throughput 2.10797K wps
[Epoch 79 Batch 90/172] avg loss 0.00162565, throughput 2.14519K wps
[Epoch 79 Batch 120/172] avg loss 0.00168858, throughput 2.12068K wps
[Epoch 79 Batch 150/172] avg loss 0.00169242, throughput 2.12831K wps
Begin Testing...
[Epoch 79] train avg loss 0.00164643, dev acc 0.9025, dev avg loss 0.304565, throughput 2.13344K wps
[Epoch 80 Batch 30/172] avg loss 0.00172931, throughput 2.17996K wps
[Epoch 80 Batch 60/172] avg loss 0.00147214, throughput 2.1172K wps
[Epoch 80 Batch 90/172] avg loss 0.00164698, throughput 2.1408K wps
[Epoch 80 Batch 120/172] avg loss 0.00156187, throughput 2.1104K wps
[Epoch 80 Batch 150/172] avg loss 0.00196594, throughput 2.12766K wps
Begin Testing...
[Epoch 80] train avg loss 0.0016473, dev acc 0.9004, dev avg loss 0.304077, throughput 2.13244K wps
[Epoch 81 Batch 30/172] avg loss 0.00175436, throughput 2.17765K wps
[Epoch 81 Batch 60/172] avg loss 0.00144169, throughput 2.10077K wps
[Epoch 81 Batch 90/172] avg loss 0.0016749, throughput 2.09825K wps
[Epoch 81 Batch 120/172] avg loss 0.00152019, throughput 2.12959K wps
[Epoch 81 Batch 150/172] avg loss 0.00173912, throughput 2.131K wps
Begin Testing...
[Epoch 81] train avg loss 0.00163829, dev acc 0.9004, dev avg loss 0.306746, throughput 2.12953K wps
[Epoch 82 Batch 30/172] avg loss 0.00162237, throughput 2.15375K wps
[Epoch 82 Batch 60/172] avg loss 0.00149183, throughput 2.11095K wps
[Epoch 82 Batch 90/172] avg loss 0.00148211, throughput 2.14173K wps
[Epoch 82 Batch 120/172] avg loss 0.00175699, throughput 2.15068K wps
[Epoch 82 Batch 150/172] avg loss 0.00145009, throughput 2.14283K wps
Begin Testing...
[Epoch 82] train avg loss 0.00157308, dev acc 0.8994, dev avg loss 0.309178, throughput 2.14047K wps
[Epoch 83 Batch 30/172] avg loss 0.00128732, throughput 2.17858K wps
[Epoch 83 Batch 60/172] avg loss 0.00164906, throughput 2.11589K wps
[Epoch 83 Batch 90/172] avg loss 0.00152955, throughput 2.08379K wps
[Epoch 83 Batch 120/172] avg loss 0.00154858, throughput 2.1351K wps
[Epoch 83 Batch 150/172] avg loss 0.00160882, throughput 2.14869K wps
Begin Testing...
[Epoch 83] train avg loss 0.00155327, dev acc 0.9015, dev avg loss 0.308741, throughput 2.13335K wps
[Epoch 84 Batch 30/172] avg loss 0.00159156, throughput 2.18445K wps
[Epoch 84 Batch 60/172] avg loss 0.0015813, throughput 2.13707K wps
[Epoch 84 Batch 90/172] avg loss 0.00150753, throughput 2.12178K wps
[Epoch 84 Batch 120/172] avg loss 0.00175144, throughput 2.13759K wps
[Epoch 84 Batch 150/172] avg loss 0.00150016, throughput 2.10763K wps
Begin Testing...
[Epoch 84] train avg loss 0.00160559, dev acc 0.8994, dev avg loss 0.30997, throughput 2.1346K wps
[Epoch 85 Batch 30/172] avg loss 0.00145194, throughput 2.16796K wps
[Epoch 85 Batch 60/172] avg loss 0.00131755, throughput 2.11064K wps
[Epoch 85 Batch 90/172] avg loss 0.00160989, throughput 2.14014K wps
[Epoch 85 Batch 120/172] avg loss 0.001521, throughput 2.14262K wps
[Epoch 85 Batch 150/172] avg loss 0.00156051, throughput 2.13174K wps
Begin Testing...
[Epoch 85] train avg loss 0.0015159, dev acc 0.9004, dev avg loss 0.312761, throughput 2.13449K wps
[Epoch 86 Batch 30/172] avg loss 0.00154112, throughput 2.05951K wps
[Epoch 86 Batch 60/172] avg loss 0.0012555, throughput 2.07847K wps
[Epoch 86 Batch 90/172] avg loss 0.00136384, throughput 2.13638K wps
[Epoch 86 Batch 120/172] avg loss 0.00161137, throughput 2.13699K wps
[Epoch 86 Batch 150/172] avg loss 0.00183105, throughput 2.13704K wps
Begin Testing...
[Epoch 86] train avg loss 0.00151499, dev acc 0.8994, dev avg loss 0.314037, throughput 2.10691K wps
[Epoch 87 Batch 30/172] avg loss 0.00149596, throughput 2.15886K wps
[Epoch 87 Batch 60/172] avg loss 0.0017023, throughput 2.11858K wps
[Epoch 87 Batch 90/172] avg loss 0.00139481, throughput 2.11744K wps
[Epoch 87 Batch 120/172] avg loss 0.00155313, throughput 2.11599K wps
[Epoch 87 Batch 150/172] avg loss 0.00135652, throughput 2.13699K wps
Begin Testing...
[Epoch 87] train avg loss 0.00151429, dev acc 0.8983, dev avg loss 0.31467, throughput 2.13116K wps
[Epoch 88 Batch 30/172] avg loss 0.00162712, throughput 2.18652K wps
[Epoch 88 Batch 60/172] avg loss 0.00131204, throughput 2.11411K wps
[Epoch 88 Batch 90/172] avg loss 0.00138459, throughput 2.11541K wps
[Epoch 88 Batch 120/172] avg loss 0.00147714, throughput 2.13583K wps
[Epoch 88 Batch 150/172] avg loss 0.00160263, throughput 2.13546K wps
Begin Testing...
[Epoch 88] train avg loss 0.00147648, dev acc 0.8994, dev avg loss 0.31619, throughput 2.13671K wps
[Epoch 89 Batch 30/172] avg loss 0.00123099, throughput 2.17619K wps
[Epoch 89 Batch 60/172] avg loss 0.00151039, throughput 2.12431K wps
[Epoch 89 Batch 90/172] avg loss 0.00157763, throughput 2.10302K wps
[Epoch 89 Batch 120/172] avg loss 0.0015948, throughput 2.10084K wps
[Epoch 89 Batch 150/172] avg loss 0.00143878, throughput 2.10984K wps
Begin Testing...
[Epoch 89] train avg loss 0.00145793, dev acc 0.8994, dev avg loss 0.316727, throughput 2.12444K wps
[Epoch 90 Batch 30/172] avg loss 0.00115839, throughput 2.15803K wps
[Epoch 90 Batch 60/172] avg loss 0.00143089, throughput 2.0974K wps
[Epoch 90 Batch 90/172] avg loss 0.00143163, throughput 2.10302K wps
[Epoch 90 Batch 120/172] avg loss 0.00142866, throughput 2.12067K wps
[Epoch 90 Batch 150/172] avg loss 0.00168707, throughput 2.09611K wps
Begin Testing...
[Epoch 90] train avg loss 0.0014566, dev acc 0.8983, dev avg loss 0.318812, throughput 2.11205K wps
[Epoch 91 Batch 30/172] avg loss 0.00136315, throughput 2.17521K wps
[Epoch 91 Batch 60/172] avg loss 0.00138483, throughput 2.10223K wps
[Epoch 91 Batch 90/172] avg loss 0.00139699, throughput 2.08498K wps
[Epoch 91 Batch 120/172] avg loss 0.00148407, throughput 2.13799K wps
[Epoch 91 Batch 150/172] avg loss 0.00140791, throughput 2.13167K wps
Begin Testing...
[Epoch 91] train avg loss 0.00138663, dev acc 0.9046, dev avg loss 0.322203, throughput 2.12642K wps
[Epoch 92 Batch 30/172] avg loss 0.00176838, throughput 2.14338K wps
[Epoch 92 Batch 60/172] avg loss 0.00124531, throughput 2.14038K wps
[Epoch 92 Batch 90/172] avg loss 0.00136681, throughput 2.13369K wps
[Epoch 92 Batch 120/172] avg loss 0.00127772, throughput 2.13979K wps
[Epoch 92 Batch 150/172] avg loss 0.00132189, throughput 2.11979K wps
Begin Testing...
[Epoch 92] train avg loss 0.00142538, dev acc 0.8973, dev avg loss 0.321955, throughput 2.13175K wps
[Epoch 93 Batch 30/172] avg loss 0.00165021, throughput 2.1326K wps
[Epoch 93 Batch 60/172] avg loss 0.00130681, throughput 2.12993K wps
[Epoch 93 Batch 90/172] avg loss 0.00111572, throughput 2.11845K wps
[Epoch 93 Batch 120/172] avg loss 0.00139794, throughput 2.09727K wps
[Epoch 93 Batch 150/172] avg loss 0.00150435, throughput 2.10936K wps
Begin Testing...
[Epoch 93] train avg loss 0.00139117, dev acc 0.8983, dev avg loss 0.323533, throughput 2.11608K wps
[Epoch 94 Batch 30/172] avg loss 0.00136281, throughput 2.16148K wps
[Epoch 94 Batch 60/172] avg loss 0.00135508, throughput 2.14166K wps
[Epoch 94 Batch 90/172] avg loss 0.00122771, throughput 2.13968K wps
[Epoch 94 Batch 120/172] avg loss 0.00130487, throughput 2.10168K wps
[Epoch 94 Batch 150/172] avg loss 0.001432, throughput 2.09839K wps
Begin Testing...
[Epoch 94] train avg loss 0.00135338, dev acc 0.9004, dev avg loss 0.324712, throughput 2.12359K wps
[Epoch 95 Batch 30/172] avg loss 0.00121612, throughput 2.19112K wps
[Epoch 95 Batch 60/172] avg loss 0.00137532, throughput 2.12808K wps
[Epoch 95 Batch 90/172] avg loss 0.00138279, throughput 2.11458K wps
[Epoch 95 Batch 120/172] avg loss 0.00162839, throughput 2.11233K wps
[Epoch 95 Batch 150/172] avg loss 0.00151353, throughput 2.12051K wps
Begin Testing...
[Epoch 95] train avg loss 0.00141395, dev acc 0.8994, dev avg loss 0.325712, throughput 2.13261K wps
[Epoch 96 Batch 30/172] avg loss 0.00141835, throughput 2.15308K wps
[Epoch 96 Batch 60/172] avg loss 0.00112595, throughput 2.12322K wps
[Epoch 96 Batch 90/172] avg loss 0.00136685, throughput 2.11185K wps
[Epoch 96 Batch 120/172] avg loss 0.00138347, throughput 2.13978K wps
[Epoch 96 Batch 150/172] avg loss 0.00117132, throughput 2.09623K wps
Begin Testing...
[Epoch 96] train avg loss 0.00135026, dev acc 0.8952, dev avg loss 0.327754, throughput 2.12277K wps
[Epoch 97 Batch 30/172] avg loss 0.00131734, throughput 2.1594K wps
[Epoch 97 Batch 60/172] avg loss 0.00133617, throughput 2.14146K wps
[Epoch 97 Batch 90/172] avg loss 0.00125842, throughput 2.10536K wps
[Epoch 97 Batch 120/172] avg loss 0.0015162, throughput 2.1457K wps
[Epoch 97 Batch 150/172] avg loss 0.00130867, throughput 2.12423K wps
Begin Testing...
[Epoch 97] train avg loss 0.00133277, dev acc 0.9057, dev avg loss 0.332782, throughput 2.13339K wps
[Epoch 98 Batch 30/172] avg loss 0.00113293, throughput 2.15109K wps
[Epoch 98 Batch 60/172] avg loss 0.00139826, throughput 2.12183K wps
[Epoch 98 Batch 90/172] avg loss 0.00129841, throughput 2.12322K wps
[Epoch 98 Batch 120/172] avg loss 0.00130713, throughput 2.10923K wps
[Epoch 98 Batch 150/172] avg loss 0.00145367, throughput 2.13878K wps
Begin Testing...
[Epoch 98] train avg loss 0.00134019, dev acc 0.9046, dev avg loss 0.332337, throughput 2.1293K wps
[Epoch 99 Batch 30/172] avg loss 0.00115919, throughput 2.15541K wps
[Epoch 99 Batch 60/172] avg loss 0.00137275, throughput 2.10183K wps
[Epoch 99 Batch 90/172] avg loss 0.0013648, throughput 2.10716K wps
[Epoch 99 Batch 120/172] avg loss 0.0013799, throughput 2.13978K wps
[Epoch 99 Batch 150/172] avg loss 0.00141016, throughput 2.13433K wps
Begin Testing...
[Epoch 99] train avg loss 0.00131809, dev acc 0.9015, dev avg loss 0.332132, throughput 2.12874K wps
[Epoch 100 Batch 30/172] avg loss 0.00122381, throughput 2.13317K wps
[Epoch 100 Batch 60/172] avg loss 0.00132797, throughput 2.10254K wps
[Epoch 100 Batch 90/172] avg loss 0.00133495, throughput 2.0823K wps
[Epoch 100 Batch 120/172] avg loss 0.00126465, throughput 2.11496K wps
[Epoch 100 Batch 150/172] avg loss 0.00137417, throughput 2.12544K wps
Begin Testing...
[Epoch 100] train avg loss 0.00130426, dev acc 0.9004, dev avg loss 0.332138, throughput 2.11164K wps
[Epoch 101 Batch 30/172] avg loss 0.00110149, throughput 2.13677K wps
[Epoch 101 Batch 60/172] avg loss 0.00125821, throughput 2.08797K wps
[Epoch 101 Batch 90/172] avg loss 0.00129212, throughput 2.08961K wps
[Epoch 101 Batch 120/172] avg loss 0.00127168, throughput 2.09529K wps
[Epoch 101 Batch 150/172] avg loss 0.00126644, throughput 2.09591K wps
Begin Testing...
[Epoch 101] train avg loss 0.00125251, dev acc 0.8962, dev avg loss 0.334139, throughput 2.10193K wps
[Epoch 102 Batch 30/172] avg loss 0.00094333, throughput 2.16715K wps
[Epoch 102 Batch 60/172] avg loss 0.00128518, throughput 2.13982K wps
[Epoch 102 Batch 90/172] avg loss 0.00135276, throughput 2.11008K wps
[Epoch 102 Batch 120/172] avg loss 0.00132686, throughput 2.10243K wps
[Epoch 102 Batch 150/172] avg loss 0.00145104, throughput 2.1055K wps
Begin Testing...
[Epoch 102] train avg loss 0.00128207, dev acc 0.8973, dev avg loss 0.334299, throughput 2.12227K wps
[Epoch 103 Batch 30/172] avg loss 0.00123833, throughput 2.15305K wps
[Epoch 103 Batch 60/172] avg loss 0.00102414, throughput 2.10878K wps
[Epoch 103 Batch 90/172] avg loss 0.0012219, throughput 2.10099K wps
[Epoch 103 Batch 120/172] avg loss 0.0011166, throughput 2.11579K wps
[Epoch 103 Batch 150/172] avg loss 0.0012062, throughput 2.1299K wps
Begin Testing...
[Epoch 103] train avg loss 0.00125108, dev acc 0.8983, dev avg loss 0.335219, throughput 2.12402K wps
[Epoch 104 Batch 30/172] avg loss 0.00127193, throughput 2.13948K wps
[Epoch 104 Batch 60/172] avg loss 0.00111761, throughput 2.09815K wps
[Epoch 104 Batch 90/172] avg loss 0.00137692, throughput 2.10816K wps
[Epoch 104 Batch 120/172] avg loss 0.00129557, throughput 2.09838K wps
[Epoch 104 Batch 150/172] avg loss 0.00134033, throughput 2.11999K wps
Begin Testing...
[Epoch 104] train avg loss 0.00127921, dev acc 0.9046, dev avg loss 0.343609, throughput 2.11142K wps
[Epoch 105 Batch 30/172] avg loss 0.00114224, throughput 2.18394K wps
[Epoch 105 Batch 60/172] avg loss 0.00131442, throughput 2.1119K wps
[Epoch 105 Batch 90/172] avg loss 0.00114947, throughput 2.10669K wps
[Epoch 105 Batch 120/172] avg loss 0.00146014, throughput 2.14063K wps
[Epoch 105 Batch 150/172] avg loss 0.00119818, throughput 2.12523K wps
Begin Testing...
[Epoch 105] train avg loss 0.001247, dev acc 0.9004, dev avg loss 0.336301, throughput 2.12894K wps
[Epoch 106 Batch 30/172] avg loss 0.000980906, throughput 2.17186K wps
[Epoch 106 Batch 60/172] avg loss 0.00115583, throughput 2.11678K wps
[Epoch 106 Batch 90/172] avg loss 0.00123219, throughput 2.12665K wps
[Epoch 106 Batch 120/172] avg loss 0.00114112, throughput 2.11025K wps
[Epoch 106 Batch 150/172] avg loss 0.00133451, throughput 2.12298K wps
Begin Testing...
[Epoch 106] train avg loss 0.00121205, dev acc 0.8994, dev avg loss 0.33955, throughput 2.12745K wps
[Epoch 107 Batch 30/172] avg loss 0.00127458, throughput 2.16306K wps
[Epoch 107 Batch 60/172] avg loss 0.00124066, throughput 2.12378K wps
[Epoch 107 Batch 90/172] avg loss 0.00107874, throughput 2.12004K wps
[Epoch 107 Batch 120/172] avg loss 0.00130832, throughput 2.14234K wps
[Epoch 107 Batch 150/172] avg loss 0.00127923, throughput 2.13446K wps
Begin Testing...
[Epoch 107] train avg loss 0.00124789, dev acc 0.8973, dev avg loss 0.340126, throughput 2.13508K wps
[Epoch 108 Batch 30/172] avg loss 0.00109183, throughput 2.1353K wps
[Epoch 108 Batch 60/172] avg loss 0.00120551, throughput 2.11347K wps
[Epoch 108 Batch 90/172] avg loss 0.00127565, throughput 2.08903K wps
[Epoch 108 Batch 120/172] avg loss 0.00132162, throughput 2.09509K wps
[Epoch 108 Batch 150/172] avg loss 0.00112204, throughput 2.1046K wps
Begin Testing...
[Epoch 108] train avg loss 0.0012057, dev acc 0.9046, dev avg loss 0.345445, throughput 2.10694K wps
[Epoch 109 Batch 30/172] avg loss 0.00122407, throughput 2.17617K wps
[Epoch 109 Batch 60/172] avg loss 0.00117484, throughput 2.14068K wps
[Epoch 109 Batch 90/172] avg loss 0.00115835, throughput 2.12007K wps
[Epoch 109 Batch 120/172] avg loss 0.00104091, throughput 2.12621K wps
[Epoch 109 Batch 150/172] avg loss 0.0011168, throughput 2.14283K wps
Begin Testing...
[Epoch 109] train avg loss 0.00118204, dev acc 0.9015, dev avg loss 0.343309, throughput 2.14094K wps
[Epoch 110 Batch 30/172] avg loss 0.00126502, throughput 2.16848K wps
[Epoch 110 Batch 60/172] avg loss 0.00101492, throughput 2.09497K wps
[Epoch 110 Batch 90/172] avg loss 0.00130724, throughput 2.09619K wps
[Epoch 110 Batch 120/172] avg loss 0.00106368, throughput 2.13791K wps
[Epoch 110 Batch 150/172] avg loss 0.00138239, throughput 2.13191K wps
Begin Testing...
[Epoch 110] train avg loss 0.00119896, dev acc 0.8962, dev avg loss 0.343635, throughput 2.12323K wps
[Epoch 111 Batch 30/172] avg loss 0.00115988, throughput 2.16979K wps
[Epoch 111 Batch 60/172] avg loss 0.00106869, throughput 2.09481K wps
[Epoch 111 Batch 90/172] avg loss 0.00134007, throughput 2.10719K wps
[Epoch 111 Batch 120/172] avg loss 0.00119414, throughput 2.15089K wps
[Epoch 111 Batch 150/172] avg loss 0.00121541, throughput 2.1394K wps
Begin Testing...
[Epoch 111] train avg loss 0.00118169, dev acc 0.9004, dev avg loss 0.345527, throughput 2.13073K wps
[Epoch 112 Batch 30/172] avg loss 0.00111774, throughput 2.13592K wps
[Epoch 112 Batch 60/172] avg loss 0.00123516, throughput 2.10054K wps
[Epoch 112 Batch 90/172] avg loss 0.00101016, throughput 2.12934K wps
[Epoch 112 Batch 120/172] avg loss 0.00123832, throughput 2.13664K wps
[Epoch 112 Batch 150/172] avg loss 0.0013051, throughput 2.08916K wps
Begin Testing...
[Epoch 112] train avg loss 0.0011737, dev acc 0.8994, dev avg loss 0.346974, throughput 2.11751K wps
[Epoch 113 Batch 30/172] avg loss 0.00135829, throughput 2.15368K wps
[Epoch 113 Batch 60/172] avg loss 0.00111275, throughput 2.10239K wps
[Epoch 113 Batch 90/172] avg loss 0.00116472, throughput 2.1043K wps
[Epoch 113 Batch 120/172] avg loss 0.00115508, throughput 2.10093K wps
[Epoch 113 Batch 150/172] avg loss 0.0012408, throughput 2.09134K wps
Begin Testing...
[Epoch 113] train avg loss 0.00118007, dev acc 0.9025, dev avg loss 0.348779, throughput 2.10873K wps
[Epoch 114 Batch 30/172] avg loss 0.00122344, throughput 2.15431K wps
[Epoch 114 Batch 60/172] avg loss 0.000962014, throughput 2.10007K wps
[Epoch 114 Batch 90/172] avg loss 0.00113141, throughput 2.12005K wps
[Epoch 114 Batch 120/172] avg loss 0.00124928, throughput 2.14003K wps
[Epoch 114 Batch 150/172] avg loss 0.00118087, throughput 2.13963K wps
Begin Testing...
[Epoch 114] train avg loss 0.00113403, dev acc 0.9015, dev avg loss 0.350082, throughput 2.12917K wps
[Epoch 115 Batch 30/172] avg loss 0.00096977, throughput 2.14908K wps
[Epoch 115 Batch 60/172] avg loss 0.00120372, throughput 2.11447K wps
[Epoch 115 Batch 90/172] avg loss 0.00136719, throughput 2.12606K wps
[Epoch 115 Batch 120/172] avg loss 0.00119856, throughput 2.13087K wps
[Epoch 115 Batch 150/172] avg loss 0.0010377, throughput 2.13386K wps
Begin Testing...
[Epoch 115] train avg loss 0.00117686, dev acc 0.8994, dev avg loss 0.348059, throughput 2.12587K wps
[Epoch 116 Batch 30/172] avg loss 0.0010015, throughput 2.16916K wps
[Epoch 116 Batch 60/172] avg loss 0.00121804, throughput 2.13956K wps
[Epoch 116 Batch 90/172] avg loss 0.00126512, throughput 2.13806K wps
[Epoch 116 Batch 120/172] avg loss 0.00108855, throughput 2.10051K wps
[Epoch 116 Batch 150/172] avg loss 0.00106196, throughput 2.14978K wps
Begin Testing...
[Epoch 116] train avg loss 0.00112433, dev acc 0.9025, dev avg loss 0.352942, throughput 2.13931K wps
[Epoch 117 Batch 30/172] avg loss 0.00117158, throughput 2.1341K wps
[Epoch 117 Batch 60/172] avg loss 0.000924907, throughput 2.11689K wps
[Epoch 117 Batch 90/172] avg loss 0.00116352, throughput 2.10394K wps
[Epoch 117 Batch 120/172] avg loss 0.0011988, throughput 2.11166K wps
[Epoch 117 Batch 150/172] avg loss 0.00113821, throughput 2.13703K wps
Begin Testing...
[Epoch 117] train avg loss 0.00112353, dev acc 0.9025, dev avg loss 0.352389, throughput 2.11805K wps
[Epoch 118 Batch 30/172] avg loss 0.00119549, throughput 2.1442K wps
[Epoch 118 Batch 60/172] avg loss 0.00107465, throughput 2.09528K wps
[Epoch 118 Batch 90/172] avg loss 0.00112216, throughput 2.09831K wps
[Epoch 118 Batch 120/172] avg loss 0.0011391, throughput 2.09575K wps
[Epoch 118 Batch 150/172] avg loss 0.000986631, throughput 2.1269K wps
Begin Testing...
[Epoch 118] train avg loss 0.00112348, dev acc 0.8983, dev avg loss 0.352674, throughput 2.11599K wps
[Epoch 119 Batch 30/172] avg loss 0.00108445, throughput 2.18737K wps
[Epoch 119 Batch 60/172] avg loss 0.000986267, throughput 2.11434K wps
[Epoch 119 Batch 90/172] avg loss 0.00114272, throughput 2.10553K wps
[Epoch 119 Batch 120/172] avg loss 0.00103587, throughput 2.13649K wps
[Epoch 119 Batch 150/172] avg loss 0.00124523, throughput 2.12076K wps
Begin Testing...
[Epoch 119] train avg loss 0.00110411, dev acc 0.8973, dev avg loss 0.355897, throughput 2.13138K wps
[Epoch 120 Batch 30/172] avg loss 0.00104035, throughput 2.14293K wps
[Epoch 120 Batch 60/172] avg loss 0.00104148, throughput 2.11881K wps
[Epoch 120 Batch 90/172] avg loss 0.00105646, throughput 2.09542K wps
[Epoch 120 Batch 120/172] avg loss 0.0012474, throughput 2.1139K wps
[Epoch 120 Batch 150/172] avg loss 0.00114495, throughput 2.10813K wps
Begin Testing...
[Epoch 120] train avg loss 0.00108889, dev acc 0.9015, dev avg loss 0.363839, throughput 2.11756K wps
[Epoch 121 Batch 30/172] avg loss 0.00105876, throughput 2.17148K wps
[Epoch 121 Batch 60/172] avg loss 0.00107663, throughput 2.09752K wps
[Epoch 121 Batch 90/172] avg loss 0.00100566, throughput 2.10085K wps
[Epoch 121 Batch 120/172] avg loss 0.00100047, throughput 2.10491K wps
[Epoch 121 Batch 150/172] avg loss 0.001272, throughput 2.11635K wps
Begin Testing...
[Epoch 121] train avg loss 0.00108147, dev acc 0.8994, dev avg loss 0.356996, throughput 2.11666K wps
[Epoch 122 Batch 30/172] avg loss 0.000992369, throughput 2.14509K wps
[Epoch 122 Batch 60/172] avg loss 0.00104409, throughput 2.14393K wps
[Epoch 122 Batch 90/172] avg loss 0.000932538, throughput 2.09571K wps
[Epoch 122 Batch 120/172] avg loss 0.00128697, throughput 2.08405K wps
[Epoch 122 Batch 150/172] avg loss 0.0011444, throughput 2.14351K wps
Begin Testing...
[Epoch 122] train avg loss 0.0010819, dev acc 0.9015, dev avg loss 0.358241, throughput 2.12465K wps
[Epoch 123 Batch 30/172] avg loss 0.00110766, throughput 2.16855K wps
[Epoch 123 Batch 60/172] avg loss 0.00103147, throughput 2.13527K wps
[Epoch 123 Batch 90/172] avg loss 0.00121152, throughput 2.14104K wps
[Epoch 123 Batch 120/172] avg loss 0.0011138, throughput 2.11325K wps
[Epoch 123 Batch 150/172] avg loss 0.000954601, throughput 2.1279K wps
Begin Testing...
[Epoch 123] train avg loss 0.00109493, dev acc 0.8994, dev avg loss 0.357482, throughput 2.13525K wps
[Epoch 124 Batch 30/172] avg loss 0.00105327, throughput 2.16721K wps
[Epoch 124 Batch 60/172] avg loss 0.00121179, throughput 2.12564K wps
[Epoch 124 Batch 90/172] avg loss 0.000956424, throughput 2.13624K wps
[Epoch 124 Batch 120/172] avg loss 0.000955184, throughput 2.12477K wps
[Epoch 124 Batch 150/172] avg loss 0.00113267, throughput 2.10901K wps
Begin Testing...
[Epoch 124] train avg loss 0.0010606, dev acc 0.8994, dev avg loss 0.362871, throughput 2.13273K wps
[Epoch 125 Batch 30/172] avg loss 0.00104387, throughput 2.14643K wps
[Epoch 125 Batch 60/172] avg loss 0.00114588, throughput 2.11926K wps
[Epoch 125 Batch 90/172] avg loss 0.00107831, throughput 2.13234K wps
[Epoch 125 Batch 120/172] avg loss 0.000838398, throughput 2.10506K wps
[Epoch 125 Batch 150/172] avg loss 0.0010757, throughput 2.10323K wps
Begin Testing...
[Epoch 125] train avg loss 0.00104921, dev acc 0.9004, dev avg loss 0.36377, throughput 2.12091K wps
[Epoch 126 Batch 30/172] avg loss 0.00105019, throughput 2.17425K wps
[Epoch 126 Batch 60/172] avg loss 0.000941896, throughput 2.12624K wps
[Epoch 126 Batch 90/172] avg loss 0.001098, throughput 2.1151K wps
[Epoch 126 Batch 120/172] avg loss 0.00130182, throughput 2.08821K wps
[Epoch 126 Batch 150/172] avg loss 0.00106095, throughput 2.0961K wps
Begin Testing...
[Epoch 126] train avg loss 0.00108322, dev acc 0.9015, dev avg loss 0.361831, throughput 2.12187K wps
[Epoch 127 Batch 30/172] avg loss 0.000890582, throughput 2.13523K wps
[Epoch 127 Batch 60/172] avg loss 0.00104037, throughput 2.13968K wps
[Epoch 127 Batch 90/172] avg loss 0.000978377, throughput 2.13472K wps
[Epoch 127 Batch 120/172] avg loss 0.000935178, throughput 2.10775K wps
[Epoch 127 Batch 150/172] avg loss 0.00118944, throughput 2.13298K wps
Begin Testing...
[Epoch 127] train avg loss 0.00104204, dev acc 0.8994, dev avg loss 0.362422, throughput 2.12543K wps
[Epoch 128 Batch 30/172] avg loss 0.000906259, throughput 2.13867K wps
[Epoch 128 Batch 60/172] avg loss 0.00109825, throughput 2.08299K wps
[Epoch 128 Batch 90/172] avg loss 0.00100442, throughput 2.09613K wps
[Epoch 128 Batch 120/172] avg loss 0.000879296, throughput 2.0959K wps
[Epoch 128 Batch 150/172] avg loss 0.000971446, throughput 2.09261K wps
Begin Testing...
[Epoch 128] train avg loss 0.000994848, dev acc 0.9004, dev avg loss 0.369579, throughput 2.10239K wps
[Epoch 129 Batch 30/172] avg loss 0.000942425, throughput 2.14423K wps
[Epoch 129 Batch 60/172] avg loss 0.00110247, throughput 2.12162K wps
[Epoch 129 Batch 90/172] avg loss 0.00110553, throughput 2.1156K wps
[Epoch 129 Batch 120/172] avg loss 0.00111614, throughput 2.10519K wps
[Epoch 129 Batch 150/172] avg loss 0.00110195, throughput 2.11262K wps
Begin Testing...
[Epoch 129] train avg loss 0.00106681, dev acc 0.9004, dev avg loss 0.368361, throughput 2.1231K wps
[Epoch 130 Batch 30/172] avg loss 0.00101051, throughput 2.17503K wps
[Epoch 130 Batch 60/172] avg loss 0.000980934, throughput 2.13097K wps
[Epoch 130 Batch 90/172] avg loss 0.000811984, throughput 2.11463K wps
[Epoch 130 Batch 120/172] avg loss 0.00108669, throughput 2.11857K wps
[Epoch 130 Batch 150/172] avg loss 0.000944335, throughput 2.09695K wps
Begin Testing...
[Epoch 130] train avg loss 0.000978455, dev acc 0.9015, dev avg loss 0.369926, throughput 2.12382K wps
[Epoch 131 Batch 30/172] avg loss 0.000871249, throughput 2.16076K wps
[Epoch 131 Batch 60/172] avg loss 0.000886271, throughput 2.10863K wps
[Epoch 131 Batch 90/172] avg loss 0.00102876, throughput 2.11113K wps
[Epoch 131 Batch 120/172] avg loss 0.000964629, throughput 2.09577K wps
[Epoch 131 Batch 150/172] avg loss 0.00114158, throughput 2.12359K wps
Begin Testing...
[Epoch 131] train avg loss 0.00102188, dev acc 0.9004, dev avg loss 0.371862, throughput 2.11707K wps
[Epoch 132 Batch 30/172] avg loss 0.00100512, throughput 2.18093K wps
[Epoch 132 Batch 60/172] avg loss 0.00103995, throughput 2.14063K wps
[Epoch 132 Batch 90/172] avg loss 0.00104416, throughput 2.11663K wps
[Epoch 132 Batch 120/172] avg loss 0.000931832, throughput 2.14522K wps
[Epoch 132 Batch 150/172] avg loss 0.00100858, throughput 2.10257K wps
Begin Testing...
[Epoch 132] train avg loss 0.00101278, dev acc 0.8994, dev avg loss 0.372827, throughput 2.13338K wps
[Epoch 133 Batch 30/172] avg loss 0.00114451, throughput 2.16922K wps
[Epoch 133 Batch 60/172] avg loss 0.000985704, throughput 2.117K wps
[Epoch 133 Batch 90/172] avg loss 0.000968998, throughput 2.09851K wps
[Epoch 133 Batch 120/172] avg loss 0.00105681, throughput 2.1331K wps
[Epoch 133 Batch 150/172] avg loss 0.000974645, throughput 2.1199K wps
Begin Testing...
[Epoch 133] train avg loss 0.0010396, dev acc 0.9015, dev avg loss 0.372473, throughput 2.12866K wps
[Epoch 134 Batch 30/172] avg loss 0.00116487, throughput 2.1663K wps
[Epoch 134 Batch 60/172] avg loss 0.000952117, throughput 2.14763K wps
[Epoch 134 Batch 90/172] avg loss 0.00102388, throughput 2.12061K wps
[Epoch 134 Batch 120/172] avg loss 0.000980427, throughput 2.12735K wps
[Epoch 134 Batch 150/172] avg loss 0.00112066, throughput 2.11877K wps
Begin Testing...
[Epoch 134] train avg loss 0.00102517, dev acc 0.8941, dev avg loss 0.374801, throughput 2.13202K wps
[Epoch 135 Batch 30/172] avg loss 0.000880687, throughput 2.17396K wps
[Epoch 135 Batch 60/172] avg loss 0.000987157, throughput 2.11558K wps
[Epoch 135 Batch 90/172] avg loss 0.00097723, throughput 2.1177K wps
[Epoch 135 Batch 120/172] avg loss 0.00100496, throughput 2.13609K wps
[Epoch 135 Batch 150/172] avg loss 0.00109321, throughput 2.10592K wps
Begin Testing...
[Epoch 135] train avg loss 0.00100774, dev acc 0.9015, dev avg loss 0.373369, throughput 2.12482K wps
[Epoch 136 Batch 30/172] avg loss 0.000945278, throughput 2.14667K wps
[Epoch 136 Batch 60/172] avg loss 0.000939777, throughput 2.09801K wps
[Epoch 136 Batch 90/172] avg loss 0.00103271, throughput 2.11749K wps
[Epoch 136 Batch 120/172] avg loss 0.0011317, throughput 2.11903K wps
[Epoch 136 Batch 150/172] avg loss 0.000889075, throughput 2.13317K wps
Begin Testing...
[Epoch 136] train avg loss 0.00102885, dev acc 0.9015, dev avg loss 0.373335, throughput 2.11877K wps
[Epoch 137 Batch 30/172] avg loss 0.00095308, throughput 2.17662K wps
[Epoch 137 Batch 60/172] avg loss 0.000979755, throughput 2.10962K wps
[Epoch 137 Batch 90/172] avg loss 0.000986225, throughput 2.10955K wps
[Epoch 137 Batch 120/172] avg loss 0.0010137, throughput 2.08015K wps
[Epoch 137 Batch 150/172] avg loss 0.000889351, throughput 2.13803K wps
Begin Testing...
[Epoch 137] train avg loss 0.000984837, dev acc 0.9015, dev avg loss 0.376134, throughput 2.12483K wps
[Epoch 138 Batch 30/172] avg loss 0.000821062, throughput 2.15417K wps
[Epoch 138 Batch 60/172] avg loss 0.000963495, throughput 2.14477K wps
[Epoch 138 Batch 90/172] avg loss 0.00103866, throughput 2.14015K wps
[Epoch 138 Batch 120/172] avg loss 0.00121677, throughput 2.13967K wps
[Epoch 138 Batch 150/172] avg loss 0.00100369, throughput 2.09115K wps
Begin Testing...
[Epoch 138] train avg loss 0.000999142, dev acc 0.9004, dev avg loss 0.374803, throughput 2.13178K wps
[Epoch 139 Batch 30/172] avg loss 0.000937017, throughput 2.16232K wps
[Epoch 139 Batch 60/172] avg loss 0.000877921, throughput 2.09599K wps
[Epoch 139 Batch 90/172] avg loss 0.00103811, throughput 2.10383K wps
[Epoch 139 Batch 120/172] avg loss 0.00115301, throughput 2.12432K wps
[Epoch 139 Batch 150/172] avg loss 0.000932771, throughput 2.10002K wps
Begin Testing...
[Epoch 139] train avg loss 0.00099296, dev acc 0.9025, dev avg loss 0.375764, throughput 2.11432K wps
[Epoch 140 Batch 30/172] avg loss 0.000902931, throughput 2.16527K wps
[Epoch 140 Batch 60/172] avg loss 0.00100231, throughput 2.12062K wps
[Epoch 140 Batch 90/172] avg loss 0.000836076, throughput 2.10107K wps
[Epoch 140 Batch 120/172] avg loss 0.00109541, throughput 2.10849K wps
[Epoch 140 Batch 150/172] avg loss 0.000885246, throughput 2.10156K wps
Begin Testing...
[Epoch 140] train avg loss 0.000943145, dev acc 0.9004, dev avg loss 0.382495, throughput 2.11821K wps
[Epoch 141 Batch 30/172] avg loss 0.000801713, throughput 2.15589K wps
[Epoch 141 Batch 60/172] avg loss 0.000947752, throughput 2.10886K wps
[Epoch 141 Batch 90/172] avg loss 0.00103187, throughput 2.10207K wps
[Epoch 141 Batch 120/172] avg loss 0.00107656, throughput 2.11187K wps
[Epoch 141 Batch 150/172] avg loss 0.00086866, throughput 2.10603K wps
Begin Testing...
[Epoch 141] train avg loss 0.000950854, dev acc 0.9015, dev avg loss 0.38122, throughput 2.11292K wps
[Epoch 142 Batch 30/172] avg loss 0.000724547, throughput 2.17584K wps
[Epoch 142 Batch 60/172] avg loss 0.00105907, throughput 2.13327K wps
[Epoch 142 Batch 90/172] avg loss 0.00104115, throughput 2.14266K wps
[Epoch 142 Batch 120/172] avg loss 0.000842946, throughput 2.13487K wps
[Epoch 142 Batch 150/172] avg loss 0.00102288, throughput 2.12285K wps
Begin Testing...
[Epoch 142] train avg loss 0.000961826, dev acc 0.9025, dev avg loss 0.377233, throughput 2.13803K wps
[Epoch 143 Batch 30/172] avg loss 0.00087402, throughput 2.1831K wps
[Epoch 143 Batch 60/172] avg loss 0.000998017, throughput 2.13677K wps
[Epoch 143 Batch 90/172] avg loss 0.000860978, throughput 2.11008K wps
[Epoch 143 Batch 120/172] avg loss 0.000874291, throughput 2.13239K wps
[Epoch 143 Batch 150/172] avg loss 0.00106197, throughput 2.12311K wps
Begin Testing...
[Epoch 143] train avg loss 0.00095535, dev acc 0.9004, dev avg loss 0.37905, throughput 2.13146K wps
[Epoch 144 Batch 30/172] avg loss 0.00100251, throughput 2.14328K wps
[Epoch 144 Batch 60/172] avg loss 0.000941845, throughput 2.13146K wps
[Epoch 144 Batch 90/172] avg loss 0.000903054, throughput 2.13807K wps
[Epoch 144 Batch 120/172] avg loss 0.00106039, throughput 2.11712K wps