Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='Subj', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='non-static')
Use gpu0
maximum length (in tokens): 120
Done! Tokenizing Time=0.23s, #Sentences=10000
SentimentNet(
(embedding): Embedding(21326 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/162] avg loss 0.0139879, throughput 0.519377K wps
[Epoch 0 Batch 60/162] avg loss 0.0138099, throughput 3.98166K wps
[Epoch 0 Batch 90/162] avg loss 0.0137182, throughput 3.97018K wps
[Epoch 0 Batch 120/162] avg loss 0.0135766, throughput 3.9753K wps
[Epoch 0 Batch 150/162] avg loss 0.0135231, throughput 3.97493K wps
Begin Testing...
[Epoch 0] train avg loss 0.0137009, dev acc 0.7822, dev avg loss 0.659232, throughput 1.78095K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.013246, throughput 4.07917K wps
[Epoch 1 Batch 60/162] avg loss 0.0130588, throughput 3.97273K wps
[Epoch 1 Batch 90/162] avg loss 0.0130003, throughput 3.97465K wps
[Epoch 1 Batch 120/162] avg loss 0.0128364, throughput 3.9825K wps
[Epoch 1 Batch 150/162] avg loss 0.0128245, throughput 3.96699K wps
Begin Testing...
[Epoch 1] train avg loss 0.0129484, dev acc 0.6811, dev avg loss 0.63181, throughput 3.99281K wps
[Epoch 2 Batch 30/162] avg loss 0.0126069, throughput 4.08349K wps
[Epoch 2 Batch 60/162] avg loss 0.0123297, throughput 3.96845K wps
[Epoch 2 Batch 90/162] avg loss 0.0121475, throughput 3.97263K wps
[Epoch 2 Batch 120/162] avg loss 0.0119399, throughput 3.97957K wps
[Epoch 2 Batch 150/162] avg loss 0.0118899, throughput 3.98129K wps
Begin Testing...
[Epoch 2] train avg loss 0.0121597, dev acc 0.8522, dev avg loss 0.582953, throughput 3.99515K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0115598, throughput 4.07451K wps
[Epoch 3 Batch 60/162] avg loss 0.0115004, throughput 3.98132K wps
[Epoch 3 Batch 90/162] avg loss 0.0112389, throughput 3.95348K wps
[Epoch 3 Batch 120/162] avg loss 0.011155, throughput 3.97367K wps
[Epoch 3 Batch 150/162] avg loss 0.0108201, throughput 3.97955K wps
Begin Testing...
[Epoch 3] train avg loss 0.0112085, dev acc 0.8511, dev avg loss 0.535335, throughput 3.98979K wps
[Epoch 4 Batch 30/162] avg loss 0.0105195, throughput 4.0632K wps
[Epoch 4 Batch 60/162] avg loss 0.0102728, throughput 3.9772K wps
[Epoch 4 Batch 90/162] avg loss 0.0102123, throughput 3.97699K wps
[Epoch 4 Batch 120/162] avg loss 0.0101576, throughput 3.97444K wps
[Epoch 4 Batch 150/162] avg loss 0.00997446, throughput 3.97722K wps
Begin Testing...
[Epoch 4] train avg loss 0.0101669, dev acc 0.8600, dev avg loss 0.483972, throughput 3.99186K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.00953274, throughput 4.06544K wps
[Epoch 5 Batch 60/162] avg loss 0.0094537, throughput 3.97895K wps
[Epoch 5 Batch 90/162] avg loss 0.00902704, throughput 3.97548K wps
[Epoch 5 Batch 120/162] avg loss 0.008906, throughput 3.96953K wps
[Epoch 5 Batch 150/162] avg loss 0.00909941, throughput 3.97183K wps
Begin Testing...
[Epoch 5] train avg loss 0.00918455, dev acc 0.8589, dev avg loss 0.4403, throughput 3.99057K wps
[Epoch 6 Batch 30/162] avg loss 0.00873539, throughput 4.08246K wps
[Epoch 6 Batch 60/162] avg loss 0.0086313, throughput 3.97768K wps
[Epoch 6 Batch 90/162] avg loss 0.00816172, throughput 3.97862K wps
[Epoch 6 Batch 120/162] avg loss 0.00819463, throughput 3.97422K wps
[Epoch 6 Batch 150/162] avg loss 0.00832301, throughput 3.97177K wps
Begin Testing...
[Epoch 6] train avg loss 0.0083557, dev acc 0.8611, dev avg loss 0.407235, throughput 3.99453K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/162] avg loss 0.00814709, throughput 4.07097K wps
[Epoch 7 Batch 60/162] avg loss 0.00780234, throughput 3.96587K wps
[Epoch 7 Batch 90/162] avg loss 0.00759229, throughput 3.97204K wps
[Epoch 7 Batch 120/162] avg loss 0.00737723, throughput 3.96509K wps
[Epoch 7 Batch 150/162] avg loss 0.00771399, throughput 3.9764K wps
Begin Testing...
[Epoch 7] train avg loss 0.00773691, dev acc 0.8644, dev avg loss 0.381616, throughput 3.98957K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.00723595, throughput 4.06998K wps
[Epoch 8 Batch 60/162] avg loss 0.00725869, throughput 3.96593K wps
[Epoch 8 Batch 90/162] avg loss 0.0071133, throughput 3.95597K wps
[Epoch 8 Batch 120/162] avg loss 0.00735365, throughput 3.97422K wps
[Epoch 8 Batch 150/162] avg loss 0.00734535, throughput 3.97734K wps
Begin Testing...
[Epoch 8] train avg loss 0.00720592, dev acc 0.8689, dev avg loss 0.361835, throughput 3.98619K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/162] avg loss 0.00691038, throughput 4.0661K wps
[Epoch 9 Batch 60/162] avg loss 0.00692505, throughput 3.96243K wps
[Epoch 9 Batch 90/162] avg loss 0.00699079, throughput 3.97226K wps
[Epoch 9 Batch 120/162] avg loss 0.00674475, throughput 3.96973K wps
[Epoch 9 Batch 150/162] avg loss 0.00653471, throughput 3.96093K wps
Begin Testing...
[Epoch 9] train avg loss 0.00679987, dev acc 0.8722, dev avg loss 0.345652, throughput 3.98464K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00632612, throughput 4.05365K wps
[Epoch 10 Batch 60/162] avg loss 0.00638748, throughput 3.97582K wps
[Epoch 10 Batch 90/162] avg loss 0.00660218, throughput 3.97474K wps
[Epoch 10 Batch 120/162] avg loss 0.00663467, throughput 3.96499K wps
[Epoch 10 Batch 150/162] avg loss 0.00626559, throughput 3.96569K wps
Begin Testing...
[Epoch 10] train avg loss 0.00641997, dev acc 0.8744, dev avg loss 0.333207, throughput 3.98609K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00607749, throughput 4.07062K wps
[Epoch 11 Batch 60/162] avg loss 0.00618429, throughput 3.97356K wps
[Epoch 11 Batch 90/162] avg loss 0.00613927, throughput 3.97682K wps
[Epoch 11 Batch 120/162] avg loss 0.00623978, throughput 3.97735K wps
[Epoch 11 Batch 150/162] avg loss 0.00597617, throughput 3.96796K wps
Begin Testing...
[Epoch 11] train avg loss 0.00612582, dev acc 0.8822, dev avg loss 0.321675, throughput 3.9918K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00604226, throughput 4.06639K wps
[Epoch 12 Batch 60/162] avg loss 0.0057043, throughput 3.97535K wps
[Epoch 12 Batch 90/162] avg loss 0.00544116, throughput 3.97226K wps
[Epoch 12 Batch 120/162] avg loss 0.00540986, throughput 3.97848K wps
[Epoch 12 Batch 150/162] avg loss 0.00599234, throughput 3.97725K wps
Begin Testing...
[Epoch 12] train avg loss 0.00571011, dev acc 0.8789, dev avg loss 0.310014, throughput 3.99123K wps
[Epoch 13 Batch 30/162] avg loss 0.00576601, throughput 4.06224K wps
[Epoch 13 Batch 60/162] avg loss 0.00578896, throughput 3.96122K wps
[Epoch 13 Batch 90/162] avg loss 0.00548611, throughput 3.96217K wps
[Epoch 13 Batch 120/162] avg loss 0.0058088, throughput 3.96749K wps
[Epoch 13 Batch 150/162] avg loss 0.00523159, throughput 3.96311K wps
Begin Testing...
[Epoch 13] train avg loss 0.00563596, dev acc 0.8833, dev avg loss 0.301952, throughput 3.98163K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00538563, throughput 4.07145K wps
[Epoch 14 Batch 60/162] avg loss 0.00544673, throughput 3.97292K wps
[Epoch 14 Batch 90/162] avg loss 0.00562219, throughput 3.96589K wps
[Epoch 14 Batch 120/162] avg loss 0.00515494, throughput 3.95976K wps
[Epoch 14 Batch 150/162] avg loss 0.00522139, throughput 3.97044K wps
Begin Testing...
[Epoch 14] train avg loss 0.00536074, dev acc 0.8878, dev avg loss 0.294112, throughput 3.98688K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00531505, throughput 4.05664K wps
[Epoch 15 Batch 60/162] avg loss 0.00506085, throughput 3.97691K wps
[Epoch 15 Batch 90/162] avg loss 0.00529926, throughput 3.96944K wps
[Epoch 15 Batch 120/162] avg loss 0.00505175, throughput 3.95438K wps
[Epoch 15 Batch 150/162] avg loss 0.00515411, throughput 3.9766K wps
Begin Testing...
[Epoch 15] train avg loss 0.00517113, dev acc 0.8933, dev avg loss 0.287163, throughput 3.98606K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/162] avg loss 0.00483317, throughput 4.06833K wps
[Epoch 16 Batch 60/162] avg loss 0.00491167, throughput 3.97502K wps
[Epoch 16 Batch 90/162] avg loss 0.00529088, throughput 3.97305K wps
[Epoch 16 Batch 120/162] avg loss 0.00508647, throughput 3.97361K wps
[Epoch 16 Batch 150/162] avg loss 0.00507969, throughput 3.97152K wps
Begin Testing...
[Epoch 16] train avg loss 0.00501231, dev acc 0.8956, dev avg loss 0.280996, throughput 3.99098K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.00469394, throughput 4.0663K wps
[Epoch 17 Batch 60/162] avg loss 0.0049662, throughput 3.96562K wps
[Epoch 17 Batch 90/162] avg loss 0.0052955, throughput 3.96688K wps
[Epoch 17 Batch 120/162] avg loss 0.00449594, throughput 3.97068K wps
[Epoch 17 Batch 150/162] avg loss 0.00500828, throughput 3.97062K wps
Begin Testing...
[Epoch 17] train avg loss 0.00486945, dev acc 0.8911, dev avg loss 0.276754, throughput 3.98541K wps
[Epoch 18 Batch 30/162] avg loss 0.00454502, throughput 4.0688K wps
[Epoch 18 Batch 60/162] avg loss 0.00481818, throughput 3.97564K wps
[Epoch 18 Batch 90/162] avg loss 0.00467161, throughput 3.96292K wps
[Epoch 18 Batch 120/162] avg loss 0.00498316, throughput 3.96468K wps
[Epoch 18 Batch 150/162] avg loss 0.00431144, throughput 3.96273K wps
Begin Testing...
[Epoch 18] train avg loss 0.00467807, dev acc 0.8989, dev avg loss 0.271471, throughput 3.98579K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.00461014, throughput 4.0667K wps
[Epoch 19 Batch 60/162] avg loss 0.00434638, throughput 3.97061K wps
[Epoch 19 Batch 90/162] avg loss 0.00444797, throughput 3.96479K wps
[Epoch 19 Batch 120/162] avg loss 0.00466971, throughput 3.97365K wps
[Epoch 19 Batch 150/162] avg loss 0.00456509, throughput 3.97485K wps
Begin Testing...
[Epoch 19] train avg loss 0.00451968, dev acc 0.8967, dev avg loss 0.267733, throughput 3.98828K wps
[Epoch 20 Batch 30/162] avg loss 0.00419915, throughput 4.05551K wps
[Epoch 20 Batch 60/162] avg loss 0.00425549, throughput 3.97766K wps
[Epoch 20 Batch 90/162] avg loss 0.00435887, throughput 3.97758K wps
[Epoch 20 Batch 120/162] avg loss 0.00466149, throughput 3.96702K wps
[Epoch 20 Batch 150/162] avg loss 0.00474927, throughput 3.96483K wps
Begin Testing...
[Epoch 20] train avg loss 0.00440688, dev acc 0.9022, dev avg loss 0.263603, throughput 3.9862K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/162] avg loss 0.00444114, throughput 4.06101K wps
[Epoch 21 Batch 60/162] avg loss 0.004244, throughput 3.9787K wps
[Epoch 21 Batch 90/162] avg loss 0.00412739, throughput 3.97281K wps
[Epoch 21 Batch 120/162] avg loss 0.00407548, throughput 3.96468K wps
[Epoch 21 Batch 150/162] avg loss 0.00403231, throughput 3.96315K wps
Begin Testing...
[Epoch 21] train avg loss 0.00418798, dev acc 0.9033, dev avg loss 0.260274, throughput 3.98609K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.00416164, throughput 4.07697K wps
[Epoch 22 Batch 60/162] avg loss 0.0041261, throughput 3.96636K wps
[Epoch 22 Batch 90/162] avg loss 0.00397903, throughput 3.96902K wps
[Epoch 22 Batch 120/162] avg loss 0.0039476, throughput 3.97033K wps
[Epoch 22 Batch 150/162] avg loss 0.00424281, throughput 3.97317K wps
Begin Testing...
[Epoch 22] train avg loss 0.00406673, dev acc 0.9022, dev avg loss 0.25698, throughput 3.98936K wps
[Epoch 23 Batch 30/162] avg loss 0.004018, throughput 4.07176K wps
[Epoch 23 Batch 60/162] avg loss 0.00396769, throughput 3.97592K wps
[Epoch 23 Batch 90/162] avg loss 0.003845, throughput 3.959K wps
[Epoch 23 Batch 120/162] avg loss 0.00407677, throughput 3.97068K wps
[Epoch 23 Batch 150/162] avg loss 0.00395317, throughput 3.96953K wps
Begin Testing...
[Epoch 23] train avg loss 0.0039596, dev acc 0.9078, dev avg loss 0.254368, throughput 3.98668K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/162] avg loss 0.00394869, throughput 4.05609K wps
[Epoch 24 Batch 60/162] avg loss 0.0037732, throughput 3.97349K wps
[Epoch 24 Batch 90/162] avg loss 0.00348149, throughput 3.97099K wps
[Epoch 24 Batch 120/162] avg loss 0.00396513, throughput 3.96504K wps
[Epoch 24 Batch 150/162] avg loss 0.00376483, throughput 3.96816K wps
Begin Testing...
[Epoch 24] train avg loss 0.00379852, dev acc 0.9078, dev avg loss 0.251106, throughput 3.98482K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/162] avg loss 0.00365086, throughput 4.04945K wps
[Epoch 25 Batch 60/162] avg loss 0.00402872, throughput 3.97484K wps
[Epoch 25 Batch 90/162] avg loss 0.00356266, throughput 3.96808K wps
[Epoch 25 Batch 120/162] avg loss 0.0039514, throughput 3.96222K wps
[Epoch 25 Batch 150/162] avg loss 0.00348811, throughput 3.96373K wps
Begin Testing...
[Epoch 25] train avg loss 0.00373991, dev acc 0.9067, dev avg loss 0.24902, throughput 3.98187K wps
[Epoch 26 Batch 30/162] avg loss 0.00376546, throughput 4.07032K wps
[Epoch 26 Batch 60/162] avg loss 0.00329796, throughput 3.96897K wps
[Epoch 26 Batch 90/162] avg loss 0.00349612, throughput 3.97339K wps
[Epoch 26 Batch 120/162] avg loss 0.00397364, throughput 3.97274K wps
[Epoch 26 Batch 150/162] avg loss 0.00356078, throughput 3.95904K wps
Begin Testing...
[Epoch 26] train avg loss 0.00362833, dev acc 0.9078, dev avg loss 0.246747, throughput 3.98766K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.0034729, throughput 4.05923K wps
[Epoch 27 Batch 60/162] avg loss 0.00341182, throughput 3.96304K wps
[Epoch 27 Batch 90/162] avg loss 0.00354008, throughput 3.9719K wps
[Epoch 27 Batch 120/162] avg loss 0.00350076, throughput 3.97872K wps
[Epoch 27 Batch 150/162] avg loss 0.00373748, throughput 3.97303K wps
Begin Testing...
[Epoch 27] train avg loss 0.0035444, dev acc 0.9044, dev avg loss 0.245151, throughput 3.98763K wps
[Epoch 28 Batch 30/162] avg loss 0.00313142, throughput 4.05932K wps
[Epoch 28 Batch 60/162] avg loss 0.0035752, throughput 3.97592K wps
[Epoch 28 Batch 90/162] avg loss 0.00322457, throughput 3.96441K wps
[Epoch 28 Batch 120/162] avg loss 0.00362403, throughput 3.95803K wps
[Epoch 28 Batch 150/162] avg loss 0.00344127, throughput 3.96315K wps
Begin Testing...
[Epoch 28] train avg loss 0.00342346, dev acc 0.9078, dev avg loss 0.243253, throughput 3.98138K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/162] avg loss 0.00355049, throughput 4.03085K wps
[Epoch 29 Batch 60/162] avg loss 0.00332167, throughput 3.97011K wps
[Epoch 29 Batch 90/162] avg loss 0.00305709, throughput 3.96528K wps
[Epoch 29 Batch 120/162] avg loss 0.00347103, throughput 3.97124K wps
[Epoch 29 Batch 150/162] avg loss 0.00298818, throughput 3.97549K wps
Begin Testing...
[Epoch 29] train avg loss 0.00325414, dev acc 0.9056, dev avg loss 0.24182, throughput 3.98226K wps
[Epoch 30 Batch 30/162] avg loss 0.00343033, throughput 4.05068K wps
[Epoch 30 Batch 60/162] avg loss 0.00295845, throughput 3.97017K wps
[Epoch 30 Batch 90/162] avg loss 0.00310296, throughput 3.97264K wps
[Epoch 30 Batch 120/162] avg loss 0.0033611, throughput 3.96622K wps
[Epoch 30 Batch 150/162] avg loss 0.00307984, throughput 3.96394K wps
Begin Testing...
[Epoch 30] train avg loss 0.00320582, dev acc 0.9056, dev avg loss 0.240329, throughput 3.98303K wps
[Epoch 31 Batch 30/162] avg loss 0.00310734, throughput 4.06854K wps
[Epoch 31 Batch 60/162] avg loss 0.00326421, throughput 3.96897K wps
[Epoch 31 Batch 90/162] avg loss 0.00296236, throughput 3.97051K wps
[Epoch 31 Batch 120/162] avg loss 0.00322509, throughput 3.97248K wps
[Epoch 31 Batch 150/162] avg loss 0.00307624, throughput 3.96477K wps
Begin Testing...
[Epoch 31] train avg loss 0.00312734, dev acc 0.9089, dev avg loss 0.239629, throughput 3.98686K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/162] avg loss 0.00295941, throughput 4.05412K wps
[Epoch 32 Batch 60/162] avg loss 0.00313761, throughput 3.94542K wps
[Epoch 32 Batch 90/162] avg loss 0.00312186, throughput 3.96777K wps
[Epoch 32 Batch 120/162] avg loss 0.00307972, throughput 3.9673K wps
[Epoch 32 Batch 150/162] avg loss 0.00278379, throughput 3.95752K wps
Begin Testing...
[Epoch 32] train avg loss 0.00301501, dev acc 0.9089, dev avg loss 0.238619, throughput 3.97642K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/162] avg loss 0.00294193, throughput 4.06138K wps
[Epoch 33 Batch 60/162] avg loss 0.00278707, throughput 3.96599K wps
[Epoch 33 Batch 90/162] avg loss 0.00329691, throughput 3.95758K wps
[Epoch 33 Batch 120/162] avg loss 0.00304052, throughput 3.97074K wps
[Epoch 33 Batch 150/162] avg loss 0.00300477, throughput 3.9605K wps
Begin Testing...
[Epoch 33] train avg loss 0.00297042, dev acc 0.9067, dev avg loss 0.239021, throughput 3.98126K wps
[Epoch 34 Batch 30/162] avg loss 0.0027751, throughput 4.06293K wps
[Epoch 34 Batch 60/162] avg loss 0.00278909, throughput 3.96742K wps
[Epoch 34 Batch 90/162] avg loss 0.00288921, throughput 3.97119K wps
[Epoch 34 Batch 120/162] avg loss 0.0025414, throughput 3.97344K wps
[Epoch 34 Batch 150/162] avg loss 0.00301764, throughput 3.9683K wps
Begin Testing...
[Epoch 34] train avg loss 0.00282594, dev acc 0.9089, dev avg loss 0.235885, throughput 3.98644K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/162] avg loss 0.00286913, throughput 4.04128K wps
[Epoch 35 Batch 60/162] avg loss 0.00265847, throughput 3.96678K wps
[Epoch 35 Batch 90/162] avg loss 0.00272095, throughput 3.96336K wps
[Epoch 35 Batch 120/162] avg loss 0.00286942, throughput 3.95778K wps
[Epoch 35 Batch 150/162] avg loss 0.00245785, throughput 3.9669K wps
Begin Testing...
[Epoch 35] train avg loss 0.00272263, dev acc 0.9089, dev avg loss 0.233997, throughput 3.97749K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/162] avg loss 0.00265271, throughput 4.05707K wps
[Epoch 36 Batch 60/162] avg loss 0.0026822, throughput 3.97404K wps
[Epoch 36 Batch 90/162] avg loss 0.00274611, throughput 3.97228K wps
[Epoch 36 Batch 120/162] avg loss 0.00239452, throughput 3.95624K wps
[Epoch 36 Batch 150/162] avg loss 0.00293286, throughput 3.96531K wps
Begin Testing...
[Epoch 36] train avg loss 0.00268521, dev acc 0.9100, dev avg loss 0.235208, throughput 3.98159K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/162] avg loss 0.00234825, throughput 4.05849K wps
[Epoch 37 Batch 60/162] avg loss 0.00270947, throughput 3.96049K wps
[Epoch 37 Batch 90/162] avg loss 0.00249041, throughput 3.97037K wps
[Epoch 37 Batch 120/162] avg loss 0.00296365, throughput 3.9691K wps
[Epoch 37 Batch 150/162] avg loss 0.00280566, throughput 3.96061K wps
Begin Testing...
[Epoch 37] train avg loss 0.00265537, dev acc 0.9100, dev avg loss 0.233215, throughput 3.98305K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/162] avg loss 0.00233653, throughput 4.05956K wps
[Epoch 38 Batch 60/162] avg loss 0.0026218, throughput 3.95385K wps
[Epoch 38 Batch 90/162] avg loss 0.00247674, throughput 3.96743K wps
[Epoch 38 Batch 120/162] avg loss 0.00238393, throughput 3.96764K wps
[Epoch 38 Batch 150/162] avg loss 0.00277486, throughput 3.96155K wps
Begin Testing...
[Epoch 38] train avg loss 0.00254695, dev acc 0.9100, dev avg loss 0.232775, throughput 3.97939K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/162] avg loss 0.00247948, throughput 4.05411K wps
[Epoch 39 Batch 60/162] avg loss 0.0025299, throughput 3.9676K wps
[Epoch 39 Batch 90/162] avg loss 0.00232298, throughput 3.97119K wps
[Epoch 39 Batch 120/162] avg loss 0.00250433, throughput 3.96755K wps
[Epoch 39 Batch 150/162] avg loss 0.00245705, throughput 3.96836K wps
Begin Testing...
[Epoch 39] train avg loss 0.00248834, dev acc 0.9111, dev avg loss 0.232121, throughput 3.98358K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/162] avg loss 0.00245987, throughput 4.06321K wps
[Epoch 40 Batch 60/162] avg loss 0.00243721, throughput 3.96429K wps
[Epoch 40 Batch 90/162] avg loss 0.00247857, throughput 3.95166K wps
[Epoch 40 Batch 120/162] avg loss 0.00226678, throughput 3.96606K wps
[Epoch 40 Batch 150/162] avg loss 0.00235358, throughput 3.97349K wps
Begin Testing...
[Epoch 40] train avg loss 0.00241301, dev acc 0.9100, dev avg loss 0.231343, throughput 3.98256K wps
[Epoch 41 Batch 30/162] avg loss 0.00246, throughput 4.0611K wps
[Epoch 41 Batch 60/162] avg loss 0.00213952, throughput 3.97146K wps
[Epoch 41 Batch 90/162] avg loss 0.00219419, throughput 3.96541K wps
[Epoch 41 Batch 120/162] avg loss 0.00247413, throughput 3.95876K wps
[Epoch 41 Batch 150/162] avg loss 0.00229071, throughput 3.96203K wps
Begin Testing...
[Epoch 41] train avg loss 0.00230453, dev acc 0.9111, dev avg loss 0.23041, throughput 3.98071K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/162] avg loss 0.00222935, throughput 4.03426K wps
[Epoch 42 Batch 60/162] avg loss 0.0022896, throughput 3.96682K wps
[Epoch 42 Batch 90/162] avg loss 0.00207808, throughput 3.96679K wps
[Epoch 42 Batch 120/162] avg loss 0.00221819, throughput 3.96754K wps
[Epoch 42 Batch 150/162] avg loss 0.00257954, throughput 3.96747K wps
Begin Testing...
[Epoch 42] train avg loss 0.00229665, dev acc 0.9111, dev avg loss 0.230339, throughput 3.97833K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/162] avg loss 0.0023038, throughput 4.06107K wps
[Epoch 43 Batch 60/162] avg loss 0.00208862, throughput 3.95438K wps
[Epoch 43 Batch 90/162] avg loss 0.00215563, throughput 3.95954K wps
[Epoch 43 Batch 120/162] avg loss 0.00228136, throughput 3.95858K wps
[Epoch 43 Batch 150/162] avg loss 0.00215245, throughput 3.9628K wps
Begin Testing...
[Epoch 43] train avg loss 0.00220212, dev acc 0.9122, dev avg loss 0.229938, throughput 3.97838K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/162] avg loss 0.00224699, throughput 4.05697K wps
[Epoch 44 Batch 60/162] avg loss 0.00216238, throughput 3.96498K wps
[Epoch 44 Batch 90/162] avg loss 0.00220064, throughput 3.96K wps
[Epoch 44 Batch 120/162] avg loss 0.00193978, throughput 3.95211K wps
[Epoch 44 Batch 150/162] avg loss 0.00234478, throughput 3.96689K wps
Begin Testing...
[Epoch 44] train avg loss 0.00215186, dev acc 0.9122, dev avg loss 0.229613, throughput 3.97687K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/162] avg loss 0.0021087, throughput 4.06381K wps
[Epoch 45 Batch 60/162] avg loss 0.00221062, throughput 3.96338K wps
[Epoch 45 Batch 90/162] avg loss 0.00218265, throughput 3.94146K wps
[Epoch 45 Batch 120/162] avg loss 0.00227687, throughput 3.96546K wps
[Epoch 45 Batch 150/162] avg loss 0.00192279, throughput 3.96271K wps
Begin Testing...
[Epoch 45] train avg loss 0.00213888, dev acc 0.9133, dev avg loss 0.228947, throughput 3.97798K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/162] avg loss 0.00197484, throughput 4.05469K wps
[Epoch 46 Batch 60/162] avg loss 0.00191701, throughput 3.95883K wps
[Epoch 46 Batch 90/162] avg loss 0.0019993, throughput 3.95296K wps
[Epoch 46 Batch 120/162] avg loss 0.00213129, throughput 3.95475K wps
[Epoch 46 Batch 150/162] avg loss 0.00210426, throughput 3.96714K wps
Begin Testing...
[Epoch 46] train avg loss 0.00200961, dev acc 0.9133, dev avg loss 0.230792, throughput 3.97655K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/162] avg loss 0.00191421, throughput 4.0578K wps
[Epoch 47 Batch 60/162] avg loss 0.00201675, throughput 3.97168K wps
[Epoch 47 Batch 90/162] avg loss 0.00186469, throughput 3.97459K wps
[Epoch 47 Batch 120/162] avg loss 0.00190284, throughput 3.96984K wps
[Epoch 47 Batch 150/162] avg loss 0.00191976, throughput 3.97016K wps
Begin Testing...
[Epoch 47] train avg loss 0.00191655, dev acc 0.9144, dev avg loss 0.231193, throughput 3.98733K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/162] avg loss 0.00192513, throughput 4.05118K wps
[Epoch 48 Batch 60/162] avg loss 0.0019239, throughput 3.96004K wps
[Epoch 48 Batch 90/162] avg loss 0.00185332, throughput 3.96365K wps
[Epoch 48 Batch 120/162] avg loss 0.00208791, throughput 3.95308K wps
[Epoch 48 Batch 150/162] avg loss 0.00201858, throughput 3.95276K wps
Begin Testing...
[Epoch 48] train avg loss 0.00191417, dev acc 0.9144, dev avg loss 0.230332, throughput 3.97472K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/162] avg loss 0.00174436, throughput 4.06308K wps
[Epoch 49 Batch 60/162] avg loss 0.00193618, throughput 3.96823K wps
[Epoch 49 Batch 90/162] avg loss 0.00179038, throughput 3.96275K wps
[Epoch 49 Batch 120/162] avg loss 0.00169459, throughput 3.97251K wps
[Epoch 49 Batch 150/162] avg loss 0.00198473, throughput 3.95479K wps
Begin Testing...
[Epoch 49] train avg loss 0.0018365, dev acc 0.9111, dev avg loss 0.229241, throughput 3.97965K wps
[Epoch 50 Batch 30/162] avg loss 0.00174641, throughput 4.05937K wps
[Epoch 50 Batch 60/162] avg loss 0.00191506, throughput 3.96872K wps
[Epoch 50 Batch 90/162] avg loss 0.00176437, throughput 3.94152K wps
[Epoch 50 Batch 120/162] avg loss 0.00203453, throughput 3.96644K wps
[Epoch 50 Batch 150/162] avg loss 0.00161034, throughput 3.96736K wps
Begin Testing...
[Epoch 50] train avg loss 0.00180475, dev acc 0.9144, dev avg loss 0.22981, throughput 3.98019K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/162] avg loss 0.00163472, throughput 4.05821K wps
[Epoch 51 Batch 60/162] avg loss 0.00175232, throughput 3.96856K wps
[Epoch 51 Batch 90/162] avg loss 0.00173731, throughput 3.96309K wps
[Epoch 51 Batch 120/162] avg loss 0.00172742, throughput 3.96691K wps
[Epoch 51 Batch 150/162] avg loss 0.00168388, throughput 3.96394K wps
Begin Testing...
[Epoch 51] train avg loss 0.00172394, dev acc 0.9167, dev avg loss 0.230276, throughput 3.98272K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/162] avg loss 0.00166937, throughput 4.0396K wps
[Epoch 52 Batch 60/162] avg loss 0.0016676, throughput 3.96812K wps
[Epoch 52 Batch 90/162] avg loss 0.00187429, throughput 3.97562K wps
[Epoch 52 Batch 120/162] avg loss 0.00164254, throughput 3.97381K wps
[Epoch 52 Batch 150/162] avg loss 0.00161472, throughput 3.96657K wps
Begin Testing...
[Epoch 52] train avg loss 0.00168859, dev acc 0.9167, dev avg loss 0.230893, throughput 3.98379K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/162] avg loss 0.00180918, throughput 4.07019K wps
[Epoch 53 Batch 60/162] avg loss 0.00179066, throughput 3.9611K wps
[Epoch 53 Batch 90/162] avg loss 0.00145422, throughput 3.96804K wps
[Epoch 53 Batch 120/162] avg loss 0.00168996, throughput 3.95209K wps
[Epoch 53 Batch 150/162] avg loss 0.00162315, throughput 3.95609K wps
Begin Testing...
[Epoch 53] train avg loss 0.00165935, dev acc 0.9167, dev avg loss 0.230886, throughput 3.97975K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/162] avg loss 0.00146386, throughput 4.05337K wps
[Epoch 54 Batch 60/162] avg loss 0.0016187, throughput 3.96467K wps
[Epoch 54 Batch 90/162] avg loss 0.00172345, throughput 3.95783K wps
[Epoch 54 Batch 120/162] avg loss 0.00149342, throughput 3.96249K wps
[Epoch 54 Batch 150/162] avg loss 0.0016881, throughput 3.945K wps
Begin Testing...
[Epoch 54] train avg loss 0.00159792, dev acc 0.9167, dev avg loss 0.230775, throughput 3.97569K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/162] avg loss 0.00159678, throughput 4.05082K wps
[Epoch 55 Batch 60/162] avg loss 0.00162838, throughput 3.96181K wps
[Epoch 55 Batch 90/162] avg loss 0.00159293, throughput 3.95864K wps
[Epoch 55 Batch 120/162] avg loss 0.00155473, throughput 3.9685K wps
[Epoch 55 Batch 150/162] avg loss 0.001436, throughput 3.96023K wps
Begin Testing...
[Epoch 55] train avg loss 0.00155758, dev acc 0.9133, dev avg loss 0.229861, throughput 3.97702K wps
[Epoch 56 Batch 30/162] avg loss 0.00149004, throughput 4.04371K wps
[Epoch 56 Batch 60/162] avg loss 0.00158831, throughput 3.95596K wps
[Epoch 56 Batch 90/162] avg loss 0.00149615, throughput 3.96178K wps
[Epoch 56 Batch 120/162] avg loss 0.00161684, throughput 3.9696K wps
[Epoch 56 Batch 150/162] avg loss 0.00132162, throughput 3.96558K wps
Begin Testing...
[Epoch 56] train avg loss 0.00150253, dev acc 0.9167, dev avg loss 0.230587, throughput 3.97755K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/162] avg loss 0.00152431, throughput 4.04799K wps
[Epoch 57 Batch 60/162] avg loss 0.0013935, throughput 3.97245K wps
[Epoch 57 Batch 90/162] avg loss 0.0014226, throughput 3.96638K wps
[Epoch 57 Batch 120/162] avg loss 0.0014398, throughput 3.96324K wps
[Epoch 57 Batch 150/162] avg loss 0.00135715, throughput 3.96601K wps
Begin Testing...
[Epoch 57] train avg loss 0.00144325, dev acc 0.9156, dev avg loss 0.231632, throughput 3.98156K wps
[Epoch 58 Batch 30/162] avg loss 0.00142341, throughput 4.05469K wps
[Epoch 58 Batch 60/162] avg loss 0.00150184, throughput 3.96741K wps
[Epoch 58 Batch 90/162] avg loss 0.00148447, throughput 3.96544K wps
[Epoch 58 Batch 120/162] avg loss 0.00158965, throughput 3.95919K wps
[Epoch 58 Batch 150/162] avg loss 0.00152541, throughput 3.96011K wps
Begin Testing...
[Epoch 58] train avg loss 0.00149393, dev acc 0.9167, dev avg loss 0.231202, throughput 3.98036K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/162] avg loss 0.00142499, throughput 4.05691K wps
[Epoch 59 Batch 60/162] avg loss 0.00132224, throughput 3.96366K wps
[Epoch 59 Batch 90/162] avg loss 0.00121261, throughput 3.96949K wps
[Epoch 59 Batch 120/162] avg loss 0.00148879, throughput 3.96739K wps
[Epoch 59 Batch 150/162] avg loss 0.00141519, throughput 3.96085K wps
Begin Testing...
[Epoch 59] train avg loss 0.00137667, dev acc 0.9144, dev avg loss 0.230846, throughput 3.98075K wps
[Epoch 60 Batch 30/162] avg loss 0.00140343, throughput 4.05298K wps
[Epoch 60 Batch 60/162] avg loss 0.00137273, throughput 3.95143K wps
[Epoch 60 Batch 90/162] avg loss 0.0012823, throughput 3.96329K wps
[Epoch 60 Batch 120/162] avg loss 0.00133991, throughput 3.96347K wps
[Epoch 60 Batch 150/162] avg loss 0.0013217, throughput 3.96399K wps
Begin Testing...
[Epoch 60] train avg loss 0.00132999, dev acc 0.9111, dev avg loss 0.231911, throughput 3.97775K wps
[Epoch 61 Batch 30/162] avg loss 0.00120299, throughput 4.05928K wps
[Epoch 61 Batch 60/162] avg loss 0.00135187, throughput 3.9726K wps
[Epoch 61 Batch 90/162] avg loss 0.00133581, throughput 3.96274K wps
[Epoch 61 Batch 120/162] avg loss 0.00141431, throughput 3.96192K wps
[Epoch 61 Batch 150/162] avg loss 0.00129439, throughput 3.96689K wps
Begin Testing...
[Epoch 61] train avg loss 0.00132136, dev acc 0.9133, dev avg loss 0.231247, throughput 3.98212K wps
[Epoch 62 Batch 30/162] avg loss 0.00129741, throughput 4.046K wps
[Epoch 62 Batch 60/162] avg loss 0.00128087, throughput 3.9645K wps
[Epoch 62 Batch 90/162] avg loss 0.0011835, throughput 3.96433K wps
[Epoch 62 Batch 120/162] avg loss 0.00125559, throughput 3.97184K wps
[Epoch 62 Batch 150/162] avg loss 0.00126705, throughput 3.96774K wps
Begin Testing...
[Epoch 62] train avg loss 0.00125963, dev acc 0.9133, dev avg loss 0.231152, throughput 3.98181K wps
[Epoch 63 Batch 30/162] avg loss 0.00122178, throughput 4.06829K wps
[Epoch 63 Batch 60/162] avg loss 0.00131648, throughput 3.96095K wps
[Epoch 63 Batch 90/162] avg loss 0.00132712, throughput 3.96531K wps
[Epoch 63 Batch 120/162] avg loss 0.00129757, throughput 3.96525K wps
[Epoch 63 Batch 150/162] avg loss 0.00116402, throughput 3.96874K wps
Begin Testing...
[Epoch 63] train avg loss 0.00125885, dev acc 0.9122, dev avg loss 0.232345, throughput 3.98424K wps
[Epoch 64 Batch 30/162] avg loss 0.00126984, throughput 4.06409K wps
[Epoch 64 Batch 60/162] avg loss 0.00126558, throughput 3.96497K wps
[Epoch 64 Batch 90/162] avg loss 0.00115282, throughput 3.97626K wps
[Epoch 64 Batch 120/162] avg loss 0.00125914, throughput 3.96849K wps
[Epoch 64 Batch 150/162] avg loss 0.00115599, throughput 3.96312K wps
Begin Testing...
[Epoch 64] train avg loss 0.00121812, dev acc 0.9122, dev avg loss 0.232852, throughput 3.98384K wps
[Epoch 65 Batch 30/162] avg loss 0.00100025, throughput 4.06119K wps
[Epoch 65 Batch 60/162] avg loss 0.00119303, throughput 3.95531K wps
[Epoch 65 Batch 90/162] avg loss 0.00119809, throughput 3.96396K wps
[Epoch 65 Batch 120/162] avg loss 0.00143245, throughput 3.96404K wps
[Epoch 65 Batch 150/162] avg loss 0.0010787, throughput 3.96805K wps
Begin Testing...
[Epoch 65] train avg loss 0.00118451, dev acc 0.9144, dev avg loss 0.233135, throughput 3.98138K wps
[Epoch 66 Batch 30/162] avg loss 0.00113927, throughput 4.06682K wps
[Epoch 66 Batch 60/162] avg loss 0.00116894, throughput 3.97083K wps
[Epoch 66 Batch 90/162] avg loss 0.00110436, throughput 3.97171K wps
[Epoch 66 Batch 120/162] avg loss 0.0012798, throughput 3.95587K wps
[Epoch 66 Batch 150/162] avg loss 0.00109501, throughput 3.96784K wps
Begin Testing...
[Epoch 66] train avg loss 0.00114976, dev acc 0.9122, dev avg loss 0.232169, throughput 3.98554K wps
[Epoch 67 Batch 30/162] avg loss 0.000969039, throughput 4.05312K wps
[Epoch 67 Batch 60/162] avg loss 0.00118791, throughput 3.95461K wps
[Epoch 67 Batch 90/162] avg loss 0.00113092, throughput 3.95838K wps
[Epoch 67 Batch 120/162] avg loss 0.000998633, throughput 3.9597K wps
[Epoch 67 Batch 150/162] avg loss 0.00108239, throughput 3.97458K wps
Begin Testing...
[Epoch 67] train avg loss 0.00107331, dev acc 0.9156, dev avg loss 0.234023, throughput 3.97806K wps
[Epoch 68 Batch 30/162] avg loss 0.00100761, throughput 4.04945K wps
[Epoch 68 Batch 60/162] avg loss 0.00102904, throughput 3.95819K wps
[Epoch 68 Batch 90/162] avg loss 0.00112826, throughput 3.96869K wps
[Epoch 68 Batch 120/162] avg loss 0.00100643, throughput 3.96606K wps
[Epoch 68 Batch 150/162] avg loss 0.00116067, throughput 3.95141K wps
Begin Testing...
[Epoch 68] train avg loss 0.00107482, dev acc 0.9100, dev avg loss 0.233785, throughput 3.97767K wps
[Epoch 69 Batch 30/162] avg loss 0.00107769, throughput 4.06762K wps
[Epoch 69 Batch 60/162] avg loss 0.00109189, throughput 3.96631K wps
[Epoch 69 Batch 90/162] avg loss 0.00106873, throughput 3.96978K wps
[Epoch 69 Batch 120/162] avg loss 0.000890604, throughput 3.9717K wps
[Epoch 69 Batch 150/162] avg loss 0.00114679, throughput 3.96689K wps
Begin Testing...
[Epoch 69] train avg loss 0.0010405, dev acc 0.9156, dev avg loss 0.235371, throughput 3.98275K wps
[Epoch 70 Batch 30/162] avg loss 0.00100659, throughput 4.05966K wps
[Epoch 70 Batch 60/162] avg loss 0.000940508, throughput 3.9739K wps
[Epoch 70 Batch 90/162] avg loss 0.00108778, throughput 3.96669K wps
[Epoch 70 Batch 120/162] avg loss 0.000870688, throughput 3.9631K wps
[Epoch 70 Batch 150/162] avg loss 0.00110208, throughput 3.97065K wps
Begin Testing...
[Epoch 70] train avg loss 0.000993991, dev acc 0.9100, dev avg loss 0.234981, throughput 3.98427K wps
[Epoch 71 Batch 30/162] avg loss 0.00096306, throughput 4.0639K wps
[Epoch 71 Batch 60/162] avg loss 0.000985338, throughput 3.96914K wps
[Epoch 71 Batch 90/162] avg loss 0.0010515, throughput 3.96837K wps
[Epoch 71 Batch 120/162] avg loss 0.00101298, throughput 3.953K wps
[Epoch 71 Batch 150/162] avg loss 0.000944053, throughput 3.9712K wps
Begin Testing...
[Epoch 71] train avg loss 0.000986142, dev acc 0.9144, dev avg loss 0.235697, throughput 3.98316K wps
[Epoch 72 Batch 30/162] avg loss 0.000933136, throughput 4.05506K wps
[Epoch 72 Batch 60/162] avg loss 0.000949904, throughput 3.96234K wps
[Epoch 72 Batch 90/162] avg loss 0.000952761, throughput 3.96781K wps
[Epoch 72 Batch 120/162] avg loss 0.000964171, throughput 3.97279K wps
[Epoch 72 Batch 150/162] avg loss 0.000933337, throughput 3.96046K wps
Begin Testing...
[Epoch 72] train avg loss 0.000951582, dev acc 0.9144, dev avg loss 0.236643, throughput 3.98168K wps
[Epoch 73 Batch 30/162] avg loss 0.000947182, throughput 4.07578K wps
[Epoch 73 Batch 60/162] avg loss 0.00103892, throughput 3.9548K wps
[Epoch 73 Batch 90/162] avg loss 0.000886194, throughput 3.97612K wps
[Epoch 73 Batch 120/162] avg loss 0.000975921, throughput 3.97182K wps
[Epoch 73 Batch 150/162] avg loss 0.000962613, throughput 3.96153K wps
Begin Testing...
[Epoch 73] train avg loss 0.000958151, dev acc 0.9144, dev avg loss 0.236343, throughput 3.98631K wps
[Epoch 74 Batch 30/162] avg loss 0.000863718, throughput 4.06239K wps
[Epoch 74 Batch 60/162] avg loss 0.000950569, throughput 3.9651K wps
[Epoch 74 Batch 90/162] avg loss 0.000914319, throughput 3.95579K wps
[Epoch 74 Batch 120/162] avg loss 0.00097873, throughput 3.97083K wps
[Epoch 74 Batch 150/162] avg loss 0.000859917, throughput 3.95957K wps
Begin Testing...
[Epoch 74] train avg loss 0.00092389, dev acc 0.9111, dev avg loss 0.237746, throughput 3.98143K wps
[Epoch 75 Batch 30/162] avg loss 0.000877121, throughput 4.05481K wps
[Epoch 75 Batch 60/162] avg loss 0.0009001, throughput 3.97067K wps
[Epoch 75 Batch 90/162] avg loss 0.000880976, throughput 3.96263K wps
[Epoch 75 Batch 120/162] avg loss 0.000861375, throughput 3.96077K wps
[Epoch 75 Batch 150/162] avg loss 0.000944932, throughput 3.96453K wps
Begin Testing...
[Epoch 75] train avg loss 0.000918381, dev acc 0.9111, dev avg loss 0.237592, throughput 3.9817K wps
[Epoch 76 Batch 30/162] avg loss 0.000849661, throughput 4.06402K wps
[Epoch 76 Batch 60/162] avg loss 0.000985202, throughput 3.95966K wps
[Epoch 76 Batch 90/162] avg loss 0.00087262, throughput 3.97153K wps
[Epoch 76 Batch 120/162] avg loss 0.00087544, throughput 3.96272K wps
[Epoch 76 Batch 150/162] avg loss 0.000822298, throughput 3.96974K wps
Begin Testing...
[Epoch 76] train avg loss 0.000861162, dev acc 0.9144, dev avg loss 0.238842, throughput 3.98357K wps
[Epoch 77 Batch 30/162] avg loss 0.000820592, throughput 4.03558K wps
[Epoch 77 Batch 60/162] avg loss 0.000921069, throughput 3.96419K wps
[Epoch 77 Batch 90/162] avg loss 0.000898361, throughput 3.97276K wps
[Epoch 77 Batch 120/162] avg loss 0.000891198, throughput 3.95509K wps
[Epoch 77 Batch 150/162] avg loss 0.000842367, throughput 3.96692K wps
Begin Testing...
[Epoch 77] train avg loss 0.000858729, dev acc 0.9156, dev avg loss 0.239267, throughput 3.97855K wps
[Epoch 78 Batch 30/162] avg loss 0.000907846, throughput 4.06882K wps
[Epoch 78 Batch 60/162] avg loss 0.000881144, throughput 3.96522K wps
[Epoch 78 Batch 90/162] avg loss 0.000808364, throughput 3.96267K wps
[Epoch 78 Batch 120/162] avg loss 0.000845366, throughput 3.96966K wps
[Epoch 78 Batch 150/162] avg loss 0.000751048, throughput 3.95521K wps
Begin Testing...
[Epoch 78] train avg loss 0.00082847, dev acc 0.9122, dev avg loss 0.238663, throughput 3.98172K wps
[Epoch 79 Batch 30/162] avg loss 0.000804181, throughput 4.06317K wps
[Epoch 79 Batch 60/162] avg loss 0.000817924, throughput 3.9704K wps
[Epoch 79 Batch 90/162] avg loss 0.000766131, throughput 3.97131K wps
[Epoch 79 Batch 120/162] avg loss 0.000808714, throughput 3.96379K wps
[Epoch 79 Batch 150/162] avg loss 0.000823432, throughput 3.95518K wps
Begin Testing...
[Epoch 79] train avg loss 0.000797532, dev acc 0.9133, dev avg loss 0.23967, throughput 3.98209K wps
[Epoch 80 Batch 30/162] avg loss 0.000769842, throughput 4.05061K wps
[Epoch 80 Batch 60/162] avg loss 0.000732789, throughput 3.96952K wps
[Epoch 80 Batch 90/162] avg loss 0.000807921, throughput 3.94135K wps
[Epoch 80 Batch 120/162] avg loss 0.000877649, throughput 3.96572K wps
[Epoch 80 Batch 150/162] avg loss 0.000814779, throughput 3.96861K wps
Begin Testing...
[Epoch 80] train avg loss 0.000798518, dev acc 0.9122, dev avg loss 0.239607, throughput 3.97786K wps
[Epoch 81 Batch 30/162] avg loss 0.000652043, throughput 4.06616K wps
[Epoch 81 Batch 60/162] avg loss 0.00076986, throughput 3.9736K wps
[Epoch 81 Batch 90/162] avg loss 0.0008167, throughput 3.9693K wps
[Epoch 81 Batch 120/162] avg loss 0.000702045, throughput 3.96322K wps
[Epoch 81 Batch 150/162] avg loss 0.000704943, throughput 3.95751K wps
Begin Testing...
[Epoch 81] train avg loss 0.000740322, dev acc 0.9111, dev avg loss 0.240929, throughput 3.98478K wps
[Epoch 82 Batch 30/162] avg loss 0.000676822, throughput 4.06892K wps
[Epoch 82 Batch 60/162] avg loss 0.000750608, throughput 3.95979K wps
[Epoch 82 Batch 90/162] avg loss 0.000720188, throughput 3.96576K wps
[Epoch 82 Batch 120/162] avg loss 0.000898016, throughput 3.97081K wps
[Epoch 82 Batch 150/162] avg loss 0.000783586, throughput 3.96227K wps
Begin Testing...
[Epoch 82] train avg loss 0.000765062, dev acc 0.9122, dev avg loss 0.241058, throughput 3.98256K wps
[Epoch 83 Batch 30/162] avg loss 0.00085595, throughput 4.05793K wps
[Epoch 83 Batch 60/162] avg loss 0.000784193, throughput 3.97022K wps
[Epoch 83 Batch 90/162] avg loss 0.000760521, throughput 3.96096K wps
[Epoch 83 Batch 120/162] avg loss 0.000702645, throughput 3.95971K wps
[Epoch 83 Batch 150/162] avg loss 0.000833852, throughput 3.96469K wps
Begin Testing...
[Epoch 83] train avg loss 0.000779324, dev acc 0.9122, dev avg loss 0.241553, throughput 3.98024K wps
[Epoch 84 Batch 30/162] avg loss 0.000794403, throughput 4.06044K wps
[Epoch 84 Batch 60/162] avg loss 0.000701403, throughput 3.97153K wps
[Epoch 84 Batch 90/162] avg loss 0.000783801, throughput 3.96794K wps
[Epoch 84 Batch 120/162] avg loss 0.000750373, throughput 3.96438K wps
[Epoch 84 Batch 150/162] avg loss 0.000646514, throughput 3.96187K wps
Begin Testing...
[Epoch 84] train avg loss 0.000737525, dev acc 0.9156, dev avg loss 0.243147, throughput 3.98345K wps
[Epoch 85 Batch 30/162] avg loss 0.000650751, throughput 4.05781K wps
[Epoch 85 Batch 60/162] avg loss 0.000751839, throughput 3.96267K wps
[Epoch 85 Batch 90/162] avg loss 0.000731457, throughput 3.96858K wps
[Epoch 85 Batch 120/162] avg loss 0.000633095, throughput 3.95052K wps
[Epoch 85 Batch 150/162] avg loss 0.0007506, throughput 3.97073K wps
Begin Testing...
[Epoch 85] train avg loss 0.000710674, dev acc 0.9122, dev avg loss 0.242922, throughput 3.97894K wps
[Epoch 86 Batch 30/162] avg loss 0.000605388, throughput 4.061K wps
[Epoch 86 Batch 60/162] avg loss 0.000665756, throughput 3.96949K wps
[Epoch 86 Batch 90/162] avg loss 0.00072418, throughput 3.97158K wps
[Epoch 86 Batch 120/162] avg loss 0.000791967, throughput 3.9589K wps
[Epoch 86 Batch 150/162] avg loss 0.000637345, throughput 3.95921K wps
Begin Testing...
[Epoch 86] train avg loss 0.000683577, dev acc 0.9144, dev avg loss 0.243549, throughput 3.98245K wps
[Epoch 87 Batch 30/162] avg loss 0.000719631, throughput 4.05846K wps
[Epoch 87 Batch 60/162] avg loss 0.0006837, throughput 3.96291K wps
[Epoch 87 Batch 90/162] avg loss 0.000591231, throughput 3.96549K wps
[Epoch 87 Batch 120/162] avg loss 0.000694745, throughput 3.96757K wps
[Epoch 87 Batch 150/162] avg loss 0.000632826, throughput 3.97438K wps
Begin Testing...
[Epoch 87] train avg loss 0.000664334, dev acc 0.9144, dev avg loss 0.24435, throughput 3.9826K wps
[Epoch 88 Batch 30/162] avg loss 0.000580401, throughput 4.0688K wps
[Epoch 88 Batch 60/162] avg loss 0.000773082, throughput 3.96265K wps
[Epoch 88 Batch 90/162] avg loss 0.000613769, throughput 3.95743K wps
[Epoch 88 Batch 120/162] avg loss 0.000589752, throughput 3.96271K wps
[Epoch 88 Batch 150/162] avg loss 0.000690384, throughput 3.95929K wps
Begin Testing...
[Epoch 88] train avg loss 0.000650593, dev acc 0.9178, dev avg loss 0.246586, throughput 3.9791K wps
Observed Improvement.
Begin Testing...
[Epoch 89 Batch 30/162] avg loss 0.00067557, throughput 4.05857K wps
[Epoch 89 Batch 60/162] avg loss 0.000657151, throughput 3.96445K wps
[Epoch 89 Batch 90/162] avg loss 0.000756645, throughput 3.963K wps
[Epoch 89 Batch 120/162] avg loss 0.000677116, throughput 3.96686K wps
[Epoch 89 Batch 150/162] avg loss 0.000561914, throughput 3.96455K wps
Begin Testing...
[Epoch 89] train avg loss 0.000660888, dev acc 0.9167, dev avg loss 0.24702, throughput 3.98107K wps
[Epoch 90 Batch 30/162] avg loss 0.000646694, throughput 4.04861K wps
[Epoch 90 Batch 60/162] avg loss 0.000583368, throughput 3.95811K wps
[Epoch 90 Batch 90/162] avg loss 0.000668853, throughput 3.968K wps
[Epoch 90 Batch 120/162] avg loss 0.000626149, throughput 3.96539K wps
[Epoch 90 Batch 150/162] avg loss 0.000738196, throughput 3.97181K wps
Begin Testing...
[Epoch 90] train avg loss 0.000656508, dev acc 0.9156, dev avg loss 0.246905, throughput 3.97795K wps
[Epoch 91 Batch 30/162] avg loss 0.000636451, throughput 4.06777K wps
[Epoch 91 Batch 60/162] avg loss 0.000636747, throughput 3.96486K wps
[Epoch 91 Batch 90/162] avg loss 0.000638804, throughput 3.97321K wps
[Epoch 91 Batch 120/162] avg loss 0.000590813, throughput 3.96786K wps
[Epoch 91 Batch 150/162] avg loss 0.000693733, throughput 3.95355K wps
Begin Testing...
[Epoch 91] train avg loss 0.000639576, dev acc 0.9144, dev avg loss 0.245852, throughput 3.98277K wps
[Epoch 92 Batch 30/162] avg loss 0.000685675, throughput 4.06493K wps
[Epoch 92 Batch 60/162] avg loss 0.000588914, throughput 3.96858K wps
[Epoch 92 Batch 90/162] avg loss 0.000631639, throughput 3.96351K wps
[Epoch 92 Batch 120/162] avg loss 0.000632326, throughput 3.97292K wps
[Epoch 92 Batch 150/162] avg loss 0.000576907, throughput 3.95537K wps
Begin Testing...
[Epoch 92] train avg loss 0.000621437, dev acc 0.9111, dev avg loss 0.246043, throughput 3.9842K wps
[Epoch 93 Batch 30/162] avg loss 0.000532915, throughput 4.06176K wps
[Epoch 93 Batch 60/162] avg loss 0.000614651, throughput 3.96839K wps
[Epoch 93 Batch 90/162] avg loss 0.000589065, throughput 3.9608K wps
[Epoch 93 Batch 120/162] avg loss 0.000606105, throughput 3.95518K wps
[Epoch 93 Batch 150/162] avg loss 0.000545098, throughput 3.96349K wps
Begin Testing...
[Epoch 93] train avg loss 0.000579269, dev acc 0.9167, dev avg loss 0.24844, throughput 3.9803K wps
[Epoch 94 Batch 30/162] avg loss 0.000567919, throughput 4.05899K wps
[Epoch 94 Batch 60/162] avg loss 0.000581752, throughput 3.96375K wps
[Epoch 94 Batch 90/162] avg loss 0.000562138, throughput 3.96997K wps
[Epoch 94 Batch 120/162] avg loss 0.0005877, throughput 3.97418K wps
[Epoch 94 Batch 150/162] avg loss 0.000557252, throughput 3.96634K wps
Begin Testing...
[Epoch 94] train avg loss 0.00057633, dev acc 0.9122, dev avg loss 0.248155, throughput 3.98526K wps
[Epoch 95 Batch 30/162] avg loss 0.000664602, throughput 4.05813K wps
[Epoch 95 Batch 60/162] avg loss 0.000575433, throughput 3.94902K wps
[Epoch 95 Batch 90/162] avg loss 0.00050354, throughput 3.96908K wps
[Epoch 95 Batch 120/162] avg loss 0.000593696, throughput 3.95978K wps
[Epoch 95 Batch 150/162] avg loss 0.000566903, throughput 3.95535K wps
Begin Testing...
[Epoch 95] train avg loss 0.000572548, dev acc 0.9122, dev avg loss 0.248979, throughput 3.97518K wps
[Epoch 96 Batch 30/162] avg loss 0.000498624, throughput 4.06083K wps
[Epoch 96 Batch 60/162] avg loss 0.000583961, throughput 3.96274K wps
[Epoch 96 Batch 90/162] avg loss 0.000536387, throughput 3.96672K wps
[Epoch 96 Batch 120/162] avg loss 0.000558587, throughput 3.96948K wps
[Epoch 96 Batch 150/162] avg loss 0.000419301, throughput 3.96998K wps
Begin Testing...
[Epoch 96] train avg loss 0.000522334, dev acc 0.9133, dev avg loss 0.250219, throughput 3.98339K wps
[Epoch 97 Batch 30/162] avg loss 0.000472716, throughput 4.06287K wps
[Epoch 97 Batch 60/162] avg loss 0.000494482, throughput 3.96346K wps
[Epoch 97 Batch 90/162] avg loss 0.000549062, throughput 3.95628K wps
[Epoch 97 Batch 120/162] avg loss 0.000588137, throughput 3.96784K wps
[Epoch 97 Batch 150/162] avg loss 0.000568689, throughput 3.97498K wps
Begin Testing...
[Epoch 97] train avg loss 0.000542349, dev acc 0.9156, dev avg loss 0.250859, throughput 3.98406K wps
[Epoch 98 Batch 30/162] avg loss 0.000507956, throughput 4.0655K wps
[Epoch 98 Batch 60/162] avg loss 0.00044286, throughput 3.9688K wps
[Epoch 98 Batch 90/162] avg loss 0.00048622, throughput 3.97125K wps
[Epoch 98 Batch 120/162] avg loss 0.000594586, throughput 3.96209K wps
[Epoch 98 Batch 150/162] avg loss 0.000635366, throughput 3.97301K wps
Begin Testing...
[Epoch 98] train avg loss 0.000539627, dev acc 0.9167, dev avg loss 0.251059, throughput 3.98639K wps
[Epoch 99 Batch 30/162] avg loss 0.000500772, throughput 4.05328K wps
[Epoch 99 Batch 60/162] avg loss 0.000584489, throughput 3.97161K wps
[Epoch 99 Batch 90/162] avg loss 0.000559541, throughput 3.97195K wps
[Epoch 99 Batch 120/162] avg loss 0.000576192, throughput 3.97998K wps
[Epoch 99 Batch 150/162] avg loss 0.000490128, throughput 3.97159K wps
Begin Testing...
[Epoch 99] train avg loss 0.000542088, dev acc 0.9133, dev avg loss 0.251461, throughput 3.98705K wps
[Epoch 100 Batch 30/162] avg loss 0.000483258, throughput 4.04855K wps
[Epoch 100 Batch 60/162] avg loss 0.000461498, throughput 3.9584K wps
[Epoch 100 Batch 90/162] avg loss 0.00050775, throughput 3.97045K wps
[Epoch 100 Batch 120/162] avg loss 0.000472378, throughput 3.97131K wps
[Epoch 100 Batch 150/162] avg loss 0.000510642, throughput 3.9631K wps
Begin Testing...
[Epoch 100] train avg loss 0.000486739, dev acc 0.9144, dev avg loss 0.251521, throughput 3.98093K wps
[Epoch 101 Batch 30/162] avg loss 0.000470763, throughput 4.06789K wps
[Epoch 101 Batch 60/162] avg loss 0.000485713, throughput 3.96476K wps
[Epoch 101 Batch 90/162] avg loss 0.000465775, throughput 3.96681K wps
[Epoch 101 Batch 120/162] avg loss 0.000559803, throughput 3.96984K wps
[Epoch 101 Batch 150/162] avg loss 0.000449126, throughput 3.97024K wps
Begin Testing...
[Epoch 101] train avg loss 0.000488623, dev acc 0.9133, dev avg loss 0.253221, throughput 3.98542K wps
[Epoch 102 Batch 30/162] avg loss 0.000450826, throughput 4.05772K wps
[Epoch 102 Batch 60/162] avg loss 0.000448846, throughput 3.97201K wps
[Epoch 102 Batch 90/162] avg loss 0.000465441, throughput 3.95452K wps
[Epoch 102 Batch 120/162] avg loss 0.000546051, throughput 3.95287K wps
[Epoch 102 Batch 150/162] avg loss 0.000532562, throughput 3.97356K wps
Begin Testing...
[Epoch 102] train avg loss 0.000487437, dev acc 0.9133, dev avg loss 0.253655, throughput 3.98067K wps
[Epoch 103 Batch 30/162] avg loss 0.000499567, throughput 4.06155K wps
[Epoch 103 Batch 60/162] avg loss 0.000451187, throughput 3.96683K wps
[Epoch 103 Batch 90/162] avg loss 0.000501458, throughput 3.9591K wps
[Epoch 103 Batch 120/162] avg loss 0.000475928, throughput 3.94172K wps
[Epoch 103 Batch 150/162] avg loss 0.000495081, throughput 3.96835K wps
Begin Testing...
[Epoch 103] train avg loss 0.000485348, dev acc 0.9133, dev avg loss 0.253641, throughput 3.9783K wps
[Epoch 104 Batch 30/162] avg loss 0.000425101, throughput 4.04982K wps
[Epoch 104 Batch 60/162] avg loss 0.000444245, throughput 3.95924K wps
[Epoch 104 Batch 90/162] avg loss 0.000549579, throughput 3.95464K wps
[Epoch 104 Batch 120/162] avg loss 0.000413554, throughput 3.96231K wps
[Epoch 104 Batch 150/162] avg loss 0.000434184, throughput 3.96393K wps
Begin Testing...
[Epoch 104] train avg loss 0.0004613, dev acc 0.9133, dev avg loss 0.252977, throughput 3.97647K wps
[Epoch 105 Batch 30/162] avg loss 0.000433986, throughput 4.06176K wps
[Epoch 105 Batch 60/162] avg loss 0.000455581, throughput 3.96896K wps
[Epoch 105 Batch 90/162] avg loss 0.000475077, throughput 3.96529K wps
[Epoch 105 Batch 120/162] avg loss 0.000479543, throughput 3.96927K wps
[Epoch 105 Batch 150/162] avg loss 0.000455606, throughput 3.96241K wps
Begin Testing...
[Epoch 105] train avg loss 0.000464262, dev acc 0.9122, dev avg loss 0.253211, throughput 3.98304K wps
[Epoch 106 Batch 30/162] avg loss 0.000444579, throughput 4.05929K wps
[Epoch 106 Batch 60/162] avg loss 0.000481358, throughput 3.9679K wps
[Epoch 106 Batch 90/162] avg loss 0.000447097, throughput 3.95356K wps
[Epoch 106 Batch 120/162] avg loss 0.000459458, throughput 3.96578K wps
[Epoch 106 Batch 150/162] avg loss 0.000420281, throughput 3.95354K wps
Begin Testing...
[Epoch 106] train avg loss 0.00044541, dev acc 0.9122, dev avg loss 0.254516, throughput 3.97777K wps
[Epoch 107 Batch 30/162] avg loss 0.000474066, throughput 4.05547K wps
[Epoch 107 Batch 60/162] avg loss 0.000412324, throughput 3.9689K wps
[Epoch 107 Batch 90/162] avg loss 0.000502602, throughput 3.96764K wps
[Epoch 107 Batch 120/162] avg loss 0.000439181, throughput 3.95027K wps
[Epoch 107 Batch 150/162] avg loss 0.000480413, throughput 3.9554K wps
Begin Testing...
[Epoch 107] train avg loss 0.000463613, dev acc 0.9133, dev avg loss 0.254643, throughput 3.9785K wps
[Epoch 108 Batch 30/162] avg loss 0.000510813, throughput 4.06451K wps
[Epoch 108 Batch 60/162] avg loss 0.000468288, throughput 3.96649K wps
[Epoch 108 Batch 90/162] avg loss 0.000439214, throughput 3.96398K wps
[Epoch 108 Batch 120/162] avg loss 0.000429135, throughput 3.9699K wps
[Epoch 108 Batch 150/162] avg loss 0.000419999, throughput 3.95897K wps
Begin Testing...
[Epoch 108] train avg loss 0.000447243, dev acc 0.9122, dev avg loss 0.256291, throughput 3.98353K wps
[Epoch 109 Batch 30/162] avg loss 0.00043256, throughput 4.06086K wps
[Epoch 109 Batch 60/162] avg loss 0.000433197, throughput 3.96145K wps
[Epoch 109 Batch 90/162] avg loss 0.000372932, throughput 3.9685K wps
[Epoch 109 Batch 120/162] avg loss 0.000463338, throughput 3.97057K wps
[Epoch 109 Batch 150/162] avg loss 0.000479696, throughput 3.96707K wps
Begin Testing...
[Epoch 109] train avg loss 0.000435861, dev acc 0.9111, dev avg loss 0.257219, throughput 3.98338K wps
[Epoch 110 Batch 30/162] avg loss 0.000446479, throughput 4.05539K wps
[Epoch 110 Batch 60/162] avg loss 0.000386943, throughput 3.97011K wps
[Epoch 110 Batch 90/162] avg loss 0.000424319, throughput 3.96501K wps
[Epoch 110 Batch 120/162] avg loss 0.00041812, throughput 3.9718K wps
[Epoch 110 Batch 150/162] avg loss 0.000503781, throughput 3.96947K wps
Begin Testing...
[Epoch 110] train avg loss 0.000432989, dev acc 0.9133, dev avg loss 0.256665, throughput 3.98378K wps
[Epoch 111 Batch 30/162] avg loss 0.00045277, throughput 4.05063K wps
[Epoch 111 Batch 60/162] avg loss 0.000376434, throughput 3.96607K wps
[Epoch 111 Batch 90/162] avg loss 0.00041113, throughput 3.96675K wps
[Epoch 111 Batch 120/162] avg loss 0.000436915, throughput 3.96435K wps
[Epoch 111 Batch 150/162] avg loss 0.00042131, throughput 3.96261K wps
Begin Testing...
[Epoch 111] train avg loss 0.000418943, dev acc 0.9144, dev avg loss 0.256919, throughput 3.98091K wps
[Epoch 112 Batch 30/162] avg loss 0.000389353, throughput 4.04814K wps
[Epoch 112 Batch 60/162] avg loss 0.000412173, throughput 3.97555K wps
[Epoch 112 Batch 90/162] avg loss 0.000413384, throughput 3.96277K wps
[Epoch 112 Batch 120/162] avg loss 0.00036889, throughput 3.9605K wps
[Epoch 112 Batch 150/162] avg loss 0.000395393, throughput 3.96951K wps
Begin Testing...
[Epoch 112] train avg loss 0.000393147, dev acc 0.9133, dev avg loss 0.259093, throughput 3.982K wps
[Epoch 113 Batch 30/162] avg loss 0.000417101, throughput 4.05277K wps
[Epoch 113 Batch 60/162] avg loss 0.00044901, throughput 3.9663K wps
[Epoch 113 Batch 90/162] avg loss 0.00039534, throughput 3.95784K wps
[Epoch 113 Batch 120/162] avg loss 0.000389739, throughput 3.9675K wps
[Epoch 113 Batch 150/162] avg loss 0.000382974, throughput 3.95495K wps
Begin Testing...
[Epoch 113] train avg loss 0.00040839, dev acc 0.9111, dev avg loss 0.259236, throughput 3.97881K wps
[Epoch 114 Batch 30/162] avg loss 0.000387229, throughput 4.04859K wps
[Epoch 114 Batch 60/162] avg loss 0.000402662, throughput 3.95312K wps
[Epoch 114 Batch 90/162] avg loss 0.000380052, throughput 3.96779K wps
[Epoch 114 Batch 120/162] avg loss 0.000416563, throughput 3.97291K wps
[Epoch 114 Batch 150/162] avg loss 0.00031988, throughput 3.96232K wps
Begin Testing...
[Epoch 114] train avg loss 0.000382684, dev acc 0.9122, dev avg loss 0.259937, throughput 3.97963K wps
[Epoch 115 Batch 30/162] avg loss 0.00038549, throughput 4.06136K wps
[Epoch 115 Batch 60/162] avg loss 0.00038134, throughput 3.96988K wps
[Epoch 115 Batch 90/162] avg loss 0.000376282, throughput 3.9544K wps
[Epoch 115 Batch 120/162] avg loss 0.000363526, throughput 3.96889K wps
[Epoch 115 Batch 150/162] avg loss 0.000363941, throughput 3.94642K wps
Begin Testing...
[Epoch 115] train avg loss 0.000374212, dev acc 0.9111, dev avg loss 0.259946, throughput 3.97678K wps
[Epoch 116 Batch 30/162] avg loss 0.000352484, throughput 4.04114K wps
[Epoch 116 Batch 60/162] avg loss 0.000366891, throughput 3.96114K wps
[Epoch 116 Batch 90/162] avg loss 0.00038137, throughput 3.96783K wps
[Epoch 116 Batch 120/162] avg loss 0.000378107, throughput 3.95977K wps
[Epoch 116 Batch 150/162] avg loss 0.000368179, throughput 3.97079K wps
Begin Testing...
[Epoch 116] train avg loss 0.00037169, dev acc 0.9122, dev avg loss 0.26138, throughput 3.97938K wps
[Epoch 117 Batch 30/162] avg loss 0.000385975, throughput 4.0607K wps
[Epoch 117 Batch 60/162] avg loss 0.000403294, throughput 3.94857K wps
[Epoch 117 Batch 90/162] avg loss 0.000326666, throughput 3.96638K wps
[Epoch 117 Batch 120/162] avg loss 0.000369738, throughput 3.9477K wps
[Epoch 117 Batch 150/162] avg loss 0.000334531, throughput 3.9577K wps
Begin Testing...
[Epoch 117] train avg loss 0.000363937, dev acc 0.9122, dev avg loss 0.261507, throughput 3.97435K wps
[Epoch 118 Batch 30/162] avg loss 0.000381012, throughput 4.06515K wps
[Epoch 118 Batch 60/162] avg loss 0.000318945, throughput 3.96509K wps
[Epoch 118 Batch 90/162] avg loss 0.000340018, throughput 3.96342K wps
[Epoch 118 Batch 120/162] avg loss 0.000405588, throughput 3.96658K wps
[Epoch 118 Batch 150/162] avg loss 0.000328387, throughput 3.95929K wps
Begin Testing...
[Epoch 118] train avg loss 0.000356519, dev acc 0.9133, dev avg loss 0.262362, throughput 3.98229K wps
[Epoch 119 Batch 30/162] avg loss 0.000362482, throughput 4.06574K wps
[Epoch 119 Batch 60/162] avg loss 0.00037369, throughput 3.96269K wps
[Epoch 119 Batch 90/162] avg loss 0.000369032, throughput 3.96103K wps
[Epoch 119 Batch 120/162] avg loss 0.000364197, throughput 3.9708K wps
[Epoch 119 Batch 150/162] avg loss 0.000379769, throughput 3.96217K wps
Begin Testing...
[Epoch 119] train avg loss 0.000364454, dev acc 0.9122, dev avg loss 0.263077, throughput 3.98313K wps
[Epoch 120 Batch 30/162] avg loss 0.000328392, throughput 4.07056K wps
[Epoch 120 Batch 60/162] avg loss 0.000370645, throughput 3.96568K wps
[Epoch 120 Batch 90/162] avg loss 0.000333917, throughput 3.95771K wps
[Epoch 120 Batch 120/162] avg loss 0.000318121, throughput 3.9654K wps
[Epoch 120 Batch 150/162] avg loss 0.000362558, throughput 3.96913K wps
Begin Testing...
[Epoch 120] train avg loss 0.000348262, dev acc 0.9122, dev avg loss 0.265498, throughput 3.98444K wps
[Epoch 121 Batch 30/162] avg loss 0.000357585, throughput 4.05386K wps
[Epoch 121 Batch 60/162] avg loss 0.000324921, throughput 3.95662K wps
[Epoch 121 Batch 90/162] avg loss 0.000381819, throughput 3.96299K wps
[Epoch 121 Batch 120/162] avg loss 0.000284665, throughput 3.95751K wps
[Epoch 121 Batch 150/162] avg loss 0.00037344, throughput 3.96757K wps
Begin Testing...
[Epoch 121] train avg loss 0.000348672, dev acc 0.9122, dev avg loss 0.264827, throughput 3.97821K wps
[Epoch 122 Batch 30/162] avg loss 0.00033577, throughput 4.07013K wps
[Epoch 122 Batch 60/162] avg loss 0.000359363, throughput 3.96139K wps
[Epoch 122 Batch 90/162] avg loss 0.000329665, throughput 3.96791K wps
[Epoch 122 Batch 120/162] avg loss 0.000322668, throughput 3.95916K wps
[Epoch 122 Batch 150/162] avg loss 0.000375997, throughput 3.96118K wps
Begin Testing...
[Epoch 122] train avg loss 0.000345101, dev acc 0.9100, dev avg loss 0.266098, throughput 3.98273K wps
[Epoch 123 Batch 30/162] avg loss 0.000306323, throughput 4.05523K wps
[Epoch 123 Batch 60/162] avg loss 0.000300852, throughput 3.9699K wps
[Epoch 123 Batch 90/162] avg loss 0.000313096, throughput 3.95814K wps
[Epoch 123 Batch 120/162] avg loss 0.000303608, throughput 3.96631K wps
[Epoch 123 Batch 150/162] avg loss 0.000374994, throughput 3.96308K wps
Begin Testing...
[Epoch 123] train avg loss 0.000325167, dev acc 0.9133, dev avg loss 0.264354, throughput 3.97995K wps
[Epoch 124 Batch 30/162] avg loss 0.000361561, throughput 4.0497K wps
[Epoch 124 Batch 60/162] avg loss 0.000307436, throughput 3.96723K wps
[Epoch 124 Batch 90/162] avg loss 0.000342864, throughput 3.95676K wps
[Epoch 124 Batch 120/162] avg loss 0.000317267, throughput 3.96633K wps
[Epoch 124 Batch 150/162] avg loss 0.000307642, throughput 3.97017K wps
Begin Testing...
[Epoch 124] train avg loss 0.000326052, dev acc 0.9111, dev avg loss 0.266004, throughput 3.98037K wps
[Epoch 125 Batch 30/162] avg loss 0.000365285, throughput 4.06199K wps
[Epoch 125 Batch 60/162] avg loss 0.000282544, throughput 3.97112K wps
[Epoch 125 Batch 90/162] avg loss 0.000341931, throughput 3.96623K wps
[Epoch 125 Batch 120/162] avg loss 0.000333302, throughput 3.95931K wps
[Epoch 125 Batch 150/162] avg loss 0.000292541, throughput 3.97231K wps
Begin Testing...
[Epoch 125] train avg loss 0.000326403, dev acc 0.9122, dev avg loss 0.265951, throughput 3.98459K wps
[Epoch 126 Batch 30/162] avg loss 0.000324803, throughput 4.04335K wps
[Epoch 126 Batch 60/162] avg loss 0.000361001, throughput 3.96303K wps
[Epoch 126 Batch 90/162] avg loss 0.000305927, throughput 3.97502K wps
[Epoch 126 Batch 120/162] avg loss 0.000373847, throughput 3.96683K wps
[Epoch 126 Batch 150/162] avg loss 0.000303023, throughput 3.97178K wps
Begin Testing...
[Epoch 126] train avg loss 0.000333597, dev acc 0.9122, dev avg loss 0.266683, throughput 3.98281K wps
[Epoch 127 Batch 30/162] avg loss 0.000368286, throughput 4.06156K wps
[Epoch 127 Batch 60/162] avg loss 0.000319377, throughput 3.95457K wps
[Epoch 127 Batch 90/162] avg loss 0.000322974, throughput 3.96203K wps
[Epoch 127 Batch 120/162] avg loss 0.000269196, throughput 3.95616K wps
[Epoch 127 Batch 150/162] avg loss 0.000320043, throughput 3.92926K wps
Begin Testing...
[Epoch 127] train avg loss 0.000315914, dev acc 0.9111, dev avg loss 0.267576, throughput 3.97029K wps
[Epoch 128 Batch 30/162] avg loss 0.000363224, throughput 4.06748K wps
[Epoch 128 Batch 60/162] avg loss 0.000315146, throughput 3.96326K wps
[Epoch 128 Batch 90/162] avg loss 0.000340435, throughput 3.96401K wps
[Epoch 128 Batch 120/162] avg loss 0.000255268, throughput 3.97046K wps
[Epoch 128 Batch 150/162] avg loss 0.000305058, throughput 3.96708K wps
Begin Testing...
[Epoch 128] train avg loss 0.000317293, dev acc 0.9144, dev avg loss 0.267248, throughput 3.98488K wps
[Epoch 129 Batch 30/162] avg loss 0.00030438, throughput 4.03581K wps
[Epoch 129 Batch 60/162] avg loss 0.000309175, throughput 3.97288K wps
[Epoch 129 Batch 90/162] avg loss 0.000327608, throughput 3.94544K wps
[Epoch 129 Batch 120/162] avg loss 0.000280849, throughput 3.95146K wps
[Epoch 129 Batch 150/162] avg loss 0.000301714, throughput 3.96871K wps
Begin Testing...
[Epoch 129] train avg loss 0.0003028, dev acc 0.9133, dev avg loss 0.268358, throughput 3.973K wps
[Epoch 130 Batch 30/162] avg loss 0.000337147, throughput 4.06301K wps
[Epoch 130 Batch 60/162] avg loss 0.000301945, throughput 3.95845K wps
[Epoch 130 Batch 90/162] avg loss 0.000300835, throughput 3.96252K wps
[Epoch 130 Batch 120/162] avg loss 0.000294902, throughput 3.95719K wps
[Epoch 130 Batch 150/162] avg loss 0.000341812, throughput 3.96157K wps
Begin Testing...
[Epoch 130] train avg loss 0.000315022, dev acc 0.9122, dev avg loss 0.268959, throughput 3.97786K wps
[Epoch 131 Batch 30/162] avg loss 0.000329414, throughput 4.04117K wps
[Epoch 131 Batch 60/162] avg loss 0.000331108, throughput 3.95194K wps
[Epoch 131 Batch 90/162] avg loss 0.000273144, throughput 3.95689K wps
[Epoch 131 Batch 120/162] avg loss 0.000307832, throughput 3.9715K wps
[Epoch 131 Batch 150/162] avg loss 0.000333742, throughput 3.9623K wps
Begin Testing...
[Epoch 131] train avg loss 0.000310385, dev acc 0.9133, dev avg loss 0.26872, throughput 3.97517K wps
[Epoch 132 Batch 30/162] avg loss 0.000299046, throughput 4.06584K wps
[Epoch 132 Batch 60/162] avg loss 0.000254618, throughput 3.96903K wps
[Epoch 132 Batch 90/162] avg loss 0.000359089, throughput 3.95034K wps
[Epoch 132 Batch 120/162] avg loss 0.000279692, throughput 3.96659K wps
[Epoch 132 Batch 150/162] avg loss 0.000337808, throughput 3.95712K wps
Begin Testing...
[Epoch 132] train avg loss 0.000305409, dev acc 0.9133, dev avg loss 0.26911, throughput 3.97994K wps
[Epoch 133 Batch 30/162] avg loss 0.000313207, throughput 4.06545K wps
[Epoch 133 Batch 60/162] avg loss 0.000310486, throughput 3.96731K wps
[Epoch 133 Batch 90/162] avg loss 0.000265902, throughput 3.9609K wps
[Epoch 133 Batch 120/162] avg loss 0.000282379, throughput 3.96921K wps
[Epoch 133 Batch 150/162] avg loss 0.000287808, throughput 3.9699K wps
Begin Testing...
[Epoch 133] train avg loss 0.000288044, dev acc 0.9122, dev avg loss 0.270709, throughput 3.9844K wps
[Epoch 134 Batch 30/162] avg loss 0.000295862, throughput 4.05354K wps
[Epoch 134 Batch 60/162] avg loss 0.00028877, throughput 3.96477K wps
[Epoch 134 Batch 90/162] avg loss 0.000301311, throughput 3.94804K wps
[Epoch 134 Batch 120/162] avg loss 0.000316824, throughput 3.95225K wps
[Epoch 134 Batch 150/162] avg loss 0.000237258, throughput 3.96243K wps
Begin Testing...
[Epoch 134] train avg loss 0.000284847, dev acc 0.9133, dev avg loss 0.270339, throughput 3.97516K wps
[Epoch 135 Batch 30/162] avg loss 0.000297028, throughput 4.06108K wps
[Epoch 135 Batch 60/162] avg loss 0.000275326, throughput 3.97266K wps
[Epoch 135 Batch 90/162] avg loss 0.000298014, throughput 3.96194K wps
[Epoch 135 Batch 120/162] avg loss 0.000300991, throughput 3.95688K wps
[Epoch 135 Batch 150/162] avg loss 0.000284016, throughput 3.95655K wps
Begin Testing...
[Epoch 135] train avg loss 0.000294389, dev acc 0.9122, dev avg loss 0.27061, throughput 3.98005K wps
[Epoch 136 Batch 30/162] avg loss 0.000287832, throughput 4.05341K wps
[Epoch 136 Batch 60/162] avg loss 0.000323974, throughput 3.96224K wps
[Epoch 136 Batch 90/162] avg loss 0.000263291, throughput 3.95973K wps
[Epoch 136 Batch 120/162] avg loss 0.000310277, throughput 3.96216K wps
[Epoch 136 Batch 150/162] avg loss 0.000265919, throughput 3.96829K wps
Begin Testing...
[Epoch 136] train avg loss 0.000290206, dev acc 0.9133, dev avg loss 0.271035, throughput 3.97936K wps
[Epoch 137 Batch 30/162] avg loss 0.000284416, throughput 4.06048K wps
[Epoch 137 Batch 60/162] avg loss 0.000282725, throughput 3.97507K wps
[Epoch 137 Batch 90/162] avg loss 0.000293705, throughput 3.95272K wps
[Epoch 137 Batch 120/162] avg loss 0.000266466, throughput 3.96161K wps
[Epoch 137 Batch 150/162] avg loss 0.000282994, throughput 3.96436K wps
Begin Testing...
[Epoch 137] train avg loss 0.000277176, dev acc 0.9133, dev avg loss 0.271376, throughput 3.98034K wps
[Epoch 138 Batch 30/162] avg loss 0.000293843, throughput 4.05459K wps
[Epoch 138 Batch 60/162] avg loss 0.000293531, throughput 3.97148K wps
[Epoch 138 Batch 90/162] avg loss 0.000296454, throughput 3.96595K wps
[Epoch 138 Batch 120/162] avg loss 0.00028898, throughput 3.96019K wps
[Epoch 138 Batch 150/162] avg loss 0.00024089, throughput 3.96783K wps
Begin Testing...
[Epoch 138] train avg loss 0.000274939, dev acc 0.9111, dev avg loss 0.272815, throughput 3.98306K wps
[Epoch 139 Batch 30/162] avg loss 0.00023371, throughput 4.05609K wps
[Epoch 139 Batch 60/162] avg loss 0.000236791, throughput 3.96553K wps
[Epoch 139 Batch 90/162] avg loss 0.000266803, throughput 3.97444K wps
[Epoch 139 Batch 120/162] avg loss 0.000239621, throughput 3.94567K wps
[Epoch 139 Batch 150/162] avg loss 0.000259673, throughput 3.96548K wps
Begin Testing...
[Epoch 139] train avg loss 0.000246758, dev acc 0.9122, dev avg loss 0.27271, throughput 3.9797K wps
[Epoch 140 Batch 30/162] avg loss 0.000287095, throughput 4.06159K wps
[Epoch 140 Batch 60/162] avg loss 0.000250348, throughput 3.96988K wps
[Epoch 140 Batch 90/162] avg loss 0.000263204, throughput 3.96593K wps
[Epoch 140 Batch 120/162] avg loss 0.000301965, throughput 3.96866K wps
[Epoch 140 Batch 150/162] avg loss 0.000258381, throughput 3.96154K wps
Begin Testing...
[Epoch 140] train avg loss 0.000274946, dev acc 0.9122, dev avg loss 0.27327, throughput 3.98416K wps
[Epoch 141 Batch 30/162] avg loss 0.000285535, throughput 4.06525K wps
[Epoch 141 Batch 60/162] avg loss 0.00025051, throughput 3.96358K wps
[Epoch 141 Batch 90/162] avg loss 0.000242051, throughput 3.96312K wps
[Epoch 141 Batch 120/162] avg loss 0.000288645, throughput 3.95921K wps
[Epoch 141 Batch 150/162] avg loss 0.000296275, throughput 3.96733K wps
Begin Testing...
[Epoch 141] train avg loss 0.00026881, dev acc 0.9133, dev avg loss 0.272865, throughput 3.98193K wps
[Epoch 142 Batch 30/162] avg loss 0.000196163, throughput 4.05596K wps
[Epoch 142 Batch 60/162] avg loss 0.000297248, throughput 3.9626K wps
[Epoch 142 Batch 90/162] avg loss 0.000291254, throughput 3.95495K wps
[Epoch 142 Batch 120/162] avg loss 0.00025122, throughput 3.96457K wps
[Epoch 142 Batch 150/162] avg loss 0.000260211, throughput 3.96732K wps
Begin Testing...
[Epoch 142] train avg loss 0.000257416, dev acc 0.9133, dev avg loss 0.273512, throughput 3.97954K wps
[Epoch 143 Batch 30/162] avg loss 0.000293267, throughput 4.06026K wps
[Epoch 143 Batch 60/162] avg loss 0.00026501, throughput 3.97162K wps
[Epoch 143 Batch 90/162] avg loss 0.000260032, throughput 3.9703K wps
[Epoch 143 Batch 120/162] avg loss 0.000261584, throughput 3.96586K wps
[Epoch 143 Batch 150/162] avg loss 0.000217127, throughput 3.96009K wps
Begin Testing...
[Epoch 143] train avg loss 0.00025575, dev acc 0.9122, dev avg loss 0.274008, throughput 3.98341K wps
[Epoch 144 Batch 30/162] avg loss 0.000249927, throughput 4.06339K wps
[Epoch 144 Batch 60/162] avg loss 0.000259265, throughput 3.95173K wps
[Epoch 144 Batch 90/162] avg loss 0.000227782, throughput 3.97258K wps
[Epoch 144 Batch 120/162] avg loss 0.000214078, throughput 3.96994K wps
[Epoch 144 Batch 150/162] avg loss 0.000237312, throughput 3.96654K wps
Begin Testing...
[Epoch 144] train avg loss 0.000242394, dev acc 0.9133, dev avg loss 0.275014, throughput 3.9827K wps
[Epoch 145 Batch 30/162] avg loss 0.000249906, throughput 4.06251K wps
[Epoch 145 Batch 60/162] avg loss 0.000279477, throughput 3.96547K wps
[Epoch 145 Batch 90/162] avg loss 0.000252899, throughput 3.95239K wps
[Epoch 145 Batch 120/162] avg loss 0.000332792, throughput 3.9667K wps
[Epoch 145 Batch 150/162] avg loss 0.00023521, throughput 3.96983K wps
Begin Testing...
[Epoch 145] train avg loss 0.000265953, dev acc 0.9111, dev avg loss 0.276755, throughput 3.98055K wps
[Epoch 146 Batch 30/162] avg loss 0.000246396, throughput 4.0669K wps
[Epoch 146 Batch 60/162] avg loss 0.000203708, throughput 3.96973K wps
[Epoch 146 Batch 90/162] avg loss 0.000230752, throughput 3.96751K wps
[Epoch 146 Batch 120/162] avg loss 0.000249806, throughput 3.96834K wps
[Epoch 146 Batch 150/162] avg loss 0.00025443, throughput 3.96633K wps
Begin Testing...
[Epoch 146] train avg loss 0.000234555, dev acc 0.9111, dev avg loss 0.277116, throughput 3.98576K wps
[Epoch 147 Batch 30/162] avg loss 0.00024503, throughput 4.06175K wps
[Epoch 147 Batch 60/162] avg loss 0.000225169, throughput 3.97005K wps
[Epoch 147 Batch 90/162] avg loss 0.000253285, throughput 3.96243K wps
[Epoch 147 Batch 120/162] avg loss 0.000216628, throughput 3.9554K wps
[Epoch 147 Batch 150/162] avg loss 0.000209311, throughput 3.96677K wps
Begin Testing...
[Epoch 147] train avg loss 0.000229373, dev acc 0.9122, dev avg loss 0.276821, throughput 3.98189K wps
[Epoch 148 Batch 30/162] avg loss 0.000230921, throughput 4.04534K wps
[Epoch 148 Batch 60/162] avg loss 0.000217041, throughput 3.93065K wps
[Epoch 148 Batch 90/162] avg loss 0.000245064, throughput 3.97018K wps
[Epoch 148 Batch 120/162] avg loss 0.000217085, throughput 3.96305K wps
[Epoch 148 Batch 150/162] avg loss 0.00025498, throughput 3.97002K wps
Begin Testing...
[Epoch 148] train avg loss 0.000229655, dev acc 0.9122, dev avg loss 0.277136, throughput 3.97483K wps
[Epoch 149 Batch 30/162] avg loss 0.000209703, throughput 4.06495K wps
[Epoch 149 Batch 60/162] avg loss 0.000288082, throughput 3.94876K wps
[Epoch 149 Batch 90/162] avg loss 0.000252237, throughput 3.96633K wps
[Epoch 149 Batch 120/162] avg loss 0.000235585, throughput 3.96974K wps
[Epoch 149 Batch 150/162] avg loss 0.000203326, throughput 3.96054K wps
Begin Testing...
[Epoch 149] train avg loss 0.000240254, dev acc 0.9100, dev avg loss 0.279214, throughput 3.97975K wps
[Epoch 150 Batch 30/162] avg loss 0.000208632, throughput 4.0614K wps
[Epoch 150 Batch 60/162] avg loss 0.000224094, throughput 3.97557K wps
[Epoch 150 Batch 90/162] avg loss 0.000243438, throughput 3.96759K wps
[Epoch 150 Batch 120/162] avg loss 0.000222237, throughput 3.96106K wps
[Epoch 150 Batch 150/162] avg loss 0.000230725, throughput 3.96077K wps
Begin Testing...
[Epoch 150] train avg loss 0.000230821, dev acc 0.9111, dev avg loss 0.28173, throughput 3.98128K wps
[Epoch 151 Batch 30/162] avg loss 0.000285208, throughput 4.06799K wps
[Epoch 151 Batch 60/162] avg loss 0.000183801, throughput 3.96464K wps
[Epoch 151 Batch 90/162] avg loss 0.00023042, throughput 3.92914K wps
[Epoch 151 Batch 120/162] avg loss 0.000242803, throughput 3.96334K wps
[Epoch 151 Batch 150/162] avg loss 0.000214334, throughput 3.96976K wps
Begin Testing...
[Epoch 151] train avg loss 0.00023332, dev acc 0.9111, dev avg loss 0.278296, throughput 3.9774K wps
[Epoch 152 Batch 30/162] avg loss 0.000245077, throughput 4.05971K wps
[Epoch 152 Batch 60/162] avg loss 0.000255482, throughput 3.9698K wps
[Epoch 152 Batch 90/162] avg loss 0.000225098, throughput 3.9715K wps
[Epoch 152 Batch 120/162] avg loss 0.000234712, throughput 3.9633K wps
[Epoch 152 Batch 150/162] avg loss 0.000231387, throughput 3.9629K wps
Begin Testing...
[Epoch 152] train avg loss 0.000238108, dev acc 0.9122, dev avg loss 0.278315, throughput 3.98382K wps
[Epoch 153 Batch 30/162] avg loss 0.000241282, throughput 4.04564K wps
[Epoch 153 Batch 60/162] avg loss 0.000222533, throughput 3.95545K wps
[Epoch 153 Batch 90/162] avg loss 0.00023099, throughput 3.96474K wps
[Epoch 153 Batch 120/162] avg loss 0.000214246, throughput 3.96798K wps
[Epoch 153 Batch 150/162] avg loss 0.000210934, throughput 3.96926K wps
Begin Testing...
[Epoch 153] train avg loss 0.000222158, dev acc 0.9111, dev avg loss 0.2797, throughput 3.97913K wps
[Epoch 154 Batch 30/162] avg loss 0.000252195, throughput 4.05927K wps
[Epoch 154 Batch 60/162] avg loss 0.00019601, throughput 3.95632K wps
[Epoch 154 Batch 90/162] avg loss 0.00019816, throughput 3.96773K wps
[Epoch 154 Batch 120/162] avg loss 0.00020243, throughput 3.96261K wps
[Epoch 154 Batch 150/162] avg loss 0.000227606, throughput 3.94976K wps
Begin Testing...
[Epoch 154] train avg loss 0.000214146, dev acc 0.9111, dev avg loss 0.280867, throughput 3.97725K wps
[Epoch 155 Batch 30/162] avg loss 0.000220064, throughput 4.05541K wps
[Epoch 155 Batch 60/162] avg loss 0.000206332, throughput 3.96794K wps
[Epoch 155 Batch 90/162] avg loss 0.000231004, throughput 3.96109K wps
[Epoch 155 Batch 120/162] avg loss 0.000213746, throughput 3.96122K wps
[Epoch 155 Batch 150/162] avg loss 0.000255822, throughput 3.96529K wps
Begin Testing...
[Epoch 155] train avg loss 0.000222591, dev acc 0.9100, dev avg loss 0.279343, throughput 3.98142K wps
[Epoch 156 Batch 30/162] avg loss 0.000238299, throughput 4.0593K wps
[Epoch 156 Batch 60/162] avg loss 0.000205229, throughput 3.95536K wps
[Epoch 156 Batch 90/162] avg loss 0.00019035, throughput 3.9638K wps
[Epoch 156 Batch 120/162] avg loss 0.00019702, throughput 3.96243K wps
[Epoch 156 Batch 150/162] avg loss 0.000234411, throughput 3.96227K wps
Begin Testing...
[Epoch 156] train avg loss 0.00021079, dev acc 0.9100, dev avg loss 0.280599, throughput 3.97887K wps
[Epoch 157 Batch 30/162] avg loss 0.000194965, throughput 4.06654K wps
[Epoch 157 Batch 60/162] avg loss 0.000229683, throughput 3.96447K wps
[Epoch 157 Batch 90/162] avg loss 0.000207894, throughput 3.96205K wps
[Epoch 157 Batch 120/162] avg loss 0.000222481, throughput 3.95822K wps
[Epoch 157 Batch 150/162] avg loss 0.000223684, throughput 3.95765K wps
Begin Testing...
[Epoch 157] train avg loss 0.00021742, dev acc 0.9111, dev avg loss 0.28125, throughput 3.9808K wps
[Epoch 158 Batch 30/162] avg loss 0.000225645, throughput 4.0636K wps
[Epoch 158 Batch 60/162] avg loss 0.000209046, throughput 3.95964K wps
[Epoch 158 Batch 90/162] avg loss 0.000232933, throughput 3.95939K wps
[Epoch 158 Batch 120/162] avg loss 0.000220676, throughput 3.96253K wps
[Epoch 158 Batch 150/162] avg loss 0.000203529, throughput 3.96817K wps
Begin Testing...
[Epoch 158] train avg loss 0.00021343, dev acc 0.9111, dev avg loss 0.28183, throughput 3.98046K wps
[Epoch 159 Batch 30/162] avg loss 0.00020617, throughput 4.06053K wps
[Epoch 159 Batch 60/162] avg loss 0.000237315, throughput 3.9679K wps
[Epoch 159 Batch 90/162] avg loss 0.000197427, throughput 3.95843K wps
[Epoch 159 Batch 120/162] avg loss 0.000181232, throughput 3.96986K wps
[Epoch 159 Batch 150/162] avg loss 0.000200353, throughput 3.96659K wps
Begin Testing...
[Epoch 159] train avg loss 0.000205385, dev acc 0.9122, dev avg loss 0.281691, throughput 3.98225K wps
[Epoch 160 Batch 30/162] avg loss 0.000179891, throughput 4.06113K wps
[Epoch 160 Batch 60/162] avg loss 0.000182207, throughput 3.96434K wps
[Epoch 160 Batch 90/162] avg loss 0.000195388, throughput 3.96351K wps
[Epoch 160 Batch 120/162] avg loss 0.000225769, throughput 3.96587K wps
[Epoch 160 Batch 150/162] avg loss 0.000210858, throughput 3.97162K wps
Begin Testing...
[Epoch 160] train avg loss 0.000203031, dev acc 0.9122, dev avg loss 0.282181, throughput 3.98366K wps
[Epoch 161 Batch 30/162] avg loss 0.000217912, throughput 4.0497K wps
[Epoch 161 Batch 60/162] avg loss 0.000197265, throughput 3.96889K wps
[Epoch 161 Batch 90/162] avg loss 0.000200799, throughput 3.96021K wps
[Epoch 161 Batch 120/162] avg loss 0.000212203, throughput 3.96535K wps
[Epoch 161 Batch 150/162] avg loss 0.000216474, throughput 3.96975K wps
Begin Testing...
[Epoch 161] train avg loss 0.000212495, dev acc 0.9122, dev avg loss 0.282766, throughput 3.9816K wps
[Epoch 162 Batch 30/162] avg loss 0.000177154, throughput 4.05963K wps
[Epoch 162 Batch 60/162] avg loss 0.000208011, throughput 3.97102K wps
[Epoch 162 Batch 90/162] avg loss 0.000198618, throughput 3.96642K wps
[Epoch 162 Batch 120/162] avg loss 0.000249618, throughput 3.96458K wps
[Epoch 162 Batch 150/162] avg loss 0.000181598, throughput 3.96292K wps
Begin Testing...
[Epoch 162] train avg loss 0.000204718, dev acc 0.9122, dev avg loss 0.284415, throughput 3.98306K wps
[Epoch 163 Batch 30/162] avg loss 0.000218274, throughput 4.0462K wps
[Epoch 163 Batch 60/162] avg loss 0.000184344, throughput 3.94715K wps
[Epoch 163 Batch 90/162] avg loss 0.000197826, throughput 3.96736K wps
[Epoch 163 Batch 120/162] avg loss 0.000191189, throughput 3.96258K wps
[Epoch 163 Batch 150/162] avg loss 0.000180939, throughput 3.9621K wps
Begin Testing...
[Epoch 163] train avg loss 0.000194576, dev acc 0.9111, dev avg loss 0.283214, throughput 3.97615K wps
[Epoch 164 Batch 30/162] avg loss 0.000179553, throughput 4.04833K wps
[Epoch 164 Batch 60/162] avg loss 0.000202025, throughput 3.97455K wps
[Epoch 164 Batch 90/162] avg loss 0.000207713, throughput 3.95753K wps
[Epoch 164 Batch 120/162] avg loss 0.000180121, throughput 3.95584K wps
[Epoch 164 Batch 150/162] avg loss 0.000174722, throughput 3.96749K wps
Begin Testing...
[Epoch 164] train avg loss 0.00018864, dev acc 0.9100, dev avg loss 0.284027, throughput 3.97933K wps
[Epoch 165 Batch 30/162] avg loss 0.000189448, throughput 4.06501K wps
[Epoch 165 Batch 60/162] avg loss 0.000182415, throughput 3.96548K wps
[Epoch 165 Batch 90/162] avg loss 0.000238556, throughput 3.96611K wps
[Epoch 165 Batch 120/162] avg loss 0.000197077, throughput 3.96159K wps
[Epoch 165 Batch 150/162] avg loss 0.0002319, throughput 3.97029K wps
Begin Testing...
[Epoch 165] train avg loss 0.000204037, dev acc 0.9111, dev avg loss 0.285041, throughput 3.98317K wps
[Epoch 166 Batch 30/162] avg loss 0.000196634, throughput 4.05567K wps
[Epoch 166 Batch 60/162] avg loss 0.000225514, throughput 3.9652K wps
[Epoch 166 Batch 90/162] avg loss 0.000219118, throughput 3.96729K wps
[Epoch 166 Batch 120/162] avg loss 0.000190914, throughput 3.96135K wps
[Epoch 166 Batch 150/162] avg loss 0.000218467, throughput 3.96658K wps
Begin Testing...
[Epoch 166] train avg loss 0.000208286, dev acc 0.9111, dev avg loss 0.284103, throughput 3.98217K wps
[Epoch 167 Batch 30/162] avg loss 0.000209437, throughput 4.06723K wps
[Epoch 167 Batch 60/162] avg loss 0.000164783, throughput 3.9679K wps
[Epoch 167 Batch 90/162] avg loss 0.000185919, throughput 3.9662K wps
[Epoch 167 Batch 120/162] avg loss 0.00017631, throughput 3.96027K wps
[Epoch 167 Batch 150/162] avg loss 0.000217162, throughput 3.95985K wps
Begin Testing...
[Epoch 167] train avg loss 0.00019094, dev acc 0.9111, dev avg loss 0.284963, throughput 3.98175K wps
[Epoch 168 Batch 30/162] avg loss 0.000195164, throughput 4.06893K wps
[Epoch 168 Batch 60/162] avg loss 0.000165451, throughput 3.96478K wps
[Epoch 168 Batch 90/162] avg loss 0.000183776, throughput 3.9664K wps
[Epoch 168 Batch 120/162] avg loss 0.000185753, throughput 3.96803K wps
[Epoch 168 Batch 150/162] avg loss 0.000197329, throughput 3.95514K wps
Begin Testing...
[Epoch 168] train avg loss 0.0001867, dev acc 0.9111, dev avg loss 0.287403, throughput 3.98182K wps
[Epoch 169 Batch 30/162] avg loss 0.00017335, throughput 4.05533K wps
[Epoch 169 Batch 60/162] avg loss 0.000189476, throughput 3.96393K wps
[Epoch 169 Batch 90/162] avg loss 0.00020599, throughput 3.97127K wps
[Epoch 169 Batch 120/162] avg loss 0.000196279, throughput 3.96292K wps
[Epoch 169 Batch 150/162] avg loss 0.000169499, throughput 3.9602K wps
Begin Testing...
[Epoch 169] train avg loss 0.00018952, dev acc 0.9111, dev avg loss 0.28705, throughput 3.98089K wps
[Epoch 170 Batch 30/162] avg loss 0.000160955, throughput 4.04253K wps
[Epoch 170 Batch 60/162] avg loss 0.000172161, throughput 3.9695K wps
[Epoch 170 Batch 90/162] avg loss 0.000208335, throughput 3.96955K wps
[Epoch 170 Batch 120/162] avg loss 0.000174425, throughput 3.96224K wps
[Epoch 170 Batch 150/162] avg loss 0.000201569, throughput 3.95588K wps
Begin Testing...
[Epoch 170] train avg loss 0.000181874, dev acc 0.9122, dev avg loss 0.286807, throughput 3.97745K wps
[Epoch 171 Batch 30/162] avg loss 0.000183457, throughput 4.06667K wps
[Epoch 171 Batch 60/162] avg loss 0.000207235, throughput 3.95035K wps
[Epoch 171 Batch 90/162] avg loss 0.00016232, throughput 3.96986K wps
[Epoch 171 Batch 120/162] avg loss 0.000163162, throughput 3.9675K wps
[Epoch 171 Batch 150/162] avg loss 0.000177221, throughput 3.96153K wps
Begin Testing...
[Epoch 171] train avg loss 0.000181522, dev acc 0.9122, dev avg loss 0.287826, throughput 3.98131K wps
[Epoch 172 Batch 30/162] avg loss 0.000207004, throughput 4.05023K wps
[Epoch 172 Batch 60/162] avg loss 0.000158188, throughput 3.96114K wps
[Epoch 172 Batch 90/162] avg loss 0.00018184, throughput 3.97196K wps
[Epoch 172 Batch 120/162] avg loss 0.000151456, throughput 3.97293K wps
[Epoch 172 Batch 150/162] avg loss 0.000159483, throughput 3.96786K wps
Begin Testing...
[Epoch 172] train avg loss 0.000173057, dev acc 0.9100, dev avg loss 0.286652, throughput 3.98232K wps
[Epoch 173 Batch 30/162] avg loss 0.00017947, throughput 4.05203K wps
[Epoch 173 Batch 60/162] avg loss 0.00017459, throughput 3.96511K wps
[Epoch 173 Batch 90/162] avg loss 0.000212713, throughput 3.96156K wps
[Epoch 173 Batch 120/162] avg loss 0.000163979, throughput 3.96083K wps
[Epoch 173 Batch 150/162] avg loss 0.0001597, throughput 3.96874K wps
Begin Testing...
[Epoch 173] train avg loss 0.000178317, dev acc 0.9100, dev avg loss 0.287719, throughput 3.9799K wps
[Epoch 174 Batch 30/162] avg loss 0.000165608, throughput 4.05007K wps
[Epoch 174 Batch 60/162] avg loss 0.000182333, throughput 3.9666K wps
[Epoch 174 Batch 90/162] avg loss 0.000153898, throughput 3.9741K wps
[Epoch 174 Batch 120/162] avg loss 0.000179472, throughput 3.9614K wps
[Epoch 174 Batch 150/162] avg loss 0.000180824, throughput 3.9685K wps
Begin Testing...
[Epoch 174] train avg loss 0.000173116, dev acc 0.9111, dev avg loss 0.287268, throughput 3.98252K wps
[Epoch 175 Batch 30/162] avg loss 0.000196313, throughput 4.04766K wps
[Epoch 175 Batch 60/162] avg loss 0.000164437, throughput 3.96457K wps
[Epoch 175 Batch 90/162] avg loss 0.000151833, throughput 3.93871K wps
[Epoch 175 Batch 120/162] avg loss 0.000161586, throughput 3.95458K wps
[Epoch 175 Batch 150/162] avg loss 0.000188135, throughput 3.96065K wps
Begin Testing...
[Epoch 175] train avg loss 0.000170201, dev acc 0.9111, dev avg loss 0.287834, throughput 3.97121K wps
[Epoch 176 Batch 30/162] avg loss 0.000216053, throughput 4.06891K wps
[Epoch 176 Batch 60/162] avg loss 0.000153729, throughput 3.95931K wps
[Epoch 176 Batch 90/162] avg loss 0.000158809, throughput 3.96513K wps
[Epoch 176 Batch 120/162] avg loss 0.000140777, throughput 3.95951K wps
[Epoch 176 Batch 150/162] avg loss 0.000147583, throughput 3.94912K wps
Begin Testing...
[Epoch 176] train avg loss 0.00016411, dev acc 0.9111, dev avg loss 0.288219, throughput 3.97647K wps
[Epoch 177 Batch 30/162] avg loss 0.000188761, throughput 4.04625K wps
[Epoch 177 Batch 60/162] avg loss 0.000162412, throughput 3.96337K wps
[Epoch 177 Batch 90/162] avg loss 0.000201792, throughput 3.97199K wps
[Epoch 177 Batch 120/162] avg loss 0.000144498, throughput 3.96327K wps
[Epoch 177 Batch 150/162] avg loss 0.000144358, throughput 3.95989K wps
Begin Testing...
[Epoch 177] train avg loss 0.000165788, dev acc 0.9111, dev avg loss 0.290721, throughput 3.97963K wps
[Epoch 178 Batch 30/162] avg loss 0.000148289, throughput 4.05191K wps
[Epoch 178 Batch 60/162] avg loss 0.000158388, throughput 3.95864K wps
[Epoch 178 Batch 90/162] avg loss 0.000189077, throughput 3.96157K wps
[Epoch 178 Batch 120/162] avg loss 0.000161768, throughput 3.96419K wps
[Epoch 178 Batch 150/162] avg loss 0.000162716, throughput 3.96035K wps
Begin Testing...
[Epoch 178] train avg loss 0.000166294, dev acc 0.9111, dev avg loss 0.290534, throughput 3.97782K wps
[Epoch 179 Batch 30/162] avg loss 0.000155635, throughput 4.05088K wps
[Epoch 179 Batch 60/162] avg loss 0.000162064, throughput 3.96738K wps
[Epoch 179 Batch 90/162] avg loss 0.0001438, throughput 3.96467K wps
[Epoch 179 Batch 120/162] avg loss 0.000134596, throughput 3.96615K wps
[Epoch 179 Batch 150/162] avg loss 0.000144415, throughput 3.95393K wps
Begin Testing...
[Epoch 179] train avg loss 0.000149102, dev acc 0.9111, dev avg loss 0.291008, throughput 3.979K wps
[Epoch 180 Batch 30/162] avg loss 0.00015598, throughput 4.06303K wps
[Epoch 180 Batch 60/162] avg loss 0.00014637, throughput 3.96248K wps
[Epoch 180 Batch 90/162] avg loss 0.000189876, throughput 3.9675K wps
[Epoch 180 Batch 120/162] avg loss 0.000155194, throughput 3.9559K wps
[Epoch 180 Batch 150/162] avg loss 0.000179916, throughput 3.96207K wps
Begin Testing...
[Epoch 180] train avg loss 0.000166144, dev acc 0.9122, dev avg loss 0.290711, throughput 3.98011K wps
[Epoch 181 Batch 30/162] avg loss 0.000153292, throughput 4.06346K wps
[Epoch 181 Batch 60/162] avg loss 0.000179751, throughput 3.972K wps
[Epoch 181 Batch 90/162] avg loss 0.000167074, throughput 3.96054K wps
[Epoch 181 Batch 120/162] avg loss 0.000164351, throughput 3.97059K wps
[Epoch 181 Batch 150/162] avg loss 0.000146089, throughput 3.96386K wps
Begin Testing...
[Epoch 181] train avg loss 0.000159746, dev acc 0.9100, dev avg loss 0.292046, throughput 3.98358K wps
[Epoch 182 Batch 30/162] avg loss 0.000170958, throughput 4.06042K wps
[Epoch 182 Batch 60/162] avg loss 0.000175511, throughput 3.9663K wps
[Epoch 182 Batch 90/162] avg loss 0.000171037, throughput 3.96139K wps
[Epoch 182 Batch 120/162] avg loss 0.000165009, throughput 3.96659K wps
[Epoch 182 Batch 150/162] avg loss 0.000153451, throughput 3.9638K wps
Begin Testing...
[Epoch 182] train avg loss 0.000164514, dev acc 0.9111, dev avg loss 0.291345, throughput 3.98172K wps
[Epoch 183 Batch 30/162] avg loss 0.000149763, throughput 4.06009K wps
[Epoch 183 Batch 60/162] avg loss 0.000140553, throughput 3.97364K wps
[Epoch 183 Batch 90/162] avg loss 0.000159256, throughput 3.97059K wps
[Epoch 183 Batch 120/162] avg loss 0.000176276, throughput 3.96361K wps
[Epoch 183 Batch 150/162] avg loss 0.000161188, throughput 3.97062K wps
Begin Testing...
[Epoch 183] train avg loss 0.000160153, dev acc 0.9111, dev avg loss 0.292061, throughput 3.98564K wps
[Epoch 184 Batch 30/162] avg loss 0.000152425, throughput 4.05795K wps
[Epoch 184 Batch 60/162] avg loss 0.00018964, throughput 3.95757K wps
[Epoch 184 Batch 90/162] avg loss 0.000150301, throughput 3.96937K wps
[Epoch 184 Batch 120/162] avg loss 0.000188041, throughput 3.96457K wps
[Epoch 184 Batch 150/162] avg loss 0.000139026, throughput 3.95387K wps
Begin Testing...
[Epoch 184] train avg loss 0.000165477, dev acc 0.9111, dev avg loss 0.293337, throughput 3.97878K wps
[Epoch 185 Batch 30/162] avg loss 0.000171088, throughput 4.06389K wps
[Epoch 185 Batch 60/162] avg loss 0.000178324, throughput 3.95072K wps
[Epoch 185 Batch 90/162] avg loss 0.000140588, throughput 3.95758K wps
[Epoch 185 Batch 120/162] avg loss 0.000163402, throughput 3.96234K wps
[Epoch 185 Batch 150/162] avg loss 0.000158087, throughput 3.97185K wps
Begin Testing...
[Epoch 185] train avg loss 0.00016281, dev acc 0.9100, dev avg loss 0.293438, throughput 3.97983K wps
[Epoch 186 Batch 30/162] avg loss 0.000147973, throughput 4.05083K wps
[Epoch 186 Batch 60/162] avg loss 0.000156419, throughput 3.96929K wps
[Epoch 186 Batch 90/162] avg loss 0.000179855, throughput 3.95476K wps
[Epoch 186 Batch 120/162] avg loss 0.000133935, throughput 3.97015K wps
[Epoch 186 Batch 150/162] avg loss 0.000176698, throughput 3.95621K wps
Begin Testing...
[Epoch 186] train avg loss 0.000157999, dev acc 0.9111, dev avg loss 0.293686, throughput 3.97796K wps
[Epoch 187 Batch 30/162] avg loss 0.00013571, throughput 4.04737K wps
[Epoch 187 Batch 60/162] avg loss 0.000142934, throughput 3.94981K wps
[Epoch 187 Batch 90/162] avg loss 0.000145526, throughput 3.96104K wps
[Epoch 187 Batch 120/162] avg loss 0.000143498, throughput 3.95778K wps
[Epoch 187 Batch 150/162] avg loss 0.000160011, throughput 3.96439K wps
Begin Testing...
[Epoch 187] train avg loss 0.000146148, dev acc 0.9100, dev avg loss 0.293828, throughput 3.97507K wps
[Epoch 188 Batch 30/162] avg loss 0.000178493, throughput 4.05665K wps
[Epoch 188 Batch 60/162] avg loss 0.000156897, throughput 3.96017K wps
[Epoch 188 Batch 90/162] avg loss 0.000156878, throughput 3.96034K wps
[Epoch 188 Batch 120/162] avg loss 0.000165616, throughput 3.96549K wps
[Epoch 188 Batch 150/162] avg loss 0.000154209, throughput 3.96386K wps
Begin Testing...
[Epoch 188] train avg loss 0.000159622, dev acc 0.9111, dev avg loss 0.293854, throughput 3.97969K wps
[Epoch 189 Batch 30/162] avg loss 0.000159059, throughput 4.07015K wps
[Epoch 189 Batch 60/162] avg loss 0.000149779, throughput 3.96902K wps
[Epoch 189 Batch 90/162] avg loss 0.00014292, throughput 3.95652K wps
[Epoch 189 Batch 120/162] avg loss 0.000134397, throughput 3.97285K wps
[Epoch 189 Batch 150/162] avg loss 0.000155541, throughput 3.9628K wps
Begin Testing...
[Epoch 189] train avg loss 0.000151208, dev acc 0.9111, dev avg loss 0.293918, throughput 3.9842K wps
[Epoch 190 Batch 30/162] avg loss 0.000151971, throughput 4.0477K wps
[Epoch 190 Batch 60/162] avg loss 0.000168004, throughput 3.96593K wps
[Epoch 190 Batch 90/162] avg loss 0.000135056, throughput 3.9652K wps
[Epoch 190 Batch 120/162] avg loss 0.000157806, throughput 3.96737K wps
[Epoch 190 Batch 150/162] avg loss 0.000125346, throughput 3.95672K wps
Begin Testing...
[Epoch 190] train avg loss 0.000150182, dev acc 0.9111, dev avg loss 0.293937, throughput 3.97908K wps
[Epoch 191 Batch 30/162] avg loss 0.00012811, throughput 4.05978K wps
[Epoch 191 Batch 60/162] avg loss 0.000129762, throughput 3.96624K wps
[Epoch 191 Batch 90/162] avg loss 0.000151346, throughput 3.9593K wps
[Epoch 191 Batch 120/162] avg loss 0.000134146, throughput 3.95633K wps
[Epoch 191 Batch 150/162] avg loss 0.000139206, throughput 3.9707K wps
Begin Testing...
[Epoch 191] train avg loss 0.000135344, dev acc 0.9100, dev avg loss 0.295159, throughput 3.98019K wps
[Epoch 192 Batch 30/162] avg loss 0.000141885, throughput 4.04716K wps
[Epoch 192 Batch 60/162] avg loss 0.000135111, throughput 3.96872K wps
[Epoch 192 Batch 90/162] avg loss 0.000151, throughput 3.96967K wps
[Epoch 192 Batch 120/162] avg loss 0.000131379, throughput 3.96903K wps
[Epoch 192 Batch 150/162] avg loss 0.000153848, throughput 3.96689K wps
Begin Testing...
[Epoch 192] train avg loss 0.000143656, dev acc 0.9111, dev avg loss 0.294748, throughput 3.9828K wps
[Epoch 193 Batch 30/162] avg loss 0.00015018, throughput 4.05095K wps
[Epoch 193 Batch 60/162] avg loss 0.000152008, throughput 3.96168K wps
[Epoch 193 Batch 90/162] avg loss 0.000145599, throughput 3.96986K wps
[Epoch 193 Batch 120/162] avg loss 0.000153926, throughput 3.96025K wps
[Epoch 193 Batch 150/162] avg loss 0.000142273, throughput 3.95927K wps
Begin Testing...
[Epoch 193] train avg loss 0.000148035, dev acc 0.9100, dev avg loss 0.295987, throughput 3.97943K wps
[Epoch 194 Batch 30/162] avg loss 0.000174025, throughput 4.05838K wps
[Epoch 194 Batch 60/162] avg loss 0.000134294, throughput 3.96137K wps
[Epoch 194 Batch 90/162] avg loss 0.000126534, throughput 3.96263K wps
[Epoch 194 Batch 120/162] avg loss 0.000131564, throughput 3.95324K wps
[Epoch 194 Batch 150/162] avg loss 0.00013914, throughput 3.96167K wps
Begin Testing...
[Epoch 194] train avg loss 0.000140794, dev acc 0.9100, dev avg loss 0.295892, throughput 3.97723K wps
[Epoch 195 Batch 30/162] avg loss 0.000156225, throughput 4.06321K wps
[Epoch 195 Batch 60/162] avg loss 0.000141584, throughput 3.96917K wps
[Epoch 195 Batch 90/162] avg loss 0.000122573, throughput 3.95354K wps
[Epoch 195 Batch 120/162] avg loss 0.000150675, throughput 3.96886K wps
[Epoch 195 Batch 150/162] avg loss 0.000143625, throughput 3.97192K wps
Begin Testing...
[Epoch 195] train avg loss 0.000145759, dev acc 0.9111, dev avg loss 0.295452, throughput 3.98389K wps
[Epoch 196 Batch 30/162] avg loss 0.000113027, throughput 4.06782K wps
[Epoch 196 Batch 60/162] avg loss 0.000202955, throughput 3.9664K wps
[Epoch 196 Batch 90/162] avg loss 0.000144135, throughput 3.95653K wps
[Epoch 196 Batch 120/162] avg loss 0.000151751, throughput 3.94051K wps
[Epoch 196 Batch 150/162] avg loss 0.000149353, throughput 3.95902K wps
Begin Testing...
[Epoch 196] train avg loss 0.000152365, dev acc 0.9100, dev avg loss 0.296364, throughput 3.97612K wps
[Epoch 197 Batch 30/162] avg loss 0.000158936, throughput 4.05407K wps
[Epoch 197 Batch 60/162] avg loss 0.000163788, throughput 3.95716K wps
[Epoch 197 Batch 90/162] avg loss 0.000168452, throughput 3.95924K wps
[Epoch 197 Batch 120/162] avg loss 0.000130512, throughput 3.95202K wps
[Epoch 197 Batch 150/162] avg loss 0.000166015, throughput 3.95995K wps
Begin Testing...
[Epoch 197] train avg loss 0.000152694, dev acc 0.9111, dev avg loss 0.296942, throughput 3.97558K wps
[Epoch 198 Batch 30/162] avg loss 0.000107527, throughput 4.05971K wps
[Epoch 198 Batch 60/162] avg loss 0.00011949, throughput 3.96435K wps
[Epoch 198 Batch 90/162] avg loss 0.000126499, throughput 3.95346K wps
[Epoch 198 Batch 120/162] avg loss 0.000130169, throughput 3.97216K wps
[Epoch 198 Batch 150/162] avg loss 0.000131534, throughput 3.96506K wps
Begin Testing...
[Epoch 198] train avg loss 0.00012417, dev acc 0.9111, dev avg loss 0.297007, throughput 3.98136K wps
[Epoch 199 Batch 30/162] avg loss 0.000144296, throughput 4.05752K wps
[Epoch 199 Batch 60/162] avg loss 0.000195428, throughput 3.95938K wps
[Epoch 199 Batch 90/162] avg loss 0.000146246, throughput 3.96406K wps
[Epoch 199 Batch 120/162] avg loss 0.000137788, throughput 3.96034K wps
[Epoch 199 Batch 150/162] avg loss 0.000108479, throughput 3.97325K wps
Begin Testing...
[Epoch 199] train avg loss 0.00014425, dev acc 0.9100, dev avg loss 0.297705, throughput 3.98042K wps
Test loss 0.240331, test acc 0.8960
Total time cost 1017.79s
[Epoch 0 Batch 30/162] avg loss 0.0140659, throughput 3.56165K wps
[Epoch 0 Batch 60/162] avg loss 0.0140114, throughput 3.97088K wps
[Epoch 0 Batch 90/162] avg loss 0.0137461, throughput 3.95574K wps
[Epoch 0 Batch 120/162] avg loss 0.01363, throughput 3.96686K wps
[Epoch 0 Batch 150/162] avg loss 0.0135188, throughput 3.95525K wps
Begin Testing...
[Epoch 0] train avg loss 0.013769, dev acc 0.6022, dev avg loss 0.666958, throughput 3.8817K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0132854, throughput 4.05025K wps
[Epoch 1 Batch 60/162] avg loss 0.0130778, throughput 3.96218K wps
[Epoch 1 Batch 90/162] avg loss 0.0130727, throughput 3.97172K wps
[Epoch 1 Batch 120/162] avg loss 0.0129263, throughput 3.95766K wps
[Epoch 1 Batch 150/162] avg loss 0.0127354, throughput 3.94919K wps
Begin Testing...
[Epoch 1] train avg loss 0.0129969, dev acc 0.8467, dev avg loss 0.626784, throughput 3.97701K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/162] avg loss 0.0124305, throughput 4.05678K wps
[Epoch 2 Batch 60/162] avg loss 0.0123844, throughput 3.97275K wps
[Epoch 2 Batch 90/162] avg loss 0.0124228, throughput 3.9718K wps
[Epoch 2 Batch 120/162] avg loss 0.0119817, throughput 3.97055K wps
[Epoch 2 Batch 150/162] avg loss 0.0118411, throughput 3.97156K wps
Begin Testing...
[Epoch 2] train avg loss 0.0121836, dev acc 0.8722, dev avg loss 0.582733, throughput 3.98625K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0116066, throughput 4.05614K wps
[Epoch 3 Batch 60/162] avg loss 0.0114874, throughput 3.9564K wps
[Epoch 3 Batch 90/162] avg loss 0.0111099, throughput 3.95719K wps
[Epoch 3 Batch 120/162] avg loss 0.0111989, throughput 3.96913K wps
[Epoch 3 Batch 150/162] avg loss 0.0109883, throughput 3.95628K wps
Begin Testing...
[Epoch 3] train avg loss 0.0112162, dev acc 0.8411, dev avg loss 0.531809, throughput 3.97798K wps
[Epoch 4 Batch 30/162] avg loss 0.0105699, throughput 4.05297K wps
[Epoch 4 Batch 60/162] avg loss 0.0102184, throughput 3.95563K wps
[Epoch 4 Batch 90/162] avg loss 0.0102269, throughput 3.96325K wps
[Epoch 4 Batch 120/162] avg loss 0.0100088, throughput 3.96343K wps
[Epoch 4 Batch 150/162] avg loss 0.00986022, throughput 3.95982K wps
Begin Testing...
[Epoch 4] train avg loss 0.0101319, dev acc 0.8733, dev avg loss 0.478087, throughput 3.9768K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.00958812, throughput 4.03987K wps
[Epoch 5 Batch 60/162] avg loss 0.00928843, throughput 3.95427K wps
[Epoch 5 Batch 90/162] avg loss 0.00932186, throughput 3.96278K wps
[Epoch 5 Batch 120/162] avg loss 0.00913098, throughput 3.9689K wps
[Epoch 5 Batch 150/162] avg loss 0.00914124, throughput 3.96703K wps
Begin Testing...
[Epoch 5] train avg loss 0.00926276, dev acc 0.8744, dev avg loss 0.433989, throughput 3.97768K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/162] avg loss 0.00857036, throughput 4.06464K wps
[Epoch 6 Batch 60/162] avg loss 0.00848618, throughput 3.97371K wps
[Epoch 6 Batch 90/162] avg loss 0.00854181, throughput 3.96072K wps
[Epoch 6 Batch 120/162] avg loss 0.00831443, throughput 3.95835K wps
[Epoch 6 Batch 150/162] avg loss 0.00836902, throughput 3.97299K wps
Begin Testing...
[Epoch 6] train avg loss 0.00844492, dev acc 0.8822, dev avg loss 0.397483, throughput 3.98493K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/162] avg loss 0.00781988, throughput 4.04031K wps
[Epoch 7 Batch 60/162] avg loss 0.00824924, throughput 3.96813K wps
[Epoch 7 Batch 90/162] avg loss 0.00781878, throughput 3.96171K wps
[Epoch 7 Batch 120/162] avg loss 0.00765126, throughput 3.96512K wps
[Epoch 7 Batch 150/162] avg loss 0.00784619, throughput 3.97522K wps
Begin Testing...
[Epoch 7] train avg loss 0.00782607, dev acc 0.8822, dev avg loss 0.369957, throughput 3.98139K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.00747435, throughput 4.05719K wps
[Epoch 8 Batch 60/162] avg loss 0.00720253, throughput 3.9596K wps
[Epoch 8 Batch 90/162] avg loss 0.00753049, throughput 3.96137K wps
[Epoch 8 Batch 120/162] avg loss 0.00713017, throughput 3.95074K wps
[Epoch 8 Batch 150/162] avg loss 0.00721763, throughput 3.96841K wps
Begin Testing...
[Epoch 8] train avg loss 0.00729884, dev acc 0.8822, dev avg loss 0.346983, throughput 3.97783K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/162] avg loss 0.00700649, throughput 4.06408K wps
[Epoch 9 Batch 60/162] avg loss 0.00672683, throughput 3.97541K wps
[Epoch 9 Batch 90/162] avg loss 0.00694227, throughput 3.97451K wps
[Epoch 9 Batch 120/162] avg loss 0.00700002, throughput 3.96496K wps
[Epoch 9 Batch 150/162] avg loss 0.00669881, throughput 3.96029K wps
Begin Testing...
[Epoch 9] train avg loss 0.00685504, dev acc 0.8856, dev avg loss 0.32836, throughput 3.98483K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00664443, throughput 4.0474K wps
[Epoch 10 Batch 60/162] avg loss 0.00687904, throughput 3.96523K wps
[Epoch 10 Batch 90/162] avg loss 0.00625326, throughput 3.95244K wps
[Epoch 10 Batch 120/162] avg loss 0.00643132, throughput 3.97191K wps
[Epoch 10 Batch 150/162] avg loss 0.00641931, throughput 3.96289K wps
Begin Testing...
[Epoch 10] train avg loss 0.0065377, dev acc 0.8956, dev avg loss 0.313155, throughput 3.97858K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00615605, throughput 4.06384K wps
[Epoch 11 Batch 60/162] avg loss 0.00634718, throughput 3.96643K wps
[Epoch 11 Batch 90/162] avg loss 0.00610828, throughput 3.9559K wps
[Epoch 11 Batch 120/162] avg loss 0.0063342, throughput 3.97069K wps
[Epoch 11 Batch 150/162] avg loss 0.00596544, throughput 3.96293K wps
Begin Testing...
[Epoch 11] train avg loss 0.00621044, dev acc 0.9022, dev avg loss 0.300656, throughput 3.98202K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00611745, throughput 4.0449K wps
[Epoch 12 Batch 60/162] avg loss 0.00584205, throughput 3.96053K wps
[Epoch 12 Batch 90/162] avg loss 0.00574453, throughput 3.96825K wps
[Epoch 12 Batch 120/162] avg loss 0.00595551, throughput 3.96999K wps
[Epoch 12 Batch 150/162] avg loss 0.00602022, throughput 3.96255K wps
Begin Testing...
[Epoch 12] train avg loss 0.00594721, dev acc 0.8978, dev avg loss 0.289898, throughput 3.98018K wps
[Epoch 13 Batch 30/162] avg loss 0.00562922, throughput 4.05274K wps
[Epoch 13 Batch 60/162] avg loss 0.00564232, throughput 3.97003K wps
[Epoch 13 Batch 90/162] avg loss 0.00566119, throughput 3.96332K wps
[Epoch 13 Batch 120/162] avg loss 0.0056914, throughput 3.96199K wps
[Epoch 13 Batch 150/162] avg loss 0.00548844, throughput 3.9712K wps
Begin Testing...
[Epoch 13] train avg loss 0.00561677, dev acc 0.9044, dev avg loss 0.280072, throughput 3.98305K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00521273, throughput 4.05569K wps
[Epoch 14 Batch 60/162] avg loss 0.00565833, throughput 3.96651K wps
[Epoch 14 Batch 90/162] avg loss 0.00544198, throughput 3.96964K wps
[Epoch 14 Batch 120/162] avg loss 0.00511647, throughput 3.96918K wps
[Epoch 14 Batch 150/162] avg loss 0.00595719, throughput 3.9573K wps
Begin Testing...
[Epoch 14] train avg loss 0.0054636, dev acc 0.9044, dev avg loss 0.272703, throughput 3.98262K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00526326, throughput 4.069K wps
[Epoch 15 Batch 60/162] avg loss 0.00513931, throughput 3.96063K wps
[Epoch 15 Batch 90/162] avg loss 0.00554776, throughput 3.96182K wps
[Epoch 15 Batch 120/162] avg loss 0.0055023, throughput 3.96339K wps
[Epoch 15 Batch 150/162] avg loss 0.00514857, throughput 3.97929K wps
Begin Testing...
[Epoch 15] train avg loss 0.00528192, dev acc 0.9011, dev avg loss 0.266943, throughput 3.98444K wps
[Epoch 16 Batch 30/162] avg loss 0.00510088, throughput 4.06338K wps
[Epoch 16 Batch 60/162] avg loss 0.00521829, throughput 3.97282K wps
[Epoch 16 Batch 90/162] avg loss 0.00495317, throughput 3.96537K wps
[Epoch 16 Batch 120/162] avg loss 0.00475383, throughput 3.96882K wps
[Epoch 16 Batch 150/162] avg loss 0.00543149, throughput 3.97045K wps
Begin Testing...
[Epoch 16] train avg loss 0.0050496, dev acc 0.9056, dev avg loss 0.259307, throughput 3.98651K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.0049401, throughput 4.06134K wps
[Epoch 17 Batch 60/162] avg loss 0.00503888, throughput 3.96707K wps
[Epoch 17 Batch 90/162] avg loss 0.00503809, throughput 3.96644K wps
[Epoch 17 Batch 120/162] avg loss 0.00497179, throughput 3.97518K wps
[Epoch 17 Batch 150/162] avg loss 0.00487976, throughput 3.97574K wps
Begin Testing...
[Epoch 17] train avg loss 0.00494732, dev acc 0.9056, dev avg loss 0.253955, throughput 3.98778K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/162] avg loss 0.00490965, throughput 4.0628K wps
[Epoch 18 Batch 60/162] avg loss 0.00477998, throughput 3.9647K wps
[Epoch 18 Batch 90/162] avg loss 0.00473044, throughput 3.9582K wps
[Epoch 18 Batch 120/162] avg loss 0.00475049, throughput 3.95551K wps
[Epoch 18 Batch 150/162] avg loss 0.00450954, throughput 3.97666K wps
Begin Testing...
[Epoch 18] train avg loss 0.00473668, dev acc 0.9089, dev avg loss 0.249113, throughput 3.9813K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.00485815, throughput 4.0615K wps
[Epoch 19 Batch 60/162] avg loss 0.00449809, throughput 3.98109K wps
[Epoch 19 Batch 90/162] avg loss 0.00482699, throughput 3.97108K wps
[Epoch 19 Batch 120/162] avg loss 0.00443956, throughput 3.97529K wps
[Epoch 19 Batch 150/162] avg loss 0.00454933, throughput 3.97149K wps
Begin Testing...
[Epoch 19] train avg loss 0.00463928, dev acc 0.9122, dev avg loss 0.244022, throughput 3.98911K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/162] avg loss 0.00471057, throughput 4.07123K wps
[Epoch 20 Batch 60/162] avg loss 0.0042631, throughput 3.96616K wps
[Epoch 20 Batch 90/162] avg loss 0.00446353, throughput 3.96361K wps
[Epoch 20 Batch 120/162] avg loss 0.00471943, throughput 3.97129K wps
[Epoch 20 Batch 150/162] avg loss 0.00421817, throughput 3.96913K wps
Begin Testing...
[Epoch 20] train avg loss 0.0044542, dev acc 0.9100, dev avg loss 0.239799, throughput 3.98709K wps
[Epoch 21 Batch 30/162] avg loss 0.00430883, throughput 4.06167K wps
[Epoch 21 Batch 60/162] avg loss 0.00441322, throughput 3.95957K wps
[Epoch 21 Batch 90/162] avg loss 0.00399769, throughput 3.964K wps
[Epoch 21 Batch 120/162] avg loss 0.00438504, throughput 3.97172K wps
[Epoch 21 Batch 150/162] avg loss 0.00429755, throughput 3.97733K wps
Begin Testing...
[Epoch 21] train avg loss 0.00430307, dev acc 0.9133, dev avg loss 0.235803, throughput 3.98465K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.00405923, throughput 4.0651K wps
[Epoch 22 Batch 60/162] avg loss 0.00399686, throughput 3.96089K wps
[Epoch 22 Batch 90/162] avg loss 0.00406616, throughput 3.97252K wps
[Epoch 22 Batch 120/162] avg loss 0.00452916, throughput 3.96815K wps
[Epoch 22 Batch 150/162] avg loss 0.00410566, throughput 3.94958K wps
Begin Testing...
[Epoch 22] train avg loss 0.00416751, dev acc 0.9178, dev avg loss 0.233082, throughput 3.98131K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/162] avg loss 0.00397356, throughput 4.04612K wps
[Epoch 23 Batch 60/162] avg loss 0.00419543, throughput 3.96908K wps
[Epoch 23 Batch 90/162] avg loss 0.00397911, throughput 3.96078K wps
[Epoch 23 Batch 120/162] avg loss 0.00417121, throughput 3.95149K wps
[Epoch 23 Batch 150/162] avg loss 0.00371416, throughput 3.96604K wps
Begin Testing...
[Epoch 23] train avg loss 0.0040199, dev acc 0.9144, dev avg loss 0.22948, throughput 3.9767K wps
[Epoch 24 Batch 30/162] avg loss 0.00392908, throughput 4.05775K wps
[Epoch 24 Batch 60/162] avg loss 0.00394934, throughput 3.97146K wps
[Epoch 24 Batch 90/162] avg loss 0.00405697, throughput 3.96458K wps
[Epoch 24 Batch 120/162] avg loss 0.00409845, throughput 3.9692K wps
[Epoch 24 Batch 150/162] avg loss 0.00371821, throughput 3.96329K wps
Begin Testing...
[Epoch 24] train avg loss 0.00390821, dev acc 0.9133, dev avg loss 0.226053, throughput 3.98201K wps
[Epoch 25 Batch 30/162] avg loss 0.00371249, throughput 4.05824K wps
[Epoch 25 Batch 60/162] avg loss 0.00405853, throughput 3.96558K wps
[Epoch 25 Batch 90/162] avg loss 0.00364943, throughput 3.96223K wps
[Epoch 25 Batch 120/162] avg loss 0.0036912, throughput 3.97156K wps
[Epoch 25 Batch 150/162] avg loss 0.00379592, throughput 3.96386K wps
Begin Testing...
[Epoch 25] train avg loss 0.00383213, dev acc 0.9133, dev avg loss 0.223755, throughput 3.98122K wps
[Epoch 26 Batch 30/162] avg loss 0.00403076, throughput 4.06044K wps
[Epoch 26 Batch 60/162] avg loss 0.00336618, throughput 3.96492K wps
[Epoch 26 Batch 90/162] avg loss 0.00364452, throughput 3.96167K wps
[Epoch 26 Batch 120/162] avg loss 0.00363685, throughput 3.96612K wps
[Epoch 26 Batch 150/162] avg loss 0.00375346, throughput 3.96678K wps
Begin Testing...
[Epoch 26] train avg loss 0.0036875, dev acc 0.9178, dev avg loss 0.220365, throughput 3.98202K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.0035763, throughput 4.05569K wps
[Epoch 27 Batch 60/162] avg loss 0.00360712, throughput 3.96742K wps
[Epoch 27 Batch 90/162] avg loss 0.00347687, throughput 3.97227K wps
[Epoch 27 Batch 120/162] avg loss 0.0038459, throughput 3.97433K wps
[Epoch 27 Batch 150/162] avg loss 0.00373362, throughput 3.97105K wps
Begin Testing...
[Epoch 27] train avg loss 0.00363491, dev acc 0.9200, dev avg loss 0.218286, throughput 3.98536K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/162] avg loss 0.00359824, throughput 4.06622K wps
[Epoch 28 Batch 60/162] avg loss 0.00359381, throughput 3.95009K wps
[Epoch 28 Batch 90/162] avg loss 0.00344732, throughput 3.96792K wps
[Epoch 28 Batch 120/162] avg loss 0.00343747, throughput 3.96717K wps
[Epoch 28 Batch 150/162] avg loss 0.00354643, throughput 3.96753K wps
Begin Testing...
[Epoch 28] train avg loss 0.00348587, dev acc 0.9189, dev avg loss 0.21594, throughput 3.98279K wps
[Epoch 29 Batch 30/162] avg loss 0.00368124, throughput 4.07115K wps
[Epoch 29 Batch 60/162] avg loss 0.00353381, throughput 3.98326K wps
[Epoch 29 Batch 90/162] avg loss 0.00319205, throughput 3.96633K wps
[Epoch 29 Batch 120/162] avg loss 0.00333365, throughput 3.96628K wps
[Epoch 29 Batch 150/162] avg loss 0.00324791, throughput 3.96516K wps
Begin Testing...
[Epoch 29] train avg loss 0.00335764, dev acc 0.9189, dev avg loss 0.213341, throughput 3.98738K wps
[Epoch 30 Batch 30/162] avg loss 0.00337642, throughput 4.06873K wps
[Epoch 30 Batch 60/162] avg loss 0.00300446, throughput 3.97501K wps
[Epoch 30 Batch 90/162] avg loss 0.00300898, throughput 3.95932K wps
[Epoch 30 Batch 120/162] avg loss 0.00367877, throughput 3.96852K wps
[Epoch 30 Batch 150/162] avg loss 0.0034593, throughput 3.9711K wps
Begin Testing...
[Epoch 30] train avg loss 0.0033387, dev acc 0.9200, dev avg loss 0.212767, throughput 3.98679K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00310923, throughput 4.07261K wps
[Epoch 31 Batch 60/162] avg loss 0.00313643, throughput 3.9789K wps
[Epoch 31 Batch 90/162] avg loss 0.00371719, throughput 3.96314K wps
[Epoch 31 Batch 120/162] avg loss 0.0030506, throughput 3.96042K wps
[Epoch 31 Batch 150/162] avg loss 0.0030959, throughput 3.97571K wps
Begin Testing...
[Epoch 31] train avg loss 0.0032233, dev acc 0.9211, dev avg loss 0.210519, throughput 3.98763K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/162] avg loss 0.00315501, throughput 4.0572K wps
[Epoch 32 Batch 60/162] avg loss 0.00304621, throughput 3.97215K wps
[Epoch 32 Batch 90/162] avg loss 0.00309089, throughput 3.96556K wps
[Epoch 32 Batch 120/162] avg loss 0.00315041, throughput 3.96107K wps
[Epoch 32 Batch 150/162] avg loss 0.00333279, throughput 3.97004K wps
Begin Testing...
[Epoch 32] train avg loss 0.00317618, dev acc 0.9233, dev avg loss 0.208625, throughput 3.98403K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/162] avg loss 0.00286383, throughput 4.0524K wps
[Epoch 33 Batch 60/162] avg loss 0.00315094, throughput 3.96335K wps
[Epoch 33 Batch 90/162] avg loss 0.00314843, throughput 3.9697K wps
[Epoch 33 Batch 120/162] avg loss 0.00331862, throughput 3.96676K wps
[Epoch 33 Batch 150/162] avg loss 0.00289814, throughput 3.97372K wps
Begin Testing...
[Epoch 33] train avg loss 0.00305328, dev acc 0.9244, dev avg loss 0.20622, throughput 3.98388K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/162] avg loss 0.00281169, throughput 4.0669K wps
[Epoch 34 Batch 60/162] avg loss 0.00281776, throughput 3.97698K wps
[Epoch 34 Batch 90/162] avg loss 0.00284858, throughput 3.96146K wps
[Epoch 34 Batch 120/162] avg loss 0.00317671, throughput 3.95892K wps
[Epoch 34 Batch 150/162] avg loss 0.00319149, throughput 3.95949K wps
Begin Testing...
[Epoch 34] train avg loss 0.00295295, dev acc 0.9200, dev avg loss 0.205863, throughput 3.98245K wps
[Epoch 35 Batch 30/162] avg loss 0.0028754, throughput 4.07368K wps
[Epoch 35 Batch 60/162] avg loss 0.00294843, throughput 3.96646K wps
[Epoch 35 Batch 90/162] avg loss 0.00282449, throughput 3.96501K wps
[Epoch 35 Batch 120/162] avg loss 0.00289983, throughput 3.96713K wps
[Epoch 35 Batch 150/162] avg loss 0.00282631, throughput 3.95764K wps
Begin Testing...
[Epoch 35] train avg loss 0.00289606, dev acc 0.9256, dev avg loss 0.204976, throughput 3.98442K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/162] avg loss 0.00311436, throughput 4.07286K wps
[Epoch 36 Batch 60/162] avg loss 0.00268752, throughput 3.96155K wps
[Epoch 36 Batch 90/162] avg loss 0.00280585, throughput 3.96316K wps
[Epoch 36 Batch 120/162] avg loss 0.00289591, throughput 3.97165K wps
[Epoch 36 Batch 150/162] avg loss 0.00241624, throughput 3.96607K wps
Begin Testing...
[Epoch 36] train avg loss 0.00276905, dev acc 0.9267, dev avg loss 0.201407, throughput 3.98542K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/162] avg loss 0.00313288, throughput 4.06453K wps
[Epoch 37 Batch 60/162] avg loss 0.00280448, throughput 3.97506K wps
[Epoch 37 Batch 90/162] avg loss 0.00244972, throughput 3.96619K wps
[Epoch 37 Batch 120/162] avg loss 0.00257705, throughput 3.96379K wps
[Epoch 37 Batch 150/162] avg loss 0.00261489, throughput 3.97252K wps
Begin Testing...
[Epoch 37] train avg loss 0.00272515, dev acc 0.9289, dev avg loss 0.200792, throughput 3.98517K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/162] avg loss 0.00241607, throughput 4.05971K wps
[Epoch 38 Batch 60/162] avg loss 0.00264984, throughput 3.97081K wps
[Epoch 38 Batch 90/162] avg loss 0.00271549, throughput 3.96878K wps
[Epoch 38 Batch 120/162] avg loss 0.00261898, throughput 3.95667K wps
[Epoch 38 Batch 150/162] avg loss 0.00282227, throughput 3.97081K wps
Begin Testing...
[Epoch 38] train avg loss 0.00262895, dev acc 0.9267, dev avg loss 0.199084, throughput 3.98402K wps
[Epoch 39 Batch 30/162] avg loss 0.00247104, throughput 4.07485K wps
[Epoch 39 Batch 60/162] avg loss 0.00248359, throughput 3.95857K wps
[Epoch 39 Batch 90/162] avg loss 0.00238276, throughput 3.96056K wps
[Epoch 39 Batch 120/162] avg loss 0.00251283, throughput 3.97245K wps
[Epoch 39 Batch 150/162] avg loss 0.00267697, throughput 3.96251K wps
Begin Testing...
[Epoch 39] train avg loss 0.00249386, dev acc 0.9267, dev avg loss 0.197403, throughput 3.98446K wps
[Epoch 40 Batch 30/162] avg loss 0.00244572, throughput 4.06903K wps
[Epoch 40 Batch 60/162] avg loss 0.00223807, throughput 3.96452K wps
[Epoch 40 Batch 90/162] avg loss 0.00282412, throughput 3.97777K wps
[Epoch 40 Batch 120/162] avg loss 0.00255206, throughput 3.9652K wps
[Epoch 40 Batch 150/162] avg loss 0.0022664, throughput 3.96824K wps
Begin Testing...
[Epoch 40] train avg loss 0.0024976, dev acc 0.9289, dev avg loss 0.196585, throughput 3.98742K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/162] avg loss 0.00229451, throughput 4.05246K wps
[Epoch 41 Batch 60/162] avg loss 0.00224535, throughput 3.97247K wps
[Epoch 41 Batch 90/162] avg loss 0.00249075, throughput 3.9653K wps
[Epoch 41 Batch 120/162] avg loss 0.00247556, throughput 3.97227K wps
[Epoch 41 Batch 150/162] avg loss 0.00236546, throughput 3.97443K wps
Begin Testing...
[Epoch 41] train avg loss 0.00239325, dev acc 0.9300, dev avg loss 0.196166, throughput 3.98534K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/162] avg loss 0.00215734, throughput 4.06011K wps
[Epoch 42 Batch 60/162] avg loss 0.00255716, throughput 3.9688K wps
[Epoch 42 Batch 90/162] avg loss 0.0023641, throughput 3.97324K wps
[Epoch 42 Batch 120/162] avg loss 0.00227251, throughput 3.97575K wps
[Epoch 42 Batch 150/162] avg loss 0.00234554, throughput 3.97053K wps
Begin Testing...
[Epoch 42] train avg loss 0.00233066, dev acc 0.9311, dev avg loss 0.193223, throughput 3.98769K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/162] avg loss 0.00234069, throughput 4.05387K wps
[Epoch 43 Batch 60/162] avg loss 0.00214056, throughput 3.96707K wps
[Epoch 43 Batch 90/162] avg loss 0.00228334, throughput 3.9537K wps
[Epoch 43 Batch 120/162] avg loss 0.00223321, throughput 3.96589K wps
[Epoch 43 Batch 150/162] avg loss 0.0022179, throughput 3.96633K wps
Begin Testing...
[Epoch 43] train avg loss 0.0022434, dev acc 0.9300, dev avg loss 0.192601, throughput 3.98025K wps
[Epoch 44 Batch 30/162] avg loss 0.00219713, throughput 4.06252K wps
[Epoch 44 Batch 60/162] avg loss 0.00216406, throughput 3.95679K wps
[Epoch 44 Batch 90/162] avg loss 0.00225355, throughput 3.97991K wps
[Epoch 44 Batch 120/162] avg loss 0.0023573, throughput 3.96915K wps
[Epoch 44 Batch 150/162] avg loss 0.00207768, throughput 3.95864K wps
Begin Testing...
[Epoch 44] train avg loss 0.00222891, dev acc 0.9333, dev avg loss 0.192082, throughput 3.98354K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/162] avg loss 0.00219366, throughput 4.06805K wps
[Epoch 45 Batch 60/162] avg loss 0.00206465, throughput 3.96324K wps
[Epoch 45 Batch 90/162] avg loss 0.00212545, throughput 3.96235K wps
[Epoch 45 Batch 120/162] avg loss 0.00219552, throughput 3.96683K wps
[Epoch 45 Batch 150/162] avg loss 0.00218915, throughput 3.96667K wps
Begin Testing...
[Epoch 45] train avg loss 0.00213419, dev acc 0.9322, dev avg loss 0.191338, throughput 3.98235K wps
[Epoch 46 Batch 30/162] avg loss 0.00223346, throughput 4.06291K wps
[Epoch 46 Batch 60/162] avg loss 0.00237827, throughput 3.96694K wps
[Epoch 46 Batch 90/162] avg loss 0.00187821, throughput 3.94801K wps
[Epoch 46 Batch 120/162] avg loss 0.00201111, throughput 3.95722K wps
[Epoch 46 Batch 150/162] avg loss 0.00188923, throughput 3.97469K wps
Begin Testing...
[Epoch 46] train avg loss 0.00206788, dev acc 0.9300, dev avg loss 0.190966, throughput 3.98026K wps
[Epoch 47 Batch 30/162] avg loss 0.00183952, throughput 4.06151K wps
[Epoch 47 Batch 60/162] avg loss 0.00223898, throughput 3.96379K wps
[Epoch 47 Batch 90/162] avg loss 0.00184192, throughput 3.97005K wps
[Epoch 47 Batch 120/162] avg loss 0.00211605, throughput 3.97598K wps
[Epoch 47 Batch 150/162] avg loss 0.00214269, throughput 3.96379K wps
Begin Testing...
[Epoch 47] train avg loss 0.00204474, dev acc 0.9333, dev avg loss 0.189776, throughput 3.98443K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/162] avg loss 0.00199568, throughput 4.06163K wps
[Epoch 48 Batch 60/162] avg loss 0.00207912, throughput 3.96633K wps
[Epoch 48 Batch 90/162] avg loss 0.00199139, throughput 3.96954K wps
[Epoch 48 Batch 120/162] avg loss 0.00203919, throughput 3.96128K wps
[Epoch 48 Batch 150/162] avg loss 0.00188106, throughput 3.95782K wps
Begin Testing...
[Epoch 48] train avg loss 0.00196862, dev acc 0.9356, dev avg loss 0.189069, throughput 3.98085K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/162] avg loss 0.00197834, throughput 4.0638K wps
[Epoch 49 Batch 60/162] avg loss 0.00204374, throughput 3.97127K wps
[Epoch 49 Batch 90/162] avg loss 0.0018384, throughput 3.96927K wps
[Epoch 49 Batch 120/162] avg loss 0.00183843, throughput 3.97038K wps
[Epoch 49 Batch 150/162] avg loss 0.0021837, throughput 3.95958K wps
Begin Testing...
[Epoch 49] train avg loss 0.00195778, dev acc 0.9344, dev avg loss 0.188049, throughput 3.98553K wps
[Epoch 50 Batch 30/162] avg loss 0.00189705, throughput 4.04914K wps
[Epoch 50 Batch 60/162] avg loss 0.00176368, throughput 3.9378K wps
[Epoch 50 Batch 90/162] avg loss 0.00191888, throughput 3.95397K wps
[Epoch 50 Batch 120/162] avg loss 0.00184898, throughput 3.95713K wps
[Epoch 50 Batch 150/162] avg loss 0.00191147, throughput 3.95836K wps
Begin Testing...
[Epoch 50] train avg loss 0.00187622, dev acc 0.9322, dev avg loss 0.18773, throughput 3.97088K wps
[Epoch 51 Batch 30/162] avg loss 0.00165701, throughput 4.07183K wps
[Epoch 51 Batch 60/162] avg loss 0.00175115, throughput 3.9621K wps
[Epoch 51 Batch 90/162] avg loss 0.00188294, throughput 3.96844K wps
[Epoch 51 Batch 120/162] avg loss 0.00184207, throughput 3.95585K wps
[Epoch 51 Batch 150/162] avg loss 0.00161046, throughput 3.97363K wps
Begin Testing...
[Epoch 51] train avg loss 0.00175026, dev acc 0.9333, dev avg loss 0.18664, throughput 3.98421K wps
[Epoch 52 Batch 30/162] avg loss 0.00187093, throughput 4.06114K wps
[Epoch 52 Batch 60/162] avg loss 0.00166173, throughput 3.95545K wps
[Epoch 52 Batch 90/162] avg loss 0.00182187, throughput 3.96417K wps
[Epoch 52 Batch 120/162] avg loss 0.001717, throughput 3.95969K wps
[Epoch 52 Batch 150/162] avg loss 0.00178018, throughput 3.95857K wps
Begin Testing...
[Epoch 52] train avg loss 0.00177941, dev acc 0.9322, dev avg loss 0.187166, throughput 3.97904K wps
[Epoch 53 Batch 30/162] avg loss 0.00171268, throughput 4.06879K wps
[Epoch 53 Batch 60/162] avg loss 0.0015478, throughput 3.94698K wps
[Epoch 53 Batch 90/162] avg loss 0.00197913, throughput 3.96278K wps
[Epoch 53 Batch 120/162] avg loss 0.00165021, throughput 3.9626K wps
[Epoch 53 Batch 150/162] avg loss 0.00164766, throughput 3.95683K wps
Begin Testing...
[Epoch 53] train avg loss 0.00169617, dev acc 0.9344, dev avg loss 0.185953, throughput 3.97731K wps
[Epoch 54 Batch 30/162] avg loss 0.00155994, throughput 4.05178K wps
[Epoch 54 Batch 60/162] avg loss 0.00157067, throughput 3.95929K wps
[Epoch 54 Batch 90/162] avg loss 0.00170145, throughput 3.97445K wps
[Epoch 54 Batch 120/162] avg loss 0.00159378, throughput 3.9766K wps
[Epoch 54 Batch 150/162] avg loss 0.00168873, throughput 3.96445K wps
Begin Testing...
[Epoch 54] train avg loss 0.00161914, dev acc 0.9333, dev avg loss 0.185567, throughput 3.98205K wps
[Epoch 55 Batch 30/162] avg loss 0.00163801, throughput 4.06214K wps
[Epoch 55 Batch 60/162] avg loss 0.00166547, throughput 3.97444K wps
[Epoch 55 Batch 90/162] avg loss 0.00137442, throughput 3.9633K wps
[Epoch 55 Batch 120/162] avg loss 0.00161071, throughput 3.97036K wps
[Epoch 55 Batch 150/162] avg loss 0.00176996, throughput 3.96966K wps
Begin Testing...
[Epoch 55] train avg loss 0.00163969, dev acc 0.9333, dev avg loss 0.185107, throughput 3.98622K wps
[Epoch 56 Batch 30/162] avg loss 0.00150565, throughput 4.05989K wps
[Epoch 56 Batch 60/162] avg loss 0.00175202, throughput 3.9533K wps
[Epoch 56 Batch 90/162] avg loss 0.00152954, throughput 3.95894K wps
[Epoch 56 Batch 120/162] avg loss 0.00161667, throughput 3.96204K wps
[Epoch 56 Batch 150/162] avg loss 0.00143081, throughput 3.96042K wps
Begin Testing...
[Epoch 56] train avg loss 0.00157825, dev acc 0.9322, dev avg loss 0.185667, throughput 3.97718K wps
[Epoch 57 Batch 30/162] avg loss 0.00158962, throughput 4.06468K wps
[Epoch 57 Batch 60/162] avg loss 0.00151052, throughput 3.97144K wps
[Epoch 57 Batch 90/162] avg loss 0.00159636, throughput 3.96748K wps
[Epoch 57 Batch 120/162] avg loss 0.00160143, throughput 3.96859K wps
[Epoch 57 Batch 150/162] avg loss 0.0014624, throughput 3.97571K wps
Begin Testing...
[Epoch 57] train avg loss 0.001544, dev acc 0.9300, dev avg loss 0.184782, throughput 3.98833K wps
[Epoch 58 Batch 30/162] avg loss 0.00163007, throughput 4.05671K wps
[Epoch 58 Batch 60/162] avg loss 0.00149643, throughput 3.94917K wps
[Epoch 58 Batch 90/162] avg loss 0.00167213, throughput 3.96026K wps
[Epoch 58 Batch 120/162] avg loss 0.00147176, throughput 3.96237K wps
[Epoch 58 Batch 150/162] avg loss 0.00139018, throughput 3.95211K wps
Begin Testing...
[Epoch 58] train avg loss 0.00151595, dev acc 0.9333, dev avg loss 0.185286, throughput 3.97403K wps
[Epoch 59 Batch 30/162] avg loss 0.00153114, throughput 4.05636K wps
[Epoch 59 Batch 60/162] avg loss 0.00146667, throughput 3.95741K wps
[Epoch 59 Batch 90/162] avg loss 0.00152531, throughput 3.97567K wps
[Epoch 59 Batch 120/162] avg loss 0.00141434, throughput 3.97002K wps
[Epoch 59 Batch 150/162] avg loss 0.00143584, throughput 3.97117K wps
Begin Testing...
[Epoch 59] train avg loss 0.00146993, dev acc 0.9300, dev avg loss 0.18419, throughput 3.98335K wps
[Epoch 60 Batch 30/162] avg loss 0.00147786, throughput 4.06114K wps
[Epoch 60 Batch 60/162] avg loss 0.00136003, throughput 3.96591K wps
[Epoch 60 Batch 90/162] avg loss 0.00137455, throughput 3.96478K wps
[Epoch 60 Batch 120/162] avg loss 0.00128146, throughput 3.95891K wps
[Epoch 60 Batch 150/162] avg loss 0.00142549, throughput 3.95835K wps
Begin Testing...
[Epoch 60] train avg loss 0.00139268, dev acc 0.9322, dev avg loss 0.183401, throughput 3.97999K wps
[Epoch 61 Batch 30/162] avg loss 0.00114325, throughput 4.06465K wps
[Epoch 61 Batch 60/162] avg loss 0.00143522, throughput 3.96059K wps
[Epoch 61 Batch 90/162] avg loss 0.00132176, throughput 3.97107K wps
[Epoch 61 Batch 120/162] avg loss 0.00142666, throughput 3.95326K wps
[Epoch 61 Batch 150/162] avg loss 0.00133628, throughput 3.96691K wps
Begin Testing...
[Epoch 61] train avg loss 0.00134613, dev acc 0.9300, dev avg loss 0.18302, throughput 3.98211K wps
[Epoch 62 Batch 30/162] avg loss 0.00128066, throughput 4.06563K wps
[Epoch 62 Batch 60/162] avg loss 0.00126296, throughput 3.94918K wps
[Epoch 62 Batch 90/162] avg loss 0.00137347, throughput 3.97483K wps
[Epoch 62 Batch 120/162] avg loss 0.0013254, throughput 3.96987K wps
[Epoch 62 Batch 150/162] avg loss 0.0014125, throughput 3.97082K wps
Begin Testing...
[Epoch 62] train avg loss 0.0013359, dev acc 0.9322, dev avg loss 0.182607, throughput 3.98497K wps
[Epoch 63 Batch 30/162] avg loss 0.00126118, throughput 4.05532K wps
[Epoch 63 Batch 60/162] avg loss 0.0012367, throughput 3.97141K wps
[Epoch 63 Batch 90/162] avg loss 0.00131012, throughput 3.96847K wps
[Epoch 63 Batch 120/162] avg loss 0.0014187, throughput 3.96602K wps
[Epoch 63 Batch 150/162] avg loss 0.001178, throughput 3.95258K wps
Begin Testing...
[Epoch 63] train avg loss 0.00128347, dev acc 0.9333, dev avg loss 0.182233, throughput 3.97922K wps
[Epoch 64 Batch 30/162] avg loss 0.00121492, throughput 4.05817K wps
[Epoch 64 Batch 60/162] avg loss 0.00127358, throughput 3.97795K wps
[Epoch 64 Batch 90/162] avg loss 0.00134598, throughput 3.97077K wps
[Epoch 64 Batch 120/162] avg loss 0.00129645, throughput 3.97311K wps
[Epoch 64 Batch 150/162] avg loss 0.00116791, throughput 3.9678K wps
Begin Testing...
[Epoch 64] train avg loss 0.00126049, dev acc 0.9300, dev avg loss 0.182413, throughput 3.98718K wps
[Epoch 65 Batch 30/162] avg loss 0.00116596, throughput 4.05366K wps
[Epoch 65 Batch 60/162] avg loss 0.0010951, throughput 3.96446K wps
[Epoch 65 Batch 90/162] avg loss 0.0010993, throughput 3.97742K wps
[Epoch 65 Batch 120/162] avg loss 0.0013662, throughput 3.96217K wps
[Epoch 65 Batch 150/162] avg loss 0.00121499, throughput 3.97248K wps
Begin Testing...
[Epoch 65] train avg loss 0.00119306, dev acc 0.9333, dev avg loss 0.181606, throughput 3.98373K wps
[Epoch 66 Batch 30/162] avg loss 0.00107089, throughput 4.0672K wps
[Epoch 66 Batch 60/162] avg loss 0.00108264, throughput 3.95402K wps
[Epoch 66 Batch 90/162] avg loss 0.00125645, throughput 3.9766K wps
[Epoch 66 Batch 120/162] avg loss 0.00121217, throughput 3.97106K wps
[Epoch 66 Batch 150/162] avg loss 0.00120473, throughput 3.96256K wps
Begin Testing...
[Epoch 66] train avg loss 0.00115406, dev acc 0.9322, dev avg loss 0.181483, throughput 3.98457K wps
[Epoch 67 Batch 30/162] avg loss 0.00105255, throughput 4.04689K wps
[Epoch 67 Batch 60/162] avg loss 0.00109786, throughput 3.96321K wps
[Epoch 67 Batch 90/162] avg loss 0.00105501, throughput 3.97363K wps
[Epoch 67 Batch 120/162] avg loss 0.00130024, throughput 3.97379K wps
[Epoch 67 Batch 150/162] avg loss 0.00124917, throughput 3.97035K wps
Begin Testing...
[Epoch 67] train avg loss 0.00114705, dev acc 0.9322, dev avg loss 0.181565, throughput 3.98307K wps
[Epoch 68 Batch 30/162] avg loss 0.0011171, throughput 4.0719K wps
[Epoch 68 Batch 60/162] avg loss 0.0013408, throughput 3.97874K wps
[Epoch 68 Batch 90/162] avg loss 0.00103572, throughput 3.95524K wps
[Epoch 68 Batch 120/162] avg loss 0.00116043, throughput 3.96852K wps
[Epoch 68 Batch 150/162] avg loss 0.00102584, throughput 3.96431K wps
Begin Testing...
[Epoch 68] train avg loss 0.00113008, dev acc 0.9333, dev avg loss 0.181722, throughput 3.98487K wps
[Epoch 69 Batch 30/162] avg loss 0.0010372, throughput 4.05164K wps
[Epoch 69 Batch 60/162] avg loss 0.00108375, throughput 3.97733K wps
[Epoch 69 Batch 90/162] avg loss 0.00107614, throughput 3.97214K wps
[Epoch 69 Batch 120/162] avg loss 0.00123578, throughput 3.96351K wps
[Epoch 69 Batch 150/162] avg loss 0.00105286, throughput 3.95602K wps
Begin Testing...
[Epoch 69] train avg loss 0.00110215, dev acc 0.9333, dev avg loss 0.181982, throughput 3.98256K wps
[Epoch 70 Batch 30/162] avg loss 0.00100313, throughput 4.04903K wps
[Epoch 70 Batch 60/162] avg loss 0.00107237, throughput 3.9722K wps
[Epoch 70 Batch 90/162] avg loss 0.00108269, throughput 3.97309K wps
[Epoch 70 Batch 120/162] avg loss 0.00110743, throughput 3.96182K wps
[Epoch 70 Batch 150/162] avg loss 0.000990837, throughput 3.95823K wps
Begin Testing...
[Epoch 70] train avg loss 0.00104917, dev acc 0.9333, dev avg loss 0.181803, throughput 3.98039K wps
[Epoch 71 Batch 30/162] avg loss 0.00100187, throughput 4.06356K wps
[Epoch 71 Batch 60/162] avg loss 0.00108663, throughput 3.98109K wps
[Epoch 71 Batch 90/162] avg loss 0.00110448, throughput 3.97525K wps
[Epoch 71 Batch 120/162] avg loss 0.000942606, throughput 3.96681K wps
[Epoch 71 Batch 150/162] avg loss 0.00114013, throughput 3.95557K wps
Begin Testing...
[Epoch 71] train avg loss 0.00105442, dev acc 0.9333, dev avg loss 0.181669, throughput 3.98688K wps
[Epoch 72 Batch 30/162] avg loss 0.00089877, throughput 4.05168K wps
[Epoch 72 Batch 60/162] avg loss 0.000898593, throughput 3.96897K wps
[Epoch 72 Batch 90/162] avg loss 0.00100363, throughput 3.97525K wps
[Epoch 72 Batch 120/162] avg loss 0.000947579, throughput 3.95681K wps
[Epoch 72 Batch 150/162] avg loss 0.00110796, throughput 3.97515K wps
Begin Testing...
[Epoch 72] train avg loss 0.00097214, dev acc 0.9322, dev avg loss 0.182187, throughput 3.98464K wps
[Epoch 73 Batch 30/162] avg loss 0.00105181, throughput 4.0788K wps
[Epoch 73 Batch 60/162] avg loss 0.000929677, throughput 3.96915K wps
[Epoch 73 Batch 90/162] avg loss 0.000913868, throughput 3.95671K wps
[Epoch 73 Batch 120/162] avg loss 0.00108525, throughput 3.97772K wps
[Epoch 73 Batch 150/162] avg loss 0.000974791, throughput 3.9672K wps
Begin Testing...
[Epoch 73] train avg loss 0.000988432, dev acc 0.9344, dev avg loss 0.181612, throughput 3.98595K wps
[Epoch 74 Batch 30/162] avg loss 0.000921319, throughput 4.05645K wps
[Epoch 74 Batch 60/162] avg loss 0.000971371, throughput 3.96821K wps
[Epoch 74 Batch 90/162] avg loss 0.000925387, throughput 3.96174K wps
[Epoch 74 Batch 120/162] avg loss 0.000988946, throughput 3.97456K wps
[Epoch 74 Batch 150/162] avg loss 0.00101659, throughput 3.97539K wps
Begin Testing...
[Epoch 74] train avg loss 0.00096508, dev acc 0.9333, dev avg loss 0.181677, throughput 3.98572K wps
[Epoch 75 Batch 30/162] avg loss 0.00101313, throughput 4.0694K wps
[Epoch 75 Batch 60/162] avg loss 0.000911673, throughput 3.9583K wps
[Epoch 75 Batch 90/162] avg loss 0.000838424, throughput 3.97367K wps
[Epoch 75 Batch 120/162] avg loss 0.000985786, throughput 3.95526K wps
[Epoch 75 Batch 150/162] avg loss 0.000911752, throughput 3.965K wps
Begin Testing...
[Epoch 75] train avg loss 0.000947807, dev acc 0.9344, dev avg loss 0.181637, throughput 3.98202K wps
[Epoch 76 Batch 30/162] avg loss 0.00100895, throughput 4.05787K wps
[Epoch 76 Batch 60/162] avg loss 0.00092185, throughput 3.97171K wps
[Epoch 76 Batch 90/162] avg loss 0.000797864, throughput 3.97649K wps
[Epoch 76 Batch 120/162] avg loss 0.000912172, throughput 3.96458K wps
[Epoch 76 Batch 150/162] avg loss 0.00102301, throughput 3.96406K wps
Begin Testing...
[Epoch 76] train avg loss 0.000931856, dev acc 0.9322, dev avg loss 0.181817, throughput 3.98534K wps
[Epoch 77 Batch 30/162] avg loss 0.00102869, throughput 4.07009K wps
[Epoch 77 Batch 60/162] avg loss 0.000879338, throughput 3.97358K wps
[Epoch 77 Batch 90/162] avg loss 0.000870625, throughput 3.95382K wps
[Epoch 77 Batch 120/162] avg loss 0.000916444, throughput 3.95614K wps
[Epoch 77 Batch 150/162] avg loss 0.000775711, throughput 3.96245K wps
Begin Testing...
[Epoch 77] train avg loss 0.000900306, dev acc 0.9333, dev avg loss 0.181692, throughput 3.98044K wps
[Epoch 78 Batch 30/162] avg loss 0.000762933, throughput 4.06695K wps
[Epoch 78 Batch 60/162] avg loss 0.00088633, throughput 3.96765K wps
[Epoch 78 Batch 90/162] avg loss 0.000930026, throughput 3.97201K wps
[Epoch 78 Batch 120/162] avg loss 0.000836505, throughput 3.96983K wps
[Epoch 78 Batch 150/162] avg loss 0.000731625, throughput 3.96922K wps
Begin Testing...
[Epoch 78] train avg loss 0.000828926, dev acc 0.9333, dev avg loss 0.18208, throughput 3.98734K wps
[Epoch 79 Batch 30/162] avg loss 0.000892013, throughput 4.05082K wps
[Epoch 79 Batch 60/162] avg loss 0.000855711, throughput 3.96186K wps
[Epoch 79 Batch 90/162] avg loss 0.000829043, throughput 3.97222K wps
[Epoch 79 Batch 120/162] avg loss 0.000854607, throughput 3.9688K wps
[Epoch 79 Batch 150/162] avg loss 0.000783181, throughput 3.9634K wps
Begin Testing...
[Epoch 79] train avg loss 0.000834929, dev acc 0.9344, dev avg loss 0.181966, throughput 3.98216K wps
[Epoch 80 Batch 30/162] avg loss 0.000835757, throughput 4.06149K wps
[Epoch 80 Batch 60/162] avg loss 0.000656415, throughput 3.96509K wps
[Epoch 80 Batch 90/162] avg loss 0.000975717, throughput 3.95149K wps
[Epoch 80 Batch 120/162] avg loss 0.00089973, throughput 3.97012K wps
[Epoch 80 Batch 150/162] avg loss 0.000678417, throughput 3.96124K wps
Begin Testing...
[Epoch 80] train avg loss 0.000805269, dev acc 0.9333, dev avg loss 0.182112, throughput 3.98009K wps
[Epoch 81 Batch 30/162] avg loss 0.000955182, throughput 4.0667K wps
[Epoch 81 Batch 60/162] avg loss 0.000769738, throughput 3.94643K wps
[Epoch 81 Batch 90/162] avg loss 0.000926774, throughput 3.97841K wps
[Epoch 81 Batch 120/162] avg loss 0.000829716, throughput 3.97657K wps
[Epoch 81 Batch 150/162] avg loss 0.000730869, throughput 3.97054K wps
Begin Testing...
[Epoch 81] train avg loss 0.000847675, dev acc 0.9344, dev avg loss 0.182362, throughput 3.98379K wps
[Epoch 82 Batch 30/162] avg loss 0.00071372, throughput 4.04732K wps
[Epoch 82 Batch 60/162] avg loss 0.00087429, throughput 3.95981K wps
[Epoch 82 Batch 90/162] avg loss 0.000910639, throughput 3.94398K wps
[Epoch 82 Batch 120/162] avg loss 0.000729443, throughput 3.95589K wps
[Epoch 82 Batch 150/162] avg loss 0.000807934, throughput 3.96441K wps
Begin Testing...
[Epoch 82] train avg loss 0.000802053, dev acc 0.9322, dev avg loss 0.182418, throughput 3.97407K wps
[Epoch 83 Batch 30/162] avg loss 0.000772763, throughput 4.06656K wps
[Epoch 83 Batch 60/162] avg loss 0.000692768, throughput 3.97712K wps
[Epoch 83 Batch 90/162] avg loss 0.000864838, throughput 3.96919K wps
[Epoch 83 Batch 120/162] avg loss 0.000793957, throughput 3.964K wps
[Epoch 83 Batch 150/162] avg loss 0.000838062, throughput 3.96494K wps
Begin Testing...
[Epoch 83] train avg loss 0.000796813, dev acc 0.9322, dev avg loss 0.182375, throughput 3.98718K wps
[Epoch 84 Batch 30/162] avg loss 0.000623201, throughput 4.05223K wps
[Epoch 84 Batch 60/162] avg loss 0.000755581, throughput 3.96508K wps
[Epoch 84 Batch 90/162] avg loss 0.000769831, throughput 3.96089K wps
[Epoch 84 Batch 120/162] avg loss 0.00075579, throughput 3.95218K wps
[Epoch 84 Batch 150/162] avg loss 0.000695814, throughput 3.95663K wps
Begin Testing...
[Epoch 84] train avg loss 0.000724311, dev acc 0.9322, dev avg loss 0.182578, throughput 3.97515K wps
[Epoch 85 Batch 30/162] avg loss 0.000741099, throughput 4.05161K wps
[Epoch 85 Batch 60/162] avg loss 0.000689993, throughput 3.96212K wps
[Epoch 85 Batch 90/162] avg loss 0.000739923, throughput 3.96539K wps
[Epoch 85 Batch 120/162] avg loss 0.000729157, throughput 3.96844K wps
[Epoch 85 Batch 150/162] avg loss 0.000793822, throughput 3.95762K wps
Begin Testing...
[Epoch 85] train avg loss 0.00073653, dev acc 0.9333, dev avg loss 0.182782, throughput 3.97934K wps
[Epoch 86 Batch 30/162] avg loss 0.000671761, throughput 4.06645K wps
[Epoch 86 Batch 60/162] avg loss 0.000752397, throughput 3.96539K wps
[Epoch 86 Batch 90/162] avg loss 0.000735468, throughput 3.96476K wps
[Epoch 86 Batch 120/162] avg loss 0.000666766, throughput 3.95305K wps
[Epoch 86 Batch 150/162] avg loss 0.000685003, throughput 3.97358K wps
Begin Testing...
[Epoch 86] train avg loss 0.000711371, dev acc 0.9333, dev avg loss 0.182738, throughput 3.9819K wps
[Epoch 87 Batch 30/162] avg loss 0.000677164, throughput 4.05192K wps
[Epoch 87 Batch 60/162] avg loss 0.000817833, throughput 3.96596K wps
[Epoch 87 Batch 90/162] avg loss 0.000681931, throughput 3.94217K wps
[Epoch 87 Batch 120/162] avg loss 0.00068824, throughput 3.96209K wps
[Epoch 87 Batch 150/162] avg loss 0.000670266, throughput 3.97017K wps
Begin Testing...
[Epoch 87] train avg loss 0.000712718, dev acc 0.9300, dev avg loss 0.183322, throughput 3.97738K wps
[Epoch 88 Batch 30/162] avg loss 0.000735912, throughput 4.06253K wps
[Epoch 88 Batch 60/162] avg loss 0.000709585, throughput 3.96561K wps
[Epoch 88 Batch 90/162] avg loss 0.000616767, throughput 3.96099K wps
[Epoch 88 Batch 120/162] avg loss 0.000697438, throughput 3.96397K wps
[Epoch 88 Batch 150/162] avg loss 0.000726841, throughput 3.96134K wps
Begin Testing...
[Epoch 88] train avg loss 0.000695657, dev acc 0.9311, dev avg loss 0.183804, throughput 3.98213K wps
[Epoch 89 Batch 30/162] avg loss 0.000674158, throughput 4.04504K wps
[Epoch 89 Batch 60/162] avg loss 0.000660933, throughput 3.96947K wps
[Epoch 89 Batch 90/162] avg loss 0.000731445, throughput 3.97318K wps
[Epoch 89 Batch 120/162] avg loss 0.000809107, throughput 3.9686K wps
[Epoch 89 Batch 150/162] avg loss 0.000701009, throughput 3.97418K wps
Begin Testing...
[Epoch 89] train avg loss 0.000714472, dev acc 0.9322, dev avg loss 0.183141, throughput 3.98386K wps
[Epoch 90 Batch 30/162] avg loss 0.000590739, throughput 4.06371K wps
[Epoch 90 Batch 60/162] avg loss 0.00073448, throughput 3.96248K wps
[Epoch 90 Batch 90/162] avg loss 0.000727161, throughput 3.95874K wps
[Epoch 90 Batch 120/162] avg loss 0.000617019, throughput 3.95564K wps
[Epoch 90 Batch 150/162] avg loss 0.000638497, throughput 3.95297K wps
Begin Testing...
[Epoch 90] train avg loss 0.000654206, dev acc 0.9344, dev avg loss 0.182731, throughput 3.97541K wps
[Epoch 91 Batch 30/162] avg loss 0.000575349, throughput 4.07451K wps
[Epoch 91 Batch 60/162] avg loss 0.000620117, throughput 3.96295K wps
[Epoch 91 Batch 90/162] avg loss 0.000623099, throughput 3.97222K wps
[Epoch 91 Batch 120/162] avg loss 0.000705142, throughput 3.96716K wps
[Epoch 91 Batch 150/162] avg loss 0.00072501, throughput 3.97099K wps
Begin Testing...
[Epoch 91] train avg loss 0.00064149, dev acc 0.9311, dev avg loss 0.183325, throughput 3.98695K wps
[Epoch 92 Batch 30/162] avg loss 0.000698119, throughput 4.05881K wps
[Epoch 92 Batch 60/162] avg loss 0.000525582, throughput 3.9764K wps
[Epoch 92 Batch 90/162] avg loss 0.000630533, throughput 3.96093K wps
[Epoch 92 Batch 120/162] avg loss 0.000573324, throughput 3.94817K wps
[Epoch 92 Batch 150/162] avg loss 0.000668067, throughput 3.97273K wps
Begin Testing...
[Epoch 92] train avg loss 0.000621524, dev acc 0.9344, dev avg loss 0.18325, throughput 3.98272K wps
[Epoch 93 Batch 30/162] avg loss 0.000682937, throughput 4.07363K wps
[Epoch 93 Batch 60/162] avg loss 0.000575994, throughput 3.96528K wps
[Epoch 93 Batch 90/162] avg loss 0.000664011, throughput 3.97523K wps
[Epoch 93 Batch 120/162] avg loss 0.00055517, throughput 3.96786K wps
[Epoch 93 Batch 150/162] avg loss 0.000569829, throughput 3.95968K wps
Begin Testing...
[Epoch 93] train avg loss 0.000613127, dev acc 0.9333, dev avg loss 0.183987, throughput 3.98664K wps
[Epoch 94 Batch 30/162] avg loss 0.000546899, throughput 4.06014K wps
[Epoch 94 Batch 60/162] avg loss 0.000553227, throughput 3.94758K wps
[Epoch 94 Batch 90/162] avg loss 0.000636203, throughput 3.9673K wps
[Epoch 94 Batch 120/162] avg loss 0.000572637, throughput 3.94265K wps
[Epoch 94 Batch 150/162] avg loss 0.000645913, throughput 3.9663K wps
Begin Testing...
[Epoch 94] train avg loss 0.000591359, dev acc 0.9344, dev avg loss 0.18437, throughput 3.9743K wps
[Epoch 95 Batch 30/162] avg loss 0.000580111, throughput 4.06978K wps
[Epoch 95 Batch 60/162] avg loss 0.000598514, throughput 3.97608K wps
[Epoch 95 Batch 90/162] avg loss 0.000580093, throughput 3.94951K wps
[Epoch 95 Batch 120/162] avg loss 0.000607476, throughput 3.97105K wps
[Epoch 95 Batch 150/162] avg loss 0.000548746, throughput 3.96193K wps
Begin Testing...
[Epoch 95] train avg loss 0.000583384, dev acc 0.9322, dev avg loss 0.184616, throughput 3.98316K wps
[Epoch 96 Batch 30/162] avg loss 0.000586459, throughput 4.06523K wps
[Epoch 96 Batch 60/162] avg loss 0.000473286, throughput 3.96288K wps
[Epoch 96 Batch 90/162] avg loss 0.000552684, throughput 3.94944K wps
[Epoch 96 Batch 120/162] avg loss 0.000585313, throughput 3.96924K wps
[Epoch 96 Batch 150/162] avg loss 0.000554352, throughput 3.96518K wps
Begin Testing...
[Epoch 96] train avg loss 0.00055777, dev acc 0.9322, dev avg loss 0.184601, throughput 3.98142K wps
[Epoch 97 Batch 30/162] avg loss 0.000547031, throughput 4.06003K wps
[Epoch 97 Batch 60/162] avg loss 0.000548583, throughput 3.97211K wps
[Epoch 97 Batch 90/162] avg loss 0.000516089, throughput 3.96816K wps
[Epoch 97 Batch 120/162] avg loss 0.000651979, throughput 3.95921K wps
[Epoch 97 Batch 150/162] avg loss 0.000538305, throughput 3.96428K wps
Begin Testing...
[Epoch 97] train avg loss 0.000563162, dev acc 0.9311, dev avg loss 0.185014, throughput 3.98317K wps
[Epoch 98 Batch 30/162] avg loss 0.000550857, throughput 4.0549K wps
[Epoch 98 Batch 60/162] avg loss 0.000542591, throughput 3.97054K wps
[Epoch 98 Batch 90/162] avg loss 0.000548194, throughput 3.96669K wps
[Epoch 98 Batch 120/162] avg loss 0.000581041, throughput 3.96781K wps
[Epoch 98 Batch 150/162] avg loss 0.000578692, throughput 3.96215K wps
Begin Testing...
[Epoch 98] train avg loss 0.00055828, dev acc 0.9322, dev avg loss 0.185282, throughput 3.98326K wps
[Epoch 99 Batch 30/162] avg loss 0.000532764, throughput 4.06297K wps
[Epoch 99 Batch 60/162] avg loss 0.000526751, throughput 3.97798K wps
[Epoch 99 Batch 90/162] avg loss 0.000553887, throughput 3.96151K wps
[Epoch 99 Batch 120/162] avg loss 0.000530505, throughput 3.96893K wps
[Epoch 99 Batch 150/162] avg loss 0.000527314, throughput 3.96336K wps
Begin Testing...
[Epoch 99] train avg loss 0.000536309, dev acc 0.9311, dev avg loss 0.185482, throughput 3.98609K wps
[Epoch 100 Batch 30/162] avg loss 0.000497654, throughput 4.07581K wps
[Epoch 100 Batch 60/162] avg loss 0.000474397, throughput 3.96979K wps
[Epoch 100 Batch 90/162] avg loss 0.000470625, throughput 3.95254K wps
[Epoch 100 Batch 120/162] avg loss 0.000528593, throughput 3.97316K wps
[Epoch 100 Batch 150/162] avg loss 0.00062933, throughput 3.97524K wps
Begin Testing...
[Epoch 100] train avg loss 0.000526325, dev acc 0.9300, dev avg loss 0.188014, throughput 3.98774K wps
[Epoch 101 Batch 30/162] avg loss 0.000478585, throughput 4.06316K wps
[Epoch 101 Batch 60/162] avg loss 0.00057653, throughput 3.97095K wps
[Epoch 101 Batch 90/162] avg loss 0.000494524, throughput 3.96375K wps
[Epoch 101 Batch 120/162] avg loss 0.000558737, throughput 3.97019K wps
[Epoch 101 Batch 150/162] avg loss 0.000532355, throughput 3.96357K wps
Begin Testing...
[Epoch 101] train avg loss 0.000527604, dev acc 0.9311, dev avg loss 0.18648, throughput 3.98425K wps
[Epoch 102 Batch 30/162] avg loss 0.000478834, throughput 4.06519K wps
[Epoch 102 Batch 60/162] avg loss 0.000504574, throughput 3.95077K wps
[Epoch 102 Batch 90/162] avg loss 0.00056215, throughput 3.97335K wps
[Epoch 102 Batch 120/162] avg loss 0.000562314, throughput 3.97037K wps
[Epoch 102 Batch 150/162] avg loss 0.000468983, throughput 3.96632K wps
Begin Testing...
[Epoch 102] train avg loss 0.000519837, dev acc 0.9300, dev avg loss 0.186595, throughput 3.98368K wps
[Epoch 103 Batch 30/162] avg loss 0.000509318, throughput 4.07662K wps
[Epoch 103 Batch 60/162] avg loss 0.000525161, throughput 3.96128K wps
[Epoch 103 Batch 90/162] avg loss 0.00052738, throughput 3.96284K wps
[Epoch 103 Batch 120/162] avg loss 0.000504557, throughput 3.97198K wps
[Epoch 103 Batch 150/162] avg loss 0.000462493, throughput 3.97721K wps
Begin Testing...
[Epoch 103] train avg loss 0.000499655, dev acc 0.9289, dev avg loss 0.186943, throughput 3.98604K wps
[Epoch 104 Batch 30/162] avg loss 0.000452756, throughput 4.06118K wps
[Epoch 104 Batch 60/162] avg loss 0.000507455, throughput 3.97279K wps
[Epoch 104 Batch 90/162] avg loss 0.000452075, throughput 3.95436K wps
[Epoch 104 Batch 120/162] avg loss 0.000467171, throughput 3.96021K wps
[Epoch 104 Batch 150/162] avg loss 0.000566069, throughput 3.97312K wps
Begin Testing...
[Epoch 104] train avg loss 0.000491951, dev acc 0.9322, dev avg loss 0.186609, throughput 3.98298K wps
[Epoch 105 Batch 30/162] avg loss 0.000475935, throughput 4.05587K wps
[Epoch 105 Batch 60/162] avg loss 0.000526947, throughput 3.96132K wps
[Epoch 105 Batch 90/162] avg loss 0.000500829, throughput 3.96963K wps
[Epoch 105 Batch 120/162] avg loss 0.000471352, throughput 3.95735K wps
[Epoch 105 Batch 150/162] avg loss 0.000501416, throughput 3.96252K wps
Begin Testing...
[Epoch 105] train avg loss 0.00050031, dev acc 0.9322, dev avg loss 0.186677, throughput 3.97884K wps
[Epoch 106 Batch 30/162] avg loss 0.000418731, throughput 4.04728K wps
[Epoch 106 Batch 60/162] avg loss 0.000479082, throughput 3.94481K wps
[Epoch 106 Batch 90/162] avg loss 0.0004989, throughput 3.97208K wps
[Epoch 106 Batch 120/162] avg loss 0.000433463, throughput 3.97598K wps
[Epoch 106 Batch 150/162] avg loss 0.000535189, throughput 3.97008K wps
Begin Testing...
[Epoch 106] train avg loss 0.000467624, dev acc 0.9344, dev avg loss 0.186382, throughput 3.98098K wps
[Epoch 107 Batch 30/162] avg loss 0.000430874, throughput 4.07406K wps
[Epoch 107 Batch 60/162] avg loss 0.000459403, throughput 3.97192K wps
[Epoch 107 Batch 90/162] avg loss 0.000441251, throughput 3.97133K wps
[Epoch 107 Batch 120/162] avg loss 0.000509154, throughput 3.97006K wps
[Epoch 107 Batch 150/162] avg loss 0.00043173, throughput 3.95957K wps
Begin Testing...
[Epoch 107] train avg loss 0.000460006, dev acc 0.9344, dev avg loss 0.187268, throughput 3.98553K wps
[Epoch 108 Batch 30/162] avg loss 0.000536803, throughput 4.07321K wps
[Epoch 108 Batch 60/162] avg loss 0.00048061, throughput 3.97661K wps
[Epoch 108 Batch 90/162] avg loss 0.000482314, throughput 3.9677K wps
[Epoch 108 Batch 120/162] avg loss 0.000485177, throughput 3.97347K wps
[Epoch 108 Batch 150/162] avg loss 0.000471475, throughput 3.97323K wps
Begin Testing...
[Epoch 108] train avg loss 0.00048509, dev acc 0.9300, dev avg loss 0.188022, throughput 3.99027K wps
[Epoch 109 Batch 30/162] avg loss 0.000398543, throughput 4.06194K wps
[Epoch 109 Batch 60/162] avg loss 0.000440951, throughput 3.96849K wps
[Epoch 109 Batch 90/162] avg loss 0.000464583, throughput 3.97587K wps
[Epoch 109 Batch 120/162] avg loss 0.000515382, throughput 3.96477K wps
[Epoch 109 Batch 150/162] avg loss 0.000439724, throughput 3.97087K wps
Begin Testing...
[Epoch 109] train avg loss 0.000453557, dev acc 0.9300, dev avg loss 0.187893, throughput 3.98627K wps
[Epoch 110 Batch 30/162] avg loss 0.00041932, throughput 4.07756K wps
[Epoch 110 Batch 60/162] avg loss 0.000382612, throughput 3.96643K wps
[Epoch 110 Batch 90/162] avg loss 0.000475249, throughput 3.97024K wps
[Epoch 110 Batch 120/162] avg loss 0.000450101, throughput 3.96472K wps
[Epoch 110 Batch 150/162] avg loss 0.000473378, throughput 3.96986K wps
Begin Testing...
[Epoch 110] train avg loss 0.000447144, dev acc 0.9300, dev avg loss 0.188315, throughput 3.9881K wps
[Epoch 111 Batch 30/162] avg loss 0.00042723, throughput 4.06361K wps
[Epoch 111 Batch 60/162] avg loss 0.000456219, throughput 3.9709K wps
[Epoch 111 Batch 90/162] avg loss 0.000393657, throughput 3.97373K wps
[Epoch 111 Batch 120/162] avg loss 0.00042225, throughput 3.97925K wps
[Epoch 111 Batch 150/162] avg loss 0.000466726, throughput 3.97883K wps
Begin Testing...
[Epoch 111] train avg loss 0.00043336, dev acc 0.9300, dev avg loss 0.18883, throughput 3.99199K wps
[Epoch 112 Batch 30/162] avg loss 0.000389151, throughput 4.05896K wps
[Epoch 112 Batch 60/162] avg loss 0.000494894, throughput 3.97163K wps
[Epoch 112 Batch 90/162] avg loss 0.000403753, throughput 3.96481K wps
[Epoch 112 Batch 120/162] avg loss 0.000451529, throughput 3.97389K wps
[Epoch 112 Batch 150/162] avg loss 0.000416782, throughput 3.97056K wps
Begin Testing...
[Epoch 112] train avg loss 0.00043018, dev acc 0.9311, dev avg loss 0.189285, throughput 3.98604K wps
[Epoch 113 Batch 30/162] avg loss 0.000413921, throughput 4.06057K wps
[Epoch 113 Batch 60/162] avg loss 0.00039652, throughput 3.97852K wps
[Epoch 113 Batch 90/162] avg loss 0.000408989, throughput 3.9807K wps
[Epoch 113 Batch 120/162] avg loss 0.000393399, throughput 3.96796K wps
[Epoch 113 Batch 150/162] avg loss 0.000419314, throughput 3.97206K wps
Begin Testing...
[Epoch 113] train avg loss 0.000403812, dev acc 0.9300, dev avg loss 0.189972, throughput 3.98996K wps
[Epoch 114 Batch 30/162] avg loss 0.000375357, throughput 4.06857K wps
[Epoch 114 Batch 60/162] avg loss 0.000441412, throughput 3.98049K wps
[Epoch 114 Batch 90/162] avg loss 0.000440071, throughput 3.96716K wps
[Epoch 114 Batch 120/162] avg loss 0.000352179, throughput 3.96826K wps
[Epoch 114 Batch 150/162] avg loss 0.000502261, throughput 3.97595K wps
Begin Testing...
[Epoch 114] train avg loss 0.000416151, dev acc 0.9289, dev avg loss 0.188506, throughput 3.99022K wps
[Epoch 115 Batch 30/162] avg loss 0.000368376, throughput 4.06408K wps
[Epoch 115 Batch 60/162] avg loss 0.0003735, throughput 3.97442K wps
[Epoch 115 Batch 90/162] avg loss 0.000459978, throughput 3.97102K wps
[Epoch 115 Batch 120/162] avg loss 0.000440839, throughput 3.97629K wps
[Epoch 115 Batch 150/162] avg loss 0.000379531, throughput 3.9625K wps
Begin Testing...
[Epoch 115] train avg loss 0.000408039, dev acc 0.9311, dev avg loss 0.188638, throughput 3.98741K wps
[Epoch 116 Batch 30/162] avg loss 0.000426842, throughput 4.07003K wps
[Epoch 116 Batch 60/162] avg loss 0.000343613, throughput 3.96816K wps
[Epoch 116 Batch 90/162] avg loss 0.000412731, throughput 3.97521K wps
[Epoch 116 Batch 120/162] avg loss 0.000383596, throughput 3.9757K wps
[Epoch 116 Batch 150/162] avg loss 0.00034935, throughput 3.97926K wps
Begin Testing...
[Epoch 116] train avg loss 0.000389023, dev acc 0.9322, dev avg loss 0.189867, throughput 3.99051K wps
[Epoch 117 Batch 30/162] avg loss 0.000358394, throughput 4.06194K wps
[Epoch 117 Batch 60/162] avg loss 0.000398445, throughput 3.97421K wps
[Epoch 117 Batch 90/162] avg loss 0.000389218, throughput 3.96384K wps
[Epoch 117 Batch 120/162] avg loss 0.000397945, throughput 3.96905K wps
[Epoch 117 Batch 150/162] avg loss 0.000450715, throughput 3.97056K wps
Begin Testing...
[Epoch 117] train avg loss 0.000391187, dev acc 0.9322, dev avg loss 0.189585, throughput 3.98485K wps
[Epoch 118 Batch 30/162] avg loss 0.000397012, throughput 4.05201K wps
[Epoch 118 Batch 60/162] avg loss 0.000371925, throughput 3.96039K wps
[Epoch 118 Batch 90/162] avg loss 0.000346151, throughput 3.9663K wps
[Epoch 118 Batch 120/162] avg loss 0.000356386, throughput 3.97227K wps
[Epoch 118 Batch 150/162] avg loss 0.00038562, throughput 3.97031K wps
Begin Testing...
[Epoch 118] train avg loss 0.000372695, dev acc 0.9289, dev avg loss 0.19022, throughput 3.9821K wps
[Epoch 119 Batch 30/162] avg loss 0.000331779, throughput 4.06685K wps
[Epoch 119 Batch 60/162] avg loss 0.000373591, throughput 3.9725K wps
[Epoch 119 Batch 90/162] avg loss 0.000337733, throughput 3.97212K wps
[Epoch 119 Batch 120/162] avg loss 0.000377943, throughput 3.96169K wps
[Epoch 119 Batch 150/162] avg loss 0.000399757, throughput 3.9709K wps
Begin Testing...
[Epoch 119] train avg loss 0.000362327, dev acc 0.9289, dev avg loss 0.190576, throughput 3.98769K wps
[Epoch 120 Batch 30/162] avg loss 0.000350948, throughput 4.07326K wps
[Epoch 120 Batch 60/162] avg loss 0.00043306, throughput 3.97618K wps
[Epoch 120 Batch 90/162] avg loss 0.00034861, throughput 3.9773K wps
[Epoch 120 Batch 120/162] avg loss 0.00037003, throughput 3.96881K wps
[Epoch 120 Batch 150/162] avg loss 0.000383052, throughput 3.96347K wps
Begin Testing...
[Epoch 120] train avg loss 0.000377519, dev acc 0.9311, dev avg loss 0.190102, throughput 3.98846K wps
[Epoch 121 Batch 30/162] avg loss 0.000315809, throughput 4.07927K wps
[Epoch 121 Batch 60/162] avg loss 0.000318624, throughput 3.96842K wps
[Epoch 121 Batch 90/162] avg loss 0.000364773, throughput 3.95794K wps
[Epoch 121 Batch 120/162] avg loss 0.000361587, throughput 3.96917K wps
[Epoch 121 Batch 150/162] avg loss 0.000352017, throughput 3.96812K wps
Begin Testing...
[Epoch 121] train avg loss 0.000347454, dev acc 0.9300, dev avg loss 0.190659, throughput 3.98668K wps
[Epoch 122 Batch 30/162] avg loss 0.000337408, throughput 4.05843K wps
[Epoch 122 Batch 60/162] avg loss 0.000411336, throughput 3.98108K wps
[Epoch 122 Batch 90/162] avg loss 0.000392273, throughput 3.96967K wps
[Epoch 122 Batch 120/162] avg loss 0.0003429, throughput 3.97091K wps
[Epoch 122 Batch 150/162] avg loss 0.000372471, throughput 3.96812K wps
Begin Testing...
[Epoch 122] train avg loss 0.000364443, dev acc 0.9289, dev avg loss 0.190439, throughput 3.988K wps
[Epoch 123 Batch 30/162] avg loss 0.000353848, throughput 4.06292K wps
[Epoch 123 Batch 60/162] avg loss 0.000343424, throughput 3.97042K wps
[Epoch 123 Batch 90/162] avg loss 0.000355563, throughput 3.9484K wps
[Epoch 123 Batch 120/162] avg loss 0.000363786, throughput 3.96407K wps
[Epoch 123 Batch 150/162] avg loss 0.000318986, throughput 3.97026K wps
Begin Testing...
[Epoch 123] train avg loss 0.000347003, dev acc 0.9289, dev avg loss 0.19172, throughput 3.98042K wps
[Epoch 124 Batch 30/162] avg loss 0.000349059, throughput 4.07533K wps
[Epoch 124 Batch 60/162] avg loss 0.00029822, throughput 3.96634K wps
[Epoch 124 Batch 90/162] avg loss 0.000329579, throughput 3.97634K wps
[Epoch 124 Batch 120/162] avg loss 0.000285373, throughput 3.95331K wps
[Epoch 124 Batch 150/162] avg loss 0.000395492, throughput 3.96795K wps
Begin Testing...
[Epoch 124] train avg loss 0.000329309, dev acc 0.9300, dev avg loss 0.191939, throughput 3.98672K wps
[Epoch 125 Batch 30/162] avg loss 0.000272659, throughput 4.06741K wps
[Epoch 125 Batch 60/162] avg loss 0.000315634, throughput 3.96508K wps
[Epoch 125 Batch 90/162] avg loss 0.000346663, throughput 3.97526K wps
[Epoch 125 Batch 120/162] avg loss 0.00037123, throughput 3.97453K wps
[Epoch 125 Batch 150/162] avg loss 0.000331453, throughput 3.97048K wps
Begin Testing...
[Epoch 125] train avg loss 0.000320202, dev acc 0.9300, dev avg loss 0.191729, throughput 3.98807K wps
[Epoch 126 Batch 30/162] avg loss 0.00032342, throughput 4.07425K wps
[Epoch 126 Batch 60/162] avg loss 0.000307765, throughput 3.96898K wps
[Epoch 126 Batch 90/162] avg loss 0.000347104, throughput 3.9673K wps
[Epoch 126 Batch 120/162] avg loss 0.000362418, throughput 3.9743K wps
[Epoch 126 Batch 150/162] avg loss 0.000305546, throughput 3.95883K wps
Begin Testing...
[Epoch 126] train avg loss 0.000327991, dev acc 0.9322, dev avg loss 0.191729, throughput 3.9873K wps
[Epoch 127 Batch 30/162] avg loss 0.000323554, throughput 4.06535K wps
[Epoch 127 Batch 60/162] avg loss 0.000343447, throughput 3.96073K wps
[Epoch 127 Batch 90/162] avg loss 0.000321067, throughput 3.97042K wps
[Epoch 127 Batch 120/162] avg loss 0.000339478, throughput 3.96003K wps
[Epoch 127 Batch 150/162] avg loss 0.000350222, throughput 3.97047K wps
Begin Testing...
[Epoch 127] train avg loss 0.000331066, dev acc 0.9300, dev avg loss 0.192462, throughput 3.98302K wps
[Epoch 128 Batch 30/162] avg loss 0.00038926, throughput 4.06345K wps
[Epoch 128 Batch 60/162] avg loss 0.000328692, throughput 3.97079K wps
[Epoch 128 Batch 90/162] avg loss 0.000272327, throughput 3.97185K wps
[Epoch 128 Batch 120/162] avg loss 0.000367802, throughput 3.97413K wps
[Epoch 128 Batch 150/162] avg loss 0.0003391, throughput 3.97444K wps
Begin Testing...
[Epoch 128] train avg loss 0.000338657, dev acc 0.9300, dev avg loss 0.192846, throughput 3.98925K wps
[Epoch 129 Batch 30/162] avg loss 0.000316129, throughput 4.07164K wps
[Epoch 129 Batch 60/162] avg loss 0.000369588, throughput 3.96334K wps
[Epoch 129 Batch 90/162] avg loss 0.000315087, throughput 3.96932K wps
[Epoch 129 Batch 120/162] avg loss 0.00031067, throughput 3.97151K wps
[Epoch 129 Batch 150/162] avg loss 0.000330878, throughput 3.94953K wps
Begin Testing...
[Epoch 129] train avg loss 0.000327641, dev acc 0.9311, dev avg loss 0.194439, throughput 3.98314K wps
[Epoch 130 Batch 30/162] avg loss 0.000291608, throughput 4.05494K wps
[Epoch 130 Batch 60/162] avg loss 0.000282713, throughput 3.97316K wps
[Epoch 130 Batch 90/162] avg loss 0.000324841, throughput 3.97064K wps
[Epoch 130 Batch 120/162] avg loss 0.000341928, throughput 3.96639K wps
[Epoch 130 Batch 150/162] avg loss 0.000298231, throughput 3.97024K wps
Begin Testing...
[Epoch 130] train avg loss 0.000309658, dev acc 0.9322, dev avg loss 0.192915, throughput 3.98509K wps
[Epoch 131 Batch 30/162] avg loss 0.000327657, throughput 4.07386K wps
[Epoch 131 Batch 60/162] avg loss 0.000325943, throughput 3.97503K wps
[Epoch 131 Batch 90/162] avg loss 0.000294477, throughput 3.95974K wps
[Epoch 131 Batch 120/162] avg loss 0.000302218, throughput 3.96697K wps
[Epoch 131 Batch 150/162] avg loss 0.000280107, throughput 3.97176K wps
Begin Testing...
[Epoch 131] train avg loss 0.000306158, dev acc 0.9289, dev avg loss 0.194059, throughput 3.98643K wps
[Epoch 132 Batch 30/162] avg loss 0.000346179, throughput 4.07058K wps
[Epoch 132 Batch 60/162] avg loss 0.000293118, throughput 3.97404K wps
[Epoch 132 Batch 90/162] avg loss 0.0003251, throughput 3.9524K wps
[Epoch 132 Batch 120/162] avg loss 0.000323055, throughput 3.96204K wps
[Epoch 132 Batch 150/162] avg loss 0.000279878, throughput 3.97439K wps
Begin Testing...
[Epoch 132] train avg loss 0.000307796, dev acc 0.9289, dev avg loss 0.19452, throughput 3.98435K wps
[Epoch 133 Batch 30/162] avg loss 0.000270959, throughput 4.06516K wps
[Epoch 133 Batch 60/162] avg loss 0.00027884, throughput 3.95863K wps
[Epoch 133 Batch 90/162] avg loss 0.000260569, throughput 3.96481K wps
[Epoch 133 Batch 120/162] avg loss 0.00027703, throughput 3.96218K wps
[Epoch 133 Batch 150/162] avg loss 0.000300929, throughput 3.98028K wps
Begin Testing...
[Epoch 133] train avg loss 0.00028397, dev acc 0.9311, dev avg loss 0.194709, throughput 3.98472K wps
[Epoch 134 Batch 30/162] avg loss 0.000327055, throughput 4.0515K wps
[Epoch 134 Batch 60/162] avg loss 0.000285361, throughput 3.95826K wps
[Epoch 134 Batch 90/162] avg loss 0.000296003, throughput 3.96538K wps
[Epoch 134 Batch 120/162] avg loss 0.000302728, throughput 3.96908K wps
[Epoch 134 Batch 150/162] avg loss 0.00030939, throughput 3.96644K wps
Begin Testing...
[Epoch 134] train avg loss 0.000301826, dev acc 0.9289, dev avg loss 0.195009, throughput 3.98131K wps
[Epoch 135 Batch 30/162] avg loss 0.000270455, throughput 4.0709K wps
[Epoch 135 Batch 60/162] avg loss 0.000279996, throughput 3.96764K wps
[Epoch 135 Batch 90/162] avg loss 0.000349147, throughput 3.97478K wps
[Epoch 135 Batch 120/162] avg loss 0.000344075, throughput 3.97087K wps
[Epoch 135 Batch 150/162] avg loss 0.000314686, throughput 3.9727K wps
Begin Testing...
[Epoch 135] train avg loss 0.000309781, dev acc 0.9300, dev avg loss 0.195419, throughput 3.98896K wps
[Epoch 136 Batch 30/162] avg loss 0.000306112, throughput 4.06042K wps
[Epoch 136 Batch 60/162] avg loss 0.000313163, throughput 3.97362K wps
[Epoch 136 Batch 90/162] avg loss 0.000301365, throughput 3.97934K wps
[Epoch 136 Batch 120/162] avg loss 0.000268573, throughput 3.95834K wps
[Epoch 136 Batch 150/162] avg loss 0.000288469, throughput 3.97723K wps
Begin Testing...
[Epoch 136] train avg loss 0.000297559, dev acc 0.9300, dev avg loss 0.194747, throughput 3.98874K wps
[Epoch 137 Batch 30/162] avg loss 0.000277647, throughput 4.07025K wps
[Epoch 137 Batch 60/162] avg loss 0.000282985, throughput 3.96608K wps
[Epoch 137 Batch 90/162] avg loss 0.000288337, throughput 3.96304K wps
[Epoch 137 Batch 120/162] avg loss 0.00024549, throughput 3.97135K wps
[Epoch 137 Batch 150/162] avg loss 0.000239611, throughput 3.96424K wps
Begin Testing...
[Epoch 137] train avg loss 0.000274329, dev acc 0.9300, dev avg loss 0.194976, throughput 3.98493K wps
[Epoch 138 Batch 30/162] avg loss 0.000259757, throughput 4.05597K wps
[Epoch 138 Batch 60/162] avg loss 0.000284674, throughput 3.97237K wps
[Epoch 138 Batch 90/162] avg loss 0.000292895, throughput 3.97319K wps
[Epoch 138 Batch 120/162] avg loss 0.000280315, throughput 3.96484K wps
[Epoch 138 Batch 150/162] avg loss 0.00022064, throughput 3.959K wps
Begin Testing...
[Epoch 138] train avg loss 0.00026464, dev acc 0.9300, dev avg loss 0.195455, throughput 3.98332K wps
[Epoch 139 Batch 30/162] avg loss 0.000287964, throughput 4.06965K wps
[Epoch 139 Batch 60/162] avg loss 0.000304948, throughput 3.97501K wps
[Epoch 139 Batch 90/162] avg loss 0.000263776, throughput 3.94972K wps
[Epoch 139 Batch 120/162] avg loss 0.000292848, throughput 3.97011K wps
[Epoch 139 Batch 150/162] avg loss 0.000301468, throughput 3.97189K wps
Begin Testing...
[Epoch 139] train avg loss 0.00028958, dev acc 0.9300, dev avg loss 0.194863, throughput 3.9853K wps
[Epoch 140 Batch 30/162] avg loss 0.000308193, throughput 4.05449K wps
[Epoch 140 Batch 60/162] avg loss 0.000277071, throughput 3.96771K wps
[Epoch 140 Batch 90/162] avg loss 0.000280234, throughput 3.9727K wps
[Epoch 140 Batch 120/162] avg loss 0.000260824, throughput 3.9758K wps
[Epoch 140 Batch 150/162] avg loss 0.000250871, throughput 3.9604K wps
Begin Testing...
[Epoch 140] train avg loss 0.000276956, dev acc 0.9300, dev avg loss 0.195626, throughput 3.98437K wps
[Epoch 141 Batch 30/162] avg loss 0.000286144, throughput 4.07659K wps
[Epoch 141 Batch 60/162] avg loss 0.000282276, throughput 3.96462K wps
[Epoch 141 Batch 90/162] avg loss 0.000292868, throughput 3.97134K wps
[Epoch 141 Batch 120/162] avg loss 0.000262451, throughput 3.97273K wps
[Epoch 141 Batch 150/162] avg loss 0.00026415, throughput 3.95692K wps
Begin Testing...
[Epoch 141] train avg loss 0.000273784, dev acc 0.9278, dev avg loss 0.197519, throughput 3.9862K wps
[Epoch 142 Batch 30/162] avg loss 0.000247657, throughput 4.06779K wps
[Epoch 142 Batch 60/162] avg loss 0.000278036, throughput 3.96903K wps
[Epoch 142 Batch 90/162] avg loss 0.000251733, throughput 3.96347K wps
[Epoch 142 Batch 120/162] avg loss 0.000248336, throughput 3.97354K wps
[Epoch 142 Batch 150/162] avg loss 0.000293798, throughput 3.96945K wps
Begin Testing...
[Epoch 142] train avg loss 0.000262314, dev acc 0.9300, dev avg loss 0.196672, throughput 3.98596K wps
[Epoch 143 Batch 30/162] avg loss 0.000266301, throughput 4.07392K wps
[Epoch 143 Batch 60/162] avg loss 0.000260168, throughput 3.96753K wps
[Epoch 143 Batch 90/162] avg loss 0.00025007, throughput 3.96675K wps
[Epoch 143 Batch 120/162] avg loss 0.000298581, throughput 3.97865K wps
[Epoch 143 Batch 150/162] avg loss 0.000231088, throughput 3.96901K wps
Begin Testing...
[Epoch 143] train avg loss 0.000259635, dev acc 0.9300, dev avg loss 0.195993, throughput 3.98929K wps
[Epoch 144 Batch 30/162] avg loss 0.000246795, throughput 4.05725K wps
[Epoch 144 Batch 60/162] avg loss 0.000266371, throughput 3.97582K wps
[Epoch 144 Batch 90/162] avg loss 0.000245736, throughput 3.9723K wps
[Epoch 144 Batch 120/162] avg loss 0.000233433, throughput 3.95956K wps
[Epoch 144 Batch 150/162] avg loss 0.000269773, throughput 3.96494K wps
Begin Testing...
[Epoch 144] train avg loss 0.000250171, dev acc 0.9289, dev avg loss 0.198433, throughput 3.98425K wps
[Epoch 145 Batch 30/162] avg loss 0.000279199, throughput 4.05482K wps
[Epoch 145 Batch 60/162] avg loss 0.00026128, throughput 3.9682K wps
[Epoch 145 Batch 90/162] avg loss 0.000253248, throughput 3.97506K wps
[Epoch 145 Batch 120/162] avg loss 0.000218842, throughput 3.97689K wps
[Epoch 145 Batch 150/162] avg loss 0.000236218, throughput 3.97303K wps
Begin Testing...
[Epoch 145] train avg loss 0.000249304, dev acc 0.9300, dev avg loss 0.197511, throughput 3.98789K wps
[Epoch 146 Batch 30/162] avg loss 0.000233697, throughput 4.06342K wps
[Epoch 146 Batch 60/162] avg loss 0.000280163, throughput 3.96882K wps
[Epoch 146 Batch 90/162] avg loss 0.000240262, throughput 3.9693K wps
[Epoch 146 Batch 120/162] avg loss 0.000230783, throughput 3.96736K wps
[Epoch 146 Batch 150/162] avg loss 0.000222711, throughput 3.95543K wps
Begin Testing...
[Epoch 146] train avg loss 0.000240464, dev acc 0.9289, dev avg loss 0.197371, throughput 3.98412K wps
[Epoch 147 Batch 30/162] avg loss 0.000206399, throughput 4.07956K wps
[Epoch 147 Batch 60/162] avg loss 0.000239679, throughput 3.97647K wps
[Epoch 147 Batch 90/162] avg loss 0.000282885, throughput 3.97232K wps
[Epoch 147 Batch 120/162] avg loss 0.000269183, throughput 3.97652K wps
[Epoch 147 Batch 150/162] avg loss 0.000299618, throughput 3.97186K wps
Begin Testing...
[Epoch 147] train avg loss 0.000259322, dev acc 0.9300, dev avg loss 0.198268, throughput 3.99173K wps
[Epoch 148 Batch 30/162] avg loss 0.000241964, throughput 4.06206K wps
[Epoch 148 Batch 60/162] avg loss 0.000257706, throughput 3.97283K wps
[Epoch 148 Batch 90/162] avg loss 0.000243928, throughput 3.96251K wps
[Epoch 148 Batch 120/162] avg loss 0.000260426, throughput 3.97441K wps
[Epoch 148 Batch 150/162] avg loss 0.000230795, throughput 3.97457K wps
Begin Testing...
[Epoch 148] train avg loss 0.000248261, dev acc 0.9289, dev avg loss 0.197491, throughput 3.98683K wps
[Epoch 149 Batch 30/162] avg loss 0.000283592, throughput 4.07765K wps
[Epoch 149 Batch 60/162] avg loss 0.000219049, throughput 3.97718K wps
[Epoch 149 Batch 90/162] avg loss 0.000233882, throughput 3.97603K wps
[Epoch 149 Batch 120/162] avg loss 0.000279695, throughput 3.96624K wps
[Epoch 149 Batch 150/162] avg loss 0.000244888, throughput 3.96597K wps
Begin Testing...
[Epoch 149] train avg loss 0.000251015, dev acc 0.9300, dev avg loss 0.20038, throughput 3.98978K wps
[Epoch 150 Batch 30/162] avg loss 0.000236784, throughput 4.04994K wps
[Epoch 150 Batch 60/162] avg loss 0.000225917, throughput 3.9541K wps
[Epoch 150 Batch 90/162] avg loss 0.000258072, throughput 3.97K wps
[Epoch 150 Batch 120/162] avg loss 0.000249415, throughput 3.9791K wps
[Epoch 150 Batch 150/162] avg loss 0.000262914, throughput 3.97815K wps
Begin Testing...
[Epoch 150] train avg loss 0.000247074, dev acc 0.9289, dev avg loss 0.198831, throughput 3.98533K wps
[Epoch 151 Batch 30/162] avg loss 0.000258705, throughput 4.06563K wps
[Epoch 151 Batch 60/162] avg loss 0.000245449, throughput 3.96085K wps
[Epoch 151 Batch 90/162] avg loss 0.000242871, throughput 3.96755K wps
[Epoch 151 Batch 120/162] avg loss 0.000222104, throughput 3.9714K wps
[Epoch 151 Batch 150/162] avg loss 0.000243519, throughput 3.96002K wps
Begin Testing...
[Epoch 151] train avg loss 0.000243644, dev acc 0.9278, dev avg loss 0.198214, throughput 3.98317K wps
[Epoch 152 Batch 30/162] avg loss 0.000271017, throughput 4.06317K wps
[Epoch 152 Batch 60/162] avg loss 0.000233091, throughput 3.97038K wps
[Epoch 152 Batch 90/162] avg loss 0.000199984, throughput 3.96335K wps
[Epoch 152 Batch 120/162] avg loss 0.000239002, throughput 3.97337K wps
[Epoch 152 Batch 150/162] avg loss 0.000184986, throughput 3.96589K wps
Begin Testing...
[Epoch 152] train avg loss 0.0002264, dev acc 0.9300, dev avg loss 0.19963, throughput 3.98441K wps
[Epoch 153 Batch 30/162] avg loss 0.000199461, throughput 4.05027K wps
[Epoch 153 Batch 60/162] avg loss 0.000220192, throughput 3.97217K wps
[Epoch 153 Batch 90/162] avg loss 0.000239478, throughput 3.9708K wps
[Epoch 153 Batch 120/162] avg loss 0.000226857, throughput 3.93975K wps
[Epoch 153 Batch 150/162] avg loss 0.000225999, throughput 3.95267K wps
Begin Testing...
[Epoch 153] train avg loss 0.000229683, dev acc 0.9278, dev avg loss 0.198667, throughput 3.97648K wps
[Epoch 154 Batch 30/162] avg loss 0.000202907, throughput 4.04198K wps
[Epoch 154 Batch 60/162] avg loss 0.000228693, throughput 3.96424K wps
[Epoch 154 Batch 90/162] avg loss 0.000216248, throughput 3.96243K wps
[Epoch 154 Batch 120/162] avg loss 0.00021522, throughput 3.9626K wps
[Epoch 154 Batch 150/162] avg loss 0.000235845, throughput 3.9599K wps
Begin Testing...
[Epoch 154] train avg loss 0.000222135, dev acc 0.9289, dev avg loss 0.199549, throughput 3.97777K wps
[Epoch 155 Batch 30/162] avg loss 0.000238876, throughput 4.06722K wps
[Epoch 155 Batch 60/162] avg loss 0.000222846, throughput 3.96332K wps
[Epoch 155 Batch 90/162] avg loss 0.000203217, throughput 3.97031K wps
[Epoch 155 Batch 120/162] avg loss 0.000208474, throughput 3.97309K wps
[Epoch 155 Batch 150/162] avg loss 0.000247211, throughput 3.97006K wps
Begin Testing...
[Epoch 155] train avg loss 0.000221643, dev acc 0.9311, dev avg loss 0.200398, throughput 3.98612K wps
[Epoch 156 Batch 30/162] avg loss 0.000251412, throughput 4.06865K wps
[Epoch 156 Batch 60/162] avg loss 0.00023305, throughput 3.97233K wps
[Epoch 156 Batch 90/162] avg loss 0.00020061, throughput 3.95733K wps
[Epoch 156 Batch 120/162] avg loss 0.000234662, throughput 3.9619K wps
[Epoch 156 Batch 150/162] avg loss 0.000230774, throughput 3.972K wps
Begin Testing...
[Epoch 156] train avg loss 0.000226351, dev acc 0.9300, dev avg loss 0.19998, throughput 3.98421K wps
[Epoch 157 Batch 30/162] avg loss 0.000216564, throughput 4.0586K wps
[Epoch 157 Batch 60/162] avg loss 0.000213238, throughput 3.96756K wps
[Epoch 157 Batch 90/162] avg loss 0.000241654, throughput 3.97321K wps
[Epoch 157 Batch 120/162] avg loss 0.000225332, throughput 3.96891K wps
[Epoch 157 Batch 150/162] avg loss 0.000191351, throughput 3.95991K wps
Begin Testing...
[Epoch 157] train avg loss 0.000218631, dev acc 0.9300, dev avg loss 0.200334, throughput 3.9838K wps
[Epoch 158 Batch 30/162] avg loss 0.000206608, throughput 4.05805K wps
[Epoch 158 Batch 60/162] avg loss 0.000292149, throughput 3.95532K wps
[Epoch 158 Batch 90/162] avg loss 0.000230339, throughput 3.96543K wps
[Epoch 158 Batch 120/162] avg loss 0.000197049, throughput 3.97174K wps
[Epoch 158 Batch 150/162] avg loss 0.000214106, throughput 3.96556K wps
Begin Testing...
[Epoch 158] train avg loss 0.000226373, dev acc 0.9311, dev avg loss 0.201173, throughput 3.98196K wps
[Epoch 159 Batch 30/162] avg loss 0.000214395, throughput 4.07156K wps
[Epoch 159 Batch 60/162] avg loss 0.000191208, throughput 3.96437K wps
[Epoch 159 Batch 90/162] avg loss 0.000175752, throughput 3.95584K wps
[Epoch 159 Batch 120/162] avg loss 0.000239432, throughput 3.97529K wps
[Epoch 159 Batch 150/162] avg loss 0.000200412, throughput 3.95112K wps
Begin Testing...
[Epoch 159] train avg loss 0.000205322, dev acc 0.9300, dev avg loss 0.201873, throughput 3.98277K wps
[Epoch 160 Batch 30/162] avg loss 0.000193531, throughput 4.07326K wps
[Epoch 160 Batch 60/162] avg loss 0.000282263, throughput 3.97504K wps
[Epoch 160 Batch 90/162] avg loss 0.000210554, throughput 3.97122K wps
[Epoch 160 Batch 120/162] avg loss 0.000228037, throughput 3.95136K wps
[Epoch 160 Batch 150/162] avg loss 0.00021716, throughput 3.94377K wps
Begin Testing...
[Epoch 160] train avg loss 0.000227976, dev acc 0.9300, dev avg loss 0.201471, throughput 3.97932K wps
[Epoch 161 Batch 30/162] avg loss 0.000269736, throughput 4.03551K wps
[Epoch 161 Batch 60/162] avg loss 0.000198388, throughput 3.95108K wps
[Epoch 161 Batch 90/162] avg loss 0.000218768, throughput 3.94332K wps
[Epoch 161 Batch 120/162] avg loss 0.000212363, throughput 3.94373K wps
[Epoch 161 Batch 150/162] avg loss 0.000185097, throughput 3.94643K wps
Begin Testing...
[Epoch 161] train avg loss 0.000216675, dev acc 0.9300, dev avg loss 0.201094, throughput 3.96254K wps
[Epoch 162 Batch 30/162] avg loss 0.00019404, throughput 4.06721K wps
[Epoch 162 Batch 60/162] avg loss 0.000234926, throughput 3.97634K wps
[Epoch 162 Batch 90/162] avg loss 0.000241839, throughput 3.97557K wps
[Epoch 162 Batch 120/162] avg loss 0.00019744, throughput 3.97567K wps
[Epoch 162 Batch 150/162] avg loss 0.000219982, throughput 3.96754K wps
Begin Testing...
[Epoch 162] train avg loss 0.000216131, dev acc 0.9289, dev avg loss 0.201228, throughput 3.99039K wps
[Epoch 163 Batch 30/162] avg loss 0.000213614, throughput 4.07403K wps
[Epoch 163 Batch 60/162] avg loss 0.000214636, throughput 3.95861K wps
[Epoch 163 Batch 90/162] avg loss 0.000174256, throughput 3.97201K wps
[Epoch 163 Batch 120/162] avg loss 0.000208346, throughput 3.95916K wps
[Epoch 163 Batch 150/162] avg loss 0.00021542, throughput 3.95494K wps
Begin Testing...
[Epoch 163] train avg loss 0.000206084, dev acc 0.9322, dev avg loss 0.200969, throughput 3.98183K wps
[Epoch 164 Batch 30/162] avg loss 0.000169261, throughput 4.06707K wps
[Epoch 164 Batch 60/162] avg loss 0.000178513, throughput 3.98021K wps
[Epoch 164 Batch 90/162] avg loss 0.000196393, throughput 3.96728K wps
[Epoch 164 Batch 120/162] avg loss 0.000180846, throughput 3.97447K wps
[Epoch 164 Batch 150/162] avg loss 0.000197908, throughput 3.96378K wps
Begin Testing...
[Epoch 164] train avg loss 0.000185584, dev acc 0.9300, dev avg loss 0.201164, throughput 3.98836K wps
[Epoch 165 Batch 30/162] avg loss 0.000182941, throughput 4.06505K wps
[Epoch 165 Batch 60/162] avg loss 0.000192462, throughput 3.9784K wps
[Epoch 165 Batch 90/162] avg loss 0.000181887, throughput 3.96748K wps
[Epoch 165 Batch 120/162] avg loss 0.000193256, throughput 3.96936K wps
[Epoch 165 Batch 150/162] avg loss 0.000219667, throughput 3.96606K wps
Begin Testing...
[Epoch 165] train avg loss 0.000197557, dev acc 0.9300, dev avg loss 0.202783, throughput 3.98662K wps
[Epoch 166 Batch 30/162] avg loss 0.000198109, throughput 4.07185K wps
[Epoch 166 Batch 60/162] avg loss 0.000202663, throughput 3.97098K wps
[Epoch 166 Batch 90/162] avg loss 0.000209048, throughput 3.96656K wps
[Epoch 166 Batch 120/162] avg loss 0.000195014, throughput 3.95811K wps
[Epoch 166 Batch 150/162] avg loss 0.000178736, throughput 3.95741K wps
Begin Testing...
[Epoch 166] train avg loss 0.000200355, dev acc 0.9300, dev avg loss 0.201717, throughput 3.98367K wps
[Epoch 167 Batch 30/162] avg loss 0.000211076, throughput 4.05813K wps
[Epoch 167 Batch 60/162] avg loss 0.000183225, throughput 3.94931K wps
[Epoch 167 Batch 90/162] avg loss 0.000168368, throughput 3.96163K wps
[Epoch 167 Batch 120/162] avg loss 0.000197881, throughput 3.97863K wps
[Epoch 167 Batch 150/162] avg loss 0.000214372, throughput 3.97492K wps
Begin Testing...
[Epoch 167] train avg loss 0.000190007, dev acc 0.9300, dev avg loss 0.202827, throughput 3.98317K wps
[Epoch 168 Batch 30/162] avg loss 0.000183417, throughput 4.05706K wps
[Epoch 168 Batch 60/162] avg loss 0.000207018, throughput 3.96941K wps
[Epoch 168 Batch 90/162] avg loss 0.000198624, throughput 3.96119K wps
[Epoch 168 Batch 120/162] avg loss 0.000174611, throughput 3.97224K wps
[Epoch 168 Batch 150/162] avg loss 0.000175121, throughput 3.96872K wps
Begin Testing...
[Epoch 168] train avg loss 0.000190135, dev acc 0.9300, dev avg loss 0.202659, throughput 3.98341K wps
[Epoch 169 Batch 30/162] avg loss 0.000173284, throughput 4.06078K wps
[Epoch 169 Batch 60/162] avg loss 0.000181151, throughput 3.96876K wps
[Epoch 169 Batch 90/162] avg loss 0.000200825, throughput 3.97369K wps
[Epoch 169 Batch 120/162] avg loss 0.000234196, throughput 3.97523K wps
[Epoch 169 Batch 150/162] avg loss 0.000171254, throughput 3.9688K wps
Begin Testing...
[Epoch 169] train avg loss 0.000200189, dev acc 0.9300, dev avg loss 0.202381, throughput 3.98766K wps
[Epoch 170 Batch 30/162] avg loss 0.000165932, throughput 4.04576K wps
[Epoch 170 Batch 60/162] avg loss 0.000166551, throughput 3.9763K wps
[Epoch 170 Batch 90/162] avg loss 0.000174685, throughput 3.95921K wps
[Epoch 170 Batch 120/162] avg loss 0.000193739, throughput 3.96802K wps
[Epoch 170 Batch 150/162] avg loss 0.00020564, throughput 3.97051K wps
Begin Testing...
[Epoch 170] train avg loss 0.000178715, dev acc 0.9311, dev avg loss 0.204552, throughput 3.98267K wps
[Epoch 171 Batch 30/162] avg loss 0.000214695, throughput 4.07105K wps
[Epoch 171 Batch 60/162] avg loss 0.000248541, throughput 3.9628K wps
[Epoch 171 Batch 90/162] avg loss 0.00017261, throughput 3.96721K wps
[Epoch 171 Batch 120/162] avg loss 0.000185075, throughput 3.95723K wps
[Epoch 171 Batch 150/162] avg loss 0.00016964, throughput 3.95757K wps
Begin Testing...
[Epoch 171] train avg loss 0.000196275, dev acc 0.9300, dev avg loss 0.20314, throughput 3.98142K wps
[Epoch 172 Batch 30/162] avg loss 0.000163807, throughput 4.06148K wps
[Epoch 172 Batch 60/162] avg loss 0.000182114, throughput 3.96247K wps
[Epoch 172 Batch 90/162] avg loss 0.000168088, throughput 3.96282K wps
[Epoch 172 Batch 120/162] avg loss 0.000235818, throughput 3.97451K wps
[Epoch 172 Batch 150/162] avg loss 0.000216457, throughput 3.96903K wps
Begin Testing...
[Epoch 172] train avg loss 0.000191902, dev acc 0.9311, dev avg loss 0.20382, throughput 3.98542K wps
[Epoch 173 Batch 30/162] avg loss 0.000214018, throughput 4.0775K wps
[Epoch 173 Batch 60/162] avg loss 0.000153466, throughput 3.97821K wps
[Epoch 173 Batch 90/162] avg loss 0.000164615, throughput 3.96535K wps
[Epoch 173 Batch 120/162] avg loss 0.000187045, throughput 3.97724K wps
[Epoch 173 Batch 150/162] avg loss 0.000185421, throughput 3.96966K wps
Begin Testing...
[Epoch 173] train avg loss 0.000179973, dev acc 0.9300, dev avg loss 0.203826, throughput 3.99076K wps
[Epoch 174 Batch 30/162] avg loss 0.000157489, throughput 4.05886K wps
[Epoch 174 Batch 60/162] avg loss 0.000174407, throughput 3.96591K wps
[Epoch 174 Batch 90/162] avg loss 0.000183835, throughput 3.97552K wps
[Epoch 174 Batch 120/162] avg loss 0.000186842, throughput 3.96871K wps
[Epoch 174 Batch 150/162] avg loss 0.000194731, throughput 3.97102K wps
Begin Testing...
[Epoch 174] train avg loss 0.000176992, dev acc 0.9300, dev avg loss 0.2046, throughput 3.9864K wps
[Epoch 175 Batch 30/162] avg loss 0.000170432, throughput 4.05675K wps
[Epoch 175 Batch 60/162] avg loss 0.000156657, throughput 3.96711K wps
[Epoch 175 Batch 90/162] avg loss 0.000201561, throughput 3.97034K wps
[Epoch 175 Batch 120/162] avg loss 0.000195048, throughput 3.96022K wps
[Epoch 175 Batch 150/162] avg loss 0.000204568, throughput 3.9601K wps
Begin Testing...
[Epoch 175] train avg loss 0.000184061, dev acc 0.9300, dev avg loss 0.204687, throughput 3.98247K wps
[Epoch 176 Batch 30/162] avg loss 0.00017846, throughput 4.07878K wps
[Epoch 176 Batch 60/162] avg loss 0.000189737, throughput 3.97833K wps
[Epoch 176 Batch 90/162] avg loss 0.000184523, throughput 3.97073K wps
[Epoch 176 Batch 120/162] avg loss 0.000199984, throughput 3.96089K wps
[Epoch 176 Batch 150/162] avg loss 0.000153088, throughput 3.95435K wps
Begin Testing...
[Epoch 176] train avg loss 0.000178863, dev acc 0.9289, dev avg loss 0.204126, throughput 3.9865K wps
[Epoch 177 Batch 30/162] avg loss 0.000180407, throughput 4.07103K wps
[Epoch 177 Batch 60/162] avg loss 0.00018509, throughput 3.96358K wps
[Epoch 177 Batch 90/162] avg loss 0.00016376, throughput 3.9639K wps
[Epoch 177 Batch 120/162] avg loss 0.000181298, throughput 3.9685K wps
[Epoch 177 Batch 150/162] avg loss 0.000198739, throughput 3.96661K wps
Begin Testing...
[Epoch 177] train avg loss 0.000180132, dev acc 0.9300, dev avg loss 0.204788, throughput 3.98418K wps
[Epoch 178 Batch 30/162] avg loss 0.000196793, throughput 4.06952K wps
[Epoch 178 Batch 60/162] avg loss 0.000162983, throughput 3.9653K wps
[Epoch 178 Batch 90/162] avg loss 0.000146372, throughput 3.9603K wps
[Epoch 178 Batch 120/162] avg loss 0.000160453, throughput 3.96106K wps
[Epoch 178 Batch 150/162] avg loss 0.00017148, throughput 3.97007K wps
Begin Testing...
[Epoch 178] train avg loss 0.000169155, dev acc 0.9289, dev avg loss 0.205235, throughput 3.98204K wps
[Epoch 179 Batch 30/162] avg loss 0.000166135, throughput 4.06707K wps
[Epoch 179 Batch 60/162] avg loss 0.000188229, throughput 3.9749K wps
[Epoch 179 Batch 90/162] avg loss 0.000183643, throughput 3.96945K wps
[Epoch 179 Batch 120/162] avg loss 0.000158629, throughput 3.97256K wps
[Epoch 179 Batch 150/162] avg loss 0.000190222, throughput 3.97701K wps
Begin Testing...
[Epoch 179] train avg loss 0.000174708, dev acc 0.9289, dev avg loss 0.206425, throughput 3.9893K wps
[Epoch 180 Batch 30/162] avg loss 0.00020937, throughput 4.0682K wps
[Epoch 180 Batch 60/162] avg loss 0.000169791, throughput 3.97274K wps
[Epoch 180 Batch 90/162] avg loss 0.000177452, throughput 3.97044K wps
[Epoch 180 Batch 120/162] avg loss 0.000162021, throughput 3.96722K wps
[Epoch 180 Batch 150/162] avg loss 0.000142062, throughput 3.96066K wps
Begin Testing...
[Epoch 180] train avg loss 0.000173828, dev acc 0.9300, dev avg loss 0.205991, throughput 3.98594K wps
[Epoch 181 Batch 30/162] avg loss 0.000144108, throughput 4.05455K wps
[Epoch 181 Batch 60/162] avg loss 0.000202774, throughput 3.97513K wps
[Epoch 181 Batch 90/162] avg loss 0.000185681, throughput 3.981K wps
[Epoch 181 Batch 120/162] avg loss 0.000166849, throughput 3.97157K wps
[Epoch 181 Batch 150/162] avg loss 0.000173906, throughput 3.96939K wps
Begin Testing...
[Epoch 181] train avg loss 0.000173302, dev acc 0.9322, dev avg loss 0.207155, throughput 3.98757K wps
[Epoch 182 Batch 30/162] avg loss 0.00017475, throughput 4.07471K wps
[Epoch 182 Batch 60/162] avg loss 0.000139059, throughput 3.97178K wps
[Epoch 182 Batch 90/162] avg loss 0.000147992, throughput 3.96515K wps
[Epoch 182 Batch 120/162] avg loss 0.0001924, throughput 3.97391K wps
[Epoch 182 Batch 150/162] avg loss 0.000151752, throughput 3.97233K wps
Begin Testing...
[Epoch 182] train avg loss 0.000159358, dev acc 0.9289, dev avg loss 0.206864, throughput 3.98962K wps
[Epoch 183 Batch 30/162] avg loss 0.000175996, throughput 4.0781K wps
[Epoch 183 Batch 60/162] avg loss 0.000142196, throughput 3.96668K wps
[Epoch 183 Batch 90/162] avg loss 0.000192235, throughput 3.96804K wps
[Epoch 183 Batch 120/162] avg loss 0.000176948, throughput 3.96606K wps
[Epoch 183 Batch 150/162] avg loss 0.000158535, throughput 3.96501K wps
Begin Testing...
[Epoch 183] train avg loss 0.000166059, dev acc 0.9300, dev avg loss 0.206815, throughput 3.98736K wps
[Epoch 184 Batch 30/162] avg loss 0.000189652, throughput 4.05246K wps
[Epoch 184 Batch 60/162] avg loss 0.000180589, throughput 3.96648K wps
[Epoch 184 Batch 90/162] avg loss 0.000154863, throughput 3.97461K wps
[Epoch 184 Batch 120/162] avg loss 0.000148682, throughput 3.97425K wps
[Epoch 184 Batch 150/162] avg loss 0.00016411, throughput 3.98285K wps
Begin Testing...
[Epoch 184] train avg loss 0.000169143, dev acc 0.9300, dev avg loss 0.207732, throughput 3.98874K wps
[Epoch 185 Batch 30/162] avg loss 0.00014004, throughput 4.06508K wps
[Epoch 185 Batch 60/162] avg loss 0.000167508, throughput 3.95681K wps
[Epoch 185 Batch 90/162] avg loss 0.000182366, throughput 3.9721K wps
[Epoch 185 Batch 120/162] avg loss 0.000168918, throughput 3.95868K wps
[Epoch 185 Batch 150/162] avg loss 0.000167151, throughput 3.97018K wps
Begin Testing...
[Epoch 185] train avg loss 0.000165948, dev acc 0.9289, dev avg loss 0.206856, throughput 3.98347K wps
[Epoch 186 Batch 30/162] avg loss 0.000141127, throughput 4.06929K wps
[Epoch 186 Batch 60/162] avg loss 0.000180766, throughput 3.97854K wps
[Epoch 186 Batch 90/162] avg loss 0.000186227, throughput 3.96954K wps
[Epoch 186 Batch 120/162] avg loss 0.000146051, throughput 3.97742K wps
[Epoch 186 Batch 150/162] avg loss 0.000191707, throughput 3.97981K wps
Begin Testing...
[Epoch 186] train avg loss 0.000166662, dev acc 0.9300, dev avg loss 0.208479, throughput 3.99195K wps
[Epoch 187 Batch 30/162] avg loss 0.000184485, throughput 4.06957K wps
[Epoch 187 Batch 60/162] avg loss 0.000163493, throughput 3.97615K wps
[Epoch 187 Batch 90/162] avg loss 0.000147022, throughput 3.97261K wps
[Epoch 187 Batch 120/162] avg loss 0.00015237, throughput 3.97382K wps
[Epoch 187 Batch 150/162] avg loss 0.000135778, throughput 3.97638K wps
Begin Testing...
[Epoch 187] train avg loss 0.000157523, dev acc 0.9300, dev avg loss 0.208211, throughput 3.99109K wps
[Epoch 188 Batch 30/162] avg loss 0.000166534, throughput 4.06833K wps
[Epoch 188 Batch 60/162] avg loss 0.000159508, throughput 3.97477K wps
[Epoch 188 Batch 90/162] avg loss 0.000151959, throughput 3.96736K wps
[Epoch 188 Batch 120/162] avg loss 0.00015344, throughput 3.96124K wps
[Epoch 188 Batch 150/162] avg loss 0.00017153, throughput 3.96877K wps
Begin Testing...
[Epoch 188] train avg loss 0.000162194, dev acc 0.9322, dev avg loss 0.209164, throughput 3.98624K wps
[Epoch 189 Batch 30/162] avg loss 0.00016703, throughput 4.05977K wps
[Epoch 189 Batch 60/162] avg loss 0.000148101, throughput 3.96983K wps
[Epoch 189 Batch 90/162] avg loss 0.000148629, throughput 3.96564K wps
[Epoch 189 Batch 120/162] avg loss 0.000144849, throughput 3.96226K wps
[Epoch 189 Batch 150/162] avg loss 0.000159223, throughput 3.96891K wps
Begin Testing...
[Epoch 189] train avg loss 0.000153801, dev acc 0.9289, dev avg loss 0.208932, throughput 3.98313K wps
[Epoch 190 Batch 30/162] avg loss 0.00014247, throughput 4.07056K wps
[Epoch 190 Batch 60/162] avg loss 0.000144332, throughput 3.96126K wps
[Epoch 190 Batch 90/162] avg loss 0.000134467, throughput 3.96129K wps
[Epoch 190 Batch 120/162] avg loss 0.00014922, throughput 3.9786K wps
[Epoch 190 Batch 150/162] avg loss 0.000153782, throughput 3.96324K wps
Begin Testing...
[Epoch 190] train avg loss 0.000148388, dev acc 0.9311, dev avg loss 0.208378, throughput 3.98482K wps
[Epoch 191 Batch 30/162] avg loss 0.000142583, throughput 4.06851K wps
[Epoch 191 Batch 60/162] avg loss 0.000140511, throughput 3.97939K wps
[Epoch 191 Batch 90/162] avg loss 0.000179801, throughput 3.97426K wps
[Epoch 191 Batch 120/162] avg loss 0.000171173, throughput 3.97222K wps
[Epoch 191 Batch 150/162] avg loss 0.00014004, throughput 3.97817K wps
Begin Testing...
[Epoch 191] train avg loss 0.000154472, dev acc 0.9300, dev avg loss 0.209952, throughput 3.99236K wps
[Epoch 192 Batch 30/162] avg loss 0.000198609, throughput 4.05835K wps
[Epoch 192 Batch 60/162] avg loss 0.000146237, throughput 3.97931K wps
[Epoch 192 Batch 90/162] avg loss 0.00014614, throughput 3.9681K wps
[Epoch 192 Batch 120/162] avg loss 0.000154991, throughput 3.96358K wps
[Epoch 192 Batch 150/162] avg loss 0.000137625, throughput 3.97265K wps
Begin Testing...
[Epoch 192] train avg loss 0.000154371, dev acc 0.9278, dev avg loss 0.210363, throughput 3.98614K wps
[Epoch 193 Batch 30/162] avg loss 0.000148035, throughput 4.06732K wps
[Epoch 193 Batch 60/162] avg loss 0.000138001, throughput 3.97353K wps
[Epoch 193 Batch 90/162] avg loss 0.000162336, throughput 3.9768K wps
[Epoch 193 Batch 120/162] avg loss 0.000136655, throughput 3.96529K wps
[Epoch 193 Batch 150/162] avg loss 0.000172792, throughput 3.95943K wps
Begin Testing...
[Epoch 193] train avg loss 0.000153247, dev acc 0.9289, dev avg loss 0.210559, throughput 3.98697K wps
[Epoch 194 Batch 30/162] avg loss 0.000180209, throughput 4.06146K wps
[Epoch 194 Batch 60/162] avg loss 0.000143852, throughput 3.96972K wps
[Epoch 194 Batch 90/162] avg loss 0.000150755, throughput 3.96783K wps
[Epoch 194 Batch 120/162] avg loss 0.000144365, throughput 3.96783K wps
[Epoch 194 Batch 150/162] avg loss 0.000144174, throughput 3.97418K wps
Begin Testing...
[Epoch 194] train avg loss 0.000149619, dev acc 0.9289, dev avg loss 0.209576, throughput 3.98756K wps
[Epoch 195 Batch 30/162] avg loss 0.000145439, throughput 4.07956K wps
[Epoch 195 Batch 60/162] avg loss 0.00015229, throughput 3.97644K wps
[Epoch 195 Batch 90/162] avg loss 0.00014674, throughput 3.96777K wps
[Epoch 195 Batch 120/162] avg loss 0.000162851, throughput 3.97788K wps
[Epoch 195 Batch 150/162] avg loss 0.00014652, throughput 3.96448K wps
Begin Testing...
[Epoch 195] train avg loss 0.000151345, dev acc 0.9289, dev avg loss 0.209717, throughput 3.99034K wps
[Epoch 196 Batch 30/162] avg loss 0.000190228, throughput 4.05998K wps
[Epoch 196 Batch 60/162] avg loss 0.000129673, throughput 3.97678K wps
[Epoch 196 Batch 90/162] avg loss 0.000138456, throughput 3.97494K wps
[Epoch 196 Batch 120/162] avg loss 0.000121943, throughput 3.97768K wps
[Epoch 196 Batch 150/162] avg loss 0.000130345, throughput 3.91897K wps
Begin Testing...
[Epoch 196] train avg loss 0.000138928, dev acc 0.9289, dev avg loss 0.211029, throughput 3.98065K wps
[Epoch 197 Batch 30/162] avg loss 0.000158323, throughput 4.06627K wps
[Epoch 197 Batch 60/162] avg loss 0.000124863, throughput 3.97989K wps
[Epoch 197 Batch 90/162] avg loss 0.000166673, throughput 3.95623K wps
[Epoch 197 Batch 120/162] avg loss 0.00015083, throughput 3.96041K wps
[Epoch 197 Batch 150/162] avg loss 0.000139297, throughput 3.97775K wps
Begin Testing...
[Epoch 197] train avg loss 0.000146007, dev acc 0.9300, dev avg loss 0.210565, throughput 3.98698K wps
[Epoch 198 Batch 30/162] avg loss 0.000154041, throughput 4.06631K wps
[Epoch 198 Batch 60/162] avg loss 0.000158004, throughput 3.97568K wps
[Epoch 198 Batch 90/162] avg loss 0.00013034, throughput 3.97004K wps
[Epoch 198 Batch 120/162] avg loss 0.000202945, throughput 3.96909K wps
[Epoch 198 Batch 150/162] avg loss 0.000150192, throughput 3.96886K wps
Begin Testing...
[Epoch 198] train avg loss 0.000156778, dev acc 0.9300, dev avg loss 0.211247, throughput 3.98868K wps
[Epoch 199 Batch 30/162] avg loss 0.000116961, throughput 4.07603K wps
[Epoch 199 Batch 60/162] avg loss 0.00012392, throughput 3.95432K wps
[Epoch 199 Batch 90/162] avg loss 0.000149806, throughput 3.95235K wps
[Epoch 199 Batch 120/162] avg loss 0.000121298, throughput 3.96889K wps
[Epoch 199 Batch 150/162] avg loss 0.000123197, throughput 3.97132K wps
Begin Testing...
[Epoch 199] train avg loss 0.000126032, dev acc 0.9300, dev avg loss 0.21178, throughput 3.98324K wps
Test loss 0.215089, test acc 0.9140
Total time cost 1007.82s
[Epoch 0 Batch 30/162] avg loss 0.0140249, throughput 3.60408K wps
[Epoch 0 Batch 60/162] avg loss 0.0137485, throughput 3.9749K wps
[Epoch 0 Batch 90/162] avg loss 0.0136365, throughput 3.96217K wps
[Epoch 0 Batch 120/162] avg loss 0.013535, throughput 3.97625K wps
[Epoch 0 Batch 150/162] avg loss 0.0134531, throughput 3.96731K wps
Begin Testing...
[Epoch 0] train avg loss 0.0136546, dev acc 0.7044, dev avg loss 0.660171, throughput 3.89299K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0131903, throughput 4.06654K wps
[Epoch 1 Batch 60/162] avg loss 0.0131604, throughput 3.9837K wps
[Epoch 1 Batch 90/162] avg loss 0.0129982, throughput 3.97008K wps
[Epoch 1 Batch 120/162] avg loss 0.0126835, throughput 3.97408K wps
[Epoch 1 Batch 150/162] avg loss 0.012682, throughput 3.96699K wps
Begin Testing...
[Epoch 1] train avg loss 0.0129182, dev acc 0.8256, dev avg loss 0.626386, throughput 3.98839K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/162] avg loss 0.0124164, throughput 4.05758K wps
[Epoch 2 Batch 60/162] avg loss 0.012381, throughput 3.97863K wps
[Epoch 2 Batch 90/162] avg loss 0.0121154, throughput 3.97303K wps
[Epoch 2 Batch 120/162] avg loss 0.0119777, throughput 3.963K wps
[Epoch 2 Batch 150/162] avg loss 0.0118354, throughput 3.9644K wps
Begin Testing...
[Epoch 2] train avg loss 0.0121217, dev acc 0.8289, dev avg loss 0.585028, throughput 3.98595K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0116759, throughput 4.06133K wps
[Epoch 3 Batch 60/162] avg loss 0.0115454, throughput 3.96941K wps
[Epoch 3 Batch 90/162] avg loss 0.0111206, throughput 3.97913K wps
[Epoch 3 Batch 120/162] avg loss 0.0111356, throughput 3.95443K wps
[Epoch 3 Batch 150/162] avg loss 0.0108987, throughput 3.96101K wps
Begin Testing...
[Epoch 3] train avg loss 0.01125, dev acc 0.8333, dev avg loss 0.537188, throughput 3.98259K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/162] avg loss 0.0107024, throughput 4.05984K wps
[Epoch 4 Batch 60/162] avg loss 0.0104154, throughput 3.96365K wps
[Epoch 4 Batch 90/162] avg loss 0.0102116, throughput 3.97292K wps
[Epoch 4 Batch 120/162] avg loss 0.0100933, throughput 3.96632K wps
[Epoch 4 Batch 150/162] avg loss 0.0097798, throughput 3.96929K wps
Begin Testing...
[Epoch 4] train avg loss 0.010198, dev acc 0.8433, dev avg loss 0.488869, throughput 3.98474K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.0097114, throughput 4.07614K wps
[Epoch 5 Batch 60/162] avg loss 0.00967168, throughput 3.97071K wps
[Epoch 5 Batch 90/162] avg loss 0.00943678, throughput 3.96361K wps
[Epoch 5 Batch 120/162] avg loss 0.00905716, throughput 3.96494K wps
[Epoch 5 Batch 150/162] avg loss 0.00884709, throughput 3.9613K wps
Begin Testing...
[Epoch 5] train avg loss 0.00934104, dev acc 0.8433, dev avg loss 0.447519, throughput 3.98416K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/162] avg loss 0.00869126, throughput 4.06162K wps
[Epoch 6 Batch 60/162] avg loss 0.00880745, throughput 3.97612K wps
[Epoch 6 Batch 90/162] avg loss 0.00849957, throughput 3.9715K wps
[Epoch 6 Batch 120/162] avg loss 0.00835585, throughput 3.97291K wps
[Epoch 6 Batch 150/162] avg loss 0.00839971, throughput 3.97844K wps
Begin Testing...
[Epoch 6] train avg loss 0.00854543, dev acc 0.8533, dev avg loss 0.412746, throughput 3.98906K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/162] avg loss 0.00819598, throughput 4.07139K wps
[Epoch 7 Batch 60/162] avg loss 0.00798402, throughput 3.96747K wps
[Epoch 7 Batch 90/162] avg loss 0.00794479, throughput 3.96218K wps
[Epoch 7 Batch 120/162] avg loss 0.00763679, throughput 3.95205K wps
[Epoch 7 Batch 150/162] avg loss 0.00768968, throughput 3.96352K wps
Begin Testing...
[Epoch 7] train avg loss 0.00789315, dev acc 0.8522, dev avg loss 0.383515, throughput 3.98199K wps
[Epoch 8 Batch 30/162] avg loss 0.00763414, throughput 4.06525K wps
[Epoch 8 Batch 60/162] avg loss 0.00769668, throughput 3.97582K wps
[Epoch 8 Batch 90/162] avg loss 0.00715745, throughput 3.97382K wps
[Epoch 8 Batch 120/162] avg loss 0.00706474, throughput 3.95617K wps
[Epoch 8 Batch 150/162] avg loss 0.00717075, throughput 3.97072K wps
Begin Testing...
[Epoch 8] train avg loss 0.00736104, dev acc 0.8600, dev avg loss 0.360761, throughput 3.98698K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/162] avg loss 0.00724243, throughput 4.05685K wps
[Epoch 9 Batch 60/162] avg loss 0.0071179, throughput 3.97537K wps
[Epoch 9 Batch 90/162] avg loss 0.00696301, throughput 3.97331K wps
[Epoch 9 Batch 120/162] avg loss 0.00670712, throughput 3.97203K wps
[Epoch 9 Batch 150/162] avg loss 0.00667993, throughput 3.96692K wps
Begin Testing...
[Epoch 9] train avg loss 0.00691978, dev acc 0.8744, dev avg loss 0.340743, throughput 3.98796K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00672479, throughput 4.06339K wps
[Epoch 10 Batch 60/162] avg loss 0.0069896, throughput 3.95935K wps
[Epoch 10 Batch 90/162] avg loss 0.00643654, throughput 3.96877K wps
[Epoch 10 Batch 120/162] avg loss 0.00613468, throughput 3.97485K wps
[Epoch 10 Batch 150/162] avg loss 0.00642247, throughput 3.97082K wps
Begin Testing...
[Epoch 10] train avg loss 0.00652831, dev acc 0.8867, dev avg loss 0.325339, throughput 3.98635K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00610104, throughput 4.06186K wps
[Epoch 11 Batch 60/162] avg loss 0.00621451, throughput 3.97251K wps
[Epoch 11 Batch 90/162] avg loss 0.00636908, throughput 3.96069K wps
[Epoch 11 Batch 120/162] avg loss 0.00631752, throughput 3.97901K wps
[Epoch 11 Batch 150/162] avg loss 0.00617874, throughput 3.974K wps
Begin Testing...
[Epoch 11] train avg loss 0.00623372, dev acc 0.8900, dev avg loss 0.312612, throughput 3.98652K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00605344, throughput 4.06027K wps
[Epoch 12 Batch 60/162] avg loss 0.00603203, throughput 3.95415K wps
[Epoch 12 Batch 90/162] avg loss 0.00601672, throughput 3.9545K wps
[Epoch 12 Batch 120/162] avg loss 0.00599494, throughput 3.97459K wps
[Epoch 12 Batch 150/162] avg loss 0.00560076, throughput 3.96835K wps
Begin Testing...
[Epoch 12] train avg loss 0.00593555, dev acc 0.8956, dev avg loss 0.301152, throughput 3.98171K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.00581681, throughput 4.0627K wps
[Epoch 13 Batch 60/162] avg loss 0.00594083, throughput 3.96904K wps
[Epoch 13 Batch 90/162] avg loss 0.00544442, throughput 3.97345K wps
[Epoch 13 Batch 120/162] avg loss 0.00587881, throughput 3.94073K wps
[Epoch 13 Batch 150/162] avg loss 0.00561636, throughput 3.96395K wps
Begin Testing...
[Epoch 13] train avg loss 0.00575068, dev acc 0.9011, dev avg loss 0.292137, throughput 3.97806K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00529779, throughput 4.05135K wps
[Epoch 14 Batch 60/162] avg loss 0.00552734, throughput 3.96971K wps
[Epoch 14 Batch 90/162] avg loss 0.00552285, throughput 3.96924K wps
[Epoch 14 Batch 120/162] avg loss 0.00521222, throughput 3.95707K wps
[Epoch 14 Batch 150/162] avg loss 0.00565713, throughput 3.96639K wps
Begin Testing...
[Epoch 14] train avg loss 0.00546311, dev acc 0.9056, dev avg loss 0.283472, throughput 3.98138K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00522977, throughput 4.06707K wps
[Epoch 15 Batch 60/162] avg loss 0.00542489, throughput 3.96638K wps
[Epoch 15 Batch 90/162] avg loss 0.00519574, throughput 3.97294K wps
[Epoch 15 Batch 120/162] avg loss 0.00522274, throughput 3.96688K wps
[Epoch 15 Batch 150/162] avg loss 0.00518845, throughput 3.97648K wps
Begin Testing...
[Epoch 15] train avg loss 0.00525349, dev acc 0.9011, dev avg loss 0.278106, throughput 3.98767K wps
[Epoch 16 Batch 30/162] avg loss 0.00521386, throughput 4.07738K wps
[Epoch 16 Batch 60/162] avg loss 0.00512446, throughput 3.96846K wps
[Epoch 16 Batch 90/162] avg loss 0.00516041, throughput 3.96475K wps
[Epoch 16 Batch 120/162] avg loss 0.00502911, throughput 3.97138K wps
[Epoch 16 Batch 150/162] avg loss 0.00520078, throughput 3.94329K wps
Begin Testing...
[Epoch 16] train avg loss 0.00512327, dev acc 0.9089, dev avg loss 0.269394, throughput 3.98185K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.00441527, throughput 4.06819K wps
[Epoch 17 Batch 60/162] avg loss 0.00526448, throughput 3.96198K wps
[Epoch 17 Batch 90/162] avg loss 0.00482063, throughput 3.95108K wps
[Epoch 17 Batch 120/162] avg loss 0.00500441, throughput 3.96213K wps
[Epoch 17 Batch 150/162] avg loss 0.00509356, throughput 3.96695K wps
Begin Testing...
[Epoch 17] train avg loss 0.00492227, dev acc 0.9067, dev avg loss 0.263196, throughput 3.97994K wps
[Epoch 18 Batch 30/162] avg loss 0.00529888, throughput 4.06849K wps
[Epoch 18 Batch 60/162] avg loss 0.0046081, throughput 3.96269K wps
[Epoch 18 Batch 90/162] avg loss 0.00467963, throughput 3.97429K wps
[Epoch 18 Batch 120/162] avg loss 0.00490713, throughput 3.96182K wps
[Epoch 18 Batch 150/162] avg loss 0.00405941, throughput 3.97783K wps
Begin Testing...
[Epoch 18] train avg loss 0.00474485, dev acc 0.9089, dev avg loss 0.257613, throughput 3.98677K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.00504619, throughput 4.05726K wps
[Epoch 19 Batch 60/162] avg loss 0.00427632, throughput 3.97879K wps
[Epoch 19 Batch 90/162] avg loss 0.00436147, throughput 3.97773K wps
[Epoch 19 Batch 120/162] avg loss 0.00463731, throughput 3.97131K wps
[Epoch 19 Batch 150/162] avg loss 0.00453661, throughput 3.96796K wps
Begin Testing...
[Epoch 19] train avg loss 0.00457433, dev acc 0.9133, dev avg loss 0.25427, throughput 3.98928K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/162] avg loss 0.00442457, throughput 4.06173K wps
[Epoch 20 Batch 60/162] avg loss 0.00425824, throughput 3.96091K wps
[Epoch 20 Batch 90/162] avg loss 0.00450895, throughput 3.96632K wps
[Epoch 20 Batch 120/162] avg loss 0.00435195, throughput 3.96647K wps
[Epoch 20 Batch 150/162] avg loss 0.00454603, throughput 3.96685K wps
Begin Testing...
[Epoch 20] train avg loss 0.00442521, dev acc 0.9122, dev avg loss 0.248357, throughput 3.98313K wps
[Epoch 21 Batch 30/162] avg loss 0.00440473, throughput 4.07049K wps
[Epoch 21 Batch 60/162] avg loss 0.00442193, throughput 3.97646K wps
[Epoch 21 Batch 90/162] avg loss 0.00411496, throughput 3.97862K wps
[Epoch 21 Batch 120/162] avg loss 0.00479913, throughput 3.98051K wps
[Epoch 21 Batch 150/162] avg loss 0.0041243, throughput 3.97447K wps
Begin Testing...
[Epoch 21] train avg loss 0.00438836, dev acc 0.9122, dev avg loss 0.244153, throughput 3.99144K wps
[Epoch 22 Batch 30/162] avg loss 0.00421052, throughput 4.06608K wps
[Epoch 22 Batch 60/162] avg loss 0.00424827, throughput 3.9683K wps
[Epoch 22 Batch 90/162] avg loss 0.00385229, throughput 3.95346K wps
[Epoch 22 Batch 120/162] avg loss 0.00433157, throughput 3.97581K wps
[Epoch 22 Batch 150/162] avg loss 0.00391805, throughput 3.96302K wps
Begin Testing...
[Epoch 22] train avg loss 0.00413372, dev acc 0.9144, dev avg loss 0.240963, throughput 3.98415K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/162] avg loss 0.00382232, throughput 4.07338K wps
[Epoch 23 Batch 60/162] avg loss 0.00410271, throughput 3.96974K wps
[Epoch 23 Batch 90/162] avg loss 0.00421264, throughput 3.97109K wps
[Epoch 23 Batch 120/162] avg loss 0.00414562, throughput 3.96016K wps
[Epoch 23 Batch 150/162] avg loss 0.00408402, throughput 3.97314K wps
Begin Testing...
[Epoch 23] train avg loss 0.00408716, dev acc 0.9133, dev avg loss 0.237411, throughput 3.98745K wps
[Epoch 24 Batch 30/162] avg loss 0.00411863, throughput 4.0545K wps
[Epoch 24 Batch 60/162] avg loss 0.00411153, throughput 3.95807K wps
[Epoch 24 Batch 90/162] avg loss 0.0038129, throughput 3.94748K wps
[Epoch 24 Batch 120/162] avg loss 0.00384414, throughput 3.96901K wps
[Epoch 24 Batch 150/162] avg loss 0.00427676, throughput 3.96727K wps
Begin Testing...
[Epoch 24] train avg loss 0.00398786, dev acc 0.9111, dev avg loss 0.232984, throughput 3.97799K wps
[Epoch 25 Batch 30/162] avg loss 0.0039367, throughput 4.06772K wps
[Epoch 25 Batch 60/162] avg loss 0.00389655, throughput 3.96448K wps
[Epoch 25 Batch 90/162] avg loss 0.00373825, throughput 3.96483K wps
[Epoch 25 Batch 120/162] avg loss 0.00403941, throughput 3.96848K wps
[Epoch 25 Batch 150/162] avg loss 0.0036502, throughput 3.96056K wps
Begin Testing...
[Epoch 25] train avg loss 0.00382259, dev acc 0.9156, dev avg loss 0.231524, throughput 3.98354K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/162] avg loss 0.00365982, throughput 4.06204K wps
[Epoch 26 Batch 60/162] avg loss 0.00392468, throughput 3.97271K wps
[Epoch 26 Batch 90/162] avg loss 0.00372041, throughput 3.9715K wps
[Epoch 26 Batch 120/162] avg loss 0.00389769, throughput 3.95701K wps
[Epoch 26 Batch 150/162] avg loss 0.00367122, throughput 3.96704K wps
Begin Testing...
[Epoch 26] train avg loss 0.00376607, dev acc 0.9167, dev avg loss 0.228777, throughput 3.98328K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.00379609, throughput 4.05879K wps
[Epoch 27 Batch 60/162] avg loss 0.00337889, throughput 3.96195K wps
[Epoch 27 Batch 90/162] avg loss 0.0035571, throughput 3.95906K wps
[Epoch 27 Batch 120/162] avg loss 0.00373622, throughput 3.97314K wps
[Epoch 27 Batch 150/162] avg loss 0.00365363, throughput 3.96452K wps
Begin Testing...
[Epoch 27] train avg loss 0.00360071, dev acc 0.9189, dev avg loss 0.225059, throughput 3.98129K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/162] avg loss 0.00379976, throughput 4.07632K wps
[Epoch 28 Batch 60/162] avg loss 0.00327066, throughput 3.97115K wps
[Epoch 28 Batch 90/162] avg loss 0.00352348, throughput 3.97975K wps
[Epoch 28 Batch 120/162] avg loss 0.00331078, throughput 3.96881K wps
[Epoch 28 Batch 150/162] avg loss 0.00350833, throughput 3.98202K wps
Begin Testing...
[Epoch 28] train avg loss 0.0035032, dev acc 0.9189, dev avg loss 0.22231, throughput 3.9918K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/162] avg loss 0.0034805, throughput 4.03787K wps
[Epoch 29 Batch 60/162] avg loss 0.00321043, throughput 3.9654K wps
[Epoch 29 Batch 90/162] avg loss 0.00333861, throughput 3.96458K wps
[Epoch 29 Batch 120/162] avg loss 0.00373789, throughput 3.96398K wps
[Epoch 29 Batch 150/162] avg loss 0.00343129, throughput 3.97558K wps
Begin Testing...
[Epoch 29] train avg loss 0.00343898, dev acc 0.9222, dev avg loss 0.220668, throughput 3.98127K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/162] avg loss 0.0031419, throughput 4.05676K wps
[Epoch 30 Batch 60/162] avg loss 0.00366432, throughput 3.95759K wps
[Epoch 30 Batch 90/162] avg loss 0.00329948, throughput 3.97543K wps
[Epoch 30 Batch 120/162] avg loss 0.00327292, throughput 3.95806K wps
[Epoch 30 Batch 150/162] avg loss 0.00348849, throughput 3.96357K wps
Begin Testing...
[Epoch 30] train avg loss 0.00335716, dev acc 0.9222, dev avg loss 0.218558, throughput 3.98118K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00333345, throughput 4.05636K wps
[Epoch 31 Batch 60/162] avg loss 0.00319917, throughput 3.97793K wps
[Epoch 31 Batch 90/162] avg loss 0.00305437, throughput 3.9656K wps
[Epoch 31 Batch 120/162] avg loss 0.0032363, throughput 3.9778K wps
[Epoch 31 Batch 150/162] avg loss 0.00349324, throughput 3.94931K wps
Begin Testing...
[Epoch 31] train avg loss 0.00322668, dev acc 0.9233, dev avg loss 0.216514, throughput 3.9826K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/162] avg loss 0.00316751, throughput 4.0588K wps
[Epoch 32 Batch 60/162] avg loss 0.00322934, throughput 3.96755K wps
[Epoch 32 Batch 90/162] avg loss 0.00291864, throughput 3.95211K wps
[Epoch 32 Batch 120/162] avg loss 0.00293513, throughput 3.97886K wps
[Epoch 32 Batch 150/162] avg loss 0.00307583, throughput 3.97286K wps
Begin Testing...
[Epoch 32] train avg loss 0.0030621, dev acc 0.9233, dev avg loss 0.214988, throughput 3.98323K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/162] avg loss 0.00301751, throughput 4.07118K wps
[Epoch 33 Batch 60/162] avg loss 0.00291219, throughput 3.97445K wps
[Epoch 33 Batch 90/162] avg loss 0.00299808, throughput 3.95439K wps
[Epoch 33 Batch 120/162] avg loss 0.00312426, throughput 3.97562K wps
[Epoch 33 Batch 150/162] avg loss 0.0030376, throughput 3.96803K wps
Begin Testing...
[Epoch 33] train avg loss 0.00303841, dev acc 0.9278, dev avg loss 0.212975, throughput 3.98725K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/162] avg loss 0.00301256, throughput 4.05763K wps
[Epoch 34 Batch 60/162] avg loss 0.00313602, throughput 3.97287K wps
[Epoch 34 Batch 90/162] avg loss 0.00300949, throughput 3.97022K wps
[Epoch 34 Batch 120/162] avg loss 0.00275392, throughput 3.98179K wps
[Epoch 34 Batch 150/162] avg loss 0.00284822, throughput 3.9835K wps
Begin Testing...
[Epoch 34] train avg loss 0.00294333, dev acc 0.9267, dev avg loss 0.210953, throughput 3.99189K wps
[Epoch 35 Batch 30/162] avg loss 0.00265601, throughput 4.06068K wps
[Epoch 35 Batch 60/162] avg loss 0.00282107, throughput 3.96763K wps
[Epoch 35 Batch 90/162] avg loss 0.00299488, throughput 3.96936K wps
[Epoch 35 Batch 120/162] avg loss 0.00299788, throughput 3.96529K wps
[Epoch 35 Batch 150/162] avg loss 0.00288945, throughput 3.9727K wps
Begin Testing...
[Epoch 35] train avg loss 0.00287573, dev acc 0.9267, dev avg loss 0.209608, throughput 3.98653K wps
[Epoch 36 Batch 30/162] avg loss 0.00277131, throughput 4.06424K wps
[Epoch 36 Batch 60/162] avg loss 0.00268626, throughput 3.96929K wps
[Epoch 36 Batch 90/162] avg loss 0.00287123, throughput 3.97889K wps
[Epoch 36 Batch 120/162] avg loss 0.0030095, throughput 3.97365K wps
[Epoch 36 Batch 150/162] avg loss 0.00267386, throughput 3.96343K wps
Begin Testing...
[Epoch 36] train avg loss 0.00280967, dev acc 0.9222, dev avg loss 0.209345, throughput 3.98813K wps
[Epoch 37 Batch 30/162] avg loss 0.002928, throughput 4.05809K wps
[Epoch 37 Batch 60/162] avg loss 0.00282291, throughput 3.96425K wps
[Epoch 37 Batch 90/162] avg loss 0.00247712, throughput 3.97427K wps
[Epoch 37 Batch 120/162] avg loss 0.00269813, throughput 3.97999K wps
[Epoch 37 Batch 150/162] avg loss 0.00257957, throughput 3.96544K wps
Begin Testing...
[Epoch 37] train avg loss 0.00266391, dev acc 0.9222, dev avg loss 0.209369, throughput 3.9862K wps
[Epoch 38 Batch 30/162] avg loss 0.00224004, throughput 4.07832K wps
[Epoch 38 Batch 60/162] avg loss 0.00258983, throughput 3.97605K wps
[Epoch 38 Batch 90/162] avg loss 0.00255822, throughput 3.96825K wps
[Epoch 38 Batch 120/162] avg loss 0.00272175, throughput 3.95908K wps
[Epoch 38 Batch 150/162] avg loss 0.00279583, throughput 3.97533K wps
Begin Testing...
[Epoch 38] train avg loss 0.00260886, dev acc 0.9278, dev avg loss 0.206274, throughput 3.98908K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/162] avg loss 0.00263334, throughput 4.03946K wps
[Epoch 39 Batch 60/162] avg loss 0.00269149, throughput 3.96479K wps
[Epoch 39 Batch 90/162] avg loss 0.00243838, throughput 3.9796K wps
[Epoch 39 Batch 120/162] avg loss 0.00260845, throughput 3.97537K wps
[Epoch 39 Batch 150/162] avg loss 0.0024475, throughput 3.97472K wps
Begin Testing...
[Epoch 39] train avg loss 0.00254099, dev acc 0.9211, dev avg loss 0.206408, throughput 3.98567K wps
[Epoch 40 Batch 30/162] avg loss 0.0025737, throughput 4.06459K wps
[Epoch 40 Batch 60/162] avg loss 0.00226124, throughput 3.96128K wps
[Epoch 40 Batch 90/162] avg loss 0.00226042, throughput 3.96968K wps
[Epoch 40 Batch 120/162] avg loss 0.00251296, throughput 3.95823K wps
[Epoch 40 Batch 150/162] avg loss 0.00266362, throughput 3.95625K wps
Begin Testing...
[Epoch 40] train avg loss 0.00247296, dev acc 0.9222, dev avg loss 0.204209, throughput 3.97985K wps
[Epoch 41 Batch 30/162] avg loss 0.00231571, throughput 4.06582K wps
[Epoch 41 Batch 60/162] avg loss 0.00245834, throughput 3.97642K wps
[Epoch 41 Batch 90/162] avg loss 0.00246494, throughput 3.97399K wps
[Epoch 41 Batch 120/162] avg loss 0.00252986, throughput 3.96369K wps
[Epoch 41 Batch 150/162] avg loss 0.00225771, throughput 3.96209K wps
Begin Testing...
[Epoch 41] train avg loss 0.00240227, dev acc 0.9211, dev avg loss 0.203481, throughput 3.98582K wps
[Epoch 42 Batch 30/162] avg loss 0.00228987, throughput 4.06765K wps
[Epoch 42 Batch 60/162] avg loss 0.00216351, throughput 3.961K wps
[Epoch 42 Batch 90/162] avg loss 0.00210932, throughput 3.96481K wps
[Epoch 42 Batch 120/162] avg loss 0.00234785, throughput 3.96448K wps
[Epoch 42 Batch 150/162] avg loss 0.00229262, throughput 3.97629K wps
Begin Testing...
[Epoch 42] train avg loss 0.00225054, dev acc 0.9222, dev avg loss 0.206903, throughput 3.98668K wps
[Epoch 43 Batch 30/162] avg loss 0.00219339, throughput 4.06805K wps
[Epoch 43 Batch 60/162] avg loss 0.00237826, throughput 3.96896K wps
[Epoch 43 Batch 90/162] avg loss 0.00212568, throughput 3.97756K wps
[Epoch 43 Batch 120/162] avg loss 0.00236167, throughput 3.95445K wps
[Epoch 43 Batch 150/162] avg loss 0.00216748, throughput 3.97379K wps
Begin Testing...
[Epoch 43] train avg loss 0.00224704, dev acc 0.9222, dev avg loss 0.203338, throughput 3.98692K wps
[Epoch 44 Batch 30/162] avg loss 0.00191938, throughput 4.07038K wps
[Epoch 44 Batch 60/162] avg loss 0.00216253, throughput 3.97286K wps
[Epoch 44 Batch 90/162] avg loss 0.00236348, throughput 3.97394K wps
[Epoch 44 Batch 120/162] avg loss 0.00257855, throughput 3.97954K wps
[Epoch 44 Batch 150/162] avg loss 0.00187053, throughput 3.98072K wps
Begin Testing...
[Epoch 44] train avg loss 0.00220626, dev acc 0.9222, dev avg loss 0.201217, throughput 3.9935K wps
[Epoch 45 Batch 30/162] avg loss 0.00237135, throughput 4.07961K wps
[Epoch 45 Batch 60/162] avg loss 0.00226576, throughput 3.96355K wps
[Epoch 45 Batch 90/162] avg loss 0.00205386, throughput 3.97904K wps
[Epoch 45 Batch 120/162] avg loss 0.00199288, throughput 3.97075K wps
[Epoch 45 Batch 150/162] avg loss 0.0022605, throughput 3.96052K wps
Begin Testing...
[Epoch 45] train avg loss 0.00214698, dev acc 0.9278, dev avg loss 0.199695, throughput 3.98907K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/162] avg loss 0.00188916, throughput 4.07353K wps
[Epoch 46 Batch 60/162] avg loss 0.00210889, throughput 3.97442K wps
[Epoch 46 Batch 90/162] avg loss 0.00213099, throughput 3.97875K wps
[Epoch 46 Batch 120/162] avg loss 0.00219747, throughput 3.97075K wps
[Epoch 46 Batch 150/162] avg loss 0.00212056, throughput 3.96949K wps
Begin Testing...
[Epoch 46] train avg loss 0.00209399, dev acc 0.9244, dev avg loss 0.19968, throughput 3.99081K wps
[Epoch 47 Batch 30/162] avg loss 0.00182045, throughput 4.07079K wps
[Epoch 47 Batch 60/162] avg loss 0.00209653, throughput 3.97538K wps
[Epoch 47 Batch 90/162] avg loss 0.00207235, throughput 3.95662K wps
[Epoch 47 Batch 120/162] avg loss 0.00224813, throughput 3.97732K wps
[Epoch 47 Batch 150/162] avg loss 0.00198243, throughput 3.97826K wps
Begin Testing...
[Epoch 47] train avg loss 0.00203443, dev acc 0.9256, dev avg loss 0.198946, throughput 3.99012K wps
[Epoch 48 Batch 30/162] avg loss 0.00177836, throughput 4.05959K wps
[Epoch 48 Batch 60/162] avg loss 0.00189354, throughput 3.9701K wps
[Epoch 48 Batch 90/162] avg loss 0.00202949, throughput 3.97275K wps
[Epoch 48 Batch 120/162] avg loss 0.00193725, throughput 3.9582K wps
[Epoch 48 Batch 150/162] avg loss 0.00210234, throughput 3.97246K wps
Begin Testing...
[Epoch 48] train avg loss 0.00196887, dev acc 0.9256, dev avg loss 0.199631, throughput 3.98442K wps
[Epoch 49 Batch 30/162] avg loss 0.00179832, throughput 4.06311K wps
[Epoch 49 Batch 60/162] avg loss 0.00198075, throughput 3.96913K wps
[Epoch 49 Batch 90/162] avg loss 0.00196167, throughput 3.97614K wps
[Epoch 49 Batch 120/162] avg loss 0.00208647, throughput 3.96516K wps
[Epoch 49 Batch 150/162] avg loss 0.00202397, throughput 3.9842K wps
Begin Testing...
[Epoch 49] train avg loss 0.00195415, dev acc 0.9244, dev avg loss 0.198052, throughput 3.99076K wps
[Epoch 50 Batch 30/162] avg loss 0.00192823, throughput 4.07439K wps
[Epoch 50 Batch 60/162] avg loss 0.00184184, throughput 3.96546K wps
[Epoch 50 Batch 90/162] avg loss 0.00191144, throughput 3.97925K wps
[Epoch 50 Batch 120/162] avg loss 0.00194282, throughput 3.95863K wps
[Epoch 50 Batch 150/162] avg loss 0.00177208, throughput 3.96349K wps
Begin Testing...
[Epoch 50] train avg loss 0.00187643, dev acc 0.9256, dev avg loss 0.19769, throughput 3.98704K wps
[Epoch 51 Batch 30/162] avg loss 0.00189407, throughput 4.07466K wps
[Epoch 51 Batch 60/162] avg loss 0.00202782, throughput 3.97555K wps
[Epoch 51 Batch 90/162] avg loss 0.00186664, throughput 3.98129K wps
[Epoch 51 Batch 120/162] avg loss 0.00178902, throughput 3.96913K wps
[Epoch 51 Batch 150/162] avg loss 0.00153724, throughput 3.98069K wps
Begin Testing...
[Epoch 51] train avg loss 0.00182959, dev acc 0.9278, dev avg loss 0.196837, throughput 3.99404K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/162] avg loss 0.00172966, throughput 4.06337K wps
[Epoch 52 Batch 60/162] avg loss 0.00157768, throughput 3.96194K wps
[Epoch 52 Batch 90/162] avg loss 0.00171, throughput 3.96328K wps
[Epoch 52 Batch 120/162] avg loss 0.00176919, throughput 3.96735K wps
[Epoch 52 Batch 150/162] avg loss 0.00174608, throughput 3.9744K wps
Begin Testing...
[Epoch 52] train avg loss 0.00172601, dev acc 0.9278, dev avg loss 0.19631, throughput 3.98417K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/162] avg loss 0.0014246, throughput 4.07507K wps
[Epoch 53 Batch 60/162] avg loss 0.0016873, throughput 3.97109K wps
[Epoch 53 Batch 90/162] avg loss 0.00186413, throughput 3.96585K wps
[Epoch 53 Batch 120/162] avg loss 0.00162794, throughput 3.96382K wps
[Epoch 53 Batch 150/162] avg loss 0.00175757, throughput 3.96964K wps
Begin Testing...
[Epoch 53] train avg loss 0.00168468, dev acc 0.9278, dev avg loss 0.195975, throughput 3.98788K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/162] avg loss 0.00185997, throughput 4.05072K wps
[Epoch 54 Batch 60/162] avg loss 0.00161953, throughput 3.97703K wps
[Epoch 54 Batch 90/162] avg loss 0.00158543, throughput 3.97818K wps
[Epoch 54 Batch 120/162] avg loss 0.00177497, throughput 3.9585K wps
[Epoch 54 Batch 150/162] avg loss 0.0014762, throughput 3.97309K wps
Begin Testing...
[Epoch 54] train avg loss 0.00166901, dev acc 0.9256, dev avg loss 0.198069, throughput 3.98588K wps
[Epoch 55 Batch 30/162] avg loss 0.00156841, throughput 4.0638K wps
[Epoch 55 Batch 60/162] avg loss 0.00175288, throughput 3.96347K wps
[Epoch 55 Batch 90/162] avg loss 0.00151782, throughput 3.96307K wps
[Epoch 55 Batch 120/162] avg loss 0.00161308, throughput 3.9685K wps
[Epoch 55 Batch 150/162] avg loss 0.00154536, throughput 3.96478K wps
Begin Testing...
[Epoch 55] train avg loss 0.00160752, dev acc 0.9289, dev avg loss 0.195289, throughput 3.98374K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/162] avg loss 0.0015456, throughput 4.05932K wps
[Epoch 56 Batch 60/162] avg loss 0.00165165, throughput 3.97605K wps
[Epoch 56 Batch 90/162] avg loss 0.00148327, throughput 3.97375K wps
[Epoch 56 Batch 120/162] avg loss 0.001701, throughput 3.97477K wps
[Epoch 56 Batch 150/162] avg loss 0.00157101, throughput 3.97692K wps
Begin Testing...
[Epoch 56] train avg loss 0.00158956, dev acc 0.9244, dev avg loss 0.195662, throughput 3.98835K wps
[Epoch 57 Batch 30/162] avg loss 0.00155194, throughput 4.05832K wps
[Epoch 57 Batch 60/162] avg loss 0.00152014, throughput 3.97732K wps
[Epoch 57 Batch 90/162] avg loss 0.00138064, throughput 3.96855K wps
[Epoch 57 Batch 120/162] avg loss 0.0015946, throughput 3.97967K wps
[Epoch 57 Batch 150/162] avg loss 0.00160899, throughput 3.96492K wps
Begin Testing...
[Epoch 57] train avg loss 0.00151164, dev acc 0.9289, dev avg loss 0.195358, throughput 3.98852K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/162] avg loss 0.00141487, throughput 4.06993K wps
[Epoch 58 Batch 60/162] avg loss 0.00146411, throughput 3.96066K wps
[Epoch 58 Batch 90/162] avg loss 0.00161622, throughput 3.97797K wps
[Epoch 58 Batch 120/162] avg loss 0.00149598, throughput 3.96824K wps
[Epoch 58 Batch 150/162] avg loss 0.00136371, throughput 3.98042K wps
Begin Testing...
[Epoch 58] train avg loss 0.00147319, dev acc 0.9278, dev avg loss 0.19505, throughput 3.98973K wps
[Epoch 59 Batch 30/162] avg loss 0.00143764, throughput 4.04843K wps
[Epoch 59 Batch 60/162] avg loss 0.00131216, throughput 3.96691K wps
[Epoch 59 Batch 90/162] avg loss 0.00137986, throughput 3.96925K wps
[Epoch 59 Batch 120/162] avg loss 0.00154006, throughput 3.96975K wps
[Epoch 59 Batch 150/162] avg loss 0.00154355, throughput 3.96916K wps
Begin Testing...
[Epoch 59] train avg loss 0.0014425, dev acc 0.9278, dev avg loss 0.194601, throughput 3.98381K wps
[Epoch 60 Batch 30/162] avg loss 0.00144591, throughput 4.06119K wps
[Epoch 60 Batch 60/162] avg loss 0.00131681, throughput 3.95138K wps
[Epoch 60 Batch 90/162] avg loss 0.00140052, throughput 3.97299K wps
[Epoch 60 Batch 120/162] avg loss 0.00126221, throughput 3.97533K wps
[Epoch 60 Batch 150/162] avg loss 0.00150643, throughput 3.94771K wps
Begin Testing...
[Epoch 60] train avg loss 0.00138458, dev acc 0.9322, dev avg loss 0.194669, throughput 3.98088K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/162] avg loss 0.00135871, throughput 4.07136K wps
[Epoch 61 Batch 60/162] avg loss 0.0013817, throughput 3.97352K wps
[Epoch 61 Batch 90/162] avg loss 0.00135154, throughput 3.9647K wps
[Epoch 61 Batch 120/162] avg loss 0.00141352, throughput 3.97216K wps
[Epoch 61 Batch 150/162] avg loss 0.00139433, throughput 3.97709K wps
Begin Testing...
[Epoch 61] train avg loss 0.00137068, dev acc 0.9278, dev avg loss 0.195256, throughput 3.98851K wps
[Epoch 62 Batch 30/162] avg loss 0.00144547, throughput 4.07158K wps
[Epoch 62 Batch 60/162] avg loss 0.00122426, throughput 3.98108K wps
[Epoch 62 Batch 90/162] avg loss 0.00134065, throughput 3.95767K wps
[Epoch 62 Batch 120/162] avg loss 0.00109397, throughput 3.97327K wps
[Epoch 62 Batch 150/162] avg loss 0.00135528, throughput 3.97329K wps
Begin Testing...
[Epoch 62] train avg loss 0.00128843, dev acc 0.9289, dev avg loss 0.194414, throughput 3.99005K wps
[Epoch 63 Batch 30/162] avg loss 0.00125202, throughput 4.07352K wps
[Epoch 63 Batch 60/162] avg loss 0.00139274, throughput 3.96558K wps
[Epoch 63 Batch 90/162] avg loss 0.00118731, throughput 3.97429K wps
[Epoch 63 Batch 120/162] avg loss 0.00133777, throughput 3.96204K wps
[Epoch 63 Batch 150/162] avg loss 0.00122035, throughput 3.97751K wps
Begin Testing...
[Epoch 63] train avg loss 0.00128744, dev acc 0.9289, dev avg loss 0.194443, throughput 3.98927K wps
[Epoch 64 Batch 30/162] avg loss 0.00125612, throughput 4.06915K wps
[Epoch 64 Batch 60/162] avg loss 0.00129519, throughput 3.96186K wps
[Epoch 64 Batch 90/162] avg loss 0.00126974, throughput 3.96114K wps
[Epoch 64 Batch 120/162] avg loss 0.00117144, throughput 3.96289K wps
[Epoch 64 Batch 150/162] avg loss 0.00115197, throughput 3.95437K wps
Begin Testing...
[Epoch 64] train avg loss 0.00124166, dev acc 0.9300, dev avg loss 0.194494, throughput 3.97879K wps
[Epoch 65 Batch 30/162] avg loss 0.00115705, throughput 4.05168K wps
[Epoch 65 Batch 60/162] avg loss 0.0013099, throughput 3.95616K wps
[Epoch 65 Batch 90/162] avg loss 0.00119286, throughput 3.96791K wps
[Epoch 65 Batch 120/162] avg loss 0.00125094, throughput 3.97872K wps
[Epoch 65 Batch 150/162] avg loss 0.00112935, throughput 3.95065K wps
Begin Testing...
[Epoch 65] train avg loss 0.00122119, dev acc 0.9300, dev avg loss 0.194053, throughput 3.97957K wps
[Epoch 66 Batch 30/162] avg loss 0.000993081, throughput 4.07261K wps
[Epoch 66 Batch 60/162] avg loss 0.00113797, throughput 3.97289K wps
[Epoch 66 Batch 90/162] avg loss 0.00114939, throughput 3.97254K wps
[Epoch 66 Batch 120/162] avg loss 0.00124553, throughput 3.96109K wps
[Epoch 66 Batch 150/162] avg loss 0.00114982, throughput 3.97429K wps
Begin Testing...
[Epoch 66] train avg loss 0.00114571, dev acc 0.9300, dev avg loss 0.194417, throughput 3.9889K wps
[Epoch 67 Batch 30/162] avg loss 0.00112671, throughput 4.05188K wps
[Epoch 67 Batch 60/162] avg loss 0.00101472, throughput 3.97913K wps
[Epoch 67 Batch 90/162] avg loss 0.00119761, throughput 3.97477K wps
[Epoch 67 Batch 120/162] avg loss 0.00111163, throughput 3.96518K wps
[Epoch 67 Batch 150/162] avg loss 0.00113358, throughput 3.97508K wps
Begin Testing...
[Epoch 67] train avg loss 0.00112229, dev acc 0.9278, dev avg loss 0.194713, throughput 3.98759K wps
[Epoch 68 Batch 30/162] avg loss 0.00102696, throughput 4.0552K wps
[Epoch 68 Batch 60/162] avg loss 0.00122908, throughput 3.96967K wps
[Epoch 68 Batch 90/162] avg loss 0.00112258, throughput 3.95639K wps
[Epoch 68 Batch 120/162] avg loss 0.00111094, throughput 3.95983K wps
[Epoch 68 Batch 150/162] avg loss 0.00115894, throughput 3.97335K wps
Begin Testing...
[Epoch 68] train avg loss 0.00113618, dev acc 0.9300, dev avg loss 0.194665, throughput 3.98128K wps
[Epoch 69 Batch 30/162] avg loss 0.00108668, throughput 4.05415K wps
[Epoch 69 Batch 60/162] avg loss 0.00113899, throughput 3.96978K wps
[Epoch 69 Batch 90/162] avg loss 0.000975535, throughput 3.96897K wps
[Epoch 69 Batch 120/162] avg loss 0.00116673, throughput 3.97084K wps
[Epoch 69 Batch 150/162] avg loss 0.00106264, throughput 3.95636K wps
Begin Testing...
[Epoch 69] train avg loss 0.00108383, dev acc 0.9267, dev avg loss 0.195172, throughput 3.98314K wps
[Epoch 70 Batch 30/162] avg loss 0.00107958, throughput 4.07587K wps
[Epoch 70 Batch 60/162] avg loss 0.00101452, throughput 3.96644K wps
[Epoch 70 Batch 90/162] avg loss 0.00114226, throughput 3.96997K wps
[Epoch 70 Batch 120/162] avg loss 0.000976259, throughput 3.9734K wps
[Epoch 70 Batch 150/162] avg loss 0.00101817, throughput 3.965K wps
Begin Testing...
[Epoch 70] train avg loss 0.00104014, dev acc 0.9300, dev avg loss 0.194751, throughput 3.98821K wps
[Epoch 71 Batch 30/162] avg loss 0.0010623, throughput 4.05667K wps
[Epoch 71 Batch 60/162] avg loss 0.000894561, throughput 3.9677K wps
[Epoch 71 Batch 90/162] avg loss 0.00106219, throughput 3.97756K wps
[Epoch 71 Batch 120/162] avg loss 0.000971939, throughput 3.97194K wps
[Epoch 71 Batch 150/162] avg loss 0.00105928, throughput 3.97493K wps
Begin Testing...
[Epoch 71] train avg loss 0.00102196, dev acc 0.9289, dev avg loss 0.195239, throughput 3.98789K wps
[Epoch 72 Batch 30/162] avg loss 0.00114319, throughput 4.06139K wps
[Epoch 72 Batch 60/162] avg loss 0.000972605, throughput 3.97917K wps
[Epoch 72 Batch 90/162] avg loss 0.000899914, throughput 3.97315K wps
[Epoch 72 Batch 120/162] avg loss 0.000911599, throughput 3.95764K wps
[Epoch 72 Batch 150/162] avg loss 0.00103065, throughput 3.95696K wps
Begin Testing...
[Epoch 72] train avg loss 0.000993318, dev acc 0.9311, dev avg loss 0.195276, throughput 3.98424K wps
[Epoch 73 Batch 30/162] avg loss 0.0010247, throughput 4.05742K wps
[Epoch 73 Batch 60/162] avg loss 0.000950399, throughput 3.96495K wps
[Epoch 73 Batch 90/162] avg loss 0.000927606, throughput 3.97392K wps
[Epoch 73 Batch 120/162] avg loss 0.00103877, throughput 3.96823K wps
[Epoch 73 Batch 150/162] avg loss 0.000930623, throughput 3.96454K wps
Begin Testing...
[Epoch 73] train avg loss 0.000975581, dev acc 0.9256, dev avg loss 0.196049, throughput 3.98471K wps
[Epoch 74 Batch 30/162] avg loss 0.000923414, throughput 4.05479K wps
[Epoch 74 Batch 60/162] avg loss 0.000845659, throughput 3.95109K wps
[Epoch 74 Batch 90/162] avg loss 0.000892244, throughput 3.96551K wps
[Epoch 74 Batch 120/162] avg loss 0.00100333, throughput 3.9768K wps
[Epoch 74 Batch 150/162] avg loss 0.000879989, throughput 3.97748K wps
Begin Testing...
[Epoch 74] train avg loss 0.000914449, dev acc 0.9322, dev avg loss 0.195194, throughput 3.98432K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/162] avg loss 0.00088972, throughput 4.06668K wps
[Epoch 75 Batch 60/162] avg loss 0.000879364, throughput 3.97895K wps
[Epoch 75 Batch 90/162] avg loss 0.000925392, throughput 3.9523K wps
[Epoch 75 Batch 120/162] avg loss 0.000940388, throughput 3.97022K wps
[Epoch 75 Batch 150/162] avg loss 0.000939531, throughput 3.96028K wps
Begin Testing...
[Epoch 75] train avg loss 0.000914795, dev acc 0.9311, dev avg loss 0.195021, throughput 3.98332K wps
[Epoch 76 Batch 30/162] avg loss 0.00107576, throughput 4.06566K wps
[Epoch 76 Batch 60/162] avg loss 0.000896801, throughput 3.97596K wps
[Epoch 76 Batch 90/162] avg loss 0.000762681, throughput 3.97951K wps
[Epoch 76 Batch 120/162] avg loss 0.00094164, throughput 3.97205K wps
[Epoch 76 Batch 150/162] avg loss 0.000791399, throughput 3.968K wps
Begin Testing...
[Epoch 76] train avg loss 0.000899698, dev acc 0.9311, dev avg loss 0.195815, throughput 3.99016K wps
[Epoch 77 Batch 30/162] avg loss 0.000862532, throughput 4.05362K wps
[Epoch 77 Batch 60/162] avg loss 0.000852622, throughput 3.97252K wps
[Epoch 77 Batch 90/162] avg loss 0.000876854, throughput 3.96998K wps
[Epoch 77 Batch 120/162] avg loss 0.00088819, throughput 3.96324K wps
[Epoch 77 Batch 150/162] avg loss 0.000962029, throughput 3.97027K wps
Begin Testing...
[Epoch 77] train avg loss 0.00089055, dev acc 0.9322, dev avg loss 0.195659, throughput 3.98341K wps
Observed Improvement.
Begin Testing...
[Epoch 78 Batch 30/162] avg loss 0.000801808, throughput 4.07123K wps
[Epoch 78 Batch 60/162] avg loss 0.000732726, throughput 3.97181K wps
[Epoch 78 Batch 90/162] avg loss 0.000928814, throughput 3.96281K wps
[Epoch 78 Batch 120/162] avg loss 0.000896114, throughput 3.96693K wps
[Epoch 78 Batch 150/162] avg loss 0.000850755, throughput 3.96873K wps
Begin Testing...
[Epoch 78] train avg loss 0.000834256, dev acc 0.9311, dev avg loss 0.196083, throughput 3.98705K wps
[Epoch 79 Batch 30/162] avg loss 0.000895604, throughput 4.06358K wps
[Epoch 79 Batch 60/162] avg loss 0.000883411, throughput 3.94125K wps
[Epoch 79 Batch 90/162] avg loss 0.000784099, throughput 3.93421K wps
[Epoch 79 Batch 120/162] avg loss 0.000913711, throughput 3.94379K wps
[Epoch 79 Batch 150/162] avg loss 0.000763101, throughput 3.95122K wps
Begin Testing...
[Epoch 79] train avg loss 0.000840198, dev acc 0.9322, dev avg loss 0.196294, throughput 3.96383K wps
Observed Improvement.
Begin Testing...
[Epoch 80 Batch 30/162] avg loss 0.000821571, throughput 4.0527K wps
[Epoch 80 Batch 60/162] avg loss 0.00086122, throughput 3.94882K wps
[Epoch 80 Batch 90/162] avg loss 0.000820334, throughput 3.96539K wps
[Epoch 80 Batch 120/162] avg loss 0.000772869, throughput 3.97222K wps
[Epoch 80 Batch 150/162] avg loss 0.000772733, throughput 3.97969K wps
Begin Testing...
[Epoch 80] train avg loss 0.000803797, dev acc 0.9289, dev avg loss 0.197227, throughput 3.98235K wps
[Epoch 81 Batch 30/162] avg loss 0.000828161, throughput 4.0622K wps
[Epoch 81 Batch 60/162] avg loss 0.000835509, throughput 3.96861K wps
[Epoch 81 Batch 90/162] avg loss 0.000792999, throughput 3.97439K wps
[Epoch 81 Batch 120/162] avg loss 0.00082976, throughput 3.97924K wps
[Epoch 81 Batch 150/162] avg loss 0.000774974, throughput 3.97599K wps
Begin Testing...
[Epoch 81] train avg loss 0.000812196, dev acc 0.9322, dev avg loss 0.196908, throughput 3.98923K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/162] avg loss 0.000750078, throughput 4.05329K wps
[Epoch 82 Batch 60/162] avg loss 0.000829557, throughput 3.97196K wps
[Epoch 82 Batch 90/162] avg loss 0.000768323, throughput 3.97706K wps
[Epoch 82 Batch 120/162] avg loss 0.000771428, throughput 3.96075K wps
[Epoch 82 Batch 150/162] avg loss 0.000724492, throughput 3.97526K wps
Begin Testing...
[Epoch 82] train avg loss 0.000761585, dev acc 0.9311, dev avg loss 0.197222, throughput 3.98615K wps
[Epoch 83 Batch 30/162] avg loss 0.00082209, throughput 4.05915K wps
[Epoch 83 Batch 60/162] avg loss 0.000758666, throughput 3.94741K wps
[Epoch 83 Batch 90/162] avg loss 0.000787666, throughput 3.97259K wps
[Epoch 83 Batch 120/162] avg loss 0.000730884, throughput 3.97057K wps
[Epoch 83 Batch 150/162] avg loss 0.000687382, throughput 3.95537K wps
Begin Testing...
[Epoch 83] train avg loss 0.000762458, dev acc 0.9344, dev avg loss 0.198298, throughput 3.98078K wps
Observed Improvement.
Begin Testing...
[Epoch 84 Batch 30/162] avg loss 0.000802707, throughput 4.0742K wps
[Epoch 84 Batch 60/162] avg loss 0.000728098, throughput 3.95489K wps
[Epoch 84 Batch 90/162] avg loss 0.00075313, throughput 3.95523K wps
[Epoch 84 Batch 120/162] avg loss 0.000815124, throughput 3.96974K wps
[Epoch 84 Batch 150/162] avg loss 0.000675278, throughput 3.98185K wps
Begin Testing...
[Epoch 84] train avg loss 0.000758907, dev acc 0.9311, dev avg loss 0.197306, throughput 3.98512K wps
[Epoch 85 Batch 30/162] avg loss 0.000746556, throughput 4.06314K wps
[Epoch 85 Batch 60/162] avg loss 0.00070243, throughput 3.96552K wps
[Epoch 85 Batch 90/162] avg loss 0.000657847, throughput 3.95226K wps
[Epoch 85 Batch 120/162] avg loss 0.000696918, throughput 3.96639K wps
[Epoch 85 Batch 150/162] avg loss 0.000769058, throughput 3.96621K wps
Begin Testing...
[Epoch 85] train avg loss 0.000714256, dev acc 0.9344, dev avg loss 0.198347, throughput 3.98023K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/162] avg loss 0.000822011, throughput 4.05577K wps
[Epoch 86 Batch 60/162] avg loss 0.000705529, throughput 3.97529K wps
[Epoch 86 Batch 90/162] avg loss 0.000650549, throughput 3.98052K wps
[Epoch 86 Batch 120/162] avg loss 0.000758922, throughput 3.97356K wps
[Epoch 86 Batch 150/162] avg loss 0.000689104, throughput 3.97772K wps
Begin Testing...
[Epoch 86] train avg loss 0.000719348, dev acc 0.9289, dev avg loss 0.198435, throughput 3.99039K wps
[Epoch 87 Batch 30/162] avg loss 0.000714539, throughput 4.06239K wps
[Epoch 87 Batch 60/162] avg loss 0.000712877, throughput 3.95751K wps
[Epoch 87 Batch 90/162] avg loss 0.000736654, throughput 3.9727K wps
[Epoch 87 Batch 120/162] avg loss 0.000599053, throughput 3.95207K wps
[Epoch 87 Batch 150/162] avg loss 0.000657248, throughput 3.95663K wps
Begin Testing...
[Epoch 87] train avg loss 0.000678551, dev acc 0.9322, dev avg loss 0.199065, throughput 3.97927K wps
[Epoch 88 Batch 30/162] avg loss 0.000692631, throughput 4.06034K wps
[Epoch 88 Batch 60/162] avg loss 0.000695741, throughput 3.97433K wps
[Epoch 88 Batch 90/162] avg loss 0.00060926, throughput 3.97142K wps
[Epoch 88 Batch 120/162] avg loss 0.000675683, throughput 3.95542K wps
[Epoch 88 Batch 150/162] avg loss 0.000714039, throughput 3.96271K wps
Begin Testing...
[Epoch 88] train avg loss 0.000675253, dev acc 0.9322, dev avg loss 0.199251, throughput 3.98278K wps
[Epoch 89 Batch 30/162] avg loss 0.000632914, throughput 4.06815K wps
[Epoch 89 Batch 60/162] avg loss 0.000608961, throughput 3.96436K wps
[Epoch 89 Batch 90/162] avg loss 0.000714926, throughput 3.96825K wps
[Epoch 89 Batch 120/162] avg loss 0.000655593, throughput 3.97924K wps
[Epoch 89 Batch 150/162] avg loss 0.00069089, throughput 3.9752K wps
Begin Testing...
[Epoch 89] train avg loss 0.000653005, dev acc 0.9311, dev avg loss 0.200192, throughput 3.98872K wps
[Epoch 90 Batch 30/162] avg loss 0.000649872, throughput 4.07609K wps
[Epoch 90 Batch 60/162] avg loss 0.000597467, throughput 3.97482K wps
[Epoch 90 Batch 90/162] avg loss 0.000641274, throughput 3.96771K wps
[Epoch 90 Batch 120/162] avg loss 0.000640755, throughput 3.97259K wps
[Epoch 90 Batch 150/162] avg loss 0.000641544, throughput 3.97663K wps
Begin Testing...
[Epoch 90] train avg loss 0.000632927, dev acc 0.9311, dev avg loss 0.199714, throughput 3.99228K wps
[Epoch 91 Batch 30/162] avg loss 0.000687435, throughput 4.06839K wps
[Epoch 91 Batch 60/162] avg loss 0.000575473, throughput 3.96425K wps
[Epoch 91 Batch 90/162] avg loss 0.000687051, throughput 3.97115K wps
[Epoch 91 Batch 120/162] avg loss 0.000564251, throughput 3.96532K wps
[Epoch 91 Batch 150/162] avg loss 0.000648289, throughput 3.97641K wps
Begin Testing...
[Epoch 91] train avg loss 0.000628273, dev acc 0.9300, dev avg loss 0.200585, throughput 3.98641K wps
[Epoch 92 Batch 30/162] avg loss 0.000640905, throughput 4.05682K wps
[Epoch 92 Batch 60/162] avg loss 0.000640836, throughput 3.95214K wps
[Epoch 92 Batch 90/162] avg loss 0.000587659, throughput 3.96408K wps
[Epoch 92 Batch 120/162] avg loss 0.000590504, throughput 3.9721K wps
[Epoch 92 Batch 150/162] avg loss 0.000569606, throughput 3.95868K wps
Begin Testing...
[Epoch 92] train avg loss 0.000608489, dev acc 0.9300, dev avg loss 0.200633, throughput 3.97887K wps
[Epoch 93 Batch 30/162] avg loss 0.000609061, throughput 4.07484K wps
[Epoch 93 Batch 60/162] avg loss 0.000612655, throughput 3.9666K wps
[Epoch 93 Batch 90/162] avg loss 0.000626828, throughput 3.97382K wps
[Epoch 93 Batch 120/162] avg loss 0.000625376, throughput 3.97992K wps
[Epoch 93 Batch 150/162] avg loss 0.000564252, throughput 3.98157K wps
Begin Testing...
[Epoch 93] train avg loss 0.000603553, dev acc 0.9322, dev avg loss 0.201545, throughput 3.99171K wps
[Epoch 94 Batch 30/162] avg loss 0.000623252, throughput 4.07788K wps
[Epoch 94 Batch 60/162] avg loss 0.000665101, throughput 3.98251K wps
[Epoch 94 Batch 90/162] avg loss 0.000567364, throughput 3.96759K wps
[Epoch 94 Batch 120/162] avg loss 0.000520082, throughput 3.95976K wps
[Epoch 94 Batch 150/162] avg loss 0.000641273, throughput 3.97622K wps
Begin Testing...
[Epoch 94] train avg loss 0.000590441, dev acc 0.9322, dev avg loss 0.201754, throughput 3.9894K wps
[Epoch 95 Batch 30/162] avg loss 0.000621504, throughput 4.07355K wps
[Epoch 95 Batch 60/162] avg loss 0.00051638, throughput 3.96852K wps
[Epoch 95 Batch 90/162] avg loss 0.000564625, throughput 3.96822K wps
[Epoch 95 Batch 120/162] avg loss 0.00057881, throughput 3.95215K wps
[Epoch 95 Batch 150/162] avg loss 0.000528066, throughput 3.96621K wps
Begin Testing...
[Epoch 95] train avg loss 0.000557837, dev acc 0.9344, dev avg loss 0.202052, throughput 3.98462K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/162] avg loss 0.00063236, throughput 4.053K wps
[Epoch 96 Batch 60/162] avg loss 0.000596782, throughput 3.96646K wps
[Epoch 96 Batch 90/162] avg loss 0.000545281, throughput 3.96648K wps
[Epoch 96 Batch 120/162] avg loss 0.000608436, throughput 3.96547K wps
[Epoch 96 Batch 150/162] avg loss 0.000539667, throughput 3.97409K wps
Begin Testing...
[Epoch 96] train avg loss 0.000584865, dev acc 0.9289, dev avg loss 0.202769, throughput 3.98367K wps
[Epoch 97 Batch 30/162] avg loss 0.00056157, throughput 4.04401K wps
[Epoch 97 Batch 60/162] avg loss 0.000535047, throughput 3.96224K wps
[Epoch 97 Batch 90/162] avg loss 0.000491336, throughput 3.95878K wps
[Epoch 97 Batch 120/162] avg loss 0.000533752, throughput 3.96079K wps
[Epoch 97 Batch 150/162] avg loss 0.000533236, throughput 3.96735K wps
Begin Testing...
[Epoch 97] train avg loss 0.000530824, dev acc 0.9356, dev avg loss 0.203686, throughput 3.9777K wps
Observed Improvement.
Begin Testing...
[Epoch 98 Batch 30/162] avg loss 0.000594737, throughput 4.06367K wps
[Epoch 98 Batch 60/162] avg loss 0.000516852, throughput 3.97653K wps
[Epoch 98 Batch 90/162] avg loss 0.000577442, throughput 3.97569K wps
[Epoch 98 Batch 120/162] avg loss 0.000591976, throughput 3.98159K wps
[Epoch 98 Batch 150/162] avg loss 0.000509601, throughput 3.97592K wps
Begin Testing...
[Epoch 98] train avg loss 0.000561676, dev acc 0.9356, dev avg loss 0.203676, throughput 3.99337K wps
Observed Improvement.
Begin Testing...
[Epoch 99 Batch 30/162] avg loss 0.000541768, throughput 4.06548K wps
[Epoch 99 Batch 60/162] avg loss 0.000573792, throughput 3.96887K wps
[Epoch 99 Batch 90/162] avg loss 0.000443161, throughput 3.96291K wps
[Epoch 99 Batch 120/162] avg loss 0.000533739, throughput 3.97525K wps
[Epoch 99 Batch 150/162] avg loss 0.000490522, throughput 3.97055K wps
Begin Testing...
[Epoch 99] train avg loss 0.000516057, dev acc 0.9311, dev avg loss 0.203343, throughput 3.98546K wps
[Epoch 100 Batch 30/162] avg loss 0.000468421, throughput 4.06524K wps
[Epoch 100 Batch 60/162] avg loss 0.000504126, throughput 3.9565K wps
[Epoch 100 Batch 90/162] avg loss 0.000513309, throughput 3.96698K wps
[Epoch 100 Batch 120/162] avg loss 0.000536252, throughput 3.956K wps
[Epoch 100 Batch 150/162] avg loss 0.000530497, throughput 3.97458K wps
Begin Testing...
[Epoch 100] train avg loss 0.000518236, dev acc 0.9322, dev avg loss 0.203837, throughput 3.98307K wps
[Epoch 101 Batch 30/162] avg loss 0.000548444, throughput 4.06195K wps
[Epoch 101 Batch 60/162] avg loss 0.000523962, throughput 3.9731K wps
[Epoch 101 Batch 90/162] avg loss 0.000488764, throughput 3.96787K wps
[Epoch 101 Batch 120/162] avg loss 0.000506302, throughput 3.96266K wps
[Epoch 101 Batch 150/162] avg loss 0.000517755, throughput 3.97834K wps
Begin Testing...
[Epoch 101] train avg loss 0.000515082, dev acc 0.9311, dev avg loss 0.203801, throughput 3.98746K wps
[Epoch 102 Batch 30/162] avg loss 0.000502274, throughput 4.07013K wps
[Epoch 102 Batch 60/162] avg loss 0.000543446, throughput 3.98142K wps
[Epoch 102 Batch 90/162] avg loss 0.000454773, throughput 3.96454K wps
[Epoch 102 Batch 120/162] avg loss 0.000498364, throughput 3.96355K wps
[Epoch 102 Batch 150/162] avg loss 0.00045692, throughput 3.97795K wps
Begin Testing...
[Epoch 102] train avg loss 0.000496298, dev acc 0.9333, dev avg loss 0.204446, throughput 3.98871K wps
[Epoch 103 Batch 30/162] avg loss 0.000470346, throughput 4.07247K wps
[Epoch 103 Batch 60/162] avg loss 0.000449099, throughput 3.97413K wps
[Epoch 103 Batch 90/162] avg loss 0.000466097, throughput 3.97718K wps
[Epoch 103 Batch 120/162] avg loss 0.000447582, throughput 3.9739K wps
[Epoch 103 Batch 150/162] avg loss 0.00053753, throughput 3.98123K wps
Begin Testing...
[Epoch 103] train avg loss 0.000474728, dev acc 0.9344, dev avg loss 0.204703, throughput 3.99394K wps
[Epoch 104 Batch 30/162] avg loss 0.00046895, throughput 4.04474K wps
[Epoch 104 Batch 60/162] avg loss 0.000479916, throughput 3.97357K wps
[Epoch 104 Batch 90/162] avg loss 0.00040712, throughput 3.97057K wps
[Epoch 104 Batch 120/162] avg loss 0.000453326, throughput 3.96659K wps
[Epoch 104 Batch 150/162] avg loss 0.000491085, throughput 3.97982K wps
Begin Testing...
[Epoch 104] train avg loss 0.000459678, dev acc 0.9344, dev avg loss 0.204732, throughput 3.9862K wps
[Epoch 105 Batch 30/162] avg loss 0.000445021, throughput 4.07239K wps
[Epoch 105 Batch 60/162] avg loss 0.000475224, throughput 3.97651K wps
[Epoch 105 Batch 90/162] avg loss 0.00043264, throughput 3.97582K wps
[Epoch 105 Batch 120/162] avg loss 0.000436481, throughput 3.97342K wps
[Epoch 105 Batch 150/162] avg loss 0.000471009, throughput 3.96066K wps
Begin Testing...
[Epoch 105] train avg loss 0.00044998, dev acc 0.9322, dev avg loss 0.205772, throughput 3.99079K wps
[Epoch 106 Batch 30/162] avg loss 0.000504877, throughput 4.04745K wps
[Epoch 106 Batch 60/162] avg loss 0.000437597, throughput 3.95618K wps
[Epoch 106 Batch 90/162] avg loss 0.000435634, throughput 3.9734K wps
[Epoch 106 Batch 120/162] avg loss 0.000476551, throughput 3.96501K wps
[Epoch 106 Batch 150/162] avg loss 0.000426683, throughput 3.97906K wps
Begin Testing...
[Epoch 106] train avg loss 0.00045114, dev acc 0.9322, dev avg loss 0.205817, throughput 3.98219K wps
[Epoch 107 Batch 30/162] avg loss 0.000493102, throughput 4.06996K wps
[Epoch 107 Batch 60/162] avg loss 0.000480903, throughput 3.98179K wps
[Epoch 107 Batch 90/162] avg loss 0.000501863, throughput 3.96008K wps
[Epoch 107 Batch 120/162] avg loss 0.000454089, throughput 3.97148K wps
[Epoch 107 Batch 150/162] avg loss 0.000445808, throughput 3.97555K wps
Begin Testing...
[Epoch 107] train avg loss 0.000476131, dev acc 0.9322, dev avg loss 0.206144, throughput 3.98896K wps
[Epoch 108 Batch 30/162] avg loss 0.000389012, throughput 4.05777K wps
[Epoch 108 Batch 60/162] avg loss 0.000398145, throughput 3.97589K wps
[Epoch 108 Batch 90/162] avg loss 0.000393355, throughput 3.96231K wps
[Epoch 108 Batch 120/162] avg loss 0.00043725, throughput 3.95805K wps
[Epoch 108 Batch 150/162] avg loss 0.000457183, throughput 3.96889K wps
Begin Testing...
[Epoch 108] train avg loss 0.000416409, dev acc 0.9356, dev avg loss 0.20671, throughput 3.98292K wps
Observed Improvement.
Begin Testing...
[Epoch 109 Batch 30/162] avg loss 0.000484516, throughput 4.0416K wps
[Epoch 109 Batch 60/162] avg loss 0.000465337, throughput 3.97687K wps
[Epoch 109 Batch 90/162] avg loss 0.000432818, throughput 3.96741K wps
[Epoch 109 Batch 120/162] avg loss 0.000386052, throughput 3.96697K wps
[Epoch 109 Batch 150/162] avg loss 0.000420271, throughput 3.9762K wps
Begin Testing...
[Epoch 109] train avg loss 0.000437884, dev acc 0.9356, dev avg loss 0.207875, throughput 3.9849K wps
Observed Improvem