-
Notifications
You must be signed in to change notification settings - Fork 155
/
Copy pathSubj_static.log
15175 lines (15175 loc) · 937 KB
/
Subj_static.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Namespace(batch_size=50, data_name='Subj', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='static')
Use gpu0
maximum length (in tokens): 120
Done! Tokenizing Time=0.24s, #Sentences=10000
SentimentNet(
(embedding): Embedding(21326 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/162] avg loss 0.0139927, throughput 0.563051K wps
[Epoch 0 Batch 60/162] avg loss 0.0138302, throughput 9.52509K wps
[Epoch 0 Batch 90/162] avg loss 0.0137475, throughput 9.45732K wps
[Epoch 0 Batch 120/162] avg loss 0.0136195, throughput 9.51905K wps
[Epoch 0 Batch 150/162] avg loss 0.0135821, throughput 9.46978K wps
Begin Testing...
[Epoch 0] train avg loss 0.0137352, dev acc 0.7733, dev avg loss 0.662958, throughput 2.41023K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0133297, throughput 9.67788K wps
[Epoch 1 Batch 60/162] avg loss 0.0131657, throughput 9.37649K wps
[Epoch 1 Batch 90/162] avg loss 0.0131255, throughput 9.49583K wps
[Epoch 1 Batch 120/162] avg loss 0.0129709, throughput 9.35736K wps
[Epoch 1 Batch 150/162] avg loss 0.0129759, throughput 9.46346K wps
Begin Testing...
[Epoch 1] train avg loss 0.013072, dev acc 0.6622, dev avg loss 0.63981, throughput 9.48049K wps
[Epoch 2 Batch 30/162] avg loss 0.0128046, throughput 9.62158K wps
[Epoch 2 Batch 60/162] avg loss 0.0125604, throughput 9.39331K wps
[Epoch 2 Batch 90/162] avg loss 0.0124079, throughput 9.55848K wps
[Epoch 2 Batch 120/162] avg loss 0.0122335, throughput 9.54988K wps
[Epoch 2 Batch 150/162] avg loss 0.0122206, throughput 9.68006K wps
Begin Testing...
[Epoch 2] train avg loss 0.0124286, dev acc 0.8478, dev avg loss 0.600085, throughput 9.55656K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0119629, throughput 9.74152K wps
[Epoch 3 Batch 60/162] avg loss 0.0119284, throughput 9.48386K wps
[Epoch 3 Batch 90/162] avg loss 0.0117038, throughput 9.53708K wps
[Epoch 3 Batch 120/162] avg loss 0.0116698, throughput 9.333K wps
[Epoch 3 Batch 150/162] avg loss 0.0113905, throughput 9.67526K wps
Begin Testing...
[Epoch 3] train avg loss 0.011693, dev acc 0.8567, dev avg loss 0.564077, throughput 9.52981K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/162] avg loss 0.0111505, throughput 9.79962K wps
[Epoch 4 Batch 60/162] avg loss 0.0109733, throughput 9.69988K wps
[Epoch 4 Batch 90/162] avg loss 0.0109122, throughput 9.54538K wps
[Epoch 4 Batch 120/162] avg loss 0.01089, throughput 9.43827K wps
[Epoch 4 Batch 150/162] avg loss 0.0107541, throughput 9.62589K wps
Begin Testing...
[Epoch 4] train avg loss 0.0108868, dev acc 0.8678, dev avg loss 0.522418, throughput 9.6274K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.0104188, throughput 9.71097K wps
[Epoch 5 Batch 60/162] avg loss 0.0102718, throughput 9.58992K wps
[Epoch 5 Batch 90/162] avg loss 0.00994886, throughput 9.49034K wps
[Epoch 5 Batch 120/162] avg loss 0.00980264, throughput 9.46891K wps
[Epoch 5 Batch 150/162] avg loss 0.00998378, throughput 9.70823K wps
Begin Testing...
[Epoch 5] train avg loss 0.0100641, dev acc 0.8722, dev avg loss 0.482372, throughput 9.60097K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/162] avg loss 0.00960078, throughput 9.66465K wps
[Epoch 6 Batch 60/162] avg loss 0.00951817, throughput 9.49316K wps
[Epoch 6 Batch 90/162] avg loss 0.00915172, throughput 9.61593K wps
[Epoch 6 Batch 120/162] avg loss 0.00909617, throughput 9.456K wps
[Epoch 6 Batch 150/162] avg loss 0.00920177, throughput 9.35415K wps
Begin Testing...
[Epoch 6] train avg loss 0.00926776, dev acc 0.8689, dev avg loss 0.447398, throughput 9.5075K wps
[Epoch 7 Batch 30/162] avg loss 0.00898816, throughput 9.71807K wps
[Epoch 7 Batch 60/162] avg loss 0.00859521, throughput 9.54446K wps
[Epoch 7 Batch 90/162] avg loss 0.00854635, throughput 9.38923K wps
[Epoch 7 Batch 120/162] avg loss 0.0082965, throughput 9.61768K wps
[Epoch 7 Batch 150/162] avg loss 0.00855405, throughput 9.64452K wps
Begin Testing...
[Epoch 7] train avg loss 0.00859653, dev acc 0.8722, dev avg loss 0.417487, throughput 9.58623K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.00807569, throughput 9.63364K wps
[Epoch 8 Batch 60/162] avg loss 0.0081229, throughput 9.54556K wps
[Epoch 8 Batch 90/162] avg loss 0.00795082, throughput 9.47762K wps
[Epoch 8 Batch 120/162] avg loss 0.00809858, throughput 9.34018K wps
[Epoch 8 Batch 150/162] avg loss 0.00807176, throughput 9.64029K wps
Begin Testing...
[Epoch 8] train avg loss 0.00802045, dev acc 0.8700, dev avg loss 0.394164, throughput 9.51211K wps
[Epoch 9 Batch 30/162] avg loss 0.00765048, throughput 9.68748K wps
[Epoch 9 Batch 60/162] avg loss 0.00766964, throughput 9.60219K wps
[Epoch 9 Batch 90/162] avg loss 0.00778438, throughput 9.46143K wps
[Epoch 9 Batch 120/162] avg loss 0.00752026, throughput 9.51964K wps
[Epoch 9 Batch 150/162] avg loss 0.0073391, throughput 9.57476K wps
Begin Testing...
[Epoch 9] train avg loss 0.0075777, dev acc 0.8733, dev avg loss 0.376142, throughput 9.548K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00717406, throughput 9.68558K wps
[Epoch 10 Batch 60/162] avg loss 0.00721908, throughput 9.46155K wps
[Epoch 10 Batch 90/162] avg loss 0.00730357, throughput 9.58996K wps
[Epoch 10 Batch 120/162] avg loss 0.00733003, throughput 9.44458K wps
[Epoch 10 Batch 150/162] avg loss 0.00701399, throughput 9.40509K wps
Begin Testing...
[Epoch 10] train avg loss 0.00718916, dev acc 0.8756, dev avg loss 0.363214, throughput 9.51757K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00678275, throughput 9.54591K wps
[Epoch 11 Batch 60/162] avg loss 0.00700366, throughput 9.4893K wps
[Epoch 11 Batch 90/162] avg loss 0.00689843, throughput 9.4057K wps
[Epoch 11 Batch 120/162] avg loss 0.0070885, throughput 9.4621K wps
[Epoch 11 Batch 150/162] avg loss 0.0067608, throughput 9.6145K wps
Begin Testing...
[Epoch 11] train avg loss 0.00691345, dev acc 0.8778, dev avg loss 0.352661, throughput 9.48861K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00683382, throughput 9.78665K wps
[Epoch 12 Batch 60/162] avg loss 0.00662973, throughput 9.41849K wps
[Epoch 12 Batch 90/162] avg loss 0.0062095, throughput 9.52567K wps
[Epoch 12 Batch 120/162] avg loss 0.00622745, throughput 9.63293K wps
[Epoch 12 Batch 150/162] avg loss 0.00668349, throughput 9.51729K wps
Begin Testing...
[Epoch 12] train avg loss 0.00650781, dev acc 0.8811, dev avg loss 0.341088, throughput 9.57071K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.0066457, throughput 9.55024K wps
[Epoch 13 Batch 60/162] avg loss 0.0066435, throughput 9.52019K wps
[Epoch 13 Batch 90/162] avg loss 0.00622139, throughput 9.42255K wps
[Epoch 13 Batch 120/162] avg loss 0.00649197, throughput 9.44282K wps
[Epoch 13 Batch 150/162] avg loss 0.00611179, throughput 9.57489K wps
Begin Testing...
[Epoch 13] train avg loss 0.00644013, dev acc 0.8856, dev avg loss 0.333625, throughput 9.49058K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00632254, throughput 9.62778K wps
[Epoch 14 Batch 60/162] avg loss 0.00630582, throughput 9.36483K wps
[Epoch 14 Batch 90/162] avg loss 0.00641791, throughput 9.35305K wps
[Epoch 14 Batch 120/162] avg loss 0.00604265, throughput 9.41917K wps
[Epoch 14 Batch 150/162] avg loss 0.00612689, throughput 9.57848K wps
Begin Testing...
[Epoch 14] train avg loss 0.00624127, dev acc 0.8867, dev avg loss 0.326977, throughput 9.44655K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00631406, throughput 9.60864K wps
[Epoch 15 Batch 60/162] avg loss 0.00592309, throughput 9.57579K wps
[Epoch 15 Batch 90/162] avg loss 0.00617731, throughput 9.50213K wps
[Epoch 15 Batch 120/162] avg loss 0.00596338, throughput 9.37959K wps
[Epoch 15 Batch 150/162] avg loss 0.00599038, throughput 9.4467K wps
Begin Testing...
[Epoch 15] train avg loss 0.00605979, dev acc 0.8889, dev avg loss 0.320968, throughput 9.48432K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/162] avg loss 0.00569059, throughput 9.56589K wps
[Epoch 16 Batch 60/162] avg loss 0.00582947, throughput 9.40569K wps
[Epoch 16 Batch 90/162] avg loss 0.0062145, throughput 9.26223K wps
[Epoch 16 Batch 120/162] avg loss 0.00600952, throughput 9.43629K wps
[Epoch 16 Batch 150/162] avg loss 0.00606738, throughput 9.49933K wps
Begin Testing...
[Epoch 16] train avg loss 0.00593747, dev acc 0.8867, dev avg loss 0.315545, throughput 9.42478K wps
[Epoch 17 Batch 30/162] avg loss 0.00557341, throughput 9.58483K wps
[Epoch 17 Batch 60/162] avg loss 0.00592358, throughput 9.39695K wps
[Epoch 17 Batch 90/162] avg loss 0.00622066, throughput 9.32288K wps
[Epoch 17 Batch 120/162] avg loss 0.00554141, throughput 9.35937K wps
[Epoch 17 Batch 150/162] avg loss 0.00607266, throughput 9.64441K wps
Begin Testing...
[Epoch 17] train avg loss 0.00583486, dev acc 0.8889, dev avg loss 0.311482, throughput 9.46071K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/162] avg loss 0.00559593, throughput 9.73618K wps
[Epoch 18 Batch 60/162] avg loss 0.00565354, throughput 9.46467K wps
[Epoch 18 Batch 90/162] avg loss 0.00555657, throughput 9.26849K wps
[Epoch 18 Batch 120/162] avg loss 0.00615792, throughput 9.34718K wps
[Epoch 18 Batch 150/162] avg loss 0.00531747, throughput 9.45421K wps
Begin Testing...
[Epoch 18] train avg loss 0.0056676, dev acc 0.8889, dev avg loss 0.306577, throughput 9.44157K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.00553821, throughput 9.64454K wps
[Epoch 19 Batch 60/162] avg loss 0.00530412, throughput 9.39186K wps
[Epoch 19 Batch 90/162] avg loss 0.00555156, throughput 9.39406K wps
[Epoch 19 Batch 120/162] avg loss 0.00581841, throughput 9.43238K wps
[Epoch 19 Batch 150/162] avg loss 0.00549919, throughput 9.34169K wps
Begin Testing...
[Epoch 19] train avg loss 0.00553728, dev acc 0.8911, dev avg loss 0.302825, throughput 9.43016K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/162] avg loss 0.00521793, throughput 9.78528K wps
[Epoch 20 Batch 60/162] avg loss 0.00523199, throughput 9.59378K wps
[Epoch 20 Batch 90/162] avg loss 0.00544018, throughput 9.55586K wps
[Epoch 20 Batch 120/162] avg loss 0.00572085, throughput 9.62187K wps
[Epoch 20 Batch 150/162] avg loss 0.00584005, throughput 9.55164K wps
Begin Testing...
[Epoch 20] train avg loss 0.00545546, dev acc 0.8922, dev avg loss 0.298785, throughput 9.61922K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/162] avg loss 0.00563956, throughput 9.55574K wps
[Epoch 21 Batch 60/162] avg loss 0.00519013, throughput 9.38313K wps
[Epoch 21 Batch 90/162] avg loss 0.0051913, throughput 9.47522K wps
[Epoch 21 Batch 120/162] avg loss 0.00510079, throughput 9.50785K wps
[Epoch 21 Batch 150/162] avg loss 0.00502946, throughput 9.27265K wps
Begin Testing...
[Epoch 21] train avg loss 0.00524858, dev acc 0.8956, dev avg loss 0.295013, throughput 9.43116K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.0052942, throughput 9.69388K wps
[Epoch 22 Batch 60/162] avg loss 0.00522257, throughput 9.48309K wps
[Epoch 22 Batch 90/162] avg loss 0.00504285, throughput 9.44366K wps
[Epoch 22 Batch 120/162] avg loss 0.00495531, throughput 9.39026K wps
[Epoch 22 Batch 150/162] avg loss 0.00539518, throughput 9.52251K wps
Begin Testing...
[Epoch 22] train avg loss 0.00516369, dev acc 0.8967, dev avg loss 0.291609, throughput 9.51514K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/162] avg loss 0.00514018, throughput 9.64495K wps
[Epoch 23 Batch 60/162] avg loss 0.00501121, throughput 9.4804K wps
[Epoch 23 Batch 90/162] avg loss 0.00491868, throughput 9.47652K wps
[Epoch 23 Batch 120/162] avg loss 0.00524275, throughput 9.41957K wps
[Epoch 23 Batch 150/162] avg loss 0.0052641, throughput 9.45964K wps
Begin Testing...
[Epoch 23] train avg loss 0.00508836, dev acc 0.8922, dev avg loss 0.289848, throughput 9.48341K wps
[Epoch 24 Batch 30/162] avg loss 0.00510846, throughput 9.47826K wps
[Epoch 24 Batch 60/162] avg loss 0.00495213, throughput 9.58132K wps
[Epoch 24 Batch 90/162] avg loss 0.00466404, throughput 9.34193K wps
[Epoch 24 Batch 120/162] avg loss 0.00530204, throughput 9.60358K wps
[Epoch 24 Batch 150/162] avg loss 0.0047967, throughput 9.47435K wps
Begin Testing...
[Epoch 24] train avg loss 0.00494935, dev acc 0.8956, dev avg loss 0.285739, throughput 9.50178K wps
[Epoch 25 Batch 30/162] avg loss 0.00484253, throughput 9.66595K wps
[Epoch 25 Batch 60/162] avg loss 0.00514943, throughput 9.43817K wps
[Epoch 25 Batch 90/162] avg loss 0.00455663, throughput 9.34473K wps
[Epoch 25 Batch 120/162] avg loss 0.00507725, throughput 9.44696K wps
[Epoch 25 Batch 150/162] avg loss 0.00469302, throughput 9.48272K wps
Begin Testing...
[Epoch 25] train avg loss 0.00487417, dev acc 0.8956, dev avg loss 0.283306, throughput 9.47791K wps
[Epoch 26 Batch 30/162] avg loss 0.00503558, throughput 9.49072K wps
[Epoch 26 Batch 60/162] avg loss 0.00440408, throughput 9.44584K wps
[Epoch 26 Batch 90/162] avg loss 0.00464679, throughput 9.52314K wps
[Epoch 26 Batch 120/162] avg loss 0.00515276, throughput 9.33639K wps
[Epoch 26 Batch 150/162] avg loss 0.00482135, throughput 9.45824K wps
Begin Testing...
[Epoch 26] train avg loss 0.00482186, dev acc 0.8978, dev avg loss 0.280569, throughput 9.45502K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.0045356, throughput 9.64995K wps
[Epoch 27 Batch 60/162] avg loss 0.00455278, throughput 9.36593K wps
[Epoch 27 Batch 90/162] avg loss 0.00482514, throughput 9.35764K wps
[Epoch 27 Batch 120/162] avg loss 0.00492322, throughput 9.25941K wps
[Epoch 27 Batch 150/162] avg loss 0.00494774, throughput 9.46653K wps
Begin Testing...
[Epoch 27] train avg loss 0.00475806, dev acc 0.8967, dev avg loss 0.2784, throughput 9.40425K wps
[Epoch 28 Batch 30/162] avg loss 0.00434146, throughput 9.67811K wps
[Epoch 28 Batch 60/162] avg loss 0.00488754, throughput 9.60404K wps
[Epoch 28 Batch 90/162] avg loss 0.00423977, throughput 9.34693K wps
[Epoch 28 Batch 120/162] avg loss 0.00503577, throughput 9.59636K wps
[Epoch 28 Batch 150/162] avg loss 0.00460634, throughput 9.33307K wps
Begin Testing...
[Epoch 28] train avg loss 0.00464542, dev acc 0.8978, dev avg loss 0.275934, throughput 9.50684K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/162] avg loss 0.00482399, throughput 9.56692K wps
[Epoch 29 Batch 60/162] avg loss 0.00454589, throughput 9.25683K wps
[Epoch 29 Batch 90/162] avg loss 0.00435749, throughput 9.57799K wps
[Epoch 29 Batch 120/162] avg loss 0.00478881, throughput 9.4965K wps
[Epoch 29 Batch 150/162] avg loss 0.00409712, throughput 9.51701K wps
Begin Testing...
[Epoch 29] train avg loss 0.00448683, dev acc 0.8978, dev avg loss 0.273418, throughput 9.4819K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/162] avg loss 0.00478067, throughput 9.80287K wps
[Epoch 30 Batch 60/162] avg loss 0.00419504, throughput 9.3498K wps
[Epoch 30 Batch 90/162] avg loss 0.00431353, throughput 9.42033K wps
[Epoch 30 Batch 120/162] avg loss 0.00470617, throughput 9.29221K wps
[Epoch 30 Batch 150/162] avg loss 0.00430479, throughput 9.39394K wps
Begin Testing...
[Epoch 30] train avg loss 0.00446317, dev acc 0.9000, dev avg loss 0.272378, throughput 9.45538K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00446604, throughput 9.61072K wps
[Epoch 31 Batch 60/162] avg loss 0.00452568, throughput 9.51161K wps
[Epoch 31 Batch 90/162] avg loss 0.00415589, throughput 9.45126K wps
[Epoch 31 Batch 120/162] avg loss 0.00450758, throughput 9.564K wps
[Epoch 31 Batch 150/162] avg loss 0.00432665, throughput 9.42179K wps
Begin Testing...
[Epoch 31] train avg loss 0.00440475, dev acc 0.8967, dev avg loss 0.269901, throughput 9.51569K wps
[Epoch 32 Batch 30/162] avg loss 0.00433878, throughput 9.51472K wps
[Epoch 32 Batch 60/162] avg loss 0.00443535, throughput 9.47296K wps
[Epoch 32 Batch 90/162] avg loss 0.00449091, throughput 9.45848K wps
[Epoch 32 Batch 120/162] avg loss 0.00428892, throughput 9.28994K wps
[Epoch 32 Batch 150/162] avg loss 0.00416887, throughput 9.31816K wps
Begin Testing...
[Epoch 32] train avg loss 0.00433306, dev acc 0.8989, dev avg loss 0.268951, throughput 9.4196K wps
[Epoch 33 Batch 30/162] avg loss 0.00421211, throughput 9.58092K wps
[Epoch 33 Batch 60/162] avg loss 0.00408083, throughput 9.63121K wps
[Epoch 33 Batch 90/162] avg loss 0.00463901, throughput 9.34245K wps
[Epoch 33 Batch 120/162] avg loss 0.0044146, throughput 9.38966K wps
[Epoch 33 Batch 150/162] avg loss 0.00431222, throughput 9.31115K wps
Begin Testing...
[Epoch 33] train avg loss 0.00428337, dev acc 0.8989, dev avg loss 0.268036, throughput 9.46413K wps
[Epoch 34 Batch 30/162] avg loss 0.00408815, throughput 9.47671K wps
[Epoch 34 Batch 60/162] avg loss 0.00419916, throughput 9.39808K wps
[Epoch 34 Batch 90/162] avg loss 0.00407953, throughput 9.35001K wps
[Epoch 34 Batch 120/162] avg loss 0.0037462, throughput 9.5343K wps
[Epoch 34 Batch 150/162] avg loss 0.00441781, throughput 9.38551K wps
Begin Testing...
[Epoch 34] train avg loss 0.00414092, dev acc 0.8989, dev avg loss 0.264133, throughput 9.41777K wps
[Epoch 35 Batch 30/162] avg loss 0.00430011, throughput 9.60308K wps
[Epoch 35 Batch 60/162] avg loss 0.0040314, throughput 9.43085K wps
[Epoch 35 Batch 90/162] avg loss 0.00414802, throughput 9.59724K wps
[Epoch 35 Batch 120/162] avg loss 0.00419202, throughput 9.41857K wps
[Epoch 35 Batch 150/162] avg loss 0.00377472, throughput 9.55167K wps
Begin Testing...
[Epoch 35] train avg loss 0.00408505, dev acc 0.8978, dev avg loss 0.262154, throughput 9.52085K wps
[Epoch 36 Batch 30/162] avg loss 0.00398365, throughput 9.4616K wps
[Epoch 36 Batch 60/162] avg loss 0.00406087, throughput 9.57624K wps
[Epoch 36 Batch 90/162] avg loss 0.00419579, throughput 9.36475K wps
[Epoch 36 Batch 120/162] avg loss 0.00370225, throughput 9.47195K wps
[Epoch 36 Batch 150/162] avg loss 0.00422434, throughput 9.56823K wps
Begin Testing...
[Epoch 36] train avg loss 0.00403643, dev acc 0.9022, dev avg loss 0.26199, throughput 9.48986K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/162] avg loss 0.00369248, throughput 9.70319K wps
[Epoch 37 Batch 60/162] avg loss 0.00399778, throughput 9.23667K wps
[Epoch 37 Batch 90/162] avg loss 0.00388887, throughput 9.34555K wps
[Epoch 37 Batch 120/162] avg loss 0.0043826, throughput 9.49703K wps
[Epoch 37 Batch 150/162] avg loss 0.00427731, throughput 9.60465K wps
Begin Testing...
[Epoch 37] train avg loss 0.00402125, dev acc 0.9000, dev avg loss 0.259542, throughput 9.4719K wps
[Epoch 38 Batch 30/162] avg loss 0.00380891, throughput 9.49633K wps
[Epoch 38 Batch 60/162] avg loss 0.0040139, throughput 9.49164K wps
[Epoch 38 Batch 90/162] avg loss 0.00386191, throughput 9.35476K wps
[Epoch 38 Batch 120/162] avg loss 0.00370861, throughput 9.45103K wps
[Epoch 38 Batch 150/162] avg loss 0.0042393, throughput 9.26727K wps
Begin Testing...
[Epoch 38] train avg loss 0.0039613, dev acc 0.9011, dev avg loss 0.258285, throughput 9.40156K wps
[Epoch 39 Batch 30/162] avg loss 0.00401124, throughput 9.54127K wps
[Epoch 39 Batch 60/162] avg loss 0.00390607, throughput 9.31581K wps
[Epoch 39 Batch 90/162] avg loss 0.00373968, throughput 9.27547K wps
[Epoch 39 Batch 120/162] avg loss 0.00396615, throughput 9.41871K wps
[Epoch 39 Batch 150/162] avg loss 0.00379264, throughput 9.39576K wps
Begin Testing...
[Epoch 39] train avg loss 0.00390702, dev acc 0.8989, dev avg loss 0.256657, throughput 9.39427K wps
[Epoch 40 Batch 30/162] avg loss 0.00393162, throughput 9.67434K wps
[Epoch 40 Batch 60/162] avg loss 0.0037819, throughput 9.57699K wps
[Epoch 40 Batch 90/162] avg loss 0.00377915, throughput 9.46876K wps
[Epoch 40 Batch 120/162] avg loss 0.00373343, throughput 9.45007K wps
[Epoch 40 Batch 150/162] avg loss 0.00380252, throughput 9.26224K wps
Begin Testing...
[Epoch 40] train avg loss 0.00381492, dev acc 0.8989, dev avg loss 0.255187, throughput 9.4706K wps
[Epoch 41 Batch 30/162] avg loss 0.00399802, throughput 9.49814K wps
[Epoch 41 Batch 60/162] avg loss 0.00345699, throughput 9.29504K wps
[Epoch 41 Batch 90/162] avg loss 0.00360227, throughput 9.45748K wps
[Epoch 41 Batch 120/162] avg loss 0.00398935, throughput 9.41777K wps
[Epoch 41 Batch 150/162] avg loss 0.00378132, throughput 9.27861K wps
Begin Testing...
[Epoch 41] train avg loss 0.00374704, dev acc 0.8989, dev avg loss 0.25391, throughput 9.38096K wps
[Epoch 42 Batch 30/162] avg loss 0.00357563, throughput 9.54487K wps
[Epoch 42 Batch 60/162] avg loss 0.00361566, throughput 9.62968K wps
[Epoch 42 Batch 90/162] avg loss 0.00354063, throughput 9.49293K wps
[Epoch 42 Batch 120/162] avg loss 0.00361762, throughput 9.44148K wps
[Epoch 42 Batch 150/162] avg loss 0.00418969, throughput 9.36544K wps
Begin Testing...
[Epoch 42] train avg loss 0.00373444, dev acc 0.9000, dev avg loss 0.25275, throughput 9.50581K wps
[Epoch 43 Batch 30/162] avg loss 0.00370573, throughput 9.5242K wps
[Epoch 43 Batch 60/162] avg loss 0.00373772, throughput 9.43488K wps
[Epoch 43 Batch 90/162] avg loss 0.00357915, throughput 9.30858K wps
[Epoch 43 Batch 120/162] avg loss 0.00372048, throughput 9.3571K wps
[Epoch 43 Batch 150/162] avg loss 0.00356147, throughput 9.36751K wps
Begin Testing...
[Epoch 43] train avg loss 0.00365763, dev acc 0.9011, dev avg loss 0.251234, throughput 9.39331K wps
[Epoch 44 Batch 30/162] avg loss 0.00378978, throughput 9.61609K wps
[Epoch 44 Batch 60/162] avg loss 0.00358534, throughput 9.39665K wps
[Epoch 44 Batch 90/162] avg loss 0.00380747, throughput 9.63629K wps
[Epoch 44 Batch 120/162] avg loss 0.00341006, throughput 9.35279K wps
[Epoch 44 Batch 150/162] avg loss 0.00383862, throughput 9.451K wps
Begin Testing...
[Epoch 44] train avg loss 0.00362973, dev acc 0.9011, dev avg loss 0.250194, throughput 9.48177K wps
[Epoch 45 Batch 30/162] avg loss 0.00359368, throughput 9.65537K wps
[Epoch 45 Batch 60/162] avg loss 0.00378111, throughput 9.51237K wps
[Epoch 45 Batch 90/162] avg loss 0.00363177, throughput 9.39398K wps
[Epoch 45 Batch 120/162] avg loss 0.00371031, throughput 9.26322K wps
[Epoch 45 Batch 150/162] avg loss 0.00336906, throughput 9.49175K wps
Begin Testing...
[Epoch 45] train avg loss 0.00359808, dev acc 0.9011, dev avg loss 0.249553, throughput 9.46465K wps
[Epoch 46 Batch 30/162] avg loss 0.00344127, throughput 9.72143K wps
[Epoch 46 Batch 60/162] avg loss 0.00328195, throughput 9.34182K wps
[Epoch 46 Batch 90/162] avg loss 0.00362182, throughput 9.33831K wps
[Epoch 46 Batch 120/162] avg loss 0.00373991, throughput 9.35089K wps
[Epoch 46 Batch 150/162] avg loss 0.00354795, throughput 9.44608K wps
Begin Testing...
[Epoch 46] train avg loss 0.00350159, dev acc 0.9056, dev avg loss 0.249423, throughput 9.43889K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/162] avg loss 0.00342204, throughput 9.64541K wps
[Epoch 47 Batch 60/162] avg loss 0.00347048, throughput 9.31912K wps
[Epoch 47 Batch 90/162] avg loss 0.00336565, throughput 9.37817K wps
[Epoch 47 Batch 120/162] avg loss 0.00349036, throughput 9.57605K wps
[Epoch 47 Batch 150/162] avg loss 0.00336753, throughput 9.44081K wps
Begin Testing...
[Epoch 47] train avg loss 0.00339945, dev acc 0.9067, dev avg loss 0.248338, throughput 9.44925K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/162] avg loss 0.00350012, throughput 9.44065K wps
[Epoch 48 Batch 60/162] avg loss 0.00330553, throughput 9.36003K wps
[Epoch 48 Batch 90/162] avg loss 0.00328175, throughput 9.31135K wps
[Epoch 48 Batch 120/162] avg loss 0.00379125, throughput 9.6338K wps
[Epoch 48 Batch 150/162] avg loss 0.00353947, throughput 9.38186K wps
Begin Testing...
[Epoch 48] train avg loss 0.00342202, dev acc 0.9067, dev avg loss 0.247824, throughput 9.43725K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/162] avg loss 0.00321639, throughput 9.54198K wps
[Epoch 49 Batch 60/162] avg loss 0.00353585, throughput 9.35214K wps
[Epoch 49 Batch 90/162] avg loss 0.00326472, throughput 9.29211K wps
[Epoch 49 Batch 120/162] avg loss 0.00307165, throughput 9.37366K wps
[Epoch 49 Batch 150/162] avg loss 0.00346831, throughput 9.27105K wps
Begin Testing...
[Epoch 49] train avg loss 0.00331722, dev acc 0.9022, dev avg loss 0.245726, throughput 9.35504K wps
[Epoch 50 Batch 30/162] avg loss 0.00318394, throughput 9.53669K wps
[Epoch 50 Batch 60/162] avg loss 0.00343808, throughput 9.49831K wps
[Epoch 50 Batch 90/162] avg loss 0.00327675, throughput 9.54207K wps
[Epoch 50 Batch 120/162] avg loss 0.00381823, throughput 9.49602K wps
[Epoch 50 Batch 150/162] avg loss 0.00297331, throughput 9.33471K wps
Begin Testing...
[Epoch 50] train avg loss 0.00333092, dev acc 0.9022, dev avg loss 0.244905, throughput 9.47841K wps
[Epoch 51 Batch 30/162] avg loss 0.00321521, throughput 9.52019K wps
[Epoch 51 Batch 60/162] avg loss 0.0032855, throughput 9.41971K wps
[Epoch 51 Batch 90/162] avg loss 0.00326702, throughput 9.27154K wps
[Epoch 51 Batch 120/162] avg loss 0.00322066, throughput 9.26474K wps
[Epoch 51 Batch 150/162] avg loss 0.00306519, throughput 9.47363K wps
Begin Testing...
[Epoch 51] train avg loss 0.00322283, dev acc 0.9022, dev avg loss 0.244081, throughput 9.39416K wps
[Epoch 52 Batch 30/162] avg loss 0.00318501, throughput 9.45487K wps
[Epoch 52 Batch 60/162] avg loss 0.00311068, throughput 9.43871K wps
[Epoch 52 Batch 90/162] avg loss 0.00349898, throughput 9.52401K wps
[Epoch 52 Batch 120/162] avg loss 0.00307708, throughput 9.49713K wps
[Epoch 52 Batch 150/162] avg loss 0.00310367, throughput 9.58282K wps
Begin Testing...
[Epoch 52] train avg loss 0.00319324, dev acc 0.9078, dev avg loss 0.244634, throughput 9.4922K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/162] avg loss 0.00330412, throughput 9.5568K wps
[Epoch 53 Batch 60/162] avg loss 0.00340498, throughput 9.38188K wps
[Epoch 53 Batch 90/162] avg loss 0.002726, throughput 9.33004K wps
[Epoch 53 Batch 120/162] avg loss 0.00325923, throughput 9.30984K wps
[Epoch 53 Batch 150/162] avg loss 0.00311163, throughput 9.50999K wps
Begin Testing...
[Epoch 53] train avg loss 0.00313863, dev acc 0.9078, dev avg loss 0.24387, throughput 9.40989K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/162] avg loss 0.00280522, throughput 9.5136K wps
[Epoch 54 Batch 60/162] avg loss 0.00312256, throughput 9.34957K wps
[Epoch 54 Batch 90/162] avg loss 0.0032516, throughput 9.49484K wps
[Epoch 54 Batch 120/162] avg loss 0.00296608, throughput 9.6302K wps
[Epoch 54 Batch 150/162] avg loss 0.00336393, throughput 9.57051K wps
Begin Testing...
[Epoch 54] train avg loss 0.00309793, dev acc 0.9067, dev avg loss 0.24246, throughput 9.5052K wps
[Epoch 55 Batch 30/162] avg loss 0.00311743, throughput 9.41864K wps
[Epoch 55 Batch 60/162] avg loss 0.00315787, throughput 9.32091K wps
[Epoch 55 Batch 90/162] avg loss 0.00304349, throughput 9.47142K wps
[Epoch 55 Batch 120/162] avg loss 0.00312005, throughput 9.26735K wps
[Epoch 55 Batch 150/162] avg loss 0.00288105, throughput 9.34897K wps
Begin Testing...
[Epoch 55] train avg loss 0.00304497, dev acc 0.9033, dev avg loss 0.240875, throughput 9.3566K wps
[Epoch 56 Batch 30/162] avg loss 0.00289679, throughput 9.51227K wps
[Epoch 56 Batch 60/162] avg loss 0.00314012, throughput 9.31222K wps
[Epoch 56 Batch 90/162] avg loss 0.00300704, throughput 9.40977K wps
[Epoch 56 Batch 120/162] avg loss 0.00313969, throughput 9.2384K wps
[Epoch 56 Batch 150/162] avg loss 0.00277625, throughput 9.54468K wps
Begin Testing...
[Epoch 56] train avg loss 0.00299008, dev acc 0.9067, dev avg loss 0.240352, throughput 9.39314K wps
[Epoch 57 Batch 30/162] avg loss 0.00299225, throughput 9.46794K wps
[Epoch 57 Batch 60/162] avg loss 0.00281025, throughput 9.35416K wps
[Epoch 57 Batch 90/162] avg loss 0.00297896, throughput 9.40174K wps
[Epoch 57 Batch 120/162] avg loss 0.00293832, throughput 9.25101K wps
[Epoch 57 Batch 150/162] avg loss 0.00276375, throughput 9.43863K wps
Begin Testing...
[Epoch 57] train avg loss 0.0029357, dev acc 0.9078, dev avg loss 0.240183, throughput 9.39874K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/162] avg loss 0.00280653, throughput 9.64921K wps
[Epoch 58 Batch 60/162] avg loss 0.00301661, throughput 9.3229K wps
[Epoch 58 Batch 90/162] avg loss 0.00294062, throughput 9.38517K wps
[Epoch 58 Batch 120/162] avg loss 0.00316029, throughput 9.40372K wps
[Epoch 58 Batch 150/162] avg loss 0.00312705, throughput 9.37382K wps
Begin Testing...
[Epoch 58] train avg loss 0.00300378, dev acc 0.9100, dev avg loss 0.240917, throughput 9.41193K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/162] avg loss 0.00293071, throughput 9.59655K wps
[Epoch 59 Batch 60/162] avg loss 0.00278885, throughput 9.3282K wps
[Epoch 59 Batch 90/162] avg loss 0.00260123, throughput 9.34104K wps
[Epoch 59 Batch 120/162] avg loss 0.00304476, throughput 9.48451K wps
[Epoch 59 Batch 150/162] avg loss 0.00294758, throughput 9.38572K wps
Begin Testing...
[Epoch 59] train avg loss 0.00288044, dev acc 0.9044, dev avg loss 0.238944, throughput 9.42144K wps
[Epoch 60 Batch 30/162] avg loss 0.00300888, throughput 9.568K wps
[Epoch 60 Batch 60/162] avg loss 0.00291366, throughput 9.44866K wps
[Epoch 60 Batch 90/162] avg loss 0.00267981, throughput 9.24282K wps
[Epoch 60 Batch 120/162] avg loss 0.00292821, throughput 9.48347K wps
[Epoch 60 Batch 150/162] avg loss 0.00276907, throughput 9.45153K wps
Begin Testing...
[Epoch 60] train avg loss 0.00283028, dev acc 0.9056, dev avg loss 0.239464, throughput 9.42948K wps
[Epoch 61 Batch 30/162] avg loss 0.00260855, throughput 9.67308K wps
[Epoch 61 Batch 60/162] avg loss 0.00278626, throughput 9.50827K wps
[Epoch 61 Batch 90/162] avg loss 0.00290986, throughput 9.53516K wps
[Epoch 61 Batch 120/162] avg loss 0.00299438, throughput 9.45582K wps
[Epoch 61 Batch 150/162] avg loss 0.00274816, throughput 9.48674K wps
Begin Testing...
[Epoch 61] train avg loss 0.00281143, dev acc 0.9056, dev avg loss 0.237615, throughput 9.5077K wps
[Epoch 62 Batch 30/162] avg loss 0.00288795, throughput 9.4864K wps
[Epoch 62 Batch 60/162] avg loss 0.00276271, throughput 9.33604K wps
[Epoch 62 Batch 90/162] avg loss 0.00266529, throughput 9.41457K wps
[Epoch 62 Batch 120/162] avg loss 0.00267132, throughput 9.4805K wps
[Epoch 62 Batch 150/162] avg loss 0.00278233, throughput 9.42662K wps
Begin Testing...
[Epoch 62] train avg loss 0.002759, dev acc 0.9056, dev avg loss 0.23689, throughput 9.42278K wps
[Epoch 63 Batch 30/162] avg loss 0.00265385, throughput 9.74434K wps
[Epoch 63 Batch 60/162] avg loss 0.00286414, throughput 9.40379K wps
[Epoch 63 Batch 90/162] avg loss 0.0028863, throughput 9.39386K wps
[Epoch 63 Batch 120/162] avg loss 0.00272298, throughput 9.50042K wps
[Epoch 63 Batch 150/162] avg loss 0.00257107, throughput 9.36452K wps
Begin Testing...
[Epoch 63] train avg loss 0.00273231, dev acc 0.9078, dev avg loss 0.23665, throughput 9.49075K wps
[Epoch 64 Batch 30/162] avg loss 0.00287369, throughput 9.70174K wps
[Epoch 64 Batch 60/162] avg loss 0.00268595, throughput 9.32407K wps
[Epoch 64 Batch 90/162] avg loss 0.00249368, throughput 9.40545K wps
[Epoch 64 Batch 120/162] avg loss 0.00279411, throughput 9.55831K wps
[Epoch 64 Batch 150/162] avg loss 0.00262947, throughput 9.50336K wps
Begin Testing...
[Epoch 64] train avg loss 0.00268018, dev acc 0.9067, dev avg loss 0.236, throughput 9.49381K wps
[Epoch 65 Batch 30/162] avg loss 0.00239457, throughput 9.43358K wps
[Epoch 65 Batch 60/162] avg loss 0.00267318, throughput 9.31637K wps
[Epoch 65 Batch 90/162] avg loss 0.00275123, throughput 9.47293K wps
[Epoch 65 Batch 120/162] avg loss 0.00294261, throughput 9.32146K wps
[Epoch 65 Batch 150/162] avg loss 0.00245131, throughput 9.52185K wps
Begin Testing...
[Epoch 65] train avg loss 0.00264365, dev acc 0.9089, dev avg loss 0.235717, throughput 9.40323K wps
[Epoch 66 Batch 30/162] avg loss 0.00268589, throughput 9.51885K wps
[Epoch 66 Batch 60/162] avg loss 0.00272469, throughput 9.6092K wps
[Epoch 66 Batch 90/162] avg loss 0.00248978, throughput 9.60937K wps
[Epoch 66 Batch 120/162] avg loss 0.00285202, throughput 9.5301K wps
[Epoch 66 Batch 150/162] avg loss 0.00249202, throughput 9.27504K wps
Begin Testing...
[Epoch 66] train avg loss 0.00264761, dev acc 0.9078, dev avg loss 0.235078, throughput 9.4942K wps
[Epoch 67 Batch 30/162] avg loss 0.00227765, throughput 9.60786K wps
[Epoch 67 Batch 60/162] avg loss 0.00280586, throughput 9.40882K wps
[Epoch 67 Batch 90/162] avg loss 0.00261633, throughput 9.45542K wps
[Epoch 67 Batch 120/162] avg loss 0.00236493, throughput 9.3603K wps
[Epoch 67 Batch 150/162] avg loss 0.00251714, throughput 9.48641K wps
Begin Testing...
[Epoch 67] train avg loss 0.0025287, dev acc 0.9078, dev avg loss 0.235218, throughput 9.47176K wps
[Epoch 68 Batch 30/162] avg loss 0.00247929, throughput 9.50213K wps
[Epoch 68 Batch 60/162] avg loss 0.00237766, throughput 9.40113K wps
[Epoch 68 Batch 90/162] avg loss 0.00258673, throughput 9.31912K wps
[Epoch 68 Batch 120/162] avg loss 0.00243504, throughput 9.55616K wps
[Epoch 68 Batch 150/162] avg loss 0.00259164, throughput 9.27384K wps
Begin Testing...
[Epoch 68] train avg loss 0.00252898, dev acc 0.9078, dev avg loss 0.235461, throughput 9.42179K wps
[Epoch 69 Batch 30/162] avg loss 0.00265411, throughput 9.56154K wps
[Epoch 69 Batch 60/162] avg loss 0.00241417, throughput 9.462K wps
[Epoch 69 Batch 90/162] avg loss 0.00254495, throughput 9.43332K wps
[Epoch 69 Batch 120/162] avg loss 0.0021794, throughput 9.3068K wps
[Epoch 69 Batch 150/162] avg loss 0.00265025, throughput 9.39508K wps
Begin Testing...
[Epoch 69] train avg loss 0.00246926, dev acc 0.9078, dev avg loss 0.235717, throughput 9.42618K wps
[Epoch 70 Batch 30/162] avg loss 0.00257705, throughput 9.51677K wps
[Epoch 70 Batch 60/162] avg loss 0.00226928, throughput 9.36987K wps
[Epoch 70 Batch 90/162] avg loss 0.00249652, throughput 9.38456K wps
[Epoch 70 Batch 120/162] avg loss 0.00211324, throughput 9.50327K wps
[Epoch 70 Batch 150/162] avg loss 0.00271965, throughput 9.46819K wps
Begin Testing...
[Epoch 70] train avg loss 0.00242111, dev acc 0.9089, dev avg loss 0.233756, throughput 9.43801K wps
[Epoch 71 Batch 30/162] avg loss 0.00235651, throughput 9.51452K wps
[Epoch 71 Batch 60/162] avg loss 0.00242878, throughput 9.47533K wps
[Epoch 71 Batch 90/162] avg loss 0.00259031, throughput 9.4441K wps
[Epoch 71 Batch 120/162] avg loss 0.00239291, throughput 9.36571K wps
[Epoch 71 Batch 150/162] avg loss 0.00234223, throughput 9.32055K wps
Begin Testing...
[Epoch 71] train avg loss 0.00241264, dev acc 0.9089, dev avg loss 0.233406, throughput 9.4133K wps
[Epoch 72 Batch 30/162] avg loss 0.00242557, throughput 9.51617K wps
[Epoch 72 Batch 60/162] avg loss 0.00242044, throughput 9.2519K wps
[Epoch 72 Batch 90/162] avg loss 0.00240604, throughput 9.48976K wps
[Epoch 72 Batch 120/162] avg loss 0.00222254, throughput 9.34074K wps
[Epoch 72 Batch 150/162] avg loss 0.00241634, throughput 9.24993K wps
Begin Testing...
[Epoch 72] train avg loss 0.00238551, dev acc 0.9078, dev avg loss 0.232928, throughput 9.37433K wps
[Epoch 73 Batch 30/162] avg loss 0.00225104, throughput 9.69523K wps
[Epoch 73 Batch 60/162] avg loss 0.00251166, throughput 9.39546K wps
[Epoch 73 Batch 90/162] avg loss 0.00224245, throughput 9.55207K wps
[Epoch 73 Batch 120/162] avg loss 0.00242628, throughput 9.45621K wps
[Epoch 73 Batch 150/162] avg loss 0.00239749, throughput 9.45189K wps
Begin Testing...
[Epoch 73] train avg loss 0.00236795, dev acc 0.9089, dev avg loss 0.232871, throughput 9.4951K wps
[Epoch 74 Batch 30/162] avg loss 0.00213699, throughput 9.43431K wps
[Epoch 74 Batch 60/162] avg loss 0.00242813, throughput 9.41929K wps
[Epoch 74 Batch 90/162] avg loss 0.00227872, throughput 9.30512K wps
[Epoch 74 Batch 120/162] avg loss 0.00230027, throughput 9.32665K wps
[Epoch 74 Batch 150/162] avg loss 0.00216548, throughput 9.6628K wps
Begin Testing...
[Epoch 74] train avg loss 0.00230945, dev acc 0.9100, dev avg loss 0.233672, throughput 9.44452K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/162] avg loss 0.00229099, throughput 9.66157K wps
[Epoch 75 Batch 60/162] avg loss 0.00231805, throughput 9.19063K wps
[Epoch 75 Batch 90/162] avg loss 0.00238184, throughput 9.45356K wps
[Epoch 75 Batch 120/162] avg loss 0.00212968, throughput 9.50277K wps
[Epoch 75 Batch 150/162] avg loss 0.00231665, throughput 9.42112K wps
Begin Testing...
[Epoch 75] train avg loss 0.00234109, dev acc 0.9067, dev avg loss 0.232223, throughput 9.43704K wps
[Epoch 76 Batch 30/162] avg loss 0.00228757, throughput 9.60872K wps
[Epoch 76 Batch 60/162] avg loss 0.00238633, throughput 9.20488K wps
[Epoch 76 Batch 90/162] avg loss 0.00226496, throughput 9.28966K wps
[Epoch 76 Batch 120/162] avg loss 0.00231158, throughput 9.40825K wps
[Epoch 76 Batch 150/162] avg loss 0.0021501, throughput 9.45735K wps
Begin Testing...
[Epoch 76] train avg loss 0.00224102, dev acc 0.9067, dev avg loss 0.232594, throughput 9.37968K wps
[Epoch 77 Batch 30/162] avg loss 0.00203727, throughput 9.46889K wps
[Epoch 77 Batch 60/162] avg loss 0.00236498, throughput 9.41467K wps
[Epoch 77 Batch 90/162] avg loss 0.00234807, throughput 9.31376K wps
[Epoch 77 Batch 120/162] avg loss 0.00238359, throughput 9.66378K wps
[Epoch 77 Batch 150/162] avg loss 0.00216933, throughput 9.54635K wps
Begin Testing...
[Epoch 77] train avg loss 0.00222111, dev acc 0.9078, dev avg loss 0.232175, throughput 9.46046K wps
[Epoch 78 Batch 30/162] avg loss 0.00237146, throughput 9.48115K wps
[Epoch 78 Batch 60/162] avg loss 0.00226466, throughput 9.40173K wps
[Epoch 78 Batch 90/162] avg loss 0.00214546, throughput 9.49443K wps
[Epoch 78 Batch 120/162] avg loss 0.00216933, throughput 9.43839K wps
[Epoch 78 Batch 150/162] avg loss 0.00218924, throughput 9.4294K wps
Begin Testing...
[Epoch 78] train avg loss 0.0022003, dev acc 0.9078, dev avg loss 0.231781, throughput 9.44886K wps
[Epoch 79 Batch 30/162] avg loss 0.00214779, throughput 9.63114K wps
[Epoch 79 Batch 60/162] avg loss 0.00208279, throughput 9.5244K wps
[Epoch 79 Batch 90/162] avg loss 0.00217244, throughput 9.30055K wps
[Epoch 79 Batch 120/162] avg loss 0.00213707, throughput 9.37776K wps
[Epoch 79 Batch 150/162] avg loss 0.00215754, throughput 9.31297K wps
Begin Testing...
[Epoch 79] train avg loss 0.00213948, dev acc 0.9078, dev avg loss 0.231746, throughput 9.43039K wps
[Epoch 80 Batch 30/162] avg loss 0.00212842, throughput 9.63843K wps
[Epoch 80 Batch 60/162] avg loss 0.00195101, throughput 9.35867K wps
[Epoch 80 Batch 90/162] avg loss 0.00209711, throughput 9.50732K wps
[Epoch 80 Batch 120/162] avg loss 0.00241555, throughput 9.50327K wps
[Epoch 80 Batch 150/162] avg loss 0.00220332, throughput 9.46487K wps
Begin Testing...
[Epoch 80] train avg loss 0.00215957, dev acc 0.9089, dev avg loss 0.231133, throughput 9.49374K wps
[Epoch 81 Batch 30/162] avg loss 0.0018672, throughput 9.54823K wps
[Epoch 81 Batch 60/162] avg loss 0.00219145, throughput 9.32757K wps
[Epoch 81 Batch 90/162] avg loss 0.00221833, throughput 9.41622K wps
[Epoch 81 Batch 120/162] avg loss 0.00197305, throughput 9.29481K wps
[Epoch 81 Batch 150/162] avg loss 0.00201746, throughput 9.45812K wps
Begin Testing...
[Epoch 81] train avg loss 0.00207677, dev acc 0.9100, dev avg loss 0.230953, throughput 9.37993K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/162] avg loss 0.00188539, throughput 9.65986K wps
[Epoch 82 Batch 60/162] avg loss 0.00204478, throughput 9.46403K wps
[Epoch 82 Batch 90/162] avg loss 0.00201831, throughput 9.2557K wps
[Epoch 82 Batch 120/162] avg loss 0.00241526, throughput 9.48381K wps
[Epoch 82 Batch 150/162] avg loss 0.0020828, throughput 9.29595K wps
Begin Testing...
[Epoch 82] train avg loss 0.00207981, dev acc 0.9078, dev avg loss 0.230809, throughput 9.44472K wps
[Epoch 83 Batch 30/162] avg loss 0.00223152, throughput 9.45692K wps
[Epoch 83 Batch 60/162] avg loss 0.00222273, throughput 9.41858K wps
[Epoch 83 Batch 90/162] avg loss 0.00200768, throughput 9.38583K wps
[Epoch 83 Batch 120/162] avg loss 0.00199637, throughput 9.35457K wps
[Epoch 83 Batch 150/162] avg loss 0.00221681, throughput 9.31573K wps
Begin Testing...
[Epoch 83] train avg loss 0.00212103, dev acc 0.9100, dev avg loss 0.230458, throughput 9.40186K wps
Observed Improvement.
Begin Testing...
[Epoch 84 Batch 30/162] avg loss 0.00215412, throughput 9.72015K wps
[Epoch 84 Batch 60/162] avg loss 0.00193827, throughput 9.26361K wps
[Epoch 84 Batch 90/162] avg loss 0.0021938, throughput 9.34604K wps
[Epoch 84 Batch 120/162] avg loss 0.00206668, throughput 9.28541K wps
[Epoch 84 Batch 150/162] avg loss 0.00189408, throughput 9.25179K wps
Begin Testing...
[Epoch 84] train avg loss 0.00204826, dev acc 0.9089, dev avg loss 0.230426, throughput 9.36956K wps
[Epoch 85 Batch 30/162] avg loss 0.00185962, throughput 9.52811K wps
[Epoch 85 Batch 60/162] avg loss 0.0020107, throughput 9.42722K wps
[Epoch 85 Batch 90/162] avg loss 0.00203444, throughput 9.41173K wps
[Epoch 85 Batch 120/162] avg loss 0.00194089, throughput 9.52288K wps
[Epoch 85 Batch 150/162] avg loss 0.00204689, throughput 9.42042K wps
Begin Testing...
[Epoch 85] train avg loss 0.00200129, dev acc 0.9067, dev avg loss 0.230463, throughput 9.47302K wps
[Epoch 86 Batch 30/162] avg loss 0.00177675, throughput 9.49244K wps
[Epoch 86 Batch 60/162] avg loss 0.00202632, throughput 9.66008K wps
[Epoch 86 Batch 90/162] avg loss 0.00204808, throughput 9.38044K wps
[Epoch 86 Batch 120/162] avg loss 0.00215758, throughput 9.52754K wps
[Epoch 86 Batch 150/162] avg loss 0.00180453, throughput 9.42865K wps
Begin Testing...
[Epoch 86] train avg loss 0.00196139, dev acc 0.9078, dev avg loss 0.231065, throughput 9.48334K wps
[Epoch 87 Batch 30/162] avg loss 0.0019585, throughput 9.55588K wps
[Epoch 87 Batch 60/162] avg loss 0.00199308, throughput 9.46191K wps
[Epoch 87 Batch 90/162] avg loss 0.00180912, throughput 9.33256K wps
[Epoch 87 Batch 120/162] avg loss 0.00202829, throughput 9.53227K wps
[Epoch 87 Batch 150/162] avg loss 0.00181426, throughput 9.34087K wps
Begin Testing...
[Epoch 87] train avg loss 0.00192785, dev acc 0.9067, dev avg loss 0.230277, throughput 9.44801K wps
[Epoch 88 Batch 30/162] avg loss 0.00176082, throughput 9.42677K wps
[Epoch 88 Batch 60/162] avg loss 0.0020671, throughput 9.46952K wps
[Epoch 88 Batch 90/162] avg loss 0.0018393, throughput 9.4635K wps
[Epoch 88 Batch 120/162] avg loss 0.00189293, throughput 9.41574K wps
[Epoch 88 Batch 150/162] avg loss 0.00200162, throughput 9.41319K wps
Begin Testing...
[Epoch 88] train avg loss 0.00192076, dev acc 0.9067, dev avg loss 0.229971, throughput 9.44874K wps
[Epoch 89 Batch 30/162] avg loss 0.00196706, throughput 9.57919K wps
[Epoch 89 Batch 60/162] avg loss 0.00193885, throughput 9.35798K wps
[Epoch 89 Batch 90/162] avg loss 0.00202345, throughput 9.34419K wps
[Epoch 89 Batch 120/162] avg loss 0.00190312, throughput 9.41812K wps
[Epoch 89 Batch 150/162] avg loss 0.00173288, throughput 9.36746K wps
Begin Testing...
[Epoch 89] train avg loss 0.00191078, dev acc 0.9089, dev avg loss 0.231062, throughput 9.43163K wps
[Epoch 90 Batch 30/162] avg loss 0.001871, throughput 9.48289K wps
[Epoch 90 Batch 60/162] avg loss 0.00174085, throughput 9.46199K wps
[Epoch 90 Batch 90/162] avg loss 0.00205183, throughput 9.26592K wps
[Epoch 90 Batch 120/162] avg loss 0.0018345, throughput 9.47404K wps
[Epoch 90 Batch 150/162] avg loss 0.00207165, throughput 9.42786K wps
Begin Testing...
[Epoch 90] train avg loss 0.00190886, dev acc 0.9089, dev avg loss 0.229688, throughput 9.42216K wps
[Epoch 91 Batch 30/162] avg loss 0.00185121, throughput 9.64327K wps
[Epoch 91 Batch 60/162] avg loss 0.00188709, throughput 9.33688K wps
[Epoch 91 Batch 90/162] avg loss 0.00186984, throughput 9.43082K wps
[Epoch 91 Batch 120/162] avg loss 0.00178008, throughput 9.49693K wps
[Epoch 91 Batch 150/162] avg loss 0.00195783, throughput 9.25683K wps
Begin Testing...
[Epoch 91] train avg loss 0.00187586, dev acc 0.9100, dev avg loss 0.229561, throughput 9.43077K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/162] avg loss 0.00191878, throughput 9.58135K wps
[Epoch 92 Batch 60/162] avg loss 0.00185139, throughput 9.45728K wps
[Epoch 92 Batch 90/162] avg loss 0.00199974, throughput 9.49063K wps
[Epoch 92 Batch 120/162] avg loss 0.00185051, throughput 9.25664K wps
[Epoch 92 Batch 150/162] avg loss 0.00176807, throughput 9.41778K wps
Begin Testing...
[Epoch 92] train avg loss 0.00186376, dev acc 0.9089, dev avg loss 0.229451, throughput 9.45148K wps
[Epoch 93 Batch 30/162] avg loss 0.00169152, throughput 9.58804K wps
[Epoch 93 Batch 60/162] avg loss 0.00184662, throughput 9.37001K wps
[Epoch 93 Batch 90/162] avg loss 0.00181727, throughput 9.30156K wps
[Epoch 93 Batch 120/162] avg loss 0.00197526, throughput 9.57869K wps
[Epoch 93 Batch 150/162] avg loss 0.00155701, throughput 9.44153K wps
Begin Testing...
[Epoch 93] train avg loss 0.00177294, dev acc 0.9067, dev avg loss 0.229503, throughput 9.44153K wps
[Epoch 94 Batch 30/162] avg loss 0.0017501, throughput 9.54151K wps
[Epoch 94 Batch 60/162] avg loss 0.00189529, throughput 9.4493K wps
[Epoch 94 Batch 90/162] avg loss 0.00171395, throughput 9.34279K wps
[Epoch 94 Batch 120/162] avg loss 0.00168188, throughput 9.3006K wps
[Epoch 94 Batch 150/162] avg loss 0.00176516, throughput 9.26229K wps
Begin Testing...
[Epoch 94] train avg loss 0.00177128, dev acc 0.9089, dev avg loss 0.229222, throughput 9.3895K wps
[Epoch 95 Batch 30/162] avg loss 0.00192507, throughput 9.48517K wps
[Epoch 95 Batch 60/162] avg loss 0.00171763, throughput 9.40488K wps
[Epoch 95 Batch 90/162] avg loss 0.00158727, throughput 9.49213K wps
[Epoch 95 Batch 120/162] avg loss 0.00190603, throughput 9.37772K wps
[Epoch 95 Batch 150/162] avg loss 0.00168699, throughput 9.34463K wps
Begin Testing...
[Epoch 95] train avg loss 0.00175758, dev acc 0.9056, dev avg loss 0.229732, throughput 9.41329K wps
[Epoch 96 Batch 30/162] avg loss 0.00164219, throughput 9.64598K wps
[Epoch 96 Batch 60/162] avg loss 0.00180407, throughput 9.26244K wps
[Epoch 96 Batch 90/162] avg loss 0.00168376, throughput 9.47289K wps
[Epoch 96 Batch 120/162] avg loss 0.00174691, throughput 9.56701K wps
[Epoch 96 Batch 150/162] avg loss 0.00146591, throughput 9.55153K wps
Begin Testing...
[Epoch 96] train avg loss 0.00166984, dev acc 0.9056, dev avg loss 0.230337, throughput 9.4896K wps
[Epoch 97 Batch 30/162] avg loss 0.00164687, throughput 9.45882K wps
[Epoch 97 Batch 60/162] avg loss 0.00149098, throughput 9.34822K wps
[Epoch 97 Batch 90/162] avg loss 0.00176893, throughput 9.42753K wps
[Epoch 97 Batch 120/162] avg loss 0.00181979, throughput 9.41794K wps
[Epoch 97 Batch 150/162] avg loss 0.00175131, throughput 9.41042K wps
Begin Testing...
[Epoch 97] train avg loss 0.00171622, dev acc 0.9056, dev avg loss 0.22993, throughput 9.42032K wps
[Epoch 98 Batch 30/162] avg loss 0.00152995, throughput 9.69674K wps
[Epoch 98 Batch 60/162] avg loss 0.00145983, throughput 9.30966K wps
[Epoch 98 Batch 90/162] avg loss 0.0015705, throughput 9.51451K wps
[Epoch 98 Batch 120/162] avg loss 0.00182273, throughput 9.4183K wps
[Epoch 98 Batch 150/162] avg loss 0.00193128, throughput 9.33212K wps
Begin Testing...
[Epoch 98] train avg loss 0.00168408, dev acc 0.9067, dev avg loss 0.230773, throughput 9.46114K wps
[Epoch 99 Batch 30/162] avg loss 0.00166608, throughput 9.61557K wps
[Epoch 99 Batch 60/162] avg loss 0.00174995, throughput 9.18671K wps
[Epoch 99 Batch 90/162] avg loss 0.0017058, throughput 9.48183K wps
[Epoch 99 Batch 120/162] avg loss 0.00178574, throughput 9.26609K wps
[Epoch 99 Batch 150/162] avg loss 0.00156201, throughput 9.28227K wps
Begin Testing...
[Epoch 99] train avg loss 0.0016845, dev acc 0.9044, dev avg loss 0.229811, throughput 9.35877K wps
[Epoch 100 Batch 30/162] avg loss 0.00161489, throughput 9.53247K wps
[Epoch 100 Batch 60/162] avg loss 0.00151833, throughput 9.30704K wps
[Epoch 100 Batch 90/162] avg loss 0.00163977, throughput 9.63101K wps
[Epoch 100 Batch 120/162] avg loss 0.00157494, throughput 9.45964K wps
[Epoch 100 Batch 150/162] avg loss 0.00160811, throughput 9.34586K wps
Begin Testing...
[Epoch 100] train avg loss 0.00158965, dev acc 0.9067, dev avg loss 0.229617, throughput 9.4506K wps
[Epoch 101 Batch 30/162] avg loss 0.00155059, throughput 9.58724K wps
[Epoch 101 Batch 60/162] avg loss 0.00159646, throughput 9.43018K wps
[Epoch 101 Batch 90/162] avg loss 0.00153429, throughput 9.39148K wps
[Epoch 101 Batch 120/162] avg loss 0.0017968, throughput 9.41788K wps
[Epoch 101 Batch 150/162] avg loss 0.00141439, throughput 9.53901K wps
Begin Testing...
[Epoch 101] train avg loss 0.00158273, dev acc 0.9056, dev avg loss 0.229857, throughput 9.48433K wps
[Epoch 102 Batch 30/162] avg loss 0.00145153, throughput 9.51218K wps
[Epoch 102 Batch 60/162] avg loss 0.00146349, throughput 9.39322K wps
[Epoch 102 Batch 90/162] avg loss 0.00155615, throughput 9.55787K wps
[Epoch 102 Batch 120/162] avg loss 0.00171872, throughput 9.48334K wps
[Epoch 102 Batch 150/162] avg loss 0.00179406, throughput 9.36459K wps
Begin Testing...
[Epoch 102] train avg loss 0.00158761, dev acc 0.9056, dev avg loss 0.229883, throughput 9.4716K wps
[Epoch 103 Batch 30/162] avg loss 0.0016011, throughput 9.51903K wps
[Epoch 103 Batch 60/162] avg loss 0.00142531, throughput 9.58996K wps
[Epoch 103 Batch 90/162] avg loss 0.00175576, throughput 9.39976K wps
[Epoch 103 Batch 120/162] avg loss 0.00163905, throughput 9.37047K wps
[Epoch 103 Batch 150/162] avg loss 0.00150459, throughput 9.43596K wps
Begin Testing...
[Epoch 103] train avg loss 0.00157957, dev acc 0.9089, dev avg loss 0.229327, throughput 9.47277K wps
[Epoch 104 Batch 30/162] avg loss 0.00153513, throughput 9.33868K wps
[Epoch 104 Batch 60/162] avg loss 0.00149964, throughput 9.5506K wps
[Epoch 104 Batch 90/162] avg loss 0.00160712, throughput 9.2441K wps
[Epoch 104 Batch 120/162] avg loss 0.00144635, throughput 9.53K wps
[Epoch 104 Batch 150/162] avg loss 0.00153667, throughput 9.47824K wps
Begin Testing...
[Epoch 104] train avg loss 0.00155162, dev acc 0.9078, dev avg loss 0.229468, throughput 9.40776K wps
[Epoch 105 Batch 30/162] avg loss 0.00141183, throughput 9.51289K wps
[Epoch 105 Batch 60/162] avg loss 0.00151852, throughput 9.49043K wps
[Epoch 105 Batch 90/162] avg loss 0.00152356, throughput 9.39677K wps
[Epoch 105 Batch 120/162] avg loss 0.0015766, throughput 9.27471K wps
[Epoch 105 Batch 150/162] avg loss 0.00143873, throughput 9.32436K wps
Begin Testing...
[Epoch 105] train avg loss 0.00149833, dev acc 0.9067, dev avg loss 0.229239, throughput 9.40824K wps
[Epoch 106 Batch 30/162] avg loss 0.00154192, throughput 9.75021K wps
[Epoch 106 Batch 60/162] avg loss 0.0015088, throughput 9.44832K wps
[Epoch 106 Batch 90/162] avg loss 0.00148075, throughput 9.30569K wps
[Epoch 106 Batch 120/162] avg loss 0.00147681, throughput 9.3575K wps
[Epoch 106 Batch 150/162] avg loss 0.00143831, throughput 9.3803K wps
Begin Testing...
[Epoch 106] train avg loss 0.00147954, dev acc 0.9078, dev avg loss 0.229283, throughput 9.42957K wps
[Epoch 107 Batch 30/162] avg loss 0.00161458, throughput 9.68302K wps
[Epoch 107 Batch 60/162] avg loss 0.00136929, throughput 9.38921K wps
[Epoch 107 Batch 90/162] avg loss 0.00169635, throughput 9.27688K wps
[Epoch 107 Batch 120/162] avg loss 0.00150827, throughput 9.4801K wps
[Epoch 107 Batch 150/162] avg loss 0.00157872, throughput 9.39013K wps
Begin Testing...
[Epoch 107] train avg loss 0.00155469, dev acc 0.9078, dev avg loss 0.229541, throughput 9.42395K wps
[Epoch 108 Batch 30/162] avg loss 0.00155633, throughput 9.56858K wps
[Epoch 108 Batch 60/162] avg loss 0.00148016, throughput 9.56082K wps
[Epoch 108 Batch 90/162] avg loss 0.0015239, throughput 9.29913K wps
[Epoch 108 Batch 120/162] avg loss 0.00149799, throughput 9.50152K wps
[Epoch 108 Batch 150/162] avg loss 0.00138366, throughput 9.34164K wps
Begin Testing...
[Epoch 108] train avg loss 0.00148478, dev acc 0.9067, dev avg loss 0.23007, throughput 9.44638K wps
[Epoch 109 Batch 30/162] avg loss 0.00148439, throughput 9.60358K wps
[Epoch 109 Batch 60/162] avg loss 0.00142033, throughput 9.33026K wps
[Epoch 109 Batch 90/162] avg loss 0.00131167, throughput 9.40558K wps
[Epoch 109 Batch 120/162] avg loss 0.00148522, throughput 9.38532K wps
[Epoch 109 Batch 150/162] avg loss 0.0016369, throughput 9.36925K wps
Begin Testing...
[Epoch 109] train avg loss 0.00146949, dev acc 0.9044, dev avg loss 0.230278, throughput 9.41857K wps
[Epoch 110 Batch 30/162] avg loss 0.00143108, throughput 9.57213K wps
[Epoch 110 Batch 60/162] avg loss 0.00141821, throughput 9.59217K wps
[Epoch 110 Batch 90/162] avg loss 0.00134611, throughput 9.18034K wps
[Epoch 110 Batch 120/162] avg loss 0.00144927, throughput 9.48332K wps
[Epoch 110 Batch 150/162] avg loss 0.00164894, throughput 9.54565K wps
Begin Testing...
[Epoch 110] train avg loss 0.00145218, dev acc 0.9044, dev avg loss 0.229918, throughput 9.46688K wps
[Epoch 111 Batch 30/162] avg loss 0.00155226, throughput 9.71186K wps
[Epoch 111 Batch 60/162] avg loss 0.00130029, throughput 9.2966K wps
[Epoch 111 Batch 90/162] avg loss 0.00141153, throughput 9.27986K wps
[Epoch 111 Batch 120/162] avg loss 0.00144165, throughput 9.43018K wps
[Epoch 111 Batch 150/162] avg loss 0.00145312, throughput 9.34458K wps
Begin Testing...
[Epoch 111] train avg loss 0.00143556, dev acc 0.9067, dev avg loss 0.230067, throughput 9.42379K wps
[Epoch 112 Batch 30/162] avg loss 0.00139717, throughput 9.45331K wps
[Epoch 112 Batch 60/162] avg loss 0.00138277, throughput 9.41432K wps
[Epoch 112 Batch 90/162] avg loss 0.00147046, throughput 9.40112K wps
[Epoch 112 Batch 120/162] avg loss 0.0013802, throughput 9.26478K wps
[Epoch 112 Batch 150/162] avg loss 0.00127427, throughput 9.24456K wps
Begin Testing...
[Epoch 112] train avg loss 0.00136954, dev acc 0.9067, dev avg loss 0.229929, throughput 9.34168K wps
[Epoch 113 Batch 30/162] avg loss 0.00143926, throughput 9.31985K wps
[Epoch 113 Batch 60/162] avg loss 0.00141941, throughput 9.49312K wps
[Epoch 113 Batch 90/162] avg loss 0.00138592, throughput 9.25285K wps
[Epoch 113 Batch 120/162] avg loss 0.0014533, throughput 9.49174K wps
[Epoch 113 Batch 150/162] avg loss 0.00131121, throughput 9.22303K wps
Begin Testing...
[Epoch 113] train avg loss 0.00139108, dev acc 0.9056, dev avg loss 0.230654, throughput 9.36915K wps
[Epoch 114 Batch 30/162] avg loss 0.00132429, throughput 9.55196K wps
[Epoch 114 Batch 60/162] avg loss 0.00138075, throughput 9.25876K wps
[Epoch 114 Batch 90/162] avg loss 0.00130877, throughput 9.51793K wps
[Epoch 114 Batch 120/162] avg loss 0.00149331, throughput 9.36408K wps
[Epoch 114 Batch 150/162] avg loss 0.00127485, throughput 9.24873K wps
Begin Testing...
[Epoch 114] train avg loss 0.00134654, dev acc 0.9089, dev avg loss 0.229764, throughput 9.39565K wps
[Epoch 115 Batch 30/162] avg loss 0.00143874, throughput 9.50552K wps
[Epoch 115 Batch 60/162] avg loss 0.0012415, throughput 9.275K wps
[Epoch 115 Batch 90/162] avg loss 0.00140917, throughput 9.57682K wps
[Epoch 115 Batch 120/162] avg loss 0.00133977, throughput 9.35983K wps
[Epoch 115 Batch 150/162] avg loss 0.00131923, throughput 9.24978K wps
Begin Testing...
[Epoch 115] train avg loss 0.0013441, dev acc 0.9078, dev avg loss 0.229782, throughput 9.3838K wps
[Epoch 116 Batch 30/162] avg loss 0.00130383, throughput 9.59202K wps
[Epoch 116 Batch 60/162] avg loss 0.00136091, throughput 9.53288K wps
[Epoch 116 Batch 90/162] avg loss 0.00137945, throughput 9.48212K wps
[Epoch 116 Batch 120/162] avg loss 0.00136721, throughput 9.42407K wps
[Epoch 116 Batch 150/162] avg loss 0.0013071, throughput 9.43917K wps
Begin Testing...
[Epoch 116] train avg loss 0.00134737, dev acc 0.9089, dev avg loss 0.229896, throughput 9.50402K wps
[Epoch 117 Batch 30/162] avg loss 0.00134371, throughput 9.66092K wps
[Epoch 117 Batch 60/162] avg loss 0.00131858, throughput 9.41767K wps
[Epoch 117 Batch 90/162] avg loss 0.001278, throughput 9.58842K wps
[Epoch 117 Batch 120/162] avg loss 0.00138105, throughput 9.36733K wps
[Epoch 117 Batch 150/162] avg loss 0.00132221, throughput 9.27931K wps
Begin Testing...
[Epoch 117] train avg loss 0.00132176, dev acc 0.9089, dev avg loss 0.230519, throughput 9.46554K wps
[Epoch 118 Batch 30/162] avg loss 0.00149979, throughput 9.60908K wps
[Epoch 118 Batch 60/162] avg loss 0.00116583, throughput 9.47154K wps
[Epoch 118 Batch 90/162] avg loss 0.00124398, throughput 9.3079K wps
[Epoch 118 Batch 120/162] avg loss 0.00132941, throughput 9.37475K wps
[Epoch 118 Batch 150/162] avg loss 0.00113277, throughput 9.42613K wps
Begin Testing...
[Epoch 118] train avg loss 0.00127486, dev acc 0.9089, dev avg loss 0.230407, throughput 9.43966K wps
[Epoch 119 Batch 30/162] avg loss 0.00133807, throughput 9.57203K wps
[Epoch 119 Batch 60/162] avg loss 0.00126535, throughput 9.41099K wps
[Epoch 119 Batch 90/162] avg loss 0.00132665, throughput 9.52672K wps
[Epoch 119 Batch 120/162] avg loss 0.00127105, throughput 9.32167K wps
[Epoch 119 Batch 150/162] avg loss 0.00130381, throughput 9.5156K wps
Begin Testing...
[Epoch 119] train avg loss 0.00129623, dev acc 0.9078, dev avg loss 0.230656, throughput 9.45224K wps
[Epoch 120 Batch 30/162] avg loss 0.00119283, throughput 9.74089K wps
[Epoch 120 Batch 60/162] avg loss 0.00130535, throughput 9.2668K wps
[Epoch 120 Batch 90/162] avg loss 0.00129203, throughput 9.41908K wps
[Epoch 120 Batch 120/162] avg loss 0.00122921, throughput 9.2502K wps
[Epoch 120 Batch 150/162] avg loss 0.00121516, throughput 9.36331K wps
Begin Testing...
[Epoch 120] train avg loss 0.00124534, dev acc 0.9067, dev avg loss 0.232324, throughput 9.39858K wps
[Epoch 121 Batch 30/162] avg loss 0.00130413, throughput 9.51718K wps
[Epoch 121 Batch 60/162] avg loss 0.00123464, throughput 9.31816K wps
[Epoch 121 Batch 90/162] avg loss 0.00131969, throughput 9.32898K wps
[Epoch 121 Batch 120/162] avg loss 0.00112522, throughput 9.52111K wps
[Epoch 121 Batch 150/162] avg loss 0.00135551, throughput 9.27221K wps
Begin Testing...
[Epoch 121] train avg loss 0.00126869, dev acc 0.9089, dev avg loss 0.232824, throughput 9.38386K wps
[Epoch 122 Batch 30/162] avg loss 0.00121979, throughput 9.61797K wps
[Epoch 122 Batch 60/162] avg loss 0.00128912, throughput 9.48748K wps
[Epoch 122 Batch 90/162] avg loss 0.00114383, throughput 9.51794K wps
[Epoch 122 Batch 120/162] avg loss 0.00123551, throughput 9.40637K wps
[Epoch 122 Batch 150/162] avg loss 0.0013794, throughput 9.27151K wps
Begin Testing...
[Epoch 122] train avg loss 0.00125345, dev acc 0.9089, dev avg loss 0.231178, throughput 9.47191K wps
[Epoch 123 Batch 30/162] avg loss 0.00123153, throughput 9.44986K wps
[Epoch 123 Batch 60/162] avg loss 0.00117243, throughput 9.37411K wps
[Epoch 123 Batch 90/162] avg loss 0.00115469, throughput 9.23474K wps
[Epoch 123 Batch 120/162] avg loss 0.00120026, throughput 9.60462K wps
[Epoch 123 Batch 150/162] avg loss 0.00130324, throughput 9.31313K wps
Begin Testing...
[Epoch 123] train avg loss 0.00120856, dev acc 0.9100, dev avg loss 0.230575, throughput 9.38487K wps
Observed Improvement.
Begin Testing...
[Epoch 124 Batch 30/162] avg loss 0.00131587, throughput 9.45153K wps
[Epoch 124 Batch 60/162] avg loss 0.00107548, throughput 9.32783K wps
[Epoch 124 Batch 90/162] avg loss 0.00116448, throughput 9.38583K wps
[Epoch 124 Batch 120/162] avg loss 0.00123663, throughput 9.40002K wps
[Epoch 124 Batch 150/162] avg loss 0.00116129, throughput 9.54536K wps
Begin Testing...
[Epoch 124] train avg loss 0.00119748, dev acc 0.9078, dev avg loss 0.231368, throughput 9.4279K wps
[Epoch 125 Batch 30/162] avg loss 0.00122607, throughput 9.60759K wps
[Epoch 125 Batch 60/162] avg loss 0.000989195, throughput 9.34235K wps
[Epoch 125 Batch 90/162] avg loss 0.00120134, throughput 9.42609K wps
[Epoch 125 Batch 120/162] avg loss 0.00116429, throughput 9.30946K wps
[Epoch 125 Batch 150/162] avg loss 0.00120988, throughput 9.45256K wps
Begin Testing...
[Epoch 125] train avg loss 0.00117088, dev acc 0.9067, dev avg loss 0.232322, throughput 9.43699K wps
[Epoch 126 Batch 30/162] avg loss 0.00125266, throughput 9.48973K wps
[Epoch 126 Batch 60/162] avg loss 0.00116606, throughput 9.36464K wps
[Epoch 126 Batch 90/162] avg loss 0.00106504, throughput 9.47364K wps
[Epoch 126 Batch 120/162] avg loss 0.00133055, throughput 9.3321K wps
[Epoch 126 Batch 150/162] avg loss 0.00117007, throughput 9.45828K wps
Begin Testing...
[Epoch 126] train avg loss 0.00119145, dev acc 0.9089, dev avg loss 0.230981, throughput 9.41892K wps
[Epoch 127 Batch 30/162] avg loss 0.00125357, throughput 9.55533K wps
[Epoch 127 Batch 60/162] avg loss 0.00118331, throughput 9.53164K wps
[Epoch 127 Batch 90/162] avg loss 0.00122439, throughput 9.4177K wps
[Epoch 127 Batch 120/162] avg loss 0.0011428, throughput 9.35033K wps
[Epoch 127 Batch 150/162] avg loss 0.00119592, throughput 9.29073K wps
Begin Testing...
[Epoch 127] train avg loss 0.00118048, dev acc 0.9089, dev avg loss 0.233368, throughput 9.43154K wps
[Epoch 128 Batch 30/162] avg loss 0.00127954, throughput 9.52522K wps
[Epoch 128 Batch 60/162] avg loss 0.00116916, throughput 9.47814K wps