/
SST-2_rand.log
10664 lines (10664 loc) · 742 KB
/
SST-2_rand.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Namespace(batch_size=50, data_name='SST-2', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='rand')
Use gpu0
Downloading data/sst-2/train-61f1f238.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/sst-2/train-61f1f238.zip...
Downloading data/sst-2/test-a39c1db6.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/sst-2/test-a39c1db6.zip...
Downloading data/sst-2/dev-65511587.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/sst-2/dev-65511587.zip...
maximum length (in tokens): 53
Done! Tokenizing Time=0.83s, #Sentences=76961
Done! Tokenizing Time=0.03s, #Sentences=1821
Done! Tokenizing Time=0.01s, #Sentences=872
SentimentNet(
(embedding): Embedding(17244 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/1540] avg loss 0.0138004, throughput 0.748774K wps
[Epoch 0 Batch 60/1540] avg loss 0.0138109, throughput 2.84564K wps
[Epoch 0 Batch 90/1540] avg loss 0.0137973, throughput 2.86792K wps
[Epoch 0 Batch 120/1540] avg loss 0.0138202, throughput 2.84788K wps
[Epoch 0 Batch 150/1540] avg loss 0.013823, throughput 2.84192K wps
[Epoch 0 Batch 180/1540] avg loss 0.0137685, throughput 2.87025K wps
[Epoch 0 Batch 210/1540] avg loss 0.0137934, throughput 2.85502K wps
[Epoch 0 Batch 240/1540] avg loss 0.0138037, throughput 2.86384K wps
[Epoch 0 Batch 270/1540] avg loss 0.0137626, throughput 2.83265K wps
[Epoch 0 Batch 300/1540] avg loss 0.0137211, throughput 2.85611K wps
[Epoch 0 Batch 330/1540] avg loss 0.0138154, throughput 2.84877K wps
[Epoch 0 Batch 360/1540] avg loss 0.0137426, throughput 2.80955K wps
[Epoch 0 Batch 390/1540] avg loss 0.0138279, throughput 2.85582K wps
[Epoch 0 Batch 420/1540] avg loss 0.0136362, throughput 2.82855K wps
[Epoch 0 Batch 450/1540] avg loss 0.0136607, throughput 2.84329K wps
[Epoch 0 Batch 480/1540] avg loss 0.0138069, throughput 2.812K wps
[Epoch 0 Batch 510/1540] avg loss 0.0137447, throughput 2.83391K wps
[Epoch 0 Batch 540/1540] avg loss 0.0136492, throughput 2.8742K wps
[Epoch 0 Batch 570/1540] avg loss 0.0137062, throughput 2.87247K wps
[Epoch 0 Batch 600/1540] avg loss 0.0135451, throughput 2.878K wps
[Epoch 0 Batch 630/1540] avg loss 0.0136163, throughput 2.85093K wps
[Epoch 0 Batch 660/1540] avg loss 0.0137568, throughput 2.86394K wps
[Epoch 0 Batch 690/1540] avg loss 0.0137163, throughput 2.84508K wps
[Epoch 0 Batch 720/1540] avg loss 0.013598, throughput 2.79075K wps
[Epoch 0 Batch 750/1540] avg loss 0.0137718, throughput 2.78031K wps
[Epoch 0 Batch 780/1540] avg loss 0.0137142, throughput 2.86873K wps
[Epoch 0 Batch 810/1540] avg loss 0.0135996, throughput 2.86383K wps
[Epoch 0 Batch 840/1540] avg loss 0.013755, throughput 2.77059K wps
[Epoch 0 Batch 870/1540] avg loss 0.0137318, throughput 2.8167K wps
[Epoch 0 Batch 900/1540] avg loss 0.0136668, throughput 2.84457K wps
[Epoch 0 Batch 930/1540] avg loss 0.0137071, throughput 2.80267K wps
[Epoch 0 Batch 960/1540] avg loss 0.0137056, throughput 2.87838K wps
[Epoch 0 Batch 990/1540] avg loss 0.0137057, throughput 2.84067K wps
[Epoch 0 Batch 1020/1540] avg loss 0.0136072, throughput 2.82948K wps
[Epoch 0 Batch 1050/1540] avg loss 0.0136353, throughput 2.87821K wps
[Epoch 0 Batch 1080/1540] avg loss 0.013673, throughput 2.8781K wps
[Epoch 0 Batch 1110/1540] avg loss 0.0136834, throughput 2.87317K wps
[Epoch 0 Batch 1140/1540] avg loss 0.0136265, throughput 2.83966K wps
[Epoch 0 Batch 1170/1540] avg loss 0.0137202, throughput 2.83108K wps
[Epoch 0 Batch 1200/1540] avg loss 0.0136948, throughput 2.85214K wps
[Epoch 0 Batch 1230/1540] avg loss 0.0137064, throughput 2.8669K wps
[Epoch 0 Batch 1260/1540] avg loss 0.0137498, throughput 2.80595K wps
[Epoch 0 Batch 1290/1540] avg loss 0.0137196, throughput 2.77533K wps
[Epoch 0 Batch 1320/1540] avg loss 0.0136522, throughput 2.8534K wps
[Epoch 0 Batch 1350/1540] avg loss 0.013645, throughput 2.865K wps
[Epoch 0 Batch 1380/1540] avg loss 0.0136207, throughput 2.85378K wps
[Epoch 0 Batch 1410/1540] avg loss 0.0136265, throughput 2.84326K wps
[Epoch 0 Batch 1440/1540] avg loss 0.0136989, throughput 2.79218K wps
[Epoch 0 Batch 1470/1540] avg loss 0.013661, throughput 2.8306K wps
[Epoch 0 Batch 1500/1540] avg loss 0.013638, throughput 2.87707K wps
[Epoch 0 Batch 1530/1540] avg loss 0.0136524, throughput 2.87419K wps
Begin Testing...
[Epoch 0] train avg loss 0.013714, dev acc 0.5596, dev avg loss 0.687023, throughput 2.60711K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 1 Batch 30/1540] avg loss 0.0135832, throughput 2.86227K wps
[Epoch 1 Batch 60/1540] avg loss 0.0135932, throughput 2.82791K wps
[Epoch 1 Batch 90/1540] avg loss 0.0136373, throughput 2.87172K wps
[Epoch 1 Batch 120/1540] avg loss 0.0134875, throughput 2.86972K wps
[Epoch 1 Batch 150/1540] avg loss 0.0136505, throughput 2.85811K wps
[Epoch 1 Batch 180/1540] avg loss 0.0136201, throughput 2.82788K wps
[Epoch 1 Batch 210/1540] avg loss 0.013589, throughput 2.85694K wps
[Epoch 1 Batch 240/1540] avg loss 0.0135679, throughput 2.87736K wps
[Epoch 1 Batch 270/1540] avg loss 0.0135552, throughput 2.87208K wps
[Epoch 1 Batch 300/1540] avg loss 0.0136459, throughput 2.84159K wps
[Epoch 1 Batch 330/1540] avg loss 0.0136596, throughput 2.78248K wps
[Epoch 1 Batch 360/1540] avg loss 0.0135834, throughput 2.85935K wps
[Epoch 1 Batch 390/1540] avg loss 0.0136186, throughput 2.84167K wps
[Epoch 1 Batch 420/1540] avg loss 0.0136105, throughput 2.80499K wps
[Epoch 1 Batch 450/1540] avg loss 0.0137079, throughput 2.8343K wps
[Epoch 1 Batch 480/1540] avg loss 0.0134815, throughput 2.80809K wps
[Epoch 1 Batch 510/1540] avg loss 0.0136119, throughput 2.88064K wps
[Epoch 1 Batch 540/1540] avg loss 0.0134643, throughput 2.87233K wps
[Epoch 1 Batch 570/1540] avg loss 0.0136057, throughput 2.80412K wps
[Epoch 1 Batch 600/1540] avg loss 0.0135286, throughput 2.8745K wps
[Epoch 1 Batch 630/1540] avg loss 0.0134499, throughput 2.83625K wps
[Epoch 1 Batch 660/1540] avg loss 0.0135644, throughput 2.83596K wps
[Epoch 1 Batch 690/1540] avg loss 0.0135062, throughput 2.86191K wps
[Epoch 1 Batch 720/1540] avg loss 0.0135646, throughput 2.87746K wps
[Epoch 1 Batch 750/1540] avg loss 0.0136507, throughput 2.87634K wps
[Epoch 1 Batch 780/1540] avg loss 0.0135311, throughput 2.82194K wps
[Epoch 1 Batch 810/1540] avg loss 0.0136298, throughput 2.84957K wps
[Epoch 1 Batch 840/1540] avg loss 0.0135529, throughput 2.832K wps
[Epoch 1 Batch 870/1540] avg loss 0.0135618, throughput 2.87618K wps
[Epoch 1 Batch 900/1540] avg loss 0.013478, throughput 2.87753K wps
[Epoch 1 Batch 930/1540] avg loss 0.0136277, throughput 2.83478K wps
[Epoch 1 Batch 960/1540] avg loss 0.0134312, throughput 2.84745K wps
[Epoch 1 Batch 990/1540] avg loss 0.0135353, throughput 2.86332K wps
[Epoch 1 Batch 1020/1540] avg loss 0.0134932, throughput 2.82394K wps
[Epoch 1 Batch 1050/1540] avg loss 0.0134916, throughput 2.86456K wps
[Epoch 1 Batch 1080/1540] avg loss 0.0135267, throughput 2.88255K wps
[Epoch 1 Batch 1110/1540] avg loss 0.0135142, throughput 2.84973K wps
[Epoch 1 Batch 1140/1540] avg loss 0.0134453, throughput 2.78757K wps
[Epoch 1 Batch 1170/1540] avg loss 0.0134157, throughput 2.7842K wps
[Epoch 1 Batch 1200/1540] avg loss 0.0134685, throughput 2.83834K wps
[Epoch 1 Batch 1230/1540] avg loss 0.0136388, throughput 2.8008K wps
[Epoch 1 Batch 1260/1540] avg loss 0.0134214, throughput 2.85152K wps
[Epoch 1 Batch 1290/1540] avg loss 0.0133604, throughput 2.86685K wps
[Epoch 1 Batch 1320/1540] avg loss 0.0134374, throughput 2.86778K wps
[Epoch 1 Batch 1350/1540] avg loss 0.0135148, throughput 2.86068K wps
[Epoch 1 Batch 1380/1540] avg loss 0.0134876, throughput 2.86039K wps
[Epoch 1 Batch 1410/1540] avg loss 0.0134357, throughput 2.86624K wps
[Epoch 1 Batch 1440/1540] avg loss 0.013369, throughput 2.85966K wps
[Epoch 1 Batch 1470/1540] avg loss 0.0134107, throughput 2.85568K wps
[Epoch 1 Batch 1500/1540] avg loss 0.0134085, throughput 2.84484K wps
[Epoch 1 Batch 1530/1540] avg loss 0.0134981, throughput 2.87142K wps
Begin Testing...
[Epoch 1] train avg loss 0.0135392, dev acc 0.6307, dev avg loss 0.672574, throughput 2.8479K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 2 Batch 30/1540] avg loss 0.0133759, throughput 2.87948K wps
[Epoch 2 Batch 60/1540] avg loss 0.0132897, throughput 2.87358K wps
[Epoch 2 Batch 90/1540] avg loss 0.0134477, throughput 2.87964K wps
[Epoch 2 Batch 120/1540] avg loss 0.0133849, throughput 2.88407K wps
[Epoch 2 Batch 150/1540] avg loss 0.0134694, throughput 2.85355K wps
[Epoch 2 Batch 180/1540] avg loss 0.0134142, throughput 2.87801K wps
[Epoch 2 Batch 210/1540] avg loss 0.0133096, throughput 2.88083K wps
[Epoch 2 Batch 240/1540] avg loss 0.0134146, throughput 2.85674K wps
[Epoch 2 Batch 270/1540] avg loss 0.0134303, throughput 2.82348K wps
[Epoch 2 Batch 300/1540] avg loss 0.0133716, throughput 2.84946K wps
[Epoch 2 Batch 330/1540] avg loss 0.0133088, throughput 2.85484K wps
[Epoch 2 Batch 360/1540] avg loss 0.0133007, throughput 2.8751K wps
[Epoch 2 Batch 390/1540] avg loss 0.0133131, throughput 2.87699K wps
[Epoch 2 Batch 420/1540] avg loss 0.0133858, throughput 2.87544K wps
[Epoch 2 Batch 450/1540] avg loss 0.0133583, throughput 2.83946K wps
[Epoch 2 Batch 480/1540] avg loss 0.0132419, throughput 2.84357K wps
[Epoch 2 Batch 510/1540] avg loss 0.0133307, throughput 2.87255K wps
[Epoch 2 Batch 540/1540] avg loss 0.0131001, throughput 2.86574K wps
[Epoch 2 Batch 570/1540] avg loss 0.0133911, throughput 2.86791K wps
[Epoch 2 Batch 600/1540] avg loss 0.0133887, throughput 2.8555K wps
[Epoch 2 Batch 630/1540] avg loss 0.0133636, throughput 2.85588K wps
[Epoch 2 Batch 660/1540] avg loss 0.0133111, throughput 2.86026K wps
[Epoch 2 Batch 690/1540] avg loss 0.0132738, throughput 2.86662K wps
[Epoch 2 Batch 720/1540] avg loss 0.0131999, throughput 2.85669K wps
[Epoch 2 Batch 750/1540] avg loss 0.0132786, throughput 2.85791K wps
[Epoch 2 Batch 780/1540] avg loss 0.0132843, throughput 2.83769K wps
[Epoch 2 Batch 810/1540] avg loss 0.0133804, throughput 2.84937K wps
[Epoch 2 Batch 840/1540] avg loss 0.013238, throughput 2.83274K wps
[Epoch 2 Batch 870/1540] avg loss 0.0134047, throughput 2.83185K wps
[Epoch 2 Batch 900/1540] avg loss 0.0132575, throughput 2.86519K wps
[Epoch 2 Batch 930/1540] avg loss 0.0131964, throughput 2.81784K wps
[Epoch 2 Batch 960/1540] avg loss 0.0132731, throughput 2.85009K wps
[Epoch 2 Batch 990/1540] avg loss 0.0132442, throughput 2.84527K wps
[Epoch 2 Batch 1020/1540] avg loss 0.0132967, throughput 2.87948K wps
[Epoch 2 Batch 1050/1540] avg loss 0.0132044, throughput 2.85214K wps
[Epoch 2 Batch 1080/1540] avg loss 0.0131556, throughput 2.87514K wps
[Epoch 2 Batch 1110/1540] avg loss 0.0131895, throughput 2.86857K wps
[Epoch 2 Batch 1140/1540] avg loss 0.0132053, throughput 2.86702K wps
[Epoch 2 Batch 1170/1540] avg loss 0.0130771, throughput 2.85977K wps
[Epoch 2 Batch 1200/1540] avg loss 0.0131146, throughput 2.83268K wps
[Epoch 2 Batch 1230/1540] avg loss 0.0132125, throughput 2.87218K wps
[Epoch 2 Batch 1260/1540] avg loss 0.0130669, throughput 2.87333K wps
[Epoch 2 Batch 1290/1540] avg loss 0.013026, throughput 2.87589K wps
[Epoch 2 Batch 1320/1540] avg loss 0.012992, throughput 2.87523K wps
[Epoch 2 Batch 1350/1540] avg loss 0.0130996, throughput 2.87466K wps
[Epoch 2 Batch 1380/1540] avg loss 0.0131247, throughput 2.82573K wps
[Epoch 2 Batch 1410/1540] avg loss 0.0128969, throughput 2.85584K wps
[Epoch 2 Batch 1440/1540] avg loss 0.0130748, throughput 2.86088K wps
[Epoch 2 Batch 1470/1540] avg loss 0.0130485, throughput 2.87395K wps
[Epoch 2 Batch 1500/1540] avg loss 0.013037, throughput 2.86989K wps
[Epoch 2 Batch 1530/1540] avg loss 0.0130386, throughput 2.85385K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132523, dev acc 0.6800, dev avg loss 0.645965, throughput 2.85944K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 3 Batch 30/1540] avg loss 0.0129923, throughput 2.92239K wps
[Epoch 3 Batch 60/1540] avg loss 0.0130136, throughput 2.8446K wps
[Epoch 3 Batch 90/1540] avg loss 0.0130823, throughput 2.87191K wps
[Epoch 3 Batch 120/1540] avg loss 0.0131139, throughput 2.84983K wps
[Epoch 3 Batch 150/1540] avg loss 0.0128815, throughput 2.86898K wps
[Epoch 3 Batch 180/1540] avg loss 0.0129892, throughput 2.87856K wps
[Epoch 3 Batch 210/1540] avg loss 0.0129723, throughput 2.84054K wps
[Epoch 3 Batch 240/1540] avg loss 0.0130261, throughput 2.87275K wps
[Epoch 3 Batch 270/1540] avg loss 0.0130874, throughput 2.87694K wps
[Epoch 3 Batch 300/1540] avg loss 0.0129992, throughput 2.84739K wps
[Epoch 3 Batch 330/1540] avg loss 0.0127983, throughput 2.82931K wps
[Epoch 3 Batch 360/1540] avg loss 0.0128736, throughput 2.81006K wps
[Epoch 3 Batch 390/1540] avg loss 0.0126295, throughput 2.84396K wps
[Epoch 3 Batch 420/1540] avg loss 0.0129446, throughput 2.86718K wps
[Epoch 3 Batch 450/1540] avg loss 0.0129793, throughput 2.85866K wps
[Epoch 3 Batch 480/1540] avg loss 0.0128536, throughput 2.86042K wps
[Epoch 3 Batch 510/1540] avg loss 0.0128249, throughput 2.84524K wps
[Epoch 3 Batch 540/1540] avg loss 0.0128234, throughput 2.87389K wps
[Epoch 3 Batch 570/1540] avg loss 0.0125954, throughput 2.86267K wps
[Epoch 3 Batch 600/1540] avg loss 0.0128308, throughput 2.83533K wps
[Epoch 3 Batch 630/1540] avg loss 0.0129443, throughput 2.88006K wps
[Epoch 3 Batch 660/1540] avg loss 0.0127793, throughput 2.87001K wps
[Epoch 3 Batch 690/1540] avg loss 0.012889, throughput 2.87715K wps
[Epoch 3 Batch 720/1540] avg loss 0.0126644, throughput 2.85958K wps
[Epoch 3 Batch 750/1540] avg loss 0.0127483, throughput 2.84097K wps
[Epoch 3 Batch 780/1540] avg loss 0.0126084, throughput 2.83583K wps
[Epoch 3 Batch 810/1540] avg loss 0.0126001, throughput 2.86778K wps
[Epoch 3 Batch 840/1540] avg loss 0.0126695, throughput 2.85772K wps
[Epoch 3 Batch 870/1540] avg loss 0.0125709, throughput 2.87201K wps
[Epoch 3 Batch 900/1540] avg loss 0.0126231, throughput 2.82978K wps
[Epoch 3 Batch 930/1540] avg loss 0.0126425, throughput 2.85174K wps
[Epoch 3 Batch 960/1540] avg loss 0.0126107, throughput 2.80688K wps
[Epoch 3 Batch 990/1540] avg loss 0.0125201, throughput 2.85561K wps
[Epoch 3 Batch 1020/1540] avg loss 0.0127235, throughput 2.86877K wps
[Epoch 3 Batch 1050/1540] avg loss 0.0125715, throughput 2.84838K wps
[Epoch 3 Batch 1080/1540] avg loss 0.0124887, throughput 2.86394K wps
[Epoch 3 Batch 1110/1540] avg loss 0.0127491, throughput 2.83163K wps
[Epoch 3 Batch 1140/1540] avg loss 0.0124604, throughput 2.8722K wps
[Epoch 3 Batch 1170/1540] avg loss 0.0124426, throughput 2.83278K wps
[Epoch 3 Batch 1200/1540] avg loss 0.0125199, throughput 2.78996K wps
[Epoch 3 Batch 1230/1540] avg loss 0.0125532, throughput 2.84109K wps
[Epoch 3 Batch 1260/1540] avg loss 0.0124685, throughput 2.83328K wps
[Epoch 3 Batch 1290/1540] avg loss 0.0124559, throughput 2.86705K wps
[Epoch 3 Batch 1320/1540] avg loss 0.0125179, throughput 2.8554K wps
[Epoch 3 Batch 1350/1540] avg loss 0.0122735, throughput 2.87615K wps
[Epoch 3 Batch 1380/1540] avg loss 0.0125621, throughput 2.87624K wps
[Epoch 3 Batch 1410/1540] avg loss 0.0127375, throughput 2.87155K wps
[Epoch 3 Batch 1440/1540] avg loss 0.0122948, throughput 2.86514K wps
[Epoch 3 Batch 1470/1540] avg loss 0.0125175, throughput 2.83488K wps
[Epoch 3 Batch 1500/1540] avg loss 0.0123382, throughput 2.86703K wps
[Epoch 3 Batch 1530/1540] avg loss 0.012455, throughput 2.86083K wps
Begin Testing...
[Epoch 3] train avg loss 0.012718, dev acc 0.6755, dev avg loss 0.60974, throughput 2.85526K wps
[Epoch 4 Batch 30/1540] avg loss 0.0125572, throughput 2.92357K wps
[Epoch 4 Batch 60/1540] avg loss 0.0122396, throughput 2.86992K wps
[Epoch 4 Batch 90/1540] avg loss 0.0123901, throughput 2.85577K wps
[Epoch 4 Batch 120/1540] avg loss 0.012153, throughput 2.87528K wps
[Epoch 4 Batch 150/1540] avg loss 0.0123142, throughput 2.86424K wps
[Epoch 4 Batch 180/1540] avg loss 0.0122087, throughput 2.83155K wps
[Epoch 4 Batch 210/1540] avg loss 0.0120872, throughput 2.87105K wps
[Epoch 4 Batch 240/1540] avg loss 0.0121988, throughput 2.87384K wps
[Epoch 4 Batch 270/1540] avg loss 0.0121247, throughput 2.83189K wps
[Epoch 4 Batch 300/1540] avg loss 0.012065, throughput 2.8535K wps
[Epoch 4 Batch 330/1540] avg loss 0.0120954, throughput 2.82428K wps
[Epoch 4 Batch 360/1540] avg loss 0.0123357, throughput 2.84309K wps
[Epoch 4 Batch 390/1540] avg loss 0.0121202, throughput 2.84239K wps
[Epoch 4 Batch 420/1540] avg loss 0.0122495, throughput 2.87225K wps
[Epoch 4 Batch 450/1540] avg loss 0.0121583, throughput 2.85514K wps
[Epoch 4 Batch 480/1540] avg loss 0.0119734, throughput 2.85955K wps
[Epoch 4 Batch 510/1540] avg loss 0.0120053, throughput 2.86879K wps
[Epoch 4 Batch 540/1540] avg loss 0.0120942, throughput 2.85683K wps
[Epoch 4 Batch 570/1540] avg loss 0.0122327, throughput 2.85976K wps
[Epoch 4 Batch 600/1540] avg loss 0.0118178, throughput 2.84185K wps
[Epoch 4 Batch 630/1540] avg loss 0.0119157, throughput 2.79804K wps
[Epoch 4 Batch 660/1540] avg loss 0.0118483, throughput 2.85401K wps
[Epoch 4 Batch 690/1540] avg loss 0.0117789, throughput 2.86022K wps
[Epoch 4 Batch 720/1540] avg loss 0.011994, throughput 2.84566K wps
[Epoch 4 Batch 750/1540] avg loss 0.0117548, throughput 2.77875K wps
[Epoch 4 Batch 780/1540] avg loss 0.0116677, throughput 2.84742K wps
[Epoch 4 Batch 810/1540] avg loss 0.0116204, throughput 2.87215K wps
[Epoch 4 Batch 840/1540] avg loss 0.0119183, throughput 2.88186K wps
[Epoch 4 Batch 870/1540] avg loss 0.0118464, throughput 2.87139K wps
[Epoch 4 Batch 900/1540] avg loss 0.0118322, throughput 2.81555K wps
[Epoch 4 Batch 930/1540] avg loss 0.011724, throughput 2.86834K wps
[Epoch 4 Batch 960/1540] avg loss 0.0121262, throughput 2.87378K wps
[Epoch 4 Batch 990/1540] avg loss 0.0119363, throughput 2.86169K wps
[Epoch 4 Batch 1020/1540] avg loss 0.0118567, throughput 2.87693K wps
[Epoch 4 Batch 1050/1540] avg loss 0.0117349, throughput 2.80259K wps
[Epoch 4 Batch 1080/1540] avg loss 0.0113797, throughput 2.85144K wps
[Epoch 4 Batch 1110/1540] avg loss 0.011804, throughput 2.8186K wps
[Epoch 4 Batch 1140/1540] avg loss 0.0115873, throughput 2.86864K wps
[Epoch 4 Batch 1170/1540] avg loss 0.0117139, throughput 2.85465K wps
[Epoch 4 Batch 1200/1540] avg loss 0.0116297, throughput 2.86959K wps
[Epoch 4 Batch 1230/1540] avg loss 0.0116782, throughput 2.85461K wps
[Epoch 4 Batch 1260/1540] avg loss 0.0113475, throughput 2.846K wps
[Epoch 4 Batch 1290/1540] avg loss 0.0114436, throughput 2.864K wps
[Epoch 4 Batch 1320/1540] avg loss 0.011606, throughput 2.87069K wps
[Epoch 4 Batch 1350/1540] avg loss 0.0113964, throughput 2.85623K wps
[Epoch 4 Batch 1380/1540] avg loss 0.0113831, throughput 2.84508K wps
[Epoch 4 Batch 1410/1540] avg loss 0.0115043, throughput 2.87117K wps
[Epoch 4 Batch 1440/1540] avg loss 0.0114194, throughput 2.86567K wps
[Epoch 4 Batch 1470/1540] avg loss 0.0115309, throughput 2.82584K wps
[Epoch 4 Batch 1500/1540] avg loss 0.011224, throughput 2.78014K wps
[Epoch 4 Batch 1530/1540] avg loss 0.0115118, throughput 2.81791K wps
Begin Testing...
[Epoch 4] train avg loss 0.0118697, dev acc 0.7328, dev avg loss 0.563926, throughput 2.85161K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 5 Batch 30/1540] avg loss 0.0112126, throughput 2.89362K wps
[Epoch 5 Batch 60/1540] avg loss 0.0112174, throughput 2.84711K wps
[Epoch 5 Batch 90/1540] avg loss 0.0111179, throughput 2.77902K wps
[Epoch 5 Batch 120/1540] avg loss 0.0111249, throughput 2.81623K wps
[Epoch 5 Batch 150/1540] avg loss 0.010953, throughput 2.86461K wps
[Epoch 5 Batch 180/1540] avg loss 0.0110536, throughput 2.83347K wps
[Epoch 5 Batch 210/1540] avg loss 0.0108626, throughput 2.85891K wps
[Epoch 5 Batch 240/1540] avg loss 0.0108945, throughput 2.82328K wps
[Epoch 5 Batch 270/1540] avg loss 0.0109154, throughput 2.83112K wps
[Epoch 5 Batch 300/1540] avg loss 0.0109706, throughput 2.81987K wps
[Epoch 5 Batch 330/1540] avg loss 0.0110977, throughput 2.87863K wps
[Epoch 5 Batch 360/1540] avg loss 0.0109776, throughput 2.87388K wps
[Epoch 5 Batch 390/1540] avg loss 0.0111206, throughput 2.86252K wps
[Epoch 5 Batch 420/1540] avg loss 0.0108639, throughput 2.80473K wps
[Epoch 5 Batch 450/1540] avg loss 0.0111617, throughput 2.83524K wps
[Epoch 5 Batch 480/1540] avg loss 0.0108822, throughput 2.87329K wps
[Epoch 5 Batch 510/1540] avg loss 0.0109191, throughput 2.86906K wps
[Epoch 5 Batch 540/1540] avg loss 0.0107287, throughput 2.86481K wps
[Epoch 5 Batch 570/1540] avg loss 0.0106061, throughput 2.8623K wps
[Epoch 5 Batch 600/1540] avg loss 0.0109577, throughput 2.85949K wps
[Epoch 5 Batch 630/1540] avg loss 0.0108638, throughput 2.86017K wps
[Epoch 5 Batch 660/1540] avg loss 0.0108706, throughput 2.87993K wps
[Epoch 5 Batch 690/1540] avg loss 0.0108266, throughput 2.80922K wps
[Epoch 5 Batch 720/1540] avg loss 0.0107844, throughput 2.87708K wps
[Epoch 5 Batch 750/1540] avg loss 0.010623, throughput 2.88049K wps
[Epoch 5 Batch 780/1540] avg loss 0.010808, throughput 2.87552K wps
[Epoch 5 Batch 810/1540] avg loss 0.0105911, throughput 2.87466K wps
[Epoch 5 Batch 840/1540] avg loss 0.0104744, throughput 2.8703K wps
[Epoch 5 Batch 870/1540] avg loss 0.0105409, throughput 2.87683K wps
[Epoch 5 Batch 900/1540] avg loss 0.0104029, throughput 2.82022K wps
[Epoch 5 Batch 930/1540] avg loss 0.0107438, throughput 2.87093K wps
[Epoch 5 Batch 960/1540] avg loss 0.0104448, throughput 2.85399K wps
[Epoch 5 Batch 990/1540] avg loss 0.0106657, throughput 2.80573K wps
[Epoch 5 Batch 1020/1540] avg loss 0.0101842, throughput 2.81487K wps
[Epoch 5 Batch 1050/1540] avg loss 0.0105361, throughput 2.83669K wps
[Epoch 5 Batch 1080/1540] avg loss 0.0102072, throughput 2.85008K wps
[Epoch 5 Batch 1110/1540] avg loss 0.010488, throughput 2.8502K wps
[Epoch 5 Batch 1140/1540] avg loss 0.0103501, throughput 2.79995K wps
[Epoch 5 Batch 1170/1540] avg loss 0.0101048, throughput 2.79695K wps
[Epoch 5 Batch 1200/1540] avg loss 0.0100308, throughput 2.80336K wps
[Epoch 5 Batch 1230/1540] avg loss 0.0103184, throughput 2.83569K wps
[Epoch 5 Batch 1260/1540] avg loss 0.0103771, throughput 2.85997K wps
[Epoch 5 Batch 1290/1540] avg loss 0.0105557, throughput 2.84893K wps
[Epoch 5 Batch 1320/1540] avg loss 0.0102983, throughput 2.83776K wps
[Epoch 5 Batch 1350/1540] avg loss 0.0100994, throughput 2.85406K wps
[Epoch 5 Batch 1380/1540] avg loss 0.0100077, throughput 2.86905K wps
[Epoch 5 Batch 1410/1540] avg loss 0.00984836, throughput 2.8778K wps
[Epoch 5 Batch 1440/1540] avg loss 0.0102557, throughput 2.84244K wps
[Epoch 5 Batch 1470/1540] avg loss 0.00991223, throughput 2.79344K wps
[Epoch 5 Batch 1500/1540] avg loss 0.00954781, throughput 2.77435K wps
[Epoch 5 Batch 1530/1540] avg loss 0.0100746, throughput 2.86985K wps
Begin Testing...
[Epoch 5] train avg loss 0.0106185, dev acc 0.7787, dev avg loss 0.519779, throughput 2.84537K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 6 Batch 30/1540] avg loss 0.0098245, throughput 2.88896K wps
[Epoch 6 Batch 60/1540] avg loss 0.00962132, throughput 2.84569K wps
[Epoch 6 Batch 90/1540] avg loss 0.0100202, throughput 2.84797K wps
[Epoch 6 Batch 120/1540] avg loss 0.00959598, throughput 2.85914K wps
[Epoch 6 Batch 150/1540] avg loss 0.00988194, throughput 2.88108K wps
[Epoch 6 Batch 180/1540] avg loss 0.0096392, throughput 2.85567K wps
[Epoch 6 Batch 210/1540] avg loss 0.00951429, throughput 2.81937K wps
[Epoch 6 Batch 240/1540] avg loss 0.00972758, throughput 2.84235K wps
[Epoch 6 Batch 270/1540] avg loss 0.00959005, throughput 2.81022K wps
[Epoch 6 Batch 300/1540] avg loss 0.00984259, throughput 2.85913K wps
[Epoch 6 Batch 330/1540] avg loss 0.0094906, throughput 2.87195K wps
[Epoch 6 Batch 360/1540] avg loss 0.00961345, throughput 2.83716K wps
[Epoch 6 Batch 390/1540] avg loss 0.0096149, throughput 2.84824K wps
[Epoch 6 Batch 420/1540] avg loss 0.00944561, throughput 2.80174K wps
[Epoch 6 Batch 450/1540] avg loss 0.00941356, throughput 2.78631K wps
[Epoch 6 Batch 480/1540] avg loss 0.00941492, throughput 2.79546K wps
[Epoch 6 Batch 510/1540] avg loss 0.00898626, throughput 2.81181K wps
[Epoch 6 Batch 540/1540] avg loss 0.00908801, throughput 2.86957K wps
[Epoch 6 Batch 570/1540] avg loss 0.00921519, throughput 2.85765K wps
[Epoch 6 Batch 600/1540] avg loss 0.00905014, throughput 2.85542K wps
[Epoch 6 Batch 630/1540] avg loss 0.00946734, throughput 2.86637K wps
[Epoch 6 Batch 660/1540] avg loss 0.00939207, throughput 2.8701K wps
[Epoch 6 Batch 690/1540] avg loss 0.00924371, throughput 2.84635K wps
[Epoch 6 Batch 720/1540] avg loss 0.00938426, throughput 2.86549K wps
[Epoch 6 Batch 750/1540] avg loss 0.00933438, throughput 2.87202K wps
[Epoch 6 Batch 780/1540] avg loss 0.00915644, throughput 2.84552K wps
[Epoch 6 Batch 810/1540] avg loss 0.00931434, throughput 2.87413K wps
[Epoch 6 Batch 840/1540] avg loss 0.00927732, throughput 2.87069K wps
[Epoch 6 Batch 870/1540] avg loss 0.00948765, throughput 2.83603K wps
[Epoch 6 Batch 900/1540] avg loss 0.00919847, throughput 2.81271K wps
[Epoch 6 Batch 930/1540] avg loss 0.00885069, throughput 2.78791K wps
[Epoch 6 Batch 960/1540] avg loss 0.00918094, throughput 2.78323K wps
[Epoch 6 Batch 990/1540] avg loss 0.00878389, throughput 2.76527K wps
[Epoch 6 Batch 1020/1540] avg loss 0.00903069, throughput 2.79435K wps
[Epoch 6 Batch 1050/1540] avg loss 0.00894816, throughput 2.81653K wps
[Epoch 6 Batch 1080/1540] avg loss 0.00915588, throughput 2.76909K wps
[Epoch 6 Batch 1110/1540] avg loss 0.0089336, throughput 2.80962K wps
[Epoch 6 Batch 1140/1540] avg loss 0.00915533, throughput 2.8671K wps
[Epoch 6 Batch 1170/1540] avg loss 0.00866029, throughput 2.85805K wps
[Epoch 6 Batch 1200/1540] avg loss 0.00894738, throughput 2.85913K wps
[Epoch 6 Batch 1230/1540] avg loss 0.00877337, throughput 2.86589K wps
[Epoch 6 Batch 1260/1540] avg loss 0.00891224, throughput 2.80636K wps
[Epoch 6 Batch 1290/1540] avg loss 0.00906667, throughput 2.83283K wps
[Epoch 6 Batch 1320/1540] avg loss 0.00862772, throughput 2.85331K wps
[Epoch 6 Batch 1350/1540] avg loss 0.00891718, throughput 2.81811K wps
[Epoch 6 Batch 1380/1540] avg loss 0.00897235, throughput 2.86313K wps
[Epoch 6 Batch 1410/1540] avg loss 0.00842482, throughput 2.86556K wps
[Epoch 6 Batch 1440/1540] avg loss 0.0087864, throughput 2.86064K wps
[Epoch 6 Batch 1470/1540] avg loss 0.00869268, throughput 2.86509K wps
[Epoch 6 Batch 1500/1540] avg loss 0.00844809, throughput 2.83551K wps
[Epoch 6 Batch 1530/1540] avg loss 0.00866327, throughput 2.80195K wps
Begin Testing...
[Epoch 6] train avg loss 0.00920906, dev acc 0.7970, dev avg loss 0.460756, throughput 2.83874K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 7 Batch 30/1540] avg loss 0.00825991, throughput 2.89203K wps
[Epoch 7 Batch 60/1540] avg loss 0.00881551, throughput 2.825K wps
[Epoch 7 Batch 90/1540] avg loss 0.00833516, throughput 2.85913K wps
[Epoch 7 Batch 120/1540] avg loss 0.00806622, throughput 2.85202K wps
[Epoch 7 Batch 150/1540] avg loss 0.00843859, throughput 2.83694K wps
[Epoch 7 Batch 180/1540] avg loss 0.00834409, throughput 2.84418K wps
[Epoch 7 Batch 210/1540] avg loss 0.00839169, throughput 2.86917K wps
[Epoch 7 Batch 240/1540] avg loss 0.00828628, throughput 2.85639K wps
[Epoch 7 Batch 270/1540] avg loss 0.00829326, throughput 2.84642K wps
[Epoch 7 Batch 300/1540] avg loss 0.00826126, throughput 2.86427K wps
[Epoch 7 Batch 330/1540] avg loss 0.00788044, throughput 2.85554K wps
[Epoch 7 Batch 360/1540] avg loss 0.00816125, throughput 2.8748K wps
[Epoch 7 Batch 390/1540] avg loss 0.00798977, throughput 2.87324K wps
[Epoch 7 Batch 420/1540] avg loss 0.00834245, throughput 2.8599K wps
[Epoch 7 Batch 450/1540] avg loss 0.0081372, throughput 2.87742K wps
[Epoch 7 Batch 480/1540] avg loss 0.00785917, throughput 2.87735K wps
[Epoch 7 Batch 510/1540] avg loss 0.00822373, throughput 2.87238K wps
[Epoch 7 Batch 540/1540] avg loss 0.00817121, throughput 2.80606K wps
[Epoch 7 Batch 570/1540] avg loss 0.00813353, throughput 2.84701K wps
[Epoch 7 Batch 600/1540] avg loss 0.00797193, throughput 2.87978K wps
[Epoch 7 Batch 630/1540] avg loss 0.00791665, throughput 2.87629K wps
[Epoch 7 Batch 660/1540] avg loss 0.0081805, throughput 2.83898K wps
[Epoch 7 Batch 690/1540] avg loss 0.00804365, throughput 2.80083K wps
[Epoch 7 Batch 720/1540] avg loss 0.00820712, throughput 2.78857K wps
[Epoch 7 Batch 750/1540] avg loss 0.00820856, throughput 2.8715K wps
[Epoch 7 Batch 780/1540] avg loss 0.00800739, throughput 2.87654K wps
[Epoch 7 Batch 810/1540] avg loss 0.00797157, throughput 2.87081K wps
[Epoch 7 Batch 840/1540] avg loss 0.0077511, throughput 2.87122K wps
[Epoch 7 Batch 870/1540] avg loss 0.00811865, throughput 2.87788K wps
[Epoch 7 Batch 900/1540] avg loss 0.00765177, throughput 2.84477K wps
[Epoch 7 Batch 930/1540] avg loss 0.00757489, throughput 2.80149K wps
[Epoch 7 Batch 960/1540] avg loss 0.00796375, throughput 2.83747K wps
[Epoch 7 Batch 990/1540] avg loss 0.00761744, throughput 2.86349K wps
[Epoch 7 Batch 1020/1540] avg loss 0.00782431, throughput 2.87655K wps
[Epoch 7 Batch 1050/1540] avg loss 0.00752091, throughput 2.8687K wps
[Epoch 7 Batch 1080/1540] avg loss 0.00769699, throughput 2.85456K wps
[Epoch 7 Batch 1110/1540] avg loss 0.00732822, throughput 2.87136K wps
[Epoch 7 Batch 1140/1540] avg loss 0.00779556, throughput 2.84986K wps
[Epoch 7 Batch 1170/1540] avg loss 0.00795337, throughput 2.84892K wps
[Epoch 7 Batch 1200/1540] avg loss 0.00762326, throughput 2.87471K wps
[Epoch 7 Batch 1230/1540] avg loss 0.00788853, throughput 2.85078K wps
[Epoch 7 Batch 1260/1540] avg loss 0.0078197, throughput 2.83619K wps
[Epoch 7 Batch 1290/1540] avg loss 0.00777646, throughput 2.78379K wps
[Epoch 7 Batch 1320/1540] avg loss 0.00789422, throughput 2.84311K wps
[Epoch 7 Batch 1350/1540] avg loss 0.00731866, throughput 2.86707K wps
[Epoch 7 Batch 1380/1540] avg loss 0.00752436, throughput 2.87333K wps
[Epoch 7 Batch 1410/1540] avg loss 0.00738245, throughput 2.87458K wps
[Epoch 7 Batch 1440/1540] avg loss 0.00764089, throughput 2.84285K wps
[Epoch 7 Batch 1470/1540] avg loss 0.00760671, throughput 2.86642K wps
[Epoch 7 Batch 1500/1540] avg loss 0.00744158, throughput 2.86612K wps
[Epoch 7 Batch 1530/1540] avg loss 0.00759741, throughput 2.87982K wps
Begin Testing...
[Epoch 7] train avg loss 0.00794728, dev acc 0.8005, dev avg loss 0.437719, throughput 2.85496K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 8 Batch 30/1540] avg loss 0.00700426, throughput 2.89692K wps
[Epoch 8 Batch 60/1540] avg loss 0.0071169, throughput 2.87137K wps
[Epoch 8 Batch 90/1540] avg loss 0.00701109, throughput 2.85659K wps
[Epoch 8 Batch 120/1540] avg loss 0.00726235, throughput 2.86725K wps
[Epoch 8 Batch 150/1540] avg loss 0.00723402, throughput 2.83711K wps
[Epoch 8 Batch 180/1540] avg loss 0.00723113, throughput 2.87776K wps
[Epoch 8 Batch 210/1540] avg loss 0.00725831, throughput 2.87068K wps
[Epoch 8 Batch 240/1540] avg loss 0.00715983, throughput 2.86004K wps
[Epoch 8 Batch 270/1540] avg loss 0.00693161, throughput 2.86467K wps
[Epoch 8 Batch 300/1540] avg loss 0.00693836, throughput 2.87248K wps
[Epoch 8 Batch 330/1540] avg loss 0.0071374, throughput 2.86431K wps
[Epoch 8 Batch 360/1540] avg loss 0.0073231, throughput 2.84465K wps
[Epoch 8 Batch 390/1540] avg loss 0.00678718, throughput 2.87325K wps
[Epoch 8 Batch 420/1540] avg loss 0.0070796, throughput 2.8738K wps
[Epoch 8 Batch 450/1540] avg loss 0.00725956, throughput 2.84717K wps
[Epoch 8 Batch 480/1540] avg loss 0.00720291, throughput 2.86924K wps
[Epoch 8 Batch 510/1540] avg loss 0.00717188, throughput 2.87498K wps
[Epoch 8 Batch 540/1540] avg loss 0.00691361, throughput 2.85918K wps
[Epoch 8 Batch 570/1540] avg loss 0.00734546, throughput 2.79205K wps
[Epoch 8 Batch 600/1540] avg loss 0.00704686, throughput 2.78737K wps
[Epoch 8 Batch 630/1540] avg loss 0.0071282, throughput 2.84455K wps
[Epoch 8 Batch 660/1540] avg loss 0.00688982, throughput 2.84754K wps
[Epoch 8 Batch 690/1540] avg loss 0.00730296, throughput 2.87633K wps
[Epoch 8 Batch 720/1540] avg loss 0.00687909, throughput 2.83198K wps
[Epoch 8 Batch 750/1540] avg loss 0.00730986, throughput 2.78896K wps
[Epoch 8 Batch 780/1540] avg loss 0.0070227, throughput 2.81495K wps
[Epoch 8 Batch 810/1540] avg loss 0.00707117, throughput 2.85642K wps
[Epoch 8 Batch 840/1540] avg loss 0.00674418, throughput 2.87021K wps
[Epoch 8 Batch 870/1540] avg loss 0.00707607, throughput 2.87595K wps
[Epoch 8 Batch 900/1540] avg loss 0.00675896, throughput 2.86891K wps
[Epoch 8 Batch 930/1540] avg loss 0.00669084, throughput 2.87824K wps
[Epoch 8 Batch 960/1540] avg loss 0.00637316, throughput 2.86867K wps
[Epoch 8 Batch 990/1540] avg loss 0.00726551, throughput 2.86205K wps
[Epoch 8 Batch 1020/1540] avg loss 0.00734874, throughput 2.78227K wps
[Epoch 8 Batch 1050/1540] avg loss 0.00646123, throughput 2.85748K wps
[Epoch 8 Batch 1080/1540] avg loss 0.0065004, throughput 2.87453K wps
[Epoch 8 Batch 1110/1540] avg loss 0.00704716, throughput 2.87549K wps
[Epoch 8 Batch 1140/1540] avg loss 0.00704922, throughput 2.87203K wps
[Epoch 8 Batch 1170/1540] avg loss 0.00695946, throughput 2.86819K wps
[Epoch 8 Batch 1200/1540] avg loss 0.00747787, throughput 2.85317K wps
[Epoch 8 Batch 1230/1540] avg loss 0.00705022, throughput 2.78019K wps
[Epoch 8 Batch 1260/1540] avg loss 0.00652246, throughput 2.81089K wps
[Epoch 8 Batch 1290/1540] avg loss 0.00654109, throughput 2.80301K wps
[Epoch 8 Batch 1320/1540] avg loss 0.00685872, throughput 2.88059K wps
[Epoch 8 Batch 1350/1540] avg loss 0.0066386, throughput 2.85119K wps
[Epoch 8 Batch 1380/1540] avg loss 0.00672991, throughput 2.87041K wps
[Epoch 8 Batch 1410/1540] avg loss 0.00642001, throughput 2.85193K wps
[Epoch 8 Batch 1440/1540] avg loss 0.00635343, throughput 2.8692K wps
[Epoch 8 Batch 1470/1540] avg loss 0.00715519, throughput 2.87862K wps
[Epoch 8 Batch 1500/1540] avg loss 0.00679608, throughput 2.86928K wps
[Epoch 8 Batch 1530/1540] avg loss 0.00666721, throughput 2.85659K wps
Begin Testing...
[Epoch 8] train avg loss 0.0069764, dev acc 0.8154, dev avg loss 0.419131, throughput 2.85368K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 9 Batch 30/1540] avg loss 0.00619578, throughput 2.88586K wps
[Epoch 9 Batch 60/1540] avg loss 0.00605075, throughput 2.86157K wps
[Epoch 9 Batch 90/1540] avg loss 0.00617871, throughput 2.86141K wps
[Epoch 9 Batch 120/1540] avg loss 0.00646453, throughput 2.8703K wps
[Epoch 9 Batch 150/1540] avg loss 0.00673407, throughput 2.87575K wps
[Epoch 9 Batch 180/1540] avg loss 0.00656423, throughput 2.83105K wps
[Epoch 9 Batch 210/1540] avg loss 0.00645156, throughput 2.87681K wps
[Epoch 9 Batch 240/1540] avg loss 0.00635163, throughput 2.85517K wps
[Epoch 9 Batch 270/1540] avg loss 0.00630716, throughput 2.84929K wps
[Epoch 9 Batch 300/1540] avg loss 0.00643651, throughput 2.84773K wps
[Epoch 9 Batch 330/1540] avg loss 0.00613593, throughput 2.86889K wps
[Epoch 9 Batch 360/1540] avg loss 0.00643492, throughput 2.87121K wps
[Epoch 9 Batch 390/1540] avg loss 0.00636353, throughput 2.80756K wps
[Epoch 9 Batch 420/1540] avg loss 0.0062237, throughput 2.84315K wps
[Epoch 9 Batch 450/1540] avg loss 0.00640715, throughput 2.873K wps
[Epoch 9 Batch 480/1540] avg loss 0.00652341, throughput 2.84392K wps
[Epoch 9 Batch 510/1540] avg loss 0.00642114, throughput 2.86765K wps
[Epoch 9 Batch 540/1540] avg loss 0.00618197, throughput 2.86862K wps
[Epoch 9 Batch 570/1540] avg loss 0.00655068, throughput 2.85865K wps
[Epoch 9 Batch 600/1540] avg loss 0.00626943, throughput 2.84815K wps
[Epoch 9 Batch 630/1540] avg loss 0.00605462, throughput 2.86374K wps
[Epoch 9 Batch 660/1540] avg loss 0.00659492, throughput 2.86841K wps
[Epoch 9 Batch 690/1540] avg loss 0.00614132, throughput 2.85061K wps
[Epoch 9 Batch 720/1540] avg loss 0.00642699, throughput 2.87782K wps
[Epoch 9 Batch 750/1540] avg loss 0.00595195, throughput 2.86046K wps
[Epoch 9 Batch 780/1540] avg loss 0.00627769, throughput 2.82313K wps
[Epoch 9 Batch 810/1540] avg loss 0.00628799, throughput 2.87666K wps
[Epoch 9 Batch 840/1540] avg loss 0.00583507, throughput 2.85026K wps
[Epoch 9 Batch 870/1540] avg loss 0.00636069, throughput 2.8064K wps
[Epoch 9 Batch 900/1540] avg loss 0.00613266, throughput 2.8593K wps
[Epoch 9 Batch 930/1540] avg loss 0.0065748, throughput 2.83976K wps
[Epoch 9 Batch 960/1540] avg loss 0.00600959, throughput 2.86833K wps
[Epoch 9 Batch 990/1540] avg loss 0.00654631, throughput 2.86886K wps
[Epoch 9 Batch 1020/1540] avg loss 0.00608594, throughput 2.85835K wps
[Epoch 9 Batch 1050/1540] avg loss 0.00638485, throughput 2.82088K wps
[Epoch 9 Batch 1080/1540] avg loss 0.00587679, throughput 2.84836K wps
[Epoch 9 Batch 1110/1540] avg loss 0.00592697, throughput 2.84355K wps
[Epoch 9 Batch 1140/1540] avg loss 0.00603676, throughput 2.861K wps
[Epoch 9 Batch 1170/1540] avg loss 0.00620328, throughput 2.80242K wps
[Epoch 9 Batch 1200/1540] avg loss 0.00596717, throughput 2.84354K wps
[Epoch 9 Batch 1230/1540] avg loss 0.00649963, throughput 2.85948K wps
[Epoch 9 Batch 1260/1540] avg loss 0.00661678, throughput 2.85709K wps
[Epoch 9 Batch 1290/1540] avg loss 0.00619676, throughput 2.85552K wps
[Epoch 9 Batch 1320/1540] avg loss 0.00608795, throughput 2.8648K wps
[Epoch 9 Batch 1350/1540] avg loss 0.00603408, throughput 2.86484K wps
[Epoch 9 Batch 1380/1540] avg loss 0.00601827, throughput 2.86878K wps
[Epoch 9 Batch 1410/1540] avg loss 0.00602206, throughput 2.87319K wps
[Epoch 9 Batch 1440/1540] avg loss 0.00625086, throughput 2.87469K wps
[Epoch 9 Batch 1470/1540] avg loss 0.00580507, throughput 2.85548K wps
[Epoch 9 Batch 1500/1540] avg loss 0.00612506, throughput 2.87051K wps
[Epoch 9 Batch 1530/1540] avg loss 0.00595437, throughput 2.82238K wps
Begin Testing...
[Epoch 9] train avg loss 0.00625529, dev acc 0.8085, dev avg loss 0.417866, throughput 2.8552K wps
[Epoch 10 Batch 30/1540] avg loss 0.00560278, throughput 2.86309K wps
[Epoch 10 Batch 60/1540] avg loss 0.00576539, throughput 2.7933K wps
[Epoch 10 Batch 90/1540] avg loss 0.00591446, throughput 2.80802K wps
[Epoch 10 Batch 120/1540] avg loss 0.00560873, throughput 2.81841K wps
[Epoch 10 Batch 150/1540] avg loss 0.00552855, throughput 2.83278K wps
[Epoch 10 Batch 180/1540] avg loss 0.00532143, throughput 2.85648K wps
[Epoch 10 Batch 210/1540] avg loss 0.00561921, throughput 2.85415K wps
[Epoch 10 Batch 240/1540] avg loss 0.00606827, throughput 2.87784K wps
[Epoch 10 Batch 270/1540] avg loss 0.00598494, throughput 2.824K wps
[Epoch 10 Batch 300/1540] avg loss 0.00540564, throughput 2.83927K wps
[Epoch 10 Batch 330/1540] avg loss 0.00539115, throughput 2.8692K wps
[Epoch 10 Batch 360/1540] avg loss 0.00580298, throughput 2.88198K wps
[Epoch 10 Batch 390/1540] avg loss 0.00586173, throughput 2.83944K wps
[Epoch 10 Batch 420/1540] avg loss 0.00576411, throughput 2.85458K wps
[Epoch 10 Batch 450/1540] avg loss 0.00568409, throughput 2.81329K wps
[Epoch 10 Batch 480/1540] avg loss 0.0059675, throughput 2.86029K wps
[Epoch 10 Batch 510/1540] avg loss 0.00548594, throughput 2.86642K wps
[Epoch 10 Batch 540/1540] avg loss 0.00594244, throughput 2.79066K wps
[Epoch 10 Batch 570/1540] avg loss 0.00568997, throughput 2.80611K wps
[Epoch 10 Batch 600/1540] avg loss 0.00594787, throughput 2.85608K wps
[Epoch 10 Batch 630/1540] avg loss 0.00653243, throughput 2.86725K wps
[Epoch 10 Batch 660/1540] avg loss 0.00568763, throughput 2.86858K wps
[Epoch 10 Batch 690/1540] avg loss 0.00546361, throughput 2.87145K wps
[Epoch 10 Batch 720/1540] avg loss 0.0057613, throughput 2.8699K wps
[Epoch 10 Batch 750/1540] avg loss 0.0055321, throughput 2.86078K wps
[Epoch 10 Batch 780/1540] avg loss 0.00585637, throughput 2.84849K wps
[Epoch 10 Batch 810/1540] avg loss 0.00540297, throughput 2.87501K wps
[Epoch 10 Batch 840/1540] avg loss 0.00549143, throughput 2.87859K wps
[Epoch 10 Batch 870/1540] avg loss 0.00587356, throughput 2.85276K wps
[Epoch 10 Batch 900/1540] avg loss 0.00607192, throughput 2.83686K wps
[Epoch 10 Batch 930/1540] avg loss 0.00589352, throughput 2.86395K wps
[Epoch 10 Batch 960/1540] avg loss 0.00573896, throughput 2.87819K wps
[Epoch 10 Batch 990/1540] avg loss 0.0059876, throughput 2.87876K wps
[Epoch 10 Batch 1020/1540] avg loss 0.00566614, throughput 2.8719K wps
[Epoch 10 Batch 1050/1540] avg loss 0.00580399, throughput 2.85761K wps
[Epoch 10 Batch 1080/1540] avg loss 0.00627715, throughput 2.85497K wps
[Epoch 10 Batch 1110/1540] avg loss 0.00526307, throughput 2.8603K wps
[Epoch 10 Batch 1140/1540] avg loss 0.00545778, throughput 2.8599K wps
[Epoch 10 Batch 1170/1540] avg loss 0.00545641, throughput 2.87173K wps
[Epoch 10 Batch 1200/1540] avg loss 0.00546712, throughput 2.87661K wps
[Epoch 10 Batch 1230/1540] avg loss 0.00576475, throughput 2.84398K wps
[Epoch 10 Batch 1260/1540] avg loss 0.00554865, throughput 2.84947K wps
[Epoch 10 Batch 1290/1540] avg loss 0.00613281, throughput 2.86158K wps
[Epoch 10 Batch 1320/1540] avg loss 0.00559378, throughput 2.85915K wps
[Epoch 10 Batch 1350/1540] avg loss 0.00519544, throughput 2.8594K wps
[Epoch 10 Batch 1380/1540] avg loss 0.00518721, throughput 2.85886K wps
[Epoch 10 Batch 1410/1540] avg loss 0.00532817, throughput 2.86864K wps
[Epoch 10 Batch 1440/1540] avg loss 0.00539702, throughput 2.87141K wps
[Epoch 10 Batch 1470/1540] avg loss 0.00548314, throughput 2.86375K wps
[Epoch 10 Batch 1500/1540] avg loss 0.00576672, throughput 2.85622K wps
[Epoch 10 Batch 1530/1540] avg loss 0.00564908, throughput 2.85837K wps
Begin Testing...
[Epoch 10] train avg loss 0.00568935, dev acc 0.8268, dev avg loss 0.417476, throughput 2.85396K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 11 Batch 30/1540] avg loss 0.00529954, throughput 2.92189K wps
[Epoch 11 Batch 60/1540] avg loss 0.00544322, throughput 2.86505K wps
[Epoch 11 Batch 90/1540] avg loss 0.00502778, throughput 2.8135K wps
[Epoch 11 Batch 120/1540] avg loss 0.0053774, throughput 2.86688K wps
[Epoch 11 Batch 150/1540] avg loss 0.0054993, throughput 2.85092K wps
[Epoch 11 Batch 180/1540] avg loss 0.00511095, throughput 2.87244K wps
[Epoch 11 Batch 210/1540] avg loss 0.00577389, throughput 2.82656K wps
[Epoch 11 Batch 240/1540] avg loss 0.00503056, throughput 2.87752K wps
[Epoch 11 Batch 270/1540] avg loss 0.00527163, throughput 2.87603K wps
[Epoch 11 Batch 300/1540] avg loss 0.00486225, throughput 2.86389K wps
[Epoch 11 Batch 330/1540] avg loss 0.0053856, throughput 2.83982K wps
[Epoch 11 Batch 360/1540] avg loss 0.00505071, throughput 2.87728K wps
[Epoch 11 Batch 390/1540] avg loss 0.00544922, throughput 2.87923K wps
[Epoch 11 Batch 420/1540] avg loss 0.00586109, throughput 2.85331K wps
[Epoch 11 Batch 450/1540] avg loss 0.00548735, throughput 2.84435K wps
[Epoch 11 Batch 480/1540] avg loss 0.00539436, throughput 2.81618K wps
[Epoch 11 Batch 510/1540] avg loss 0.00549712, throughput 2.80748K wps
[Epoch 11 Batch 540/1540] avg loss 0.00525779, throughput 2.82319K wps
[Epoch 11 Batch 570/1540] avg loss 0.0049041, throughput 2.80543K wps
[Epoch 11 Batch 600/1540] avg loss 0.00467523, throughput 2.8282K wps
[Epoch 11 Batch 630/1540] avg loss 0.00520027, throughput 2.86484K wps
[Epoch 11 Batch 660/1540] avg loss 0.00535244, throughput 2.87777K wps
[Epoch 11 Batch 690/1540] avg loss 0.0050197, throughput 2.84967K wps
[Epoch 11 Batch 720/1540] avg loss 0.00491594, throughput 2.87223K wps
[Epoch 11 Batch 750/1540] avg loss 0.00531045, throughput 2.82622K wps
[Epoch 11 Batch 780/1540] avg loss 0.00504664, throughput 2.8625K wps
[Epoch 11 Batch 810/1540] avg loss 0.00499072, throughput 2.84293K wps
[Epoch 11 Batch 840/1540] avg loss 0.00484892, throughput 2.84938K wps
[Epoch 11 Batch 870/1540] avg loss 0.00493278, throughput 2.86382K wps
[Epoch 11 Batch 900/1540] avg loss 0.00548666, throughput 2.86781K wps
[Epoch 11 Batch 930/1540] avg loss 0.0053415, throughput 2.86029K wps
[Epoch 11 Batch 960/1540] avg loss 0.00524888, throughput 2.81201K wps
[Epoch 11 Batch 990/1540] avg loss 0.00536742, throughput 2.85772K wps
[Epoch 11 Batch 1020/1540] avg loss 0.00525738, throughput 2.82844K wps
[Epoch 11 Batch 1050/1540] avg loss 0.00481887, throughput 2.87807K wps
[Epoch 11 Batch 1080/1540] avg loss 0.00501868, throughput 2.80777K wps
[Epoch 11 Batch 1110/1540] avg loss 0.00503706, throughput 2.85932K wps
[Epoch 11 Batch 1140/1540] avg loss 0.00531201, throughput 2.86339K wps
[Epoch 11 Batch 1170/1540] avg loss 0.00537489, throughput 2.83131K wps
[Epoch 11 Batch 1200/1540] avg loss 0.00524029, throughput 2.788K wps
[Epoch 11 Batch 1230/1540] avg loss 0.00489996, throughput 2.82875K wps
[Epoch 11 Batch 1260/1540] avg loss 0.00506119, throughput 2.80727K wps
[Epoch 11 Batch 1290/1540] avg loss 0.00501068, throughput 2.80065K wps
[Epoch 11 Batch 1320/1540] avg loss 0.00530482, throughput 2.87433K wps
[Epoch 11 Batch 1350/1540] avg loss 0.00560463, throughput 2.85454K wps
[Epoch 11 Batch 1380/1540] avg loss 0.0053076, throughput 2.85734K wps
[Epoch 11 Batch 1410/1540] avg loss 0.00528334, throughput 2.86835K wps
[Epoch 11 Batch 1440/1540] avg loss 0.00551849, throughput 2.87039K wps
[Epoch 11 Batch 1470/1540] avg loss 0.0052075, throughput 2.86872K wps
[Epoch 11 Batch 1500/1540] avg loss 0.00519176, throughput 2.86183K wps
[Epoch 11 Batch 1530/1540] avg loss 0.00525325, throughput 2.86227K wps
Begin Testing...
[Epoch 11] train avg loss 0.00522733, dev acc 0.8154, dev avg loss 0.41487, throughput 2.84941K wps
[Epoch 12 Batch 30/1540] avg loss 0.00459408, throughput 2.85194K wps
[Epoch 12 Batch 60/1540] avg loss 0.00491037, throughput 2.86956K wps
[Epoch 12 Batch 90/1540] avg loss 0.00524657, throughput 2.8686K wps
[Epoch 12 Batch 120/1540] avg loss 0.00539493, throughput 2.85217K wps
[Epoch 12 Batch 150/1540] avg loss 0.00507731, throughput 2.8672K wps
[Epoch 12 Batch 180/1540] avg loss 0.00448498, throughput 2.86041K wps
[Epoch 12 Batch 210/1540] avg loss 0.00459914, throughput 2.85766K wps
[Epoch 12 Batch 240/1540] avg loss 0.00471152, throughput 2.7843K wps
[Epoch 12 Batch 270/1540] avg loss 0.00501468, throughput 2.79973K wps
[Epoch 12 Batch 300/1540] avg loss 0.00484297, throughput 2.87177K wps
[Epoch 12 Batch 330/1540] avg loss 0.00501018, throughput 2.79047K wps
[Epoch 12 Batch 360/1540] avg loss 0.00471138, throughput 2.87457K wps
[Epoch 12 Batch 390/1540] avg loss 0.0049432, throughput 2.87765K wps
[Epoch 12 Batch 420/1540] avg loss 0.00490452, throughput 2.87606K wps
[Epoch 12 Batch 450/1540] avg loss 0.00516997, throughput 2.84712K wps
[Epoch 12 Batch 480/1540] avg loss 0.00509852, throughput 2.84603K wps
[Epoch 12 Batch 510/1540] avg loss 0.00436913, throughput 2.82328K wps
[Epoch 12 Batch 540/1540] avg loss 0.00468779, throughput 2.8765K wps
[Epoch 12 Batch 570/1540] avg loss 0.00502387, throughput 2.83461K wps
[Epoch 12 Batch 600/1540] avg loss 0.0047895, throughput 2.79215K wps
[Epoch 12 Batch 630/1540] avg loss 0.00513222, throughput 2.85132K wps
[Epoch 12 Batch 660/1540] avg loss 0.00509643, throughput 2.84477K wps
[Epoch 12 Batch 690/1540] avg loss 0.00473696, throughput 2.8722K wps
[Epoch 12 Batch 720/1540] avg loss 0.00505605, throughput 2.81686K wps
[Epoch 12 Batch 750/1540] avg loss 0.00457419, throughput 2.87451K wps
[Epoch 12 Batch 780/1540] avg loss 0.0046172, throughput 2.85142K wps
[Epoch 12 Batch 810/1540] avg loss 0.00483807, throughput 2.8754K wps
[Epoch 12 Batch 840/1540] avg loss 0.00514774, throughput 2.82651K wps
[Epoch 12 Batch 870/1540] avg loss 0.00531144, throughput 2.85358K wps
[Epoch 12 Batch 900/1540] avg loss 0.00478178, throughput 2.83389K wps
[Epoch 12 Batch 930/1540] avg loss 0.00471039, throughput 2.87719K wps
[Epoch 12 Batch 960/1540] avg loss 0.00496347, throughput 2.83385K wps
[Epoch 12 Batch 990/1540] avg loss 0.0050525, throughput 2.79573K wps
[Epoch 12 Batch 1020/1540] avg loss 0.00468869, throughput 2.82976K wps
[Epoch 12 Batch 1050/1540] avg loss 0.00454686, throughput 2.87201K wps
[Epoch 12 Batch 1080/1540] avg loss 0.00469205, throughput 2.87248K wps
[Epoch 12 Batch 1110/1540] avg loss 0.00497689, throughput 2.87541K wps
[Epoch 12 Batch 1140/1540] avg loss 0.00445265, throughput 2.86966K wps
[Epoch 12 Batch 1170/1540] avg loss 0.00505778, throughput 2.8492K wps
[Epoch 12 Batch 1200/1540] avg loss 0.00482485, throughput 2.87481K wps
[Epoch 12 Batch 1230/1540] avg loss 0.00504717, throughput 2.81775K wps
[Epoch 12 Batch 1260/1540] avg loss 0.00473321, throughput 2.82531K wps
[Epoch 12 Batch 1290/1540] avg loss 0.00507508, throughput 2.87582K wps
[Epoch 12 Batch 1320/1540] avg loss 0.00442842, throughput 2.87338K wps
[Epoch 12 Batch 1350/1540] avg loss 0.00489596, throughput 2.86631K wps
[Epoch 12 Batch 1380/1540] avg loss 0.0044795, throughput 2.88059K wps
[Epoch 12 Batch 1410/1540] avg loss 0.00467905, throughput 2.85294K wps
[Epoch 12 Batch 1440/1540] avg loss 0.00440097, throughput 2.7771K wps
[Epoch 12 Batch 1470/1540] avg loss 0.00455162, throughput 2.81136K wps
[Epoch 12 Batch 1500/1540] avg loss 0.00475475, throughput 2.78996K wps
[Epoch 12 Batch 1530/1540] avg loss 0.00497422, throughput 2.84052K wps
Begin Testing...
[Epoch 12] train avg loss 0.00483987, dev acc 0.8200, dev avg loss 0.43082, throughput 2.84601K wps
[Epoch 13 Batch 30/1540] avg loss 0.00446236, throughput 2.83165K wps
[Epoch 13 Batch 60/1540] avg loss 0.00474609, throughput 2.7882K wps
[Epoch 13 Batch 90/1540] avg loss 0.00481771, throughput 2.78453K wps
[Epoch 13 Batch 120/1540] avg loss 0.00455366, throughput 2.82144K wps
[Epoch 13 Batch 150/1540] avg loss 0.00468424, throughput 2.82126K wps
[Epoch 13 Batch 180/1540] avg loss 0.00467242, throughput 2.82396K wps
[Epoch 13 Batch 210/1540] avg loss 0.00463527, throughput 2.81668K wps
[Epoch 13 Batch 240/1540] avg loss 0.00451349, throughput 2.8703K wps
[Epoch 13 Batch 270/1540] avg loss 0.00432513, throughput 2.82377K wps
[Epoch 13 Batch 300/1540] avg loss 0.00436745, throughput 2.84135K wps
[Epoch 13 Batch 330/1540] avg loss 0.00443646, throughput 2.86605K wps
[Epoch 13 Batch 360/1540] avg loss 0.0045516, throughput 2.87585K wps
[Epoch 13 Batch 390/1540] avg loss 0.00417121, throughput 2.83353K wps
[Epoch 13 Batch 420/1540] avg loss 0.00457967, throughput 2.83476K wps
[Epoch 13 Batch 450/1540] avg loss 0.00449326, throughput 2.86706K wps
[Epoch 13 Batch 480/1540] avg loss 0.00426371, throughput 2.86733K wps
[Epoch 13 Batch 510/1540] avg loss 0.00439008, throughput 2.88155K wps
[Epoch 13 Batch 540/1540] avg loss 0.00452512, throughput 2.82744K wps
[Epoch 13 Batch 570/1540] avg loss 0.00425974, throughput 2.87104K wps
[Epoch 13 Batch 600/1540] avg loss 0.00416599, throughput 2.87146K wps
[Epoch 13 Batch 630/1540] avg loss 0.00469689, throughput 2.87303K wps
[Epoch 13 Batch 660/1540] avg loss 0.00448656, throughput 2.87424K wps
[Epoch 13 Batch 690/1540] avg loss 0.00439577, throughput 2.8576K wps
[Epoch 13 Batch 720/1540] avg loss 0.00462097, throughput 2.8352K wps
[Epoch 13 Batch 750/1540] avg loss 0.00449958, throughput 2.86557K wps
[Epoch 13 Batch 780/1540] avg loss 0.00453113, throughput 2.79336K wps
[Epoch 13 Batch 810/1540] avg loss 0.00434875, throughput 2.84386K wps
[Epoch 13 Batch 840/1540] avg loss 0.00457832, throughput 2.87667K wps
[Epoch 13 Batch 870/1540] avg loss 0.00500019, throughput 2.85573K wps
[Epoch 13 Batch 900/1540] avg loss 0.00473433, throughput 2.85732K wps
[Epoch 13 Batch 930/1540] avg loss 0.00463568, throughput 2.87828K wps
[Epoch 13 Batch 960/1540] avg loss 0.00428502, throughput 2.85308K wps
[Epoch 13 Batch 990/1540] avg loss 0.0043615, throughput 2.82854K wps
[Epoch 13 Batch 1020/1540] avg loss 0.00450924, throughput 2.87002K wps
[Epoch 13 Batch 1050/1540] avg loss 0.00436647, throughput 2.8819K wps
[Epoch 13 Batch 1080/1540] avg loss 0.00451411, throughput 2.87603K wps
[Epoch 13 Batch 1110/1540] avg loss 0.00448811, throughput 2.84644K wps
[Epoch 13 Batch 1140/1540] avg loss 0.00484055, throughput 2.87198K wps
[Epoch 13 Batch 1170/1540] avg loss 0.0047167, throughput 2.87767K wps
[Epoch 13 Batch 1200/1540] avg loss 0.00501602, throughput 2.88106K wps
[Epoch 13 Batch 1230/1540] avg loss 0.00452643, throughput 2.87166K wps
[Epoch 13 Batch 1260/1540] avg loss 0.00454048, throughput 2.87546K wps
[Epoch 13 Batch 1290/1540] avg loss 0.00456253, throughput 2.88161K wps
[Epoch 13 Batch 1320/1540] avg loss 0.00476569, throughput 2.8565K wps
[Epoch 13 Batch 1350/1540] avg loss 0.00418505, throughput 2.85399K wps
[Epoch 13 Batch 1380/1540] avg loss 0.00445918, throughput 2.87694K wps
[Epoch 13 Batch 1410/1540] avg loss 0.00435116, throughput 2.87921K wps
[Epoch 13 Batch 1440/1540] avg loss 0.00479225, throughput 2.86933K wps
[Epoch 13 Batch 1470/1540] avg loss 0.00425588, throughput 2.8674K wps
[Epoch 13 Batch 1500/1540] avg loss 0.00433084, throughput 2.87503K wps
[Epoch 13 Batch 1530/1540] avg loss 0.00457583, throughput 2.87123K wps
Begin Testing...
[Epoch 13] train avg loss 0.00452305, dev acc 0.8131, dev avg loss 0.419574, throughput 2.85466K wps
[Epoch 14 Batch 30/1540] avg loss 0.00461691, throughput 2.90346K wps
[Epoch 14 Batch 60/1540] avg loss 0.00409082, throughput 2.86606K wps
[Epoch 14 Batch 90/1540] avg loss 0.00411367, throughput 2.87014K wps
[Epoch 14 Batch 120/1540] avg loss 0.00441572, throughput 2.84007K wps
[Epoch 14 Batch 150/1540] avg loss 0.00415351, throughput 2.85366K wps
[Epoch 14 Batch 180/1540] avg loss 0.004476, throughput 2.84757K wps
[Epoch 14 Batch 210/1540] avg loss 0.00440516, throughput 2.87513K wps
[Epoch 14 Batch 240/1540] avg loss 0.00403609, throughput 2.85843K wps
[Epoch 14 Batch 270/1540] avg loss 0.00436929, throughput 2.84186K wps
[Epoch 14 Batch 300/1540] avg loss 0.00429113, throughput 2.87756K wps
[Epoch 14 Batch 330/1540] avg loss 0.00443148, throughput 2.85188K wps
[Epoch 14 Batch 360/1540] avg loss 0.0043789, throughput 2.8415K wps
[Epoch 14 Batch 390/1540] avg loss 0.00430021, throughput 2.8736K wps
[Epoch 14 Batch 420/1540] avg loss 0.00399232, throughput 2.87152K wps
[Epoch 14 Batch 450/1540] avg loss 0.00451988, throughput 2.82441K wps
[Epoch 14 Batch 480/1540] avg loss 0.00399028, throughput 2.82411K wps
[Epoch 14 Batch 510/1540] avg loss 0.00442665, throughput 2.86096K wps
[Epoch 14 Batch 540/1540] avg loss 0.00434507, throughput 2.87596K wps
[Epoch 14 Batch 570/1540] avg loss 0.00463962, throughput 2.87929K wps
[Epoch 14 Batch 600/1540] avg loss 0.00411575, throughput 2.88231K wps
[Epoch 14 Batch 630/1540] avg loss 0.00417867, throughput 2.87889K wps
[Epoch 14 Batch 660/1540] avg loss 0.0045338, throughput 2.8693K wps
[Epoch 14 Batch 690/1540] avg loss 0.00399325, throughput 2.86181K wps
[Epoch 14 Batch 720/1540] avg loss 0.00423137, throughput 2.87301K wps
[Epoch 14 Batch 750/1540] avg loss 0.00413027, throughput 2.86344K wps
[Epoch 14 Batch 780/1540] avg loss 0.00404059, throughput 2.87574K wps
[Epoch 14 Batch 810/1540] avg loss 0.00409784, throughput 2.87673K wps
[Epoch 14 Batch 840/1540] avg loss 0.00402593, throughput 2.87102K wps
[Epoch 14 Batch 870/1540] avg loss 0.00402846, throughput 2.87835K wps
[Epoch 14 Batch 900/1540] avg loss 0.00413051, throughput 2.86132K wps
[Epoch 14 Batch 930/1540] avg loss 0.00425607, throughput 2.8608K wps
[Epoch 14 Batch 960/1540] avg loss 0.00422695, throughput 2.84248K wps
[Epoch 14 Batch 990/1540] avg loss 0.00441385, throughput 2.84117K wps
[Epoch 14 Batch 1020/1540] avg loss 0.00415427, throughput 2.87268K wps
[Epoch 14 Batch 1050/1540] avg loss 0.00376706, throughput 2.82777K wps
[Epoch 14 Batch 1080/1540] avg loss 0.00476164, throughput 2.87051K wps
[Epoch 14 Batch 1110/1540] avg loss 0.00441269, throughput 2.8621K wps
[Epoch 14 Batch 1140/1540] avg loss 0.00449428, throughput 2.80276K wps
[Epoch 14 Batch 1170/1540] avg loss 0.00418358, throughput 2.84116K wps
[Epoch 14 Batch 1200/1540] avg loss 0.00422666, throughput 2.87902K wps
[Epoch 14 Batch 1230/1540] avg loss 0.0043409, throughput 2.86527K wps
[Epoch 14 Batch 1260/1540] avg loss 0.00404391, throughput 2.82409K wps
[Epoch 14 Batch 1290/1540] avg loss 0.00451377, throughput 2.86621K wps
[Epoch 14 Batch 1320/1540] avg loss 0.0042835, throughput 2.83833K wps
[Epoch 14 Batch 1350/1540] avg loss 0.00434493, throughput 2.80139K wps
[Epoch 14 Batch 1380/1540] avg loss 0.00438983, throughput 2.8662K wps
[Epoch 14 Batch 1410/1540] avg loss 0.00423225, throughput 2.85795K wps
[Epoch 14 Batch 1440/1540] avg loss 0.00471089, throughput 2.81824K wps
[Epoch 14 Batch 1470/1540] avg loss 0.0042575, throughput 2.84146K wps
[Epoch 14 Batch 1500/1540] avg loss 0.00381145, throughput 2.87356K wps
[Epoch 14 Batch 1530/1540] avg loss 0.00411584, throughput 2.85227K wps
Begin Testing...
[Epoch 14] train avg loss 0.00426876, dev acc 0.8119, dev avg loss 0.431456, throughput 2.85742K wps
[Epoch 15 Batch 30/1540] avg loss 0.00407376, throughput 2.91374K wps
[Epoch 15 Batch 60/1540] avg loss 0.00401679, throughput 2.86123K wps
[Epoch 15 Batch 90/1540] avg loss 0.00392947, throughput 2.83942K wps
[Epoch 15 Batch 120/1540] avg loss 0.00422627, throughput 2.81369K wps
[Epoch 15 Batch 150/1540] avg loss 0.00453781, throughput 2.79111K wps
[Epoch 15 Batch 180/1540] avg loss 0.00383111, throughput 2.79639K wps
[Epoch 15 Batch 210/1540] avg loss 0.00426815, throughput 2.81802K wps
[Epoch 15 Batch 240/1540] avg loss 0.00367191, throughput 2.86546K wps
[Epoch 15 Batch 270/1540] avg loss 0.00390078, throughput 2.87507K wps
[Epoch 15 Batch 300/1540] avg loss 0.00391309, throughput 2.83758K wps
[Epoch 15 Batch 330/1540] avg loss 0.00372025, throughput 2.85992K wps
[Epoch 15 Batch 360/1540] avg loss 0.00388494, throughput 2.78417K wps
[Epoch 15 Batch 390/1540] avg loss 0.00373275, throughput 2.81732K wps
[Epoch 15 Batch 420/1540] avg loss 0.00396817, throughput 2.82379K wps
[Epoch 15 Batch 450/1540] avg loss 0.00436736, throughput 2.84744K wps
[Epoch 15 Batch 480/1540] avg loss 0.00373357, throughput 2.86786K wps
[Epoch 15 Batch 510/1540] avg loss 0.00368596, throughput 2.85527K wps
[Epoch 15 Batch 540/1540] avg loss 0.00371191, throughput 2.87065K wps
[Epoch 15 Batch 570/1540] avg loss 0.00422839, throughput 2.86757K wps
[Epoch 15 Batch 600/1540] avg loss 0.00409697, throughput 2.86764K wps
[Epoch 15 Batch 630/1540] avg loss 0.00381312, throughput 2.82558K wps
[Epoch 15 Batch 660/1540] avg loss 0.00431842, throughput 2.86827K wps
[Epoch 15 Batch 690/1540] avg loss 0.0037886, throughput 2.84364K wps
[Epoch 15 Batch 720/1540] avg loss 0.00409405, throughput 2.7893K wps
[Epoch 15 Batch 750/1540] avg loss 0.00368649, throughput 2.84533K wps
[Epoch 15 Batch 780/1540] avg loss 0.00435725, throughput 2.88253K wps
[Epoch 15 Batch 810/1540] avg loss 0.0039976, throughput 2.86638K wps
[Epoch 15 Batch 840/1540] avg loss 0.00391333, throughput 2.83815K wps
[Epoch 15 Batch 870/1540] avg loss 0.00444651, throughput 2.87223K wps
[Epoch 15 Batch 900/1540] avg loss 0.00426778, throughput 2.87275K wps
[Epoch 15 Batch 930/1540] avg loss 0.00473772, throughput 2.87182K wps
[Epoch 15 Batch 960/1540] avg loss 0.00413148, throughput 2.86173K wps
[Epoch 15 Batch 990/1540] avg loss 0.00388711, throughput 2.87103K wps
[Epoch 15 Batch 1020/1540] avg loss 0.00363337, throughput 2.8719K wps
[Epoch 15 Batch 1050/1540] avg loss 0.00394424, throughput 2.86909K wps
[Epoch 15 Batch 1080/1540] avg loss 0.00408659, throughput 2.86277K wps
[Epoch 15 Batch 1110/1540] avg loss 0.0036926, throughput 2.86438K wps
[Epoch 15 Batch 1140/1540] avg loss 0.00416388, throughput 2.85054K wps
[Epoch 15 Batch 1170/1540] avg loss 0.00405194, throughput 2.85722K wps
[Epoch 15 Batch 1200/1540] avg loss 0.00350064, throughput 2.87805K wps
[Epoch 15 Batch 1230/1540] avg loss 0.00421358, throughput 2.86051K wps
[Epoch 15 Batch 1260/1540] avg loss 0.00361744, throughput 2.87947K wps
[Epoch 15 Batch 1290/1540] avg loss 0.00420992, throughput 2.88045K wps
[Epoch 15 Batch 1320/1540] avg loss 0.00381196, throughput 2.85958K wps
[Epoch 15 Batch 1350/1540] avg loss 0.00409585, throughput 2.8782K wps
[Epoch 15 Batch 1380/1540] avg loss 0.00405585, throughput 2.87984K wps
[Epoch 15 Batch 1410/1540] avg loss 0.00439197, throughput 2.82591K wps
[Epoch 15 Batch 1440/1540] avg loss 0.00391647, throughput 2.82319K wps
[Epoch 15 Batch 1470/1540] avg loss 0.00429827, throughput 2.87291K wps
[Epoch 15 Batch 1500/1540] avg loss 0.00439724, throughput 2.88058K wps
[Epoch 15 Batch 1530/1540] avg loss 0.00413388, throughput 2.86962K wps
Begin Testing...
[Epoch 15] train avg loss 0.00402739, dev acc 0.8177, dev avg loss 0.427729, throughput 2.85364K wps
[Epoch 16 Batch 30/1540] avg loss 0.00364828, throughput 2.91324K wps
[Epoch 16 Batch 60/1540] avg loss 0.00346847, throughput 2.85303K wps
[Epoch 16 Batch 90/1540] avg loss 0.00373049, throughput 2.87388K wps
[Epoch 16 Batch 120/1540] avg loss 0.00393828, throughput 2.87238K wps
[Epoch 16 Batch 150/1540] avg loss 0.0035373, throughput 2.87733K wps
[Epoch 16 Batch 180/1540] avg loss 0.00401865, throughput 2.83449K wps
[Epoch 16 Batch 210/1540] avg loss 0.00394978, throughput 2.78224K wps
[Epoch 16 Batch 240/1540] avg loss 0.00388588, throughput 2.85971K wps
[Epoch 16 Batch 270/1540] avg loss 0.00357529, throughput 2.85854K wps
[Epoch 16 Batch 300/1540] avg loss 0.00413567, throughput 2.87997K wps
[Epoch 16 Batch 330/1540] avg loss 0.00358932, throughput 2.86505K wps
[Epoch 16 Batch 360/1540] avg loss 0.00350209, throughput 2.83764K wps
[Epoch 16 Batch 390/1540] avg loss 0.00366215, throughput 2.87282K wps
[Epoch 16 Batch 420/1540] avg loss 0.00393011, throughput 2.8481K wps
[Epoch 16 Batch 450/1540] avg loss 0.00378705, throughput 2.87037K wps
[Epoch 16 Batch 480/1540] avg loss 0.00404772, throughput 2.86988K wps
[Epoch 16 Batch 510/1540] avg loss 0.00393417, throughput 2.87401K wps
[Epoch 16 Batch 540/1540] avg loss 0.00400132, throughput 2.82079K wps
[Epoch 16 Batch 570/1540] avg loss 0.00382737, throughput 2.84271K wps
[Epoch 16 Batch 600/1540] avg loss 0.00365693, throughput 2.86856K wps
[Epoch 16 Batch 630/1540] avg loss 0.00367696, throughput 2.87466K wps
[Epoch 16 Batch 660/1540] avg loss 0.00396527, throughput 2.85173K wps
[Epoch 16 Batch 690/1540] avg loss 0.00398598, throughput 2.83263K wps
[Epoch 16 Batch 720/1540] avg loss 0.00357823, throughput 2.8294K wps
[Epoch 16 Batch 750/1540] avg loss 0.00361394, throughput 2.86624K wps
[Epoch 16 Batch 780/1540] avg loss 0.00360104, throughput 2.8646K wps
[Epoch 16 Batch 810/1540] avg loss 0.00364887, throughput 2.83687K wps
[Epoch 16 Batch 840/1540] avg loss 0.00380038, throughput 2.83233K wps
[Epoch 16 Batch 870/1540] avg loss 0.00367782, throughput 2.83244K wps
[Epoch 16 Batch 900/1540] avg loss 0.00383111, throughput 2.85035K wps
[Epoch 16 Batch 930/1540] avg loss 0.00394563, throughput 2.87646K wps
[Epoch 16 Batch 960/1540] avg loss 0.00400893, throughput 2.83558K wps
[Epoch 16 Batch 990/1540] avg loss 0.00376215, throughput 2.79682K wps
[Epoch 16 Batch 1020/1540] avg loss 0.00371723, throughput 2.79242K wps
[Epoch 16 Batch 1050/1540] avg loss 0.00377845, throughput 2.87491K wps
[Epoch 16 Batch 1080/1540] avg loss 0.0040392, throughput 2.84615K wps
[Epoch 16 Batch 1110/1540] avg loss 0.00410519, throughput 2.79365K wps
[Epoch 16 Batch 1140/1540] avg loss 0.00410984, throughput 2.84937K wps
[Epoch 16 Batch 1170/1540] avg loss 0.00360057, throughput 2.88165K wps
[Epoch 16 Batch 1200/1540] avg loss 0.00419759, throughput 2.87876K wps
[Epoch 16 Batch 1230/1540] avg loss 0.0038205, throughput 2.85643K wps
[Epoch 16 Batch 1260/1540] avg loss 0.00364247, throughput 2.87129K wps
[Epoch 16 Batch 1290/1540] avg loss 0.00401488, throughput 2.87944K wps
[Epoch 16 Batch 1320/1540] avg loss 0.00436021, throughput 2.84787K wps
[Epoch 16 Batch 1350/1540] avg loss 0.00362354, throughput 2.87069K wps
[Epoch 16 Batch 1380/1540] avg loss 0.0036643, throughput 2.86974K wps
[Epoch 16 Batch 1410/1540] avg loss 0.00433668, throughput 2.87245K wps
[Epoch 16 Batch 1440/1540] avg loss 0.00382339, throughput 2.87811K wps
[Epoch 16 Batch 1470/1540] avg loss 0.0038438, throughput 2.86514K wps
[Epoch 16 Batch 1500/1540] avg loss 0.0040651, throughput 2.8789K wps
[Epoch 16 Batch 1530/1540] avg loss 0.00382273, throughput 2.86611K wps
Begin Testing...
[Epoch 16] train avg loss 0.00383502, dev acc 0.8234, dev avg loss 0.435514, throughput 2.85508K wps
[Epoch 17 Batch 30/1540] avg loss 0.00319087, throughput 2.92501K wps
[Epoch 17 Batch 60/1540] avg loss 0.00332997, throughput 2.79723K wps
[Epoch 17 Batch 90/1540] avg loss 0.00377095, throughput 2.86783K wps
[Epoch 17 Batch 120/1540] avg loss 0.00342876, throughput 2.87935K wps
[Epoch 17 Batch 150/1540] avg loss 0.00361603, throughput 2.86129K wps
[Epoch 17 Batch 180/1540] avg loss 0.00398759, throughput 2.81541K wps
[Epoch 17 Batch 210/1540] avg loss 0.00365921, throughput 2.86253K wps
[Epoch 17 Batch 240/1540] avg loss 0.00333436, throughput 2.8684K wps
[Epoch 17 Batch 270/1540] avg loss 0.00319246, throughput 2.85613K wps
[Epoch 17 Batch 300/1540] avg loss 0.00355645, throughput 2.86481K wps
[Epoch 17 Batch 330/1540] avg loss 0.00378092, throughput 2.8106K wps
[Epoch 17 Batch 360/1540] avg loss 0.00339321, throughput 2.86744K wps
[Epoch 17 Batch 390/1540] avg loss 0.00355143, throughput 2.87489K wps
[Epoch 17 Batch 420/1540] avg loss 0.00338337, throughput 2.84091K wps
[Epoch 17 Batch 450/1540] avg loss 0.0034537, throughput 2.87476K wps
[Epoch 17 Batch 480/1540] avg loss 0.00350938, throughput 2.87454K wps
[Epoch 17 Batch 510/1540] avg loss 0.00372047, throughput 2.87112K wps
[Epoch 17 Batch 540/1540] avg loss 0.00341108, throughput 2.8482K wps
[Epoch 17 Batch 570/1540] avg loss 0.0034723, throughput 2.8272K wps
[Epoch 17 Batch 600/1540] avg loss 0.00364292, throughput 2.82127K wps
[Epoch 17 Batch 630/1540] avg loss 0.00350363, throughput 2.86217K wps
[Epoch 17 Batch 660/1540] avg loss 0.00402372, throughput 2.84594K wps
[Epoch 17 Batch 690/1540] avg loss 0.00364954, throughput 2.83362K wps
[Epoch 17 Batch 720/1540] avg loss 0.00374546, throughput 2.79015K wps
[Epoch 17 Batch 750/1540] avg loss 0.00352202, throughput 2.83113K wps
[Epoch 17 Batch 780/1540] avg loss 0.00363937, throughput 2.8698K wps
[Epoch 17 Batch 810/1540] avg loss 0.00400565, throughput 2.87314K wps
[Epoch 17 Batch 840/1540] avg loss 0.00355017, throughput 2.87075K wps
[Epoch 17 Batch 870/1540] avg loss 0.00372163, throughput 2.82721K wps
[Epoch 17 Batch 900/1540] avg loss 0.00393952, throughput 2.79891K wps
[Epoch 17 Batch 930/1540] avg loss 0.00345285, throughput 2.87991K wps
[Epoch 17 Batch 960/1540] avg loss 0.00373159, throughput 2.8686K wps
[Epoch 17 Batch 990/1540] avg loss 0.00373004, throughput 2.86531K wps
[Epoch 17 Batch 1020/1540] avg loss 0.00389974, throughput 2.84328K wps
[Epoch 17 Batch 1050/1540] avg loss 0.00369152, throughput 2.84203K wps
[Epoch 17 Batch 1080/1540] avg loss 0.00339621, throughput 2.86918K wps
[Epoch 17 Batch 1110/1540] avg loss 0.00371069, throughput 2.83962K wps