-
Notifications
You must be signed in to change notification settings - Fork 154
/
CR_non-static.log
9001 lines (9001 loc) · 517 KB
/
CR_non-static.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Namespace(batch_size=50, data_name='CR', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='non-static')
Use gpu0
maximum length (in tokens): 105
Done! Tokenizing Time=0.06s, #Sentences=3775
SentimentNet(
(embedding): Embedding(5343 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/62] avg loss 0.013445, throughput 0.471727K wps
[Epoch 0 Batch 60/62] avg loss 0.0131968, throughput 4.84626K wps
Begin Testing...
[Epoch 0] train avg loss 0.0135073, dev acc 0.6372, dev avg loss 0.655936, throughput 0.558127K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0130377, throughput 4.91513K wps
[Epoch 1 Batch 60/62] avg loss 0.0130932, throughput 4.81334K wps
Begin Testing...
[Epoch 1] train avg loss 0.0131987, dev acc 0.6372, dev avg loss 0.652218, throughput 4.87065K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0129942, throughput 4.95888K wps
[Epoch 2 Batch 60/62] avg loss 0.0129855, throughput 4.83108K wps
Begin Testing...
[Epoch 2] train avg loss 0.0131511, dev acc 0.6372, dev avg loss 0.647381, throughput 4.90005K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0128415, throughput 4.91883K wps
[Epoch 3 Batch 60/62] avg loss 0.0128294, throughput 4.84425K wps
Begin Testing...
[Epoch 3] train avg loss 0.0130041, dev acc 0.6372, dev avg loss 0.64255, throughput 4.88807K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0126786, throughput 4.96171K wps
[Epoch 4 Batch 60/62] avg loss 0.0127723, throughput 4.83611K wps
Begin Testing...
[Epoch 4] train avg loss 0.0128726, dev acc 0.6372, dev avg loss 0.638311, throughput 4.90359K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0126234, throughput 4.92492K wps
[Epoch 5 Batch 60/62] avg loss 0.0124364, throughput 4.82995K wps
Begin Testing...
[Epoch 5] train avg loss 0.012706, dev acc 0.6372, dev avg loss 0.634129, throughput 4.87534K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0124799, throughput 4.79373K wps
[Epoch 6 Batch 60/62] avg loss 0.0123978, throughput 4.68229K wps
Begin Testing...
[Epoch 6] train avg loss 0.0125991, dev acc 0.6372, dev avg loss 0.629978, throughput 4.74414K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0121903, throughput 4.37484K wps
[Epoch 7 Batch 60/62] avg loss 0.0123959, throughput 4.62811K wps
Begin Testing...
[Epoch 7] train avg loss 0.0124428, dev acc 0.6372, dev avg loss 0.624923, throughput 4.51607K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0122846, throughput 4.96264K wps
[Epoch 8 Batch 60/62] avg loss 0.0121282, throughput 4.79527K wps
Begin Testing...
[Epoch 8] train avg loss 0.0123774, dev acc 0.6372, dev avg loss 0.619865, throughput 4.88361K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0121771, throughput 4.95395K wps
[Epoch 9 Batch 60/62] avg loss 0.0119131, throughput 4.84895K wps
Begin Testing...
[Epoch 9] train avg loss 0.0121916, dev acc 0.6372, dev avg loss 0.614844, throughput 4.90712K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0120003, throughput 4.93726K wps
[Epoch 10 Batch 60/62] avg loss 0.0120468, throughput 4.82593K wps
Begin Testing...
[Epoch 10] train avg loss 0.0121914, dev acc 0.6490, dev avg loss 0.60958, throughput 4.88997K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0118796, throughput 4.94148K wps
[Epoch 11 Batch 60/62] avg loss 0.0117989, throughput 4.80922K wps
Begin Testing...
[Epoch 11] train avg loss 0.0119515, dev acc 0.6372, dev avg loss 0.604868, throughput 4.88105K wps
[Epoch 12 Batch 30/62] avg loss 0.0117181, throughput 4.9344K wps
[Epoch 12 Batch 60/62] avg loss 0.0116583, throughput 4.83048K wps
Begin Testing...
[Epoch 12] train avg loss 0.0118604, dev acc 0.6490, dev avg loss 0.598476, throughput 4.88592K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0115948, throughput 4.93865K wps
[Epoch 13 Batch 60/62] avg loss 0.0113715, throughput 4.84105K wps
Begin Testing...
[Epoch 13] train avg loss 0.0115705, dev acc 0.6490, dev avg loss 0.593178, throughput 4.89779K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0112728, throughput 4.95158K wps
[Epoch 14 Batch 60/62] avg loss 0.0111997, throughput 4.8281K wps
Begin Testing...
[Epoch 14] train avg loss 0.0113931, dev acc 0.6549, dev avg loss 0.585561, throughput 4.89568K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0113973, throughput 4.9172K wps
[Epoch 15 Batch 60/62] avg loss 0.0110716, throughput 4.85717K wps
Begin Testing...
[Epoch 15] train avg loss 0.0114255, dev acc 0.6962, dev avg loss 0.579586, throughput 4.89334K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0108623, throughput 4.96316K wps
[Epoch 16 Batch 60/62] avg loss 0.0113374, throughput 4.86042K wps
Begin Testing...
[Epoch 16] train avg loss 0.0112716, dev acc 0.7139, dev avg loss 0.573322, throughput 4.91661K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0108886, throughput 4.94509K wps
[Epoch 17 Batch 60/62] avg loss 0.0108646, throughput 4.83962K wps
Begin Testing...
[Epoch 17] train avg loss 0.0110222, dev acc 0.6991, dev avg loss 0.565283, throughput 4.89782K wps
[Epoch 18 Batch 30/62] avg loss 0.010731, throughput 4.96271K wps
[Epoch 18 Batch 60/62] avg loss 0.0106331, throughput 4.83561K wps
Begin Testing...
[Epoch 18] train avg loss 0.0108652, dev acc 0.7139, dev avg loss 0.561672, throughput 4.90353K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0105682, throughput 4.94056K wps
[Epoch 19 Batch 60/62] avg loss 0.0103771, throughput 4.82665K wps
Begin Testing...
[Epoch 19] train avg loss 0.0106028, dev acc 0.7168, dev avg loss 0.550879, throughput 4.88809K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0102601, throughput 4.91329K wps
[Epoch 20 Batch 60/62] avg loss 0.0103047, throughput 4.82117K wps
Begin Testing...
[Epoch 20] train avg loss 0.0104176, dev acc 0.7227, dev avg loss 0.543837, throughput 4.87274K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0100937, throughput 4.93931K wps
[Epoch 21 Batch 60/62] avg loss 0.0100922, throughput 4.84118K wps
Begin Testing...
[Epoch 21] train avg loss 0.0102422, dev acc 0.7345, dev avg loss 0.537139, throughput 4.8972K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.00992095, throughput 4.94586K wps
[Epoch 22 Batch 60/62] avg loss 0.00988475, throughput 4.82724K wps
Begin Testing...
[Epoch 22] train avg loss 0.0100151, dev acc 0.7198, dev avg loss 0.530422, throughput 4.89234K wps
[Epoch 23 Batch 30/62] avg loss 0.00997493, throughput 4.95337K wps
[Epoch 23 Batch 60/62] avg loss 0.00965959, throughput 4.83017K wps
Begin Testing...
[Epoch 23] train avg loss 0.00998559, dev acc 0.7227, dev avg loss 0.524708, throughput 4.89973K wps
[Epoch 24 Batch 30/62] avg loss 0.00972962, throughput 4.94553K wps
[Epoch 24 Batch 60/62] avg loss 0.00943127, throughput 4.83315K wps
Begin Testing...
[Epoch 24] train avg loss 0.00968126, dev acc 0.7434, dev avg loss 0.517007, throughput 4.89504K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.00953306, throughput 4.93025K wps
[Epoch 25 Batch 60/62] avg loss 0.00934566, throughput 4.80385K wps
Begin Testing...
[Epoch 25] train avg loss 0.00956199, dev acc 0.7286, dev avg loss 0.510836, throughput 4.87335K wps
[Epoch 26 Batch 30/62] avg loss 0.00907633, throughput 4.9455K wps
[Epoch 26 Batch 60/62] avg loss 0.00952389, throughput 4.84178K wps
Begin Testing...
[Epoch 26] train avg loss 0.00938982, dev acc 0.7345, dev avg loss 0.505575, throughput 4.89909K wps
[Epoch 27 Batch 30/62] avg loss 0.00928468, throughput 4.94761K wps
[Epoch 27 Batch 60/62] avg loss 0.00899859, throughput 4.8399K wps
Begin Testing...
[Epoch 27] train avg loss 0.00923328, dev acc 0.7434, dev avg loss 0.499525, throughput 4.89968K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.00892942, throughput 4.94443K wps
[Epoch 28 Batch 60/62] avg loss 0.00905691, throughput 4.84321K wps
Begin Testing...
[Epoch 28] train avg loss 0.00906026, dev acc 0.7404, dev avg loss 0.494729, throughput 4.90138K wps
[Epoch 29 Batch 30/62] avg loss 0.00882908, throughput 4.9525K wps
[Epoch 29 Batch 60/62] avg loss 0.0088607, throughput 4.85042K wps
Begin Testing...
[Epoch 29] train avg loss 0.00898893, dev acc 0.7434, dev avg loss 0.490111, throughput 4.90763K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00874423, throughput 4.98039K wps
[Epoch 30 Batch 60/62] avg loss 0.00839061, throughput 4.86983K wps
Begin Testing...
[Epoch 30] train avg loss 0.00865087, dev acc 0.7434, dev avg loss 0.485349, throughput 4.93074K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00830643, throughput 4.97675K wps
[Epoch 31 Batch 60/62] avg loss 0.00873809, throughput 4.87517K wps
Begin Testing...
[Epoch 31] train avg loss 0.0086694, dev acc 0.7375, dev avg loss 0.482564, throughput 4.93307K wps
[Epoch 32 Batch 30/62] avg loss 0.00835041, throughput 4.96316K wps
[Epoch 32 Batch 60/62] avg loss 0.0083072, throughput 4.82533K wps
Begin Testing...
[Epoch 32] train avg loss 0.00840796, dev acc 0.7404, dev avg loss 0.477304, throughput 4.89862K wps
[Epoch 33 Batch 30/62] avg loss 0.0081551, throughput 4.92545K wps
[Epoch 33 Batch 60/62] avg loss 0.00829042, throughput 4.83263K wps
Begin Testing...
[Epoch 33] train avg loss 0.00833375, dev acc 0.7670, dev avg loss 0.472417, throughput 4.88726K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.00809826, throughput 4.92582K wps
[Epoch 34 Batch 60/62] avg loss 0.00823976, throughput 4.82162K wps
Begin Testing...
[Epoch 34] train avg loss 0.00821343, dev acc 0.7375, dev avg loss 0.471272, throughput 4.88108K wps
[Epoch 35 Batch 30/62] avg loss 0.00789119, throughput 4.93705K wps
[Epoch 35 Batch 60/62] avg loss 0.00797774, throughput 4.82862K wps
Begin Testing...
[Epoch 35] train avg loss 0.00812172, dev acc 0.7581, dev avg loss 0.465611, throughput 4.88903K wps
[Epoch 36 Batch 30/62] avg loss 0.00769126, throughput 4.93575K wps
[Epoch 36 Batch 60/62] avg loss 0.00796227, throughput 4.82045K wps
Begin Testing...
[Epoch 36] train avg loss 0.00790419, dev acc 0.7434, dev avg loss 0.465442, throughput 4.88425K wps
[Epoch 37 Batch 30/62] avg loss 0.00782121, throughput 4.93549K wps
[Epoch 37 Batch 60/62] avg loss 0.00761437, throughput 4.81188K wps
Begin Testing...
[Epoch 37] train avg loss 0.00779516, dev acc 0.7434, dev avg loss 0.46251, throughput 4.87863K wps
[Epoch 38 Batch 30/62] avg loss 0.00744394, throughput 4.92002K wps
[Epoch 38 Batch 60/62] avg loss 0.00772991, throughput 4.80913K wps
Begin Testing...
[Epoch 38] train avg loss 0.00768352, dev acc 0.7788, dev avg loss 0.459294, throughput 4.87127K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.00737345, throughput 4.97096K wps
[Epoch 39 Batch 60/62] avg loss 0.007457, throughput 4.86727K wps
Begin Testing...
[Epoch 39] train avg loss 0.00755137, dev acc 0.7758, dev avg loss 0.453665, throughput 4.92608K wps
[Epoch 40 Batch 30/62] avg loss 0.00737379, throughput 4.9837K wps
[Epoch 40 Batch 60/62] avg loss 0.00726226, throughput 4.87156K wps
Begin Testing...
[Epoch 40] train avg loss 0.00738417, dev acc 0.7906, dev avg loss 0.451054, throughput 4.93344K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.00715502, throughput 4.98009K wps
[Epoch 41 Batch 60/62] avg loss 0.00745742, throughput 4.861K wps
Begin Testing...
[Epoch 41] train avg loss 0.00738155, dev acc 0.7611, dev avg loss 0.452702, throughput 4.92783K wps
[Epoch 42 Batch 30/62] avg loss 0.00702534, throughput 4.98742K wps
[Epoch 42 Batch 60/62] avg loss 0.00747432, throughput 4.8624K wps
Begin Testing...
[Epoch 42] train avg loss 0.0073331, dev acc 0.7847, dev avg loss 0.446736, throughput 4.92998K wps
[Epoch 43 Batch 30/62] avg loss 0.00692755, throughput 4.98862K wps
[Epoch 43 Batch 60/62] avg loss 0.00695682, throughput 4.88248K wps
Begin Testing...
[Epoch 43] train avg loss 0.00700193, dev acc 0.7699, dev avg loss 0.445013, throughput 4.94176K wps
[Epoch 44 Batch 30/62] avg loss 0.00670062, throughput 4.9822K wps
[Epoch 44 Batch 60/62] avg loss 0.00712595, throughput 4.88549K wps
Begin Testing...
[Epoch 44] train avg loss 0.00699366, dev acc 0.7817, dev avg loss 0.44334, throughput 4.94104K wps
[Epoch 45 Batch 30/62] avg loss 0.00686528, throughput 4.97675K wps
[Epoch 45 Batch 60/62] avg loss 0.00677299, throughput 4.83285K wps
Begin Testing...
[Epoch 45] train avg loss 0.0068584, dev acc 0.7729, dev avg loss 0.440649, throughput 4.90872K wps
[Epoch 46 Batch 30/62] avg loss 0.00640212, throughput 4.95008K wps
[Epoch 46 Batch 60/62] avg loss 0.00668036, throughput 4.81475K wps
Begin Testing...
[Epoch 46] train avg loss 0.0066247, dev acc 0.7876, dev avg loss 0.438343, throughput 4.88771K wps
[Epoch 47 Batch 30/62] avg loss 0.00660311, throughput 4.93405K wps
[Epoch 47 Batch 60/62] avg loss 0.00646501, throughput 4.81428K wps
Begin Testing...
[Epoch 47] train avg loss 0.00659378, dev acc 0.7788, dev avg loss 0.436129, throughput 4.8808K wps
[Epoch 48 Batch 30/62] avg loss 0.00639406, throughput 4.91493K wps
[Epoch 48 Batch 60/62] avg loss 0.00637038, throughput 4.80639K wps
Begin Testing...
[Epoch 48] train avg loss 0.00645187, dev acc 0.7758, dev avg loss 0.436345, throughput 4.86633K wps
[Epoch 49 Batch 30/62] avg loss 0.0065051, throughput 4.8828K wps
[Epoch 49 Batch 60/62] avg loss 0.00611506, throughput 4.8076K wps
Begin Testing...
[Epoch 49] train avg loss 0.0064, dev acc 0.7906, dev avg loss 0.431317, throughput 4.85318K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00648552, throughput 4.92371K wps
[Epoch 50 Batch 60/62] avg loss 0.00620265, throughput 4.8164K wps
Begin Testing...
[Epoch 50] train avg loss 0.00638058, dev acc 0.7788, dev avg loss 0.435056, throughput 4.87537K wps
[Epoch 51 Batch 30/62] avg loss 0.00615561, throughput 4.9197K wps
[Epoch 51 Batch 60/62] avg loss 0.00605422, throughput 4.81603K wps
Begin Testing...
[Epoch 51] train avg loss 0.00617264, dev acc 0.7906, dev avg loss 0.42798, throughput 4.87347K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00608989, throughput 4.92474K wps
[Epoch 52 Batch 60/62] avg loss 0.00612328, throughput 4.81134K wps
Begin Testing...
[Epoch 52] train avg loss 0.0061573, dev acc 0.7906, dev avg loss 0.426614, throughput 4.87399K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.00613385, throughput 4.91569K wps
[Epoch 53 Batch 60/62] avg loss 0.00584593, throughput 4.82171K wps
Begin Testing...
[Epoch 53] train avg loss 0.00609652, dev acc 0.7817, dev avg loss 0.426982, throughput 4.87503K wps
[Epoch 54 Batch 30/62] avg loss 0.00585307, throughput 4.935K wps
[Epoch 54 Batch 60/62] avg loss 0.00584116, throughput 4.82292K wps
Begin Testing...
[Epoch 54] train avg loss 0.00590052, dev acc 0.7847, dev avg loss 0.426442, throughput 4.88429K wps
[Epoch 55 Batch 30/62] avg loss 0.00565125, throughput 4.93897K wps
[Epoch 55 Batch 60/62] avg loss 0.00573727, throughput 4.8622K wps
Begin Testing...
[Epoch 55] train avg loss 0.00583393, dev acc 0.7935, dev avg loss 0.424057, throughput 4.90746K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00569146, throughput 4.93934K wps
[Epoch 56 Batch 60/62] avg loss 0.00553693, throughput 4.84011K wps
Begin Testing...
[Epoch 56] train avg loss 0.00570647, dev acc 0.7876, dev avg loss 0.423241, throughput 4.89687K wps
[Epoch 57 Batch 30/62] avg loss 0.00545584, throughput 4.97103K wps
[Epoch 57 Batch 60/62] avg loss 0.00585332, throughput 4.88425K wps
Begin Testing...
[Epoch 57] train avg loss 0.00571544, dev acc 0.7847, dev avg loss 0.420496, throughput 4.93435K wps
[Epoch 58 Batch 30/62] avg loss 0.00543689, throughput 4.98044K wps
[Epoch 58 Batch 60/62] avg loss 0.0054878, throughput 4.88719K wps
Begin Testing...
[Epoch 58] train avg loss 0.00549895, dev acc 0.7935, dev avg loss 0.417901, throughput 4.93885K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/62] avg loss 0.00531911, throughput 5.0013K wps
[Epoch 59 Batch 60/62] avg loss 0.00544832, throughput 4.8692K wps
Begin Testing...
[Epoch 59] train avg loss 0.00544736, dev acc 0.7935, dev avg loss 0.416824, throughput 4.94126K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/62] avg loss 0.00504661, throughput 4.98266K wps
[Epoch 60 Batch 60/62] avg loss 0.00539697, throughput 4.89258K wps
Begin Testing...
[Epoch 60] train avg loss 0.00529213, dev acc 0.7906, dev avg loss 0.416071, throughput 4.94371K wps
[Epoch 61 Batch 30/62] avg loss 0.00530804, throughput 4.98295K wps
[Epoch 61 Batch 60/62] avg loss 0.00508263, throughput 4.83853K wps
Begin Testing...
[Epoch 61] train avg loss 0.00522678, dev acc 0.7994, dev avg loss 0.414187, throughput 4.91503K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/62] avg loss 0.00504957, throughput 4.93522K wps
[Epoch 62 Batch 60/62] avg loss 0.00510101, throughput 4.80079K wps
Begin Testing...
[Epoch 62] train avg loss 0.00520337, dev acc 0.7935, dev avg loss 0.414048, throughput 4.87308K wps
[Epoch 63 Batch 30/62] avg loss 0.00488011, throughput 4.93389K wps
[Epoch 63 Batch 60/62] avg loss 0.0052202, throughput 4.80206K wps
Begin Testing...
[Epoch 63] train avg loss 0.00515046, dev acc 0.7906, dev avg loss 0.412618, throughput 4.87438K wps
[Epoch 64 Batch 30/62] avg loss 0.00483845, throughput 4.92132K wps
[Epoch 64 Batch 60/62] avg loss 0.00502515, throughput 4.81619K wps
Begin Testing...
[Epoch 64] train avg loss 0.00495068, dev acc 0.7935, dev avg loss 0.412102, throughput 4.87568K wps
[Epoch 65 Batch 30/62] avg loss 0.00481287, throughput 4.9322K wps
[Epoch 65 Batch 60/62] avg loss 0.00490264, throughput 4.82885K wps
Begin Testing...
[Epoch 65] train avg loss 0.00496641, dev acc 0.8053, dev avg loss 0.418704, throughput 4.88616K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/62] avg loss 0.00467902, throughput 4.9391K wps
[Epoch 66 Batch 60/62] avg loss 0.00470211, throughput 4.82495K wps
Begin Testing...
[Epoch 66] train avg loss 0.00474762, dev acc 0.7965, dev avg loss 0.411497, throughput 4.88862K wps
[Epoch 67 Batch 30/62] avg loss 0.0047562, throughput 4.92613K wps
[Epoch 67 Batch 60/62] avg loss 0.00462049, throughput 4.8139K wps
Begin Testing...
[Epoch 67] train avg loss 0.00471963, dev acc 0.7994, dev avg loss 0.410209, throughput 4.87687K wps
[Epoch 68 Batch 30/62] avg loss 0.00445426, throughput 4.93308K wps
[Epoch 68 Batch 60/62] avg loss 0.0046298, throughput 4.84473K wps
Begin Testing...
[Epoch 68] train avg loss 0.00459832, dev acc 0.7935, dev avg loss 0.408507, throughput 4.89701K wps
[Epoch 69 Batch 30/62] avg loss 0.00464487, throughput 4.99189K wps
[Epoch 69 Batch 60/62] avg loss 0.00429537, throughput 4.86371K wps
Begin Testing...
[Epoch 69] train avg loss 0.00452614, dev acc 0.7994, dev avg loss 0.407721, throughput 4.93426K wps
[Epoch 70 Batch 30/62] avg loss 0.00426489, throughput 4.97545K wps
[Epoch 70 Batch 60/62] avg loss 0.00444868, throughput 4.82434K wps
Begin Testing...
[Epoch 70] train avg loss 0.00437552, dev acc 0.8024, dev avg loss 0.408463, throughput 4.90606K wps
[Epoch 71 Batch 30/62] avg loss 0.00425345, throughput 4.95247K wps
[Epoch 71 Batch 60/62] avg loss 0.00432024, throughput 4.8293K wps
Begin Testing...
[Epoch 71] train avg loss 0.00436143, dev acc 0.7965, dev avg loss 0.406426, throughput 4.89527K wps
[Epoch 72 Batch 30/62] avg loss 0.00420103, throughput 4.94874K wps
[Epoch 72 Batch 60/62] avg loss 0.00427249, throughput 4.87087K wps
Begin Testing...
[Epoch 72] train avg loss 0.00431095, dev acc 0.7994, dev avg loss 0.41164, throughput 4.91802K wps
[Epoch 73 Batch 30/62] avg loss 0.00420223, throughput 4.97549K wps
[Epoch 73 Batch 60/62] avg loss 0.00418222, throughput 4.86391K wps
Begin Testing...
[Epoch 73] train avg loss 0.00425525, dev acc 0.7965, dev avg loss 0.405059, throughput 4.926K wps
[Epoch 74 Batch 30/62] avg loss 0.00413097, throughput 4.96214K wps
[Epoch 74 Batch 60/62] avg loss 0.00404736, throughput 4.8315K wps
Begin Testing...
[Epoch 74] train avg loss 0.0041412, dev acc 0.7965, dev avg loss 0.404888, throughput 4.90268K wps
[Epoch 75 Batch 30/62] avg loss 0.00389628, throughput 4.9392K wps
[Epoch 75 Batch 60/62] avg loss 0.0043514, throughput 4.84072K wps
Begin Testing...
[Epoch 75] train avg loss 0.00413485, dev acc 0.7994, dev avg loss 0.404551, throughput 4.89702K wps
[Epoch 76 Batch 30/62] avg loss 0.0039169, throughput 4.9499K wps
[Epoch 76 Batch 60/62] avg loss 0.00400583, throughput 4.80024K wps
Begin Testing...
[Epoch 76] train avg loss 0.00399436, dev acc 0.7994, dev avg loss 0.403852, throughput 4.881K wps
[Epoch 77 Batch 30/62] avg loss 0.00402453, throughput 4.9366K wps
[Epoch 77 Batch 60/62] avg loss 0.00374487, throughput 4.82652K wps
Begin Testing...
[Epoch 77] train avg loss 0.00396336, dev acc 0.8053, dev avg loss 0.402713, throughput 4.88698K wps
Observed Improvement.
Begin Testing...
[Epoch 78 Batch 30/62] avg loss 0.00372348, throughput 4.92524K wps
[Epoch 78 Batch 60/62] avg loss 0.0039152, throughput 4.81877K wps
Begin Testing...
[Epoch 78] train avg loss 0.00383342, dev acc 0.8024, dev avg loss 0.403372, throughput 4.87807K wps
[Epoch 79 Batch 30/62] avg loss 0.00360691, throughput 4.9172K wps
[Epoch 79 Batch 60/62] avg loss 0.00380993, throughput 4.81862K wps
Begin Testing...
[Epoch 79] train avg loss 0.00377186, dev acc 0.8024, dev avg loss 0.401945, throughput 4.87375K wps
[Epoch 80 Batch 30/62] avg loss 0.00363161, throughput 4.91291K wps
[Epoch 80 Batch 60/62] avg loss 0.00372225, throughput 4.83944K wps
Begin Testing...
[Epoch 80] train avg loss 0.00373056, dev acc 0.7965, dev avg loss 0.402562, throughput 4.88255K wps
[Epoch 81 Batch 30/62] avg loss 0.00358091, throughput 4.93753K wps
[Epoch 81 Batch 60/62] avg loss 0.00379092, throughput 4.85232K wps
Begin Testing...
[Epoch 81] train avg loss 0.0037086, dev acc 0.8112, dev avg loss 0.407002, throughput 4.90013K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/62] avg loss 0.00341959, throughput 4.93654K wps
[Epoch 82 Batch 60/62] avg loss 0.00362189, throughput 4.86557K wps
Begin Testing...
[Epoch 82] train avg loss 0.00354048, dev acc 0.8112, dev avg loss 0.402683, throughput 4.90737K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/62] avg loss 0.00362322, throughput 4.93393K wps
[Epoch 83 Batch 60/62] avg loss 0.00351045, throughput 4.83923K wps
Begin Testing...
[Epoch 83] train avg loss 0.00359019, dev acc 0.8053, dev avg loss 0.400379, throughput 4.89349K wps
[Epoch 84 Batch 30/62] avg loss 0.00348498, throughput 4.94791K wps
[Epoch 84 Batch 60/62] avg loss 0.00347922, throughput 4.82548K wps
Begin Testing...
[Epoch 84] train avg loss 0.00347214, dev acc 0.8083, dev avg loss 0.403754, throughput 4.89166K wps
[Epoch 85 Batch 30/62] avg loss 0.00357113, throughput 4.93768K wps
[Epoch 85 Batch 60/62] avg loss 0.00326591, throughput 4.82296K wps
Begin Testing...
[Epoch 85] train avg loss 0.00349396, dev acc 0.8171, dev avg loss 0.40066, throughput 4.88639K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/62] avg loss 0.0032253, throughput 4.95436K wps
[Epoch 86 Batch 60/62] avg loss 0.00351155, throughput 4.88762K wps
Begin Testing...
[Epoch 86] train avg loss 0.00343281, dev acc 0.8083, dev avg loss 0.399794, throughput 4.92818K wps
[Epoch 87 Batch 30/62] avg loss 0.00320541, throughput 4.97434K wps
[Epoch 87 Batch 60/62] avg loss 0.00344333, throughput 4.8698K wps
Begin Testing...
[Epoch 87] train avg loss 0.00341011, dev acc 0.7994, dev avg loss 0.399896, throughput 4.9276K wps
[Epoch 88 Batch 30/62] avg loss 0.0033472, throughput 5.00638K wps
[Epoch 88 Batch 60/62] avg loss 0.00313657, throughput 4.87398K wps
Begin Testing...
[Epoch 88] train avg loss 0.0032611, dev acc 0.8024, dev avg loss 0.401921, throughput 4.94487K wps
[Epoch 89 Batch 30/62] avg loss 0.00304958, throughput 4.95111K wps
[Epoch 89 Batch 60/62] avg loss 0.00313891, throughput 4.83242K wps
Begin Testing...
[Epoch 89] train avg loss 0.003154, dev acc 0.7994, dev avg loss 0.39916, throughput 4.89843K wps
[Epoch 90 Batch 30/62] avg loss 0.00319146, throughput 4.93611K wps
[Epoch 90 Batch 60/62] avg loss 0.0030831, throughput 4.82163K wps
Begin Testing...
[Epoch 90] train avg loss 0.00316246, dev acc 0.8053, dev avg loss 0.398907, throughput 4.88463K wps
[Epoch 91 Batch 30/62] avg loss 0.00284124, throughput 4.92377K wps
[Epoch 91 Batch 60/62] avg loss 0.00317324, throughput 4.80262K wps
Begin Testing...
[Epoch 91] train avg loss 0.00308219, dev acc 0.8171, dev avg loss 0.403204, throughput 4.86933K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/62] avg loss 0.00288564, throughput 4.87798K wps
[Epoch 92 Batch 60/62] avg loss 0.00306204, throughput 4.81344K wps
Begin Testing...
[Epoch 92] train avg loss 0.00300929, dev acc 0.8024, dev avg loss 0.39859, throughput 4.85389K wps
[Epoch 93 Batch 30/62] avg loss 0.00304639, throughput 4.89393K wps
[Epoch 93 Batch 60/62] avg loss 0.00284405, throughput 4.79452K wps
Begin Testing...
[Epoch 93] train avg loss 0.00296548, dev acc 0.8053, dev avg loss 0.400506, throughput 4.85095K wps
[Epoch 94 Batch 30/62] avg loss 0.00284129, throughput 4.90905K wps
[Epoch 94 Batch 60/62] avg loss 0.00303574, throughput 4.80948K wps
Begin Testing...
[Epoch 94] train avg loss 0.00297128, dev acc 0.8083, dev avg loss 0.40273, throughput 4.86478K wps
[Epoch 95 Batch 30/62] avg loss 0.00286424, throughput 4.9441K wps
[Epoch 95 Batch 60/62] avg loss 0.00281804, throughput 4.8151K wps
Begin Testing...
[Epoch 95] train avg loss 0.00286033, dev acc 0.8053, dev avg loss 0.399327, throughput 4.88453K wps
[Epoch 96 Batch 30/62] avg loss 0.002682, throughput 4.93968K wps
[Epoch 96 Batch 60/62] avg loss 0.00280621, throughput 4.82469K wps
Begin Testing...
[Epoch 96] train avg loss 0.00277098, dev acc 0.8053, dev avg loss 0.399787, throughput 4.88719K wps
[Epoch 97 Batch 30/62] avg loss 0.00276861, throughput 4.93674K wps
[Epoch 97 Batch 60/62] avg loss 0.00273016, throughput 4.83938K wps
Begin Testing...
[Epoch 97] train avg loss 0.00276745, dev acc 0.8053, dev avg loss 0.399933, throughput 4.89485K wps
[Epoch 98 Batch 30/62] avg loss 0.00266061, throughput 4.94148K wps
[Epoch 98 Batch 60/62] avg loss 0.00264135, throughput 4.85929K wps
Begin Testing...
[Epoch 98] train avg loss 0.00269753, dev acc 0.8112, dev avg loss 0.398668, throughput 4.90815K wps
[Epoch 99 Batch 30/62] avg loss 0.00261924, throughput 4.9367K wps
[Epoch 99 Batch 60/62] avg loss 0.00257474, throughput 4.82314K wps
Begin Testing...
[Epoch 99] train avg loss 0.00265475, dev acc 0.8112, dev avg loss 0.400058, throughput 4.88547K wps
[Epoch 100 Batch 30/62] avg loss 0.00260291, throughput 4.91384K wps
[Epoch 100 Batch 60/62] avg loss 0.00257685, throughput 4.85946K wps
Begin Testing...
[Epoch 100] train avg loss 0.00264481, dev acc 0.8230, dev avg loss 0.401613, throughput 4.89451K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/62] avg loss 0.00273689, throughput 4.99194K wps
[Epoch 101 Batch 60/62] avg loss 0.00247731, throughput 4.86389K wps
Begin Testing...
[Epoch 101] train avg loss 0.00264166, dev acc 0.8142, dev avg loss 0.399851, throughput 4.93307K wps
[Epoch 102 Batch 30/62] avg loss 0.00246808, throughput 4.94302K wps
[Epoch 102 Batch 60/62] avg loss 0.00255604, throughput 4.83942K wps
Begin Testing...
[Epoch 102] train avg loss 0.00254966, dev acc 0.8112, dev avg loss 0.402226, throughput 4.89653K wps
[Epoch 103 Batch 30/62] avg loss 0.00244403, throughput 4.94818K wps
[Epoch 103 Batch 60/62] avg loss 0.002531, throughput 4.81661K wps
Begin Testing...
[Epoch 103] train avg loss 0.00248259, dev acc 0.8171, dev avg loss 0.401456, throughput 4.88857K wps
[Epoch 104 Batch 30/62] avg loss 0.00245059, throughput 4.91035K wps
[Epoch 104 Batch 60/62] avg loss 0.00245823, throughput 4.80514K wps
Begin Testing...
[Epoch 104] train avg loss 0.00250326, dev acc 0.8083, dev avg loss 0.400125, throughput 4.86438K wps
[Epoch 105 Batch 30/62] avg loss 0.00245116, throughput 4.91432K wps
[Epoch 105 Batch 60/62] avg loss 0.00236679, throughput 4.797K wps
Begin Testing...
[Epoch 105] train avg loss 0.00245955, dev acc 0.8083, dev avg loss 0.401209, throughput 4.86226K wps
[Epoch 106 Batch 30/62] avg loss 0.00245967, throughput 4.92792K wps
[Epoch 106 Batch 60/62] avg loss 0.00241958, throughput 4.83744K wps
Begin Testing...
[Epoch 106] train avg loss 0.00245092, dev acc 0.8053, dev avg loss 0.400825, throughput 4.88899K wps
[Epoch 107 Batch 30/62] avg loss 0.00232097, throughput 4.96835K wps
[Epoch 107 Batch 60/62] avg loss 0.00231263, throughput 4.86626K wps
Begin Testing...
[Epoch 107] train avg loss 0.00237091, dev acc 0.8142, dev avg loss 0.401292, throughput 4.92277K wps
[Epoch 108 Batch 30/62] avg loss 0.00218261, throughput 4.98262K wps
[Epoch 108 Batch 60/62] avg loss 0.00245994, throughput 4.86717K wps
Begin Testing...
[Epoch 108] train avg loss 0.00233806, dev acc 0.8112, dev avg loss 0.40157, throughput 4.93093K wps
[Epoch 109 Batch 30/62] avg loss 0.00214847, throughput 4.98316K wps
[Epoch 109 Batch 60/62] avg loss 0.00232461, throughput 4.85921K wps
Begin Testing...
[Epoch 109] train avg loss 0.00225533, dev acc 0.8171, dev avg loss 0.403396, throughput 4.92652K wps
[Epoch 110 Batch 30/62] avg loss 0.00214925, throughput 4.94703K wps
[Epoch 110 Batch 60/62] avg loss 0.00225157, throughput 4.83554K wps
Begin Testing...
[Epoch 110] train avg loss 0.00225002, dev acc 0.8201, dev avg loss 0.405584, throughput 4.89719K wps
[Epoch 111 Batch 30/62] avg loss 0.00210567, throughput 4.95789K wps
[Epoch 111 Batch 60/62] avg loss 0.00213322, throughput 4.84157K wps
Begin Testing...
[Epoch 111] train avg loss 0.00217526, dev acc 0.8142, dev avg loss 0.402819, throughput 4.90561K wps
[Epoch 112 Batch 30/62] avg loss 0.00215706, throughput 4.95956K wps
[Epoch 112 Batch 60/62] avg loss 0.0022813, throughput 4.85516K wps
Begin Testing...
[Epoch 112] train avg loss 0.00223246, dev acc 0.8112, dev avg loss 0.403293, throughput 4.91278K wps
[Epoch 113 Batch 30/62] avg loss 0.00204791, throughput 4.9515K wps
[Epoch 113 Batch 60/62] avg loss 0.00214596, throughput 4.81977K wps
Begin Testing...
[Epoch 113] train avg loss 0.00213818, dev acc 0.8083, dev avg loss 0.404342, throughput 4.89177K wps
[Epoch 114 Batch 30/62] avg loss 0.00203295, throughput 4.93008K wps
[Epoch 114 Batch 60/62] avg loss 0.0019987, throughput 4.8103K wps
Begin Testing...
[Epoch 114] train avg loss 0.00205839, dev acc 0.8171, dev avg loss 0.40833, throughput 4.87685K wps
[Epoch 115 Batch 30/62] avg loss 0.0021643, throughput 4.89277K wps
[Epoch 115 Batch 60/62] avg loss 0.00200021, throughput 4.84323K wps
Begin Testing...
[Epoch 115] train avg loss 0.00210126, dev acc 0.8142, dev avg loss 0.403812, throughput 4.87394K wps
[Epoch 116 Batch 30/62] avg loss 0.00201431, throughput 4.93547K wps
[Epoch 116 Batch 60/62] avg loss 0.00202221, throughput 4.80507K wps
Begin Testing...
[Epoch 116] train avg loss 0.0020591, dev acc 0.8142, dev avg loss 0.406041, throughput 4.87748K wps
[Epoch 117 Batch 30/62] avg loss 0.00196542, throughput 4.94477K wps
[Epoch 117 Batch 60/62] avg loss 0.00199098, throughput 4.82016K wps
Begin Testing...
[Epoch 117] train avg loss 0.00200254, dev acc 0.8171, dev avg loss 0.40592, throughput 4.88772K wps
[Epoch 118 Batch 30/62] avg loss 0.00199991, throughput 4.92471K wps
[Epoch 118 Batch 60/62] avg loss 0.0019675, throughput 4.81119K wps
Begin Testing...
[Epoch 118] train avg loss 0.00199394, dev acc 0.8171, dev avg loss 0.408684, throughput 4.87451K wps
[Epoch 119 Batch 30/62] avg loss 0.00203604, throughput 4.92393K wps
[Epoch 119 Batch 60/62] avg loss 0.00185689, throughput 4.81257K wps
Begin Testing...
[Epoch 119] train avg loss 0.00199259, dev acc 0.8201, dev avg loss 0.40845, throughput 4.87452K wps
[Epoch 120 Batch 30/62] avg loss 0.00186694, throughput 4.90486K wps
[Epoch 120 Batch 60/62] avg loss 0.00189741, throughput 4.79164K wps
Begin Testing...
[Epoch 120] train avg loss 0.00191043, dev acc 0.8201, dev avg loss 0.408325, throughput 4.85079K wps
[Epoch 121 Batch 30/62] avg loss 0.00177509, throughput 4.91359K wps
[Epoch 121 Batch 60/62] avg loss 0.00187273, throughput 4.81126K wps
Begin Testing...
[Epoch 121] train avg loss 0.00185189, dev acc 0.8230, dev avg loss 0.409625, throughput 4.86995K wps
Observed Improvement.
Begin Testing...
[Epoch 122 Batch 30/62] avg loss 0.00183536, throughput 4.92576K wps
[Epoch 122 Batch 60/62] avg loss 0.00186271, throughput 4.80947K wps
Begin Testing...
[Epoch 122] train avg loss 0.00185319, dev acc 0.8112, dev avg loss 0.407253, throughput 4.87392K wps
[Epoch 123 Batch 30/62] avg loss 0.0018143, throughput 4.93396K wps
[Epoch 123 Batch 60/62] avg loss 0.00171863, throughput 4.83571K wps
Begin Testing...
[Epoch 123] train avg loss 0.00176556, dev acc 0.8142, dev avg loss 0.408033, throughput 4.89179K wps
[Epoch 124 Batch 30/62] avg loss 0.00176602, throughput 4.9401K wps
[Epoch 124 Batch 60/62] avg loss 0.0018552, throughput 4.85136K wps
Begin Testing...
[Epoch 124] train avg loss 0.00186109, dev acc 0.8201, dev avg loss 0.407848, throughput 4.90262K wps
[Epoch 125 Batch 30/62] avg loss 0.00177371, throughput 4.95628K wps
[Epoch 125 Batch 60/62] avg loss 0.00170149, throughput 4.83568K wps
Begin Testing...
[Epoch 125] train avg loss 0.00175845, dev acc 0.8142, dev avg loss 0.406372, throughput 4.90279K wps
[Epoch 126 Batch 30/62] avg loss 0.00182704, throughput 4.93073K wps
[Epoch 126 Batch 60/62] avg loss 0.00165369, throughput 4.82077K wps
Begin Testing...
[Epoch 126] train avg loss 0.00176805, dev acc 0.8171, dev avg loss 0.407174, throughput 4.88238K wps
[Epoch 127 Batch 30/62] avg loss 0.00169655, throughput 4.92989K wps
[Epoch 127 Batch 60/62] avg loss 0.00171651, throughput 4.81877K wps
Begin Testing...
[Epoch 127] train avg loss 0.00172212, dev acc 0.8201, dev avg loss 0.407556, throughput 4.8809K wps
[Epoch 128 Batch 30/62] avg loss 0.00166197, throughput 4.92073K wps
[Epoch 128 Batch 60/62] avg loss 0.0016612, throughput 4.8245K wps
Begin Testing...
[Epoch 128] train avg loss 0.00172645, dev acc 0.8260, dev avg loss 0.407096, throughput 4.87847K wps
Observed Improvement.
Begin Testing...
[Epoch 129 Batch 30/62] avg loss 0.00165866, throughput 4.93985K wps
[Epoch 129 Batch 60/62] avg loss 0.00173905, throughput 4.82769K wps
Begin Testing...
[Epoch 129] train avg loss 0.00171076, dev acc 0.8260, dev avg loss 0.408482, throughput 4.88985K wps
Observed Improvement.
Begin Testing...
[Epoch 130 Batch 30/62] avg loss 0.00167902, throughput 4.94411K wps
[Epoch 130 Batch 60/62] avg loss 0.00166501, throughput 4.83577K wps
Begin Testing...
[Epoch 130] train avg loss 0.00170428, dev acc 0.8201, dev avg loss 0.409378, throughput 4.89377K wps
[Epoch 131 Batch 30/62] avg loss 0.00163923, throughput 4.93794K wps
[Epoch 131 Batch 60/62] avg loss 0.00158189, throughput 4.83468K wps
Begin Testing...
[Epoch 131] train avg loss 0.00162725, dev acc 0.8171, dev avg loss 0.412034, throughput 4.89201K wps
[Epoch 132 Batch 30/62] avg loss 0.0016463, throughput 4.93032K wps
[Epoch 132 Batch 60/62] avg loss 0.00155324, throughput 4.81834K wps
Begin Testing...
[Epoch 132] train avg loss 0.00164797, dev acc 0.8260, dev avg loss 0.419009, throughput 4.88081K wps
Observed Improvement.
Begin Testing...
[Epoch 133 Batch 30/62] avg loss 0.00145594, throughput 4.92639K wps
[Epoch 133 Batch 60/62] avg loss 0.00164364, throughput 4.82716K wps
Begin Testing...
[Epoch 133] train avg loss 0.00157608, dev acc 0.8260, dev avg loss 0.410458, throughput 4.88286K wps
Observed Improvement.
Begin Testing...
[Epoch 134 Batch 30/62] avg loss 0.00154391, throughput 4.92676K wps
[Epoch 134 Batch 60/62] avg loss 0.00147854, throughput 4.82157K wps
Begin Testing...
[Epoch 134] train avg loss 0.0015245, dev acc 0.8319, dev avg loss 0.414659, throughput 4.87915K wps
Observed Improvement.
Begin Testing...
[Epoch 135 Batch 30/62] avg loss 0.00151295, throughput 4.93987K wps
[Epoch 135 Batch 60/62] avg loss 0.00161193, throughput 4.82986K wps
Begin Testing...
[Epoch 135] train avg loss 0.00156672, dev acc 0.8260, dev avg loss 0.413498, throughput 4.88935K wps
[Epoch 136 Batch 30/62] avg loss 0.00155415, throughput 4.92467K wps
[Epoch 136 Batch 60/62] avg loss 0.00148812, throughput 4.83304K wps
Begin Testing...
[Epoch 136] train avg loss 0.00158763, dev acc 0.8260, dev avg loss 0.41416, throughput 4.88481K wps
[Epoch 137 Batch 30/62] avg loss 0.00144511, throughput 4.93279K wps
[Epoch 137 Batch 60/62] avg loss 0.00160545, throughput 4.83169K wps
Begin Testing...
[Epoch 137] train avg loss 0.00153991, dev acc 0.8230, dev avg loss 0.411326, throughput 4.8898K wps
[Epoch 138 Batch 30/62] avg loss 0.00138395, throughput 4.94413K wps
[Epoch 138 Batch 60/62] avg loss 0.00150656, throughput 4.83674K wps
Begin Testing...
[Epoch 138] train avg loss 0.00145115, dev acc 0.8230, dev avg loss 0.414153, throughput 4.89614K wps
[Epoch 139 Batch 30/62] avg loss 0.00137001, throughput 4.94798K wps
[Epoch 139 Batch 60/62] avg loss 0.00151116, throughput 4.8171K wps
Begin Testing...
[Epoch 139] train avg loss 0.00144808, dev acc 0.8201, dev avg loss 0.414638, throughput 4.88796K wps
[Epoch 140 Batch 30/62] avg loss 0.00136126, throughput 4.92479K wps
[Epoch 140 Batch 60/62] avg loss 0.00135073, throughput 4.82898K wps
Begin Testing...
[Epoch 140] train avg loss 0.00136242, dev acc 0.8230, dev avg loss 0.415429, throughput 4.8841K wps
[Epoch 141 Batch 30/62] avg loss 0.00132463, throughput 4.95255K wps
[Epoch 141 Batch 60/62] avg loss 0.00147527, throughput 4.83351K wps
Begin Testing...
[Epoch 141] train avg loss 0.00140349, dev acc 0.8201, dev avg loss 0.416175, throughput 4.89887K wps
[Epoch 142 Batch 30/62] avg loss 0.00146153, throughput 4.93673K wps
[Epoch 142 Batch 60/62] avg loss 0.00135837, throughput 4.80788K wps
Begin Testing...
[Epoch 142] train avg loss 0.00141772, dev acc 0.8201, dev avg loss 0.416126, throughput 4.87686K wps
[Epoch 143 Batch 30/62] avg loss 0.00132431, throughput 4.92471K wps
[Epoch 143 Batch 60/62] avg loss 0.00139188, throughput 4.8195K wps
Begin Testing...
[Epoch 143] train avg loss 0.00138705, dev acc 0.8230, dev avg loss 0.417326, throughput 4.87972K wps
[Epoch 144 Batch 30/62] avg loss 0.00135, throughput 4.95746K wps
[Epoch 144 Batch 60/62] avg loss 0.00127342, throughput 4.81419K wps
Begin Testing...
[Epoch 144] train avg loss 0.00130893, dev acc 0.8230, dev avg loss 0.416488, throughput 4.89125K wps
[Epoch 145 Batch 30/62] avg loss 0.00128519, throughput 4.92439K wps
[Epoch 145 Batch 60/62] avg loss 0.00133467, throughput 4.81471K wps
Begin Testing...
[Epoch 145] train avg loss 0.00133226, dev acc 0.8260, dev avg loss 0.417505, throughput 4.87507K wps
[Epoch 146 Batch 30/62] avg loss 0.00140574, throughput 4.9434K wps
[Epoch 146 Batch 60/62] avg loss 0.00138723, throughput 4.81833K wps
Begin Testing...
[Epoch 146] train avg loss 0.00140059, dev acc 0.8230, dev avg loss 0.417894, throughput 4.8862K wps
[Epoch 147 Batch 30/62] avg loss 0.00127563, throughput 4.97521K wps
[Epoch 147 Batch 60/62] avg loss 0.00130018, throughput 4.87532K wps
Begin Testing...
[Epoch 147] train avg loss 0.00131154, dev acc 0.8230, dev avg loss 0.418934, throughput 4.931K wps
[Epoch 148 Batch 30/62] avg loss 0.00130709, throughput 4.96927K wps
[Epoch 148 Batch 60/62] avg loss 0.00121163, throughput 4.8477K wps
Begin Testing...
[Epoch 148] train avg loss 0.00126477, dev acc 0.8230, dev avg loss 0.420909, throughput 4.91552K wps
[Epoch 149 Batch 30/62] avg loss 0.00124646, throughput 4.97422K wps
[Epoch 149 Batch 60/62] avg loss 0.00129258, throughput 4.83177K wps
Begin Testing...
[Epoch 149] train avg loss 0.00126932, dev acc 0.8260, dev avg loss 0.42212, throughput 4.90945K wps
[Epoch 150 Batch 30/62] avg loss 0.00117627, throughput 4.96937K wps
[Epoch 150 Batch 60/62] avg loss 0.00129253, throughput 4.8438K wps
Begin Testing...
[Epoch 150] train avg loss 0.00123909, dev acc 0.8260, dev avg loss 0.421011, throughput 4.91169K wps
[Epoch 151 Batch 30/62] avg loss 0.0011603, throughput 4.90742K wps
[Epoch 151 Batch 60/62] avg loss 0.00125494, throughput 4.81415K wps
Begin Testing...
[Epoch 151] train avg loss 0.00122865, dev acc 0.8348, dev avg loss 0.424816, throughput 4.86772K wps
Observed Improvement.
Begin Testing...
[Epoch 152 Batch 30/62] avg loss 0.00126421, throughput 4.94207K wps
[Epoch 152 Batch 60/62] avg loss 0.00113896, throughput 4.84206K wps
Begin Testing...
[Epoch 152] train avg loss 0.00120811, dev acc 0.8289, dev avg loss 0.42111, throughput 4.89985K wps
[Epoch 153 Batch 30/62] avg loss 0.00122426, throughput 4.96044K wps
[Epoch 153 Batch 60/62] avg loss 0.00124705, throughput 4.83794K wps
Begin Testing...
[Epoch 153] train avg loss 0.00125165, dev acc 0.8319, dev avg loss 0.423785, throughput 4.90446K wps
[Epoch 154 Batch 30/62] avg loss 0.00108365, throughput 4.93601K wps
[Epoch 154 Batch 60/62] avg loss 0.00120043, throughput 4.79192K wps
Begin Testing...
[Epoch 154] train avg loss 0.0011538, dev acc 0.8260, dev avg loss 0.423137, throughput 4.86877K wps
[Epoch 155 Batch 30/62] avg loss 0.00113152, throughput 4.91817K wps
[Epoch 155 Batch 60/62] avg loss 0.00118403, throughput 4.83032K wps
Begin Testing...
[Epoch 155] train avg loss 0.00115801, dev acc 0.8289, dev avg loss 0.422134, throughput 4.88112K wps
[Epoch 156 Batch 30/62] avg loss 0.00116643, throughput 4.92678K wps
[Epoch 156 Batch 60/62] avg loss 0.0011346, throughput 4.83118K wps
Begin Testing...
[Epoch 156] train avg loss 0.00115634, dev acc 0.8348, dev avg loss 0.422629, throughput 4.88511K wps
Observed Improvement.
Begin Testing...
[Epoch 157 Batch 30/62] avg loss 0.00109197, throughput 4.94288K wps
[Epoch 157 Batch 60/62] avg loss 0.00119816, throughput 4.79925K wps
Begin Testing...
[Epoch 157] train avg loss 0.00118505, dev acc 0.8260, dev avg loss 0.424904, throughput 4.8753K wps
[Epoch 158 Batch 30/62] avg loss 0.00108017, throughput 4.93171K wps
[Epoch 158 Batch 60/62] avg loss 0.00113251, throughput 4.8262K wps
Begin Testing...
[Epoch 158] train avg loss 0.00111975, dev acc 0.8319, dev avg loss 0.423126, throughput 4.88376K wps
[Epoch 159 Batch 30/62] avg loss 0.00107761, throughput 4.93185K wps
[Epoch 159 Batch 60/62] avg loss 0.00110841, throughput 4.83795K wps
Begin Testing...
[Epoch 159] train avg loss 0.00112279, dev acc 0.8260, dev avg loss 0.42316, throughput 4.89004K wps
[Epoch 160 Batch 30/62] avg loss 0.00105955, throughput 4.92531K wps
[Epoch 160 Batch 60/62] avg loss 0.00114383, throughput 4.86112K wps
Begin Testing...
[Epoch 160] train avg loss 0.00113028, dev acc 0.8230, dev avg loss 0.42516, throughput 4.90021K wps
[Epoch 161 Batch 30/62] avg loss 0.00106864, throughput 4.9659K wps
[Epoch 161 Batch 60/62] avg loss 0.00113653, throughput 4.85823K wps
Begin Testing...
[Epoch 161] train avg loss 0.00112438, dev acc 0.8260, dev avg loss 0.426109, throughput 4.91829K wps
[Epoch 162 Batch 30/62] avg loss 0.0011719, throughput 4.95204K wps
[Epoch 162 Batch 60/62] avg loss 0.00103751, throughput 4.83855K wps
Begin Testing...
[Epoch 162] train avg loss 0.00111564, dev acc 0.8230, dev avg loss 0.426758, throughput 4.90169K wps
[Epoch 163 Batch 30/62] avg loss 0.00112327, throughput 4.93995K wps
[Epoch 163 Batch 60/62] avg loss 0.00110703, throughput 4.84378K wps
Begin Testing...
[Epoch 163] train avg loss 0.00113555, dev acc 0.8348, dev avg loss 0.428327, throughput 4.8972K wps
Observed Improvement.
Begin Testing...
[Epoch 164 Batch 30/62] avg loss 0.00102646, throughput 4.91372K wps
[Epoch 164 Batch 60/62] avg loss 0.00105261, throughput 4.83505K wps
Begin Testing...
[Epoch 164] train avg loss 0.00104664, dev acc 0.8289, dev avg loss 0.429286, throughput 4.8798K wps
[Epoch 165 Batch 30/62] avg loss 0.000993723, throughput 4.93602K wps
[Epoch 165 Batch 60/62] avg loss 0.000969147, throughput 4.82377K wps
Begin Testing...
[Epoch 165] train avg loss 0.00104037, dev acc 0.8260, dev avg loss 0.433639, throughput 4.88556K wps
[Epoch 166 Batch 30/62] avg loss 0.00103125, throughput 4.93263K wps
[Epoch 166 Batch 60/62] avg loss 0.00101036, throughput 4.82917K wps
Begin Testing...
[Epoch 166] train avg loss 0.00102789, dev acc 0.8260, dev avg loss 0.43008, throughput 4.88701K wps
[Epoch 167 Batch 30/62] avg loss 0.00104284, throughput 4.92774K wps
[Epoch 167 Batch 60/62] avg loss 0.000958115, throughput 4.80478K wps
Begin Testing...
[Epoch 167] train avg loss 0.00101473, dev acc 0.8260, dev avg loss 0.431452, throughput 4.87371K wps
[Epoch 168 Batch 30/62] avg loss 0.000993763, throughput 4.9001K wps
[Epoch 168 Batch 60/62] avg loss 0.000949397, throughput 4.81756K wps
Begin Testing...
[Epoch 168] train avg loss 0.00100365, dev acc 0.8230, dev avg loss 0.430207, throughput 4.86588K wps
[Epoch 169 Batch 30/62] avg loss 0.00104061, throughput 4.9306K wps
[Epoch 169 Batch 60/62] avg loss 0.000938038, throughput 4.80034K wps
Begin Testing...
[Epoch 169] train avg loss 0.000988033, dev acc 0.8230, dev avg loss 0.431484, throughput 4.87179K wps
[Epoch 170 Batch 30/62] avg loss 0.000991867, throughput 4.91738K wps
[Epoch 170 Batch 60/62] avg loss 0.000932809, throughput 4.81949K wps
Begin Testing...
[Epoch 170] train avg loss 0.000974802, dev acc 0.8230, dev avg loss 0.432317, throughput 4.87494K wps
[Epoch 171 Batch 30/62] avg loss 0.000977734, throughput 4.92994K wps
[Epoch 171 Batch 60/62] avg loss 0.000941724, throughput 4.8017K wps
Begin Testing...
[Epoch 171] train avg loss 0.000980667, dev acc 0.8201, dev avg loss 0.434028, throughput 4.87207K wps
[Epoch 172 Batch 30/62] avg loss 0.000865354, throughput 4.93807K wps
[Epoch 172 Batch 60/62] avg loss 0.000983529, throughput 4.81584K wps
Begin Testing...
[Epoch 172] train avg loss 0.000930014, dev acc 0.8260, dev avg loss 0.434352, throughput 4.88195K wps
[Epoch 173 Batch 30/62] avg loss 0.00100322, throughput 4.95884K wps
[Epoch 173 Batch 60/62] avg loss 0.00098865, throughput 4.82106K wps
Begin Testing...
[Epoch 173] train avg loss 0.00101373, dev acc 0.8378, dev avg loss 0.438718, throughput 4.89485K wps
Observed Improvement.
Begin Testing...
[Epoch 174 Batch 30/62] avg loss 0.000978976, throughput 4.92989K wps
[Epoch 174 Batch 60/62] avg loss 0.000979541, throughput 4.81615K wps
Begin Testing...
[Epoch 174] train avg loss 0.000979813, dev acc 0.8230, dev avg loss 0.434806, throughput 4.88077K wps
[Epoch 175 Batch 30/62] avg loss 0.000905998, throughput 4.93048K wps
[Epoch 175 Batch 60/62] avg loss 0.000907003, throughput 4.80691K wps
Begin Testing...
[Epoch 175] train avg loss 0.000925599, dev acc 0.8230, dev avg loss 0.435277, throughput 4.87437K wps
[Epoch 176 Batch 30/62] avg loss 0.000948889, throughput 4.92552K wps
[Epoch 176 Batch 60/62] avg loss 0.000908992, throughput 4.82254K wps
Begin Testing...
[Epoch 176] train avg loss 0.000943701, dev acc 0.8260, dev avg loss 0.435107, throughput 4.87888K wps
[Epoch 177 Batch 30/62] avg loss 0.000878182, throughput 4.92796K wps
[Epoch 177 Batch 60/62] avg loss 0.000975693, throughput 4.83092K wps
Begin Testing...
[Epoch 177] train avg loss 0.000944721, dev acc 0.8201, dev avg loss 0.434872, throughput 4.8867K wps
[Epoch 178 Batch 30/62] avg loss 0.000889917, throughput 4.95787K wps
[Epoch 178 Batch 60/62] avg loss 0.000839382, throughput 4.84944K wps
Begin Testing...
[Epoch 178] train avg loss 0.000872623, dev acc 0.8230, dev avg loss 0.436346, throughput 4.90952K wps
[Epoch 179 Batch 30/62] avg loss 0.000889195, throughput 4.95148K wps
[Epoch 179 Batch 60/62] avg loss 0.000927502, throughput 4.82884K wps
Begin Testing...
[Epoch 179] train avg loss 0.000929151, dev acc 0.8201, dev avg loss 0.437887, throughput 4.89594K wps
[Epoch 180 Batch 30/62] avg loss 0.000973668, throughput 4.94167K wps
[Epoch 180 Batch 60/62] avg loss 0.000831781, throughput 4.82141K wps
Begin Testing...
[Epoch 180] train avg loss 0.000921627, dev acc 0.8230, dev avg loss 0.437316, throughput 4.88672K wps
[Epoch 181 Batch 30/62] avg loss 0.000825963, throughput 4.91906K wps
[Epoch 181 Batch 60/62] avg loss 0.000939353, throughput 4.82326K wps
Begin Testing...
[Epoch 181] train avg loss 0.000891966, dev acc 0.8230, dev avg loss 0.437434, throughput 4.87682K wps
[Epoch 182 Batch 30/62] avg loss 0.000867811, throughput 4.92518K wps
[Epoch 182 Batch 60/62] avg loss 0.000823859, throughput 4.80652K wps
Begin Testing...
[Epoch 182] train avg loss 0.000852982, dev acc 0.8289, dev avg loss 0.438244, throughput 4.87042K wps
[Epoch 183 Batch 30/62] avg loss 0.000801359, throughput 4.93218K wps
[Epoch 183 Batch 60/62] avg loss 0.000818466, throughput 4.83988K wps
Begin Testing...
[Epoch 183] train avg loss 0.000809552, dev acc 0.8289, dev avg loss 0.438928, throughput 4.89271K wps
[Epoch 184 Batch 30/62] avg loss 0.000841416, throughput 4.9406K wps
[Epoch 184 Batch 60/62] avg loss 0.000849439, throughput 4.84106K wps
Begin Testing...
[Epoch 184] train avg loss 0.000843917, dev acc 0.8289, dev avg loss 0.438152, throughput 4.89756K wps
[Epoch 185 Batch 30/62] avg loss 0.000835334, throughput 4.93186K wps
[Epoch 185 Batch 60/62] avg loss 0.000772945, throughput 4.82953K wps
Begin Testing...
[Epoch 185] train avg loss 0.000809165, dev acc 0.8260, dev avg loss 0.439601, throughput 4.88774K wps
[Epoch 186 Batch 30/62] avg loss 0.000811842, throughput 4.91802K wps
[Epoch 186 Batch 60/62] avg loss 0.000820948, throughput 4.82284K wps
Begin Testing...
[Epoch 186] train avg loss 0.000822724, dev acc 0.8319, dev avg loss 0.442136, throughput 4.87677K wps
[Epoch 187 Batch 30/62] avg loss 0.000821102, throughput 4.90497K wps
[Epoch 187 Batch 60/62] avg loss 0.000865831, throughput 4.79484K wps
Begin Testing...
[Epoch 187] train avg loss 0.000849243, dev acc 0.8230, dev avg loss 0.441778, throughput 4.8553K wps
[Epoch 188 Batch 30/62] avg loss 0.000836062, throughput 4.93783K wps
[Epoch 188 Batch 60/62] avg loss 0.000810722, throughput 4.81504K wps
Begin Testing...
[Epoch 188] train avg loss 0.000853639, dev acc 0.8289, dev avg loss 0.44292, throughput 4.88158K wps
[Epoch 189 Batch 30/62] avg loss 0.000733967, throughput 4.93635K wps
[Epoch 189 Batch 60/62] avg loss 0.000819734, throughput 4.82994K wps
Begin Testing...
[Epoch 189] train avg loss 0.000802793, dev acc 0.8289, dev avg loss 0.442318, throughput 4.88986K wps
[Epoch 190 Batch 30/62] avg loss 0.000796659, throughput 4.9462K wps
[Epoch 190 Batch 60/62] avg loss 0.000839895, throughput 4.8178K wps
Begin Testing...
[Epoch 190] train avg loss 0.000832549, dev acc 0.8260, dev avg loss 0.441422, throughput 4.88707K wps
[Epoch 191 Batch 30/62] avg loss 0.000791413, throughput 4.93073K wps
[Epoch 191 Batch 60/62] avg loss 0.000744238, throughput 4.80565K wps
Begin Testing...
[Epoch 191] train avg loss 0.000766732, dev acc 0.8289, dev avg loss 0.442979, throughput 4.87533K wps
[Epoch 192 Batch 30/62] avg loss 0.000746033, throughput 4.94018K wps
[Epoch 192 Batch 60/62] avg loss 0.00069881, throughput 4.82498K wps
Begin Testing...
[Epoch 192] train avg loss 0.000728788, dev acc 0.8260, dev avg loss 0.442207, throughput 4.88761K wps
[Epoch 193 Batch 30/62] avg loss 0.000729343, throughput 4.91415K wps
[Epoch 193 Batch 60/62] avg loss 0.000776538, throughput 4.8173K wps
Begin Testing...
[Epoch 193] train avg loss 0.000764997, dev acc 0.8319, dev avg loss 0.442535, throughput 4.87276K wps
[Epoch 194 Batch 30/62] avg loss 0.000720262, throughput 4.92807K wps
[Epoch 194 Batch 60/62] avg loss 0.000708799, throughput 4.81154K wps
Begin Testing...
[Epoch 194] train avg loss 0.000713206, dev acc 0.8319, dev avg loss 0.442688, throughput 4.87595K wps
[Epoch 195 Batch 30/62] avg loss 0.00079499, throughput 4.91599K wps
[Epoch 195 Batch 60/62] avg loss 0.000792924, throughput 4.77961K wps
Begin Testing...
[Epoch 195] train avg loss 0.00079928, dev acc 0.8230, dev avg loss 0.442847, throughput 4.85429K wps
[Epoch 196 Batch 30/62] avg loss 0.000694746, throughput 4.93557K wps
[Epoch 196 Batch 60/62] avg loss 0.000759288, throughput 4.81568K wps
Begin Testing...
[Epoch 196] train avg loss 0.00075999, dev acc 0.8260, dev avg loss 0.445062, throughput 4.88092K wps
[Epoch 197 Batch 30/62] avg loss 0.000780717, throughput 4.92897K wps
[Epoch 197 Batch 60/62] avg loss 0.000719829, throughput 4.84817K wps
Begin Testing...
[Epoch 197] train avg loss 0.000762046, dev acc 0.8230, dev avg loss 0.445097, throughput 4.89337K wps
[Epoch 198 Batch 30/62] avg loss 0.000726032, throughput 4.95112K wps
[Epoch 198 Batch 60/62] avg loss 0.000775801, throughput 4.82837K wps
Begin Testing...
[Epoch 198] train avg loss 0.000764433, dev acc 0.8348, dev avg loss 0.447533, throughput 4.89544K wps
[Epoch 199 Batch 30/62] avg loss 0.000710051, throughput 4.94191K wps
[Epoch 199 Batch 60/62] avg loss 0.000746546, throughput 4.80315K wps
Begin Testing...
[Epoch 199] train avg loss 0.000731783, dev acc 0.8289, dev avg loss 0.445931, throughput 4.87994K wps
Test loss 0.329627, test acc 0.8621
Total time cost 301.12s
[Epoch 0 Batch 30/62] avg loss 0.0134099, throughput 4.68396K wps
[Epoch 0 Batch 60/62] avg loss 0.0130446, throughput 4.82706K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133665, dev acc 0.6519, dev avg loss 0.646416, throughput 4.76497K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0132915, throughput 4.92584K wps
[Epoch 1 Batch 60/62] avg loss 0.012984, throughput 4.82071K wps
Begin Testing...
[Epoch 1] train avg loss 0.0132989, dev acc 0.6519, dev avg loss 0.639813, throughput 4.87886K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0129344, throughput 4.94846K wps
[Epoch 2 Batch 60/62] avg loss 0.0129028, throughput 4.83946K wps
Begin Testing...
[Epoch 2] train avg loss 0.0130489, dev acc 0.6519, dev avg loss 0.634614, throughput 4.90011K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0128663, throughput 4.98224K wps
[Epoch 3 Batch 60/62] avg loss 0.0127377, throughput 4.8779K wps
Begin Testing...
[Epoch 3] train avg loss 0.0129831, dev acc 0.6519, dev avg loss 0.628229, throughput 4.93663K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0128713, throughput 4.97545K wps
[Epoch 4 Batch 60/62] avg loss 0.0125316, throughput 4.85148K wps
Begin Testing...
[Epoch 4] train avg loss 0.0129052, dev acc 0.6519, dev avg loss 0.624134, throughput 4.91802K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0125555, throughput 4.94017K wps
[Epoch 5 Batch 60/62] avg loss 0.0124649, throughput 4.81369K wps
Begin Testing...
[Epoch 5] train avg loss 0.0126641, dev acc 0.6519, dev avg loss 0.61818, throughput 4.88206K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0123767, throughput 4.93163K wps
[Epoch 6 Batch 60/62] avg loss 0.0123722, throughput 4.84408K wps
Begin Testing...
[Epoch 6] train avg loss 0.0125669, dev acc 0.6519, dev avg loss 0.613508, throughput 4.89525K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0124189, throughput 4.95376K wps
[Epoch 7 Batch 60/62] avg loss 0.0121335, throughput 4.81237K wps
Begin Testing...
[Epoch 7] train avg loss 0.0124299, dev acc 0.6519, dev avg loss 0.608955, throughput 4.88975K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0120111, throughput 4.93913K wps
[Epoch 8 Batch 60/62] avg loss 0.0122622, throughput 4.83707K wps
Begin Testing...
[Epoch 8] train avg loss 0.0122635, dev acc 0.6519, dev avg loss 0.602553, throughput 4.8927K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0119937, throughput 4.93191K wps
[Epoch 9 Batch 60/62] avg loss 0.0119565, throughput 4.81971K wps
Begin Testing...
[Epoch 9] train avg loss 0.012096, dev acc 0.6519, dev avg loss 0.59704, throughput 4.88144K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0120836, throughput 4.94584K wps
[Epoch 10 Batch 60/62] avg loss 0.0117503, throughput 4.81265K wps
Begin Testing...
[Epoch 10] train avg loss 0.0120119, dev acc 0.6519, dev avg loss 0.592398, throughput 4.88413K wps
Observed Improvement.
Begin Testing...