-
Notifications
You must be signed in to change notification settings - Fork 154
/
awd_lstm_lm_600_wikitext-2.log
2998 lines (2998 loc) · 184 KB
/
awd_lstm_lm_600_wikitext-2.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Namespace(alpha=2, batch_size=80, beta=1, bptt=70, clip=0.25, dropout=0.2, dropout_e=0.05, dropout_h=0.1, dropout_i=0.3, emsize=200, epochs=750, eval_only=False, gpu='0', log_interval=200, lr=30, lr_update_factor=0.1, lr_update_interval=30, model='lstm', nhid=600, nlayers=3, ntasgd=True, optimizer='sgd', save='awd_lstm_lm_600_wikitext-2', test_mode=False, tied=True, wd=1.2e-06, weight_dropout=0.2)
Use AWDRNN
AWDRNN(
(embedding): HybridSequential(
(0): Embedding(33278 -> 200, float32)
(1): Dropout(p = 0.3, axes=(0,))
)
(encoder): HybridSequential(
(0): LSTM(200 -> 600, TNC)
(1): LSTM(600 -> 600, TNC)
(2): LSTM(600 -> 200, TNC)
)
(decoder): HybridSequential(
(0): Dense(200 -> 33278, linear)
)
)
[Epoch 0 Batch 200/372] current loss 7.74, ppl 2288.02, throughput 650.82 samples/s, lr 28.71
[Epoch 0] throughput 44198.16 samples/s
[Epoch 0] time cost 50.10s, valid loss 6.38, valid ppl 592.45, lr 30.00
[Epoch 0] test loss 6.31, test ppl 549.84
[Epoch 1 Batch 200/372] current loss 6.64, ppl 766.72, throughput 633.33 samples/s, lr 30.00
[Epoch 1] throughput 43426.25 samples/s
[Epoch 1] time cost 51.17s, valid loss 5.99, valid ppl 400.90, lr 30.00
[Epoch 1] test loss 5.92, test ppl 372.19
[Epoch 2 Batch 200/372] current loss 6.31, ppl 550.95, throughput 654.39 samples/s, lr 32.57
[Epoch 2] throughput 44243.79 samples/s
[Epoch 2] time cost 50.16s, valid loss 5.74, valid ppl 310.69, lr 30.00
[Epoch 2] test loss 5.67, test ppl 288.70
[Epoch 3 Batch 200/372] current loss 6.10, ppl 444.18, throughput 650.56 samples/s, lr 29.14
[Epoch 3] throughput 44074.94 samples/s
[Epoch 3] time cost 50.32s, valid loss 5.62, valid ppl 276.33, lr 30.00
[Epoch 3] test loss 5.54, test ppl 255.81
[Epoch 4 Batch 200/372] current loss 5.92, ppl 370.67, throughput 649.30 samples/s, lr 29.14
[Epoch 4] throughput 44324.22 samples/s
[Epoch 4] time cost 50.06s, valid loss 5.40, valid ppl 220.83, lr 30.00
[Epoch 4] test loss 5.32, test ppl 204.42
[Epoch 5 Batch 200/372] current loss 5.77, ppl 321.68, throughput 647.33 samples/s, lr 32.57
[Epoch 5] throughput 44292.99 samples/s
[Epoch 5] time cost 50.03s, valid loss 5.29, valid ppl 199.28, lr 30.00
[Epoch 5] test loss 5.22, test ppl 184.12
[Epoch 6 Batch 200/372] current loss 5.67, ppl 289.51, throughput 639.94 samples/s, lr 31.29
[Epoch 6] throughput 44443.05 samples/s
[Epoch 6] time cost 49.97s, valid loss 5.25, valid ppl 190.39, lr 30.00
[Epoch 6] test loss 5.17, test ppl 176.70
[Epoch 7 Batch 200/372] current loss 5.57, ppl 263.28, throughput 644.22 samples/s, lr 13.71
[Epoch 7] throughput 44507.61 samples/s
[Epoch 7] time cost 49.86s, valid loss 5.13, valid ppl 168.18, lr 30.00
[Epoch 7] test loss 5.05, test ppl 156.27
[Epoch 8 Batch 200/372] current loss 5.50, ppl 244.18, throughput 662.58 samples/s, lr 30.43
[Epoch 8] throughput 44688.86 samples/s
[Epoch 8] time cost 49.63s, valid loss 5.04, valid ppl 155.16, lr 30.00
[Epoch 8] test loss 4.97, test ppl 144.62
[Epoch 9 Batch 200/372] current loss 5.42, ppl 225.88, throughput 683.11 samples/s, lr 31.71
[Epoch 9] throughput 44739.28 samples/s
[Epoch 9] time cost 49.61s, valid loss 4.99, valid ppl 146.83, lr 30.00
[Epoch 9] test loss 4.92, test ppl 136.54
[Epoch 10 Batch 200/372] current loss 5.37, ppl 214.31, throughput 637.87 samples/s, lr 33.43
[Epoch 10] throughput 44001.53 samples/s
[Epoch 10] time cost 50.46s, valid loss 4.96, valid ppl 142.89, lr 30.00
[Epoch 10] test loss 4.89, test ppl 132.76
[Epoch 11 Batch 200/372] current loss 5.32, ppl 204.49, throughput 648.52 samples/s, lr 30.43
[Epoch 11] throughput 44460.91 samples/s
[Epoch 11] time cost 49.90s, valid loss 4.92, valid ppl 136.69, lr 30.00
[Epoch 11] test loss 4.85, test ppl 127.58
[Epoch 12 Batch 200/372] current loss 5.26, ppl 191.94, throughput 640.00 samples/s, lr 28.29
[Epoch 12] throughput 43973.53 samples/s
[Epoch 12] time cost 50.42s, valid loss 4.92, valid ppl 137.00, lr 30.00
[Epoch 13 Batch 200/372] current loss 5.20, ppl 181.90, throughput 643.26 samples/s, lr 28.71
[Epoch 13] throughput 44082.73 samples/s
[Epoch 13] time cost 50.31s, valid loss 4.85, valid ppl 128.23, lr 30.00
[Epoch 13] test loss 4.79, test ppl 120.21
[Epoch 14 Batch 200/372] current loss 5.18, ppl 177.20, throughput 666.17 samples/s, lr 30.86
[Epoch 14] throughput 44608.82 samples/s
[Epoch 14] time cost 49.69s, valid loss 4.85, valid ppl 127.85, lr 30.00
[Epoch 14] test loss 4.79, test ppl 119.81
[Epoch 15 Batch 200/372] current loss 5.13, ppl 169.37, throughput 650.57 samples/s, lr 29.57
[Epoch 15] throughput 43556.71 samples/s
[Epoch 15] time cost 50.83s, valid loss 4.86, valid ppl 128.43, lr 30.00
[Epoch 16 Batch 200/372] current loss 5.10, ppl 164.02, throughput 643.99 samples/s, lr 32.14
[Epoch 16] throughput 44424.84 samples/s
[Epoch 16] time cost 49.97s, valid loss 4.78, valid ppl 119.50, lr 30.00
[Epoch 16] test loss 4.72, test ppl 111.94
[Epoch 17 Batch 200/372] current loss 5.06, ppl 157.63, throughput 647.20 samples/s, lr 27.00
[Epoch 17] throughput 44375.80 samples/s
[Epoch 17] time cost 50.09s, valid loss 4.76, valid ppl 116.58, lr 30.00
[Epoch 17] test loss 4.69, test ppl 109.19
[Epoch 18 Batch 200/372] current loss 5.03, ppl 153.11, throughput 634.29 samples/s, lr 33.86
[Epoch 18] throughput 44334.62 samples/s
[Epoch 18] time cost 50.03s, valid loss 4.79, valid ppl 120.30, lr 30.00
[Epoch 19 Batch 200/372] current loss 5.00, ppl 148.68, throughput 650.36 samples/s, lr 29.14
[Epoch 19] throughput 44259.08 samples/s
[Epoch 19] time cost 50.32s, valid loss 4.72, valid ppl 111.80, lr 30.00
[Epoch 19] test loss 4.65, test ppl 104.44
[Epoch 20 Batch 200/372] current loss 4.97, ppl 144.18, throughput 648.58 samples/s, lr 29.57
[Epoch 20] throughput 44451.41 samples/s
[Epoch 20] time cost 49.88s, valid loss 4.70, valid ppl 109.74, lr 30.00
[Epoch 20] test loss 4.63, test ppl 102.37
[Epoch 21 Batch 200/372] current loss 4.94, ppl 139.94, throughput 634.14 samples/s, lr 29.14
[Epoch 21] throughput 42820.37 samples/s
[Epoch 21] time cost 51.69s, valid loss 4.70, valid ppl 110.18, lr 30.00
[Epoch 22 Batch 200/372] current loss 4.93, ppl 138.47, throughput 645.90 samples/s, lr 14.57
[Epoch 22] throughput 44392.20 samples/s
[Epoch 22] time cost 49.97s, valid loss 4.70, valid ppl 110.15, lr 30.00
[Epoch 23 Batch 200/372] current loss 4.91, ppl 135.20, throughput 642.94 samples/s, lr 30.00
[Epoch 23] throughput 44340.26 samples/s
[Epoch 23] time cost 50.10s, valid loss 4.68, valid ppl 107.29, lr 30.00
[Epoch 23] test loss 4.61, test ppl 100.69
[Epoch 24 Batch 200/372] current loss 4.87, ppl 130.95, throughput 648.49 samples/s, lr 19.29
[Epoch 24] throughput 44230.44 samples/s
[Epoch 24] time cost 50.17s, valid loss 4.67, valid ppl 106.22, lr 30.00
[Epoch 24] test loss 4.61, test ppl 100.12
[Epoch 25 Batch 200/372] current loss 4.86, ppl 128.55, throughput 656.92 samples/s, lr 33.00
[Epoch 25] throughput 44743.07 samples/s
[Epoch 25] time cost 49.58s, valid loss 4.65, valid ppl 104.16, lr 30.00
[Epoch 25] test loss 4.58, test ppl 97.89
[Epoch 26 Batch 200/372] current loss 4.85, ppl 127.62, throughput 645.60 samples/s, lr 31.29
[Epoch 26] throughput 44472.71 samples/s
[Epoch 26] time cost 49.87s, valid loss 4.66, valid ppl 106.16, lr 30.00
[Epoch 27 Batch 200/372] current loss 4.83, ppl 124.75, throughput 648.05 samples/s, lr 25.71
[Epoch 27] throughput 43892.91 samples/s
[Epoch 27] time cost 50.48s, valid loss 4.64, valid ppl 103.36, lr 30.00
[Epoch 27] test loss 4.58, test ppl 97.12
[Epoch 28 Batch 200/372] current loss 4.81, ppl 122.32, throughput 648.08 samples/s, lr 27.86
[Epoch 28] throughput 44300.62 samples/s
[Epoch 28] time cost 50.12s, valid loss 4.62, valid ppl 101.75, lr 30.00
[Epoch 28] test loss 4.56, test ppl 95.81
[Epoch 29 Batch 200/372] current loss 4.79, ppl 120.48, throughput 659.91 samples/s, lr 30.00
[Epoch 29] throughput 44449.51 samples/s
[Epoch 29] time cost 49.92s, valid loss 4.61, valid ppl 100.83, lr 30.00
[Epoch 29] test loss 4.56, test ppl 95.17
[Epoch 30 Batch 200/372] current loss 4.78, ppl 118.84, throughput 643.60 samples/s, lr 32.14
[Epoch 30] throughput 43515.09 samples/s
[Epoch 30] time cost 50.93s, valid loss 4.63, valid ppl 102.50, lr 30.00
[Epoch 31 Batch 200/372] current loss 4.75, ppl 115.05, throughput 645.49 samples/s, lr 33.86
[Epoch 31] throughput 43325.90 samples/s
[Epoch 31] time cost 51.19s, valid loss 4.62, valid ppl 101.06, lr 30.00
[Epoch 32 Batch 200/372] current loss 4.75, ppl 115.10, throughput 642.59 samples/s, lr 31.71
[Epoch 32] throughput 43937.59 samples/s
[Epoch 32] time cost 50.45s, valid loss 4.61, valid ppl 100.98, lr 30.00
[Epoch 33 Batch 200/372] current loss 4.73, ppl 113.15, throughput 652.65 samples/s, lr 29.57
[Epoch 33] throughput 44141.01 samples/s
[Epoch 33] time cost 50.28s, valid loss 4.61, valid ppl 100.94, lr 30.00
[Epoch 34 Batch 200/372] current loss 4.70, ppl 110.37, throughput 668.18 samples/s, lr 29.14
[Epoch 34] throughput 44402.29 samples/s
[Epoch 34] time cost 50.04s, valid loss 4.59, valid ppl 98.34, lr 30.00
[Epoch 34] test loss 4.53, test ppl 92.54
[Epoch 35 Batch 200/372] current loss 4.71, ppl 111.27, throughput 645.86 samples/s, lr 27.43
[Epoch 35] throughput 44061.75 samples/s
[Epoch 35] time cost 50.41s, valid loss 4.58, valid ppl 97.54, lr 30.00
[Epoch 35] test loss 4.52, test ppl 91.87
[Epoch 36 Batch 200/372] current loss 4.69, ppl 109.24, throughput 666.47 samples/s, lr 27.43
[Epoch 36] throughput 44988.59 samples/s
[Epoch 36] time cost 49.32s, valid loss 4.57, valid ppl 96.82, lr 30.00
[Epoch 36] test loss 4.51, test ppl 90.90
[Epoch 37 Batch 200/372] current loss 4.67, ppl 107.07, throughput 655.55 samples/s, lr 27.00
[Epoch 37] throughput 44336.11 samples/s
[Epoch 37] time cost 50.00s, valid loss 4.59, valid ppl 98.36, lr 30.00
[Epoch 38 Batch 200/372] current loss 4.66, ppl 105.74, throughput 652.47 samples/s, lr 27.43
[Epoch 38] throughput 43731.19 samples/s
[Epoch 38] time cost 50.63s, valid loss 4.57, valid ppl 96.84, lr 30.00
[Epoch 39 Batch 200/372] current loss 4.66, ppl 105.34, throughput 645.74 samples/s, lr 29.14
[Epoch 39] throughput 43622.05 samples/s
[Epoch 39] time cost 50.81s, valid loss 4.57, valid ppl 96.27, lr 30.00
[Epoch 39] test loss 4.50, test ppl 90.25
[Epoch 40 Batch 200/372] current loss 4.64, ppl 103.12, throughput 636.13 samples/s, lr 14.57
[Epoch 40] throughput 44324.39 samples/s
[Epoch 40] time cost 50.00s, valid loss 4.58, valid ppl 97.21, lr 30.00
[Epoch 41 Batch 200/372] current loss 4.63, ppl 102.42, throughput 634.71 samples/s, lr 27.00
[Epoch 41] throughput 44022.04 samples/s
[Epoch 41] time cost 50.34s, valid loss 4.57, valid ppl 96.62, lr 30.00
[Epoch 42 Batch 200/372] current loss 4.62, ppl 101.32, throughput 649.06 samples/s, lr 28.71
[Epoch 42] throughput 44744.91 samples/s
[Epoch 42] time cost 49.65s, valid loss 4.55, valid ppl 94.17, lr 30.00
[Epoch 42] test loss 4.48, test ppl 88.57
[Epoch 43 Batch 200/372] current loss 4.61, ppl 100.66, throughput 640.14 samples/s, lr 33.00
[Epoch 43] throughput 43091.62 samples/s
[Epoch 43] time cost 51.40s, valid loss 4.55, valid ppl 94.82, lr 30.00
[Epoch 44 Batch 200/372] current loss 4.61, ppl 100.60, throughput 657.50 samples/s, lr 30.43
[Epoch 44] throughput 44312.00 samples/s
[Epoch 44] time cost 50.11s, valid loss 4.57, valid ppl 96.38, lr 30.00
[Epoch 45 Batch 200/372] current loss 4.60, ppl 99.42, throughput 631.02 samples/s, lr 17.57
[Epoch 45] throughput 43348.68 samples/s
[Epoch 45] time cost 51.09s, valid loss 4.57, valid ppl 96.28, lr 30.00
Switching to NTASGD and avg_trigger is : 17112
[Epoch 46 Batch 200/372] current loss 4.59, ppl 98.58, throughput 638.27 samples/s, lr 28.29
[Epoch 46] throughput 41875.81 samples/s
[Epoch 46] time cost 52.83s, valid loss 4.50, valid ppl 89.82, lr 30.00
[Epoch 46] test loss 4.44, test ppl 84.88
[Epoch 47 Batch 200/372] current loss 4.58, ppl 97.39, throughput 638.05 samples/s, lr 28.71
[Epoch 47] throughput 43327.74 samples/s
[Epoch 47] time cost 51.06s, valid loss 4.49, valid ppl 89.44, lr 30.00
[Epoch 47] test loss 4.44, test ppl 84.53
[Epoch 48 Batch 200/372] current loss 4.57, ppl 96.10, throughput 630.81 samples/s, lr 30.43
[Epoch 48] throughput 43206.65 samples/s
[Epoch 48] time cost 51.26s, valid loss 4.49, valid ppl 89.16, lr 30.00
[Epoch 48] test loss 4.43, test ppl 84.27
[Epoch 49 Batch 200/372] current loss 4.57, ppl 96.24, throughput 605.90 samples/s, lr 31.29
[Epoch 49] throughput 41919.85 samples/s
[Epoch 49] time cost 52.76s, valid loss 4.49, valid ppl 88.89, lr 30.00
[Epoch 49] test loss 4.43, test ppl 84.04
[Epoch 50 Batch 200/372] current loss 4.56, ppl 95.94, throughput 621.56 samples/s, lr 30.86
[Epoch 50] throughput 42276.36 samples/s
[Epoch 50] time cost 52.24s, valid loss 4.49, valid ppl 88.68, lr 30.00
[Epoch 50] test loss 4.43, test ppl 83.84
[Epoch 51 Batch 200/372] current loss 4.54, ppl 93.65, throughput 622.65 samples/s, lr 15.43
[Epoch 51] throughput 42635.10 samples/s
[Epoch 51] time cost 51.92s, valid loss 4.48, valid ppl 88.52, lr 30.00
[Epoch 51] test loss 4.43, test ppl 83.69
[Epoch 52 Batch 200/372] current loss 4.54, ppl 93.84, throughput 637.35 samples/s, lr 26.57
[Epoch 52] throughput 42508.08 samples/s
[Epoch 52] time cost 52.06s, valid loss 4.48, valid ppl 88.35, lr 30.00
[Epoch 52] test loss 4.43, test ppl 83.55
[Epoch 53 Batch 200/372] current loss 4.54, ppl 93.42, throughput 624.95 samples/s, lr 27.00
[Epoch 53] throughput 42464.86 samples/s
[Epoch 53] time cost 52.14s, valid loss 4.48, valid ppl 88.20, lr 30.00
[Epoch 53] test loss 4.42, test ppl 83.42
[Epoch 54 Batch 200/372] current loss 4.51, ppl 91.21, throughput 627.37 samples/s, lr 28.29
[Epoch 54] throughput 42577.20 samples/s
[Epoch 54] time cost 51.92s, valid loss 4.48, valid ppl 88.06, lr 30.00
[Epoch 54] test loss 4.42, test ppl 83.29
[Epoch 55 Batch 200/372] current loss 4.51, ppl 90.79, throughput 627.75 samples/s, lr 31.29
[Epoch 55] throughput 41886.00 samples/s
[Epoch 55] time cost 52.70s, valid loss 4.48, valid ppl 87.93, lr 30.00
[Epoch 55] test loss 4.42, test ppl 83.16
[Epoch 56 Batch 200/372] current loss 4.52, ppl 91.47, throughput 618.82 samples/s, lr 27.43
[Epoch 56] throughput 42316.70 samples/s
[Epoch 56] time cost 52.21s, valid loss 4.48, valid ppl 87.80, lr 30.00
[Epoch 56] test loss 4.42, test ppl 83.05
[Epoch 57 Batch 200/372] current loss 4.50, ppl 90.41, throughput 611.52 samples/s, lr 31.71
[Epoch 57] throughput 42165.09 samples/s
[Epoch 57] time cost 52.48s, valid loss 4.47, valid ppl 87.67, lr 30.00
[Epoch 57] test loss 4.42, test ppl 82.94
[Epoch 58 Batch 200/372] current loss 4.49, ppl 89.36, throughput 615.84 samples/s, lr 29.14
[Epoch 58] throughput 42772.39 samples/s
[Epoch 58] time cost 51.74s, valid loss 4.47, valid ppl 87.54, lr 30.00
[Epoch 58] test loss 4.42, test ppl 82.84
[Epoch 59 Batch 200/372] current loss 4.49, ppl 88.85, throughput 607.43 samples/s, lr 26.57
[Epoch 59] throughput 41395.13 samples/s
[Epoch 59] time cost 53.44s, valid loss 4.47, valid ppl 87.43, lr 30.00
[Epoch 59] test loss 4.42, test ppl 82.73
[Epoch 60 Batch 200/372] current loss 4.47, ppl 87.78, throughput 630.80 samples/s, lr 31.71
[Epoch 60] throughput 42870.16 samples/s
[Epoch 60] time cost 51.66s, valid loss 4.47, valid ppl 87.31, lr 30.00
[Epoch 60] test loss 4.41, test ppl 82.63
[Epoch 61 Batch 200/372] current loss 4.48, ppl 87.82, throughput 623.81 samples/s, lr 30.86
[Epoch 61] throughput 41849.03 samples/s
[Epoch 61] time cost 52.74s, valid loss 4.47, valid ppl 87.20, lr 30.00
[Epoch 61] test loss 4.41, test ppl 82.52
[Epoch 62 Batch 200/372] current loss 4.48, ppl 88.04, throughput 615.71 samples/s, lr 31.71
[Epoch 62] throughput 42296.39 samples/s
[Epoch 62] time cost 52.26s, valid loss 4.47, valid ppl 87.09, lr 30.00
[Epoch 62] test loss 4.41, test ppl 82.43
[Epoch 63 Batch 200/372] current loss 4.46, ppl 86.53, throughput 621.55 samples/s, lr 29.57
[Epoch 63] throughput 42202.85 samples/s
[Epoch 63] time cost 52.41s, valid loss 4.47, valid ppl 86.99, lr 30.00
[Epoch 63] test loss 4.41, test ppl 82.33
[Epoch 64 Batch 200/372] current loss 4.48, ppl 88.01, throughput 612.73 samples/s, lr 28.29
[Epoch 64] throughput 42039.58 samples/s
[Epoch 64] time cost 52.55s, valid loss 4.46, valid ppl 86.89, lr 30.00
[Epoch 64] test loss 4.41, test ppl 82.24
[Epoch 65 Batch 200/372] current loss 4.45, ppl 85.89, throughput 620.45 samples/s, lr 31.71
[Epoch 65] throughput 42300.77 samples/s
[Epoch 65] time cost 52.27s, valid loss 4.46, valid ppl 86.79, lr 30.00
[Epoch 65] test loss 4.41, test ppl 82.15
[Epoch 66 Batch 200/372] current loss 4.44, ppl 84.92, throughput 632.49 samples/s, lr 30.00
[Epoch 66] throughput 42374.63 samples/s
[Epoch 66] time cost 52.14s, valid loss 4.46, valid ppl 86.70, lr 30.00
[Epoch 66] test loss 4.41, test ppl 82.06
[Epoch 67 Batch 200/372] current loss 4.45, ppl 85.74, throughput 598.66 samples/s, lr 30.43
[Epoch 67] throughput 41922.16 samples/s
[Epoch 67] time cost 52.68s, valid loss 4.46, valid ppl 86.60, lr 30.00
[Epoch 67] test loss 4.41, test ppl 81.97
[Epoch 68 Batch 200/372] current loss 4.44, ppl 84.62, throughput 620.85 samples/s, lr 29.14
[Epoch 68] throughput 42776.30 samples/s
[Epoch 68] time cost 51.67s, valid loss 4.46, valid ppl 86.50, lr 30.00
[Epoch 68] test loss 4.41, test ppl 81.89
[Epoch 69 Batch 200/372] current loss 4.45, ppl 85.35, throughput 649.11 samples/s, lr 29.57
[Epoch 69] throughput 42931.04 samples/s
[Epoch 69] time cost 51.73s, valid loss 4.46, valid ppl 86.42, lr 30.00
[Epoch 69] test loss 4.40, test ppl 81.80
[Epoch 70 Batch 200/372] current loss 4.44, ppl 85.04, throughput 635.48 samples/s, lr 29.57
[Epoch 70] throughput 42318.56 samples/s
[Epoch 70] time cost 52.19s, valid loss 4.46, valid ppl 86.33, lr 30.00
[Epoch 70] test loss 4.40, test ppl 81.72
[Epoch 71 Batch 200/372] current loss 4.43, ppl 84.20, throughput 636.80 samples/s, lr 32.14
[Epoch 71] throughput 42835.40 samples/s
[Epoch 71] time cost 51.75s, valid loss 4.46, valid ppl 86.25, lr 30.00
[Epoch 71] test loss 4.40, test ppl 81.65
[Epoch 72 Batch 200/372] current loss 4.42, ppl 83.11, throughput 644.80 samples/s, lr 30.00
[Epoch 72] throughput 43489.42 samples/s
[Epoch 72] time cost 50.99s, valid loss 4.46, valid ppl 86.17, lr 30.00
[Epoch 72] test loss 4.40, test ppl 81.57
[Epoch 73 Batch 200/372] current loss 4.44, ppl 84.66, throughput 594.11 samples/s, lr 31.29
[Epoch 73] throughput 41731.42 samples/s
[Epoch 73] time cost 52.88s, valid loss 4.46, valid ppl 86.09, lr 30.00
[Epoch 73] test loss 4.40, test ppl 81.49
[Epoch 74 Batch 200/372] current loss 4.42, ppl 83.09, throughput 608.43 samples/s, lr 29.57
[Epoch 74] throughput 42080.81 samples/s
[Epoch 74] time cost 52.56s, valid loss 4.45, valid ppl 86.01, lr 30.00
[Epoch 74] test loss 4.40, test ppl 81.42
[Epoch 75 Batch 200/372] current loss 4.41, ppl 82.15, throughput 628.36 samples/s, lr 16.29
[Epoch 75] throughput 42419.68 samples/s
[Epoch 75] time cost 52.19s, valid loss 4.45, valid ppl 85.93, lr 30.00
[Epoch 75] test loss 4.40, test ppl 81.34
[Epoch 76 Batch 200/372] current loss 4.42, ppl 82.80, throughput 611.88 samples/s, lr 27.86
[Epoch 76] throughput 42369.68 samples/s
[Epoch 76] time cost 52.20s, valid loss 4.45, valid ppl 85.85, lr 30.00
[Epoch 76] test loss 4.40, test ppl 81.27
[Epoch 77 Batch 200/372] current loss 4.40, ppl 81.82, throughput 608.03 samples/s, lr 26.14
[Epoch 77] throughput 42248.01 samples/s
[Epoch 77] time cost 52.31s, valid loss 4.45, valid ppl 85.78, lr 30.00
[Epoch 77] test loss 4.40, test ppl 81.21
[Epoch 78 Batch 200/372] current loss 4.38, ppl 80.20, throughput 644.05 samples/s, lr 26.14
[Epoch 78] throughput 43045.25 samples/s
[Epoch 78] time cost 51.36s, valid loss 4.45, valid ppl 85.71, lr 30.00
[Epoch 78] test loss 4.40, test ppl 81.14
[Epoch 79 Batch 200/372] current loss 4.39, ppl 80.93, throughput 620.29 samples/s, lr 24.86
[Epoch 79] throughput 42148.68 samples/s
[Epoch 79] time cost 52.41s, valid loss 4.45, valid ppl 85.64, lr 30.00
[Epoch 79] test loss 4.40, test ppl 81.08
[Epoch 80 Batch 200/372] current loss 4.39, ppl 80.49, throughput 627.29 samples/s, lr 26.57
[Epoch 80] throughput 42012.09 samples/s
[Epoch 80] time cost 52.57s, valid loss 4.45, valid ppl 85.57, lr 30.00
[Epoch 80] test loss 4.39, test ppl 81.02
[Epoch 81 Batch 200/372] current loss 4.39, ppl 80.33, throughput 619.81 samples/s, lr 31.71
[Epoch 81] throughput 42319.90 samples/s
[Epoch 81] time cost 52.19s, valid loss 4.45, valid ppl 85.50, lr 30.00
[Epoch 81] test loss 4.39, test ppl 80.96
[Epoch 82 Batch 200/372] current loss 4.40, ppl 81.08, throughput 633.14 samples/s, lr 27.86
[Epoch 82] throughput 42723.98 samples/s
[Epoch 82] time cost 51.75s, valid loss 4.45, valid ppl 85.44, lr 30.00
[Epoch 82] test loss 4.39, test ppl 80.90
[Epoch 83 Batch 200/372] current loss 4.38, ppl 79.69, throughput 623.08 samples/s, lr 30.00
[Epoch 83] throughput 42739.28 samples/s
[Epoch 83] time cost 51.74s, valid loss 4.45, valid ppl 85.37, lr 30.00
[Epoch 83] test loss 4.39, test ppl 80.84
[Epoch 84 Batch 200/372] current loss 4.37, ppl 79.37, throughput 623.88 samples/s, lr 30.43
[Epoch 84] throughput 42153.19 samples/s
[Epoch 84] time cost 52.51s, valid loss 4.45, valid ppl 85.31, lr 30.00
[Epoch 84] test loss 4.39, test ppl 80.79
[Epoch 85 Batch 200/372] current loss 4.39, ppl 80.27, throughput 626.80 samples/s, lr 30.86
[Epoch 85] throughput 42386.08 samples/s
[Epoch 85] time cost 52.21s, valid loss 4.45, valid ppl 85.24, lr 30.00
[Epoch 85] test loss 4.39, test ppl 80.73
[Epoch 86 Batch 200/372] current loss 4.36, ppl 77.97, throughput 595.65 samples/s, lr 30.00
[Epoch 86] throughput 41601.81 samples/s
[Epoch 86] time cost 53.13s, valid loss 4.44, valid ppl 85.18, lr 30.00
[Epoch 86] test loss 4.39, test ppl 80.68
[Epoch 87 Batch 200/372] current loss 4.37, ppl 79.16, throughput 603.91 samples/s, lr 28.71
[Epoch 87] throughput 42503.69 samples/s
[Epoch 87] time cost 52.14s, valid loss 4.44, valid ppl 85.12, lr 30.00
[Epoch 87] test loss 4.39, test ppl 80.63
[Epoch 88 Batch 200/372] current loss 4.36, ppl 78.37, throughput 627.36 samples/s, lr 27.43
[Epoch 88] throughput 42298.30 samples/s
[Epoch 88] time cost 52.22s, valid loss 4.44, valid ppl 85.06, lr 30.00
[Epoch 88] test loss 4.39, test ppl 80.57
[Epoch 89 Batch 200/372] current loss 4.38, ppl 79.57, throughput 626.20 samples/s, lr 30.43
[Epoch 89] throughput 42983.71 samples/s
[Epoch 89] time cost 51.55s, valid loss 4.44, valid ppl 85.00, lr 30.00
[Epoch 89] test loss 4.39, test ppl 80.52
[Epoch 90 Batch 200/372] current loss 4.37, ppl 78.70, throughput 622.36 samples/s, lr 29.57
[Epoch 90] throughput 42838.53 samples/s
[Epoch 90] time cost 51.64s, valid loss 4.44, valid ppl 84.94, lr 30.00
[Epoch 90] test loss 4.39, test ppl 80.47
[Epoch 91 Batch 200/372] current loss 4.35, ppl 77.60, throughput 640.73 samples/s, lr 31.29
[Epoch 91] throughput 43112.23 samples/s
[Epoch 91] time cost 51.31s, valid loss 4.44, valid ppl 84.88, lr 30.00
[Epoch 91] test loss 4.39, test ppl 80.42
[Epoch 92 Batch 200/372] current loss 4.36, ppl 78.13, throughput 627.94 samples/s, lr 28.71
[Epoch 92] throughput 42647.41 samples/s
[Epoch 92] time cost 51.83s, valid loss 4.44, valid ppl 84.83, lr 30.00
[Epoch 92] test loss 4.39, test ppl 80.37
[Epoch 93 Batch 200/372] current loss 4.36, ppl 77.92, throughput 637.66 samples/s, lr 33.43
[Epoch 93] throughput 42212.90 samples/s
[Epoch 93] time cost 52.29s, valid loss 4.44, valid ppl 84.78, lr 30.00
[Epoch 93] test loss 4.39, test ppl 80.32
[Epoch 94 Batch 200/372] current loss 4.34, ppl 76.89, throughput 622.54 samples/s, lr 28.29
[Epoch 94] throughput 42398.41 samples/s
[Epoch 94] time cost 52.09s, valid loss 4.44, valid ppl 84.72, lr 30.00
[Epoch 94] test loss 4.39, test ppl 80.28
[Epoch 95 Batch 200/372] current loss 4.35, ppl 77.14, throughput 611.84 samples/s, lr 30.86
[Epoch 95] throughput 42214.58 samples/s
[Epoch 95] time cost 52.44s, valid loss 4.44, valid ppl 84.67, lr 30.00
[Epoch 95] test loss 4.38, test ppl 80.23
[Epoch 96 Batch 200/372] current loss 4.34, ppl 76.49, throughput 617.29 samples/s, lr 31.71
[Epoch 96] throughput 42103.51 samples/s
[Epoch 96] time cost 52.45s, valid loss 4.44, valid ppl 84.62, lr 30.00
[Epoch 96] test loss 4.38, test ppl 80.18
[Epoch 97 Batch 200/372] current loss 4.35, ppl 77.38, throughput 622.09 samples/s, lr 26.57
[Epoch 97] throughput 42785.34 samples/s
[Epoch 97] time cost 51.78s, valid loss 4.44, valid ppl 84.57, lr 30.00
[Epoch 97] test loss 4.38, test ppl 80.14
[Epoch 98 Batch 200/372] current loss 4.33, ppl 76.18, throughput 598.95 samples/s, lr 31.29
[Epoch 98] throughput 42206.15 samples/s
[Epoch 98] time cost 52.38s, valid loss 4.44, valid ppl 84.52, lr 30.00
[Epoch 98] test loss 4.38, test ppl 80.10
[Epoch 99 Batch 200/372] current loss 4.35, ppl 77.42, throughput 632.05 samples/s, lr 29.57
[Epoch 99] throughput 42833.37 samples/s
[Epoch 99] time cost 51.69s, valid loss 4.44, valid ppl 84.47, lr 30.00
[Epoch 99] test loss 4.38, test ppl 80.05
[Epoch 100 Batch 200/372] current loss 4.34, ppl 76.61, throughput 624.72 samples/s, lr 26.14
[Epoch 100] throughput 43187.75 samples/s
[Epoch 100] time cost 51.29s, valid loss 4.44, valid ppl 84.42, lr 30.00
[Epoch 100] test loss 4.38, test ppl 80.01
[Epoch 101 Batch 200/372] current loss 4.33, ppl 75.89, throughput 630.16 samples/s, lr 31.29
[Epoch 101] throughput 42434.37 samples/s
[Epoch 101] time cost 52.06s, valid loss 4.44, valid ppl 84.37, lr 30.00
[Epoch 101] test loss 4.38, test ppl 79.97
[Epoch 102 Batch 200/372] current loss 4.32, ppl 75.51, throughput 634.73 samples/s, lr 30.86
[Epoch 102] throughput 42820.79 samples/s
[Epoch 102] time cost 51.61s, valid loss 4.43, valid ppl 84.32, lr 30.00
[Epoch 102] test loss 4.38, test ppl 79.93
[Epoch 103 Batch 200/372] current loss 4.32, ppl 75.36, throughput 629.37 samples/s, lr 31.71
[Epoch 103] throughput 42395.48 samples/s
[Epoch 103] time cost 52.13s, valid loss 4.43, valid ppl 84.27, lr 30.00
[Epoch 103] test loss 4.38, test ppl 79.88
[Epoch 104 Batch 200/372] current loss 4.33, ppl 75.84, throughput 611.30 samples/s, lr 30.43
[Epoch 104] throughput 41930.82 samples/s
[Epoch 104] time cost 52.70s, valid loss 4.43, valid ppl 84.23, lr 30.00
[Epoch 104] test loss 4.38, test ppl 79.84
[Epoch 105 Batch 200/372] current loss 4.31, ppl 74.72, throughput 614.01 samples/s, lr 29.57
[Epoch 105] throughput 42483.24 samples/s
[Epoch 105] time cost 52.09s, valid loss 4.43, valid ppl 84.19, lr 30.00
[Epoch 105] test loss 4.38, test ppl 79.81
[Epoch 106 Batch 200/372] current loss 4.32, ppl 75.00, throughput 640.20 samples/s, lr 27.86
[Epoch 106] throughput 42805.04 samples/s
[Epoch 106] time cost 51.64s, valid loss 4.43, valid ppl 84.14, lr 30.00
[Epoch 106] test loss 4.38, test ppl 79.77
[Epoch 107 Batch 200/372] current loss 4.31, ppl 74.70, throughput 619.72 samples/s, lr 30.86
[Epoch 107] throughput 42513.68 samples/s
[Epoch 107] time cost 52.11s, valid loss 4.43, valid ppl 84.10, lr 30.00
[Epoch 107] test loss 4.38, test ppl 79.73
[Epoch 108 Batch 200/372] current loss 4.31, ppl 74.74, throughput 621.66 samples/s, lr 29.14
[Epoch 108] throughput 42665.14 samples/s
[Epoch 108] time cost 51.92s, valid loss 4.43, valid ppl 84.06, lr 30.00
[Epoch 108] test loss 4.38, test ppl 79.69
[Epoch 109 Batch 200/372] current loss 4.32, ppl 74.92, throughput 617.69 samples/s, lr 32.14
[Epoch 109] throughput 42151.91 samples/s
[Epoch 109] time cost 52.47s, valid loss 4.43, valid ppl 84.02, lr 30.00
[Epoch 109] test loss 4.38, test ppl 79.66
[Epoch 110 Batch 200/372] current loss 4.29, ppl 73.04, throughput 632.90 samples/s, lr 29.57
[Epoch 110] throughput 42212.33 samples/s
[Epoch 110] time cost 52.37s, valid loss 4.43, valid ppl 83.98, lr 30.00
[Epoch 110] test loss 4.38, test ppl 79.62
[Epoch 111 Batch 200/372] current loss 4.30, ppl 73.96, throughput 629.84 samples/s, lr 32.57
[Epoch 111] throughput 42814.44 samples/s
[Epoch 111] time cost 51.74s, valid loss 4.43, valid ppl 83.95, lr 30.00
[Epoch 111] test loss 4.38, test ppl 79.58
[Epoch 112 Batch 200/372] current loss 4.30, ppl 73.99, throughput 644.73 samples/s, lr 30.00
[Epoch 112] throughput 43096.30 samples/s
[Epoch 112] time cost 51.34s, valid loss 4.43, valid ppl 83.91, lr 30.00
[Epoch 112] test loss 4.38, test ppl 79.55
[Epoch 113 Batch 200/372] current loss 4.30, ppl 73.85, throughput 611.62 samples/s, lr 28.71
[Epoch 113] throughput 42306.62 samples/s
[Epoch 113] time cost 52.31s, valid loss 4.43, valid ppl 83.87, lr 30.00
[Epoch 113] test loss 4.38, test ppl 79.52
[Epoch 114 Batch 200/372] current loss 4.29, ppl 72.89, throughput 630.09 samples/s, lr 33.00
[Epoch 114] throughput 43033.33 samples/s
[Epoch 114] time cost 51.46s, valid loss 4.43, valid ppl 83.84, lr 30.00
[Epoch 114] test loss 4.38, test ppl 79.49
[Epoch 115 Batch 200/372] current loss 4.29, ppl 72.80, throughput 610.95 samples/s, lr 29.14
[Epoch 115] throughput 41864.82 samples/s
[Epoch 115] time cost 52.80s, valid loss 4.43, valid ppl 83.80, lr 30.00
[Epoch 115] test loss 4.38, test ppl 79.45
[Epoch 116 Batch 200/372] current loss 4.28, ppl 72.41, throughput 619.12 samples/s, lr 32.57
[Epoch 116] throughput 43114.96 samples/s
[Epoch 116] time cost 51.30s, valid loss 4.43, valid ppl 83.76, lr 30.00
[Epoch 116] test loss 4.37, test ppl 79.42
[Epoch 117 Batch 200/372] current loss 4.29, ppl 73.24, throughput 610.93 samples/s, lr 29.57
[Epoch 117] throughput 42872.57 samples/s
[Epoch 117] time cost 51.57s, valid loss 4.43, valid ppl 83.73, lr 30.00
[Epoch 117] test loss 4.37, test ppl 79.39
[Epoch 118 Batch 200/372] current loss 4.29, ppl 73.00, throughput 637.12 samples/s, lr 31.29
[Epoch 118] throughput 42507.10 samples/s
[Epoch 118] time cost 52.15s, valid loss 4.43, valid ppl 83.69, lr 30.00
[Epoch 118] test loss 4.37, test ppl 79.36
[Epoch 119 Batch 200/372] current loss 4.27, ppl 71.81, throughput 615.58 samples/s, lr 30.43
[Epoch 119] throughput 42060.26 samples/s
[Epoch 119] time cost 52.53s, valid loss 4.43, valid ppl 83.66, lr 30.00
[Epoch 119] test loss 4.37, test ppl 79.33
[Epoch 120 Batch 200/372] current loss 4.28, ppl 72.09, throughput 622.62 samples/s, lr 29.14
[Epoch 120] throughput 42323.88 samples/s
[Epoch 120] time cost 52.25s, valid loss 4.43, valid ppl 83.63, lr 30.00
[Epoch 120] test loss 4.37, test ppl 79.29
[Epoch 121 Batch 200/372] current loss 4.28, ppl 72.20, throughput 636.11 samples/s, lr 28.71
[Epoch 121] throughput 42903.70 samples/s
[Epoch 121] time cost 51.54s, valid loss 4.43, valid ppl 83.60, lr 30.00
[Epoch 121] test loss 4.37, test ppl 79.26
[Epoch 122 Batch 200/372] current loss 4.27, ppl 71.46, throughput 619.79 samples/s, lr 27.43
[Epoch 122] throughput 42450.96 samples/s
[Epoch 122] time cost 52.15s, valid loss 4.43, valid ppl 83.56, lr 30.00
[Epoch 122] test loss 4.37, test ppl 79.23
[Epoch 123 Batch 200/372] current loss 4.28, ppl 72.31, throughput 622.97 samples/s, lr 31.29
[Epoch 123] throughput 42299.64 samples/s
[Epoch 123] time cost 52.30s, valid loss 4.43, valid ppl 83.53, lr 30.00
[Epoch 123] test loss 4.37, test ppl 79.20
[Epoch 124 Batch 200/372] current loss 4.27, ppl 71.71, throughput 604.45 samples/s, lr 30.43
[Epoch 124] throughput 41669.58 samples/s
[Epoch 124] time cost 53.01s, valid loss 4.42, valid ppl 83.50, lr 30.00
[Epoch 124] test loss 4.37, test ppl 79.17
[Epoch 125 Batch 200/372] current loss 4.29, ppl 72.97, throughput 624.41 samples/s, lr 27.86
[Epoch 125] throughput 41211.76 samples/s
[Epoch 125] time cost 53.65s, valid loss 4.42, valid ppl 83.47, lr 30.00
[Epoch 125] test loss 4.37, test ppl 79.14
[Epoch 126 Batch 200/372] current loss 4.27, ppl 71.57, throughput 622.67 samples/s, lr 31.29
[Epoch 126] throughput 42384.30 samples/s
[Epoch 126] time cost 52.17s, valid loss 4.42, valid ppl 83.44, lr 30.00
[Epoch 126] test loss 4.37, test ppl 79.12
[Epoch 127 Batch 200/372] current loss 4.27, ppl 71.71, throughput 622.63 samples/s, lr 28.71
[Epoch 127] throughput 42546.11 samples/s
[Epoch 127] time cost 51.95s, valid loss 4.42, valid ppl 83.41, lr 30.00
[Epoch 127] test loss 4.37, test ppl 79.09
[Epoch 128 Batch 200/372] current loss 4.27, ppl 71.85, throughput 625.10 samples/s, lr 28.71
[Epoch 128] throughput 42250.49 samples/s
[Epoch 128] time cost 52.27s, valid loss 4.42, valid ppl 83.38, lr 30.00
[Epoch 128] test loss 4.37, test ppl 79.06
[Epoch 129 Batch 200/372] current loss 4.27, ppl 71.29, throughput 628.77 samples/s, lr 31.71
[Epoch 129] throughput 42724.51 samples/s
[Epoch 129] time cost 51.77s, valid loss 4.42, valid ppl 83.35, lr 30.00
[Epoch 129] test loss 4.37, test ppl 79.03
[Epoch 130 Batch 200/372] current loss 4.26, ppl 70.52, throughput 617.96 samples/s, lr 28.29
[Epoch 130] throughput 41837.55 samples/s
[Epoch 130] time cost 52.78s, valid loss 4.42, valid ppl 83.32, lr 30.00
[Epoch 130] test loss 4.37, test ppl 79.00
[Epoch 131 Batch 200/372] current loss 4.27, ppl 71.54, throughput 625.81 samples/s, lr 33.86
[Epoch 131] throughput 42176.83 samples/s
[Epoch 131] time cost 52.43s, valid loss 4.42, valid ppl 83.29, lr 30.00
[Epoch 131] test loss 4.37, test ppl 78.98
[Epoch 132 Batch 200/372] current loss 4.25, ppl 70.22, throughput 619.91 samples/s, lr 28.71
[Epoch 132] throughput 41883.92 samples/s
[Epoch 132] time cost 52.71s, valid loss 4.42, valid ppl 83.26, lr 30.00
[Epoch 132] test loss 4.37, test ppl 78.95
[Epoch 133 Batch 200/372] current loss 4.26, ppl 71.06, throughput 605.59 samples/s, lr 26.14
[Epoch 133] throughput 41553.04 samples/s
[Epoch 133] time cost 53.10s, valid loss 4.42, valid ppl 83.24, lr 30.00
[Epoch 133] test loss 4.37, test ppl 78.92
[Epoch 134 Batch 200/372] current loss 4.26, ppl 70.61, throughput 621.59 samples/s, lr 30.43
[Epoch 134] throughput 42943.95 samples/s
[Epoch 134] time cost 51.61s, valid loss 4.42, valid ppl 83.21, lr 30.00
[Epoch 134] test loss 4.37, test ppl 78.90
[Epoch 135 Batch 200/372] current loss 4.26, ppl 70.74, throughput 615.73 samples/s, lr 14.57
[Epoch 135] throughput 41980.57 samples/s
[Epoch 135] time cost 52.67s, valid loss 4.42, valid ppl 83.18, lr 30.00
[Epoch 135] test loss 4.37, test ppl 78.87
[Epoch 136 Batch 200/372] current loss 4.25, ppl 70.25, throughput 630.72 samples/s, lr 14.14
[Epoch 136] throughput 42559.18 samples/s
[Epoch 136] time cost 51.92s, valid loss 4.42, valid ppl 83.15, lr 30.00
[Epoch 136] test loss 4.37, test ppl 78.84
[Epoch 137 Batch 200/372] current loss 4.25, ppl 70.38, throughput 588.01 samples/s, lr 30.86
[Epoch 137] throughput 41450.64 samples/s
[Epoch 137] time cost 53.23s, valid loss 4.42, valid ppl 83.13, lr 30.00
[Epoch 137] test loss 4.37, test ppl 78.82
[Epoch 138 Batch 200/372] current loss 4.25, ppl 70.11, throughput 601.15 samples/s, lr 30.00
[Epoch 138] throughput 42605.72 samples/s
[Epoch 138] time cost 51.87s, valid loss 4.42, valid ppl 83.10, lr 30.00
[Epoch 138] test loss 4.37, test ppl 78.80
[Epoch 139 Batch 200/372] current loss 4.26, ppl 70.54, throughput 605.35 samples/s, lr 26.14
[Epoch 139] throughput 41854.20 samples/s
[Epoch 139] time cost 52.76s, valid loss 4.42, valid ppl 83.08, lr 30.00
[Epoch 139] test loss 4.37, test ppl 78.77
[Epoch 140 Batch 200/372] current loss 4.25, ppl 69.79, throughput 598.49 samples/s, lr 30.00
[Epoch 140] throughput 42340.23 samples/s
[Epoch 140] time cost 52.16s, valid loss 4.42, valid ppl 83.05, lr 30.00
[Epoch 140] test loss 4.37, test ppl 78.75
[Epoch 141 Batch 200/372] current loss 4.25, ppl 69.78, throughput 613.50 samples/s, lr 28.29
[Epoch 141] throughput 42020.03 samples/s
[Epoch 141] time cost 52.59s, valid loss 4.42, valid ppl 83.02, lr 30.00
[Epoch 141] test loss 4.37, test ppl 78.72
[Epoch 142 Batch 200/372] current loss 4.25, ppl 69.83, throughput 610.89 samples/s, lr 32.57
[Epoch 142] throughput 41820.27 samples/s
[Epoch 142] time cost 52.89s, valid loss 4.42, valid ppl 83.00, lr 30.00
[Epoch 142] test loss 4.37, test ppl 78.70
[Epoch 143 Batch 200/372] current loss 4.25, ppl 69.85, throughput 623.50 samples/s, lr 31.71
[Epoch 143] throughput 42529.90 samples/s
[Epoch 143] time cost 51.98s, valid loss 4.42, valid ppl 82.97, lr 30.00
[Epoch 143] test loss 4.37, test ppl 78.68
[Epoch 144 Batch 200/372] current loss 4.25, ppl 70.12, throughput 628.69 samples/s, lr 31.29
[Epoch 144] throughput 42440.15 samples/s
[Epoch 144] time cost 52.10s, valid loss 4.42, valid ppl 82.95, lr 30.00
[Epoch 144] test loss 4.37, test ppl 78.65
[Epoch 145 Batch 200/372] current loss 4.25, ppl 69.88, throughput 632.59 samples/s, lr 29.14
[Epoch 145] throughput 42652.77 samples/s
[Epoch 145] time cost 51.88s, valid loss 4.42, valid ppl 82.92, lr 30.00
[Epoch 145] test loss 4.36, test ppl 78.63
[Epoch 146 Batch 200/372] current loss 4.23, ppl 69.06, throughput 617.29 samples/s, lr 29.57
[Epoch 146] throughput 42592.84 samples/s
[Epoch 146] time cost 51.88s, valid loss 4.42, valid ppl 82.90, lr 30.00
[Epoch 146] test loss 4.36, test ppl 78.61
[Epoch 147 Batch 200/372] current loss 4.25, ppl 69.84, throughput 614.56 samples/s, lr 30.86
[Epoch 147] throughput 41912.71 samples/s
[Epoch 147] time cost 52.70s, valid loss 4.42, valid ppl 82.87, lr 30.00
[Epoch 147] test loss 4.36, test ppl 78.59
[Epoch 148 Batch 200/372] current loss 4.25, ppl 70.03, throughput 629.48 samples/s, lr 30.43
[Epoch 148] throughput 42416.71 samples/s
[Epoch 148] time cost 52.10s, valid loss 4.42, valid ppl 82.85, lr 30.00
[Epoch 148] test loss 4.36, test ppl 78.56
[Epoch 149 Batch 200/372] current loss 4.24, ppl 69.35, throughput 631.98 samples/s, lr 27.43
[Epoch 149] throughput 42542.01 samples/s
[Epoch 149] time cost 52.00s, valid loss 4.42, valid ppl 82.82, lr 30.00
[Epoch 149] test loss 4.36, test ppl 78.54
[Epoch 150 Batch 200/372] current loss 4.23, ppl 68.42, throughput 634.41 samples/s, lr 32.57
[Epoch 150] throughput 42046.93 samples/s
[Epoch 150] time cost 52.58s, valid loss 4.42, valid ppl 82.80, lr 30.00
[Epoch 150] test loss 4.36, test ppl 78.52
[Epoch 151 Batch 200/372] current loss 4.23, ppl 68.38, throughput 636.14 samples/s, lr 27.86
[Epoch 151] throughput 42939.51 samples/s
[Epoch 151] time cost 51.46s, valid loss 4.42, valid ppl 82.77, lr 30.00
[Epoch 151] test loss 4.36, test ppl 78.50
[Epoch 152 Batch 200/372] current loss 4.23, ppl 68.83, throughput 628.97 samples/s, lr 28.71
[Epoch 152] throughput 43333.71 samples/s
[Epoch 152] time cost 51.06s, valid loss 4.42, valid ppl 82.75, lr 30.00
[Epoch 152] test loss 4.36, test ppl 78.48
[Epoch 153 Batch 200/372] current loss 4.23, ppl 68.82, throughput 605.18 samples/s, lr 32.14
[Epoch 153] throughput 42178.94 samples/s
[Epoch 153] time cost 52.42s, valid loss 4.42, valid ppl 82.73, lr 30.00
[Epoch 153] test loss 4.36, test ppl 78.45
[Epoch 154 Batch 200/372] current loss 4.23, ppl 69.05, throughput 631.22 samples/s, lr 28.71
[Epoch 154] throughput 43374.35 samples/s
[Epoch 154] time cost 50.99s, valid loss 4.42, valid ppl 82.71, lr 30.00
[Epoch 154] test loss 4.36, test ppl 78.43
[Epoch 155 Batch 200/372] current loss 4.22, ppl 67.94, throughput 633.71 samples/s, lr 33.43
[Epoch 155] throughput 42709.06 samples/s
[Epoch 155] time cost 51.78s, valid loss 4.42, valid ppl 82.68, lr 30.00
[Epoch 155] test loss 4.36, test ppl 78.41
[Epoch 156 Batch 200/372] current loss 4.22, ppl 67.77, throughput 623.61 samples/s, lr 28.71
[Epoch 156] throughput 43191.29 samples/s
[Epoch 156] time cost 51.21s, valid loss 4.41, valid ppl 82.66, lr 30.00
[Epoch 156] test loss 4.36, test ppl 78.39
[Epoch 157 Batch 200/372] current loss 4.23, ppl 68.77, throughput 625.52 samples/s, lr 27.86
[Epoch 157] throughput 42713.32 samples/s
[Epoch 157] time cost 51.90s, valid loss 4.41, valid ppl 82.64, lr 30.00
[Epoch 157] test loss 4.36, test ppl 78.37
[Epoch 158 Batch 200/372] current loss 4.23, ppl 68.51, throughput 620.36 samples/s, lr 30.43
[Epoch 158] throughput 42457.97 samples/s
[Epoch 158] time cost 52.03s, valid loss 4.41, valid ppl 82.61, lr 30.00
[Epoch 158] test loss 4.36, test ppl 78.35
[Epoch 159 Batch 200/372] current loss 4.21, ppl 67.68, throughput 653.23 samples/s, lr 30.00
[Epoch 159] throughput 43662.53 samples/s
[Epoch 159] time cost 50.74s, valid loss 4.41, valid ppl 82.59, lr 30.00
[Epoch 159] test loss 4.36, test ppl 78.33
[Epoch 160 Batch 200/372] current loss 4.23, ppl 68.58, throughput 623.29 samples/s, lr 31.29
[Epoch 160] throughput 42330.18 samples/s
[Epoch 160] time cost 52.27s, valid loss 4.41, valid ppl 82.57, lr 30.00
[Epoch 160] test loss 4.36, test ppl 78.31
[Epoch 161 Batch 200/372] current loss 4.22, ppl 67.98, throughput 610.20 samples/s, lr 31.29
[Epoch 161] throughput 41748.78 samples/s
[Epoch 161] time cost 52.84s, valid loss 4.41, valid ppl 82.55, lr 30.00
[Epoch 161] test loss 4.36, test ppl 78.30
[Epoch 162 Batch 200/372] current loss 4.22, ppl 67.89, throughput 643.48 samples/s, lr 27.43
[Epoch 162] throughput 42560.14 samples/s
[Epoch 162] time cost 51.91s, valid loss 4.41, valid ppl 82.53, lr 30.00
[Epoch 162] test loss 4.36, test ppl 78.28
[Epoch 163 Batch 200/372] current loss 4.23, ppl 68.62, throughput 635.56 samples/s, lr 33.43
[Epoch 163] throughput 42859.09 samples/s
[Epoch 163] time cost 51.58s, valid loss 4.41, valid ppl 82.51, lr 30.00
[Epoch 163] test loss 4.36, test ppl 78.26
[Epoch 164 Batch 200/372] current loss 4.23, ppl 68.76, throughput 611.25 samples/s, lr 28.71
[Epoch 164] throughput 41671.31 samples/s
[Epoch 164] time cost 53.00s, valid loss 4.41, valid ppl 82.49, lr 30.00
[Epoch 164] test loss 4.36, test ppl 78.24
[Epoch 165 Batch 200/372] current loss 4.22, ppl 68.25, throughput 592.58 samples/s, lr 32.14
[Epoch 165] throughput 41590.25 samples/s
[Epoch 165] time cost 53.15s, valid loss 4.41, valid ppl 82.47, lr 30.00
[Epoch 165] test loss 4.36, test ppl 78.22
[Epoch 166 Batch 200/372] current loss 4.21, ppl 67.40, throughput 620.67 samples/s, lr 13.71
[Epoch 166] throughput 41997.39 samples/s
[Epoch 166] time cost 52.67s, valid loss 4.41, valid ppl 82.45, lr 30.00
[Epoch 166] test loss 4.36, test ppl 78.21
[Epoch 167 Batch 200/372] current loss 4.21, ppl 67.35, throughput 617.06 samples/s, lr 24.43
[Epoch 167] throughput 42855.18 samples/s
[Epoch 167] time cost 51.60s, valid loss 4.41, valid ppl 82.43, lr 30.00
[Epoch 167] test loss 4.36, test ppl 78.19
[Epoch 168 Batch 200/372] current loss 4.21, ppl 67.03, throughput 632.68 samples/s, lr 33.43
[Epoch 168] throughput 42519.23 samples/s
[Epoch 168] time cost 52.06s, valid loss 4.41, valid ppl 82.41, lr 30.00
[Epoch 168] test loss 4.36, test ppl 78.17
[Epoch 169 Batch 200/372] current loss 4.22, ppl 67.83, throughput 601.58 samples/s, lr 25.71
[Epoch 169] throughput 41822.07 samples/s
[Epoch 169] time cost 52.84s, valid loss 4.41, valid ppl 82.39, lr 30.00
[Epoch 169] test loss 4.36, test ppl 78.16
[Epoch 170 Batch 200/372] current loss 4.21, ppl 67.32, throughput 634.45 samples/s, lr 28.71
[Epoch 170] throughput 43138.90 samples/s
[Epoch 170] time cost 51.32s, valid loss 4.41, valid ppl 82.37, lr 30.00
[Epoch 170] test loss 4.36, test ppl 78.14
[Epoch 171 Batch 200/372] current loss 4.21, ppl 67.20, throughput 618.31 samples/s, lr 26.57
[Epoch 171] throughput 42127.97 samples/s
[Epoch 171] time cost 52.51s, valid loss 4.41, valid ppl 82.35, lr 30.00
[Epoch 171] test loss 4.36, test ppl 78.12
[Epoch 172 Batch 200/372] current loss 4.21, ppl 67.17, throughput 648.29 samples/s, lr 27.00
[Epoch 172] throughput 42732.12 samples/s
[Epoch 172] time cost 51.74s, valid loss 4.41, valid ppl 82.33, lr 30.00
[Epoch 172] test loss 4.36, test ppl 78.11
[Epoch 173 Batch 200/372] current loss 4.21, ppl 67.26, throughput 640.31 samples/s, lr 29.57
[Epoch 173] throughput 42851.79 samples/s
[Epoch 173] time cost 51.56s, valid loss 4.41, valid ppl 82.32, lr 30.00
[Epoch 173] test loss 4.36, test ppl 78.09
[Epoch 174 Batch 200/372] current loss 4.20, ppl 67.00, throughput 641.56 samples/s, lr 28.71
[Epoch 174] throughput 43475.20 samples/s
[Epoch 174] time cost 51.02s, valid loss 4.41, valid ppl 82.30, lr 30.00
[Epoch 174] test loss 4.36, test ppl 78.08
[Epoch 175 Batch 200/372] current loss 4.21, ppl 67.62, throughput 625.55 samples/s, lr 29.57
[Epoch 175] throughput 43135.78 samples/s
[Epoch 175] time cost 51.29s, valid loss 4.41, valid ppl 82.28, lr 30.00
[Epoch 175] test loss 4.36, test ppl 78.06
[Epoch 176 Batch 200/372] current loss 4.22, ppl 68.05, throughput 617.58 samples/s, lr 31.71
[Epoch 176] throughput 43073.56 samples/s
[Epoch 176] time cost 51.39s, valid loss 4.41, valid ppl 82.26, lr 30.00
[Epoch 176] test loss 4.36, test ppl 78.05
[Epoch 177 Batch 200/372] current loss 4.21, ppl 67.64, throughput 621.43 samples/s, lr 30.86
[Epoch 177] throughput 42589.02 samples/s
[Epoch 177] time cost 51.91s, valid loss 4.41, valid ppl 82.25, lr 30.00
[Epoch 177] test loss 4.36, test ppl 78.03
[Epoch 178 Batch 200/372] current loss 4.20, ppl 66.91, throughput 633.88 samples/s, lr 30.00
[Epoch 178] throughput 42996.02 samples/s
[Epoch 178] time cost 51.53s, valid loss 4.41, valid ppl 82.23, lr 30.00
[Epoch 178] test loss 4.36, test ppl 78.02
[Epoch 179 Batch 200/372] current loss 4.21, ppl 67.36, throughput 632.30 samples/s, lr 30.43
[Epoch 179] throughput 42064.46 samples/s
[Epoch 179] time cost 52.54s, valid loss 4.41, valid ppl 82.21, lr 30.00
[Epoch 179] test loss 4.36, test ppl 78.00
[Epoch 180 Batch 200/372] current loss 4.20, ppl 66.42, throughput 617.41 samples/s, lr 31.71
[Epoch 180] throughput 41547.86 samples/s
[Epoch 180] time cost 53.13s, valid loss 4.41, valid ppl 82.19, lr 30.00
[Epoch 180] test loss 4.36, test ppl 77.98
[Epoch 181 Batch 200/372] current loss 4.21, ppl 67.17, throughput 630.85 samples/s, lr 30.00
[Epoch 181] throughput 42467.55 samples/s
[Epoch 181] time cost 52.16s, valid loss 4.41, valid ppl 82.17, lr 30.00
[Epoch 181] test loss 4.36, test ppl 77.97
[Epoch 182 Batch 200/372] current loss 4.19, ppl 66.31, throughput 640.38 samples/s, lr 30.86
[Epoch 182] throughput 43110.55 samples/s
[Epoch 182] time cost 51.40s, valid loss 4.41, valid ppl 82.16, lr 30.00
[Epoch 182] test loss 4.36, test ppl 77.95
[Epoch 183 Batch 200/372] current loss 4.20, ppl 67.00, throughput 648.02 samples/s, lr 27.86
[Epoch 183] throughput 43473.77 samples/s
[Epoch 183] time cost 51.12s, valid loss 4.41, valid ppl 82.14, lr 30.00
[Epoch 183] test loss 4.36, test ppl 77.94
[Epoch 184 Batch 200/372] current loss 4.21, ppl 67.39, throughput 623.74 samples/s, lr 31.71
[Epoch 184] throughput 42398.42 samples/s
[Epoch 184] time cost 52.12s, valid loss 4.41, valid ppl 82.12, lr 30.00
[Epoch 184] test loss 4.36, test ppl 77.92
[Epoch 185 Batch 200/372] current loss 4.19, ppl 66.33, throughput 614.59 samples/s, lr 31.71
[Epoch 185] throughput 42479.56 samples/s
[Epoch 185] time cost 52.09s, valid loss 4.41, valid ppl 82.10, lr 30.00
[Epoch 185] test loss 4.36, test ppl 77.91
[Epoch 186 Batch 200/372] current loss 4.19, ppl 65.87, throughput 626.51 samples/s, lr 27.43
[Epoch 186] throughput 42185.07 samples/s
[Epoch 186] time cost 52.39s, valid loss 4.41, valid ppl 82.09, lr 30.00
[Epoch 186] test loss 4.36, test ppl 77.89
[Epoch 187 Batch 200/372] current loss 4.19, ppl 66.13, throughput 650.30 samples/s, lr 32.14
[Epoch 187] throughput 42707.27 samples/s
[Epoch 187] time cost 51.76s, valid loss 4.41, valid ppl 82.07, lr 30.00
[Epoch 187] test loss 4.36, test ppl 77.88
[Epoch 188 Batch 200/372] current loss 4.19, ppl 66.35, throughput 621.48 samples/s, lr 29.57
[Epoch 188] throughput 42228.29 samples/s
[Epoch 188] time cost 52.43s, valid loss 4.41, valid ppl 82.05, lr 30.00
[Epoch 188] test loss 4.35, test ppl 77.86
[Epoch 189 Batch 200/372] current loss 4.20, ppl 66.60, throughput 629.83 samples/s, lr 27.00
[Epoch 189] throughput 42924.44 samples/s
[Epoch 189] time cost 51.55s, valid loss 4.41, valid ppl 82.04, lr 30.00
[Epoch 189] test loss 4.35, test ppl 77.85
[Epoch 190 Batch 200/372] current loss 4.20, ppl 66.41, throughput 629.17 samples/s, lr 28.29
[Epoch 190] throughput 43228.13 samples/s
[Epoch 190] time cost 51.18s, valid loss 4.41, valid ppl 82.02, lr 30.00
[Epoch 190] test loss 4.35, test ppl 77.83
[Epoch 191 Batch 200/372] current loss 4.19, ppl 66.32, throughput 618.72 samples/s, lr 27.86
[Epoch 191] throughput 42574.32 samples/s
[Epoch 191] time cost 51.89s, valid loss 4.41, valid ppl 82.00, lr 30.00
[Epoch 191] test loss 4.35, test ppl 77.82
[Epoch 192 Batch 200/372] current loss 4.18, ppl 65.57, throughput 612.29 samples/s, lr 30.86
[Epoch 192] throughput 42625.47 samples/s
[Epoch 192] time cost 51.85s, valid loss 4.41, valid ppl 81.99, lr 30.00
[Epoch 192] test loss 4.35, test ppl 77.81
[Epoch 193 Batch 200/372] current loss 4.20, ppl 66.52, throughput 635.19 samples/s, lr 32.14
[Epoch 193] throughput 42896.65 samples/s
[Epoch 193] time cost 51.66s, valid loss 4.41, valid ppl 81.97, lr 30.00
[Epoch 193] test loss 4.35, test ppl 77.80
[Epoch 194 Batch 200/372] current loss 4.18, ppl 65.36, throughput 637.32 samples/s, lr 27.86
[Epoch 194] throughput 42842.24 samples/s
[Epoch 194] time cost 51.59s, valid loss 4.41, valid ppl 81.96, lr 30.00
[Epoch 194] test loss 4.35, test ppl 77.78
[Epoch 195 Batch 200/372] current loss 4.19, ppl 66.05, throughput 618.51 samples/s, lr 28.71
[Epoch 195] throughput 42506.89 samples/s
[Epoch 195] time cost 52.05s, valid loss 4.41, valid ppl 81.94, lr 30.00
[Epoch 195] test loss 4.35, test ppl 77.77
[Epoch 196 Batch 200/372] current loss 4.19, ppl 65.94, throughput 622.84 samples/s, lr 30.00
[Epoch 196] throughput 42926.21 samples/s
[Epoch 196] time cost 51.47s, valid loss 4.41, valid ppl 81.92, lr 30.00
[Epoch 196] test loss 4.35, test ppl 77.76
[Epoch 197 Batch 200/372] current loss 4.19, ppl 66.09, throughput 625.54 samples/s, lr 31.71
[Epoch 197] throughput 42424.96 samples/s
[Epoch 197] time cost 52.08s, valid loss 4.41, valid ppl 81.91, lr 30.00
[Epoch 197] test loss 4.35, test ppl 77.74
[Epoch 198 Batch 200/372] current loss 4.19, ppl 65.88, throughput 634.99 samples/s, lr 28.29
[Epoch 198] throughput 42476.89 samples/s
[Epoch 198] time cost 52.01s, valid loss 4.41, valid ppl 81.89, lr 30.00
[Epoch 198] test loss 4.35, test ppl 77.73
[Epoch 199 Batch 200/372] current loss 4.17, ppl 64.70, throughput 647.75 samples/s, lr 30.43
[Epoch 199] throughput 43741.82 samples/s
[Epoch 199] time cost 50.70s, valid loss 4.41, valid ppl 81.88, lr 30.00
[Epoch 199] test loss 4.35, test ppl 77.71
[Epoch 200 Batch 200/372] current loss 4.17, ppl 65.00, throughput 613.53 samples/s, lr 25.71
[Epoch 200] throughput 42846.96 samples/s
[Epoch 200] time cost 51.59s, valid loss 4.41, valid ppl 81.86, lr 30.00
[Epoch 200] test loss 4.35, test ppl 77.70
[Epoch 201 Batch 200/372] current loss 4.19, ppl 66.18, throughput 620.43 samples/s, lr 29.57
[Epoch 201] throughput 42857.98 samples/s
[Epoch 201] time cost 51.59s, valid loss 4.40, valid ppl 81.85, lr 30.00
[Epoch 201] test loss 4.35, test ppl 77.69
[Epoch 202 Batch 200/372] current loss 4.17, ppl 64.96, throughput 624.46 samples/s, lr 26.14
[Epoch 202] throughput 42243.08 samples/s
[Epoch 202] time cost 52.28s, valid loss 4.40, valid ppl 81.83, lr 30.00
[Epoch 202] test loss 4.35, test ppl 77.67
[Epoch 203 Batch 200/372] current loss 4.18, ppl 65.23, throughput 605.76 samples/s, lr 31.29
[Epoch 203] throughput 41992.04 samples/s
[Epoch 203] time cost 52.64s, valid loss 4.40, valid ppl 81.82, lr 30.00
[Epoch 203] test loss 4.35, test ppl 77.66
[Epoch 204 Batch 200/372] current loss 4.17, ppl 64.97, throughput 617.51 samples/s, lr 25.29
[Epoch 204] throughput 42216.45 samples/s
[Epoch 204] time cost 52.32s, valid loss 4.40, valid ppl 81.80, lr 30.00
[Epoch 204] test loss 4.35, test ppl 77.65
[Epoch 205 Batch 200/372] current loss 4.18, ppl 65.20, throughput 638.78 samples/s, lr 30.00
[Epoch 205] throughput 43120.25 samples/s
[Epoch 205] time cost 51.31s, valid loss 4.40, valid ppl 81.79, lr 30.00
[Epoch 205] test loss 4.35, test ppl 77.64
[Epoch 206 Batch 200/372] current loss 4.17, ppl 65.01, throughput 642.96 samples/s, lr 30.43
[Epoch 206] throughput 42623.66 samples/s
[Epoch 206] time cost 51.83s, valid loss 4.40, valid ppl 81.78, lr 30.00
[Epoch 206] test loss 4.35, test ppl 77.62
[Epoch 207 Batch 200/372] current loss 4.17, ppl 64.75, throughput 614.45 samples/s, lr 27.43
[Epoch 207] throughput 42009.19 samples/s
[Epoch 207] time cost 52.68s, valid loss 4.40, valid ppl 81.76, lr 30.00
[Epoch 207] test loss 4.35, test ppl 77.61
[Epoch 208 Batch 200/372] current loss 4.17, ppl 64.50, throughput 622.53 samples/s, lr 30.43
[Epoch 208] throughput 42720.86 samples/s
[Epoch 208] time cost 51.75s, valid loss 4.40, valid ppl 81.75, lr 30.00
[Epoch 208] test loss 4.35, test ppl 77.60
[Epoch 209 Batch 200/372] current loss 4.17, ppl 64.93, throughput 633.74 samples/s, lr 32.57
[Epoch 209] throughput 42903.10 samples/s
[Epoch 209] time cost 51.60s, valid loss 4.40, valid ppl 81.74, lr 30.00
[Epoch 209] test loss 4.35, test ppl 77.59
[Epoch 210 Batch 200/372] current loss 4.17, ppl 65.01, throughput 631.82 samples/s, lr 32.14
[Epoch 210] throughput 42825.94 samples/s
[Epoch 210] time cost 51.59s, valid loss 4.40, valid ppl 81.72, lr 30.00
[Epoch 210] test loss 4.35, test ppl 77.58
[Epoch 211 Batch 200/372] current loss 4.18, ppl 65.29, throughput 618.15 samples/s, lr 31.29
[Epoch 211] throughput 42336.41 samples/s
[Epoch 211] time cost 52.17s, valid loss 4.40, valid ppl 81.71, lr 30.00
[Epoch 211] test loss 4.35, test ppl 77.57
[Epoch 212 Batch 200/372] current loss 4.17, ppl 64.98, throughput 617.05 samples/s, lr 31.29
[Epoch 212] throughput 42262.93 samples/s
[Epoch 212] time cost 52.36s, valid loss 4.40, valid ppl 81.70, lr 30.00
[Epoch 212] test loss 4.35, test ppl 77.55
[Epoch 213 Batch 200/372] current loss 4.18, ppl 65.20, throughput 633.80 samples/s, lr 31.29
[Epoch 213] throughput 43160.18 samples/s
[Epoch 213] time cost 51.23s, valid loss 4.40, valid ppl 81.68, lr 30.00
[Epoch 213] test loss 4.35, test ppl 77.54
[Epoch 214 Batch 200/372] current loss 4.17, ppl 64.60, throughput 625.34 samples/s, lr 31.29
[Epoch 214] throughput 42545.10 samples/s
[Epoch 214] time cost 51.97s, valid loss 4.40, valid ppl 81.67, lr 30.00
[Epoch 214] test loss 4.35, test ppl 77.53
[Epoch 215 Batch 200/372] current loss 4.17, ppl 64.79, throughput 631.90 samples/s, lr 28.29
[Epoch 215] throughput 42327.73 samples/s
[Epoch 215] time cost 52.30s, valid loss 4.40, valid ppl 81.66, lr 30.00
[Epoch 215] test loss 4.35, test ppl 77.52
[Epoch 216 Batch 200/372] current loss 4.17, ppl 64.99, throughput 617.20 samples/s, lr 29.57
[Epoch 216] throughput 43108.41 samples/s
[Epoch 216] time cost 51.30s, valid loss 4.40, valid ppl 81.64, lr 30.00
[Epoch 216] test loss 4.35, test ppl 77.51
[Epoch 217 Batch 200/372] current loss 4.18, ppl 65.60, throughput 623.32 samples/s, lr 28.71
[Epoch 217] throughput 42449.05 samples/s
[Epoch 217] time cost 52.04s, valid loss 4.40, valid ppl 81.63, lr 30.00
[Epoch 217] test loss 4.35, test ppl 77.50
[Epoch 218 Batch 200/372] current loss 4.18, ppl 65.28, throughput 625.64 samples/s, lr 13.71
[Epoch 218] throughput 42138.78 samples/s
[Epoch 218] time cost 52.41s, valid loss 4.40, valid ppl 81.62, lr 30.00
[Epoch 218] test loss 4.35, test ppl 77.49
[Epoch 219 Batch 200/372] current loss 4.17, ppl 64.92, throughput 637.63 samples/s, lr 33.43
[Epoch 219] throughput 43709.68 samples/s
[Epoch 219] time cost 50.65s, valid loss 4.40, valid ppl 81.61, lr 30.00
[Epoch 219] test loss 4.35, test ppl 77.48
[Epoch 220 Batch 200/372] current loss 4.15, ppl 63.46, throughput 618.69 samples/s, lr 30.43
[Epoch 220] throughput 42351.54 samples/s
[Epoch 220] time cost 52.16s, valid loss 4.40, valid ppl 81.60, lr 30.00
[Epoch 220] test loss 4.35, test ppl 77.46
[Epoch 221 Batch 200/372] current loss 4.16, ppl 64.27, throughput 619.92 samples/s, lr 30.43
[Epoch 221] throughput 42206.73 samples/s
[Epoch 221] time cost 52.29s, valid loss 4.40, valid ppl 81.58, lr 30.00
[Epoch 221] test loss 4.35, test ppl 77.45
[Epoch 222 Batch 200/372] current loss 4.16, ppl 64.14, throughput 630.51 samples/s, lr 29.57
[Epoch 222] throughput 43170.78 samples/s
[Epoch 222] time cost 51.32s, valid loss 4.40, valid ppl 81.57, lr 30.00
[Epoch 222] test loss 4.35, test ppl 77.44
[Epoch 223 Batch 200/372] current loss 4.16, ppl 64.37, throughput 613.96 samples/s, lr 27.00
[Epoch 223] throughput 42329.50 samples/s
[Epoch 223] time cost 52.22s, valid loss 4.40, valid ppl 81.56, lr 30.00
[Epoch 223] test loss 4.35, test ppl 77.43
[Epoch 224 Batch 200/372] current loss 4.17, ppl 64.89, throughput 619.33 samples/s, lr 27.00
[Epoch 224] throughput 41738.27 samples/s
[Epoch 224] time cost 52.91s, valid loss 4.40, valid ppl 81.55, lr 30.00
[Epoch 224] test loss 4.35, test ppl 77.42
[Epoch 225 Batch 200/372] current loss 4.17, ppl 64.47, throughput 613.90 samples/s, lr 32.14
[Epoch 225] throughput 42659.72 samples/s
[Epoch 225] time cost 51.81s, valid loss 4.40, valid ppl 81.53, lr 30.00
[Epoch 225] test loss 4.35, test ppl 77.41
[Epoch 226 Batch 200/372] current loss 4.17, ppl 64.52, throughput 620.24 samples/s, lr 28.71
[Epoch 226] throughput 42971.31 samples/s
[Epoch 226] time cost 51.46s, valid loss 4.40, valid ppl 81.52, lr 30.00
[Epoch 226] test loss 4.35, test ppl 77.40
[Epoch 227 Batch 200/372] current loss 4.16, ppl 64.21, throughput 631.05 samples/s, lr 30.43
[Epoch 227] throughput 43646.46 samples/s
[Epoch 227] time cost 50.82s, valid loss 4.40, valid ppl 81.51, lr 30.00
[Epoch 227] test loss 4.35, test ppl 77.39
[Epoch 228 Batch 200/372] current loss 4.15, ppl 63.52, throughput 637.43 samples/s, lr 29.57
[Epoch 228] throughput 42328.47 samples/s
[Epoch 228] time cost 52.22s, valid loss 4.40, valid ppl 81.50, lr 30.00
[Epoch 228] test loss 4.35, test ppl 77.38
[Epoch 229 Batch 200/372] current loss 4.16, ppl 63.96, throughput 631.62 samples/s, lr 30.86
[Epoch 229] throughput 43363.68 samples/s
[Epoch 229] time cost 51.09s, valid loss 4.40, valid ppl 81.49, lr 30.00
[Epoch 229] test loss 4.35, test ppl 77.37
[Epoch 230 Batch 200/372] current loss 4.16, ppl 64.00, throughput 615.64 samples/s, lr 29.57
[Epoch 230] throughput 41865.88 samples/s
[Epoch 230] time cost 52.80s, valid loss 4.40, valid ppl 81.48, lr 30.00
[Epoch 230] test loss 4.35, test ppl 77.36
[Epoch 231 Batch 200/372] current loss 4.16, ppl 64.22, throughput 618.36 samples/s, lr 29.14
[Epoch 231] throughput 42264.36 samples/s
[Epoch 231] time cost 52.34s, valid loss 4.40, valid ppl 81.47, lr 30.00
[Epoch 231] test loss 4.35, test ppl 77.35
[Epoch 232 Batch 200/372] current loss 4.15, ppl 63.55, throughput 620.35 samples/s, lr 30.86
[Epoch 232] throughput 42673.61 samples/s
[Epoch 232] time cost 51.82s, valid loss 4.40, valid ppl 81.46, lr 30.00
[Epoch 232] test loss 4.35, test ppl 77.34
[Epoch 233 Batch 200/372] current loss 4.16, ppl 63.94, throughput 622.33 samples/s, lr 28.71
[Epoch 233] throughput 42474.03 samples/s
[Epoch 233] time cost 52.07s, valid loss 4.40, valid ppl 81.44, lr 30.00
[Epoch 233] test loss 4.35, test ppl 77.33
[Epoch 234 Batch 200/372] current loss 4.16, ppl 64.20, throughput 621.41 samples/s, lr 25.71
[Epoch 234] throughput 42391.73 samples/s
[Epoch 234] time cost 52.15s, valid loss 4.40, valid ppl 81.43, lr 30.00
[Epoch 234] test loss 4.35, test ppl 77.32
[Epoch 235 Batch 200/372] current loss 4.16, ppl 63.99, throughput 609.20 samples/s, lr 28.71
[Epoch 235] throughput 41883.75 samples/s
[Epoch 235] time cost 52.76s, valid loss 4.40, valid ppl 81.42, lr 30.00
[Epoch 235] test loss 4.35, test ppl 77.31
[Epoch 236 Batch 200/372] current loss 4.16, ppl 64.24, throughput 616.41 samples/s, lr 27.43
[Epoch 236] throughput 41806.95 samples/s
[Epoch 236] time cost 52.94s, valid loss 4.40, valid ppl 81.41, lr 30.00
[Epoch 236] test loss 4.35, test ppl 77.30
[Epoch 237 Batch 200/372] current loss 4.15, ppl 63.75, throughput 620.63 samples/s, lr 29.14
[Epoch 237] throughput 42268.02 samples/s
[Epoch 237] time cost 52.24s, valid loss 4.40, valid ppl 81.40, lr 30.00
[Epoch 237] test loss 4.35, test ppl 77.29
[Epoch 238 Batch 200/372] current loss 4.14, ppl 63.06, throughput 620.54 samples/s, lr 28.29
[Epoch 238] throughput 42727.50 samples/s
[Epoch 238] time cost 51.96s, valid loss 4.40, valid ppl 81.39, lr 30.00
[Epoch 238] test loss 4.35, test ppl 77.28
[Epoch 239 Batch 200/372] current loss 4.14, ppl 63.11, throughput 649.07 samples/s, lr 30.00
[Epoch 239] throughput 43527.63 samples/s
[Epoch 239] time cost 50.93s, valid loss 4.40, valid ppl 81.38, lr 30.00
[Epoch 239] test loss 4.35, test ppl 77.27
[Epoch 240 Batch 200/372] current loss 4.14, ppl 62.52, throughput 610.45 samples/s, lr 26.57
[Epoch 240] throughput 42727.84 samples/s
[Epoch 240] time cost 51.72s, valid loss 4.40, valid ppl 81.37, lr 30.00
[Epoch 240] test loss 4.35, test ppl 77.26
[Epoch 241 Batch 200/372] current loss 4.15, ppl 63.41, throughput 619.85 samples/s, lr 30.43
[Epoch 241] throughput 42262.89 samples/s
[Epoch 241] time cost 52.26s, valid loss 4.40, valid ppl 81.36, lr 30.00
[Epoch 241] test loss 4.35, test ppl 77.26
[Epoch 242 Batch 200/372] current loss 4.15, ppl 63.35, throughput 626.85 samples/s, lr 30.43
[Epoch 242] throughput 42524.85 samples/s
[Epoch 242] time cost 52.06s, valid loss 4.40, valid ppl 81.35, lr 30.00
[Epoch 242] test loss 4.35, test ppl 77.25
[Epoch 243 Batch 200/372] current loss 4.15, ppl 63.36, throughput 630.04 samples/s, lr 30.86
[Epoch 243] throughput 43061.82 samples/s
[Epoch 243] time cost 51.38s, valid loss 4.40, valid ppl 81.34, lr 30.00
[Epoch 243] test loss 4.35, test ppl 77.24
[Epoch 244 Batch 200/372] current loss 4.14, ppl 62.83, throughput 619.09 samples/s, lr 30.43
[Epoch 244] throughput 42184.81 samples/s
[Epoch 244] time cost 52.39s, valid loss 4.40, valid ppl 81.33, lr 30.00
[Epoch 244] test loss 4.35, test ppl 77.23
[Epoch 245 Batch 200/372] current loss 4.14, ppl 62.94, throughput 616.38 samples/s, lr 25.71
[Epoch 245] throughput 43038.20 samples/s
[Epoch 245] time cost 51.39s, valid loss 4.40, valid ppl 81.32, lr 30.00
[Epoch 245] test loss 4.35, test ppl 77.22
[Epoch 246 Batch 200/372] current loss 4.15, ppl 63.58, throughput 637.73 samples/s, lr 27.86
[Epoch 246] throughput 42256.89 samples/s
[Epoch 246] time cost 52.38s, valid loss 4.40, valid ppl 81.31, lr 30.00
[Epoch 246] test loss 4.35, test ppl 77.21
[Epoch 247 Batch 200/372] current loss 4.14, ppl 63.06, throughput 635.90 samples/s, lr 24.43
[Epoch 247] throughput 43409.57 samples/s
[Epoch 247] time cost 51.09s, valid loss 4.40, valid ppl 81.29, lr 30.00
[Epoch 247] test loss 4.35, test ppl 77.20
[Epoch 248 Batch 200/372] current loss 4.15, ppl 63.65, throughput 628.21 samples/s, lr 26.14
[Epoch 248] throughput 42922.58 samples/s
[Epoch 248] time cost 51.50s, valid loss 4.40, valid ppl 81.28, lr 30.00
[Epoch 248] test loss 4.35, test ppl 77.19
[Epoch 249 Batch 200/372] current loss 4.13, ppl 62.36, throughput 627.19 samples/s, lr 31.29
[Epoch 249] throughput 42362.21 samples/s
[Epoch 249] time cost 52.20s, valid loss 4.40, valid ppl 81.27, lr 30.00
[Epoch 249] test loss 4.35, test ppl 77.18