Skip to content
Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
2391 lines (2390 sloc) 144 KB
Use AWDRNN
[Epoch 0 Batch 200/372] loss 8.04, ppl 3112.62, throughput 153.27 samples/s, lr 31.71
[Epoch 0] throughput 8419.28 samples/s
[Epoch 0] time cost 298.99s, valid loss 6.60, valid ppl 732.92
test loss 6.53, test ppl 683.43
[Epoch 1 Batch 200/372] loss 6.73, ppl 837.89, throughput 85.25 samples/s, lr 30.43
[Epoch 1] throughput 5613.80 samples/s
[Epoch 1] time cost 412.58s, valid loss 6.04, valid ppl 418.85
test loss 5.96, test ppl 388.21
[Epoch 2 Batch 200/372] loss 6.39, ppl 596.13, throughput 59.23 samples/s, lr 30.00
[Epoch 2] throughput 4302.44 samples/s
[Epoch 2] time cost 526.18s, valid loss 5.83, valid ppl 339.24
test loss 5.75, test ppl 314.06
[Epoch 3 Batch 200/372] loss 6.17, ppl 475.86, throughput 61.51 samples/s, lr 27.43
[Epoch 3] throughput 4421.76 samples/s
[Epoch 3] time cost 512.67s, valid loss 5.68, valid ppl 292.97
test loss 5.61, test ppl 271.84
[Epoch 4 Batch 200/372] loss 6.00, ppl 402.93, throughput 77.74 samples/s, lr 31.29
[Epoch 4] throughput 4767.67 samples/s
[Epoch 4] time cost 484.53s, valid loss 5.50, valid ppl 243.73
test loss 5.42, test ppl 224.87
[Epoch 5 Batch 200/372] loss 5.86, ppl 349.62, throughput 86.82 samples/s, lr 26.14
[Epoch 5] throughput 4825.81 samples/s
[Epoch 5] time cost 476.47s, valid loss 5.40, valid ppl 221.21
test loss 5.32, test ppl 204.58
[Epoch 6 Batch 200/372] loss 5.74, ppl 311.02, throughput 73.88 samples/s, lr 29.57
[Epoch 6] throughput 4739.17 samples/s
[Epoch 6] time cost 492.25s, valid loss 5.22, valid ppl 185.74
test loss 5.15, test ppl 172.54
[Epoch 7 Batch 200/372] loss 5.64, ppl 281.28, throughput 81.85 samples/s, lr 32.57
[Epoch 7] throughput 5433.96 samples/s
[Epoch 7] time cost 425.26s, valid loss 5.17, valid ppl 175.25
test loss 5.09, test ppl 163.05
[Epoch 8 Batch 200/372] loss 5.56, ppl 260.18, throughput 64.80 samples/s, lr 31.29
[Epoch 8] throughput 4797.27 samples/s
[Epoch 8] time cost 476.19s, valid loss 5.10, valid ppl 163.21
test loss 5.02, test ppl 151.71
[Epoch 9 Batch 200/372] loss 5.48, ppl 240.37, throughput 70.05 samples/s, lr 31.29
[Epoch 9] throughput 5344.21 samples/s
[Epoch 9] time cost 431.60s, valid loss 5.04, valid ppl 154.77
test loss 4.97, test ppl 143.94
[Epoch 10 Batch 200/372] loss 5.42, ppl 225.51, throughput 66.21 samples/s, lr 33.00
[Epoch 10] throughput 5143.70 samples/s
[Epoch 10] time cost 446.68s, valid loss 5.00, valid ppl 148.01
test loss 4.93, test ppl 138.28
[Epoch 11 Batch 200/372] loss 5.35, ppl 211.61, throughput 67.46 samples/s, lr 28.71
[Epoch 11] throughput 4808.33 samples/s
[Epoch 11] time cost 475.43s, valid loss 4.95, valid ppl 141.16
test loss 4.88, test ppl 131.30
[Epoch 12 Batch 200/372] loss 5.30, ppl 200.43, throughput 63.22 samples/s, lr 33.43
[Epoch 12] throughput 5056.17 samples/s
[Epoch 12] time cost 453.93s, valid loss 4.91, valid ppl 135.94
test loss 4.84, test ppl 126.85
[Epoch 13 Batch 200/372] loss 5.26, ppl 191.56, throughput 81.42 samples/s, lr 29.14
[Epoch 13] throughput 5788.55 samples/s
[Epoch 13] time cost 401.47s, valid loss 4.86, valid ppl 128.65
test loss 4.79, test ppl 119.78
[Epoch 14 Batch 200/372] loss 5.20, ppl 181.61, throughput 71.99 samples/s, lr 33.00
[Epoch 14] throughput 5515.01 samples/s
[Epoch 14] time cost 419.01s, valid loss 4.82, valid ppl 123.64
test loss 4.75, test ppl 115.04
[Epoch 15 Batch 200/372] loss 5.16, ppl 174.99, throughput 70.30 samples/s, lr 30.00
[Epoch 15] throughput 5265.00 samples/s
[Epoch 15] time cost 437.26s, valid loss 4.81, valid ppl 122.64
test loss 4.74, test ppl 114.18
[Epoch 16 Batch 200/372] loss 5.13, ppl 168.45, throughput 72.20 samples/s, lr 27.43
[Epoch 16] throughput 5409.06 samples/s
[Epoch 16] time cost 427.61s, valid loss 4.78, valid ppl 118.66
test loss 4.71, test ppl 110.74
[Epoch 17 Batch 200/372] loss 5.09, ppl 161.84, throughput 89.77 samples/s, lr 30.86
[Epoch 17] throughput 5673.45 samples/s
[Epoch 17] time cost 412.51s, valid loss 4.74, valid ppl 114.64
test loss 4.67, test ppl 106.71
[Epoch 18 Batch 200/372] loss 5.05, ppl 156.53, throughput 75.14 samples/s, lr 30.00
[Epoch 18] throughput 5241.95 samples/s
[Epoch 18] time cost 448.43s, valid loss 4.71, valid ppl 111.27
test loss 4.65, test ppl 104.15
[Epoch 19 Batch 200/372] loss 5.02, ppl 151.86, throughput 89.22 samples/s, lr 27.86
[Epoch 19] throughput 5988.14 samples/s
[Epoch 19] time cost 392.94s, valid loss 4.70, valid ppl 110.29
test loss 4.63, test ppl 102.68
[Epoch 20 Batch 200/372] loss 5.00, ppl 148.15, throughput 79.62 samples/s, lr 31.29
[Epoch 20] throughput 5841.11 samples/s
[Epoch 20] time cost 405.78s, valid loss 4.70, valid ppl 110.22
test loss 4.63, test ppl 102.97
[Epoch 21 Batch 200/372] loss 4.97, ppl 143.91, throughput 77.89 samples/s, lr 30.43
[Epoch 21] throughput 5211.63 samples/s
[Epoch 21] time cost 447.32s, valid loss 4.67, valid ppl 106.31
test loss 4.60, test ppl 99.30
[Epoch 22 Batch 200/372] loss 4.94, ppl 140.13, throughput 95.16 samples/s, lr 27.86
[Epoch 22] throughput 5961.35 samples/s
[Epoch 22] time cost 397.29s, valid loss 4.65, valid ppl 105.04
test loss 4.58, test ppl 97.96
[Epoch 23 Batch 200/372] loss 4.92, ppl 136.90, throughput 92.48 samples/s, lr 32.57
[Epoch 23] throughput 5736.43 samples/s
[Epoch 23] time cost 404.65s, valid loss 4.64, valid ppl 103.14
test loss 4.57, test ppl 96.50
[Epoch 24 Batch 200/372] loss 4.89, ppl 132.87, throughput 75.87 samples/s, lr 31.29
[Epoch 24] throughput 5161.64 samples/s
[Epoch 24] time cost 445.62s, valid loss 4.63, valid ppl 102.41
test loss 4.56, test ppl 96.06
[Epoch 25 Batch 200/372] loss 4.87, ppl 130.51, throughput 72.86 samples/s, lr 31.71
[Epoch 25] throughput 5360.39 samples/s
[Epoch 25] time cost 430.37s, valid loss 4.63, valid ppl 102.24
test loss 4.56, test ppl 95.82
[Epoch 26 Batch 200/372] loss 4.85, ppl 127.93, throughput 62.83 samples/s, lr 29.14
[Epoch 26] throughput 4864.95 samples/s
[Epoch 26] time cost 470.24s, valid loss 4.60, valid ppl 99.81
test loss 4.54, test ppl 93.63
[Epoch 27 Batch 200/372] loss 4.83, ppl 125.79, throughput 81.17 samples/s, lr 24.43
[Epoch 27] throughput 5831.03 samples/s
[Epoch 27] time cost 399.08s, valid loss 4.59, valid ppl 98.19
test loss 4.52, test ppl 92.13
[Epoch 28 Batch 200/372] loss 4.81, ppl 122.52, throughput 78.78 samples/s, lr 30.00
[Epoch 28] throughput 5620.19 samples/s
[Epoch 28] time cost 412.70s, valid loss 4.57, valid ppl 96.40
test loss 4.50, test ppl 90.29
[Epoch 29 Batch 200/372] loss 4.79, ppl 120.44, throughput 92.35 samples/s, lr 31.29
[Epoch 29] throughput 5646.46 samples/s
[Epoch 29] time cost 410.41s, valid loss 4.57, valid ppl 96.55
[Epoch 30 Batch 200/372] loss 4.77, ppl 118.04, throughput 79.95 samples/s, lr 29.57
[Epoch 30] throughput 5658.86 samples/s
[Epoch 30] time cost 411.86s, valid loss 4.56, valid ppl 95.37
test loss 4.50, test ppl 89.63
[Epoch 31 Batch 200/372] loss 4.75, ppl 116.07, throughput 76.83 samples/s, lr 30.00
[Epoch 31] throughput 5614.30 samples/s
[Epoch 31] time cost 422.68s, valid loss 4.55, valid ppl 94.58
test loss 4.48, test ppl 88.58
[Epoch 32 Batch 200/372] loss 4.74, ppl 114.67, throughput 91.46 samples/s, lr 16.29
[Epoch 32] throughput 5622.08 samples/s
[Epoch 32] time cost 415.38s, valid loss 4.55, valid ppl 94.47
test loss 4.49, test ppl 88.74
[Epoch 33 Batch 200/372] loss 4.72, ppl 112.68, throughput 83.98 samples/s, lr 33.86
[Epoch 33] throughput 5329.00 samples/s
[Epoch 33] time cost 432.42s, valid loss 4.54, valid ppl 94.08
test loss 4.48, test ppl 88.43
[Epoch 34 Batch 200/372] loss 4.71, ppl 110.78, throughput 83.68 samples/s, lr 28.71
[Epoch 34] throughput 5558.50 samples/s
[Epoch 34] time cost 416.56s, valid loss 4.53, valid ppl 92.31
test loss 4.47, test ppl 86.98
[Epoch 35 Batch 200/372] loss 4.69, ppl 109.27, throughput 78.03 samples/s, lr 29.57
[Epoch 35] throughput 5792.32 samples/s
[Epoch 35] time cost 401.24s, valid loss 4.52, valid ppl 91.45
test loss 4.45, test ppl 85.65
[Epoch 36 Batch 200/372] loss 4.68, ppl 108.01, throughput 89.02 samples/s, lr 31.29
[Epoch 36] throughput 5665.38 samples/s
[Epoch 36] time cost 413.99s, valid loss 4.52, valid ppl 91.52
[Epoch 37 Batch 200/372] loss 4.67, ppl 106.62, throughput 87.94 samples/s, lr 16.71
[Epoch 37] throughput 5577.88 samples/s
[Epoch 37] time cost 415.35s, valid loss 4.52, valid ppl 91.52
[Epoch 38 Batch 200/372] loss 4.66, ppl 105.35, throughput 79.05 samples/s, lr 28.71
[Epoch 38] throughput 5532.16 samples/s
[Epoch 38] time cost 418.46s, valid loss 4.51, valid ppl 91.06
test loss 4.45, test ppl 85.69
[Epoch 39 Batch 200/372] loss 4.64, ppl 103.62, throughput 78.71 samples/s, lr 29.57
[Epoch 39] throughput 5622.57 samples/s
[Epoch 39] time cost 411.91s, valid loss 4.49, valid ppl 89.28
test loss 4.43, test ppl 83.91
[Epoch 40 Batch 200/372] loss 4.63, ppl 102.51, throughput 81.60 samples/s, lr 33.00
[Epoch 40] throughput 5769.69 samples/s
[Epoch 40] time cost 412.29s, valid loss 4.51, valid ppl 90.87
[Epoch 41 Batch 200/372] loss 4.62, ppl 101.06, throughput 86.87 samples/s, lr 29.57
[Epoch 41] throughput 5574.90 samples/s
[Epoch 41] time cost 418.06s, valid loss 4.48, valid ppl 88.40
test loss 4.42, test ppl 83.06
[Epoch 42 Batch 200/372] loss 4.61, ppl 100.41, throughput 91.43 samples/s, lr 28.71
[Epoch 42] throughput 5360.11 samples/s
[Epoch 42] time cost 439.17s, valid loss 4.50, valid ppl 90.01
[Epoch 43 Batch 200/372] loss 4.60, ppl 99.60, throughput 79.32 samples/s, lr 32.14
[Epoch 43] throughput 5725.24 samples/s
[Epoch 43] time cost 405.76s, valid loss 4.48, valid ppl 87.94
test loss 4.41, test ppl 82.64
[Epoch 44 Batch 200/372] loss 4.59, ppl 98.39, throughput 95.72 samples/s, lr 14.14
[Epoch 44] throughput 6551.73 samples/s
[Epoch 44] time cost 359.48s, valid loss 4.47, valid ppl 87.54
test loss 4.41, test ppl 82.31
[Epoch 45 Batch 200/372] loss 4.58, ppl 97.16, throughput 76.81 samples/s, lr 15.43
[Epoch 45] throughput 5657.17 samples/s
[Epoch 45] time cost 409.67s, valid loss 4.47, valid ppl 87.68
[Epoch 46 Batch 200/372] loss 4.57, ppl 96.59, throughput 80.00 samples/s, lr 27.00
[Epoch 46] throughput 5892.77 samples/s
[Epoch 46] time cost 395.17s, valid loss 4.47, valid ppl 87.49
test loss 4.41, test ppl 82.08
[Epoch 47 Batch 200/372] loss 4.56, ppl 95.42, throughput 84.17 samples/s, lr 30.00
[Epoch 47] throughput 6052.09 samples/s
[Epoch 47] time cost 385.70s, valid loss 4.48, valid ppl 88.46
[Epoch 48 Batch 200/372] loss 4.55, ppl 94.17, throughput 76.91 samples/s, lr 15.00
[Epoch 48] throughput 5616.49 samples/s
[Epoch 48] time cost 413.95s, valid loss 4.45, valid ppl 85.82
test loss 4.39, test ppl 80.66
[Epoch 49 Batch 200/372] loss 4.54, ppl 93.46, throughput 95.74 samples/s, lr 30.43
[Epoch 49] throughput 6467.52 samples/s
[Epoch 49] time cost 367.83s, valid loss 4.46, valid ppl 86.15
[Epoch 50 Batch 200/372] loss 4.53, ppl 92.87, throughput 89.88 samples/s, lr 30.00
[Epoch 50] throughput 6234.43 samples/s
[Epoch 50] time cost 381.59s, valid loss 4.45, valid ppl 85.90
[Epoch 51 Batch 200/372] loss 4.51, ppl 91.37, throughput 88.83 samples/s, lr 27.43
[Epoch 51] throughput 6108.12 samples/s
[Epoch 51] time cost 389.83s, valid loss 4.46, valid ppl 86.53
[Epoch 52 Batch 200/372] loss 4.51, ppl 90.94, throughput 90.79 samples/s, lr 28.29
[Epoch 52] throughput 6193.85 samples/s
[Epoch 52] time cost 380.84s, valid loss 4.44, valid ppl 84.49
test loss 4.38, test ppl 79.50
[Epoch 53 Batch 200/372] loss 4.50, ppl 89.95, throughput 90.13 samples/s, lr 29.14
[Epoch 53] throughput 5612.41 samples/s
[Epoch 53] time cost 413.10s, valid loss 4.44, valid ppl 84.38
test loss 4.37, test ppl 79.26
[Epoch 54 Batch 200/372] loss 4.50, ppl 89.71, throughput 93.45 samples/s, lr 16.71
[Epoch 54] throughput 5741.49 samples/s
[Epoch 54] time cost 404.57s, valid loss 4.45, valid ppl 85.26
[Epoch 55 Batch 200/372] loss 4.49, ppl 89.04, throughput 94.19 samples/s, lr 30.43
[Epoch 55] throughput 5629.13 samples/s
[Epoch 55] time cost 417.36s, valid loss 4.45, valid ppl 85.62
[Epoch 56 Batch 200/372] loss 4.48, ppl 88.60, throughput 72.98 samples/s, lr 30.00
[Epoch 56] throughput 5586.15 samples/s
[Epoch 56] time cost 414.66s, valid loss 4.45, valid ppl 85.30
[Epoch 57 Batch 200/372] loss 4.48, ppl 87.84, throughput 61.51 samples/s, lr 33.43
[Epoch 57] throughput 4462.06 samples/s
[Epoch 57] time cost 509.27s, valid loss 4.44, valid ppl 84.78
[Epoch 58 Batch 200/372] loss 4.47, ppl 87.00, throughput 90.50 samples/s, lr 30.00
[Epoch 58] throughput 5187.15 samples/s
[Epoch 58] time cost 443.64s, valid loss 4.43, valid ppl 84.08
test loss 4.37, test ppl 79.01
[Epoch 59 Batch 200/372] loss 4.46, ppl 86.74, throughput 92.15 samples/s, lr 27.86
[Epoch 59] throughput 5703.33 samples/s
[Epoch 59] time cost 407.28s, valid loss 4.43, valid ppl 83.57
test loss 4.37, test ppl 78.68
[Epoch 60 Batch 200/372] loss 4.45, ppl 85.65, throughput 77.47 samples/s, lr 33.43
[Epoch 60] throughput 5800.89 samples/s
[Epoch 60] time cost 401.28s, valid loss 4.44, valid ppl 84.76
[Epoch 61 Batch 200/372] loss 4.45, ppl 85.41, throughput 74.78 samples/s, lr 30.43
[Epoch 61] throughput 4975.44 samples/s
[Epoch 61] time cost 460.83s, valid loss 4.42, valid ppl 82.86
test loss 4.36, test ppl 77.93
[Epoch 62 Batch 200/372] loss 4.44, ppl 84.72, throughput 86.99 samples/s, lr 30.00
[Epoch 62] throughput 5794.57 samples/s
[Epoch 62] time cost 400.93s, valid loss 4.43, valid ppl 84.17
[Epoch 63 Batch 200/372] loss 4.43, ppl 84.24, throughput 87.69 samples/s, lr 28.71
[Epoch 63] throughput 5704.03 samples/s
[Epoch 63] time cost 406.97s, valid loss 4.41, valid ppl 82.35
test loss 4.35, test ppl 77.55
[Epoch 64 Batch 200/372] loss 4.43, ppl 83.87, throughput 77.79 samples/s, lr 32.14
[Epoch 64] throughput 5763.02 samples/s
[Epoch 64] time cost 403.43s, valid loss 4.42, valid ppl 83.31
[Epoch 65 Batch 200/372] loss 4.42, ppl 83.11, throughput 82.29 samples/s, lr 32.57
[Epoch 65] throughput 6030.77 samples/s
[Epoch 65] time cost 386.85s, valid loss 4.41, valid ppl 82.20
test loss 4.35, test ppl 77.55
[Epoch 66 Batch 200/372] loss 4.41, ppl 82.29, throughput 82.64 samples/s, lr 27.86
[Epoch 66] throughput 5845.20 samples/s
[Epoch 66] time cost 397.85s, valid loss 4.41, valid ppl 81.97
test loss 4.35, test ppl 77.43
[Epoch 67 Batch 200/372] loss 4.41, ppl 82.25, throughput 81.46 samples/s, lr 13.71
[Epoch 67] throughput 5796.99 samples/s
[Epoch 67] time cost 403.67s, valid loss 4.42, valid ppl 83.22
[Epoch 68 Batch 200/372] loss 4.40, ppl 81.63, throughput 81.68 samples/s, lr 32.14
[Epoch 68] throughput 5648.02 samples/s
[Epoch 68] time cost 418.98s, valid loss 4.40, valid ppl 81.27
test loss 4.34, test ppl 76.89
[Epoch 69 Batch 200/372] loss 4.40, ppl 81.37, throughput 91.03 samples/s, lr 27.86
[Epoch 69] throughput 5681.12 samples/s
[Epoch 69] time cost 408.24s, valid loss 4.41, valid ppl 82.37
[Epoch 70 Batch 200/372] loss 4.39, ppl 81.01, throughput 88.56 samples/s, lr 27.43
[Epoch 70] throughput 5514.88 samples/s
[Epoch 70] time cost 419.17s, valid loss 4.40, valid ppl 81.54
[Epoch 71 Batch 200/372] loss 4.39, ppl 80.34, throughput 91.92 samples/s, lr 30.43
[Epoch 71] throughput 5346.24 samples/s
[Epoch 71] time cost 431.36s, valid loss 4.40, valid ppl 81.15
test loss 4.34, test ppl 76.75
[Epoch 72 Batch 200/372] loss 4.38, ppl 80.08, throughput 78.51 samples/s, lr 27.86
[Epoch 72] throughput 5203.89 samples/s
[Epoch 72] time cost 455.09s, valid loss 4.40, valid ppl 81.83
[Epoch 73 Batch 200/372] loss 4.38, ppl 79.96, throughput 71.85 samples/s, lr 29.57
[Epoch 73] throughput 4999.45 samples/s
[Epoch 73] time cost 458.59s, valid loss 4.40, valid ppl 81.42
[Epoch 74 Batch 200/372] loss 4.37, ppl 78.97, throughput 67.30 samples/s, lr 30.00
[Epoch 74] throughput 5051.99 samples/s
[Epoch 74] time cost 454.11s, valid loss 4.42, valid ppl 82.79
[Epoch 75 Batch 200/372] loss 4.37, ppl 79.07, throughput 79.53 samples/s, lr 29.14
[Epoch 75] throughput 5311.61 samples/s
[Epoch 75] time cost 433.78s, valid loss 4.39, valid ppl 80.57
test loss 4.33, test ppl 76.22
[Epoch 76 Batch 200/372] loss 4.37, ppl 78.76, throughput 69.62 samples/s, lr 27.00
[Epoch 76] throughput 4949.10 samples/s
[Epoch 76] time cost 462.58s, valid loss 4.42, valid ppl 83.07
[Epoch 77 Batch 200/372] loss 4.36, ppl 77.89, throughput 87.12 samples/s, lr 29.57
[Epoch 77] throughput 5550.72 samples/s
[Epoch 77] time cost 417.03s, valid loss 4.40, valid ppl 81.39
[Epoch 78 Batch 200/372] loss 4.36, ppl 77.90, throughput 83.19 samples/s, lr 29.57
[Epoch 78] throughput 5602.86 samples/s
[Epoch 78] time cost 413.35s, valid loss 4.40, valid ppl 81.61
[Epoch 79 Batch 200/372] loss 4.35, ppl 77.47, throughput 89.16 samples/s, lr 29.57
[Epoch 79] throughput 5787.58 samples/s
[Epoch 79] time cost 401.56s, valid loss 4.39, valid ppl 80.60
[Epoch 80 Batch 200/372] loss 4.34, ppl 76.78, throughput 98.89 samples/s, lr 27.86
[Epoch 80] throughput 5988.18 samples/s
[Epoch 80] time cost 389.25s, valid loss 4.39, valid ppl 80.73
[Epoch 81 Batch 200/372] loss 4.34, ppl 77.08, throughput 89.52 samples/s, lr 32.57
[Epoch 81] throughput 5696.97 samples/s
[Epoch 81] time cost 407.19s, valid loss 4.39, valid ppl 80.36
test loss 4.33, test ppl 75.85
[Epoch 82 Batch 200/372] loss 4.34, ppl 76.54, throughput 83.03 samples/s, lr 29.14
[Epoch 82] throughput 5888.09 samples/s
[Epoch 82] time cost 395.61s, valid loss 4.40, valid ppl 81.62
[Epoch 83 Batch 200/372] loss 4.33, ppl 76.13, throughput 80.76 samples/s, lr 27.00
[Epoch 83] throughput 5915.38 samples/s
[Epoch 83] time cost 394.02s, valid loss 4.37, valid ppl 79.35
test loss 4.32, test ppl 74.92
[Epoch 84 Batch 200/372] loss 4.33, ppl 75.74, throughput 80.89 samples/s, lr 31.29
[Epoch 84] throughput 5891.77 samples/s
[Epoch 84] time cost 395.01s, valid loss 4.39, valid ppl 80.50
[Epoch 85 Batch 200/372] loss 4.33, ppl 75.67, throughput 96.45 samples/s, lr 13.29
[Epoch 85] throughput 5941.48 samples/s
[Epoch 85] time cost 392.45s, valid loss 4.39, valid ppl 80.94
[Epoch 86 Batch 200/372] loss 4.32, ppl 75.25, throughput 89.05 samples/s, lr 31.29
[Epoch 86] throughput 5743.64 samples/s
[Epoch 86] time cost 404.17s, valid loss 4.38, valid ppl 80.18
[Epoch 87 Batch 200/372] loss 4.32, ppl 75.07, throughput 93.17 samples/s, lr 30.86
[Epoch 87] throughput 5781.79 samples/s
[Epoch 87] time cost 401.58s, valid loss 4.37, valid ppl 79.06
test loss 4.32, test ppl 74.92
[Epoch 88 Batch 200/372] loss 4.31, ppl 74.79, throughput 76.10 samples/s, lr 33.43
[Epoch 88] throughput 5528.55 samples/s
[Epoch 88] time cost 418.76s, valid loss 4.37, valid ppl 79.37
[Epoch 89 Batch 200/372] loss 4.31, ppl 74.25, throughput 79.94 samples/s, lr 27.43
[Epoch 89] throughput 5716.12 samples/s
[Epoch 89] time cost 406.92s, valid loss 4.37, valid ppl 79.20
[Epoch 90 Batch 200/372] loss 4.30, ppl 74.02, throughput 79.60 samples/s, lr 30.43
[Epoch 90] throughput 5699.10 samples/s
[Epoch 90] time cost 407.07s, valid loss 4.39, valid ppl 80.47
[Epoch 91 Batch 200/372] loss 4.31, ppl 74.13, throughput 77.26 samples/s, lr 29.57
[Epoch 91] throughput 5663.00 samples/s
[Epoch 91] time cost 409.54s, valid loss 4.37, valid ppl 79.31
[Epoch 92 Batch 200/372] loss 4.30, ppl 73.95, throughput 79.18 samples/s, lr 30.00
[Epoch 92] throughput 5688.22 samples/s
[Epoch 92] time cost 407.91s, valid loss 4.37, valid ppl 79.29
[Epoch 93 Batch 200/372] loss 4.30, ppl 73.74, throughput 80.28 samples/s, lr 26.57
[Epoch 93] throughput 5821.12 samples/s
[Epoch 93] time cost 399.49s, valid loss 4.39, valid ppl 80.41
[Epoch 94 Batch 200/372] loss 4.29, ppl 73.02, throughput 80.60 samples/s, lr 29.57
[Epoch 94] throughput 5905.77 samples/s
[Epoch 94] time cost 394.46s, valid loss 4.37, valid ppl 78.73
test loss 4.31, test ppl 74.46
[Epoch 95 Batch 200/372] loss 4.29, ppl 73.05, throughput 75.44 samples/s, lr 28.71
[Epoch 95] throughput 5629.56 samples/s
[Epoch 95] time cost 411.79s, valid loss 4.37, valid ppl 79.04
[Epoch 96 Batch 200/372] loss 4.29, ppl 72.64, throughput 82.48 samples/s, lr 26.14
[Epoch 96] throughput 5949.22 samples/s
[Epoch 96] time cost 391.64s, valid loss 4.36, valid ppl 78.55
test loss 4.31, test ppl 74.54
[Epoch 97 Batch 200/372] loss 4.28, ppl 72.59, throughput 86.57 samples/s, lr 27.43
[Epoch 97] throughput 6116.07 samples/s
[Epoch 97] time cost 382.04s, valid loss 4.37, valid ppl 78.92
[Epoch 98 Batch 200/372] loss 4.27, ppl 71.70, throughput 76.87 samples/s, lr 33.00
[Epoch 98] throughput 5633.48 samples/s
[Epoch 98] time cost 412.27s, valid loss 4.37, valid ppl 79.15
[Epoch 99 Batch 200/372] loss 4.28, ppl 72.00, throughput 79.71 samples/s, lr 30.86
[Epoch 99] throughput 5693.13 samples/s
[Epoch 99] time cost 411.92s, valid loss 4.37, valid ppl 79.30
[Epoch 100 Batch 200/372] loss 4.28, ppl 71.94, throughput 82.17 samples/s, lr 32.57
[Epoch 100] throughput 5750.52 samples/s
[Epoch 100] time cost 409.74s, valid loss 4.37, valid ppl 78.72
[Epoch 101 Batch 200/372] loss 4.27, ppl 71.60, throughput 91.28 samples/s, lr 31.71
[Epoch 101] throughput 6225.84 samples/s
[Epoch 101] time cost 381.86s, valid loss 4.38, valid ppl 79.58
[Epoch 102 Batch 200/372] loss 4.27, ppl 71.25, throughput 89.27 samples/s, lr 33.43
[Epoch 102] throughput 6030.18 samples/s
[Epoch 102] time cost 396.07s, valid loss 4.37, valid ppl 79.07
[Epoch 103 Batch 200/372] loss 4.26, ppl 71.11, throughput 89.21 samples/s, lr 27.00
[Epoch 103] throughput 6149.38 samples/s
[Epoch 103] time cost 387.71s, valid loss 4.38, valid ppl 79.83
[Epoch 104 Batch 200/372] loss 4.26, ppl 70.72, throughput 93.07 samples/s, lr 31.29
[Epoch 104] throughput 6177.50 samples/s
[Epoch 104] time cost 388.00s, valid loss 4.37, valid ppl 79.08
[Epoch 105 Batch 200/372] loss 4.26, ppl 70.80, throughput 93.83 samples/s, lr 28.71
[Epoch 105] throughput 5959.67 samples/s
[Epoch 105] time cost 394.12s, valid loss 4.37, valid ppl 78.96
[Epoch 106 Batch 200/372] loss 4.25, ppl 70.23, throughput 92.91 samples/s, lr 32.57
[Epoch 106] throughput 5745.21 samples/s
[Epoch 106] time cost 404.36s, valid loss 4.37, valid ppl 78.96
[Epoch 107 Batch 200/372] loss 4.25, ppl 70.20, throughput 96.27 samples/s, lr 30.00
[Epoch 107] throughput 6311.56 samples/s
[Epoch 107] time cost 377.87s, valid loss 4.37, valid ppl 78.66
[Epoch 108 Batch 200/372] loss 4.25, ppl 70.31, throughput 93.22 samples/s, lr 28.71
[Epoch 108] throughput 6198.75 samples/s
[Epoch 108] time cost 387.79s, valid loss 4.36, valid ppl 78.50
test loss 4.31, test ppl 74.46
[Epoch 109 Batch 200/372] loss 4.25, ppl 70.12, throughput 91.55 samples/s, lr 28.71
[Epoch 109] throughput 5737.88 samples/s
[Epoch 109] time cost 404.98s, valid loss 4.36, valid ppl 78.15
test loss 4.31, test ppl 74.46
[Epoch 110 Batch 200/372] loss 4.24, ppl 69.53, throughput 89.30 samples/s, lr 30.00
[Epoch 110] throughput 5904.05 samples/s
[Epoch 110] time cost 394.39s, valid loss 4.36, valid ppl 77.97
test loss 4.31, test ppl 74.17
[Epoch 111 Batch 200/372] loss 4.25, ppl 69.81, throughput 82.86 samples/s, lr 30.43
[Epoch 111] throughput 6049.88 samples/s
[Epoch 111] time cost 385.73s, valid loss 4.35, valid ppl 77.17
test loss 4.29, test ppl 73.27
[Epoch 112 Batch 200/372] loss 4.24, ppl 69.10, throughput 80.32 samples/s, lr 31.29
[Epoch 112] throughput 5812.69 samples/s
[Epoch 112] time cost 399.90s, valid loss 4.35, valid ppl 77.81
[Epoch 113 Batch 200/372] loss 4.23, ppl 69.01, throughput 80.09 samples/s, lr 28.29
[Epoch 113] throughput 5806.72 samples/s
[Epoch 113] time cost 400.23s, valid loss 4.35, valid ppl 77.75
[Epoch 114 Batch 200/372] loss 4.24, ppl 69.09, throughput 79.48 samples/s, lr 27.86
[Epoch 114] throughput 5752.34 samples/s
[Epoch 114] time cost 403.96s, valid loss 4.36, valid ppl 78.40
[Epoch 115 Batch 200/372] loss 4.23, ppl 68.94, throughput 86.84 samples/s, lr 31.29
[Epoch 115] throughput 6169.92 samples/s
[Epoch 115] time cost 379.29s, valid loss 4.35, valid ppl 77.67
[Epoch 116 Batch 200/372] loss 4.23, ppl 68.67, throughput 84.91 samples/s, lr 26.14
[Epoch 116] throughput 5732.43 samples/s
[Epoch 116] time cost 405.37s, valid loss 4.36, valid ppl 78.53
[Epoch 117 Batch 200/372] loss 4.23, ppl 68.70, throughput 83.87 samples/s, lr 31.71
[Epoch 117] throughput 5688.65 samples/s
[Epoch 117] time cost 408.18s, valid loss 4.35, valid ppl 77.70
[Epoch 118 Batch 200/372] loss 4.22, ppl 68.33, throughput 85.66 samples/s, lr 33.00
[Epoch 118] throughput 5747.43 samples/s
[Epoch 118] time cost 404.04s, valid loss 4.34, valid ppl 77.06
test loss 4.29, test ppl 73.08
[Epoch 119 Batch 200/372] loss 4.22, ppl 67.92, throughput 77.61 samples/s, lr 24.86
[Epoch 119] throughput 5622.65 samples/s
[Epoch 119] time cost 412.32s, valid loss 4.34, valid ppl 77.03
test loss 4.29, test ppl 72.90
[Epoch 120 Batch 200/372] loss 4.22, ppl 67.94, throughput 89.38 samples/s, lr 30.43
[Epoch 120] throughput 6104.96 samples/s
[Epoch 120] time cost 392.11s, valid loss 4.36, valid ppl 77.96
[Epoch 121 Batch 200/372] loss 4.22, ppl 67.96, throughput 93.90 samples/s, lr 28.71
[Epoch 121] throughput 6308.22 samples/s
[Epoch 121] time cost 381.74s, valid loss 4.36, valid ppl 78.16
[Epoch 122 Batch 200/372] loss 4.22, ppl 67.83, throughput 91.57 samples/s, lr 29.57
[Epoch 122] throughput 6180.94 samples/s
[Epoch 122] time cost 384.82s, valid loss 4.34, valid ppl 76.83
test loss 4.29, test ppl 72.92
[Epoch 123 Batch 200/372] loss 4.22, ppl 67.72, throughput 90.41 samples/s, lr 30.86
[Epoch 123] throughput 5624.98 samples/s
[Epoch 123] time cost 412.33s, valid loss 4.37, valid ppl 78.95
[Epoch 124 Batch 200/372] loss 4.21, ppl 67.36, throughput 91.88 samples/s, lr 31.71
[Epoch 124] throughput 5703.81 samples/s
[Epoch 124] time cost 407.19s, valid loss 4.36, valid ppl 78.16
[Epoch 125 Batch 200/372] loss 4.21, ppl 67.16, throughput 92.75 samples/s, lr 30.43
[Epoch 125] throughput 5634.45 samples/s
[Epoch 125] time cost 411.02s, valid loss 4.35, valid ppl 77.55
[Epoch 126 Batch 200/372] loss 4.21, ppl 67.21, throughput 79.95 samples/s, lr 32.14
[Epoch 126] throughput 5406.29 samples/s
[Epoch 126] time cost 427.13s, valid loss 4.36, valid ppl 78.08
[Epoch 127 Batch 200/372] loss 4.20, ppl 66.62, throughput 76.70 samples/s, lr 30.86
[Epoch 127] throughput 5397.91 samples/s
[Epoch 127] time cost 427.32s, valid loss 4.35, valid ppl 77.60
[Epoch 128 Batch 200/372] loss 4.20, ppl 66.93, throughput 76.16 samples/s, lr 30.43
[Epoch 128] throughput 5619.32 samples/s
[Epoch 128] time cost 419.56s, valid loss 4.34, valid ppl 76.67
test loss 4.29, test ppl 73.20
[Epoch 129 Batch 200/372] loss 4.20, ppl 66.50, throughput 92.24 samples/s, lr 28.71
[Epoch 129] throughput 5854.98 samples/s
[Epoch 129] time cost 397.70s, valid loss 4.34, valid ppl 76.95
[Epoch 130 Batch 200/372] loss 4.20, ppl 66.53, throughput 93.34 samples/s, lr 30.86
[Epoch 130] throughput 5811.65 samples/s
[Epoch 130] time cost 399.86s, valid loss 4.35, valid ppl 77.67
[Epoch 131 Batch 200/372] loss 4.19, ppl 65.94, throughput 91.39 samples/s, lr 30.00
[Epoch 131] throughput 5670.40 samples/s
[Epoch 131] time cost 408.85s, valid loss 4.36, valid ppl 78.08
[Epoch 132 Batch 200/372] loss 4.19, ppl 66.18, throughput 92.54 samples/s, lr 32.57
[Epoch 132] throughput 5715.72 samples/s
[Epoch 132] time cost 406.07s, valid loss 4.34, valid ppl 76.82
[Epoch 133 Batch 200/372] loss 4.19, ppl 66.04, throughput 93.35 samples/s, lr 29.14
[Epoch 133] throughput 5780.56 samples/s
[Epoch 133] time cost 402.61s, valid loss 4.35, valid ppl 77.12
[Epoch 134 Batch 200/372] loss 4.19, ppl 66.29, throughput 96.44 samples/s, lr 30.43
[Epoch 134] throughput 5828.52 samples/s
[Epoch 134] time cost 399.32s, valid loss 4.35, valid ppl 77.77
[Epoch 135 Batch 200/372] loss 4.19, ppl 66.06, throughput 90.92 samples/s, lr 34.71
[Epoch 135] throughput 5634.48 samples/s
[Epoch 135] time cost 411.25s, valid loss 4.35, valid ppl 77.36
[Epoch 136 Batch 200/372] loss 4.19, ppl 65.72, throughput 89.03 samples/s, lr 31.29
[Epoch 136] throughput 5569.54 samples/s
[Epoch 136] time cost 415.65s, valid loss 4.35, valid ppl 77.31
[Epoch 137 Batch 200/372] loss 4.19, ppl 66.27, throughput 75.98 samples/s, lr 30.43
[Epoch 137] throughput 5553.57 samples/s
[Epoch 137] time cost 416.92s, valid loss 4.34, valid ppl 76.36
test loss 4.29, test ppl 72.97
[Epoch 138 Batch 200/372] loss 4.18, ppl 65.19, throughput 77.33 samples/s, lr 30.00
[Epoch 138] throughput 5566.40 samples/s
[Epoch 138] time cost 418.87s, valid loss 4.36, valid ppl 77.91
[Epoch 139 Batch 200/372] loss 4.18, ppl 65.48, throughput 84.65 samples/s, lr 27.86
[Epoch 139] throughput 5910.42 samples/s
[Epoch 139] time cost 394.65s, valid loss 4.33, valid ppl 75.88
test loss 4.28, test ppl 72.24
[Epoch 140 Batch 200/372] loss 4.18, ppl 65.09, throughput 87.90 samples/s, lr 30.43
[Epoch 140] throughput 5502.95 samples/s
[Epoch 140] time cost 420.39s, valid loss 4.34, valid ppl 76.91
[Epoch 141 Batch 200/372] loss 4.18, ppl 65.37, throughput 79.68 samples/s, lr 31.71
[Epoch 141] throughput 5569.16 samples/s
[Epoch 141] time cost 415.92s, valid loss 4.34, valid ppl 76.86
[Epoch 142 Batch 200/372] loss 4.18, ppl 65.17, throughput 80.89 samples/s, lr 29.57
[Epoch 142] throughput 5744.30 samples/s
[Epoch 142] time cost 403.94s, valid loss 4.35, valid ppl 77.44
[Epoch 143 Batch 200/372] loss 4.18, ppl 65.20, throughput 76.40 samples/s, lr 29.14
[Epoch 143] throughput 5484.64 samples/s
[Epoch 143] time cost 421.81s, valid loss 4.35, valid ppl 77.41
[Epoch 144 Batch 200/372] loss 4.17, ppl 64.76, throughput 80.01 samples/s, lr 31.29
[Epoch 144] throughput 5885.99 samples/s
[Epoch 144] time cost 395.38s, valid loss 4.36, valid ppl 77.97
[Epoch 145 Batch 200/372] loss 4.17, ppl 65.01, throughput 85.40 samples/s, lr 28.29
[Epoch 145] throughput 5869.57 samples/s
[Epoch 145] time cost 396.84s, valid loss 4.33, valid ppl 76.07
[Epoch 146 Batch 200/372] loss 4.17, ppl 64.93, throughput 92.90 samples/s, lr 30.00
[Epoch 146] throughput 5787.77 samples/s
[Epoch 146] time cost 401.80s, valid loss 4.37, valid ppl 79.17
[Epoch 147 Batch 200/372] loss 4.17, ppl 64.64, throughput 88.33 samples/s, lr 15.43
[Epoch 147] throughput 5709.65 samples/s
[Epoch 147] time cost 406.48s, valid loss 4.33, valid ppl 76.28
[Epoch 148 Batch 200/372] loss 4.16, ppl 64.39, throughput 91.27 samples/s, lr 29.14
[Epoch 148] throughput 5800.30 samples/s
[Epoch 148] time cost 400.79s, valid loss 4.34, valid ppl 76.86
[Epoch 149 Batch 200/372] loss 4.16, ppl 64.38, throughput 94.72 samples/s, lr 27.86
[Epoch 149] throughput 5849.68 samples/s
[Epoch 149] time cost 397.85s, valid loss 4.34, valid ppl 76.63
[Epoch 150 Batch 200/372] loss 4.17, ppl 64.61, throughput 90.85 samples/s, lr 25.71
[Epoch 150] throughput 5595.11 samples/s
[Epoch 150] time cost 414.24s, valid loss 4.34, valid ppl 76.94
[Epoch 151 Batch 200/372] loss 4.16, ppl 64.16, throughput 85.57 samples/s, lr 15.86
[Epoch 151] throughput 5640.44 samples/s
[Epoch 151] time cost 410.77s, valid loss 4.34, valid ppl 76.54
[Epoch 152 Batch 200/372] loss 4.16, ppl 63.95, throughput 79.96 samples/s, lr 33.43
[Epoch 152] throughput 5657.59 samples/s
[Epoch 152] time cost 409.86s, valid loss 4.34, valid ppl 76.48
[Epoch 153 Batch 200/372] loss 4.16, ppl 63.91, throughput 77.60 samples/s, lr 30.00
[Epoch 153] throughput 5635.91 samples/s
[Epoch 153] time cost 411.08s, valid loss 4.33, valid ppl 76.09
[Epoch 154 Batch 200/372] loss 4.15, ppl 63.25, throughput 76.36 samples/s, lr 27.86
[Epoch 154] throughput 5627.37 samples/s
[Epoch 154] time cost 411.85s, valid loss 4.33, valid ppl 76.30
[Epoch 155 Batch 200/372] loss 4.16, ppl 64.05, throughput 76.65 samples/s, lr 33.43
[Epoch 155] throughput 5618.82 samples/s
[Epoch 155] time cost 412.64s, valid loss 4.33, valid ppl 76.31
[Epoch 156 Batch 200/372] loss 4.15, ppl 63.49, throughput 80.79 samples/s, lr 30.43
[Epoch 156] throughput 5669.51 samples/s
[Epoch 156] time cost 411.45s, valid loss 4.33, valid ppl 75.73
test loss 4.28, test ppl 72.28
[Epoch 157 Batch 200/372] loss 4.16, ppl 63.85, throughput 86.95 samples/s, lr 29.14
[Epoch 157] throughput 5511.12 samples/s
[Epoch 157] time cost 419.76s, valid loss 4.34, valid ppl 76.99
[Epoch 158 Batch 200/372] loss 4.15, ppl 63.52, throughput 87.06 samples/s, lr 27.86
[Epoch 158] throughput 5570.80 samples/s
[Epoch 158] time cost 415.71s, valid loss 4.32, valid ppl 75.34
test loss 4.28, test ppl 71.93
[Epoch 159 Batch 200/372] loss 4.15, ppl 63.27, throughput 77.87 samples/s, lr 36.00
[Epoch 159] throughput 5583.38 samples/s
[Epoch 159] time cost 414.95s, valid loss 4.33, valid ppl 75.67
[Epoch 160 Batch 200/372] loss 4.15, ppl 63.47, throughput 74.34 samples/s, lr 33.00
[Epoch 160] throughput 5376.27 samples/s
[Epoch 160] time cost 434.12s, valid loss 4.33, valid ppl 75.57
[Epoch 161 Batch 200/372] loss 4.15, ppl 63.23, throughput 79.80 samples/s, lr 29.14
[Epoch 161] throughput 5482.44 samples/s
[Epoch 161] time cost 426.71s, valid loss 4.33, valid ppl 75.85
[Epoch 162 Batch 200/372] loss 4.14, ppl 63.11, throughput 87.88 samples/s, lr 28.71
[Epoch 162] throughput 5849.91 samples/s
[Epoch 162] time cost 401.22s, valid loss 4.32, valid ppl 75.48
[Epoch 163 Batch 200/372] loss 4.14, ppl 62.81, throughput 89.02 samples/s, lr 30.43
[Epoch 163] throughput 5620.54 samples/s
[Epoch 163] time cost 412.40s, valid loss 4.34, valid ppl 76.57
[Epoch 164 Batch 200/372] loss 4.14, ppl 63.00, throughput 90.11 samples/s, lr 34.29
[Epoch 164] throughput 6015.97 samples/s
[Epoch 164] time cost 391.63s, valid loss 4.33, valid ppl 75.71
[Epoch 165 Batch 200/372] loss 4.14, ppl 62.54, throughput 97.85 samples/s, lr 33.86
[Epoch 165] throughput 6429.26 samples/s
[Epoch 165] time cost 376.03s, valid loss 4.33, valid ppl 75.75
[Epoch 166 Batch 200/372] loss 4.14, ppl 62.83, throughput 91.23 samples/s, lr 27.00
[Epoch 166] throughput 5836.18 samples/s
[Epoch 166] time cost 403.13s, valid loss 4.34, valid ppl 76.58
[Epoch 167 Batch 200/372] loss 4.14, ppl 62.76, throughput 90.74 samples/s, lr 27.43
[Epoch 167] throughput 5568.20 samples/s
[Epoch 167] time cost 415.48s, valid loss 4.34, valid ppl 76.33
[Epoch 168 Batch 200/372] loss 4.13, ppl 62.48, throughput 88.69 samples/s, lr 31.29
[Epoch 168] throughput 5658.11 samples/s
[Epoch 168] time cost 409.94s, valid loss 4.35, valid ppl 77.27
[Epoch 169 Batch 200/372] loss 4.14, ppl 62.84, throughput 92.82 samples/s, lr 28.71
[Epoch 169] throughput 5765.86 samples/s
[Epoch 169] time cost 403.78s, valid loss 4.33, valid ppl 76.10
[Epoch 170 Batch 200/372] loss 4.13, ppl 62.46, throughput 88.44 samples/s, lr 30.86
[Epoch 170] throughput 5470.49 samples/s
[Epoch 170] time cost 422.62s, valid loss 4.32, valid ppl 75.11
test loss 4.27, test ppl 71.56
[Epoch 171 Batch 200/372] loss 4.13, ppl 62.26, throughput 74.50 samples/s, lr 27.43
[Epoch 171] throughput 5502.17 samples/s
[Epoch 171] time cost 419.95s, valid loss 4.32, valid ppl 75.48
[Epoch 172 Batch 200/372] loss 4.14, ppl 62.56, throughput 77.02 samples/s, lr 25.29
[Epoch 172] throughput 5760.68 samples/s
[Epoch 172] time cost 403.15s, valid loss 4.36, valid ppl 78.09
[Epoch 173 Batch 200/372] loss 4.12, ppl 61.78, throughput 78.69 samples/s, lr 31.71
[Epoch 173] throughput 5646.10 samples/s
[Epoch 173] time cost 410.30s, valid loss 4.33, valid ppl 75.99
[Epoch 174 Batch 200/372] loss 4.13, ppl 62.12, throughput 77.48 samples/s, lr 32.14
[Epoch 174] throughput 5681.67 samples/s
[Epoch 174] time cost 408.50s, valid loss 4.33, valid ppl 75.78
[Epoch 175 Batch 200/372] loss 4.12, ppl 61.82, throughput 86.37 samples/s, lr 31.29
[Epoch 175] throughput 6024.27 samples/s
[Epoch 175] time cost 387.41s, valid loss 4.35, valid ppl 77.30
[Epoch 176 Batch 200/372] loss 4.12, ppl 61.75, throughput 76.66 samples/s, lr 28.29
[Epoch 176] throughput 5517.99 samples/s
[Epoch 176] time cost 419.95s, valid loss 4.32, valid ppl 75.43
[Epoch 177 Batch 200/372] loss 4.12, ppl 61.62, throughput 78.46 samples/s, lr 30.86
[Epoch 177] throughput 5561.10 samples/s
[Epoch 177] time cost 424.70s, valid loss 4.32, valid ppl 75.54
[Epoch 178 Batch 200/372] loss 4.12, ppl 61.85, throughput 87.46 samples/s, lr 27.43
[Epoch 178] throughput 5651.47 samples/s
[Epoch 178] time cost 412.56s, valid loss 4.32, valid ppl 75.24
[Epoch 179 Batch 200/372] loss 4.12, ppl 61.69, throughput 93.92 samples/s, lr 27.86
[Epoch 179] throughput 5695.84 samples/s
[Epoch 179] time cost 407.54s, valid loss 4.34, valid ppl 76.45
[Epoch 180 Batch 200/372] loss 4.13, ppl 61.89, throughput 88.78 samples/s, lr 31.71
[Epoch 180] throughput 5579.53 samples/s
[Epoch 180] time cost 415.00s, valid loss 4.33, valid ppl 76.08
[Epoch 181 Batch 200/372] loss 4.12, ppl 61.62, throughput 89.97 samples/s, lr 29.57
[Epoch 181] throughput 5782.42 samples/s
[Epoch 181] time cost 401.95s, valid loss 4.33, valid ppl 76.04
[Epoch 182 Batch 200/372] loss 4.12, ppl 61.44, throughput 96.09 samples/s, lr 30.43
[Epoch 182] throughput 5857.82 samples/s
[Epoch 182] time cost 397.38s, valid loss 4.33, valid ppl 76.16
[Epoch 183 Batch 200/372] loss 4.12, ppl 61.39, throughput 76.68 samples/s, lr 29.57
[Epoch 183] throughput 5435.31 samples/s
[Epoch 183] time cost 425.26s, valid loss 4.34, valid ppl 76.58
[Epoch 184 Batch 200/372] loss 4.12, ppl 61.52, throughput 80.00 samples/s, lr 28.29
[Epoch 184] throughput 5711.12 samples/s
[Epoch 184] time cost 406.31s, valid loss 4.33, valid ppl 75.83
[Epoch 185 Batch 200/372] loss 4.11, ppl 61.12, throughput 77.94 samples/s, lr 26.14
[Epoch 185] throughput 5598.94 samples/s
[Epoch 185] time cost 414.39s, valid loss 4.33, valid ppl 75.77
[Epoch 186 Batch 200/372] loss 4.12, ppl 61.28, throughput 76.73 samples/s, lr 31.29
[Epoch 186] throughput 5528.49 samples/s
[Epoch 186] time cost 422.91s, valid loss 4.33, valid ppl 76.30
[Epoch 187 Batch 200/372] loss 4.11, ppl 60.87, throughput 81.36 samples/s, lr 27.00
[Epoch 187] throughput 5811.46 samples/s
[Epoch 187] time cost 405.56s, valid loss 4.33, valid ppl 75.95
[Epoch 188 Batch 200/372] loss 4.11, ppl 61.16, throughput 82.18 samples/s, lr 28.71
[Epoch 188] throughput 5374.02 samples/s
[Epoch 188] time cost 429.85s, valid loss 4.32, valid ppl 74.99
test loss 4.27, test ppl 71.22
[Epoch 189 Batch 200/372] loss 4.11, ppl 61.12, throughput 80.85 samples/s, lr 30.43
[Epoch 189] throughput 5446.04 samples/s
[Epoch 189] time cost 424.10s, valid loss 4.33, valid ppl 76.03
[Epoch 190 Batch 200/372] loss 4.11, ppl 60.96, throughput 77.85 samples/s, lr 15.86
[Epoch 190] throughput 5471.69 samples/s
[Epoch 190] time cost 422.05s, valid loss 4.33, valid ppl 76.15
[Epoch 191 Batch 200/372] loss 4.11, ppl 60.96, throughput 77.80 samples/s, lr 30.00
[Epoch 191] throughput 5582.31 samples/s
[Epoch 191] time cost 415.15s, valid loss 4.32, valid ppl 75.38
[Epoch 192 Batch 200/372] loss 4.11, ppl 60.90, throughput 77.04 samples/s, lr 14.14
[Epoch 192] throughput 5505.95 samples/s
[Epoch 192] time cost 425.51s, valid loss 4.33, valid ppl 75.99
[Epoch 193 Batch 200/372] loss 4.10, ppl 60.61, throughput 85.14 samples/s, lr 30.86
[Epoch 193] throughput 5887.66 samples/s
[Epoch 193] time cost 404.21s, valid loss 4.32, valid ppl 75.31
[Epoch 194 Batch 200/372] loss 4.11, ppl 60.94, throughput 92.44 samples/s, lr 33.43
[Epoch 194] throughput 6149.69 samples/s
[Epoch 194] time cost 389.72s, valid loss 4.32, valid ppl 75.26
[Epoch 195 Batch 200/372] loss 4.11, ppl 60.65, throughput 90.47 samples/s, lr 31.71
[Epoch 195] throughput 5754.30 samples/s
[Epoch 195] time cost 407.11s, valid loss 4.32, valid ppl 75.21
[Epoch 196 Batch 200/372] loss 4.10, ppl 60.63, throughput 90.76 samples/s, lr 32.14
[Epoch 196] throughput 5707.21 samples/s
[Epoch 196] time cost 408.15s, valid loss 4.32, valid ppl 75.30
[Epoch 197 Batch 200/372] loss 4.10, ppl 60.56, throughput 88.79 samples/s, lr 32.14
[Epoch 197] throughput 5825.69 samples/s
[Epoch 197] time cost 401.83s, valid loss 4.33, valid ppl 75.69
[Epoch 198 Batch 200/372] loss 4.10, ppl 60.36, throughput 96.06 samples/s, lr 28.71
[Epoch 198] throughput 5997.77 samples/s
[Epoch 198] time cost 392.54s, valid loss 4.33, valid ppl 75.63
[Epoch 199 Batch 200/372] loss 4.10, ppl 60.59, throughput 90.70 samples/s, lr 27.86
[Epoch 199] throughput 5510.08 samples/s
[Epoch 199] time cost 419.64s, valid loss 4.33, valid ppl 75.82
[Epoch 200 Batch 200/372] loss 4.10, ppl 60.24, throughput 90.36 samples/s, lr 30.86
[Epoch 200] throughput 5585.67 samples/s
[Epoch 200] time cost 414.33s, valid loss 4.33, valid ppl 75.67
[Epoch 201 Batch 200/372] loss 4.10, ppl 60.34, throughput 77.10 samples/s, lr 30.00
[Epoch 201] throughput 5511.77 samples/s
[Epoch 201] time cost 419.90s, valid loss 4.31, valid ppl 74.77
test loss 4.27, test ppl 71.25
[Epoch 202 Batch 200/372] loss 4.10, ppl 60.47, throughput 77.99 samples/s, lr 30.86
[Epoch 202] throughput 5764.48 samples/s
[Epoch 202] time cost 402.75s, valid loss 4.33, valid ppl 75.91
[Epoch 203 Batch 200/372] loss 4.10, ppl 60.23, throughput 78.29 samples/s, lr 28.29
[Epoch 203] throughput 5778.89 samples/s
[Epoch 203] time cost 402.03s, valid loss 4.32, valid ppl 74.89
[Epoch 204 Batch 200/372] loss 4.10, ppl 60.26, throughput 77.21 samples/s, lr 32.14
[Epoch 204] throughput 5836.39 samples/s
[Epoch 204] time cost 398.93s, valid loss 4.33, valid ppl 75.75
[Epoch 205 Batch 200/372] loss 4.10, ppl 60.12, throughput 79.30 samples/s, lr 31.29
[Epoch 205] throughput 5788.83 samples/s
[Epoch 205] time cost 401.52s, valid loss 4.32, valid ppl 75.44
[Epoch 206 Batch 200/372] loss 4.10, ppl 60.37, throughput 80.07 samples/s, lr 31.71
[Epoch 206] throughput 5746.75 samples/s
[Epoch 206] time cost 404.14s, valid loss 4.32, valid ppl 75.14
[Epoch 207 Batch 200/372] loss 4.09, ppl 59.89, throughput 82.78 samples/s, lr 27.86
[Epoch 207] throughput 5910.33 samples/s
[Epoch 207] time cost 394.28s, valid loss 4.33, valid ppl 75.99
[Epoch 208 Batch 200/372] loss 4.09, ppl 59.81, throughput 80.30 samples/s, lr 29.57
[Epoch 208] throughput 5802.46 samples/s
[Epoch 208] time cost 400.65s, valid loss 4.32, valid ppl 75.20
[Epoch 209 Batch 200/372] loss 4.09, ppl 59.53, throughput 84.59 samples/s, lr 27.43
[Epoch 209] throughput 5755.41 samples/s
[Epoch 209] time cost 404.24s, valid loss 4.32, valid ppl 74.99
[Epoch 210 Batch 200/372] loss 4.10, ppl 60.11, throughput 81.69 samples/s, lr 27.43
[Epoch 210] throughput 5682.75 samples/s
[Epoch 210] time cost 408.46s, valid loss 4.31, valid ppl 74.26
test loss 4.26, test ppl 70.80
[Epoch 211 Batch 200/372] loss 4.09, ppl 59.80, throughput 78.09 samples/s, lr 23.57
[Epoch 211] throughput 5786.55 samples/s
[Epoch 211] time cost 401.84s, valid loss 4.32, valid ppl 74.95
[Epoch 212 Batch 200/372] loss 4.09, ppl 59.80, throughput 81.43 samples/s, lr 30.86
[Epoch 212] throughput 5846.06 samples/s
[Epoch 212] time cost 397.81s, valid loss 4.34, valid ppl 76.60
[Epoch 213 Batch 200/372] loss 4.09, ppl 59.46, throughput 80.30 samples/s, lr 13.71
[Epoch 213] throughput 5828.77 samples/s
[Epoch 213] time cost 398.76s, valid loss 4.33, valid ppl 75.99
[Epoch 214 Batch 200/372] loss 4.09, ppl 59.80, throughput 80.88 samples/s, lr 29.57
[Epoch 214] throughput 5639.27 samples/s
[Epoch 214] time cost 411.00s, valid loss 4.31, valid ppl 74.38
[Epoch 215 Batch 200/372] loss 4.09, ppl 59.84, throughput 76.52 samples/s, lr 31.29
[Epoch 215] throughput 5504.78 samples/s
[Epoch 215] time cost 422.31s, valid loss 4.31, valid ppl 74.54
[Epoch 216 Batch 200/372] loss 4.08, ppl 59.33, throughput 80.32 samples/s, lr 27.00
[Epoch 216] throughput 5682.48 samples/s
[Epoch 216] time cost 413.06s, valid loss 4.32, valid ppl 74.97
[Epoch 217 Batch 200/372] loss 4.08, ppl 59.18, throughput 84.01 samples/s, lr 29.57
[Epoch 217] throughput 5897.39 samples/s
[Epoch 217] time cost 402.56s, valid loss 4.31, valid ppl 74.29
[Epoch 218 Batch 200/372] loss 4.08, ppl 59.15, throughput 91.03 samples/s, lr 27.00
[Epoch 218] throughput 6210.86 samples/s
[Epoch 218] time cost 385.37s, valid loss 4.32, valid ppl 75.46
[Epoch 219 Batch 200/372] loss 4.08, ppl 59.05, throughput 86.14 samples/s, lr 32.14
[Epoch 219] throughput 5495.34 samples/s
[Epoch 219] time cost 421.08s, valid loss 4.31, valid ppl 74.78
[Epoch 220 Batch 200/372] loss 4.08, ppl 58.92, throughput 84.58 samples/s, lr 32.14
[Epoch 220] throughput 5469.48 samples/s
[Epoch 220] time cost 422.48s, valid loss 4.33, valid ppl 75.66
[Epoch 221 Batch 200/372] loss 4.08, ppl 58.98, throughput 84.45 samples/s, lr 30.00
[Epoch 221] throughput 5708.92 samples/s
[Epoch 221] time cost 406.54s, valid loss 4.33, valid ppl 75.96
[Epoch 222 Batch 200/372] loss 4.08, ppl 59.11, throughput 82.69 samples/s, lr 31.71
[Epoch 222] throughput 5676.46 samples/s
[Epoch 222] time cost 408.66s, valid loss 4.34, valid ppl 76.88
[Epoch 223 Batch 200/372] loss 4.08, ppl 58.87, throughput 84.27 samples/s, lr 29.57
[Epoch 223] throughput 5675.44 samples/s
[Epoch 223] time cost 408.42s, valid loss 4.33, valid ppl 75.82
[Epoch 224 Batch 200/372] loss 4.08, ppl 59.13, throughput 75.59 samples/s, lr 30.43
[Epoch 224] throughput 5585.46 samples/s
[Epoch 224] time cost 414.61s, valid loss 4.33, valid ppl 75.61
[Epoch 225 Batch 200/372] loss 4.08, ppl 58.89, throughput 79.47 samples/s, lr 29.57
[Epoch 225] throughput 5746.36 samples/s
[Epoch 225] time cost 403.98s, valid loss 4.31, valid ppl 74.44
[Epoch 226 Batch 200/372] loss 4.08, ppl 59.02, throughput 78.42 samples/s, lr 25.71
[Epoch 226] throughput 5676.18 samples/s
[Epoch 226] time cost 408.78s, valid loss 4.33, valid ppl 75.61
[Epoch 227 Batch 200/372] loss 4.08, ppl 59.02, throughput 83.83 samples/s, lr 13.29
[Epoch 227] throughput 5940.95 samples/s
[Epoch 227] time cost 392.60s, valid loss 4.33, valid ppl 76.26
[Epoch 228 Batch 200/372] loss 4.08, ppl 59.20, throughput 80.43 samples/s, lr 29.14
[Epoch 228] throughput 5886.08 samples/s
[Epoch 228] time cost 395.35s, valid loss 4.32, valid ppl 75.19
[Epoch 229 Batch 200/372] loss 4.07, ppl 58.61, throughput 92.22 samples/s, lr 29.14
[Epoch 229] throughput 5818.95 samples/s
[Epoch 229] time cost 399.77s, valid loss 4.31, valid ppl 74.60
[Epoch 230 Batch 200/372] loss 4.07, ppl 58.70, throughput 76.73 samples/s, lr 29.14
[Epoch 230] throughput 5552.43 samples/s
[Epoch 230] time cost 416.57s, valid loss 4.31, valid ppl 74.78
[Epoch 231 Batch 200/372] loss 4.07, ppl 58.40, throughput 77.94 samples/s, lr 30.86
[Epoch 231] throughput 5650.92 samples/s
[Epoch 231] time cost 410.13s, valid loss 4.31, valid ppl 74.79
[Epoch 232 Batch 200/372] loss 4.07, ppl 58.51, throughput 78.62 samples/s, lr 9.00
[Epoch 232] throughput 5759.07 samples/s
[Epoch 232] time cost 403.30s, valid loss 4.31, valid ppl 74.72
[Epoch 233 Batch 200/372] loss 4.07, ppl 58.52, throughput 80.59 samples/s, lr 29.14
[Epoch 233] throughput 5697.98 samples/s
[Epoch 233] time cost 406.98s, valid loss 4.31, valid ppl 74.65
[Epoch 234 Batch 200/372] loss 4.07, ppl 58.43, throughput 74.13 samples/s, lr 30.00
[Epoch 234] throughput 5622.95 samples/s
[Epoch 234] time cost 412.50s, valid loss 4.31, valid ppl 74.56
[Epoch 235 Batch 200/372] loss 4.07, ppl 58.52, throughput 79.86 samples/s, lr 29.57
[Epoch 235] throughput 5674.21 samples/s
[Epoch 235] time cost 409.10s, valid loss 4.32, valid ppl 75.28
[Epoch 236 Batch 200/372] loss 4.07, ppl 58.47, throughput 80.91 samples/s, lr 30.00
[Epoch 236] throughput 5827.90 samples/s
[Epoch 236] time cost 398.96s, valid loss 4.31, valid ppl 74.14
test loss 4.26, test ppl 70.80
[Epoch 237 Batch 200/372] loss 4.07, ppl 58.57, throughput 96.33 samples/s, lr 31.71
[Epoch 237] throughput 6602.91 samples/s
[Epoch 237] time cost 357.31s, valid loss 4.32, valid ppl 75.14
[Epoch 238 Batch 200/372] loss 4.06, ppl 58.23, throughput 81.80 samples/s, lr 25.71
[Epoch 238] throughput 5879.69 samples/s
[Epoch 238] time cost 396.37s, valid loss 4.33, valid ppl 75.70
[Epoch 239 Batch 200/372] loss 4.06, ppl 58.16, throughput 78.11 samples/s, lr 29.57
[Epoch 239] throughput 5637.04 samples/s
[Epoch 239] time cost 411.47s, valid loss 4.34, valid ppl 76.64
[Epoch 240 Batch 200/372] loss 4.06, ppl 58.15, throughput 77.96 samples/s, lr 28.29
[Epoch 240] throughput 5574.72 samples/s
[Epoch 240] time cost 418.20s, valid loss 4.32, valid ppl 74.93
[Epoch 241 Batch 200/372] loss 4.06, ppl 58.16, throughput 84.26 samples/s, lr 31.71
[Epoch 241] throughput 5809.73 samples/s
[Epoch 241] time cost 403.62s, valid loss 4.31, valid ppl 74.50
[Epoch 242 Batch 200/372] loss 4.06, ppl 58.10, throughput 82.33 samples/s, lr 34.29
[Epoch 242] throughput 5787.25 samples/s
[Epoch 242] time cost 411.88s, valid loss 4.31, valid ppl 74.53
[Epoch 243 Batch 200/372] loss 4.07, ppl 58.34, throughput 85.83 samples/s, lr 32.57
[Epoch 243] throughput 5589.53 samples/s
[Epoch 243] time cost 416.90s, valid loss 4.31, valid ppl 74.66
[Epoch 244 Batch 200/372] loss 4.07, ppl 58.54, throughput 93.52 samples/s, lr 34.71
[Epoch 244] throughput 6203.16 samples/s
[Epoch 244] time cost 385.04s, valid loss 4.33, valid ppl 76.16
[Epoch 245 Batch 200/372] loss 4.07, ppl 58.35, throughput 92.92 samples/s, lr 16.29
[Epoch 245] throughput 5845.85 samples/s
[Epoch 245] time cost 399.29s, valid loss 4.31, valid ppl 74.43
[Epoch 246 Batch 200/372] loss 4.06, ppl 58.24, throughput 92.01 samples/s, lr 26.14
[Epoch 246] throughput 5622.59 samples/s
[Epoch 246] time cost 412.24s, valid loss 4.33, valid ppl 76.17
[Epoch 247 Batch 200/372] loss 4.06, ppl 57.85, throughput 163.14 samples/s, lr 26.14
[Epoch 247] throughput 11291.97 samples/s
[Epoch 247] time cost 225.54s, valid loss 4.31, valid ppl 74.41
[Epoch 248 Batch 200/372] loss 4.06, ppl 58.20, throughput 162.87 samples/s, lr 28.71
[Epoch 248] throughput 11050.18 samples/s
[Epoch 248] time cost 230.05s, valid loss 4.30, valid ppl 73.84
test loss 4.26, test ppl 70.50
[Epoch 249 Batch 200/372] loss 4.05, ppl 57.64, throughput 166.17 samples/s, lr 35.57
[Epoch 249] throughput 11221.29 samples/s
[Epoch 249] time cost 226.91s, valid loss 4.31, valid ppl 74.80
[Epoch 250 Batch 200/372] loss 4.05, ppl 57.62, throughput 160.67 samples/s, lr 30.43
[Epoch 250] throughput 10875.83 samples/s
[Epoch 250] time cost 233.12s, valid loss 4.31, valid ppl 74.80
[Epoch 251 Batch 200/372] loss 4.05, ppl 57.64, throughput 158.48 samples/s, lr 28.29
[Epoch 251] throughput 10663.23 samples/s
[Epoch 251] time cost 237.18s, valid loss 4.32, valid ppl 75.40
[Epoch 252 Batch 200/372] loss 4.06, ppl 57.88, throughput 160.83 samples/s, lr 31.29
[Epoch 252] throughput 10900.55 samples/s
[Epoch 252] time cost 232.71s, valid loss 4.32, valid ppl 75.08
[Epoch 253 Batch 200/372] loss 4.05, ppl 57.68, throughput 165.35 samples/s, lr 30.86
[Epoch 253] throughput 11355.07 samples/s
[Epoch 253] time cost 224.79s, valid loss 4.31, valid ppl 74.48
[Epoch 254 Batch 200/372] loss 4.05, ppl 57.67, throughput 162.05 samples/s, lr 30.86
[Epoch 254] throughput 10761.88 samples/s
[Epoch 254] time cost 234.60s, valid loss 4.31, valid ppl 74.42
[Epoch 255 Batch 200/372] loss 4.05, ppl 57.55, throughput 155.69 samples/s, lr 29.14
[Epoch 255] throughput 10615.18 samples/s
[Epoch 255] time cost 237.63s, valid loss 4.31, valid ppl 74.48
[Epoch 256 Batch 200/372] loss 4.05, ppl 57.55, throughput 160.46 samples/s, lr 28.29
[Epoch 256] throughput 11124.18 samples/s
[Epoch 256] time cost 229.06s, valid loss 4.33, valid ppl 75.89
[Epoch 257 Batch 200/372] loss 4.05, ppl 57.48, throughput 161.26 samples/s, lr 29.57
[Epoch 257] throughput 10912.61 samples/s
[Epoch 257] time cost 232.25s, valid loss 4.32, valid ppl 75.04
[Epoch 258 Batch 200/372] loss 4.05, ppl 57.62, throughput 160.41 samples/s, lr 29.14
[Epoch 258] throughput 11026.64 samples/s
[Epoch 258] time cost 230.19s, valid loss 4.31, valid ppl 74.37
[Epoch 259 Batch 200/372] loss 4.05, ppl 57.43, throughput 174.16 samples/s, lr 30.86
[Epoch 259] throughput 11663.37 samples/s
[Epoch 259] time cost 220.10s, valid loss 4.32, valid ppl 75.22
[Epoch 260 Batch 200/372] loss 4.05, ppl 57.66, throughput 167.82 samples/s, lr 27.86
[Epoch 260] throughput 11298.28 samples/s
[Epoch 260] time cost 226.20s, valid loss 4.32, valid ppl 75.44
[Epoch 261 Batch 200/372] loss 4.05, ppl 57.65, throughput 161.26 samples/s, lr 28.71
[Epoch 261] throughput 11316.53 samples/s
[Epoch 261] time cost 225.83s, valid loss 4.32, valid ppl 75.28
[Epoch 262 Batch 200/372] loss 4.05, ppl 57.35, throughput 161.75 samples/s, lr 30.00
[Epoch 262] throughput 10956.85 samples/s
[Epoch 262] time cost 231.36s, valid loss 4.31, valid ppl 74.66
[Epoch 263 Batch 200/372] loss 4.05, ppl 57.45, throughput 162.32 samples/s, lr 31.71
[Epoch 263] throughput 11026.72 samples/s
[Epoch 263] time cost 230.43s, valid loss 4.32, valid ppl 75.01
[Epoch 264 Batch 200/372] loss 4.04, ppl 57.10, throughput 160.35 samples/s, lr 27.43
[Epoch 264] throughput 10902.61 samples/s
[Epoch 264] time cost 232.45s, valid loss 4.31, valid ppl 74.77
[Epoch 265 Batch 200/372] loss 4.04, ppl 57.08, throughput 156.16 samples/s, lr 30.43
[Epoch 265] throughput 10729.75 samples/s
[Epoch 265] time cost 235.62s, valid loss 4.31, valid ppl 74.48
[Epoch 266 Batch 200/372] loss 4.04, ppl 56.84, throughput 162.03 samples/s, lr 30.00
[Epoch 266] throughput 10744.56 samples/s
[Epoch 266] time cost 235.63s, valid loss 4.30, valid ppl 73.57
test loss 4.25, test ppl 70.38
[Epoch 267 Batch 200/372] loss 4.04, ppl 56.93, throughput 160.45 samples/s, lr 30.86
[Epoch 267] throughput 11000.47 samples/s
[Epoch 267] time cost 230.66s, valid loss 4.30, valid ppl 73.47
test loss 4.25, test ppl 70.21
[Epoch 268 Batch 200/372] loss 4.05, ppl 57.19, throughput 162.61 samples/s, lr 30.43
[Epoch 268] throughput 10785.37 samples/s
[Epoch 268] time cost 234.71s, valid loss 4.31, valid ppl 74.35
[Epoch 269 Batch 200/372] loss 4.04, ppl 56.94, throughput 157.74 samples/s, lr 27.00
[Epoch 269] throughput 10615.60 samples/s
[Epoch 269] time cost 237.84s, valid loss 4.30, valid ppl 73.44
test loss 4.25, test ppl 70.25
[Epoch 270 Batch 200/372] loss 4.04, ppl 56.87, throughput 154.08 samples/s, lr 31.29
[Epoch 270] throughput 10823.20 samples/s
[Epoch 270] time cost 234.05s, valid loss 4.31, valid ppl 74.51
[Epoch 271 Batch 200/372] loss 4.04, ppl 57.00, throughput 154.19 samples/s, lr 26.57
[Epoch 271] throughput 10658.95 samples/s
[Epoch 271] time cost 236.83s, valid loss 4.33, valid ppl 75.61
[Epoch 272 Batch 200/372] loss 4.04, ppl 56.90, throughput 157.63 samples/s, lr 32.57
[Epoch 272] throughput 10983.69 samples/s
[Epoch 272] time cost 230.98s, valid loss 4.31, valid ppl 74.75
[Epoch 273 Batch 200/372] loss 4.04, ppl 57.10, throughput 159.17 samples/s, lr 30.86
[Epoch 273] throughput 11100.08 samples/s
[Epoch 273] time cost 228.89s, valid loss 4.31, valid ppl 74.65
[Epoch 274 Batch 200/372] loss 4.04, ppl 56.93, throughput 168.62 samples/s, lr 30.43
[Epoch 274] throughput 11301.89 samples/s
[Epoch 274] time cost 225.41s, valid loss 4.32, valid ppl 74.89
[Epoch 275 Batch 200/372] loss 4.04, ppl 57.08, throughput 162.02 samples/s, lr 30.43
[Epoch 275] throughput 10992.83 samples/s
[Epoch 275] time cost 230.71s, valid loss 4.30, valid ppl 73.64
[Epoch 276 Batch 200/372] loss 4.04, ppl 56.83, throughput 160.77 samples/s, lr 34.71
[Epoch 276] throughput 10981.84 samples/s
[Epoch 276] time cost 231.11s, valid loss 4.32, valid ppl 75.32
[Epoch 277 Batch 200/372] loss 4.04, ppl 56.70, throughput 156.84 samples/s, lr 33.00
[Epoch 277] throughput 10653.05 samples/s
[Epoch 277] time cost 237.35s, valid loss 4.32, valid ppl 75.31
[Epoch 278 Batch 200/372] loss 4.04, ppl 56.76, throughput 159.93 samples/s, lr 29.57
[Epoch 278] throughput 10967.50 samples/s
[Epoch 278] time cost 231.68s, valid loss 4.31, valid ppl 74.22
[Epoch 279 Batch 200/372] loss 4.04, ppl 56.85, throughput 169.65 samples/s, lr 29.57
[Epoch 279] throughput 11490.58 samples/s
[Epoch 279] time cost 222.56s, valid loss 4.30, valid ppl 73.61
[Epoch 280 Batch 200/372] loss 4.04, ppl 56.80, throughput 160.79 samples/s, lr 31.29
[Epoch 280] throughput 10903.62 samples/s
[Epoch 280] time cost 232.47s, valid loss 4.31, valid ppl 74.50
[Epoch 281 Batch 200/372] loss 4.04, ppl 56.73, throughput 164.47 samples/s, lr 28.71
[Epoch 281] throughput 11133.72 samples/s
[Epoch 281] time cost 228.21s, valid loss 4.31, valid ppl 74.53
[Epoch 282 Batch 200/372] loss 4.04, ppl 56.65, throughput 160.90 samples/s, lr 32.14
[Epoch 282] throughput 10926.61 samples/s
[Epoch 282] time cost 232.31s, valid loss 4.30, valid ppl 73.66
[Epoch 283 Batch 200/372] loss 4.03, ppl 56.41, throughput 161.02 samples/s, lr 32.57
[Epoch 283] throughput 11053.78 samples/s
[Epoch 283] time cost 229.61s, valid loss 4.31, valid ppl 74.76
[Epoch 284 Batch 200/372] loss 4.04, ppl 56.66, throughput 160.70 samples/s, lr 26.57
[Epoch 284] throughput 10799.42 samples/s
[Epoch 284] time cost 234.26s, valid loss 4.30, valid ppl 73.79
[Epoch 285 Batch 200/372] loss 4.03, ppl 56.36, throughput 158.85 samples/s, lr 27.86
[Epoch 285] throughput 10671.52 samples/s
[Epoch 285] time cost 236.81s, valid loss 4.30, valid ppl 74.03
[Epoch 286 Batch 200/372] loss 4.03, ppl 56.39, throughput 168.66 samples/s, lr 27.43
[Epoch 286] throughput 11145.95 samples/s
[Epoch 286] time cost 228.47s, valid loss 4.31, valid ppl 74.15
[Epoch 287 Batch 200/372] loss 4.03, ppl 56.46, throughput 159.79 samples/s, lr 29.14
[Epoch 287] throughput 10892.39 samples/s
[Epoch 287] time cost 232.77s, valid loss 4.31, valid ppl 74.47
[Epoch 288 Batch 200/372] loss 4.03, ppl 56.15, throughput 159.14 samples/s, lr 32.14
[Epoch 288] throughput 10764.06 samples/s
[Epoch 288] time cost 235.17s, valid loss 4.30, valid ppl 74.03
[Epoch 289 Batch 200/372] loss 4.04, ppl 56.65, throughput 155.70 samples/s, lr 27.86
[Epoch 289] throughput 10695.01 samples/s
[Epoch 289] time cost 236.04s, valid loss 4.31, valid ppl 74.19
[Epoch 290 Batch 200/372] loss 4.04, ppl 56.77, throughput 164.03 samples/s, lr 28.71
[Epoch 290] throughput 11095.71 samples/s
[Epoch 290] time cost 229.08s, valid loss 4.32, valid ppl 74.96
[Epoch 291 Batch 200/372] loss 4.03, ppl 56.38, throughput 153.30 samples/s, lr 28.29
[Epoch 291] throughput 10746.08 samples/s
[Epoch 291] time cost 235.46s, valid loss 4.32, valid ppl 75.33
[Epoch 292 Batch 200/372] loss 4.03, ppl 56.33, throughput 157.97 samples/s, lr 18.86
[Epoch 292] throughput 10823.73 samples/s
[Epoch 292] time cost 233.82s, valid loss 4.32, valid ppl 75.15
[Epoch 293 Batch 200/372] loss 4.04, ppl 56.59, throughput 158.72 samples/s, lr 32.14
[Epoch 293] throughput 10554.99 samples/s
[Epoch 293] time cost 238.84s, valid loss 4.31, valid ppl 74.23
[Epoch 294 Batch 200/372] loss 4.03, ppl 56.28, throughput 161.85 samples/s, lr 27.00
[Epoch 294] throughput 11100.17 samples/s
[Epoch 294] time cost 228.83s, valid loss 4.32, valid ppl 74.90
[Epoch 295 Batch 200/372] loss 4.03, ppl 56.11, throughput 162.83 samples/s, lr 14.57
[Epoch 295] throughput 11077.34 samples/s
[Epoch 295] time cost 229.46s, valid loss 4.30, valid ppl 73.95
[Epoch 296 Batch 200/372] loss 4.03, ppl 56.17, throughput 165.22 samples/s, lr 17.57
[Epoch 296] throughput 11017.07 samples/s
[Epoch 296] time cost 230.50s, valid loss 4.31, valid ppl 74.42
[Epoch 297 Batch 200/372] loss 4.03, ppl 56.18, throughput 153.31 samples/s, lr 32.14
[Epoch 297] throughput 10757.80 samples/s
[Epoch 297] time cost 234.92s, valid loss 4.30, valid ppl 73.97
[Epoch 298 Batch 200/372] loss 4.03, ppl 56.23, throughput 160.48 samples/s, lr 28.71
[Epoch 298] throughput 10981.50 samples/s
[Epoch 298] time cost 231.06s, valid loss 4.31, valid ppl 74.35
[Epoch 299 Batch 200/372] loss 4.03, ppl 56.11, throughput 157.45 samples/s, lr 29.57
[Epoch 299] throughput 10762.19 samples/s
[Epoch 299] time cost 235.04s, valid loss 4.32, valid ppl 75.56
Learning rate after interval update 3.000000
[Epoch 300 Batch 200/372] loss 4.02, ppl 55.76, throughput 155.51 samples/s, lr 3.00
[Epoch 300] throughput 10460.07 samples/s
[Epoch 300] time cost 240.46s, valid loss 4.27, valid ppl 71.61
test loss 4.23, test ppl 68.40
[Epoch 301 Batch 200/372] loss 3.99, ppl 53.87, throughput 163.15 samples/s, lr 3.17
[Epoch 301] throughput 11364.52 samples/s
[Epoch 301] time cost 225.26s, valid loss 4.27, valid ppl 71.36
test loss 4.22, test ppl 68.14
[Epoch 302 Batch 200/372] loss 3.97, ppl 53.10, throughput 156.95 samples/s, lr 3.39
[Epoch 302] throughput 10684.24 samples/s
[Epoch 302] time cost 236.32s, valid loss 4.27, valid ppl 71.27
test loss 4.22, test ppl 68.06
[Epoch 303 Batch 200/372] loss 3.97, ppl 52.93, throughput 159.82 samples/s, lr 3.00
[Epoch 303] throughput 10768.60 samples/s
[Epoch 303] time cost 234.72s, valid loss 4.26, valid ppl 71.12
test loss 4.22, test ppl 67.96
[Epoch 304 Batch 200/372] loss 3.96, ppl 52.33, throughput 152.75 samples/s, lr 3.34
[Epoch 304] throughput 10761.66 samples/s
[Epoch 304] time cost 234.86s, valid loss 4.26, valid ppl 70.93
test loss 4.22, test ppl 67.81
[Epoch 305 Batch 200/372] loss 3.95, ppl 52.05, throughput 162.45 samples/s, lr 3.17
[Epoch 305] throughput 11183.83 samples/s
[Epoch 305] time cost 227.57s, valid loss 4.26, valid ppl 70.91
test loss 4.22, test ppl 67.72
[Epoch 306 Batch 200/372] loss 3.94, ppl 51.43, throughput 162.42 samples/s, lr 3.04
[Epoch 306] throughput 11000.55 samples/s
[Epoch 306] time cost 231.09s, valid loss 4.26, valid ppl 70.88
test loss 4.22, test ppl 67.77
[Epoch 307 Batch 200/372] loss 3.94, ppl 51.53, throughput 166.70 samples/s, lr 3.00
[Epoch 307] throughput 11037.51 samples/s
[Epoch 307] time cost 230.54s, valid loss 4.26, valid ppl 70.83
test loss 4.22, test ppl 67.75
[Epoch 308 Batch 200/372] loss 3.94, ppl 51.24, throughput 160.77 samples/s, lr 3.17
[Epoch 308] throughput 11090.72 samples/s
[Epoch 308] time cost 229.03s, valid loss 4.26, valid ppl 70.89
[Epoch 309 Batch 200/372] loss 3.93, ppl 50.91, throughput 158.05 samples/s, lr 3.00
[Epoch 309] throughput 10937.88 samples/s
[Epoch 309] time cost 231.65s, valid loss 4.26, valid ppl 70.56
test loss 4.21, test ppl 67.51
[Epoch 310 Batch 200/372] loss 3.93, ppl 50.80, throughput 157.41 samples/s, lr 3.04
[Epoch 310] throughput 10947.45 samples/s
[Epoch 310] time cost 231.48s, valid loss 4.26, valid ppl 70.64
[Epoch 311 Batch 200/372] loss 3.92, ppl 50.36, throughput 170.40 samples/s, lr 3.17
[Epoch 311] throughput 11187.98 samples/s
[Epoch 311] time cost 227.44s, valid loss 4.26, valid ppl 70.62
[Epoch 312 Batch 200/372] loss 3.92, ppl 50.21, throughput 157.84 samples/s, lr 3.17
[Epoch 312] throughput 10722.09 samples/s
[Epoch 312] time cost 235.96s, valid loss 4.26, valid ppl 70.63
[Epoch 313 Batch 200/372] loss 3.91, ppl 50.04, throughput 158.52 samples/s, lr 3.17
[Epoch 313] throughput 10968.41 samples/s
[Epoch 313] time cost 231.57s, valid loss 4.26, valid ppl 70.70
[Epoch 314 Batch 200/372] loss 3.92, ppl 50.16, throughput 160.05 samples/s, lr 2.87
[Epoch 314] throughput 10994.55 samples/s
[Epoch 314] time cost 230.90s, valid loss 4.26, valid ppl 70.56
[Epoch 315 Batch 200/372] loss 3.91, ppl 49.96, throughput 165.84 samples/s, lr 2.91
[Epoch 315] throughput 11155.06 samples/s
[Epoch 315] time cost 228.15s, valid loss 4.26, valid ppl 70.59
[Epoch 316 Batch 200/372] loss 3.91, ppl 49.74, throughput 162.44 samples/s, lr 3.09
[Epoch 316] throughput 11056.05 samples/s
[Epoch 316] time cost 229.74s, valid loss 4.26, valid ppl 70.66
[Epoch 317 Batch 200/372] loss 3.91, ppl 49.78, throughput 153.44 samples/s, lr 2.83
[Epoch 317] throughput 10454.70 samples/s
[Epoch 317] time cost 241.06s, valid loss 4.26, valid ppl 70.55
test loss 4.21, test ppl 67.51
[Epoch 318 Batch 200/372] loss 3.90, ppl 49.49, throughput 162.61 samples/s, lr 2.79
[Epoch 318] throughput 10883.02 samples/s
[Epoch 318] time cost 232.90s, valid loss 4.25, valid ppl 70.37
test loss 4.21, test ppl 67.40
[Epoch 319 Batch 200/372] loss 3.90, ppl 49.45, throughput 162.31 samples/s, lr 2.74
[Epoch 319] throughput 10977.65 samples/s
[Epoch 319] time cost 231.29s, valid loss 4.25, valid ppl 70.35
test loss 4.21, test ppl 67.34
[Epoch 320 Batch 200/372] loss 3.90, ppl 49.19, throughput 162.65 samples/s, lr 2.96
[Epoch 320] throughput 10970.78 samples/s
[Epoch 320] time cost 231.16s, valid loss 4.25, valid ppl 70.34
test loss 4.21, test ppl 67.32
[Epoch 321 Batch 200/372] loss 3.89, ppl 49.08, throughput 158.01 samples/s, lr 2.79
[Epoch 321] throughput 11014.50 samples/s
[Epoch 321] time cost 230.31s, valid loss 4.25, valid ppl 70.45
[Epoch 322 Batch 200/372] loss 3.89, ppl 49.12, throughput 165.00 samples/s, lr 3.13
[Epoch 322] throughput 11154.91 samples/s
[Epoch 322] time cost 228.27s, valid loss 4.26, valid ppl 70.46
[Epoch 323 Batch 200/372] loss 3.89, ppl 48.91, throughput 168.01 samples/s, lr 3.34
[Epoch 323] throughput 11198.35 samples/s
[Epoch 323] time cost 227.43s, valid loss 4.25, valid ppl 70.43
[Epoch 324 Batch 200/372] loss 3.89, ppl 48.83, throughput 169.63 samples/s, lr 3.13
[Epoch 324] throughput 11438.37 samples/s
[Epoch 324] time cost 223.65s, valid loss 4.25, valid ppl 70.41
[Epoch 325 Batch 200/372] loss 3.88, ppl 48.63, throughput 161.43 samples/s, lr 2.74
[Epoch 325] throughput 11010.29 samples/s
[Epoch 325] time cost 230.79s, valid loss 4.25, valid ppl 70.39
[Epoch 326 Batch 200/372] loss 3.89, ppl 48.69, throughput 160.74 samples/s, lr 2.66
[Epoch 326] throughput 10900.69 samples/s
[Epoch 326] time cost 232.97s, valid loss 4.25, valid ppl 70.43
[Epoch 327 Batch 200/372] loss 3.88, ppl 48.43, throughput 175.04 samples/s, lr 2.66
[Epoch 327] throughput 11497.59 samples/s
[Epoch 327] time cost 222.57s, valid loss 4.25, valid ppl 70.32
test loss 4.21, test ppl 67.35
[Epoch 328 Batch 200/372] loss 3.88, ppl 48.51, throughput 167.99 samples/s, lr 2.96
[Epoch 328] throughput 10937.60 samples/s
[Epoch 328] time cost 231.74s, valid loss 4.25, valid ppl 70.23
test loss 4.21, test ppl 67.29
[Epoch 329 Batch 200/372] loss 3.89, ppl 48.69, throughput 162.17 samples/s, lr 3.04
[Epoch 329] throughput 10937.00 samples/s
[Epoch 329] time cost 232.04s, valid loss 4.25, valid ppl 70.43
[Epoch 330 Batch 200/372] loss 3.88, ppl 48.23, throughput 159.66 samples/s, lr 3.09
[Epoch 330] throughput 10592.62 samples/s
[Epoch 330] time cost 238.28s, valid loss 4.25, valid ppl 70.22
test loss 4.21, test ppl 67.27
[Epoch 331 Batch 200/372] loss 3.87, ppl 48.07, throughput 156.42 samples/s, lr 2.91
[Epoch 331] throughput 10905.82 samples/s
[Epoch 331] time cost 232.51s, valid loss 4.25, valid ppl 70.41
[Epoch 332 Batch 200/372] loss 3.88, ppl 48.25, throughput 170.68 samples/s, lr 3.21
[Epoch 332] throughput 11200.51 samples/s
[Epoch 332] time cost 227.52s, valid loss 4.25, valid ppl 70.37
[Epoch 333 Batch 200/372] loss 3.87, ppl 48.10, throughput 161.00 samples/s, lr 3.04
[Epoch 333] throughput 11206.12 samples/s
[Epoch 333] time cost 227.20s, valid loss 4.25, valid ppl 70.26
[Epoch 334 Batch 200/372] loss 3.87, ppl 48.02, throughput 169.77 samples/s, lr 2.57
[Epoch 334] throughput 11355.02 samples/s
[Epoch 334] time cost 224.72s, valid loss 4.25, valid ppl 70.30
[Epoch 335 Batch 200/372] loss 3.87, ppl 48.04, throughput 166.64 samples/s, lr 3.17
[Epoch 335] throughput 10978.06 samples/s
[Epoch 335] time cost 230.76s, valid loss 4.25, valid ppl 70.22
test loss 4.21, test ppl 67.22
[Epoch 336 Batch 200/372] loss 3.87, ppl 47.84, throughput 162.84 samples/s, lr 3.04
[Epoch 336] throughput 10942.12 samples/s
[Epoch 336] time cost 231.68s, valid loss 4.25, valid ppl 70.37
[Epoch 337 Batch 200/372] loss 3.87, ppl 47.96, throughput 158.68 samples/s, lr 3.09
[Epoch 337] throughput 10597.87 samples/s
[Epoch 337] time cost 237.95s, valid loss 4.25, valid ppl 70.37
[Epoch 338 Batch 200/372] loss 3.87, ppl 47.83, throughput 161.55 samples/s, lr 2.87
[Epoch 338] throughput 11068.05 samples/s
[Epoch 338] time cost 229.75s, valid loss 4.25, valid ppl 70.31
[Epoch 339 Batch 200/372] loss 3.86, ppl 47.70, throughput 161.17 samples/s, lr 3.26
[Epoch 339] throughput 11298.50 samples/s
[Epoch 339] time cost 225.77s, valid loss 4.25, valid ppl 70.28
[Epoch 340 Batch 200/372] loss 3.86, ppl 47.50, throughput 163.66 samples/s, lr 3.30
[Epoch 340] throughput 10946.42 samples/s
[Epoch 340] time cost 232.06s, valid loss 4.25, valid ppl 70.36
[Epoch 341 Batch 200/372] loss 3.86, ppl 47.47, throughput 162.45 samples/s, lr 2.79
[Epoch 341] throughput 11266.53 samples/s
[Epoch 341] time cost 226.16s, valid loss 4.25, valid ppl 70.29
[Epoch 342 Batch 200/372] loss 3.86, ppl 47.52, throughput 162.80 samples/s, lr 2.83
[Epoch 342] throughput 11228.66 samples/s
[Epoch 342] time cost 227.18s, valid loss 4.25, valid ppl 70.34
[Epoch 343 Batch 200/372] loss 3.86, ppl 47.48, throughput 163.08 samples/s, lr 3.21
[Epoch 343] throughput 11394.54 samples/s
[Epoch 343] time cost 224.65s, valid loss 4.25, valid ppl 70.31
[Epoch 344 Batch 200/372] loss 3.87, ppl 47.72, throughput 164.35 samples/s, lr 2.96
[Epoch 344] throughput 11152.97 samples/s
[Epoch 344] time cost 228.60s, valid loss 4.25, valid ppl 70.38
[Epoch 345 Batch 200/372] loss 3.86, ppl 47.39, throughput 160.88 samples/s, lr 3.30
[Epoch 345] throughput 11160.92 samples/s
[Epoch 345] time cost 227.97s, valid loss 4.25, valid ppl 70.34
[Epoch 346 Batch 200/372] loss 3.85, ppl 47.22, throughput 154.72 samples/s, lr 2.79
[Epoch 346] throughput 10906.10 samples/s
[Epoch 346] time cost 232.30s, valid loss 4.25, valid ppl 70.31
[Epoch 347 Batch 200/372] loss 3.85, ppl 47.07, throughput 162.57 samples/s, lr 2.83
[Epoch 347] throughput 10968.46 samples/s
[Epoch 347] time cost 231.28s, valid loss 4.25, valid ppl 70.24
[Epoch 348 Batch 200/372] loss 3.85, ppl 47.12, throughput 153.80 samples/s, lr 3.21
[Epoch 348] throughput 10273.93 samples/s
[Epoch 348] time cost 243.92s, valid loss 4.25, valid ppl 70.39
[Epoch 349 Batch 200/372] loss 3.86, ppl 47.26, throughput 165.24 samples/s, lr 2.96
[Epoch 349] throughput 11201.17 samples/s
[Epoch 349] time cost 227.40s, valid loss 4.25, valid ppl 70.35
[Epoch 350 Batch 200/372] loss 3.85, ppl 47.04, throughput 165.84 samples/s, lr 2.91
[Epoch 350] throughput 11265.65 samples/s
[Epoch 350] time cost 226.29s, valid loss 4.25, valid ppl 70.36
[Epoch 351 Batch 200/372] loss 3.85, ppl 47.17, throughput 158.53 samples/s, lr 3.34
[Epoch 351] throughput 11064.35 samples/s
[Epoch 351] time cost 229.37s, valid loss 4.25, valid ppl 70.29
[Epoch 352 Batch 200/372] loss 3.85, ppl 46.91, throughput 171.13 samples/s, lr 2.70
[Epoch 352] throughput 11353.65 samples/s
[Epoch 352] time cost 224.53s, valid loss 4.25, valid ppl 70.36
[Epoch 353 Batch 200/372] loss 3.85, ppl 46.90, throughput 152.39 samples/s, lr 3.17
[Epoch 353] throughput 10606.05 samples/s
[Epoch 353] time cost 237.70s, valid loss 4.25, valid ppl 70.34
[Epoch 354 Batch 200/372] loss 3.85, ppl 46.94, throughput 162.01 samples/s, lr 3.00
[Epoch 354] throughput 10644.03 samples/s
[Epoch 354] time cost 237.04s, valid loss 4.25, valid ppl 70.28
[Epoch 355 Batch 200/372] loss 3.85, ppl 47.12, throughput 160.86 samples/s, lr 3.09
[Epoch 355] throughput 10706.94 samples/s
[Epoch 355] time cost 236.23s, valid loss 4.25, valid ppl 70.25
[Epoch 356 Batch 200/372] loss 3.85, ppl 47.02, throughput 160.51 samples/s, lr 2.70
[Epoch 356] throughput 10737.57 samples/s
[Epoch 356] time cost 235.43s, valid loss 4.25, valid ppl 70.19
test loss 4.21, test ppl 67.27
[Epoch 357 Batch 200/372] loss 3.85, ppl 46.76, throughput 159.09 samples/s, lr 3.13
[Epoch 357] throughput 10952.77 samples/s
[Epoch 357] time cost 231.43s, valid loss 4.25, valid ppl 70.28
[Epoch 358 Batch 200/372] loss 3.85, ppl 46.79, throughput 155.57 samples/s, lr 2.96
[Epoch 358] throughput 10857.84 samples/s
[Epoch 358] time cost 233.34s, valid loss 4.25, valid ppl 70.40
[Epoch 359 Batch 200/372] loss 3.85, ppl 46.81, throughput 159.80 samples/s, lr 2.70
[Epoch 359] throughput 10949.75 samples/s
[Epoch 359] time cost 231.42s, valid loss 4.25, valid ppl 70.21
[Epoch 360 Batch 200/372] loss 3.85, ppl 46.88, throughput 159.29 samples/s, lr 2.70
[Epoch 360] throughput 10579.40 samples/s
[Epoch 360] time cost 238.39s, valid loss 4.25, valid ppl 70.23
[Epoch 361 Batch 200/372] loss 3.84, ppl 46.44, throughput 163.29 samples/s, lr 2.83
[Epoch 361] throughput 10825.06 samples/s
[Epoch 361] time cost 233.98s, valid loss 4.25, valid ppl 70.29
[Epoch 362 Batch 200/372] loss 3.84, ppl 46.51, throughput 164.37 samples/s, lr 2.79
[Epoch 362] throughput 10900.48 samples/s
[Epoch 362] time cost 232.32s, valid loss 4.25, valid ppl 70.22
[Epoch 363 Batch 200/372] loss 3.85, ppl 46.81, throughput 157.80 samples/s, lr 2.74
[Epoch 363] throughput 10857.76 samples/s
[Epoch 363] time cost 233.44s, valid loss 4.25, valid ppl 70.33
[Epoch 364 Batch 200/372] loss 3.84, ppl 46.33, throughput 156.39 samples/s, lr 2.53
[Epoch 364] throughput 10737.07 samples/s
[Epoch 364] time cost 235.45s, valid loss 4.25, valid ppl 70.38
[Epoch 365 Batch 200/372] loss 3.84, ppl 46.32, throughput 156.30 samples/s, lr 3.26
[Epoch 365] throughput 10927.81 samples/s
[Epoch 365] time cost 232.35s, valid loss 4.25, valid ppl 70.37
[Epoch 366 Batch 200/372] loss 3.83, ppl 46.23, throughput 161.66 samples/s, lr 2.83
[Epoch 366] throughput 11042.67 samples/s
[Epoch 366] time cost 230.27s, valid loss 4.26, valid ppl 70.47
[Epoch 367 Batch 200/372] loss 3.84, ppl 46.42, throughput 155.89 samples/s, lr 3.26
[Epoch 367] throughput 10738.85 samples/s
[Epoch 367] time cost 235.28s, valid loss 4.25, valid ppl 70.38
[Epoch 368 Batch 200/372] loss 3.83, ppl 46.21, throughput 163.49 samples/s, lr 2.57
[Epoch 368] throughput 11327.71 samples/s
[Epoch 368] time cost 224.97s, valid loss 4.25, valid ppl 70.29
[Epoch 369 Batch 200/372] loss 3.83, ppl 46.23, throughput 160.49 samples/s, lr 3.04
[Epoch 369] throughput 11029.09 samples/s
[Epoch 369] time cost 230.58s, valid loss 4.25, valid ppl 70.24
[Epoch 370 Batch 200/372] loss 3.83, ppl 46.03, throughput 164.65 samples/s, lr 2.96
[Epoch 370] throughput 11205.82 samples/s
[Epoch 370] time cost 227.40s, valid loss 4.25, valid ppl 70.23
[Epoch 371 Batch 200/372] loss 3.84, ppl 46.43, throughput 166.95 samples/s, lr 3.39
[Epoch 371] throughput 11333.35 samples/s
[Epoch 371] time cost 225.58s, valid loss 4.25, valid ppl 70.43
[Epoch 372 Batch 200/372] loss 3.83, ppl 46.28, throughput 154.34 samples/s, lr 3.17
[Epoch 372] throughput 10893.03 samples/s
[Epoch 372] time cost 232.57s, valid loss 4.25, valid ppl 70.37
[Epoch 373 Batch 200/372] loss 3.84, ppl 46.30, throughput 156.70 samples/s, lr 2.70
[Epoch 373] throughput 10657.75 samples/s
[Epoch 373] time cost 236.82s, valid loss 4.25, valid ppl 70.29
[Epoch 374 Batch 200/372] loss 3.83, ppl 46.16, throughput 161.05 samples/s, lr 3.21
[Epoch 374] throughput 10976.49 samples/s
[Epoch 374] time cost 231.07s, valid loss 4.25, valid ppl 70.30
[Epoch 375 Batch 200/372] loss 3.83, ppl 46.25, throughput 162.44 samples/s, lr 2.96
[Epoch 375] throughput 11130.73 samples/s
[Epoch 375] time cost 228.99s, valid loss 4.25, valid ppl 70.32
[Epoch 376 Batch 200/372] loss 3.83, ppl 46.22, throughput 162.07 samples/s, lr 2.87
[Epoch 376] throughput 11152.28 samples/s
[Epoch 376] time cost 228.38s, valid loss 4.25, valid ppl 70.42
[Epoch 377 Batch 200/372] loss 3.83, ppl 46.00, throughput 165.26 samples/s, lr 3.00
[Epoch 377] throughput 11196.79 samples/s
[Epoch 377] time cost 227.58s, valid loss 4.25, valid ppl 70.25
[Epoch 378 Batch 200/372] loss 3.83, ppl 46.04, throughput 173.70 samples/s, lr 2.96
[Epoch 378] throughput 11591.30 samples/s
[Epoch 378] time cost 221.08s, valid loss 4.25, valid ppl 70.41
[Epoch 379 Batch 200/372] loss 3.83, ppl 45.94, throughput 167.89 samples/s, lr 3.26
[Epoch 379] throughput 11374.20 samples/s
[Epoch 379] time cost 224.40s, valid loss 4.25, valid ppl 70.34
[Epoch 380 Batch 200/372] loss 3.83, ppl 45.84, throughput 162.31 samples/s, lr 3.21
[Epoch 380] throughput 11334.14 samples/s
[Epoch 380] time cost 225.30s, valid loss 4.25, valid ppl 70.32
[Epoch 381 Batch 200/372] loss 3.83, ppl 46.03, throughput 154.46 samples/s, lr 2.70
[Epoch 381] throughput 10830.20 samples/s
[Epoch 381] time cost 233.56s, valid loss 4.25, valid ppl 70.29
[Epoch 382 Batch 200/372] loss 3.83, ppl 46.01, throughput 163.23 samples/s, lr 3.30
[Epoch 382] throughput 10868.13 samples/s
[Epoch 382] time cost 233.26s, valid loss 4.26, valid ppl 70.56
[Epoch 383 Batch 200/372] loss 3.82, ppl 45.64, throughput 159.17 samples/s, lr 3.17
[Epoch 383] throughput 11155.55 samples/s
[Epoch 383] time cost 228.03s, valid loss 4.25, valid ppl 70.41
[Epoch 384 Batch 200/372] loss 3.82, ppl 45.74, throughput 166.76 samples/s, lr 3.04
[Epoch 384] throughput 11513.60 samples/s
[Epoch 384] time cost 222.70s, valid loss 4.25, valid ppl 70.32
[Epoch 385 Batch 200/372] loss 3.82, ppl 45.68, throughput 165.01 samples/s, lr 3.21
[Epoch 385] throughput 11375.82 samples/s
[Epoch 385] time cost 224.91s, valid loss 4.25, valid ppl 70.33
[Epoch 386 Batch 200/372] loss 3.82, ppl 45.81, throughput 165.49 samples/s, lr 3.30
[Epoch 386] throughput 11082.20 samples/s
[Epoch 386] time cost 229.73s, valid loss 4.25, valid ppl 70.45
Learning rate after interval update 0.300000
[Epoch 387 Batch 200/372] loss 3.84, ppl 46.66, throughput 159.54 samples/s, lr 0.34
[Epoch 387] throughput 10743.42 samples/s
[Epoch 387] time cost 235.24s, valid loss 4.25, valid ppl 69.76
test loss 4.20, test ppl 66.85
[Epoch 388 Batch 200/372] loss 3.83, ppl 46.24, throughput 153.11 samples/s, lr 0.32
[Epoch 388] throughput 10615.73 samples/s
[Epoch 388] time cost 237.83s, valid loss 4.24, valid ppl 69.72
test loss 4.20, test ppl 66.81
[Epoch 389 Batch 200/372] loss 3.84, ppl 46.30, throughput 160.74 samples/s, lr 0.25
[Epoch 389] throughput 10908.33 samples/s
[Epoch 389] time cost 232.38s, valid loss 4.24, valid ppl 69.70
test loss 4.20, test ppl 66.82
[Epoch 390 Batch 200/372] loss 3.83, ppl 45.96, throughput 159.92 samples/s, lr 0.27
[Epoch 390] throughput 10861.62 samples/s
[Epoch 390] time cost 233.33s, valid loss 4.24, valid ppl 69.71
[Epoch 391 Batch 200/372] loss 3.83, ppl 46.05, throughput 155.93 samples/s, lr 0.31
[Epoch 391] throughput 10968.52 samples/s
[Epoch 391] time cost 231.56s, valid loss 4.24, valid ppl 69.65
test loss 4.20, test ppl 66.76
[Epoch 392 Batch 200/372] loss 3.83, ppl 46.05, throughput 170.94 samples/s, lr 0.13
[Epoch 392] throughput 11197.85 samples/s
[Epoch 392] time cost 227.55s, valid loss 4.24, valid ppl 69.68
[Epoch 393 Batch 200/372] loss 3.83, ppl 46.26, throughput 164.10 samples/s, lr 0.27
[Epoch 393] throughput 11093.48 samples/s
[Epoch 393] time cost 229.66s, valid loss 4.24, valid ppl 69.67
[Epoch 394 Batch 200/372] loss 3.83, ppl 45.93, throughput 166.17 samples/s, lr 0.15
[Epoch 394] throughput 11059.55 samples/s
[Epoch 394] time cost 230.07s, valid loss 4.24, valid ppl 69.65
test loss 4.20, test ppl 66.73
[Epoch 395 Batch 200/372] loss 3.83, ppl 46.00, throughput 162.61 samples/s, lr 0.30
[Epoch 395] throughput 10937.66 samples/s
[Epoch 395] time cost 232.34s, valid loss 4.24, valid ppl 69.65
[Epoch 396 Batch 200/372] loss 3.83, ppl 46.02, throughput 158.58 samples/s, lr 0.33
[Epoch 396] throughput 11096.14 samples/s
[Epoch 396] time cost 229.16s, valid loss 4.24, valid ppl 69.66
[Epoch 397 Batch 200/372] loss 3.82, ppl 45.80, throughput 160.26 samples/s, lr 0.29
[Epoch 397] throughput 10840.74 samples/s
[Epoch 397] time cost 233.53s, valid loss 4.24, valid ppl 69.66
[Epoch 398 Batch 200/372] loss 3.82, ppl 45.78, throughput 167.39 samples/s, lr 0.27
[Epoch 398] throughput 11065.40 samples/s
[Epoch 398] time cost 229.98s, valid loss 4.24, valid ppl 69.69
[Epoch 399 Batch 200/372] loss 3.82, ppl 45.81, throughput 163.56 samples/s, lr 0.27
[Epoch 399] throughput 10967.01 samples/s
[Epoch 399] time cost 231.69s, valid loss 4.24, valid ppl 69.70
[Epoch 400 Batch 200/372] loss 3.83, ppl 45.88, throughput 163.64 samples/s, lr 0.32
[Epoch 400] throughput 11142.16 samples/s
[Epoch 400] time cost 228.08s, valid loss 4.24, valid ppl 69.69
[Epoch 401 Batch 200/372] loss 3.82, ppl 45.58, throughput 156.75 samples/s, lr 0.30
[Epoch 401] throughput 10870.31 samples/s
[Epoch 401] time cost 233.35s, valid loss 4.24, valid ppl 69.68
[Epoch 402 Batch 200/372] loss 3.82, ppl 45.55, throughput 156.13 samples/s, lr 0.31
[Epoch 402] throughput 10793.27 samples/s
[Epoch 402] time cost 234.25s, valid loss 4.24, valid ppl 69.68
[Epoch 403 Batch 200/372] loss 3.82, ppl 45.81, throughput 152.79 samples/s, lr 0.33
[Epoch 403] throughput 10635.67 samples/s
[Epoch 403] time cost 237.56s, valid loss 4.24, valid ppl 69.66
[Epoch 404 Batch 200/372] loss 3.82, ppl 45.82, throughput 157.62 samples/s, lr 0.16
[Epoch 404] throughput 10764.61 samples/s
[Epoch 404] time cost 234.76s, valid loss 4.24, valid ppl 69.68
[Epoch 405 Batch 200/372] loss 3.83, ppl 45.97, throughput 151.43 samples/s, lr 0.27
[Epoch 405] throughput 10587.95 samples/s
[Epoch 405] time cost 238.14s, valid loss 4.24, valid ppl 69.64
test loss 4.20, test ppl 66.72
[Epoch 406 Batch 200/372] loss 3.82, ppl 45.59, throughput 157.22 samples/s, lr 0.31
[Epoch 406] throughput 10764.38 samples/s
[Epoch 406] time cost 234.81s, valid loss 4.24, valid ppl 69.68
[Epoch 407 Batch 200/372] loss 3.82, ppl 45.73, throughput 162.42 samples/s, lr 0.30
[Epoch 407] throughput 11001.55 samples/s
[Epoch 407] time cost 230.80s, valid loss 4.24, valid ppl 69.69
[Epoch 408 Batch 200/372] loss 3.83, ppl 45.95, throughput 157.40 samples/s, lr 0.28
[Epoch 408] throughput 10918.07 samples/s
[Epoch 408] time cost 232.24s, valid loss 4.24, valid ppl 69.63
test loss 4.20, test ppl 66.70
[Epoch 409 Batch 200/372] loss 3.83, ppl 45.84, throughput 162.02 samples/s, lr 0.28
[Epoch 409] throughput 11141.75 samples/s
[Epoch 409] time cost 228.75s, valid loss 4.24, valid ppl 69.62
test loss 4.20, test ppl 66.68
[Epoch 410 Batch 200/372] loss 3.82, ppl 45.47, throughput 164.04 samples/s, lr 0.25
[Epoch 410] throughput 11041.25 samples/s
[Epoch 410] time cost 230.03s, valid loss 4.24, valid ppl 69.65
[Epoch 411 Batch 200/372] loss 3.82, ppl 45.75, throughput 155.82 samples/s, lr 0.29
[Epoch 411] throughput 10823.97 samples/s
[Epoch 411] time cost 234.31s, valid loss 4.24, valid ppl 69.66
[Epoch 412 Batch 200/372] loss 3.82, ppl 45.48, throughput 151.95 samples/s, lr 0.15
[Epoch 412] throughput 10656.82 samples/s
[Epoch 412] time cost 236.63s, valid loss 4.24, valid ppl 69.62
[Epoch 413 Batch 200/372] loss 3.82, ppl 45.54, throughput 159.09 samples/s, lr 0.28
[Epoch 413] throughput 10881.40 samples/s
[Epoch 413] time cost 232.58s, valid loss 4.24, valid ppl 69.64
[Epoch 414 Batch 200/372] loss 3.82, ppl 45.60, throughput 156.73 samples/s, lr 0.30
[Epoch 414] throughput 10679.70 samples/s
[Epoch 414] time cost 236.49s, valid loss 4.24, valid ppl 69.63
[Epoch 415 Batch 200/372] loss 3.82, ppl 45.70, throughput 155.35 samples/s, lr 0.27
[Epoch 415] throughput 10704.31 samples/s
[Epoch 415] time cost 236.10s, valid loss 4.24, valid ppl 69.62
test loss 4.20, test ppl 66.70
[Epoch 416 Batch 200/372] loss 3.82, ppl 45.82, throughput 158.12 samples/s, lr 0.29
[Epoch 416] throughput 10873.87 samples/s
[Epoch 416] time cost 232.94s, valid loss 4.24, valid ppl 69.65
[Epoch 417 Batch 200/372] loss 3.83, ppl 45.87, throughput 161.81 samples/s, lr 0.31
[Epoch 417] throughput 10846.46 samples/s
[Epoch 417] time cost 233.48s, valid loss 4.24, valid ppl 69.66
[Epoch 418 Batch 200/372] loss 3.82, ppl 45.72, throughput 161.12 samples/s, lr 0.33
[Epoch 418] throughput 10788.06 samples/s
[Epoch 418] time cost 234.44s, valid loss 4.24, valid ppl 69.67
[Epoch 419 Batch 200/372] loss 3.82, ppl 45.64, throughput 161.85 samples/s, lr 0.31
[Epoch 419] throughput 11076.88 samples/s
[Epoch 419] time cost 229.46s, valid loss 4.24, valid ppl 69.67
[Epoch 420 Batch 200/372] loss 3.82, ppl 45.49, throughput 161.22 samples/s, lr 0.29
[Epoch 420] throughput 10965.39 samples/s
[Epoch 420] time cost 231.34s, valid loss 4.24, valid ppl 69.65
[Epoch 421 Batch 200/372] loss 3.82, ppl 45.41, throughput 156.50 samples/s, lr 0.26
[Epoch 421] throughput 10626.24 samples/s
[Epoch 421] time cost 237.84s, valid loss 4.24, valid ppl 69.66
[Epoch 422 Batch 200/372] loss 3.82, ppl 45.81, throughput 164.67 samples/s, lr 0.28
[Epoch 422] throughput 10964.21 samples/s
[Epoch 422] time cost 231.74s, valid loss 4.24, valid ppl 69.65
[Epoch 423 Batch 200/372] loss 3.82, ppl 45.49, throughput 160.46 samples/s, lr 0.31
[Epoch 423] throughput 10924.48 samples/s
[Epoch 423] time cost 232.03s, valid loss 4.24, valid ppl 69.64
[Epoch 424 Batch 200/372] loss 3.82, ppl 45.73, throughput 162.38 samples/s, lr 0.29
[Epoch 424] throughput 11029.93 samples/s
[Epoch 424] time cost 230.40s, valid loss 4.24, valid ppl 69.66
[Epoch 425 Batch 200/372] loss 3.82, ppl 45.43, throughput 166.86 samples/s, lr 0.29
[Epoch 425] throughput 11224.18 samples/s
[Epoch 425] time cost 227.00s, valid loss 4.24, valid ppl 69.64
[Epoch 426 Batch 200/372] loss 3.81, ppl 45.37, throughput 163.99 samples/s, lr 0.31
[Epoch 426] throughput 11481.59 samples/s
[Epoch 426] time cost 222.88s, valid loss 4.24, valid ppl 69.64
[Epoch 427 Batch 200/372] loss 3.82, ppl 45.61, throughput 160.06 samples/s, lr 0.30
[Epoch 427] throughput 11048.62 samples/s
[Epoch 427] time cost 230.25s, valid loss 4.24, valid ppl 69.65
[Epoch 428 Batch 200/372] loss 3.82, ppl 45.45, throughput 169.31 samples/s, lr 0.31
[Epoch 428] throughput 11619.78 samples/s
[Epoch 428] time cost 220.75s, valid loss 4.24, valid ppl 69.64
[Epoch 429 Batch 200/372] loss 3.81, ppl 45.30, throughput 162.20 samples/s, lr 0.30
[Epoch 429] throughput 11131.25 samples/s
[Epoch 429] time cost 228.52s, valid loss 4.24, valid ppl 69.63
[Epoch 430 Batch 200/372] loss 3.81, ppl 45.33, throughput 165.25 samples/s, lr 0.16
[Epoch 430] throughput 11100.13 samples/s
[Epoch 430] time cost 229.12s, valid loss 4.24, valid ppl 69.64
[Epoch 431 Batch 200/372] loss 3.82, ppl 45.47, throughput 162.25 samples/s, lr 0.30
[Epoch 431] throughput 11092.84 samples/s
[Epoch 431] time cost 229.18s, valid loss 4.24, valid ppl 69.67
[Epoch 432 Batch 200/372] loss 3.82, ppl 45.50, throughput 153.81 samples/s, lr 0.27
[Epoch 432] throughput 10503.23 samples/s
[Epoch 432] time cost 240.24s, valid loss 4.24, valid ppl 69.66
[Epoch 433 Batch 200/372] loss 3.81, ppl 45.37, throughput 168.26 samples/s, lr 0.28
[Epoch 433] throughput 11071.45 samples/s
[Epoch 433] time cost 229.71s, valid loss 4.24, valid ppl 69.65
[Epoch 434 Batch 200/372] loss 3.81, ppl 45.35, throughput 158.55 samples/s, lr 0.29
[Epoch 434] throughput 10976.47 samples/s
[Epoch 434] time cost 231.43s, valid loss 4.24, valid ppl 69.63
[Epoch 435 Batch 200/372] loss 3.82, ppl 45.62, throughput 159.16 samples/s, lr 0.28
[Epoch 435] throughput 11018.28 samples/s
[Epoch 435] time cost 230.71s, valid loss 4.24, valid ppl 69.61
test loss 4.20, test ppl 66.70
[Epoch 436 Batch 200/372] loss 3.82, ppl 45.50, throughput 160.95 samples/s, lr 0.33
[Epoch 436] throughput 11037.11 samples/s
[Epoch 436] time cost 230.09s, valid loss 4.24, valid ppl 69.61
[Epoch 437 Batch 200/372] loss 3.81, ppl 45.22, throughput 158.87 samples/s, lr 0.28
[Epoch 437] throughput 10746.66 samples/s
[Epoch 437] time cost 235.27s, valid loss 4.24, valid ppl 69.64
[Epoch 438 Batch 200/372] loss 3.82, ppl 45.44, throughput 164.03 samples/s, lr 0.32
[Epoch 438] throughput 11096.78 samples/s
[Epoch 438] time cost 229.05s, valid loss 4.24, valid ppl 69.65
[Epoch 439 Batch 200/372] loss 3.81, ppl 45.24, throughput 160.18 samples/s, lr 0.30
[Epoch 439] throughput 10933.03 samples/s
[Epoch 439] time cost 232.11s, valid loss 4.24, valid ppl 69.65
[Epoch 440 Batch 200/372] loss 3.81, ppl 45.34, throughput 159.50 samples/s, lr 0.30
[Epoch 440] throughput 11115.65 samples/s
[Epoch 440] time cost 228.79s, valid loss 4.24, valid ppl 69.64
[Epoch 441 Batch 200/372] loss 3.82, ppl 45.41, throughput 170.29 samples/s, lr 0.27
[Epoch 441] throughput 11495.62 samples/s
[Epoch 441] time cost 222.92s, valid loss 4.24, valid ppl 69.63
[Epoch 442 Batch 200/372] loss 3.81, ppl 45.09, throughput 155.94 samples/s, lr 0.28
[Epoch 442] throughput 10528.72 samples/s
[Epoch 442] time cost 239.45s, valid loss 4.24, valid ppl 69.64
[Epoch 443 Batch 200/372] loss 3.82, ppl 45.44, throughput 157.88 samples/s, lr 0.33
[Epoch 443] throughput 10751.32 samples/s
[Epoch 443] time cost 235.19s, valid loss 4.24, valid ppl 69.62
[Epoch 444 Batch 200/372] loss 3.82, ppl 45.44, throughput 150.09 samples/s, lr 0.29
[Epoch 444] throughput 10471.02 samples/s
[Epoch 444] time cost 240.82s, valid loss 4.24, valid ppl 69.68
[Epoch 445 Batch 200/372] loss 3.81, ppl 45.11, throughput 157.57 samples/s, lr 0.35
[Epoch 445] throughput 10603.62 samples/s
[Epoch 445] time cost 237.72s, valid loss 4.24, valid ppl 69.69
[Epoch 446 Batch 200/372] loss 3.82, ppl 45.39, throughput 153.52 samples/s, lr 0.33
[Epoch 446] throughput 10340.95 samples/s
[Epoch 446] time cost 242.90s, valid loss 4.24, valid ppl 69.65
[Epoch 447 Batch 200/372] loss 3.81, ppl 45.11, throughput 163.01 samples/s, lr 0.31
[Epoch 447] throughput 10791.06 samples/s
[Epoch 447] time cost 234.56s, valid loss 4.24, valid ppl 69.66
[Epoch 448 Batch 200/372] loss 3.81, ppl 45.28, throughput 164.87 samples/s, lr 0.28
[Epoch 448] throughput 11136.97 samples/s
[Epoch 448] time cost 228.44s, valid loss 4.24, valid ppl 69.67
[Epoch 449 Batch 200/372] loss 3.82, ppl 45.49, throughput 156.45 samples/s, lr 0.30
[Epoch 449] throughput 10678.21 samples/s
[Epoch 449] time cost 236.67s, valid loss 4.24, valid ppl 69.68
[Epoch 450 Batch 200/372] loss 3.81, ppl 45.27, throughput 155.40 samples/s, lr 0.29
[Epoch 450] throughput 10907.21 samples/s
[Epoch 450] time cost 232.44s, valid loss 4.24, valid ppl 69.67
[Epoch 451 Batch 200/372] loss 3.81, ppl 45.25, throughput 159.19 samples/s, lr 0.33
[Epoch 451] throughput 10950.65 samples/s
[Epoch 451] time cost 231.54s, valid loss 4.24, valid ppl 69.66
[Epoch 452 Batch 200/372] loss 3.81, ppl 45.18, throughput 156.75 samples/s, lr 0.30
[Epoch 452] throughput 10609.48 samples/s
[Epoch 452] time cost 237.82s, valid loss 4.24, valid ppl 69.63
[Epoch 453 Batch 200/372] loss 3.81, ppl 45.34, throughput 154.51 samples/s, lr 0.30
[Epoch 453] throughput 10500.97 samples/s
[Epoch 453] time cost 239.69s, valid loss 4.24, valid ppl 69.65
[Epoch 454 Batch 200/372] loss 3.81, ppl 45.31, throughput 161.33 samples/s, lr 0.30
[Epoch 454] throughput 11035.46 samples/s
[Epoch 454] time cost 229.99s, valid loss 4.24, valid ppl 69.68
[Epoch 455 Batch 200/372] loss 3.82, ppl 45.45, throughput 169.73 samples/s, lr 0.32
[Epoch 455] throughput 11251.56 samples/s
[Epoch 455] time cost 226.66s, valid loss 4.24, valid ppl 69.65
[Epoch 456 Batch 200/372] loss 3.81, ppl 45.34, throughput 154.30 samples/s, lr 0.13
[Epoch 456] throughput 10668.31 samples/s
[Epoch 456] time cost 236.67s, valid loss 4.24, valid ppl 69.66
[Epoch 457 Batch 200/372] loss 3.81, ppl 45.37, throughput 154.05 samples/s, lr 0.30
[Epoch 457] throughput 10641.55 samples/s
[Epoch 457] time cost 237.06s, valid loss 4.24, valid ppl 69.66
[Epoch 458 Batch 200/372] loss 3.82, ppl 45.38, throughput 160.08 samples/s, lr 0.31
[Epoch 458] throughput 10867.12 samples/s
[Epoch 458] time cost 233.08s, valid loss 4.24, valid ppl 69.66
[Epoch 459 Batch 200/372] loss 3.81, ppl 45.30, throughput 158.48 samples/s, lr 0.26
[Epoch 459] throughput 10897.19 samples/s
[Epoch 459] time cost 232.52s, valid loss 4.24, valid ppl 69.65
[Epoch 460 Batch 200/372] loss 3.81, ppl 45.19, throughput 159.89 samples/s, lr 0.28
[Epoch 460] throughput 10734.45 samples/s
[Epoch 460] time cost 235.49s, valid loss 4.24, valid ppl 69.65
[Epoch 461 Batch 200/372] loss 3.81, ppl 45.27, throughput 161.88 samples/s, lr 0.31
[Epoch 461] throughput 10947.36 samples/s
[Epoch 461] time cost 231.65s, valid loss 4.24, valid ppl 69.64
[Epoch 462 Batch 200/372] loss 3.81, ppl 45.11, throughput 160.91 samples/s, lr 0.17
[Epoch 462] throughput 10994.02 samples/s
[Epoch 462] time cost 230.80s, valid loss 4.24, valid ppl 69.62
[Epoch 463 Batch 200/372] loss 3.82, ppl 45.48, throughput 163.73 samples/s, lr 0.27
[Epoch 463] throughput 11409.96 samples/s
[Epoch 463] time cost 224.17s, valid loss 4.24, valid ppl 69.65
[Epoch 464 Batch 200/372] loss 3.81, ppl 45.14, throughput 164.46 samples/s, lr 0.31
[Epoch 464] throughput 10782.75 samples/s
[Epoch 464] time cost 234.57s, valid loss 4.24, valid ppl 69.70
[Epoch 465 Batch 200/372] loss 3.82, ppl 45.43, throughput 154.75 samples/s, lr 0.30
[Epoch 465] throughput 10660.48 samples/s
[Epoch 465] time cost 236.61s, valid loss 4.24, valid ppl 69.70
Learning rate after interval update 0.030000
[Epoch 466 Batch 200/372] loss 3.82, ppl 45.72, throughput 162.74 samples/s, lr 0.01
[Epoch 466] throughput 10879.05 samples/s
[Epoch 466] time cost 233.31s, valid loss 4.24, valid ppl 69.65
[Epoch 467 Batch 200/372] loss 3.82, ppl 45.80, throughput 158.57 samples/s, lr 0.03
[Epoch 467] throughput 10740.68 samples/s
[Epoch 467] time cost 235.22s, valid loss 4.24, valid ppl 69.63
[Epoch 468 Batch 200/372] loss 3.82, ppl 45.46, throughput 162.03 samples/s, lr 0.03
[Epoch 468] throughput 10861.71 samples/s
[Epoch 468] time cost 233.59s, valid loss 4.24, valid ppl 69.61
test loss 4.20, test ppl 66.65
[Epoch 469 Batch 200/372] loss 3.82, ppl 45.81, throughput 152.90 samples/s, lr 0.03
[Epoch 469] throughput 10392.09 samples/s
[Epoch 469] time cost 241.55s, valid loss 4.24, valid ppl 69.59
test loss 4.20, test ppl 66.64
[Epoch 470 Batch 200/372] loss 3.82, ppl 45.81, throughput 164.66 samples/s, lr 0.03
[Epoch 470] throughput 11156.89 samples/s
[Epoch 470] time cost 227.74s, valid loss 4.24, valid ppl 69.58
test loss 4.20, test ppl 66.63
[Epoch 471 Batch 200/372] loss 3.82, ppl 45.50, throughput 163.27 samples/s, lr 0.03
[Epoch 471] throughput 11302.34 samples/s
[Epoch 471] time cost 225.67s, valid loss 4.24, valid ppl 69.57
test loss 4.20, test ppl 66.62
[Epoch 472 Batch 200/372] loss 3.82, ppl 45.72, throughput 158.70 samples/s, lr 0.03
[Epoch 472] throughput 10936.82 samples/s
[Epoch 472] time cost 231.76s, valid loss 4.24, valid ppl 69.56
test loss 4.20, test ppl 66.61
[Epoch 473 Batch 200/372] loss 3.82, ppl 45.66, throughput 154.66 samples/s, lr 0.03
[Epoch 473] throughput 10470.60 samples/s
[Epoch 473] time cost 240.89s, valid loss 4.24, valid ppl 69.56
test loss 4.20, test ppl 66.61
[Epoch 474 Batch 200/372] loss 3.82, ppl 45.55, throughput 162.75 samples/s, lr 0.03
[Epoch 474] throughput 10962.70 samples/s
[Epoch 474] time cost 231.38s, valid loss 4.24, valid ppl 69.56
[Epoch 475 Batch 200/372] loss 3.82, ppl 45.63, throughput 159.81 samples/s, lr 0.03
[Epoch 475] throughput 10905.04 samples/s
[Epoch 475] time cost 232.60s, valid loss 4.24, valid ppl 69.57
[Epoch 476 Batch 200/372] loss 3.83, ppl 45.96, throughput 164.38 samples/s, lr 0.03
[Epoch 476] throughput 10953.32 samples/s
[Epoch 476] time cost 231.49s, valid loss 4.24, valid ppl 69.57
[Epoch 477 Batch 200/372] loss 3.82, ppl 45.53, throughput 159.85 samples/s, lr 0.03
[Epoch 477] throughput 10998.40 samples/s
[Epoch 477] time cost 230.50s, valid loss 4.24, valid ppl 69.58
[Epoch 478 Batch 200/372] loss 3.82, ppl 45.42, throughput 159.65 samples/s, lr 0.03
[Epoch 478] throughput 11077.51 samples/s
[Epoch 478] time cost 229.67s, valid loss 4.24, valid ppl 69.58
[Epoch 479 Batch 200/372] loss 3.81, ppl 45.38, throughput 158.23 samples/s, lr 0.01
[Epoch 479] throughput 11013.09 samples/s
[Epoch 479] time cost 230.83s, valid loss 4.24, valid ppl 69.58
[Epoch 480 Batch 200/372] loss 3.82, ppl 45.43, throughput 160.94 samples/s, lr 0.03
[Epoch 480] throughput 11021.22 samples/s
[Epoch 480] time cost 230.40s, valid loss 4.24, valid ppl 69.57
[Epoch 481 Batch 200/372] loss 3.82, ppl 45.40, throughput 157.68 samples/s, lr 0.03
[Epoch 481] throughput 10746.13 samples/s
[Epoch 481] time cost 235.29s, valid loss 4.24, valid ppl 69.57
[Epoch 482 Batch 200/372] loss 3.82, ppl 45.39, throughput 163.36 samples/s, lr 0.03
[Epoch 482] throughput 11068.46 samples/s
[Epoch 482] time cost 229.73s, valid loss 4.24, valid ppl 69.57
[Epoch 483 Batch 200/372] loss 3.81, ppl 45.27, throughput 170.53 samples/s, lr 0.03
[Epoch 483] throughput 11316.23 samples/s
[Epoch 483] time cost 225.53s, valid loss 4.24, valid ppl 69.57
[Epoch 484 Batch 200/372] loss 3.82, ppl 45.40, throughput 164.23 samples/s, lr 0.03
[Epoch 484] throughput 11210.18 samples/s
[Epoch 484] time cost 226.82s, valid loss 4.24, valid ppl 69.57
[Epoch 485 Batch 200/372] loss 3.82, ppl 45.48, throughput 165.22 samples/s, lr 0.03
[Epoch 485] throughput 11214.88 samples/s
[Epoch 485] time cost 227.27s, valid loss 4.24, valid ppl 69.57
[Epoch 486 Batch 200/372] loss 3.82, ppl 45.45, throughput 160.19 samples/s, lr 0.03
[Epoch 486] throughput 10943.95 samples/s
[Epoch 486] time cost 231.68s, valid loss 4.24, valid ppl 69.57
[Epoch 487 Batch 200/372] loss 3.82, ppl 45.41, throughput 174.08 samples/s, lr 0.03
[Epoch 487] throughput 11158.38 samples/s
[Epoch 487] time cost 227.76s, valid loss 4.24, valid ppl 69.56
test loss 4.20, test ppl 66.61
[Epoch 488 Batch 200/372] loss 3.82, ppl 45.40, throughput 155.48 samples/s, lr 0.03
[Epoch 488] throughput 10884.80 samples/s
[Epoch 488] time cost 233.23s, valid loss 4.24, valid ppl 69.56
test loss 4.20, test ppl 66.61
[Epoch 489 Batch 200/372] loss 3.82, ppl 45.51, throughput 157.48 samples/s, lr 0.03
[Epoch 489] throughput 10754.87 samples/s
[Epoch 489] time cost 235.44s, valid loss 4.24, valid ppl 69.56
[Epoch 490 Batch 200/372] loss 3.82, ppl 45.40, throughput 157.83 samples/s, lr 0.03
[Epoch 490] throughput 10812.83 samples/s
[Epoch 490] time cost 234.43s, valid loss 4.24, valid ppl 69.55
test loss 4.20, test ppl 66.61
[Epoch 491 Batch 200/372] loss 3.81, ppl 45.32, throughput 161.83 samples/s, lr 0.03
[Epoch 491] throughput 10973.80 samples/s
[Epoch 491] time cost 231.49s, valid loss 4.24, valid ppl 69.55
test loss 4.20, test ppl 66.61
[Epoch 492 Batch 200/372] loss 3.81, ppl 45.27, throughput 158.36 samples/s, lr 0.03
[Epoch 492] throughput 11226.89 samples/s
[Epoch 492] time cost 227.31s, valid loss 4.24, valid ppl 69.55
[Epoch 493 Batch 200/372] loss 3.81, ppl 45.31, throughput 164.87 samples/s, lr 0.03
[Epoch 493] throughput 11115.40 samples/s
[Epoch 493] time cost 228.97s, valid loss 4.24, valid ppl 69.56
[Epoch 494 Batch 200/372] loss 3.82, ppl 45.63, throughput 163.70 samples/s, lr 0.03
[Epoch 494] throughput 11002.69 samples/s
[Epoch 494] time cost 230.92s, valid loss 4.24, valid ppl 69.55
[Epoch 495 Batch 200/372] loss 3.81, ppl 45.33, throughput 162.00 samples/s, lr 0.03
[Epoch 495] throughput 10797.39 samples/s
[Epoch 495] time cost 234.75s, valid loss 4.24, valid ppl 69.55
test loss 4.20, test ppl 66.61
[Epoch 496 Batch 200/372] loss 3.82, ppl 45.43, throughput 166.17 samples/s, lr 0.03
[Epoch 496] throughput 11087.68 samples/s
[Epoch 496] time cost 229.09s, valid loss 4.24, valid ppl 69.56
[Epoch 497 Batch 200/372] loss 3.81, ppl 45.27, throughput 162.09 samples/s, lr 0.03
[Epoch 497] throughput 11140.76 samples/s
[Epoch 497] time cost 228.53s, valid loss 4.24, valid ppl 69.55
[Epoch 498 Batch 200/372] loss 3.82, ppl 45.43, throughput 161.23 samples/s, lr 0.03
[Epoch 498] throughput 10898.62 samples/s
[Epoch 498] time cost 232.64s, valid loss 4.24, valid ppl 69.56
[Epoch 499 Batch 200/372] loss 3.82, ppl 45.41, throughput 162.65 samples/s, lr 0.03
[Epoch 499] throughput 10991.36 samples/s
[Epoch 499] time cost 230.68s, valid loss 4.24, valid ppl 69.55
test loss 4.20, test ppl 66.61
[Epoch 500 Batch 200/372] loss 3.82, ppl 45.63, throughput 164.59 samples/s, lr 0.03
[Epoch 500] throughput 11255.58 samples/s
[Epoch 500] time cost 226.57s, valid loss 4.24, valid ppl 69.56
[Epoch 501 Batch 200/372] loss 3.82, ppl 45.38, throughput 157.35 samples/s, lr 0.03
[Epoch 501] throughput 10704.18 samples/s
[Epoch 501] time cost 235.79s, valid loss 4.24, valid ppl 69.55
[Epoch 502 Batch 200/372] loss 3.82, ppl 45.40, throughput 151.26 samples/s, lr 0.03
[Epoch 502] throughput 10384.18 samples/s
[Epoch 502] time cost 241.92s, valid loss 4.24, valid ppl 69.55
[Epoch 503 Batch 200/372] loss 3.81, ppl 45.09, throughput 155.81 samples/s, lr 0.02
[Epoch 503] throughput 10500.22 samples/s
[Epoch 503] time cost 239.91s, valid loss 4.24, valid ppl 69.56
[Epoch 504 Batch 200/372] loss 3.81, ppl 45.28, throughput 156.89 samples/s, lr 0.03
[Epoch 504] throughput 10737.01 samples/s
[Epoch 504] time cost 235.36s, valid loss 4.24, valid ppl 69.56
[Epoch 505 Batch 200/372] loss 3.82, ppl 45.45, throughput 159.87 samples/s, lr 0.03
[Epoch 505] throughput 10687.31 samples/s
[Epoch 505] time cost 236.67s, valid loss 4.24, valid ppl 69.55
[Epoch 506 Batch 200/372] loss 3.82, ppl 45.47, throughput 153.13 samples/s, lr 0.03
[Epoch 506] throughput 10676.12 samples/s
[Epoch 506] time cost 236.70s, valid loss 4.24, valid ppl 69.55
[Epoch 507 Batch 200/372] loss 3.82, ppl 45.50, throughput 153.75 samples/s, lr 0.03
[Epoch 507] throughput 10621.59 samples/s
[Epoch 507] time cost 237.74s, valid loss 4.24, valid ppl 69.55
[Epoch 508 Batch 200/372] loss 3.81, ppl 45.27, throughput 155.46 samples/s, lr 0.03
[Epoch 508] throughput 10891.36 samples/s
[Epoch 508] time cost 232.75s, valid loss 4.24, valid ppl 69.55
test loss 4.20, test ppl 66.62
[Epoch 509 Batch 200/372] loss 3.82, ppl 45.47, throughput 171.03 samples/s, lr 0.03
[Epoch 509] throughput 11233.49 samples/s
[Epoch 509] time cost 227.19s, valid loss 4.24, valid ppl 69.55
test loss 4.20, test ppl 66.61
[Epoch 510 Batch 200/372] loss 3.82, ppl 45.43, throughput 157.78 samples/s, lr 0.03
[Epoch 510] throughput 10955.54 samples/s
[Epoch 510] time cost 231.52s, valid loss 4.24, valid ppl 69.55
[Epoch 511 Batch 200/372] loss 3.82, ppl 45.46, throughput 171.41 samples/s, lr 0.03
[Epoch 511] throughput 11301.71 samples/s
[Epoch 511] time cost 225.92s, valid loss 4.24, valid ppl 69.54
test loss 4.20, test ppl 66.61
[Epoch 512 Batch 200/372] loss 3.82, ppl 45.46, throughput 162.79 samples/s, lr 0.03
[Epoch 512] throughput 10927.48 samples/s
[Epoch 512] time cost 232.18s, valid loss 4.24, valid ppl 69.55
[Epoch 513 Batch 200/372] loss 3.82, ppl 45.51, throughput 160.88 samples/s, lr 0.03
[Epoch 513] throughput 10782.72 samples/s
[Epoch 513] time cost 234.26s, valid loss 4.24, valid ppl 69.55
[Epoch 514 Batch 200/372] loss 3.82, ppl 45.41, throughput 154.18 samples/s, lr 0.03
[Epoch 514] throughput 10692.78 samples/s
[Epoch 514] time cost 235.90s, valid loss 4.24, valid ppl 69.55
[Epoch 515 Batch 200/372] loss 3.81, ppl 45.27, throughput 159.91 samples/s, lr 0.03
[Epoch 515] throughput 10878.23 samples/s
[Epoch 515] time cost 232.77s, valid loss 4.24, valid ppl 69.54
[Epoch 516 Batch 200/372] loss 3.81, ppl 45.31, throughput 157.51 samples/s, lr 0.03
[Epoch 516] throughput 10793.79 samples/s
[Epoch 516] time cost 234.89s, valid loss 4.24, valid ppl 69.54
test loss 4.20, test ppl 66.61
[Epoch 517 Batch 200/372] loss 3.81, ppl 45.30, throughput 155.02 samples/s, lr 0.03
[Epoch 517] throughput 10484.68 samples/s
[Epoch 517] time cost 240.25s, valid loss 4.24, valid ppl 69.55
[Epoch 518 Batch 200/372] loss 3.82, ppl 45.53, throughput 155.63 samples/s, lr 0.03
[Epoch 518] throughput 10372.29 samples/s
[Epoch 518] time cost 242.39s, valid loss 4.24, valid ppl 69.55
[Epoch 519 Batch 200/372] loss 3.82, ppl 45.49, throughput 155.22 samples/s, lr 0.03
[Epoch 519] throughput 10614.27 samples/s
[Epoch 519] time cost 237.67s, valid loss 4.24, valid ppl 69.55
[Epoch 520 Batch 200/372] loss 3.81, ppl 45.33, throughput 163.52 samples/s, lr 0.03
[Epoch 520] throughput 11098.96 samples/s
[Epoch 520] time cost 228.95s, valid loss 4.24, valid ppl 69.55
[Epoch 521 Batch 200/372] loss 3.81, ppl 45.31, throughput 159.86 samples/s, lr 0.03
[Epoch 521] throughput 10629.18 samples/s
[Epoch 521] time cost 237.71s, valid loss 4.24, valid ppl 69.55
[Epoch 522 Batch 200/372] loss 3.81, ppl 45.18, throughput 160.51 samples/s, lr 0.03
[Epoch 522] throughput 10836.31 samples/s
[Epoch 522] time cost 233.57s, valid loss 4.24, valid ppl 69.54
[Epoch 523 Batch 200/372] loss 3.81, ppl 45.16, throughput 164.86 samples/s, lr 0.03
[Epoch 523] throughput 11098.76 samples/s
[Epoch 523] time cost 229.17s, valid loss 4.24, valid ppl 69.54
[Epoch 524 Batch 200/372] loss 3.82, ppl 45.38, throughput 157.73 samples/s, lr 0.03
[Epoch 524] throughput 10824.00 samples/s
[Epoch 524] time cost 233.72s, valid loss 4.24, valid ppl 69.54
test loss 4.20, test ppl 66.60
[Epoch 525 Batch 200/372] loss 3.81, ppl 45.27, throughput 154.83 samples/s, lr 0.03
[Epoch 525] throughput 10560.86 samples/s
[Epoch 525] time cost 239.02s, valid loss 4.24, valid ppl 69.54
[Epoch 526 Batch 200/372] loss 3.81, ppl 45.35, throughput 160.44 samples/s, lr 0.03
[Epoch 526] throughput 10701.75 samples/s
[Epoch 526] time cost 235.63s, valid loss 4.24, valid ppl 69.54
[Epoch 527 Batch 200/372] loss 3.81, ppl 45.37, throughput 158.79 samples/s, lr 0.03
[Epoch 527] throughput 10880.61 samples/s
[Epoch 527] time cost 232.89s, valid loss 4.24, valid ppl 69.54
[Epoch 528 Batch 200/372] loss 3.81, ppl 45.37, throughput 165.26 samples/s, lr 0.03
[Epoch 528] throughput 11119.06 samples/s
[Epoch 528] time cost 228.74s, valid loss 4.24, valid ppl 69.55
[Epoch 529 Batch 200/372] loss 3.82, ppl 45.54, throughput 157.72 samples/s, lr 0.03
[Epoch 529] throughput 10592.37 samples/s
[Epoch 529] time cost 237.93s, valid loss 4.24, valid ppl 69.54
[Epoch 530 Batch 200/372] loss 3.81, ppl 45.30, throughput 155.69 samples/s, lr 0.03
[Epoch 530] throughput 10881.87 samples/s
[Epoch 530] time cost 232.68s, valid loss 4.24, valid ppl 69.54
[Epoch 531 Batch 200/372] loss 3.82, ppl 45.39, throughput 164.29 samples/s, lr 0.03
[Epoch 531] throughput 11303.56 samples/s
[Epoch 531] time cost 225.45s, valid loss 4.24, valid ppl 69.54
[Epoch 532 Batch 200/372] loss 3.82, ppl 45.74, throughput 166.60 samples/s, lr 0.03
[Epoch 532] throughput 11545.24 samples/s
[Epoch 532] time cost 221.85s, valid loss 4.24, valid ppl 69.54
[Epoch 533 Batch 200/372] loss 3.82, ppl 45.51, throughput 163.41 samples/s, lr 0.03
[Epoch 533] throughput 11188.25 samples/s
[Epoch 533] time cost 227.92s, valid loss 4.24, valid ppl 69.54
[Epoch 534 Batch 200/372] loss 3.82, ppl 45.48, throughput 165.47 samples/s, lr 0.03
[Epoch 534] throughput 11411.45 samples/s
[Epoch 534] time cost 223.81s, valid loss 4.24, valid ppl 69.55
[Epoch 535 Batch 200/372] loss 3.82, ppl 45.46, throughput 169.97 samples/s, lr 0.03
[Epoch 535] throughput 11624.55 samples/s
[Epoch 535] time cost 220.69s, valid loss 4.24, valid ppl 69.56
[Epoch 536 Batch 200/372] loss 3.81, ppl 45.11, throughput 164.67 samples/s, lr 0.03
[Epoch 536] throughput 11248.39 samples/s
[Epoch 536] time cost 226.63s, valid loss 4.24, valid ppl 69.55
[Epoch 537 Batch 200/372] loss 3.82, ppl 45.46, throughput 161.37 samples/s, lr 0.03
[Epoch 537] throughput 10871.96 samples/s
[Epoch 537] time cost 233.06s, valid loss 4.24, valid ppl 69.55
[Epoch 538 Batch 200/372] loss 3.81, ppl 45.22, throughput 163.30 samples/s, lr 0.03
[Epoch 538] throughput 11045.10 samples/s
[Epoch 538] time cost 230.17s, valid loss 4.24, valid ppl 69.55
[Epoch 539 Batch 200/372] loss 3.81, ppl 45.34, throughput 163.98 samples/s, lr 0.03
[Epoch 539] throughput 11150.92 samples/s
[Epoch 539] time cost 228.52s, valid loss 4.24, valid ppl 69.55
[Epoch 540 Batch 200/372] loss 3.82, ppl 45.42, throughput 162.54 samples/s, lr 0.03
[Epoch 540] throughput 11080.77 samples/s
[Epoch 540] time cost 229.30s, valid loss 4.24, valid ppl 69.54
[Epoch 541 Batch 200/372] loss 3.81, ppl 45.19, throughput 159.09 samples/s, lr 0.03
[Epoch 541] throughput 10745.29 samples/s
[Epoch 541] time cost 235.37s, valid loss 4.24, valid ppl 69.55
[Epoch 542 Batch 200/372] loss 3.82, ppl 45.51, throughput 155.68 samples/s, lr 0.03
[Epoch 542] throughput 10942.45 samples/s
[Epoch 542] time cost 231.44s, valid loss 4.24, valid ppl 69.55
[Epoch 543 Batch 200/372] loss 3.82, ppl 45.54, throughput 163.63 samples/s, lr 0.03
[Epoch 543] throughput 10990.74 samples/s
[Epoch 543] time cost 231.37s, valid loss 4.24, valid ppl 69.54
[Epoch 544 Batch 200/372] loss 3.82, ppl 45.62, throughput 156.89 samples/s, lr 0.03
[Epoch 544] throughput 10808.76 samples/s
[Epoch 544] time cost 233.91s, valid loss 4.24, valid ppl 69.54
[Epoch 545 Batch 200/372] loss 3.82, ppl 45.38, throughput 165.60 samples/s, lr 0.03
[Epoch 545] throughput 11256.99 samples/s
[Epoch 545] time cost 226.53s, valid loss 4.24, valid ppl 69.54
[Epoch 546 Batch 200/372] loss 3.82, ppl 45.52, throughput 161.73 samples/s, lr 0.03
[Epoch 546] throughput 11181.17 samples/s
[Epoch 546] time cost 227.78s, valid loss 4.24, valid ppl 69.53
test loss 4.20, test ppl 66.59
[Epoch 547 Batch 200/372] loss 3.82, ppl 45.53, throughput 159.91 samples/s, lr 0.03
[Epoch 547] throughput 10894.63 samples/s
[Epoch 547] time cost 232.56s, valid loss 4.24, valid ppl 69.53
test loss 4.20, test ppl 66.59
[Epoch 548 Batch 200/372] loss 3.81, ppl 45.19, throughput 161.65 samples/s, lr 0.03
[Epoch 548] throughput 10846.98 samples/s
[Epoch 548] time cost 233.85s, valid loss 4.24, valid ppl 69.53
[Epoch 549 Batch 200/372] loss 3.81, ppl 45.23, throughput 155.16 samples/s, lr 0.03
[Epoch 549] throughput 10850.00 samples/s
[Epoch 549] time cost 233.43s, valid loss 4.24, valid ppl 69.54
[Epoch 550 Batch 200/372] loss 3.81, ppl 45.34, throughput 160.21 samples/s, lr 0.03
[Epoch 550] throughput 10995.04 samples/s
[Epoch 550] time cost 230.65s, valid loss 4.24, valid ppl 69.53
[Epoch 551 Batch 200/372] loss 3.81, ppl 45.29, throughput 162.71 samples/s, lr 0.03
[Epoch 551] throughput 11219.21 samples/s
[Epoch 551] time cost 227.32s, valid loss 4.24, valid ppl 69.54
[Epoch 552 Batch 200/372] loss 3.81, ppl 45.26, throughput 162.79 samples/s, lr 0.03
[Epoch 552] throughput 11161.30 samples/s
[Epoch 552] time cost 227.72s, valid loss 4.24, valid ppl 69.54
[Epoch 553 Batch 200/372] loss 3.81, ppl 45.32, throughput 162.63 samples/s, lr 0.03
[Epoch 553] throughput 10936.07 samples/s
[Epoch 553] time cost 231.81s, valid loss 4.24, valid ppl 69.54
[Epoch 554 Batch 200/372] loss 3.82, ppl 45.45, throughput 168.32 samples/s, lr 0.03
[Epoch 554] throughput 11166.26 samples/s
[Epoch 554] time cost 228.05s, valid loss 4.24, valid ppl 69.54
[Epoch 555 Batch 200/372] loss 3.82, ppl 45.49, throughput 156.73 samples/s, lr 0.03
[Epoch 555] throughput 10924.03 samples/s
[Epoch 555] time cost 232.10s, valid loss 4.24, valid ppl 69.54
[Epoch 556 Batch 200/372] loss 3.82, ppl 45.54, throughput 159.87 samples/s, lr 0.03
[Epoch 556] throughput 10985.03 samples/s
[Epoch 556] time cost 230.89s, valid loss 4.24, valid ppl 69.54
[Epoch 557 Batch 200/372] loss 3.81, ppl 45.30, throughput 164.99 samples/s, lr 0.03
[Epoch 557] throughput 11336.70 samples/s
[Epoch 557] time cost 225.64s, valid loss 4.24, valid ppl 69.54
[Epoch 558 Batch 200/372] loss 3.81, ppl 45.35, throughput 166.65 samples/s, lr 0.03
[Epoch 558] throughput 11167.28 samples/s
[Epoch 558] time cost 228.24s, valid loss 4.24, valid ppl 69.54
[Epoch 559 Batch 200/372] loss 3.82, ppl 45.47, throughput 160.73 samples/s, lr 0.03
[Epoch 559] throughput 10744.01 samples/s
[Epoch 559] time cost 235.22s, valid loss 4.24, valid ppl 69.54
[Epoch 560 Batch 200/372] loss 3.81, ppl 45.29, throughput 158.56 samples/s, lr 0.03
[Epoch 560] throughput 11096.23 samples/s
[Epoch 560] time cost 229.10s, valid loss 4.24, valid ppl 69.54
[Epoch 561 Batch 200/372] loss 3.81, ppl 45.13, throughput 161.21 samples/s, lr 0.03
[Epoch 561] throughput 10793.41 samples/s
[Epoch 561] time cost 234.23s, valid loss 4.24, valid ppl 69.54
[Epoch 562 Batch 200/372] loss 3.82, ppl 45.38, throughput 162.89 samples/s, lr 0.03
[Epoch 562] throughput 10967.25 samples/s
[Epoch 562] time cost 231.42s, valid loss 4.24, valid ppl 69.54
[Epoch 563 Batch 200/372] loss 3.81, ppl 45.29, throughput 170.53 samples/s, lr 0.03
[Epoch 563] throughput 11464.53 samples/s
[Epoch 563] time cost 222.91s, valid loss 4.24, valid ppl 69.54
[Epoch 564 Batch 200/372] loss 3.82, ppl 45.41, throughput 161.46 samples/s, lr 0.03
[Epoch 564] throughput 11117.82 samples/s
[Epoch 564] time cost 228.87s, valid loss 4.24, valid ppl 69.53
[Epoch 565 Batch 200/372] loss 3.82, ppl 45.49, throughput 160.98 samples/s, lr 0.03
[Epoch 565] throughput 10944.57 samples/s
[Epoch 565] time cost 232.01s, valid loss 4.24, valid ppl 69.53
[Epoch 566 Batch 200/372] loss 3.81, ppl 45.09, throughput 171.83 samples/s, lr 0.03
[Epoch 566] throughput 11499.38 samples/s
[Epoch 566] time cost 222.28s, valid loss 4.24, valid ppl 69.54
[Epoch 567 Batch 200/372] loss 3.82, ppl 45.39, throughput 164.03 samples/s, lr 0.03
[Epoch 567] throughput 11174.26 samples/s
[Epoch 567] time cost 227.52s, valid loss 4.24, valid ppl 69.54
[Epoch 568 Batch 200/372] loss 3.81, ppl 45.38, throughput 162.23 samples/s, lr 0.03
[Epoch 568] throughput 10949.79 samples/s
[Epoch 568] time cost 231.57s, valid loss 4.24, valid ppl 69.54
[Epoch 569 Batch 200/372] loss 3.82, ppl 45.38, throughput 154.88 samples/s, lr 0.03
[Epoch 569] throughput 10943.64 samples/s
[Epoch 569] time cost 232.03s, valid loss 4.24, valid ppl 69.54
[Epoch 570 Batch 200/372] loss 3.82, ppl 45.44, throughput 159.27 samples/s, lr 0.03
[Epoch 570] throughput 11061.42 samples/s
[Epoch 570] time cost 230.27s, valid loss 4.24, valid ppl 69.54
[Epoch 571 Batch 200/372] loss 3.81, ppl 45.35, throughput 162.23 samples/s, lr 0.03
[Epoch 571] throughput 11000.35 samples/s
[Epoch 571] time cost 230.91s, valid loss 4.24, valid ppl 69.54
[Epoch 572 Batch 200/372] loss 3.81, ppl 45.21, throughput 155.85 samples/s, lr 0.03
[Epoch 572] throughput 10895.50 samples/s
[Epoch 572] time cost 232.91s, valid loss 4.24, valid ppl 69.54
[Epoch 573 Batch 200/372] loss 3.82, ppl 45.54, throughput 158.61 samples/s, lr 0.03
[Epoch 573] throughput 10721.60 samples/s
[Epoch 573] time cost 235.57s, valid loss 4.24, valid ppl 69.55
[Epoch 574 Batch 200/372] loss 3.81, ppl 45.19, throughput 162.07 samples/s, lr 0.03
[Epoch 574] throughput 11153.61 samples/s
[Epoch 574] time cost 228.51s, valid loss 4.24, valid ppl 69.55
[Epoch 575 Batch 200/372] loss 3.81, ppl 45.30, throughput 160.16 samples/s, lr 0.03
[Epoch 575] throughput 11010.04 samples/s
[Epoch 575] time cost 230.18s, valid loss 4.24, valid ppl 69.54
[Epoch 576 Batch 200/372] loss 3.82, ppl 45.49, throughput 157.44 samples/s, lr 0.03
[Epoch 576] throughput 10778.58 samples/s
[Epoch 576] time cost 235.01s, valid loss 4.24, valid ppl 69.54
[Epoch 577 Batch 200/372] loss 3.82, ppl 45.41, throughput 165.34 samples/s, lr 0.03
[Epoch 577] throughput 11031.68 samples/s
[Epoch 577] time cost 229.95s, valid loss 4.24, valid ppl 69.54
Learning rate after interval update 0.003000
[Epoch 578 Batch 200/372] loss 3.81, ppl 45.35, throughput 160.83 samples/s, lr 0.00
[Epoch 578] throughput 10837.85 samples/s
[Epoch 578] time cost 234.18s, valid loss 4.24, valid ppl 69.54
[Epoch 579 Batch 200/372] loss 3.81, ppl 45.33, throughput 160.46 samples/s, lr 0.00
[Epoch 579] throughput 11036.40 samples/s
[Epoch 579] time cost 229.97s, valid loss 4.24, valid ppl 69.54
[Epoch 580 Batch 200/372] loss 3.81, ppl 45.31, throughput 163.88 samples/s, lr 0.00
[Epoch 580] throughput 10899.42 samples/s
[Epoch 580] time cost 232.48s, valid loss 4.24, valid ppl 69.54
[Epoch 581 Batch 200/372] loss 3.81, ppl 45.24, throughput 164.60 samples/s, lr 0.00
[Epoch 581] throughput 11218.54 samples/s
[Epoch 581] time cost 227.53s, valid loss 4.24, valid ppl 69.54
[Epoch 582 Batch 200/372] loss 3.81, ppl 45.32, throughput 164.24 samples/s, lr 0.00
[Epoch 582] throughput 11058.06 samples/s
[Epoch 582] time cost 229.54s, valid loss 4.24, valid ppl 69.54
[Epoch 583 Batch 200/372] loss 3.81, ppl 45.33, throughput 157.42 samples/s, lr 0.00
[Epoch 583] throughput 10751.91 samples/s
[Epoch 583] time cost 235.05s, valid loss 4.24, valid ppl 69.54
[Epoch 584 Batch 200/372] loss 3.81, ppl 45.18, throughput 164.12 samples/s, lr 0.00
[Epoch 584] throughput 11196.04 samples/s
[Epoch 584] time cost 227.30s, valid loss 4.24, valid ppl 69.54
[Epoch 585 Batch 200/372] loss 3.81, ppl 45.36, throughput 156.29 samples/s, lr 0.00
[Epoch 585] throughput 10839.57 samples/s
[Epoch 585] time cost 233.88s, valid loss 4.24, valid ppl 69.54
[Epoch 586 Batch 200/372] loss 3.82, ppl 45.38, throughput 161.87 samples/s, lr 0.00
[Epoch 586] throughput 10887.77 samples/s
[Epoch 586] time cost 232.43s, valid loss 4.24, valid ppl 69.54
[Epoch 587 Batch 200/372] loss 3.81, ppl 45.30, throughput 163.39 samples/s, lr 0.00
[Epoch 587] throughput 11245.22 samples/s
[Epoch 587] time cost 227.19s, valid loss 4.24, valid ppl 69.54
[Epoch 588 Batch 200/372] loss 3.81, ppl 45.27, throughput 159.42 samples/s, lr 0.00
[Epoch 588] throughput 10806.27 samples/s
[Epoch 588] time cost 234.15s, valid loss 4.24, valid ppl 69.54
[Epoch 589 Batch 200/372] loss 3.82, ppl 45.67, throughput 161.44 samples/s, lr 0.00
[Epoch 589] throughput 10992.51 samples/s
[Epoch 589] time cost 230.92s, valid loss 4.24, valid ppl 69.54
[Epoch 590 Batch 200/372] loss 3.81, ppl 45.34, throughput 159.20 samples/s, lr 0.00
[Epoch 590] throughput 10783.06 samples/s
[Epoch 590] time cost 234.73s, valid loss 4.24, valid ppl 69.54
[Epoch 591 Batch 200/372] loss 3.81, ppl 45.24, throughput 152.99 samples/s, lr 0.00
[Epoch 591] throughput 10528.12 samples/s
[Epoch 591] time cost 239.70s, valid loss 4.24, valid ppl 69.54
[Epoch 592 Batch 200/372] loss 3.82, ppl 45.38, throughput 163.98 samples/s, lr 0.00
[Epoch 592] throughput 11347.14 samples/s
[Epoch 592] time cost 224.77s, valid loss 4.24, valid ppl 69.54
[Epoch 593 Batch 200/372] loss 3.81, ppl 45.30, throughput 166.80 samples/s, lr 0.00
[Epoch 593] throughput 11370.22 samples/s
[Epoch 593] time cost 224.43s, valid loss 4.24, valid ppl 69.54
[Epoch 594 Batch 200/372] loss 3.82, ppl 45.58, throughput 160.80 samples/s, lr 0.00
[Epoch 594] throughput 10592.53 samples/s
[Epoch 594] time cost 238.07s, valid loss 4.24, valid ppl 69.54
[Epoch 595 Batch 200/372] loss 3.82, ppl 45.48, throughput 165.51 samples/s, lr 0.00
[Epoch 595] throughput 11169.24 samples/s
[Epoch 595] time cost 227.73s, valid loss 4.24, valid ppl 69.54
[Epoch 596 Batch 200/372] loss 3.81, ppl 45.23, throughput 155.64 samples/s, lr 0.00
[Epoch 596] throughput 10845.99 samples/s
[Epoch 596] time cost 234.22s, valid loss 4.24, valid ppl 69.54
[Epoch 597 Batch 200/372] loss 3.82, ppl 45.38, throughput 159.83 samples/s, lr 0.00
[Epoch 597] throughput 10773.55 samples/s
[Epoch 597] time cost 234.71s, valid loss 4.24, valid ppl 69.54
[Epoch 598 Batch 200/372] loss 3.81, ppl 45.20, throughput 161.40 samples/s, lr 0.00
[Epoch 598] throughput 10812.91 samples/s
[Epoch 598] time cost 234.28s, valid loss 4.24, valid ppl 69.54
[Epoch 599 Batch 200/372] loss 3.81, ppl 45.03, throughput 153.37 samples/s, lr 0.00
[Epoch 599] throughput 10610.65 samples/s
[Epoch 599] time cost 237.96s, valid loss 4.24, valid ppl 69.54
[Epoch 600 Batch 200/372] loss 3.81, ppl 45.36, throughput 164.53 samples/s, lr 0.00
[Epoch 600] throughput 11308.10 samples/s
[Epoch 600] time cost 225.75s, valid loss 4.24, valid ppl 69.54
[Epoch 601 Batch 200/372] loss 3.82, ppl 45.47, throughput 167.00 samples/s, lr 0.00
[Epoch 601] throughput 11056.95 samples/s
[Epoch 601] time cost 230.11s, valid loss 4.24, valid ppl 69.54
[Epoch 602 Batch 200/372] loss 3.82, ppl 45.40, throughput 161.04 samples/s, lr 0.00
[Epoch 602] throughput 11111.93 samples/s
[Epoch 602] time cost 228.56s, valid loss 4.24, valid ppl 69.54
[Epoch 603 Batch 200/372] loss 3.81, ppl 45.09, throughput 160.53 samples/s, lr 0.00
[Epoch 603] throughput 11145.66 samples/s
[Epoch 603] time cost 228.40s, valid loss 4.24, valid ppl 69.54
[Epoch 604 Batch 200/372] loss 3.81, ppl 45.32, throughput 162.92 samples/s, lr 0.00
[Epoch 604] throughput 11420.07 samples/s
[Epoch 604] time cost 223.78s, valid loss 4.24, valid ppl 69.54
[Epoch 605 Batch 200/372] loss 3.81, ppl 45.27, throughput 164.32 samples/s, lr 0.00
[Epoch 605] throughput 10928.00 samples/s
[Epoch 605] time cost 231.72s, valid loss 4.24, valid ppl 69.54
[Epoch 606 Batch 200/372] loss 3.81, ppl 45.16, throughput 163.76 samples/s, lr 0.00
[Epoch 606] throughput 11011.99 samples/s
[Epoch 606] time cost 230.51s, valid loss 4.24, valid ppl 69.54
[Epoch 607 Batch 200/372] loss 3.81, ppl 45.19, throughput 160.86 samples/s, lr 0.00
[Epoch 607] throughput 10961.97 samples/s
[Epoch 607] time cost 231.43s, valid loss 4.24, valid ppl 69.54
Learning rate after interval update 0.000300
[Epoch 608 Batch 200/372] loss 3.81, ppl 45.19, throughput 158.87 samples/s, lr 0.00
[Epoch 608] throughput 10985.82 samples/s
[Epoch 608] time cost 230.85s, valid loss 4.24, valid ppl 69.54
[Epoch 609 Batch 200/372] loss 3.81, ppl 45.29, throughput 170.43 samples/s, lr 0.00
[Epoch 609] throughput 11522.92 samples/s
[Epoch 609] time cost 222.41s, valid loss 4.24, valid ppl 69.54
[Epoch 610 Batch 200/372] loss 3.81, ppl 45.34, throughput 162.75 samples/s, lr 0.00
[Epoch 610] throughput 10838.39 samples/s
[Epoch 610] time cost 233.69s, valid loss 4.24, valid ppl 69.54
[Epoch 611 Batch 200/372] loss 3.81, ppl 45.29, throughput 158.80 samples/s, lr 0.00
[Epoch 611] throughput 10992.07 samples/s
[Epoch 611] time cost 230.87s, valid loss 4.24, valid ppl 69.54
[Epoch 612 Batch 200/372] loss 3.82, ppl 45.57, throughput 162.89 samples/s, lr 0.00
[Epoch 612] throughput 10941.36 samples/s
[Epoch 612] time cost 232.16s, valid loss 4.24, valid ppl 69.54
[Epoch 613 Batch 200/372] loss 3.82, ppl 45.52, throughput 162.56 samples/s, lr 0.00
[Epoch 613] throughput 10855.44 samples/s
[Epoch 613] time cost 233.23s, valid loss 4.24, valid ppl 69.54
[Epoch 614 Batch 200/372] loss 3.82, ppl 45.39, throughput 161.46 samples/s, lr 0.00
[Epoch 614] throughput 10997.71 samples/s
[Epoch 614] time cost 230.80s, valid loss 4.24, valid ppl 69.54
[Epoch 615 Batch 200/372] loss 3.81, ppl 45.32, throughput 163.01 samples/s, lr 0.00
[Epoch 615] throughput 11184.47 samples/s
[Epoch 615] time cost 227.64s, valid loss 4.24, valid ppl 69.54
[Epoch 616 Batch 200/372] loss 3.81, ppl 45.26, throughput 167.10 samples/s, lr 0.00
[Epoch 616] throughput 11258.63 samples/s
[Epoch 616] time cost 226.55s, valid loss 4.24, valid ppl 69.54
[Epoch 617 Batch 200/372] loss 3.82, ppl 45.45, throughput 163.25 samples/s, lr 0.00
[Epoch 617] throughput 10984.04 samples/s
[Epoch 617] time cost 230.49s, valid loss 4.24, valid ppl 69.54
[Epoch 618 Batch 200/372] loss 3.81, ppl 45.30, throughput 159.78 samples/s, lr 0.00
[Epoch 618] throughput 10930.82 samples/s
[Epoch 618] time cost 231.92s, valid loss 4.24, valid ppl 69.54
[Epoch 619 Batch 200/372] loss 3.81, ppl 45.33, throughput 161.18 samples/s, lr 0.00
[Epoch 619] throughput 10710.94 samples/s
[Epoch 619] time cost 235.82s, valid loss 4.24, valid ppl 69.54
[Epoch 620 Batch 200/372] loss 3.81, ppl 45.29, throughput 154.12 samples/s, lr 0.00
[Epoch 620] throughput 10527.01 samples/s
[Epoch 620] time cost 239.19s, valid loss 4.24, valid ppl 69.54
[Epoch 621 Batch 200/372] loss 3.81, ppl 45.29, throughput 156.26 samples/s, lr 0.00
[Epoch 621] throughput 10554.85 samples/s
[Epoch 621] time cost 238.74s, valid loss 4.24, valid ppl 69.54
[Epoch 622 Batch 200/372] loss 3.82, ppl 45.40, throughput 158.75 samples/s, lr 0.00
[Epoch 622] throughput 10627.33 samples/s
[Epoch 622] time cost 237.47s, valid loss 4.24, valid ppl 69.54
[Epoch 623 Batch 200/372] loss 3.81, ppl 45.27, throughput 161.10 samples/s, lr 0.00
[Epoch 623] throughput 10880.20 samples/s
[Epoch 623] time cost 233.00s, valid loss 4.24, valid ppl 69.54
[Epoch 624 Batch 200/372] loss 3.81, ppl 45.20, throughput 163.51 samples/s, lr 0.00
[Epoch 624] throughput 11052.83 samples/s
[Epoch 624] time cost 230.23s, valid loss 4.24, valid ppl 69.54
[Epoch 625 Batch 200/372] loss 3.81, ppl 45.16, throughput 167.10 samples/s, lr 0.00
[Epoch 625] throughput 11321.60 samples/s
[Epoch 625] time cost 225.64s, valid loss 4.24, valid ppl 69.54
[Epoch 626 Batch 200/372] loss 3.81, ppl 45.19, throughput 170.63 samples/s, lr 0.00
[Epoch 626] throughput 11243.09 samples/s
[Epoch 626] time cost 226.58s, valid loss 4.24, valid ppl 69.54
[Epoch 627 Batch 200/372] loss 3.82, ppl 45.55, throughput 163.19 samples/s, lr 0.00
[Epoch 627] throughput 11015.18 samples/s
[Epoch 627] time cost 230.83s, valid loss 4.24, valid ppl 69.54
[Epoch 628 Batch 200/372] loss 3.82, ppl 45.49, throughput 162.71 samples/s, lr 0.00
[Epoch 628] throughput 10904.18 samples/s
[Epoch 628] time cost 232.55s, valid loss 4.24, valid ppl 69.54
[Epoch 629 Batch 200/372] loss 3.81, ppl 45.30, throughput 166.23 samples/s, lr 0.00
[Epoch 629] throughput 11177.26 samples/s
[Epoch 629] time cost 227.49s, valid loss 4.24, valid ppl 69.54
[Epoch 630 Batch 200/372] loss 3.82, ppl 45.63, throughput 160.19 samples/s, lr 0.00
[Epoch 630] throughput 10996.05 samples/s
[Epoch 630] time cost 230.91s, valid loss 4.24, valid ppl 69.54
[Epoch 631 Batch 200/372] loss 3.81, ppl 45.26, throughput 160.99 samples/s, lr 0.00
[Epoch 631] throughput 10981.37 samples/s
[Epoch 631] time cost 230.83s, valid loss 4.24, valid ppl 69.54
[Epoch 632 Batch 200/372] loss 3.81, ppl 45.24, throughput 160.95 samples/s, lr 0.00
[Epoch 632] throughput 10926.46 samples/s
[Epoch 632] time cost 232.15s, valid loss 4.24, valid ppl 69.54
[Epoch 633 Batch 200/372] loss 3.82, ppl 45.39, throughput 165.34 samples/s, lr 0.00
[Epoch 633] throughput 10925.30 samples/s
[Epoch 633] time cost 231.99s, valid loss 4.24, valid ppl 69.54
[Epoch 634 Batch 200/372] loss 3.82, ppl 45.49, throughput 156.23 samples/s, lr 0.00
[Epoch 634] throughput 10721.66 samples/s
[Epoch 634] time cost 235.80s, valid loss 4.24, valid ppl 69.54
[Epoch 635 Batch 200/372] loss 3.81, ppl 45.28, throughput 158.87 samples/s, lr 0.00
[Epoch 635] throughput 10788.18 samples/s
[Epoch 635] time cost 234.43s, valid loss 4.24, valid ppl 69.54
[Epoch 636 Batch 200/372] loss 3.82, ppl 45.41, throughput 160.06 samples/s, lr 0.00
[Epoch 636] throughput 10926.92 samples/s
[Epoch 636] time cost 232.08s, valid loss 4.24, valid ppl 69.54
[Epoch 637 Batch 200/372] loss 3.82, ppl 45.48, throughput 161.25 samples/s, lr 0.00
[Epoch 637] throughput 10843.95 samples/s
[Epoch 637] time cost 233.54s, valid loss 4.24, valid ppl 69.54
Learning rate after interval update 0.000030
[Epoch 638 Batch 200/372] loss 3.82, ppl 45.65, throughput 165.52 samples/s, lr 0.00
[Epoch 638] throughput 10912.40 samples/s
[Epoch 638] time cost 232.23s, valid loss 4.24, valid ppl 69.54
[Epoch 639 Batch 200/372] loss 3.82, ppl 45.49, throughput 164.60 samples/s, lr 0.00
[Epoch 639] throughput 11159.45 samples/s
[Epoch 639] time cost 228.06s, valid loss 4.24, valid ppl 69.54
[Epoch 640 Batch 200/372] loss 3.82, ppl 45.43, throughput 162.95 samples/s, lr 0.00
[Epoch 640] throughput 11358.92 samples/s
[Epoch 640] time cost 224.59s, valid loss 4.24, valid ppl 69.54
[Epoch 641 Batch 200/372] loss 3.82, ppl 45.48, throughput 161.39 samples/s, lr 0.00
[Epoch 641] throughput 10862.12 samples/s
[Epoch 641] time cost 233.06s, valid loss 4.24, valid ppl 69.54
[Epoch 642 Batch 200/372] loss 3.81, ppl 45.31, throughput 162.79 samples/s, lr 0.00
[Epoch 642] throughput 11061.33 samples/s
[Epoch 642] time cost 229.66s, valid loss 4.24, valid ppl 69.54
[Epoch 643 Batch 200/372] loss 3.81, ppl 45.24, throughput 154.62 samples/s, lr 0.00
[Epoch 643] throughput 10763.97 samples/s
[Epoch 643] time cost 235.04s, valid loss 4.24, valid ppl 69.54
[Epoch 644 Batch 200/372] loss 3.81, ppl 45.21, throughput 158.07 samples/s, lr 0.00
[Epoch 644] throughput 10712.27 samples/s
[Epoch 644] time cost 236.26s, valid loss 4.24, valid ppl 69.54
[Epoch 645 Batch 200/372] loss 3.82, ppl 45.45, throughput 156.64 samples/s, lr 0.00
[Epoch 645] throughput 10632.56 samples/s
[Epoch 645] time cost 237.11s, valid loss 4.24, valid ppl 69.54
[Epoch 646 Batch 200/372] loss 3.81, ppl 45.26, throughput 158.78 samples/s, lr 0.00
[Epoch 646] throughput 10505.01 samples/s
[Epoch 646] time cost 239.58s, valid loss 4.24, valid ppl 69.54
[Epoch 647 Batch 200/372] loss 3.81, ppl 45.09, throughput 158.41 samples/s, lr 0.00
[Epoch 647] throughput 10835.42 samples/s
[Epoch 647] time cost 233.52s, valid loss 4.24, valid ppl 69.54
[Epoch 648 Batch 200/372] loss 3.81, ppl 45.32, throughput 161.04 samples/s, lr 0.00
[Epoch 648] throughput 10938.15 samples/s
[Epoch 648] time cost 231.65s, valid loss 4.24, valid ppl 69.54
[Epoch 649 Batch 200/372] loss 3.81, ppl 45.23, throughput 155.68 samples/s, lr 0.00
[Epoch 649] throughput 10544.03 samples/s
[Epoch 649] time cost 239.16s, valid loss 4.24, valid ppl 69.54
[Epoch 650 Batch 200/372] loss 3.81, ppl 45.30, throughput 160.20 samples/s, lr 0.00
[Epoch 650] throughput 11124.87 samples/s
[Epoch 650] time cost 228.95s, valid loss 4.24, valid ppl 69.54
[Epoch 651 Batch 200/372] loss 3.81, ppl 45.31, throughput 161.81 samples/s, lr 0.00
[Epoch 651] throughput 10933.31 samples/s
[Epoch 651] time cost 232.11s, valid loss 4.24, valid ppl 69.54
[Epoch 652 Batch 200/372] loss 3.82, ppl 45.50, throughput 156.33 samples/s, lr 0.00
[Epoch 652] throughput 10585.88 samples/s
[Epoch 652] time cost 238.12s, valid loss 4.24, valid ppl 69.54
[Epoch 653 Batch 200/372] loss 3.82, ppl 45.49, throughput 158.95 samples/s, lr 0.00
[Epoch 653] throughput 10847.48 samples/s
[Epoch 653] time cost 233.36s, valid loss 4.24, valid ppl 69.54
[Epoch 654 Batch 200/372] loss 3.81, ppl 45.37, throughput 153.52 samples/s, lr 0.00
[Epoch 654] throughput 10740.53 samples/s
[Epoch 654] time cost 235.26s, valid loss 4.24, valid ppl 69.54
[Epoch 655 Batch 200/372] loss 3.82, ppl 45.38, throughput 158.49 samples/s, lr 0.00
[Epoch 655] throughput 10842.21 samples/s
[Epoch 655] time cost 233.74s, valid loss 4.24, valid ppl 69.54
[Epoch 656 Batch 200/372] loss 3.81, ppl 45.06, throughput 156.17 samples/s, lr 0.00
[Epoch 656] throughput 10745.49 samples/s
[Epoch 656] time cost 235.31s, valid loss 4.24, valid ppl 69.54
[Epoch 657 Batch 200/372] loss 3.82, ppl 45.42, throughput 158.93 samples/s, lr 0.00
[Epoch 657] throughput 10853.16 samples/s
[Epoch 657] time cost 233.74s, valid loss 4.24, valid ppl 69.54
[Epoch 658 Batch 200/372] loss 3.82, ppl 45.41, throughput 167.93 samples/s, lr 0.00
[Epoch 658] throughput 11307.70 samples/s
[Epoch 658] time cost 225.91s, valid loss 4.24, valid ppl 69.54
[Epoch 659 Batch 200/372] loss 3.82, ppl 45.40, throughput 165.82 samples/s, lr 0.00
[Epoch 659] throughput 10892.26 samples/s
[Epoch 659] time cost 232.67s, valid loss 4.24, valid ppl 69.54
[Epoch 660 Batch 200/372] loss 3.81, ppl 45.22, throughput 157.49 samples/s, lr 0.00
[Epoch 660] throughput 11132.99 samples/s
[Epoch 660] time cost 229.12s, valid loss 4.24, valid ppl 69.54
[Epoch 661 Batch 200/372] loss 3.81, ppl 45.33, throughput 161.01 samples/s, lr 0.00
[Epoch 661] throughput 10982.36 samples/s
[Epoch 661] time cost 230.48s, valid loss 4.24, valid ppl 69.54
[Epoch 662 Batch 200/372] loss 3.81, ppl 45.33, throughput 164.38 samples/s, lr 0.00
[Epoch 662] throughput 11424.29 samples/s
[Epoch 662] time cost 223.81s, valid loss 4.24, valid ppl 69.54
[Epoch 663 Batch 200/372] loss 3.81, ppl 45.27, throughput 159.55 samples/s, lr 0.00
[Epoch 663] throughput 10948.09 samples/s
[Epoch 663] time cost 232.17s, valid loss 4.24, valid ppl 69.54
[Epoch 664 Batch 200/372] loss 3.81, ppl 45.34, throughput 155.41 samples/s, lr 0.00
[Epoch 664] throughput 10715.58 samples/s
[Epoch 664] time cost 235.73s, valid loss 4.24, valid ppl 69.54
[Epoch 665 Batch 200/372] loss 3.81, ppl 45.26, throughput 158.48 samples/s, lr 0.00
[Epoch 665] throughput 10950.85 samples/s
[Epoch 665] time cost 231.34s, valid loss 4.24, valid ppl 69.54
[Epoch 666 Batch 200/372] loss 3.82, ppl 45.45, throughput 160.75 samples/s, lr 0.00
[Epoch 666] throughput 11060.87 samples/s
[Epoch 666] time cost 229.77s, valid loss 4.24, valid ppl 69.54
[Epoch 667 Batch 200/372] loss 3.82, ppl 45.51, throughput 161.83 samples/s, lr 0.00
[Epoch 667] throughput 10988.94 samples/s
[Epoch 667] time cost 230.88s, valid loss 4.24, valid ppl 69.54
Learning rate after interval update 0.000003
[Epoch 668 Batch 200/372] loss 3.81, ppl 45.19, throughput 162.13 samples/s, lr 0.00
[Epoch 668] throughput 11080.83 samples/s
[Epoch 668] time cost 229.71s, valid loss 4.24, valid ppl 69.54
[Epoch 669 Batch 200/372] loss 3.82, ppl 45.49, throughput 159.27 samples/s, lr 0.00
[Epoch 669] throughput 11211.44 samples/s
[Epoch 669] time cost 227.26s, valid loss 4.24, valid ppl 69.54
[Epoch 670 Batch 200/372] loss 3.81, ppl 45.34, throughput 164.77 samples/s, lr 0.00
[Epoch 670] throughput 11307.08 samples/s
[Epoch 670] time cost 225.72s, valid loss 4.24, valid ppl 69.54
[Epoch 671 Batch 200/372] loss 3.82, ppl 45.40, throughput 161.17 samples/s, lr 0.00
[Epoch 671] throughput 11181.81 samples/s
[Epoch 671] time cost 227.71s, valid loss 4.24, valid ppl 69.54
[Epoch 672 Batch 200/372] loss 3.81, ppl 45.33, throughput 161.87 samples/s, lr 0.00
[Epoch 672] throughput 10976.62 samples/s
[Epoch 672] time cost 231.03s, valid loss 4.24, valid ppl 69.54
[Epoch 673 Batch 200/372] loss 3.82, ppl 45.44, throughput 160.51 samples/s, lr 0.00
[Epoch 673] throughput 11024.63 samples/s
[Epoch 673] time cost 230.37s, valid loss 4.24, valid ppl 69.54
[Epoch 674 Batch 200/372] loss 3.82, ppl 45.45, throughput 163.65 samples/s, lr 0.00
[Epoch 674] throughput 11136.98 samples/s
[Epoch 674] time cost 228.01s, valid loss 4.24, valid ppl 69.54
[Epoch 675 Batch 200/372] loss 3.82, ppl 45.41, throughput 160.76 samples/s, lr 0.00
[Epoch 675] throughput 11203.62 samples/s
[Epoch 675] time cost 227.64s, valid loss 4.24, valid ppl 69.54
[Epoch 676 Batch 200/372] loss 3.82, ppl 45.38, throughput 161.34 samples/s, lr 0.00
[Epoch 676] throughput 11065.11 samples/s
[Epoch 676] time cost 229.36s, valid loss 4.24, valid ppl 69.54
[Epoch 677 Batch 200/372] loss 3.81, ppl 45.28, throughput 165.63 samples/s, lr 0.00
[Epoch 677] throughput 11104.09 samples/s
[Epoch 677] time cost 228.72s, valid loss 4.24, valid ppl 69.54
[Epoch 678 Batch 200/372] loss 3.82, ppl 45.47, throughput 163.07 samples/s, lr 0.00
[Epoch 678] throughput 10891.54 samples/s
[Epoch 678] time cost 232.86s, valid loss 4.24, valid ppl 69.54
[Epoch 679 Batch 200/372] loss 3.82, ppl 45.40, throughput 159.30 samples/s, lr 0.00
[Epoch 679] throughput 10957.26 samples/s
[Epoch 679] time cost 231.86s, valid loss 4.24, valid ppl 69.54
[Epoch 680 Batch 200/372] loss 3.82, ppl 45.42, throughput 150.19 samples/s, lr 0.00
[Epoch 680] throughput 10691.16 samples/s
[Epoch 680] time cost 236.10s, valid loss 4.24, valid ppl 69.54
[Epoch 681 Batch 200/372] loss 3.82, ppl 45.59, throughput 153.93 samples/s, lr 0.00
[Epoch 681] throughput 10606.74 samples/s
[Epoch 681] time cost 237.69s, valid loss 4.24, valid ppl 69.54
[Epoch 682 Batch 200/372] loss 3.81, ppl 45.28, throughput 167.92 samples/s, lr 0.00
[Epoch 682] throughput 11348.96 samples/s
[Epoch 682] time cost 224.76s, valid loss 4.24, valid ppl 69.54
[Epoch 683 Batch 200/372] loss 3.81, ppl 45.35, throughput 161.79 samples/s, lr 0.00
[Epoch 683] throughput 10832.54 samples/s
[Epoch 683] time cost 233.43s, valid loss 4.24, valid ppl 69.54
[Epoch 684 Batch 200/372] loss 3.81, ppl 45.24, throughput 157.19 samples/s, lr 0.00
[Epoch 684] throughput 10664.33 samples/s
[Epoch 684] time cost 236.59s, valid loss 4.24, valid ppl 69.54
[Epoch 685 Batch 200/372] loss 3.81, ppl 45.34, throughput 160.82 samples/s, lr 0.00
[Epoch 685] throughput 11111.03 samples/s
[Epoch 685] time cost 228.77s, valid loss 4.24, valid ppl 69.54
[Epoch 686 Batch 200/372] loss 3.81, ppl 45.14, throughput 161.84 samples/s, lr 0.00
[Epoch 686] throughput 11180.81 samples/s
[Epoch 686] time cost 228.14s, valid loss 4.24, valid ppl 69.54
[Epoch 687 Batch 200/372] loss 3.82, ppl 45.43, throughput 160.27 samples/s, lr 0.00
[Epoch 687] throughput 10805.27 samples/s
[Epoch 687] time cost 234.18s, valid loss 4.24, valid ppl 69.54
[Epoch 688 Batch 200/372] loss 3.82, ppl 45.53, throughput 159.82 samples/s, lr 0.00
[Epoch 688] throughput 11242.43 samples/s
[Epoch 688] time cost 226.70s, valid loss 4.24, valid ppl 69.54
[Epoch 689 Batch 200/372] loss 3.82, ppl 45.42, throughput 156.76 samples/s, lr 0.00
[Epoch 689] throughput 10963.80 samples/s
[Epoch 689] time cost 231.34s, valid loss 4.24, valid ppl 69.54
[Epoch 690 Batch 200/372] loss 3.81, ppl 45.24, throughput 157.28 samples/s, lr 0.00
[Epoch 690] throughput 10580.87 samples/s
[Epoch 690] time cost 238.34s, valid loss 4.24, valid ppl 69.54
[Epoch 691 Batch 200/372] loss 3.82, ppl 45.61, throughput 160.88 samples/s, lr 0.00
[Epoch 691] throughput 10880.56 samples/s
[Epoch 691] time cost 232.50s, valid loss 4.24, valid ppl 69.54
[Epoch 692 Batch 200/372] loss 3.82, ppl 45.40, throughput 154.49 samples/s, lr 0.00
[Epoch 692] throughput 10985.72 samples/s
[Epoch 692] time cost 230.98s, valid loss 4.24, valid ppl 69.54
[Epoch 693 Batch 200/372] loss 3.82, ppl 45.48, throughput 156.11 samples/s, lr 0.00
[Epoch 693] throughput 10934.78 samples/s
[Epoch 693] time cost 232.01s, valid loss 4.24, valid ppl 69.54
[Epoch 694 Batch 200/372] loss 3.81, ppl 45.28, throughput 163.49 samples/s, lr 0.00
[Epoch 694] throughput 11290.55 samples/s
[Epoch 694] time cost 226.27s, valid loss 4.24, valid ppl 69.54
[Epoch 695 Batch 200/372] loss 3.82, ppl 45.46, throughput 162.27 samples/s, lr 0.00
[Epoch 695] throughput 11022.32 samples/s
[Epoch 695] time cost 230.56s, valid loss 4.24, valid ppl 69.54
[Epoch 696 Batch 200/372] loss 3.82, ppl 45.48, throughput 160.55 samples/s, lr 0.00
[Epoch 696] throughput 11023.66 samples/s
[Epoch 696] time cost 230.60s, valid loss 4.24, valid ppl 69.54
[Epoch 697 Batch 200/372] loss 3.81, ppl 45.15, throughput 162.52 samples/s, lr 0.00
[Epoch 697] throughput 10693.22 samples/s
[Epoch 697] time cost 236.10s, valid loss 4.24, valid ppl 69.54
Learning rate after interval update 0.000000
[Epoch 698 Batch 200/372] loss 3.81, ppl 45.23, throughput 159.72 samples/s, lr 0.00
[Epoch 698] throughput 10786.26 samples/s
[Epoch 698] time cost 234.81s, valid loss 4.24, valid ppl 69.54
[Epoch 699 Batch 200/372] loss 3.82, ppl 45.44, throughput 165.46 samples/s, lr 0.00
[Epoch 699] throughput 10949.17 samples/s
[Epoch 699] time cost 231.78s, valid loss 4.24, valid ppl 69.54
[Epoch 700 Batch 200/372] loss 3.81, ppl 45.26, throughput 158.93 samples/s, lr 0.00
[Epoch 700] throughput 10441.60 samples/s
[Epoch 700] time cost 241.50s, valid loss 4.24, valid ppl 69.54
[Epoch 701 Batch 200/372] loss 3.81, ppl 45.03, throughput 156.34 samples/s, lr 0.00
[Epoch 701] throughput 11098.86 samples/s
[Epoch 701] time cost 229.56s, valid loss 4.24, valid ppl 69.54
[Epoch 702 Batch 200/372] loss 3.81, ppl 45.34, throughput 162.10 samples/s, lr 0.00
[Epoch 702] throughput 10949.12 samples/s
[Epoch 702] time cost 231.82s, valid loss 4.24, valid ppl 69.54
[Epoch 703 Batch 200/372] loss 3.82, ppl 45.57, throughput 157.66 samples/s, lr 0.00
[Epoch 703] throughput 10968.87 samples/s
[Epoch 703] time cost 231.88s, valid loss 4.24, valid ppl 69.54
[Epoch 704 Batch 200/372] loss 3.82, ppl 45.58, throughput 164.74 samples/s, lr 0.00
[Epoch 704] throughput 11032.69 samples/s
[Epoch 704] time cost 230.32s, valid loss 4.24, valid ppl 69.54
[Epoch 705 Batch 200/372] loss 3.82, ppl 45.40, throughput 162.86 samples/s, lr 0.00
[Epoch 705] throughput 11067.17 samples/s
[Epoch 705] time cost 230.10s, valid loss 4.24, valid ppl 69.54
[Epoch 706 Batch 200/372] loss 3.81, ppl 45.37, throughput 162.30 samples/s, lr 0.00
[Epoch 706] throughput 10991.63 samples/s
[Epoch 706] time cost 231.25s, valid loss 4.24, valid ppl 69.54
[Epoch 707 Batch 200/372] loss 3.81, ppl 45.35, throughput 162.38 samples/s, lr 0.00
[Epoch 707] throughput 11173.48 samples/s
[Epoch 707] time cost 227.90s, valid loss 4.24, valid ppl 69.54
[Epoch 708 Batch 200/372] loss 3.81, ppl 45.21, throughput 161.82 samples/s, lr 0.00
[Epoch 708] throughput 10894.72 samples/s
[Epoch 708] time cost 232.62s, valid loss 4.24, valid ppl 69.54
[Epoch 709 Batch 200/372] loss 3.82, ppl 45.62, throughput 165.82 samples/s, lr 0.00
[Epoch 709] throughput 11219.64 samples/s
[Epoch 709] time cost 227.38s, valid loss 4.24, valid ppl 69.54
[Epoch 710 Batch 200/372] loss 3.82, ppl 45.40, throughput 158.05 samples/s, lr 0.00
[Epoch 710] throughput 10740.92 samples/s
[Epoch 710] time cost 235.70s, valid loss 4.24, valid ppl 69.54
[Epoch 711 Batch 200/372] loss 3.81, ppl 45.36, throughput 159.27 samples/s, lr 0.00
[Epoch 711] throughput 10866.53 samples/s
[Epoch 711] time cost 233.16s, valid loss 4.24, valid ppl 69.54
[Epoch 712 Batch 200/372] loss 3.81, ppl 45.22, throughput 163.42 samples/s, lr 0.00
[Epoch 712] throughput 10945.50 samples/s
[Epoch 712] time cost 231.76s, valid loss 4.24, valid ppl 69.54
[Epoch 713 Batch 200/372] loss 3.81, ppl 45.32, throughput 161.25 samples/s, lr 0.00
[Epoch 713] throughput 10794.14 samples/s
[Epoch 713] time cost 234.38s, valid loss 4.24, valid ppl 69.54
[Epoch 714 Batch 200/372] loss 3.81, ppl 45.35, throughput 165.57 samples/s, lr 0.00
[Epoch 714] throughput 11046.60 samples/s
[Epoch 714] time cost 229.98s, valid loss 4.24, valid ppl 69.54
[Epoch 715 Batch 200/372] loss 3.81, ppl 45.27, throughput 159.17 samples/s, lr 0.00
[Epoch 715] throughput 11120.03 samples/s
[Epoch 715] time cost 228.15s, valid loss 4.24, valid ppl 69.54
[Epoch 716 Batch 200/372] loss 3.82, ppl 45.62, throughput 161.53 samples/s, lr 0.00
[Epoch 716] throughput 10803.25 samples/s
[Epoch 716] time cost 234.34s, valid loss 4.24, valid ppl 69.54
[Epoch 717 Batch 200/372] loss 3.82, ppl 45.41, throughput 162.17 samples/s, lr 0.00
[Epoch 717] throughput 11029.39 samples/s
[Epoch 717] time cost 230.37s, valid loss 4.24, valid ppl 69.54
[Epoch 718 Batch 200/372] loss 3.82, ppl 45.41, throughput 161.78 samples/s, lr 0.00
[Epoch 718] throughput 11056.10 samples/s
[Epoch 718] time cost 230.33s, valid loss 4.24, valid ppl 69.54
[Epoch 719 Batch 200/372] loss 3.82, ppl 45.44, throughput 162.90 samples/s, lr 0.00
[Epoch 719] throughput 11588.67 samples/s
[Epoch 719] time cost 221.05s, valid loss 4.24, valid ppl 69.54
[Epoch 720 Batch 200/372] loss 3.81, ppl 45.35, throughput 162.41 samples/s, lr 0.00
[Epoch 720] throughput 11210.51 samples/s
[Epoch 720] time cost 227.08s, valid loss 4.24, valid ppl 69.54
[Epoch 721 Batch 200/372] loss 3.81, ppl 45.33, throughput 159.11 samples/s, lr 0.00
[Epoch 721] throughput 11260.27 samples/s
[Epoch 721] time cost 226.24s, valid loss 4.24, valid ppl 69.54
[Epoch 722 Batch 200/372] loss 3.82, ppl 45.69, throughput 156.86 samples/s, lr 0.00
[Epoch 722] throughput 10769.65 samples/s
[Epoch 722] time cost 234.70s, valid loss 4.24, valid ppl 69.54
[Epoch 723 Batch 200/372] loss 3.81, ppl 45.36, throughput 162.67 samples/s, lr 0.00
[Epoch 723] throughput 10776.15 samples/s
[Epoch 723] time cost 234.87s, valid loss 4.24, valid ppl 69.54
[Epoch 724 Batch 200/372] loss 3.81, ppl 45.20, throughput 163.84 samples/s, lr 0.00
[Epoch 724] throughput 11169.05 samples/s
[Epoch 724] time cost 227.70s, valid loss 4.24, valid ppl 69.54
[Epoch 725 Batch 200/372] loss 3.82, ppl 45.43, throughput 168.63 samples/s, lr 0.00
[Epoch 725] throughput 11127.46 samples/s
[Epoch 725] time cost 228.94s, valid loss 4.24, valid ppl 69.54
[Epoch 726 Batch 200/372] loss 3.82, ppl 45.41, throughput 157.97 samples/s, lr 0.00
[Epoch 726] throughput 10648.66 samples/s
[Epoch 726] time cost 237.11s, valid loss 4.24, valid ppl 69.54
[Epoch 727 Batch 200/372] loss 3.82, ppl 45.45, throughput 157.04 samples/s, lr 0.00
[Epoch 727] throughput 10597.94 samples/s
[Epoch 727] time cost 237.91s, valid loss 4.24, valid ppl 69.54
Learning rate after interval update 0.000000
[Epoch 728 Batch 200/372] loss 3.82, ppl 45.44, throughput 156.60 samples/s, lr 0.00
[Epoch 728] throughput 10632.51 samples/s
[Epoch 728] time cost 237.41s, valid loss 4.24, valid ppl 69.54
[Epoch 729 Batch 200/372] loss 3.81, ppl 45.30, throughput 151.71 samples/s, lr 0.00
[Epoch 729] throughput 10481.83 samples/s
[Epoch 729] time cost 240.61s, valid loss 4.24, valid ppl 69.54
[Epoch 730 Batch 200/372] loss 3.82, ppl 45.39, throughput 158.74 samples/s, lr 0.00
[Epoch 730] throughput 10750.89 samples/s
[Epoch 730] time cost 235.08s, valid loss 4.24, valid ppl 69.54
[Epoch 731 Batch 200/372] loss 3.81, ppl 45.29, throughput 159.81 samples/s, lr 0.00
[Epoch 731] throughput 10755.71 samples/s
[Epoch 731] time cost 234.79s, valid loss 4.24, valid ppl 69.54
[Epoch 732 Batch 200/372] loss 3.82, ppl 45.41, throughput 153.84 samples/s, lr 0.00
[Epoch 732] throughput 10420.98 samples/s
[Epoch 732] time cost 241.26s, valid loss 4.24, valid ppl 69.54
[Epoch 733 Batch 200/372] loss 3.81, ppl 45.36, throughput 158.00 samples/s, lr 0.00
[Epoch 733] throughput 10996.68 samples/s
[Epoch 733] time cost 230.69s, valid loss 4.24, valid ppl 69.54
[Epoch 734 Batch 200/372] loss 3.81, ppl 45.34, throughput 160.98 samples/s, lr 0.00
[Epoch 734] throughput 10955.26 samples/s
[Epoch 734] time cost 231.58s, valid loss 4.24, valid ppl 69.54
[Epoch 735 Batch 200/372] loss 3.82, ppl 45.38, throughput 159.51 samples/s, lr 0.00
[Epoch 735] throughput 10716.44 samples/s
[Epoch 735] time cost 235.61s, valid loss 4.24, valid ppl 69.54
[Epoch 736 Batch 200/372] loss 3.82, ppl 45.41, throughput 158.35 samples/s, lr 0.00
[Epoch 736] throughput 10782.22 samples/s
[Epoch 736] time cost 234.50s, valid loss 4.24, valid ppl 69.54
[Epoch 737 Batch 200/372] loss 3.81, ppl 45.31, throughput 162.68 samples/s, lr 0.00
[Epoch 737] throughput 10880.44 samples/s
[Epoch 737] time cost 232.77s, valid loss 4.24, valid ppl 69.54
[Epoch 738 Batch 200/372] loss 3.81, ppl 45.28, throughput 156.46 samples/s, lr 0.00
[Epoch 738] throughput 10680.20 samples/s
[Epoch 738] time cost 236.25s, valid loss 4.24, valid ppl 69.54
[Epoch 739 Batch 200/372] loss 3.81, ppl 45.21, throughput 156.12 samples/s, lr 0.00
[Epoch 739] throughput 11100.00 samples/s
[Epoch 739] time cost 229.34s, valid loss 4.24, valid ppl 69.54
[Epoch 740 Batch 200/372] loss 3.81, ppl 45.36, throughput 166.96 samples/s, lr 0.00
[Epoch 740] throughput 10971.52 samples/s
[Epoch 740] time cost 231.21s, valid loss 4.24, valid ppl 69.54
[Epoch 741 Batch 200/372] loss 3.81, ppl 45.16, throughput 157.99 samples/s, lr 0.00
[Epoch 741] throughput 10856.63 samples/s
[Epoch 741] time cost 232.70s, valid loss 4.24, valid ppl 69.54
[Epoch 742 Batch 200/372] loss 3.82, ppl 45.43, throughput 165.11 samples/s, lr 0.00
[Epoch 742] throughput 11173.60 samples/s
[Epoch 742] time cost 228.14s, valid loss 4.24, valid ppl 69.54
[Epoch 743 Batch 200/372] loss 3.81, ppl 45.17, throughput 162.79 samples/s, lr 0.00
[Epoch 743] throughput 11224.96 samples/s
[Epoch 743] time cost 226.98s, valid loss 4.24, valid ppl 69.54
[Epoch 744 Batch 200/372] loss 3.81, ppl 45.15, throughput 163.10 samples/s, lr 0.00
[Epoch 744] throughput 10852.12 samples/s
[Epoch 744] time cost 233.56s, valid loss 4.24, valid ppl 69.54
[Epoch 745 Batch 200/372] loss 3.81, ppl 45.21, throughput 161.73 samples/s, lr 0.00
[Epoch 745] throughput 10929.88 samples/s
[Epoch 745] time cost 231.90s, valid loss 4.24, valid ppl 69.54
[Epoch 746 Batch 200/372] loss 3.82, ppl 45.69, throughput 161.43 samples/s, lr 0.00
[Epoch 746] throughput 10844.52 samples/s
[Epoch 746] time cost 233.87s, valid loss 4.24, valid ppl 69.54
[Epoch 747 Batch 200/372] loss 3.82, ppl 45.40, throughput 157.04 samples/s, lr 0.00
[Epoch 747] throughput 10877.51 samples/s
[Epoch 747] time cost 233.07s, valid loss 4.24, valid ppl 69.54
[Epoch 748 Batch 200/372] loss 3.81, ppl 45.32, throughput 163.48 samples/s, lr 0.00
[Epoch 748] throughput 10990.50 samples/s
[Epoch 748] time cost 231.02s, valid loss 4.24, valid ppl 69.54
[Epoch 749 Batch 200/372] loss 3.82, ppl 45.44, throughput 160.96 samples/s, lr 0.00
[Epoch 749] throughput 11025.73 samples/s
[Epoch 749] time cost 230.81s, valid loss 4.24, valid ppl 69.54
Total training throughput 6987.74 samples/s
Best validation loss 4.23, val ppl 68.71
Best test loss 4.18, test ppl 65.62
Total time cost 224254.95s
You can’t perform that action at this time.