Skip to content
Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
2347 lines (2346 sloc) 143 KB
Use AWDRNN
[Epoch 0 Batch 200/372] loss 7.74, ppl 2308.73, throughput 314.27 samples/s, lr 28.71
[Epoch 0] throughput 21812.05 samples/s
[Epoch 0] time cost 136.78s, valid loss 6.39, valid ppl 593.11
test loss 6.31, test ppl 552.01
[Epoch 1 Batch 200/372] loss 6.67, ppl 787.95, throughput 201.80 samples/s, lr 28.71
[Epoch 1] throughput 16735.23 samples/s
[Epoch 1] time cost 165.61s, valid loss 6.06, valid ppl 426.34
test loss 5.99, test ppl 399.37
[Epoch 2 Batch 200/372] loss 6.30, ppl 541.98, throughput 317.35 samples/s, lr 27.00
[Epoch 2] throughput 21691.66 samples/s
[Epoch 2] time cost 137.50s, valid loss 5.74, valid ppl 310.75
test loss 5.66, test ppl 288.55
[Epoch 3 Batch 200/372] loss 6.05, ppl 425.63, throughput 215.54 samples/s, lr 30.43
[Epoch 3] throughput 15223.53 samples/s
[Epoch 3] time cost 178.97s, valid loss 5.59, valid ppl 266.72
test loss 5.51, test ppl 246.45
[Epoch 4 Batch 200/372] loss 5.85, ppl 348.88, throughput 356.42 samples/s, lr 29.14
[Epoch 4] throughput 24490.41 samples/s
[Epoch 4] time cost 126.46s, valid loss 5.43, valid ppl 229.10
test loss 5.36, test ppl 212.40
[Epoch 5 Batch 200/372] loss 5.71, ppl 301.35, throughput 353.42 samples/s, lr 33.00
[Epoch 5] throughput 23882.20 samples/s
[Epoch 5] time cost 128.66s, valid loss 5.30, valid ppl 200.72
test loss 5.23, test ppl 185.97
[Epoch 6 Batch 200/372] loss 5.58, ppl 265.81, throughput 227.66 samples/s, lr 25.71
[Epoch 6] throughput 15637.85 samples/s
[Epoch 6] time cost 174.88s, valid loss 5.27, valid ppl 193.50
test loss 5.19, test ppl 179.14
[Epoch 7 Batch 200/372] loss 5.49, ppl 242.42, throughput 363.18 samples/s, lr 30.86
[Epoch 7] throughput 24772.99 samples/s
[Epoch 7] time cost 126.22s, valid loss 5.13, valid ppl 168.52
test loss 5.06, test ppl 156.97
[Epoch 8 Batch 200/372] loss 5.40, ppl 220.98, throughput 352.50 samples/s, lr 33.86
[Epoch 8] throughput 23861.03 samples/s
[Epoch 8] time cost 129.07s, valid loss 5.12, valid ppl 166.75
test loss 5.05, test ppl 155.46
[Epoch 9 Batch 200/372] loss 5.33, ppl 206.37, throughput 184.66 samples/s, lr 29.14
[Epoch 9] throughput 15525.06 samples/s
[Epoch 9] time cost 175.47s, valid loss 4.99, valid ppl 147.03
test loss 4.92, test ppl 136.87
[Epoch 10 Batch 200/372] loss 5.26, ppl 193.15, throughput 360.80 samples/s, lr 27.86
[Epoch 10] throughput 24328.99 samples/s
[Epoch 10] time cost 126.96s, valid loss 4.96, valid ppl 143.00
test loss 4.89, test ppl 133.05
[Epoch 11 Batch 200/372] loss 5.20, ppl 181.81, throughput 361.45 samples/s, lr 32.14
[Epoch 11] throughput 24386.21 samples/s
[Epoch 11] time cost 127.12s, valid loss 4.91, valid ppl 135.12
test loss 4.84, test ppl 126.28
[Epoch 12 Batch 200/372] loss 5.15, ppl 172.67, throughput 210.69 samples/s, lr 26.57
[Epoch 12] throughput 18078.17 samples/s
[Epoch 12] time cost 156.74s, valid loss 4.92, valid ppl 137.48
[Epoch 13 Batch 200/372] loss 5.10, ppl 164.12, throughput 354.20 samples/s, lr 28.29
[Epoch 13] throughput 23980.18 samples/s
[Epoch 13] time cost 128.90s, valid loss 4.88, valid ppl 131.12
test loss 4.81, test ppl 122.98
[Epoch 14 Batch 200/372] loss 5.06, ppl 157.58, throughput 351.50 samples/s, lr 31.71
[Epoch 14] throughput 24450.49 samples/s
[Epoch 14] time cost 126.88s, valid loss 4.86, valid ppl 128.76
test loss 4.79, test ppl 120.47
[Epoch 15 Batch 200/372] loss 5.01, ppl 149.60, throughput 187.93 samples/s, lr 32.14
[Epoch 15] throughput 16645.97 samples/s
[Epoch 15] time cost 166.67s, valid loss 4.82, valid ppl 124.38
test loss 4.76, test ppl 116.55
[Epoch 16 Batch 200/372] loss 4.98, ppl 145.24, throughput 363.18 samples/s, lr 30.86
[Epoch 16] throughput 24775.03 samples/s
[Epoch 16] time cost 125.18s, valid loss 4.77, valid ppl 117.72
test loss 4.70, test ppl 110.32
[Epoch 17 Batch 200/372] loss 4.94, ppl 139.39, throughput 354.75 samples/s, lr 29.57
[Epoch 17] throughput 23907.22 samples/s
[Epoch 17] time cost 128.06s, valid loss 4.75, valid ppl 115.29
test loss 4.68, test ppl 107.82
[Epoch 18 Batch 200/372] loss 4.90, ppl 134.34, throughput 220.18 samples/s, lr 28.29
[Epoch 18] throughput 18349.57 samples/s
[Epoch 18] time cost 155.16s, valid loss 4.73, valid ppl 113.66
test loss 4.67, test ppl 106.30
[Epoch 19 Batch 200/372] loss 4.87, ppl 130.09, throughput 362.99 samples/s, lr 31.71
[Epoch 19] throughput 24569.11 samples/s
[Epoch 19] time cost 126.71s, valid loss 4.72, valid ppl 111.99
test loss 4.65, test ppl 104.91
[Epoch 20 Batch 200/372] loss 4.84, ppl 126.47, throughput 359.79 samples/s, lr 25.71
[Epoch 20] throughput 17628.52 samples/s
[Epoch 20] time cost 160.87s, valid loss 4.71, valid ppl 111.18
test loss 4.65, test ppl 104.52
[Epoch 21 Batch 200/372] loss 4.82, ppl 123.69, throughput 347.41 samples/s, lr 18.00
[Epoch 21] throughput 23584.06 samples/s
[Epoch 21] time cost 130.46s, valid loss 4.69, valid ppl 108.34
test loss 4.62, test ppl 101.62
[Epoch 22 Batch 200/372] loss 4.79, ppl 120.25, throughput 354.54 samples/s, lr 27.86
[Epoch 22] throughput 24101.15 samples/s
[Epoch 22] time cost 127.93s, valid loss 4.68, valid ppl 107.32
test loss 4.61, test ppl 100.18
[Epoch 23 Batch 200/372] loss 4.77, ppl 117.43, throughput 246.56 samples/s, lr 31.71
[Epoch 23] throughput 15592.37 samples/s
[Epoch 23] time cost 175.05s, valid loss 4.68, valid ppl 108.16
[Epoch 24 Batch 200/372] loss 4.74, ppl 114.35, throughput 359.19 samples/s, lr 30.86
[Epoch 24] throughput 24556.05 samples/s
[Epoch 24] time cost 126.08s, valid loss 4.65, valid ppl 104.52
test loss 4.58, test ppl 97.86
[Epoch 25 Batch 200/372] loss 4.72, ppl 112.41, throughput 368.87 samples/s, lr 32.57
[Epoch 25] throughput 24616.92 samples/s
[Epoch 25] time cost 126.05s, valid loss 4.67, valid ppl 106.60
[Epoch 26 Batch 200/372] loss 4.70, ppl 109.75, throughput 275.77 samples/s, lr 31.71
[Epoch 26] throughput 15559.96 samples/s
[Epoch 26] time cost 175.77s, valid loss 4.63, valid ppl 102.42
test loss 4.57, test ppl 96.17
[Epoch 27 Batch 200/372] loss 4.68, ppl 107.45, throughput 354.88 samples/s, lr 33.86
[Epoch 27] throughput 24378.12 samples/s
[Epoch 27] time cost 126.79s, valid loss 4.63, valid ppl 102.79
[Epoch 28 Batch 200/372] loss 4.66, ppl 105.39, throughput 351.41 samples/s, lr 33.00
[Epoch 28] throughput 24331.75 samples/s
[Epoch 28] time cost 127.46s, valid loss 4.63, valid ppl 102.44
[Epoch 29 Batch 200/372] loss 4.64, ppl 103.71, throughput 360.80 samples/s, lr 29.14
[Epoch 29] throughput 15861.13 samples/s
[Epoch 29] time cost 173.27s, valid loss 4.65, valid ppl 104.94
[Epoch 30 Batch 200/372] loss 4.62, ppl 101.17, throughput 369.38 samples/s, lr 31.29
[Epoch 30] throughput 24888.10 samples/s
[Epoch 30] time cost 124.55s, valid loss 4.61, valid ppl 100.30
test loss 4.55, test ppl 94.26
[Epoch 31 Batch 200/372] loss 4.61, ppl 100.58, throughput 363.89 samples/s, lr 30.86
[Epoch 31] throughput 24736.95 samples/s
[Epoch 31] time cost 125.12s, valid loss 4.60, valid ppl 99.85
test loss 4.54, test ppl 93.94
[Epoch 32 Batch 200/372] loss 4.59, ppl 98.44, throughput 355.93 samples/s, lr 27.43
[Epoch 32] throughput 16923.24 samples/s
[Epoch 32] time cost 164.96s, valid loss 4.60, valid ppl 99.78
test loss 4.54, test ppl 93.88
[Epoch 33 Batch 200/372] loss 4.57, ppl 96.80, throughput 354.66 samples/s, lr 29.57
[Epoch 33] throughput 23953.67 samples/s
[Epoch 33] time cost 128.88s, valid loss 4.59, valid ppl 98.31
test loss 4.53, test ppl 92.69
[Epoch 34 Batch 200/372] loss 4.57, ppl 96.29, throughput 353.98 samples/s, lr 29.14
[Epoch 34] throughput 24388.61 samples/s
[Epoch 34] time cost 127.45s, valid loss 4.60, valid ppl 99.80
[Epoch 35 Batch 200/372] loss 4.55, ppl 94.47, throughput 353.64 samples/s, lr 28.29
[Epoch 35] throughput 15619.84 samples/s
[Epoch 35] time cost 174.99s, valid loss 4.60, valid ppl 99.90
[Epoch 36 Batch 200/372] loss 4.54, ppl 93.32, throughput 350.33 samples/s, lr 28.71
[Epoch 36] throughput 24032.20 samples/s
[Epoch 36] time cost 128.81s, valid loss 4.61, valid ppl 100.44
[Epoch 37 Batch 200/372] loss 4.53, ppl 92.42, throughput 359.33 samples/s, lr 29.57
[Epoch 37] throughput 24409.68 samples/s
[Epoch 37] time cost 127.29s, valid loss 4.60, valid ppl 99.30
[Epoch 38 Batch 200/372] loss 4.50, ppl 90.31, throughput 353.43 samples/s, lr 28.29
[Epoch 38] throughput 19096.60 samples/s
[Epoch 38] time cost 151.63s, valid loss 4.58, valid ppl 97.09
test loss 4.51, test ppl 91.35
[Epoch 39 Batch 200/372] loss 4.50, ppl 90.14, throughput 355.96 samples/s, lr 30.86
[Epoch 39] throughput 23735.17 samples/s
[Epoch 39] time cost 130.29s, valid loss 4.57, valid ppl 96.41
test loss 4.51, test ppl 90.74
[Epoch 40 Batch 200/372] loss 4.49, ppl 88.73, throughput 348.87 samples/s, lr 30.43
[Epoch 40] throughput 24292.64 samples/s
[Epoch 40] time cost 127.43s, valid loss 4.57, valid ppl 96.97
[Epoch 41 Batch 200/372] loss 4.47, ppl 87.02, throughput 357.52 samples/s, lr 30.86
[Epoch 41] throughput 15894.23 samples/s
[Epoch 41] time cost 172.66s, valid loss 4.57, valid ppl 96.67
[Epoch 42 Batch 200/372] loss 4.45, ppl 85.75, throughput 351.97 samples/s, lr 29.57
[Epoch 42] throughput 23830.11 samples/s
[Epoch 42] time cost 128.76s, valid loss 4.56, valid ppl 95.60
test loss 4.50, test ppl 90.19
[Epoch 43 Batch 200/372] loss 4.45, ppl 85.62, throughput 362.58 samples/s, lr 32.57
[Epoch 43] throughput 24109.90 samples/s
[Epoch 43] time cost 127.97s, valid loss 4.57, valid ppl 96.57
[Epoch 44 Batch 200/372] loss 4.44, ppl 85.12, throughput 352.53 samples/s, lr 30.43
[Epoch 44] throughput 16980.48 samples/s
[Epoch 44] time cost 164.31s, valid loss 4.58, valid ppl 97.24
[Epoch 45 Batch 200/372] loss 4.43, ppl 83.58, throughput 357.26 samples/s, lr 30.86
[Epoch 45] throughput 24386.10 samples/s
[Epoch 45] time cost 127.46s, valid loss 4.55, valid ppl 94.53
test loss 4.49, test ppl 88.71
[Epoch 46 Batch 200/372] loss 4.42, ppl 83.14, throughput 354.41 samples/s, lr 30.00
[Epoch 46] throughput 23909.73 samples/s
[Epoch 46] time cost 129.65s, valid loss 4.57, valid ppl 96.20
[Epoch 47 Batch 200/372] loss 4.40, ppl 81.70, throughput 359.16 samples/s, lr 29.57
[Epoch 47] throughput 19149.31 samples/s
[Epoch 47] time cost 150.93s, valid loss 4.56, valid ppl 95.18
[Epoch 48 Batch 200/372] loss 4.39, ppl 80.70, throughput 306.85 samples/s, lr 32.14
[Epoch 48] throughput 22258.54 samples/s
[Epoch 48] time cost 134.80s, valid loss 4.56, valid ppl 95.85
[Epoch 49 Batch 200/372] loss 4.39, ppl 80.39, throughput 347.67 samples/s, lr 33.00
[Epoch 49] throughput 24079.87 samples/s
[Epoch 49] time cost 128.15s, valid loss 4.55, valid ppl 94.24
test loss 4.49, test ppl 88.83
[Epoch 50 Batch 200/372] loss 4.38, ppl 79.91, throughput 374.16 samples/s, lr 28.71
[Epoch 50] throughput 21507.73 samples/s
[Epoch 50] time cost 139.44s, valid loss 4.58, valid ppl 97.35
[Epoch 51 Batch 200/372] loss 4.38, ppl 79.96, throughput 262.73 samples/s, lr 30.00
[Epoch 51] throughput 20752.18 samples/s
[Epoch 51] time cost 141.66s, valid loss 4.54, valid ppl 93.87
test loss 4.48, test ppl 88.24
[Epoch 52 Batch 200/372] loss 4.37, ppl 78.99, throughput 351.18 samples/s, lr 27.86
[Epoch 52] throughput 24180.53 samples/s
[Epoch 52] time cost 127.88s, valid loss 4.55, valid ppl 94.94
[Epoch 53 Batch 200/372] loss 4.37, ppl 78.68, throughput 353.41 samples/s, lr 30.43
[Epoch 53] throughput 23973.34 samples/s
[Epoch 53] time cost 128.95s, valid loss 4.54, valid ppl 93.59
test loss 4.48, test ppl 88.35
[Epoch 54 Batch 200/372] loss 4.35, ppl 77.19, throughput 254.02 samples/s, lr 30.00
[Epoch 54] throughput 19888.39 samples/s
[Epoch 54] time cost 146.08s, valid loss 4.54, valid ppl 94.09
[Epoch 55 Batch 200/372] loss 4.35, ppl 77.69, throughput 360.36 samples/s, lr 27.43
[Epoch 55] throughput 24450.84 samples/s
[Epoch 55] time cost 126.66s, valid loss 4.56, valid ppl 95.65
[Epoch 56 Batch 200/372] loss 4.35, ppl 77.21, throughput 351.57 samples/s, lr 27.86
[Epoch 56] throughput 23498.22 samples/s
[Epoch 56] time cost 130.23s, valid loss 4.54, valid ppl 93.78
[Epoch 57 Batch 200/372] loss 4.33, ppl 76.22, throughput 191.94 samples/s, lr 33.00
[Epoch 57] throughput 16735.56 samples/s
[Epoch 57] time cost 166.32s, valid loss 4.54, valid ppl 93.56
test loss 4.48, test ppl 88.18
[Epoch 58 Batch 200/372] loss 4.34, ppl 76.49, throughput 362.73 samples/s, lr 31.29
[Epoch 58] throughput 24250.97 samples/s
[Epoch 58] time cost 127.40s, valid loss 4.54, valid ppl 93.57
[Epoch 59 Batch 200/372] loss 4.32, ppl 75.35, throughput 353.10 samples/s, lr 27.86
[Epoch 59] throughput 23540.26 samples/s
[Epoch 59] time cost 130.62s, valid loss 4.54, valid ppl 93.91
[Epoch 60 Batch 200/372] loss 4.31, ppl 74.48, throughput 217.26 samples/s, lr 34.29
[Epoch 60] throughput 18253.63 samples/s
[Epoch 60] time cost 155.11s, valid loss 4.54, valid ppl 93.53
test loss 4.48, test ppl 87.93
[Epoch 61 Batch 200/372] loss 4.31, ppl 74.36, throughput 357.37 samples/s, lr 30.43
[Epoch 61] throughput 24656.69 samples/s
[Epoch 61] time cost 126.62s, valid loss 4.53, valid ppl 92.77
test loss 4.47, test ppl 87.78
[Epoch 62 Batch 200/372] loss 4.30, ppl 73.93, throughput 355.97 samples/s, lr 26.57
[Epoch 62] throughput 21314.70 samples/s
[Epoch 62] time cost 140.97s, valid loss 4.53, valid ppl 92.80
[Epoch 63 Batch 200/372] loss 4.29, ppl 73.31, throughput 253.49 samples/s, lr 28.29
[Epoch 63] throughput 19899.23 samples/s
[Epoch 63] time cost 146.21s, valid loss 4.54, valid ppl 93.57
[Epoch 64 Batch 200/372] loss 4.29, ppl 73.26, throughput 394.80 samples/s, lr 30.86
[Epoch 64] throughput 26713.94 samples/s
[Epoch 64] time cost 119.58s, valid loss 4.52, valid ppl 92.00
test loss 4.46, test ppl 86.48
[Epoch 65 Batch 200/372] loss 4.28, ppl 72.20, throughput 392.87 samples/s, lr 27.43
[Epoch 65] throughput 24267.03 samples/s
[Epoch 65] time cost 128.28s, valid loss 4.53, valid ppl 92.78
[Epoch 66 Batch 200/372] loss 4.28, ppl 72.02, throughput 252.78 samples/s, lr 31.29
[Epoch 66] throughput 20675.87 samples/s
[Epoch 66] time cost 142.55s, valid loss 4.55, valid ppl 94.56
[Epoch 67 Batch 200/372] loss 4.28, ppl 72.09, throughput 359.25 samples/s, lr 30.86
[Epoch 67] throughput 24260.58 samples/s
[Epoch 67] time cost 127.35s, valid loss 4.56, valid ppl 95.83
[Epoch 68 Batch 200/372] loss 4.26, ppl 71.10, throughput 353.18 samples/s, lr 27.43
[Epoch 68] throughput 24189.24 samples/s
[Epoch 68] time cost 128.82s, valid loss 4.54, valid ppl 94.03
[Epoch 69 Batch 200/372] loss 4.26, ppl 71.11, throughput 174.00 samples/s, lr 34.29
[Epoch 69] throughput 15706.34 samples/s
[Epoch 69] time cost 174.03s, valid loss 4.52, valid ppl 91.80
test loss 4.46, test ppl 86.68
[Epoch 70 Batch 200/372] loss 4.26, ppl 70.64, throughput 361.86 samples/s, lr 28.71
[Epoch 70] throughput 25013.21 samples/s
[Epoch 70] time cost 125.22s, valid loss 4.54, valid ppl 93.51
[Epoch 71 Batch 200/372] loss 4.25, ppl 70.05, throughput 372.33 samples/s, lr 27.43
[Epoch 71] throughput 25018.43 samples/s
[Epoch 71] time cost 124.98s, valid loss 4.53, valid ppl 93.21
[Epoch 72 Batch 200/372] loss 4.25, ppl 69.87, throughput 227.66 samples/s, lr 29.57
[Epoch 72] throughput 15807.32 samples/s
[Epoch 72] time cost 173.97s, valid loss 4.52, valid ppl 91.44
test loss 4.45, test ppl 86.00
[Epoch 73 Batch 200/372] loss 4.25, ppl 70.09, throughput 362.66 samples/s, lr 29.57
[Epoch 73] throughput 25085.93 samples/s
[Epoch 73] time cost 124.38s, valid loss 4.53, valid ppl 92.34
[Epoch 74 Batch 200/372] loss 4.24, ppl 69.43, throughput 366.11 samples/s, lr 31.71
[Epoch 74] throughput 24605.20 samples/s
[Epoch 74] time cost 126.09s, valid loss 4.52, valid ppl 92.03
[Epoch 75 Batch 200/372] loss 4.24, ppl 69.14, throughput 246.13 samples/s, lr 32.14
[Epoch 75] throughput 15604.34 samples/s
[Epoch 75] time cost 175.05s, valid loss 4.51, valid ppl 91.03
test loss 4.45, test ppl 86.04
[Epoch 76 Batch 200/372] loss 4.23, ppl 68.64, throughput 357.76 samples/s, lr 31.29
[Epoch 76] throughput 24615.70 samples/s
[Epoch 76] time cost 126.30s, valid loss 4.51, valid ppl 90.95
test loss 4.46, test ppl 86.13
[Epoch 77 Batch 200/372] loss 4.22, ppl 68.26, throughput 363.33 samples/s, lr 28.29
[Epoch 77] throughput 24136.04 samples/s
[Epoch 77] time cost 127.89s, valid loss 4.52, valid ppl 91.51
[Epoch 78 Batch 200/372] loss 4.22, ppl 67.73, throughput 219.14 samples/s, lr 29.57
[Epoch 78] throughput 15651.66 samples/s
[Epoch 78] time cost 174.81s, valid loss 4.53, valid ppl 92.42
[Epoch 79 Batch 200/372] loss 4.22, ppl 67.73, throughput 360.73 samples/s, lr 30.00
[Epoch 79] throughput 24171.40 samples/s
[Epoch 79] time cost 127.15s, valid loss 4.51, valid ppl 91.01
[Epoch 80 Batch 200/372] loss 4.21, ppl 67.48, throughput 351.15 samples/s, lr 31.29
[Epoch 80] throughput 24129.69 samples/s
[Epoch 80] time cost 127.30s, valid loss 4.52, valid ppl 91.61
[Epoch 81 Batch 200/372] loss 4.20, ppl 66.96, throughput 313.93 samples/s, lr 26.14
[Epoch 81] throughput 15413.65 samples/s
[Epoch 81] time cost 177.18s, valid loss 4.53, valid ppl 92.81
[Epoch 82 Batch 200/372] loss 4.20, ppl 66.83, throughput 350.49 samples/s, lr 31.29
[Epoch 82] throughput 23959.00 samples/s
[Epoch 82] time cost 129.20s, valid loss 4.54, valid ppl 93.36
[Epoch 83 Batch 200/372] loss 4.21, ppl 67.28, throughput 358.05 samples/s, lr 28.71
[Epoch 83] throughput 23863.90 samples/s
[Epoch 83] time cost 129.18s, valid loss 4.51, valid ppl 91.34
[Epoch 84 Batch 200/372] loss 4.20, ppl 66.61, throughput 356.50 samples/s, lr 30.86
[Epoch 84] throughput 18363.52 samples/s
[Epoch 84] time cost 155.57s, valid loss 4.52, valid ppl 91.55
[Epoch 85 Batch 200/372] loss 4.20, ppl 66.42, throughput 355.99 samples/s, lr 26.14
[Epoch 85] throughput 23938.77 samples/s
[Epoch 85] time cost 129.13s, valid loss 4.54, valid ppl 93.23
[Epoch 86 Batch 200/372] loss 4.19, ppl 65.74, throughput 359.52 samples/s, lr 26.57
[Epoch 86] throughput 24263.24 samples/s
[Epoch 86] time cost 127.12s, valid loss 4.51, valid ppl 90.82
test loss 4.45, test ppl 86.02
[Epoch 87 Batch 200/372] loss 4.19, ppl 65.90, throughput 346.85 samples/s, lr 29.14
[Epoch 87] throughput 19064.29 samples/s
[Epoch 87] time cost 150.94s, valid loss 4.55, valid ppl 94.21
[Epoch 88 Batch 200/372] loss 4.18, ppl 65.46, throughput 350.46 samples/s, lr 31.71
[Epoch 88] throughput 24312.78 samples/s
[Epoch 88] time cost 127.55s, valid loss 4.54, valid ppl 94.09
[Epoch 89 Batch 200/372] loss 4.18, ppl 65.28, throughput 352.89 samples/s, lr 30.86
[Epoch 89] throughput 24215.04 samples/s
[Epoch 89] time cost 127.79s, valid loss 4.55, valid ppl 94.67
[Epoch 90 Batch 200/372] loss 4.18, ppl 65.44, throughput 357.70 samples/s, lr 31.29
[Epoch 90] throughput 20017.41 samples/s
[Epoch 90] time cost 147.38s, valid loss 4.52, valid ppl 92.04
[Epoch 91 Batch 200/372] loss 4.17, ppl 64.85, throughput 283.47 samples/s, lr 24.00
[Epoch 91] throughput 21420.48 samples/s
[Epoch 91] time cost 138.69s, valid loss 4.51, valid ppl 90.69
test loss 4.45, test ppl 85.72
[Epoch 92 Batch 200/372] loss 4.17, ppl 64.74, throughput 357.34 samples/s, lr 28.71
[Epoch 92] throughput 24458.29 samples/s
[Epoch 92] time cost 126.73s, valid loss 4.52, valid ppl 92.10
[Epoch 93 Batch 200/372] loss 4.17, ppl 64.43, throughput 358.21 samples/s, lr 31.29
[Epoch 93] throughput 23271.88 samples/s
[Epoch 93] time cost 131.32s, valid loss 4.50, valid ppl 90.35
test loss 4.45, test ppl 85.49
[Epoch 94 Batch 200/372] loss 4.17, ppl 64.48, throughput 313.79 samples/s, lr 13.29
[Epoch 94] throughput 22773.71 samples/s
[Epoch 94] time cost 133.01s, valid loss 4.53, valid ppl 93.00
[Epoch 95 Batch 200/372] loss 4.15, ppl 63.68, throughput 358.46 samples/s, lr 29.14
[Epoch 95] throughput 24362.25 samples/s
[Epoch 95] time cost 126.96s, valid loss 4.50, valid ppl 90.23
test loss 4.45, test ppl 85.53
[Epoch 96 Batch 200/372] loss 4.15, ppl 63.64, throughput 350.60 samples/s, lr 30.86
[Epoch 96] throughput 20747.04 samples/s
[Epoch 96] time cost 143.81s, valid loss 4.50, valid ppl 89.75
test loss 4.44, test ppl 84.95
[Epoch 97 Batch 200/372] loss 4.16, ppl 63.82, throughput 361.18 samples/s, lr 29.14
[Epoch 97] throughput 24316.63 samples/s
[Epoch 97] time cost 127.30s, valid loss 4.53, valid ppl 93.17
[Epoch 98 Batch 200/372] loss 4.15, ppl 63.75, throughput 352.48 samples/s, lr 28.29
[Epoch 98] throughput 24290.11 samples/s
[Epoch 98] time cost 127.05s, valid loss 4.53, valid ppl 92.91
[Epoch 99 Batch 200/372] loss 4.15, ppl 63.44, throughput 365.48 samples/s, lr 32.14
[Epoch 99] throughput 24995.11 samples/s
[Epoch 99] time cost 124.69s, valid loss 4.55, valid ppl 94.47
[Epoch 100 Batch 200/372] loss 4.16, ppl 63.99, throughput 173.55 samples/s, lr 29.57
[Epoch 100] throughput 15629.16 samples/s
[Epoch 100] time cost 174.66s, valid loss 4.52, valid ppl 91.60
[Epoch 101 Batch 200/372] loss 4.15, ppl 63.30, throughput 351.88 samples/s, lr 28.29
[Epoch 101] throughput 24155.85 samples/s
[Epoch 101] time cost 127.58s, valid loss 4.51, valid ppl 91.37
[Epoch 102 Batch 200/372] loss 4.14, ppl 62.59, throughput 356.93 samples/s, lr 32.14
[Epoch 102] throughput 23996.21 samples/s
[Epoch 102] time cost 128.49s, valid loss 4.53, valid ppl 92.31
[Epoch 103 Batch 200/372] loss 4.14, ppl 63.01, throughput 279.09 samples/s, lr 29.57
[Epoch 103] throughput 15745.16 samples/s
[Epoch 103] time cost 174.00s, valid loss 4.54, valid ppl 93.27
[Epoch 104 Batch 200/372] loss 4.14, ppl 62.76, throughput 374.87 samples/s, lr 28.71
[Epoch 104] throughput 25667.79 samples/s
[Epoch 104] time cost 122.76s, valid loss 4.54, valid ppl 93.29
[Epoch 105 Batch 200/372] loss 4.14, ppl 62.98, throughput 374.61 samples/s, lr 17.14
[Epoch 105] throughput 25418.81 samples/s
[Epoch 105] time cost 123.95s, valid loss 4.54, valid ppl 93.59
[Epoch 106 Batch 200/372] loss 4.13, ppl 62.41, throughput 372.69 samples/s, lr 28.71
[Epoch 106] throughput 18344.58 samples/s
[Epoch 106] time cost 156.26s, valid loss 4.52, valid ppl 91.70
[Epoch 107 Batch 200/372] loss 4.13, ppl 62.41, throughput 358.13 samples/s, lr 27.43
[Epoch 107] throughput 24255.34 samples/s
[Epoch 107] time cost 127.40s, valid loss 4.53, valid ppl 92.66
[Epoch 108 Batch 200/372] loss 4.13, ppl 62.30, throughput 354.49 samples/s, lr 28.71
[Epoch 108] throughput 24098.75 samples/s
[Epoch 108] time cost 127.94s, valid loss 4.51, valid ppl 90.49
[Epoch 109 Batch 200/372] loss 4.12, ppl 61.55, throughput 342.50 samples/s, lr 32.14
[Epoch 109] throughput 23731.62 samples/s
[Epoch 109] time cost 130.18s, valid loss 4.52, valid ppl 91.69
[Epoch 110 Batch 200/372] loss 4.12, ppl 61.69, throughput 266.21 samples/s, lr 27.43
[Epoch 110] throughput 20581.48 samples/s
[Epoch 110] time cost 143.36s, valid loss 4.50, valid ppl 90.19
[Epoch 111 Batch 200/372] loss 4.12, ppl 61.34, throughput 355.57 samples/s, lr 29.57
[Epoch 111] throughput 24558.18 samples/s
[Epoch 111] time cost 126.39s, valid loss 4.52, valid ppl 91.91
[Epoch 112 Batch 200/372] loss 4.11, ppl 61.06, throughput 374.05 samples/s, lr 31.71
[Epoch 112] throughput 25994.75 samples/s
[Epoch 112] time cost 121.66s, valid loss 4.54, valid ppl 93.23
[Epoch 113 Batch 200/372] loss 4.12, ppl 61.40, throughput 237.27 samples/s, lr 29.14
[Epoch 113] throughput 19656.97 samples/s
[Epoch 113] time cost 147.71s, valid loss 4.51, valid ppl 90.49
[Epoch 114 Batch 200/372] loss 4.11, ppl 61.00, throughput 371.38 samples/s, lr 32.14
[Epoch 114] throughput 25406.52 samples/s
[Epoch 114] time cost 124.08s, valid loss 4.51, valid ppl 90.51
[Epoch 115 Batch 200/372] loss 4.11, ppl 60.97, throughput 351.06 samples/s, lr 24.86
[Epoch 115] throughput 23981.58 samples/s
[Epoch 115] time cost 128.57s, valid loss 4.51, valid ppl 90.66
[Epoch 116 Batch 200/372] loss 4.10, ppl 60.46, throughput 256.24 samples/s, lr 27.86
[Epoch 116] throughput 15614.81 samples/s
[Epoch 116] time cost 175.63s, valid loss 4.52, valid ppl 92.17
[Epoch 117 Batch 200/372] loss 4.10, ppl 60.37, throughput 358.43 samples/s, lr 31.71
[Epoch 117] throughput 23957.10 samples/s
[Epoch 117] time cost 128.95s, valid loss 4.50, valid ppl 90.26
[Epoch 118 Batch 200/372] loss 4.10, ppl 60.12, throughput 350.85 samples/s, lr 28.29
[Epoch 118] throughput 23822.04 samples/s
[Epoch 118] time cost 129.19s, valid loss 4.52, valid ppl 91.53
[Epoch 119 Batch 200/372] loss 4.10, ppl 60.07, throughput 356.06 samples/s, lr 29.14
[Epoch 119] throughput 16643.42 samples/s
[Epoch 119] time cost 167.41s, valid loss 4.50, valid ppl 89.88
[Epoch 120 Batch 200/372] loss 4.10, ppl 60.25, throughput 354.92 samples/s, lr 27.43
[Epoch 120] throughput 23912.31 samples/s
[Epoch 120] time cost 128.58s, valid loss 4.53, valid ppl 93.10
[Epoch 121 Batch 200/372] loss 4.10, ppl 60.08, throughput 354.36 samples/s, lr 27.86
[Epoch 121] throughput 23797.78 samples/s
[Epoch 121] time cost 128.95s, valid loss 4.52, valid ppl 91.77
[Epoch 122 Batch 200/372] loss 4.10, ppl 60.27, throughput 361.54 samples/s, lr 28.71
[Epoch 122] throughput 22760.68 samples/s
[Epoch 122] time cost 135.52s, valid loss 4.50, valid ppl 89.83
[Epoch 123 Batch 200/372] loss 4.09, ppl 59.58, throughput 359.38 samples/s, lr 27.00
[Epoch 123] throughput 24249.69 samples/s
[Epoch 123] time cost 127.18s, valid loss 4.51, valid ppl 90.83
[Epoch 124 Batch 200/372] loss 4.08, ppl 59.39, throughput 350.54 samples/s, lr 27.00
[Epoch 124] throughput 24248.01 samples/s
[Epoch 124] time cost 126.73s, valid loss 4.50, valid ppl 90.28
[Epoch 125 Batch 200/372] loss 4.08, ppl 59.21, throughput 361.92 samples/s, lr 32.14
[Epoch 125] throughput 24788.18 samples/s
[Epoch 125] time cost 126.18s, valid loss 4.53, valid ppl 92.79
[Epoch 126 Batch 200/372] loss 4.08, ppl 58.95, throughput 180.07 samples/s, lr 29.57
[Epoch 126] throughput 16069.40 samples/s
[Epoch 126] time cost 171.05s, valid loss 4.55, valid ppl 94.18
Learning rate after interval update 3.000000
[Epoch 127 Batch 200/372] loss 4.07, ppl 58.60, throughput 359.54 samples/s, lr 2.83
[Epoch 127] throughput 24489.85 samples/s
[Epoch 127] time cost 126.55s, valid loss 4.46, valid ppl 86.68
test loss 4.41, test ppl 82.24
[Epoch 128 Batch 200/372] loss 4.04, ppl 56.90, throughput 362.49 samples/s, lr 3.17
[Epoch 128] throughput 24160.23 samples/s
[Epoch 128] time cost 128.08s, valid loss 4.46, valid ppl 86.61
test loss 4.41, test ppl 82.20
[Epoch 129 Batch 200/372] loss 4.01, ppl 55.42, throughput 359.92 samples/s, lr 2.70
[Epoch 129] throughput 24466.26 samples/s
[Epoch 129] time cost 127.02s, valid loss 4.46, valid ppl 86.28
test loss 4.41, test ppl 81.91
[Epoch 130 Batch 200/372] loss 4.01, ppl 55.24, throughput 359.28 samples/s, lr 3.21
[Epoch 130] throughput 24756.02 samples/s
[Epoch 130] time cost 126.53s, valid loss 4.46, valid ppl 86.29
[Epoch 131 Batch 200/372] loss 3.98, ppl 53.74, throughput 350.94 samples/s, lr 2.79
[Epoch 131] throughput 18014.97 samples/s
[Epoch 131] time cost 156.37s, valid loss 4.46, valid ppl 86.36
[Epoch 132 Batch 200/372] loss 3.99, ppl 54.04, throughput 357.61 samples/s, lr 3.00
[Epoch 132] throughput 24569.09 samples/s
[Epoch 132] time cost 125.88s, valid loss 4.46, valid ppl 86.36
[Epoch 133 Batch 200/372] loss 3.99, ppl 54.00, throughput 355.15 samples/s, lr 2.79
[Epoch 133] throughput 24426.18 samples/s
[Epoch 133] time cost 127.24s, valid loss 4.46, valid ppl 86.31
[Epoch 134 Batch 200/372] loss 3.98, ppl 53.40, throughput 354.18 samples/s, lr 1.71
[Epoch 134] throughput 18059.48 samples/s
[Epoch 134] time cost 157.50s, valid loss 4.46, valid ppl 86.29
[Epoch 135 Batch 200/372] loss 3.98, ppl 53.45, throughput 339.68 samples/s, lr 3.43
[Epoch 135] throughput 23483.48 samples/s
[Epoch 135] time cost 133.33s, valid loss 4.46, valid ppl 86.10
test loss 4.40, test ppl 81.82
[Epoch 136 Batch 200/372] loss 3.97, ppl 53.09, throughput 354.24 samples/s, lr 2.83
[Epoch 136] throughput 24105.10 samples/s
[Epoch 136] time cost 128.46s, valid loss 4.46, valid ppl 86.38
[Epoch 137 Batch 200/372] loss 3.98, ppl 53.35, throughput 354.54 samples/s, lr 2.87
[Epoch 137] throughput 19380.38 samples/s
[Epoch 137] time cost 149.07s, valid loss 4.46, valid ppl 86.17
[Epoch 138 Batch 200/372] loss 3.96, ppl 52.67, throughput 361.04 samples/s, lr 3.21
[Epoch 138] throughput 24421.53 samples/s
[Epoch 138] time cost 126.58s, valid loss 4.46, valid ppl 86.25
[Epoch 139 Batch 200/372] loss 3.96, ppl 52.66, throughput 364.58 samples/s, lr 1.59
[Epoch 139] throughput 24638.25 samples/s
[Epoch 139] time cost 126.40s, valid loss 4.46, valid ppl 86.25
[Epoch 140 Batch 200/372] loss 3.96, ppl 52.62, throughput 349.51 samples/s, lr 3.00
[Epoch 140] throughput 16384.59 samples/s
[Epoch 140] time cost 169.20s, valid loss 4.46, valid ppl 86.48
[Epoch 141 Batch 200/372] loss 3.96, ppl 52.24, throughput 349.81 samples/s, lr 2.96
[Epoch 141] throughput 23782.96 samples/s
[Epoch 141] time cost 129.00s, valid loss 4.46, valid ppl 86.46
[Epoch 142 Batch 200/372] loss 3.96, ppl 52.28, throughput 359.69 samples/s, lr 3.17
[Epoch 142] throughput 23924.44 samples/s
[Epoch 142] time cost 129.05s, valid loss 4.46, valid ppl 86.28
[Epoch 143 Batch 200/372] loss 3.95, ppl 51.98, throughput 371.65 samples/s, lr 2.87
[Epoch 143] throughput 19731.62 samples/s
[Epoch 143] time cost 146.72s, valid loss 4.46, valid ppl 86.21
[Epoch 144 Batch 200/372] loss 3.95, ppl 51.91, throughput 358.39 samples/s, lr 3.26
[Epoch 144] throughput 23947.15 samples/s
[Epoch 144] time cost 128.52s, valid loss 4.46, valid ppl 86.44
[Epoch 145 Batch 200/372] loss 3.95, ppl 51.92, throughput 349.41 samples/s, lr 2.91
[Epoch 145] throughput 23885.79 samples/s
[Epoch 145] time cost 128.70s, valid loss 4.46, valid ppl 86.30
[Epoch 146 Batch 200/372] loss 3.94, ppl 51.53, throughput 358.80 samples/s, lr 2.83
[Epoch 146] throughput 19628.67 samples/s
[Epoch 146] time cost 148.91s, valid loss 4.46, valid ppl 86.47
[Epoch 147 Batch 200/372] loss 3.94, ppl 51.25, throughput 298.55 samples/s, lr 1.46
[Epoch 147] throughput 22099.92 samples/s
[Epoch 147] time cost 135.46s, valid loss 4.46, valid ppl 86.33
[Epoch 148 Batch 200/372] loss 3.95, ppl 51.72, throughput 389.50 samples/s, lr 2.87
[Epoch 148] throughput 26404.46 samples/s
[Epoch 148] time cost 120.27s, valid loss 4.46, valid ppl 86.39
[Epoch 149 Batch 200/372] loss 3.93, ppl 51.15, throughput 358.17 samples/s, lr 3.17
[Epoch 149] throughput 23887.95 samples/s
[Epoch 149] time cost 129.78s, valid loss 4.46, valid ppl 86.64
[Epoch 150 Batch 200/372] loss 3.94, ppl 51.31, throughput 355.55 samples/s, lr 2.79
[Epoch 150] throughput 24382.97 samples/s
[Epoch 150] time cost 127.54s, valid loss 4.46, valid ppl 86.36
[Epoch 151 Batch 200/372] loss 3.94, ppl 51.49, throughput 363.38 samples/s, lr 3.13
[Epoch 151] throughput 24548.38 samples/s
[Epoch 151] time cost 126.09s, valid loss 4.46, valid ppl 86.32
[Epoch 152 Batch 200/372] loss 3.94, ppl 51.20, throughput 353.22 samples/s, lr 2.79
[Epoch 152] throughput 22896.05 samples/s
[Epoch 152] time cost 133.38s, valid loss 4.46, valid ppl 86.52
[Epoch 153 Batch 200/372] loss 3.93, ppl 50.87, throughput 354.28 samples/s, lr 3.17
[Epoch 153] throughput 24040.67 samples/s
[Epoch 153] time cost 128.67s, valid loss 4.46, valid ppl 86.57
[Epoch 154 Batch 200/372] loss 3.93, ppl 50.90, throughput 354.97 samples/s, lr 3.34
[Epoch 154] throughput 24474.05 samples/s
[Epoch 154] time cost 126.63s, valid loss 4.46, valid ppl 86.50
[Epoch 155 Batch 200/372] loss 3.93, ppl 50.73, throughput 366.20 samples/s, lr 3.17
[Epoch 155] throughput 22338.39 samples/s
[Epoch 155] time cost 135.77s, valid loss 4.46, valid ppl 86.57
[Epoch 156 Batch 200/372] loss 3.93, ppl 50.85, throughput 351.45 samples/s, lr 3.00
[Epoch 156] throughput 24245.72 samples/s
[Epoch 156] time cost 126.98s, valid loss 4.46, valid ppl 86.66
[Epoch 157 Batch 200/372] loss 3.92, ppl 50.65, throughput 358.65 samples/s, lr 3.09
[Epoch 157] throughput 24387.34 samples/s
[Epoch 157] time cost 127.85s, valid loss 4.46, valid ppl 86.71
[Epoch 158 Batch 200/372] loss 3.93, ppl 50.67, throughput 351.62 samples/s, lr 2.83
[Epoch 158] throughput 20334.45 samples/s
[Epoch 158] time cost 143.26s, valid loss 4.46, valid ppl 86.69
[Epoch 159 Batch 200/372] loss 3.93, ppl 50.75, throughput 274.20 samples/s, lr 3.21
[Epoch 159] throughput 21161.92 samples/s
[Epoch 159] time cost 139.97s, valid loss 4.46, valid ppl 86.68
[Epoch 160 Batch 200/372] loss 3.93, ppl 50.75, throughput 357.76 samples/s, lr 2.83
[Epoch 160] throughput 24267.59 samples/s
[Epoch 160] time cost 127.73s, valid loss 4.46, valid ppl 86.87
[Epoch 161 Batch 200/372] loss 3.93, ppl 51.02, throughput 359.42 samples/s, lr 3.13
[Epoch 161] throughput 24328.67 samples/s
[Epoch 161] time cost 127.84s, valid loss 4.46, valid ppl 86.88
[Epoch 162 Batch 200/372] loss 3.93, ppl 50.69, throughput 169.40 samples/s, lr 3.34
[Epoch 162] throughput 15425.15 samples/s
[Epoch 162] time cost 177.04s, valid loss 4.46, valid ppl 86.70
[Epoch 163 Batch 200/372] loss 3.92, ppl 50.39, throughput 355.23 samples/s, lr 3.34
[Epoch 163] throughput 24232.34 samples/s
[Epoch 163] time cost 127.52s, valid loss 4.46, valid ppl 86.82
[Epoch 164 Batch 200/372] loss 3.92, ppl 50.41, throughput 357.81 samples/s, lr 3.13
[Epoch 164] throughput 23824.02 samples/s
[Epoch 164] time cost 129.44s, valid loss 4.46, valid ppl 86.61
[Epoch 165 Batch 200/372] loss 3.92, ppl 50.15, throughput 236.99 samples/s, lr 3.09
[Epoch 165] throughput 18950.05 samples/s
[Epoch 165] time cost 151.48s, valid loss 4.46, valid ppl 86.74
Learning rate after interval update 0.300000
[Epoch 166 Batch 200/372] loss 3.94, ppl 51.56, throughput 357.16 samples/s, lr 0.32
[Epoch 166] throughput 24307.13 samples/s
[Epoch 166] time cost 127.24s, valid loss 4.45, valid ppl 85.59
test loss 4.40, test ppl 81.46
[Epoch 167 Batch 200/372] loss 3.94, ppl 51.22, throughput 351.91 samples/s, lr 0.28
[Epoch 167] throughput 24450.88 samples/s
[Epoch 167] time cost 127.64s, valid loss 4.45, valid ppl 85.52
test loss 4.40, test ppl 81.40
[Epoch 168 Batch 200/372] loss 3.93, ppl 50.96, throughput 366.64 samples/s, lr 0.27
[Epoch 168] throughput 24387.05 samples/s
[Epoch 168] time cost 127.19s, valid loss 4.45, valid ppl 85.46
test loss 4.40, test ppl 81.34
[Epoch 169 Batch 200/372] loss 3.92, ppl 50.48, throughput 357.37 samples/s, lr 0.27
[Epoch 169] throughput 24173.97 samples/s
[Epoch 169] time cost 127.34s, valid loss 4.45, valid ppl 85.46
[Epoch 170 Batch 200/372] loss 3.92, ppl 50.54, throughput 361.24 samples/s, lr 0.30
[Epoch 170] throughput 19227.66 samples/s
[Epoch 170] time cost 149.76s, valid loss 4.45, valid ppl 85.44
test loss 4.40, test ppl 81.32
[Epoch 171 Batch 200/372] loss 3.93, ppl 50.73, throughput 365.96 samples/s, lr 0.27
[Epoch 171] throughput 24890.51 samples/s
[Epoch 171] time cost 125.77s, valid loss 4.45, valid ppl 85.43
test loss 4.40, test ppl 81.32
[Epoch 172 Batch 200/372] loss 3.93, ppl 50.93, throughput 361.82 samples/s, lr 0.31
[Epoch 172] throughput 24438.88 samples/s
[Epoch 172] time cost 126.58s, valid loss 4.45, valid ppl 85.41
test loss 4.40, test ppl 81.30
[Epoch 173 Batch 200/372] loss 3.92, ppl 50.45, throughput 234.03 samples/s, lr 0.30
[Epoch 173] throughput 18904.94 samples/s
[Epoch 173] time cost 152.31s, valid loss 4.45, valid ppl 85.50
[Epoch 174 Batch 200/372] loss 3.92, ppl 50.48, throughput 353.08 samples/s, lr 0.29
[Epoch 174] throughput 23935.67 samples/s
[Epoch 174] time cost 128.48s, valid loss 4.45, valid ppl 85.46
[Epoch 175 Batch 200/372] loss 3.92, ppl 50.35, throughput 355.57 samples/s, lr 0.30
[Epoch 175] throughput 23742.93 samples/s
[Epoch 175] time cost 129.41s, valid loss 4.45, valid ppl 85.41
test loss 4.40, test ppl 81.30
[Epoch 176 Batch 200/372] loss 3.92, ppl 50.18, throughput 172.75 samples/s, lr 0.33
[Epoch 176] throughput 15510.22 samples/s
[Epoch 176] time cost 176.40s, valid loss 4.45, valid ppl 85.47
[Epoch 177 Batch 200/372] loss 3.92, ppl 50.26, throughput 362.45 samples/s, lr 0.29
[Epoch 177] throughput 24411.25 samples/s
[Epoch 177] time cost 126.34s, valid loss 4.45, valid ppl 85.43
[Epoch 178 Batch 200/372] loss 3.92, ppl 50.46, throughput 359.08 samples/s, lr 0.31
[Epoch 178] throughput 24597.28 samples/s
[Epoch 178] time cost 125.98s, valid loss 4.45, valid ppl 85.45
[Epoch 179 Batch 200/372] loss 3.92, ppl 50.45, throughput 226.86 samples/s, lr 0.30
[Epoch 179] throughput 15530.95 samples/s
[Epoch 179] time cost 175.71s, valid loss 4.45, valid ppl 85.47
[Epoch 180 Batch 200/372] loss 3.92, ppl 50.37, throughput 349.76 samples/s, lr 0.32
[Epoch 180] throughput 24176.36 samples/s
[Epoch 180] time cost 127.83s, valid loss 4.45, valid ppl 85.44
[Epoch 181 Batch 200/372] loss 3.92, ppl 50.34, throughput 375.13 samples/s, lr 0.33
[Epoch 181] throughput 24757.22 samples/s
[Epoch 181] time cost 125.50s, valid loss 4.45, valid ppl 85.47
[Epoch 182 Batch 200/372] loss 3.92, ppl 50.18, throughput 364.83 samples/s, lr 0.30
[Epoch 182] throughput 19327.35 samples/s
[Epoch 182] time cost 150.45s, valid loss 4.45, valid ppl 85.43
[Epoch 183 Batch 200/372] loss 3.92, ppl 50.48, throughput 361.77 samples/s, lr 0.29
[Epoch 183] throughput 24290.66 samples/s
[Epoch 183] time cost 126.99s, valid loss 4.45, valid ppl 85.40
test loss 4.40, test ppl 81.28
[Epoch 184 Batch 200/372] loss 3.92, ppl 50.41, throughput 363.79 samples/s, lr 0.26
[Epoch 184] throughput 24580.74 samples/s
[Epoch 184] time cost 126.09s, valid loss 4.45, valid ppl 85.39
test loss 4.40, test ppl 81.29
[Epoch 185 Batch 200/372] loss 3.92, ppl 50.40, throughput 175.99 samples/s, lr 0.30
[Epoch 185] throughput 15513.98 samples/s
[Epoch 185] time cost 176.13s, valid loss 4.45, valid ppl 85.46
[Epoch 186 Batch 200/372] loss 3.92, ppl 50.45, throughput 358.22 samples/s, lr 0.30
[Epoch 186] throughput 24105.95 samples/s
[Epoch 186] time cost 127.88s, valid loss 4.45, valid ppl 85.42
[Epoch 187 Batch 200/372] loss 3.92, ppl 50.56, throughput 360.40 samples/s, lr 0.28
[Epoch 187] throughput 24284.61 samples/s
[Epoch 187] time cost 127.29s, valid loss 4.45, valid ppl 85.44
[Epoch 188 Batch 200/372] loss 3.92, ppl 50.24, throughput 239.38 samples/s, lr 0.29
[Epoch 188] throughput 19559.06 samples/s
[Epoch 188] time cost 148.18s, valid loss 4.45, valid ppl 85.45
[Epoch 189 Batch 200/372] loss 3.93, ppl 50.78, throughput 360.22 samples/s, lr 0.25
[Epoch 189] throughput 24417.60 samples/s
[Epoch 189] time cost 126.41s, valid loss 4.45, valid ppl 85.45
[Epoch 190 Batch 200/372] loss 3.91, ppl 50.09, throughput 353.16 samples/s, lr 0.30
[Epoch 190] throughput 24303.31 samples/s
[Epoch 190] time cost 127.33s, valid loss 4.45, valid ppl 85.44
[Epoch 191 Batch 200/372] loss 3.92, ppl 50.34, throughput 253.20 samples/s, lr 0.30
[Epoch 191] throughput 15347.30 samples/s
[Epoch 191] time cost 177.87s, valid loss 4.45, valid ppl 85.44
[Epoch 192 Batch 200/372] loss 3.92, ppl 50.28, throughput 358.18 samples/s, lr 0.29
[Epoch 192] throughput 24467.01 samples/s
[Epoch 192] time cost 126.52s, valid loss 4.45, valid ppl 85.44
[Epoch 193 Batch 200/372] loss 3.92, ppl 50.44, throughput 352.61 samples/s, lr 0.14
[Epoch 193] throughput 23872.79 samples/s
[Epoch 193] time cost 128.35s, valid loss 4.45, valid ppl 85.45
[Epoch 194 Batch 200/372] loss 3.92, ppl 50.17, throughput 375.19 samples/s, lr 0.33
[Epoch 194] throughput 19871.99 samples/s
[Epoch 194] time cost 146.25s, valid loss 4.45, valid ppl 85.47
[Epoch 195 Batch 200/372] loss 3.91, ppl 50.11, throughput 356.12 samples/s, lr 0.25
[Epoch 195] throughput 24493.86 samples/s
[Epoch 195] time cost 126.71s, valid loss 4.45, valid ppl 85.47
[Epoch 196 Batch 200/372] loss 3.91, ppl 49.91, throughput 366.23 samples/s, lr 0.29
[Epoch 196] throughput 24656.06 samples/s
[Epoch 196] time cost 126.37s, valid loss 4.45, valid ppl 85.47
[Epoch 197 Batch 200/372] loss 3.92, ppl 50.55, throughput 280.11 samples/s, lr 0.30
[Epoch 197] throughput 15540.63 samples/s
[Epoch 197] time cost 175.68s, valid loss 4.45, valid ppl 85.45
[Epoch 198 Batch 200/372] loss 3.92, ppl 50.32, throughput 359.02 samples/s, lr 0.29
[Epoch 198] throughput 24774.65 samples/s
[Epoch 198] time cost 125.62s, valid loss 4.45, valid ppl 85.47
[Epoch 199 Batch 200/372] loss 3.91, ppl 50.00, throughput 358.11 samples/s, lr 0.31
[Epoch 199] throughput 24270.59 samples/s
[Epoch 199] time cost 127.88s, valid loss 4.45, valid ppl 85.46
[Epoch 200 Batch 200/372] loss 3.91, ppl 50.11, throughput 353.43 samples/s, lr 0.29
[Epoch 200] throughput 15740.87 samples/s
[Epoch 200] time cost 174.83s, valid loss 4.45, valid ppl 85.46
[Epoch 201 Batch 200/372] loss 3.91, ppl 50.04, throughput 367.36 samples/s, lr 0.34
[Epoch 201] throughput 25194.23 samples/s
[Epoch 201] time cost 124.23s, valid loss 4.45, valid ppl 85.49
[Epoch 202 Batch 200/372] loss 3.91, ppl 50.13, throughput 352.23 samples/s, lr 0.30
[Epoch 202] throughput 24275.25 samples/s
[Epoch 202] time cost 127.12s, valid loss 4.45, valid ppl 85.50
[Epoch 203 Batch 200/372] loss 3.92, ppl 50.35, throughput 358.92 samples/s, lr 0.25
[Epoch 203] throughput 20152.50 samples/s
[Epoch 203] time cost 145.28s, valid loss 4.45, valid ppl 85.46
[Epoch 204 Batch 200/372] loss 3.92, ppl 50.19, throughput 368.36 samples/s, lr 0.17
[Epoch 204] throughput 24944.26 samples/s
[Epoch 204] time cost 124.96s, valid loss 4.45, valid ppl 85.42
[Epoch 205 Batch 200/372] loss 3.91, ppl 49.91, throughput 358.47 samples/s, lr 0.33
[Epoch 205] throughput 24507.94 samples/s
[Epoch 205] time cost 126.30s, valid loss 4.45, valid ppl 85.43
[Epoch 206 Batch 200/372] loss 3.91, ppl 49.75, throughput 349.05 samples/s, lr 0.29
[Epoch 206] throughput 23993.13 samples/s
[Epoch 206] time cost 129.72s, valid loss 4.45, valid ppl 85.46
[Epoch 207 Batch 200/372] loss 3.92, ppl 50.21, throughput 211.00 samples/s, lr 0.29
[Epoch 207] throughput 17724.04 samples/s
[Epoch 207] time cost 159.25s, valid loss 4.45, valid ppl 85.44
[Epoch 208 Batch 200/372] loss 3.92, ppl 50.19, throughput 357.65 samples/s, lr 0.30
[Epoch 208] throughput 24225.08 samples/s
[Epoch 208] time cost 127.88s, valid loss 4.45, valid ppl 85.45
[Epoch 209 Batch 200/372] loss 3.91, ppl 49.93, throughput 354.23 samples/s, lr 0.30
[Epoch 209] throughput 24378.62 samples/s
[Epoch 209] time cost 127.78s, valid loss 4.45, valid ppl 85.53
[Epoch 210 Batch 200/372] loss 3.91, ppl 49.66, throughput 237.22 samples/s, lr 0.31
[Epoch 210] throughput 19154.10 samples/s
[Epoch 210] time cost 150.27s, valid loss 4.45, valid ppl 85.46
[Epoch 211 Batch 200/372] loss 3.91, ppl 50.07, throughput 360.36 samples/s, lr 0.29
[Epoch 211] throughput 24203.18 samples/s
[Epoch 211] time cost 128.07s, valid loss 4.45, valid ppl 85.48
[Epoch 212 Batch 200/372] loss 3.91, ppl 50.01, throughput 361.46 samples/s, lr 0.28
[Epoch 212] throughput 24742.99 samples/s
[Epoch 212] time cost 125.19s, valid loss 4.45, valid ppl 85.45
[Epoch 213 Batch 200/372] loss 3.92, ppl 50.17, throughput 235.74 samples/s, lr 0.28
[Epoch 213] throughput 19196.80 samples/s
[Epoch 213] time cost 151.14s, valid loss 4.45, valid ppl 85.49
[Epoch 214 Batch 200/372] loss 3.91, ppl 49.66, throughput 354.53 samples/s, lr 0.30
[Epoch 214] throughput 23773.42 samples/s
[Epoch 214] time cost 129.76s, valid loss 4.45, valid ppl 85.49
Learning rate after interval update 0.030000
[Epoch 215 Batch 200/372] loss 3.92, ppl 50.37, throughput 366.36 samples/s, lr 0.03
[Epoch 215] throughput 24464.04 samples/s
[Epoch 215] time cost 127.07s, valid loss 4.45, valid ppl 85.39
test loss 4.40, test ppl 81.29
[Epoch 216 Batch 200/372] loss 3.92, ppl 50.22, throughput 171.34 samples/s, lr 0.03
[Epoch 216] throughput 15567.54 samples/s
[Epoch 216] time cost 176.03s, valid loss 4.45, valid ppl 85.35
test loss 4.40, test ppl 81.25
[Epoch 217 Batch 200/372] loss 3.92, ppl 50.33, throughput 355.34 samples/s, lr 0.03
[Epoch 217] throughput 24325.90 samples/s
[Epoch 217] time cost 127.69s, valid loss 4.45, valid ppl 85.33
test loss 4.40, test ppl 81.23
[Epoch 218 Batch 200/372] loss 3.91, ppl 50.00, throughput 363.83 samples/s, lr 0.03
[Epoch 218] throughput 24519.00 samples/s
[Epoch 218] time cost 126.73s, valid loss 4.45, valid ppl 85.32
test loss 4.40, test ppl 81.23
[Epoch 219 Batch 200/372] loss 3.91, ppl 49.79, throughput 351.84 samples/s, lr 0.03
[Epoch 219] throughput 24228.41 samples/s
[Epoch 219] time cost 127.88s, valid loss 4.45, valid ppl 85.32
test loss 4.40, test ppl 81.22
[Epoch 220 Batch 200/372] loss 3.91, ppl 49.94, throughput 362.93 samples/s, lr 0.03
[Epoch 220] throughput 24404.23 samples/s
[Epoch 220] time cost 126.92s, valid loss 4.45, valid ppl 85.30
test loss 4.40, test ppl 81.21
[Epoch 221 Batch 200/372] loss 3.91, ppl 49.95, throughput 311.62 samples/s, lr 0.02
[Epoch 221] throughput 19329.23 samples/s
[Epoch 221] time cost 149.42s, valid loss 4.45, valid ppl 85.30
test loss 4.40, test ppl 81.21
[Epoch 222 Batch 200/372] loss 3.92, ppl 50.28, throughput 358.73 samples/s, lr 0.03
[Epoch 222] throughput 24245.10 samples/s
[Epoch 222] time cost 127.10s, valid loss 4.45, valid ppl 85.30
[Epoch 223 Batch 200/372] loss 3.92, ppl 50.39, throughput 354.26 samples/s, lr 0.03
[Epoch 223] throughput 24405.51 samples/s
[Epoch 223] time cost 126.75s, valid loss 4.45, valid ppl 85.30
[Epoch 224 Batch 200/372] loss 3.91, ppl 49.87, throughput 347.78 samples/s, lr 0.03
[Epoch 224] throughput 19114.77 samples/s
[Epoch 224] time cost 151.10s, valid loss 4.45, valid ppl 85.31
[Epoch 225 Batch 200/372] loss 3.91, ppl 50.12, throughput 355.77 samples/s, lr 0.03
[Epoch 225] throughput 24144.39 samples/s
[Epoch 225] time cost 127.87s, valid loss 4.45, valid ppl 85.30
[Epoch 226 Batch 200/372] loss 3.91, ppl 50.05, throughput 351.15 samples/s, lr 0.03
[Epoch 226] throughput 24352.04 samples/s
[Epoch 226] time cost 126.84s, valid loss 4.45, valid ppl 85.30
test loss 4.40, test ppl 81.21
[Epoch 227 Batch 200/372] loss 3.92, ppl 50.34, throughput 229.78 samples/s, lr 0.03
[Epoch 227] throughput 15669.12 samples/s
[Epoch 227] time cost 175.09s, valid loss 4.45, valid ppl 85.29
test loss 4.40, test ppl 81.20
[Epoch 228 Batch 200/372] loss 3.92, ppl 50.18, throughput 356.51 samples/s, lr 0.03
[Epoch 228] throughput 24376.46 samples/s
[Epoch 228] time cost 127.00s, valid loss 4.45, valid ppl 85.29
test loss 4.40, test ppl 81.20
[Epoch 229 Batch 200/372] loss 3.91, ppl 49.97, throughput 364.43 samples/s, lr 0.03
[Epoch 229] throughput 24825.63 samples/s
[Epoch 229] time cost 125.50s, valid loss 4.45, valid ppl 85.29
[Epoch 230 Batch 200/372] loss 3.92, ppl 50.43, throughput 235.72 samples/s, lr 0.03
[Epoch 230] throughput 19112.37 samples/s
[Epoch 230] time cost 151.04s, valid loss 4.45, valid ppl 85.30
[Epoch 231 Batch 200/372] loss 3.91, ppl 49.93, throughput 358.08 samples/s, lr 0.03
[Epoch 231] throughput 23995.98 samples/s
[Epoch 231] time cost 128.08s, valid loss 4.45, valid ppl 85.29
[Epoch 232 Batch 200/372] loss 3.91, ppl 49.86, throughput 354.26 samples/s, lr 0.03
[Epoch 232] throughput 24295.99 samples/s
[Epoch 232] time cost 127.51s, valid loss 4.45, valid ppl 85.29
[Epoch 233 Batch 200/372] loss 3.92, ppl 50.23, throughput 235.90 samples/s, lr 0.03
[Epoch 233] throughput 19266.48 samples/s
[Epoch 233] time cost 149.50s, valid loss 4.45, valid ppl 85.29
[Epoch 234 Batch 200/372] loss 3.91, ppl 49.99, throughput 348.32 samples/s, lr 0.03
[Epoch 234] throughput 23849.24 samples/s
[Epoch 234] time cost 128.85s, valid loss 4.45, valid ppl 85.29
[Epoch 235 Batch 200/372] loss 3.91, ppl 49.87, throughput 363.11 samples/s, lr 0.03
[Epoch 235] throughput 24030.76 samples/s
[Epoch 235] time cost 128.44s, valid loss 4.45, valid ppl 85.29
[Epoch 236 Batch 200/372] loss 3.92, ppl 50.22, throughput 246.06 samples/s, lr 0.03
[Epoch 236] throughput 19009.79 samples/s
[Epoch 236] time cost 151.63s, valid loss 4.45, valid ppl 85.29
[Epoch 237 Batch 200/372] loss 3.92, ppl 50.29, throughput 354.38 samples/s, lr 0.03
[Epoch 237] throughput 24006.52 samples/s
[Epoch 237] time cost 127.89s, valid loss 4.45, valid ppl 85.29
[Epoch 238 Batch 200/372] loss 3.91, ppl 50.06, throughput 362.81 samples/s, lr 0.03
[Epoch 238] throughput 24505.49 samples/s
[Epoch 238] time cost 126.93s, valid loss 4.45, valid ppl 85.30
[Epoch 239 Batch 200/372] loss 3.92, ppl 50.40, throughput 234.24 samples/s, lr 0.03
[Epoch 239] throughput 19123.92 samples/s
[Epoch 239] time cost 150.22s, valid loss 4.45, valid ppl 85.30
[Epoch 240 Batch 200/372] loss 3.91, ppl 49.75, throughput 366.05 samples/s, lr 0.03
[Epoch 240] throughput 24653.15 samples/s
[Epoch 240] time cost 125.89s, valid loss 4.45, valid ppl 85.30
[Epoch 241 Batch 200/372] loss 3.91, ppl 49.75, throughput 355.80 samples/s, lr 0.03
[Epoch 241] throughput 24490.95 samples/s
[Epoch 241] time cost 126.51s, valid loss 4.45, valid ppl 85.30
[Epoch 242 Batch 200/372] loss 3.91, ppl 49.89, throughput 241.19 samples/s, lr 0.03
[Epoch 242] throughput 19426.52 samples/s
[Epoch 242] time cost 149.14s, valid loss 4.45, valid ppl 85.30
[Epoch 243 Batch 200/372] loss 3.91, ppl 49.99, throughput 361.93 samples/s, lr 0.03
[Epoch 243] throughput 24410.01 samples/s
[Epoch 243] time cost 127.35s, valid loss 4.45, valid ppl 85.30
[Epoch 244 Batch 200/372] loss 3.91, ppl 49.92, throughput 356.38 samples/s, lr 0.03
[Epoch 244] throughput 24242.70 samples/s
[Epoch 244] time cost 127.32s, valid loss 4.45, valid ppl 85.31
[Epoch 245 Batch 200/372] loss 3.92, ppl 50.16, throughput 187.09 samples/s, lr 0.03
[Epoch 245] throughput 15355.60 samples/s
[Epoch 245] time cost 177.73s, valid loss 4.45, valid ppl 85.30
[Epoch 246 Batch 200/372] loss 3.91, ppl 49.78, throughput 354.91 samples/s, lr 0.03
[Epoch 246] throughput 24253.44 samples/s
[Epoch 246] time cost 127.31s, valid loss 4.45, valid ppl 85.31
[Epoch 247 Batch 200/372] loss 3.91, ppl 49.91, throughput 355.55 samples/s, lr 0.03
[Epoch 247] throughput 24557.09 samples/s
[Epoch 247] time cost 125.82s, valid loss 4.45, valid ppl 85.31
[Epoch 248 Batch 200/372] loss 3.91, ppl 50.02, throughput 234.49 samples/s, lr 0.03
[Epoch 248] throughput 19168.38 samples/s
[Epoch 248] time cost 150.33s, valid loss 4.45, valid ppl 85.31
[Epoch 249 Batch 200/372] loss 3.92, ppl 50.26, throughput 351.35 samples/s, lr 0.03
[Epoch 249] throughput 24344.54 samples/s
[Epoch 249] time cost 126.88s, valid loss 4.45, valid ppl 85.31
[Epoch 250 Batch 200/372] loss 3.91, ppl 49.84, throughput 358.22 samples/s, lr 0.03
[Epoch 250] throughput 24655.24 samples/s
[Epoch 250] time cost 125.78s, valid loss 4.45, valid ppl 85.32
[Epoch 251 Batch 200/372] loss 3.90, ppl 49.41, throughput 199.34 samples/s, lr 0.03
[Epoch 251] throughput 15769.63 samples/s
[Epoch 251] time cost 173.68s, valid loss 4.45, valid ppl 85.32
[Epoch 252 Batch 200/372] loss 3.92, ppl 50.19, throughput 359.15 samples/s, lr 0.03
[Epoch 252] throughput 24277.31 samples/s
[Epoch 252] time cost 127.73s, valid loss 4.45, valid ppl 85.31
[Epoch 253 Batch 200/372] loss 3.91, ppl 50.03, throughput 345.55 samples/s, lr 0.03
[Epoch 253] throughput 24204.16 samples/s
[Epoch 253] time cost 127.24s, valid loss 4.45, valid ppl 85.32
[Epoch 254 Batch 200/372] loss 3.92, ppl 50.32, throughput 242.89 samples/s, lr 0.03
[Epoch 254] throughput 19146.47 samples/s
[Epoch 254] time cost 150.85s, valid loss 4.45, valid ppl 85.32
[Epoch 255 Batch 200/372] loss 3.91, ppl 49.96, throughput 355.63 samples/s, lr 0.03
[Epoch 255] throughput 24213.92 samples/s
[Epoch 255] time cost 128.07s, valid loss 4.45, valid ppl 85.32
[Epoch 256 Batch 200/372] loss 3.91, ppl 50.02, throughput 376.74 samples/s, lr 0.03
[Epoch 256] throughput 25013.65 samples/s
[Epoch 256] time cost 124.08s, valid loss 4.45, valid ppl 85.32
[Epoch 257 Batch 200/372] loss 3.91, ppl 50.03, throughput 236.42 samples/s, lr 0.03
[Epoch 257] throughput 19121.20 samples/s
[Epoch 257] time cost 150.82s, valid loss 4.45, valid ppl 85.31
[Epoch 258 Batch 200/372] loss 3.90, ppl 49.54, throughput 356.39 samples/s, lr 0.03
[Epoch 258] throughput 24522.92 samples/s
[Epoch 258] time cost 126.31s, valid loss 4.45, valid ppl 85.32
Learning rate after interval update 0.003000
[Epoch 259 Batch 200/372] loss 3.92, ppl 50.22, throughput 350.06 samples/s, lr 0.00
[Epoch 259] throughput 24129.99 samples/s
[Epoch 259] time cost 128.65s, valid loss 4.45, valid ppl 85.32
[Epoch 260 Batch 200/372] loss 3.91, ppl 49.95, throughput 237.31 samples/s, lr 0.00
[Epoch 260] throughput 19194.26 samples/s
[Epoch 260] time cost 150.45s, valid loss 4.45, valid ppl 85.32
[Epoch 261 Batch 200/372] loss 3.91, ppl 49.99, throughput 352.67 samples/s, lr 0.00
[Epoch 261] throughput 23932.89 samples/s
[Epoch 261] time cost 128.88s, valid loss 4.45, valid ppl 85.32
[Epoch 262 Batch 200/372] loss 3.92, ppl 50.29, throughput 351.31 samples/s, lr 0.00
[Epoch 262] throughput 24302.05 samples/s
[Epoch 262] time cost 126.67s, valid loss 4.45, valid ppl 85.31
[Epoch 263 Batch 200/372] loss 3.92, ppl 50.34, throughput 181.92 samples/s, lr 0.00
[Epoch 263] throughput 15497.56 samples/s
[Epoch 263] time cost 176.05s, valid loss 4.45, valid ppl 85.31
[Epoch 264 Batch 200/372] loss 3.91, ppl 49.82, throughput 360.73 samples/s, lr 0.00
[Epoch 264] throughput 24326.79 samples/s
[Epoch 264] time cost 126.97s, valid loss 4.45, valid ppl 85.31
[Epoch 265 Batch 200/372] loss 3.91, ppl 49.80, throughput 360.20 samples/s, lr 0.00
[Epoch 265] throughput 24403.80 samples/s
[Epoch 265] time cost 126.25s, valid loss 4.45, valid ppl 85.31
[Epoch 266 Batch 200/372] loss 3.91, ppl 49.81, throughput 362.11 samples/s, lr 0.00
[Epoch 266] throughput 19183.41 samples/s
[Epoch 266] time cost 149.98s, valid loss 4.45, valid ppl 85.31
[Epoch 267 Batch 200/372] loss 3.91, ppl 49.82, throughput 352.72 samples/s, lr 0.00
[Epoch 267] throughput 24405.98 samples/s
[Epoch 267] time cost 127.32s, valid loss 4.45, valid ppl 85.31
[Epoch 268 Batch 200/372] loss 3.91, ppl 50.00, throughput 354.73 samples/s, lr 0.00
[Epoch 268] throughput 23990.45 samples/s
[Epoch 268] time cost 129.54s, valid loss 4.45, valid ppl 85.31
[Epoch 269 Batch 200/372] loss 3.91, ppl 50.10, throughput 366.04 samples/s, lr 0.00
[Epoch 269] throughput 19208.53 samples/s
[Epoch 269] time cost 149.87s, valid loss 4.45, valid ppl 85.31
[Epoch 270 Batch 200/372] loss 3.91, ppl 50.02, throughput 360.55 samples/s, lr 0.00
[Epoch 270] throughput 24242.95 samples/s
[Epoch 270] time cost 127.90s, valid loss 4.45, valid ppl 85.31
[Epoch 271 Batch 200/372] loss 3.91, ppl 49.87, throughput 357.34 samples/s, lr 0.00
[Epoch 271] throughput 24433.16 samples/s
[Epoch 271] time cost 126.66s, valid loss 4.45, valid ppl 85.31
[Epoch 272 Batch 200/372] loss 3.91, ppl 49.98, throughput 351.87 samples/s, lr 0.00
[Epoch 272] throughput 19113.59 samples/s
[Epoch 272] time cost 150.98s, valid loss 4.45, valid ppl 85.31
[Epoch 273 Batch 200/372] loss 3.91, ppl 50.09, throughput 351.38 samples/s, lr 0.00
[Epoch 273] throughput 24529.60 samples/s
[Epoch 273] time cost 127.49s, valid loss 4.45, valid ppl 85.31
[Epoch 274 Batch 200/372] loss 3.92, ppl 50.21, throughput 359.21 samples/s, lr 0.00
[Epoch 274] throughput 24171.03 samples/s
[Epoch 274] time cost 127.81s, valid loss 4.45, valid ppl 85.31
[Epoch 275 Batch 200/372] loss 3.92, ppl 50.35, throughput 362.64 samples/s, lr 0.00
[Epoch 275] throughput 19284.59 samples/s
[Epoch 275] time cost 150.09s, valid loss 4.45, valid ppl 85.31
[Epoch 276 Batch 200/372] loss 3.92, ppl 50.22, throughput 357.81 samples/s, lr 0.00
[Epoch 276] throughput 24360.96 samples/s
[Epoch 276] time cost 128.03s, valid loss 4.45, valid ppl 85.30
[Epoch 277 Batch 200/372] loss 3.92, ppl 50.22, throughput 354.23 samples/s, lr 0.00
[Epoch 277] throughput 24374.09 samples/s
[Epoch 277] time cost 127.07s, valid loss 4.45, valid ppl 85.30
[Epoch 278 Batch 200/372] loss 3.92, ppl 50.24, throughput 354.18 samples/s, lr 0.00
[Epoch 278] throughput 19313.51 samples/s
[Epoch 278] time cost 148.78s, valid loss 4.45, valid ppl 85.30
[Epoch 279 Batch 200/372] loss 3.91, ppl 50.12, throughput 355.69 samples/s, lr 0.00
[Epoch 279] throughput 23880.78 samples/s
[Epoch 279] time cost 129.85s, valid loss 4.45, valid ppl 85.30
[Epoch 280 Batch 200/372] loss 3.92, ppl 50.41, throughput 361.43 samples/s, lr 0.00
[Epoch 280] throughput 24261.73 samples/s
[Epoch 280] time cost 127.28s, valid loss 4.45, valid ppl 85.30
[Epoch 281 Batch 200/372] loss 3.92, ppl 50.37, throughput 352.67 samples/s, lr 0.00
[Epoch 281] throughput 19031.46 samples/s
[Epoch 281] time cost 151.99s, valid loss 4.45, valid ppl 85.30
[Epoch 282 Batch 200/372] loss 3.91, ppl 49.90, throughput 365.89 samples/s, lr 0.00
[Epoch 282] throughput 24944.27 samples/s
[Epoch 282] time cost 125.09s, valid loss 4.45, valid ppl 85.30
[Epoch 283 Batch 200/372] loss 3.92, ppl 50.37, throughput 355.47 samples/s, lr 0.00
[Epoch 283] throughput 24119.70 samples/s
[Epoch 283] time cost 128.40s, valid loss 4.45, valid ppl 85.30
[Epoch 284 Batch 200/372] loss 3.92, ppl 50.20, throughput 302.33 samples/s, lr 0.00
[Epoch 284] throughput 15400.57 samples/s
[Epoch 284] time cost 176.71s, valid loss 4.45, valid ppl 85.30
[Epoch 285 Batch 200/372] loss 3.91, ppl 49.90, throughput 354.81 samples/s, lr 0.00
[Epoch 285] throughput 24116.58 samples/s
[Epoch 285] time cost 128.34s, valid loss 4.45, valid ppl 85.30
[Epoch 286 Batch 200/372] loss 3.92, ppl 50.29, throughput 361.66 samples/s, lr 0.00
[Epoch 286] throughput 24460.15 samples/s
[Epoch 286] time cost 126.88s, valid loss 4.45, valid ppl 85.30
[Epoch 287 Batch 200/372] loss 3.91, ppl 50.05, throughput 363.29 samples/s, lr 0.00
[Epoch 287] throughput 19139.30 samples/s
[Epoch 287] time cost 150.33s, valid loss 4.45, valid ppl 85.30
[Epoch 288 Batch 200/372] loss 3.91, ppl 49.88, throughput 362.36 samples/s, lr 0.00
[Epoch 288] throughput 24586.43 samples/s
[Epoch 288] time cost 125.60s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000300
[Epoch 289 Batch 200/372] loss 3.91, ppl 50.01, throughput 364.97 samples/s, lr 0.00
[Epoch 289] throughput 24663.31 samples/s
[Epoch 289] time cost 126.03s, valid loss 4.45, valid ppl 85.30
[Epoch 290 Batch 200/372] loss 3.91, ppl 49.83, throughput 354.53 samples/s, lr 0.00
[Epoch 290] throughput 16665.41 samples/s
[Epoch 290] time cost 168.24s, valid loss 4.45, valid ppl 85.30
[Epoch 291 Batch 200/372] loss 3.91, ppl 49.92, throughput 353.79 samples/s, lr 0.00
[Epoch 291] throughput 23903.64 samples/s
[Epoch 291] time cost 129.60s, valid loss 4.45, valid ppl 85.30
[Epoch 292 Batch 200/372] loss 3.92, ppl 50.20, throughput 360.15 samples/s, lr 0.00
[Epoch 292] throughput 24290.97 samples/s
[Epoch 292] time cost 127.00s, valid loss 4.45, valid ppl 85.30
[Epoch 293 Batch 200/372] loss 3.91, ppl 50.13, throughput 348.95 samples/s, lr 0.00
[Epoch 293] throughput 19062.87 samples/s
[Epoch 293] time cost 150.75s, valid loss 4.45, valid ppl 85.30
[Epoch 294 Batch 200/372] loss 3.92, ppl 50.28, throughput 355.03 samples/s, lr 0.00
[Epoch 294] throughput 24071.92 samples/s
[Epoch 294] time cost 128.18s, valid loss 4.45, valid ppl 85.30
[Epoch 295 Batch 200/372] loss 3.91, ppl 49.72, throughput 353.12 samples/s, lr 0.00
[Epoch 295] throughput 24172.44 samples/s
[Epoch 295] time cost 127.73s, valid loss 4.45, valid ppl 85.30
[Epoch 296 Batch 200/372] loss 3.91, ppl 50.01, throughput 357.08 samples/s, lr 0.00
[Epoch 296] throughput 19427.14 samples/s
[Epoch 296] time cost 149.58s, valid loss 4.45, valid ppl 85.30
[Epoch 297 Batch 200/372] loss 3.91, ppl 50.14, throughput 360.13 samples/s, lr 0.00
[Epoch 297] throughput 24449.05 samples/s
[Epoch 297] time cost 126.59s, valid loss 4.45, valid ppl 85.30
[Epoch 298 Batch 200/372] loss 3.92, ppl 50.54, throughput 350.65 samples/s, lr 0.00
[Epoch 298] throughput 23826.94 samples/s
[Epoch 298] time cost 129.04s, valid loss 4.45, valid ppl 85.30
[Epoch 299 Batch 200/372] loss 3.91, ppl 49.96, throughput 358.45 samples/s, lr 0.00
[Epoch 299] throughput 20728.28 samples/s
[Epoch 299] time cost 142.31s, valid loss 4.45, valid ppl 85.30
[Epoch 300 Batch 200/372] loss 3.91, ppl 50.06, throughput 355.80 samples/s, lr 0.00
[Epoch 300] throughput 24271.13 samples/s
[Epoch 300] time cost 128.18s, valid loss 4.45, valid ppl 85.30
[Epoch 301 Batch 200/372] loss 3.91, ppl 49.90, throughput 360.43 samples/s, lr 0.00
[Epoch 301] throughput 24610.32 samples/s
[Epoch 301] time cost 127.05s, valid loss 4.45, valid ppl 85.30
[Epoch 302 Batch 200/372] loss 3.91, ppl 50.03, throughput 357.05 samples/s, lr 0.00
[Epoch 302] throughput 23060.97 samples/s
[Epoch 302] time cost 132.43s, valid loss 4.45, valid ppl 85.30
[Epoch 303 Batch 200/372] loss 3.92, ppl 50.23, throughput 360.97 samples/s, lr 0.00
[Epoch 303] throughput 24036.93 samples/s
[Epoch 303] time cost 128.78s, valid loss 4.45, valid ppl 85.30
[Epoch 304 Batch 200/372] loss 3.91, ppl 49.87, throughput 367.32 samples/s, lr 0.00
[Epoch 304] throughput 24953.90 samples/s
[Epoch 304] time cost 125.62s, valid loss 4.45, valid ppl 85.30
[Epoch 305 Batch 200/372] loss 3.90, ppl 49.55, throughput 369.54 samples/s, lr 0.00
[Epoch 305] throughput 23187.13 samples/s
[Epoch 305] time cost 133.42s, valid loss 4.45, valid ppl 85.30
[Epoch 306 Batch 200/372] loss 3.92, ppl 50.23, throughput 349.88 samples/s, lr 0.00
[Epoch 306] throughput 24231.45 samples/s
[Epoch 306] time cost 127.41s, valid loss 4.45, valid ppl 85.30
[Epoch 307 Batch 200/372] loss 3.91, ppl 50.10, throughput 359.79 samples/s, lr 0.00
[Epoch 307] throughput 24527.88 samples/s
[Epoch 307] time cost 126.57s, valid loss 4.45, valid ppl 85.30
[Epoch 308 Batch 200/372] loss 3.91, ppl 49.85, throughput 363.78 samples/s, lr 0.00
[Epoch 308] throughput 22804.68 samples/s
[Epoch 308] time cost 134.06s, valid loss 4.45, valid ppl 85.30
[Epoch 309 Batch 200/372] loss 3.91, ppl 50.05, throughput 355.28 samples/s, lr 0.00
[Epoch 309] throughput 24228.43 samples/s
[Epoch 309] time cost 127.76s, valid loss 4.45, valid ppl 85.30
[Epoch 310 Batch 200/372] loss 3.91, ppl 50.11, throughput 354.11 samples/s, lr 0.00
[Epoch 310] throughput 24167.73 samples/s
[Epoch 310] time cost 127.68s, valid loss 4.45, valid ppl 85.30
[Epoch 311 Batch 200/372] loss 3.92, ppl 50.20, throughput 354.91 samples/s, lr 0.00
[Epoch 311] throughput 24483.57 samples/s
[Epoch 311] time cost 127.20s, valid loss 4.45, valid ppl 85.30
[Epoch 312 Batch 200/372] loss 3.92, ppl 50.16, throughput 339.87 samples/s, lr 0.00
[Epoch 312] throughput 23852.04 samples/s
[Epoch 312] time cost 128.72s, valid loss 4.45, valid ppl 85.30
[Epoch 313 Batch 200/372] loss 3.91, ppl 49.86, throughput 356.26 samples/s, lr 0.00
[Epoch 313] throughput 24766.64 samples/s
[Epoch 313] time cost 125.63s, valid loss 4.45, valid ppl 85.30
[Epoch 314 Batch 200/372] loss 3.91, ppl 49.80, throughput 353.88 samples/s, lr 0.00
[Epoch 314] throughput 24310.37 samples/s
[Epoch 314] time cost 128.51s, valid loss 4.45, valid ppl 85.30
[Epoch 315 Batch 200/372] loss 3.91, ppl 49.76, throughput 333.19 samples/s, lr 0.00
[Epoch 315] throughput 23339.45 samples/s
[Epoch 315] time cost 130.10s, valid loss 4.45, valid ppl 85.30
[Epoch 316 Batch 200/372] loss 3.91, ppl 50.00, throughput 395.30 samples/s, lr 0.00
[Epoch 316] throughput 26806.57 samples/s
[Epoch 316] time cost 119.02s, valid loss 4.45, valid ppl 85.30
[Epoch 317 Batch 200/372] loss 3.91, ppl 50.04, throughput 389.30 samples/s, lr 0.00
[Epoch 317] throughput 26743.87 samples/s
[Epoch 317] time cost 120.93s, valid loss 4.45, valid ppl 85.30
[Epoch 318 Batch 200/372] loss 3.92, ppl 50.26, throughput 256.38 samples/s, lr 0.00
[Epoch 318] throughput 20051.21 samples/s
[Epoch 318] time cost 146.04s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000030
[Epoch 319 Batch 200/372] loss 3.92, ppl 50.17, throughput 363.98 samples/s, lr 0.00
[Epoch 319] throughput 24603.48 samples/s
[Epoch 319] time cost 127.02s, valid loss 4.45, valid ppl 85.30
[Epoch 320 Batch 200/372] loss 3.91, ppl 50.11, throughput 364.75 samples/s, lr 0.00
[Epoch 320] throughput 25077.98 samples/s
[Epoch 320] time cost 124.66s, valid loss 4.45, valid ppl 85.30
[Epoch 321 Batch 200/372] loss 3.91, ppl 50.00, throughput 241.92 samples/s, lr 0.00
[Epoch 321] throughput 19432.43 samples/s
[Epoch 321] time cost 148.87s, valid loss 4.45, valid ppl 85.30
[Epoch 322 Batch 200/372] loss 3.91, ppl 49.79, throughput 359.29 samples/s, lr 0.00
[Epoch 322] throughput 24427.52 samples/s
[Epoch 322] time cost 126.68s, valid loss 4.45, valid ppl 85.30
[Epoch 323 Batch 200/372] loss 3.92, ppl 50.16, throughput 360.21 samples/s, lr 0.00
[Epoch 323] throughput 24374.28 samples/s
[Epoch 323] time cost 127.61s, valid loss 4.45, valid ppl 85.30
[Epoch 324 Batch 200/372] loss 3.91, ppl 49.87, throughput 298.31 samples/s, lr 0.00
[Epoch 324] throughput 22288.95 samples/s
[Epoch 324] time cost 134.95s, valid loss 4.45, valid ppl 85.30
[Epoch 325 Batch 200/372] loss 3.91, ppl 49.88, throughput 353.12 samples/s, lr 0.00
[Epoch 325] throughput 24087.91 samples/s
[Epoch 325] time cost 128.47s, valid loss 4.45, valid ppl 85.30
[Epoch 326 Batch 200/372] loss 3.92, ppl 50.33, throughput 357.39 samples/s, lr 0.00
[Epoch 326] throughput 24083.01 samples/s
[Epoch 326] time cost 129.25s, valid loss 4.45, valid ppl 85.30
[Epoch 327 Batch 200/372] loss 3.91, ppl 50.04, throughput 213.41 samples/s, lr 0.00
[Epoch 327] throughput 18040.80 samples/s
[Epoch 327] time cost 157.85s, valid loss 4.45, valid ppl 85.30
[Epoch 328 Batch 200/372] loss 3.92, ppl 50.21, throughput 354.69 samples/s, lr 0.00
[Epoch 328] throughput 24850.32 samples/s
[Epoch 328] time cost 126.33s, valid loss 4.45, valid ppl 85.30
[Epoch 329 Batch 200/372] loss 3.91, ppl 50.01, throughput 358.68 samples/s, lr 0.00
[Epoch 329] throughput 24676.37 samples/s
[Epoch 329] time cost 125.77s, valid loss 4.45, valid ppl 85.30
[Epoch 330 Batch 200/372] loss 3.91, ppl 50.12, throughput 186.94 samples/s, lr 0.00
[Epoch 330] throughput 15730.61 samples/s
[Epoch 330] time cost 173.66s, valid loss 4.45, valid ppl 85.30
[Epoch 331 Batch 200/372] loss 3.92, ppl 50.15, throughput 366.97 samples/s, lr 0.00
[Epoch 331] throughput 25016.47 samples/s
[Epoch 331] time cost 124.64s, valid loss 4.45, valid ppl 85.30
[Epoch 332 Batch 200/372] loss 3.91, ppl 49.93, throughput 347.38 samples/s, lr 0.00
[Epoch 332] throughput 23793.27 samples/s
[Epoch 332] time cost 128.96s, valid loss 4.45, valid ppl 85.30
[Epoch 333 Batch 200/372] loss 3.91, ppl 49.67, throughput 234.32 samples/s, lr 0.00
[Epoch 333] throughput 15623.63 samples/s
[Epoch 333] time cost 175.12s, valid loss 4.45, valid ppl 85.30
[Epoch 334 Batch 200/372] loss 3.91, ppl 49.96, throughput 366.82 samples/s, lr 0.00
[Epoch 334] throughput 24899.13 samples/s
[Epoch 334] time cost 125.35s, valid loss 4.45, valid ppl 85.30
[Epoch 335 Batch 200/372] loss 3.91, ppl 50.13, throughput 369.31 samples/s, lr 0.00
[Epoch 335] throughput 24937.98 samples/s
[Epoch 335] time cost 124.85s, valid loss 4.45, valid ppl 85.30
[Epoch 336 Batch 200/372] loss 3.91, ppl 49.95, throughput 268.63 samples/s, lr 0.00
[Epoch 336] throughput 15765.94 samples/s
[Epoch 336] time cost 173.19s, valid loss 4.45, valid ppl 85.30
[Epoch 337 Batch 200/372] loss 3.91, ppl 50.12, throughput 361.38 samples/s, lr 0.00
[Epoch 337] throughput 24510.06 samples/s
[Epoch 337] time cost 126.49s, valid loss 4.45, valid ppl 85.30
[Epoch 338 Batch 200/372] loss 3.91, ppl 49.74, throughput 357.01 samples/s, lr 0.00
[Epoch 338] throughput 24273.33 samples/s
[Epoch 338] time cost 127.54s, valid loss 4.45, valid ppl 85.30
[Epoch 339 Batch 200/372] loss 3.92, ppl 50.27, throughput 359.03 samples/s, lr 0.00
[Epoch 339] throughput 19196.44 samples/s
[Epoch 339] time cost 149.83s, valid loss 4.45, valid ppl 85.30
[Epoch 340 Batch 200/372] loss 3.92, ppl 50.38, throughput 355.75 samples/s, lr 0.00
[Epoch 340] throughput 24408.44 samples/s
[Epoch 340] time cost 127.33s, valid loss 4.45, valid ppl 85.30
[Epoch 341 Batch 200/372] loss 3.91, ppl 49.86, throughput 370.21 samples/s, lr 0.00
[Epoch 341] throughput 25150.43 samples/s
[Epoch 341] time cost 124.89s, valid loss 4.45, valid ppl 85.30
[Epoch 342 Batch 200/372] loss 3.91, ppl 50.01, throughput 367.60 samples/s, lr 0.00
[Epoch 342] throughput 19642.87 samples/s
[Epoch 342] time cost 147.72s, valid loss 4.45, valid ppl 85.30
[Epoch 343 Batch 200/372] loss 3.91, ppl 50.03, throughput 365.93 samples/s, lr 0.00
[Epoch 343] throughput 25018.28 samples/s
[Epoch 343] time cost 124.89s, valid loss 4.45, valid ppl 85.30
[Epoch 344 Batch 200/372] loss 3.91, ppl 49.97, throughput 369.25 samples/s, lr 0.00
[Epoch 344] throughput 24921.88 samples/s
[Epoch 344] time cost 124.86s, valid loss 4.45, valid ppl 85.30
[Epoch 345 Batch 200/372] loss 3.91, ppl 50.02, throughput 352.77 samples/s, lr 0.00
[Epoch 345] throughput 19223.19 samples/s
[Epoch 345] time cost 149.44s, valid loss 4.45, valid ppl 85.30
[Epoch 346 Batch 200/372] loss 3.91, ppl 49.84, throughput 360.37 samples/s, lr 0.00
[Epoch 346] throughput 24317.39 samples/s
[Epoch 346] time cost 127.08s, valid loss 4.45, valid ppl 85.30
[Epoch 347 Batch 200/372] loss 3.91, ppl 50.10, throughput 362.43 samples/s, lr 0.00
[Epoch 347] throughput 24331.25 samples/s
[Epoch 347] time cost 127.09s, valid loss 4.45, valid ppl 85.30
[Epoch 348 Batch 200/372] loss 3.91, ppl 50.05, throughput 247.66 samples/s, lr 0.00
[Epoch 348] throughput 19122.14 samples/s
[Epoch 348] time cost 150.26s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000003
[Epoch 349 Batch 200/372] loss 3.91, ppl 49.97, throughput 369.62 samples/s, lr 0.00
[Epoch 349] throughput 24488.03 samples/s
[Epoch 349] time cost 126.42s, valid loss 4.45, valid ppl 85.30
[Epoch 350 Batch 200/372] loss 3.91, ppl 50.06, throughput 362.75 samples/s, lr 0.00
[Epoch 350] throughput 24101.63 samples/s
[Epoch 350] time cost 127.63s, valid loss 4.45, valid ppl 85.30
[Epoch 351 Batch 200/372] loss 3.91, ppl 49.82, throughput 251.20 samples/s, lr 0.00
[Epoch 351] throughput 19059.06 samples/s
[Epoch 351] time cost 151.35s, valid loss 4.45, valid ppl 85.30
[Epoch 352 Batch 200/372] loss 3.91, ppl 49.73, throughput 359.90 samples/s, lr 0.00
[Epoch 352] throughput 24134.87 samples/s
[Epoch 352] time cost 127.58s, valid loss 4.45, valid ppl 85.30
[Epoch 353 Batch 200/372] loss 3.91, ppl 49.85, throughput 358.01 samples/s, lr 0.00
[Epoch 353] throughput 24471.36 samples/s
[Epoch 353] time cost 126.74s, valid loss 4.45, valid ppl 85.30
[Epoch 354 Batch 200/372] loss 3.91, ppl 49.93, throughput 271.39 samples/s, lr 0.00
[Epoch 354] throughput 19025.39 samples/s
[Epoch 354] time cost 151.02s, valid loss 4.45, valid ppl 85.30
[Epoch 355 Batch 200/372] loss 3.91, ppl 49.90, throughput 359.77 samples/s, lr 0.00
[Epoch 355] throughput 24340.77 samples/s
[Epoch 355] time cost 127.25s, valid loss 4.45, valid ppl 85.30
[Epoch 356 Batch 200/372] loss 3.91, ppl 50.01, throughput 360.05 samples/s, lr 0.00
[Epoch 356] throughput 24564.60 samples/s
[Epoch 356] time cost 126.99s, valid loss 4.45, valid ppl 85.30
[Epoch 357 Batch 200/372] loss 3.91, ppl 50.09, throughput 273.31 samples/s, lr 0.00
[Epoch 357] throughput 15692.39 samples/s
[Epoch 357] time cost 174.93s, valid loss 4.45, valid ppl 85.30
[Epoch 358 Batch 200/372] loss 3.91, ppl 49.76, throughput 367.65 samples/s, lr 0.00
[Epoch 358] throughput 24796.55 samples/s
[Epoch 358] time cost 126.07s, valid loss 4.45, valid ppl 85.30
[Epoch 359 Batch 200/372] loss 3.92, ppl 50.18, throughput 354.45 samples/s, lr 0.00
[Epoch 359] throughput 23899.56 samples/s
[Epoch 359] time cost 128.50s, valid loss 4.45, valid ppl 85.30
[Epoch 360 Batch 200/372] loss 3.92, ppl 50.20, throughput 359.67 samples/s, lr 0.00
[Epoch 360] throughput 18016.56 samples/s
[Epoch 360] time cost 157.60s, valid loss 4.45, valid ppl 85.30
[Epoch 361 Batch 200/372] loss 3.91, ppl 49.87, throughput 352.79 samples/s, lr 0.00
[Epoch 361] throughput 23834.78 samples/s
[Epoch 361] time cost 129.22s, valid loss 4.45, valid ppl 85.30
[Epoch 362 Batch 200/372] loss 3.91, ppl 50.13, throughput 347.29 samples/s, lr 0.00
[Epoch 362] throughput 24251.22 samples/s
[Epoch 362] time cost 127.10s, valid loss 4.45, valid ppl 85.30
[Epoch 363 Batch 200/372] loss 3.91, ppl 50.03, throughput 355.92 samples/s, lr 0.00
[Epoch 363] throughput 23815.33 samples/s
[Epoch 363] time cost 130.06s, valid loss 4.45, valid ppl 85.30
[Epoch 364 Batch 200/372] loss 3.91, ppl 50.08, throughput 327.23 samples/s, lr 0.00
[Epoch 364] throughput 23330.53 samples/s
[Epoch 364] time cost 131.24s, valid loss 4.45, valid ppl 85.30
[Epoch 365 Batch 200/372] loss 3.91, ppl 49.98, throughput 359.34 samples/s, lr 0.00
[Epoch 365] throughput 24407.25 samples/s
[Epoch 365] time cost 127.07s, valid loss 4.45, valid ppl 85.30
[Epoch 366 Batch 200/372] loss 3.91, ppl 49.96, throughput 359.32 samples/s, lr 0.00
[Epoch 366] throughput 22972.50 samples/s
[Epoch 366] time cost 134.29s, valid loss 4.45, valid ppl 85.30
[Epoch 367 Batch 200/372] loss 3.92, ppl 50.21, throughput 357.03 samples/s, lr 0.00
[Epoch 367] throughput 24232.52 samples/s
[Epoch 367] time cost 127.93s, valid loss 4.45, valid ppl 85.30
[Epoch 368 Batch 200/372] loss 3.92, ppl 50.19, throughput 353.66 samples/s, lr 0.00
[Epoch 368] throughput 24018.44 samples/s
[Epoch 368] time cost 128.42s, valid loss 4.45, valid ppl 85.30
[Epoch 369 Batch 200/372] loss 3.92, ppl 50.27, throughput 363.26 samples/s, lr 0.00
[Epoch 369] throughput 22260.76 samples/s
[Epoch 369] time cost 136.62s, valid loss 4.45, valid ppl 85.30
[Epoch 370 Batch 200/372] loss 3.91, ppl 50.09, throughput 242.61 samples/s, lr 0.00
[Epoch 370] throughput 19449.27 samples/s
[Epoch 370] time cost 148.59s, valid loss 4.45, valid ppl 85.30
[Epoch 371 Batch 200/372] loss 3.91, ppl 50.07, throughput 359.03 samples/s, lr 0.00
[Epoch 371] throughput 24368.34 samples/s
[Epoch 371] time cost 126.92s, valid loss 4.45, valid ppl 85.30
[Epoch 372 Batch 200/372] loss 3.91, ppl 50.09, throughput 359.81 samples/s, lr 0.00
[Epoch 372] throughput 24292.01 samples/s
[Epoch 372] time cost 127.43s, valid loss 4.45, valid ppl 85.30
[Epoch 373 Batch 200/372] loss 3.91, ppl 50.09, throughput 234.56 samples/s, lr 0.00
[Epoch 373] throughput 19183.03 samples/s
[Epoch 373] time cost 150.35s, valid loss 4.45, valid ppl 85.30
[Epoch 374 Batch 200/372] loss 3.92, ppl 50.33, throughput 363.33 samples/s, lr 0.00
[Epoch 374] throughput 24353.18 samples/s
[Epoch 374] time cost 127.03s, valid loss 4.45, valid ppl 85.30
[Epoch 375 Batch 200/372] loss 3.92, ppl 50.25, throughput 358.22 samples/s, lr 0.00
[Epoch 375] throughput 24246.01 samples/s
[Epoch 375] time cost 127.55s, valid loss 4.45, valid ppl 85.30
[Epoch 376 Batch 200/372] loss 3.91, ppl 49.92, throughput 237.30 samples/s, lr 0.00
[Epoch 376] throughput 19193.65 samples/s
[Epoch 376] time cost 150.19s, valid loss 4.45, valid ppl 85.30
[Epoch 377 Batch 200/372] loss 3.91, ppl 49.95, throughput 372.70 samples/s, lr 0.00
[Epoch 377] throughput 25259.92 samples/s
[Epoch 377] time cost 124.39s, valid loss 4.45, valid ppl 85.30
[Epoch 378 Batch 200/372] loss 3.91, ppl 50.11, throughput 363.40 samples/s, lr 0.00
[Epoch 378] throughput 24556.58 samples/s
[Epoch 378] time cost 126.86s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 379 Batch 200/372] loss 3.92, ppl 50.17, throughput 241.77 samples/s, lr 0.00
[Epoch 379] throughput 19405.06 samples/s
[Epoch 379] time cost 149.13s, valid loss 4.45, valid ppl 85.30
[Epoch 380 Batch 200/372] loss 3.92, ppl 50.18, throughput 358.28 samples/s, lr 0.00
[Epoch 380] throughput 24547.05 samples/s
[Epoch 380] time cost 126.30s, valid loss 4.45, valid ppl 85.30
[Epoch 381 Batch 200/372] loss 3.92, ppl 50.20, throughput 360.50 samples/s, lr 0.00
[Epoch 381] throughput 24346.79 samples/s
[Epoch 381] time cost 126.46s, valid loss 4.45, valid ppl 85.30
[Epoch 382 Batch 200/372] loss 3.91, ppl 49.95, throughput 338.60 samples/s, lr 0.00
[Epoch 382] throughput 19018.17 samples/s
[Epoch 382] time cost 151.97s, valid loss 4.45, valid ppl 85.30
[Epoch 383 Batch 200/372] loss 3.91, ppl 50.04, throughput 354.85 samples/s, lr 0.00
[Epoch 383] throughput 24447.80 samples/s
[Epoch 383] time cost 126.29s, valid loss 4.45, valid ppl 85.30
[Epoch 384 Batch 200/372] loss 3.91, ppl 49.82, throughput 356.64 samples/s, lr 0.00
[Epoch 384] throughput 24523.53 samples/s
[Epoch 384] time cost 126.84s, valid loss 4.45, valid ppl 85.30
[Epoch 385 Batch 200/372] loss 3.91, ppl 50.10, throughput 355.02 samples/s, lr 0.00
[Epoch 385] throughput 18996.47 samples/s
[Epoch 385] time cost 151.21s, valid loss 4.45, valid ppl 85.30
[Epoch 386 Batch 200/372] loss 3.92, ppl 50.17, throughput 357.65 samples/s, lr 0.00
[Epoch 386] throughput 24081.49 samples/s
[Epoch 386] time cost 128.01s, valid loss 4.45, valid ppl 85.30
[Epoch 387 Batch 200/372] loss 3.92, ppl 50.21, throughput 368.85 samples/s, lr 0.00
[Epoch 387] throughput 25546.96 samples/s
[Epoch 387] time cost 123.00s, valid loss 4.45, valid ppl 85.30
[Epoch 388 Batch 200/372] loss 3.92, ppl 50.16, throughput 374.26 samples/s, lr 0.00
[Epoch 388] throughput 22454.57 samples/s
[Epoch 388] time cost 135.38s, valid loss 4.45, valid ppl 85.30
[Epoch 389 Batch 200/372] loss 3.92, ppl 50.34, throughput 246.20 samples/s, lr 0.00
[Epoch 389] throughput 19450.43 samples/s
[Epoch 389] time cost 148.28s, valid loss 4.45, valid ppl 85.30
[Epoch 390 Batch 200/372] loss 3.92, ppl 50.16, throughput 371.89 samples/s, lr 0.00
[Epoch 390] throughput 25628.83 samples/s
[Epoch 390] time cost 122.98s, valid loss 4.45, valid ppl 85.30
[Epoch 391 Batch 200/372] loss 3.91, ppl 49.83, throughput 389.80 samples/s, lr 0.00
[Epoch 391] throughput 26366.68 samples/s
[Epoch 391] time cost 120.97s, valid loss 4.45, valid ppl 85.30
[Epoch 392 Batch 200/372] loss 3.91, ppl 50.14, throughput 245.20 samples/s, lr 0.00
[Epoch 392] throughput 20259.41 samples/s
[Epoch 392] time cost 143.90s, valid loss 4.45, valid ppl 85.30
[Epoch 393 Batch 200/372] loss 3.91, ppl 49.82, throughput 358.51 samples/s, lr 0.00
[Epoch 393] throughput 24359.08 samples/s
[Epoch 393] time cost 126.97s, valid loss 4.45, valid ppl 85.30
[Epoch 394 Batch 200/372] loss 3.91, ppl 49.88, throughput 359.11 samples/s, lr 0.00
[Epoch 394] throughput 24319.17 samples/s
[Epoch 394] time cost 127.73s, valid loss 4.45, valid ppl 85.30
[Epoch 395 Batch 200/372] loss 3.91, ppl 49.85, throughput 238.27 samples/s, lr 0.00
[Epoch 395] throughput 19390.26 samples/s
[Epoch 395] time cost 149.08s, valid loss 4.45, valid ppl 85.30
[Epoch 396 Batch 200/372] loss 3.91, ppl 49.85, throughput 359.21 samples/s, lr 0.00
[Epoch 396] throughput 24318.51 samples/s
[Epoch 396] time cost 127.72s, valid loss 4.45, valid ppl 85.30
[Epoch 397 Batch 200/372] loss 3.91, ppl 50.11, throughput 372.26 samples/s, lr 0.00
[Epoch 397] throughput 24711.54 samples/s
[Epoch 397] time cost 126.48s, valid loss 4.45, valid ppl 85.30
[Epoch 398 Batch 200/372] loss 3.91, ppl 49.78, throughput 233.51 samples/s, lr 0.00
[Epoch 398] throughput 19081.50 samples/s
[Epoch 398] time cost 150.24s, valid loss 4.45, valid ppl 85.30
[Epoch 399 Batch 200/372] loss 3.91, ppl 49.91, throughput 356.15 samples/s, lr 0.00
[Epoch 399] throughput 24199.19 samples/s
[Epoch 399] time cost 127.88s, valid loss 4.45, valid ppl 85.30
[Epoch 400 Batch 200/372] loss 3.91, ppl 49.77, throughput 360.38 samples/s, lr 0.00
[Epoch 400] throughput 24860.10 samples/s
[Epoch 400] time cost 125.80s, valid loss 4.45, valid ppl 85.30
[Epoch 401 Batch 200/372] loss 3.91, ppl 50.02, throughput 241.55 samples/s, lr 0.00
[Epoch 401] throughput 19381.79 samples/s
[Epoch 401] time cost 149.90s, valid loss 4.45, valid ppl 85.30
[Epoch 402 Batch 200/372] loss 3.91, ppl 50.09, throughput 358.99 samples/s, lr 0.00
[Epoch 402] throughput 24169.02 samples/s
[Epoch 402] time cost 127.65s, valid loss 4.45, valid ppl 85.30
[Epoch 403 Batch 200/372] loss 3.91, ppl 49.88, throughput 357.30 samples/s, lr 0.00
[Epoch 403] throughput 24337.89 samples/s
[Epoch 403] time cost 127.12s, valid loss 4.45, valid ppl 85.30
[Epoch 404 Batch 200/372] loss 3.92, ppl 50.41, throughput 238.50 samples/s, lr 0.00
[Epoch 404] throughput 19199.41 samples/s
[Epoch 404] time cost 149.95s, valid loss 4.45, valid ppl 85.30
[Epoch 405 Batch 200/372] loss 3.91, ppl 50.03, throughput 356.27 samples/s, lr 0.00
[Epoch 405] throughput 24089.21 samples/s
[Epoch 405] time cost 127.57s, valid loss 4.45, valid ppl 85.30
[Epoch 406 Batch 200/372] loss 3.92, ppl 50.33, throughput 348.00 samples/s, lr 0.00
[Epoch 406] throughput 24396.35 samples/s
[Epoch 406] time cost 127.04s, valid loss 4.45, valid ppl 85.30
[Epoch 407 Batch 200/372] loss 3.91, ppl 50.14, throughput 244.61 samples/s, lr 0.00
[Epoch 407] throughput 20196.77 samples/s
[Epoch 407] time cost 144.90s, valid loss 4.45, valid ppl 85.30
[Epoch 408 Batch 200/372] loss 3.91, ppl 49.89, throughput 359.75 samples/s, lr 0.00
[Epoch 408] throughput 24322.25 samples/s
[Epoch 408] time cost 127.45s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 409 Batch 200/372] loss 3.92, ppl 50.21, throughput 362.78 samples/s, lr 0.00
[Epoch 409] throughput 23891.52 samples/s
[Epoch 409] time cost 129.34s, valid loss 4.45, valid ppl 85.30
[Epoch 410 Batch 200/372] loss 3.92, ppl 50.21, throughput 242.35 samples/s, lr 0.00
[Epoch 410] throughput 19488.76 samples/s
[Epoch 410] time cost 148.05s, valid loss 4.45, valid ppl 85.30
[Epoch 411 Batch 200/372] loss 3.92, ppl 50.21, throughput 351.69 samples/s, lr 0.00
[Epoch 411] throughput 24251.09 samples/s
[Epoch 411] time cost 127.07s, valid loss 4.45, valid ppl 85.30
[Epoch 412 Batch 200/372] loss 3.91, ppl 49.96, throughput 362.38 samples/s, lr 0.00
[Epoch 412] throughput 24627.21 samples/s
[Epoch 412] time cost 125.90s, valid loss 4.45, valid ppl 85.30
[Epoch 413 Batch 200/372] loss 3.91, ppl 49.76, throughput 243.86 samples/s, lr 0.00
[Epoch 413] throughput 19261.10 samples/s
[Epoch 413] time cost 149.92s, valid loss 4.45, valid ppl 85.30
[Epoch 414 Batch 200/372] loss 3.92, ppl 50.19, throughput 363.04 samples/s, lr 0.00
[Epoch 414] throughput 24361.82 samples/s
[Epoch 414] time cost 127.68s, valid loss 4.45, valid ppl 85.30
[Epoch 415 Batch 200/372] loss 3.91, ppl 49.78, throughput 356.47 samples/s, lr 0.00
[Epoch 415] throughput 24947.24 samples/s
[Epoch 415] time cost 125.39s, valid loss 4.45, valid ppl 85.30
[Epoch 416 Batch 200/372] loss 3.91, ppl 50.10, throughput 336.49 samples/s, lr 0.00
[Epoch 416] throughput 15582.89 samples/s
[Epoch 416] time cost 175.73s, valid loss 4.45, valid ppl 85.30
[Epoch 417 Batch 200/372] loss 3.92, ppl 50.21, throughput 355.73 samples/s, lr 0.00
[Epoch 417] throughput 24439.26 samples/s
[Epoch 417] time cost 126.59s, valid loss 4.45, valid ppl 85.30
[Epoch 418 Batch 200/372] loss 3.92, ppl 50.51, throughput 354.08 samples/s, lr 0.00
[Epoch 418] throughput 23880.32 samples/s
[Epoch 418] time cost 129.02s, valid loss 4.45, valid ppl 85.30
[Epoch 419 Batch 200/372] loss 3.91, ppl 49.71, throughput 350.49 samples/s, lr 0.00
[Epoch 419] throughput 20017.98 samples/s
[Epoch 419] time cost 146.42s, valid loss 4.45, valid ppl 85.30
[Epoch 420 Batch 200/372] loss 3.91, ppl 49.69, throughput 365.85 samples/s, lr 0.00
[Epoch 420] throughput 24616.04 samples/s
[Epoch 420] time cost 126.17s, valid loss 4.45, valid ppl 85.30
[Epoch 421 Batch 200/372] loss 3.91, ppl 49.94, throughput 350.87 samples/s, lr 0.00
[Epoch 421] throughput 24352.15 samples/s
[Epoch 421] time cost 127.12s, valid loss 4.45, valid ppl 85.30
[Epoch 422 Batch 200/372] loss 3.91, ppl 50.10, throughput 355.31 samples/s, lr 0.00
[Epoch 422] throughput 19111.29 samples/s
[Epoch 422] time cost 151.21s, valid loss 4.45, valid ppl 85.30
[Epoch 423 Batch 200/372] loss 3.91, ppl 49.81, throughput 297.17 samples/s, lr 0.00
[Epoch 423] throughput 22408.02 samples/s
[Epoch 423] time cost 134.48s, valid loss 4.45, valid ppl 85.30
[Epoch 424 Batch 200/372] loss 3.91, ppl 50.12, throughput 386.16 samples/s, lr 0.00
[Epoch 424] throughput 26291.22 samples/s
[Epoch 424] time cost 120.68s, valid loss 4.45, valid ppl 85.30
[Epoch 425 Batch 200/372] loss 3.91, ppl 49.96, throughput 375.84 samples/s, lr 0.00
[Epoch 425] throughput 25579.89 samples/s
[Epoch 425] time cost 123.01s, valid loss 4.45, valid ppl 85.30
[Epoch 426 Batch 200/372] loss 3.92, ppl 50.33, throughput 244.57 samples/s, lr 0.00
[Epoch 426] throughput 20113.53 samples/s
[Epoch 426] time cost 145.11s, valid loss 4.45, valid ppl 85.30
[Epoch 427 Batch 200/372] loss 3.91, ppl 50.03, throughput 356.99 samples/s, lr 0.00
[Epoch 427] throughput 24317.63 samples/s
[Epoch 427] time cost 127.46s, valid loss 4.45, valid ppl 85.30
[Epoch 428 Batch 200/372] loss 3.91, ppl 49.67, throughput 351.63 samples/s, lr 0.00
[Epoch 428] throughput 24163.00 samples/s
[Epoch 428] time cost 127.67s, valid loss 4.45, valid ppl 85.30
[Epoch 429 Batch 200/372] loss 3.92, ppl 50.26, throughput 268.53 samples/s, lr 0.00
[Epoch 429] throughput 19142.99 samples/s
[Epoch 429] time cost 150.27s, valid loss 4.45, valid ppl 85.30
[Epoch 430 Batch 200/372] loss 3.92, ppl 50.17, throughput 359.00 samples/s, lr 0.00
[Epoch 430] throughput 24377.69 samples/s
[Epoch 430] time cost 127.43s, valid loss 4.45, valid ppl 85.30
[Epoch 431 Batch 200/372] loss 3.91, ppl 49.99, throughput 363.01 samples/s, lr 0.00
[Epoch 431] throughput 24211.32 samples/s
[Epoch 431] time cost 127.49s, valid loss 4.45, valid ppl 85.30
[Epoch 432 Batch 200/372] loss 3.91, ppl 50.14, throughput 267.37 samples/s, lr 0.00
[Epoch 432] throughput 19436.59 samples/s
[Epoch 432] time cost 148.71s, valid loss 4.45, valid ppl 85.30
[Epoch 433 Batch 200/372] loss 3.91, ppl 49.95, throughput 349.68 samples/s, lr 0.00
[Epoch 433] throughput 24207.65 samples/s
[Epoch 433] time cost 128.00s, valid loss 4.45, valid ppl 85.30
[Epoch 434 Batch 200/372] loss 3.91, ppl 49.90, throughput 351.11 samples/s, lr 0.00
[Epoch 434] throughput 24302.95 samples/s
[Epoch 434] time cost 127.37s, valid loss 4.45, valid ppl 85.30
[Epoch 435 Batch 200/372] loss 3.91, ppl 50.04, throughput 352.35 samples/s, lr 0.00
[Epoch 435] throughput 19269.26 samples/s
[Epoch 435] time cost 150.42s, valid loss 4.45, valid ppl 85.30
[Epoch 436 Batch 200/372] loss 3.91, ppl 49.97, throughput 362.52 samples/s, lr 0.00
[Epoch 436] throughput 24246.09 samples/s
[Epoch 436] time cost 127.84s, valid loss 4.45, valid ppl 85.30
[Epoch 437 Batch 200/372] loss 3.91, ppl 49.97, throughput 363.41 samples/s, lr 0.00
[Epoch 437] throughput 24500.76 samples/s
[Epoch 437] time cost 126.37s, valid loss 4.45, valid ppl 85.30
[Epoch 438 Batch 200/372] loss 3.91, ppl 50.08, throughput 320.97 samples/s, lr 0.00
[Epoch 438] throughput 19152.47 samples/s
[Epoch 438] time cost 150.35s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 439 Batch 200/372] loss 3.91, ppl 50.13, throughput 358.11 samples/s, lr 0.00
[Epoch 439] throughput 24402.17 samples/s
[Epoch 439] time cost 128.02s, valid loss 4.45, valid ppl 85.30
[Epoch 440 Batch 200/372] loss 3.90, ppl 49.46, throughput 353.86 samples/s, lr 0.00
[Epoch 440] throughput 24234.85 samples/s
[Epoch 440] time cost 128.15s, valid loss 4.45, valid ppl 85.30
[Epoch 441 Batch 200/372] loss 3.92, ppl 50.16, throughput 247.27 samples/s, lr 0.00
[Epoch 441] throughput 18979.96 samples/s
[Epoch 441] time cost 151.31s, valid loss 4.45, valid ppl 85.30
[Epoch 442 Batch 200/372] loss 3.92, ppl 50.34, throughput 348.76 samples/s, lr 0.00
[Epoch 442] throughput 24387.04 samples/s
[Epoch 442] time cost 126.78s, valid loss 4.45, valid ppl 85.30
[Epoch 443 Batch 200/372] loss 3.91, ppl 49.90, throughput 355.27 samples/s, lr 0.00
[Epoch 443] throughput 24672.14 samples/s
[Epoch 443] time cost 126.53s, valid loss 4.45, valid ppl 85.30
[Epoch 444 Batch 200/372] loss 3.91, ppl 50.01, throughput 248.29 samples/s, lr 0.00
[Epoch 444] throughput 19412.71 samples/s
[Epoch 444] time cost 149.36s, valid loss 4.45, valid ppl 85.30
[Epoch 445 Batch 200/372] loss 3.92, ppl 50.43, throughput 358.73 samples/s, lr 0.00
[Epoch 445] throughput 24367.70 samples/s
[Epoch 445] time cost 128.03s, valid loss 4.45, valid ppl 85.30
[Epoch 446 Batch 200/372] loss 3.91, ppl 50.02, throughput 358.51 samples/s, lr 0.00
[Epoch 446] throughput 24412.66 samples/s
[Epoch 446] time cost 127.36s, valid loss 4.45, valid ppl 85.30
[Epoch 447 Batch 200/372] loss 3.91, ppl 49.93, throughput 245.64 samples/s, lr 0.00
[Epoch 447] throughput 19186.68 samples/s
[Epoch 447] time cost 150.45s, valid loss 4.45, valid ppl 85.30
[Epoch 448 Batch 200/372] loss 3.92, ppl 50.16, throughput 358.02 samples/s, lr 0.00
[Epoch 448] throughput 24163.37 samples/s
[Epoch 448] time cost 128.61s, valid loss 4.45, valid ppl 85.30
[Epoch 449 Batch 200/372] loss 3.91, ppl 50.03, throughput 361.64 samples/s, lr 0.00
[Epoch 449] throughput 24587.99 samples/s
[Epoch 449] time cost 126.49s, valid loss 4.45, valid ppl 85.30
[Epoch 450 Batch 200/372] loss 3.91, ppl 49.89, throughput 237.22 samples/s, lr 0.00
[Epoch 450] throughput 19230.23 samples/s
[Epoch 450] time cost 150.91s, valid loss 4.45, valid ppl 85.30
[Epoch 451 Batch 200/372] loss 3.91, ppl 49.79, throughput 361.16 samples/s, lr 0.00
[Epoch 451] throughput 24256.26 samples/s
[Epoch 451] time cost 127.82s, valid loss 4.45, valid ppl 85.30
[Epoch 452 Batch 200/372] loss 3.92, ppl 50.35, throughput 365.13 samples/s, lr 0.00
[Epoch 452] throughput 24755.04 samples/s
[Epoch 452] time cost 125.40s, valid loss 4.45, valid ppl 85.30
[Epoch 453 Batch 200/372] loss 3.91, ppl 50.14, throughput 234.83 samples/s, lr 0.00
[Epoch 453] throughput 19104.37 samples/s
[Epoch 453] time cost 150.54s, valid loss 4.45, valid ppl 85.30
[Epoch 454 Batch 200/372] loss 3.91, ppl 49.78, throughput 350.63 samples/s, lr 0.00
[Epoch 454] throughput 23914.40 samples/s
[Epoch 454] time cost 128.52s, valid loss 4.45, valid ppl 85.30
[Epoch 455 Batch 200/372] loss 3.92, ppl 50.51, throughput 344.28 samples/s, lr 0.00
[Epoch 455] throughput 24130.65 samples/s
[Epoch 455] time cost 127.63s, valid loss 4.45, valid ppl 85.30
[Epoch 456 Batch 200/372] loss 3.91, ppl 50.03, throughput 240.75 samples/s, lr 0.00
[Epoch 456] throughput 19129.61 samples/s
[Epoch 456] time cost 150.40s, valid loss 4.45, valid ppl 85.30
[Epoch 457 Batch 200/372] loss 3.91, ppl 49.87, throughput 362.84 samples/s, lr 0.00
[Epoch 457] throughput 24018.00 samples/s
[Epoch 457] time cost 128.05s, valid loss 4.45, valid ppl 85.30
[Epoch 458 Batch 200/372] loss 3.91, ppl 50.00, throughput 362.15 samples/s, lr 0.00
[Epoch 458] throughput 24374.37 samples/s
[Epoch 458] time cost 126.88s, valid loss 4.45, valid ppl 85.30
[Epoch 459 Batch 200/372] loss 3.90, ppl 49.64, throughput 246.54 samples/s, lr 0.00
[Epoch 459] throughput 18858.63 samples/s
[Epoch 459] time cost 151.79s, valid loss 4.45, valid ppl 85.30
[Epoch 460 Batch 200/372] loss 3.91, ppl 49.81, throughput 359.64 samples/s, lr 0.00
[Epoch 460] throughput 24257.62 samples/s
[Epoch 460] time cost 127.63s, valid loss 4.45, valid ppl 85.30
[Epoch 461 Batch 200/372] loss 3.91, ppl 49.93, throughput 359.44 samples/s, lr 0.00
[Epoch 461] throughput 24009.24 samples/s
[Epoch 461] time cost 128.26s, valid loss 4.45, valid ppl 85.30
[Epoch 462 Batch 200/372] loss 3.91, ppl 49.88, throughput 309.43 samples/s, lr 0.00
[Epoch 462] throughput 20185.34 samples/s
[Epoch 462] time cost 144.43s, valid loss 4.45, valid ppl 85.30
[Epoch 463 Batch 200/372] loss 3.91, ppl 49.96, throughput 382.42 samples/s, lr 0.00
[Epoch 463] throughput 25590.42 samples/s
[Epoch 463] time cost 122.95s, valid loss 4.45, valid ppl 85.30
[Epoch 464 Batch 200/372] loss 3.91, ppl 49.97, throughput 384.23 samples/s, lr 0.00
[Epoch 464] throughput 26013.33 samples/s
[Epoch 464] time cost 121.59s, valid loss 4.45, valid ppl 85.30
[Epoch 465 Batch 200/372] loss 3.92, ppl 50.30, throughput 363.35 samples/s, lr 0.00
[Epoch 465] throughput 19477.90 samples/s
[Epoch 465] time cost 148.21s, valid loss 4.45, valid ppl 85.30
[Epoch 466 Batch 200/372] loss 3.92, ppl 50.23, throughput 357.01 samples/s, lr 0.00
[Epoch 466] throughput 24085.90 samples/s
[Epoch 466] time cost 127.84s, valid loss 4.45, valid ppl 85.30
[Epoch 467 Batch 200/372] loss 3.91, ppl 49.79, throughput 356.21 samples/s, lr 0.00
[Epoch 467] throughput 24173.81 samples/s
[Epoch 467] time cost 127.74s, valid loss 4.45, valid ppl 85.30
[Epoch 468 Batch 200/372] loss 3.92, ppl 50.21, throughput 342.10 samples/s, lr 0.00
[Epoch 468] throughput 19155.89 samples/s
[Epoch 468] time cost 150.56s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 469 Batch 200/372] loss 3.92, ppl 50.16, throughput 356.19 samples/s, lr 0.00
[Epoch 469] throughput 23848.42 samples/s
[Epoch 469] time cost 128.56s, valid loss 4.45, valid ppl 85.30
[Epoch 470 Batch 200/372] loss 3.91, ppl 49.69, throughput 356.61 samples/s, lr 0.00
[Epoch 470] throughput 24162.15 samples/s
[Epoch 470] time cost 127.16s, valid loss 4.45, valid ppl 85.30
[Epoch 471 Batch 200/372] loss 3.91, ppl 50.02, throughput 350.74 samples/s, lr 0.00
[Epoch 471] throughput 19265.48 samples/s
[Epoch 471] time cost 149.74s, valid loss 4.45, valid ppl 85.30
[Epoch 472 Batch 200/372] loss 3.92, ppl 50.26, throughput 357.53 samples/s, lr 0.00
[Epoch 472] throughput 24403.79 samples/s
[Epoch 472] time cost 126.38s, valid loss 4.45, valid ppl 85.30
[Epoch 473 Batch 200/372] loss 3.92, ppl 50.25, throughput 355.25 samples/s, lr 0.00
[Epoch 473] throughput 23834.07 samples/s
[Epoch 473] time cost 128.49s, valid loss 4.45, valid ppl 85.30
[Epoch 474 Batch 200/372] loss 3.91, ppl 49.87, throughput 351.35 samples/s, lr 0.00
[Epoch 474] throughput 20633.96 samples/s
[Epoch 474] time cost 142.29s, valid loss 4.45, valid ppl 85.30
[Epoch 475 Batch 200/372] loss 3.90, ppl 49.41, throughput 268.58 samples/s, lr 0.00
[Epoch 475] throughput 20841.38 samples/s
[Epoch 475] time cost 141.57s, valid loss 4.45, valid ppl 85.30
[Epoch 476 Batch 200/372] loss 3.92, ppl 50.20, throughput 363.53 samples/s, lr 0.00
[Epoch 476] throughput 24559.40 samples/s
[Epoch 476] time cost 126.86s, valid loss 4.45, valid ppl 85.30
[Epoch 477 Batch 200/372] loss 3.92, ppl 50.17, throughput 358.91 samples/s, lr 0.00
[Epoch 477] throughput 24010.65 samples/s
[Epoch 477] time cost 128.35s, valid loss 4.45, valid ppl 85.30
[Epoch 478 Batch 200/372] loss 3.91, ppl 49.73, throughput 236.48 samples/s, lr 0.00
[Epoch 478] throughput 19249.62 samples/s
[Epoch 478] time cost 149.61s, valid loss 4.45, valid ppl 85.30
[Epoch 479 Batch 200/372] loss 3.92, ppl 50.15, throughput 361.45 samples/s, lr 0.00
[Epoch 479] throughput 24351.42 samples/s
[Epoch 479] time cost 127.08s, valid loss 4.45, valid ppl 85.30
[Epoch 480 Batch 200/372] loss 3.91, ppl 50.04, throughput 352.69 samples/s, lr 0.00
[Epoch 480] throughput 24141.73 samples/s
[Epoch 480] time cost 128.17s, valid loss 4.45, valid ppl 85.30
[Epoch 481 Batch 200/372] loss 3.91, ppl 50.13, throughput 212.46 samples/s, lr 0.00
[Epoch 481] throughput 15661.26 samples/s
[Epoch 481] time cost 174.82s, valid loss 4.45, valid ppl 85.30
[Epoch 482 Batch 200/372] loss 3.91, ppl 50.05, throughput 362.52 samples/s, lr 0.00
[Epoch 482] throughput 24704.47 samples/s
[Epoch 482] time cost 125.58s, valid loss 4.45, valid ppl 85.30
[Epoch 483 Batch 200/372] loss 3.92, ppl 50.30, throughput 358.41 samples/s, lr 0.00
[Epoch 483] throughput 24107.81 samples/s
[Epoch 483] time cost 128.31s, valid loss 4.45, valid ppl 85.30
[Epoch 484 Batch 200/372] loss 3.90, ppl 49.65, throughput 356.58 samples/s, lr 0.00
[Epoch 484] throughput 19165.89 samples/s
[Epoch 484] time cost 150.49s, valid loss 4.45, valid ppl 85.30
[Epoch 485 Batch 200/372] loss 3.91, ppl 50.00, throughput 358.79 samples/s, lr 0.00
[Epoch 485] throughput 23915.49 samples/s
[Epoch 485] time cost 128.58s, valid loss 4.45, valid ppl 85.30
[Epoch 486 Batch 200/372] loss 3.92, ppl 50.24, throughput 356.93 samples/s, lr 0.00
[Epoch 486] throughput 24309.81 samples/s
[Epoch 486] time cost 127.49s, valid loss 4.45, valid ppl 85.30
[Epoch 487 Batch 200/372] loss 3.91, ppl 49.89, throughput 364.46 samples/s, lr 0.00
[Epoch 487] throughput 22079.83 samples/s
[Epoch 487] time cost 136.93s, valid loss 4.45, valid ppl 85.30
[Epoch 488 Batch 200/372] loss 3.92, ppl 50.43, throughput 356.15 samples/s, lr 0.00
[Epoch 488] throughput 23695.79 samples/s
[Epoch 488] time cost 129.80s, valid loss 4.45, valid ppl 85.30
[Epoch 489 Batch 200/372] loss 3.92, ppl 50.36, throughput 351.11 samples/s, lr 0.00
[Epoch 489] throughput 23955.61 samples/s
[Epoch 489] time cost 128.54s, valid loss 4.45, valid ppl 85.30
[Epoch 490 Batch 200/372] loss 3.91, ppl 50.02, throughput 353.22 samples/s, lr 0.00
[Epoch 490] throughput 24295.64 samples/s
[Epoch 490] time cost 128.30s, valid loss 4.45, valid ppl 85.30
[Epoch 491 Batch 200/372] loss 3.92, ppl 50.41, throughput 272.23 samples/s, lr 0.00
[Epoch 491] throughput 20917.67 samples/s
[Epoch 491] time cost 141.56s, valid loss 4.45, valid ppl 85.30
[Epoch 492 Batch 200/372] loss 3.91, ppl 49.75, throughput 351.95 samples/s, lr 0.00
[Epoch 492] throughput 24230.68 samples/s
[Epoch 492] time cost 127.22s, valid loss 4.45, valid ppl 85.30
[Epoch 493 Batch 200/372] loss 3.91, ppl 50.09, throughput 353.95 samples/s, lr 0.00
[Epoch 493] throughput 23941.67 samples/s
[Epoch 493] time cost 128.87s, valid loss 4.45, valid ppl 85.30
[Epoch 494 Batch 200/372] loss 3.92, ppl 50.18, throughput 267.85 samples/s, lr 0.00
[Epoch 494] throughput 20663.56 samples/s
[Epoch 494] time cost 142.27s, valid loss 4.45, valid ppl 85.30
[Epoch 495 Batch 200/372] loss 3.90, ppl 49.60, throughput 352.71 samples/s, lr 0.00
[Epoch 495] throughput 24028.37 samples/s
[Epoch 495] time cost 127.88s, valid loss 4.45, valid ppl 85.30
[Epoch 496 Batch 200/372] loss 3.91, ppl 50.04, throughput 360.83 samples/s, lr 0.00
[Epoch 496] throughput 24407.54 samples/s
[Epoch 496] time cost 127.25s, valid loss 4.45, valid ppl 85.30
[Epoch 497 Batch 200/372] loss 3.90, ppl 49.64, throughput 235.75 samples/s, lr 0.00
[Epoch 497] throughput 19122.12 samples/s
[Epoch 497] time cost 150.49s, valid loss 4.45, valid ppl 85.30
[Epoch 498 Batch 200/372] loss 3.91, ppl 49.82, throughput 353.55 samples/s, lr 0.00
[Epoch 498] throughput 24331.31 samples/s
[Epoch 498] time cost 126.93s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 499 Batch 200/372] loss 3.91, ppl 49.75, throughput 353.27 samples/s, lr 0.00
[Epoch 499] throughput 24509.08 samples/s
[Epoch 499] time cost 127.47s, valid loss 4.45, valid ppl 85.30
[Epoch 500 Batch 200/372] loss 3.92, ppl 50.16, throughput 271.99 samples/s, lr 0.00
[Epoch 500] throughput 20880.92 samples/s
[Epoch 500] time cost 142.25s, valid loss 4.45, valid ppl 85.30
[Epoch 501 Batch 200/372] loss 3.91, ppl 50.13, throughput 367.09 samples/s, lr 0.00
[Epoch 501] throughput 25646.92 samples/s
[Epoch 501] time cost 123.23s, valid loss 4.45, valid ppl 85.30
[Epoch 502 Batch 200/372] loss 3.92, ppl 50.17, throughput 366.98 samples/s, lr 0.00
[Epoch 502] throughput 24513.10 samples/s
[Epoch 502] time cost 128.61s, valid loss 4.45, valid ppl 85.30
[Epoch 503 Batch 200/372] loss 3.91, ppl 49.98, throughput 316.98 samples/s, lr 0.00
[Epoch 503] throughput 22629.27 samples/s
[Epoch 503] time cost 134.56s, valid loss 4.45, valid ppl 85.30
[Epoch 504 Batch 200/372] loss 3.91, ppl 50.11, throughput 351.64 samples/s, lr 0.00
[Epoch 504] throughput 24053.75 samples/s
[Epoch 504] time cost 128.51s, valid loss 4.45, valid ppl 85.30
[Epoch 505 Batch 200/372] loss 3.91, ppl 50.13, throughput 351.43 samples/s, lr 0.00
[Epoch 505] throughput 24232.03 samples/s
[Epoch 505] time cost 128.28s, valid loss 4.45, valid ppl 85.30
[Epoch 506 Batch 200/372] loss 3.91, ppl 49.99, throughput 279.02 samples/s, lr 0.00
[Epoch 506] throughput 21422.35 samples/s
[Epoch 506] time cost 138.59s, valid loss 4.45, valid ppl 85.30
[Epoch 507 Batch 200/372] loss 3.91, ppl 49.99, throughput 355.19 samples/s, lr 0.00
[Epoch 507] throughput 23953.51 samples/s
[Epoch 507] time cost 128.20s, valid loss 4.45, valid ppl 85.30
[Epoch 508 Batch 200/372] loss 3.92, ppl 50.17, throughput 350.68 samples/s, lr 0.00
[Epoch 508] throughput 23965.77 samples/s
[Epoch 508] time cost 128.95s, valid loss 4.45, valid ppl 85.30
[Epoch 509 Batch 200/372] loss 3.93, ppl 50.67, throughput 237.20 samples/s, lr 0.00
[Epoch 509] throughput 19204.30 samples/s
[Epoch 509] time cost 150.52s, valid loss 4.45, valid ppl 85.30
[Epoch 510 Batch 200/372] loss 3.92, ppl 50.18, throughput 350.44 samples/s, lr 0.00
[Epoch 510] throughput 23952.96 samples/s
[Epoch 510] time cost 129.14s, valid loss 4.45, valid ppl 85.30
[Epoch 511 Batch 200/372] loss 3.92, ppl 50.19, throughput 352.76 samples/s, lr 0.00
[Epoch 511] throughput 23960.81 samples/s
[Epoch 511] time cost 129.44s, valid loss 4.45, valid ppl 85.30
[Epoch 512 Batch 200/372] loss 3.91, ppl 49.73, throughput 237.24 samples/s, lr 0.00
[Epoch 512] throughput 19219.31 samples/s
[Epoch 512] time cost 150.66s, valid loss 4.45, valid ppl 85.30
[Epoch 513 Batch 200/372] loss 3.91, ppl 50.03, throughput 356.62 samples/s, lr 0.00
[Epoch 513] throughput 24200.96 samples/s
[Epoch 513] time cost 127.51s, valid loss 4.45, valid ppl 85.30
[Epoch 514 Batch 200/372] loss 3.92, ppl 50.38, throughput 354.12 samples/s, lr 0.00
[Epoch 514] throughput 24059.57 samples/s
[Epoch 514] time cost 127.64s, valid loss 4.45, valid ppl 85.30
[Epoch 515 Batch 200/372] loss 3.91, ppl 50.09, throughput 237.00 samples/s, lr 0.00
[Epoch 515] throughput 19250.41 samples/s
[Epoch 515] time cost 150.78s, valid loss 4.45, valid ppl 85.30
[Epoch 516 Batch 200/372] loss 3.91, ppl 49.80, throughput 349.60 samples/s, lr 0.00
[Epoch 516] throughput 23762.76 samples/s
[Epoch 516] time cost 129.42s, valid loss 4.45, valid ppl 85.30
[Epoch 517 Batch 200/372] loss 3.91, ppl 49.89, throughput 347.84 samples/s, lr 0.00
[Epoch 517] throughput 23953.72 samples/s
[Epoch 517] time cost 129.02s, valid loss 4.45, valid ppl 85.30
[Epoch 518 Batch 200/372] loss 3.92, ppl 50.16, throughput 192.21 samples/s, lr 0.00
[Epoch 518] throughput 15467.83 samples/s
[Epoch 518] time cost 176.33s, valid loss 4.45, valid ppl 85.30
[Epoch 519 Batch 200/372] loss 3.92, ppl 50.23, throughput 346.53 samples/s, lr 0.00
[Epoch 519] throughput 23747.43 samples/s
[Epoch 519] time cost 129.09s, valid loss 4.45, valid ppl 85.30
[Epoch 520 Batch 200/372] loss 3.91, ppl 50.02, throughput 386.22 samples/s, lr 0.00
[Epoch 520] throughput 26328.66 samples/s
[Epoch 520] time cost 121.15s, valid loss 4.45, valid ppl 85.30
[Epoch 521 Batch 200/372] loss 3.91, ppl 49.96, throughput 349.75 samples/s, lr 0.00
[Epoch 521] throughput 18977.07 samples/s
[Epoch 521] time cost 151.67s, valid loss 4.45, valid ppl 85.30
[Epoch 522 Batch 200/372] loss 3.91, ppl 49.96, throughput 350.81 samples/s, lr 0.00
[Epoch 522] throughput 24626.79 samples/s
[Epoch 522] time cost 126.95s, valid loss 4.45, valid ppl 85.30
[Epoch 523 Batch 200/372] loss 3.92, ppl 50.16, throughput 379.31 samples/s, lr 0.00
[Epoch 523] throughput 25226.61 samples/s
[Epoch 523] time cost 124.03s, valid loss 4.45, valid ppl 85.30
[Epoch 524 Batch 200/372] loss 3.92, ppl 50.30, throughput 360.90 samples/s, lr 0.00
[Epoch 524] throughput 19097.18 samples/s
[Epoch 524] time cost 150.45s, valid loss 4.45, valid ppl 85.30
[Epoch 525 Batch 200/372] loss 3.91, ppl 50.05, throughput 357.40 samples/s, lr 0.00
[Epoch 525] throughput 24281.54 samples/s
[Epoch 525] time cost 126.73s, valid loss 4.45, valid ppl 85.30
[Epoch 526 Batch 200/372] loss 3.91, ppl 49.98, throughput 355.24 samples/s, lr 0.00
[Epoch 526] throughput 24161.32 samples/s
[Epoch 526] time cost 128.20s, valid loss 4.45, valid ppl 85.30
[Epoch 527 Batch 200/372] loss 3.92, ppl 50.49, throughput 356.27 samples/s, lr 0.00
[Epoch 527] throughput 19180.24 samples/s
[Epoch 527] time cost 150.15s, valid loss 4.45, valid ppl 85.30
[Epoch 528 Batch 200/372] loss 3.91, ppl 50.01, throughput 361.54 samples/s, lr 0.00
[Epoch 528] throughput 24159.85 samples/s
[Epoch 528] time cost 127.50s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 529 Batch 200/372] loss 3.91, ppl 49.66, throughput 354.99 samples/s, lr 0.00
[Epoch 529] throughput 24433.75 samples/s
[Epoch 529] time cost 127.67s, valid loss 4.45, valid ppl 85.30
[Epoch 530 Batch 200/372] loss 3.92, ppl 50.30, throughput 357.24 samples/s, lr 0.00
[Epoch 530] throughput 19425.35 samples/s
[Epoch 530] time cost 148.33s, valid loss 4.45, valid ppl 85.30
[Epoch 531 Batch 200/372] loss 3.92, ppl 50.26, throughput 394.03 samples/s, lr 0.00
[Epoch 531] throughput 26691.33 samples/s
[Epoch 531] time cost 120.02s, valid loss 4.45, valid ppl 85.30
[Epoch 532 Batch 200/372] loss 3.91, ppl 49.92, throughput 363.65 samples/s, lr 0.00
[Epoch 532] throughput 24302.71 samples/s
[Epoch 532] time cost 127.59s, valid loss 4.45, valid ppl 85.30
[Epoch 533 Batch 200/372] loss 3.91, ppl 49.80, throughput 353.22 samples/s, lr 0.00
[Epoch 533] throughput 19059.91 samples/s
[Epoch 533] time cost 150.82s, valid loss 4.45, valid ppl 85.30
[Epoch 534 Batch 200/372] loss 3.91, ppl 49.79, throughput 364.30 samples/s, lr 0.00
[Epoch 534] throughput 24330.12 samples/s
[Epoch 534] time cost 126.76s, valid loss 4.45, valid ppl 85.30
[Epoch 535 Batch 200/372] loss 3.92, ppl 50.29, throughput 354.70 samples/s, lr 0.00
[Epoch 535] throughput 24587.79 samples/s
[Epoch 535] time cost 126.03s, valid loss 4.45, valid ppl 85.30
[Epoch 536 Batch 200/372] loss 3.92, ppl 50.16, throughput 361.48 samples/s, lr 0.00
[Epoch 536] throughput 19614.77 samples/s
[Epoch 536] time cost 147.86s, valid loss 4.45, valid ppl 85.30
[Epoch 537 Batch 200/372] loss 3.91, ppl 49.92, throughput 365.82 samples/s, lr 0.00
[Epoch 537] throughput 24437.69 samples/s
[Epoch 537] time cost 126.47s, valid loss 4.45, valid ppl 85.30
[Epoch 538 Batch 200/372] loss 3.92, ppl 50.42, throughput 358.98 samples/s, lr 0.00
[Epoch 538] throughput 24349.83 samples/s
[Epoch 538] time cost 127.31s, valid loss 4.45, valid ppl 85.30
[Epoch 539 Batch 200/372] loss 3.91, ppl 49.82, throughput 350.08 samples/s, lr 0.00
[Epoch 539] throughput 23826.56 samples/s
[Epoch 539] time cost 128.87s, valid loss 4.45, valid ppl 85.30
[Epoch 540 Batch 200/372] loss 3.91, ppl 49.80, throughput 332.47 samples/s, lr 0.00
[Epoch 540] throughput 23409.94 samples/s
[Epoch 540] time cost 130.58s, valid loss 4.45, valid ppl 85.30
[Epoch 541 Batch 200/372] loss 3.92, ppl 50.57, throughput 365.98 samples/s, lr 0.00
[Epoch 541] throughput 24425.27 samples/s
[Epoch 541] time cost 125.90s, valid loss 4.45, valid ppl 85.30
[Epoch 542 Batch 200/372] loss 3.92, ppl 50.15, throughput 357.19 samples/s, lr 0.00
[Epoch 542] throughput 24238.46 samples/s
[Epoch 542] time cost 127.28s, valid loss 4.45, valid ppl 85.30
[Epoch 543 Batch 200/372] loss 3.91, ppl 50.02, throughput 249.51 samples/s, lr 0.00
[Epoch 543] throughput 20340.50 samples/s
[Epoch 543] time cost 144.88s, valid loss 4.45, valid ppl 85.30
[Epoch 544 Batch 200/372] loss 3.91, ppl 50.07, throughput 357.30 samples/s, lr 0.00
[Epoch 544] throughput 24456.09 samples/s
[Epoch 544] time cost 126.64s, valid loss 4.45, valid ppl 85.30
[Epoch 545 Batch 200/372] loss 3.91, ppl 50.11, throughput 349.87 samples/s, lr 0.00
[Epoch 545] throughput 24075.76 samples/s
[Epoch 545] time cost 127.53s, valid loss 4.45, valid ppl 85.30
[Epoch 546 Batch 200/372] loss 3.91, ppl 49.83, throughput 235.30 samples/s, lr 0.00
[Epoch 546] throughput 19355.05 samples/s
[Epoch 546] time cost 149.07s, valid loss 4.45, valid ppl 85.30
[Epoch 547 Batch 200/372] loss 3.92, ppl 50.28, throughput 365.73 samples/s, lr 0.00
[Epoch 547] throughput 24787.34 samples/s
[Epoch 547] time cost 125.64s, valid loss 4.45, valid ppl 85.30
[Epoch 548 Batch 200/372] loss 3.91, ppl 49.91, throughput 356.98 samples/s, lr 0.00
[Epoch 548] throughput 24306.96 samples/s
[Epoch 548] time cost 126.75s, valid loss 4.45, valid ppl 85.30
[Epoch 549 Batch 200/372] loss 3.92, ppl 50.16, throughput 236.71 samples/s, lr 0.00
[Epoch 549] throughput 19018.16 samples/s
[Epoch 549] time cost 151.63s, valid loss 4.45, valid ppl 85.30
[Epoch 550 Batch 200/372] loss 3.91, ppl 50.14, throughput 365.85 samples/s, lr 0.00
[Epoch 550] throughput 24650.86 samples/s
[Epoch 550] time cost 125.92s, valid loss 4.45, valid ppl 85.30
[Epoch 551 Batch 200/372] loss 3.91, ppl 49.89, throughput 374.90 samples/s, lr 0.00
[Epoch 551] throughput 25950.19 samples/s
[Epoch 551] time cost 122.22s, valid loss 4.45, valid ppl 85.30
[Epoch 552 Batch 200/372] loss 3.92, ppl 50.19, throughput 242.65 samples/s, lr 0.00
[Epoch 552] throughput 20028.78 samples/s
[Epoch 552] time cost 146.12s, valid loss 4.45, valid ppl 85.30
[Epoch 553 Batch 200/372] loss 3.92, ppl 50.17, throughput 386.70 samples/s, lr 0.00
[Epoch 553] throughput 26256.36 samples/s
[Epoch 553] time cost 121.30s, valid loss 4.45, valid ppl 85.30
[Epoch 554 Batch 200/372] loss 3.91, ppl 49.99, throughput 388.38 samples/s, lr 0.00
[Epoch 554] throughput 26129.08 samples/s
[Epoch 554] time cost 121.76s, valid loss 4.45, valid ppl 85.30
[Epoch 555 Batch 200/372] loss 3.92, ppl 50.32, throughput 244.28 samples/s, lr 0.00
[Epoch 555] throughput 19523.64 samples/s
[Epoch 555] time cost 147.95s, valid loss 4.45, valid ppl 85.30
[Epoch 556 Batch 200/372] loss 3.91, ppl 50.07, throughput 371.75 samples/s, lr 0.00
[Epoch 556] throughput 25266.30 samples/s
[Epoch 556] time cost 124.45s, valid loss 4.45, valid ppl 85.30
[Epoch 557 Batch 200/372] loss 3.91, ppl 49.93, throughput 361.23 samples/s, lr 0.00
[Epoch 557] throughput 24333.95 samples/s
[Epoch 557] time cost 127.48s, valid loss 4.45, valid ppl 85.30
[Epoch 558 Batch 200/372] loss 3.92, ppl 50.19, throughput 362.60 samples/s, lr 0.00
[Epoch 558] throughput 19315.07 samples/s
[Epoch 558] time cost 150.23s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 559 Batch 200/372] loss 3.91, ppl 49.99, throughput 352.87 samples/s, lr 0.00
[Epoch 559] throughput 24233.01 samples/s
[Epoch 559] time cost 127.18s, valid loss 4.45, valid ppl 85.30
[Epoch 560 Batch 200/372] loss 3.92, ppl 50.35, throughput 357.89 samples/s, lr 0.00
[Epoch 560] throughput 24168.98 samples/s
[Epoch 560] time cost 127.56s, valid loss 4.45, valid ppl 85.30
[Epoch 561 Batch 200/372] loss 3.91, ppl 49.94, throughput 359.92 samples/s, lr 0.00
[Epoch 561] throughput 19260.37 samples/s
[Epoch 561] time cost 149.53s, valid loss 4.45, valid ppl 85.30
[Epoch 562 Batch 200/372] loss 3.90, ppl 49.61, throughput 359.91 samples/s, lr 0.00
[Epoch 562] throughput 24351.65 samples/s
[Epoch 562] time cost 127.19s, valid loss 4.45, valid ppl 85.30
[Epoch 563 Batch 200/372] loss 3.92, ppl 50.19, throughput 353.76 samples/s, lr 0.00
[Epoch 563] throughput 23927.78 samples/s
[Epoch 563] time cost 128.35s, valid loss 4.45, valid ppl 85.30
[Epoch 564 Batch 200/372] loss 3.91, ppl 50.12, throughput 356.48 samples/s, lr 0.00
[Epoch 564] throughput 19137.09 samples/s
[Epoch 564] time cost 150.20s, valid loss 4.45, valid ppl 85.30
[Epoch 565 Batch 200/372] loss 3.91, ppl 50.06, throughput 354.19 samples/s, lr 0.00
[Epoch 565] throughput 24097.01 samples/s
[Epoch 565] time cost 127.38s, valid loss 4.45, valid ppl 85.30
[Epoch 566 Batch 200/372] loss 3.91, ppl 50.13, throughput 353.53 samples/s, lr 0.00
[Epoch 566] throughput 24025.27 samples/s
[Epoch 566] time cost 128.63s, valid loss 4.45, valid ppl 85.30
[Epoch 567 Batch 200/372] loss 3.92, ppl 50.38, throughput 350.81 samples/s, lr 0.00
[Epoch 567] throughput 20585.55 samples/s
[Epoch 567] time cost 143.25s, valid loss 4.45, valid ppl 85.30
[Epoch 568 Batch 200/372] loss 3.91, ppl 50.00, throughput 356.89 samples/s, lr 0.00
[Epoch 568] throughput 24055.13 samples/s
[Epoch 568] time cost 127.64s, valid loss 4.45, valid ppl 85.30
[Epoch 569 Batch 200/372] loss 3.91, ppl 50.07, throughput 346.53 samples/s, lr 0.00
[Epoch 569] throughput 24233.98 samples/s
[Epoch 569] time cost 127.90s, valid loss 4.45, valid ppl 85.30
[Epoch 570 Batch 200/372] loss 3.91, ppl 49.74, throughput 365.54 samples/s, lr 0.00
[Epoch 570] throughput 22495.07 samples/s
[Epoch 570] time cost 134.24s, valid loss 4.45, valid ppl 85.30
[Epoch 571 Batch 200/372] loss 3.90, ppl 49.60, throughput 355.07 samples/s, lr 0.00
[Epoch 571] throughput 24553.40 samples/s
[Epoch 571] time cost 126.70s, valid loss 4.45, valid ppl 85.30
[Epoch 572 Batch 200/372] loss 3.91, ppl 50.03, throughput 358.20 samples/s, lr 0.00
[Epoch 572] throughput 24151.35 samples/s
[Epoch 572] time cost 128.12s, valid loss 4.45, valid ppl 85.30
[Epoch 573 Batch 200/372] loss 3.91, ppl 49.66, throughput 360.38 samples/s, lr 0.00
[Epoch 573] throughput 24147.38 samples/s
[Epoch 573] time cost 127.79s, valid loss 4.45, valid ppl 85.30
[Epoch 574 Batch 200/372] loss 3.91, ppl 50.06, throughput 172.21 samples/s, lr 0.00
[Epoch 574] throughput 15638.72 samples/s
[Epoch 574] time cost 174.85s, valid loss 4.45, valid ppl 85.30
[Epoch 575 Batch 200/372] loss 3.92, ppl 50.19, throughput 354.70 samples/s, lr 0.00
[Epoch 575] throughput 23977.70 samples/s
[Epoch 575] time cost 128.41s, valid loss 4.45, valid ppl 85.30
[Epoch 576 Batch 200/372] loss 3.92, ppl 50.31, throughput 350.93 samples/s, lr 0.00
[Epoch 576] throughput 24199.34 samples/s
[Epoch 576] time cost 127.66s, valid loss 4.45, valid ppl 85.30
[Epoch 577 Batch 200/372] loss 3.91, ppl 49.76, throughput 292.94 samples/s, lr 0.00
[Epoch 577] throughput 18968.20 samples/s
[Epoch 577] time cost 151.35s, valid loss 4.45, valid ppl 85.30
[Epoch 578 Batch 200/372] loss 3.91, ppl 49.67, throughput 360.85 samples/s, lr 0.00
[Epoch 578] throughput 24726.12 samples/s
[Epoch 578] time cost 125.69s, valid loss 4.45, valid ppl 85.30
[Epoch 579 Batch 200/372] loss 3.91, ppl 49.95, throughput 355.37 samples/s, lr 0.00
[Epoch 579] throughput 24032.24 samples/s
[Epoch 579] time cost 128.06s, valid loss 4.45, valid ppl 85.30
[Epoch 580 Batch 200/372] loss 3.91, ppl 49.88, throughput 360.70 samples/s, lr 0.00
[Epoch 580] throughput 19331.71 samples/s
[Epoch 580] time cost 149.02s, valid loss 4.45, valid ppl 85.30
[Epoch 581 Batch 200/372] loss 3.91, ppl 49.71, throughput 356.73 samples/s, lr 0.00
[Epoch 581] throughput 23960.27 samples/s
[Epoch 581] time cost 128.84s, valid loss 4.45, valid ppl 85.30
[Epoch 582 Batch 200/372] loss 3.91, ppl 49.88, throughput 351.34 samples/s, lr 0.00
[Epoch 582] throughput 24255.25 samples/s
[Epoch 582] time cost 127.97s, valid loss 4.45, valid ppl 85.30
[Epoch 583 Batch 200/372] loss 3.91, ppl 50.13, throughput 368.08 samples/s, lr 0.00
[Epoch 583] throughput 19374.71 samples/s
[Epoch 583] time cost 148.67s, valid loss 4.45, valid ppl 85.30
[Epoch 584 Batch 200/372] loss 3.92, ppl 50.23, throughput 351.76 samples/s, lr 0.00
[Epoch 584] throughput 24196.55 samples/s
[Epoch 584] time cost 128.69s, valid loss 4.45, valid ppl 85.30
[Epoch 585 Batch 200/372] loss 3.92, ppl 50.22, throughput 360.67 samples/s, lr 0.00
[Epoch 585] throughput 24246.73 samples/s
[Epoch 585] time cost 127.23s, valid loss 4.45, valid ppl 85.30
[Epoch 586 Batch 200/372] loss 3.91, ppl 50.10, throughput 346.11 samples/s, lr 0.00
[Epoch 586] throughput 21300.78 samples/s
[Epoch 586] time cost 140.13s, valid loss 4.45, valid ppl 85.30
[Epoch 587 Batch 200/372] loss 3.91, ppl 49.83, throughput 349.03 samples/s, lr 0.00
[Epoch 587] throughput 24120.86 samples/s
[Epoch 587] time cost 128.79s, valid loss 4.45, valid ppl 85.30
[Epoch 588 Batch 200/372] loss 3.91, ppl 49.85, throughput 358.04 samples/s, lr 0.00
[Epoch 588] throughput 24435.83 samples/s
[Epoch 588] time cost 126.54s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 589 Batch 200/372] loss 3.92, ppl 50.31, throughput 357.77 samples/s, lr 0.00
[Epoch 589] throughput 23812.32 samples/s
[Epoch 589] time cost 130.10s, valid loss 4.45, valid ppl 85.30
[Epoch 590 Batch 200/372] loss 3.92, ppl 50.37, throughput 335.38 samples/s, lr 0.00
[Epoch 590] throughput 23444.24 samples/s
[Epoch 590] time cost 130.05s, valid loss 4.45, valid ppl 85.30
[Epoch 591 Batch 200/372] loss 3.91, ppl 49.95, throughput 349.99 samples/s, lr 0.00
[Epoch 591] throughput 23943.97 samples/s
[Epoch 591] time cost 128.39s, valid loss 4.45, valid ppl 85.30
[Epoch 592 Batch 200/372] loss 3.91, ppl 49.91, throughput 360.47 samples/s, lr 0.00
[Epoch 592] throughput 24051.66 samples/s
[Epoch 592] time cost 129.57s, valid loss 4.45, valid ppl 85.30
[Epoch 593 Batch 200/372] loss 3.91, ppl 49.98, throughput 328.94 samples/s, lr 0.00
[Epoch 593] throughput 23395.12 samples/s
[Epoch 593] time cost 130.47s, valid loss 4.45, valid ppl 85.30
[Epoch 594 Batch 200/372] loss 3.91, ppl 50.08, throughput 348.07 samples/s, lr 0.00
[Epoch 594] throughput 24077.38 samples/s
[Epoch 594] time cost 128.48s, valid loss 4.45, valid ppl 85.30
[Epoch 595 Batch 200/372] loss 3.92, ppl 50.28, throughput 349.78 samples/s, lr 0.00
[Epoch 595] throughput 23938.06 samples/s
[Epoch 595] time cost 128.69s, valid loss 4.45, valid ppl 85.30
[Epoch 596 Batch 200/372] loss 3.91, ppl 49.84, throughput 270.63 samples/s, lr 0.00
[Epoch 596] throughput 20420.58 samples/s
[Epoch 596] time cost 143.96s, valid loss 4.45, valid ppl 85.30
[Epoch 597 Batch 200/372] loss 3.92, ppl 50.26, throughput 362.68 samples/s, lr 0.00
[Epoch 597] throughput 24374.25 samples/s
[Epoch 597] time cost 126.54s, valid loss 4.45, valid ppl 85.30
[Epoch 598 Batch 200/372] loss 3.91, ppl 50.12, throughput 359.27 samples/s, lr 0.00
[Epoch 598] throughput 24268.10 samples/s
[Epoch 598] time cost 127.81s, valid loss 4.45, valid ppl 85.30
[Epoch 599 Batch 200/372] loss 3.91, ppl 50.00, throughput 244.79 samples/s, lr 0.00
[Epoch 599] throughput 19687.45 samples/s
[Epoch 599] time cost 147.73s, valid loss 4.45, valid ppl 85.30
[Epoch 600 Batch 200/372] loss 3.91, ppl 50.04, throughput 346.35 samples/s, lr 0.00
[Epoch 600] throughput 23537.14 samples/s
[Epoch 600] time cost 129.59s, valid loss 4.45, valid ppl 85.30
[Epoch 601 Batch 200/372] loss 3.91, ppl 50.14, throughput 348.49 samples/s, lr 0.00
[Epoch 601] throughput 23771.76 samples/s
[Epoch 601] time cost 129.28s, valid loss 4.45, valid ppl 85.30
[Epoch 602 Batch 200/372] loss 3.92, ppl 50.19, throughput 259.44 samples/s, lr 0.00
[Epoch 602] throughput 20731.59 samples/s
[Epoch 602] time cost 141.94s, valid loss 4.45, valid ppl 85.30
[Epoch 603 Batch 200/372] loss 3.91, ppl 50.02, throughput 378.59 samples/s, lr 0.00
[Epoch 603] throughput 25284.26 samples/s
[Epoch 603] time cost 123.73s, valid loss 4.45, valid ppl 85.30
[Epoch 604 Batch 200/372] loss 3.92, ppl 50.21, throughput 355.02 samples/s, lr 0.00
[Epoch 604] throughput 24784.66 samples/s
[Epoch 604] time cost 125.75s, valid loss 4.45, valid ppl 85.30
[Epoch 605 Batch 200/372] loss 3.92, ppl 50.18, throughput 271.36 samples/s, lr 0.00
[Epoch 605] throughput 21907.97 samples/s
[Epoch 605] time cost 137.02s, valid loss 4.45, valid ppl 85.30
[Epoch 606 Batch 200/372] loss 3.91, ppl 49.88, throughput 360.40 samples/s, lr 0.00
[Epoch 606] throughput 24020.13 samples/s
[Epoch 606] time cost 128.10s, valid loss 4.45, valid ppl 85.30
[Epoch 607 Batch 200/372] loss 3.92, ppl 50.18, throughput 349.75 samples/s, lr 0.00
[Epoch 607] throughput 24116.74 samples/s
[Epoch 607] time cost 130.76s, valid loss 4.45, valid ppl 85.30
[Epoch 608 Batch 200/372] loss 3.91, ppl 49.93, throughput 234.30 samples/s, lr 0.00
[Epoch 608] throughput 18974.58 samples/s
[Epoch 608] time cost 152.32s, valid loss 4.45, valid ppl 85.30
[Epoch 609 Batch 200/372] loss 3.91, ppl 50.13, throughput 356.56 samples/s, lr 0.00
[Epoch 609] throughput 24264.52 samples/s
[Epoch 609] time cost 127.16s, valid loss 4.45, valid ppl 85.30
[Epoch 610 Batch 200/372] loss 3.91, ppl 49.96, throughput 377.32 samples/s, lr 0.00
[Epoch 610] throughput 25514.25 samples/s
[Epoch 610] time cost 122.91s, valid loss 4.45, valid ppl 85.30
[Epoch 611 Batch 200/372] loss 3.92, ppl 50.50, throughput 233.46 samples/s, lr 0.00
[Epoch 611] throughput 19134.51 samples/s
[Epoch 611] time cost 150.48s, valid loss 4.45, valid ppl 85.30
[Epoch 612 Batch 200/372] loss 3.92, ppl 50.19, throughput 354.26 samples/s, lr 0.00
[Epoch 612] throughput 24014.24 samples/s
[Epoch 612] time cost 129.28s, valid loss 4.45, valid ppl 85.30
[Epoch 613 Batch 200/372] loss 3.91, ppl 49.80, throughput 347.79 samples/s, lr 0.00
[Epoch 613] throughput 24032.87 samples/s
[Epoch 613] time cost 129.19s, valid loss 4.45, valid ppl 85.30
[Epoch 614 Batch 200/372] loss 3.91, ppl 50.08, throughput 286.25 samples/s, lr 0.00
[Epoch 614] throughput 15494.62 samples/s
[Epoch 614] time cost 177.04s, valid loss 4.45, valid ppl 85.30
[Epoch 615 Batch 200/372] loss 3.91, ppl 49.89, throughput 354.81 samples/s, lr 0.00
[Epoch 615] throughput 24277.56 samples/s
[Epoch 615] time cost 127.46s, valid loss 4.45, valid ppl 85.30
[Epoch 616 Batch 200/372] loss 3.91, ppl 49.88, throughput 358.31 samples/s, lr 0.00
[Epoch 616] throughput 24259.05 samples/s
[Epoch 616] time cost 127.12s, valid loss 4.45, valid ppl 85.30
[Epoch 617 Batch 200/372] loss 3.91, ppl 50.04, throughput 349.34 samples/s, lr 0.00
[Epoch 617] throughput 19278.35 samples/s
[Epoch 617] time cost 149.75s, valid loss 4.45, valid ppl 85.30
[Epoch 618 Batch 200/372] loss 3.91, ppl 49.92, throughput 349.76 samples/s, lr 0.00
[Epoch 618] throughput 23861.00 samples/s
[Epoch 618] time cost 128.77s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 619 Batch 200/372] loss 3.92, ppl 50.45, throughput 352.36 samples/s, lr 0.00
[Epoch 619] throughput 24188.80 samples/s
[Epoch 619] time cost 127.35s, valid loss 4.45, valid ppl 85.30
[Epoch 620 Batch 200/372] loss 3.91, ppl 50.14, throughput 362.71 samples/s, lr 0.00
[Epoch 620] throughput 19591.94 samples/s
[Epoch 620] time cost 147.63s, valid loss 4.45, valid ppl 85.30
[Epoch 621 Batch 200/372] loss 3.91, ppl 50.05, throughput 360.13 samples/s, lr 0.00
[Epoch 621] throughput 24174.68 samples/s
[Epoch 621] time cost 127.70s, valid loss 4.45, valid ppl 85.30
[Epoch 622 Batch 200/372] loss 3.92, ppl 50.37, throughput 347.84 samples/s, lr 0.00
[Epoch 622] throughput 24308.32 samples/s
[Epoch 622] time cost 127.53s, valid loss 4.45, valid ppl 85.30
[Epoch 623 Batch 200/372] loss 3.91, ppl 50.03, throughput 356.47 samples/s, lr 0.00
[Epoch 623] throughput 19261.00 samples/s
[Epoch 623] time cost 150.59s, valid loss 4.45, valid ppl 85.30
[Epoch 624 Batch 200/372] loss 3.91, ppl 50.14, throughput 359.16 samples/s, lr 0.00
[Epoch 624] throughput 24122.23 samples/s
[Epoch 624] time cost 127.78s, valid loss 4.45, valid ppl 85.30
[Epoch 625 Batch 200/372] loss 3.92, ppl 50.15, throughput 381.57 samples/s, lr 0.00
[Epoch 625] throughput 26453.75 samples/s
[Epoch 625] time cost 120.80s, valid loss 4.45, valid ppl 85.30
[Epoch 626 Batch 200/372] loss 3.92, ppl 50.25, throughput 360.76 samples/s, lr 0.00
[Epoch 626] throughput 19040.07 samples/s
[Epoch 626] time cost 150.89s, valid loss 4.45, valid ppl 85.30
[Epoch 627 Batch 200/372] loss 3.92, ppl 50.23, throughput 355.97 samples/s, lr 0.00
[Epoch 627] throughput 24484.17 samples/s
[Epoch 627] time cost 126.31s, valid loss 4.45, valid ppl 85.30
[Epoch 628 Batch 200/372] loss 3.92, ppl 50.15, throughput 355.53 samples/s, lr 0.00
[Epoch 628] throughput 24390.69 samples/s
[Epoch 628] time cost 127.36s, valid loss 4.45, valid ppl 85.30
[Epoch 629 Batch 200/372] loss 3.91, ppl 49.94, throughput 349.06 samples/s, lr 0.00
[Epoch 629] throughput 19252.02 samples/s
[Epoch 629] time cost 150.02s, valid loss 4.45, valid ppl 85.30
[Epoch 630 Batch 200/372] loss 3.92, ppl 50.35, throughput 374.33 samples/s, lr 0.00
[Epoch 630] throughput 25046.92 samples/s
[Epoch 630] time cost 124.69s, valid loss 4.45, valid ppl 85.30
[Epoch 631 Batch 200/372] loss 3.91, ppl 50.11, throughput 360.60 samples/s, lr 0.00
[Epoch 631] throughput 24316.05 samples/s
[Epoch 631] time cost 127.33s, valid loss 4.45, valid ppl 85.30
[Epoch 632 Batch 200/372] loss 3.91, ppl 50.09, throughput 306.57 samples/s, lr 0.00
[Epoch 632] throughput 18881.81 samples/s
[Epoch 632] time cost 152.11s, valid loss 4.45, valid ppl 85.30
[Epoch 633 Batch 200/372] loss 3.92, ppl 50.26, throughput 358.26 samples/s, lr 0.00
[Epoch 633] throughput 24419.45 samples/s
[Epoch 633] time cost 126.83s, valid loss 4.45, valid ppl 85.30
[Epoch 634 Batch 200/372] loss 3.91, ppl 50.00, throughput 350.65 samples/s, lr 0.00
[Epoch 634] throughput 24130.94 samples/s
[Epoch 634] time cost 127.49s, valid loss 4.45, valid ppl 85.30
[Epoch 635 Batch 200/372] loss 3.90, ppl 49.62, throughput 291.56 samples/s, lr 0.00
[Epoch 635] throughput 19010.90 samples/s
[Epoch 635] time cost 151.00s, valid loss 4.45, valid ppl 85.30
[Epoch 636 Batch 200/372] loss 3.91, ppl 49.99, throughput 359.04 samples/s, lr 0.00
[Epoch 636] throughput 24200.25 samples/s
[Epoch 636] time cost 127.56s, valid loss 4.45, valid ppl 85.30
[Epoch 637 Batch 200/372] loss 3.91, ppl 49.91, throughput 362.88 samples/s, lr 0.00
[Epoch 637] throughput 24017.49 samples/s
[Epoch 637] time cost 128.47s, valid loss 4.45, valid ppl 85.30
[Epoch 638 Batch 200/372] loss 3.91, ppl 49.72, throughput 264.12 samples/s, lr 0.00
[Epoch 638] throughput 19114.72 samples/s
[Epoch 638] time cost 150.53s, valid loss 4.45, valid ppl 85.30
[Epoch 639 Batch 200/372] loss 3.92, ppl 50.27, throughput 351.87 samples/s, lr 0.00
[Epoch 639] throughput 23941.00 samples/s
[Epoch 639] time cost 128.10s, valid loss 4.45, valid ppl 85.30
[Epoch 640 Batch 200/372] loss 3.91, ppl 50.03, throughput 349.37 samples/s, lr 0.00
[Epoch 640] throughput 24344.79 samples/s
[Epoch 640] time cost 127.11s, valid loss 4.45, valid ppl 85.30
[Epoch 641 Batch 200/372] loss 3.91, ppl 50.13, throughput 315.25 samples/s, lr 0.00
[Epoch 641] throughput 15673.80 samples/s
[Epoch 641] time cost 175.00s, valid loss 4.45, valid ppl 85.30
[Epoch 642 Batch 200/372] loss 3.91, ppl 49.89, throughput 355.69 samples/s, lr 0.00
[Epoch 642] throughput 24503.09 samples/s
[Epoch 642] time cost 126.25s, valid loss 4.45, valid ppl 85.30
[Epoch 643 Batch 200/372] loss 3.92, ppl 50.34, throughput 352.41 samples/s, lr 0.00
[Epoch 643] throughput 24033.48 samples/s
[Epoch 643] time cost 128.13s, valid loss 4.45, valid ppl 85.30
[Epoch 644 Batch 200/372] loss 3.91, ppl 49.92, throughput 366.01 samples/s, lr 0.00
[Epoch 644] throughput 19296.30 samples/s
[Epoch 644] time cost 149.37s, valid loss 4.45, valid ppl 85.30
[Epoch 645 Batch 200/372] loss 3.91, ppl 50.02, throughput 356.85 samples/s, lr 0.00
[Epoch 645] throughput 24498.67 samples/s
[Epoch 645] time cost 126.41s, valid loss 4.45, valid ppl 85.30
[Epoch 646 Batch 200/372] loss 3.91, ppl 49.95, throughput 354.77 samples/s, lr 0.00
[Epoch 646] throughput 24463.38 samples/s
[Epoch 646] time cost 127.05s, valid loss 4.45, valid ppl 85.30
[Epoch 647 Batch 200/372] loss 3.91, ppl 49.95, throughput 358.62 samples/s, lr 0.00
[Epoch 647] throughput 19092.49 samples/s
[Epoch 647] time cost 150.92s, valid loss 4.45, valid ppl 85.30
[Epoch 648 Batch 200/372] loss 3.91, ppl 50.01, throughput 348.80 samples/s, lr 0.00
[Epoch 648] throughput 24121.69 samples/s
[Epoch 648] time cost 128.47s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 649 Batch 200/372] loss 3.91, ppl 49.86, throughput 373.40 samples/s, lr 0.00
[Epoch 649] throughput 25017.41 samples/s
[Epoch 649] time cost 125.33s, valid loss 4.45, valid ppl 85.30
[Epoch 650 Batch 200/372] loss 3.91, ppl 49.92, throughput 369.38 samples/s, lr 0.00
[Epoch 650] throughput 19798.89 samples/s
[Epoch 650] time cost 146.44s, valid loss 4.45, valid ppl 85.30
[Epoch 651 Batch 200/372] loss 3.91, ppl 49.92, throughput 361.85 samples/s, lr 0.00
[Epoch 651] throughput 25064.60 samples/s
[Epoch 651] time cost 124.42s, valid loss 4.45, valid ppl 85.30
[Epoch 652 Batch 200/372] loss 3.91, ppl 50.07, throughput 360.04 samples/s, lr 0.00
[Epoch 652] throughput 23979.04 samples/s
[Epoch 652] time cost 128.40s, valid loss 4.45, valid ppl 85.30
[Epoch 653 Batch 200/372] loss 3.91, ppl 49.73, throughput 355.70 samples/s, lr 0.00
[Epoch 653] throughput 19093.51 samples/s
[Epoch 653] time cost 151.04s, valid loss 4.45, valid ppl 85.30
[Epoch 654 Batch 200/372] loss 3.91, ppl 49.93, throughput 360.24 samples/s, lr 0.00
[Epoch 654] throughput 23921.65 samples/s
[Epoch 654] time cost 128.98s, valid loss 4.45, valid ppl 85.30
[Epoch 655 Batch 200/372] loss 3.91, ppl 49.97, throughput 356.31 samples/s, lr 0.00
[Epoch 655] throughput 24276.60 samples/s
[Epoch 655] time cost 127.32s, valid loss 4.45, valid ppl 85.30
[Epoch 656 Batch 200/372] loss 3.91, ppl 49.79, throughput 354.78 samples/s, lr 0.00
[Epoch 656] throughput 19483.02 samples/s
[Epoch 656] time cost 148.19s, valid loss 4.45, valid ppl 85.30
[Epoch 657 Batch 200/372] loss 3.91, ppl 49.97, throughput 355.19 samples/s, lr 0.00
[Epoch 657] throughput 24451.90 samples/s
[Epoch 657] time cost 127.20s, valid loss 4.45, valid ppl 85.30
[Epoch 658 Batch 200/372] loss 3.91, ppl 50.01, throughput 357.18 samples/s, lr 0.00
[Epoch 658] throughput 24397.87 samples/s
[Epoch 658] time cost 127.89s, valid loss 4.45, valid ppl 85.30
[Epoch 659 Batch 200/372] loss 3.92, ppl 50.45, throughput 354.12 samples/s, lr 0.00
[Epoch 659] throughput 21045.88 samples/s
[Epoch 659] time cost 141.01s, valid loss 4.45, valid ppl 85.30
[Epoch 660 Batch 200/372] loss 3.91, ppl 49.95, throughput 359.51 samples/s, lr 0.00
[Epoch 660] throughput 24515.08 samples/s
[Epoch 660] time cost 126.12s, valid loss 4.45, valid ppl 85.30
[Epoch 661 Batch 200/372] loss 3.91, ppl 49.95, throughput 370.19 samples/s, lr 0.00
[Epoch 661] throughput 24929.43 samples/s
[Epoch 661] time cost 125.45s, valid loss 4.45, valid ppl 85.30
[Epoch 662 Batch 200/372] loss 3.92, ppl 50.35, throughput 363.90 samples/s, lr 0.00
[Epoch 662] throughput 23544.35 samples/s
[Epoch 662] time cost 130.87s, valid loss 4.45, valid ppl 85.30
[Epoch 663 Batch 200/372] loss 3.92, ppl 50.20, throughput 355.76 samples/s, lr 0.00
[Epoch 663] throughput 24346.73 samples/s
[Epoch 663] time cost 127.13s, valid loss 4.45, valid ppl 85.30
[Epoch 664 Batch 200/372] loss 3.92, ppl 50.23, throughput 356.08 samples/s, lr 0.00
[Epoch 664] throughput 24031.31 samples/s
[Epoch 664] time cost 128.47s, valid loss 4.45, valid ppl 85.30
[Epoch 665 Batch 200/372] loss 3.91, ppl 50.01, throughput 346.95 samples/s, lr 0.00
[Epoch 665] throughput 22673.50 samples/s
[Epoch 665] time cost 134.53s, valid loss 4.45, valid ppl 85.30
[Epoch 666 Batch 200/372] loss 3.92, ppl 50.25, throughput 366.09 samples/s, lr 0.00
[Epoch 666] throughput 25032.66 samples/s
[Epoch 666] time cost 124.86s, valid loss 4.45, valid ppl 85.30
[Epoch 667 Batch 200/372] loss 3.92, ppl 50.19, throughput 383.63 samples/s, lr 0.00
[Epoch 667] throughput 26172.31 samples/s
[Epoch 667] time cost 121.06s, valid loss 4.45, valid ppl 85.30
[Epoch 668 Batch 200/372] loss 3.91, ppl 49.90, throughput 385.15 samples/s, lr 0.00
[Epoch 668] throughput 26459.83 samples/s
[Epoch 668] time cost 119.92s, valid loss 4.45, valid ppl 85.30
[Epoch 669 Batch 200/372] loss 3.91, ppl 49.85, throughput 241.32 samples/s, lr 0.00
[Epoch 669] throughput 19409.15 samples/s
[Epoch 669] time cost 149.81s, valid loss 4.45, valid ppl 85.30
[Epoch 670 Batch 200/372] loss 3.91, ppl 50.00, throughput 355.43 samples/s, lr 0.00
[Epoch 670] throughput 23955.21 samples/s
[Epoch 670] time cost 128.32s, valid loss 4.45, valid ppl 85.30
[Epoch 671 Batch 200/372] loss 3.91, ppl 49.90, throughput 354.39 samples/s, lr 0.00
[Epoch 671] throughput 24175.30 samples/s
[Epoch 671] time cost 127.76s, valid loss 4.45, valid ppl 85.30
[Epoch 672 Batch 200/372] loss 3.91, ppl 49.83, throughput 242.10 samples/s, lr 0.00
[Epoch 672] throughput 19332.63 samples/s
[Epoch 672] time cost 149.28s, valid loss 4.45, valid ppl 85.30
[Epoch 673 Batch 200/372] loss 3.91, ppl 50.10, throughput 369.23 samples/s, lr 0.00
[Epoch 673] throughput 24763.99 samples/s
[Epoch 673] time cost 126.26s, valid loss 4.45, valid ppl 85.30
[Epoch 674 Batch 200/372] loss 3.91, ppl 49.88, throughput 357.54 samples/s, lr 0.00
[Epoch 674] throughput 24107.54 samples/s
[Epoch 674] time cost 127.62s, valid loss 4.45, valid ppl 85.30
[Epoch 675 Batch 200/372] loss 3.91, ppl 50.08, throughput 243.89 samples/s, lr 0.00
[Epoch 675] throughput 19154.86 samples/s
[Epoch 675] time cost 149.96s, valid loss 4.45, valid ppl 85.30
[Epoch 676 Batch 200/372] loss 3.91, ppl 50.02, throughput 349.35 samples/s, lr 0.00
[Epoch 676] throughput 24351.91 samples/s
[Epoch 676] time cost 127.10s, valid loss 4.45, valid ppl 85.30
[Epoch 677 Batch 200/372] loss 3.91, ppl 50.06, throughput 349.13 samples/s, lr 0.00
[Epoch 677] throughput 24009.53 samples/s
[Epoch 677] time cost 128.05s, valid loss 4.45, valid ppl 85.30
[Epoch 678 Batch 200/372] loss 3.91, ppl 50.14, throughput 275.60 samples/s, lr 0.00
[Epoch 678] throughput 19227.76 samples/s
[Epoch 678] time cost 149.73s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 679 Batch 200/372] loss 3.92, ppl 50.22, throughput 352.87 samples/s, lr 0.00
[Epoch 679] throughput 24618.14 samples/s
[Epoch 679] time cost 126.63s, valid loss 4.45, valid ppl 85.30
[Epoch 680 Batch 200/372] loss 3.92, ppl 50.34, throughput 356.11 samples/s, lr 0.00
[Epoch 680] throughput 23995.03 samples/s
[Epoch 680] time cost 128.89s, valid loss 4.45, valid ppl 85.30
[Epoch 681 Batch 200/372] loss 3.91, ppl 49.82, throughput 282.44 samples/s, lr 0.00
[Epoch 681] throughput 19032.67 samples/s
[Epoch 681] time cost 151.04s, valid loss 4.45, valid ppl 85.30
[Epoch 682 Batch 200/372] loss 3.91, ppl 49.96, throughput 352.46 samples/s, lr 0.00
[Epoch 682] throughput 24339.87 samples/s
[Epoch 682] time cost 128.06s, valid loss 4.45, valid ppl 85.30
[Epoch 683 Batch 200/372] loss 3.92, ppl 50.18, throughput 360.48 samples/s, lr 0.00
[Epoch 683] throughput 24234.66 samples/s
[Epoch 683] time cost 127.14s, valid loss 4.45, valid ppl 85.30
[Epoch 684 Batch 200/372] loss 3.91, ppl 50.00, throughput 364.10 samples/s, lr 0.00
[Epoch 684] throughput 19420.36 samples/s
[Epoch 684] time cost 149.06s, valid loss 4.45, valid ppl 85.30
[Epoch 685 Batch 200/372] loss 3.92, ppl 50.16, throughput 372.95 samples/s, lr 0.00
[Epoch 685] throughput 25063.49 samples/s
[Epoch 685] time cost 124.30s, valid loss 4.45, valid ppl 85.30
[Epoch 686 Batch 200/372] loss 3.92, ppl 50.37, throughput 374.44 samples/s, lr 0.00
[Epoch 686] throughput 25117.27 samples/s
[Epoch 686] time cost 124.20s, valid loss 4.45, valid ppl 85.30
[Epoch 687 Batch 200/372] loss 3.91, ppl 49.93, throughput 372.11 samples/s, lr 0.00
[Epoch 687] throughput 19888.28 samples/s
[Epoch 687] time cost 146.19s, valid loss 4.45, valid ppl 85.30
[Epoch 688 Batch 200/372] loss 3.91, ppl 49.98, throughput 353.47 samples/s, lr 0.00
[Epoch 688] throughput 24140.79 samples/s
[Epoch 688] time cost 127.52s, valid loss 4.45, valid ppl 85.30
[Epoch 689 Batch 200/372] loss 3.92, ppl 50.32, throughput 363.13 samples/s, lr 0.00
[Epoch 689] throughput 24191.39 samples/s
[Epoch 689] time cost 127.63s, valid loss 4.45, valid ppl 85.30
[Epoch 690 Batch 200/372] loss 3.91, ppl 49.68, throughput 347.04 samples/s, lr 0.00
[Epoch 690] throughput 19102.81 samples/s
[Epoch 690] time cost 150.75s, valid loss 4.45, valid ppl 85.30
[Epoch 691 Batch 200/372] loss 3.91, ppl 50.03, throughput 359.15 samples/s, lr 0.00
[Epoch 691] throughput 24265.59 samples/s
[Epoch 691] time cost 127.26s, valid loss 4.45, valid ppl 85.30
[Epoch 692 Batch 200/372] loss 3.91, ppl 49.86, throughput 360.19 samples/s, lr 0.00
[Epoch 692] throughput 24522.84 samples/s
[Epoch 692] time cost 126.49s, valid loss 4.45, valid ppl 85.30
[Epoch 693 Batch 200/372] loss 3.91, ppl 49.95, throughput 364.31 samples/s, lr 0.00
[Epoch 693] throughput 19194.54 samples/s
[Epoch 693] time cost 150.77s, valid loss 4.45, valid ppl 85.30
[Epoch 694 Batch 200/372] loss 3.91, ppl 50.10, throughput 356.01 samples/s, lr 0.00
[Epoch 694] throughput 24088.74 samples/s
[Epoch 694] time cost 128.62s, valid loss 4.45, valid ppl 85.30
[Epoch 695 Batch 200/372] loss 3.91, ppl 50.09, throughput 353.45 samples/s, lr 0.00
[Epoch 695] throughput 24311.58 samples/s
[Epoch 695] time cost 127.20s, valid loss 4.45, valid ppl 85.30
[Epoch 696 Batch 200/372] loss 3.91, ppl 49.94, throughput 295.56 samples/s, lr 0.00
[Epoch 696] throughput 19715.73 samples/s
[Epoch 696] time cost 146.97s, valid loss 4.45, valid ppl 85.30
[Epoch 697 Batch 200/372] loss 3.91, ppl 50.05, throughput 362.97 samples/s, lr 0.00
[Epoch 697] throughput 24890.17 samples/s
[Epoch 697] time cost 125.40s, valid loss 4.45, valid ppl 85.30
[Epoch 698 Batch 200/372] loss 3.92, ppl 50.31, throughput 352.42 samples/s, lr 0.00
[Epoch 698] throughput 23921.87 samples/s
[Epoch 698] time cost 129.14s, valid loss 4.45, valid ppl 85.30
[Epoch 699 Batch 200/372] loss 3.92, ppl 50.19, throughput 274.12 samples/s, lr 0.00
[Epoch 699] throughput 19396.24 samples/s
[Epoch 699] time cost 148.67s, valid loss 4.45, valid ppl 85.30
[Epoch 700 Batch 200/372] loss 3.90, ppl 49.55, throughput 366.81 samples/s, lr 0.00
[Epoch 700] throughput 24583.83 samples/s
[Epoch 700] time cost 126.76s, valid loss 4.45, valid ppl 85.30
[Epoch 701 Batch 200/372] loss 3.91, ppl 49.97, throughput 353.92 samples/s, lr 0.00
[Epoch 701] throughput 24364.24 samples/s
[Epoch 701] time cost 127.52s, valid loss 4.45, valid ppl 85.30
[Epoch 702 Batch 200/372] loss 3.91, ppl 49.80, throughput 350.05 samples/s, lr 0.00
[Epoch 702] throughput 19199.46 samples/s
[Epoch 702] time cost 150.26s, valid loss 4.45, valid ppl 85.30
[Epoch 703 Batch 200/372] loss 3.91, ppl 49.86, throughput 358.93 samples/s, lr 0.00
[Epoch 703] throughput 24506.74 samples/s
[Epoch 703] time cost 126.58s, valid loss 4.45, valid ppl 85.30
[Epoch 704 Batch 200/372] loss 3.92, ppl 50.16, throughput 350.15 samples/s, lr 0.00
[Epoch 704] throughput 24052.54 samples/s
[Epoch 704] time cost 127.70s, valid loss 4.45, valid ppl 85.30
[Epoch 705 Batch 200/372] loss 3.92, ppl 50.27, throughput 365.25 samples/s, lr 0.00
[Epoch 705] throughput 19435.32 samples/s
[Epoch 705] time cost 148.73s, valid loss 4.45, valid ppl 85.30
[Epoch 706 Batch 200/372] loss 3.91, ppl 50.08, throughput 353.29 samples/s, lr 0.00
[Epoch 706] throughput 24107.21 samples/s
[Epoch 706] time cost 127.79s, valid loss 4.45, valid ppl 85.30
[Epoch 707 Batch 200/372] loss 3.91, ppl 50.14, throughput 351.78 samples/s, lr 0.00
[Epoch 707] throughput 24742.26 samples/s
[Epoch 707] time cost 125.84s, valid loss 4.45, valid ppl 85.30
[Epoch 708 Batch 200/372] loss 3.92, ppl 50.16, throughput 352.59 samples/s, lr 0.00
[Epoch 708] throughput 19127.70 samples/s
[Epoch 708] time cost 150.08s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 709 Batch 200/372] loss 3.91, ppl 50.12, throughput 353.02 samples/s, lr 0.00
[Epoch 709] throughput 24286.67 samples/s
[Epoch 709] time cost 127.79s, valid loss 4.45, valid ppl 85.30
[Epoch 710 Batch 200/372] loss 3.91, ppl 50.12, throughput 356.78 samples/s, lr 0.00
[Epoch 710] throughput 24002.79 samples/s
[Epoch 710] time cost 128.15s, valid loss 4.45, valid ppl 85.30
[Epoch 711 Batch 200/372] loss 3.92, ppl 50.25, throughput 351.54 samples/s, lr 0.00
[Epoch 711] throughput 18980.67 samples/s
[Epoch 711] time cost 151.87s, valid loss 4.45, valid ppl 85.30
[Epoch 712 Batch 200/372] loss 3.91, ppl 50.07, throughput 358.55 samples/s, lr 0.00
[Epoch 712] throughput 24307.72 samples/s
[Epoch 712] time cost 128.19s, valid loss 4.45, valid ppl 85.30
[Epoch 713 Batch 200/372] loss 3.91, ppl 50.08, throughput 354.40 samples/s, lr 0.00
[Epoch 713] throughput 24381.40 samples/s
[Epoch 713] time cost 127.02s, valid loss 4.45, valid ppl 85.30
[Epoch 714 Batch 200/372] loss 3.91, ppl 49.67, throughput 363.60 samples/s, lr 0.00
[Epoch 714] throughput 19411.12 samples/s
[Epoch 714] time cost 148.89s, valid loss 4.45, valid ppl 85.30
[Epoch 715 Batch 200/372] loss 3.91, ppl 49.82, throughput 354.57 samples/s, lr 0.00
[Epoch 715] throughput 24320.05 samples/s
[Epoch 715] time cost 126.71s, valid loss 4.45, valid ppl 85.30
[Epoch 716 Batch 200/372] loss 3.92, ppl 50.41, throughput 358.60 samples/s, lr 0.00
[Epoch 716] throughput 24145.68 samples/s
[Epoch 716] time cost 127.54s, valid loss 4.45, valid ppl 85.30
[Epoch 717 Batch 200/372] loss 3.91, ppl 49.68, throughput 353.97 samples/s, lr 0.00
[Epoch 717] throughput 19129.76 samples/s
[Epoch 717] time cost 150.29s, valid loss 4.45, valid ppl 85.30
[Epoch 718 Batch 200/372] loss 3.92, ppl 50.19, throughput 373.56 samples/s, lr 0.00
[Epoch 718] throughput 24827.64 samples/s
[Epoch 718] time cost 125.21s, valid loss 4.45, valid ppl 85.30
[Epoch 719 Batch 200/372] loss 3.91, ppl 49.97, throughput 354.47 samples/s, lr 0.00
[Epoch 719] throughput 24398.66 samples/s
[Epoch 719] time cost 126.60s, valid loss 4.45, valid ppl 85.30
[Epoch 720 Batch 200/372] loss 3.91, ppl 49.89, throughput 356.24 samples/s, lr 0.00
[Epoch 720] throughput 18494.41 samples/s
[Epoch 720] time cost 155.34s, valid loss 4.45, valid ppl 85.30
[Epoch 721 Batch 200/372] loss 3.91, ppl 50.10, throughput 335.77 samples/s, lr 0.00
[Epoch 721] throughput 23490.56 samples/s
[Epoch 721] time cost 129.87s, valid loss 4.45, valid ppl 85.30
[Epoch 722 Batch 200/372] loss 3.92, ppl 50.23, throughput 360.13 samples/s, lr 0.00
[Epoch 722] throughput 24294.41 samples/s
[Epoch 722] time cost 127.25s, valid loss 4.45, valid ppl 85.30
[Epoch 723 Batch 200/372] loss 3.91, ppl 49.98, throughput 350.76 samples/s, lr 0.00
[Epoch 723] throughput 19022.21 samples/s
[Epoch 723] time cost 151.99s, valid loss 4.45, valid ppl 85.30
[Epoch 724 Batch 200/372] loss 3.91, ppl 49.80, throughput 358.19 samples/s, lr 0.00
[Epoch 724] throughput 23931.81 samples/s
[Epoch 724] time cost 128.60s, valid loss 4.45, valid ppl 85.30
[Epoch 725 Batch 200/372] loss 3.91, ppl 49.77, throughput 366.49 samples/s, lr 0.00
[Epoch 725] throughput 24838.19 samples/s
[Epoch 725] time cost 125.02s, valid loss 4.45, valid ppl 85.30
[Epoch 726 Batch 200/372] loss 3.91, ppl 50.00, throughput 366.66 samples/s, lr 0.00
[Epoch 726] throughput 19761.29 samples/s
[Epoch 726] time cost 147.22s, valid loss 4.45, valid ppl 85.30
[Epoch 727 Batch 200/372] loss 3.92, ppl 50.56, throughput 359.74 samples/s, lr 0.00
[Epoch 727] throughput 24133.05 samples/s
[Epoch 727] time cost 128.31s, valid loss 4.45, valid ppl 85.30
[Epoch 728 Batch 200/372] loss 3.92, ppl 50.23, throughput 357.07 samples/s, lr 0.00
[Epoch 728] throughput 24010.24 samples/s
[Epoch 728] time cost 128.30s, valid loss 4.45, valid ppl 85.30
[Epoch 729 Batch 200/372] loss 3.91, ppl 50.10, throughput 355.45 samples/s, lr 0.00
[Epoch 729] throughput 19188.33 samples/s
[Epoch 729] time cost 151.12s, valid loss 4.45, valid ppl 85.30
[Epoch 730 Batch 200/372] loss 3.91, ppl 49.93, throughput 352.18 samples/s, lr 0.00
[Epoch 730] throughput 24131.84 samples/s
[Epoch 730] time cost 128.79s, valid loss 4.45, valid ppl 85.30
[Epoch 731 Batch 200/372] loss 3.91, ppl 49.95, throughput 352.97 samples/s, lr 0.00
[Epoch 731] throughput 24236.73 samples/s
[Epoch 731] time cost 127.03s, valid loss 4.45, valid ppl 85.30
[Epoch 732 Batch 200/372] loss 3.92, ppl 50.22, throughput 353.43 samples/s, lr 0.00
[Epoch 732] throughput 20321.30 samples/s
[Epoch 732] time cost 144.48s, valid loss 4.45, valid ppl 85.30
[Epoch 733 Batch 200/372] loss 3.91, ppl 49.89, throughput 358.82 samples/s, lr 0.00
[Epoch 733] throughput 24128.60 samples/s
[Epoch 733] time cost 127.51s, valid loss 4.45, valid ppl 85.30
[Epoch 734 Batch 200/372] loss 3.92, ppl 50.16, throughput 350.53 samples/s, lr 0.00
[Epoch 734] throughput 24030.53 samples/s
[Epoch 734] time cost 128.17s, valid loss 4.45, valid ppl 85.30
[Epoch 735 Batch 200/372] loss 3.91, ppl 49.84, throughput 352.39 samples/s, lr 0.00
[Epoch 735] throughput 20556.38 samples/s
[Epoch 735] time cost 143.03s, valid loss 4.45, valid ppl 85.30
[Epoch 736 Batch 200/372] loss 3.92, ppl 50.31, throughput 359.36 samples/s, lr 0.00
[Epoch 736] throughput 24185.37 samples/s
[Epoch 736] time cost 127.53s, valid loss 4.45, valid ppl 85.30
[Epoch 737 Batch 200/372] loss 3.91, ppl 49.89, throughput 365.86 samples/s, lr 0.00
[Epoch 737] throughput 24196.99 samples/s
[Epoch 737] time cost 126.96s, valid loss 4.45, valid ppl 85.30
[Epoch 738 Batch 200/372] loss 3.91, ppl 49.98, throughput 354.85 samples/s, lr 0.00
[Epoch 738] throughput 22702.67 samples/s
[Epoch 738] time cost 134.25s, valid loss 4.45, valid ppl 85.30
Learning rate after interval update 0.000000
[Epoch 739 Batch 200/372] loss 3.91, ppl 49.89, throughput 346.55 samples/s, lr 0.00
[Epoch 739] throughput 23797.89 samples/s
[Epoch 739] time cost 129.28s, valid loss 4.45, valid ppl 85.30
[Epoch 740 Batch 200/372] loss 3.91, ppl 49.77, throughput 366.59 samples/s, lr 0.00
[Epoch 740] throughput 24252.20 samples/s
[Epoch 740] time cost 127.44s, valid loss 4.45, valid ppl 85.30
[Epoch 741 Batch 200/372] loss 3.91, ppl 49.88, throughput 352.96 samples/s, lr 0.00
[Epoch 741] throughput 24102.86 samples/s
[Epoch 741] time cost 128.03s, valid loss 4.45, valid ppl 85.30
[Epoch 742 Batch 200/372] loss 3.91, ppl 49.88, throughput 255.27 samples/s, lr 0.00
[Epoch 742] throughput 20091.74 samples/s
[Epoch 742] time cost 145.24s, valid loss 4.45, valid ppl 85.30
[Epoch 743 Batch 200/372] loss 3.92, ppl 50.30, throughput 352.06 samples/s, lr 0.00
[Epoch 743] throughput 24118.86 samples/s
[Epoch 743] time cost 128.60s, valid loss 4.45, valid ppl 85.30
[Epoch 744 Batch 200/372] loss 3.92, ppl 50.17, throughput 359.79 samples/s, lr 0.00
[Epoch 744] throughput 24624.03 samples/s
[Epoch 744] time cost 127.24s, valid loss 4.45, valid ppl 85.30
[Epoch 745 Batch 200/372] loss 3.91, ppl 49.90, throughput 323.90 samples/s, lr 0.00
[Epoch 745] throughput 24092.49 samples/s
[Epoch 745] time cost 127.89s, valid loss 4.45, valid ppl 85.30
[Epoch 746 Batch 200/372] loss 3.91, ppl 49.84, throughput 383.68 samples/s, lr 0.00
[Epoch 746] throughput 25772.71 samples/s
[Epoch 746] time cost 121.89s, valid loss 4.45, valid ppl 85.30
[Epoch 747 Batch 200/372] loss 3.90, ppl 49.56, throughput 360.89 samples/s, lr 0.00
[Epoch 747] throughput 24168.05 samples/s
[Epoch 747] time cost 128.08s, valid loss 4.45, valid ppl 85.30
[Epoch 748 Batch 200/372] loss 3.91, ppl 49.78, throughput 244.04 samples/s, lr 0.00
[Epoch 748] throughput 19807.64 samples/s
[Epoch 748] time cost 146.97s, valid loss 4.45, valid ppl 85.30
[Epoch 749 Batch 200/372] loss 3.93, ppl 50.70, throughput 356.85 samples/s, lr 0.00
[Epoch 749] throughput 24523.09 samples/s
[Epoch 749] time cost 126.27s, valid loss 4.45, valid ppl 85.30
Total training throughput 14921.84 samples/s
Best validation loss 4.44, val ppl 84.89
Best test loss 4.39, test ppl 80.67
Total time cost 105062.79s
You can’t perform that action at this time.