Skip to content
Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
3335 lines (3334 sloc) 238 KB
Namespace(batch_size=128, bptt=20, clip=10.0, dropout=0.1, emsize=512, epochs=42, eps=1, eval_only=False, from_epoch=None, gpus='4,5,6,7', k=8192, log_interval=1000, lr=0.2, nhid=2048, nlayers=1, nproj=512, save='/dev/shm/eos_128.params', seed=1, test_mode=False)
BigRNN(
(encoder): HybridSequentialRNNCell(
(0): LSTMPCell(None -> 8192 -> 512)
(1): DropoutCell(rate=0.1, axes=())
)
(decoder): SparseISLogits(512 -> 793471, num_sampled = 8192, remove_accidental_hits = True)
(embedding): Sequential(
(0): SparseEmbedding(793471 -> 512, float32)
(1): Dropout(p = 0.1, axes=())
)
)
[05:03:57] src/storage/storage.cc:129: Using GPUPooledRoundedStorageManager.
[05:03:59] src/storage/storage.cc:129: Using GPUPooledRoundedStorageManager.
[05:04:00] src/storage/storage.cc:129: Using GPUPooledRoundedStorageManager.
[05:04:02] src/storage/storage.cc:129: Using GPUPooledRoundedStorageManager.
[Epoch 0 Batch 1000] loss 6.64, ppl 762.56, throughput 4903.04 samples/s
[Epoch 0 Batch 2000] loss 5.27, ppl 194.94, throughput 5523.85 samples/s
[Epoch 0 Batch 3000] loss 4.93, ppl 138.99, throughput 5495.11 samples/s
[Epoch 0 Batch 4000] loss 4.74, ppl 113.91, throughput 5339.02 samples/s
[Epoch 0 Batch 5000] loss 4.60, ppl 99.21, throughput 5561.92 samples/s
[Epoch 0 Batch 6000] loss 4.50, ppl 89.67, throughput 5454.31 samples/s
[Epoch 0 Batch 7000] loss 4.42, ppl 82.77, throughput 5532.44 samples/s
[Epoch 0 Batch 8000] loss 4.35, ppl 77.49, throughput 5253.34 samples/s
[Epoch 0 Batch 9000] loss 4.29, ppl 73.25, throughput 5463.93 samples/s
[Epoch 0 Batch 10000] loss 4.25, ppl 70.05, throughput 5485.40 samples/s
[Epoch 0 Batch 11000] loss 4.21, ppl 67.32, throughput 5554.08 samples/s
[Epoch 0 Batch 12000] loss 4.17, ppl 64.92, throughput 5322.89 samples/s
[Epoch 0 Batch 13000] loss 4.14, ppl 62.75, throughput 5469.81 samples/s
[Epoch 0 Batch 14000] loss 4.11, ppl 60.97, throughput 5477.65 samples/s
[Epoch 0 Batch 15000] loss 4.09, ppl 59.54, throughput 5281.06 samples/s
[Epoch 0 Batch 16000] loss 4.06, ppl 58.12, throughput 5490.56 samples/s
[Epoch 0 Batch 17000] loss 4.04, ppl 56.86, throughput 5482.82 samples/s
[Epoch 0 Batch 18000] loss 4.02, ppl 55.87, throughput 5490.20 samples/s
[Epoch 0 Batch 19000] loss 4.01, ppl 54.94, throughput 5329.22 samples/s
[Epoch 0 Batch 20000] loss 3.99, ppl 53.88, throughput 5556.10 samples/s
[Epoch 0 Batch 21000] loss 3.98, ppl 53.32, throughput 5533.29 samples/s
[Epoch 0 Batch 22000] loss 3.96, ppl 52.36, throughput 5495.44 samples/s
[Epoch 0 Batch 23000] loss 3.95, ppl 51.80, throughput 5326.83 samples/s
[Epoch 0 Batch 24000] loss 3.93, ppl 51.01, throughput 5481.15 samples/s
[Epoch 0 Batch 25000] loss 3.92, ppl 50.38, throughput 5511.87 samples/s
[Epoch 0 Batch 26000] loss 3.91, ppl 49.88, throughput 5476.13 samples/s
[Epoch 0 Batch 27000] loss 3.90, ppl 49.28, throughput 5300.33 samples/s
[Epoch 0 Batch 28000] loss 3.89, ppl 48.80, throughput 5490.62 samples/s
[Epoch 0 Batch 29000] loss 3.88, ppl 48.38, throughput 5454.01 samples/s
[Epoch 0 Batch 30000] loss 3.87, ppl 47.87, throughput 5349.89 samples/s
[Epoch 0 Batch 31000] loss 3.86, ppl 47.53, throughput 5492.97 samples/s
[Epoch 0 Batch 32000] loss 3.85, ppl 47.08, throughput 5488.59 samples/s
[Epoch 0 Batch 33000] loss 3.84, ppl 46.73, throughput 5504.19 samples/s
[Epoch 0 Batch 34000] loss 3.83, ppl 46.22, throughput 5329.99 samples/s
[Epoch 0 Batch 35000] loss 3.83, ppl 45.92, throughput 5511.69 samples/s
[Epoch 0 Batch 36000] loss 3.82, ppl 45.56, throughput 5466.39 samples/s
[Epoch 0 Batch 37000] loss 3.81, ppl 45.35, throughput 5495.84 samples/s
[Epoch 0 Batch 38000] loss 3.80, ppl 44.92, throughput 5367.80 samples/s
[Epoch 0 Batch 39000] loss 3.80, ppl 44.60, throughput 5518.76 samples/s
[Epoch 0 Batch 40000] loss 3.79, ppl 44.35, throughput 5502.05 samples/s
[Epoch 0 Batch 41000] loss 3.78, ppl 44.02, throughput 5317.20 samples/s
[Epoch 0 Batch 42000] loss 3.78, ppl 43.83, throughput 5512.92 samples/s
[Epoch 0 Batch 43000] loss 3.77, ppl 43.56, throughput 5465.99 samples/s
[Epoch 0 Batch 44000] loss 3.77, ppl 43.37, throughput 5535.85 samples/s
[Epoch 0 Batch 45000] loss 3.77, ppl 43.17, throughput 5394.16 samples/s
[Epoch 0 Batch 46000] loss 3.76, ppl 42.88, throughput 5481.49 samples/s
[Epoch 0 Batch 47000] loss 3.75, ppl 42.66, throughput 5553.71 samples/s
[Epoch 0 Batch 48000] loss 3.75, ppl 42.39, throughput 5428.99 samples/s
[Epoch 0 Batch 49000] loss 3.74, ppl 42.26, throughput 5335.44 samples/s
[Epoch 0 Batch 50000] loss 3.74, ppl 42.21, throughput 5499.88 samples/s
[Epoch 0 Batch 51000] loss 3.73, ppl 41.88, throughput 5516.58 samples/s
[Epoch 0 Batch 52000] loss 3.73, ppl 41.79, throughput 5138.53 samples/s
[Epoch 0 Batch 53000] loss 3.73, ppl 41.67, throughput 4061.70 samples/s
[Epoch 0 Batch 54000] loss 3.72, ppl 41.37, throughput 4308.69 samples/s
[Epoch 0 Batch 55000] loss 3.72, ppl 41.29, throughput 4334.37 samples/s
[Epoch 0 Batch 56000] loss 3.72, ppl 41.14, throughput 4348.48 samples/s
[Epoch 0 Batch 57000] loss 3.71, ppl 40.98, throughput 4789.44 samples/s
[Epoch 0 Batch 58000] loss 3.71, ppl 40.82, throughput 5366.84 samples/s
[Epoch 0 Batch 59000] loss 3.70, ppl 40.56, throughput 5477.70 samples/s
[Epoch 0 Batch 60000] loss 3.70, ppl 40.57, throughput 5348.57 samples/s
[Epoch 0 Batch 61000] loss 3.70, ppl 40.30, throughput 5496.70 samples/s
[Epoch 0 Batch 62000] loss 3.69, ppl 40.20, throughput 5468.31 samples/s
[Epoch 0 Batch 63000] loss 3.69, ppl 40.15, throughput 5505.61 samples/s
[Epoch 0 Batch 64000] loss 3.69, ppl 39.91, throughput 5384.36 samples/s
[Epoch 0 Batch 65000] loss 3.68, ppl 39.77, throughput 5477.98 samples/s
[Epoch 0 Batch 66000] loss 3.68, ppl 39.79, throughput 5509.40 samples/s
[Epoch 0 Batch 67000] loss 3.68, ppl 39.49, throughput 5339.33 samples/s
[Epoch 0 Batch 68000] loss 3.68, ppl 39.46, throughput 5479.17 samples/s
[Epoch 0 Batch 69000] loss 3.67, ppl 39.35, throughput 5488.64 samples/s
[Epoch 0 Batch 70000] loss 3.67, ppl 39.32, throughput 5480.76 samples/s
[Epoch 0 Batch 71000] loss 3.67, ppl 39.21, throughput 5291.28 samples/s
[Epoch 0 Batch 72000] loss 3.67, ppl 39.07, throughput 5486.19 samples/s
[Epoch 0 Batch 73000] loss 3.66, ppl 38.88, throughput 5514.52 samples/s
[Epoch 0 Batch 74000] loss 3.66, ppl 38.95, throughput 5421.78 samples/s
[Epoch 0 Batch 75000] loss 3.66, ppl 38.74, throughput 5260.19 samples/s
[Epoch 0 Batch 76000] loss 3.66, ppl 38.67, throughput 5511.49 samples/s
[Epoch 0 Batch 77000] loss 3.65, ppl 38.61, throughput 5507.37 samples/s
[Epoch 0 Batch 78000] loss 3.65, ppl 38.52, throughput 5445.71 samples/s
Epoch 0 took 7466.25 seconds.
[Epoch 1 Batch 1000] loss 3.62, ppl 37.41, throughput 5408.23 samples/s
[Epoch 1 Batch 2000] loss 3.59, ppl 36.11, throughput 5514.72 samples/s
[Epoch 1 Batch 3000] loss 3.60, ppl 36.70, throughput 5512.45 samples/s
[Epoch 1 Batch 4000] loss 3.63, ppl 37.67, throughput 5455.10 samples/s
[Epoch 1 Batch 5000] loss 3.62, ppl 37.24, throughput 4881.78 samples/s
[Epoch 1 Batch 6000] loss 3.62, ppl 37.17, throughput 4665.66 samples/s
[Epoch 1 Batch 7000] loss 3.63, ppl 37.65, throughput 4743.40 samples/s
[Epoch 1 Batch 8000] loss 3.61, ppl 36.80, throughput 4449.83 samples/s
[Epoch 1 Batch 9000] loss 3.62, ppl 37.33, throughput 4608.19 samples/s
[Epoch 1 Batch 10000] loss 3.60, ppl 36.76, throughput 4750.90 samples/s
[Epoch 1 Batch 11000] loss 3.62, ppl 37.24, throughput 5749.37 samples/s
[Epoch 1 Batch 12000] loss 3.61, ppl 37.06, throughput 5504.49 samples/s
[Epoch 1 Batch 13000] loss 3.61, ppl 37.06, throughput 5698.37 samples/s
[Epoch 1 Batch 14000] loss 3.60, ppl 36.51, throughput 5704.38 samples/s
[Epoch 1 Batch 15000] loss 3.61, ppl 36.82, throughput 5499.98 samples/s
[Epoch 1 Batch 16000] loss 3.60, ppl 36.62, throughput 5676.33 samples/s
[Epoch 1 Batch 17000] loss 3.61, ppl 36.92, throughput 5641.28 samples/s
[Epoch 1 Batch 18000] loss 3.61, ppl 36.98, throughput 5673.32 samples/s
[Epoch 1 Batch 19000] loss 3.60, ppl 36.66, throughput 5537.11 samples/s
[Epoch 1 Batch 20000] loss 3.60, ppl 36.73, throughput 5650.75 samples/s
[Epoch 1 Batch 21000] loss 3.60, ppl 36.62, throughput 5749.04 samples/s
[Epoch 1 Batch 22000] loss 3.60, ppl 36.58, throughput 5632.77 samples/s
[Epoch 1 Batch 23000] loss 3.60, ppl 36.52, throughput 5314.66 samples/s
[Epoch 1 Batch 24000] loss 3.59, ppl 36.40, throughput 4617.63 samples/s
[Epoch 1 Batch 25000] loss 3.58, ppl 35.90, throughput 4524.49 samples/s
[Epoch 1 Batch 26000] loss 3.60, ppl 36.44, throughput 4942.76 samples/s
[Epoch 1 Batch 27000] loss 3.58, ppl 35.88, throughput 5199.30 samples/s
[Epoch 1 Batch 28000] loss 3.58, ppl 35.84, throughput 5317.60 samples/s
[Epoch 1 Batch 29000] loss 3.57, ppl 35.67, throughput 5367.94 samples/s
[Epoch 1 Batch 30000] loss 3.58, ppl 35.80, throughput 5570.83 samples/s
[Epoch 1 Batch 31000] loss 3.58, ppl 35.99, throughput 5676.62 samples/s
[Epoch 1 Batch 32000] loss 3.58, ppl 35.94, throughput 5688.23 samples/s
[Epoch 1 Batch 33000] loss 3.59, ppl 36.16, throughput 5698.69 samples/s
[Epoch 1 Batch 34000] loss 3.59, ppl 36.10, throughput 5512.86 samples/s
[Epoch 1 Batch 35000] loss 3.58, ppl 35.89, throughput 5731.14 samples/s
[Epoch 1 Batch 36000] loss 3.57, ppl 35.56, throughput 5713.60 samples/s
[Epoch 1 Batch 37000] loss 3.57, ppl 35.40, throughput 5727.71 samples/s
[Epoch 1 Batch 38000] loss 3.58, ppl 35.77, throughput 5528.63 samples/s
[Epoch 1 Batch 39000] loss 3.57, ppl 35.41, throughput 5751.44 samples/s
[Epoch 1 Batch 40000] loss 3.57, ppl 35.34, throughput 5688.37 samples/s
[Epoch 1 Batch 41000] loss 3.57, ppl 35.34, throughput 5550.87 samples/s
[Epoch 1 Batch 42000] loss 3.57, ppl 35.49, throughput 5711.58 samples/s
[Epoch 1 Batch 43000] loss 3.57, ppl 35.54, throughput 5620.25 samples/s
[Epoch 1 Batch 44000] loss 3.56, ppl 35.22, throughput 5766.83 samples/s
[Epoch 1 Batch 45000] loss 3.56, ppl 35.03, throughput 5563.90 samples/s
[Epoch 1 Batch 46000] loss 3.56, ppl 35.05, throughput 5613.50 samples/s
[Epoch 1 Batch 47000] loss 3.57, ppl 35.34, throughput 5729.62 samples/s
[Epoch 1 Batch 48000] loss 3.56, ppl 35.26, throughput 5762.30 samples/s
[Epoch 1 Batch 49000] loss 3.56, ppl 35.10, throughput 5479.04 samples/s
[Epoch 1 Batch 50000] loss 3.56, ppl 35.19, throughput 5749.08 samples/s
[Epoch 1 Batch 51000] loss 3.56, ppl 35.12, throughput 5716.22 samples/s
[Epoch 1 Batch 52000] loss 3.55, ppl 34.96, throughput 5697.26 samples/s
[Epoch 1 Batch 53000] loss 3.56, ppl 35.12, throughput 5504.68 samples/s
[Epoch 1 Batch 54000] loss 3.55, ppl 34.86, throughput 5675.81 samples/s
[Epoch 1 Batch 55000] loss 3.55, ppl 34.79, throughput 5738.12 samples/s
[Epoch 1 Batch 56000] loss 3.55, ppl 34.76, throughput 5540.05 samples/s
[Epoch 1 Batch 57000] loss 3.55, ppl 34.78, throughput 5785.17 samples/s
[Epoch 1 Batch 58000] loss 3.55, ppl 34.78, throughput 5690.33 samples/s
[Epoch 1 Batch 59000] loss 3.55, ppl 34.73, throughput 5709.33 samples/s
[Epoch 1 Batch 60000] loss 3.55, ppl 34.90, throughput 5518.98 samples/s
[Epoch 1 Batch 61000] loss 3.55, ppl 34.92, throughput 5689.50 samples/s
[Epoch 1 Batch 62000] loss 3.54, ppl 34.58, throughput 5648.56 samples/s
[Epoch 1 Batch 63000] loss 3.54, ppl 34.54, throughput 5724.62 samples/s
[Epoch 1 Batch 64000] loss 3.55, ppl 34.78, throughput 5515.24 samples/s
[Epoch 1 Batch 65000] loss 3.54, ppl 34.39, throughput 5671.14 samples/s
[Epoch 1 Batch 66000] loss 3.55, ppl 34.66, throughput 5679.73 samples/s
[Epoch 1 Batch 67000] loss 3.54, ppl 34.58, throughput 5565.66 samples/s
[Epoch 1 Batch 68000] loss 3.54, ppl 34.32, throughput 5676.57 samples/s
[Epoch 1 Batch 69000] loss 3.53, ppl 34.23, throughput 5742.23 samples/s
[Epoch 1 Batch 70000] loss 3.54, ppl 34.44, throughput 5685.74 samples/s
[Epoch 1 Batch 71000] loss 3.54, ppl 34.56, throughput 5561.66 samples/s
[Epoch 1 Batch 72000] loss 3.53, ppl 34.24, throughput 5699.39 samples/s
[Epoch 1 Batch 73000] loss 3.54, ppl 34.48, throughput 5657.86 samples/s
[Epoch 1 Batch 74000] loss 3.54, ppl 34.31, throughput 5720.75 samples/s
[Epoch 1 Batch 75000] loss 3.54, ppl 34.51, throughput 5507.97 samples/s
[Epoch 1 Batch 76000] loss 3.54, ppl 34.45, throughput 5070.78 samples/s
[Epoch 1 Batch 77000] loss 3.54, ppl 34.48, throughput 4687.29 samples/s
[Epoch 1 Batch 78000] loss 3.54, ppl 34.41, throughput 4614.11 samples/s
Epoch 1 took 7323.82 seconds.
[Epoch 2 Batch 1000] loss 3.50, ppl 33.20, throughput 4549.21 samples/s
[Epoch 2 Batch 2000] loss 3.51, ppl 33.51, throughput 4624.28 samples/s
[Epoch 2 Batch 3000] loss 3.50, ppl 33.05, throughput 4314.30 samples/s
[Epoch 2 Batch 4000] loss 3.50, ppl 33.12, throughput 4493.33 samples/s
[Epoch 2 Batch 5000] loss 3.51, ppl 33.52, throughput 4622.40 samples/s
[Epoch 2 Batch 6000] loss 3.51, ppl 33.59, throughput 4583.71 samples/s
[Epoch 2 Batch 7000] loss 3.51, ppl 33.56, throughput 4539.79 samples/s
[Epoch 2 Batch 8000] loss 3.51, ppl 33.52, throughput 4489.65 samples/s
[Epoch 2 Batch 9000] loss 3.50, ppl 33.00, throughput 5703.05 samples/s
[Epoch 2 Batch 10000] loss 3.49, ppl 32.90, throughput 5682.15 samples/s
[Epoch 2 Batch 11000] loss 3.50, ppl 32.98, throughput 5751.12 samples/s
[Epoch 2 Batch 12000] loss 3.50, ppl 33.26, throughput 5515.54 samples/s
[Epoch 2 Batch 13000] loss 3.50, ppl 33.10, throughput 5752.29 samples/s
[Epoch 2 Batch 14000] loss 3.50, ppl 33.08, throughput 5747.62 samples/s
[Epoch 2 Batch 15000] loss 3.50, ppl 33.16, throughput 5531.13 samples/s
[Epoch 2 Batch 16000] loss 3.51, ppl 33.54, throughput 5765.22 samples/s
[Epoch 2 Batch 17000] loss 3.51, ppl 33.47, throughput 5686.56 samples/s
[Epoch 2 Batch 18000] loss 3.51, ppl 33.40, throughput 5684.91 samples/s
[Epoch 2 Batch 19000] loss 3.50, ppl 32.98, throughput 5542.61 samples/s
[Epoch 2 Batch 20000] loss 3.50, ppl 33.10, throughput 5697.24 samples/s
[Epoch 2 Batch 21000] loss 3.50, ppl 32.97, throughput 5723.34 samples/s
[Epoch 2 Batch 22000] loss 3.50, ppl 32.98, throughput 5708.90 samples/s
[Epoch 2 Batch 23000] loss 3.50, ppl 33.21, throughput 5522.53 samples/s
[Epoch 2 Batch 24000] loss 3.50, ppl 33.26, throughput 5688.12 samples/s
[Epoch 2 Batch 25000] loss 3.50, ppl 33.01, throughput 5760.20 samples/s
[Epoch 2 Batch 26000] loss 3.51, ppl 33.31, throughput 5716.50 samples/s
[Epoch 2 Batch 27000] loss 3.50, ppl 33.07, throughput 5546.14 samples/s
[Epoch 2 Batch 28000] loss 3.49, ppl 32.79, throughput 5712.66 samples/s
[Epoch 2 Batch 29000] loss 3.49, ppl 32.89, throughput 5732.01 samples/s
[Epoch 2 Batch 30000] loss 3.49, ppl 32.85, throughput 5557.24 samples/s
[Epoch 2 Batch 31000] loss 3.50, ppl 33.05, throughput 5694.53 samples/s
[Epoch 2 Batch 32000] loss 3.49, ppl 32.79, throughput 5723.85 samples/s
[Epoch 2 Batch 33000] loss 3.49, ppl 32.94, throughput 5736.64 samples/s
[Epoch 2 Batch 34000] loss 3.49, ppl 32.89, throughput 5544.96 samples/s
[Epoch 2 Batch 35000] loss 3.49, ppl 32.63, throughput 5708.52 samples/s
[Epoch 2 Batch 36000] loss 3.49, ppl 32.82, throughput 5728.72 samples/s
[Epoch 2 Batch 37000] loss 3.49, ppl 32.93, throughput 5710.44 samples/s
[Epoch 2 Batch 38000] loss 3.50, ppl 32.95, throughput 5554.55 samples/s
[Epoch 2 Batch 39000] loss 3.49, ppl 32.81, throughput 5691.69 samples/s
[Epoch 2 Batch 40000] loss 3.49, ppl 32.70, throughput 5745.12 samples/s
[Epoch 2 Batch 41000] loss 3.48, ppl 32.60, throughput 5595.15 samples/s
[Epoch 2 Batch 42000] loss 3.49, ppl 32.75, throughput 5730.79 samples/s
[Epoch 2 Batch 43000] loss 3.49, ppl 32.75, throughput 5778.02 samples/s
[Epoch 2 Batch 44000] loss 3.49, ppl 32.85, throughput 5702.47 samples/s
[Epoch 2 Batch 45000] loss 3.49, ppl 32.79, throughput 5593.93 samples/s
[Epoch 2 Batch 46000] loss 3.49, ppl 32.80, throughput 5743.23 samples/s
[Epoch 2 Batch 47000] loss 3.49, ppl 32.69, throughput 5722.82 samples/s
[Epoch 2 Batch 48000] loss 3.49, ppl 32.80, throughput 5716.75 samples/s
[Epoch 2 Batch 49000] loss 3.49, ppl 32.72, throughput 5246.39 samples/s
[Epoch 2 Batch 50000] loss 3.48, ppl 32.58, throughput 5424.76 samples/s
[Epoch 2 Batch 51000] loss 3.48, ppl 32.50, throughput 5343.63 samples/s
[Epoch 2 Batch 52000] loss 3.49, ppl 32.67, throughput 5382.63 samples/s
[Epoch 2 Batch 53000] loss 3.49, ppl 32.71, throughput 5186.45 samples/s
[Epoch 2 Batch 54000] loss 3.48, ppl 32.62, throughput 5399.52 samples/s
[Epoch 2 Batch 55000] loss 3.48, ppl 32.48, throughput 5519.21 samples/s
[Epoch 2 Batch 56000] loss 3.49, ppl 32.67, throughput 5528.34 samples/s
[Epoch 2 Batch 57000] loss 3.48, ppl 32.60, throughput 5718.79 samples/s
[Epoch 2 Batch 58000] loss 3.48, ppl 32.45, throughput 5701.91 samples/s
[Epoch 2 Batch 59000] loss 3.48, ppl 32.40, throughput 5798.39 samples/s
[Epoch 2 Batch 60000] loss 3.48, ppl 32.55, throughput 5555.42 samples/s
[Epoch 2 Batch 61000] loss 3.48, ppl 32.38, throughput 5723.85 samples/s
[Epoch 2 Batch 62000] loss 3.48, ppl 32.32, throughput 5693.46 samples/s
[Epoch 2 Batch 63000] loss 3.48, ppl 32.49, throughput 5750.82 samples/s
[Epoch 2 Batch 64000] loss 3.48, ppl 32.49, throughput 5583.56 samples/s
[Epoch 2 Batch 65000] loss 3.48, ppl 32.50, throughput 5763.88 samples/s
[Epoch 2 Batch 66000] loss 3.48, ppl 32.34, throughput 5733.98 samples/s
[Epoch 2 Batch 67000] loss 3.48, ppl 32.41, throughput 5579.35 samples/s
[Epoch 2 Batch 68000] loss 3.48, ppl 32.42, throughput 5720.82 samples/s
[Epoch 2 Batch 69000] loss 3.48, ppl 32.37, throughput 5759.61 samples/s
[Epoch 2 Batch 70000] loss 3.47, ppl 32.21, throughput 5744.27 samples/s
[Epoch 2 Batch 71000] loss 3.48, ppl 32.44, throughput 5581.50 samples/s
[Epoch 2 Batch 72000] loss 3.48, ppl 32.36, throughput 5747.30 samples/s
[Epoch 2 Batch 73000] loss 3.48, ppl 32.40, throughput 5753.69 samples/s
[Epoch 2 Batch 74000] loss 3.47, ppl 32.16, throughput 5693.86 samples/s
[Epoch 2 Batch 75000] loss 3.47, ppl 32.23, throughput 5565.73 samples/s
[Epoch 2 Batch 76000] loss 3.48, ppl 32.38, throughput 5754.42 samples/s
[Epoch 2 Batch 77000] loss 3.48, ppl 32.34, throughput 5644.55 samples/s
[Epoch 2 Batch 78000] loss 3.48, ppl 32.49, throughput 5706.93 samples/s
Epoch 2 took 7255.29 seconds.
[Epoch 3 Batch 1000] loss 3.44, ppl 31.16, throughput 5594.64 samples/s
[Epoch 3 Batch 2000] loss 3.42, ppl 30.72, throughput 5705.36 samples/s
[Epoch 3 Batch 3000] loss 3.45, ppl 31.63, throughput 5065.17 samples/s
[Epoch 3 Batch 4000] loss 3.43, ppl 30.84, throughput 5043.67 samples/s
[Epoch 3 Batch 5000] loss 3.46, ppl 31.66, throughput 5391.01 samples/s
[Epoch 3 Batch 6000] loss 3.46, ppl 31.73, throughput 5356.24 samples/s
[Epoch 3 Batch 7000] loss 3.43, ppl 31.01, throughput 5291.40 samples/s
[Epoch 3 Batch 8000] loss 3.45, ppl 31.44, throughput 5211.53 samples/s
[Epoch 3 Batch 9000] loss 3.45, ppl 31.54, throughput 5083.51 samples/s
[Epoch 3 Batch 10000] loss 3.45, ppl 31.50, throughput 5772.09 samples/s
[Epoch 3 Batch 11000] loss 3.45, ppl 31.55, throughput 5722.88 samples/s
[Epoch 3 Batch 12000] loss 3.45, ppl 31.63, throughput 5559.29 samples/s
[Epoch 3 Batch 13000] loss 3.45, ppl 31.64, throughput 5700.08 samples/s
[Epoch 3 Batch 14000] loss 3.45, ppl 31.54, throughput 5778.63 samples/s
[Epoch 3 Batch 15000] loss 3.45, ppl 31.34, throughput 5583.65 samples/s
[Epoch 3 Batch 16000] loss 3.45, ppl 31.56, throughput 5708.34 samples/s
[Epoch 3 Batch 17000] loss 3.45, ppl 31.51, throughput 5727.63 samples/s
[Epoch 3 Batch 18000] loss 3.45, ppl 31.37, throughput 5750.55 samples/s
[Epoch 3 Batch 19000] loss 3.45, ppl 31.46, throughput 5582.62 samples/s
[Epoch 3 Batch 20000] loss 3.45, ppl 31.44, throughput 5763.27 samples/s
[Epoch 3 Batch 21000] loss 3.45, ppl 31.60, throughput 5770.23 samples/s
[Epoch 3 Batch 22000] loss 3.45, ppl 31.42, throughput 5728.21 samples/s
[Epoch 3 Batch 23000] loss 3.44, ppl 31.24, throughput 5338.98 samples/s
[Epoch 3 Batch 24000] loss 3.44, ppl 31.23, throughput 5298.76 samples/s
[Epoch 3 Batch 25000] loss 3.45, ppl 31.48, throughput 5292.95 samples/s
[Epoch 3 Batch 26000] loss 3.44, ppl 31.27, throughput 5285.87 samples/s
[Epoch 3 Batch 27000] loss 3.45, ppl 31.51, throughput 4572.62 samples/s
[Epoch 3 Batch 28000] loss 3.45, ppl 31.40, throughput 4696.57 samples/s
[Epoch 3 Batch 29000] loss 3.45, ppl 31.37, throughput 5121.59 samples/s
[Epoch 3 Batch 30000] loss 3.45, ppl 31.50, throughput 5583.91 samples/s
[Epoch 3 Batch 31000] loss 3.45, ppl 31.53, throughput 5757.16 samples/s
[Epoch 3 Batch 32000] loss 3.45, ppl 31.56, throughput 5700.93 samples/s
[Epoch 3 Batch 33000] loss 3.44, ppl 31.32, throughput 5794.59 samples/s
[Epoch 3 Batch 34000] loss 3.45, ppl 31.62, throughput 5553.74 samples/s
[Epoch 3 Batch 35000] loss 3.45, ppl 31.38, throughput 5717.36 samples/s
[Epoch 3 Batch 36000] loss 3.44, ppl 31.20, throughput 5771.18 samples/s
[Epoch 3 Batch 37000] loss 3.44, ppl 31.08, throughput 5724.51 samples/s
[Epoch 3 Batch 38000] loss 3.45, ppl 31.56, throughput 5576.85 samples/s
[Epoch 3 Batch 39000] loss 3.45, ppl 31.38, throughput 5749.54 samples/s
[Epoch 3 Batch 40000] loss 3.44, ppl 31.18, throughput 5722.96 samples/s
[Epoch 3 Batch 41000] loss 3.44, ppl 31.33, throughput 5581.49 samples/s
[Epoch 3 Batch 42000] loss 3.44, ppl 31.34, throughput 5714.17 samples/s
[Epoch 3 Batch 43000] loss 3.44, ppl 31.21, throughput 5788.39 samples/s
[Epoch 3 Batch 44000] loss 3.44, ppl 31.28, throughput 5781.06 samples/s
[Epoch 3 Batch 45000] loss 3.44, ppl 31.34, throughput 5564.96 samples/s
[Epoch 3 Batch 46000] loss 3.44, ppl 31.22, throughput 5676.52 samples/s
[Epoch 3 Batch 47000] loss 3.44, ppl 31.28, throughput 5737.72 samples/s
[Epoch 3 Batch 48000] loss 3.45, ppl 31.35, throughput 5734.03 samples/s
[Epoch 3 Batch 49000] loss 3.45, ppl 31.39, throughput 5587.03 samples/s
[Epoch 3 Batch 50000] loss 3.45, ppl 31.43, throughput 5792.42 samples/s
[Epoch 3 Batch 51000] loss 3.44, ppl 31.32, throughput 5757.19 samples/s
[Epoch 3 Batch 52000] loss 3.45, ppl 31.37, throughput 5739.78 samples/s
[Epoch 3 Batch 53000] loss 3.45, ppl 31.41, throughput 5565.32 samples/s
[Epoch 3 Batch 54000] loss 3.44, ppl 31.22, throughput 5712.26 samples/s
[Epoch 3 Batch 55000] loss 3.44, ppl 31.18, throughput 5750.01 samples/s
[Epoch 3 Batch 56000] loss 3.44, ppl 31.30, throughput 5598.20 samples/s
[Epoch 3 Batch 57000] loss 3.44, ppl 31.21, throughput 5703.15 samples/s
[Epoch 3 Batch 58000] loss 3.44, ppl 31.28, throughput 5722.60 samples/s
[Epoch 3 Batch 59000] loss 3.44, ppl 31.22, throughput 5776.23 samples/s
[Epoch 3 Batch 60000] loss 3.45, ppl 31.40, throughput 5577.22 samples/s
[Epoch 3 Batch 61000] loss 3.44, ppl 31.27, throughput 5713.09 samples/s
[Epoch 3 Batch 62000] loss 3.44, ppl 31.32, throughput 5764.48 samples/s
[Epoch 3 Batch 63000] loss 3.44, ppl 31.06, throughput 5758.25 samples/s
[Epoch 3 Batch 64000] loss 3.44, ppl 31.10, throughput 5567.92 samples/s
[Epoch 3 Batch 65000] loss 3.44, ppl 31.25, throughput 5730.20 samples/s
[Epoch 3 Batch 66000] loss 3.43, ppl 31.02, throughput 5704.98 samples/s
[Epoch 3 Batch 67000] loss 3.44, ppl 31.04, throughput 5527.19 samples/s
[Epoch 3 Batch 68000] loss 3.44, ppl 31.18, throughput 5700.79 samples/s
[Epoch 3 Batch 69000] loss 3.44, ppl 31.20, throughput 5751.03 samples/s
[Epoch 3 Batch 70000] loss 3.44, ppl 31.28, throughput 5714.79 samples/s
[Epoch 3 Batch 71000] loss 3.44, ppl 31.25, throughput 5603.86 samples/s
[Epoch 3 Batch 72000] loss 3.43, ppl 31.01, throughput 5725.73 samples/s
[Epoch 3 Batch 73000] loss 3.43, ppl 31.03, throughput 5741.14 samples/s
[Epoch 3 Batch 74000] loss 3.44, ppl 31.13, throughput 5719.27 samples/s
[Epoch 3 Batch 75000] loss 3.44, ppl 31.25, throughput 5533.83 samples/s
[Epoch 3 Batch 76000] loss 3.44, ppl 31.03, throughput 4829.66 samples/s
[Epoch 3 Batch 77000] loss 3.44, ppl 31.09, throughput 4606.71 samples/s
[Epoch 3 Batch 78000] loss 3.44, ppl 31.19, throughput 4747.18 samples/s
Epoch 3 took 7213.15 seconds.
[Epoch 4 Batch 1000] loss 3.41, ppl 30.21, throughput 5293.15 samples/s
[Epoch 4 Batch 2000] loss 3.42, ppl 30.51, throughput 5359.11 samples/s
[Epoch 4 Batch 3000] loss 3.40, ppl 29.91, throughput 4397.25 samples/s
[Epoch 4 Batch 4000] loss 3.42, ppl 30.47, throughput 4414.70 samples/s
[Epoch 4 Batch 5000] loss 3.39, ppl 29.74, throughput 4691.99 samples/s
[Epoch 4 Batch 6000] loss 3.41, ppl 30.25, throughput 4607.48 samples/s
[Epoch 4 Batch 7000] loss 3.41, ppl 30.36, throughput 4613.96 samples/s
[Epoch 4 Batch 8000] loss 3.41, ppl 30.22, throughput 4442.88 samples/s
[Epoch 4 Batch 9000] loss 3.41, ppl 30.39, throughput 5564.91 samples/s
[Epoch 4 Batch 10000] loss 3.41, ppl 30.37, throughput 5717.99 samples/s
[Epoch 4 Batch 11000] loss 3.41, ppl 30.40, throughput 5724.80 samples/s
[Epoch 4 Batch 12000] loss 3.42, ppl 30.44, throughput 5548.13 samples/s
[Epoch 4 Batch 13000] loss 3.41, ppl 30.21, throughput 5718.02 samples/s
[Epoch 4 Batch 14000] loss 3.41, ppl 30.14, throughput 5723.91 samples/s
[Epoch 4 Batch 15000] loss 3.42, ppl 30.44, throughput 5584.57 samples/s
[Epoch 4 Batch 16000] loss 3.42, ppl 30.43, throughput 5652.32 samples/s
[Epoch 4 Batch 17000] loss 3.42, ppl 30.43, throughput 5716.82 samples/s
[Epoch 4 Batch 18000] loss 3.42, ppl 30.45, throughput 5746.42 samples/s
[Epoch 4 Batch 19000] loss 3.41, ppl 30.39, throughput 5573.86 samples/s
[Epoch 4 Batch 20000] loss 3.41, ppl 30.26, throughput 5665.63 samples/s
[Epoch 4 Batch 21000] loss 3.42, ppl 30.53, throughput 5736.29 samples/s
[Epoch 4 Batch 22000] loss 3.41, ppl 30.38, throughput 5773.41 samples/s
[Epoch 4 Batch 23000] loss 3.41, ppl 30.41, throughput 5538.42 samples/s
[Epoch 4 Batch 24000] loss 3.41, ppl 30.28, throughput 5769.77 samples/s
[Epoch 4 Batch 25000] loss 3.41, ppl 30.31, throughput 5665.80 samples/s
[Epoch 4 Batch 26000] loss 3.42, ppl 30.53, throughput 5721.05 samples/s
[Epoch 4 Batch 27000] loss 3.41, ppl 30.15, throughput 5546.07 samples/s
[Epoch 4 Batch 28000] loss 3.42, ppl 30.43, throughput 5742.34 samples/s
[Epoch 4 Batch 29000] loss 3.42, ppl 30.57, throughput 5754.56 samples/s
[Epoch 4 Batch 30000] loss 3.42, ppl 30.60, throughput 5531.90 samples/s
[Epoch 4 Batch 31000] loss 3.41, ppl 30.39, throughput 5770.56 samples/s
[Epoch 4 Batch 32000] loss 3.42, ppl 30.51, throughput 5718.89 samples/s
[Epoch 4 Batch 33000] loss 3.41, ppl 30.38, throughput 5760.60 samples/s
[Epoch 4 Batch 34000] loss 3.41, ppl 30.36, throughput 5540.76 samples/s
[Epoch 4 Batch 35000] loss 3.42, ppl 30.48, throughput 5665.84 samples/s
[Epoch 4 Batch 36000] loss 3.41, ppl 30.40, throughput 5714.46 samples/s
[Epoch 4 Batch 37000] loss 3.41, ppl 30.34, throughput 5699.19 samples/s
[Epoch 4 Batch 38000] loss 3.41, ppl 30.29, throughput 5541.29 samples/s
[Epoch 4 Batch 39000] loss 3.41, ppl 30.36, throughput 5759.82 samples/s
[Epoch 4 Batch 40000] loss 3.42, ppl 30.45, throughput 5750.27 samples/s
[Epoch 4 Batch 41000] loss 3.41, ppl 30.25, throughput 5574.14 samples/s
[Epoch 4 Batch 42000] loss 3.42, ppl 30.45, throughput 5753.26 samples/s
[Epoch 4 Batch 43000] loss 3.41, ppl 30.36, throughput 5764.83 samples/s
[Epoch 4 Batch 44000] loss 3.41, ppl 30.40, throughput 5738.20 samples/s
[Epoch 4 Batch 45000] loss 3.41, ppl 30.26, throughput 5598.87 samples/s
[Epoch 4 Batch 46000] loss 3.41, ppl 30.35, throughput 5716.75 samples/s
[Epoch 4 Batch 47000] loss 3.42, ppl 30.43, throughput 5731.03 samples/s
[Epoch 4 Batch 48000] loss 3.41, ppl 30.24, throughput 5725.00 samples/s
[Epoch 4 Batch 49000] loss 3.41, ppl 30.33, throughput 5355.28 samples/s
[Epoch 4 Batch 50000] loss 3.41, ppl 30.35, throughput 5331.57 samples/s
[Epoch 4 Batch 51000] loss 3.41, ppl 30.23, throughput 5328.32 samples/s
[Epoch 4 Batch 52000] loss 3.41, ppl 30.36, throughput 5372.60 samples/s
[Epoch 4 Batch 53000] loss 3.42, ppl 30.43, throughput 5111.16 samples/s
[Epoch 4 Batch 54000] loss 3.41, ppl 30.28, throughput 5344.31 samples/s
[Epoch 4 Batch 55000] loss 3.41, ppl 30.33, throughput 5244.19 samples/s
[Epoch 4 Batch 56000] loss 3.41, ppl 30.33, throughput 5552.74 samples/s
[Epoch 4 Batch 57000] loss 3.42, ppl 30.47, throughput 5728.00 samples/s
[Epoch 4 Batch 58000] loss 3.41, ppl 30.34, throughput 5734.19 samples/s
[Epoch 4 Batch 59000] loss 3.41, ppl 30.36, throughput 5719.94 samples/s
[Epoch 4 Batch 60000] loss 3.41, ppl 30.34, throughput 5581.99 samples/s
[Epoch 4 Batch 61000] loss 3.41, ppl 30.37, throughput 5720.55 samples/s
[Epoch 4 Batch 62000] loss 3.41, ppl 30.34, throughput 5691.47 samples/s
[Epoch 4 Batch 63000] loss 3.41, ppl 30.27, throughput 5751.24 samples/s
[Epoch 4 Batch 64000] loss 3.41, ppl 30.28, throughput 5570.59 samples/s
[Epoch 4 Batch 65000] loss 3.41, ppl 30.38, throughput 5727.51 samples/s
[Epoch 4 Batch 66000] loss 3.41, ppl 30.32, throughput 5701.19 samples/s
[Epoch 4 Batch 67000] loss 3.41, ppl 30.25, throughput 5505.51 samples/s
[Epoch 4 Batch 68000] loss 3.41, ppl 30.30, throughput 5764.85 samples/s
[Epoch 4 Batch 69000] loss 3.41, ppl 30.31, throughput 5759.25 samples/s
[Epoch 4 Batch 70000] loss 3.41, ppl 30.19, throughput 5764.71 samples/s
[Epoch 4 Batch 71000] loss 3.41, ppl 30.24, throughput 5574.65 samples/s
[Epoch 4 Batch 72000] loss 3.41, ppl 30.34, throughput 5739.35 samples/s
[Epoch 4 Batch 73000] loss 3.41, ppl 30.38, throughput 5760.88 samples/s
[Epoch 4 Batch 74000] loss 3.41, ppl 30.38, throughput 5692.21 samples/s
[Epoch 4 Batch 75000] loss 3.41, ppl 30.24, throughput 5529.50 samples/s
[Epoch 4 Batch 76000] loss 3.41, ppl 30.28, throughput 5793.64 samples/s
[Epoch 4 Batch 77000] loss 3.41, ppl 30.27, throughput 5770.64 samples/s
[Epoch 4 Batch 78000] loss 3.41, ppl 30.25, throughput 5786.38 samples/s
Epoch 4 took 7226.43 seconds.
[Epoch 5 Batch 1000] loss 3.39, ppl 29.55, throughput 5595.07 samples/s
[Epoch 5 Batch 2000] loss 3.39, ppl 29.56, throughput 5767.16 samples/s
[Epoch 5 Batch 3000] loss 3.39, ppl 29.67, throughput 5116.79 samples/s
[Epoch 5 Batch 4000] loss 3.37, ppl 29.12, throughput 5073.26 samples/s
[Epoch 5 Batch 5000] loss 3.39, ppl 29.68, throughput 5365.46 samples/s
[Epoch 5 Batch 6000] loss 3.39, ppl 29.56, throughput 4825.95 samples/s
[Epoch 5 Batch 7000] loss 3.39, ppl 29.64, throughput 4593.48 samples/s
[Epoch 5 Batch 8000] loss 3.38, ppl 29.27, throughput 4568.82 samples/s
[Epoch 5 Batch 9000] loss 3.38, ppl 29.52, throughput 5115.61 samples/s
[Epoch 5 Batch 10000] loss 3.39, ppl 29.74, throughput 5684.46 samples/s
[Epoch 5 Batch 11000] loss 3.39, ppl 29.72, throughput 5735.72 samples/s
[Epoch 5 Batch 12000] loss 3.38, ppl 29.32, throughput 5571.34 samples/s
[Epoch 5 Batch 13000] loss 3.38, ppl 29.45, throughput 5720.51 samples/s
[Epoch 5 Batch 14000] loss 3.39, ppl 29.56, throughput 5753.09 samples/s
[Epoch 5 Batch 15000] loss 3.38, ppl 29.47, throughput 5569.39 samples/s
[Epoch 5 Batch 16000] loss 3.39, ppl 29.77, throughput 5761.76 samples/s
[Epoch 5 Batch 17000] loss 3.40, ppl 29.83, throughput 5665.95 samples/s
[Epoch 5 Batch 18000] loss 3.39, ppl 29.80, throughput 5714.89 samples/s
[Epoch 5 Batch 19000] loss 3.39, ppl 29.75, throughput 5586.87 samples/s
[Epoch 5 Batch 20000] loss 3.39, ppl 29.63, throughput 5703.43 samples/s
[Epoch 5 Batch 21000] loss 3.39, ppl 29.77, throughput 5758.68 samples/s
[Epoch 5 Batch 22000] loss 3.40, ppl 29.85, throughput 5716.88 samples/s
[Epoch 5 Batch 23000] loss 3.39, ppl 29.52, throughput 5230.21 samples/s
[Epoch 5 Batch 24000] loss 3.39, ppl 29.67, throughput 5318.53 samples/s
[Epoch 5 Batch 25000] loss 3.39, ppl 29.58, throughput 5265.36 samples/s
[Epoch 5 Batch 26000] loss 3.39, ppl 29.67, throughput 5321.43 samples/s
[Epoch 5 Batch 27000] loss 3.39, ppl 29.63, throughput 5181.07 samples/s
[Epoch 5 Batch 28000] loss 3.39, ppl 29.62, throughput 4717.78 samples/s
[Epoch 5 Batch 29000] loss 3.39, ppl 29.70, throughput 5161.01 samples/s
[Epoch 5 Batch 30000] loss 3.39, ppl 29.60, throughput 5578.79 samples/s
[Epoch 5 Batch 31000] loss 3.39, ppl 29.69, throughput 5757.80 samples/s
[Epoch 5 Batch 32000] loss 3.39, ppl 29.67, throughput 5679.44 samples/s
[Epoch 5 Batch 33000] loss 3.39, ppl 29.69, throughput 5693.79 samples/s
[Epoch 5 Batch 34000] loss 3.40, ppl 29.84, throughput 5590.31 samples/s
[Epoch 5 Batch 35000] loss 3.39, ppl 29.61, throughput 5729.26 samples/s
[Epoch 5 Batch 36000] loss 3.39, ppl 29.63, throughput 5744.08 samples/s
[Epoch 5 Batch 37000] loss 3.39, ppl 29.76, throughput 5744.44 samples/s
[Epoch 5 Batch 38000] loss 3.39, ppl 29.54, throughput 5586.13 samples/s
[Epoch 5 Batch 39000] loss 3.39, ppl 29.70, throughput 5766.45 samples/s
[Epoch 5 Batch 40000] loss 3.39, ppl 29.68, throughput 5775.29 samples/s
[Epoch 5 Batch 41000] loss 3.39, ppl 29.58, throughput 5604.15 samples/s
[Epoch 5 Batch 42000] loss 3.39, ppl 29.72, throughput 5729.96 samples/s
[Epoch 5 Batch 43000] loss 3.38, ppl 29.51, throughput 5725.68 samples/s
[Epoch 5 Batch 44000] loss 3.39, ppl 29.67, throughput 5761.87 samples/s
[Epoch 5 Batch 45000] loss 3.40, ppl 29.82, throughput 5556.04 samples/s
[Epoch 5 Batch 46000] loss 3.39, ppl 29.71, throughput 5804.03 samples/s
[Epoch 5 Batch 47000] loss 3.39, ppl 29.73, throughput 5830.26 samples/s
[Epoch 5 Batch 48000] loss 3.39, ppl 29.62, throughput 5722.01 samples/s
[Epoch 5 Batch 49000] loss 3.39, ppl 29.58, throughput 5589.96 samples/s
[Epoch 5 Batch 50000] loss 3.39, ppl 29.61, throughput 5754.62 samples/s
[Epoch 5 Batch 51000] loss 3.39, ppl 29.70, throughput 5680.84 samples/s
[Epoch 5 Batch 52000] loss 3.39, ppl 29.69, throughput 5731.83 samples/s
[Epoch 5 Batch 53000] loss 3.39, ppl 29.54, throughput 5607.45 samples/s
[Epoch 5 Batch 54000] loss 3.39, ppl 29.55, throughput 5672.25 samples/s
[Epoch 5 Batch 55000] loss 3.39, ppl 29.57, throughput 5743.36 samples/s
[Epoch 5 Batch 56000] loss 3.39, ppl 29.63, throughput 5546.06 samples/s
[Epoch 5 Batch 57000] loss 3.39, ppl 29.80, throughput 5729.50 samples/s
[Epoch 5 Batch 58000] loss 3.39, ppl 29.68, throughput 5768.81 samples/s
[Epoch 5 Batch 59000] loss 3.39, ppl 29.71, throughput 5684.99 samples/s
[Epoch 5 Batch 60000] loss 3.39, ppl 29.63, throughput 5501.80 samples/s
[Epoch 5 Batch 61000] loss 3.39, ppl 29.63, throughput 5761.29 samples/s
[Epoch 5 Batch 62000] loss 3.39, ppl 29.65, throughput 5749.26 samples/s
[Epoch 5 Batch 63000] loss 3.39, ppl 29.60, throughput 5735.58 samples/s
[Epoch 5 Batch 64000] loss 3.39, ppl 29.59, throughput 5584.52 samples/s
[Epoch 5 Batch 65000] loss 3.39, ppl 29.56, throughput 5687.72 samples/s
[Epoch 5 Batch 66000] loss 3.39, ppl 29.64, throughput 5734.79 samples/s
[Epoch 5 Batch 67000] loss 3.39, ppl 29.56, throughput 5559.85 samples/s
[Epoch 5 Batch 68000] loss 3.39, ppl 29.61, throughput 5740.68 samples/s
[Epoch 5 Batch 69000] loss 3.39, ppl 29.66, throughput 5207.57 samples/s
[Epoch 5 Batch 70000] loss 3.39, ppl 29.59, throughput 4608.09 samples/s
[Epoch 5 Batch 71000] loss 3.39, ppl 29.64, throughput 4478.58 samples/s
[Epoch 5 Batch 72000] loss 3.39, ppl 29.59, throughput 4565.61 samples/s
[Epoch 5 Batch 73000] loss 3.39, ppl 29.69, throughput 4697.02 samples/s
[Epoch 5 Batch 74000] loss 3.39, ppl 29.62, throughput 4654.05 samples/s
[Epoch 5 Batch 75000] loss 3.39, ppl 29.65, throughput 5405.83 samples/s
[Epoch 5 Batch 76000] loss 3.39, ppl 29.60, throughput 5746.53 samples/s
[Epoch 5 Batch 77000] loss 3.39, ppl 29.60, throughput 5734.62 samples/s
[Epoch 5 Batch 78000] loss 3.39, ppl 29.53, throughput 5738.57 samples/s
Epoch 5 took 7300.56 seconds.
[Epoch 6 Batch 1000] loss 3.37, ppl 29.01, throughput 5646.88 samples/s
[Epoch 6 Batch 2000] loss 3.37, ppl 28.96, throughput 5033.93 samples/s
[Epoch 6 Batch 3000] loss 3.37, ppl 28.99, throughput 4850.01 samples/s
[Epoch 6 Batch 4000] loss 3.37, ppl 29.10, throughput 5117.01 samples/s
[Epoch 6 Batch 5000] loss 3.37, ppl 29.07, throughput 4932.26 samples/s
[Epoch 6 Batch 6000] loss 3.37, ppl 29.09, throughput 4607.14 samples/s
[Epoch 6 Batch 7000] loss 3.35, ppl 28.47, throughput 4698.40 samples/s
[Epoch 6 Batch 8000] loss 3.37, ppl 28.95, throughput 5318.54 samples/s
[Epoch 6 Batch 9000] loss 3.37, ppl 29.22, throughput 5691.35 samples/s
[Epoch 6 Batch 10000] loss 3.37, ppl 28.94, throughput 5715.21 samples/s
[Epoch 6 Batch 11000] loss 3.36, ppl 28.93, throughput 5716.87 samples/s
[Epoch 6 Batch 12000] loss 3.37, ppl 28.96, throughput 5509.81 samples/s
[Epoch 6 Batch 13000] loss 3.36, ppl 28.80, throughput 5643.47 samples/s
[Epoch 6 Batch 14000] loss 3.37, ppl 29.10, throughput 5749.88 samples/s
[Epoch 6 Batch 15000] loss 3.37, ppl 29.18, throughput 5612.87 samples/s
[Epoch 6 Batch 16000] loss 3.37, ppl 29.10, throughput 5756.58 samples/s
[Epoch 6 Batch 17000] loss 3.36, ppl 28.84, throughput 5705.81 samples/s
[Epoch 6 Batch 18000] loss 3.36, ppl 28.79, throughput 5759.42 samples/s
[Epoch 6 Batch 19000] loss 3.37, ppl 29.14, throughput 5582.69 samples/s
[Epoch 6 Batch 20000] loss 3.36, ppl 28.88, throughput 5693.28 samples/s
[Epoch 6 Batch 21000] loss 3.37, ppl 29.12, throughput 5690.46 samples/s
[Epoch 6 Batch 22000] loss 3.37, ppl 29.22, throughput 5752.52 samples/s
[Epoch 6 Batch 23000] loss 3.37, ppl 29.00, throughput 5569.67 samples/s
[Epoch 6 Batch 24000] loss 3.37, ppl 29.07, throughput 5753.09 samples/s
[Epoch 6 Batch 25000] loss 3.38, ppl 29.25, throughput 5773.85 samples/s
[Epoch 6 Batch 26000] loss 3.37, ppl 29.00, throughput 5696.77 samples/s
[Epoch 6 Batch 27000] loss 3.37, ppl 28.96, throughput 5582.98 samples/s
[Epoch 6 Batch 28000] loss 3.37, ppl 28.98, throughput 5780.90 samples/s
[Epoch 6 Batch 29000] loss 3.37, ppl 29.04, throughput 5678.47 samples/s
[Epoch 6 Batch 30000] loss 3.37, ppl 29.16, throughput 5612.75 samples/s
[Epoch 6 Batch 31000] loss 3.37, ppl 29.08, throughput 5736.54 samples/s
[Epoch 6 Batch 32000] loss 3.37, ppl 29.18, throughput 5663.08 samples/s
[Epoch 6 Batch 33000] loss 3.37, ppl 29.11, throughput 5722.64 samples/s
[Epoch 6 Batch 34000] loss 3.38, ppl 29.24, throughput 5547.67 samples/s
[Epoch 6 Batch 35000] loss 3.37, ppl 29.06, throughput 5781.72 samples/s
[Epoch 6 Batch 36000] loss 3.37, ppl 28.97, throughput 5699.38 samples/s
[Epoch 6 Batch 37000] loss 3.37, ppl 29.07, throughput 5755.01 samples/s
[Epoch 6 Batch 38000] loss 3.37, ppl 28.96, throughput 5536.73 samples/s
[Epoch 6 Batch 39000] loss 3.37, ppl 29.05, throughput 5774.89 samples/s
[Epoch 6 Batch 40000] loss 3.37, ppl 29.13, throughput 5736.25 samples/s
[Epoch 6 Batch 41000] loss 3.37, ppl 29.21, throughput 5604.72 samples/s
[Epoch 6 Batch 42000] loss 3.37, ppl 29.19, throughput 4939.63 samples/s
[Epoch 6 Batch 43000] loss 3.37, ppl 29.14, throughput 4672.98 samples/s
[Epoch 6 Batch 44000] loss 3.37, ppl 29.17, throughput 4605.59 samples/s
[Epoch 6 Batch 45000] loss 3.37, ppl 29.15, throughput 4606.44 samples/s
[Epoch 6 Batch 46000] loss 3.37, ppl 29.09, throughput 4631.76 samples/s
[Epoch 6 Batch 47000] loss 3.37, ppl 29.15, throughput 4782.96 samples/s
[Epoch 6 Batch 48000] loss 3.37, ppl 29.06, throughput 5738.36 samples/s
[Epoch 6 Batch 49000] loss 3.37, ppl 29.20, throughput 5579.52 samples/s
[Epoch 6 Batch 50000] loss 3.37, ppl 29.06, throughput 5735.82 samples/s
[Epoch 6 Batch 51000] loss 3.37, ppl 29.17, throughput 5788.02 samples/s
[Epoch 6 Batch 52000] loss 3.38, ppl 29.28, throughput 5753.40 samples/s
[Epoch 6 Batch 53000] loss 3.37, ppl 29.10, throughput 5582.03 samples/s
[Epoch 6 Batch 54000] loss 3.37, ppl 29.09, throughput 5788.65 samples/s
[Epoch 6 Batch 55000] loss 3.37, ppl 29.03, throughput 5759.96 samples/s
[Epoch 6 Batch 56000] loss 3.37, ppl 29.14, throughput 5596.64 samples/s
[Epoch 6 Batch 57000] loss 3.37, ppl 28.99, throughput 5746.44 samples/s
[Epoch 6 Batch 58000] loss 3.37, ppl 29.01, throughput 5731.45 samples/s
[Epoch 6 Batch 59000] loss 3.37, ppl 29.11, throughput 5678.82 samples/s
[Epoch 6 Batch 60000] loss 3.37, ppl 29.09, throughput 5572.62 samples/s
[Epoch 6 Batch 61000] loss 3.37, ppl 29.10, throughput 5768.37 samples/s
[Epoch 6 Batch 62000] loss 3.37, ppl 29.11, throughput 5741.41 samples/s
[Epoch 6 Batch 63000] loss 3.37, ppl 28.99, throughput 5689.83 samples/s
[Epoch 6 Batch 64000] loss 3.37, ppl 29.08, throughput 5595.84 samples/s
[Epoch 6 Batch 65000] loss 3.37, ppl 29.11, throughput 5720.34 samples/s
[Epoch 6 Batch 66000] loss 3.37, ppl 29.17, throughput 5749.33 samples/s
[Epoch 6 Batch 67000] loss 3.37, ppl 29.12, throughput 5615.39 samples/s
[Epoch 6 Batch 68000] loss 3.37, ppl 29.04, throughput 5729.84 samples/s
[Epoch 6 Batch 69000] loss 3.38, ppl 29.24, throughput 5656.49 samples/s
[Epoch 6 Batch 70000] loss 3.37, ppl 29.19, throughput 5754.15 samples/s
[Epoch 6 Batch 71000] loss 3.37, ppl 29.04, throughput 5581.93 samples/s
[Epoch 6 Batch 72000] loss 3.37, ppl 29.05, throughput 5759.24 samples/s
[Epoch 6 Batch 73000] loss 3.37, ppl 29.09, throughput 5703.61 samples/s
[Epoch 6 Batch 74000] loss 3.37, ppl 29.06, throughput 5713.25 samples/s
[Epoch 6 Batch 75000] loss 3.37, ppl 29.15, throughput 5589.67 samples/s
[Epoch 6 Batch 76000] loss 3.37, ppl 29.14, throughput 5759.14 samples/s
[Epoch 6 Batch 77000] loss 3.37, ppl 29.09, throughput 5730.98 samples/s
[Epoch 6 Batch 78000] loss 3.37, ppl 29.06, throughput 5727.91 samples/s
Epoch 6 took 7233.37 seconds.
[Epoch 7 Batch 1000] loss 3.35, ppl 28.50, throughput 5331.70 samples/s
[Epoch 7 Batch 2000] loss 3.34, ppl 28.19, throughput 5204.49 samples/s
[Epoch 7 Batch 3000] loss 3.35, ppl 28.41, throughput 4712.44 samples/s
[Epoch 7 Batch 4000] loss 3.35, ppl 28.59, throughput 4477.35 samples/s
[Epoch 7 Batch 5000] loss 3.35, ppl 28.42, throughput 4908.55 samples/s
[Epoch 7 Batch 6000] loss 3.35, ppl 28.47, throughput 5310.47 samples/s
[Epoch 7 Batch 7000] loss 3.35, ppl 28.56, throughput 5383.86 samples/s
[Epoch 7 Batch 8000] loss 3.35, ppl 28.48, throughput 5534.44 samples/s
[Epoch 7 Batch 9000] loss 3.35, ppl 28.53, throughput 5728.91 samples/s
[Epoch 7 Batch 10000] loss 3.35, ppl 28.53, throughput 5786.02 samples/s
[Epoch 7 Batch 11000] loss 3.35, ppl 28.42, throughput 5733.98 samples/s
[Epoch 7 Batch 12000] loss 3.35, ppl 28.50, throughput 5554.94 samples/s
[Epoch 7 Batch 13000] loss 3.35, ppl 28.54, throughput 5687.74 samples/s
[Epoch 7 Batch 14000] loss 3.35, ppl 28.47, throughput 5712.57 samples/s
[Epoch 7 Batch 15000] loss 3.34, ppl 28.33, throughput 4787.27 samples/s
[Epoch 7 Batch 16000] loss 3.35, ppl 28.62, throughput 4650.63 samples/s
[Epoch 7 Batch 17000] loss 3.35, ppl 28.45, throughput 4594.04 samples/s
[Epoch 7 Batch 18000] loss 3.35, ppl 28.40, throughput 5030.93 samples/s
[Epoch 7 Batch 19000] loss 3.35, ppl 28.54, throughput 5154.63 samples/s
[Epoch 7 Batch 20000] loss 3.35, ppl 28.44, throughput 4937.71 samples/s
[Epoch 7 Batch 21000] loss 3.35, ppl 28.52, throughput 5623.78 samples/s
[Epoch 7 Batch 22000] loss 3.36, ppl 28.69, throughput 5765.45 samples/s
[Epoch 7 Batch 23000] loss 3.35, ppl 28.55, throughput 5516.91 samples/s
[Epoch 7 Batch 24000] loss 3.35, ppl 28.54, throughput 5733.29 samples/s
[Epoch 7 Batch 25000] loss 3.36, ppl 28.65, throughput 5700.20 samples/s
[Epoch 7 Batch 26000] loss 3.34, ppl 28.35, throughput 5573.60 samples/s
[Epoch 7 Batch 27000] loss 3.36, ppl 28.65, throughput 5733.96 samples/s
[Epoch 7 Batch 28000] loss 3.36, ppl 28.68, throughput 5723.05 samples/s
[Epoch 7 Batch 29000] loss 3.36, ppl 28.67, throughput 5696.58 samples/s
[Epoch 7 Batch 30000] loss 3.36, ppl 28.66, throughput 5569.73 samples/s
[Epoch 7 Batch 31000] loss 3.35, ppl 28.53, throughput 5750.91 samples/s
[Epoch 7 Batch 32000] loss 3.35, ppl 28.54, throughput 5739.69 samples/s
[Epoch 7 Batch 33000] loss 3.36, ppl 28.76, throughput 5771.15 samples/s
[Epoch 7 Batch 34000] loss 3.36, ppl 28.72, throughput 5596.02 samples/s
[Epoch 7 Batch 35000] loss 3.35, ppl 28.50, throughput 5723.83 samples/s
[Epoch 7 Batch 36000] loss 3.35, ppl 28.48, throughput 5730.68 samples/s
[Epoch 7 Batch 37000] loss 3.35, ppl 28.58, throughput 5729.56 samples/s
[Epoch 7 Batch 38000] loss 3.35, ppl 28.63, throughput 5593.78 samples/s
[Epoch 7 Batch 39000] loss 3.36, ppl 28.69, throughput 5707.52 samples/s
[Epoch 7 Batch 40000] loss 3.36, ppl 28.65, throughput 5754.02 samples/s
[Epoch 7 Batch 41000] loss 3.36, ppl 28.77, throughput 5617.89 samples/s
[Epoch 7 Batch 42000] loss 3.36, ppl 28.75, throughput 5731.44 samples/s
[Epoch 7 Batch 43000] loss 3.36, ppl 28.68, throughput 5716.10 samples/s
[Epoch 7 Batch 44000] loss 3.36, ppl 28.69, throughput 5735.47 samples/s
[Epoch 7 Batch 45000] loss 3.36, ppl 28.68, throughput 5564.34 samples/s
[Epoch 7 Batch 46000] loss 3.35, ppl 28.62, throughput 5726.52 samples/s
[Epoch 7 Batch 47000] loss 3.35, ppl 28.60, throughput 5761.74 samples/s
[Epoch 7 Batch 48000] loss 3.35, ppl 28.54, throughput 5732.13 samples/s
[Epoch 7 Batch 49000] loss 3.36, ppl 28.67, throughput 5545.38 samples/s
[Epoch 7 Batch 50000] loss 3.36, ppl 28.72, throughput 5759.50 samples/s
[Epoch 7 Batch 51000] loss 3.36, ppl 28.74, throughput 5752.23 samples/s
[Epoch 7 Batch 52000] loss 3.35, ppl 28.62, throughput 5738.38 samples/s
[Epoch 7 Batch 53000] loss 3.36, ppl 28.70, throughput 5604.33 samples/s
[Epoch 7 Batch 54000] loss 3.36, ppl 28.76, throughput 5672.56 samples/s
[Epoch 7 Batch 55000] loss 3.35, ppl 28.64, throughput 5785.76 samples/s
[Epoch 7 Batch 56000] loss 3.36, ppl 28.75, throughput 5547.99 samples/s
[Epoch 7 Batch 57000] loss 3.35, ppl 28.57, throughput 5745.85 samples/s
[Epoch 7 Batch 58000] loss 3.36, ppl 28.83, throughput 5731.78 samples/s
[Epoch 7 Batch 59000] loss 3.36, ppl 28.70, throughput 5739.30 samples/s
[Epoch 7 Batch 60000] loss 3.36, ppl 28.82, throughput 5594.37 samples/s
[Epoch 7 Batch 61000] loss 3.35, ppl 28.62, throughput 5752.74 samples/s
[Epoch 7 Batch 62000] loss 3.36, ppl 28.71, throughput 5748.97 samples/s
[Epoch 7 Batch 63000] loss 3.36, ppl 28.74, throughput 5716.88 samples/s
[Epoch 7 Batch 64000] loss 3.36, ppl 28.74, throughput 5546.35 samples/s
[Epoch 7 Batch 65000] loss 3.35, ppl 28.55, throughput 5630.64 samples/s
[Epoch 7 Batch 66000] loss 3.35, ppl 28.54, throughput 5757.74 samples/s
[Epoch 7 Batch 67000] loss 3.36, ppl 28.69, throughput 5286.09 samples/s
[Epoch 7 Batch 68000] loss 3.36, ppl 28.71, throughput 4754.02 samples/s
[Epoch 7 Batch 69000] loss 3.35, ppl 28.60, throughput 5354.72 samples/s
[Epoch 7 Batch 70000] loss 3.36, ppl 28.66, throughput 5325.49 samples/s
[Epoch 7 Batch 71000] loss 3.35, ppl 28.63, throughput 5097.73 samples/s
[Epoch 7 Batch 72000] loss 3.36, ppl 28.66, throughput 5068.07 samples/s
[Epoch 7 Batch 73000] loss 3.36, ppl 28.69, throughput 5367.63 samples/s
[Epoch 7 Batch 74000] loss 3.36, ppl 28.76, throughput 5727.09 samples/s
[Epoch 7 Batch 75000] loss 3.36, ppl 28.77, throughput 5621.63 samples/s
[Epoch 7 Batch 76000] loss 3.36, ppl 28.79, throughput 5786.99 samples/s
[Epoch 7 Batch 77000] loss 3.35, ppl 28.64, throughput 5717.78 samples/s
[Epoch 7 Batch 78000] loss 3.35, ppl 28.65, throughput 5761.57 samples/s
Epoch 7 took 7261.97 seconds.
[Epoch 8 Batch 1000] loss 3.33, ppl 28.06, throughput 4626.74 samples/s
[Epoch 8 Batch 2000] loss 3.33, ppl 28.08, throughput 4865.51 samples/s
[Epoch 8 Batch 3000] loss 3.32, ppl 27.72, throughput 5318.06 samples/s
[Epoch 8 Batch 4000] loss 3.33, ppl 27.93, throughput 5164.32 samples/s
[Epoch 8 Batch 5000] loss 3.34, ppl 28.17, throughput 4765.89 samples/s
[Epoch 8 Batch 6000] loss 3.33, ppl 27.98, throughput 4743.06 samples/s
[Epoch 8 Batch 7000] loss 3.34, ppl 28.17, throughput 5736.41 samples/s
[Epoch 8 Batch 8000] loss 3.34, ppl 28.11, throughput 5607.19 samples/s
[Epoch 8 Batch 9000] loss 3.33, ppl 28.04, throughput 5710.33 samples/s
[Epoch 8 Batch 10000] loss 3.34, ppl 28.23, throughput 5736.11 samples/s
[Epoch 8 Batch 11000] loss 3.34, ppl 28.17, throughput 5749.58 samples/s
[Epoch 8 Batch 12000] loss 3.34, ppl 28.09, throughput 5498.03 samples/s
[Epoch 8 Batch 13000] loss 3.34, ppl 28.08, throughput 5723.16 samples/s
[Epoch 8 Batch 14000] loss 3.33, ppl 28.03, throughput 5719.19 samples/s
[Epoch 8 Batch 15000] loss 3.33, ppl 27.92, throughput 5602.52 samples/s
[Epoch 8 Batch 16000] loss 3.34, ppl 28.25, throughput 5729.38 samples/s
[Epoch 8 Batch 17000] loss 3.34, ppl 28.14, throughput 5770.23 samples/s
[Epoch 8 Batch 18000] loss 3.34, ppl 28.19, throughput 5713.13 samples/s
[Epoch 8 Batch 19000] loss 3.34, ppl 28.34, throughput 5575.76 samples/s
[Epoch 8 Batch 20000] loss 3.34, ppl 28.10, throughput 5755.51 samples/s
[Epoch 8 Batch 21000] loss 3.34, ppl 28.13, throughput 5789.39 samples/s
[Epoch 8 Batch 22000] loss 3.33, ppl 28.07, throughput 5726.59 samples/s
[Epoch 8 Batch 23000] loss 3.34, ppl 28.19, throughput 5560.16 samples/s
[Epoch 8 Batch 24000] loss 3.34, ppl 28.16, throughput 5747.21 samples/s
[Epoch 8 Batch 25000] loss 3.34, ppl 28.12, throughput 5737.89 samples/s
[Epoch 8 Batch 26000] loss 3.34, ppl 28.14, throughput 5745.73 samples/s
[Epoch 8 Batch 27000] loss 3.34, ppl 28.24, throughput 5603.33 samples/s
[Epoch 8 Batch 28000] loss 3.34, ppl 28.13, throughput 5734.35 samples/s
[Epoch 8 Batch 29000] loss 3.34, ppl 28.28, throughput 5767.62 samples/s
[Epoch 8 Batch 30000] loss 3.34, ppl 28.16, throughput 5529.96 samples/s
[Epoch 8 Batch 31000] loss 3.33, ppl 28.04, throughput 5748.72 samples/s
[Epoch 8 Batch 32000] loss 3.34, ppl 28.16, throughput 5713.88 samples/s
[Epoch 8 Batch 33000] loss 3.34, ppl 28.26, throughput 5685.78 samples/s
[Epoch 8 Batch 34000] loss 3.34, ppl 28.22, throughput 5562.89 samples/s
[Epoch 8 Batch 35000] loss 3.34, ppl 28.22, throughput 5729.77 samples/s
[Epoch 8 Batch 36000] loss 3.35, ppl 28.39, throughput 5678.73 samples/s
[Epoch 8 Batch 37000] loss 3.34, ppl 28.17, throughput 5710.85 samples/s
[Epoch 8 Batch 38000] loss 3.34, ppl 28.13, throughput 5591.02 samples/s
[Epoch 8 Batch 39000] loss 3.34, ppl 28.26, throughput 5769.41 samples/s
[Epoch 8 Batch 40000] loss 3.34, ppl 28.30, throughput 5759.34 samples/s
[Epoch 8 Batch 41000] loss 3.34, ppl 28.35, throughput 5307.05 samples/s
[Epoch 8 Batch 42000] loss 3.34, ppl 28.28, throughput 4976.91 samples/s
[Epoch 8 Batch 43000] loss 3.34, ppl 28.24, throughput 4638.15 samples/s
[Epoch 8 Batch 44000] loss 3.34, ppl 28.32, throughput 4626.68 samples/s
[Epoch 8 Batch 45000] loss 3.34, ppl 28.27, throughput 4806.67 samples/s
[Epoch 8 Batch 46000] loss 3.34, ppl 28.22, throughput 5352.05 samples/s
[Epoch 8 Batch 47000] loss 3.34, ppl 28.23, throughput 5582.54 samples/s
[Epoch 8 Batch 48000] loss 3.34, ppl 28.34, throughput 5724.01 samples/s
[Epoch 8 Batch 49000] loss 3.34, ppl 28.29, throughput 5593.48 samples/s
[Epoch 8 Batch 50000] loss 3.34, ppl 28.26, throughput 5676.80 samples/s
[Epoch 8 Batch 51000] loss 3.34, ppl 28.31, throughput 5723.61 samples/s
[Epoch 8 Batch 52000] loss 3.35, ppl 28.41, throughput 5737.86 samples/s
[Epoch 8 Batch 53000] loss 3.34, ppl 28.25, throughput 5559.81 samples/s
[Epoch 8 Batch 54000] loss 3.35, ppl 28.39, throughput 5781.08 samples/s
[Epoch 8 Batch 55000] loss 3.35, ppl 28.42, throughput 5718.23 samples/s
[Epoch 8 Batch 56000] loss 3.34, ppl 28.29, throughput 5584.25 samples/s
[Epoch 8 Batch 57000] loss 3.34, ppl 28.32, throughput 5684.21 samples/s
[Epoch 8 Batch 58000] loss 3.34, ppl 28.24, throughput 5780.40 samples/s
[Epoch 8 Batch 59000] loss 3.35, ppl 28.40, throughput 5745.67 samples/s
[Epoch 8 Batch 60000] loss 3.34, ppl 28.33, throughput 5553.22 samples/s
[Epoch 8 Batch 61000] loss 3.34, ppl 28.28, throughput 5770.54 samples/s
[Epoch 8 Batch 62000] loss 3.34, ppl 28.25, throughput 5730.56 samples/s
[Epoch 8 Batch 63000] loss 3.34, ppl 28.30, throughput 5712.82 samples/s
[Epoch 8 Batch 64000] loss 3.34, ppl 28.30, throughput 5552.39 samples/s
[Epoch 8 Batch 65000] loss 3.35, ppl 28.36, throughput 5696.30 samples/s
[Epoch 8 Batch 66000] loss 3.34, ppl 28.28, throughput 5768.32 samples/s
[Epoch 8 Batch 67000] loss 3.35, ppl 28.45, throughput 5566.78 samples/s
[Epoch 8 Batch 68000] loss 3.34, ppl 28.28, throughput 5734.26 samples/s
[Epoch 8 Batch 69000] loss 3.34, ppl 28.29, throughput 5772.52 samples/s
[Epoch 8 Batch 70000] loss 3.34, ppl 28.26, throughput 5743.45 samples/s
[Epoch 8 Batch 71000] loss 3.34, ppl 28.32, throughput 5611.65 samples/s
[Epoch 8 Batch 72000] loss 3.35, ppl 28.41, throughput 5721.41 samples/s
[Epoch 8 Batch 73000] loss 3.35, ppl 28.40, throughput 5715.51 samples/s
[Epoch 8 Batch 74000] loss 3.34, ppl 28.33, throughput 5691.80 samples/s
[Epoch 8 Batch 75000] loss 3.35, ppl 28.37, throughput 5533.00 samples/s
[Epoch 8 Batch 76000] loss 3.34, ppl 28.27, throughput 5768.24 samples/s
[Epoch 8 Batch 77000] loss 3.34, ppl 28.32, throughput 5727.08 samples/s
[Epoch 8 Batch 78000] loss 3.34, ppl 28.32, throughput 5736.31 samples/s
Epoch 8 took 7192.33 seconds.
[Epoch 9 Batch 1000] loss 3.32, ppl 27.74, throughput 4647.34 samples/s
[Epoch 9 Batch 2000] loss 3.32, ppl 27.64, throughput 4644.25 samples/s
[Epoch 9 Batch 3000] loss 3.32, ppl 27.57, throughput 4937.49 samples/s
[Epoch 9 Batch 4000] loss 3.32, ppl 27.75, throughput 5152.43 samples/s
[Epoch 9 Batch 5000] loss 3.32, ppl 27.75, throughput 5328.70 samples/s
[Epoch 9 Batch 6000] loss 3.32, ppl 27.67, throughput 5309.59 samples/s
[Epoch 9 Batch 7000] loss 3.33, ppl 27.81, throughput 5751.66 samples/s
[Epoch 9 Batch 8000] loss 3.32, ppl 27.79, throughput 5599.92 samples/s
[Epoch 9 Batch 9000] loss 3.32, ppl 27.64, throughput 5797.57 samples/s
[Epoch 9 Batch 10000] loss 3.33, ppl 27.93, throughput 5729.58 samples/s
[Epoch 9 Batch 11000] loss 3.32, ppl 27.77, throughput 5778.31 samples/s
[Epoch 9 Batch 12000] loss 3.33, ppl 27.88, throughput 5617.64 samples/s
[Epoch 9 Batch 13000] loss 3.32, ppl 27.54, throughput 5728.14 samples/s
[Epoch 9 Batch 14000] loss 3.32, ppl 27.63, throughput 5658.10 samples/s
[Epoch 9 Batch 15000] loss 3.32, ppl 27.72, throughput 5184.89 samples/s
[Epoch 9 Batch 16000] loss 3.33, ppl 27.85, throughput 5311.78 samples/s
[Epoch 9 Batch 17000] loss 3.32, ppl 27.78, throughput 5307.85 samples/s
[Epoch 9 Batch 18000] loss 3.32, ppl 27.73, throughput 5350.08 samples/s
[Epoch 9 Batch 19000] loss 3.32, ppl 27.80, throughput 4999.80 samples/s
[Epoch 9 Batch 20000] loss 3.32, ppl 27.59, throughput 4729.55 samples/s
[Epoch 9 Batch 21000] loss 3.32, ppl 27.75, throughput 5724.06 samples/s
[Epoch 9 Batch 22000] loss 3.33, ppl 27.92, throughput 5760.64 samples/s
[Epoch 9 Batch 23000] loss 3.33, ppl 27.91, throughput 5543.45 samples/s
[Epoch 9 Batch 24000] loss 3.32, ppl 27.74, throughput 5783.28 samples/s
[Epoch 9 Batch 25000] loss 3.32, ppl 27.80, throughput 5698.40 samples/s
[Epoch 9 Batch 26000] loss 3.33, ppl 27.98, throughput 5766.96 samples/s
[Epoch 9 Batch 27000] loss 3.33, ppl 27.91, throughput 5610.47 samples/s
[Epoch 9 Batch 28000] loss 3.33, ppl 27.90, throughput 5765.71 samples/s
[Epoch 9 Batch 29000] loss 3.33, ppl 27.93, throughput 5729.95 samples/s
[Epoch 9 Batch 30000] loss 3.33, ppl 27.95, throughput 5565.60 samples/s
[Epoch 9 Batch 31000] loss 3.33, ppl 27.92, throughput 5811.81 samples/s
[Epoch 9 Batch 32000] loss 3.33, ppl 27.90, throughput 5765.09 samples/s
[Epoch 9 Batch 33000] loss 3.33, ppl 27.96, throughput 5762.13 samples/s
[Epoch 9 Batch 34000] loss 3.33, ppl 27.91, throughput 5569.73 samples/s
[Epoch 9 Batch 35000] loss 3.33, ppl 27.94, throughput 5765.76 samples/s
[Epoch 9 Batch 36000] loss 3.33, ppl 27.86, throughput 5709.97 samples/s
[Epoch 9 Batch 37000] loss 3.32, ppl 27.75, throughput 5748.57 samples/s
[Epoch 9 Batch 38000] loss 3.32, ppl 27.79, throughput 5555.83 samples/s
[Epoch 9 Batch 39000] loss 3.33, ppl 27.91, throughput 5760.14 samples/s
[Epoch 9 Batch 40000] loss 3.33, ppl 27.90, throughput 5778.09 samples/s
[Epoch 9 Batch 41000] loss 3.33, ppl 27.85, throughput 5570.51 samples/s
[Epoch 9 Batch 42000] loss 3.33, ppl 27.93, throughput 5758.40 samples/s
[Epoch 9 Batch 43000] loss 3.33, ppl 27.94, throughput 5748.83 samples/s
[Epoch 9 Batch 44000] loss 3.33, ppl 27.97, throughput 5743.08 samples/s
[Epoch 9 Batch 45000] loss 3.33, ppl 28.01, throughput 5585.68 samples/s
[Epoch 9 Batch 46000] loss 3.33, ppl 27.93, throughput 5754.68 samples/s
[Epoch 9 Batch 47000] loss 3.33, ppl 28.00, throughput 5709.79 samples/s
[Epoch 9 Batch 48000] loss 3.33, ppl 27.92, throughput 5722.63 samples/s
[Epoch 9 Batch 49000] loss 3.33, ppl 27.99, throughput 5566.20 samples/s
[Epoch 9 Batch 50000] loss 3.33, ppl 27.93, throughput 5719.64 samples/s
[Epoch 9 Batch 51000] loss 3.33, ppl 27.91, throughput 5707.32 samples/s
[Epoch 9 Batch 52000] loss 3.33, ppl 27.99, throughput 5749.22 samples/s
[Epoch 9 Batch 53000] loss 3.33, ppl 27.91, throughput 5617.35 samples/s
[Epoch 9 Batch 54000] loss 3.33, ppl 27.94, throughput 5722.77 samples/s
[Epoch 9 Batch 55000] loss 3.33, ppl 27.96, throughput 5699.43 samples/s
[Epoch 9 Batch 56000] loss 3.33, ppl 28.03, throughput 5586.17 samples/s
[Epoch 9 Batch 57000] loss 3.33, ppl 28.06, throughput 5716.63 samples/s
[Epoch 9 Batch 58000] loss 3.33, ppl 27.98, throughput 5719.90 samples/s
[Epoch 9 Batch 59000] loss 3.33, ppl 27.89, throughput 5744.05 samples/s
[Epoch 9 Batch 60000] loss 3.33, ppl 28.06, throughput 5568.85 samples/s
[Epoch 9 Batch 61000] loss 3.33, ppl 27.95, throughput 5795.33 samples/s
[Epoch 9 Batch 62000] loss 3.33, ppl 27.99, throughput 5731.03 samples/s
[Epoch 9 Batch 63000] loss 3.33, ppl 27.99, throughput 5726.33 samples/s
[Epoch 9 Batch 64000] loss 3.33, ppl 27.98, throughput 5575.30 samples/s
[Epoch 9 Batch 65000] loss 3.33, ppl 27.92, throughput 5733.14 samples/s
[Epoch 9 Batch 66000] loss 3.33, ppl 28.04, throughput 5766.54 samples/s
[Epoch 9 Batch 67000] loss 3.33, ppl 28.02, throughput 5319.45 samples/s
[Epoch 9 Batch 68000] loss 3.33, ppl 28.03, throughput 5053.11 samples/s
[Epoch 9 Batch 69000] loss 3.33, ppl 28.05, throughput 5386.97 samples/s
[Epoch 9 Batch 70000] loss 3.33, ppl 27.98, throughput 5345.94 samples/s
[Epoch 9 Batch 71000] loss 3.33, ppl 28.01, throughput 5167.70 samples/s
[Epoch 9 Batch 72000] loss 3.34, ppl 28.14, throughput 5305.30 samples/s
[Epoch 9 Batch 73000] loss 3.33, ppl 28.00, throughput 5360.93 samples/s
[Epoch 9 Batch 74000] loss 3.33, ppl 28.04, throughput 5673.26 samples/s
[Epoch 9 Batch 75000] loss 3.34, ppl 28.14, throughput 5591.30 samples/s
[Epoch 9 Batch 76000] loss 3.33, ppl 27.97, throughput 5798.78 samples/s
[Epoch 9 Batch 77000] loss 3.33, ppl 27.96, throughput 5780.06 samples/s
[Epoch 9 Batch 78000] loss 3.33, ppl 28.00, throughput 5755.41 samples/s
Epoch 9 took 7195.49 seconds.
[Epoch 10 Batch 1000] loss 3.31, ppl 27.30, throughput 4585.54 samples/s
[Epoch 10 Batch 2000] loss 3.30, ppl 27.18, throughput 4665.90 samples/s
[Epoch 10 Batch 3000] loss 3.30, ppl 27.24, throughput 4684.73 samples/s
[Epoch 10 Batch 4000] loss 3.29, ppl 26.80, throughput 4513.37 samples/s
[Epoch 10 Batch 5000] loss 3.31, ppl 27.43, throughput 4688.96 samples/s
[Epoch 10 Batch 6000] loss 3.31, ppl 27.49, throughput 5170.69 samples/s
[Epoch 10 Batch 7000] loss 3.31, ppl 27.25, throughput 5742.72 samples/s
[Epoch 10 Batch 8000] loss 3.31, ppl 27.29, throughput 5593.52 samples/s
[Epoch 10 Batch 9000] loss 3.32, ppl 27.56, throughput 5762.95 samples/s
[Epoch 10 Batch 10000] loss 3.31, ppl 27.42, throughput 5733.34 samples/s
[Epoch 10 Batch 11000] loss 3.32, ppl 27.55, throughput 5750.97 samples/s
[Epoch 10 Batch 12000] loss 3.32, ppl 27.53, throughput 5599.23 samples/s
[Epoch 10 Batch 13000] loss 3.31, ppl 27.39, throughput 5694.45 samples/s
[Epoch 10 Batch 14000] loss 3.31, ppl 27.31, throughput 5714.71 samples/s
[Epoch 10 Batch 15000] loss 3.32, ppl 27.56, throughput 5614.35 samples/s
[Epoch 10 Batch 16000] loss 3.32, ppl 27.62, throughput 5736.52 samples/s
[Epoch 10 Batch 17000] loss 3.32, ppl 27.62, throughput 5737.45 samples/s
[Epoch 10 Batch 18000] loss 3.32, ppl 27.65, throughput 5741.14 samples/s
[Epoch 10 Batch 19000] loss 3.32, ppl 27.61, throughput 5593.40 samples/s
[Epoch 10 Batch 20000] loss 3.32, ppl 27.55, throughput 5742.59 samples/s
[Epoch 10 Batch 21000] loss 3.31, ppl 27.48, throughput 5806.35 samples/s
[Epoch 10 Batch 22000] loss 3.32, ppl 27.58, throughput 5665.74 samples/s
[Epoch 10 Batch 23000] loss 3.32, ppl 27.70, throughput 5618.28 samples/s
[Epoch 10 Batch 24000] loss 3.32, ppl 27.58, throughput 5783.57 samples/s
[Epoch 10 Batch 25000] loss 3.31, ppl 27.50, throughput 5783.71 samples/s
[Epoch 10 Batch 26000] loss 3.32, ppl 27.60, throughput 5743.81 samples/s
[Epoch 10 Batch 27000] loss 3.31, ppl 27.45, throughput 5632.17 samples/s
[Epoch 10 Batch 28000] loss 3.31, ppl 27.48, throughput 5708.20 samples/s
[Epoch 10 Batch 29000] loss 3.32, ppl 27.65, throughput 5723.31 samples/s
[Epoch 10 Batch 30000] loss 3.31, ppl 27.52, throughput 5561.96 samples/s
[Epoch 10 Batch 31000] loss 3.32, ppl 27.62, throughput 5778.12 samples/s
[Epoch 10 Batch 32000] loss 3.32, ppl 27.53, throughput 5729.16 samples/s
[Epoch 10 Batch 33000] loss 3.32, ppl 27.63, throughput 5739.38 samples/s
[Epoch 10 Batch 34000] loss 3.32, ppl 27.68, throughput 5533.89 samples/s
[Epoch 10 Batch 35000] loss 3.32, ppl 27.67, throughput 5655.29 samples/s
[Epoch 10 Batch 36000] loss 3.32, ppl 27.59, throughput 5767.38 samples/s
[Epoch 10 Batch 37000] loss 3.32, ppl 27.72, throughput 5767.26 samples/s
[Epoch 10 Batch 38000] loss 3.32, ppl 27.56, throughput 5585.14 samples/s
[Epoch 10 Batch 39000] loss 3.32, ppl 27.67, throughput 5721.42 samples/s
[Epoch 10 Batch 40000] loss 3.32, ppl 27.62, throughput 5735.35 samples/s
[Epoch 10 Batch 41000] loss 3.32, ppl 27.66, throughput 5066.70 samples/s
[Epoch 10 Batch 42000] loss 3.32, ppl 27.62, throughput 4592.93 samples/s
[Epoch 10 Batch 43000] loss 3.32, ppl 27.74, throughput 4627.14 samples/s
[Epoch 10 Batch 44000] loss 3.32, ppl 27.69, throughput 4602.07 samples/s
[Epoch 10 Batch 45000] loss 3.32, ppl 27.75, throughput 4531.17 samples/s
[Epoch 10 Batch 46000] loss 3.32, ppl 27.57, throughput 4684.66 samples/s
[Epoch 10 Batch 47000] loss 3.32, ppl 27.57, throughput 5726.43 samples/s
[Epoch 10 Batch 48000] loss 3.32, ppl 27.61, throughput 5752.30 samples/s
[Epoch 10 Batch 49000] loss 3.32, ppl 27.70, throughput 5559.27 samples/s
[Epoch 10 Batch 50000] loss 3.32, ppl 27.74, throughput 5704.39 samples/s
[Epoch 10 Batch 51000] loss 3.32, ppl 27.74, throughput 5783.53 samples/s
[Epoch 10 Batch 52000] loss 3.33, ppl 27.84, throughput 5711.76 samples/s
[Epoch 10 Batch 53000] loss 3.32, ppl 27.65, throughput 5597.92 samples/s
[Epoch 10 Batch 54000] loss 3.32, ppl 27.78, throughput 5744.89 samples/s
[Epoch 10 Batch 55000] loss 3.32, ppl 27.74, throughput 5769.10 samples/s
[Epoch 10 Batch 56000] loss 3.32, ppl 27.70, throughput 5577.24 samples/s
[Epoch 10 Batch 57000] loss 3.32, ppl 27.71, throughput 5733.19 samples/s
[Epoch 10 Batch 58000] loss 3.32, ppl 27.69, throughput 5734.00 samples/s
[Epoch 10 Batch 59000] loss 3.32, ppl 27.62, throughput 5730.15 samples/s
[Epoch 10 Batch 60000] loss 3.32, ppl 27.63, throughput 5581.10 samples/s
[Epoch 10 Batch 61000] loss 3.32, ppl 27.73, throughput 5724.20 samples/s
[Epoch 10 Batch 62000] loss 3.32, ppl 27.69, throughput 5770.14 samples/s
[Epoch 10 Batch 63000] loss 3.32, ppl 27.78, throughput 5737.52 samples/s
[Epoch 10 Batch 64000] loss 3.32, ppl 27.77, throughput 5598.42 samples/s
[Epoch 10 Batch 65000] loss 3.32, ppl 27.77, throughput 5763.68 samples/s
[Epoch 10 Batch 66000] loss 3.32, ppl 27.72, throughput 5737.44 samples/s
[Epoch 10 Batch 67000] loss 3.32, ppl 27.69, throughput 5574.45 samples/s
[Epoch 10 Batch 68000] loss 3.32, ppl 27.69, throughput 5721.34 samples/s
[Epoch 10 Batch 69000] loss 3.32, ppl 27.69, throughput 5782.93 samples/s
[Epoch 10 Batch 70000] loss 3.32, ppl 27.72, throughput 5755.45 samples/s
[Epoch 10 Batch 71000] loss 3.32, ppl 27.63, throughput 5579.56 samples/s
[Epoch 10 Batch 72000] loss 3.32, ppl 27.71, throughput 5747.28 samples/s
[Epoch 10 Batch 73000] loss 3.33, ppl 27.80, throughput 5761.01 samples/s
[Epoch 10 Batch 74000] loss 3.32, ppl 27.60, throughput 5738.75 samples/s
[Epoch 10 Batch 75000] loss 3.33, ppl 27.80, throughput 5674.03 samples/s
[Epoch 10 Batch 76000] loss 3.32, ppl 27.76, throughput 5730.83 samples/s
[Epoch 10 Batch 77000] loss 3.32, ppl 27.70, throughput 5756.66 samples/s
[Epoch 10 Batch 78000] loss 3.32, ppl 27.77, throughput 5713.15 samples/s
Epoch 10 took 7238.54 seconds.
[Epoch 11 Batch 1000] loss 3.30, ppl 27.01, throughput 5631.70 samples/s
[Epoch 11 Batch 2000] loss 3.30, ppl 27.15, throughput 5677.35 samples/s
[Epoch 11 Batch 3000] loss 3.30, ppl 27.20, throughput 5742.46 samples/s
[Epoch 11 Batch 4000] loss 3.29, ppl 26.84, throughput 5554.18 samples/s
[Epoch 11 Batch 5000] loss 3.31, ppl 27.29, throughput 5730.71 samples/s
[Epoch 11 Batch 6000] loss 3.30, ppl 27.16, throughput 5676.68 samples/s
[Epoch 11 Batch 7000] loss 3.30, ppl 27.06, throughput 4645.86 samples/s
[Epoch 11 Batch 8000] loss 3.30, ppl 27.21, throughput 4493.88 samples/s
[Epoch 11 Batch 9000] loss 3.29, ppl 26.93, throughput 4645.74 samples/s
[Epoch 11 Batch 10000] loss 3.30, ppl 27.18, throughput 4518.74 samples/s
[Epoch 11 Batch 11000] loss 3.31, ppl 27.31, throughput 4547.31 samples/s
[Epoch 11 Batch 12000] loss 3.31, ppl 27.33, throughput 5224.80 samples/s
[Epoch 11 Batch 13000] loss 3.30, ppl 27.23, throughput 5571.66 samples/s
[Epoch 11 Batch 14000] loss 3.30, ppl 27.05, throughput 4654.58 samples/s
[Epoch 11 Batch 15000] loss 3.31, ppl 27.31, throughput 4483.01 samples/s
[Epoch 11 Batch 16000] loss 3.31, ppl 27.32, throughput 5163.94 samples/s
[Epoch 11 Batch 17000] loss 3.31, ppl 27.28, throughput 4690.84 samples/s
[Epoch 11 Batch 18000] loss 3.31, ppl 27.30, throughput 4598.26 samples/s
[Epoch 11 Batch 19000] loss 3.31, ppl 27.36, throughput 4888.83 samples/s
[Epoch 11 Batch 20000] loss 3.31, ppl 27.39, throughput 5698.12 samples/s
[Epoch 11 Batch 21000] loss 3.31, ppl 27.36, throughput 5722.74 samples/s
[Epoch 11 Batch 22000] loss 3.31, ppl 27.37, throughput 5713.87 samples/s
[Epoch 11 Batch 23000] loss 3.30, ppl 27.20, throughput 5669.93 samples/s
[Epoch 11 Batch 24000] loss 3.31, ppl 27.40, throughput 5782.51 samples/s
[Epoch 11 Batch 25000] loss 3.31, ppl 27.29, throughput 5749.79 samples/s
[Epoch 11 Batch 26000] loss 3.31, ppl 27.27, throughput 5822.87 samples/s
[Epoch 11 Batch 27000] loss 3.31, ppl 27.29, throughput 5621.08 samples/s
[Epoch 11 Batch 28000] loss 3.31, ppl 27.32, throughput 5733.82 samples/s
[Epoch 11 Batch 29000] loss 3.31, ppl 27.40, throughput 5727.72 samples/s
[Epoch 11 Batch 30000] loss 3.31, ppl 27.38, throughput 5605.47 samples/s
[Epoch 11 Batch 31000] loss 3.30, ppl 27.25, throughput 5734.39 samples/s
[Epoch 11 Batch 32000] loss 3.31, ppl 27.41, throughput 5767.86 samples/s
[Epoch 11 Batch 33000] loss 3.31, ppl 27.36, throughput 5727.60 samples/s
[Epoch 11 Batch 34000] loss 3.31, ppl 27.34, throughput 5514.52 samples/s
[Epoch 11 Batch 35000] loss 3.31, ppl 27.41, throughput 5761.61 samples/s
[Epoch 11 Batch 36000] loss 3.31, ppl 27.37, throughput 5751.86 samples/s
[Epoch 11 Batch 37000] loss 3.31, ppl 27.29, throughput 5771.57 samples/s
[Epoch 11 Batch 38000] loss 3.31, ppl 27.40, throughput 5581.13 samples/s
[Epoch 11 Batch 39000] loss 3.31, ppl 27.33, throughput 5680.01 samples/s
[Epoch 11 Batch 40000] loss 3.31, ppl 27.31, throughput 5753.59 samples/s
[Epoch 11 Batch 41000] loss 3.31, ppl 27.35, throughput 5622.31 samples/s
[Epoch 11 Batch 42000] loss 3.31, ppl 27.43, throughput 5726.39 samples/s
[Epoch 11 Batch 43000] loss 3.31, ppl 27.41, throughput 5725.05 samples/s
[Epoch 11 Batch 44000] loss 3.31, ppl 27.34, throughput 5760.85 samples/s
[Epoch 11 Batch 45000] loss 3.31, ppl 27.41, throughput 5568.86 samples/s
[Epoch 11 Batch 46000] loss 3.31, ppl 27.36, throughput 5703.33 samples/s
[Epoch 11 Batch 47000] loss 3.31, ppl 27.32, throughput 5692.83 samples/s
[Epoch 11 Batch 48000] loss 3.31, ppl 27.35, throughput 5739.69 samples/s
[Epoch 11 Batch 49000] loss 3.31, ppl 27.47, throughput 5556.53 samples/s
[Epoch 11 Batch 50000] loss 3.31, ppl 27.29, throughput 5765.16 samples/s
[Epoch 11 Batch 51000] loss 3.31, ppl 27.37, throughput 5719.33 samples/s
[Epoch 11 Batch 52000] loss 3.31, ppl 27.41, throughput 5696.55 samples/s
[Epoch 11 Batch 53000] loss 3.31, ppl 27.43, throughput 5582.67 samples/s
[Epoch 11 Batch 54000] loss 3.31, ppl 27.40, throughput 5773.69 samples/s
[Epoch 11 Batch 55000] loss 3.31, ppl 27.39, throughput 5792.20 samples/s
[Epoch 11 Batch 56000] loss 3.31, ppl 27.49, throughput 5564.06 samples/s
[Epoch 11 Batch 57000] loss 3.31, ppl 27.46, throughput 5725.12 samples/s
[Epoch 11 Batch 58000] loss 3.31, ppl 27.49, throughput 5773.14 samples/s
[Epoch 11 Batch 59000] loss 3.31, ppl 27.41, throughput 5693.79 samples/s
[Epoch 11 Batch 60000] loss 3.31, ppl 27.47, throughput 5566.17 samples/s
[Epoch 11 Batch 61000] loss 3.31, ppl 27.36, throughput 5794.31 samples/s
[Epoch 11 Batch 62000] loss 3.31, ppl 27.45, throughput 5699.53 samples/s
[Epoch 11 Batch 63000] loss 3.31, ppl 27.45, throughput 5717.48 samples/s
[Epoch 11 Batch 64000] loss 3.32, ppl 27.59, throughput 5539.36 samples/s
[Epoch 11 Batch 65000] loss 3.32, ppl 27.54, throughput 5688.92 samples/s
[Epoch 11 Batch 66000] loss 3.31, ppl 27.45, throughput 4905.88 samples/s
[Epoch 11 Batch 67000] loss 3.31, ppl 27.42, throughput 4502.06 samples/s
[Epoch 11 Batch 68000] loss 3.31, ppl 27.49, throughput 4599.34 samples/s
[Epoch 11 Batch 69000] loss 3.31, ppl 27.45, throughput 4563.62 samples/s
[Epoch 11 Batch 70000] loss 3.31, ppl 27.52, throughput 4641.74 samples/s
[Epoch 11 Batch 71000] loss 3.32, ppl 27.52, throughput 5059.03 samples/s
[Epoch 11 Batch 72000] loss 3.31, ppl 27.48, throughput 5707.31 samples/s
[Epoch 11 Batch 73000] loss 3.32, ppl 27.52, throughput 5747.39 samples/s
[Epoch 11 Batch 74000] loss 3.32, ppl 27.53, throughput 5794.15 samples/s
[Epoch 11 Batch 75000] loss 3.32, ppl 27.54, throughput 5556.92 samples/s
[Epoch 11 Batch 76000] loss 3.32, ppl 27.53, throughput 5745.13 samples/s
[Epoch 11 Batch 77000] loss 3.31, ppl 27.37, throughput 5786.51 samples/s
[Epoch 11 Batch 78000] loss 3.31, ppl 27.43, throughput 5761.38 samples/s
Epoch 11 took 7357.18 seconds.
[Epoch 12 Batch 1000] loss 3.30, ppl 27.03, throughput 5596.05 samples/s
[Epoch 12 Batch 2000] loss 3.30, ppl 27.07, throughput 5699.81 samples/s
[Epoch 12 Batch 3000] loss 3.27, ppl 26.36, throughput 5737.13 samples/s
[Epoch 12 Batch 4000] loss 3.29, ppl 26.72, throughput 5505.08 samples/s
[Epoch 12 Batch 5000] loss 3.29, ppl 26.72, throughput 5307.32 samples/s
[Epoch 12 Batch 6000] loss 3.30, ppl 26.98, throughput 5329.64 samples/s
[Epoch 12 Batch 7000] loss 3.30, ppl 27.08, throughput 5157.94 samples/s
[Epoch 12 Batch 8000] loss 3.30, ppl 27.02, throughput 4550.31 samples/s
[Epoch 12 Batch 9000] loss 3.29, ppl 26.81, throughput 4737.45 samples/s
[Epoch 12 Batch 10000] loss 3.29, ppl 26.84, throughput 4702.17 samples/s
[Epoch 12 Batch 11000] loss 3.30, ppl 27.06, throughput 5689.93 samples/s
[Epoch 12 Batch 12000] loss 3.30, ppl 27.01, throughput 5584.23 samples/s
[Epoch 12 Batch 13000] loss 3.29, ppl 26.92, throughput 5778.82 samples/s
[Epoch 12 Batch 14000] loss 3.30, ppl 27.08, throughput 5718.96 samples/s
[Epoch 12 Batch 15000] loss 3.30, ppl 26.99, throughput 5597.12 samples/s
[Epoch 12 Batch 16000] loss 3.30, ppl 27.07, throughput 5687.09 samples/s
[Epoch 12 Batch 17000] loss 3.29, ppl 26.88, throughput 5737.93 samples/s
[Epoch 12 Batch 18000] loss 3.30, ppl 27.13, throughput 5778.39 samples/s
[Epoch 12 Batch 19000] loss 3.30, ppl 27.14, throughput 5590.98 samples/s
[Epoch 12 Batch 20000] loss 3.30, ppl 27.11, throughput 5675.96 samples/s
[Epoch 12 Batch 21000] loss 3.30, ppl 27.10, throughput 5759.64 samples/s
[Epoch 12 Batch 22000] loss 3.30, ppl 27.11, throughput 5757.87 samples/s
[Epoch 12 Batch 23000] loss 3.30, ppl 27.17, throughput 5574.90 samples/s
[Epoch 12 Batch 24000] loss 3.30, ppl 27.05, throughput 5738.33 samples/s
[Epoch 12 Batch 25000] loss 3.30, ppl 27.05, throughput 5774.35 samples/s
[Epoch 12 Batch 26000] loss 3.30, ppl 27.05, throughput 5726.98 samples/s
[Epoch 12 Batch 27000] loss 3.30, ppl 27.01, throughput 5611.62 samples/s
[Epoch 12 Batch 28000] loss 3.30, ppl 27.00, throughput 5720.54 samples/s
[Epoch 12 Batch 29000] loss 3.30, ppl 27.10, throughput 5719.24 samples/s
[Epoch 12 Batch 30000] loss 3.30, ppl 27.05, throughput 5565.90 samples/s
[Epoch 12 Batch 31000] loss 3.29, ppl 26.93, throughput 5750.44 samples/s
[Epoch 12 Batch 32000] loss 3.30, ppl 27.19, throughput 5752.85 samples/s
[Epoch 12 Batch 33000] loss 3.30, ppl 27.07, throughput 5732.04 samples/s
[Epoch 12 Batch 34000] loss 3.30, ppl 27.07, throughput 5639.67 samples/s
[Epoch 12 Batch 35000] loss 3.30, ppl 27.11, throughput 5728.17 samples/s
[Epoch 12 Batch 36000] loss 3.30, ppl 27.05, throughput 5751.60 samples/s
[Epoch 12 Batch 37000] loss 3.30, ppl 27.13, throughput 5747.08 samples/s
[Epoch 12 Batch 38000] loss 3.30, ppl 27.20, throughput 5562.87 samples/s
[Epoch 12 Batch 39000] loss 3.30, ppl 27.14, throughput 5023.21 samples/s
[Epoch 12 Batch 40000] loss 3.31, ppl 27.25, throughput 4744.54 samples/s
[Epoch 12 Batch 41000] loss 3.30, ppl 27.16, throughput 4937.46 samples/s
[Epoch 12 Batch 42000] loss 3.30, ppl 27.22, throughput 4651.88 samples/s
[Epoch 12 Batch 43000] loss 3.30, ppl 27.14, throughput 4641.69 samples/s
[Epoch 12 Batch 44000] loss 3.30, ppl 27.19, throughput 5067.58 samples/s
[Epoch 12 Batch 45000] loss 3.30, ppl 27.17, throughput 5446.13 samples/s
[Epoch 12 Batch 46000] loss 3.30, ppl 27.09, throughput 5782.34 samples/s
[Epoch 12 Batch 47000] loss 3.30, ppl 27.19, throughput 5755.85 samples/s
[Epoch 12 Batch 48000] loss 3.30, ppl 27.14, throughput 5760.49 samples/s
[Epoch 12 Batch 49000] loss 3.30, ppl 27.15, throughput 5551.47 samples/s
[Epoch 12 Batch 50000] loss 3.31, ppl 27.28, throughput 5701.49 samples/s
[Epoch 12 Batch 51000] loss 3.31, ppl 27.28, throughput 5712.28 samples/s
[Epoch 12 Batch 52000] loss 3.30, ppl 27.10, throughput 5752.61 samples/s
[Epoch 12 Batch 53000] loss 3.30, ppl 27.19, throughput 5613.04 samples/s
[Epoch 12 Batch 54000] loss 3.31, ppl 27.27, throughput 5757.05 samples/s
[Epoch 12 Batch 55000] loss 3.30, ppl 27.22, throughput 5744.52 samples/s
[Epoch 12 Batch 56000] loss 3.30, ppl 27.24, throughput 5579.99 samples/s
[Epoch 12 Batch 57000] loss 3.30, ppl 27.15, throughput 5765.64 samples/s
[Epoch 12 Batch 58000] loss 3.30, ppl 27.24, throughput 5703.41 samples/s
[Epoch 12 Batch 59000] loss 3.30, ppl 27.24, throughput 5741.17 samples/s
[Epoch 12 Batch 60000] loss 3.30, ppl 27.21, throughput 5539.47 samples/s
[Epoch 12 Batch 61000] loss 3.31, ppl 27.31, throughput 5760.23 samples/s
[Epoch 12 Batch 62000] loss 3.30, ppl 27.24, throughput 5719.68 samples/s
[Epoch 12 Batch 63000] loss 3.30, ppl 27.23, throughput 5758.99 samples/s
[Epoch 12 Batch 64000] loss 3.31, ppl 27.25, throughput 5546.63 samples/s
[Epoch 12 Batch 65000] loss 3.30, ppl 27.22, throughput 5728.34 samples/s
[Epoch 12 Batch 66000] loss 3.30, ppl 27.20, throughput 5669.68 samples/s
[Epoch 12 Batch 67000] loss 3.31, ppl 27.33, throughput 5537.67 samples/s
[Epoch 12 Batch 68000] loss 3.30, ppl 27.16, throughput 5753.61 samples/s
[Epoch 12 Batch 69000] loss 3.30, ppl 27.24, throughput 5727.57 samples/s
[Epoch 12 Batch 70000] loss 3.31, ppl 27.27, throughput 5755.14 samples/s
[Epoch 12 Batch 71000] loss 3.31, ppl 27.29, throughput 5575.28 samples/s
[Epoch 12 Batch 72000] loss 3.31, ppl 27.28, throughput 5722.52 samples/s
[Epoch 12 Batch 73000] loss 3.30, ppl 27.22, throughput 5732.96 samples/s
[Epoch 12 Batch 74000] loss 3.31, ppl 27.27, throughput 5745.27 samples/s
[Epoch 12 Batch 75000] loss 3.31, ppl 27.29, throughput 5595.87 samples/s
[Epoch 12 Batch 76000] loss 3.30, ppl 27.23, throughput 5693.73 samples/s
[Epoch 12 Batch 77000] loss 3.30, ppl 27.24, throughput 5708.64 samples/s
[Epoch 12 Batch 78000] loss 3.30, ppl 27.23, throughput 5723.88 samples/s
Epoch 12 took 7202.25 seconds.
[Epoch 13 Batch 1000] loss 3.27, ppl 26.27, throughput 5606.56 samples/s
[Epoch 13 Batch 2000] loss 3.27, ppl 26.37, throughput 5686.62 samples/s
[Epoch 13 Batch 3000] loss 3.29, ppl 26.81, throughput 5697.04 samples/s
[Epoch 13 Batch 4000] loss 3.29, ppl 26.86, throughput 5366.71 samples/s
[Epoch 13 Batch 5000] loss 3.28, ppl 26.65, throughput 4587.71 samples/s
[Epoch 13 Batch 6000] loss 3.28, ppl 26.48, throughput 4563.73 samples/s
[Epoch 13 Batch 7000] loss 3.28, ppl 26.57, throughput 4612.37 samples/s
[Epoch 13 Batch 8000] loss 3.29, ppl 26.84, throughput 4467.70 samples/s
[Epoch 13 Batch 9000] loss 3.28, ppl 26.70, throughput 4639.37 samples/s
[Epoch 13 Batch 10000] loss 3.28, ppl 26.71, throughput 5231.19 samples/s
[Epoch 13 Batch 11000] loss 3.29, ppl 26.73, throughput 5757.22 samples/s
[Epoch 13 Batch 12000] loss 3.27, ppl 26.36, throughput 5213.28 samples/s
[Epoch 13 Batch 13000] loss 3.28, ppl 26.52, throughput 4668.84 samples/s
[Epoch 13 Batch 14000] loss 3.29, ppl 26.82, throughput 4835.14 samples/s
[Epoch 13 Batch 15000] loss 3.29, ppl 26.89, throughput 4538.39 samples/s
[Epoch 13 Batch 16000] loss 3.30, ppl 26.99, throughput 4612.47 samples/s
[Epoch 13 Batch 17000] loss 3.29, ppl 26.89, throughput 4798.27 samples/s
[Epoch 13 Batch 18000] loss 3.29, ppl 26.81, throughput 5718.40 samples/s
[Epoch 13 Batch 19000] loss 3.29, ppl 26.85, throughput 5575.03 samples/s
[Epoch 13 Batch 20000] loss 3.28, ppl 26.65, throughput 5762.65 samples/s
[Epoch 13 Batch 21000] loss 3.29, ppl 26.71, throughput 5664.67 samples/s
[Epoch 13 Batch 22000] loss 3.28, ppl 26.65, throughput 5785.68 samples/s
[Epoch 13 Batch 23000] loss 3.29, ppl 26.87, throughput 5559.21 samples/s
[Epoch 13 Batch 24000] loss 3.29, ppl 26.87, throughput 5751.02 samples/s
[Epoch 13 Batch 25000] loss 3.29, ppl 26.85, throughput 5800.51 samples/s
[Epoch 13 Batch 26000] loss 3.30, ppl 26.99, throughput 5762.07 samples/s
[Epoch 13 Batch 27000] loss 3.29, ppl 26.94, throughput 5572.81 samples/s
[Epoch 13 Batch 28000] loss 3.29, ppl 26.83, throughput 5781.55 samples/s
[Epoch 13 Batch 29000] loss 3.29, ppl 26.81, throughput 5701.82 samples/s
[Epoch 13 Batch 30000] loss 3.29, ppl 26.95, throughput 5570.97 samples/s
[Epoch 13 Batch 31000] loss 3.29, ppl 26.93, throughput 5703.59 samples/s
[Epoch 13 Batch 32000] loss 3.29, ppl 26.93, throughput 5727.47 samples/s
[Epoch 13 Batch 33000] loss 3.29, ppl 26.86, throughput 5761.78 samples/s
[Epoch 13 Batch 34000] loss 3.30, ppl 27.03, throughput 5529.53 samples/s
[Epoch 13 Batch 35000] loss 3.30, ppl 27.04, throughput 5718.21 samples/s
[Epoch 13 Batch 36000] loss 3.29, ppl 26.93, throughput 5720.55 samples/s
[Epoch 13 Batch 37000] loss 3.29, ppl 26.91, throughput 5710.15 samples/s
[Epoch 13 Batch 38000] loss 3.30, ppl 27.03, throughput 5541.41 samples/s
[Epoch 13 Batch 39000] loss 3.29, ppl 26.90, throughput 5714.58 samples/s
[Epoch 13 Batch 40000] loss 3.30, ppl 26.99, throughput 5745.19 samples/s
[Epoch 13 Batch 41000] loss 3.29, ppl 26.91, throughput 5527.27 samples/s
[Epoch 13 Batch 42000] loss 3.29, ppl 26.92, throughput 5735.26 samples/s
[Epoch 13 Batch 43000] loss 3.30, ppl 27.03, throughput 5734.59 samples/s
[Epoch 13 Batch 44000] loss 3.30, ppl 27.01, throughput 5721.63 samples/s
[Epoch 13 Batch 45000] loss 3.30, ppl 27.03, throughput 5511.32 samples/s
[Epoch 13 Batch 46000] loss 3.30, ppl 27.00, throughput 5778.81 samples/s
[Epoch 13 Batch 47000] loss 3.29, ppl 26.93, throughput 5691.12 samples/s
[Epoch 13 Batch 48000] loss 3.29, ppl 26.92, throughput 5707.42 samples/s
[Epoch 13 Batch 49000] loss 3.29, ppl 26.93, throughput 5553.31 samples/s
[Epoch 13 Batch 50000] loss 3.29, ppl 26.88, throughput 5712.15 samples/s
[Epoch 13 Batch 51000] loss 3.30, ppl 27.00, throughput 5685.83 samples/s
[Epoch 13 Batch 52000] loss 3.30, ppl 27.08, throughput 5756.61 samples/s
[Epoch 13 Batch 53000] loss 3.30, ppl 27.13, throughput 5604.01 samples/s
[Epoch 13 Batch 54000] loss 3.29, ppl 26.96, throughput 5734.81 samples/s
[Epoch 13 Batch 55000] loss 3.29, ppl 26.95, throughput 5739.16 samples/s
[Epoch 13 Batch 56000] loss 3.30, ppl 26.99, throughput 5557.28 samples/s
[Epoch 13 Batch 57000] loss 3.30, ppl 27.08, throughput 5504.62 samples/s
[Epoch 13 Batch 58000] loss 3.30, ppl 27.05, throughput 4587.04 samples/s
[Epoch 13 Batch 59000] loss 3.30, ppl 27.04, throughput 4573.54 samples/s
[Epoch 13 Batch 60000] loss 3.30, ppl 27.12, throughput 4487.64 samples/s
[Epoch 13 Batch 61000] loss 3.29, ppl 26.97, throughput 4683.46 samples/s
[Epoch 13 Batch 62000] loss 3.30, ppl 27.01, throughput 4925.89 samples/s
[Epoch 13 Batch 63000] loss 3.30, ppl 27.09, throughput 5443.75 samples/s
[Epoch 13 Batch 64000] loss 3.29, ppl 26.97, throughput 5545.60 samples/s
[Epoch 13 Batch 65000] loss 3.30, ppl 27.04, throughput 5749.27 samples/s
[Epoch 13 Batch 66000] loss 3.30, ppl 27.09, throughput 5746.18 samples/s
[Epoch 13 Batch 67000] loss 3.30, ppl 27.11, throughput 5529.50 samples/s
[Epoch 13 Batch 68000] loss 3.30, ppl 27.09, throughput 5792.92 samples/s
[Epoch 13 Batch 69000] loss 3.30, ppl 27.04, throughput 5756.83 samples/s
[Epoch 13 Batch 70000] loss 3.30, ppl 27.08, throughput 5723.28 samples/s
[Epoch 13 Batch 71000] loss 3.30, ppl 27.02, throughput 5589.43 samples/s
[Epoch 13 Batch 72000] loss 3.30, ppl 27.10, throughput 5734.90 samples/s
[Epoch 13 Batch 73000] loss 3.30, ppl 27.10, throughput 5761.58 samples/s
[Epoch 13 Batch 74000] loss 3.30, ppl 27.11, throughput 5712.72 samples/s
[Epoch 13 Batch 75000] loss 3.30, ppl 27.10, throughput 5602.31 samples/s
[Epoch 13 Batch 76000] loss 3.30, ppl 27.03, throughput 5740.47 samples/s
[Epoch 13 Batch 77000] loss 3.30, ppl 27.11, throughput 5731.95 samples/s
[Epoch 13 Batch 78000] loss 3.30, ppl 27.09, throughput 5751.84 samples/s
Epoch 13 took 7361.06 seconds.
[Epoch 14 Batch 1000] loss 3.26, ppl 26.17, throughput 5579.74 samples/s
[Epoch 14 Batch 2000] loss 3.27, ppl 26.38, throughput 5648.08 samples/s
[Epoch 14 Batch 3000] loss 3.26, ppl 26.02, throughput 5029.30 samples/s
[Epoch 14 Batch 4000] loss 3.27, ppl 26.37, throughput 5202.09 samples/s
[Epoch 14 Batch 5000] loss 3.27, ppl 26.40, throughput 5311.66 samples/s
[Epoch 14 Batch 6000] loss 3.27, ppl 26.27, throughput 5325.23 samples/s
[Epoch 14 Batch 7000] loss 3.28, ppl 26.51, throughput 5302.52 samples/s
[Epoch 14 Batch 8000] loss 3.28, ppl 26.50, throughput 5139.53 samples/s
[Epoch 14 Batch 9000] loss 3.28, ppl 26.70, throughput 5624.58 samples/s
[Epoch 14 Batch 10000] loss 3.28, ppl 26.52, throughput 5751.26 samples/s
[Epoch 14 Batch 11000] loss 3.28, ppl 26.58, throughput 5729.41 samples/s
[Epoch 14 Batch 12000] loss 3.28, ppl 26.66, throughput 5552.90 samples/s
[Epoch 14 Batch 13000] loss 3.28, ppl 26.62, throughput 5759.06 samples/s
[Epoch 14 Batch 14000] loss 3.28, ppl 26.69, throughput 5716.89 samples/s
[Epoch 14 Batch 15000] loss 3.28, ppl 26.51, throughput 5571.35 samples/s
[Epoch 14 Batch 16000] loss 3.28, ppl 26.56, throughput 5701.98 samples/s
[Epoch 14 Batch 17000] loss 3.28, ppl 26.66, throughput 5738.74 samples/s
[Epoch 14 Batch 18000] loss 3.28, ppl 26.66, throughput 5746.72 samples/s
[Epoch 14 Batch 19000] loss 3.28, ppl 26.54, throughput 5594.64 samples/s
[Epoch 14 Batch 20000] loss 3.28, ppl 26.65, throughput 5796.00 samples/s
[Epoch 14 Batch 21000] loss 3.28, ppl 26.66, throughput 5745.77 samples/s
[Epoch 14 Batch 22000] loss 3.28, ppl 26.58, throughput 5670.89 samples/s
[Epoch 14 Batch 23000] loss 3.29, ppl 26.77, throughput 5593.71 samples/s
[Epoch 14 Batch 24000] loss 3.29, ppl 26.77, throughput 5683.68 samples/s
[Epoch 14 Batch 25000] loss 3.29, ppl 26.71, throughput 5717.53 samples/s
[Epoch 14 Batch 26000] loss 3.29, ppl 26.81, throughput 5609.63 samples/s
[Epoch 14 Batch 27000] loss 3.28, ppl 26.65, throughput 5740.22 samples/s
[Epoch 14 Batch 28000] loss 3.29, ppl 26.72, throughput 5701.95 samples/s
[Epoch 14 Batch 29000] loss 3.28, ppl 26.68, throughput 5778.50 samples/s
[Epoch 14 Batch 30000] loss 3.29, ppl 26.75, throughput 5608.94 samples/s
[Epoch 14 Batch 31000] loss 3.29, ppl 26.71, throughput 5387.35 samples/s
[Epoch 14 Batch 32000] loss 3.28, ppl 26.71, throughput 5391.18 samples/s
[Epoch 14 Batch 33000] loss 3.28, ppl 26.67, throughput 5299.43 samples/s
[Epoch 14 Batch 34000] loss 3.28, ppl 26.66, throughput 5178.58 samples/s
[Epoch 14 Batch 35000] loss 3.29, ppl 26.75, throughput 5318.57 samples/s
[Epoch 14 Batch 36000] loss 3.29, ppl 26.71, throughput 5305.99 samples/s
[Epoch 14 Batch 37000] loss 3.29, ppl 26.77, throughput 5295.48 samples/s
[Epoch 14 Batch 38000] loss 3.29, ppl 26.78, throughput 5604.32 samples/s
[Epoch 14 Batch 39000] loss 3.29, ppl 26.80, throughput 5748.51 samples/s
[Epoch 14 Batch 40000] loss 3.29, ppl 26.80, throughput 5749.14 samples/s
[Epoch 14 Batch 41000] loss 3.29, ppl 26.74, throughput 5571.13 samples/s
[Epoch 14 Batch 42000] loss 3.29, ppl 26.75, throughput 5760.89 samples/s
[Epoch 14 Batch 43000] loss 3.29, ppl 26.71, throughput 5793.50 samples/s
[Epoch 14 Batch 44000] loss 3.28, ppl 26.70, throughput 5794.06 samples/s
[Epoch 14 Batch 45000] loss 3.29, ppl 26.89, throughput 5623.84 samples/s
[Epoch 14 Batch 46000] loss 3.29, ppl 26.77, throughput 5763.78 samples/s
[Epoch 14 Batch 47000] loss 3.29, ppl 26.73, throughput 5815.20 samples/s
[Epoch 14 Batch 48000] loss 3.29, ppl 26.87, throughput 5722.88 samples/s
[Epoch 14 Batch 49000] loss 3.29, ppl 26.92, throughput 5607.52 samples/s
[Epoch 14 Batch 50000] loss 3.29, ppl 26.86, throughput 5777.29 samples/s
[Epoch 14 Batch 51000] loss 3.29, ppl 26.80, throughput 5728.71 samples/s
[Epoch 14 Batch 52000] loss 3.29, ppl 26.77, throughput 5787.68 samples/s
[Epoch 14 Batch 53000] loss 3.29, ppl 26.79, throughput 5573.68 samples/s
[Epoch 14 Batch 54000] loss 3.29, ppl 26.82, throughput 5711.43 samples/s
[Epoch 14 Batch 55000] loss 3.29, ppl 26.82, throughput 5761.14 samples/s
[Epoch 14 Batch 56000] loss 3.29, ppl 26.88, throughput 5602.50 samples/s
[Epoch 14 Batch 57000] loss 3.29, ppl 26.83, throughput 5760.28 samples/s
[Epoch 14 Batch 58000] loss 3.29, ppl 26.83, throughput 5740.84 samples/s
[Epoch 14 Batch 59000] loss 3.29, ppl 26.85, throughput 5719.06 samples/s
[Epoch 14 Batch 60000] loss 3.29, ppl 26.87, throughput 5586.28 samples/s
[Epoch 14 Batch 61000] loss 3.29, ppl 26.88, throughput 5804.84 samples/s
[Epoch 14 Batch 62000] loss 3.29, ppl 26.79, throughput 5697.27 samples/s
[Epoch 14 Batch 63000] loss 3.29, ppl 26.94, throughput 5782.19 samples/s
[Epoch 14 Batch 64000] loss 3.29, ppl 26.84, throughput 5598.24 samples/s
[Epoch 14 Batch 65000] loss 3.29, ppl 26.79, throughput 5726.69 samples/s
[Epoch 14 Batch 66000] loss 3.29, ppl 26.87, throughput 5756.19 samples/s
[Epoch 14 Batch 67000] loss 3.29, ppl 26.90, throughput 5606.40 samples/s
[Epoch 14 Batch 68000] loss 3.29, ppl 26.89, throughput 5721.65 samples/s
[Epoch 14 Batch 69000] loss 3.29, ppl 26.84, throughput 5728.39 samples/s
[Epoch 14 Batch 70000] loss 3.29, ppl 26.87, throughput 5705.25 samples/s
[Epoch 14 Batch 71000] loss 3.29, ppl 26.91, throughput 5569.23 samples/s
[Epoch 14 Batch 72000] loss 3.29, ppl 26.86, throughput 5745.25 samples/s
[Epoch 14 Batch 73000] loss 3.29, ppl 26.86, throughput 5741.00 samples/s
[Epoch 14 Batch 74000] loss 3.29, ppl 26.96, throughput 5821.20 samples/s
[Epoch 14 Batch 75000] loss 3.29, ppl 26.83, throughput 5629.85 samples/s
[Epoch 14 Batch 76000] loss 3.29, ppl 26.87, throughput 5771.49 samples/s
[Epoch 14 Batch 77000] loss 3.29, ppl 26.84, throughput 5688.62 samples/s
[Epoch 14 Batch 78000] loss 3.29, ppl 26.94, throughput 5692.20 samples/s
Epoch 14 took 7106.98 seconds.
[Epoch 15 Batch 1000] loss 3.27, ppl 26.35, throughput 5587.90 samples/s
[Epoch 15 Batch 2000] loss 3.27, ppl 26.40, throughput 5646.81 samples/s
[Epoch 15 Batch 3000] loss 3.27, ppl 26.26, throughput 5662.05 samples/s
[Epoch 15 Batch 4000] loss 3.26, ppl 26.14, throughput 4568.40 samples/s
[Epoch 15 Batch 5000] loss 3.26, ppl 26.14, throughput 4948.38 samples/s
[Epoch 15 Batch 6000] loss 3.27, ppl 26.43, throughput 4212.10 samples/s
[Epoch 15 Batch 7000] loss 3.28, ppl 26.47, throughput 4178.58 samples/s
[Epoch 15 Batch 8000] loss 3.27, ppl 26.37, throughput 4073.08 samples/s
[Epoch 15 Batch 9000] loss 3.27, ppl 26.19, throughput 4230.07 samples/s
[Epoch 15 Batch 10000] loss 3.27, ppl 26.34, throughput 4127.82 samples/s
[Epoch 15 Batch 11000] loss 3.28, ppl 26.47, throughput 4495.32 samples/s
[Epoch 15 Batch 12000] loss 3.27, ppl 26.38, throughput 4696.24 samples/s
[Epoch 15 Batch 13000] loss 3.28, ppl 26.59, throughput 5691.04 samples/s
[Epoch 15 Batch 14000] loss 3.28, ppl 26.45, throughput 5770.07 samples/s
[Epoch 15 Batch 15000] loss 3.28, ppl 26.49, throughput 5617.78 samples/s
[Epoch 15 Batch 16000] loss 3.28, ppl 26.54, throughput 5736.58 samples/s
[Epoch 15 Batch 17000] loss 3.28, ppl 26.53, throughput 5748.56 samples/s
[Epoch 15 Batch 18000] loss 3.27, ppl 26.38, throughput 5726.84 samples/s
[Epoch 15 Batch 19000] loss 3.28, ppl 26.53, throughput 5549.73 samples/s
[Epoch 15 Batch 20000] loss 3.27, ppl 26.32, throughput 5758.29 samples/s
[Epoch 15 Batch 21000] loss 3.27, ppl 26.43, throughput 5754.58 samples/s
[Epoch 15 Batch 22000] loss 3.28, ppl 26.45, throughput 5757.42 samples/s
[Epoch 15 Batch 23000] loss 3.28, ppl 26.44, throughput 5603.10 samples/s
[Epoch 15 Batch 24000] loss 3.27, ppl 26.40, throughput 5739.37 samples/s
[Epoch 15 Batch 25000] loss 3.28, ppl 26.53, throughput 5736.63 samples/s
[Epoch 15 Batch 26000] loss 3.28, ppl 26.55, throughput 5579.34 samples/s
[Epoch 15 Batch 27000] loss 3.28, ppl 26.62, throughput 5756.75 samples/s
[Epoch 15 Batch 28000] loss 3.28, ppl 26.45, throughput 5705.18 samples/s
[Epoch 15 Batch 29000] loss 3.28, ppl 26.48, throughput 5676.32 samples/s
[Epoch 15 Batch 30000] loss 3.28, ppl 26.53, throughput 5611.25 samples/s
[Epoch 15 Batch 31000] loss 3.28, ppl 26.55, throughput 5779.03 samples/s
[Epoch 15 Batch 32000] loss 3.28, ppl 26.63, throughput 5712.93 samples/s
[Epoch 15 Batch 33000] loss 3.28, ppl 26.58, throughput 5762.70 samples/s
[Epoch 15 Batch 34000] loss 3.28, ppl 26.51, throughput 5554.08 samples/s
[Epoch 15 Batch 35000] loss 3.28, ppl 26.52, throughput 5735.75 samples/s
[Epoch 15 Batch 36000] loss 3.28, ppl 26.50, throughput 5732.83 samples/s
[Epoch 15 Batch 37000] loss 3.28, ppl 26.56, throughput 5691.95 samples/s
[Epoch 15 Batch 38000] loss 3.28, ppl 26.60, throughput 5565.25 samples/s
[Epoch 15 Batch 39000] loss 3.28, ppl 26.70, throughput 5714.85 samples/s
[Epoch 15 Batch 40000] loss 3.28, ppl 26.51, throughput 5708.44 samples/s
[Epoch 15 Batch 41000] loss 3.29, ppl 26.71, throughput 5579.59 samples/s
[Epoch 15 Batch 42000] loss 3.28, ppl 26.63, throughput 5742.06 samples/s
[Epoch 15 Batch 43000] loss 3.28, ppl 26.65, throughput 5756.76 samples/s
[Epoch 15 Batch 44000] loss 3.28, ppl 26.64, throughput 5723.91 samples/s
[Epoch 15 Batch 45000] loss 3.28, ppl 26.65, throughput 5572.84 samples/s
[Epoch 15 Batch 46000] loss 3.28, ppl 26.62, throughput 5757.23 samples/s
[Epoch 15 Batch 47000] loss 3.28, ppl 26.58, throughput 5775.76 samples/s
[Epoch 15 Batch 48000] loss 3.28, ppl 26.62, throughput 5738.47 samples/s
[Epoch 15 Batch 49000] loss 3.28, ppl 26.66, throughput 5582.52 samples/s
[Epoch 15 Batch 50000] loss 3.28, ppl 26.66, throughput 5784.26 samples/s
[Epoch 15 Batch 51000] loss 3.28, ppl 26.69, throughput 5722.12 samples/s
[Epoch 15 Batch 52000] loss 3.28, ppl 26.64, throughput 5764.42 samples/s
[Epoch 15 Batch 53000] loss 3.28, ppl 26.60, throughput 5589.32 samples/s
[Epoch 15 Batch 54000] loss 3.28, ppl 26.68, throughput 5721.68 samples/s
[Epoch 15 Batch 55000] loss 3.28, ppl 26.58, throughput 5721.99 samples/s
[Epoch 15 Batch 56000] loss 3.28, ppl 26.68, throughput 5616.61 samples/s
[Epoch 15 Batch 57000] loss 3.28, ppl 26.67, throughput 5729.65 samples/s
[Epoch 15 Batch 58000] loss 3.28, ppl 26.70, throughput 5764.21 samples/s
[Epoch 15 Batch 59000] loss 3.28, ppl 26.64, throughput 5636.54 samples/s
[Epoch 15 Batch 60000] loss 3.28, ppl 26.65, throughput 5234.53 samples/s
[Epoch 15 Batch 61000] loss 3.28, ppl 26.67, throughput 5298.92 samples/s
[Epoch 15 Batch 62000] loss 3.28, ppl 26.58, throughput 5320.29 samples/s
[Epoch 15 Batch 63000] loss 3.28, ppl 26.64, throughput 5365.66 samples/s
[Epoch 15 Batch 64000] loss 3.29, ppl 26.76, throughput 4701.38 samples/s
[Epoch 15 Batch 65000] loss 3.28, ppl 26.69, throughput 4652.24 samples/s
[Epoch 15 Batch 66000] loss 3.28, ppl 26.68, throughput 5710.53 samples/s
[Epoch 15 Batch 67000] loss 3.28, ppl 26.61, throughput 5627.64 samples/s
[Epoch 15 Batch 68000] loss 3.28, ppl 26.65, throughput 5730.45 samples/s
[Epoch 15 Batch 69000] loss 3.28, ppl 26.69, throughput 5753.40 samples/s
[Epoch 15 Batch 70000] loss 3.28, ppl 26.64, throughput 5698.85 samples/s
[Epoch 15 Batch 71000] loss 3.28, ppl 26.63, throughput 5570.11 samples/s
[Epoch 15 Batch 72000] loss 3.29, ppl 26.73, throughput 5770.36 samples/s
[Epoch 15 Batch 73000] loss 3.28, ppl 26.69, throughput 5762.91 samples/s
[Epoch 15 Batch 74000] loss 3.28, ppl 26.68, throughput 5776.88 samples/s
[Epoch 15 Batch 75000] loss 3.28, ppl 26.65, throughput 5600.53 samples/s
[Epoch 15 Batch 76000] loss 3.29, ppl 26.71, throughput 5743.66 samples/s
[Epoch 15 Batch 77000] loss 3.29, ppl 26.72, throughput 5765.68 samples/s
[Epoch 15 Batch 78000] loss 3.29, ppl 26.77, throughput 5806.58 samples/s
Epoch 15 took 7324.97 seconds.
[Epoch 16 Batch 1000] loss 3.26, ppl 26.17, throughput 5658.04 samples/s
[Epoch 16 Batch 2000] loss 3.26, ppl 26.15, throughput 5662.54 samples/s
[Epoch 16 Batch 3000] loss 3.26, ppl 26.03, throughput 5742.89 samples/s
[Epoch 16 Batch 4000] loss 3.26, ppl 26.00, throughput 5575.34 samples/s
[Epoch 16 Batch 5000] loss 3.27, ppl 26.31, throughput 5378.21 samples/s
[Epoch 16 Batch 6000] loss 3.27, ppl 26.41, throughput 5365.16 samples/s
[Epoch 16 Batch 7000] loss 3.27, ppl 26.19, throughput 5258.84 samples/s
[Epoch 16 Batch 8000] loss 3.26, ppl 26.03, throughput 4483.55 samples/s
[Epoch 16 Batch 9000] loss 3.26, ppl 26.14, throughput 4894.85 samples/s
[Epoch 16 Batch 10000] loss 3.26, ppl 26.07, throughput 5355.06 samples/s
[Epoch 16 Batch 11000] loss 3.26, ppl 26.10, throughput 5615.17 samples/s
[Epoch 16 Batch 12000] loss 3.26, ppl 26.03, throughput 5596.74 samples/s
[Epoch 16 Batch 13000] loss 3.26, ppl 26.11, throughput 5820.97 samples/s
[Epoch 16 Batch 14000] loss 3.27, ppl 26.36, throughput 5752.64 samples/s
[Epoch 16 Batch 15000] loss 3.27, ppl 26.31, throughput 5531.12 samples/s
[Epoch 16 Batch 16000] loss 3.27, ppl 26.37, throughput 5772.70 samples/s
[Epoch 16 Batch 17000] loss 3.27, ppl 26.36, throughput 5704.68 samples/s
[Epoch 16 Batch 18000] loss 3.27, ppl 26.37, throughput 5726.72 samples/s
[Epoch 16 Batch 19000] loss 3.27, ppl 26.38, throughput 5592.67 samples/s
[Epoch 16 Batch 20000] loss 3.28, ppl 26.45, throughput 5734.59 samples/s
[Epoch 16 Batch 21000] loss 3.27, ppl 26.27, throughput 5696.33 samples/s
[Epoch 16 Batch 22000] loss 3.27, ppl 26.31, throughput 5748.33 samples/s
[Epoch 16 Batch 23000] loss 3.27, ppl 26.30, throughput 5507.83 samples/s
[Epoch 16 Batch 24000] loss 3.28, ppl 26.47, throughput 5751.41 samples/s
[Epoch 16 Batch 25000] loss 3.28, ppl 26.47, throughput 5801.35 samples/s
[Epoch 16 Batch 26000] loss 3.27, ppl 26.41, throughput 5779.87 samples/s
[Epoch 16 Batch 27000] loss 3.28, ppl 26.45, throughput 5593.93 samples/s
[Epoch 16 Batch 28000] loss 3.28, ppl 26.45, throughput 5722.10 samples/s
[Epoch 16 Batch 29000] loss 3.27, ppl 26.34, throughput 5720.29 samples/s
[Epoch 16 Batch 30000] loss 3.27, ppl 26.21, throughput 5602.20 samples/s
[Epoch 16 Batch 31000] loss 3.27, ppl 26.24, throughput 5675.18 samples/s
[Epoch 16 Batch 32000] loss 3.27, ppl 26.32, throughput 5756.73 samples/s
[Epoch 16 Batch 33000] loss 3.27, ppl 26.30, throughput 5274.11 samples/s
[Epoch 16 Batch 34000] loss 3.27, ppl 26.41, throughput 4474.22 samples/s
[Epoch 16 Batch 35000] loss 3.28, ppl 26.50, throughput 4565.47 samples/s
[Epoch 16 Batch 36000] loss 3.27, ppl 26.39, throughput 4626.14 samples/s
[Epoch 16 Batch 37000] loss 3.27, ppl 26.42, throughput 4649.19 samples/s
[Epoch 16 Batch 38000] loss 3.27, ppl 26.36, throughput 4577.15 samples/s
[Epoch 16 Batch 39000] loss 3.28, ppl 26.55, throughput 5473.84 samples/s
[Epoch 16 Batch 40000] loss 3.27, ppl 26.39, throughput 5710.52 samples/s
[Epoch 16 Batch 41000] loss 3.27, ppl 26.36, throughput 5565.55 samples/s
[Epoch 16 Batch 42000] loss 3.28, ppl 26.53, throughput 5767.01 samples/s
[Epoch 16 Batch 43000] loss 3.28, ppl 26.47, throughput 5723.94 samples/s
[Epoch 16 Batch 44000] loss 3.28, ppl 26.48, throughput 5715.90 samples/s
[Epoch 16 Batch 45000] loss 3.28, ppl 26.49, throughput 5546.07 samples/s
[Epoch 16 Batch 46000] loss 3.28, ppl 26.48, throughput 5765.69 samples/s
[Epoch 16 Batch 47000] loss 3.28, ppl 26.50, throughput 5754.89 samples/s
[Epoch 16 Batch 48000] loss 3.28, ppl 26.47, throughput 5735.17 samples/s
[Epoch 16 Batch 49000] loss 3.28, ppl 26.57, throughput 5524.65 samples/s
[Epoch 16 Batch 50000] loss 3.28, ppl 26.47, throughput 5794.50 samples/s
[Epoch 16 Batch 51000] loss 3.27, ppl 26.42, throughput 5785.98 samples/s
[Epoch 16 Batch 52000] loss 3.27, ppl 26.37, throughput 5709.25 samples/s
[Epoch 16 Batch 53000] loss 3.28, ppl 26.47, throughput 5535.75 samples/s
[Epoch 16 Batch 54000] loss 3.28, ppl 26.55, throughput 5746.49 samples/s
[Epoch 16 Batch 55000] loss 3.28, ppl 26.50, throughput 5696.93 samples/s
[Epoch 16 Batch 56000] loss 3.28, ppl 26.49, throughput 5571.52 samples/s
[Epoch 16 Batch 57000] loss 3.28, ppl 26.50, throughput 5735.76 samples/s
[Epoch 16 Batch 58000] loss 3.27, ppl 26.41, throughput 5689.16 samples/s
[Epoch 16 Batch 59000] loss 3.28, ppl 26.49, throughput 5772.06 samples/s
[Epoch 16 Batch 60000] loss 3.28, ppl 26.51, throughput 5596.93 samples/s
[Epoch 16 Batch 61000] loss 3.27, ppl 26.44, throughput 5794.69 samples/s
[Epoch 16 Batch 62000] loss 3.28, ppl 26.55, throughput 5735.14 samples/s
[Epoch 16 Batch 63000] loss 3.28, ppl 26.46, throughput 5686.40 samples/s
[Epoch 16 Batch 64000] loss 3.28, ppl 26.58, throughput 5540.15 samples/s
[Epoch 16 Batch 65000] loss 3.28, ppl 26.55, throughput 5744.22 samples/s
[Epoch 16 Batch 66000] loss 3.28, ppl 26.52, throughput 5749.06 samples/s
[Epoch 16 Batch 67000] loss 3.28, ppl 26.61, throughput 5527.63 samples/s
[Epoch 16 Batch 68000] loss 3.28, ppl 26.50, throughput 5757.62 samples/s
[Epoch 16 Batch 69000] loss 3.27, ppl 26.40, throughput 5699.99 samples/s
[Epoch 16 Batch 70000] loss 3.28, ppl 26.49, throughput 5739.28 samples/s
[Epoch 16 Batch 71000] loss 3.28, ppl 26.64, throughput 5590.44 samples/s
[Epoch 16 Batch 72000] loss 3.28, ppl 26.57, throughput 5759.36 samples/s
[Epoch 16 Batch 73000] loss 3.28, ppl 26.47, throughput 5757.68 samples/s
[Epoch 16 Batch 74000] loss 3.28, ppl 26.57, throughput 5740.28 samples/s
[Epoch 16 Batch 75000] loss 3.28, ppl 26.54, throughput 5566.69 samples/s
[Epoch 16 Batch 76000] loss 3.28, ppl 26.58, throughput 5767.73 samples/s
[Epoch 16 Batch 77000] loss 3.28, ppl 26.53, throughput 5730.29 samples/s
[Epoch 16 Batch 78000] loss 3.28, ppl 26.68, throughput 5728.83 samples/s
Epoch 16 took 7207.53 seconds.
[Epoch 17 Batch 1000] loss 3.26, ppl 26.05, throughput 5582.14 samples/s
[Epoch 17 Batch 2000] loss 3.26, ppl 25.96, throughput 5725.41 samples/s
[Epoch 17 Batch 3000] loss 3.25, ppl 25.81, throughput 5706.70 samples/s
[Epoch 17 Batch 4000] loss 3.25, ppl 25.88, throughput 5516.26 samples/s
[Epoch 17 Batch 5000] loss 3.26, ppl 26.05, throughput 5295.49 samples/s
[Epoch 17 Batch 6000] loss 3.26, ppl 26.08, throughput 5316.10 samples/s
[Epoch 17 Batch 7000] loss 3.25, ppl 25.68, throughput 4448.50 samples/s
[Epoch 17 Batch 8000] loss 3.25, ppl 25.77, throughput 4017.40 samples/s
[Epoch 17 Batch 9000] loss 3.26, ppl 26.02, throughput 4250.34 samples/s
[Epoch 17 Batch 10000] loss 3.26, ppl 25.97, throughput 4284.24 samples/s
[Epoch 17 Batch 11000] loss 3.27, ppl 26.19, throughput 4192.84 samples/s
[Epoch 17 Batch 12000] loss 3.26, ppl 26.07, throughput 4342.14 samples/s
[Epoch 17 Batch 13000] loss 3.26, ppl 26.04, throughput 4642.01 samples/s
[Epoch 17 Batch 14000] loss 3.27, ppl 26.33, throughput 5153.34 samples/s
[Epoch 17 Batch 15000] loss 3.26, ppl 26.17, throughput 5560.28 samples/s
[Epoch 17 Batch 16000] loss 3.27, ppl 26.27, throughput 5719.82 samples/s
[Epoch 17 Batch 17000] loss 3.27, ppl 26.18, throughput 5728.98 samples/s
[Epoch 17 Batch 18000] loss 3.27, ppl 26.22, throughput 5712.12 samples/s
[Epoch 17 Batch 19000] loss 3.27, ppl 26.19, throughput 5546.78 samples/s
[Epoch 17 Batch 20000] loss 3.26, ppl 26.01, throughput 5759.37 samples/s
[Epoch 17 Batch 21000] loss 3.27, ppl 26.19, throughput 5718.57 samples/s
[Epoch 17 Batch 22000] loss 3.27, ppl 26.27, throughput 5717.10 samples/s
[Epoch 17 Batch 23000] loss 3.26, ppl 26.18, throughput 5557.61 samples/s
[Epoch 17 Batch 24000] loss 3.27, ppl 26.19, throughput 5703.29 samples/s
[Epoch 17 Batch 25000] loss 3.26, ppl 26.08, throughput 5735.81 samples/s
[Epoch 17 Batch 26000] loss 3.27, ppl 26.19, throughput 5583.37 samples/s
[Epoch 17 Batch 27000] loss 3.27, ppl 26.34, throughput 5749.76 samples/s
[Epoch 17 Batch 28000] loss 3.27, ppl 26.19, throughput 5721.72 samples/s
[Epoch 17 Batch 29000] loss 3.27, ppl 26.26, throughput 5739.86 samples/s
[Epoch 17 Batch 30000] loss 3.27, ppl 26.21, throughput 5619.89 samples/s
[Epoch 17 Batch 31000] loss 3.26, ppl 26.16, throughput 5776.74 samples/s
[Epoch 17 Batch 32000] loss 3.27, ppl 26.21, throughput 5716.40 samples/s
[Epoch 17 Batch 33000] loss 3.27, ppl 26.29, throughput 5713.73 samples/s
[Epoch 17 Batch 34000] loss 3.27, ppl 26.18, throughput 5580.07 samples/s
[Epoch 17 Batch 35000] loss 3.27, ppl 26.20, throughput 5769.53 samples/s
[Epoch 17 Batch 36000] loss 3.27, ppl 26.30, throughput 5727.30 samples/s
[Epoch 17 Batch 37000] loss 3.27, ppl 26.24, throughput 5682.56 samples/s
[Epoch 17 Batch 38000] loss 3.27, ppl 26.27, throughput 5539.91 samples/s
[Epoch 17 Batch 39000] loss 3.27, ppl 26.30, throughput 5716.82 samples/s
[Epoch 17 Batch 40000] loss 3.27, ppl 26.38, throughput 5704.40 samples/s
[Epoch 17 Batch 41000] loss 3.27, ppl 26.37, throughput 5567.30 samples/s
[Epoch 17 Batch 42000] loss 3.27, ppl 26.25, throughput 5774.05 samples/s
[Epoch 17 Batch 43000] loss 3.27, ppl 26.29, throughput 5730.02 samples/s
[Epoch 17 Batch 44000] loss 3.27, ppl 26.28, throughput 5677.95 samples/s
[Epoch 17 Batch 45000] loss 3.27, ppl 26.34, throughput 5553.61 samples/s
[Epoch 17 Batch 46000] loss 3.27, ppl 26.30, throughput 5793.92 samples/s
[Epoch 17 Batch 47000] loss 3.27, ppl 26.33, throughput 5712.91 samples/s
[Epoch 17 Batch 48000] loss 3.27, ppl 26.40, throughput 5729.10 samples/s
[Epoch 17 Batch 49000] loss 3.27, ppl 26.39, throughput 5588.88 samples/s
[Epoch 17 Batch 50000] loss 3.27, ppl 26.28, throughput 5731.59 samples/s
[Epoch 17 Batch 51000] loss 3.27, ppl 26.41, throughput 5761.99 samples/s
[Epoch 17 Batch 52000] loss 3.27, ppl 26.39, throughput 5795.86 samples/s
[Epoch 17 Batch 53000] loss 3.27, ppl 26.36, throughput 5530.35 samples/s
[Epoch 17 Batch 54000] loss 3.27, ppl 26.32, throughput 5120.57 samples/s
[Epoch 17 Batch 55000] loss 3.27, ppl 26.35, throughput 4591.12 samples/s
[Epoch 17 Batch 56000] loss 3.27, ppl 26.43, throughput 4428.14 samples/s
[Epoch 17 Batch 57000] loss 3.27, ppl 26.35, throughput 4589.06 samples/s
[Epoch 17 Batch 58000] loss 3.27, ppl 26.23, throughput 4609.82 samples/s
[Epoch 17 Batch 59000] loss 3.27, ppl 26.31, throughput 4628.06 samples/s
[Epoch 17 Batch 60000] loss 3.27, ppl 26.37, throughput 5362.45 samples/s
[Epoch 17 Batch 61000] loss 3.27, ppl 26.37, throughput 5698.49 samples/s
[Epoch 17 Batch 62000] loss 3.27, ppl 26.38, throughput 5779.05 samples/s
[Epoch 17 Batch 63000] loss 3.27, ppl 26.40, throughput 5744.88 samples/s
[Epoch 17 Batch 64000] loss 3.27, ppl 26.40, throughput 5564.54 samples/s
[Epoch 17 Batch 65000] loss 3.27, ppl 26.44, throughput 5745.68 samples/s
[Epoch 17 Batch 66000] loss 3.27, ppl 26.35, throughput 5707.26 samples/s
[Epoch 17 Batch 67000] loss 3.27, ppl 26.36, throughput 5578.72 samples/s
[Epoch 17 Batch 68000] loss 3.27, ppl 26.39, throughput 5713.07 samples/s
[Epoch 17 Batch 69000] loss 3.27, ppl 26.41, throughput 5749.39 samples/s
[Epoch 17 Batch 70000] loss 3.27, ppl 26.43, throughput 5767.92 samples/s
[Epoch 17 Batch 71000] loss 3.28, ppl 26.51, throughput 5554.00 samples/s
[Epoch 17 Batch 72000] loss 3.28, ppl 26.48, throughput 5713.34 samples/s
[Epoch 17 Batch 73000] loss 3.28, ppl 26.47, throughput 5784.05 samples/s
[Epoch 17 Batch 74000] loss 3.27, ppl 26.36, throughput 5705.79 samples/s
[Epoch 17 Batch 75000] loss 3.27, ppl 26.36, throughput 5548.71 samples/s
[Epoch 17 Batch 76000] loss 3.28, ppl 26.45, throughput 5753.64 samples/s
[Epoch 17 Batch 77000] loss 3.27, ppl 26.44, throughput 5734.26 samples/s
[Epoch 17 Batch 78000] loss 3.27, ppl 26.40, throughput 5714.05 samples/s
Epoch 17 took 7378.03 seconds.
[Epoch 18 Batch 1000] loss 3.25, ppl 25.77, throughput 5568.87 samples/s
[Epoch 18 Batch 2000] loss 3.25, ppl 25.74, throughput 5727.70 samples/s
[Epoch 18 Batch 3000] loss 3.25, ppl 25.76, throughput 5696.23 samples/s
[Epoch 18 Batch 4000] loss 3.25, ppl 25.68, throughput 5588.62 samples/s
[Epoch 18 Batch 5000] loss 3.25, ppl 25.83, throughput 5528.60 samples/s
[Epoch 18 Batch 6000] loss 3.26, ppl 26.12, throughput 5336.03 samples/s
[Epoch 18 Batch 7000] loss 3.26, ppl 25.95, throughput 5298.26 samples/s
[Epoch 18 Batch 8000] loss 3.26, ppl 26.13, throughput 5156.87 samples/s
[Epoch 18 Batch 9000] loss 3.26, ppl 25.95, throughput 5367.48 samples/s
[Epoch 18 Batch 10000] loss 3.25, ppl 25.90, throughput 5109.98 samples/s
[Epoch 18 Batch 11000] loss 3.26, ppl 25.93, throughput 4984.59 samples/s
[Epoch 18 Batch 12000] loss 3.26, ppl 25.94, throughput 5606.01 samples/s
[Epoch 18 Batch 13000] loss 3.25, ppl 25.91, throughput 5720.05 samples/s
[Epoch 18 Batch 14000] loss 3.26, ppl 25.99, throughput 5795.04 samples/s
[Epoch 18 Batch 15000] loss 3.26, ppl 26.04, throughput 5533.01 samples/s
[Epoch 18 Batch 16000] loss 3.25, ppl 25.89, throughput 5753.54 samples/s
[Epoch 18 Batch 17000] loss 3.25, ppl 25.90, throughput 5736.17 samples/s
[Epoch 18 Batch 18000] loss 3.26, ppl 25.94, throughput 5785.47 samples/s
[Epoch 18 Batch 19000] loss 3.26, ppl 26.01, throughput 5539.38 samples/s
[Epoch 18 Batch 20000] loss 3.26, ppl 26.06, throughput 5740.85 samples/s
[Epoch 18 Batch 21000] loss 3.26, ppl 26.09, throughput 5727.34 samples/s
[Epoch 18 Batch 22000] loss 3.26, ppl 25.98, throughput 5767.68 samples/s
[Epoch 18 Batch 23000] loss 3.26, ppl 26.07, throughput 5563.23 samples/s
[Epoch 18 Batch 24000] loss 3.26, ppl 26.18, throughput 5717.32 samples/s
[Epoch 18 Batch 25000] loss 3.26, ppl 26.10, throughput 5771.98 samples/s
[Epoch 18 Batch 26000] loss 3.26, ppl 26.09, throughput 5725.98 samples/s
[Epoch 18 Batch 27000] loss 3.26, ppl 26.03, throughput 5316.66 samples/s
[Epoch 18 Batch 28000] loss 3.26, ppl 25.99, throughput 4673.97 samples/s
[Epoch 18 Batch 29000] loss 3.26, ppl 26.04, throughput 4604.08 samples/s
[Epoch 18 Batch 30000] loss 3.26, ppl 26.08, throughput 4490.03 samples/s
[Epoch 18 Batch 31000] loss 3.26, ppl 26.09, throughput 4662.17 samples/s
[Epoch 18 Batch 32000] loss 3.25, ppl 25.89, throughput 4646.90 samples/s
[Epoch 18 Batch 33000] loss 3.26, ppl 26.17, throughput 5137.64 samples/s
[Epoch 18 Batch 34000] loss 3.26, ppl 26.13, throughput 5564.16 samples/s
[Epoch 18 Batch 35000] loss 3.26, ppl 26.16, throughput 5711.13 samples/s
[Epoch 18 Batch 36000] loss 3.26, ppl 26.13, throughput 5730.03 samples/s
[Epoch 18 Batch 37000] loss 3.26, ppl 26.05, throughput 5743.71 samples/s
[Epoch 18 Batch 38000] loss 3.27, ppl 26.21, throughput 5566.96 samples/s
[Epoch 18 Batch 39000] loss 3.27, ppl 26.20, throughput 5731.32 samples/s
[Epoch 18 Batch 40000] loss 3.27, ppl 26.22, throughput 5755.64 samples/s
[Epoch 18 Batch 41000] loss 3.27, ppl 26.23, throughput 5618.83 samples/s
[Epoch 18 Batch 42000] loss 3.26, ppl 26.14, throughput 5783.87 samples/s
[Epoch 18 Batch 43000] loss 3.26, ppl 26.16, throughput 5738.53 samples/s
[Epoch 18 Batch 44000] loss 3.26, ppl 26.13, throughput 5734.55 samples/s
[Epoch 18 Batch 45000] loss 3.27, ppl 26.27, throughput 5579.56 samples/s
[Epoch 18 Batch 46000] loss 3.27, ppl 26.30, throughput 5755.54 samples/s
[Epoch 18 Batch 47000] loss 3.27, ppl 26.19, throughput 5764.52 samples/s
[Epoch 18 Batch 48000] loss 3.27, ppl 26.26, throughput 5808.86 samples/s
[Epoch 18 Batch 49000] loss 3.27, ppl 26.18, throughput 5592.56 samples/s
[Epoch 18 Batch 50000] loss 3.26, ppl 26.13, throughput 5711.80 samples/s
[Epoch 18 Batch 51000] loss 3.26, ppl 26.17, throughput 5727.93 samples/s
[Epoch 18 Batch 52000] loss 3.27, ppl 26.18, throughput 5755.50 samples/s
[Epoch 18 Batch 53000] loss 3.26, ppl 26.13, throughput 5621.31 samples/s
[Epoch 18 Batch 54000] loss 3.27, ppl 26.28, throughput 5741.49 samples/s
[Epoch 18 Batch 55000] loss 3.27, ppl 26.22, throughput 5685.45 samples/s
[Epoch 18 Batch 56000] loss 3.27, ppl 26.21, throughput 5574.21 samples/s
[Epoch 18 Batch 57000] loss 3.26, ppl 26.16, throughput 5739.67 samples/s
[Epoch 18 Batch 58000] loss 3.27, ppl 26.19, throughput 5782.66 samples/s
[Epoch 18 Batch 59000] loss 3.27, ppl 26.27, throughput 5730.72 samples/s
[Epoch 18 Batch 60000] loss 3.27, ppl 26.29, throughput 5595.87 samples/s
[Epoch 18 Batch 61000] loss 3.27, ppl 26.34, throughput 5720.86 samples/s
[Epoch 18 Batch 62000] loss 3.27, ppl 26.28, throughput 5699.62 samples/s
[Epoch 18 Batch 63000] loss 3.27, ppl 26.18, throughput 5731.98 samples/s
[Epoch 18 Batch 64000] loss 3.27, ppl 26.24, throughput 5630.06 samples/s
[Epoch 18 Batch 65000] loss 3.26, ppl 26.15, throughput 5760.30 samples/s
[Epoch 18 Batch 66000] loss 3.27, ppl 26.24, throughput 5743.27 samples/s
[Epoch 18 Batch 67000] loss 3.26, ppl 26.15, throughput 5535.04 samples/s
[Epoch 18 Batch 68000] loss 3.27, ppl 26.38, throughput 5695.67 samples/s
[Epoch 18 Batch 69000] loss 3.27, ppl 26.23, throughput 5752.87 samples/s
[Epoch 18 Batch 70000] loss 3.27, ppl 26.30, throughput 5731.37 samples/s
[Epoch 18 Batch 71000] loss 3.27, ppl 26.28, throughput 5566.18 samples/s
[Epoch 18 Batch 72000] loss 3.27, ppl 26.30, throughput 5715.39 samples/s
[Epoch 18 Batch 73000] loss 3.27, ppl 26.27, throughput 5740.35 samples/s
[Epoch 18 Batch 74000] loss 3.27, ppl 26.30, throughput 5696.93 samples/s
[Epoch 18 Batch 75000] loss 3.27, ppl 26.30, throughput 5551.15 samples/s
[Epoch 18 Batch 76000] loss 3.27, ppl 26.28, throughput 5755.70 samples/s
[Epoch 18 Batch 77000] loss 3.27, ppl 26.28, throughput 5761.64 samples/s
[Epoch 18 Batch 78000] loss 3.27, ppl 26.26, throughput 5797.14 samples/s
Epoch 18 took 7191.16 seconds.
[Epoch 19 Batch 1000] loss 3.25, ppl 25.73, throughput 5309.15 samples/s
[Epoch 19 Batch 2000] loss 3.24, ppl 25.47, throughput 4604.72 samples/s
[Epoch 19 Batch 3000] loss 3.23, ppl 25.32, throughput 4575.14 samples/s
[Epoch 19 Batch 4000] loss 3.25, ppl 25.71, throughput 4353.71 samples/s
[Epoch 19 Batch 5000] loss 3.25, ppl 25.89, throughput 4174.93 samples/s
[Epoch 19 Batch 6000] loss 3.25, ppl 25.87, throughput 4173.03 samples/s
[Epoch 19 Batch 7000] loss 3.25, ppl 25.73, throughput 4219.29 samples/s
[Epoch 19 Batch 8000] loss 3.24, ppl 25.65, throughput 4429.28 samples/s
[Epoch 19 Batch 9000] loss 3.24, ppl 25.53, throughput 5296.26 samples/s
[Epoch 19 Batch 10000] loss 3.26, ppl 26.02, throughput 5324.82 samples/s
[Epoch 19 Batch 11000] loss 3.25, ppl 25.88, throughput 5345.82 samples/s
[Epoch 19 Batch 12000] loss 3.24, ppl 25.53, throughput 5600.73 samples/s
[Epoch 19 Batch 13000] loss 3.26, ppl 25.93, throughput 5725.03 samples/s
[Epoch 19 Batch 14000] loss 3.26, ppl 25.97, throughput 5805.37 samples/s
[Epoch 19 Batch 15000] loss 3.25, ppl 25.83, throughput 5560.06 samples/s
[Epoch 19 Batch 16000] loss 3.26, ppl 25.92, throughput 5741.99 samples/s
[Epoch 19 Batch 17000] loss 3.25, ppl 25.81, throughput 5741.20 samples/s
[Epoch 19 Batch 18000] loss 3.25, ppl 25.89, throughput 5749.50 samples/s
[Epoch 19 Batch 19000] loss 3.26, ppl 25.94, throughput 5589.51 samples/s
[Epoch 19 Batch 20000] loss 3.26, ppl 26.00, throughput 5736.96 samples/s
[Epoch 19 Batch 21000] loss 3.26, ppl 26.02, throughput 5734.37 samples/s
[Epoch 19 Batch 22000] loss 3.26, ppl 26.01, throughput 5747.88 samples/s
[Epoch 19 Batch 23000] loss 3.25, ppl 25.89, throughput 5590.10 samples/s
[Epoch 19 Batch 24000] loss 3.26, ppl 25.97, throughput 5766.53 samples/s
[Epoch 19 Batch 25000] loss 3.26, ppl 26.02, throughput 5745.07 samples/s
[Epoch 19 Batch 26000] loss 3.26, ppl 25.94, throughput 5561.86 samples/s
[Epoch 19 Batch 27000] loss 3.26, ppl 26.03, throughput 5673.53 samples/s
[Epoch 19 Batch 28000] loss 3.26, ppl 26.02, throughput 5699.60 samples/s
[Epoch 19 Batch 29000] loss 3.25, ppl 25.88, throughput 5761.19 samples/s
[Epoch 19 Batch 30000] loss 3.26, ppl 26.07, throughput 5577.69 samples/s
[Epoch 19 Batch 31000] loss 3.25, ppl 25.91, throughput 5699.89 samples/s
[Epoch 19 Batch 32000] loss 3.26, ppl 26.08, throughput 5713.74 samples/s
[Epoch 19 Batch 33000] loss 3.26, ppl 26.08, throughput 5746.54 samples/s
[Epoch 19 Batch 34000] loss 3.26, ppl 26.05, throughput 5556.20 samples/s
[Epoch 19 Batch 35000] loss 3.26, ppl 26.03, throughput 5766.03 samples/s
[Epoch 19 Batch 36000] loss 3.26, ppl 26.05, throughput 5710.73 samples/s
[Epoch 19 Batch 37000] loss 3.26, ppl 25.98, throughput 5797.20 samples/s
[Epoch 19 Batch 38000] loss 3.26, ppl 26.07, throughput 5600.57 samples/s
[Epoch 19 Batch 39000] loss 3.26, ppl 26.04, throughput 5717.80 samples/s
[Epoch 19 Batch 40000] loss 3.26, ppl 25.98, throughput 5736.25 samples/s
[Epoch 19 Batch 41000] loss 3.26, ppl 25.96, throughput 5572.23 samples/s
[Epoch 19 Batch 42000] loss 3.26, ppl 26.08, throughput 5743.00 samples/s
[Epoch 19 Batch 43000] loss 3.25, ppl 25.82, throughput 5763.26 samples/s
[Epoch 19 Batch 44000] loss 3.26, ppl 26.08, throughput 5710.65 samples/s
[Epoch 19 Batch 45000] loss 3.26, ppl 26.05, throughput 5539.99 samples/s
[Epoch 19 Batch 46000] loss 3.26, ppl 26.02, throughput 5767.95 samples/s
[Epoch 19 Batch 47000] loss 3.26, ppl 26.07, throughput 5710.01 samples/s
[Epoch 19 Batch 48000] loss 3.26, ppl 26.17, throughput 5813.07 samples/s
[Epoch 19 Batch 49000] loss 3.26, ppl 26.07, throughput 5584.12 samples/s
[Epoch 19 Batch 50000] loss 3.26, ppl 26.15, throughput 5706.31 samples/s
[Epoch 19 Batch 51000] loss 3.26, ppl 26.03, throughput 5741.99 samples/s
[Epoch 19 Batch 52000] loss 3.26, ppl 26.07, throughput 5731.89 samples/s
[Epoch 19 Batch 53000] loss 3.26, ppl 26.05, throughput 5534.89 samples/s
[Epoch 19 Batch 54000] loss 3.26, ppl 26.04, throughput 5693.09 samples/s
[Epoch 19 Batch 55000] loss 3.26, ppl 26.10, throughput 4556.74 samples/s
[Epoch 19 Batch 56000] loss 3.26, ppl 26.08, throughput 4431.53 samples/s
[Epoch 19 Batch 57000] loss 3.26, ppl 26.06, throughput 4621.32 samples/s
[Epoch 19 Batch 58000] loss 3.27, ppl 26.19, throughput 4604.89 samples/s
[Epoch 19 Batch 59000] loss 3.26, ppl 26.05, throughput 4654.81 samples/s
[Epoch 19 Batch 60000] loss 3.26, ppl 26.13, throughput 4865.72 samples/s
[Epoch 19 Batch 61000] loss 3.27, ppl 26.21, throughput 5708.77 samples/s
[Epoch 19 Batch 62000] loss 3.26, ppl 26.12, throughput 5754.20 samples/s
[Epoch 19 Batch 63000] loss 3.26, ppl 26.16, throughput 5722.94 samples/s
[Epoch 19 Batch 64000] loss 3.26, ppl 26.07, throughput 5595.17 samples/s
[Epoch 19 Batch 65000] loss 3.26, ppl 26.06, throughput 5734.86 samples/s
[Epoch 19 Batch 66000] loss 3.26, ppl 26.08, throughput 5720.50 samples/s
[Epoch 19 Batch 67000] loss 3.26, ppl 26.17, throughput 5576.70 samples/s
[Epoch 19 Batch 68000] loss 3.26, ppl 26.17, throughput 5707.11 samples/s
[Epoch 19 Batch 69000] loss 3.26, ppl 26.06, throughput 5754.22 samples/s
[Epoch 19 Batch 70000] loss 3.26, ppl 26.15, throughput 5753.79 samples/s
[Epoch 19 Batch 71000] loss 3.26, ppl 26.17, throughput 5556.97 samples/s
[Epoch 19 Batch 72000] loss 3.26, ppl 26.05, throughput 5794.66 samples/s
[Epoch 19 Batch 73000] loss 3.26, ppl 26.16, throughput 5777.70 samples/s
[Epoch 19 Batch 74000] loss 3.26, ppl 26.14, throughput 5710.34 samples/s
[Epoch 19 Batch 75000] loss 3.26, ppl 26.12, throughput 5540.54 samples/s
[Epoch 19 Batch 76000] loss 3.26, ppl 26.15, throughput 5709.12 samples/s
[Epoch 19 Batch 77000] loss 3.26, ppl 26.13, throughput 5785.02 samples/s
[Epoch 19 Batch 78000] loss 3.26, ppl 26.08, throughput 5672.57 samples/s
Epoch 19 took 7362.63 seconds.
[Epoch 20 Batch 1000] loss 3.24, ppl 25.63, throughput 5553.38 samples/s
[Epoch 20 Batch 2000] loss 3.24, ppl 25.48, throughput 5705.88 samples/s
[Epoch 20 Batch 3000] loss 3.23, ppl 25.35, throughput 5714.75 samples/s
[Epoch 20 Batch 4000] loss 3.24, ppl 25.56, throughput 5590.21 samples/s
[Epoch 20 Batch 5000] loss 3.24, ppl 25.41, throughput 5512.19 samples/s
[Epoch 20 Batch 6000] loss 3.24, ppl 25.41, throughput 4712.11 samples/s
[Epoch 20 Batch 7000] loss 3.24, ppl 25.53, throughput 5252.58 samples/s
[Epoch 20 Batch 8000] loss 3.24, ppl 25.55, throughput 5131.10 samples/s
[Epoch 20 Batch 9000] loss 3.24, ppl 25.58, throughput 5317.42 samples/s
[Epoch 20 Batch 10000] loss 3.25, ppl 25.87, throughput 5270.53 samples/s
[Epoch 20 Batch 11000] loss 3.26, ppl 25.97, throughput 5437.06 samples/s
[Epoch 20 Batch 12000] loss 3.24, ppl 25.65, throughput 5599.21 samples/s
[Epoch 20 Batch 13000] loss 3.25, ppl 25.88, throughput 5745.44 samples/s
[Epoch 20 Batch 14000] loss 3.24, ppl 25.61, throughput 5744.34 samples/s
[Epoch 20 Batch 15000] loss 3.25, ppl 25.74, throughput 5605.33 samples/s
[Epoch 20 Batch 16000] loss 3.25, ppl 25.72, throughput 5747.16 samples/s
[Epoch 20 Batch 17000] loss 3.25, ppl 25.77, throughput 5754.59 samples/s
[Epoch 20 Batch 18000] loss 3.26, ppl 25.99, throughput 5752.70 samples/s
[Epoch 20 Batch 19000] loss 3.25, ppl 25.86, throughput 5597.84 samples/s
[Epoch 20 Batch 20000] loss 3.25, ppl 25.90, throughput 5768.72 samples/s
[Epoch 20 Batch 21000] loss 3.25, ppl 25.79, throughput 5715.81 samples/s
[Epoch 20 Batch 22000] loss 3.25, ppl 25.89, throughput 5745.35 samples/s
[Epoch 20 Batch 23000] loss 3.26, ppl 25.94, throughput 5573.49 samples/s
[Epoch 20 Batch 24000] loss 3.25, ppl 25.91, throughput 5680.36 samples/s
[Epoch 20 Batch 25000] loss 3.25, ppl 25.84, throughput 5758.44 samples/s
[Epoch 20 Batch 26000] loss 3.25, ppl 25.90, throughput 5561.40 samples/s
[Epoch 20 Batch 27000] loss 3.25, ppl 25.89, throughput 5733.33 samples/s
[Epoch 20 Batch 28000] loss 3.25, ppl 25.78, throughput 4808.88 samples/s
[Epoch 20 Batch 29000] loss 3.25, ppl 25.68, throughput 4620.93 samples/s
[Epoch 20 Batch 30000] loss 3.26, ppl 25.93, throughput 4489.48 samples/s
[Epoch 20 Batch 31000] loss 3.25, ppl 25.86, throughput 4643.00 samples/s
[Epoch 20 Batch 32000] loss 3.26, ppl 25.99, throughput 4580.55 samples/s
[Epoch 20 Batch 33000] loss 3.25, ppl 25.83, throughput 4788.99 samples/s
[Epoch 20 Batch 34000] loss 3.25, ppl 25.81, throughput 5573.43 samples/s
[Epoch 20 Batch 35000] loss 3.26, ppl 25.98, throughput 5726.76 samples/s
[Epoch 20 Batch 36000] loss 3.25, ppl 25.90, throughput 5699.92 samples/s
[Epoch 20 Batch 37000] loss 3.26, ppl 25.93, throughput 5767.82 samples/s
[Epoch 20 Batch 38000] loss 3.25, ppl 25.87, throughput 5605.18 samples/s
[Epoch 20 Batch 39000] loss 3.25, ppl 25.77, throughput 5758.06 samples/s
[Epoch 20 Batch 40000] loss 3.26, ppl 25.98, throughput 5763.28 samples/s
[Epoch 20 Batch 41000] loss 3.25, ppl 25.90, throughput 5573.59 samples/s
[Epoch 20 Batch 42000] loss 3.26, ppl 25.95, throughput 5726.37 samples/s
[Epoch 20 Batch 43000] loss 3.26, ppl 25.93, throughput 5737.63 samples/s
[Epoch 20 Batch 44000] loss 3.25, ppl 25.89, throughput 5744.60 samples/s
[Epoch 20 Batch 45000] loss 3.25, ppl 25.87, throughput 5531.07 samples/s
[Epoch 20 Batch 46000] loss 3.26, ppl 25.95, throughput 5765.06 samples/s
[Epoch 20 Batch 47000] loss 3.26, ppl 26.03, throughput 5724.35 samples/s
[Epoch 20 Batch 48000] loss 3.26, ppl 26.04, throughput 5701.92 samples/s
[Epoch 20 Batch 49000] loss 3.26, ppl 25.96, throughput 5541.54 samples/s
[Epoch 20 Batch 50000] loss 3.26, ppl 25.95, throughput 5763.40 samples/s
[Epoch 20 Batch 51000] loss 3.26, ppl 25.92, throughput 5755.30 samples/s
[Epoch 20 Batch 52000] loss 3.26, ppl 26.00, throughput 5746.01 samples/s
[Epoch 20 Batch 53000] loss 3.26, ppl 26.00, throughput 5551.42 samples/s
[Epoch 20 Batch 54000] loss 3.26, ppl 26.04, throughput 5736.49 samples/s
[Epoch 20 Batch 55000] loss 3.26, ppl 25.97, throughput 5772.83 samples/s
[Epoch 20 Batch 56000] loss 3.26, ppl 25.97, throughput 5572.96 samples/s
[Epoch 20 Batch 57000] loss 3.26, ppl 26.03, throughput 5770.45 samples/s
[Epoch 20 Batch 58000] loss 3.26, ppl 26.01, throughput 5736.48 samples/s
[Epoch 20 Batch 59000] loss 3.26, ppl 25.97, throughput 5747.48 samples/s
[Epoch 20 Batch 60000] loss 3.25, ppl 25.91, throughput 5590.60 samples/s
[Epoch 20 Batch 61000] loss 3.26, ppl 25.96, throughput 5708.57 samples/s
[Epoch 20 Batch 62000] loss 3.26, ppl 25.96, throughput 5688.10 samples/s
[Epoch 20 Batch 63000] loss 3.26, ppl 25.94, throughput 5750.16 samples/s
[Epoch 20 Batch 64000] loss 3.26, ppl 26.03, throughput 5567.31 samples/s
[Epoch 20 Batch 65000] loss 3.26, ppl 26.02, throughput 5771.56 samples/s
[Epoch 20 Batch 66000] loss 3.26, ppl 25.97, throughput 5763.42 samples/s
[Epoch 20 Batch 67000] loss 3.25, ppl 25.86, throughput 5580.43 samples/s
[Epoch 20 Batch 68000] loss 3.26, ppl 26.04, throughput 5700.84 samples/s
[Epoch 20 Batch 69000] loss 3.26, ppl 25.94, throughput 5739.64 samples/s
[Epoch 20 Batch 70000] loss 3.26, ppl 26.02, throughput 5725.68 samples/s
[Epoch 20 Batch 71000] loss 3.26, ppl 26.07, throughput 5563.58 samples/s
[Epoch 20 Batch 72000] loss 3.26, ppl 26.06, throughput 5708.29 samples/s
[Epoch 20 Batch 73000] loss 3.26, ppl 26.06, throughput 5722.42 samples/s
[Epoch 20 Batch 74000] loss 3.26, ppl 26.05, throughput 5763.47 samples/s
[Epoch 20 Batch 75000] loss 3.26, ppl 26.06, throughput 5641.20 samples/s
[Epoch 20 Batch 76000] loss 3.26, ppl 26.05, throughput 5722.62 samples/s
[Epoch 20 Batch 77000] loss 3.26, ppl 26.02, throughput 5769.57 samples/s
[Epoch 20 Batch 78000] loss 3.26, ppl 25.95, throughput 5752.83 samples/s
Epoch 20 took 7197.45 seconds.
[Epoch 21 Batch 1000] loss 3.22, ppl 25.15, throughput 5566.07 samples/s
[Epoch 21 Batch 2000] loss 3.25, ppl 25.74, throughput 4800.51 samples/s
[Epoch 21 Batch 3000] loss 3.24, ppl 25.42, throughput 4620.85 samples/s
[Epoch 21 Batch 4000] loss 3.24, ppl 25.63, throughput 4374.15 samples/s
[Epoch 21 Batch 5000] loss 3.25, ppl 25.69, throughput 4216.29 samples/s
[Epoch 21 Batch 6000] loss 3.25, ppl 25.70, throughput 4196.11 samples/s
[Epoch 21 Batch 7000] loss 3.24, ppl 25.61, throughput 4158.68 samples/s
[Epoch 21 Batch 8000] loss 3.25, ppl 25.66, throughput 3980.17 samples/s
[Epoch 21 Batch 9000] loss 3.24, ppl 25.62, throughput 5297.06 samples/s
[Epoch 21 Batch 10000] loss 3.24, ppl 25.63, throughput 5295.72 samples/s
[Epoch 21 Batch 11000] loss 3.25, ppl 25.77, throughput 5231.98 samples/s
[Epoch 21 Batch 12000] loss 3.25, ppl 25.70, throughput 5502.94 samples/s
[Epoch 21 Batch 13000] loss 3.24, ppl 25.50, throughput 5747.92 samples/s
[Epoch 21 Batch 14000] loss 3.23, ppl 25.37, throughput 5697.21 samples/s
[Epoch 21 Batch 15000] loss 3.24, ppl 25.50, throughput 5597.66 samples/s
[Epoch 21 Batch 16000] loss 3.25, ppl 25.72, throughput 5740.61 samples/s
[Epoch 21 Batch 17000] loss 3.24, ppl 25.58, throughput 5745.98 samples/s
[Epoch 21 Batch 18000] loss 3.24, ppl 25.60, throughput 5766.09 samples/s
[Epoch 21 Batch 19000] loss 3.24, ppl 25.63, throughput 5525.72 samples/s
[Epoch 21 Batch 20000] loss 3.25, ppl 25.76, throughput 5721.95 samples/s
[Epoch 21 Batch 21000] loss 3.25, ppl 25.79, throughput 5709.22 samples/s
[Epoch 21 Batch 22000] loss 3.25, ppl 25.69, throughput 5732.63 samples/s
[Epoch 21 Batch 23000] loss 3.25, ppl 25.74, throughput 5578.43 samples/s
[Epoch 21 Batch 24000] loss 3.25, ppl 25.68, throughput 5732.72 samples/s
[Epoch 21 Batch 25000] loss 3.24, ppl 25.65, throughput 5761.63 samples/s
[Epoch 21 Batch 26000] loss 3.24, ppl 25.66, throughput 5567.43 samples/s
[Epoch 21 Batch 27000] loss 3.25, ppl 25.74, throughput 5733.59 samples/s
[Epoch 21 Batch 28000] loss 3.24, ppl 25.58, throughput 5747.11 samples/s
[Epoch 21 Batch 29000] loss 3.25, ppl 25.77, throughput 5736.34 samples/s
[Epoch 21 Batch 30000] loss 3.25, ppl 25.74, throughput 5554.15 samples/s
[Epoch 21 Batch 31000] loss 3.25, ppl 25.85, throughput 5764.55 samples/s
[Epoch 21 Batch 32000] loss 3.25, ppl 25.89, throughput 5759.52 samples/s
[Epoch 21 Batch 33000] loss 3.25, ppl 25.79, throughput 5722.10 samples/s
[Epoch 21 Batch 34000] loss 3.26, ppl 25.92, throughput 5542.43 samples/s
[Epoch 21 Batch 35000] loss 3.25, ppl 25.69, throughput 5769.85 samples/s
[Epoch 21 Batch 36000] loss 3.25, ppl 25.73, throughput 5749.96 samples/s
[Epoch 21 Batch 37000] loss 3.25, ppl 25.76, throughput 5766.01 samples/s
[Epoch 21 Batch 38000] loss 3.26, ppl 25.93, throughput 5571.40 samples/s
[Epoch 21 Batch 39000] loss 3.25, ppl 25.70, throughput 5757.24 samples/s
[Epoch 21 Batch 40000] loss 3.25, ppl 25.81, throughput 5706.36 samples/s
[Epoch 21 Batch 41000] loss 3.25, ppl 25.77, throughput 5586.82 samples/s
[Epoch 21 Batch 42000] loss 3.25, ppl 25.80, throughput 5731.14 samples/s
[Epoch 21 Batch 43000] loss 3.25, ppl 25.79, throughput 5786.09 samples/s
[Epoch 21 Batch 44000] loss 3.25, ppl 25.86, throughput 5739.91 samples/s
[Epoch 21 Batch 45000] loss 3.25, ppl 25.85, throughput 5564.60 samples/s
[Epoch 21 Batch 46000] loss 3.25, ppl 25.87, throughput 5717.95 samples/s
[Epoch 21 Batch 47000] loss 3.25, ppl 25.77, throughput 5749.30 samples/s
[Epoch 21 Batch 48000] loss 3.25, ppl 25.75, throughput 5490.75 samples/s
[Epoch 21 Batch 49000] loss 3.25, ppl 25.78, throughput 4469.71 samples/s
[Epoch 21 Batch 50000] loss 3.25, ppl 25.89, throughput 4639.30 samples/s
[Epoch 21 Batch 51000] loss 3.25, ppl 25.88, throughput 4617.93 samples/s
[Epoch 21 Batch 52000] loss 3.25, ppl 25.86, throughput 4467.01 samples/s
[Epoch 21 Batch 53000] loss 3.26, ppl 25.95, throughput 4616.48 samples/s
[Epoch 21 Batch 54000] loss 3.25, ppl 25.89, throughput 5207.72 samples/s
[Epoch 21 Batch 55000] loss 3.25, ppl 25.78, throughput 5772.39 samples/s
[Epoch 21 Batch 56000] loss 3.25, ppl 25.84, throughput 5574.42 samples/s
[Epoch 21 Batch 57000] loss 3.25, ppl 25.77, throughput 5713.29 samples/s
[Epoch 21 Batch 58000] loss 3.25, ppl 25.82, throughput 5693.90 samples/s
[Epoch 21 Batch 59000] loss 3.25, ppl 25.85, throughput 5704.96 samples/s
[Epoch 21 Batch 60000] loss 3.25, ppl 25.86, throughput 5599.48 samples/s
[Epoch 21 Batch 61000] loss 3.25, ppl 25.83, throughput 5786.97 samples/s
[Epoch 21 Batch 62000] loss 3.25, ppl 25.80, throughput 5733.66 samples/s
[Epoch 21 Batch 63000] loss 3.25, ppl 25.80, throughput 5740.38 samples/s
[Epoch 21 Batch 64000] loss 3.25, ppl 25.89, throughput 5601.73 samples/s
[Epoch 21 Batch 65000] loss 3.25, ppl 25.84, throughput 5723.65 samples/s
[Epoch 21 Batch 66000] loss 3.25, ppl 25.87, throughput 5688.80 samples/s
[Epoch 21 Batch 67000] loss 3.26, ppl 25.92, throughput 5575.32 samples/s
[Epoch 21 Batch 68000] loss 3.26, ppl 25.96, throughput 5778.49 samples/s
[Epoch 21 Batch 69000] loss 3.25, ppl 25.86, throughput 5722.12 samples/s
[Epoch 21 Batch 70000] loss 3.25, ppl 25.90, throughput 5734.58 samples/s
[Epoch 21 Batch 71000] loss 3.26, ppl 25.97, throughput 5566.91 samples/s
[Epoch 21 Batch 72000] loss 3.25, ppl 25.88, throughput 5769.27 samples/s
[Epoch 21 Batch 73000] loss 3.26, ppl 25.95, throughput 5838.05 samples/s
[Epoch 21 Batch 74000] loss 3.26, ppl 25.93, throughput 5690.60 samples/s
[Epoch 21 Batch 75000] loss 3.26, ppl 26.02, throughput 5576.01 samples/s
[Epoch 21 Batch 76000] loss 3.26, ppl 25.95, throughput 5736.50 samples/s
[Epoch 21 Batch 77000] loss 3.25, ppl 25.91, throughput 5772.40 samples/s
[Epoch 21 Batch 78000] loss 3.25, ppl 25.90, throughput 5708.97 samples/s
Epoch 21 took 7364.35 seconds.
[Epoch 22 Batch 1000] loss 3.23, ppl 25.17, throughput 5586.10 samples/s
[Epoch 22 Batch 2000] loss 3.24, ppl 25.53, throughput 5724.17 samples/s
[Epoch 22 Batch 3000] loss 3.23, ppl 25.30, throughput 5762.21 samples/s
[Epoch 22 Batch 4000] loss 3.23, ppl 25.24, throughput 5606.43 samples/s
[Epoch 22 Batch 5000] loss 3.24, ppl 25.60, throughput 5521.45 samples/s
[Epoch 22 Batch 6000] loss 3.24, ppl 25.51, throughput 5335.92 samples/s
[Epoch 22 Batch 7000] loss 3.24, ppl 25.54, throughput 5365.32 samples/s
[Epoch 22 Batch 8000] loss 3.24, ppl 25.53, throughput 5171.46 samples/s
[Epoch 22 Batch 9000] loss 3.24, ppl 25.58, throughput 5292.77 samples/s
[Epoch 22 Batch 10000] loss 3.24, ppl 25.49, throughput 5337.53 samples/s
[Epoch 22 Batch 11000] loss 3.24, ppl 25.44, throughput 5340.44 samples/s
[Epoch 22 Batch 12000] loss 3.23, ppl 25.38, throughput 5530.47 samples/s
[Epoch 22 Batch 13000] loss 3.24, ppl 25.61, throughput 5774.56 samples/s
[Epoch 22 Batch 14000] loss 3.25, ppl 25.69, throughput 5776.11 samples/s
[Epoch 22 Batch 15000] loss 3.24, ppl 25.63, throughput 5534.21 samples/s
[Epoch 22 Batch 16000] loss 3.24, ppl 25.62, throughput 5700.05 samples/s
[Epoch 22 Batch 17000] loss 3.24, ppl 25.50, throughput 5764.83 samples/s
[Epoch 22 Batch 18000] loss 3.24, ppl 25.48, throughput 5773.04 samples/s
[Epoch 22 Batch 19000] loss 3.24, ppl 25.63, throughput 5558.59 samples/s
[Epoch 22 Batch 20000] loss 3.25, ppl 25.71, throughput 5692.38 samples/s
[Epoch 22 Batch 21000] loss 3.24, ppl 25.47, throughput 5788.99 samples/s
[Epoch 22 Batch 22000] loss 3.24, ppl 25.44, throughput 4849.90 samples/s
[Epoch 22 Batch 23000] loss 3.24, ppl 25.61, throughput 4512.16 samples/s
[Epoch 22 Batch 24000] loss 3.25, ppl 25.67, throughput 4595.10 samples/s
[Epoch 22 Batch 25000] loss 3.24, ppl 25.64, throughput 4574.91 samples/s
[Epoch 22 Batch 26000] loss 3.24, ppl 25.59, throughput 4595.12 samples/s
[Epoch 22 Batch 27000] loss 3.25, ppl 25.69, throughput 4662.87 samples/s
[Epoch 22 Batch 28000] loss 3.25, ppl 25.66, throughput 5735.85 samples/s
[Epoch 22 Batch 29000] loss 3.25, ppl 25.69, throughput 5734.86 samples/s
[Epoch 22 Batch 30000] loss 3.25, ppl 25.66, throughput 5620.24 samples/s
[Epoch 22 Batch 31000] loss 3.25, ppl 25.78, throughput 5751.36 samples/s
[Epoch 22 Batch 32000] loss 3.25, ppl 25.73, throughput 5759.02 samples/s
[Epoch 22 Batch 33000] loss 3.24, ppl 25.56, throughput 5752.10 samples/s
[Epoch 22 Batch 34000] loss 3.25, ppl 25.80, throughput 5510.94 samples/s
[Epoch 22 Batch 35000] loss 3.24, ppl 25.64, throughput 5737.79 samples/s
[Epoch 22 Batch 36000] loss 3.24, ppl 25.62, throughput 5750.30 samples/s
[Epoch 22 Batch 37000] loss 3.24, ppl 25.63, throughput 5756.05 samples/s
[Epoch 22 Batch 38000] loss 3.25, ppl 25.71, throughput 5572.03 samples/s
[Epoch 22 Batch 39000] loss 3.24, ppl 25.66, throughput 5769.11 samples/s
[Epoch 22 Batch 40000] loss 3.24, ppl 25.64, throughput 5728.54 samples/s
[Epoch 22 Batch 41000] loss 3.25, ppl 25.68, throughput 5558.56 samples/s
[Epoch 22 Batch 42000] loss 3.24, ppl 25.60, throughput 5715.14 samples/s
[Epoch 22 Batch 43000] loss 3.25, ppl 25.70, throughput 5712.10 samples/s
[Epoch 22 Batch 44000] loss 3.25, ppl 25.69, throughput 5713.89 samples/s
[Epoch 22 Batch 45000] loss 3.24, ppl 25.61, throughput 5553.65 samples/s
[Epoch 22 Batch 46000] loss 3.25, ppl 25.68, throughput 5743.81 samples/s
[Epoch 22 Batch 47000] loss 3.24, ppl 25.65, throughput 5722.96 samples/s
[Epoch 22 Batch 48000] loss 3.25, ppl 25.82, throughput 5710.64 samples/s
[Epoch 22 Batch 49000] loss 3.25, ppl 25.75, throughput 5606.01 samples/s
[Epoch 22 Batch 50000] loss 3.25, ppl 25.79, throughput 5746.33 samples/s
[Epoch 22 Batch 51000] loss 3.25, ppl 25.73, throughput 5776.00 samples/s
[Epoch 22 Batch 52000] loss 3.25, ppl 25.78, throughput 5713.86 samples/s
[Epoch 22 Batch 53000] loss 3.25, ppl 25.70, throughput 5626.99 samples/s
[Epoch 22 Batch 54000] loss 3.25, ppl 25.82, throughput 5746.29 samples/s
[Epoch 22 Batch 55000] loss 3.25, ppl 25.76, throughput 5774.36 samples/s
[Epoch 22 Batch 56000] loss 3.25, ppl 25.82, throughput 5586.80 samples/s
[Epoch 22 Batch 57000] loss 3.25, ppl 25.73, throughput 5740.75 samples/s
[Epoch 22 Batch 58000] loss 3.25, ppl 25.83, throughput 5791.29 samples/s
[Epoch 22 Batch 59000] loss 3.25, ppl 25.73, throughput 5714.82 samples/s
[Epoch 22 Batch 60000] loss 3.25, ppl 25.82, throughput 5566.46 samples/s
[Epoch 22 Batch 61000] loss 3.25, ppl 25.72, throughput 5759.25 samples/s
[Epoch 22 Batch 62000] loss 3.25, ppl 25.76, throughput 5681.95 samples/s
[Epoch 22 Batch 63000] loss 3.25, ppl 25.85, throughput 5736.83 samples/s
[Epoch 22 Batch 64000] loss 3.25, ppl 25.72, throughput 5527.44 samples/s
[Epoch 22 Batch 65000] loss 3.25, ppl 25.91, throughput 5721.09 samples/s
[Epoch 22 Batch 66000] loss 3.25, ppl 25.84, throughput 5708.49 samples/s
[Epoch 22 Batch 67000] loss 3.25, ppl 25.82, throughput 5579.88 samples/s
[Epoch 22 Batch 68000] loss 3.25, ppl 25.71, throughput 5748.70 samples/s
[Epoch 22 Batch 69000] loss 3.25, ppl 25.76, throughput 5754.67 samples/s
[Epoch 22 Batch 70000] loss 3.25, ppl 25.81, throughput 5763.81 samples/s
[Epoch 22 Batch 71000] loss 3.25, ppl 25.75, throughput 5575.36 samples/s
[Epoch 22 Batch 72000] loss 3.25, ppl 25.84, throughput 5710.73 samples/s
[Epoch 22 Batch 73000] loss 3.25, ppl 25.81, throughput 5757.16 samples/s
[Epoch 22 Batch 74000] loss 3.25, ppl 25.77, throughput 5174.21 samples/s
[Epoch 22 Batch 75000] loss 3.25, ppl 25.79, throughput 4503.87 samples/s
[Epoch 22 Batch 76000] loss 3.25, ppl 25.75, throughput 4651.40 samples/s
[Epoch 22 Batch 77000] loss 3.25, ppl 25.77, throughput 4620.15 samples/s
[Epoch 22 Batch 78000] loss 3.25, ppl 25.81, throughput 4645.93 samples/s
Epoch 22 took 7279.61 seconds.
[Epoch 23 Batch 1000] loss 3.20, ppl 24.42, throughput 4698.58 samples/s
[Epoch 23 Batch 2000] loss 3.23, ppl 25.41, throughput 5774.39 samples/s
[Epoch 23 Batch 3000] loss 3.23, ppl 25.38, throughput 5748.27 samples/s
[Epoch 23 Batch 4000] loss 3.24, ppl 25.45, throughput 5303.93 samples/s
[Epoch 23 Batch 5000] loss 3.23, ppl 25.34, throughput 5299.09 samples/s
[Epoch 23 Batch 6000] loss 3.23, ppl 25.24, throughput 5326.35 samples/s
[Epoch 23 Batch 7000] loss 3.23, ppl 25.39, throughput 5280.85 samples/s
[Epoch 23 Batch 8000] loss 3.23, ppl 25.18, throughput 5157.95 samples/s
[Epoch 23 Batch 9000] loss 3.24, ppl 25.52, throughput 5301.15 samples/s
[Epoch 23 Batch 10000] loss 3.23, ppl 25.38, throughput 5436.67 samples/s
[Epoch 23 Batch 11000] loss 3.23, ppl 25.35, throughput 5750.48 samples/s
[Epoch 23 Batch 12000] loss 3.24, ppl 25.42, throughput 5577.86 samples/s
[Epoch 23 Batch 13000] loss 3.24, ppl 25.52, throughput 5777.01 samples/s
[Epoch 23 Batch 14000] loss 3.24, ppl 25.49, throughput 5739.63 samples/s
[Epoch 23 Batch 15000] loss 3.24, ppl 25.51, throughput 5558.61 samples/s
[Epoch 23 Batch 16000] loss 3.24, ppl 25.54, throughput 5723.18 samples/s
[Epoch 23 Batch 17000] loss 3.24, ppl 25.49, throughput 5757.39 samples/s
[Epoch 23 Batch 18000] loss 3.24, ppl 25.42, throughput 5712.95 samples/s
[Epoch 23 Batch 19000] loss 3.23, ppl 25.37, throughput 5560.61 samples/s
[Epoch 23 Batch 20000] loss 3.24, ppl 25.56, throughput 5734.15 samples/s
[Epoch 23 Batch 21000] loss 3.24, ppl 25.58, throughput 5714.69 samples/s
[Epoch 23 Batch 22000] loss 3.24, ppl 25.51, throughput 5757.97 samples/s
[Epoch 23 Batch 23000] loss 3.24, ppl 25.46, throughput 5578.99 samples/s
[Epoch 23 Batch 24000] loss 3.24, ppl 25.45, throughput 5754.30 samples/s
[Epoch 23 Batch 25000] loss 3.24, ppl 25.58, throughput 5734.05 samples/s
[Epoch 23 Batch 26000] loss 3.24, ppl 25.59, throughput 5533.27 samples/s
[Epoch 23 Batch 27000] loss 3.24, ppl 25.51, throughput 5735.41 samples/s
[Epoch 23 Batch 28000] loss 3.24, ppl 25.45, throughput 5732.62 samples/s
[Epoch 23 Batch 29000] loss 3.24, ppl 25.48, throughput 5772.84 samples/s
[Epoch 23 Batch 30000] loss 3.24, ppl 25.60, throughput 5582.64 samples/s
[Epoch 23 Batch 31000] loss 3.24, ppl 25.60, throughput 5735.86 samples/s
[Epoch 23 Batch 32000] loss 3.24, ppl 25.54, throughput 5762.04 samples/s
[Epoch 23 Batch 33000] loss 3.24, ppl 25.41, throughput 5759.10 samples/s
[Epoch 23 Batch 34000] loss 3.24, ppl 25.58, throughput 5601.60 samples/s
[Epoch 23 Batch 35000] loss 3.24, ppl 25.58, throughput 5744.39 samples/s
[Epoch 23 Batch 36000] loss 3.25, ppl 25.67, throughput 5758.09 samples/s
[Epoch 23 Batch 37000] loss 3.24, ppl 25.43, throughput 5720.98 samples/s
[Epoch 23 Batch 38000] loss 3.24, ppl 25.55, throughput 5497.89 samples/s
[Epoch 23 Batch 39000] loss 3.24, ppl 25.42, throughput 5742.25 samples/s
[Epoch 23 Batch 40000] loss 3.24, ppl 25.52, throughput 5701.12 samples/s
[Epoch 23 Batch 41000] loss 3.24, ppl 25.63, throughput 5583.78 samples/s
[Epoch 23 Batch 42000] loss 3.24, ppl 25.61, throughput 5754.77 samples/s
[Epoch 23 Batch 43000] loss 3.25, ppl 25.69, throughput 5724.32 samples/s
[Epoch 23 Batch 44000] loss 3.24, ppl 25.58, throughput 5755.87 samples/s
[Epoch 23 Batch 45000] loss 3.24, ppl 25.54, throughput 5515.36 samples/s
[Epoch 23 Batch 46000] loss 3.24, ppl 25.54, throughput 5798.67 samples/s
[Epoch 23 Batch 47000] loss 3.24, ppl 25.60, throughput 5696.46 samples/s
[Epoch 23 Batch 48000] loss 3.24, ppl 25.53, throughput 4650.71 samples/s
[Epoch 23 Batch 49000] loss 3.24, ppl 25.60, throughput 4495.92 samples/s
[Epoch 23 Batch 50000] loss 3.24, ppl 25.56, throughput 4658.89 samples/s
[Epoch 23 Batch 51000] loss 3.24, ppl 25.58, throughput 4607.39 samples/s
[Epoch 23 Batch 52000] loss 3.25, ppl 25.72, throughput 4568.85 samples/s
[Epoch 23 Batch 53000] loss 3.25, ppl 25.77, throughput 4750.18 samples/s
[Epoch 23 Batch 54000] loss 3.25, ppl 25.79, throughput 5673.23 samples/s
[Epoch 23 Batch 55000] loss 3.25, ppl 25.67, throughput 5626.01 samples/s
[Epoch 23 Batch 56000] loss 3.24, ppl 25.66, throughput 5507.30 samples/s
[Epoch 23 Batch 57000] loss 3.25, ppl 25.68, throughput 5762.44 samples/s
[Epoch 23 Batch 58000] loss 3.24, ppl 25.62, throughput 5717.00 samples/s
[Epoch 23 Batch 59000] loss 3.25, ppl 25.74, throughput 5787.89 samples/s
[Epoch 23 Batch 60000] loss 3.24, ppl 25.63, throughput 5538.26 samples/s
[Epoch 23 Batch 61000] loss 3.25, ppl 25.71, throughput 5682.69 samples/s
[Epoch 23 Batch 62000] loss 3.25, ppl 25.70, throughput 5707.77 samples/s
[Epoch 23 Batch 63000] loss 3.25, ppl 25.72, throughput 5734.08 samples/s
[Epoch 23 Batch 64000] loss 3.25, ppl 25.70, throughput 5580.23 samples/s
[Epoch 23 Batch 65000] loss 3.25, ppl 25.75, throughput 5754.44 samples/s
[Epoch 23 Batch 66000] loss 3.25, ppl 25.70, throughput 5705.74 samples/s
[Epoch 23 Batch 67000] loss 3.25, ppl 25.73, throughput 5586.11 samples/s
[Epoch 23 Batch 68000] loss 3.25, ppl 25.73, throughput 5700.44 samples/s
[Epoch 23 Batch 69000] loss 3.25, ppl 25.71, throughput 5742.91 samples/s
[Epoch 23 Batch 70000] loss 3.25, ppl 25.68, throughput 5716.23 samples/s
[Epoch 23 Batch 71000] loss 3.25, ppl 25.67, throughput 5554.62 samples/s
[Epoch 23 Batch 72000] loss 3.25, ppl 25.74, throughput 5757.14 samples/s
[Epoch 23 Batch 73000] loss 3.25, ppl 25.75, throughput 5739.93 samples/s
[Epoch 23 Batch 74000] loss 3.25, ppl 25.70, throughput 5702.09 samples/s
[Epoch 23 Batch 75000] loss 3.25, ppl 25.77, throughput 5573.16 samples/s
[Epoch 23 Batch 76000] loss 3.25, ppl 25.80, throughput 5728.09 samples/s
[Epoch 23 Batch 77000] loss 3.25, ppl 25.75, throughput 5764.86 samples/s
[Epoch 23 Batch 78000] loss 3.25, ppl 25.80, throughput 5751.53 samples/s
Epoch 23 took 7212.27 seconds.
[Epoch 24 Batch 1000] loss 3.23, ppl 25.27, throughput 5565.46 samples/s
[Epoch 24 Batch 2000] loss 3.22, ppl 24.90, throughput 5729.17 samples/s
[Epoch 24 Batch 3000] loss 3.22, ppl 25.13, throughput 5699.27 samples/s
[Epoch 24 Batch 4000] loss 3.22, ppl 25.00, throughput 5277.63 samples/s
[Epoch 24 Batch 5000] loss 3.23, ppl 25.36, throughput 5304.09 samples/s
[Epoch 24 Batch 6000] loss 3.23, ppl 25.16, throughput 5206.14 samples/s
[Epoch 24 Batch 7000] loss 3.23, ppl 25.19, throughput 4633.74 samples/s
[Epoch 24 Batch 8000] loss 3.23, ppl 25.29, throughput 4531.17 samples/s
[Epoch 24 Batch 9000] loss 3.23, ppl 25.24, throughput 4749.37 samples/s
[Epoch 24 Batch 10000] loss 3.23, ppl 25.28, throughput 5592.03 samples/s
[Epoch 24 Batch 11000] loss 3.22, ppl 25.08, throughput 5720.69 samples/s
[Epoch 24 Batch 12000] loss 3.23, ppl 25.34, throughput 5535.00 samples/s
[Epoch 24 Batch 13000] loss 3.23, ppl 25.23, throughput 5768.24 samples/s
[Epoch 24 Batch 14000] loss 3.23, ppl 25.35, throughput 5743.78 samples/s
[Epoch 24 Batch 15000] loss 3.23, ppl 25.39, throughput 5611.25 samples/s
[Epoch 24 Batch 16000] loss 3.24, ppl 25.41, throughput 5725.23 samples/s
[Epoch 24 Batch 17000] loss 3.23, ppl 25.34, throughput 5767.31 samples/s
[Epoch 24 Batch 18000] loss 3.24, ppl 25.50, throughput 5754.58 samples/s
[Epoch 24 Batch 19000] loss 3.23, ppl 25.39, throughput 5556.89 samples/s
[Epoch 24 Batch 20000] loss 3.23, ppl 25.40, throughput 5546.42 samples/s
[Epoch 24 Batch 21000] loss 3.23, ppl 25.37, throughput 4635.49 samples/s
[Epoch 24 Batch 22000] loss 3.24, ppl 25.52, throughput 4590.48 samples/s
[Epoch 24 Batch 23000] loss 3.24, ppl 25.46, throughput 4437.95 samples/s
[Epoch 24 Batch 24000] loss 3.23, ppl 25.34, throughput 4546.07 samples/s
[Epoch 24 Batch 25000] loss 3.23, ppl 25.36, throughput 4564.69 samples/s
[Epoch 24 Batch 26000] loss 3.24, ppl 25.45, throughput 5070.14 samples/s
[Epoch 24 Batch 27000] loss 3.24, ppl 25.44, throughput 5597.93 samples/s
[Epoch 24 Batch 28000] loss 3.23, ppl 25.39, throughput 5679.20 samples/s
[Epoch 24 Batch 29000] loss 3.24, ppl 25.57, throughput 5737.78 samples/s
[Epoch 24 Batch 30000] loss 3.23, ppl 25.40, throughput 5621.20 samples/s
[Epoch 24 Batch 31000] loss 3.24, ppl 25.52, throughput 5745.64 samples/s
[Epoch 24 Batch 32000] loss 3.24, ppl 25.50, throughput 5692.00 samples/s
[Epoch 24 Batch 33000] loss 3.24, ppl 25.41, throughput 5735.37 samples/s
[Epoch 24 Batch 34000] loss 3.24, ppl 25.41, throughput 5544.49 samples/s
[Epoch 24 Batch 35000] loss 3.23, ppl 25.41, throughput 5758.68 samples/s
[Epoch 24 Batch 36000] loss 3.24, ppl 25.48, throughput 5721.22 samples/s
[Epoch 24 Batch 37000] loss 3.24, ppl 25.50, throughput 5760.73 samples/s
[Epoch 24 Batch 38000] loss 3.24, ppl 25.44, throughput 5539.69 samples/s
[Epoch 24 Batch 39000] loss 3.24, ppl 25.60, throughput 5740.48 samples/s
[Epoch 24 Batch 40000] loss 3.24, ppl 25.45, throughput 5737.12 samples/s
[Epoch 24 Batch 41000] loss 3.24, ppl 25.51, throughput 5588.07 samples/s
[Epoch 24 Batch 42000] loss 3.24, ppl 25.54, throughput 5715.82 samples/s
[Epoch 24 Batch 43000] loss 3.24, ppl 25.50, throughput 5734.29 samples/s
[Epoch 24 Batch 44000] loss 3.24, ppl 25.58, throughput 5727.00 samples/s
[Epoch 24 Batch 45000] loss 3.24, ppl 25.52, throughput 5531.05 samples/s
[Epoch 24 Batch 46000] loss 3.24, ppl 25.61, throughput 5747.32 samples/s
[Epoch 24 Batch 47000] loss 3.24, ppl 25.58, throughput 5730.37 samples/s
[Epoch 24 Batch 48000] loss 3.24, ppl 25.54, throughput 5700.02 samples/s
[Epoch 24 Batch 49000] loss 3.24, ppl 25.49, throughput 5603.32 samples/s
[Epoch 24 Batch 50000] loss 3.24, ppl 25.54, throughput 5738.23 samples/s
[Epoch 24 Batch 51000] loss 3.24, ppl 25.58, throughput 5708.62 samples/s
[Epoch 24 Batch 52000] loss 3.24, ppl 25.50, throughput 5777.99 samples/s
[Epoch 24 Batch 53000] loss 3.24, ppl 25.50, throughput 5588.33 samples/s
[Epoch 24 Batch 54000] loss 3.24, ppl 25.50, throughput 5712.52 samples/s
[Epoch 24 Batch 55000] loss 3.24, ppl 25.55, throughput 5754.65 samples/s
[Epoch 24 Batch 56000] loss 3.24, ppl 25.59, throughput 5559.34 samples/s
[Epoch 24 Batch 57000] loss 3.24, ppl 25.53, throughput 5750.70 samples/s
[Epoch 24 Batch 58000] loss 3.24, ppl 25.64, throughput 5758.61 samples/s
[Epoch 24 Batch 59000] loss 3.24, ppl 25.63, throughput 5783.28 samples/s
[Epoch 24 Batch 60000] loss 3.24, ppl 25.65, throughput 5562.83 samples/s
[Epoch 24 Batch 61000] loss 3.24, ppl 25.58, throughput 5729.51 samples/s
[Epoch 24 Batch 62000] loss 3.24, ppl 25.65, throughput 5706.00 samples/s
[Epoch 24 Batch 63000] loss 3.24, ppl 25.62, throughput 5713.54 samples/s
[Epoch 24 Batch 64000] loss 3.24, ppl 25.52, throughput 5554.61 samples/s
[Epoch 24 Batch 65000] loss 3.24, ppl 25.59, throughput 5681.27 samples/s
[Epoch 24 Batch 66000] loss 3.24, ppl 25.60, throughput 5740.90 samples/s
[Epoch 24 Batch 67000] loss 3.24, ppl 25.63, throughput 5535.85 samples/s
[Epoch 24 Batch 68000] loss 3.25, ppl 25.69, throughput 5768.11 samples/s
[Epoch 24 Batch 69000] loss 3.25, ppl 25.78, throughput 5747.17 samples/s
[Epoch 24 Batch 70000] loss 3.25, ppl 25.67, throughput 5752.04 samples/s
[Epoch 24 Batch 71000] loss 3.24, ppl 25.65, throughput 5572.72 samples/s
[Epoch 24 Batch 72000] loss 3.24, ppl 25.58, throughput 5705.85 samples/s
[Epoch 24 Batch 73000] loss 3.24, ppl 25.64, throughput 4745.36 samples/s
[Epoch 24 Batch 74000] loss 3.24, ppl 25.60, throughput 4583.79 samples/s
[Epoch 24 Batch 75000] loss 3.24, ppl 25.62, throughput 4474.14 samples/s
[Epoch 24 Batch 76000] loss 3.25, ppl 25.70, throughput 4610.61 samples/s
[Epoch 24 Batch 77000] loss 3.24, ppl 25.60, throughput 4603.75 samples/s
[Epoch 24 Batch 78000] loss 3.24, ppl 25.60, throughput 4814.16 samples/s
Epoch 24 took 7360.23 seconds.
[Epoch 25 Batch 1000] loss 3.23, ppl 25.25, throughput 5576.50 samples/s
[Epoch 25 Batch 2000] loss 3.22, ppl 25.10, throughput 5456.40 samples/s
[Epoch 25 Batch 3000] loss 3.22, ppl 25.12, throughput 5258.96 samples/s
[Epoch 25 Batch 4000] loss 3.22, ppl 25.07, throughput 5215.85 samples/s
[Epoch 25 Batch 5000] loss 3.22, ppl 25.09, throughput 5300.41 samples/s
[Epoch 25 Batch 6000] loss 3.22, ppl 25.09, throughput 5344.00 samples/s
[Epoch 25 Batch 7000] loss 3.23, ppl 25.32, throughput 5306.12 samples/s
[Epoch 25 Batch 8000] loss 3.23, ppl 25.26, throughput 5299.32 samples/s
[Epoch 25 Batch 9000] loss 3.23, ppl 25.21, throughput 5813.74 samples/s
[Epoch 25 Batch 10000] loss 3.23, ppl 25.26, throughput 5777.86 samples/s
[Epoch 25 Batch 11000] loss 3.22, ppl 24.94, throughput 5711.70 samples/s
[Epoch 25 Batch 12000] loss 3.23, ppl 25.25, throughput 5589.84 samples/s
[Epoch 25 Batch 13000] loss 3.23, ppl 25.20, throughput 5731.16 samples/s
[Epoch 25 Batch 14000] loss 3.22, ppl 25.09, throughput 5703.91 samples/s
[Epoch 25 Batch 15000] loss 3.23, ppl 25.26, throughput 5591.45 samples/s
[Epoch 25 Batch 16000] loss 3.23, ppl 25.39, throughput 5700.77 samples/s
[Epoch 25 Batch 17000] loss 3.23, ppl 25.17, throughput 5708.57 samples/s
[Epoch 25 Batch 18000] loss 3.22, ppl 25.15, throughput 5724.15 samples/s
[Epoch 25 Batch 19000] loss 3.22, ppl 25.09, throughput 5620.86 samples/s
[Epoch 25 Batch 20000] loss 3.23, ppl 25.23, throughput 5713.62 samples/s
[Epoch 25 Batch 21000] loss 3.24, ppl 25.41, throughput 5703.88 samples/s
[Epoch 25 Batch 22000] loss 3.24, ppl 25.43, throughput 5742.29 samples/s
[Epoch 25 Batch 23000] loss 3.23, ppl 25.24, throughput 5580.52 samples/s
[Epoch 25 Batch 24000] loss 3.23, ppl 25.32, throughput 5785.75 samples/s
[Epoch 25 Batch 25000] loss 3.23, ppl 25.29, throughput 5699.07 samples/s
[Epoch 25 Batch 26000] loss 3.24, ppl 25.42, throughput 5715.27 samples/s
[Epoch 25 Batch 27000] loss 3.23, ppl 25.32, throughput 5584.54 samples/s
[Epoch 25 Batch 28000] loss 3.23, ppl 25.31, throughput 5717.52 samples/s
[Epoch 25 Batch 29000] loss 3.23, ppl 25.31, throughput 5718.62 samples/s
[Epoch 25 Batch 30000] loss 3.23, ppl 25.35, throughput 5616.40 samples/s
[Epoch 25 Batch 31000] loss 3.23, ppl 25.32, throughput 5728.38 samples/s
[Epoch 25 Batch 32000] loss 3.23, ppl 25.28, throughput 5772.27 samples/s
[Epoch 25 Batch 33000] loss 3.23, ppl 25.38, throughput 5719.68 samples/s
[Epoch 25 Batch 34000] loss 3.23, ppl 25.37, throughput 5565.93 samples/s
[Epoch 25 Batch 35000] loss 3.23, ppl 25.25, throughput 5729.62 samples/s
[Epoch 25 Batch 36000] loss 3.23, ppl 25.36, throughput 5786.19 samples/s
[Epoch 25 Batch 37000] loss 3.23, ppl 25.37, throughput 5718.67 samples/s
[Epoch 25 Batch 38000] loss 3.24, ppl 25.48, throughput 5551.79 samples/s
[Epoch 25 Batch 39000] loss 3.24, ppl 25.45, throughput 5696.39 samples/s
[Epoch 25 Batch 40000] loss 3.24, ppl 25.47, throughput 5744.46 samples/s
[Epoch 25 Batch 41000] loss 3.24, ppl 25.49, throughput 5643.39 samples/s
[Epoch 25 Batch 42000] loss 3.24, ppl 25.45, throughput 5771.57 samples/s
[Epoch 25 Batch 43000] loss 3.23, ppl 25.39, throughput 5700.75 samples/s
[Epoch 25 Batch 44000] loss 3.24, ppl 25.45, throughput 5708.31 samples/s
[Epoch 25 Batch 45000] loss 3.24, ppl 25.48, throughput 5585.53 samples/s
[Epoch 25 Batch 46000] loss 3.24, ppl 25.52, throughput 5118.14 samples/s
[Epoch 25 Batch 47000] loss 3.23, ppl 25.39, throughput 4624.41 samples/s
[Epoch 25 Batch 48000] loss 3.24, ppl 25.55, throughput 4610.68 samples/s
[Epoch 25 Batch 49000] loss 3.24, ppl 25.48, throughput 4470.29 samples/s
[Epoch 25 Batch 50000] loss 3.24, ppl 25.51, throughput 4585.95 samples/s
[Epoch 25 Batch 51000] loss 3.24, ppl 25.46, throughput 4576.30 samples/s
[Epoch 25 Batch 52000] loss 3.24, ppl 25.54, throughput 5539.67 samples/s
[Epoch 25 Batch 53000] loss 3.24, ppl 25.50, throughput 5561.88 samples/s
[Epoch 25 Batch 54000] loss 3.24, ppl 25.47, throughput 5732.04 samples/s
[Epoch 25 Batch 55000] loss 3.24, ppl 25.49, throughput 5732.94 samples/s
[Epoch 25 Batch 56000] loss 3.24, ppl 25.44, throughput 5546.11 samples/s
[Epoch 25 Batch 57000] loss 3.24, ppl 25.53, throughput 5775.74 samples/s
[Epoch 25 Batch 58000] loss 3.24, ppl 25.55, throughput 5726.19 samples/s
[Epoch 25 Batch 59000] loss 3.24, ppl 25.42, throughput 5745.88 samples/s
[Epoch 25 Batch 60000] loss 3.24, ppl 25.46, throughput 5564.25 samples/s
[Epoch 25 Batch 61000] loss 3.24, ppl 25.51, throughput 5705.52 samples/s
[Epoch 25 Batch 62000] loss 3.24, ppl 25.54, throughput 5712.23 samples/s
[Epoch 25 Batch 63000] loss 3.24, ppl 25.49, throughput 5667.94 samples/s
[Epoch 25 Batch 64000] loss 3.24, ppl 25.53, throughput 5574.84 samples/s
[Epoch 25 Batch 65000] loss 3.24, ppl 25.53, throughput 5727.13 samples/s
[Epoch 25 Batch 66000] loss 3.24, ppl 25.46, throughput 5701.26 samples/s
[Epoch 25 Batch 67000] loss 3.24, ppl 25.54, throughput 5610.88 samples/s
[Epoch 25 Batch 68000] loss 3.24, ppl 25.53, throughput 5727.26 samples/s
[Epoch 25 Batch 69000] loss 3.24, ppl 25.53, throughput 5729.43 samples/s
[Epoch 25 Batch 70000] loss 3.24, ppl 25.59, throughput 5767.52 samples/s
[Epoch 25 Batch 71000] loss 3.24, ppl 25.60, throughput 5568.89 samples/s
[Epoch 25 Batch 72000] loss 3.24, ppl 25.51, throughput 5740.18 samples/s
[Epoch 25 Batch 73000] loss 3.24, ppl 25.62, throughput 5715.63 samples/s
[Epoch 25 Batch 74000] loss 3.24, ppl 25.51, throughput 5729.76 samples/s
[Epoch 25 Batch 75000] loss 3.24, ppl 25.55, throughput 5545.32 samples/s
[Epoch 25 Batch 76000] loss 3.24, ppl 25.58, throughput 5713.05 samples/s
[Epoch 25 Batch 77000] loss 3.24, ppl 25.60, throughput 5736.76 samples/s
[Epoch 25 Batch 78000] loss 3.24, ppl 25.50, throughput 5703.04 samples/s
Epoch 25 took 7196.49 seconds.
[Epoch 26 Batch 1000] loss 3.23, ppl 25.22, throughput 5623.18 samples/s
[Epoch 26 Batch 2000] loss 3.22, ppl 25.12, throughput 5413.33 samples/s
[Epoch 26 Batch 3000] loss 3.22, ppl 25.05, throughput 5350.81 samples/s
[Epoch 26 Batch 4000] loss 3.21, ppl 24.68, throughput 5122.49 samples/s
[Epoch 26 Batch 5000] loss 3.23, ppl 25.19, throughput 4654.33 samples/s
[Epoch 26 Batch 6000] loss 3.22, ppl 25.02, throughput 4725.07 samples/s
[Epoch 26 Batch 7000] loss 3.23, ppl 25.15, throughput 5313.72 samples/s
[Epoch 26 Batch 8000] loss 3.23, ppl 25.22, throughput 5221.59 samples/s
[Epoch 26 Batch 9000] loss 3.22, ppl 25.04, throughput 5716.12 samples/s
[Epoch 26 Batch 10000] loss 3.22, ppl 25.04, throughput 5722.79 samples/s
[Epoch 26 Batch 11000] loss 3.23, ppl 25.22, throughput 5784.81 samples/s
[Epoch 26 Batch 12000] loss 3.23, ppl 25.17, throughput 5596.60 samples/s
[Epoch 26 Batch 13000] loss 3.23, ppl 25.21, throughput 5723.98 samples/s
[Epoch 26 Batch 14000] loss 3.22, ppl 25.10, throughput 5745.63 samples/s
[Epoch 26 Batch 15000] loss 3.23, ppl 25.25, throughput 5600.43 samples/s
[Epoch 26 Batch 16000] loss 3.23, ppl 25.21, throughput 5762.30 samples/s
[Epoch 26 Batch 17000] loss 3.23, ppl 25.20, throughput 5706.58 samples/s
[Epoch 26 Batch 18000] loss 3.23, ppl 25.25, throughput 5737.63 samples/s
[Epoch 26 Batch 19000] loss 3.23, ppl 25.36, throughput 4936.30 samples/s
[Epoch 26 Batch 20000] loss 3.23, ppl 25.28, throughput 4579.83 samples/s
[Epoch 26 Batch 21000] loss 3.22, ppl 25.09, throughput 4557.80 samples/s
[Epoch 26 Batch 22000] loss 3.23, ppl 25.22, throughput 4513.20 samples/s
[Epoch 26 Batch 23000] loss 3.23, ppl 25.23, throughput 4342.11 samples/s
[Epoch 26 Batch 24000] loss 3.23, ppl 25.21, throughput 4497.08 samples/s
[Epoch 26 Batch 25000] loss 3.23, ppl 25.25, throughput 5680.30 samples/s
[Epoch 26 Batch 26000] loss 3.23, ppl 25.22, throughput 5775.52 samples/s
[Epoch 26 Batch 27000] loss 3.23, ppl 25.26, throughput 5552.28 samples/s
[Epoch 26 Batch 28000] loss 3.23, ppl 25.38, throughput 5696.74 samples/s
[Epoch 26 Batch 29000] loss 3.23, ppl 25.26, throughput 5759.75 samples/s
[Epoch 26 Batch 30000] loss 3.23, ppl 25.34, throughput 5511.05 samples/s
[Epoch 26 Batch 31000] loss 3.23, ppl 25.31, throughput 5747.64 samples/s
[Epoch 26 Batch 32000] loss 3.23, ppl 25.25, throughput 5669.27 samples/s
[Epoch 26 Batch 33000] loss 3.23, ppl 25.34, throughput 5748.46 samples/s
[Epoch 26 Batch 34000] loss 3.23, ppl 25.35, throughput 5509.73 samples/s
[Epoch 26 Batch 35000] loss 3.23, ppl 25.36, throughput 5732.98 samples/s
[Epoch 26 Batch 36000] loss 3.23, ppl 25.31, throughput 5707.87 samples/s
[Epoch 26 Batch 37000] loss 3.23, ppl 25.20, throughput 5729.24 samples/s
[Epoch 26 Batch 38000] loss 3.23, ppl 25.35, throughput 5584.47 samples/s
[Epoch 26 Batch 39000] loss 3.23, ppl 25.33, throughput 5737.67 samples/s
[Epoch 26 Batch 40000] loss 3.23, ppl 25.34, throughput 5750.71 samples/s
[Epoch 26 Batch 41000] loss 3.23, ppl 25.30, throughput 5519.73 samples/s
[Epoch 26 Batch 42000] loss 3.24, ppl 25.42, throughput 5756.42 samples/s
[Epoch 26 Batch 43000] loss 3.23, ppl 25.31, throughput 5729.87 samples/s
[Epoch 26 Batch 44000] loss 3.23, ppl 25.31, throughput 5734.97 samples/s
[Epoch 26 Batch 45000] loss 3.23, ppl 25.35, throughput 5584.59 samples/s
[Epoch 26 Batch 46000] loss 3.23, ppl 25.28, throughput 5704.71 samples/s
[Epoch 26 Batch 47000] loss 3.23, ppl 25.36, throughput 5758.83 samples/s
[Epoch 26 Batch 48000] loss 3.23, ppl 25.26, throughput 5688.83 samples/s
[Epoch 26 Batch 49000] loss 3.24, ppl 25.41, throughput 5584.01 samples/s
[Epoch 26 Batch 50000] loss 3.23, ppl 25.39, throughput 5715.55 samples/s
[Epoch 26 Batch 51000] loss 3.23, ppl 25.41, throughput 5663.12 samples/s
[Epoch 26 Batch 52000] loss 3.23, ppl 25.30, throughput 5704.46 samples/s
[Epoch 26 Batch 53000] loss 3.23, ppl 25.40, throughput 5622.84 samples/s
[Epoch 26 Batch 54000] loss 3.23, ppl 25.38, throughput 5776.60 samples/s
[Epoch 26 Batch 55000] loss 3.24, ppl 25.41, throughput 5718.09 samples/s
[Epoch 26 Batch 56000] loss 3.23, ppl 25.36, throughput 5619.99 samples/s
[Epoch 26 Batch 57000] loss 3.24, ppl 25.46, throughput 5748.46 samples/s
[Epoch 26 Batch 58000] loss 3.24, ppl 25.48, throughput 5730.41 samples/s
[Epoch 26 Batch 59000] loss 3.24, ppl 25.51, throughput 5714.83 samples/s
[Epoch 26 Batch 60000] loss 3.24, ppl 25.42, throughput 5579.31 samples/s
[Epoch 26 Batch 61000] loss 3.24, ppl 25.44, throughput 5721.53 samples/s
[Epoch 26 Batch 62000] loss 3.24, ppl 25.43, throughput 5713.72 samples/s
[Epoch 26 Batch 63000] loss 3.24, ppl 25.43, throughput 5741.59 samples/s
[Epoch 26 Batch 64000] loss 3.23, ppl 25.40, throughput 5502.58 samples/s
[Epoch 26 Batch 65000] loss 3.24, ppl 25.44, throughput 5693.57 samples/s
[Epoch 26 Batch 66000] loss 3.23, ppl 25.40, throughput 5758.68 samples/s
[Epoch 26 Batch 67000] loss 3.23, ppl 25.37, throughput 5614.80 samples/s
[Epoch 26 Batch 68000] loss 3.24, ppl 25.47, throughput 5721.88 samples/s
[Epoch 26 Batch 69000] loss 3.24, ppl 25.42, throughput 5749.44 samples/s
[Epoch 26 Batch 70000] loss 3.23, ppl 25.40, throughput 5712.26 samples/s
[Epoch 26 Batch 71000] loss 3.24, ppl 25.42, throughput 5167.77 samples/s
[Epoch 26 Batch 72000] loss 3.23, ppl 25.40, throughput 4557.51 samples/s
[Epoch 26 Batch 73000] loss 3.24, ppl 25.44, throughput 4586.29 samples/s
[Epoch 26 Batch 74000] loss 3.23, ppl 25.36, throughput 4607.42 samples/s
[Epoch 26 Batch 75000] loss 3.24, ppl 25.44, throughput 4512.40 samples/s
[Epoch 26 Batch 76000] loss 3.24, ppl 25.44, throughput 4574.48 samples/s
[Epoch 26 Batch 77000] loss 3.24, ppl 25.43, throughput 5408.67 samples/s
[Epoch 26 Batch 78000] loss 3.24, ppl 25.49, throughput 5636.93 samples/s
Epoch 26 took 7356.13 seconds.
[Epoch 27 Batch 1000] loss 3.22, ppl 24.98, throughput 5589.05 samples/s
[Epoch 27 Batch 2000] loss 3.22, ppl 24.95, throughput 5758.85 samples/s
[Epoch 27 Batch 3000] loss 3.22, ppl 25.01, throughput 5724.75 samples/s
[Epoch 27 Batch 4000] loss 3.22, ppl 25.06, throughput 5573.52 samples/s
[Epoch 27 Batch 5000] loss 3.20, ppl 24.62, throughput 5745.32 samples/s
[Epoch 27 Batch 6000] loss 3.22, ppl 25.05, throughput 5761.02 samples/s
[Epoch 27 Batch 7000] loss 3.22, ppl 25.13, throughput 4719.73 samples/s
[Epoch 27 Batch 8000] loss 3.22, ppl 25.13, throughput 4631.14 samples/s
[Epoch 27 Batch 9000] loss 3.22, ppl 25.10, throughput 4995.13 samples/s
[Epoch 27 Batch 10000] loss 3.22, ppl 25.14, throughput 4617.16 samples/s
[Epoch 27 Batch 11000] loss 3.23, ppl 25.16, throughput 4698.14 samples/s
[Epoch 27 Batch 12000] loss 3.22, ppl 24.95, throughput 4695.62 samples/s
[Epoch 27 Batch 13000] loss 3.22, ppl 25.04, throughput 5692.58 samples/s
[Epoch 27 Batch 14000] loss 3.22, ppl 24.97, throughput 5710.63 samples/s
[Epoch 27 Batch 15000] loss 3.22, ppl 25.02, throughput 5570.94 samples/s
[Epoch 27 Batch 16000] loss 3.23, ppl 25.27, throughput 5697.23 samples/s
[Epoch 27 Batch 17000] loss 3.23, ppl 25.20, throughput 5717.54 samples/s
[Epoch 27 Batch 18000] loss 3.22, ppl 24.96, throughput 5743.84 samples/s
[Epoch 27 Batch 19000] loss 3.22, ppl 25.11, throughput 5581.48 samples/s
[Epoch 27 Batch 20000] loss 3.22, ppl 25.09, throughput 5739.89 samples/s
[Epoch 27 Batch 21000] loss 3.23, ppl 25.17, throughput 5716.60 samples/s
[Epoch 27 Batch 22000] loss 3.23, ppl 25.18, throughput 5684.81 samples/s
[Epoch 27 Batch 23000] loss 3.23, ppl 25.27, throughput 5575.15 samples/s
[Epoch 27 Batch 24000] loss 3.23, ppl 25.17, throughput 5777.49 samples/s
[Epoch 27 Batch 25000] loss 3.23, ppl 25.22, throughput 5732.47 samples/s
[Epoch 27 Batch 26000] loss 3.23, ppl 25.19, throughput 5715.42 samples/s
[Epoch 27 Batch 27000] loss 3.22, ppl 25.12, throughput 5547.71 samples/s
[Epoch 27 Batch 28000] loss 3.23, ppl 25.18, throughput 5734.58 samples/s
[Epoch 27 Batch 29000] loss 3.22, ppl 25.03, throughput 5750.08 samples/s
[Epoch 27 Batch 30000] loss 3.23, ppl 25.23, throughput 5579.58 samples/s
[Epoch 27 Batch 31000] loss 3.23, ppl 25.28, throughput 5694.47 samples/s
[Epoch 27 Batch 32000] loss 3.23, ppl 25.23, throughput 5772.28 samples/s
[Epoch 27 Batch 33000] loss 3.23, ppl 25.20, throughput 5748.96 samples/s
[Epoch 27 Batch 34000] loss 3.23, ppl 25.32, throughput 5548.76 samples/s
[Epoch 27 Batch 35000] loss 3.23, ppl 25.29, throughput 5774.08 samples/s
[Epoch 27 Batch 36000] loss 3.23, ppl 25.20, throughput 5732.36 samples/s
[Epoch 27 Batch 37000] loss 3.23, ppl 25.31, throughput 5225.88 samples/s
[Epoch 27 Batch 38000] loss 3.23, ppl 25.24, throughput 4486.21 samples/s
[Epoch 27 Batch 39000] loss 3.23, ppl 25.35, throughput 4606.67 samples/s
[Epoch 27 Batch 40000] loss 3.23, ppl 25.33, throughput 4576.38 samples/s
[Epoch 27 Batch 41000] loss 3.23, ppl 25.27, throughput 4499.47 samples/s
[Epoch 27 Batch 42000] loss 3.23, ppl 25.33, throughput 4605.51 samples/s
[Epoch 27 Batch 43000] loss 3.23, ppl 25.33, throughput 5498.99 samples/s
[Epoch 27 Batch 44000] loss 3.23, ppl 25.38, throughput 5758.27 samples/s
[Epoch 27 Batch 45000] loss 3.23, ppl 25.24, throughput 5555.47 samples/s
[Epoch 27 Batch 46000] loss 3.23, ppl 25.24, throughput 5745.48 samples/s
[Epoch 27 Batch 47000] loss 3.23, ppl 25.16, throughput 5758.50 samples/s
[Epoch 27 Batch 48000] loss 3.23, ppl 25.17, throughput 5726.27 samples/s
[Epoch 27 Batch 49000] loss 3.23, ppl 25.29, throughput 5573.15 samples/s
[Epoch 27 Batch 50000] loss 3.23, ppl 25.18, throughput 5703.12 samples/s
[Epoch 27 Batch 51000] loss 3.23, ppl 25.22, throughput 5710.53 samples/s
[Epoch 27 Batch 52000] loss 3.23, ppl 25.37, throughput 5722.77 samples/s
[Epoch 27 Batch 53000] loss 3.23, ppl 25.29, throughput 5592.97 samples/s
[Epoch 27 Batch 54000] loss 3.23, ppl 25.30, throughput 5733.67 samples/s
[Epoch 27 Batch 55000] loss 3.23, ppl 25.25, throughput 5726.58 samples/s
[Epoch 27 Batch 56000] loss 3.23, ppl 25.26, throughput 5600.88 samples/s
[Epoch 27 Batch 57000] loss 3.23, ppl 25.26, throughput 5735.10 samples/s
[Epoch 27 Batch 58000] loss 3.23, ppl 25.29, throughput 5716.87 samples/s
[Epoch 27 Batch 59000] loss 3.23, ppl 25.37, throughput 5768.67 samples/s
[Epoch 27 Batch 60000] loss 3.23, ppl 25.40, throughput 5530.65 samples/s
[Epoch 27 Batch 61000] loss 3.23, ppl 25.37, throughput 5733.79 samples/s
[Epoch 27 Batch 62000] loss 3.23, ppl 25.38, throughput 5762.78 samples/s
[Epoch 27 Batch 63000] loss 3.23, ppl 25.32, throughput 5720.22 samples/s
[Epoch 27 Batch 64000] loss 3.23, ppl 25.32, throughput 5593.85 samples/s
[Epoch 27 Batch 65000] loss 3.24, ppl 25.48, throughput 5741.70 samples/s
[Epoch 27 Batch 66000] loss 3.23, ppl 25.30, throughput 5686.01 samples/s
[Epoch 27 Batch 67000] loss 3.24, ppl 25.41, throughput 5579.89 samples/s
[Epoch 27 Batch 68000] loss 3.23, ppl 25.33, throughput 5706.84 samples/s
[Epoch 27 Batch 69000] loss 3.23, ppl 25.35, throughput 5778.38 samples/s
[Epoch 27 Batch 70000] loss 3.23, ppl 25.39, throughput 5740.58 samples/s
[Epoch 27 Batch 71000] loss 3.24, ppl 25.41, throughput 5536.63 samples/s
[Epoch 27 Batch 72000] loss 3.23, ppl 25.36, throughput 5696.68 samples/s
[Epoch 27 Batch 73000] loss 3.23, ppl 25.35, throughput 5717.97 samples/s
[Epoch 27 Batch 74000] loss 3.23, ppl 25.39, throughput 5738.56 samples/s
[Epoch 27 Batch 75000] loss 3.23, ppl 25.38, throughput 5558.16 samples/s
[Epoch 27 Batch 76000] loss 3.23, ppl 25.38, throughput 5744.68 samples/s
[Epoch 27 Batch 77000] loss 3.23, ppl 25.33, throughput 5746.88 samples/s
[Epoch 27 Batch 78000] loss 3.23, ppl 25.36, throughput 5717.86 samples/s
Epoch 27 took 7256.25 seconds.
[Epoch 28 Batch 1000] loss 3.21, ppl 24.81, throughput 5605.49 samples/s
[Epoch 28 Batch 2000] loss 3.22, ppl 25.01, throughput 5719.44 samples/s
[Epoch 28 Batch 3000] loss 3.22, ppl 24.95, throughput 5738.80 samples/s
[Epoch 28 Batch 4000] loss 3.21, ppl 24.81, throughput 5552.19 samples/s
[Epoch 28 Batch 5000] loss 3.21, ppl 24.86, throughput 5699.02 samples/s
[Epoch 28 Batch 6000] loss 3.21, ppl 24.90, throughput 4819.83 samples/s
[Epoch 28 Batch 7000] loss 3.22, ppl 24.92, throughput 4570.71 samples/s
[Epoch 28 Batch 8000] loss 3.22, ppl 24.94, throughput 4462.39 samples/s
[Epoch 28 Batch 9000] loss 3.22, ppl 25.01, throughput 4574.09 samples/s
[Epoch 28 Batch 10000] loss 3.22, ppl 25.00, throughput 4210.70 samples/s
[Epoch 28 Batch 11000] loss 3.22, ppl 25.08, throughput 4202.70 samples/s
[Epoch 28 Batch 12000] loss 3.22, ppl 25.06, throughput 4349.22 samples/s
[Epoch 28 Batch 13000] loss 3.21, ppl 24.89, throughput 4620.78 samples/s
[Epoch 28 Batch 14000] loss 3.22, ppl 25.13, throughput 4617.27 samples/s
[Epoch 28 Batch 15000] loss 3.22, ppl 25.01, throughput 4479.92 samples/s
[Epoch 28 Batch 16000] loss 3.22, ppl 24.93, throughput 5003.36 samples/s
[Epoch 28 Batch 17000] loss 3.22, ppl 25.11, throughput 5693.00 samples/s
[Epoch 28 Batch 18000] loss 3.22, ppl 24.95, throughput 5724.23 samples/s
[Epoch 28 Batch 19000] loss 3.22, ppl 25.10, throughput 5550.58 samples/s
[Epoch 28 Batch 20000] loss 3.22, ppl 25.06, throughput 5734.04 samples/s
[Epoch 28 Batch 21000] loss 3.22, ppl 25.00, throughput 5750.82 samples/s
[Epoch 28 Batch 22000] loss 3.22, ppl 25.00, throughput 5681.62 samples/s
[Epoch 28 Batch 23000] loss 3.22, ppl 25.08, throughput 5533.27 samples/s
[Epoch 28 Batch 24000] loss 3.22, ppl 25.04, throughput 5728.56 samples/s
[Epoch 28 Batch 25000] loss 3.22, ppl 25.05, throughput 5748.40 samples/s
[Epoch 28 Batch 26000] loss 3.22, ppl 25.14, throughput 5774.98 samples/s
[Epoch 28 Batch 27000] loss 3.22, ppl 25.07, throughput 5556.32 samples/s
[Epoch 28 Batch 28000] loss 3.23, ppl 25.17, throughput 5713.51 samples/s
[Epoch 28 Batch 29000] loss 3.22, ppl 25.03, throughput 5760.31 samples/s
[Epoch 28 Batch 30000] loss 3.22, ppl 25.08, throughput 5580.46 samples/s
[Epoch 28 Batch 31000] loss 3.22, ppl 25.07, throughput 5763.29 samples/s
[Epoch 28 Batch 32000] loss 3.22, ppl 25.00, throughput 5672.49 samples/s
[Epoch 28 Batch 33000] loss 3.22, ppl 25.11, throughput 5695.54 samples/s
[Epoch 28 Batch 34000] loss 3.23, ppl 25.23, throughput 5573.57 samples/s
[Epoch 28 Batch 35000] loss 3.23, ppl 25.19, throughput 5710.69 samples/s
[Epoch 28 Batch 36000] loss 3.23, ppl 25.28, throughput 5691.68 samples/s
[Epoch 28 Batch 37000] loss 3.23, ppl 25.22, throughput 5733.83 samples/s
[Epoch 28 Batch 38000] loss 3.22, ppl 25.11, throughput 5611.12 samples/s
[Epoch 28 Batch 39000] loss 3.23, ppl 25.16, throughput 5708.98 samples/s
[Epoch 28 Batch 40000] loss 3.22, ppl 25.05, throughput 5725.46 samples/s
[Epoch 28 Batch 41000] loss 3.22, ppl 25.15, throughput 5565.56 samples/s
[Epoch 28 Batch 42000] loss 3.23, ppl 25.19, throughput 5706.41 samples/s
[Epoch 28 Batch 43000] loss 3.22, ppl 25.12, throughput 5770.82 samples/s
[Epoch 28 Batch 44000] loss 3.22, ppl 25.07, throughput 5769.37 samples/s
[Epoch 28 Batch 45000] loss 3.23, ppl 25.23, throughput 5530.62 samples/s
[Epoch 28 Batch 46000] loss 3.23, ppl 25.28, throughput 5814.35 samples/s
[Epoch 28 Batch 47000] loss 3.23, ppl 25.22, throughput 5720.34 samples/s
[Epoch 28 Batch 48000] loss 3.23, ppl 25.22, throughput 5753.68 samples/s
[Epoch 28 Batch 49000] loss 3.23, ppl 25.26, throughput 5630.15 samples/s
[Epoch 28 Batch 50000] loss 3.23, ppl 25.25, throughput 5722.16 samples/s
[Epoch 28 Batch 51000] loss 3.23, ppl 25.29, throughput 5771.57 samples/s
[Epoch 28 Batch 52000] loss 3.23, ppl 25.20, throughput 5746.93 samples/s
[Epoch 28 Batch 53000] loss 3.22, ppl 25.15, throughput 5590.44 samples/s
[Epoch 28 Batch 54000] loss 3.23, ppl 25.29, throughput 5763.02 samples/s
[Epoch 28 Batch 55000] loss 3.23, ppl 25.20, throughput 5766.68 samples/s
[Epoch 28 Batch 56000] loss 3.23, ppl 25.20, throughput 5631.14 samples/s
[Epoch 28 Batch 57000] loss 3.23, ppl 25.20, throughput 5751.55 samples/s
[Epoch 28 Batch 58000] loss 3.23, ppl 25.32, throughput 5731.65 samples/s
[Epoch 28 Batch 59000] loss 3.23, ppl 25.30, throughput 5712.27 samples/s
[Epoch 28 Batch 60000] loss 3.23, ppl 25.32, throughput 5568.73 samples/s
[Epoch 28 Batch 61000] loss 3.23, ppl 25.34, throughput 5778.42 samples/s
[Epoch 28 Batch 62000] loss 3.23, ppl 25.22, throughput 5722.89 samples/s
[Epoch 28 Batch 63000] loss 3.23, ppl 25.28, throughput 4926.58 samples/s
[Epoch 28 Batch 64000] loss 3.23, ppl 25.30, throughput 4459.47 samples/s
[Epoch 28 Batch 65000] loss 3.23, ppl 25.26, throughput 4548.69 samples/s
[Epoch 28 Batch 66000] loss 3.23, ppl 25.29, throughput 4645.54 samples/s
[Epoch 28 Batch 67000] loss 3.23, ppl 25.37, throughput 4474.35 samples/s
[Epoch 28 Batch 68000] loss 3.23, ppl 25.34, throughput 4677.90 samples/s
[Epoch 28 Batch 69000] loss 3.23, ppl 25.37, throughput 5701.15 samples/s
[Epoch 28 Batch 70000] loss 3.23, ppl 25.27, throughput 5702.42 samples/s
[Epoch 28 Batch 71000] loss 3.23, ppl 25.37, throughput 5563.94 samples/s
[Epoch 28 Batch 72000] loss 3.23, ppl 25.28, throughput 5744.22 samples/s
[Epoch 28 Batch 73000] loss 3.23, ppl 25.31, throughput 5773.71 samples/s
[Epoch 28 Batch 74000] loss 3.23, ppl 25.32, throughput 5760.68 samples/s
[Epoch 28 Batch 75000] loss 3.23, ppl 25.39, throughput 5584.24 samples/s
[Epoch 28 Batch 76000] loss 3.23, ppl 25.25, throughput 5697.55 samples/s
[Epoch 28 Batch 77000] loss 3.23, ppl 25.23, throughput 5781.99 samples/s
[Epoch 28 Batch 78000] loss 3.23, ppl 25.34, throughput 5703.16 samples/s
Epoch 28 took 7399.69 seconds.
[Epoch 29 Batch 1000] loss 3.21, ppl 24.77, throughput 5623.15 samples/s
[Epoch 29 Batch 2000] loss 3.20, ppl 24.62, throughput 5677.89 samples/s
[Epoch 29 Batch 3000] loss 3.21, ppl 24.81, throughput 5754.99 samples/s
[Epoch 29 Batch 4000] loss 3.22, ppl 24.94, throughput 5560.14 samples/s
[Epoch 29 Batch 5000] loss 3.22, ppl 24.99, throughput 5318.75 samples/s
[Epoch 29 Batch 6000] loss 3.21, ppl 24.86, throughput 5285.87 samples/s
[Epoch 29 Batch 7000] loss 3.22, ppl 25.01, throughput 5266.98 samples/s
[Epoch 29 Batch 8000] loss 3.21, ppl 24.86, throughput 5179.32 samples/s
[Epoch 29 Batch 9000] loss 3.21, ppl 24.68, throughput 5290.07 samples/s
[Epoch 29 Batch 10000] loss 3.20, ppl 24.64, throughput 4711.65 samples/s
[Epoch 29 Batch 11000] loss 3.21, ppl 24.77, throughput 5537.94 samples/s
[Epoch 29 Batch 12000] loss 3.22, ppl 25.02, throughput 5589.78 samples/s
[Epoch 29 Batch 13000] loss 3.22, ppl 24.98, throughput 5731.20 samples/s
[Epoch 29 Batch 14000] loss 3.21, ppl 24.87, throughput 5790.49 samples/s
[Epoch 29 Batch 15000] loss 3.22, ppl 25.01, throughput 5551.19 samples/s
[Epoch 29 Batch 16000] loss 3.22, ppl 24.99, throughput 5696.65 samples/s
[Epoch 29 Batch 17000] loss 3.22, ppl 24.99, throughput 5768.21 samples/s
[Epoch 29 Batch 18000] loss 3.22, ppl 24.99, throughput 5746.21 samples/s
[Epoch 29 Batch 19000] loss 3.22, ppl 25.05, throughput 5586.48 samples/s
[Epoch 29 Batch 20000] loss 3.22, ppl 24.99, throughput 5672.69 samples/s
[Epoch 29 Batch 21000] loss 3.22, ppl 24.99, throughput 5737.62 samples/s
[Epoch 29 Batch 22000] loss 3.22, ppl 25.01, throughput 5760.80 samples/s
[Epoch 29 Batch 23000] loss 3.22, ppl 25.02, throughput 5550.99 samples/s
[Epoch 29 Batch 24000] loss 3.21, ppl 24.89, throughput 5750.12 samples/s
[Epoch 29 Batch 25000] loss 3.22, ppl 25.03, throughput 5704.82 samples/s
[Epoch 29 Batch 26000] loss 3.22, ppl 25.12, throughput 5737.54 samples/s
[Epoch 29 Batch 27000] loss 3.22, ppl 25.06, throughput 5545.17 samples/s
[Epoch 29 Batch 28000] loss 3.22, ppl 25.06, throughput 5656.25 samples/s
[Epoch 29 Batch 29000] loss 3.22, ppl 25.11, throughput 5721.67 samples/s
[Epoch 29 Batch 30000] loss 3.22, ppl 25.13, throughput 5568.62 samples/s
[Epoch 29 Batch 31000] loss 3.22, ppl 25.04, throughput 5777.15 samples/s
[Epoch 29 Batch 32000] loss 3.22, ppl 25.09, throughput 5708.30 samples/s
[Epoch 29 Batch 33000] loss 3.22, ppl 25.07, throughput 5727.67 samples/s
[Epoch 29 Batch 34000] loss 3.22, ppl 25.07, throughput 5534.81 samples/s
[Epoch 29 Batch 35000] loss 3.22, ppl 25.14, throughput 5728.75 samples/s
[Epoch 29 Batch 36000] loss 3.22, ppl 25.00, throughput 5027.35 samples/s
[Epoch 29 Batch 37000] loss 3.22, ppl 25.09, throughput 4609.77 samples/s
[Epoch 29 Batch 38000] loss 3.22, ppl 25.09, throughput 4465.63 samples/s
[Epoch 29 Batch 39000] loss 3.22, ppl 25.03, throughput 4593.06 samples/s
[Epoch 29 Batch 40000] loss 3.23, ppl 25.20, throughput 4616.03 samples/s
[Epoch 29 Batch 41000] loss 3.22, ppl 25.11, throughput 4456.71 samples/s
[Epoch 29 Batch 42000] loss 3.22, ppl 25.06, throughput 5617.07 samples/s
[Epoch 29 Batch 43000] loss 3.22, ppl 25.08, throughput 5754.35 samples/s
[Epoch 29 Batch 44000] loss 3.23, ppl 25.20, throughput 5717.43 samples/s
[Epoch 29 Batch 45000] loss 3.22, ppl 25.02, throughput 5598.10 samples/s
[Epoch 29 Batch 46000] loss 3.23, ppl 25.21, throughput 5681.05 samples/s
[Epoch 29 Batch 47000] loss 3.22, ppl 25.09, throughput 5729.29 samples/s
[Epoch 29 Batch 48000] loss 3.22, ppl 25.13, throughput 5750.19 samples/s
[Epoch 29 Batch 49000] loss 3.23, ppl 25.19, throughput 5543.55 samples/s
[Epoch 29 Batch 50000] loss 3.22, ppl 25.12, throughput 5681.14 samples/s
[Epoch 29 Batch 51000] loss 3.22, ppl 25.11, throughput 5728.03 samples/s
[Epoch 29 Batch 52000] loss 3.22, ppl 25.13, throughput 5703.67 samples/s
[Epoch 29 Batch 53000] loss 3.22, ppl 25.12, throughput 5539.51 samples/s
[Epoch 29 Batch 54000] loss 3.22, ppl 25.15, throughput 5720.60 samples/s
[Epoch 29 Batch 55000] loss 3.23, ppl 25.16, throughput 5748.49 samples/s
[Epoch 29 Batch 56000] loss 3.22, ppl 25.15, throughput 5565.23 samples/s
[Epoch 29 Batch 57000] loss 3.23, ppl 25.19, throughput 5733.90 samples/s
[Epoch 29 Batch 58000] loss 3.23, ppl 25.21, throughput 5756.06 samples/s
[Epoch 29 Batch 59000] loss 3.23, ppl 25.23, throughput 5723.14 samples/s
[Epoch 29 Batch 60000] loss 3.23, ppl 25.18, throughput 5615.58 samples/s
[Epoch 29 Batch 61000] loss 3.23, ppl 25.24, throughput 5734.54 samples/s
[Epoch 29 Batch 62000] loss 3.22, ppl 25.15, throughput 5728.33 samples/s
[Epoch 29 Batch 63000] loss 3.22, ppl 25.11, throughput 5712.40 samples/s
[Epoch 29 Batch 64000] loss 3.22, ppl 25.13, throughput 5468.61 samples/s
[Epoch 29 Batch 65000] loss 3.23, ppl 25.22, throughput 5799.57 samples/s
[Epoch 29 Batch 66000] loss 3.23, ppl 25.31, throughput 5772.97 samples/s
[Epoch 29 Batch 67000] loss 3.23, ppl 25.26, throughput 5557.65 samples/s
[Epoch 29 Batch 68000] loss 3.23, ppl 25.21, throughput 5735.39 samples/s
[Epoch 29 Batch 69000] loss 3.23, ppl 25.23, throughput 5682.69 samples/s
[Epoch 29 Batch 70000] loss 3.23, ppl 25.20, throughput 5694.59 samples/s
[Epoch 29 Batch 71000] loss 3.23, ppl 25.22, throughput 5584.02 samples/s
[Epoch 29 Batch 72000] loss 3.23, ppl 25.23, throughput 5725.00 samples/s
[Epoch 29 Batch 73000] loss 3.23, ppl 25.18, throughput 5768.03 samples/s
[Epoch 29 Batch 74000] loss 3.23, ppl 25.16, throughput 5734.23 samples/s
[Epoch 29 Batch 75000] loss 3.23, ppl 25.32, throughput 5585.47 samples/s
[Epoch 29 Batch 76000] loss 3.23, ppl 25.25, throughput 5728.93 samples/s
[Epoch 29 Batch 77000] loss 3.23, ppl 25.26, throughput 5731.03 samples/s
[Epoch 29 Batch 78000] loss 3.23, ppl 25.27, throughput 5751.64 samples/s
Epoch 29 took 7212.51 seconds.
[Epoch 30 Batch 1000] loss 3.21, ppl 24.77, throughput 5608.12 samples/s
[Epoch 30 Batch 2000] loss 3.21, ppl 24.87, throughput 5713.76 samples/s
[Epoch 30 Batch 3000] loss 3.21, ppl 24.69, throughput 5742.90 samples/s
[Epoch 30 Batch 4000] loss 3.21, ppl 24.80, throughput 5125.47 samples/s
[Epoch 30 Batch 5000] loss 3.21, ppl 24.67, throughput 4575.02 samples/s
[Epoch 30 Batch 6000] loss 3.21, ppl 24.84, throughput 4600.02 samples/s
[Epoch 30 Batch 7000] loss 3.21, ppl 24.76, throughput 4615.06 samples/s
[Epoch 30 Batch 8000] loss 3.21, ppl 24.77, throughput 4456.98 samples/s
[Epoch 30 Batch 9000] loss 3.21, ppl 24.88, throughput 4098.04 samples/s
[Epoch 30 Batch 10000] loss 3.22, ppl 24.98, throughput 4330.87 samples/s
[Epoch 30 Batch 11000] loss 3.21, ppl 24.82, throughput 4590.74 samples/s
[Epoch 30 Batch 12000] loss 3.21, ppl 24.84, throughput 4455.84 samples/s
[Epoch 30 Batch 13000] loss 3.22, ppl 25.00, throughput 4635.66 samples/s
[Epoch 30 Batch 14000] loss 3.21, ppl 24.80, throughput 4536.58 samples/s
[Epoch 30 Batch 15000] loss 3.21, ppl 24.70, throughput 5322.92 samples/s
[Epoch 30 Batch 16000] loss 3.22, ppl 25.05, throughput 5745.25 samples/s
[Epoch 30 Batch 17000] loss 3.22, ppl 24.91, throughput 5777.28 samples/s
[Epoch 30 Batch 18000] loss 3.22, ppl 24.93, throughput 5729.05 samples/s
[Epoch 30 Batch 19000] loss 3.22, ppl 25.00, throughput 5582.60 samples/s
[Epoch 30 Batch 20000] loss 3.22, ppl 25.05, throughput 5693.05 samples/s
[Epoch 30 Batch 21000] loss 3.22, ppl 25.00, throughput 5742.74 samples/s
[Epoch 30 Batch 22000] loss 3.22, ppl 25.02, throughput 5765.33 samples/s
[Epoch 30 Batch 23000] loss 3.22, ppl 25.01, throughput 5613.76 samples/s
[Epoch 30 Batch 24000] loss 3.22, ppl 24.99, throughput 5726.03 samples/s
[Epoch 30 Batch 25000] loss 3.22, ppl 24.95, throughput 5729.21 samples/s
[Epoch 30 Batch 26000] loss 3.22, ppl 24.99, throughput 5722.89 samples/s
[Epoch 30 Batch 27000] loss 3.22, ppl 24.99, throughput 5608.54 samples/s
[Epoch 30 Batch 28000] loss 3.22, ppl 25.07, throughput 5729.08 samples/s
[Epoch 30 Batch 29000] loss 3.22, ppl 24.96, throughput 5713.26 samples/s
[Epoch 30 Batch 30000] loss 3.21, ppl 24.88, throughput 5578.68 samples/s
[Epoch 30 Batch 31000] loss 3.22, ppl 25.00, throughput 5716.06 samples/s
[Epoch 30 Batch 32000] loss 3.22, ppl 24.97, throughput 5741.56 samples/s
[Epoch 30 Batch 33000] loss 3.22, ppl 24.96, throughput 5709.17 samples/s
[Epoch 30 Batch 34000] loss 3.22, ppl 25.03, throughput 5592.56 samples/s
[Epoch 30 Batch 35000] loss 3.22, ppl 24.95, throughput 5718.20 samples/s
[Epoch 30 Batch 36000] loss 3.22, ppl 25.00, throughput 5771.45 samples/s
[Epoch 30 Batch 37000] loss 3.22, ppl 25.02, throughput 5736.16 samples/s
[Epoch 30 Batch 38000] loss 3.22, ppl 25.00, throughput 5601.19 samples/s
[Epoch 30 Batch 39000] loss 3.22, ppl 25.05, throughput 5756.32 samples/s
[Epoch 30 Batch 40000] loss 3.22, ppl 25.01, throughput 5723.41 samples/s
[Epoch 30 Batch 41000] loss 3.22, ppl 25.08, throughput 5579.59 samples/s
[Epoch 30 Batch 42000] loss 3.22, ppl 25.00, throughput 5746.83 samples/s
[Epoch 30 Batch 43000] loss 3.22, ppl 25.05, throughput 5724.96 samples/s
[Epoch 30 Batch 44000] loss 3.22, ppl 25.09, throughput 5733.15 samples/s
[Epoch 30 Batch 45000] loss 3.22, ppl 25.00, throughput 5608.48 samples/s
[Epoch 30 Batch 46000] loss 3.23, ppl 25.18, throughput 5745.30 samples/s
[Epoch 30 Batch 47000] loss 3.22, ppl 25.12, throughput 5747.03 samples/s
[Epoch 30 Batch 48000] loss 3.22, ppl 25.03, throughput 5725.33 samples/s
[Epoch 30 Batch 49000] loss 3.22, ppl 25.01, throughput 5578.89 samples/s
[Epoch 30 Batch 50000] loss 3.22, ppl 25.08, throughput 5727.54 samples/s
[Epoch 30 Batch 51000] loss 3.22, ppl 25.02, throughput 5701.02 samples/s
[Epoch 30 Batch 52000] loss 3.22, ppl 25.06, throughput 5711.99 samples/s
[Epoch 30 Batch 53000] loss 3.22, ppl 25.04, throughput 5520.50 samples/s
[Epoch 30 Batch 54000] loss 3.22, ppl 25.13, throughput 5679.87 samples/s
[Epoch 30 Batch 55000] loss 3.22, ppl 25.05, throughput 5694.71 samples/s
[Epoch 30 Batch 56000] loss 3.22, ppl 25.05, throughput 5669.68 samples/s
[Epoch 30 Batch 57000] loss 3.22, ppl 24.98, throughput 5766.17 samples/s
[Epoch 30 Batch 58000] loss 3.22, ppl 25.02, throughput 5759.12 samples/s
[Epoch 30 Batch 59000] loss 3.22, ppl 25.03, throughput 5679.02 samples/s
[Epoch 30 Batch 60000] loss 3.23, ppl 25.18, throughput 5527.64 samples/s
[Epoch 30 Batch 61000] loss 3.23, ppl 25.16, throughput 5476.41 samples/s
[Epoch 30 Batch 62000] loss 3.22, ppl 25.08, throughput 4621.23 samples/s
[Epoch 30 Batch 63000] loss 3.22, ppl 25.11, throughput 4578.64 samples/s
[Epoch 30 Batch 64000] loss 3.22, ppl 25.12, throughput 4451.25 samples/s
[Epoch 30 Batch 65000] loss 3.22, ppl 25.11, throughput 4614.64 samples/s
[Epoch 30 Batch 66000] loss 3.22, ppl 25.15, throughput 4601.85 samples/s
[Epoch 30 Batch 67000] loss 3.22, ppl 25.15, throughput 5033.38 samples/s
[Epoch 30 Batch 68000] loss 3.23, ppl 25.21, throughput 5781.94 samples/s
[Epoch 30 Batch 69000] loss 3.22, ppl 25.08, throughput 5700.45 samples/s
[Epoch 30 Batch 70000] loss 3.23, ppl 25.20, throughput 5747.85 samples/s
[Epoch 30 Batch 71000] loss 3.22, ppl 25.14, throughput 5555.08 samples/s
[Epoch 30 Batch 72000] loss 3.23, ppl 25.21, throughput 5769.99 samples/s
[Epoch 30 Batch 73000] loss 3.23, ppl 25.16, throughput 5804.20 samples/s
[Epoch 30 Batch 74000] loss 3.22, ppl 25.11, throughput 5713.58 samples/s
[Epoch 30 Batch 75000] loss 3.23, ppl 25.18, throughput 5573.87 samples/s
[Epoch 30 Batch 76000] loss 3.22, ppl 25.09, throughput 5774.06 samples/s
[Epoch 30 Batch 77000] loss 3.23, ppl 25.20, throughput 5745.65 samples/s
[Epoch 30 Batch 78000] loss 3.23, ppl 25.20, throughput 5759.98 samples/s
Epoch 30 took 7398.99 seconds.
[Epoch 31 Batch 1000] loss 3.20, ppl 24.51, throughput 5609.30 samples/s
[Epoch 31 Batch 2000] loss 3.20, ppl 24.62, throughput 5730.00 samples/s
[Epoch 31 Batch 3000] loss 3.19, ppl 24.33, throughput 5290.57 samples/s
[Epoch 31 Batch 4000] loss 3.21, ppl 24.76, throughput 5157.76 samples/s
[Epoch 31 Batch 5000] loss 3.21, ppl 24.70, throughput 5268.80 samples/s
[Epoch 31 Batch 6000] loss 3.20, ppl 24.43, throughput 5358.25 samples/s
[Epoch 31 Batch 7000] loss 3.21, ppl 24.67, throughput 5304.42 samples/s
[Epoch 31 Batch 8000] loss 3.21, ppl 24.90, throughput 5120.97 samples/s
[Epoch 31 Batch 9000] loss 3.22, ppl 24.97, throughput 5617.28 samples/s
[Epoch 31 Batch 10000] loss 3.21, ppl 24.77, throughput 5711.48 samples/s
[Epoch 31 Batch 11000] loss 3.21, ppl 24.88, throughput 5805.00 samples/s
[Epoch 31 Batch 12000] loss 3.22, ppl 24.97, throughput 5596.26 samples/s
[Epoch 31 Batch 13000] loss 3.21, ppl 24.74, throughput 5759.68 samples/s
[Epoch 31 Batch 14000] loss 3.21, ppl 24.67, throughput 5700.39 samples/s
[Epoch 31 Batch 15000] loss 3.21, ppl 24.89, throughput 5598.33 samples/s
[Epoch 31 Batch 16000] loss 3.21, ppl 24.78, throughput 5739.18 samples/s
[Epoch 31 Batch 17000] loss 3.21, ppl 24.82, throughput 5802.87 samples/s
[Epoch 31 Batch 18000] loss 3.21, ppl 24.88, throughput 5757.06 samples/s
[Epoch 31 Batch 19000] loss 3.22, ppl 24.93, throughput 5614.90 samples/s
[Epoch 31 Batch 20000] loss 3.22, ppl 24.95, throughput 5707.61 samples/s
[Epoch 31 Batch 21000] loss 3.21, ppl 24.87, throughput 5751.06 samples/s
[Epoch 31 Batch 22000] loss 3.21, ppl 24.83, throughput 5733.33 samples/s
[Epoch 31 Batch 23000] loss 3.21, ppl 24.88, throughput 5595.18 samples/s
[Epoch 31 Batch 24000] loss 3.21, ppl 24.84, throughput 5755.16 samples/s
[Epoch 31 Batch 25000] loss 3.22, ppl 24.93, throughput 5719.23 samples/s
[Epoch 31 Batch 26000] loss 3.22, ppl 25.03, throughput 5743.31 samples/s
[Epoch 31 Batch 27000] loss 3.21, ppl 24.86, throughput 5633.78 samples/s
[Epoch 31 Batch 28000] loss 3.22, ppl 24.94, throughput 5723.11 samples/s
[Epoch 31 Batch 29000] loss 3.22, ppl 25.01, throughput 5745.62 samples/s
[Epoch 31 Batch 30000] loss 3.21, ppl 24.77, throughput 5592.82 samples/s
[Epoch 31 Batch 31000] loss 3.22, ppl 24.93, throughput 5753.13 samples/s
[Epoch 31 Batch 32000] loss 3.22, ppl 24.96, throughput 5689.05 samples/s
[Epoch 31 Batch 33000] loss 3.21, ppl 24.86, throughput 5735.52 samples/s
[Epoch 31 Batch 34000] loss 3.22, ppl 24.99, throughput 5558.23 samples/s
[Epoch 31 Batch 35000] loss 3.22, ppl 24.93, throughput 4887.23 samples/s
[Epoch 31 Batch 36000] loss 3.22, ppl 24.94, throughput 4594.64 samples/s
[Epoch 31 Batch 37000] loss 3.22, ppl 24.97, throughput 4558.57 samples/s
[Epoch 31 Batch 38000] loss 3.22, ppl 25.00, throughput 4463.16 samples/s
[Epoch 31 Batch 39000] loss 3.22, ppl 24.95, throughput 4503.82 samples/s
[Epoch 31 Batch 40000] loss 3.21, ppl 24.84, throughput 4641.15 samples/s
[Epoch 31 Batch 41000] loss 3.22, ppl 24.97, throughput 5555.48 samples/s
[Epoch 31 Batch 42000] loss 3.22, ppl 24.94, throughput 5735.40 samples/s
[Epoch 31 Batch 43000] loss 3.22, ppl 24.91, throughput 5721.82 samples/s
[Epoch 31 Batch 44000] loss 3.22, ppl 25.09, throughput 5724.47 samples/s
[Epoch 31 Batch 45000] loss 3.22, ppl 25.04, throughput 5563.33 samples/s
[Epoch 31 Batch 46000] loss 3.22, ppl 24.94, throughput 5766.98 samples/s
[Epoch 31 Batch 47000] loss 3.22, ppl 25.11, throughput 5712.36 samples/s
[Epoch 31 Batch 48000] loss 3.22, ppl 25.04, throughput 5712.57 samples/s
[Epoch 31 Batch 49000] loss 3.22, ppl 24.96, throughput 5502.56 samples/s
[Epoch 31 Batch 50000] loss 3.22, ppl 25.00, throughput 5712.93 samples/s
[Epoch 31 Batch 51000] loss 3.22, ppl 25.02, throughput 5717.69 samples/s
[Epoch 31 Batch 52000] loss 3.22, ppl 25.01, throughput 5777.73 samples/s
[Epoch 31 Batch 53000] loss 3.22, ppl 25.03, throughput 5554.07 samples/s
[Epoch 31 Batch 54000] loss 3.22, ppl 25.06, throughput 5722.96 samples/s
[Epoch 31 Batch 55000] loss 3.22, ppl 25.05, throughput 5698.32 samples/s
[Epoch 31 Batch 56000] loss 3.22, ppl 24.99, throughput 5532.97 samples/s
[Epoch 31 Batch 57000] loss 3.23, ppl 25.17, throughput 5778.01 samples/s
[Epoch 31 Batch 58000] loss 3.22, ppl 25.07, throughput 5699.78 samples/s
[Epoch 31 Batch 59000] loss 3.22, ppl 25.08, throughput 5710.26 samples/s
[Epoch 31 Batch 60000] loss 3.22, ppl 25.04, throughput 5577.03 samples/s
[Epoch 31 Batch 61000] loss 3.22, ppl 25.00, throughput 5736.75 samples/s
[Epoch 31 Batch 62000] loss 3.22, ppl 25.13, throughput 5688.92 samples/s
[Epoch 31 Batch 63000] loss 3.22, ppl 25.13, throughput 5704.55 samples/s
[Epoch 31 Batch 64000] loss 3.22, ppl 25.02, throughput 5595.81 samples/s
[Epoch 31 Batch 65000] loss 3.22, ppl 25.00, throughput 5747.29 samples/s
[Epoch 31 Batch 66000] loss 3.22, ppl 25.13, throughput 5796.32 samples/s
[Epoch 31 Batch 67000] loss 3.22, ppl 25.15, throughput 5549.71 samples/s
[Epoch 31 Batch 68000] loss 3.22, ppl 24.99, throughput 5698.73 samples/s
[Epoch 31 Batch 69000] loss 3.22, ppl 25.01, throughput 5706.77 samples/s
[Epoch 31 Batch 70000] loss 3.22, ppl 25.04, throughput 5678.02 samples/s
[Epoch 31 Batch 71000] loss 3.22, ppl 25.10, throughput 5592.88 samples/s
[Epoch 31 Batch 72000] loss 3.22, ppl 25.08, throughput 5728.92 samples/s
[Epoch 31 Batch 73000] loss 3.22, ppl 25.07, throughput 5711.47 samples/s
[Epoch 31 Batch 74000] loss 3.22, ppl 25.08, throughput 5734.80 samples/s
[Epoch 31 Batch 75000] loss 3.22, ppl 25.06, throughput 5627.79 samples/s
[Epoch 31 Batch 76000] loss 3.22, ppl 25.07, throughput 5731.64 samples/s
[Epoch 31 Batch 77000] loss 3.22, ppl 25.15, throughput 5754.90 samples/s
[Epoch 31 Batch 78000] loss 3.22, ppl 25.08, throughput 5768.35 samples/s
Epoch 31 took 7197.20 seconds.
[Epoch 32 Batch 1000] loss 3.21, ppl 24.70, throughput 5586.28 samples/s
[Epoch 32 Batch 2000] loss 3.20, ppl 24.58, throughput 4769.25 samples/s
[Epoch 32 Batch 3000] loss 3.20, ppl 24.51, throughput 4185.82 samples/s
[Epoch 32 Batch 4000] loss 3.21, ppl 24.66, throughput 4074.27 samples/s
[Epoch 32 Batch 5000] loss 3.21, ppl 24.73, throughput 4137.07 samples/s
[Epoch 32 Batch 6000] loss 3.21, ppl 24.80, throughput 4102.48 samples/s
[Epoch 32 Batch 7000] loss 3.21, ppl 24.77, throughput 4230.06 samples/s
[Epoch 32 Batch 8000] loss 3.20, ppl 24.46, throughput 4036.27 samples/s
[Epoch 32 Batch 9000] loss 3.21, ppl 24.73, throughput 4202.04 samples/s
[Epoch 32 Batch 10000] loss 3.20, ppl 24.62, throughput 5163.85 samples/s
[Epoch 32 Batch 11000] loss 3.21, ppl 24.79, throughput 5717.09 samples/s
[Epoch 32 Batch 12000] loss 3.21, ppl 24.67, throughput 5582.17 samples/s
[Epoch 32 Batch 13000] loss 3.21, ppl 24.85, throughput 5770.11 samples/s
[Epoch 32 Batch 14000] loss 3.21, ppl 24.81, throughput 5766.52 samples/s
[Epoch 32 Batch 15000] loss 3.21, ppl 24.79, throughput 5643.12 samples/s
[Epoch 32 Batch 16000] loss 3.21, ppl 24.84, throughput 5701.70 samples/s
[Epoch 32 Batch 17000] loss 3.20, ppl 24.65, throughput 5717.60 samples/s
[Epoch 32 Batch 18000] loss 3.20, ppl 24.60, throughput 5713.79 samples/s
[Epoch 32 Batch 19000] loss 3.22, ppl 24.94, throughput 5573.84 samples/s
[Epoch 32 Batch 20000] loss 3.21, ppl 24.78, throughput 5726.77 samples/s
[Epoch 32 Batch 21000] loss 3.21, ppl 24.84, throughput 5776.18 samples/s
[Epoch 32 Batch 22000] loss 3.21, ppl 24.76, throughput 5759.30 samples/s
[Epoch 32 Batch 23000] loss 3.22, ppl 24.94, throughput 5627.29 samples/s
[Epoch 32 Batch 24000] loss 3.21, ppl 24.86, throughput 5701.68 samples/s
[Epoch 32 Batch 25000] loss 3.21, ppl 24.86, throughput 5748.35 samples/s
[Epoch 32 Batch 26000] loss 3.22, ppl 24.91, throughput 5568.84 samples/s
[Epoch 32 Batch 27000] loss 3.21, ppl 24.77, throughput 5768.15 samples/s
[Epoch 32 Batch 28000] loss 3.21, ppl 24.82, throughput 5708.30 samples/s
[Epoch 32 Batch 29000] loss 3.21, ppl 24.87, throughput 5704.85 samples/s
[Epoch 32 Batch 30000] loss 3.21, ppl 24.73, throughput 5541.21 samples/s
[Epoch 32 Batch 31000] loss 3.21, ppl 24.87, throughput 5721.32 samples/s
[Epoch 32 Batch 32000] loss 3.21, ppl 24.73, throughput 5713.97 samples/s
[Epoch 32 Batch 33000] loss 3.21, ppl 24.81, throughput 5717.63 samples/s
[Epoch 32 Batch 34000] loss 3.22, ppl 24.91, throughput 5575.59 samples/s
[Epoch 32 Batch 35000] loss 3.22, ppl 24.93, throughput 5760.51 samples/s
[Epoch 32 Batch 36000] loss 3.22, ppl 24.93, throughput 5731.36 samples/s
[Epoch 32 Batch 37000] loss 3.22, ppl 24.91, throughput 5678.07 samples/s
[Epoch 32 Batch 38000] loss 3.21, ppl 24.87, throughput 5574.90 samples/s
[Epoch 32 Batch 39000] loss 3.22, ppl 24.91, throughput 5712.62 samples/s
[Epoch 32 Batch 40000] loss 3.21, ppl 24.82, throughput 5767.33 samples/s
[Epoch 32 Batch 41000] loss 3.22, ppl 24.91, throughput 5519.07 samples/s
[Epoch 32 Batch 42000] loss 3.22, ppl 24.91, throughput 5712.96 samples/s
[Epoch 32 Batch 43000] loss 3.22, ppl 24.93, throughput 5792.50 samples/s
[Epoch 32 Batch 44000] loss 3.21, ppl 24.88, throughput 5789.30 samples/s
[Epoch 32 Batch 45000] loss 3.22, ppl 25.01, throughput 5515.08 samples/s
[Epoch 32 Batch 46000] loss 3.22, ppl 24.95, throughput 5786.11 samples/s
[Epoch 32 Batch 47000] loss 3.22, ppl 25.01, throughput 5738.08 samples/s
[Epoch 32 Batch 48000] loss 3.22, ppl 24.98, throughput 5777.36 samples/s
[Epoch 32 Batch 49000] loss 3.22, ppl 24.96, throughput 5518.86 samples/s
[Epoch 32 Batch 50000] loss 3.22, ppl 24.95, throughput 5692.37 samples/s
[Epoch 32 Batch 51000] loss 3.22, ppl 24.94, throughput 5741.63 samples/s
[Epoch 32 Batch 52000] loss 3.22, ppl 24.92, throughput 5752.68 samples/s
[Epoch 32 Batch 53000] loss 3.22, ppl 24.95, throughput 5598.03 samples/s
[Epoch 32 Batch 54000] loss 3.22, ppl 24.96, throughput 5706.65 samples/s
[Epoch 32 Batch 55000] loss 3.22, ppl 24.98, throughput 5737.49 samples/s
[Epoch 32 Batch 56000] loss 3.21, ppl 24.90, throughput 5334.89 samples/s
[Epoch 32 Batch 57000] loss 3.22, ppl 25.05, throughput 4618.79 samples/s
[Epoch 32 Batch 58000] loss 3.22, ppl 25.03, throughput 4612.97 samples/s
[Epoch 32 Batch 59000] loss 3.22, ppl 25.02, throughput 4608.21 samples/s
[Epoch 32 Batch 60000] loss 3.21, ppl 24.89, throughput 4496.36 samples/s
[Epoch 32 Batch 61000] loss 3.22, ppl 24.99, throughput 4562.08 samples/s
[Epoch 32 Batch 62000] loss 3.22, ppl 25.02, throughput 5140.35 samples/s
[Epoch 32 Batch 63000] loss 3.22, ppl 25.02, throughput 5770.65 samples/s
[Epoch 32 Batch 64000] loss 3.22, ppl 24.96, throughput 5577.24 samples/s
[Epoch 32 Batch 65000] loss 3.22, ppl 25.05, throughput 5742.57 samples/s
[Epoch 32 Batch 66000] loss 3.22, ppl 24.94, throughput 5696.73 samples/s
[Epoch 32 Batch 67000] loss 3.22, ppl 25.08, throughput 5630.94 samples/s
[Epoch 32 Batch 68000] loss 3.22, ppl 24.97, throughput 5703.28 samples/s
[Epoch 32 Batch 69000] loss 3.22, ppl 24.97, throughput 5765.60 samples/s
[Epoch 32 Batch 70000] loss 3.22, ppl 25.04, throughput 5729.16 samples/s
[Epoch 32 Batch 71000] loss 3.22, ppl 25.04, throughput 5557.17 samples/s
[Epoch 32 Batch 72000] loss 3.22, ppl 24.96, throughput 5731.83 samples/s
[Epoch 32 Batch 73000] loss 3.22, ppl 24.94, throughput 5814.44 samples/s
[Epoch 32 Batch 74000] loss 3.22, ppl 25.02, throughput 5778.68 samples/s
[Epoch 32 Batch 75000] loss 3.22, ppl 25.07, throughput 5556.79 samples/s
[Epoch 32 Batch 76000] loss 3.22, ppl 25.03, throughput 5785.24 samples/s
[Epoch 32 Batch 77000] loss 3.22, ppl 25.04, throughput 5732.12 samples/s
[Epoch 32 Batch 78000] loss 3.22, ppl 25.03, throughput 5731.70 samples/s
Epoch 32 took 7406.32 seconds.
[Epoch 33 Batch 1000] loss 3.20, ppl 24.56, throughput 5582.17 samples/s
[Epoch 33 Batch 2000] loss 3.20, ppl 24.59, throughput 5766.55 samples/s
[Epoch 33 Batch 3000] loss 3.20, ppl 24.52, throughput 5713.96 samples/s
[Epoch 33 Batch 4000] loss 3.21, ppl 24.72, throughput 5268.71 samples/s
[Epoch 33 Batch 5000] loss 3.19, ppl 24.33, throughput 5266.79 samples/s
[Epoch 33 Batch 6000] loss 3.21, ppl 24.71, throughput 5311.01 samples/s
[Epoch 33 Batch 7000] loss 3.21, ppl 24.70, throughput 5299.27 samples/s
[Epoch 33 Batch 8000] loss 3.19, ppl 24.40, throughput 5164.60 samples/s
[Epoch 33 Batch 9000] loss 3.20, ppl 24.53, throughput 5312.17 samples/s
[Epoch 33 Batch 10000] loss 3.20, ppl 24.60, throughput 5435.56 samples/s
[Epoch 33 Batch 11000] loss 3.20, ppl 24.48, throughput 5769.42 samples/s
[Epoch 33 Batch 12000] loss 3.21, ppl 24.81, throughput 5619.73 samples/s
[Epoch 33 Batch 13000] loss 3.21, ppl 24.74, throughput 5772.89 samples/s
[Epoch 33 Batch 14000] loss 3.20, ppl 24.61, throughput 5694.47 samples/s
[Epoch 33 Batch 15000] loss 3.21, ppl 24.77, throughput 5614.11 samples/s
[Epoch 33 Batch 16000] loss 3.20, ppl 24.59, throughput 5734.42 samples/s
[Epoch 33 Batch 17000] loss 3.20, ppl 24.62, throughput 5748.48 samples/s
[Epoch 33 Batch 18000] loss 3.21, ppl 24.81, throughput 5709.26 samples/s
[Epoch 33 Batch 19000] loss 3.21, ppl 24.66, throughput 5557.43 samples/s
[Epoch 33 Batch 20000] loss 3.21, ppl 24.80, throughput 5751.16 samples/s
[Epoch 33 Batch 21000] loss 3.21, ppl 24.80, throughput 5770.23 samples/s
[Epoch 33 Batch 22000] loss 3.21, ppl 24.78, throughput 5772.05 samples/s
[Epoch 33 Batch 23000] loss 3.21, ppl 24.70, throughput 5599.90 samples/s
[Epoch 33 Batch 24000] loss 3.21, ppl 24.82, throughput 5696.24 samples/s
[Epoch 33 Batch 25000] loss 3.21, ppl 24.68, throughput 5772.11 samples/s
[Epoch 33 Batch 26000] loss 3.21, ppl 24.77, throughput 5533.47 samples/s
[Epoch 33 Batch 27000] loss 3.21, ppl 24.70, throughput 5722.27 samples/s
[Epoch 33 Batch 28000] loss 3.21, ppl 24.83, throughput 5707.78 samples/s
[Epoch 33 Batch 29000] loss 3.21, ppl 24.89, throughput 5740.37 samples/s
[Epoch 33 Batch 30000] loss 3.21, ppl 24.89, throughput 5526.01 samples/s
[Epoch 33 Batch 31000] loss 3.21, ppl 24.89, throughput 5767.68 samples/s
[Epoch 33 Batch 32000] loss 3.22, ppl 24.91, throughput 5715.36 samples/s
[Epoch 33 Batch 33000] loss 3.21, ppl 24.85, throughput 5748.77 samples/s
[Epoch 33 Batch 34000] loss 3.21, ppl 24.73, throughput 5571.61 samples/s
[Epoch 33 Batch 35000] loss 3.21, ppl 24.81, throughput 5771.58 samples/s
[Epoch 33 Batch 36000] loss 3.22, ppl 24.91, throughput 5695.99 samples/s
[Epoch 33 Batch 37000] loss 3.22, ppl 24.93, throughput 5707.13 samples/s
[Epoch 33 Batch 38000] loss 3.21, ppl 24.83, throughput 5553.98 samples/s
[Epoch 33 Batch 39000] loss 3.21, ppl 24.84, throughput 5725.16 samples/s
[Epoch 33 Batch 40000] loss 3.21, ppl 24.89, throughput 5727.03 samples/s
[Epoch 33 Batch 41000] loss 3.21, ppl 24.81, throughput 5617.38 samples/s
[Epoch 33 Batch 42000] loss 3.21, ppl 24.81, throughput 5723.62 samples/s
[Epoch 33 Batch 43000] loss 3.21, ppl 24.80, throughput 5761.72 samples/s
[Epoch 33 Batch 44000] loss 3.21, ppl 24.82, throughput 5713.89 samples/s
[Epoch 33 Batch 45000] loss 3.21, ppl 24.86, throughput 5605.48 samples/s
[Epoch 33 Batch 46000] loss 3.21, ppl 24.84, throughput 5751.09 samples/s
[Epoch 33 Batch 47000] loss 3.21, ppl 24.77, throughput 5740.44 samples/s
[Epoch 33 Batch 48000] loss 3.21, ppl 24.79, throughput 5707.36 samples/s
[Epoch 33 Batch 49000] loss 3.22, ppl 24.93, throughput 5581.49 samples/s
[Epoch 33 Batch 50000] loss 3.22, ppl 24.96, throughput 5785.37 samples/s
[Epoch 33 Batch 51000] loss 3.22, ppl 24.94, throughput 5721.39 samples/s
[Epoch 33 Batch 52000] loss 3.21, ppl 24.88, throughput 5682.68 samples/s
[Epoch 33 Batch 53000] loss 3.22, ppl 24.91, throughput 5581.22 samples/s
[Epoch 33 Batch 54000] loss 3.22, ppl 24.91, throughput 5753.96 samples/s
[Epoch 33 Batch 55000] loss 3.22, ppl 24.92, throughput 5765.33 samples/s
[Epoch 33 Batch 56000] loss 3.21, ppl 24.89, throughput 5515.81 samples/s
[Epoch 33 Batch 57000] loss 3.21, ppl 24.87, throughput 5802.02 samples/s
[Epoch 33 Batch 58000] loss 3.21, ppl 24.88, throughput 5730.28 samples/s
[Epoch 33 Batch 59000] loss 3.22, ppl 24.98, throughput 5764.14 samples/s
[Epoch 33 Batch 60000] loss 3.22, ppl 24.96, throughput 5613.51 samples/s
[Epoch 33 Batch 61000] loss 3.22, ppl 24.95, throughput 5743.42 samples/s
[Epoch 33 Batch 62000] loss 3.22, ppl 24.95, throughput 5769.70 samples/s
[Epoch 33 Batch 63000] loss 3.22, ppl 25.00, throughput 5711.38 samples/s
[Epoch 33 Batch 64000] loss 3.22, ppl 24.95, throughput 5543.31 samples/s
[Epoch 33 Batch 65000] loss 3.22, ppl 24.93, throughput 5752.40 samples/s
[Epoch 33 Batch 66000] loss 3.22, ppl 24.99, throughput 5710.16 samples/s
[Epoch 33 Batch 67000] loss 3.22, ppl 25.00, throughput 5536.75 samples/s
[Epoch 33 Batch 68000] loss 3.22, ppl 24.99, throughput 5772.64 samples/s
[Epoch 33 Batch 69000] loss 3.22, ppl 24.94, throughput 5745.49 samples/s
[Epoch 33 Batch 70000] loss 3.22, ppl 24.96, throughput 5734.46 samples/s
[Epoch 33 Batch 71000] loss 3.22, ppl 24.93, throughput 5526.44 samples/s
[Epoch 33 Batch 72000] loss 3.22, ppl 24.99, throughput 5731.01 samples/s
[Epoch 33 Batch 73000] loss 3.21, ppl 24.90, throughput 5755.00 samples/s
[Epoch 33 Batch 74000] loss 3.22, ppl 24.92, throughput 5732.31 samples/s
[Epoch 33 Batch 75000] loss 3.22, ppl 24.97, throughput 5514.52 samples/s
[Epoch 33 Batch 76000] loss 3.22, ppl 24.90, throughput 5754.37 samples/s
[Epoch 33 Batch 77000] loss 3.22, ppl 24.96, throughput 5724.11 samples/s
[Epoch 33 Batch 78000] loss 3.22, ppl 24.98, throughput 5734.91 samples/s
Epoch 33 took 7065.52 seconds.
[Epoch 34 Batch 1000] loss 3.20, ppl 24.60, throughput 5598.66 samples/s
[Epoch 34 Batch 2000] loss 3.19, ppl 24.27, throughput 5718.32 samples/s
[Epoch 34 Batch 3000] loss 3.19, ppl 24.37, throughput 5788.25 samples/s
[Epoch 34 Batch 4000] loss 3.20, ppl 24.64, throughput 5557.45 samples/s
[Epoch 34 Batch 5000] loss 3.20, ppl 24.62, throughput 5568.33 samples/s
[Epoch 34 Batch 6000] loss 3.20, ppl 24.59, throughput 5361.66 samples/s
[Epoch 34 Batch 7000] loss 3.20, ppl 24.59, throughput 5286.84 samples/s
[Epoch 34 Batch 8000] loss 3.20, ppl 24.49, throughput 5141.58 samples/s
[Epoch 34 Batch 9000] loss 3.19, ppl 24.30, throughput 5317.19 samples/s
[Epoch 34 Batch 10000] loss 3.19, ppl 24.38, throughput 5361.66 samples/s
[Epoch 34 Batch 11000] loss 3.21, ppl 24.72, throughput 5116.45 samples/s
[Epoch 34 Batch 12000] loss 3.20, ppl 24.60, throughput 5548.94 samples/s
[Epoch 34 Batch 13000] loss 3.21, ppl 24.70, throughput 5724.63 samples/s
[Epoch 34 Batch 14000] loss 3.20, ppl 24.54, throughput 5736.35 samples/s
[Epoch 34 Batch 15000] loss 3.20, ppl 24.49, throughput 5530.16 samples/s
[Epoch 34 Batch 16000] loss 3.20, ppl 24.59, throughput 5785.01 samples/s
[Epoch 34 Batch 17000] loss 3.21, ppl 24.75, throughput 5719.15 samples/s
[Epoch 34 Batch 18000] loss 3.21, ppl 24.74, throughput 5726.74 samples/s
[Epoch 34 Batch 19000] loss 3.21, ppl 24.76, throughput 5575.04 samples/s
[Epoch 34 Batch 20000] loss 3.20, ppl 24.62, throughput 5735.98 samples/s
[Epoch 34 Batch 21000] loss 3.21, ppl 24.78, throughput 5793.05 samples/s
[Epoch 34 Batch 22000] loss 3.21, ppl 24.84, throughput 5759.60 samples/s
[Epoch 34 Batch 23000] loss 3.21, ppl 24.73, throughput 5592.26 samples/s
[Epoch 34 Batch 24000] loss 3.21, ppl 24.71, throughput 5706.91 samples/s
[Epoch 34 Batch 25000] loss 3.20, ppl 24.63, throughput 5803.12 samples/s
[Epoch 34 Batch 26000] loss 3.21, ppl 24.75, throughput 5736.02 samples/s
[Epoch 34 Batch 27000] loss 3.21, ppl 24.76, throughput 5623.39 samples/s
[Epoch 34 Batch 28000] loss 3.21, ppl 24.66, throughput 5660.16 samples/s
[Epoch 34 Batch 29000] loss 3.21, ppl 24.77, throughput 5787.18 samples/s
[Epoch 34 Batch 30000] loss 3.21, ppl 24.71, throughput 5576.18 samples/s
[Epoch 34 Batch 31000] loss 3.21, ppl 24.77, throughput 5782.80 samples/s
[Epoch 34 Batch 32000] loss 3.21, ppl 24.79, throughput 5746.04 samples/s
[Epoch 34 Batch 33000] loss 3.21, ppl 24.77, throughput 5744.85 samples/s
[Epoch 34 Batch 34000] loss 3.21, ppl 24.81, throughput 5544.60 samples/s
[Epoch 34 Batch 35000] loss 3.21, ppl 24.79, throughput 5815.75 samples/s
[Epoch 34 Batch 36000] loss 3.21, ppl 24.69, throughput 5721.07 samples/s
[Epoch 34 Batch 37000] loss 3.21, ppl 24.73, throughput 5778.31 samples/s
[Epoch 34 Batch 38000] loss 3.21, ppl 24.82, throughput 5519.44 samples/s
[Epoch 34 Batch 39000] loss 3.21, ppl 24.77, throughput 5760.94 samples/s
[Epoch 34 Batch 40000] loss 3.21, ppl 24.80, throughput 5710.79 samples/s
[Epoch 34 Batch 41000] loss 3.21, ppl 24.87, throughput 5580.03 samples/s
[Epoch 34 Batch 42000] loss 3.21, ppl 24.82, throughput 5781.79 samples/s
[Epoch 34 Batch 43000] loss 3.21, ppl 24.68, throughput 5761.35 samples/s
[Epoch 34 Batch 44000] loss 3.21, ppl 24.77, throughput 5738.63 samples/s
[Epoch 34 Batch 45000] loss 3.21, ppl 24.80, throughput 5609.76 samples/s
[Epoch 34 Batch 46000] loss 3.21, ppl 24.83, throughput 5771.89 samples/s
[Epoch 34 Batch 47000] loss 3.21, ppl 24.85, throughput 5710.35 samples/s
[Epoch 34 Batch 48000] loss 3.21, ppl 24.80, throughput 5769.10 samples/s
[Epoch 34 Batch 49000] loss 3.21, ppl 24.88, throughput 5575.84 samples/s
[Epoch 34 Batch 50000] loss 3.21, ppl 24.86, throughput 5673.15 samples/s
[Epoch 34 Batch 51000] loss 3.21, ppl 24.78, throughput 5754.74 samples/s
[Epoch 34 Batch 52000] loss 3.21, ppl 24.87, throughput 5776.46 samples/s
[Epoch 34 Batch 53000] loss 3.21, ppl 24.88, throughput 5551.06 samples/s
[Epoch 34 Batch 54000] loss 3.22, ppl 24.91, throughput 5786.64 samples/s
[Epoch 34 Batch 55000] loss 3.21, ppl 24.86, throughput 5798.60 samples/s
[Epoch 34 Batch 56000] loss 3.21, ppl 24.87, throughput 5615.47 samples/s
[Epoch 34 Batch 57000] loss 3.21, ppl 24.81, throughput 5761.74 samples/s
[Epoch 34 Batch 58000] loss 3.21, ppl 24.86, throughput 5754.13 samples/s
[Epoch 34 Batch 59000] loss 3.21, ppl 24.85, throughput 5776.80 samples/s
[Epoch 34 Batch 60000] loss 3.21, ppl 24.84, throughput 5575.27 samples/s
[Epoch 34 Batch 61000] loss 3.21, ppl 24.86, throughput 5730.14 samples/s
[Epoch 34 Batch 62000] loss 3.21, ppl 24.82, throughput 5815.62 samples/s
[Epoch 34 Batch 63000] loss 3.21, ppl 24.89, throughput 5750.22 samples/s
[Epoch 34 Batch 64000] loss 3.21, ppl 24.85, throughput 5589.73 samples/s
[Epoch 34 Batch 65000] loss 3.21, ppl 24.87, throughput 5745.90 samples/s
[Epoch 34 Batch 66000] loss 3.22, ppl 24.94, throughput 5706.31 samples/s
[Epoch 34 Batch 67000] loss 3.21, ppl 24.89, throughput 5577.12 samples/s
[Epoch 34 Batch 68000] loss 3.21, ppl 24.85, throughput 5764.58 samples/s
[Epoch 34 Batch 69000] loss 3.22, ppl 24.94, throughput 5687.46 samples/s
[Epoch 34 Batch 70000] loss 3.21, ppl 24.87, throughput 5767.50 samples/s
[Epoch 34 Batch 71000] loss 3.21, ppl 24.90, throughput 5559.15 samples/s
[Epoch 34 Batch 72000] loss 3.21, ppl 24.82, throughput 5749.47 samples/s
[Epoch 34 Batch 73000] loss 3.22, ppl 24.95, throughput 5679.48 samples/s
[Epoch 34 Batch 74000] loss 3.21, ppl 24.89, throughput 5716.36 samples/s
[Epoch 34 Batch 75000] loss 3.21, ppl 24.88, throughput 5539.07 samples/s
[Epoch 34 Batch 76000] loss 3.22, ppl 24.96, throughput 5759.88 samples/s
[Epoch 34 Batch 77000] loss 3.21, ppl 24.90, throughput 5709.09 samples/s
[Epoch 34 Batch 78000] loss 3.22, ppl 24.93, throughput 5767.82 samples/s
Epoch 34 took 7059.84 seconds.
[Epoch 35 Batch 1000] loss 3.20, ppl 24.53, throughput 5581.61 samples/s
[Epoch 35 Batch 2000] loss 3.20, ppl 24.46, throughput 5708.57 samples/s
[Epoch 35 Batch 3000] loss 3.19, ppl 24.38, throughput 5757.23 samples/s
[Epoch 35 Batch 4000] loss 3.19, ppl 24.18, throughput 5551.92 samples/s
[Epoch 35 Batch 5000] loss 3.20, ppl 24.46, throughput 5704.83 samples/s
[Epoch 35 Batch 6000] loss 3.20, ppl 24.44, throughput 5728.93 samples/s
[Epoch 35 Batch 7000] loss 3.20, ppl 24.56, throughput 4683.97 samples/s
[Epoch 35 Batch 8000] loss 3.19, ppl 24.32, throughput 4517.64 samples/s
[Epoch 35 Batch 9000] loss 3.20, ppl 24.45, throughput 4761.47 samples/s
[Epoch 35 Batch 10000] loss 3.20, ppl 24.63, throughput 5316.66 samples/s
[Epoch 35 Batch 11000] loss 3.20, ppl 24.60, throughput 5331.80 samples/s
[Epoch 35 Batch 12000] loss 3.20, ppl 24.56, throughput 5150.54 samples/s
[Epoch 35 Batch 13000] loss 3.20, ppl 24.43, throughput 5704.00 samples/s
[Epoch 35 Batch 14000] loss 3.20, ppl 24.64, throughput 5741.38 samples/s
[Epoch 35 Batch 15000] loss 3.20, ppl 24.60, throughput 5566.27 samples/s
[Epoch 35 Batch 16000] loss 3.20, ppl 24.47, throughput 5708.26 samples/s
[Epoch 35 Batch 17000] loss 3.20, ppl 24.52, throughput 5667.74 samples/s
[Epoch 35 Batch 18000] loss 3.21, ppl 24.67, throughput 5720.52 samples/s
[Epoch 35 Batch 19000] loss 3.20, ppl 24.51, throughput 5561.83 samples/s
[Epoch 35 Batch 20000] loss 3.21, ppl 24.66, throughput 5751.76 samples/s
[Epoch 35 Batch 21000] loss 3.20, ppl 24.58, throughput 5692.48 samples/s
[Epoch 35 Batch 22000] loss 3.20, ppl 24.60, throughput 5800.16 samples/s
[Epoch 35 Batch 23000] loss 3.20, ppl 24.65, throughput 5581.50 samples/s
[Epoch 35 Batch 24000] loss 3.20, ppl 24.54, throughput 5696.04 samples/s
[Epoch 35 Batch 25000] loss 3.21, ppl 24.67, throughput 5676.60 samples/s
[Epoch 35 Batch 26000] loss 3.21, ppl 24.77, throughput 5561.15 samples/s
[Epoch 35 Batch 27000] loss 3.21, ppl 24.75, throughput 5752.01 samples/s
[Epoch 35 Batch 28000] loss 3.21, ppl 24.67, throughput 5808.28 samples/s
[Epoch 35 Batch 29000] loss 3.21, ppl 24.68, throughput 5752.32 samples/s
[Epoch 35 Batch 30000] loss 3.20, ppl 24.61, throughput 5581.48 samples/s
[Epoch 35 Batch 31000] loss 3.21, ppl 24.73, throughput 5753.77 samples/s
[Epoch 35 Batch 32000] loss 3.21, ppl 24.72, throughput 5724.08 samples/s
[Epoch 35 Batch 33000] loss 3.21, ppl 24.73, throughput 5772.08 samples/s
[Epoch 35 Batch 34000] loss 3.21, ppl 24.79, throughput 5596.52 samples/s
[Epoch 35 Batch 35000] loss 3.21, ppl 24.73, throughput 5698.61 samples/s
[Epoch 35 Batch 36000] loss 3.21, ppl 24.67, throughput 5695.90 samples/s
[Epoch 35 Batch 37000] loss 3.20, ppl 24.64, throughput 5775.50 samples/s
[Epoch 35 Batch 38000] loss 3.21, ppl 24.66, throughput 5555.54 samples/s
[Epoch 35 Batch 39000] loss 3.21, ppl 24.81, throughput 5719.63 samples/s
[Epoch 35 Batch 40000] loss 3.21, ppl 24.74, throughput 5740.22 samples/s
[Epoch 35 Batch 41000] loss 3.21, ppl 24.73, throughput 5541.32 samples/s
[Epoch 35 Batch 42000] loss 3.21, ppl 24.75, throughput 5753.99 samples/s
[Epoch 35 Batch 43000] loss 3.21, ppl 24.79, throughput 5777.54 samples/s
[Epoch 35 Batch 44000] loss 3.21, ppl 24.70, throughput 5748.43 samples/s
[Epoch 35 Batch 45000] loss 3.21, ppl 24.72, throughput 5564.63 samples/s
[Epoch 35 Batch 46000] loss 3.21, ppl 24.78, throughput 5714.57 samples/s
[Epoch 35 Batch 47000] loss 3.21, ppl 24.71, throughput 5714.48 samples/s
[Epoch 35 Batch 48000] loss 3.21, ppl 24.77, throughput 5779.15 samples/s
[Epoch 35 Batch 49000] loss 3.21, ppl 24.86, throughput 5560.51 samples/s
[Epoch 35 Batch 50000] loss 3.21, ppl 24.80, throughput 5769.46 samples/s
[Epoch 35 Batch 51000] loss 3.21, ppl 24.80, throughput 5742.39 samples/s
[Epoch 35 Batch 52000] loss 3.21, ppl 24.82, throughput 5755.34 samples/s
[Epoch 35 Batch 53000] loss 3.21, ppl 24.87, throughput 5601.84 samples/s
[Epoch 35 Batch 54000] loss 3.21, ppl 24.77, throughput 5761.16 samples/s
[Epoch 35 Batch 55000] loss 3.22, ppl 24.91, throughput 5767.95 samples/s
[Epoch 35 Batch 56000] loss 3.21, ppl 24.81, throughput 5554.58 samples/s
[Epoch 35 Batch 57000] loss 3.21, ppl 24.81, throughput 5736.23 samples/s
[Epoch 35 Batch 58000] loss 3.21, ppl 24.80, throughput 5711.15 samples/s
[Epoch 35 Batch 59000] loss 3.21, ppl 24.75, throughput 5705.45 samples/s
[Epoch 35 Batch 60000] loss 3.21, ppl 24.82, throughput 5531.45 samples/s
[Epoch 35 Batch 61000] loss 3.21, ppl 24.81, throughput 5742.09 samples/s
[Epoch 35 Batch 62000] loss 3.21, ppl 24.87, throughput 5751.40 samples/s
[Epoch 35 Batch 63000] loss 3.21, ppl 24.89, throughput 5725.82 samples/s
[Epoch 35 Batch 64000] loss 3.21, ppl 24.74, throughput 5549.51 samples/s
[Epoch 35 Batch 65000] loss 3.21, ppl 24.79, throughput 5702.92 samples/s
[Epoch 35 Batch 66000] loss 3.21, ppl 24.76, throughput 5713.27 samples/s
[Epoch 35 Batch 67000] loss 3.21, ppl 24.76, throughput 5631.69 samples/s
[Epoch 35 Batch 68000] loss 3.22, ppl 24.94, throughput 5760.53 samples/s
[Epoch 35 Batch 69000] loss 3.21, ppl 24.85, throughput 5733.03 samples/s
[Epoch 35 Batch 70000] loss 3.21, ppl 24.87, throughput 5737.39 samples/s
[Epoch 35 Batch 71000] loss 3.21, ppl 24.81, throughput 5533.84 samples/s
[Epoch 35 Batch 72000] loss 3.21, ppl 24.83, throughput 5694.37 samples/s
[Epoch 35 Batch 73000] loss 3.21, ppl 24.81, throughput 5776.82 samples/s
[Epoch 35 Batch 74000] loss 3.21, ppl 24.82, throughput 5700.96 samples/s
[Epoch 35 Batch 75000] loss 3.22, ppl 24.92, throughput 5575.81 samples/s
[Epoch 35 Batch 76000] loss 3.21, ppl 24.86, throughput 5719.04 samples/s
[Epoch 35 Batch 77000] loss 3.21, ppl 24.90, throughput 5712.32 samples/s
[Epoch 35 Batch 78000] loss 3.21, ppl 24.87, throughput 5717.21 samples/s
Epoch 35 took 7104.95 seconds.
[Epoch 36 Batch 1000] loss 3.20, ppl 24.43, throughput 5279.76 samples/s
[Epoch 36 Batch 2000] loss 3.20, ppl 24.53, throughput 5346.44 samples/s
[Epoch 36 Batch 3000] loss 3.20, ppl 24.46, throughput 5342.75 samples/s
[Epoch 36 Batch 4000] loss 3.20, ppl 24.41, throughput 5214.85 samples/s
[Epoch 36 Batch 5000] loss 3.20, ppl 24.46, throughput 5270.09 samples/s
[Epoch 36 Batch 6000] loss 3.20, ppl 24.60, throughput 5274.60 samples/s
[Epoch 36 Batch 7000] loss 3.20, ppl 24.65, throughput 5431.91 samples/s
[Epoch 36 Batch 8000] loss 3.20, ppl 24.61, throughput 5582.20 samples/s
[Epoch 36 Batch 9000] loss 3.20, ppl 24.49, throughput 5730.63 samples/s
[Epoch 36 Batch 10000] loss 3.20, ppl 24.45, throughput 5742.33 samples/s
[Epoch 36 Batch 11000] loss 3.20, ppl 24.52, throughput 5760.42 samples/s
[Epoch 36 Batch 12000] loss 3.20, ppl 24.56, throughput 5589.82 samples/s
[Epoch 36 Batch 13000] loss 3.20, ppl 24.50, throughput 5736.81 samples/s
[Epoch 36 Batch 14000] loss 3.19, ppl 24.40, throughput 5750.84 samples/s
[Epoch 36 Batch 15000] loss 3.20, ppl 24.44, throughput 5613.19 samples/s
[Epoch 36 Batch 16000] loss 3.19, ppl 24.40, throughput 5736.93 samples/s
[Epoch 36 Batch 17000] loss 3.20, ppl 24.49, throughput 5738.97 samples/s
[Epoch 36 Batch 18000] loss 3.20, ppl 24.49, throughput 5792.47 samples/s
[Epoch 36 Batch 19000] loss 3.20, ppl 24.55, throughput 5562.10 samples/s
[Epoch 36 Batch 20000] loss 3.20, ppl 24.48, throughput 5770.09 samples/s
[Epoch 36 Batch 21000] loss 3.20, ppl 24.45, throughput 5787.55 samples/s
[Epoch 36 Batch 22000] loss 3.20, ppl 24.45, throughput 5753.29 samples/s
[Epoch 36 Batch 23000] loss 3.20, ppl 24.54, throughput 5638.00 samples/s
[Epoch 36 Batch 24000] loss 3.20, ppl 24.56, throughput 5763.98 samples/s
[Epoch 36 Batch 25000] loss 3.20, ppl 24.51, throughput 5705.60 samples/s
[Epoch 36 Batch 26000] loss 3.20, ppl 24.50, throughput 5707.59 samples/s
[Epoch 36 Batch 27000] loss 3.20, ppl 24.61, throughput 5612.54 samples/s
[Epoch 36 Batch 28000] loss 3.20, ppl 24.62, throughput 5718.46 samples/s
[Epoch 36 Batch 29000] loss 3.20, ppl 24.61, throughput 5750.66 samples/s
[Epoch 36 Batch 30000] loss 3.21, ppl 24.66, throughput 5576.52 samples/s
[Epoch 36 Batch 31000] loss 3.21, ppl 24.77, throughput 5751.10 samples/s
[Epoch 36 Batch 32000] loss 3.21, ppl 24.68, throughput 5759.97 samples/s
[Epoch 36 Batch 33000] loss 3.20, ppl 24.65, throughput 5760.31 samples/s
[Epoch 36 Batch 34000] loss 3.20, ppl 24.57, throughput 5532.42 samples/s
[Epoch 36 Batch 35000] loss 3.21, ppl 24.68, throughput 5724.56 samples/s
[Epoch 36 Batch 36000] loss 3.21, ppl 24.66, throughput 5792.36 samples/s
[Epoch 36 Batch 37000] loss 3.20, ppl 24.59, throughput 5759.10 samples/s
[Epoch 36 Batch 38000] loss 3.20, ppl 24.54, throughput 5545.25 samples/s
[Epoch 36 Batch 39000] loss 3.21, ppl 24.74, throughput 5763.92 samples/s
[Epoch 36 Batch 40000] loss 3.21, ppl 24.67, throughput 5707.22 samples/s
[Epoch 36 Batch 41000] loss 3.20, ppl 24.58, throughput 5616.38 samples/s
[Epoch 36 Batch 42000] loss 3.21, ppl 24.70, throughput 5795.79 samples/s
[Epoch 36 Batch 43000] loss 3.21, ppl 24.76, throughput 5694.99 samples/s
[Epoch 36 Batch 44000] loss 3.21, ppl 24.67, throughput 5777.57 samples/s
[Epoch 36 Batch 45000] loss 3.21, ppl 24.73, throughput 5594.63 samples/s
[Epoch 36 Batch 46000] loss 3.20, ppl 24.63, throughput 5766.41 samples/s
[Epoch 36 Batch 47000] loss 3.21, ppl 24.69, throughput 5745.28 samples/s
[Epoch 36 Batch 48000] loss 3.21, ppl 24.83, throughput 5750.12 samples/s
[Epoch 36 Batch 49000] loss 3.21, ppl 24.73, throughput 5614.97 samples/s
[Epoch 36 Batch 50000] loss 3.21, ppl 24.75, throughput 5783.63 samples/s
[Epoch 36 Batch 51000] loss 3.21, ppl 24.68, throughput 5697.56 samples/s
[Epoch 36 Batch 52000] loss 3.21, ppl 24.76, throughput 5769.46 samples/s
[Epoch 36 Batch 53000] loss 3.20, ppl 24.63, throughput 5600.59 samples/s
[Epoch 36 Batch 54000] loss 3.21, ppl 24.74, throughput 5759.29 samples/s
[Epoch 36 Batch 55000] loss 3.21, ppl 24.70, throughput 5768.75 samples/s
[Epoch 36 Batch 56000] loss 3.21, ppl 24.82, throughput 5525.90 samples/s
[Epoch 36 Batch 57000] loss 3.21, ppl 24.68, throughput 5758.12 samples/s
[Epoch 36 Batch 58000] loss 3.21, ppl 24.75, throughput 5725.36 samples/s
[Epoch 36 Batch 59000] loss 3.21, ppl 24.66, throughput 5669.47 samples/s
[Epoch 36 Batch 60000] loss 3.21, ppl 24.77, throughput 5554.44 samples/s
[Epoch 36 Batch 61000] loss 3.21, ppl 24.81, throughput 5766.96 samples/s
[Epoch 36 Batch 62000] loss 3.21, ppl 24.77, throughput 5732.52 samples/s
[Epoch 36 Batch 63000] loss 3.21, ppl 24.83, throughput 5655.03 samples/s
[Epoch 36 Batch 64000] loss 3.21, ppl 24.76, throughput 5554.25 samples/s
[Epoch 36 Batch 65000] loss 3.21, ppl 24.71, throughput 5686.12 samples/s
[Epoch 36 Batch 66000] loss 3.21, ppl 24.83, throughput 5770.67 samples/s
[Epoch 36 Batch 67000] loss 3.21, ppl 24.83, throughput 5585.63 samples/s
[Epoch 36 Batch 68000] loss 3.21, ppl 24.81, throughput 5711.07 samples/s
[Epoch 36 Batch 69000] loss 3.21, ppl 24.79, throughput 5741.77 samples/s
[Epoch 36 Batch 70000] loss 3.21, ppl 24.82, throughput 5742.86 samples/s
[Epoch 36 Batch 71000] loss 3.21, ppl 24.74, throughput 5569.80 samples/s
[Epoch 36 Batch 72000] loss 3.21, ppl 24.84, throughput 5749.86 samples/s
[Epoch 36 Batch 73000] loss 3.21, ppl 24.82, throughput 5742.56 samples/s
[Epoch 36 Batch 74000] loss 3.21, ppl 24.89, throughput 5754.35 samples/s
[Epoch 36 Batch 75000] loss 3.21, ppl 24.77, throughput 5553.80 samples/s
[Epoch 36 Batch 76000] loss 3.21, ppl 24.86, throughput 5767.48 samples/s
[Epoch 36 Batch 77000] loss 3.21, ppl 24.81, throughput 5726.20 samples/s
[Epoch 36 Batch 78000] loss 3.21, ppl 24.84, throughput 5753.68 samples/s
Epoch 36 took 7060.10 seconds.
[Epoch 37 Batch 1000] loss 3.18, ppl 24.02, throughput 5606.98 samples/s
[Epoch 37 Batch 2000] loss 3.17, ppl 23.71, throughput 5637.24 samples/s
[Epoch 37 Batch 3000] loss 3.19, ppl 24.37, throughput 5268.97 samples/s
[Epoch 37 Batch 4000] loss 3.19, ppl 24.39, throughput 4499.92 samples/s
[Epoch 37 Batch 5000] loss 3.20, ppl 24.52, throughput 4642.99 samples/s
[Epoch 37 Batch 6000] loss 3.20, ppl 24.50, throughput 4589.79 samples/s
[Epoch 37 Batch 7000] loss 3.19, ppl 24.37, throughput 4623.91 samples/s
[Epoch 37 Batch 8000] loss 3.20, ppl 24.45, throughput 5019.13 samples/s
[Epoch 37 Batch 9000] loss 3.19, ppl 24.37, throughput 5753.05 samples/s
[Epoch 37 Batch 10000] loss 3.20, ppl 24.51, throughput 5712.64 samples/s
[Epoch 37 Batch 11000] loss 3.20, ppl 24.43, throughput 5746.97 samples/s
[Epoch 37 Batch 12000] loss 3.20, ppl 24.50, throughput 5584.62 samples/s
[Epoch 37 Batch 13000] loss 3.19, ppl 24.38, throughput 5740.55 samples/s
[Epoch 37 Batch 14000] loss 3.20, ppl 24.44, throughput 5790.52 samples/s
[Epoch 37 Batch 15000] loss 3.20, ppl 24.50, throughput 5553.55 samples/s
[Epoch 37 Batch 16000] loss 3.20, ppl 24.52, throughput 5741.47 samples/s
[Epoch 37 Batch 17000] loss 3.20, ppl 24.61, throughput 5755.16 samples/s
[Epoch 37 Batch 18000] loss 3.20, ppl 24.58, throughput 5784.34 samples/s
[Epoch 37 Batch 19000] loss 3.20, ppl 24.45, throughput 5600.56 samples/s
[Epoch 37 Batch 20000] loss 3.20, ppl 24.46, throughput 5716.59 samples/s
[Epoch 37 Batch 21000] loss 3.20, ppl 24.51, throughput 5723.39 samples/s
[Epoch 37 Batch 22000] loss 3.20, ppl 24.55, throughput 5769.47 samples/s
[Epoch 37 Batch 23000] loss 3.20, ppl 24.57, throughput 5574.23 samples/s
[Epoch 37 Batch 24000] loss 3.20, ppl 24.45, throughput 5708.12 samples/s
[Epoch 37 Batch 25000] loss 3.20, ppl 24.65, throughput 5744.37 samples/s
[Epoch 37 Batch 26000] loss 3.20, ppl 24.56, throughput 5703.43 samples/s
[Epoch 37 Batch 27000] loss 3.21, ppl 24.66, throughput 5592.05 samples/s
[Epoch 37 Batch 28000] loss 3.20, ppl 24.51, throughput 5740.09 samples/s
[Epoch 37 Batch 29000] loss 3.20, ppl 24.56, throughput 5762.51 samples/s
[Epoch 37 Batch 30000] loss 3.20, ppl 24.57, throughput 5594.97 samples/s
[Epoch 37 Batch 31000] loss 3.21, ppl 24.68, throughput 5776.33 samples/s
[Epoch 37 Batch 32000] loss 3.20, ppl 24.55, throughput 5795.36 samples/s
[Epoch 37 Batch 33000] loss 3.20, ppl 24.44, throughput 5670.05 samples/s
[Epoch 37 Batch 34000] loss 3.20, ppl 24.58, throughput 5541.46 samples/s
[Epoch 37 Batch 35000] loss 3.20, ppl 24.56, throughput 5718.88 samples/s
[Epoch 37 Batch 36000] loss 3.20, ppl 24.57, throughput 5707.73 samples/s
[Epoch 37 Batch 37000] loss 3.20, ppl 24.65, throughput 5724.65 samples/s
[Epoch 37 Batch 38000] loss 3.21, ppl 24.67, throughput 5608.71 samples/s
[Epoch 37 Batch 39000] loss 3.20, ppl 24.64, throughput 5751.15 samples/s
[Epoch 37 Batch 40000] loss 3.21, ppl 24.66, throughput 5693.11 samples/s
[Epoch 37 Batch 41000] loss 3.20, ppl 24.63, throughput 5559.17 samples/s
[Epoch 37 Batch 42000] loss 3.20, ppl 24.66, throughput 5776.75 samples/s
[Epoch 37 Batch 43000] loss 3.21, ppl 24.71, throughput 5718.33 samples/s
[Epoch 37 Batch 44000] loss 3.20, ppl 24.62, throughput 5742.11 samples/s
[Epoch 37 Batch 45000] loss 3.21, ppl 24.68, throughput 5542.80 samples/s
[Epoch 37 Batch 46000] loss 3.20, ppl 24.64, throughput 5737.83 samples/s
[Epoch 37 Batch 47000] loss 3.21, ppl 24.70, throughput 5707.23 samples/s
[Epoch 37 Batch 48000] loss 3.20, ppl 24.57, throughput 5776.64 samples/s
[Epoch 37 Batch 49000] loss 3.21, ppl 24.67, throughput 5573.20 samples/s
[Epoch 37 Batch 50000] loss 3.21, ppl 24.69, throughput 5733.48 samples/s
[Epoch 37 Batch 51000] loss 3.20, ppl 24.64, throughput 5753.21 samples/s
[Epoch 37 Batch 52000] loss 3.21, ppl 24.72, throughput 5716.77 samples/s
[Epoch 37 Batch 53000] loss 3.21, ppl 24.69, throughput 5596.68 samples/s
[Epoch 37 Batch 54000] loss 3.21, ppl 24.72, throughput 5711.39 samples/s
[Epoch 37 Batch 55000] loss 3.21, ppl 24.67, throughput 5739.26 samples/s
[Epoch 37 Batch 56000] loss 3.21, ppl 24.69, throughput 5578.71 samples/s
[Epoch 37 Batch 57000] loss 3.21, ppl 24.66, throughput 5753.57 samples/s
[Epoch 37 Batch 58000] loss 3.20, ppl 24.65, throughput 5728.45 samples/s
[Epoch 37 Batch 59000] loss 3.21, ppl 24.77, throughput 5732.98 samples/s
[Epoch 37 Batch 60000] loss 3.21, ppl 24.79, throughput 5566.36 samples/s
[Epoch 37 Batch 61000] loss 3.21, ppl 24.73, throughput 5738.80 samples/s
[Epoch 37 Batch 62000] loss 3.21, ppl 24.69, throughput 5755.92 samples/s
[Epoch 37 Batch 63000] loss 3.21, ppl 24.68, throughput 5712.34 samples/s
[Epoch 37 Batch 64000] loss 3.21, ppl 24.71, throughput 5574.46 samples/s
[Epoch 37 Batch 65000] loss 3.21, ppl 24.71, throughput 5764.05 samples/s
[Epoch 37 Batch 66000] loss 3.21, ppl 24.68, throughput 5723.13 samples/s
[Epoch 37 Batch 67000] loss 3.21, ppl 24.72, throughput 5543.98 samples/s
[Epoch 37 Batch 68000] loss 3.21, ppl 24.74, throughput 5756.89 samples/s
[Epoch 37 Batch 69000] loss 3.21, ppl 24.72, throughput 5726.40 samples/s
[Epoch 37 Batch 70000] loss 3.21, ppl 24.78, throughput 5734.80 samples/s
[Epoch 37 Batch 71000] loss 3.21, ppl 24.68, throughput 5591.48 samples/s
[Epoch 37 Batch 72000] loss 3.21, ppl 24.73, throughput 5748.72 samples/s
[Epoch 37 Batch 73000] loss 3.21, ppl 24.74, throughput 5718.79 samples/s
[Epoch 37 Batch 74000] loss 3.21, ppl 24.73, throughput 5707.85 samples/s
[Epoch 37 Batch 75000] loss 3.21, ppl 24.67, throughput 5547.15 samples/s
[Epoch 37 Batch 76000] loss 3.21, ppl 24.79, throughput 5752.60 samples/s
[Epoch 37 Batch 77000] loss 3.21, ppl 24.84, throughput 5721.42 samples/s
[Epoch 37 Batch 78000] loss 3.21, ppl 24.79, throughput 5731.43 samples/s
Epoch 37 took 7124.60 seconds.
[Epoch 38 Batch 1000] loss 3.18, ppl 24.15, throughput 5601.82 samples/s
[Epoch 38 Batch 2000] loss 3.20, ppl 24.42, throughput 5747.95 samples/s
[Epoch 38 Batch 3000] loss 3.20, ppl 24.46, throughput 5116.96 samples/s
[Epoch 38 Batch 4000] loss 3.19, ppl 24.40, throughput 4484.97 samples/s
[Epoch 38 Batch 5000] loss 3.19, ppl 24.17, throughput 4791.22 samples/s
[Epoch 38 Batch 6000] loss 3.18, ppl 24.09, throughput 5280.32 samples/s
[Epoch 38 Batch 7000] loss 3.20, ppl 24.44, throughput 5329.80 samples/s
[Epoch 38 Batch 8000] loss 3.19, ppl 24.34, throughput 5072.94 samples/s
[Epoch 38 Batch 9000] loss 3.20, ppl 24.42, throughput 5308.21 samples/s
[Epoch 38 Batch 10000] loss 3.19, ppl 24.26, throughput 5735.24 samples/s
[Epoch 38 Batch 11000] loss 3.19, ppl 24.27, throughput 5755.06 samples/s
[Epoch 38 Batch 12000] loss 3.19, ppl 24.40, throughput 5538.14 samples/s
[Epoch 38 Batch 13000] loss 3.20, ppl 24.48, throughput 5722.19 samples/s
[Epoch 38 Batch 14000] loss 3.20, ppl 24.50, throughput 5737.73 samples/s
[Epoch 38 Batch 15000] loss 3.20, ppl 24.43, throughput 5573.86 samples/s
[Epoch 38 Batch 16000] loss 3.20, ppl 24.44, throughput 5789.45 samples/s
[Epoch 38 Batch 17000] loss 3.20, ppl 24.48, throughput 5790.49 samples/s
[Epoch 38 Batch 18000] loss 3.20, ppl 24.51, throughput 5762.52 samples/s
[Epoch 38 Batch 19000] loss 3.19, ppl 24.36, throughput 5556.47 samples/s
[Epoch 38 Batch 20000] loss 3.20, ppl 24.50, throughput 5775.43 samples/s
[Epoch 38 Batch 21000] loss 3.20, ppl 24.49, throughput 5706.63 samples/s
[Epoch 38 Batch 22000] loss 3.20, ppl 24.42, throughput 5730.78 samples/s
[Epoch 38 Batch 23000] loss 3.20, ppl 24.41, throughput 5585.96 samples/s
[Epoch 38 Batch 24000] loss 3.19, ppl 24.36, throughput 5722.36 samples/s
[Epoch 38 Batch 25000] loss 3.20, ppl 24.57, throughput 5731.53 samples/s
[Epoch 38 Batch 26000] loss 3.20, ppl 24.47, throughput 5766.04 samples/s
[Epoch 38 Batch 27000] loss 3.20, ppl 24.52, throughput 5642.02 samples/s
[Epoch 38 Batch 28000] loss 3.20, ppl 24.49, throughput 5736.39 samples/s
[Epoch 38 Batch 29000] loss 3.20, ppl 24.59, throughput 5771.42 samples/s
[Epoch 38 Batch 30000] loss 3.20, ppl 24.60, throughput 5565.20 samples/s
[Epoch 38 Batch 31000] loss 3.20, ppl 24.47, throughput 5697.96 samples/s
[Epoch 38 Batch 32000] loss 3.20, ppl 24.45, throughput 5795.05 samples/s
[Epoch 38 Batch 33000] loss 3.20, ppl 24.60, throughput 5812.51 samples/s
[Epoch 38 Batch 34000] loss 3.20, ppl 24.49, throughput 5572.63 samples/s
[Epoch 38 Batch 35000] loss 3.20, ppl 24.53, throughput 5750.36 samples/s
[Epoch 38 Batch 36000] loss 3.20, ppl 24.57, throughput 5769.74 samples/s
[Epoch 38 Batch 37000] loss 3.20, ppl 24.51, throughput 5707.39 samples/s
[Epoch 38 Batch 38000] loss 3.21, ppl 24.66, throughput 5604.35 samples/s
[Epoch 38 Batch 39000] loss 3.20, ppl 24.52, throughput 5751.48 samples/s
[Epoch 38 Batch 40000] loss 3.20, ppl 24.60, throughput 5721.08 samples/s
[Epoch 38 Batch 41000] loss 3.20, ppl 24.55, throughput 5600.07 samples/s
[Epoch 38 Batch 42000] loss 3.20, ppl 24.59, throughput 5728.48 samples/s
[Epoch 38 Batch 43000] loss 3.20, ppl 24.53, throughput 5730.64 samples/s
[Epoch 38 Batch 44000] loss 3.20, ppl 24.58, throughput 5771.42 samples/s
[Epoch 38 Batch 45000] loss 3.20, ppl 24.61, throughput 5613.97 samples/s
[Epoch 38 Batch 46000] loss 3.20, ppl 24.58, throughput 5736.68 samples/s
[Epoch 38 Batch 47000] loss 3.20, ppl 24.61, throughput 5759.95 samples/s
[Epoch 38 Batch 48000] loss 3.21, ppl 24.66, throughput 5813.60 samples/s
[Epoch 38 Batch 49000] loss 3.21, ppl 24.66, throughput 5537.80 samples/s
[Epoch 38 Batch 50000] loss 3.20, ppl 24.57, throughput 5796.98 samples/s
[Epoch 38 Batch 51000] loss 3.20, ppl 24.55, throughput 5748.10 samples/s
[Epoch 38 Batch 52000] loss 3.20, ppl 24.65, throughput 5736.57 samples/s
[Epoch 38 Batch 53000] loss 3.20, ppl 24.64, throughput 5588.26 samples/s
[Epoch 38 Batch 54000] loss 3.20, ppl 24.57, throughput 5706.21 samples/s
[Epoch 38 Batch 55000] loss 3.20, ppl 24.57, throughput 5754.01 samples/s
[Epoch 38 Batch 56000] loss 3.21, ppl 24.69, throughput 5556.91 samples/s
[Epoch 38 Batch 57000] loss 3.20, ppl 24.62, throughput 5739.07 samples/s
[Epoch 38 Batch 58000] loss 3.21, ppl 24.68, throughput 5698.71 samples/s
[Epoch 38 Batch 59000] loss 3.21, ppl 24.69, throughput 5719.55 samples/s
[Epoch 38 Batch 60000] loss 3.20, ppl 24.63, throughput 5545.73 samples/s
[Epoch 38 Batch 61000] loss 3.21, ppl 24.70, throughput 5681.75 samples/s
[Epoch 38 Batch 62000] loss 3.21, ppl 24.66, throughput 5705.52 samples/s
[Epoch 38 Batch 63000] loss 3.21, ppl 24.73, throughput 5710.67 samples/s
[Epoch 38 Batch 64000] loss 3.21, ppl 24.68, throughput 5534.73 samples/s
[Epoch 38 Batch 65000] loss 3.20, ppl 24.61, throughput 5749.19 samples/s
[Epoch 38 Batch 66000] loss 3.21, ppl 24.68, throughput 5750.72 samples/s
[Epoch 38 Batch 67000] loss 3.20, ppl 24.65, throughput 5608.61 samples/s
[Epoch 38 Batch 68000] loss 3.21, ppl 24.78, throughput 5703.52 samples/s
[Epoch 38 Batch 69000] loss 3.21, ppl 24.70, throughput 5790.39 samples/s
[Epoch 38 Batch 70000] loss 3.21, ppl 24.71, throughput 5752.41 samples/s
[Epoch 38 Batch 71000] loss 3.21, ppl 24.80, throughput 5566.67 samples/s
[Epoch 38 Batch 72000] loss 3.21, ppl 24.66, throughput 5752.01 samples/s
[Epoch 38 Batch 73000] loss 3.21, ppl 24.67, throughput 5757.59 samples/s
[Epoch 38 Batch 74000] loss 3.20, ppl 24.65, throughput 5747.63 samples/s
[Epoch 38 Batch 75000] loss 3.21, ppl 24.75, throughput 5572.20 samples/s
[Epoch 38 Batch 76000] loss 3.21, ppl 24.77, throughput 5749.37 samples/s
[Epoch 38 Batch 77000] loss 3.21, ppl 24.69, throughput 5715.41 samples/s
[Epoch 38 Batch 78000] loss 3.21, ppl 24.72, throughput 5688.63 samples/s
Epoch 38 took 7094.80 seconds.
[Epoch 39 Batch 1000] loss 3.18, ppl 24.09, throughput 5651.78 samples/s
[Epoch 39 Batch 2000] loss 3.18, ppl 24.09, throughput 5730.32 samples/s
[Epoch 39 Batch 3000] loss 3.18, ppl 24.14, throughput 5690.33 samples/s
[Epoch 39 Batch 4000] loss 3.18, ppl 24.15, throughput 4928.42 samples/s
[Epoch 39 Batch 5000] loss 3.19, ppl 24.32, throughput 4586.51 samples/s
[Epoch 39 Batch 6000] loss 3.18, ppl 24.10, throughput 4596.66 samples/s
[Epoch 39 Batch 7000] loss 3.18, ppl 24.10, throughput 4604.82 samples/s
[Epoch 39 Batch 8000] loss 3.19, ppl 24.21, throughput 4559.59 samples/s
[Epoch 39 Batch 9000] loss 3.19, ppl 24.40, throughput 4880.59 samples/s
[Epoch 39 Batch 10000] loss 3.19, ppl 24.41, throughput 5693.35 samples/s
[Epoch 39 Batch 11000] loss 3.19, ppl 24.25, throughput 5739.77 samples/s
[Epoch 39 Batch 12000] loss 3.19, ppl 24.41, throughput 5548.89 samples/s
[Epoch 39 Batch 13000] loss 3.19, ppl 24.40, throughput 5728.01 samples/s
[Epoch 39 Batch 14000] loss 3.19, ppl 24.23, throughput 5731.76 samples/s
[Epoch 39 Batch 15000] loss 3.19, ppl 24.30, throughput 5596.23 samples/s
[Epoch 39 Batch 16000] loss 3.20, ppl 24.41, throughput 5741.98 samples/s
[Epoch 39 Batch 17000] loss 3.20, ppl 24.43, throughput 5731.92 samples/s
[Epoch 39 Batch 18000] loss 3.20, ppl 24.53, throughput 5739.17 samples/s
[Epoch 39 Batch 19000] loss 3.19, ppl 24.38, throughput 5585.95 samples/s
[Epoch 39 Batch 20000] loss 3.20, ppl 24.51, throughput 5789.39 samples/s
[Epoch 39 Batch 21000] loss 3.20, ppl 24.45, throughput 5705.34 samples/s
[Epoch 39 Batch 22000] loss 3.20, ppl 24.46, throughput 5792.89 samples/s
[Epoch 39 Batch 23000] loss 3.20, ppl 24.44, throughput 5581.34 samples/s
[Epoch 39 Batch 24000] loss 3.20, ppl 24.44, throughput 5741.55 samples/s
[Epoch 39 Batch 25000] loss 3.20, ppl 24.45, throughput 5141.44 samples/s
[Epoch 39 Batch 26000] loss 3.20, ppl 24.54, throughput 4965.59 samples/s
[Epoch 39 Batch 27000] loss 3.20, ppl 24.47, throughput 4958.11 samples/s
[Epoch 39 Batch 28000] loss 3.20, ppl 24.54, throughput 5050.75 samples/s
[Epoch 39 Batch 29000] loss 3.20, ppl 24.46, throughput 4920.94 samples/s
[Epoch 39 Batch 30000] loss 3.20, ppl 24.53, throughput 4948.39 samples/s
[Epoch 39 Batch 31000] loss 3.20, ppl 24.46, throughput 5071.64 samples/s
[Epoch 39 Batch 32000] loss 3.20, ppl 24.42, throughput 4972.31 samples/s
[Epoch 39 Batch 33000] loss 3.20, ppl 24.42, throughput 4994.18 samples/s
[Epoch 39 Batch 34000] loss 3.20, ppl 24.48, throughput 4873.19 samples/s
[Epoch 39 Batch 35000] loss 3.20, ppl 24.52, throughput 5067.99 samples/s
[Epoch 39 Batch 36000] loss 3.20, ppl 24.45, throughput 5089.40 samples/s
[Epoch 39 Batch 37000] loss 3.20, ppl 24.50, throughput 5000.21 samples/s
[Epoch 39 Batch 38000] loss 3.20, ppl 24.53, throughput 4892.56 samples/s
[Epoch 39 Batch 39000] loss 3.20, ppl 24.49, throughput 5128.35 samples/s
[Epoch 39 Batch 40000] loss 3.20, ppl 24.60, throughput 4861.94 samples/s
[Epoch 39 Batch 41000] loss 3.20, ppl 24.50, throughput 4832.00 samples/s
[Epoch 39 Batch 42000] loss 3.20, ppl 24.43, throughput 4988.43 samples/s
[Epoch 39 Batch 43000] loss 3.20, ppl 24.63, throughput 5088.19 samples/s
[Epoch 39 Batch 44000] loss 3.20, ppl 24.57, throughput 5144.58 samples/s
[Epoch 39 Batch 45000] loss 3.20, ppl 24.51, throughput 4895.15 samples/s
[Epoch 39 Batch 46000] loss 3.20, ppl 24.46, throughput 5071.56 samples/s
[Epoch 39 Batch 47000] loss 3.20, ppl 24.58, throughput 5096.79 samples/s
[Epoch 39 Batch 48000] loss 3.20, ppl 24.58, throughput 5027.79 samples/s
[Epoch 39 Batch 49000] loss 3.20, ppl 24.63, throughput 4999.65 samples/s
[Epoch 39 Batch 50000] loss 3.20, ppl 24.63, throughput 4900.77 samples/s
[Epoch 39 Batch 51000] loss 3.20, ppl 24.62, throughput 5111.08 samples/s
[Epoch 39 Batch 52000] loss 3.20, ppl 24.65, throughput 4992.84 samples/s
[Epoch 39 Batch 53000] loss 3.20, ppl 24.55, throughput 4856.18 samples/s
[Epoch 39 Batch 54000] loss 3.20, ppl 24.57, throughput 5017.81 samples/s
[Epoch 39 Batch 55000] loss 3.20, ppl 24.58, throughput 5015.06 samples/s
[Epoch 39 Batch 56000] loss 3.20, ppl 24.59, throughput 4869.85 samples/s
[Epoch 39 Batch 57000] loss 3.20, ppl 24.57, throughput 5022.35 samples/s
[Epoch 39 Batch 58000] loss 3.20, ppl 24.50, throughput 5063.90 samples/s
[Epoch 39 Batch 59000] loss 3.20, ppl 24.57, throughput 4974.49 samples/s
[Epoch 39 Batch 60000] loss 3.20, ppl 24.55, throughput 4864.38 samples/s
[Epoch 39 Batch 61000] loss 3.20, ppl 24.61, throughput 4998.86 samples/s
[Epoch 39 Batch 62000] loss 3.21, ppl 24.66, throughput 5115.24 samples/s
[Epoch 39 Batch 63000] loss 3.20, ppl 24.65, throughput 5147.94 samples/s
[Epoch 39 Batch 64000] loss 3.21, ppl 24.66, throughput 4961.97 samples/s
[Epoch 39 Batch 65000] loss 3.21, ppl 24.67, throughput 5108.07 samples/s
[Epoch 39 Batch 66000] loss 3.20, ppl 24.62, throughput 5012.22 samples/s
[Epoch 39 Batch 67000] loss 3.20, ppl 24.56, throughput 4939.73 samples/s
[Epoch 39 Batch 68000] loss 3.21, ppl 24.69, throughput 5058.30 samples/s
[Epoch 39 Batch 69000] loss 3.21, ppl 24.70, throughput 5015.93 samples/s
[Epoch 39 Batch 70000] loss 3.20, ppl 24.62, throughput 5061.53 samples/s
[Epoch 39 Batch 71000] loss 3.21, ppl 24.72, throughput 4760.39 samples/s
[Epoch 39 Batch 72000] loss 3.21, ppl 24.67, throughput 5063.28 samples/s
[Epoch 39 Batch 73000] loss 3.20, ppl 24.65, throughput 5030.02 samples/s
[Epoch 39 Batch 74000] loss 3.20, ppl 24.61, throughput 5020.48 samples/s
[Epoch 39 Batch 75000] loss 3.21, ppl 24.74, throughput 4967.72 samples/s
[Epoch 39 Batch 76000] loss 3.20, ppl 24.64, throughput 4975.27 samples/s
[Epoch 39 Batch 77000] loss 3.20, ppl 24.63, throughput 5111.50 samples/s
[Epoch 39 Batch 78000] loss 3.20, ppl 24.65, throughput 5089.58 samples/s
Epoch 39 took 7805.22 seconds.
[Epoch 40 Batch 1000] loss 3.19, ppl 24.19, throughput 4940.26 samples/s
[Epoch 40 Batch 2000] loss 3.19, ppl 24.29, throughput 4969.80 samples/s
[Epoch 40 Batch 3000] loss 3.18, ppl 24.09, throughput 5057.78 samples/s
[Epoch 40 Batch 4000] loss 3.19, ppl 24.26, throughput 4308.36 samples/s
[Epoch 40 Batch 5000] loss 3.18, ppl 24.07, throughput 4242.06 samples/s
[Epoch 40 Batch 6000] loss 3.18, ppl 24.15, throughput 4183.47 samples/s
[Epoch 40 Batch 7000] loss 3.19, ppl 24.24, throughput 4209.20 samples/s
[Epoch 40 Batch 8000] loss 3.19, ppl 24.38, throughput 4051.12 samples/s
[Epoch 40 Batch 9000] loss 3.19, ppl 24.32, throughput 4315.04 samples/s
[Epoch 40 Batch 10000] loss 3.18, ppl 24.16, throughput 5082.38 samples/s
[Epoch 40 Batch 11000] loss 3.19, ppl 24.31, throughput 5041.95 samples/s
[Epoch 40 Batch 12000] loss 3.19, ppl 24.31, throughput 5048.76 samples/s
[Epoch 40 Batch 13000] loss 3.19, ppl 24.19, throughput 4986.22 samples/s
[Epoch 40 Batch 14000] loss 3.19, ppl 24.33, throughput 5166.23 samples/s
[Epoch 40 Batch 15000] loss 3.20, ppl 24.42, throughput 4878.20 samples/s
[Epoch 40 Batch 16000] loss 3.20, ppl 24.42, throughput 5075.68 samples/s
[Epoch 40 Batch 17000] loss 3.19, ppl 24.29, throughput 5121.80 samples/s
[Epoch 40 Batch 18000] loss 3.19, ppl 24.32, throughput 5142.32 samples/s
[Epoch 40 Batch 19000] loss 3.19, ppl 24.24, throughput 5029.31 samples/s
[Epoch 40 Batch 20000] loss 3.20, ppl 24.50, throughput 5032.07 samples/s
[Epoch 40 Batch 21000] loss 3.20, ppl 24.44, throughput 5066.76 samples/s
[Epoch 40 Batch 22000] loss 3.19, ppl 24.34, throughput 5093.44 samples/s
[Epoch 40 Batch 23000] loss 3.20, ppl 24.43, throughput 4808.28 samples/s
[Epoch 40 Batch 24000] loss 3.19, ppl 24.38, throughput 5027.44 samples/s
[Epoch 40 Batch 25000] loss 3.20, ppl 24.50, throughput 5027.39 samples/s
[Epoch 40 Batch 26000] loss 3.19, ppl 24.37, throughput 4898.84 samples/s
[Epoch 40 Batch 27000] loss 3.19, ppl 24.33, throughput 5078.02 samples/s
[Epoch 40 Batch 28000] loss 3.20, ppl 24.42, throughput 5055.26 samples/s
[Epoch 40 Batch 29000] loss 3.20, ppl 24.45, throughput 5072.10 samples/s
[Epoch 40 Batch 30000] loss 3.20, ppl 24.53, throughput 4969.61 samples/s
[Epoch 40 Batch 31000] loss 3.19, ppl 24.38, throughput 4974.51 samples/s
[Epoch 40 Batch 32000] loss 3.19, ppl 24.36, throughput 5159.18 samples/s
[Epoch 40 Batch 33000] loss 3.19, ppl 24.26, throughput 5025.50 samples/s
[Epoch 40 Batch 34000] loss 3.19, ppl 24.36, throughput 4883.53 samples/s
[Epoch 40 Batch 35000] loss 3.20, ppl 24.48, throughput 5005.73 samples/s
[Epoch 40 Batch 36000] loss 3.20, ppl 24.48, throughput 5142.99 samples/s
[Epoch 40 Batch 37000] loss 3.20, ppl 24.44, throughput 5034.52 samples/s
[Epoch 40 Batch 38000] loss 3.20, ppl 24.48, throughput 4914.06 samples/s
[Epoch 40 Batch 39000] loss 3.20, ppl 24.46, throughput 4973.03 samples/s
[Epoch 40 Batch 40000] loss 3.20, ppl 24.47, throughput 5080.98 samples/s
[Epoch 40 Batch 41000] loss 3.20, ppl 24.50, throughput 4895.05 samples/s
[Epoch 40 Batch 42000] loss 3.20, ppl 24.48, throughput 4987.92 samples/s
[Epoch 40 Batch 43000] loss 3.20, ppl 24.54, throughput 5032.46 samples/s
[Epoch 40 Batch 44000] loss 3.20, ppl 24.58, throughput 5100.90 samples/s
[Epoch 40 Batch 45000] loss 3.20, ppl 24.58, throughput 4864.68 samples/s
[Epoch 40 Batch 46000] loss 3.20, ppl 24.57, throughput 4987.84 samples/s
[Epoch 40 Batch 47000] loss 3.20, ppl 24.53, throughput 4999.65 samples/s
[Epoch 40 Batch 48000] loss 3.20, ppl 24.57, throughput 5061.56 samples/s
[Epoch 40 Batch 49000] loss 3.20, ppl 24.53, throughput 4926.86 samples/s
[Epoch 40 Batch 50000] loss 3.20, ppl 24.54, throughput 5059.90 samples/s
[Epoch 40 Batch 51000] loss 3.20, ppl 24.50, throughput 5146.98 samples/s
[Epoch 40 Batch 52000] loss 3.20, ppl 24.51, throughput 4914.57 samples/s
[Epoch 40 Batch 53000] loss 3.20, ppl 24.52, throughput 5031.68 samples/s
[Epoch 40 Batch 54000] loss 3.20, ppl 24.60, throughput 5080.52 samples/s
[Epoch 40 Batch 55000] loss 3.20, ppl 24.59, throughput 5124.14 samples/s
[Epoch 40 Batch 56000] loss 3.20, ppl 24.53, throughput 4963.41 samples/s
[Epoch 40 Batch 57000] loss 3.20, ppl 24.60, throughput 4979.02 samples/s
[Epoch 40 Batch 58000] loss 3.20, ppl 24.51, throughput 5119.00 samples/s
[Epoch 40 Batch 59000] loss 3.20, ppl 24.56, throughput 4974.11 samples/s
[Epoch 40 Batch 60000] loss 3.20, ppl 24.54, throughput 4914.12 samples/s
[Epoch 40 Batch 61000] loss 3.20, ppl 24.55, throughput 5053.27 samples/s
[Epoch 40 Batch 62000] loss 3.20, ppl 24.49, throughput 5018.67 samples/s
[Epoch 40 Batch 63000] loss 3.20, ppl 24.52, throughput 5075.65 samples/s
[Epoch 40 Batch 64000] loss 3.20, ppl 24.62, throughput 5016.28 samples/s
[Epoch 40 Batch 65000] loss 3.20, ppl 24.57, throughput 5048.99 samples/s
[Epoch 40 Batch 66000] loss 3.20, ppl 24.56, throughput 4988.21 samples/s
[Epoch 40 Batch 67000] loss 3.20, ppl 24.56, throughput 4839.18 samples/s
[Epoch 40 Batch 68000] loss 3.20, ppl 24.57, throughput 5146.17 samples/s
[Epoch 40 Batch 69000] loss 3.20, ppl 24.64, throughput 4987.37 samples/s
[Epoch 40 Batch 70000] loss 3.20, ppl 24.59, throughput 5057.05 samples/s
[Epoch 40 Batch 71000] loss 3.20, ppl 24.51, throughput 4842.31 samples/s
[Epoch 40 Batch 72000] loss 3.20, ppl 24.63, throughput 4981.76 samples/s
[Epoch 40 Batch 73000] loss 3.20, ppl 24.59, throughput 5045.47 samples/s
[Epoch 40 Batch 74000] loss 3.20, ppl 24.57, throughput 5166.74 samples/s
[Epoch 40 Batch 75000] loss 3.20, ppl 24.64, throughput 4855.44 samples/s
[Epoch 40 Batch 76000] loss 3.20, ppl 24.57, throughput 4946.20 samples/s
[Epoch 40 Batch 77000] loss 3.20, ppl 24.62, throughput 5089.08 samples/s
[Epoch 40 Batch 78000] loss 3.20, ppl 24.64, throughput 4999.71 samples/s
Epoch 40 took 8083.10 seconds.
[Epoch 41 Batch 1000] loss 3.18, ppl 24.08, throughput 4825.05 samples/s
[Epoch 41 Batch 2000] loss 3.19, ppl 24.22, throughput 4997.84 samples/s
[Epoch 41 Batch 3000] loss 3.18, ppl 24.03, throughput 5295.45 samples/s
[Epoch 41 Batch 4000] loss 3.18, ppl 24.05, throughput 5347.77 samples/s
[Epoch 41 Batch 5000] loss 3.18, ppl 24.13, throughput 5416.10 samples/s
[Epoch 41 Batch 6000] loss 3.18, ppl 24.07, throughput 5482.13 samples/s
[Epoch 41 Batch 7000] loss 3.19, ppl 24.26, throughput 5438.80 samples/s
[Epoch 41 Batch 8000] loss 3.18, ppl 24.14, throughput 5295.16 samples/s
[Epoch 41 Batch 9000] loss 3.19, ppl 24.38, throughput 5329.52 samples/s
[Epoch 41 Batch 10000] loss 3.19, ppl 24.31, throughput 5041.14 samples/s
[Epoch 41 Batch 11000] loss 3.18, ppl 24.03, throughput 5020.45 samples/s
[Epoch 41 Batch 12000] loss 3.19, ppl 24.33, throughput 4824.03 samples/s
[Epoch 41 Batch 13000] loss 3.19, ppl 24.28, throughput 5117.68 samples/s
[Epoch 41 Batch 14000] loss 3.19, ppl 24.38, throughput 4969.68 samples/s
[Epoch 41 Batch 15000] loss 3.19, ppl 24.32, throughput 5008.50 samples/s
[Epoch 41 Batch 16000] loss 3.19, ppl 24.31, throughput 4909.73 samples/s
[Epoch 41 Batch 17000] loss 3.19, ppl 24.29, throughput 5164.40 samples/s
[Epoch 41 Batch 18000] loss 3.19, ppl 24.23, throughput 5044.70 samples/s
[Epoch 41 Batch 19000] loss 3.19, ppl 24.32, throughput 4852.21 samples/s
[Epoch 41 Batch 20000] loss 3.19, ppl 24.35, throughput 4965.10 samples/s
[Epoch 41 Batch 21000] loss 3.19, ppl 24.36, throughput 5111.12 samples/s
[Epoch 41 Batch 22000] loss 3.19, ppl 24.25, throughput 4976.67 samples/s
[Epoch 41 Batch 23000] loss 3.19, ppl 24.25, throughput 4905.07 samples/s
[Epoch 41 Batch 24000] loss 3.20, ppl 24.51, throughput 5048.12 samples/s
[Epoch 41 Batch 25000] loss 3.20, ppl 24.43, throughput 5147.06 samples/s
[Epoch 41 Batch 26000] loss 3.19, ppl 24.30, throughput 4901.50 samples/s
[Epoch 41 Batch 27000] loss 3.19, ppl 24.34, throughput 5083.15 samples/s
[Epoch 41 Batch 28000] loss 3.19, ppl 24.27, throughput 5160.00 samples/s
[Epoch 41 Batch 29000] loss 3.19, ppl 24.34, throughput 4950.72 samples/s
[Epoch 41 Batch 30000] loss 3.20, ppl 24.45, throughput 4852.28 samples/s
[Epoch 41 Batch 31000] loss 3.20, ppl 24.43, throughput 5066.58 samples/s
[Epoch 41 Batch 32000] loss 3.20, ppl 24.42, throughput 4946.82 samples/s
[Epoch 41 Batch 33000] loss 3.20, ppl 24.43, throughput 5092.73 samples/s
[Epoch 41 Batch 34000] loss 3.20, ppl 24.44, throughput 4967.38 samples/s
[Epoch 41 Batch 35000] loss 3.20, ppl 24.46, throughput 4974.38 samples/s
[Epoch 41 Batch 36000] loss 3.19, ppl 24.40, throughput 4999.47 samples/s
[Epoch 41 Batch 37000] loss 3.20, ppl 24.52, throughput 5127.81 samples/s
[Epoch 41 Batch 38000] loss 3.20, ppl 24.48, throughput 5001.67 samples/s
[Epoch 41 Batch 39000] loss 3.20, ppl 24.47, throughput 4965.30 samples/s
[Epoch 41 Batch 40000] loss 3.20, ppl 24.41, throughput 5033.13 samples/s
[Epoch 41 Batch 41000] loss 3.20, ppl 24.44, throughput 4916.23 samples/s
[Epoch 41 Batch 42000] loss 3.20, ppl 24.42, throughput 5014.59 samples/s
[Epoch 41 Batch 43000] loss 3.20, ppl 24.43, throughput 5028.82 samples/s
[Epoch 41 Batch 44000] loss 3.20, ppl 24.48, throughput 4949.22 samples/s
[Epoch 41 Batch 45000] loss 3.20, ppl 24.42, throughput 4971.55 samples/s
[Epoch 41 Batch 46000] loss 3.20, ppl 24.42, throughput 5063.15 samples/s
[Epoch 41 Batch 47000] loss 3.20, ppl 24.54, throughput 5018.52 samples/s
[Epoch 41 Batch 48000] loss 3.19, ppl 24.38, throughput 5114.56 samples/s
[Epoch 41 Batch 49000] loss 3.20, ppl 24.44, throughput 4867.21 samples/s
[Epoch 41 Batch 50000] loss 3.19, ppl 24.41, throughput 4980.67 samples/s
[Epoch 41 Batch 51000] loss 3.20, ppl 24.51, throughput 5004.40 samples/s
[Epoch 41 Batch 52000] loss 3.19, ppl 24.39, throughput 4997.40 samples/s
[Epoch 41 Batch 53000] loss 3.20, ppl 24.45, throughput 5003.62 samples/s
[Epoch 41 Batch 54000] loss 3.20, ppl 24.47, throughput 5084.11 samples/s
[Epoch 41 Batch 55000] loss 3.20, ppl 24.54, throughput 5019.36 samples/s
[Epoch 41 Batch 56000] loss 3.20, ppl 24.60, throughput 5027.52 samples/s
[Epoch 41 Batch 57000] loss 3.20, ppl 24.55, throughput 5025.61 samples/s
[Epoch 41 Batch 58000] loss 3.20, ppl 24.47, throughput 5098.70 samples/s
[Epoch 41 Batch 59000] loss 3.20, ppl 24.58, throughput 5061.40 samples/s
[Epoch 41 Batch 60000] loss 3.20, ppl 24.55, throughput 5061.55 samples/s
[Epoch 41 Batch 61000] loss 3.20, ppl 24.50, throughput 5014.70 samples/s
[Epoch 41 Batch 62000] loss 3.20, ppl 24.46, throughput 5183.68 samples/s
[Epoch 41 Batch 63000] loss 3.20, ppl 24.45, throughput 4970.67 samples/s
[Epoch 41 Batch 64000] loss 3.20, ppl 24.51, throughput 4996.07 samples/s
[Epoch 41 Batch 65000] loss 3.20, ppl 24.53, throughput 5062.92 samples/s
[Epoch 41 Batch 66000] loss 3.20, ppl 24.49, throughput 5025.62 samples/s
[Epoch 41 Batch 67000] loss 3.20, ppl 24.56, throughput 4839.96 samples/s
[Epoch 41 Batch 68000] loss 3.20, ppl 24.57, throughput 4963.89 samples/s
[Epoch 41 Batch 69000] loss 3.20, ppl 24.59, throughput 5089.68 samples/s
[Epoch 41 Batch 70000] loss 3.20, ppl 24.42, throughput 5094.89 samples/s
[Epoch 41 Batch 71000] loss 3.20, ppl 24.62, throughput 4873.00 samples/s
[Epoch 41 Batch 72000] loss 3.20, ppl 24.58, throughput 5102.50 samples/s
[Epoch 41 Batch 73000] loss 3.20, ppl 24.61, throughput 5011.35 samples/s
[Epoch 41 Batch 74000] loss 3.20, ppl 24.49, throughput 5099.71 samples/s
[Epoch 41 Batch 75000] loss 3.20, ppl 24.65, throughput 4845.11 samples/s
[Epoch 41 Batch 76000] loss 3.20, ppl 24.63, throughput 4899.14 samples/s
[Epoch 41 Batch 77000] loss 3.20, ppl 24.64, throughput 4920.82 samples/s
[Epoch 41 Batch 78000] loss 3.20, ppl 24.54, throughput 5037.60 samples/s
Epoch 41 took 7935.48 seconds.
You can’t perform that action at this time.