# Image Captioning Project

## Training the network

---
In this notebook, we adjust our architectures and hyperparameters while training on the COCO dataset.

<a id='step1'></a>
## Step 1: Training Setup

Important parameters that effects the training of our network:
- `batch_size` - the batch size of each training batch.  It is the number of image-caption pairs used to amend the model weights in each training step. 
- `vocab_threshold` - the minimum word count threshold.  Note that a larger threshold will result in a smaller vocabulary, whereas a smaller threshold will include rarer words and result in a larger vocabulary.  
- `vocab_from_file` - a Boolean that decides whether to load the vocabulary from file. 
- `embed_size` - the dimensionality of the image and word embeddings.  
- `hidden_size` - the number of features in the hidden state of the RNN decoder.  
- `num_epochs` - the number of epochs to train the model.  We recommend that you set `num_epochs=3`, but feel free to increase or decrease this number as you wish.  [This paper](https://arxiv.org/pdf/1502.03044.pdf) trained a captioning model on a single state-of-the-art GPU for 3 days, but you'll soon see that you can get reasonable results in a matter of a few hours!  (_But of course, if you want your model to compete with current research, you will have to train for much longer._)
- `save_every` - determines how often to save the model weights.  We recommend that you set `save_every=1`, to save the model weights after each epoch.  This way, after the `i`th epoch, the encoder and decoder weights will be saved in the `models/` folder as `encoder-i.pkl` and `decoder-i.pkl`, respectively.
- `print_every` - determines how often to print the batch loss to the Jupyter notebook while training.  Note that you **will not** observe a monotonic decrease in the loss function while training - this is perfectly fine and completely expected!  You are encouraged to keep this at its default value of `100` to avoid clogging the notebook, but feel free to change it.
- `log_file` - the name of the text file containing - for every step - how the loss and perplexity evolved during training.

In [1]:
import torch
import torch.nn as nn
from torchvision import transforms
import sys
sys.path.append(r'C:\Users\Project\Documents\Python Scripts\Github_Clone\cocoapi-master\PythonAPI')
from pycocotools.coco import COCO
from data_loader import get_loader
from model import EncoderCNN, DecoderRNN
import math

batch_size = 64          # batch size
vocab_threshold = 3        # minimum word count threshold
vocab_from_file = True    # if True, load existing vocab file
embed_size = 256           # dimensionality of image and word embeddings
hidden_size = 1048          # number of features in hidden state of the RNN decoder
num_epochs = 3             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 200          # determines window for printing average loss
log_file = 'training_log.txt'       # name of file with saved training loss and perplexity

transform_train = transforms.Compose([ 
    transforms.Resize(256),                          # smaller edge of image resized to 256
    transforms.RandomCrop(224),                      # get 224x224 crop from random location
    transforms.RandomHorizontalFlip(),               # horizontally flip image with probability=0.5
    transforms.ToTensor(),                           # convert the PIL Image to a tensor
    transforms.Normalize((0.485, 0.456, 0.406),      # normalize image for pre-trained model
                         (0.229, 0.224, 0.225))])

# Build data loader.
data_loader = get_loader(transform=transform_train,
                         mode='train',
                         batch_size=batch_size,
                         vocab_threshold=vocab_threshold,
                         vocab_from_file=vocab_from_file)

# The size of the vocabulary.
vocab_size = len(data_loader.dataset.vocab)

# Initialize the encoder and decoder. 
encoder = EncoderCNN(embed_size)
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)

# Move models to GPU if CUDA is available. 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
encoder.to(device)
decoder.to(device)

# Define the loss function. 
criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

# Specify the learnable parameters of the model.
params = list(decoder.parameters()) + list(encoder.embed.parameters())

# Define the optimizer.
optimizer = torch.optim.Adam(params, lr=0.0001)

# Set the total number of training steps per epoch.
total_step = math.ceil(len(data_loader.dataset.caption_lengths) / data_loader.batch_sampler.batch_size)
total_step

Vocabulary successfully loaded from vocab.pkl file!
loading annotations into memory...
Done (t=0.92s)
creating index...
index created!
Obtaining caption lengths...


100%|████████████████████████████████████████████████████████████████████████| 414113/414113 [00:56<00:00, 7316.64it/s]
  "num_layers={}".format(dropout, num_layers))


6471

<a id='step2'></a>
## Step 2: Train our Model

Once you have executed the code cell in **Step 1**, the training procedure below should run without issue.  

It is completely fine to leave the code cell below as-is without modifications to train your model.  However, if you would like to modify the code used to train the model below, you must ensure that your changes are easily parsed by your reviewer.  In other words, make sure to provide appropriate comments to describe how your code works!  

You may find it useful to load saved weights to resume training.  In that case, note the names of the files containing the encoder and decoder weights that you'd like to load (`encoder_file` and `decoder_file`).  Then you can load the weights by using the lines below:

```python
# Load pre-trained weights before resuming training.
encoder.load_state_dict(torch.load(os.path.join('./models', encoder_file)))
decoder.load_state_dict(torch.load(os.path.join('./models', decoder_file)))
```

While trying out parameters, make sure to take extensive notes and record the settings that you used in your various training runs.  In particular, you don't want to encounter a situation where you've trained a model for several hours but can't remember what settings you used :).

In [2]:
import torch.utils.data as data
import numpy as np
import os
import time

# Open the training log file.
f = open(log_file, 'w')

old_time = time.time()


for epoch in range(1, num_epochs+1):
    
    for i_step in range(1, total_step+1):
      
        # Randomly sample a caption length, and sample indices with that length.
        indices = data_loader.dataset.get_train_indices()
        # Create and assign a batch sampler to retrieve a batch with the sampled indices.
        new_sampler = data.sampler.SubsetRandomSampler(indices=indices)
        data_loader.batch_sampler.sampler = new_sampler
        
        # Obtain the batch.
        images, captions = next(iter(data_loader))

        # Move batch of images and captions to GPU if CUDA is available.
        images = images.to(device)
        captions = captions.to(device)
        
        # Zero the gradients.
        decoder.zero_grad()
        encoder.zero_grad()
        
        # Pass the inputs through the CNN-RNN model.
        features = encoder(images)
        outputs = decoder(features, captions)
        
        # Calculate the batch loss.
        loss = criterion(outputs.view(-1, vocab_size), captions.view(-1))
        
        # Backward pass.
        loss.backward()
        
        # Update the parameters in the optimizer.
        optimizer.step()
            
        # Get training statistics.
        stats = 'Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Perplexity: %5.4f' % (epoch, num_epochs, i_step, total_step, loss.item(), np.exp(loss.item()))
        
        # Print training statistics (on same line).
        print('\r' + stats, end="")
        sys.stdout.flush()
        
        # Print training statistics to file.
        f.write(stats + '\n')
        f.flush()
        
        # Print training statistics (on different line).
        if i_step % print_every == 0:
            print('\r' + stats)
            
    # Save the weights.
    if epoch % save_every == 0:
        torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d.pkl' % epoch))
        torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d.pkl' % epoch))

# Close the training log file.
f.close()

the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1/6471], Loss: 9.3614, Perplexity: 11630.2678the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2/6471], Loss: 9.3488, Perplexity: 11484.7450the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3/6471], Loss: 9.3309, Perplexity: 11280.7443the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [4/6471], Loss: 9.3269, Perplexity: 11235.8326the hiddens.shape: torch.Size([64, 26, 1048])
Epoch [1/3], Step [5/6471], Loss: 9.3194, Perplexity: 11152.6376the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6/6471], Loss: 9.2895, Perplexity: 10824.0624the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [7/6471], Loss: 9.2787, Perplexity: 10707.4609the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [8/6471], Loss: 9.2581, Perplexity: 10488.8602the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [9/6471], Loss: 9.2511, Perplexity: 10415.9531the hidden

Epoch [1/3], Step [75/6471], Loss: 5.3452, Perplexity: 209.6003the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [76/6471], Loss: 5.4562, Perplexity: 234.2149the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [77/6471], Loss: 5.3846, Perplexity: 218.0242the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [78/6471], Loss: 5.6499, Perplexity: 284.2695the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [79/6471], Loss: 5.2254, Perplexity: 185.9341the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [80/6471], Loss: 5.3495, Perplexity: 210.5120the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [81/6471], Loss: 5.3541, Perplexity: 211.4757the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [82/6471], Loss: 5.3385, Perplexity: 208.1934the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [83/6471], Loss: 5.3123, Perplexity: 202.8217the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [

Epoch [1/3], Step [149/6471], Loss: 4.9074, Perplexity: 135.2891the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [150/6471], Loss: 4.8782, Perplexity: 131.3898the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [151/6471], Loss: 4.9639, Perplexity: 143.1479the hiddens.shape: torch.Size([64, 25, 1048])
Epoch [1/3], Step [152/6471], Loss: 5.4775, Perplexity: 239.2377the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [153/6471], Loss: 5.1129, Perplexity: 166.1477the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [154/6471], Loss: 4.8338, Perplexity: 125.6881the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [155/6471], Loss: 5.1011, Perplexity: 164.1962the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [156/6471], Loss: 4.9338, Perplexity: 138.9102the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [157/6471], Loss: 4.9932, Perplexity: 147.4043the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [223/6471], Loss: 5.0476, Perplexity: 155.6443the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [224/6471], Loss: 4.7047, Perplexity: 110.4676the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [225/6471], Loss: 4.7121, Perplexity: 111.2840the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [226/6471], Loss: 4.6455, Perplexity: 104.1195the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [227/6471], Loss: 4.9383, Perplexity: 139.5323the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [228/6471], Loss: 4.6734, Perplexity: 107.0624the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [229/6471], Loss: 4.6371, Perplexity: 103.2470the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [230/6471], Loss: 5.4075, Perplexity: 223.0639the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [231/6471], Loss: 4.6187, Perplexity: 101.3625the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [297/6471], Loss: 4.3830, Perplexity: 80.0808the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [298/6471], Loss: 4.3632, Perplexity: 78.5045the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [299/6471], Loss: 4.5877, Perplexity: 98.2670the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [300/6471], Loss: 4.4213, Perplexity: 83.2069the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [301/6471], Loss: 4.2205, Perplexity: 68.0662the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [302/6471], Loss: 4.2548, Perplexity: 70.4409the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [303/6471], Loss: 4.4040, Perplexity: 81.7781the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [304/6471], Loss: 4.3619, Perplexity: 78.4073the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [305/6471], Loss: 4.7269, Perplexity: 112.9504the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step 

Epoch [1/3], Step [372/6471], Loss: 4.3293, Perplexity: 75.8949the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [373/6471], Loss: 4.5481, Perplexity: 94.4493the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [374/6471], Loss: 4.1253, Perplexity: 61.8876the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [375/6471], Loss: 4.2501, Perplexity: 70.1154the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [376/6471], Loss: 4.0717, Perplexity: 58.6594the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [377/6471], Loss: 4.3695, Perplexity: 79.0036the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [378/6471], Loss: 4.0730, Perplexity: 58.7313the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [379/6471], Loss: 4.3057, Perplexity: 74.1229the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [380/6471], Loss: 4.1480, Perplexity: 63.3077the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [

Epoch [1/3], Step [447/6471], Loss: 4.0134, Perplexity: 55.3342the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [448/6471], Loss: 4.1427, Perplexity: 62.9757the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [449/6471], Loss: 4.0723, Perplexity: 58.6923the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [450/6471], Loss: 4.2023, Perplexity: 66.8411the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [451/6471], Loss: 4.1204, Perplexity: 61.5810the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [452/6471], Loss: 4.1028, Perplexity: 60.5094the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [453/6471], Loss: 4.0773, Perplexity: 58.9859the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [454/6471], Loss: 4.2085, Perplexity: 67.2538the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [455/6471], Loss: 4.0079, Perplexity: 55.0333the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [

Epoch [1/3], Step [522/6471], Loss: 3.9740, Perplexity: 53.1973the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [523/6471], Loss: 4.1063, Perplexity: 60.7219the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [524/6471], Loss: 3.6600, Perplexity: 38.8599the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [525/6471], Loss: 3.9403, Perplexity: 51.4321the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [526/6471], Loss: 4.1314, Perplexity: 62.2653the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [527/6471], Loss: 4.1219, Perplexity: 61.6791the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [528/6471], Loss: 4.0906, Perplexity: 59.7730the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [529/6471], Loss: 3.9070, Perplexity: 49.7504the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [530/6471], Loss: 4.2971, Perplexity: 73.4833the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [

Epoch [1/3], Step [597/6471], Loss: 4.0422, Perplexity: 56.9539the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [598/6471], Loss: 4.0439, Perplexity: 57.0493the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [599/6471], Loss: 3.8144, Perplexity: 45.3480the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [600/6471], Loss: 3.8341, Perplexity: 46.2507
the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [601/6471], Loss: 3.7372, Perplexity: 41.9798the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [602/6471], Loss: 3.8240, Perplexity: 45.7867the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [603/6471], Loss: 3.6807, Perplexity: 39.6752the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [604/6471], Loss: 3.8509, Perplexity: 47.0348the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [605/6471], Loss: 3.9378, Perplexity: 51.3081the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step 

Epoch [1/3], Step [672/6471], Loss: 3.7742, Perplexity: 43.5638the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [673/6471], Loss: 3.9054, Perplexity: 49.6678the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [674/6471], Loss: 4.1122, Perplexity: 61.0833the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [675/6471], Loss: 3.8111, Perplexity: 45.1987the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [676/6471], Loss: 3.7045, Perplexity: 40.6307the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [677/6471], Loss: 3.7254, Perplexity: 41.4858the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [678/6471], Loss: 3.7413, Perplexity: 42.1546the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [679/6471], Loss: 3.6057, Perplexity: 36.8085the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [680/6471], Loss: 3.9940, Perplexity: 54.2706the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [

Epoch [1/3], Step [747/6471], Loss: 3.4683, Perplexity: 32.0828the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [748/6471], Loss: 3.9128, Perplexity: 50.0371the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [749/6471], Loss: 3.4631, Perplexity: 31.9171the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [750/6471], Loss: 3.5947, Perplexity: 36.4048the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [751/6471], Loss: 3.4163, Perplexity: 30.4575the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [752/6471], Loss: 3.6193, Perplexity: 37.3119the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [753/6471], Loss: 3.6679, Perplexity: 39.1688the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [754/6471], Loss: 4.2312, Perplexity: 68.7993the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [755/6471], Loss: 3.9396, Perplexity: 51.3961the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [

Epoch [1/3], Step [822/6471], Loss: 3.5722, Perplexity: 35.5937the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [823/6471], Loss: 3.8891, Perplexity: 48.8651the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [824/6471], Loss: 3.7951, Perplexity: 44.4808the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [825/6471], Loss: 3.8526, Perplexity: 47.1172the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [1/3], Step [826/6471], Loss: 4.2592, Perplexity: 70.7503the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [827/6471], Loss: 3.5692, Perplexity: 35.4865the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [828/6471], Loss: 3.6675, Perplexity: 39.1554the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [829/6471], Loss: 3.5207, Perplexity: 33.8087the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [830/6471], Loss: 3.6620, Perplexity: 38.9384the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [

Epoch [1/3], Step [897/6471], Loss: 3.7187, Perplexity: 41.2096the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [898/6471], Loss: 3.5715, Perplexity: 35.5713the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [899/6471], Loss: 3.4254, Perplexity: 30.7361the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [900/6471], Loss: 3.7106, Perplexity: 40.8769the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [901/6471], Loss: 3.4402, Perplexity: 31.1923the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [902/6471], Loss: 3.3901, Perplexity: 29.6686the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [903/6471], Loss: 3.4065, Perplexity: 30.1605the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [904/6471], Loss: 3.5762, Perplexity: 35.7381the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [905/6471], Loss: 3.2111, Perplexity: 24.8064the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [

Epoch [1/3], Step [972/6471], Loss: 3.5703, Perplexity: 35.5274the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [973/6471], Loss: 3.5852, Perplexity: 36.0619the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [974/6471], Loss: 3.5273, Perplexity: 34.0335the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [975/6471], Loss: 3.6244, Perplexity: 37.5016the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [976/6471], Loss: 3.2786, Perplexity: 26.5382the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [977/6471], Loss: 3.6012, Perplexity: 36.6423the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [1/3], Step [978/6471], Loss: 4.0877, Perplexity: 59.6046the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [979/6471], Loss: 3.5652, Perplexity: 35.3454the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [980/6471], Loss: 3.8266, Perplexity: 45.9045the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [

Epoch [1/3], Step [1046/6471], Loss: 3.3894, Perplexity: 29.6478the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1047/6471], Loss: 3.4931, Perplexity: 32.8881the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1048/6471], Loss: 3.7079, Perplexity: 40.7687the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [1049/6471], Loss: 3.6602, Perplexity: 38.8707the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1050/6471], Loss: 3.5278, Perplexity: 34.0490the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [1051/6471], Loss: 4.1102, Perplexity: 60.9576the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1052/6471], Loss: 3.3343, Perplexity: 28.0582the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1053/6471], Loss: 3.5266, Perplexity: 34.0092the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [1054/6471], Loss: 3.9236, Perplexity: 50.5828the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [1120/6471], Loss: 3.8016, Perplexity: 44.7717the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1121/6471], Loss: 3.4045, Perplexity: 30.0991the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1122/6471], Loss: 3.2675, Perplexity: 26.2448the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1123/6471], Loss: 3.7602, Perplexity: 42.9587the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1124/6471], Loss: 3.3069, Perplexity: 27.2994the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1125/6471], Loss: 3.2846, Perplexity: 26.6970the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1126/6471], Loss: 3.1363, Perplexity: 23.0195the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1127/6471], Loss: 3.6321, Perplexity: 37.7932the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1128/6471], Loss: 3.2039, Perplexity: 24.6276the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [1194/6471], Loss: 3.2067, Perplexity: 24.6967the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1195/6471], Loss: 3.4926, Perplexity: 32.8710the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [1196/6471], Loss: 3.8383, Perplexity: 46.4458the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [1197/6471], Loss: 3.7470, Perplexity: 42.3919the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1198/6471], Loss: 3.4222, Perplexity: 30.6382the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1199/6471], Loss: 3.3596, Perplexity: 28.7765the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1200/6471], Loss: 3.4114, Perplexity: 30.3088
the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1201/6471], Loss: 3.4082, Perplexity: 30.2112the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1202/6471], Loss: 3.3713, Perplexity: 29.1178the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/

Epoch [1/3], Step [1268/6471], Loss: 3.4477, Perplexity: 31.4285the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1269/6471], Loss: 3.3359, Perplexity: 28.1048the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [1270/6471], Loss: 3.4287, Perplexity: 30.8364the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [1271/6471], Loss: 3.6772, Perplexity: 39.5343the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1272/6471], Loss: 3.3963, Perplexity: 29.8525the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1273/6471], Loss: 3.2514, Perplexity: 25.8269the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1274/6471], Loss: 3.3452, Perplexity: 28.3661the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1275/6471], Loss: 3.3391, Perplexity: 28.1948the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1276/6471], Loss: 3.3709, Perplexity: 29.1041the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [1342/6471], Loss: 3.4422, Perplexity: 31.2547the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [1343/6471], Loss: 3.9064, Perplexity: 49.7191the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1344/6471], Loss: 3.3546, Perplexity: 28.6340the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1345/6471], Loss: 3.4998, Perplexity: 33.1091the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1346/6471], Loss: 3.3546, Perplexity: 28.6339the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [1/3], Step [1347/6471], Loss: 4.2737, Perplexity: 71.7855the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1348/6471], Loss: 3.4434, Perplexity: 31.2940the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [1349/6471], Loss: 3.8842, Perplexity: 48.6261the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1350/6471], Loss: 3.1813, Perplexity: 24.0774the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [1416/6471], Loss: 3.3295, Perplexity: 27.9254the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [1417/6471], Loss: 3.8503, Perplexity: 47.0091the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1418/6471], Loss: 3.3699, Perplexity: 29.0763the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1419/6471], Loss: 3.2403, Perplexity: 25.5415the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [1420/6471], Loss: 3.8051, Perplexity: 44.9312the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1421/6471], Loss: 3.0676, Perplexity: 21.4892the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1422/6471], Loss: 3.2302, Perplexity: 25.2835the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1423/6471], Loss: 3.5174, Perplexity: 33.6972the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1424/6471], Loss: 3.0939, Perplexity: 22.0626the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [1490/6471], Loss: 3.6444, Perplexity: 38.2601the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1491/6471], Loss: 3.3329, Perplexity: 28.0194the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1492/6471], Loss: 3.2188, Perplexity: 24.9971the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1493/6471], Loss: 3.3680, Perplexity: 29.0194the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1494/6471], Loss: 3.4114, Perplexity: 30.3071the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1495/6471], Loss: 3.1296, Perplexity: 22.8657the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1496/6471], Loss: 3.3684, Perplexity: 29.0326the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1497/6471], Loss: 3.3401, Perplexity: 28.2223the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1498/6471], Loss: 3.4282, Perplexity: 30.8209the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [1564/6471], Loss: 3.2257, Perplexity: 25.1703the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1565/6471], Loss: 3.1893, Perplexity: 24.2723the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [1566/6471], Loss: 3.3176, Perplexity: 27.5950the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1567/6471], Loss: 3.1582, Perplexity: 23.5282the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1568/6471], Loss: 3.3011, Perplexity: 27.1414the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [1569/6471], Loss: 3.7850, Perplexity: 44.0366the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [1570/6471], Loss: 3.8265, Perplexity: 45.8996the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [1571/6471], Loss: 3.4375, Perplexity: 31.1097the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1572/6471], Loss: 3.1961, Perplexity: 24.4376the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3

Epoch [1/3], Step [1638/6471], Loss: 3.4878, Perplexity: 32.7125the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1639/6471], Loss: 3.3277, Perplexity: 27.8745the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1640/6471], Loss: 3.2051, Perplexity: 24.6588the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1641/6471], Loss: 3.2554, Perplexity: 25.9310the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1642/6471], Loss: 3.2399, Perplexity: 25.5322the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1643/6471], Loss: 3.1643, Perplexity: 23.6724the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1644/6471], Loss: 3.1668, Perplexity: 23.7314the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1645/6471], Loss: 3.0803, Perplexity: 21.7653the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1646/6471], Loss: 3.1487, Perplexity: 23.3057the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [1712/6471], Loss: 3.0759, Perplexity: 21.6700the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1713/6471], Loss: 3.0331, Perplexity: 20.7622the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1714/6471], Loss: 3.1395, Perplexity: 23.0919the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [1715/6471], Loss: 3.2430, Perplexity: 25.6108the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [1/3], Step [1716/6471], Loss: 3.7722, Perplexity: 43.4763the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [1717/6471], Loss: 3.3980, Perplexity: 29.9035the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1718/6471], Loss: 3.5770, Perplexity: 35.7664the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [1719/6471], Loss: 3.5990, Perplexity: 36.5631the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1720/6471], Loss: 3.2300, Perplexity: 25.2794the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [1786/6471], Loss: 3.2815, Perplexity: 26.6153the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1787/6471], Loss: 3.1737, Perplexity: 23.8950the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1788/6471], Loss: 3.1843, Perplexity: 24.1498the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [1789/6471], Loss: 3.6690, Perplexity: 39.2115the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [1790/6471], Loss: 3.2389, Perplexity: 25.5050the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1791/6471], Loss: 3.1495, Perplexity: 23.3242the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [1792/6471], Loss: 3.3360, Perplexity: 28.1057the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1793/6471], Loss: 3.1872, Perplexity: 24.2206the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [1794/6471], Loss: 3.3025, Perplexity: 27.1814the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3

Epoch [1/3], Step [1860/6471], Loss: 3.1451, Perplexity: 23.2224the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1861/6471], Loss: 3.2071, Perplexity: 24.7063the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1862/6471], Loss: 3.1136, Perplexity: 22.5020the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1863/6471], Loss: 3.2737, Perplexity: 26.4099the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [1864/6471], Loss: 3.1679, Perplexity: 23.7580the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1865/6471], Loss: 2.8088, Perplexity: 16.5894the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1866/6471], Loss: 3.3918, Perplexity: 29.7180the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [1867/6471], Loss: 4.1153, Perplexity: 61.2679the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1868/6471], Loss: 2.9007, Perplexity: 18.1871the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [1934/6471], Loss: 3.6830, Perplexity: 39.7673the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [1935/6471], Loss: 3.2617, Perplexity: 26.0947the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [1936/6471], Loss: 2.8862, Perplexity: 17.9255the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1937/6471], Loss: 3.2746, Perplexity: 26.4317the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1938/6471], Loss: 3.1589, Perplexity: 23.5437the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [1939/6471], Loss: 3.2378, Perplexity: 25.4784the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [1940/6471], Loss: 3.2926, Perplexity: 26.9119the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [1941/6471], Loss: 3.1324, Perplexity: 22.9287the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [1942/6471], Loss: 3.2755, Perplexity: 26.4570the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3

Epoch [1/3], Step [2008/6471], Loss: 3.6093, Perplexity: 36.9392the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2009/6471], Loss: 2.8499, Perplexity: 17.2863the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [2010/6471], Loss: 3.2123, Perplexity: 24.8372the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [2011/6471], Loss: 3.2419, Perplexity: 25.5829the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2012/6471], Loss: 3.0696, Perplexity: 21.5323the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2013/6471], Loss: 3.0689, Perplexity: 21.5178the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2014/6471], Loss: 3.1081, Perplexity: 22.3779the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2015/6471], Loss: 2.9919, Perplexity: 19.9236the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2016/6471], Loss: 3.2260, Perplexity: 25.1794the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [2082/6471], Loss: 3.2146, Perplexity: 24.8921the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2083/6471], Loss: 3.0446, Perplexity: 21.0026the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2084/6471], Loss: 3.2303, Perplexity: 25.2871the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2085/6471], Loss: 3.1561, Perplexity: 23.4784the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2086/6471], Loss: 2.9613, Perplexity: 19.3228the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [2087/6471], Loss: 3.1123, Perplexity: 22.4718the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2088/6471], Loss: 2.9993, Perplexity: 20.0719the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2089/6471], Loss: 3.0820, Perplexity: 21.8024the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [2090/6471], Loss: 3.6449, Perplexity: 38.2804the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [2156/6471], Loss: 3.2089, Perplexity: 24.7515the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2157/6471], Loss: 3.1002, Perplexity: 22.2033the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2158/6471], Loss: 3.0466, Perplexity: 21.0432the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2159/6471], Loss: 2.9531, Perplexity: 19.1653the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2160/6471], Loss: 2.8665, Perplexity: 17.5759the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [2161/6471], Loss: 3.1105, Perplexity: 22.4320the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [2162/6471], Loss: 3.6229, Perplexity: 37.4453the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [2163/6471], Loss: 3.3794, Perplexity: 29.3520the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2164/6471], Loss: 3.0973, Perplexity: 22.1384the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [2230/6471], Loss: 3.4648, Perplexity: 31.9708the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2231/6471], Loss: 3.0587, Perplexity: 21.2995the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [2232/6471], Loss: 3.0041, Perplexity: 20.1672the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2233/6471], Loss: 3.1475, Perplexity: 23.2769the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2234/6471], Loss: 2.9854, Perplexity: 19.7947the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [2235/6471], Loss: 3.2405, Perplexity: 25.5476the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2236/6471], Loss: 2.9322, Perplexity: 18.7690the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2237/6471], Loss: 3.1584, Perplexity: 23.5334the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2238/6471], Loss: 3.1645, Perplexity: 23.6772the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [2304/6471], Loss: 2.9840, Perplexity: 19.7665the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2305/6471], Loss: 2.9312, Perplexity: 18.7501the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2306/6471], Loss: 2.9621, Perplexity: 19.3378the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2307/6471], Loss: 2.8037, Perplexity: 16.5049the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2308/6471], Loss: 3.0562, Perplexity: 21.2472the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2309/6471], Loss: 2.9818, Perplexity: 19.7230the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2310/6471], Loss: 2.9293, Perplexity: 18.7149the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2311/6471], Loss: 3.2129, Perplexity: 24.8516the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [2312/6471], Loss: 3.3525, Perplexity: 28.5753the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3

Epoch [1/3], Step [2378/6471], Loss: 2.7738, Perplexity: 16.0189the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2379/6471], Loss: 2.8022, Perplexity: 16.4811the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2380/6471], Loss: 3.0539, Perplexity: 21.1973the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2381/6471], Loss: 2.9081, Perplexity: 18.3211the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2382/6471], Loss: 2.8697, Perplexity: 17.6310the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2383/6471], Loss: 3.1582, Perplexity: 23.5290the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [2384/6471], Loss: 3.1028, Perplexity: 22.2599the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2385/6471], Loss: 2.8011, Perplexity: 16.4622the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [2386/6471], Loss: 3.2339, Perplexity: 25.3779the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [2452/6471], Loss: 3.0083, Perplexity: 20.2533the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2453/6471], Loss: 2.8785, Perplexity: 17.7877the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2454/6471], Loss: 3.1416, Perplexity: 23.1409the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2455/6471], Loss: 3.1512, Perplexity: 23.3651the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [2456/6471], Loss: 3.4403, Perplexity: 31.1961the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [2457/6471], Loss: 3.2911, Perplexity: 26.8718the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2458/6471], Loss: 3.0990, Perplexity: 22.1747the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2459/6471], Loss: 3.0284, Perplexity: 20.6649the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2460/6471], Loss: 3.0689, Perplexity: 21.5178the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [2526/6471], Loss: 2.7071, Perplexity: 14.9854the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2527/6471], Loss: 2.8015, Perplexity: 16.4700the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [2528/6471], Loss: 3.0547, Perplexity: 21.2153the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2529/6471], Loss: 3.0955, Perplexity: 22.0982the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2530/6471], Loss: 2.9711, Perplexity: 19.5126the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [2531/6471], Loss: 2.9112, Perplexity: 18.3782the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [2532/6471], Loss: 3.2807, Perplexity: 26.5948the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2533/6471], Loss: 2.9353, Perplexity: 18.8280the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2534/6471], Loss: 2.9296, Perplexity: 18.7193the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [2600/6471], Loss: 3.0576, Perplexity: 21.2754
the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2601/6471], Loss: 2.8849, Perplexity: 17.9025the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [2602/6471], Loss: 3.1442, Perplexity: 23.2020the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2603/6471], Loss: 2.8357, Perplexity: 17.0420the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2604/6471], Loss: 3.0980, Perplexity: 22.1533the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2605/6471], Loss: 2.8040, Perplexity: 16.5110the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2606/6471], Loss: 3.2086, Perplexity: 24.7453the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [2607/6471], Loss: 3.2672, Perplexity: 26.2383the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2608/6471], Loss: 2.8156, Perplexity: 16.7034the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/

Epoch [1/3], Step [2674/6471], Loss: 3.3656, Perplexity: 28.9522the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [2675/6471], Loss: 3.0653, Perplexity: 21.4403the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2676/6471], Loss: 2.7410, Perplexity: 15.5018the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [2677/6471], Loss: 2.9901, Perplexity: 19.8869the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2678/6471], Loss: 2.9278, Perplexity: 18.6870the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2679/6471], Loss: 2.8544, Perplexity: 17.3636the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2680/6471], Loss: 2.7941, Perplexity: 16.3482the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [2681/6471], Loss: 3.0562, Perplexity: 21.2459the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2682/6471], Loss: 2.8164, Perplexity: 16.7158the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [2748/6471], Loss: 2.9522, Perplexity: 19.1480the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2749/6471], Loss: 2.7869, Perplexity: 16.2298the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2750/6471], Loss: 3.0987, Perplexity: 22.1687the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [2751/6471], Loss: 3.6778, Perplexity: 39.5590the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2752/6471], Loss: 2.8447, Perplexity: 17.1970the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2753/6471], Loss: 2.8968, Perplexity: 18.1158the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2754/6471], Loss: 2.8799, Perplexity: 17.8117the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2755/6471], Loss: 2.8925, Perplexity: 18.0383the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2756/6471], Loss: 2.9379, Perplexity: 18.8757the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [2822/6471], Loss: 2.8419, Perplexity: 17.1477the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2823/6471], Loss: 3.0670, Perplexity: 21.4784the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [2824/6471], Loss: 2.7398, Perplexity: 15.4840the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2825/6471], Loss: 2.8376, Perplexity: 17.0745the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [2826/6471], Loss: 3.3016, Perplexity: 27.1567the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [2827/6471], Loss: 2.9841, Perplexity: 19.7696the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [2828/6471], Loss: 3.0288, Perplexity: 20.6722the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2829/6471], Loss: 3.0138, Perplexity: 20.3639the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2830/6471], Loss: 3.0170, Perplexity: 20.4296the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [2896/6471], Loss: 3.1418, Perplexity: 23.1451the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2897/6471], Loss: 2.7603, Perplexity: 15.8039the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2898/6471], Loss: 2.7557, Perplexity: 15.7316the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2899/6471], Loss: 2.9087, Perplexity: 18.3336the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2900/6471], Loss: 2.6954, Perplexity: 14.8119the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [2901/6471], Loss: 3.0032, Perplexity: 20.1498the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [2902/6471], Loss: 2.8040, Perplexity: 16.5110the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2903/6471], Loss: 2.9668, Perplexity: 19.4293the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2904/6471], Loss: 2.8925, Perplexity: 18.0384the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [2970/6471], Loss: 2.9652, Perplexity: 19.3993the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2971/6471], Loss: 2.9284, Perplexity: 18.6971the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2972/6471], Loss: 2.6983, Perplexity: 14.8547the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2973/6471], Loss: 2.7500, Perplexity: 15.6427the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [2974/6471], Loss: 3.0117, Perplexity: 20.3211the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [2975/6471], Loss: 3.0451, Perplexity: 21.0117the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [1/3], Step [2976/6471], Loss: 3.6636, Perplexity: 39.0012the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [2977/6471], Loss: 2.6521, Perplexity: 14.1841the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [2978/6471], Loss: 3.0452, Perplexity: 21.0139the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [3044/6471], Loss: 2.6877, Perplexity: 14.6973the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [3045/6471], Loss: 3.0240, Perplexity: 20.5739the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3046/6471], Loss: 2.7520, Perplexity: 15.6744the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3047/6471], Loss: 2.8363, Perplexity: 17.0531the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3048/6471], Loss: 2.8402, Perplexity: 17.1187the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3049/6471], Loss: 2.7497, Perplexity: 15.6386the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3050/6471], Loss: 2.9257, Perplexity: 18.6465the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3051/6471], Loss: 2.9661, Perplexity: 19.4160the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3052/6471], Loss: 2.9145, Perplexity: 18.4391the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [3118/6471], Loss: 3.0087, Perplexity: 20.2620the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [1/3], Step [3119/6471], Loss: 3.1528, Perplexity: 23.4017the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3120/6471], Loss: 2.7883, Perplexity: 16.2528the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3121/6471], Loss: 2.7090, Perplexity: 15.0139the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3122/6471], Loss: 2.7358, Perplexity: 15.4227the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [3123/6471], Loss: 3.0853, Perplexity: 21.8739the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [3124/6471], Loss: 3.7506, Perplexity: 42.5475the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3125/6471], Loss: 3.1122, Perplexity: 22.4706the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [1/3], Step [3126/6471], Loss: 3.5943, Perplexity: 36.3906the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [3192/6471], Loss: 2.6783, Perplexity: 14.5605the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [1/3], Step [3193/6471], Loss: 3.4392, Perplexity: 31.1616the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3194/6471], Loss: 2.8342, Perplexity: 17.0168the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [1/3], Step [3195/6471], Loss: 3.1975, Perplexity: 24.4722the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3196/6471], Loss: 2.8915, Perplexity: 18.0212the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3197/6471], Loss: 2.8197, Perplexity: 16.7719the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3198/6471], Loss: 2.8537, Perplexity: 17.3519the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3199/6471], Loss: 2.7483, Perplexity: 15.6154the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3200/6471], Loss: 2.9292, Perplexity: 18.7134
the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/

Epoch [1/3], Step [3266/6471], Loss: 2.8777, Perplexity: 17.7732the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3267/6471], Loss: 2.8376, Perplexity: 17.0747the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3268/6471], Loss: 2.6434, Perplexity: 14.0607the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3269/6471], Loss: 2.7311, Perplexity: 15.3501the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3270/6471], Loss: 2.7989, Perplexity: 16.4268the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3271/6471], Loss: 2.5914, Perplexity: 13.3490the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [3272/6471], Loss: 3.0739, Perplexity: 21.6257the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3273/6471], Loss: 2.5540, Perplexity: 12.8588the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3274/6471], Loss: 2.6466, Perplexity: 14.1063the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [3340/6471], Loss: 3.1626, Perplexity: 23.6324the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3341/6471], Loss: 2.8461, Perplexity: 17.2201the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3342/6471], Loss: 2.7071, Perplexity: 14.9858the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3343/6471], Loss: 2.9463, Perplexity: 19.0363the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3344/6471], Loss: 2.8581, Perplexity: 17.4291the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3345/6471], Loss: 2.8387, Perplexity: 17.0937the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3346/6471], Loss: 2.8469, Perplexity: 17.2351the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3347/6471], Loss: 2.7250, Perplexity: 15.2557the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3348/6471], Loss: 2.8278, Perplexity: 16.9090the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [3414/6471], Loss: 2.6769, Perplexity: 14.5397the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3415/6471], Loss: 2.7228, Perplexity: 15.2224the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [1/3], Step [3416/6471], Loss: 2.9131, Perplexity: 18.4133the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [3417/6471], Loss: 3.0315, Perplexity: 20.7278the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3418/6471], Loss: 2.7557, Perplexity: 15.7318the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [3419/6471], Loss: 3.0057, Perplexity: 20.2000the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3420/6471], Loss: 2.9335, Perplexity: 18.7933the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3421/6471], Loss: 2.7337, Perplexity: 15.3899the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3422/6471], Loss: 2.8324, Perplexity: 16.9855the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [3488/6471], Loss: 2.8058, Perplexity: 16.5396the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3489/6471], Loss: 2.6389, Perplexity: 13.9974the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3490/6471], Loss: 3.1221, Perplexity: 22.6933the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3491/6471], Loss: 2.6652, Perplexity: 14.3704the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [3492/6471], Loss: 3.5143, Perplexity: 33.5932the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3493/6471], Loss: 2.8828, Perplexity: 17.8640the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3494/6471], Loss: 2.9123, Perplexity: 18.3997the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3495/6471], Loss: 2.8544, Perplexity: 17.3634the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3496/6471], Loss: 2.8461, Perplexity: 17.2211the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [1/3

Epoch [1/3], Step [3562/6471], Loss: 2.7434, Perplexity: 15.5401the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3563/6471], Loss: 2.6615, Perplexity: 14.3171the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3564/6471], Loss: 2.9519, Perplexity: 19.1414the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3565/6471], Loss: 2.7296, Perplexity: 15.3265the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3566/6471], Loss: 2.5898, Perplexity: 13.3272the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3567/6471], Loss: 2.5212, Perplexity: 12.4434the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3568/6471], Loss: 2.8795, Perplexity: 17.8056the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3569/6471], Loss: 2.7838, Perplexity: 16.1800the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [3570/6471], Loss: 2.9986, Perplexity: 20.0568the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [3636/6471], Loss: 2.6668, Perplexity: 14.3942the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [3637/6471], Loss: 3.4122, Perplexity: 30.3330the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3638/6471], Loss: 2.5862, Perplexity: 13.2794the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3639/6471], Loss: 2.6967, Perplexity: 14.8305the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3640/6471], Loss: 2.7958, Perplexity: 16.3760the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3641/6471], Loss: 2.6884, Perplexity: 14.7084the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3642/6471], Loss: 2.7953, Perplexity: 16.3677the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3643/6471], Loss: 2.7286, Perplexity: 15.3121the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [3644/6471], Loss: 3.7645, Perplexity: 43.1426the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [3710/6471], Loss: 2.8857, Perplexity: 17.9165the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3711/6471], Loss: 2.6753, Perplexity: 14.5166the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3712/6471], Loss: 2.9739, Perplexity: 19.5679the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [3713/6471], Loss: 3.2505, Perplexity: 25.8037the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3714/6471], Loss: 2.6827, Perplexity: 14.6251the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3715/6471], Loss: 2.6491, Perplexity: 14.1417the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3716/6471], Loss: 2.9070, Perplexity: 18.3016the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3717/6471], Loss: 2.8571, Perplexity: 17.4102the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3718/6471], Loss: 2.4982, Perplexity: 12.1606the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [3784/6471], Loss: 2.5294, Perplexity: 12.5463the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3785/6471], Loss: 2.7832, Perplexity: 16.1710the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3786/6471], Loss: 2.5897, Perplexity: 13.3261the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [3787/6471], Loss: 2.9267, Perplexity: 18.6654the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3788/6471], Loss: 2.6838, Perplexity: 14.6409the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [3789/6471], Loss: 3.0139, Perplexity: 20.3660the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3790/6471], Loss: 2.5594, Perplexity: 12.9283the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3791/6471], Loss: 2.5544, Perplexity: 12.8640the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3792/6471], Loss: 2.6843, Perplexity: 14.6479the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [3858/6471], Loss: 2.7401, Perplexity: 15.4885the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3859/6471], Loss: 2.5580, Perplexity: 12.9100the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3860/6471], Loss: 2.8107, Perplexity: 16.6217the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3861/6471], Loss: 2.8786, Perplexity: 17.7898the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [3862/6471], Loss: 3.0226, Perplexity: 20.5444the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3863/6471], Loss: 2.6629, Perplexity: 14.3381the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [3864/6471], Loss: 2.6001, Perplexity: 13.4649the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [3865/6471], Loss: 2.8168, Perplexity: 16.7240the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3866/6471], Loss: 2.6610, Perplexity: 14.3103the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [3932/6471], Loss: 2.6477, Perplexity: 14.1213the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [3933/6471], Loss: 2.4343, Perplexity: 11.4075the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3934/6471], Loss: 2.5941, Perplexity: 13.3845the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [3935/6471], Loss: 2.5936, Perplexity: 13.3775the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [3936/6471], Loss: 2.9443, Perplexity: 18.9971the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [3937/6471], Loss: 3.0055, Perplexity: 20.1970the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [3938/6471], Loss: 2.7725, Perplexity: 15.9988the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [3939/6471], Loss: 3.1531, Perplexity: 23.4087the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [3940/6471], Loss: 2.8890, Perplexity: 17.9761the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3

Epoch [1/3], Step [4006/6471], Loss: 2.6318, Perplexity: 13.8987the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4007/6471], Loss: 2.5088, Perplexity: 12.2899the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4008/6471], Loss: 2.5547, Perplexity: 12.8672the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4009/6471], Loss: 2.4701, Perplexity: 11.8241the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4010/6471], Loss: 2.5962, Perplexity: 13.4129the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4011/6471], Loss: 2.6546, Perplexity: 14.2193the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4012/6471], Loss: 2.7216, Perplexity: 15.2040the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4013/6471], Loss: 2.6064, Perplexity: 13.5505the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4014/6471], Loss: 2.7314, Perplexity: 15.3550the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [4080/6471], Loss: 2.7185, Perplexity: 15.1578the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [4081/6471], Loss: 3.0235, Perplexity: 20.5628the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4082/6471], Loss: 2.4291, Perplexity: 11.3488the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4083/6471], Loss: 2.5906, Perplexity: 13.3373the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4084/6471], Loss: 2.4141, Perplexity: 11.1795the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4085/6471], Loss: 2.6200, Perplexity: 13.7364the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4086/6471], Loss: 2.4479, Perplexity: 11.5639the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4087/6471], Loss: 2.6752, Perplexity: 14.5158the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4088/6471], Loss: 2.7637, Perplexity: 15.8592the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [4154/6471], Loss: 2.6833, Perplexity: 14.6328the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [4155/6471], Loss: 3.1727, Perplexity: 23.8711the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [4156/6471], Loss: 2.6826, Perplexity: 14.6224the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [4157/6471], Loss: 2.7963, Perplexity: 16.3846the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4158/6471], Loss: 2.6179, Perplexity: 13.7067the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4159/6471], Loss: 2.5827, Perplexity: 13.2332the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4160/6471], Loss: 2.7402, Perplexity: 15.4902the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [4161/6471], Loss: 2.7164, Perplexity: 15.1260the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [4162/6471], Loss: 2.7904, Perplexity: 16.2873the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [4228/6471], Loss: 2.8642, Perplexity: 17.5344the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4229/6471], Loss: 2.5224, Perplexity: 12.4580the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4230/6471], Loss: 2.7206, Perplexity: 15.1893the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [4231/6471], Loss: 2.7310, Perplexity: 15.3477the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4232/6471], Loss: 2.6140, Perplexity: 13.6539the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4233/6471], Loss: 2.6619, Perplexity: 14.3234the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4234/6471], Loss: 2.8111, Perplexity: 16.6288the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4235/6471], Loss: 2.3868, Perplexity: 10.8788the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [4236/6471], Loss: 3.1143, Perplexity: 22.5177the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [4302/6471], Loss: 2.6832, Perplexity: 14.6318the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4303/6471], Loss: 2.6103, Perplexity: 13.6030the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4304/6471], Loss: 2.9143, Perplexity: 18.4356the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4305/6471], Loss: 2.9113, Perplexity: 18.3815the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4306/6471], Loss: 2.6827, Perplexity: 14.6250the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4307/6471], Loss: 2.4921, Perplexity: 12.0872the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4308/6471], Loss: 2.7221, Perplexity: 15.2122the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4309/6471], Loss: 2.6045, Perplexity: 13.5249the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4310/6471], Loss: 2.4903, Perplexity: 12.0650the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [4376/6471], Loss: 2.8390, Perplexity: 17.0987the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4377/6471], Loss: 2.7975, Perplexity: 16.4038the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [4378/6471], Loss: 3.0722, Perplexity: 21.5898the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [4379/6471], Loss: 2.5828, Perplexity: 13.2337the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4380/6471], Loss: 2.5669, Perplexity: 13.0257the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4381/6471], Loss: 2.7160, Perplexity: 15.1193the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4382/6471], Loss: 2.5841, Perplexity: 13.2509the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4383/6471], Loss: 2.6865, Perplexity: 14.6797the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4384/6471], Loss: 2.3219, Perplexity: 10.1946the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3

Epoch [1/3], Step [4450/6471], Loss: 2.7863, Perplexity: 16.2205the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [4451/6471], Loss: 3.0000, Perplexity: 20.0859the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [4452/6471], Loss: 2.8954, Perplexity: 18.0905the hiddens.shape: torch.Size([64, 28, 1048])
Epoch [1/3], Step [4453/6471], Loss: 3.9582, Perplexity: 52.3622the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4454/6471], Loss: 2.6050, Perplexity: 13.5315the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4455/6471], Loss: 2.8506, Perplexity: 17.2974the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [4456/6471], Loss: 2.9033, Perplexity: 18.2349the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [4457/6471], Loss: 2.9041, Perplexity: 18.2494the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [4458/6471], Loss: 2.8122, Perplexity: 16.6467the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3

Epoch [1/3], Step [4524/6471], Loss: 2.6190, Perplexity: 13.7218the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [4525/6471], Loss: 2.8487, Perplexity: 17.2651the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [4526/6471], Loss: 3.3946, Perplexity: 29.8033the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [1/3], Step [4527/6471], Loss: 2.9075, Perplexity: 18.3112the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4528/6471], Loss: 2.6996, Perplexity: 14.8742the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4529/6471], Loss: 2.5933, Perplexity: 13.3734the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4530/6471], Loss: 2.7501, Perplexity: 15.6447the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [4531/6471], Loss: 3.1425, Perplexity: 23.1613the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [4532/6471], Loss: 2.8096, Perplexity: 16.6025the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [4598/6471], Loss: 2.6409, Perplexity: 14.0260the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4599/6471], Loss: 2.3062, Perplexity: 10.0365the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [1/3], Step [4600/6471], Loss: 3.2788, Perplexity: 26.5432
the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4601/6471], Loss: 2.5863, Perplexity: 13.2806the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [4602/6471], Loss: 2.7033, Perplexity: 14.9284the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [1/3], Step [4603/6471], Loss: 3.3557, Perplexity: 28.6647the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [4604/6471], Loss: 3.0125, Perplexity: 20.3378the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4605/6471], Loss: 2.6433, Perplexity: 14.0597the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4606/6471], Loss: 2.6693, Perplexity: 14.4294the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/

Epoch [1/3], Step [4672/6471], Loss: 2.7937, Perplexity: 16.3411the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4673/6471], Loss: 2.9649, Perplexity: 19.3930the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4674/6471], Loss: 2.7662, Perplexity: 15.8984the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4675/6471], Loss: 2.5105, Perplexity: 12.3110the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [4676/6471], Loss: 3.2190, Perplexity: 25.0023the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [4677/6471], Loss: 2.9518, Perplexity: 19.1395the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4678/6471], Loss: 2.6170, Perplexity: 13.6951the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4679/6471], Loss: 2.4651, Perplexity: 11.7649the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4680/6471], Loss: 2.5888, Perplexity: 13.3132the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [4746/6471], Loss: 2.6965, Perplexity: 14.8278the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [4747/6471], Loss: 3.2684, Perplexity: 26.2704the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4748/6471], Loss: 2.5324, Perplexity: 12.5837the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [4749/6471], Loss: 3.3070, Perplexity: 27.3041the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [4750/6471], Loss: 2.8068, Perplexity: 16.5573the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [4751/6471], Loss: 3.1037, Perplexity: 22.2795the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4752/6471], Loss: 2.5427, Perplexity: 12.7142the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [4753/6471], Loss: 2.8868, Perplexity: 17.9355the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4754/6471], Loss: 2.3778, Perplexity: 10.7812the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [4820/6471], Loss: 2.6186, Perplexity: 13.7164the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4821/6471], Loss: 2.7918, Perplexity: 16.3099the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4822/6471], Loss: 2.6590, Perplexity: 14.2819the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4823/6471], Loss: 2.6658, Perplexity: 14.3797the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4824/6471], Loss: 2.6351, Perplexity: 13.9443the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4825/6471], Loss: 2.4413, Perplexity: 11.4883the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4826/6471], Loss: 2.6336, Perplexity: 13.9236the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4827/6471], Loss: 2.6611, Perplexity: 14.3114the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4828/6471], Loss: 2.5827, Perplexity: 13.2325the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [4894/6471], Loss: 2.5817, Perplexity: 13.2191the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [4895/6471], Loss: 2.9666, Perplexity: 19.4266the hiddens.shape: torch.Size([64, 28, 1048])
Epoch [1/3], Step [4896/6471], Loss: 3.8883, Perplexity: 48.8262the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4897/6471], Loss: 2.5597, Perplexity: 12.9317the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4898/6471], Loss: 2.4143, Perplexity: 11.1824the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4899/6471], Loss: 2.3171, Perplexity: 10.1463the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4900/6471], Loss: 2.4973, Perplexity: 12.1491the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4901/6471], Loss: 2.5732, Perplexity: 13.1080the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4902/6471], Loss: 2.3879, Perplexity: 10.8908the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [4968/6471], Loss: 2.4850, Perplexity: 12.0015the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [4969/6471], Loss: 3.0167, Perplexity: 20.4235the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4970/6471], Loss: 2.5239, Perplexity: 12.4774the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4971/6471], Loss: 2.4875, Perplexity: 12.0313the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [4972/6471], Loss: 2.6500, Perplexity: 14.1536the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [4973/6471], Loss: 2.3956, Perplexity: 10.9745the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [4974/6471], Loss: 2.3996, Perplexity: 11.0192the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4975/6471], Loss: 2.7947, Perplexity: 16.3572the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [4976/6471], Loss: 2.8675, Perplexity: 17.5924the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3

Epoch [1/3], Step [5042/6471], Loss: 2.4927, Perplexity: 12.0940the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5043/6471], Loss: 2.7085, Perplexity: 15.0067the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5044/6471], Loss: 2.5275, Perplexity: 12.5223the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5045/6471], Loss: 2.6998, Perplexity: 14.8765the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5046/6471], Loss: 2.5978, Perplexity: 13.4338the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5047/6471], Loss: 2.4808, Perplexity: 11.9510the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5048/6471], Loss: 2.5578, Perplexity: 12.9073the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [5049/6471], Loss: 2.6454, Perplexity: 14.0885the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5050/6471], Loss: 2.6034, Perplexity: 13.5095the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [5116/6471], Loss: 2.7802, Perplexity: 16.1228the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5117/6471], Loss: 2.3867, Perplexity: 10.8774the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5118/6471], Loss: 2.5438, Perplexity: 12.7274the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5119/6471], Loss: 2.6847, Perplexity: 14.6537the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5120/6471], Loss: 2.8014, Perplexity: 16.4679the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5121/6471], Loss: 2.4765, Perplexity: 11.8992the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5122/6471], Loss: 2.5392, Perplexity: 12.6693the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5123/6471], Loss: 2.4591, Perplexity: 11.6938the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5124/6471], Loss: 2.4993, Perplexity: 12.1744the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [5190/6471], Loss: 2.7777, Perplexity: 16.0827the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5191/6471], Loss: 2.7855, Perplexity: 16.2081the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5192/6471], Loss: 2.4458, Perplexity: 11.5396the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5193/6471], Loss: 2.5008, Perplexity: 12.1922the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5194/6471], Loss: 2.6319, Perplexity: 13.9000the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5195/6471], Loss: 2.6890, Perplexity: 14.7173the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [5196/6471], Loss: 3.1475, Perplexity: 23.2783the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5197/6471], Loss: 2.5464, Perplexity: 12.7608the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [5198/6471], Loss: 2.5911, Perplexity: 13.3438the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [5264/6471], Loss: 2.7497, Perplexity: 15.6374the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [5265/6471], Loss: 3.3279, Perplexity: 27.8791the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5266/6471], Loss: 2.5503, Perplexity: 12.8109the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5267/6471], Loss: 2.7081, Perplexity: 15.0002the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [5268/6471], Loss: 3.3633, Perplexity: 28.8832the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5269/6471], Loss: 2.4801, Perplexity: 11.9422the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5270/6471], Loss: 2.5199, Perplexity: 12.4277the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5271/6471], Loss: 2.5431, Perplexity: 12.7191the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5272/6471], Loss: 2.5218, Perplexity: 12.4512the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [5338/6471], Loss: 2.6467, Perplexity: 14.1079the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5339/6471], Loss: 2.5405, Perplexity: 12.6860the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5340/6471], Loss: 2.3365, Perplexity: 10.3448the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [5341/6471], Loss: 2.8842, Perplexity: 17.8885the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5342/6471], Loss: 2.7241, Perplexity: 15.2424the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5343/6471], Loss: 2.5278, Perplexity: 12.5254the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [5344/6471], Loss: 3.0948, Perplexity: 22.0823the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5345/6471], Loss: 2.3846, Perplexity: 10.8548the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5346/6471], Loss: 2.4758, Perplexity: 11.8917the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [5412/6471], Loss: 2.8316, Perplexity: 16.9731the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5413/6471], Loss: 2.4660, Perplexity: 11.7757the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [5414/6471], Loss: 2.6974, Perplexity: 14.8417the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5415/6471], Loss: 2.5408, Perplexity: 12.6893the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [5416/6471], Loss: 2.5717, Perplexity: 13.0880the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5417/6471], Loss: 2.3916, Perplexity: 10.9313the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5418/6471], Loss: 2.5176, Perplexity: 12.3985the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5419/6471], Loss: 2.7192, Perplexity: 15.1676the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5420/6471], Loss: 2.6297, Perplexity: 13.8702the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [5486/6471], Loss: 2.8183, Perplexity: 16.7480the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5487/6471], Loss: 2.6214, Perplexity: 13.7549the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5488/6471], Loss: 2.6174, Perplexity: 13.6996the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5489/6471], Loss: 2.4955, Perplexity: 12.1281the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5490/6471], Loss: 2.5333, Perplexity: 12.5946the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [5491/6471], Loss: 2.9091, Perplexity: 18.3407the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [5492/6471], Loss: 2.6027, Perplexity: 13.5002the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5493/6471], Loss: 2.5738, Perplexity: 13.1151the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [5494/6471], Loss: 2.7395, Perplexity: 15.4799the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [5560/6471], Loss: 2.4428, Perplexity: 11.5053the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5561/6471], Loss: 2.5104, Perplexity: 12.3104the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5562/6471], Loss: 2.6465, Perplexity: 14.1044the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [5563/6471], Loss: 2.4322, Perplexity: 11.3845the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [5564/6471], Loss: 2.9816, Perplexity: 19.7203the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5565/6471], Loss: 2.4273, Perplexity: 11.3284the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5566/6471], Loss: 2.5335, Perplexity: 12.5978the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5567/6471], Loss: 2.5263, Perplexity: 12.5068the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5568/6471], Loss: 2.6611, Perplexity: 14.3125the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3

Epoch [1/3], Step [5634/6471], Loss: 2.6272, Perplexity: 13.8343the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5635/6471], Loss: 2.4121, Perplexity: 11.1576the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5636/6471], Loss: 2.3668, Perplexity: 10.6637the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5637/6471], Loss: 2.5414, Perplexity: 12.6977the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5638/6471], Loss: 2.6160, Perplexity: 13.6815the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [5639/6471], Loss: 3.1315, Perplexity: 22.9072the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5640/6471], Loss: 2.6151, Perplexity: 13.6691the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5641/6471], Loss: 2.3537, Perplexity: 10.5244the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [5642/6471], Loss: 2.8182, Perplexity: 16.7471the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3

Epoch [1/3], Step [5708/6471], Loss: 2.2786, Perplexity: 9.7629the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [5709/6471], Loss: 2.8537, Perplexity: 17.3521the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5710/6471], Loss: 2.5070, Perplexity: 12.2681the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [5711/6471], Loss: 2.6685, Perplexity: 14.4179the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5712/6471], Loss: 2.6473, Perplexity: 14.1154the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5713/6471], Loss: 2.4240, Perplexity: 11.2913the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5714/6471], Loss: 2.4411, Perplexity: 11.4852the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [5715/6471], Loss: 2.5547, Perplexity: 12.8677the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5716/6471], Loss: 2.6672, Perplexity: 14.4000the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3]

Epoch [1/3], Step [5782/6471], Loss: 2.4850, Perplexity: 12.0015the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5783/6471], Loss: 2.5110, Perplexity: 12.3175the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [5784/6471], Loss: 2.7194, Perplexity: 15.1720the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [5785/6471], Loss: 2.7351, Perplexity: 15.4106the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [5786/6471], Loss: 3.0071, Perplexity: 20.2294the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5787/6471], Loss: 2.4580, Perplexity: 11.6819the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5788/6471], Loss: 2.5432, Perplexity: 12.7198the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [5789/6471], Loss: 2.5452, Perplexity: 12.7461the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5790/6471], Loss: 2.4336, Perplexity: 11.3998the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [5856/6471], Loss: 2.5145, Perplexity: 12.3609the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5857/6471], Loss: 2.3077, Perplexity: 10.0511the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5858/6471], Loss: 2.5975, Perplexity: 13.4305the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5859/6471], Loss: 2.4453, Perplexity: 11.5341the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5860/6471], Loss: 2.6041, Perplexity: 13.5194the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5861/6471], Loss: 2.4010, Perplexity: 11.0342the hiddens.shape: torch.Size([64, 26, 1048])
Epoch [1/3], Step [5862/6471], Loss: 3.6200, Perplexity: 37.3394the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [5863/6471], Loss: 2.5830, Perplexity: 13.2362the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5864/6471], Loss: 2.4079, Perplexity: 11.1109the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3

Epoch [1/3], Step [5930/6471], Loss: 2.5802, Perplexity: 13.2000the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [5931/6471], Loss: 2.5591, Perplexity: 12.9238the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5932/6471], Loss: 2.6515, Perplexity: 14.1758the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [5933/6471], Loss: 2.6455, Perplexity: 14.0907the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [5934/6471], Loss: 2.4934, Perplexity: 12.1027the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [5935/6471], Loss: 3.0271, Perplexity: 20.6380the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5936/6471], Loss: 2.3352, Perplexity: 10.3314the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [5937/6471], Loss: 2.4978, Perplexity: 12.1562the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [1/3], Step [5938/6471], Loss: 3.2923, Perplexity: 26.9049the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [6004/6471], Loss: 2.3663, Perplexity: 10.6584the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6005/6471], Loss: 2.4847, Perplexity: 11.9979the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [6006/6471], Loss: 3.2353, Perplexity: 25.4140the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6007/6471], Loss: 2.4542, Perplexity: 11.6376the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6008/6471], Loss: 2.5545, Perplexity: 12.8652the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6009/6471], Loss: 2.6319, Perplexity: 13.8998the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [6010/6471], Loss: 2.4479, Perplexity: 11.5636the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [6011/6471], Loss: 2.4468, Perplexity: 11.5514the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [6012/6471], Loss: 2.7013, Perplexity: 14.8988the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3

Epoch [1/3], Step [6078/6471], Loss: 2.8672, Perplexity: 17.5881the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [6079/6471], Loss: 2.6986, Perplexity: 14.8583the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6080/6471], Loss: 2.5977, Perplexity: 13.4323the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [6081/6471], Loss: 2.8872, Perplexity: 17.9438the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6082/6471], Loss: 2.4276, Perplexity: 11.3313the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [6083/6471], Loss: 2.4594, Perplexity: 11.6973the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [6084/6471], Loss: 2.4165, Perplexity: 11.2063the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6085/6471], Loss: 2.5226, Perplexity: 12.4611the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [1/3], Step [6086/6471], Loss: 3.1558, Perplexity: 23.4712the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3

Epoch [1/3], Step [6152/6471], Loss: 2.3113, Perplexity: 10.0873the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [6153/6471], Loss: 2.5415, Perplexity: 12.6984the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [6154/6471], Loss: 2.8628, Perplexity: 17.5101the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [6155/6471], Loss: 2.4770, Perplexity: 11.9057the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6156/6471], Loss: 2.6486, Perplexity: 14.1348the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [6157/6471], Loss: 2.4246, Perplexity: 11.2979the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [6158/6471], Loss: 2.7695, Perplexity: 15.9505the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [6159/6471], Loss: 2.6486, Perplexity: 14.1343the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [1/3], Step [6160/6471], Loss: 2.8496, Perplexity: 17.2808the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [6226/6471], Loss: 2.2471, Perplexity: 9.4604the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6227/6471], Loss: 2.4961, Perplexity: 12.1356the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [6228/6471], Loss: 2.5972, Perplexity: 13.4268the hiddens.shape: torch.Size([64, 30, 1048])
Epoch [1/3], Step [6229/6471], Loss: 3.9428, Perplexity: 51.5604the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [6230/6471], Loss: 2.5423, Perplexity: 12.7094the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6231/6471], Loss: 2.6725, Perplexity: 14.4767the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6232/6471], Loss: 2.4545, Perplexity: 11.6408the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6233/6471], Loss: 2.2825, Perplexity: 9.8011the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6234/6471], Loss: 2.4654, Perplexity: 11.7677the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3],

Epoch [1/3], Step [6300/6471], Loss: 2.4154, Perplexity: 11.1944the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [6301/6471], Loss: 2.5808, Perplexity: 13.2078the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3], Step [6302/6471], Loss: 2.9457, Perplexity: 19.0238the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [6303/6471], Loss: 2.5503, Perplexity: 12.8112the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [1/3], Step [6304/6471], Loss: 2.9142, Perplexity: 18.4344the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [6305/6471], Loss: 2.3742, Perplexity: 10.7429the hiddens.shape: torch.Size([64, 25, 1048])
Epoch [1/3], Step [6306/6471], Loss: 3.4631, Perplexity: 31.9171the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3], Step [6307/6471], Loss: 2.5943, Perplexity: 13.3875the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6308/6471], Loss: 2.5033, Perplexity: 12.2233the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3

Epoch [1/3], Step [6374/6471], Loss: 2.4939, Perplexity: 12.1084the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6375/6471], Loss: 2.5124, Perplexity: 12.3348the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6376/6471], Loss: 2.3762, Perplexity: 10.7639the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6377/6471], Loss: 2.4466, Perplexity: 11.5495the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6378/6471], Loss: 2.5542, Perplexity: 12.8605the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [1/3], Step [6379/6471], Loss: 2.2936, Perplexity: 9.9101the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6380/6471], Loss: 2.3963, Perplexity: 10.9821the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [1/3], Step [6381/6471], Loss: 2.9309, Perplexity: 18.7449the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [1/3], Step [6382/6471], Loss: 3.0992, Perplexity: 22.1808the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [1/3]

Epoch [1/3], Step [6448/6471], Loss: 2.2889, Perplexity: 9.8643the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6449/6471], Loss: 2.3753, Perplexity: 10.7546the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [1/3], Step [6450/6471], Loss: 3.5760, Perplexity: 35.7318the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [1/3], Step [6451/6471], Loss: 2.6850, Perplexity: 14.6575the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [1/3], Step [6452/6471], Loss: 2.8158, Perplexity: 16.7073the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [1/3], Step [6453/6471], Loss: 2.4802, Perplexity: 11.9440the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6454/6471], Loss: 2.3530, Perplexity: 10.5174the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6455/6471], Loss: 2.4473, Perplexity: 11.5565the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [1/3], Step [6456/6471], Loss: 2.4071, Perplexity: 11.1018the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [1/3]

Epoch [2/3], Step [52/6471], Loss: 2.5829, Perplexity: 13.2353the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [53/6471], Loss: 2.4465, Perplexity: 11.5482the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [54/6471], Loss: 2.3896, Perplexity: 10.9089the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [55/6471], Loss: 2.4466, Perplexity: 11.5486the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [56/6471], Loss: 2.4015, Perplexity: 11.0395the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [57/6471], Loss: 2.5589, Perplexity: 12.9216the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [58/6471], Loss: 2.4451, Perplexity: 11.5322the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [59/6471], Loss: 2.6257, Perplexity: 13.8137the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [60/6471], Loss: 2.4689, Perplexity: 11.8094the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [61/6471],

Epoch [2/3], Step [127/6471], Loss: 2.3545, Perplexity: 10.5327the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [128/6471], Loss: 2.5494, Perplexity: 12.7999the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [129/6471], Loss: 2.2921, Perplexity: 9.8959the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [130/6471], Loss: 2.6720, Perplexity: 14.4683the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [131/6471], Loss: 2.3457, Perplexity: 10.4403the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [132/6471], Loss: 2.4331, Perplexity: 11.3941the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [133/6471], Loss: 2.4057, Perplexity: 11.0863the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [134/6471], Loss: 2.3239, Perplexity: 10.2153the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [135/6471], Loss: 2.2910, Perplexity: 9.8846the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [13

Epoch [2/3], Step [202/6471], Loss: 2.6366, Perplexity: 13.9650the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [203/6471], Loss: 2.4996, Perplexity: 12.1775the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [204/6471], Loss: 2.4230, Perplexity: 11.2799the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [205/6471], Loss: 2.6090, Perplexity: 13.5849the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [206/6471], Loss: 2.4677, Perplexity: 11.7948the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [207/6471], Loss: 2.4004, Perplexity: 11.0275the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [208/6471], Loss: 2.4667, Perplexity: 11.7832the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [209/6471], Loss: 2.8655, Perplexity: 17.5579the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [210/6471], Loss: 2.4508, Perplexity: 11.5972the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [

Epoch [2/3], Step [277/6471], Loss: 2.2884, Perplexity: 9.8593the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [278/6471], Loss: 2.3406, Perplexity: 10.3870the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [279/6471], Loss: 2.4107, Perplexity: 11.1420the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [280/6471], Loss: 2.4801, Perplexity: 11.9424the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [281/6471], Loss: 2.8290, Perplexity: 16.9290the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [282/6471], Loss: 2.4875, Perplexity: 12.0310the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [283/6471], Loss: 2.3327, Perplexity: 10.3055the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [284/6471], Loss: 2.2959, Perplexity: 9.9337the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [285/6471], Loss: 2.7737, Perplexity: 16.0183the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [28

Epoch [2/3], Step [352/6471], Loss: 2.2751, Perplexity: 9.7292the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [353/6471], Loss: 2.6573, Perplexity: 14.2583the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [354/6471], Loss: 2.2484, Perplexity: 9.4726the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [355/6471], Loss: 2.3725, Perplexity: 10.7244the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [356/6471], Loss: 2.3515, Perplexity: 10.5010the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [357/6471], Loss: 2.5729, Perplexity: 13.1038the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [358/6471], Loss: 2.4033, Perplexity: 11.0594the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [359/6471], Loss: 2.4847, Perplexity: 11.9973the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [360/6471], Loss: 2.5138, Perplexity: 12.3513the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [36

Epoch [2/3], Step [427/6471], Loss: 2.6119, Perplexity: 13.6245the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [428/6471], Loss: 2.3348, Perplexity: 10.3271the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [429/6471], Loss: 2.2680, Perplexity: 9.6603the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [430/6471], Loss: 2.4073, Perplexity: 11.1044the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [431/6471], Loss: 2.4800, Perplexity: 11.9414the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [432/6471], Loss: 2.4206, Perplexity: 11.2531the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [433/6471], Loss: 2.7742, Perplexity: 16.0259the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [434/6471], Loss: 2.8898, Perplexity: 17.9892the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [2/3], Step [435/6471], Loss: 2.9587, Perplexity: 19.2735the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4

Epoch [2/3], Step [502/6471], Loss: 2.7857, Perplexity: 16.2113the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [503/6471], Loss: 2.3234, Perplexity: 10.2100the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [504/6471], Loss: 2.8628, Perplexity: 17.5106the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [505/6471], Loss: 2.3899, Perplexity: 10.9121the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [506/6471], Loss: 2.6927, Perplexity: 14.7708the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [507/6471], Loss: 2.4514, Perplexity: 11.6049the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [508/6471], Loss: 2.4566, Perplexity: 11.6647the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [509/6471], Loss: 2.5276, Perplexity: 12.5239the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [510/6471], Loss: 2.8216, Perplexity: 16.8038the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [

Epoch [2/3], Step [577/6471], Loss: 2.4342, Perplexity: 11.4066the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [2/3], Step [578/6471], Loss: 3.2411, Perplexity: 25.5618the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [579/6471], Loss: 2.3121, Perplexity: 10.0960the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [580/6471], Loss: 2.7539, Perplexity: 15.7036the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [581/6471], Loss: 2.5400, Perplexity: 12.6795the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [582/6471], Loss: 2.5461, Perplexity: 12.7576the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [583/6471], Loss: 2.3255, Perplexity: 10.2322the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [584/6471], Loss: 2.5973, Perplexity: 13.4279the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [585/6471], Loss: 2.5659, Perplexity: 13.0125the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [

Epoch [2/3], Step [652/6471], Loss: 2.4903, Perplexity: 12.0646the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [2/3], Step [653/6471], Loss: 3.2188, Perplexity: 24.9984the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [654/6471], Loss: 2.3926, Perplexity: 10.9424the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [655/6471], Loss: 2.4159, Perplexity: 11.1997the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [656/6471], Loss: 2.5149, Perplexity: 12.3658the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [657/6471], Loss: 2.5930, Perplexity: 13.3695the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [658/6471], Loss: 2.4188, Perplexity: 11.2324the hiddens.shape: torch.Size([64, 25, 1048])
Epoch [2/3], Step [659/6471], Loss: 3.4650, Perplexity: 31.9762the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [660/6471], Loss: 2.2569, Perplexity: 9.5533the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [6

Epoch [2/3], Step [727/6471], Loss: 3.3195, Perplexity: 27.6472the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [728/6471], Loss: 2.4556, Perplexity: 11.6532the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [729/6471], Loss: 2.2037, Perplexity: 9.0588the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [730/6471], Loss: 2.5231, Perplexity: 12.4678the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [731/6471], Loss: 2.5617, Perplexity: 12.9578the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [732/6471], Loss: 2.4380, Perplexity: 11.4501the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [733/6471], Loss: 2.3900, Perplexity: 10.9137the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [734/6471], Loss: 2.3609, Perplexity: 10.6007the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [735/6471], Loss: 2.4145, Perplexity: 11.1845the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [7

Epoch [2/3], Step [802/6471], Loss: 2.4685, Perplexity: 11.8052the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [803/6471], Loss: 2.2195, Perplexity: 9.2026the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [2/3], Step [804/6471], Loss: 2.7027, Perplexity: 14.9204the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [805/6471], Loss: 2.3878, Perplexity: 10.8892the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [806/6471], Loss: 2.7988, Perplexity: 16.4253the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [807/6471], Loss: 2.3244, Perplexity: 10.2208the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [808/6471], Loss: 2.4547, Perplexity: 11.6428the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [809/6471], Loss: 2.4750, Perplexity: 11.8821the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [810/6471], Loss: 2.2823, Perplexity: 9.7988the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [81

Epoch [2/3], Step [877/6471], Loss: 2.3485, Perplexity: 10.4703the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [878/6471], Loss: 2.3730, Perplexity: 10.7291the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [879/6471], Loss: 2.3652, Perplexity: 10.6465the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [880/6471], Loss: 2.3481, Perplexity: 10.4655the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [881/6471], Loss: 2.3364, Perplexity: 10.3440the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [882/6471], Loss: 2.3616, Perplexity: 10.6083the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [883/6471], Loss: 2.6680, Perplexity: 14.4111the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [884/6471], Loss: 2.9424, Perplexity: 18.9609the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [885/6471], Loss: 2.4985, Perplexity: 12.1648the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [

Epoch [2/3], Step [952/6471], Loss: 2.1998, Perplexity: 9.0229the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [953/6471], Loss: 2.5549, Perplexity: 12.8697the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [954/6471], Loss: 2.5900, Perplexity: 13.3300the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [955/6471], Loss: 2.4321, Perplexity: 11.3831the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [956/6471], Loss: 2.4938, Perplexity: 12.1071the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [957/6471], Loss: 2.3988, Perplexity: 11.0104the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [958/6471], Loss: 2.4481, Perplexity: 11.5660the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [959/6471], Loss: 2.4760, Perplexity: 11.8931the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [960/6471], Loss: 2.2813, Perplexity: 9.7899the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [96

Epoch [2/3], Step [1027/6471], Loss: 2.4808, Perplexity: 11.9507the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1028/6471], Loss: 2.3984, Perplexity: 11.0060the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [1029/6471], Loss: 2.3204, Perplexity: 10.1797the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1030/6471], Loss: 2.2438, Perplexity: 9.4289the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1031/6471], Loss: 2.3744, Perplexity: 10.7449the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [2/3], Step [1032/6471], Loss: 3.3483, Perplexity: 28.4554the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1033/6471], Loss: 2.3844, Perplexity: 10.8528the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [1034/6471], Loss: 2.7723, Perplexity: 15.9950the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [1035/6471], Loss: 2.5861, Perplexity: 13.2773the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3]

Epoch [2/3], Step [1101/6471], Loss: 2.4170, Perplexity: 11.2123the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [1102/6471], Loss: 2.6524, Perplexity: 14.1885the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1103/6471], Loss: 2.3520, Perplexity: 10.5069the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1104/6471], Loss: 2.4180, Perplexity: 11.2237the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1105/6471], Loss: 2.3373, Perplexity: 10.3532the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1106/6471], Loss: 2.2945, Perplexity: 9.9191the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [1107/6471], Loss: 2.8226, Perplexity: 16.8208the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1108/6471], Loss: 2.3574, Perplexity: 10.5630the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1109/6471], Loss: 2.4322, Perplexity: 11.3840the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3]

Epoch [2/3], Step [1175/6471], Loss: 2.4850, Perplexity: 12.0007the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1176/6471], Loss: 2.3753, Perplexity: 10.7537the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1177/6471], Loss: 2.1759, Perplexity: 8.8100the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1178/6471], Loss: 2.4934, Perplexity: 12.1021the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1179/6471], Loss: 2.4533, Perplexity: 11.6272the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [1180/6471], Loss: 2.7058, Perplexity: 14.9661the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [1181/6471], Loss: 2.4718, Perplexity: 11.8437the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1182/6471], Loss: 2.3597, Perplexity: 10.5878the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1183/6471], Loss: 2.5266, Perplexity: 12.5107the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3]

Epoch [2/3], Step [1249/6471], Loss: 2.7854, Perplexity: 16.2070the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1250/6471], Loss: 2.4077, Perplexity: 11.1085the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1251/6471], Loss: 2.4098, Perplexity: 11.1314the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1252/6471], Loss: 2.3515, Perplexity: 10.5015the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [1253/6471], Loss: 2.6916, Perplexity: 14.7547the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [1254/6471], Loss: 2.6813, Perplexity: 14.6042the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1255/6471], Loss: 2.3102, Perplexity: 10.0768the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1256/6471], Loss: 2.3530, Perplexity: 10.5173the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1257/6471], Loss: 2.3413, Perplexity: 10.3943the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3

Epoch [2/3], Step [1323/6471], Loss: 2.1788, Perplexity: 8.8358the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1324/6471], Loss: 2.4475, Perplexity: 11.5590the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1325/6471], Loss: 2.2974, Perplexity: 9.9483the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1326/6471], Loss: 2.4303, Perplexity: 11.3627the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [2/3], Step [1327/6471], Loss: 2.9878, Perplexity: 19.8413the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1328/6471], Loss: 2.4335, Perplexity: 11.3990the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1329/6471], Loss: 2.2892, Perplexity: 9.8672the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1330/6471], Loss: 2.2990, Perplexity: 9.9639the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [1331/6471], Loss: 2.3778, Perplexity: 10.7807the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], S

Epoch [2/3], Step [1397/6471], Loss: 2.2867, Perplexity: 9.8426the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1398/6471], Loss: 2.2615, Perplexity: 9.5973the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1399/6471], Loss: 2.1842, Perplexity: 8.8839the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [1400/6471], Loss: 2.6827, Perplexity: 14.6241
the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1401/6471], Loss: 2.2251, Perplexity: 9.2541the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1402/6471], Loss: 2.2349, Perplexity: 9.3453the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1403/6471], Loss: 2.5062, Perplexity: 12.2577the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1404/6471], Loss: 2.4508, Perplexity: 11.5975the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1405/6471], Loss: 2.1377, Perplexity: 8.4798the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], St

Epoch [2/3], Step [1471/6471], Loss: 2.3389, Perplexity: 10.3698the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1472/6471], Loss: 2.2487, Perplexity: 9.4753the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1473/6471], Loss: 2.4247, Perplexity: 11.2993the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [1474/6471], Loss: 2.9633, Perplexity: 19.3610the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1475/6471], Loss: 2.3077, Perplexity: 10.0511the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [1476/6471], Loss: 2.4322, Perplexity: 11.3839the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1477/6471], Loss: 2.2904, Perplexity: 9.8793the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1478/6471], Loss: 2.1503, Perplexity: 8.5870the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1479/6471], Loss: 2.2989, Perplexity: 9.9630the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], S

Epoch [2/3], Step [1545/6471], Loss: 2.8233, Perplexity: 16.8315the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1546/6471], Loss: 2.4884, Perplexity: 12.0419the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [1547/6471], Loss: 2.4110, Perplexity: 11.1447the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1548/6471], Loss: 2.1623, Perplexity: 8.6914the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1549/6471], Loss: 2.4687, Perplexity: 11.8068the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1550/6471], Loss: 2.3775, Perplexity: 10.7774the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1551/6471], Loss: 2.5490, Perplexity: 12.7941the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [1552/6471], Loss: 2.4841, Perplexity: 11.9905the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1553/6471], Loss: 2.1715, Perplexity: 8.7717the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3],

Epoch [2/3], Step [1619/6471], Loss: 2.4113, Perplexity: 11.1490the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1620/6471], Loss: 2.7051, Perplexity: 14.9565the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1621/6471], Loss: 2.5508, Perplexity: 12.8175the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1622/6471], Loss: 2.2215, Perplexity: 9.2214the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1623/6471], Loss: 2.3392, Perplexity: 10.3727the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [1624/6471], Loss: 2.7269, Perplexity: 15.2859the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [1625/6471], Loss: 2.6515, Perplexity: 14.1756the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [1626/6471], Loss: 2.2702, Perplexity: 9.6814the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1627/6471], Loss: 2.4035, Perplexity: 11.0616the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3],

Epoch [2/3], Step [1693/6471], Loss: 2.2324, Perplexity: 9.3224the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [2/3], Step [1694/6471], Loss: 2.6302, Perplexity: 13.8763the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1695/6471], Loss: 2.5644, Perplexity: 12.9925the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1696/6471], Loss: 2.2367, Perplexity: 9.3627the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [1697/6471], Loss: 2.4097, Perplexity: 11.1302the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1698/6471], Loss: 2.3596, Perplexity: 10.5865the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1699/6471], Loss: 2.1817, Perplexity: 8.8614the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [1700/6471], Loss: 2.5536, Perplexity: 12.8529the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1701/6471], Loss: 2.3240, Perplexity: 10.2161the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], 

Epoch [2/3], Step [1767/6471], Loss: 2.3021, Perplexity: 9.9948the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1768/6471], Loss: 2.5860, Perplexity: 13.2762the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1769/6471], Loss: 2.5810, Perplexity: 13.2105the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1770/6471], Loss: 2.3686, Perplexity: 10.6821the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [1771/6471], Loss: 2.8570, Perplexity: 17.4096the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1772/6471], Loss: 2.4389, Perplexity: 11.4602the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [1773/6471], Loss: 2.6383, Perplexity: 13.9899the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1774/6471], Loss: 2.2811, Perplexity: 9.7872the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [2/3], Step [1775/6471], Loss: 3.1864, Perplexity: 24.2020the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3],

Epoch [2/3], Step [1841/6471], Loss: 2.3968, Perplexity: 10.9881the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [2/3], Step [1842/6471], Loss: 3.1898, Perplexity: 24.2833the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [1843/6471], Loss: 2.8574, Perplexity: 17.4154the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1844/6471], Loss: 2.1350, Perplexity: 8.4571the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [1845/6471], Loss: 2.6888, Perplexity: 14.7133the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1846/6471], Loss: 2.3871, Perplexity: 10.8821the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1847/6471], Loss: 2.5092, Perplexity: 12.2957the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1848/6471], Loss: 2.0743, Perplexity: 7.9593the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1849/6471], Loss: 2.4692, Perplexity: 11.8130the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3],

Epoch [2/3], Step [1915/6471], Loss: 2.3515, Perplexity: 10.5013the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1916/6471], Loss: 2.2576, Perplexity: 9.5597the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1917/6471], Loss: 2.2165, Perplexity: 9.1748the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1918/6471], Loss: 2.3098, Perplexity: 10.0728the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1919/6471], Loss: 2.3418, Perplexity: 10.3996the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1920/6471], Loss: 2.1479, Perplexity: 8.5669the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [1921/6471], Loss: 2.6506, Perplexity: 14.1621the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1922/6471], Loss: 2.2652, Perplexity: 9.6330the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [1923/6471], Loss: 2.2843, Perplexity: 9.8188the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], St

Epoch [2/3], Step [1989/6471], Loss: 2.2835, Perplexity: 9.8110the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1990/6471], Loss: 2.2500, Perplexity: 9.4878the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1991/6471], Loss: 2.4139, Perplexity: 11.1777the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [2/3], Step [1992/6471], Loss: 3.1125, Perplexity: 22.4762the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [1993/6471], Loss: 2.3422, Perplexity: 10.4036the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1994/6471], Loss: 2.2633, Perplexity: 9.6150the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [1995/6471], Loss: 2.4956, Perplexity: 12.1288the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [1996/6471], Loss: 2.4665, Perplexity: 11.7810the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [2/3], Step [1997/6471], Loss: 3.0915, Perplexity: 22.0099the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], 

Epoch [2/3], Step [2063/6471], Loss: 2.5595, Perplexity: 12.9290the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [2/3], Step [2064/6471], Loss: 2.8073, Perplexity: 16.5645the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [2/3], Step [2065/6471], Loss: 2.6076, Perplexity: 13.5660the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2066/6471], Loss: 2.5365, Perplexity: 12.6347the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2067/6471], Loss: 2.2985, Perplexity: 9.9595the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [2068/6471], Loss: 2.6847, Perplexity: 14.6544the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2069/6471], Loss: 2.4703, Perplexity: 11.8257the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [2070/6471], Loss: 2.5991, Perplexity: 13.4519the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2071/6471], Loss: 2.2338, Perplexity: 9.3350the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3],

Epoch [2/3], Step [2137/6471], Loss: 3.1434, Perplexity: 23.1825the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2138/6471], Loss: 2.5205, Perplexity: 12.4343the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2139/6471], Loss: 2.2087, Perplexity: 9.1041the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2140/6471], Loss: 2.1641, Perplexity: 8.7067the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2141/6471], Loss: 2.3970, Perplexity: 10.9903the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2142/6471], Loss: 2.3003, Perplexity: 9.9774the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2143/6471], Loss: 2.1780, Perplexity: 8.8286the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [2144/6471], Loss: 2.5663, Perplexity: 13.0174the hiddens.shape: torch.Size([64, 31, 1048])
Epoch [2/3], Step [2145/6471], Loss: 4.0023, Perplexity: 54.7263the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], S

Epoch [2/3], Step [2211/6471], Loss: 2.5364, Perplexity: 12.6336the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2212/6471], Loss: 2.3371, Perplexity: 10.3509the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2213/6471], Loss: 2.2905, Perplexity: 9.8800the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2214/6471], Loss: 2.3302, Perplexity: 10.2804the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2215/6471], Loss: 2.4695, Perplexity: 11.8160the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2216/6471], Loss: 2.2384, Perplexity: 9.3788the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2217/6471], Loss: 2.2559, Perplexity: 9.5442the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [2218/6471], Loss: 2.4242, Perplexity: 11.2934the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2219/6471], Loss: 2.4039, Perplexity: 11.0667the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], 

Epoch [2/3], Step [2285/6471], Loss: 2.2311, Perplexity: 9.3105the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2286/6471], Loss: 2.3364, Perplexity: 10.3435the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [2287/6471], Loss: 2.6448, Perplexity: 14.0811the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [2288/6471], Loss: 2.5697, Perplexity: 13.0621the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2289/6471], Loss: 2.4957, Perplexity: 12.1304the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2290/6471], Loss: 2.2570, Perplexity: 9.5543the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [2291/6471], Loss: 2.7964, Perplexity: 16.3853the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [2292/6471], Loss: 2.4954, Perplexity: 12.1265the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2293/6471], Loss: 2.3441, Perplexity: 10.4235the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3],

Epoch [2/3], Step [2359/6471], Loss: 2.9812, Perplexity: 19.7118the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2360/6471], Loss: 2.3143, Perplexity: 10.1178the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [2361/6471], Loss: 2.4508, Perplexity: 11.5974the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2362/6471], Loss: 2.3413, Perplexity: 10.3952the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [2363/6471], Loss: 2.2608, Perplexity: 9.5905the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2364/6471], Loss: 2.1668, Perplexity: 8.7300the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2365/6471], Loss: 2.4307, Perplexity: 11.3670the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2366/6471], Loss: 2.5079, Perplexity: 12.2791the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2367/6471], Loss: 2.4560, Perplexity: 11.6584the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3],

Epoch [2/3], Step [2433/6471], Loss: 2.4422, Perplexity: 11.4989the hiddens.shape: torch.Size([64, 25, 1048])
Epoch [2/3], Step [2434/6471], Loss: 3.2652, Perplexity: 26.1851the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2435/6471], Loss: 2.2769, Perplexity: 9.7465the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2436/6471], Loss: 2.5162, Perplexity: 12.3819the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2437/6471], Loss: 2.3394, Perplexity: 10.3755the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2438/6471], Loss: 2.3463, Perplexity: 10.4470the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [2439/6471], Loss: 2.6965, Perplexity: 14.8278the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2440/6471], Loss: 2.3848, Perplexity: 10.8572the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2441/6471], Loss: 2.4173, Perplexity: 11.2153the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3]

Epoch [2/3], Step [2507/6471], Loss: 2.5181, Perplexity: 12.4045the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2508/6471], Loss: 2.1980, Perplexity: 9.0067the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2509/6471], Loss: 2.3654, Perplexity: 10.6480the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [2510/6471], Loss: 2.6790, Perplexity: 14.5709the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [2/3], Step [2511/6471], Loss: 3.0319, Perplexity: 20.7372the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [2512/6471], Loss: 2.5554, Perplexity: 12.8761the hiddens.shape: torch.Size([64, 24, 1048])
Epoch [2/3], Step [2513/6471], Loss: 3.3278, Perplexity: 27.8760the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2514/6471], Loss: 2.3517, Perplexity: 10.5032the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2515/6471], Loss: 2.3890, Perplexity: 10.9023the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3]

Epoch [2/3], Step [2581/6471], Loss: 2.3442, Perplexity: 10.4251the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [2582/6471], Loss: 2.6403, Perplexity: 14.0175the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2583/6471], Loss: 2.2140, Perplexity: 9.1520the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2584/6471], Loss: 2.4912, Perplexity: 12.0759the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2585/6471], Loss: 2.0678, Perplexity: 7.9072the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2586/6471], Loss: 2.4715, Perplexity: 11.8402the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2587/6471], Loss: 2.3155, Perplexity: 10.1296the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [2588/6471], Loss: 2.7122, Perplexity: 15.0619the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2589/6471], Loss: 2.4661, Perplexity: 11.7769the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3],

Epoch [2/3], Step [2655/6471], Loss: 2.4515, Perplexity: 11.6054the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2656/6471], Loss: 2.4445, Perplexity: 11.5242the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [2657/6471], Loss: 2.4688, Perplexity: 11.8079the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2658/6471], Loss: 2.6093, Perplexity: 13.5897the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2659/6471], Loss: 2.2412, Perplexity: 9.4050the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2660/6471], Loss: 2.3897, Perplexity: 10.9100the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [2661/6471], Loss: 2.3960, Perplexity: 10.9787the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [2662/6471], Loss: 2.7597, Perplexity: 15.7955the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2663/6471], Loss: 2.2490, Perplexity: 9.4780the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3],

Epoch [2/3], Step [2729/6471], Loss: 2.4052, Perplexity: 11.0807the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [2/3], Step [2730/6471], Loss: 3.1786, Perplexity: 24.0127the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2731/6471], Loss: 2.3475, Perplexity: 10.4592the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2732/6471], Loss: 2.3973, Perplexity: 10.9930the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [2/3], Step [2733/6471], Loss: 2.8719, Perplexity: 17.6699the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2734/6471], Loss: 2.3769, Perplexity: 10.7712the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2735/6471], Loss: 2.4029, Perplexity: 11.0548the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2736/6471], Loss: 2.4813, Perplexity: 11.9569the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2737/6471], Loss: 2.2262, Perplexity: 9.2646the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3]

Epoch [2/3], Step [2803/6471], Loss: 2.4411, Perplexity: 11.4856the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2804/6471], Loss: 2.1716, Perplexity: 8.7723the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [2805/6471], Loss: 2.5329, Perplexity: 12.5902the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [2806/6471], Loss: 2.4269, Perplexity: 11.3238the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2807/6471], Loss: 2.2880, Perplexity: 9.8556the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [2808/6471], Loss: 2.2756, Perplexity: 9.7340the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2809/6471], Loss: 2.4695, Perplexity: 11.8170the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [2810/6471], Loss: 2.6108, Perplexity: 13.6103the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [2811/6471], Loss: 2.3911, Perplexity: 10.9250the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], 

Epoch [2/3], Step [2877/6471], Loss: 2.4551, Perplexity: 11.6472the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2878/6471], Loss: 2.4644, Perplexity: 11.7569the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2879/6471], Loss: 2.4171, Perplexity: 11.2137the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2880/6471], Loss: 2.3547, Perplexity: 10.5349the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2881/6471], Loss: 2.2629, Perplexity: 9.6105the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2882/6471], Loss: 2.0550, Perplexity: 7.8068the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [2/3], Step [2883/6471], Loss: 3.1291, Perplexity: 22.8526the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2884/6471], Loss: 2.4462, Perplexity: 11.5443the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [2885/6471], Loss: 2.4338, Perplexity: 11.4017the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3],

Epoch [2/3], Step [2951/6471], Loss: 2.3472, Perplexity: 10.4567the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2952/6471], Loss: 2.1869, Perplexity: 8.9077the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2953/6471], Loss: 2.3416, Perplexity: 10.3979the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2954/6471], Loss: 2.3459, Perplexity: 10.4432the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [2955/6471], Loss: 2.1669, Perplexity: 8.7316the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2956/6471], Loss: 2.3132, Perplexity: 10.1069the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [2957/6471], Loss: 2.2086, Perplexity: 9.1033the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [2958/6471], Loss: 1.8804, Perplexity: 6.5563the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [2959/6471], Loss: 2.2333, Perplexity: 9.3308the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], St

Epoch [2/3], Step [3025/6471], Loss: 2.1203, Perplexity: 8.3336the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3026/6471], Loss: 2.3901, Perplexity: 10.9143the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3027/6471], Loss: 2.3568, Perplexity: 10.5572the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3028/6471], Loss: 2.2654, Perplexity: 9.6349the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3029/6471], Loss: 2.2774, Perplexity: 9.7514the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3030/6471], Loss: 2.3032, Perplexity: 10.0063the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3031/6471], Loss: 2.3209, Perplexity: 10.1853the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3032/6471], Loss: 2.1317, Perplexity: 8.4290the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [3033/6471], Loss: 2.6544, Perplexity: 14.2167the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], S

Epoch [2/3], Step [3099/6471], Loss: 2.5024, Perplexity: 12.2118the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3100/6471], Loss: 2.3163, Perplexity: 10.1383the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [3101/6471], Loss: 2.4428, Perplexity: 11.5055the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [3102/6471], Loss: 2.2895, Perplexity: 9.8699the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [2/3], Step [3103/6471], Loss: 3.1509, Perplexity: 23.3575the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3104/6471], Loss: 2.0295, Perplexity: 7.6105the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3105/6471], Loss: 2.3226, Perplexity: 10.2017the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3106/6471], Loss: 2.1023, Perplexity: 8.1852the hiddens.shape: torch.Size([64, 30, 1048])
Epoch [2/3], Step [3107/6471], Loss: 3.6239, Perplexity: 37.4839the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], 

Epoch [2/3], Step [3173/6471], Loss: 2.3263, Perplexity: 10.2405the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [3174/6471], Loss: 2.8773, Perplexity: 17.7654the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3175/6471], Loss: 2.2116, Perplexity: 9.1304the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3176/6471], Loss: 2.2712, Perplexity: 9.6914the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3177/6471], Loss: 2.1999, Perplexity: 9.0237the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3178/6471], Loss: 2.3042, Perplexity: 10.0160the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [2/3], Step [3179/6471], Loss: 2.6144, Perplexity: 13.6589the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3180/6471], Loss: 2.2150, Perplexity: 9.1613the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3181/6471], Loss: 2.3115, Perplexity: 10.0895the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], S

Epoch [2/3], Step [3247/6471], Loss: 2.3701, Perplexity: 10.6981the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3248/6471], Loss: 2.3058, Perplexity: 10.0325the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3249/6471], Loss: 2.2458, Perplexity: 9.4477the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3250/6471], Loss: 2.3715, Perplexity: 10.7134the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3251/6471], Loss: 2.1018, Perplexity: 8.1808the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [2/3], Step [3252/6471], Loss: 3.0949, Perplexity: 22.0861the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3253/6471], Loss: 2.2412, Perplexity: 9.4045the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3254/6471], Loss: 2.1818, Perplexity: 8.8623the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [3255/6471], Loss: 2.4245, Perplexity: 11.2971the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], S

Epoch [2/3], Step [3321/6471], Loss: 2.5466, Perplexity: 12.7636the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3322/6471], Loss: 1.9301, Perplexity: 6.8904the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3323/6471], Loss: 2.3762, Perplexity: 10.7637the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [3324/6471], Loss: 2.4279, Perplexity: 11.3356the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [3325/6471], Loss: 2.3576, Perplexity: 10.5657the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [3326/6471], Loss: 2.5951, Perplexity: 13.3977the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3327/6471], Loss: 2.3052, Perplexity: 10.0260the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3328/6471], Loss: 2.3820, Perplexity: 10.8262the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3329/6471], Loss: 2.0887, Perplexity: 8.0747the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3],

Epoch [2/3], Step [3395/6471], Loss: 2.8068, Perplexity: 16.5561the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3396/6471], Loss: 2.2306, Perplexity: 9.3051the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3397/6471], Loss: 2.4189, Perplexity: 11.2332the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3398/6471], Loss: 2.1865, Perplexity: 8.9039the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3399/6471], Loss: 2.3049, Perplexity: 10.0231the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3400/6471], Loss: 2.2626, Perplexity: 9.6076
the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3401/6471], Loss: 2.2106, Perplexity: 9.1215the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3402/6471], Loss: 2.3821, Perplexity: 10.8276the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3403/6471], Loss: 2.2225, Perplexity: 9.2302the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], S

Epoch [2/3], Step [3469/6471], Loss: 2.3721, Perplexity: 10.7202the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [3470/6471], Loss: 2.6087, Perplexity: 13.5815the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3471/6471], Loss: 2.1757, Perplexity: 8.8080the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3472/6471], Loss: 2.3812, Perplexity: 10.8179the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [3473/6471], Loss: 2.8278, Perplexity: 16.9080the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [3474/6471], Loss: 2.6702, Perplexity: 14.4429the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3475/6471], Loss: 2.2103, Perplexity: 9.1183the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3476/6471], Loss: 2.3917, Perplexity: 10.9316the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [3477/6471], Loss: 2.3998, Perplexity: 11.0211the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3],

Epoch [2/3], Step [3543/6471], Loss: 2.3841, Perplexity: 10.8496the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3544/6471], Loss: 2.3491, Perplexity: 10.4766the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3545/6471], Loss: 2.2866, Perplexity: 9.8409the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3546/6471], Loss: 2.0593, Perplexity: 7.8407the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3547/6471], Loss: 2.2003, Perplexity: 9.0280the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3548/6471], Loss: 2.3061, Perplexity: 10.0350the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [3549/6471], Loss: 2.9450, Perplexity: 19.0104the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3550/6471], Loss: 2.2487, Perplexity: 9.4757the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3551/6471], Loss: 2.1235, Perplexity: 8.3601the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], St

Epoch [2/3], Step [3617/6471], Loss: 2.4411, Perplexity: 11.4862the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3618/6471], Loss: 2.4050, Perplexity: 11.0781the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3619/6471], Loss: 2.5041, Perplexity: 12.2331the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3620/6471], Loss: 2.3986, Perplexity: 11.0072the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3621/6471], Loss: 2.4016, Perplexity: 11.0412the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3622/6471], Loss: 2.4753, Perplexity: 11.8858the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [3623/6471], Loss: 2.2993, Perplexity: 9.9667the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [3624/6471], Loss: 2.4253, Perplexity: 11.3054the hiddens.shape: torch.Size([64, 28, 1048])
Epoch [2/3], Step [3625/6471], Loss: 3.3333, Perplexity: 28.0316the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3]

Epoch [2/3], Step [3691/6471], Loss: 2.2483, Perplexity: 9.4712the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3692/6471], Loss: 2.3533, Perplexity: 10.5204the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3693/6471], Loss: 2.4248, Perplexity: 11.2995the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3694/6471], Loss: 2.2726, Perplexity: 9.7043the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [3695/6471], Loss: 2.6334, Perplexity: 13.9210the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3696/6471], Loss: 2.2344, Perplexity: 9.3411the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3697/6471], Loss: 2.3230, Perplexity: 10.2067the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3698/6471], Loss: 2.1243, Perplexity: 8.3671the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3699/6471], Loss: 2.1482, Perplexity: 8.5696the hiddens.shape: torch.Size([64, 25, 1048])
Epoch [2/3], St

Epoch [2/3], Step [3765/6471], Loss: 2.5028, Perplexity: 12.2162the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3766/6471], Loss: 2.0442, Perplexity: 7.7227the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [3767/6471], Loss: 2.5065, Perplexity: 12.2615the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [3768/6471], Loss: 2.3626, Perplexity: 10.6187the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3769/6471], Loss: 2.3532, Perplexity: 10.5190the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3770/6471], Loss: 2.1177, Perplexity: 8.3118the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3771/6471], Loss: 2.2849, Perplexity: 9.8249the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3772/6471], Loss: 2.1473, Perplexity: 8.5615the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [3773/6471], Loss: 2.4417, Perplexity: 11.4922the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], S

Epoch [2/3], Step [3839/6471], Loss: 2.2612, Perplexity: 9.5949the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3840/6471], Loss: 2.4542, Perplexity: 11.6366the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3841/6471], Loss: 2.4381, Perplexity: 11.4507the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3842/6471], Loss: 2.1019, Perplexity: 8.1817the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3843/6471], Loss: 2.1471, Perplexity: 8.5598the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [2/3], Step [3844/6471], Loss: 3.0658, Perplexity: 21.4516the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3845/6471], Loss: 2.2923, Perplexity: 9.8979the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [3846/6471], Loss: 2.6271, Perplexity: 13.8341the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [2/3], Step [3847/6471], Loss: 3.0210, Perplexity: 20.5119the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], S

Epoch [2/3], Step [3913/6471], Loss: 2.5554, Perplexity: 12.8771the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [3914/6471], Loss: 2.2870, Perplexity: 9.8458the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3915/6471], Loss: 2.3877, Perplexity: 10.8886the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3916/6471], Loss: 1.9810, Perplexity: 7.2501the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [3917/6471], Loss: 2.6911, Perplexity: 14.7482the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3918/6471], Loss: 2.1961, Perplexity: 8.9902the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [3919/6471], Loss: 2.4995, Perplexity: 12.1764the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [3920/6471], Loss: 2.7075, Perplexity: 14.9918the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3921/6471], Loss: 2.2211, Perplexity: 9.2177the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], S

Epoch [2/3], Step [3987/6471], Loss: 2.2223, Perplexity: 9.2281the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3988/6471], Loss: 2.1509, Perplexity: 8.5923the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3989/6471], Loss: 2.2553, Perplexity: 9.5380the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [3990/6471], Loss: 2.4430, Perplexity: 11.5070the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3991/6471], Loss: 2.4470, Perplexity: 11.5535the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [3992/6471], Loss: 2.5218, Perplexity: 12.4513the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [3993/6471], Loss: 2.2370, Perplexity: 9.3652the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [3994/6471], Loss: 2.2276, Perplexity: 9.2773the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [3995/6471], Loss: 2.1232, Perplexity: 8.3574the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [4061/6471], Loss: 2.5172, Perplexity: 12.3935the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [4062/6471], Loss: 2.4575, Perplexity: 11.6756the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4063/6471], Loss: 2.3331, Perplexity: 10.3099the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4064/6471], Loss: 2.2357, Perplexity: 9.3531the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [4065/6471], Loss: 2.3543, Perplexity: 10.5310the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [2/3], Step [4066/6471], Loss: 2.8486, Perplexity: 17.2632the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4067/6471], Loss: 2.4321, Perplexity: 11.3822the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4068/6471], Loss: 2.1173, Perplexity: 8.3084the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4069/6471], Loss: 2.1588, Perplexity: 8.6604the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], 

Epoch [2/3], Step [4135/6471], Loss: 2.0057, Perplexity: 7.4310the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [2/3], Step [4136/6471], Loss: 3.0701, Perplexity: 21.5448the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4137/6471], Loss: 2.3732, Perplexity: 10.7314the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4138/6471], Loss: 2.4590, Perplexity: 11.6925the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [4139/6471], Loss: 2.7973, Perplexity: 16.4007the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4140/6471], Loss: 2.1117, Perplexity: 8.2626the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4141/6471], Loss: 2.1034, Perplexity: 8.1937the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4142/6471], Loss: 2.2240, Perplexity: 9.2438the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [4143/6471], Loss: 2.2831, Perplexity: 9.8075the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], St

Epoch [2/3], Step [4209/6471], Loss: 2.2248, Perplexity: 9.2516the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4210/6471], Loss: 2.4124, Perplexity: 11.1608the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4211/6471], Loss: 2.1387, Perplexity: 8.4887the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4212/6471], Loss: 2.1477, Perplexity: 8.5650the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4213/6471], Loss: 2.5140, Perplexity: 12.3542the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4214/6471], Loss: 2.6445, Perplexity: 14.0759the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4215/6471], Loss: 2.3465, Perplexity: 10.4490the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4216/6471], Loss: 2.1113, Perplexity: 8.2589the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4217/6471], Loss: 2.5491, Perplexity: 12.7960the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], S

Epoch [2/3], Step [4283/6471], Loss: 2.2325, Perplexity: 9.3232the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4284/6471], Loss: 2.2717, Perplexity: 9.6957the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [4285/6471], Loss: 2.3669, Perplexity: 10.6639the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [4286/6471], Loss: 2.6130, Perplexity: 13.6404the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4287/6471], Loss: 2.1326, Perplexity: 8.4369the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [4288/6471], Loss: 2.5921, Perplexity: 13.3584the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4289/6471], Loss: 2.0824, Perplexity: 8.0236the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4290/6471], Loss: 2.3660, Perplexity: 10.6546the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [4291/6471], Loss: 2.2420, Perplexity: 9.4126the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], St

Epoch [2/3], Step [4357/6471], Loss: 2.3034, Perplexity: 10.0083the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4358/6471], Loss: 2.1506, Perplexity: 8.5904the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [4359/6471], Loss: 2.7861, Perplexity: 16.2182the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4360/6471], Loss: 2.5267, Perplexity: 12.5121the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [4361/6471], Loss: 2.8849, Perplexity: 17.9011the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4362/6471], Loss: 2.1348, Perplexity: 8.4552the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4363/6471], Loss: 2.1818, Perplexity: 8.8620the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4364/6471], Loss: 2.3433, Perplexity: 10.4154the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4365/6471], Loss: 2.4421, Perplexity: 11.4977the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], 

Epoch [2/3], Step [4431/6471], Loss: 2.2632, Perplexity: 9.6135the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4432/6471], Loss: 2.1390, Perplexity: 8.4907the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4433/6471], Loss: 2.2639, Perplexity: 9.6209the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4434/6471], Loss: 2.2447, Perplexity: 9.4373the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4435/6471], Loss: 2.2624, Perplexity: 9.6066the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4436/6471], Loss: 2.5640, Perplexity: 12.9877the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4437/6471], Loss: 2.0602, Perplexity: 7.8472the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4438/6471], Loss: 2.3255, Perplexity: 10.2314the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4439/6471], Loss: 2.3433, Perplexity: 10.4153the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [4505/6471], Loss: 2.3535, Perplexity: 10.5228the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4506/6471], Loss: 2.1816, Perplexity: 8.8606the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4507/6471], Loss: 2.5347, Perplexity: 12.6132the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4508/6471], Loss: 2.3816, Perplexity: 10.8226the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4509/6471], Loss: 2.0865, Perplexity: 8.0566the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [4510/6471], Loss: 2.4513, Perplexity: 11.6035the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4511/6471], Loss: 2.2696, Perplexity: 9.6754the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4512/6471], Loss: 2.2450, Perplexity: 9.4401the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4513/6471], Loss: 2.5767, Perplexity: 13.1530the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], S

Epoch [2/3], Step [4579/6471], Loss: 2.3193, Perplexity: 10.1690the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4580/6471], Loss: 2.2809, Perplexity: 9.7860the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4581/6471], Loss: 2.1775, Perplexity: 8.8241the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4582/6471], Loss: 2.2438, Perplexity: 9.4286the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4583/6471], Loss: 2.3625, Perplexity: 10.6170the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4584/6471], Loss: 2.0882, Perplexity: 8.0702the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4585/6471], Loss: 2.1539, Perplexity: 8.6180the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4586/6471], Loss: 2.3884, Perplexity: 10.8964the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4587/6471], Loss: 2.2973, Perplexity: 9.9474the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [4653/6471], Loss: 2.3428, Perplexity: 10.4107the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4654/6471], Loss: 2.2387, Perplexity: 9.3811the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [2/3], Step [4655/6471], Loss: 3.1288, Perplexity: 22.8458the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4656/6471], Loss: 2.1382, Perplexity: 8.4839the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4657/6471], Loss: 2.1974, Perplexity: 9.0017the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4658/6471], Loss: 2.2451, Perplexity: 9.4414the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [2/3], Step [4659/6471], Loss: 3.1259, Perplexity: 22.7807the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4660/6471], Loss: 2.0973, Perplexity: 8.1445the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4661/6471], Loss: 2.0685, Perplexity: 7.9126the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [4727/6471], Loss: 2.3256, Perplexity: 10.2333the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4728/6471], Loss: 2.2150, Perplexity: 9.1612the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4729/6471], Loss: 2.2176, Perplexity: 9.1849the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4730/6471], Loss: 2.2500, Perplexity: 9.4880the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4731/6471], Loss: 2.3234, Perplexity: 10.2103the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4732/6471], Loss: 2.1328, Perplexity: 8.4383the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4733/6471], Loss: 2.2230, Perplexity: 9.2352the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [4734/6471], Loss: 2.5022, Perplexity: 12.2096the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4735/6471], Loss: 2.2509, Perplexity: 9.4963the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [4801/6471], Loss: 2.2340, Perplexity: 9.3373the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4802/6471], Loss: 2.2158, Perplexity: 9.1685the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4803/6471], Loss: 2.3934, Perplexity: 10.9507the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [2/3], Step [4804/6471], Loss: 3.0675, Perplexity: 21.4889the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4805/6471], Loss: 2.0856, Perplexity: 8.0490the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4806/6471], Loss: 2.2861, Perplexity: 9.8365the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4807/6471], Loss: 2.6337, Perplexity: 13.9255the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [4808/6471], Loss: 2.6178, Perplexity: 13.7052the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4809/6471], Loss: 2.2000, Perplexity: 9.0250the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], St

Epoch [2/3], Step [4875/6471], Loss: 2.1203, Perplexity: 8.3333the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [4876/6471], Loss: 2.7276, Perplexity: 15.2963the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4877/6471], Loss: 2.1057, Perplexity: 8.2132the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [4878/6471], Loss: 2.4099, Perplexity: 11.1327the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [4879/6471], Loss: 2.1218, Perplexity: 8.3457the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4880/6471], Loss: 2.1150, Perplexity: 8.2895the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4881/6471], Loss: 2.2729, Perplexity: 9.7071the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4882/6471], Loss: 2.1272, Perplexity: 8.3910the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4883/6471], Loss: 2.2494, Perplexity: 9.4825the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step

Epoch [2/3], Step [4949/6471], Loss: 2.3725, Perplexity: 10.7243the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4950/6471], Loss: 2.1182, Perplexity: 8.3164the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4951/6471], Loss: 2.1302, Perplexity: 8.4166the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4952/6471], Loss: 2.3635, Perplexity: 10.6281the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4953/6471], Loss: 2.2914, Perplexity: 9.8889the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [4954/6471], Loss: 2.3286, Perplexity: 10.2639the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [4955/6471], Loss: 2.2416, Perplexity: 9.4081the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [4956/6471], Loss: 2.2554, Perplexity: 9.5392the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [4957/6471], Loss: 2.1764, Perplexity: 8.8145the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [5023/6471], Loss: 2.3230, Perplexity: 10.2065the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5024/6471], Loss: 2.1627, Perplexity: 8.6948the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5025/6471], Loss: 2.1988, Perplexity: 9.0146the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5026/6471], Loss: 2.2293, Perplexity: 9.2933the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5027/6471], Loss: 2.0564, Perplexity: 7.8178the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5028/6471], Loss: 2.2663, Perplexity: 9.6438the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [5029/6471], Loss: 2.4897, Perplexity: 12.0578the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5030/6471], Loss: 2.1324, Perplexity: 8.4349the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5031/6471], Loss: 2.2041, Perplexity: 9.0619the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step

Epoch [2/3], Step [5097/6471], Loss: 2.5058, Perplexity: 12.2536the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5098/6471], Loss: 2.1119, Perplexity: 8.2642the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5099/6471], Loss: 2.3615, Perplexity: 10.6072the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5100/6471], Loss: 2.3253, Perplexity: 10.2295the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5101/6471], Loss: 2.4879, Perplexity: 12.0358the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5102/6471], Loss: 2.2692, Perplexity: 9.6718the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5103/6471], Loss: 1.9302, Perplexity: 6.8909the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5104/6471], Loss: 2.2873, Perplexity: 9.8485the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5105/6471], Loss: 2.2625, Perplexity: 9.6070the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], St

Epoch [2/3], Step [5171/6471], Loss: 2.5557, Perplexity: 12.8798the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [5172/6471], Loss: 2.3666, Perplexity: 10.6614the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [5173/6471], Loss: 2.6768, Perplexity: 14.5378the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5174/6471], Loss: 2.2168, Perplexity: 9.1783the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5175/6471], Loss: 2.1880, Perplexity: 8.9172the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5176/6471], Loss: 2.2136, Perplexity: 9.1489the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [2/3], Step [5177/6471], Loss: 2.9631, Perplexity: 19.3571the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5178/6471], Loss: 2.1430, Perplexity: 8.5248the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5179/6471], Loss: 2.3051, Perplexity: 10.0248the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], S

Epoch [2/3], Step [5245/6471], Loss: 2.2897, Perplexity: 9.8722the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5246/6471], Loss: 2.1851, Perplexity: 8.8913the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5247/6471], Loss: 2.2739, Perplexity: 9.7175the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5248/6471], Loss: 2.4491, Perplexity: 11.5785the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5249/6471], Loss: 2.3908, Perplexity: 10.9220the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5250/6471], Loss: 2.2197, Perplexity: 9.2046the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [5251/6471], Loss: 2.6257, Perplexity: 13.8143the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5252/6471], Loss: 2.4810, Perplexity: 11.9536the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5253/6471], Loss: 2.2213, Perplexity: 9.2195the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], St

Epoch [2/3], Step [5319/6471], Loss: 2.1808, Perplexity: 8.8536the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5320/6471], Loss: 2.1871, Perplexity: 8.9096the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [5321/6471], Loss: 2.4643, Perplexity: 11.7547the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [5322/6471], Loss: 2.5763, Perplexity: 13.1487the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5323/6471], Loss: 2.3503, Perplexity: 10.4886the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5324/6471], Loss: 2.2114, Perplexity: 9.1288the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5325/6471], Loss: 2.1443, Perplexity: 8.5362the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5326/6471], Loss: 2.2524, Perplexity: 9.5106the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [5327/6471], Loss: 2.5682, Perplexity: 13.0426the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], St

Epoch [2/3], Step [5393/6471], Loss: 4.6695, Perplexity: 106.6435the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5394/6471], Loss: 2.0903, Perplexity: 8.0873the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [5395/6471], Loss: 2.4593, Perplexity: 11.6963the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5396/6471], Loss: 2.3414, Perplexity: 10.3961the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5397/6471], Loss: 2.2436, Perplexity: 9.4275the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5398/6471], Loss: 2.0100, Perplexity: 7.4632the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5399/6471], Loss: 2.2151, Perplexity: 9.1621the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5400/6471], Loss: 2.1351, Perplexity: 8.4578
the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5401/6471], Loss: 2.2391, Perplexity: 9.3849the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], S

Epoch [2/3], Step [5467/6471], Loss: 2.2867, Perplexity: 9.8420the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5468/6471], Loss: 2.3309, Perplexity: 10.2873the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5469/6471], Loss: 2.1042, Perplexity: 8.2002the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5470/6471], Loss: 2.2094, Perplexity: 9.1099the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5471/6471], Loss: 2.4865, Perplexity: 12.0192the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5472/6471], Loss: 2.2600, Perplexity: 9.5831the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5473/6471], Loss: 2.1622, Perplexity: 8.6898the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5474/6471], Loss: 2.1500, Perplexity: 8.5851the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5475/6471], Loss: 2.1571, Perplexity: 8.6462the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step

Epoch [2/3], Step [5541/6471], Loss: 2.3158, Perplexity: 10.1335the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5542/6471], Loss: 2.0925, Perplexity: 8.1048the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5543/6471], Loss: 2.2007, Perplexity: 9.0313the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5544/6471], Loss: 2.3499, Perplexity: 10.4846the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [5545/6471], Loss: 2.3960, Perplexity: 10.9790the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [2/3], Step [5546/6471], Loss: 2.8403, Perplexity: 17.1213the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5547/6471], Loss: 2.3944, Perplexity: 10.9619the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5548/6471], Loss: 2.3817, Perplexity: 10.8228the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5549/6471], Loss: 2.2125, Perplexity: 9.1388the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], 

Epoch [2/3], Step [5615/6471], Loss: 2.0648, Perplexity: 7.8841the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5616/6471], Loss: 2.3265, Perplexity: 10.2416the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5617/6471], Loss: 2.1575, Perplexity: 8.6494the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [5618/6471], Loss: 2.4577, Perplexity: 11.6784the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5619/6471], Loss: 2.1630, Perplexity: 8.6969the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5620/6471], Loss: 2.2136, Perplexity: 9.1487the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5621/6471], Loss: 2.0331, Perplexity: 7.6377the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5622/6471], Loss: 2.4847, Perplexity: 11.9978the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5623/6471], Loss: 2.1680, Perplexity: 8.7404the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [5689/6471], Loss: 2.3723, Perplexity: 10.7215the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5690/6471], Loss: 2.2555, Perplexity: 9.5403the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5691/6471], Loss: 2.3439, Perplexity: 10.4221the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [5692/6471], Loss: 2.4550, Perplexity: 11.6467the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5693/6471], Loss: 2.2752, Perplexity: 9.7300the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5694/6471], Loss: 2.2152, Perplexity: 9.1637the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5695/6471], Loss: 2.2525, Perplexity: 9.5117the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5696/6471], Loss: 2.2099, Perplexity: 9.1146the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5697/6471], Loss: 2.2185, Perplexity: 9.1935the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [5763/6471], Loss: 2.2001, Perplexity: 9.0263the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [2/3], Step [5764/6471], Loss: 2.7817, Perplexity: 16.1468the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5765/6471], Loss: 2.0714, Perplexity: 7.9359the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5766/6471], Loss: 2.1449, Perplexity: 8.5408the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5767/6471], Loss: 2.2672, Perplexity: 9.6525the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5768/6471], Loss: 2.1806, Perplexity: 8.8519the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5769/6471], Loss: 2.0748, Perplexity: 7.9633the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [5770/6471], Loss: 2.0998, Perplexity: 8.1646the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [5771/6471], Loss: 2.4294, Perplexity: 11.3525the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step

Epoch [2/3], Step [5837/6471], Loss: 2.3911, Perplexity: 10.9252the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5838/6471], Loss: 2.4259, Perplexity: 11.3120the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5839/6471], Loss: 2.2778, Perplexity: 9.7553the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5840/6471], Loss: 2.1148, Perplexity: 8.2876the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5841/6471], Loss: 2.3167, Perplexity: 10.1419the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5842/6471], Loss: 2.0425, Perplexity: 7.7096the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [5843/6471], Loss: 2.3866, Perplexity: 10.8764the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [5844/6471], Loss: 2.5318, Perplexity: 12.5759the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5845/6471], Loss: 2.1054, Perplexity: 8.2100the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], S

Epoch [2/3], Step [5911/6471], Loss: 2.0519, Perplexity: 7.7823the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5912/6471], Loss: 2.4207, Perplexity: 11.2536the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5913/6471], Loss: 2.1013, Perplexity: 8.1772the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5914/6471], Loss: 2.3110, Perplexity: 10.0847the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5915/6471], Loss: 2.2221, Perplexity: 9.2263the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5916/6471], Loss: 2.1565, Perplexity: 8.6405the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5917/6471], Loss: 2.2480, Perplexity: 9.4684the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [5918/6471], Loss: 2.5411, Perplexity: 12.6936the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [5919/6471], Loss: 2.2355, Perplexity: 9.3512the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [5985/6471], Loss: 2.1992, Perplexity: 9.0177the hiddens.shape: torch.Size([64, 26, 1048])
Epoch [2/3], Step [5986/6471], Loss: 3.2458, Perplexity: 25.6831the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [2/3], Step [5987/6471], Loss: 2.7373, Perplexity: 15.4448the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5988/6471], Loss: 2.1738, Perplexity: 8.7917the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [5989/6471], Loss: 2.1899, Perplexity: 8.9345the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5990/6471], Loss: 2.1734, Perplexity: 8.7884the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5991/6471], Loss: 2.1515, Perplexity: 8.5977the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [5992/6471], Loss: 2.1632, Perplexity: 8.6993the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [5993/6471], Loss: 2.3189, Perplexity: 10.1646the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [6059/6471], Loss: 2.2040, Perplexity: 9.0609the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6060/6471], Loss: 2.2820, Perplexity: 9.7962the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6061/6471], Loss: 2.2194, Perplexity: 9.2021the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [6062/6471], Loss: 2.1949, Perplexity: 8.9790the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [6063/6471], Loss: 2.3252, Perplexity: 10.2284the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6064/6471], Loss: 2.2327, Perplexity: 9.3251the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [6065/6471], Loss: 2.6048, Perplexity: 13.5287the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [6066/6471], Loss: 2.2778, Perplexity: 9.7552the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6067/6471], Loss: 2.0968, Perplexity: 8.1403the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step

Epoch [2/3], Step [6133/6471], Loss: 3.1463, Perplexity: 23.2498the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6134/6471], Loss: 2.1803, Perplexity: 8.8488the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [6135/6471], Loss: 2.3113, Perplexity: 10.0874the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6136/6471], Loss: 2.3541, Perplexity: 10.5290the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [6137/6471], Loss: 2.4206, Perplexity: 11.2521the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6138/6471], Loss: 2.2270, Perplexity: 9.2722the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6139/6471], Loss: 2.0642, Perplexity: 7.8791the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [6140/6471], Loss: 2.6077, Perplexity: 13.5682the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [6141/6471], Loss: 2.5283, Perplexity: 12.5317the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], 

Epoch [2/3], Step [6207/6471], Loss: 2.3233, Perplexity: 10.2090the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [6208/6471], Loss: 2.1942, Perplexity: 8.9726the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6209/6471], Loss: 2.2093, Perplexity: 9.1096the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6210/6471], Loss: 2.0249, Perplexity: 7.5751the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [6211/6471], Loss: 2.6806, Perplexity: 14.5943the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6212/6471], Loss: 2.3737, Perplexity: 10.7367the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6213/6471], Loss: 2.2007, Perplexity: 9.0311the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [2/3], Step [6214/6471], Loss: 2.5403, Perplexity: 12.6839the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [6215/6471], Loss: 2.2843, Perplexity: 9.8187the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], St

Epoch [2/3], Step [6281/6471], Loss: 2.5244, Perplexity: 12.4831the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [6282/6471], Loss: 2.4342, Perplexity: 11.4066the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [6283/6471], Loss: 2.3646, Perplexity: 10.6393the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6284/6471], Loss: 2.0963, Perplexity: 8.1362the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [6285/6471], Loss: 2.5703, Perplexity: 13.0695the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6286/6471], Loss: 2.2445, Perplexity: 9.4356the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [2/3], Step [6287/6471], Loss: 2.1572, Perplexity: 8.6467the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [6288/6471], Loss: 2.1507, Perplexity: 8.5909the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [6289/6471], Loss: 2.7273, Perplexity: 15.2917the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], S

Epoch [2/3], Step [6355/6471], Loss: 2.3434, Perplexity: 10.4170the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [6356/6471], Loss: 2.4497, Perplexity: 11.5850the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [6357/6471], Loss: 2.1544, Perplexity: 8.6227the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [2/3], Step [6358/6471], Loss: 2.4855, Perplexity: 12.0066the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [6359/6471], Loss: 2.1529, Perplexity: 8.6096the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6360/6471], Loss: 2.2120, Perplexity: 9.1337the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [6361/6471], Loss: 2.2431, Perplexity: 9.4225the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6362/6471], Loss: 2.1560, Perplexity: 8.6362the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6363/6471], Loss: 2.1786, Perplexity: 8.8341the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Ste

Epoch [2/3], Step [6429/6471], Loss: 2.2476, Perplexity: 9.4647the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6430/6471], Loss: 2.2425, Perplexity: 9.4169the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [2/3], Step [6431/6471], Loss: 2.2215, Perplexity: 9.2212the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6432/6471], Loss: 2.1160, Perplexity: 8.2982the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [2/3], Step [6433/6471], Loss: 2.1460, Perplexity: 8.5509the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [6434/6471], Loss: 2.2491, Perplexity: 9.4789the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [6435/6471], Loss: 2.2108, Perplexity: 9.1226the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [2/3], Step [6436/6471], Loss: 2.1880, Perplexity: 8.9172the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [2/3], Step [6437/6471], Loss: 2.2411, Perplexity: 9.4036the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [2/3], Step [

Epoch [3/3], Step [33/6471], Loss: 2.2032, Perplexity: 9.0536the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [34/6471], Loss: 2.3695, Perplexity: 10.6918the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [35/6471], Loss: 2.3416, Perplexity: 10.3974the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [36/6471], Loss: 2.7164, Perplexity: 15.1255the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [37/6471], Loss: 2.3900, Perplexity: 10.9131the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [38/6471], Loss: 1.9837, Perplexity: 7.2698the hiddens.shape: torch.Size([64, 25, 1048])
Epoch [3/3], Step [39/6471], Loss: 3.0600, Perplexity: 21.3269the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [40/6471], Loss: 2.1306, Perplexity: 8.4202the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [41/6471], Loss: 2.8854, Perplexity: 17.9112the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [42/6471], Lo

Epoch [3/3], Step [109/6471], Loss: 2.4887, Perplexity: 12.0451the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [110/6471], Loss: 2.0942, Perplexity: 8.1188the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [3/3], Step [111/6471], Loss: 2.8918, Perplexity: 18.0256the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [112/6471], Loss: 2.8628, Perplexity: 17.5103the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [113/6471], Loss: 2.4526, Perplexity: 11.6182the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [114/6471], Loss: 2.0986, Perplexity: 8.1546the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [115/6471], Loss: 2.3569, Perplexity: 10.5583the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [116/6471], Loss: 2.2905, Perplexity: 9.8795the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [117/6471], Loss: 2.1370, Perplexity: 8.4743the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [118/

Epoch [3/3], Step [184/6471], Loss: 2.2348, Perplexity: 9.3450the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [185/6471], Loss: 2.2279, Perplexity: 9.2802the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [186/6471], Loss: 2.1577, Perplexity: 8.6512the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [187/6471], Loss: 2.1631, Perplexity: 8.6977the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [188/6471], Loss: 2.5358, Perplexity: 12.6260the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [189/6471], Loss: 2.1678, Perplexity: 8.7387the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [190/6471], Loss: 2.2405, Perplexity: 9.3980the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [191/6471], Loss: 2.0389, Perplexity: 7.6818the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [192/6471], Loss: 2.1341, Perplexity: 8.4491the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [193/6471

Epoch [3/3], Step [259/6471], Loss: 2.0144, Perplexity: 7.4964the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [260/6471], Loss: 2.3358, Perplexity: 10.3376the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [261/6471], Loss: 2.2765, Perplexity: 9.7424the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [262/6471], Loss: 2.4043, Perplexity: 11.0705the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [263/6471], Loss: 2.2086, Perplexity: 9.1033the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [264/6471], Loss: 2.2026, Perplexity: 9.0481the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [265/6471], Loss: 2.2408, Perplexity: 9.4009the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [266/6471], Loss: 2.1216, Perplexity: 8.3447the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [267/6471], Loss: 2.2977, Perplexity: 9.9513the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [268/647

Epoch [3/3], Step [334/6471], Loss: 2.1488, Perplexity: 8.5747the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [335/6471], Loss: 2.3026, Perplexity: 10.0000the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [336/6471], Loss: 2.4147, Perplexity: 11.1862the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [337/6471], Loss: 2.4189, Perplexity: 11.2339the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [338/6471], Loss: 2.3145, Perplexity: 10.1194the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [339/6471], Loss: 2.5830, Perplexity: 13.2368the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [340/6471], Loss: 2.2036, Perplexity: 9.0579the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [341/6471], Loss: 2.3069, Perplexity: 10.0434the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [342/6471], Loss: 2.3755, Perplexity: 10.7567the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [34

Epoch [3/3], Step [409/6471], Loss: 2.1443, Perplexity: 8.5360the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [410/6471], Loss: 2.1793, Perplexity: 8.8405the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [411/6471], Loss: 2.2337, Perplexity: 9.3345the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [412/6471], Loss: 2.0922, Perplexity: 8.1030the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [413/6471], Loss: 2.9913, Perplexity: 19.9114the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [414/6471], Loss: 2.1873, Perplexity: 8.9110the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [415/6471], Loss: 2.3907, Perplexity: 10.9208the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [416/6471], Loss: 2.3106, Perplexity: 10.0807the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [417/6471], Loss: 2.4397, Perplexity: 11.4692the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [418/6

Epoch [3/3], Step [484/6471], Loss: 2.1079, Perplexity: 8.2312the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [485/6471], Loss: 2.0367, Perplexity: 7.6652the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [486/6471], Loss: 2.3081, Perplexity: 10.0558the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [487/6471], Loss: 1.9867, Perplexity: 7.2915the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [488/6471], Loss: 2.3610, Perplexity: 10.6010the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [489/6471], Loss: 2.1421, Perplexity: 8.5171the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [490/6471], Loss: 2.1940, Perplexity: 8.9713the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [491/6471], Loss: 2.1620, Perplexity: 8.6887the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [492/6471], Loss: 2.1422, Perplexity: 8.5182the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [493/647

Epoch [3/3], Step [559/6471], Loss: 2.2344, Perplexity: 9.3407the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [560/6471], Loss: 2.2198, Perplexity: 9.2056the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [561/6471], Loss: 2.1642, Perplexity: 8.7080the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [562/6471], Loss: 2.1051, Perplexity: 8.2080the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [563/6471], Loss: 2.0227, Perplexity: 7.5587the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [564/6471], Loss: 2.1872, Perplexity: 8.9104the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [565/6471], Loss: 2.0587, Perplexity: 7.8356the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [566/6471], Loss: 2.4672, Perplexity: 11.7899the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [567/6471], Loss: 2.2217, Perplexity: 9.2233the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [568/6471

Epoch [3/3], Step [634/6471], Loss: 2.0400, Perplexity: 7.6907the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [635/6471], Loss: 1.8672, Perplexity: 6.4701the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [636/6471], Loss: 2.1612, Perplexity: 8.6813the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [637/6471], Loss: 2.1687, Perplexity: 8.7472the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [638/6471], Loss: 2.4642, Perplexity: 11.7544the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [639/6471], Loss: 2.0968, Perplexity: 8.1400the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [640/6471], Loss: 2.3857, Perplexity: 10.8668the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [641/6471], Loss: 2.0474, Perplexity: 7.7480the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [642/6471], Loss: 2.4475, Perplexity: 11.5597the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [643/64

Epoch [3/3], Step [709/6471], Loss: 2.1800, Perplexity: 8.8463the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [710/6471], Loss: 2.0815, Perplexity: 8.0164the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [711/6471], Loss: 2.1655, Perplexity: 8.7193the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [712/6471], Loss: 2.2949, Perplexity: 9.9239the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [713/6471], Loss: 2.2065, Perplexity: 9.0841the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [714/6471], Loss: 2.3146, Perplexity: 10.1212the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [715/6471], Loss: 2.1512, Perplexity: 8.5949the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [716/6471], Loss: 2.0946, Perplexity: 8.1219the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [717/6471], Loss: 2.2396, Perplexity: 9.3894the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [718/6471

Epoch [3/3], Step [784/6471], Loss: 2.3739, Perplexity: 10.7393the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [785/6471], Loss: 2.1284, Perplexity: 8.4013the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [786/6471], Loss: 1.9212, Perplexity: 6.8289the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [787/6471], Loss: 2.5184, Perplexity: 12.4085the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [788/6471], Loss: 2.3100, Perplexity: 10.0745the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [789/6471], Loss: 2.3014, Perplexity: 9.9880the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [790/6471], Loss: 2.4762, Perplexity: 11.8963the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [791/6471], Loss: 2.2954, Perplexity: 9.9283the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [792/6471], Loss: 2.1896, Perplexity: 8.9312the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [793/6

Epoch [3/3], Step [859/6471], Loss: 2.4060, Perplexity: 11.0894the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [860/6471], Loss: 2.1366, Perplexity: 8.4704the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [861/6471], Loss: 2.2203, Perplexity: 9.2105the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [862/6471], Loss: 2.2517, Perplexity: 9.5038the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [863/6471], Loss: 2.3601, Perplexity: 10.5922the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [864/6471], Loss: 2.0827, Perplexity: 8.0258the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [865/6471], Loss: 2.8164, Perplexity: 16.7169the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [866/6471], Loss: 2.1732, Perplexity: 8.7863the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [867/6471], Loss: 1.9501, Perplexity: 7.0294the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [868/64

Epoch [3/3], Step [934/6471], Loss: 2.1186, Perplexity: 8.3195the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [935/6471], Loss: 2.1673, Perplexity: 8.7346the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [936/6471], Loss: 2.7632, Perplexity: 15.8513the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [937/6471], Loss: 2.0467, Perplexity: 7.7422the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [938/6471], Loss: 2.1265, Perplexity: 8.3858the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [939/6471], Loss: 1.9296, Perplexity: 6.8868the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [940/6471], Loss: 2.1805, Perplexity: 8.8509the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [941/6471], Loss: 2.0513, Perplexity: 7.7778the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [942/6471], Loss: 2.1506, Perplexity: 8.5901the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [943/6471

Epoch [3/3], Step [1009/6471], Loss: 1.9432, Perplexity: 6.9809the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1010/6471], Loss: 2.0315, Perplexity: 7.6254the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1011/6471], Loss: 2.2326, Perplexity: 9.3239the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1012/6471], Loss: 2.3194, Perplexity: 10.1691the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [1013/6471], Loss: 2.1605, Perplexity: 8.6751the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [1014/6471], Loss: 2.5589, Perplexity: 12.9213the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1015/6471], Loss: 2.0203, Perplexity: 7.5409the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1016/6471], Loss: 2.1990, Perplexity: 9.0159the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1017/6471], Loss: 2.1303, Perplexity: 8.4173the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [1084/6471], Loss: 2.3872, Perplexity: 10.8827the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1085/6471], Loss: 2.0492, Perplexity: 7.7620the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1086/6471], Loss: 2.3267, Perplexity: 10.2438the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [1087/6471], Loss: 2.2870, Perplexity: 9.8455the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1088/6471], Loss: 2.0264, Perplexity: 7.5866the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1089/6471], Loss: 2.2099, Perplexity: 9.1152the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1090/6471], Loss: 2.0243, Perplexity: 7.5708the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1091/6471], Loss: 2.0406, Perplexity: 7.6951the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1092/6471], Loss: 2.1628, Perplexity: 8.6954the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [1158/6471], Loss: 2.2559, Perplexity: 9.5438the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1159/6471], Loss: 1.9727, Perplexity: 7.1900the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1160/6471], Loss: 2.0194, Perplexity: 7.5335the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1161/6471], Loss: 2.0558, Perplexity: 7.8131the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1162/6471], Loss: 2.1476, Perplexity: 8.5646the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1163/6471], Loss: 2.3219, Perplexity: 10.1952the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1164/6471], Loss: 2.1241, Perplexity: 8.3653the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1165/6471], Loss: 2.2046, Perplexity: 9.0669the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1166/6471], Loss: 2.1469, Perplexity: 8.5582the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [1232/6471], Loss: 2.2351, Perplexity: 9.3471the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1233/6471], Loss: 2.0112, Perplexity: 7.4725the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1234/6471], Loss: 2.1454, Perplexity: 8.5458the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [1235/6471], Loss: 2.5237, Perplexity: 12.4750the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1236/6471], Loss: 2.1811, Perplexity: 8.8560the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1237/6471], Loss: 2.4298, Perplexity: 11.3568the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1238/6471], Loss: 2.3324, Perplexity: 10.3023the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [1239/6471], Loss: 2.2637, Perplexity: 9.6189the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [1240/6471], Loss: 2.2467, Perplexity: 9.4566the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [1306/6471], Loss: 2.3240, Perplexity: 10.2167the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1307/6471], Loss: 2.1291, Perplexity: 8.4075the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1308/6471], Loss: 2.2520, Perplexity: 9.5063the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1309/6471], Loss: 2.2132, Perplexity: 9.1449the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1310/6471], Loss: 2.0415, Perplexity: 7.7018the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1311/6471], Loss: 2.2585, Perplexity: 9.5688the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1312/6471], Loss: 1.9281, Perplexity: 6.8764the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1313/6471], Loss: 2.3846, Perplexity: 10.8548the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1314/6471], Loss: 2.1735, Perplexity: 8.7890the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [1381/6471], Loss: 2.0935, Perplexity: 8.1129the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1382/6471], Loss: 2.0543, Perplexity: 7.8013the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1383/6471], Loss: 2.1697, Perplexity: 8.7556the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [1384/6471], Loss: 2.4546, Perplexity: 11.6417the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [1385/6471], Loss: 2.3951, Perplexity: 10.9690the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [1386/6471], Loss: 2.2771, Perplexity: 9.7484the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1387/6471], Loss: 2.1380, Perplexity: 8.4825the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [1388/6471], Loss: 2.6211, Perplexity: 13.7513the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [1389/6471], Loss: 2.5930, Perplexity: 13.3703the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], St

Epoch [3/3], Step [1455/6471], Loss: 2.2639, Perplexity: 9.6209the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1456/6471], Loss: 1.9428, Perplexity: 6.9785the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1457/6471], Loss: 2.4036, Perplexity: 11.0632the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [3/3], Step [1458/6471], Loss: 2.3431, Perplexity: 10.4134the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1459/6471], Loss: 2.1530, Perplexity: 8.6110the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1460/6471], Loss: 2.0938, Perplexity: 8.1158the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1461/6471], Loss: 2.1894, Perplexity: 8.9303the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [1462/6471], Loss: 2.5185, Perplexity: 12.4105the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1463/6471], Loss: 2.1572, Perplexity: 8.6468the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [1529/6471], Loss: 2.2674, Perplexity: 9.6544the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [1530/6471], Loss: 2.6274, Perplexity: 13.8377the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1531/6471], Loss: 2.1431, Perplexity: 8.5260the hiddens.shape: torch.Size([64, 26, 1048])
Epoch [3/3], Step [1532/6471], Loss: 3.2375, Perplexity: 25.4690the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1533/6471], Loss: 2.2360, Perplexity: 9.3556the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1534/6471], Loss: 2.2883, Perplexity: 9.8577the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1535/6471], Loss: 2.1417, Perplexity: 8.5136the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1536/6471], Loss: 2.1765, Perplexity: 8.8156the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1537/6471], Loss: 2.1427, Perplexity: 8.5223the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [1603/6471], Loss: 2.0537, Perplexity: 7.7970the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1604/6471], Loss: 1.9489, Perplexity: 7.0209the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1605/6471], Loss: 2.1004, Perplexity: 8.1698the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1606/6471], Loss: 2.3766, Perplexity: 10.7683the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1607/6471], Loss: 2.0206, Perplexity: 7.5426the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1608/6471], Loss: 2.2772, Perplexity: 9.7490the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1609/6471], Loss: 2.1180, Perplexity: 8.3149the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1610/6471], Loss: 2.1985, Perplexity: 9.0118the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1611/6471], Loss: 2.2427, Perplexity: 9.4186the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [1678/6471], Loss: 2.1590, Perplexity: 8.6625the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [1679/6471], Loss: 2.2138, Perplexity: 9.1505the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1680/6471], Loss: 2.2171, Perplexity: 9.1809the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1681/6471], Loss: 2.4116, Perplexity: 11.1513the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1682/6471], Loss: 2.0619, Perplexity: 7.8608the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1683/6471], Loss: 2.4150, Perplexity: 11.1898the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [1684/6471], Loss: 2.2191, Perplexity: 9.1987the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1685/6471], Loss: 2.2758, Perplexity: 9.7352the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1686/6471], Loss: 2.2550, Perplexity: 9.5350the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [1753/6471], Loss: 2.2573, Perplexity: 9.5568the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [1754/6471], Loss: 2.2529, Perplexity: 9.5157the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1755/6471], Loss: 1.9763, Perplexity: 7.2161the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [1756/6471], Loss: 2.4117, Perplexity: 11.1534the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [1757/6471], Loss: 2.4084, Perplexity: 11.1163the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1758/6471], Loss: 2.1831, Perplexity: 8.8738the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1759/6471], Loss: 2.1376, Perplexity: 8.4788the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1760/6471], Loss: 2.1780, Perplexity: 8.8283the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1761/6471], Loss: 1.9715, Perplexity: 7.1816the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [1827/6471], Loss: 2.1264, Perplexity: 8.3842the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [1828/6471], Loss: 2.2963, Perplexity: 9.9373the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1829/6471], Loss: 2.1335, Perplexity: 8.4440the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1830/6471], Loss: 2.1416, Perplexity: 8.5134the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1831/6471], Loss: 2.0604, Perplexity: 7.8488the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1832/6471], Loss: 2.2780, Perplexity: 9.7574the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1833/6471], Loss: 2.1202, Perplexity: 8.3331the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [1834/6471], Loss: 2.5846, Perplexity: 13.2577the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1835/6471], Loss: 2.1712, Perplexity: 8.7691the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [1901/6471], Loss: 2.0282, Perplexity: 7.6003the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1902/6471], Loss: 2.3572, Perplexity: 10.5617the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1903/6471], Loss: 1.9736, Perplexity: 7.1963the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1904/6471], Loss: 2.1749, Perplexity: 8.8013the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [1905/6471], Loss: 2.6521, Perplexity: 14.1840the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1906/6471], Loss: 1.9921, Perplexity: 7.3306the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1907/6471], Loss: 2.2559, Perplexity: 9.5443the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1908/6471], Loss: 2.3088, Perplexity: 10.0622the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1909/6471], Loss: 2.2428, Perplexity: 9.4197the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [1975/6471], Loss: 2.1784, Perplexity: 8.8321the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1976/6471], Loss: 2.3152, Perplexity: 10.1268the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1977/6471], Loss: 1.9981, Perplexity: 7.3751the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [1978/6471], Loss: 2.0769, Perplexity: 7.9797the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [1979/6471], Loss: 2.1517, Perplexity: 8.5991the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1980/6471], Loss: 2.2085, Perplexity: 9.1025the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [1981/6471], Loss: 2.3626, Perplexity: 10.6190the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [1982/6471], Loss: 2.2340, Perplexity: 9.3376the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [1983/6471], Loss: 2.0355, Perplexity: 7.6562the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [2049/6471], Loss: 2.0896, Perplexity: 8.0813the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2050/6471], Loss: 2.2978, Perplexity: 9.9518the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2051/6471], Loss: 2.2178, Perplexity: 9.1869the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2052/6471], Loss: 2.1773, Perplexity: 8.8220the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2053/6471], Loss: 2.3213, Perplexity: 10.1886the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2054/6471], Loss: 1.9231, Perplexity: 6.8422the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [3/3], Step [2055/6471], Loss: 2.9044, Perplexity: 18.2542the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2056/6471], Loss: 2.3105, Perplexity: 10.0791the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2057/6471], Loss: 2.1072, Perplexity: 8.2248the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [2123/6471], Loss: 2.1206, Perplexity: 8.3361the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2124/6471], Loss: 2.4880, Perplexity: 12.0370the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2125/6471], Loss: 2.1393, Perplexity: 8.4932the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2126/6471], Loss: 2.2466, Perplexity: 9.4560the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2127/6471], Loss: 2.0845, Perplexity: 8.0409the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2128/6471], Loss: 2.2219, Perplexity: 9.2253the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [2129/6471], Loss: 2.6090, Perplexity: 13.5856the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [2130/6471], Loss: 2.4775, Perplexity: 11.9115the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2131/6471], Loss: 2.1070, Perplexity: 8.2237the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [2198/6471], Loss: 2.2977, Perplexity: 9.9514the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2199/6471], Loss: 2.1851, Perplexity: 8.8914the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2200/6471], Loss: 1.9659, Perplexity: 7.1416
the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2201/6471], Loss: 2.4131, Perplexity: 11.1681the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2202/6471], Loss: 2.4263, Perplexity: 11.3167the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2203/6471], Loss: 2.3004, Perplexity: 9.9779the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2204/6471], Loss: 2.0435, Perplexity: 7.7174the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2205/6471], Loss: 2.4584, Perplexity: 11.6858the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2206/6471], Loss: 1.9773, Perplexity: 7.2232the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], St

Epoch [3/3], Step [2272/6471], Loss: 2.2648, Perplexity: 9.6291the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2273/6471], Loss: 2.0268, Perplexity: 7.5896the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2274/6471], Loss: 2.2266, Perplexity: 9.2679the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2275/6471], Loss: 2.4212, Perplexity: 11.2596the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2276/6471], Loss: 2.5784, Perplexity: 13.1762the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2277/6471], Loss: 2.0413, Perplexity: 7.7008the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2278/6471], Loss: 1.9538, Perplexity: 7.0558the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2279/6471], Loss: 2.1985, Perplexity: 9.0117the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2280/6471], Loss: 2.1410, Perplexity: 8.5082the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [2346/6471], Loss: 2.1601, Perplexity: 8.6720the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2347/6471], Loss: 2.2766, Perplexity: 9.7438the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2348/6471], Loss: 1.8931, Perplexity: 6.6400the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2349/6471], Loss: 2.2654, Perplexity: 9.6347the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2350/6471], Loss: 2.4298, Perplexity: 11.3567the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2351/6471], Loss: 2.0879, Perplexity: 8.0682the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2352/6471], Loss: 2.2416, Perplexity: 9.4087the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2353/6471], Loss: 2.0355, Perplexity: 7.6558the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [2354/6471], Loss: 2.3206, Perplexity: 10.1822the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [2420/6471], Loss: 2.2465, Perplexity: 9.4546the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2421/6471], Loss: 2.4933, Perplexity: 12.1016the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2422/6471], Loss: 1.9096, Perplexity: 6.7506the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2423/6471], Loss: 2.0462, Perplexity: 7.7381the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2424/6471], Loss: 2.3930, Perplexity: 10.9461the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2425/6471], Loss: 2.2002, Perplexity: 9.0269the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2426/6471], Loss: 2.2798, Perplexity: 9.7745the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2427/6471], Loss: 2.2332, Perplexity: 9.3299the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2428/6471], Loss: 2.1251, Perplexity: 8.3739the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [2495/6471], Loss: 2.1475, Perplexity: 8.5636the hiddens.shape: torch.Size([64, 24, 1048])
Epoch [3/3], Step [2496/6471], Loss: 3.0936, Perplexity: 22.0572the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2497/6471], Loss: 2.2639, Perplexity: 9.6202the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2498/6471], Loss: 2.0796, Perplexity: 8.0015the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2499/6471], Loss: 2.4243, Perplexity: 11.2941the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2500/6471], Loss: 2.1206, Perplexity: 8.3364the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2501/6471], Loss: 2.2266, Perplexity: 9.2678the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2502/6471], Loss: 2.2478, Perplexity: 9.4666the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2503/6471], Loss: 2.1426, Perplexity: 8.5217the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [2569/6471], Loss: 2.1079, Perplexity: 8.2306the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2570/6471], Loss: 2.0305, Perplexity: 7.6179the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2571/6471], Loss: 2.0832, Perplexity: 8.0304the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2572/6471], Loss: 1.9078, Perplexity: 6.7386the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2573/6471], Loss: 2.0479, Perplexity: 7.7516the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2574/6471], Loss: 2.0898, Perplexity: 8.0832the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2575/6471], Loss: 2.0718, Perplexity: 7.9391the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2576/6471], Loss: 2.2780, Perplexity: 9.7576the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2577/6471], Loss: 2.3505, Perplexity: 10.4907the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [2644/6471], Loss: 2.0668, Perplexity: 7.8996the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2645/6471], Loss: 1.9163, Perplexity: 6.7957the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2646/6471], Loss: 2.1233, Perplexity: 8.3586the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2647/6471], Loss: 1.9502, Perplexity: 7.0301the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2648/6471], Loss: 2.0533, Perplexity: 7.7937the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2649/6471], Loss: 2.3575, Perplexity: 10.5641the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2650/6471], Loss: 1.9334, Perplexity: 6.9129the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2651/6471], Loss: 2.0765, Perplexity: 7.9769the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2652/6471], Loss: 2.3001, Perplexity: 9.9747the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [2718/6471], Loss: 2.1696, Perplexity: 8.7545the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2719/6471], Loss: 2.2192, Perplexity: 9.2003the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2720/6471], Loss: 2.0926, Perplexity: 8.1058the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2721/6471], Loss: 2.1098, Perplexity: 8.2462the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2722/6471], Loss: 2.2565, Perplexity: 9.5494the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [3/3], Step [2723/6471], Loss: 2.7509, Perplexity: 15.6561the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2724/6471], Loss: 2.1805, Perplexity: 8.8505the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2725/6471], Loss: 2.1354, Perplexity: 8.4607the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2726/6471], Loss: 2.0510, Perplexity: 7.7756the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [2793/6471], Loss: 2.1629, Perplexity: 8.6963the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2794/6471], Loss: 2.1003, Perplexity: 8.1684the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2795/6471], Loss: 2.2021, Perplexity: 9.0438the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2796/6471], Loss: 2.1555, Perplexity: 8.6321the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [2797/6471], Loss: 2.1655, Perplexity: 8.7186the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2798/6471], Loss: 2.2611, Perplexity: 9.5933the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2799/6471], Loss: 2.2070, Perplexity: 9.0880the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2800/6471], Loss: 2.0751, Perplexity: 7.9656
the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2801/6471], Loss: 1.9256, Perplexity: 6.8596the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [2868/6471], Loss: 2.1126, Perplexity: 8.2699the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2869/6471], Loss: 2.1189, Perplexity: 8.3218the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2870/6471], Loss: 1.9407, Perplexity: 6.9638the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2871/6471], Loss: 2.0839, Perplexity: 8.0357the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [2872/6471], Loss: 2.7689, Perplexity: 15.9412the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [2873/6471], Loss: 2.4150, Perplexity: 11.1894the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [2874/6471], Loss: 2.7282, Perplexity: 15.3052the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2875/6471], Loss: 2.0176, Perplexity: 7.5203the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [2876/6471], Loss: 2.0137, Perplexity: 7.4909the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [2942/6471], Loss: 2.4410, Perplexity: 11.4848the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [3/3], Step [2943/6471], Loss: 3.0007, Perplexity: 20.1000the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2944/6471], Loss: 2.1914, Perplexity: 8.9475the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2945/6471], Loss: 2.0240, Perplexity: 7.5683the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2946/6471], Loss: 2.3597, Perplexity: 10.5877the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [3/3], Step [2947/6471], Loss: 2.9626, Perplexity: 19.3490the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [2948/6471], Loss: 2.2833, Perplexity: 9.8086the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [2949/6471], Loss: 2.2577, Perplexity: 9.5608the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [2950/6471], Loss: 2.0447, Perplexity: 7.7265the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], St

Epoch [3/3], Step [3016/6471], Loss: 2.1221, Perplexity: 8.3490the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3017/6471], Loss: 2.2175, Perplexity: 9.1844the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3018/6471], Loss: 2.1071, Perplexity: 8.2246the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3019/6471], Loss: 2.2726, Perplexity: 9.7045the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [3020/6471], Loss: 2.1393, Perplexity: 8.4934the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3021/6471], Loss: 2.0652, Perplexity: 7.8871the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3022/6471], Loss: 2.0896, Perplexity: 8.0819the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3023/6471], Loss: 1.9550, Perplexity: 7.0640the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3024/6471], Loss: 2.2423, Perplexity: 9.4149the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [

Epoch [3/3], Step [3091/6471], Loss: 2.4136, Perplexity: 11.1741the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3092/6471], Loss: 1.9925, Perplexity: 7.3335the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [3093/6471], Loss: 2.2747, Perplexity: 9.7255the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [3094/6471], Loss: 2.5162, Perplexity: 12.3820the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [3095/6471], Loss: 2.2732, Perplexity: 9.7100the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [3/3], Step [3096/6471], Loss: 3.0861, Perplexity: 21.8923the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3097/6471], Loss: 1.9737, Perplexity: 7.1969the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3098/6471], Loss: 1.9171, Perplexity: 6.8012the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [3099/6471], Loss: 2.3132, Perplexity: 10.1066the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], St

Epoch [3/3], Step [3166/6471], Loss: 2.0342, Perplexity: 7.6462the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3167/6471], Loss: 2.2457, Perplexity: 9.4468the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3168/6471], Loss: 2.0208, Perplexity: 7.5444the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3169/6471], Loss: 2.3315, Perplexity: 10.2933the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3170/6471], Loss: 2.0559, Perplexity: 7.8138the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3171/6471], Loss: 1.9605, Perplexity: 7.1029the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [3172/6471], Loss: 2.2926, Perplexity: 9.9006the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [3173/6471], Loss: 2.4189, Perplexity: 11.2340the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3174/6471], Loss: 2.2869, Perplexity: 9.8448the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [3240/6471], Loss: 2.0488, Perplexity: 7.7590the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [3241/6471], Loss: 2.3970, Perplexity: 10.9896the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3242/6471], Loss: 2.1216, Perplexity: 8.3448the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3243/6471], Loss: 2.0498, Perplexity: 7.7667the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3244/6471], Loss: 1.9865, Perplexity: 7.2897the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3245/6471], Loss: 2.2574, Perplexity: 9.5585the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3246/6471], Loss: 2.0880, Perplexity: 8.0690the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3247/6471], Loss: 1.9557, Perplexity: 7.0688the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3248/6471], Loss: 2.2084, Perplexity: 9.1014the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [3315/6471], Loss: 2.4220, Perplexity: 11.2689the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [3316/6471], Loss: 2.1176, Perplexity: 8.3110the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3317/6471], Loss: 1.9499, Perplexity: 7.0282the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3318/6471], Loss: 2.2434, Perplexity: 9.4256the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3319/6471], Loss: 2.3386, Perplexity: 10.3665the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3320/6471], Loss: 2.0839, Perplexity: 8.0357the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3321/6471], Loss: 2.0423, Perplexity: 7.7081the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3322/6471], Loss: 2.0924, Perplexity: 8.1041the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3323/6471], Loss: 2.0028, Perplexity: 7.4097the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [3389/6471], Loss: 2.3406, Perplexity: 10.3870the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3390/6471], Loss: 2.2462, Perplexity: 9.4522the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [3391/6471], Loss: 2.2209, Perplexity: 9.2153the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3392/6471], Loss: 1.9867, Perplexity: 7.2917the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3393/6471], Loss: 2.0729, Perplexity: 7.9475the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [3394/6471], Loss: 2.1921, Perplexity: 8.9537the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [3395/6471], Loss: 2.4139, Perplexity: 11.1779the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3396/6471], Loss: 1.9344, Perplexity: 6.9201the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3397/6471], Loss: 2.2023, Perplexity: 9.0456the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [3464/6471], Loss: 2.1171, Perplexity: 8.3066the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3465/6471], Loss: 2.3930, Perplexity: 10.9459the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3466/6471], Loss: 1.9769, Perplexity: 7.2205the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3467/6471], Loss: 1.9894, Perplexity: 7.3109the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3468/6471], Loss: 2.0978, Perplexity: 8.1482the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3469/6471], Loss: 2.0585, Perplexity: 7.8346the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3470/6471], Loss: 2.1592, Perplexity: 8.6639the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3471/6471], Loss: 2.1700, Perplexity: 8.7585the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3472/6471], Loss: 2.0956, Perplexity: 8.1301the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [3539/6471], Loss: 1.8842, Perplexity: 6.5808the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3540/6471], Loss: 2.0175, Perplexity: 7.5191the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [3/3], Step [3541/6471], Loss: 2.8009, Perplexity: 16.4598the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3542/6471], Loss: 2.2632, Perplexity: 9.6142the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [3543/6471], Loss: 2.4319, Perplexity: 11.3805the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3544/6471], Loss: 2.1244, Perplexity: 8.3680the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [3545/6471], Loss: 2.0742, Perplexity: 7.9582the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3546/6471], Loss: 2.0358, Perplexity: 7.6581the hiddens.shape: torch.Size([64, 26, 1048])
Epoch [3/3], Step [3547/6471], Loss: 3.0298, Perplexity: 20.6936the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [3614/6471], Loss: 2.1354, Perplexity: 8.4604the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3615/6471], Loss: 2.0961, Perplexity: 8.1341the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3616/6471], Loss: 2.0356, Perplexity: 7.6571the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [3617/6471], Loss: 2.2420, Perplexity: 9.4126the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3618/6471], Loss: 2.1048, Perplexity: 8.2052the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3619/6471], Loss: 2.0071, Perplexity: 7.4417the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3620/6471], Loss: 2.2631, Perplexity: 9.6127the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3621/6471], Loss: 2.1275, Perplexity: 8.3937the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3622/6471], Loss: 2.1530, Perplexity: 8.6107the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [

Epoch [3/3], Step [3689/6471], Loss: 2.2867, Perplexity: 9.8425the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [3690/6471], Loss: 2.1947, Perplexity: 8.9777the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3691/6471], Loss: 2.0908, Perplexity: 8.0916the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3692/6471], Loss: 2.1403, Perplexity: 8.5020the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [3693/6471], Loss: 2.3177, Perplexity: 10.1521the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3694/6471], Loss: 2.2895, Perplexity: 9.8702the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3695/6471], Loss: 2.0011, Perplexity: 7.3974the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3696/6471], Loss: 1.9738, Perplexity: 7.1982the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3697/6471], Loss: 2.2168, Perplexity: 9.1777the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [3764/6471], Loss: 2.0110, Perplexity: 7.4708the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [3765/6471], Loss: 2.7722, Perplexity: 15.9931the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3766/6471], Loss: 2.1336, Perplexity: 8.4454the hiddens.shape: torch.Size([64, 24, 1048])
Epoch [3/3], Step [3767/6471], Loss: 3.0895, Perplexity: 21.9653the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3768/6471], Loss: 2.2622, Perplexity: 9.6045the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3769/6471], Loss: 2.4172, Perplexity: 11.2139the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3770/6471], Loss: 2.2760, Perplexity: 9.7380the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [3771/6471], Loss: 2.2934, Perplexity: 9.9087the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3772/6471], Loss: 2.1242, Perplexity: 8.3661the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [3839/6471], Loss: 2.2490, Perplexity: 9.4780the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3840/6471], Loss: 2.3505, Perplexity: 10.4908the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3841/6471], Loss: 2.2692, Perplexity: 9.6718the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3842/6471], Loss: 2.0676, Perplexity: 7.9057the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3843/6471], Loss: 1.9866, Perplexity: 7.2904the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3844/6471], Loss: 2.1590, Perplexity: 8.6624the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3845/6471], Loss: 1.8517, Perplexity: 6.3708the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3846/6471], Loss: 2.2191, Perplexity: 9.1991the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [3847/6471], Loss: 2.4330, Perplexity: 11.3933the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [3914/6471], Loss: 2.1892, Perplexity: 8.9283the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3915/6471], Loss: 2.3551, Perplexity: 10.5397the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3916/6471], Loss: 2.0955, Perplexity: 8.1294the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [3917/6471], Loss: 2.1868, Perplexity: 8.9067the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [3918/6471], Loss: 2.1739, Perplexity: 8.7921the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3919/6471], Loss: 1.8860, Perplexity: 6.5932the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3920/6471], Loss: 1.9418, Perplexity: 6.9714the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3921/6471], Loss: 2.2203, Perplexity: 9.2102the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [3922/6471], Loss: 2.1933, Perplexity: 8.9644the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [3989/6471], Loss: 2.1904, Perplexity: 8.9386the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3990/6471], Loss: 2.3039, Perplexity: 10.0129the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3991/6471], Loss: 2.1001, Perplexity: 8.1671the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [3992/6471], Loss: 2.0556, Perplexity: 7.8113the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [3993/6471], Loss: 2.3078, Perplexity: 10.0523the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [3994/6471], Loss: 2.1712, Perplexity: 8.7684the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [3995/6471], Loss: 2.1478, Perplexity: 8.5660the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [3996/6471], Loss: 2.3923, Perplexity: 10.9383the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [3997/6471], Loss: 2.5097, Perplexity: 12.3008the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], St

Epoch [3/3], Step [4064/6471], Loss: 2.1584, Perplexity: 8.6574the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [4065/6471], Loss: 2.0906, Perplexity: 8.0899the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [4066/6471], Loss: 2.5521, Perplexity: 12.8338the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [4067/6471], Loss: 2.1673, Perplexity: 8.7343the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4068/6471], Loss: 2.0856, Perplexity: 8.0497the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4069/6471], Loss: 2.0802, Perplexity: 8.0058the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [4070/6471], Loss: 2.5278, Perplexity: 12.5260the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4071/6471], Loss: 1.9594, Perplexity: 7.0949the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [4072/6471], Loss: 2.3387, Perplexity: 10.3682the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [4139/6471], Loss: 2.3639, Perplexity: 10.6324the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4140/6471], Loss: 1.9927, Perplexity: 7.3357the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4141/6471], Loss: 2.0709, Perplexity: 7.9322the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4142/6471], Loss: 1.9323, Perplexity: 6.9056the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [4143/6471], Loss: 2.6780, Perplexity: 14.5564the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4144/6471], Loss: 2.1664, Perplexity: 8.7266the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [4145/6471], Loss: 2.1895, Perplexity: 8.9308the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4146/6471], Loss: 1.9640, Perplexity: 7.1280the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [4147/6471], Loss: 2.0586, Perplexity: 7.8353the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [4214/6471], Loss: 1.9902, Perplexity: 7.3172the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4215/6471], Loss: 1.9444, Perplexity: 6.9892the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4216/6471], Loss: 2.0756, Perplexity: 7.9697the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4217/6471], Loss: 2.4498, Perplexity: 11.5858the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4218/6471], Loss: 2.1377, Perplexity: 8.4797the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4219/6471], Loss: 2.0872, Perplexity: 8.0627the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4220/6471], Loss: 2.1568, Perplexity: 8.6433the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4221/6471], Loss: 2.1323, Perplexity: 8.4341the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4222/6471], Loss: 2.0846, Perplexity: 8.0410the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [4289/6471], Loss: 2.3679, Perplexity: 10.6748the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4290/6471], Loss: 2.0746, Perplexity: 7.9613the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4291/6471], Loss: 2.0910, Perplexity: 8.0928the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4292/6471], Loss: 2.0813, Perplexity: 8.0151the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4293/6471], Loss: 2.0271, Perplexity: 7.5920the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [4294/6471], Loss: 2.3043, Perplexity: 10.0176the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4295/6471], Loss: 2.1482, Perplexity: 8.5697the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [4296/6471], Loss: 2.6320, Perplexity: 13.9019the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4297/6471], Loss: 2.1195, Perplexity: 8.3269the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [4364/6471], Loss: 2.2194, Perplexity: 9.2017the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4365/6471], Loss: 1.9219, Perplexity: 6.8336the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4366/6471], Loss: 2.0660, Perplexity: 7.8935the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4367/6471], Loss: 1.9809, Perplexity: 7.2491the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4368/6471], Loss: 2.1008, Perplexity: 8.1726the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4369/6471], Loss: 1.9432, Perplexity: 6.9811the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4370/6471], Loss: 1.8864, Perplexity: 6.5953the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4371/6471], Loss: 2.1970, Perplexity: 8.9977the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4372/6471], Loss: 2.2629, Perplexity: 9.6105the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [

Epoch [3/3], Step [4438/6471], Loss: 2.1650, Perplexity: 8.7150the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4439/6471], Loss: 1.9292, Perplexity: 6.8843the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4440/6471], Loss: 2.0008, Perplexity: 7.3947the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4441/6471], Loss: 2.2099, Perplexity: 9.1146the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4442/6471], Loss: 2.2189, Perplexity: 9.1976the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4443/6471], Loss: 2.0700, Perplexity: 7.9246the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4444/6471], Loss: 2.1201, Perplexity: 8.3324the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4445/6471], Loss: 2.3158, Perplexity: 10.1325the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [4446/6471], Loss: 2.4478, Perplexity: 11.5634the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [4512/6471], Loss: 1.9823, Perplexity: 7.2591the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [4513/6471], Loss: 2.0884, Perplexity: 8.0716the hiddens.shape: torch.Size([64, 23, 1048])
Epoch [3/3], Step [4514/6471], Loss: 2.9773, Perplexity: 19.6338the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4515/6471], Loss: 1.8556, Perplexity: 6.3955the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4516/6471], Loss: 2.1138, Perplexity: 8.2793the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [4517/6471], Loss: 2.1245, Perplexity: 8.3689the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4518/6471], Loss: 2.1115, Perplexity: 8.2604the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [4519/6471], Loss: 2.3851, Perplexity: 10.8598the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4520/6471], Loss: 2.0443, Perplexity: 7.7235the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [4587/6471], Loss: 2.2050, Perplexity: 9.0707the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [4588/6471], Loss: 2.5042, Perplexity: 12.2339the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4589/6471], Loss: 2.0406, Perplexity: 7.6952the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [4590/6471], Loss: 2.3505, Perplexity: 10.4911the hiddens.shape: torch.Size([64, 24, 1048])
Epoch [3/3], Step [4591/6471], Loss: 2.8937, Perplexity: 18.0598the hiddens.shape: torch.Size([64, 19, 1048])
Epoch [3/3], Step [4592/6471], Loss: 2.5839, Perplexity: 13.2488the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [4593/6471], Loss: 2.4346, Perplexity: 11.4111the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [4594/6471], Loss: 2.4380, Perplexity: 11.4500the hiddens.shape: torch.Size([64, 10, 1048])
Epoch [3/3], Step [4595/6471], Loss: 2.2641, Perplexity: 9.6226the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], 

Epoch [3/3], Step [4662/6471], Loss: 2.2810, Perplexity: 9.7863the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [4663/6471], Loss: 2.6330, Perplexity: 13.9158the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4664/6471], Loss: 2.0449, Perplexity: 7.7284the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4665/6471], Loss: 2.3292, Perplexity: 10.2701the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4666/6471], Loss: 1.9461, Perplexity: 7.0016the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4667/6471], Loss: 1.8903, Perplexity: 6.6216the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4668/6471], Loss: 2.1137, Perplexity: 8.2791the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4669/6471], Loss: 2.3457, Perplexity: 10.4406the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4670/6471], Loss: 2.2414, Perplexity: 9.4062the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [4737/6471], Loss: 2.3415, Perplexity: 10.3967the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4738/6471], Loss: 2.0714, Perplexity: 7.9357the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4739/6471], Loss: 2.0539, Perplexity: 7.7985the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4740/6471], Loss: 2.1133, Perplexity: 8.2758the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4741/6471], Loss: 1.8141, Perplexity: 6.1358the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4742/6471], Loss: 1.9906, Perplexity: 7.3200the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4743/6471], Loss: 1.8129, Perplexity: 6.1284the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4744/6471], Loss: 2.0984, Perplexity: 8.1530the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4745/6471], Loss: 2.2112, Perplexity: 9.1269the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [4812/6471], Loss: 1.9550, Perplexity: 7.0639the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [4813/6471], Loss: 2.4015, Perplexity: 11.0398the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4814/6471], Loss: 2.2425, Perplexity: 9.4168the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4815/6471], Loss: 2.0251, Perplexity: 7.5767the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4816/6471], Loss: 1.9263, Perplexity: 6.8644the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [4817/6471], Loss: 2.4083, Perplexity: 11.1148the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4818/6471], Loss: 2.0832, Perplexity: 8.0299the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [4819/6471], Loss: 2.0544, Perplexity: 7.8025the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [4820/6471], Loss: 2.0534, Perplexity: 7.7945the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [4887/6471], Loss: 1.9557, Perplexity: 7.0691the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [4888/6471], Loss: 2.3291, Perplexity: 10.2686the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4889/6471], Loss: 2.2533, Perplexity: 9.5192the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4890/6471], Loss: 2.1630, Perplexity: 8.6969the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4891/6471], Loss: 2.2290, Perplexity: 9.2901the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [4892/6471], Loss: 2.1021, Perplexity: 8.1837the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4893/6471], Loss: 1.9384, Perplexity: 6.9478the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [4894/6471], Loss: 2.3122, Perplexity: 10.0963the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4895/6471], Loss: 2.1306, Perplexity: 8.4197the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [4962/6471], Loss: 2.0628, Perplexity: 7.8677the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [4963/6471], Loss: 2.0921, Perplexity: 8.1019the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4964/6471], Loss: 1.9590, Perplexity: 7.0923the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4965/6471], Loss: 2.2219, Perplexity: 9.2248the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [4966/6471], Loss: 2.4962, Perplexity: 12.1364the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [4967/6471], Loss: 2.1652, Perplexity: 8.7163the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [4968/6471], Loss: 2.3712, Perplexity: 10.7100the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [4969/6471], Loss: 2.1665, Perplexity: 8.7280the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [4970/6471], Loss: 2.3996, Perplexity: 11.0187the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [5036/6471], Loss: 2.1863, Perplexity: 8.9021the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5037/6471], Loss: 2.0025, Perplexity: 7.4078the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5038/6471], Loss: 2.4477, Perplexity: 11.5612the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5039/6471], Loss: 2.4334, Perplexity: 11.3980the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5040/6471], Loss: 2.2354, Perplexity: 9.3504the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5041/6471], Loss: 2.2173, Perplexity: 9.1828the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5042/6471], Loss: 2.0629, Perplexity: 7.8684the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5043/6471], Loss: 2.1159, Perplexity: 8.2971the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5044/6471], Loss: 2.1250, Perplexity: 8.3731the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [5111/6471], Loss: 2.1479, Perplexity: 8.5672the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5112/6471], Loss: 2.1971, Perplexity: 8.9987the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [5113/6471], Loss: 2.2676, Perplexity: 9.6565the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5114/6471], Loss: 2.0689, Perplexity: 7.9161the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5115/6471], Loss: 2.1210, Perplexity: 8.3395the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5116/6471], Loss: 2.4287, Perplexity: 11.3447the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5117/6471], Loss: 2.0824, Perplexity: 8.0241the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5118/6471], Loss: 2.1703, Perplexity: 8.7610the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5119/6471], Loss: 1.9937, Perplexity: 7.3429the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [5186/6471], Loss: 2.0663, Perplexity: 7.8957the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5187/6471], Loss: 2.0807, Perplexity: 8.0103the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5188/6471], Loss: 2.0037, Perplexity: 7.4165the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5189/6471], Loss: 2.0408, Perplexity: 7.6965the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5190/6471], Loss: 2.3097, Perplexity: 10.0716the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [5191/6471], Loss: 2.4155, Perplexity: 11.1950the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5192/6471], Loss: 2.1807, Perplexity: 8.8528the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5193/6471], Loss: 2.0755, Perplexity: 7.9688the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5194/6471], Loss: 2.2794, Perplexity: 9.7707the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [5261/6471], Loss: 2.0501, Perplexity: 7.7685the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5262/6471], Loss: 2.1795, Perplexity: 8.8419the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5263/6471], Loss: 2.1374, Perplexity: 8.4777the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5264/6471], Loss: 2.0100, Perplexity: 7.4636the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5265/6471], Loss: 2.3620, Perplexity: 10.6119the hiddens.shape: torch.Size([64, 22, 1048])
Epoch [3/3], Step [5266/6471], Loss: 2.9496, Perplexity: 19.0979the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5267/6471], Loss: 2.2458, Perplexity: 9.4478the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5268/6471], Loss: 2.5185, Perplexity: 12.4096the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5269/6471], Loss: 2.0238, Perplexity: 7.5670the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [5336/6471], Loss: 2.1557, Perplexity: 8.6342the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5337/6471], Loss: 2.0165, Perplexity: 7.5122the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5338/6471], Loss: 2.1277, Perplexity: 8.3956the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5339/6471], Loss: 2.0383, Perplexity: 7.6775the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5340/6471], Loss: 2.0295, Perplexity: 7.6101the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5341/6471], Loss: 2.3860, Perplexity: 10.8694the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5342/6471], Loss: 2.1004, Perplexity: 8.1696the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5343/6471], Loss: 2.0228, Perplexity: 7.5595the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5344/6471], Loss: 2.1277, Perplexity: 8.3954the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [5411/6471], Loss: 2.4491, Perplexity: 11.5781the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5412/6471], Loss: 2.0209, Perplexity: 7.5448the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5413/6471], Loss: 2.3567, Perplexity: 10.5564the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5414/6471], Loss: 2.0310, Perplexity: 7.6218the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [5415/6471], Loss: 2.3657, Perplexity: 10.6518the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5416/6471], Loss: 1.9986, Perplexity: 7.3786the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5417/6471], Loss: 1.9780, Perplexity: 7.2283the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5418/6471], Loss: 2.0716, Perplexity: 7.9379the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5419/6471], Loss: 2.2851, Perplexity: 9.8268the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [5486/6471], Loss: 2.2512, Perplexity: 9.4988the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5487/6471], Loss: 2.2033, Perplexity: 9.0544the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5488/6471], Loss: 1.9373, Perplexity: 6.9398the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5489/6471], Loss: 2.0195, Perplexity: 7.5349the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5490/6471], Loss: 2.0756, Perplexity: 7.9692the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5491/6471], Loss: 2.0972, Perplexity: 8.1430the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5492/6471], Loss: 1.9357, Perplexity: 6.9287the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5493/6471], Loss: 2.0366, Perplexity: 7.6647the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [5494/6471], Loss: 2.3032, Perplexity: 10.0061the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [5561/6471], Loss: 2.2775, Perplexity: 9.7522the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5562/6471], Loss: 2.0609, Perplexity: 7.8527the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5563/6471], Loss: 2.0585, Perplexity: 7.8341the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5564/6471], Loss: 1.9425, Perplexity: 6.9764the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5565/6471], Loss: 1.9359, Perplexity: 6.9306the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5566/6471], Loss: 1.9981, Perplexity: 7.3749the hiddens.shape: torch.Size([64, 31, 1048])
Epoch [3/3], Step [5567/6471], Loss: 3.5623, Perplexity: 35.2433the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5568/6471], Loss: 2.2986, Perplexity: 9.9602the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5569/6471], Loss: 2.3473, Perplexity: 10.4575the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [5636/6471], Loss: 1.8565, Perplexity: 6.4014the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5637/6471], Loss: 2.0480, Perplexity: 7.7524the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5638/6471], Loss: 2.2892, Perplexity: 9.8675the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5639/6471], Loss: 2.2294, Perplexity: 9.2943the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5640/6471], Loss: 2.3727, Perplexity: 10.7265the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5641/6471], Loss: 1.9976, Perplexity: 7.3714the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5642/6471], Loss: 2.2094, Perplexity: 9.1106the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5643/6471], Loss: 2.3674, Perplexity: 10.6695the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5644/6471], Loss: 2.0080, Perplexity: 7.4485the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [5711/6471], Loss: 2.0785, Perplexity: 7.9928the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5712/6471], Loss: 2.0352, Perplexity: 7.6539the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5713/6471], Loss: 1.9798, Perplexity: 7.2415the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5714/6471], Loss: 2.0688, Perplexity: 7.9157the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5715/6471], Loss: 2.2021, Perplexity: 9.0442the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5716/6471], Loss: 2.1931, Perplexity: 8.9630the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5717/6471], Loss: 2.0236, Perplexity: 7.5659the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5718/6471], Loss: 2.1012, Perplexity: 8.1759the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5719/6471], Loss: 2.0606, Perplexity: 7.8508the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [

Epoch [3/3], Step [5786/6471], Loss: 2.1846, Perplexity: 8.8867the hiddens.shape: torch.Size([64, 25, 1048])
Epoch [3/3], Step [5787/6471], Loss: 3.0105, Perplexity: 20.2974the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5788/6471], Loss: 2.3248, Perplexity: 10.2251the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5789/6471], Loss: 2.1914, Perplexity: 8.9482the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5790/6471], Loss: 2.2553, Perplexity: 9.5382the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5791/6471], Loss: 2.1823, Perplexity: 8.8665the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5792/6471], Loss: 1.8915, Perplexity: 6.6290the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5793/6471], Loss: 1.9642, Perplexity: 7.1290the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [5794/6471], Loss: 2.3816, Perplexity: 10.8226the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [5861/6471], Loss: 1.9011, Perplexity: 6.6935the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5862/6471], Loss: 1.9591, Perplexity: 7.0927the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [5863/6471], Loss: 2.2131, Perplexity: 9.1444the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5864/6471], Loss: 2.2727, Perplexity: 9.7052the hiddens.shape: torch.Size([64, 29, 1048])
Epoch [3/3], Step [5865/6471], Loss: 3.5646, Perplexity: 35.3238the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5866/6471], Loss: 2.1154, Perplexity: 8.2930the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5867/6471], Loss: 1.9884, Perplexity: 7.3037the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5868/6471], Loss: 2.0183, Perplexity: 7.5253the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5869/6471], Loss: 2.2714, Perplexity: 9.6932the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [5936/6471], Loss: 2.1557, Perplexity: 8.6338the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5937/6471], Loss: 2.1004, Perplexity: 8.1696the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [5938/6471], Loss: 2.1214, Perplexity: 8.3428the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [5939/6471], Loss: 2.2446, Perplexity: 9.4370the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5940/6471], Loss: 2.0891, Perplexity: 8.0773the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5941/6471], Loss: 2.3898, Perplexity: 10.9117the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [5942/6471], Loss: 2.1631, Perplexity: 8.6979the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [5943/6471], Loss: 2.1429, Perplexity: 8.5240the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [5944/6471], Loss: 2.0478, Perplexity: 7.7507the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [6011/6471], Loss: 1.9510, Perplexity: 7.0355the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [6012/6471], Loss: 2.4483, Perplexity: 11.5682the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6013/6471], Loss: 2.0290, Perplexity: 7.6068the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [6014/6471], Loss: 2.3471, Perplexity: 10.4553the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6015/6471], Loss: 2.1973, Perplexity: 9.0003the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6016/6471], Loss: 1.9461, Perplexity: 7.0014the hiddens.shape: torch.Size([64, 18, 1048])
Epoch [3/3], Step [6017/6471], Loss: 2.4775, Perplexity: 11.9110the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [6018/6471], Loss: 2.0843, Perplexity: 8.0388the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6019/6471], Loss: 1.9528, Perplexity: 7.0487the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Ste

Epoch [3/3], Step [6086/6471], Loss: 2.5421, Perplexity: 12.7065the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6087/6471], Loss: 2.1512, Perplexity: 8.5949the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6088/6471], Loss: 2.1011, Perplexity: 8.1751the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6089/6471], Loss: 1.9923, Perplexity: 7.3322the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [6090/6471], Loss: 2.1022, Perplexity: 8.1839the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6091/6471], Loss: 1.9449, Perplexity: 6.9927the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6092/6471], Loss: 2.1545, Perplexity: 8.6238the hiddens.shape: torch.Size([64, 21, 1048])
Epoch [3/3], Step [6093/6471], Loss: 2.7775, Perplexity: 16.0795the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6094/6471], Loss: 2.0936, Perplexity: 8.1142the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [6161/6471], Loss: 2.0202, Perplexity: 7.5400the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6162/6471], Loss: 2.1203, Perplexity: 8.3333the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [6163/6471], Loss: 2.1871, Perplexity: 8.9093the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6164/6471], Loss: 2.1016, Perplexity: 8.1792the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [6165/6471], Loss: 2.3213, Perplexity: 10.1893the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [6166/6471], Loss: 2.2235, Perplexity: 9.2394the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [6167/6471], Loss: 2.2713, Perplexity: 9.6921the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [6168/6471], Loss: 2.1543, Perplexity: 8.6218the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6169/6471], Loss: 2.0674, Perplexity: 7.9040the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step 

Epoch [3/3], Step [6236/6471], Loss: 2.1480, Perplexity: 8.5681the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [6237/6471], Loss: 2.1318, Perplexity: 8.4304the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [6238/6471], Loss: 2.0921, Perplexity: 8.1016the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [6239/6471], Loss: 1.9944, Perplexity: 7.3475the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [6240/6471], Loss: 2.2334, Perplexity: 9.3319the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6241/6471], Loss: 2.2443, Perplexity: 9.4339the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6242/6471], Loss: 2.0869, Perplexity: 8.0599the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [6243/6471], Loss: 1.9504, Perplexity: 7.0316the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [6244/6471], Loss: 1.9486, Perplexity: 7.0186the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [

Epoch [3/3], Step [6311/6471], Loss: 1.8359, Perplexity: 6.2708the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [6312/6471], Loss: 2.1156, Perplexity: 8.2946the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [6313/6471], Loss: 2.3705, Perplexity: 10.7023the hiddens.shape: torch.Size([64, 20, 1048])
Epoch [3/3], Step [6314/6471], Loss: 2.6081, Perplexity: 13.5728the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6315/6471], Loss: 1.9820, Perplexity: 7.2571the hiddens.shape: torch.Size([64, 15, 1048])
Epoch [3/3], Step [6316/6471], Loss: 2.1363, Perplexity: 8.4684the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [6317/6471], Loss: 2.2264, Perplexity: 9.2662the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6318/6471], Loss: 2.1689, Perplexity: 8.7489the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6319/6471], Loss: 1.8974, Perplexity: 6.6685the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [6386/6471], Loss: 2.3351, Perplexity: 10.3306the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6387/6471], Loss: 2.0527, Perplexity: 7.7885the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6388/6471], Loss: 1.9637, Perplexity: 7.1254the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [6389/6471], Loss: 2.1255, Perplexity: 8.3775the hiddens.shape: torch.Size([64, 11, 1048])
Epoch [3/3], Step [6390/6471], Loss: 2.4953, Perplexity: 12.1257the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6391/6471], Loss: 2.1672, Perplexity: 8.7336the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [6392/6471], Loss: 1.9336, Perplexity: 6.9146the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [6393/6471], Loss: 2.3016, Perplexity: 9.9902the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6394/6471], Loss: 2.1237, Perplexity: 8.3620the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step

Epoch [3/3], Step [6461/6471], Loss: 2.2944, Perplexity: 9.9180the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [6462/6471], Loss: 2.0846, Perplexity: 8.0415the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [6463/6471], Loss: 1.9218, Perplexity: 6.8334the hiddens.shape: torch.Size([64, 12, 1048])
Epoch [3/3], Step [6464/6471], Loss: 2.0234, Perplexity: 7.5641the hiddens.shape: torch.Size([64, 17, 1048])
Epoch [3/3], Step [6465/6471], Loss: 2.3343, Perplexity: 10.3221the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [6466/6471], Loss: 1.9337, Perplexity: 6.9151the hiddens.shape: torch.Size([64, 13, 1048])
Epoch [3/3], Step [6467/6471], Loss: 1.8836, Perplexity: 6.5773the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step [6468/6471], Loss: 1.9484, Perplexity: 7.0176the hiddens.shape: torch.Size([64, 16, 1048])
Epoch [3/3], Step [6469/6471], Loss: 2.1847, Perplexity: 8.8880the hiddens.shape: torch.Size([64, 14, 1048])
Epoch [3/3], Step 