# Computer Vision Nanodegree

## Project: Image Captioning

---

In this notebook, you will train your CNN-RNN model.  

You are welcome and encouraged to try out many different architectures and hyperparameters when searching for a good model.

This does have the potential to make the project quite messy!  Before submitting your project, make sure that you clean up:
- the code you write in this notebook.  The notebook should describe how to train a single CNN-RNN architecture, corresponding to your final choice of hyperparameters.  You should structure the notebook so that the reviewer can replicate your results by running the code in this notebook.  
- the output of the code cell in **Step 2**.  The output should show the output obtained when training the model from scratch.

This notebook **will be graded**.  

Feel free to use the links below to navigate the notebook:
- [Step 1](#step1): Training Setup
- [Step 2](#step2): Train your Model
- [Step 3](#step3): (Optional) Validate your Model

<a id='step1'></a>
## Step 1: Training Setup

In this step of the notebook, you will customize the training of your CNN-RNN model by specifying hyperparameters and setting other options that are important to the training procedure.  The values you set now will be used when training your model in **Step 2** below.

You should only amend blocks of code that are preceded by a `TODO` statement.  **Any code blocks that are not preceded by a `TODO` statement should not be modified**.

### Task #1

Begin by setting the following variables:
- `batch_size` - the batch size of each training batch.  It is the number of image-caption pairs used to amend the model weights in each training step. 
- `vocab_threshold` - the minimum word count threshold.  Note that a larger threshold will result in a smaller vocabulary, whereas a smaller threshold will include rarer words and result in a larger vocabulary.  
- `vocab_from_file` - a Boolean that decides whether to load the vocabulary from file. 
- `embed_size` - the dimensionality of the image and word embeddings.  
- `hidden_size` - the number of features in the hidden state of the RNN decoder.  
- `num_epochs` - the number of epochs to train the model.  We recommend that you set `num_epochs=3`, but feel free to increase or decrease this number as you wish.  [This paper](https://arxiv.org/pdf/1502.03044.pdf) trained a captioning model on a single state-of-the-art GPU for 3 days, but you'll soon see that you can get reasonable results in a matter of a few hours!  (_But of course, if you want your model to compete with current research, you will have to train for much longer._)
- `save_every` - determines how often to save the model weights.  We recommend that you set `save_every=1`, to save the model weights after each epoch.  This way, after the `i`th epoch, the encoder and decoder weights will be saved in the `models/` folder as `encoder-i.pkl` and `decoder-i.pkl`, respectively.
- `print_every` - determines how often to print the batch loss to the Jupyter notebook while training.  Note that you **will not** observe a monotonic decrease in the loss function while training - this is perfectly fine and completely expected!  You are encouraged to keep this at its default value of `100` to avoid clogging the notebook, but feel free to change it.
- `log_file` - the name of the text file containing - for every step - how the loss and perplexity evolved during training.

If you're not sure where to begin to set some of the values above, you can peruse [this paper](https://arxiv.org/pdf/1502.03044.pdf) and [this paper](https://arxiv.org/pdf/1411.4555.pdf) for useful guidance!  **To avoid spending too long on this notebook**, you are encouraged to consult these suggested research papers to obtain a strong initial guess for which hyperparameters are likely to work best.  Then, train a single model, and proceed to the next notebook (**3_Inference.ipynb**).  If you are unhappy with your performance, you can return to this notebook to tweak the hyperparameters (and/or the architecture in **model.py**) and re-train your model.

### Question 1

**Question:** Describe your CNN-RNN architecture in detail.  With this architecture in mind, how did you select the values of the variables in Task 1?  If you consulted a research paper detailing a successful implementation of an image captioning model, please provide the reference.

**Answer:** Due to the complexity of the task and the amount of training time, I tried to keep the model simple. No change in the pretrained Resnet CNN was done apart from getting rid of the last linear layer to tranform the output. The RNN was also made using a simple LSTM with just the basic inputs.


### (Optional) Task #2

Note that we have provided a recommended image transform `transform_train` for pre-processing the training images, but you are welcome (and encouraged!) to modify it as you wish.  When modifying this transform, keep in mind that:
- the images in the dataset have varying heights and widths, and 
- if using a pre-trained model, you must perform the corresponding appropriate normalization.

### Question 2

**Question:** How did you select the transform in `transform_train`?  If you left the transform at its provided value, why do you think that it is a good choice for your CNN architecture?

**Answer:** 224*224 is a suitable resolution for the task, so the transform_train was left as it is and the similar one was used for transform_test as well.

### Task #3

Next, you will specify a Python list containing the learnable parameters of the model.  For instance, if you decide to make all weights in the decoder trainable, but only want to train the weights in the embedding layer of the encoder, then you should set `params` to something like:
```
params = list(decoder.parameters()) + list(encoder.embed.parameters()) 
```

### Question 3

**Question:** How did you select the trainable parameters of your architecture?  Why do you think this is a good choice?

**Answer:** I did some research of what other people has done regarding this type of problem and chose the parameters so that they converge faster sacrificing accuracy (slight higher learning rate for example). The number of hidden size was gradually increased from 3 to 512 because increasing it seemed to lower perplexity and loss. 

### Task #4

Finally, you will select an [optimizer](http://pytorch.org/docs/master/optim.html#torch.optim.Optimizer).

### Question 4

**Question:** How did you select the optimizer used to train your model?

**Answer:** Adam optimizer because it works well pretty much universally

In [10]:
import torch
import torch.nn as nn
from torchvision import transforms
import sys
sys.path.append('/opt/cocoapi/PythonAPI')
from pycocotools.coco import COCO
from data_loader import get_loader
from model import EncoderCNN, DecoderRNN
import math
import nltk
nltk.download('punkt')


## TODO #1: Select appropriate values for the Python variables below.
batch_size = 32           # batch size
vocab_threshold = 4        # minimum word count threshold
vocab_from_file = True    # if True, load existing vocab file
embed_size = 512           # dimensionality of image and word embeddings
hidden_size = 512          # number of features in hidden state of the RNN decoder
num_epochs = 1             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 100          # determines window for printing average loss
log_file = 'training_log.txt'       # name of file with saved training loss and perplexity

# (Optional) TODO #2: Amend the image transform below.
transform_train = transforms.Compose([ 
    transforms.Resize(256),                          # smaller edge of image resized to 256
    transforms.RandomCrop(224),                      # get 224x224 crop from random location
    transforms.RandomHorizontalFlip(),               # horizontally flip image with probability=0.5
    transforms.ToTensor(),                           # convert the PIL Image to a tensor
    transforms.Normalize((0.485, 0.456, 0.406),      # normalize image for pre-trained model
                         (0.229, 0.224, 0.225))])

# Build data loader.
data_loader = get_loader(transform=transform_train,
                         mode='train',
                         batch_size=batch_size,
                         vocab_threshold=vocab_threshold,
                         vocab_from_file=vocab_from_file)

# The size of the vocabulary.
vocab_size = len(data_loader.dataset.vocab)

# Initialize the encoder and decoder. 
encoder = EncoderCNN(embed_size)
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)

# Move models to GPU if CUDA is available. 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
encoder.to(device)
decoder.to(device)

# Define the loss function. 
criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

# TODO #3: Specify the learnable parameters of the model.
params = list(decoder.parameters()) + list(encoder.embed.parameters()) 
          
# TODO #4: Define the optimizer.
optimizer = torch.optim.Adam(params, lr=0.001)

# Set the total number of training steps per epoch.
total_step = math.ceil(len(data_loader.dataset.caption_lengths) / data_loader.batch_sampler.batch_size)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Vocabulary successfully loaded from vocab.pkl file!
loading annotations into memory...




  0%|          | 0/414113 [00:00<?, ?it/s][A[A

  0%|          | 452/414113 [00:00<01:31, 4518.95it/s][A[A

  0%|          | 848/414113 [00:00<01:35, 4333.18it/s]

Done (t=0.89s)
creating index...
index created!
Obtaining caption lengths...


[A[A

  0%|          | 1306/414113 [00:00<01:33, 4402.42it/s][A[A

  0%|          | 1751/414113 [00:00<01:33, 4414.75it/s][A[A

  1%|          | 2201/414113 [00:00<01:32, 4438.96it/s][A[A

  1%|          | 2650/414113 [00:00<01:32, 4450.88it/s][A[A

  1%|          | 3093/414113 [00:00<01:32, 4443.26it/s][A[A

  1%|          | 3526/414113 [00:00<01:33, 4406.22it/s][A[A

  1%|          | 3958/414113 [00:00<01:33, 4378.89it/s][A[A

  1%|          | 4408/414113 [00:01<01:32, 4413.67it/s][A[A

  1%|          | 4837/414113 [00:01<01:34, 4330.58it/s][A[A

  1%|▏         | 5295/414113 [00:01<01:32, 4400.30it/s][A[A

  1%|▏         | 5730/414113 [00:01<01:33, 4373.38it/s][A[A

  1%|▏         | 6183/414113 [00:01<01:32, 4416.82it/s][A[A

  2%|▏         | 6626/414113 [00:01<01:32, 4419.27it/s][A[A

  2%|▏         | 7067/414113 [00:01<01:35, 4264.09it/s][A[A

  2%|▏         | 7501/414113 [00:01<01:34, 4284.59it/s][A[A

  2%|▏         | 7957/414113 [00:01<01:33, 4362

 14%|█▍        | 59152/414113 [00:13<01:18, 4544.26it/s][A[A

 14%|█▍        | 59611/414113 [00:13<01:17, 4557.52it/s][A[A

 15%|█▍        | 60077/414113 [00:14<01:17, 4587.75it/s][A[A

 15%|█▍        | 60537/414113 [00:14<01:19, 4426.96it/s][A[A

 15%|█▍        | 60991/414113 [00:14<01:19, 4458.64it/s][A[A

 15%|█▍        | 61457/414113 [00:14<01:18, 4516.65it/s][A[A

 15%|█▍        | 61910/414113 [00:14<01:18, 4514.93it/s][A[A

 15%|█▌        | 62363/414113 [00:14<01:17, 4512.85it/s][A[A

 15%|█▌        | 62820/414113 [00:14<01:17, 4529.28it/s][A[A

 15%|█▌        | 63274/414113 [00:14<01:17, 4531.72it/s][A[A

 15%|█▌        | 63738/414113 [00:14<01:16, 4560.84it/s][A[A

 16%|█▌        | 64198/414113 [00:14<01:16, 4570.12it/s][A[A

 16%|█▌        | 64656/414113 [00:15<01:17, 4513.05it/s][A[A

 16%|█▌        | 65117/414113 [00:15<01:16, 4540.39it/s][A[A

 16%|█▌        | 65572/414113 [00:15<01:18, 4448.20it/s][A[A

 16%|█▌        | 66020/414113 [00:15<01:

 29%|██▉       | 119592/414113 [00:27<01:05, 4527.29it/s][A
 29%|██▉       | 120045/414113 [00:27<01:05, 4518.79it/s][A
 29%|██▉       | 120505/414113 [00:27<01:04, 4539.54it/s][A
 29%|██▉       | 120960/414113 [00:27<01:04, 4541.95it/s][A
 29%|██▉       | 121418/414113 [00:27<01:04, 4553.02it/s][A
 29%|██▉       | 121875/414113 [00:27<01:04, 4554.23it/s][A
 30%|██▉       | 122333/414113 [00:27<01:03, 4560.25it/s][A
 30%|██▉       | 122790/414113 [00:27<01:04, 4545.86it/s][A
 30%|██▉       | 123245/414113 [00:28<01:04, 4532.78it/s][A
 30%|██▉       | 123699/414113 [00:28<01:04, 4510.89it/s][A
 30%|██▉       | 124159/414113 [00:28<01:03, 4535.92it/s][A
 30%|███       | 124613/414113 [00:28<01:03, 4530.68it/s][A
 30%|███       | 125076/414113 [00:28<01:03, 4559.80it/s][A
 30%|███       | 125533/414113 [00:28<01:03, 4554.28it/s][A
 30%|███       | 125989/414113 [00:28<01:03, 4519.81it/s][A
 31%|███       | 126442/414113 [00:28<01:03, 4518.37it/s][A
 31%|███       | 126910/

 44%|████▎     | 180345/414113 [00:41<00:52, 4446.52it/s][A
 44%|████▎     | 180790/414113 [00:41<00:52, 4434.30it/s][A
 44%|████▍     | 181238/414113 [00:41<00:52, 4447.29it/s][A
 44%|████▍     | 181695/414113 [00:41<00:51, 4482.33it/s][A
 44%|████▍     | 182149/414113 [00:41<00:51, 4499.26it/s][A
 44%|████▍     | 182600/414113 [00:41<00:51, 4484.46it/s][A
 44%|████▍     | 183049/414113 [00:41<00:52, 4443.32it/s][A
 44%|████▍     | 183494/414113 [00:41<00:52, 4372.85it/s][A
 44%|████▍     | 183940/414113 [00:41<00:52, 4397.18it/s][A
 45%|████▍     | 184390/414113 [00:42<00:51, 4426.37it/s][A
 45%|████▍     | 184838/414113 [00:42<00:51, 4439.77it/s][A
 45%|████▍     | 185294/414113 [00:42<00:51, 4472.57it/s][A
 45%|████▍     | 185749/414113 [00:42<00:50, 4494.90it/s][A
 45%|████▍     | 186199/414113 [00:42<00:50, 4496.31it/s][A
 45%|████▌     | 186659/414113 [00:42<00:50, 4525.76it/s][A
 45%|████▌     | 187112/414113 [00:42<00:50, 4514.49it/s][A
 45%|████▌     | 187569/

 58%|█████▊    | 241435/414113 [00:54<00:39, 4403.31it/s][A
 58%|█████▊    | 241881/414113 [00:54<00:38, 4418.79it/s][A
 59%|█████▊    | 242330/414113 [00:54<00:38, 4438.92it/s][A
 59%|█████▊    | 242782/414113 [00:54<00:38, 4462.12it/s][A
 59%|█████▊    | 243241/414113 [00:55<00:37, 4498.68it/s][A
 59%|█████▉    | 243697/414113 [00:55<00:37, 4516.79it/s][A
 59%|█████▉    | 244155/414113 [00:55<00:37, 4532.63it/s][A
 59%|█████▉    | 244609/414113 [00:55<00:37, 4525.93it/s][A
 59%|█████▉    | 245062/414113 [00:55<00:37, 4523.58it/s][A
 59%|█████▉    | 245516/414113 [00:55<00:37, 4526.34it/s][A
 59%|█████▉    | 245969/414113 [00:55<00:38, 4410.91it/s][A
 60%|█████▉    | 246425/414113 [00:55<00:37, 4453.93it/s][A
 60%|█████▉    | 246871/414113 [00:55<00:37, 4452.40it/s][A
 60%|█████▉    | 247321/414113 [00:55<00:37, 4466.03it/s][A
 60%|█████▉    | 247768/414113 [00:56<00:37, 4415.66it/s][A
 60%|█████▉    | 248215/414113 [00:56<00:37, 4429.15it/s][A
 60%|██████    | 248671/

 73%|███████▎  | 301938/414113 [01:08<00:24, 4490.85it/s][A
 73%|███████▎  | 302388/414113 [01:08<00:24, 4492.69it/s][A
 73%|███████▎  | 302841/414113 [01:08<00:24, 4502.52it/s][A
 73%|███████▎  | 303303/414113 [01:08<00:24, 4534.34it/s][A
 73%|███████▎  | 303757/414113 [01:08<00:24, 4534.18it/s][A
 73%|███████▎  | 304211/414113 [01:08<00:24, 4519.78it/s][A
 74%|███████▎  | 304664/414113 [01:08<00:24, 4511.80it/s][A
 74%|███████▎  | 305116/414113 [01:08<00:24, 4487.72it/s][A
 74%|███████▍  | 305565/414113 [01:09<00:24, 4458.67it/s][A
 74%|███████▍  | 306017/414113 [01:09<00:24, 4475.21it/s][A
 74%|███████▍  | 306465/414113 [01:09<00:24, 4458.80it/s][A
 74%|███████▍  | 306911/414113 [01:09<00:24, 4449.55it/s][A
 74%|███████▍  | 307363/414113 [01:09<00:23, 4467.61it/s][A
 74%|███████▍  | 307810/414113 [01:09<00:23, 4443.60it/s][A
 74%|███████▍  | 308274/414113 [01:09<00:23, 4498.51it/s][A
 75%|███████▍  | 308725/414113 [01:10<00:47, 2235.66it/s][A
 75%|███████▍  | 309178/

 88%|████████▊ | 362580/414113 [01:22<00:11, 4470.27it/s][A
 88%|████████▊ | 363028/414113 [01:22<00:11, 4455.28it/s][A
 88%|████████▊ | 363474/414113 [01:22<00:11, 4430.10it/s][A
 88%|████████▊ | 363923/414113 [01:22<00:11, 4445.17it/s][A
 88%|████████▊ | 364371/414113 [01:22<00:11, 4454.44it/s][A
 88%|████████▊ | 364826/414113 [01:22<00:10, 4480.93it/s][A
 88%|████████▊ | 365279/414113 [01:22<00:10, 4493.47it/s][A
 88%|████████▊ | 365732/414113 [01:22<00:10, 4503.97it/s][A
 88%|████████▊ | 366185/414113 [01:22<00:10, 4510.68it/s][A
 89%|████████▊ | 366641/414113 [01:22<00:10, 4523.11it/s][A
 89%|████████▊ | 367094/414113 [01:23<00:10, 4374.49it/s][A
 89%|████████▉ | 367544/414113 [01:23<00:10, 4410.95it/s][A
 89%|████████▉ | 367986/414113 [01:23<00:10, 4409.72it/s][A
 89%|████████▉ | 368436/414113 [01:23<00:10, 4434.46it/s][A
 89%|████████▉ | 368880/414113 [01:23<00:10, 4431.36it/s][A
 89%|████████▉ | 369336/414113 [01:23<00:10, 4467.86it/s][A
 89%|████████▉ | 369786/

<a id='step2'></a>
## Step 2: Train your Model

Once you have executed the code cell in **Step 1**, the training procedure below should run without issue.  

It is completely fine to leave the code cell below as-is without modifications to train your model.  However, if you would like to modify the code used to train the model below, you must ensure that your changes are easily parsed by your reviewer.  In other words, make sure to provide appropriate comments to describe how your code works!  

You may find it useful to load saved weights to resume training.  In that case, note the names of the files containing the encoder and decoder weights that you'd like to load (`encoder_file` and `decoder_file`).  Then you can load the weights by using the lines below:

```python
# Load pre-trained weights before resuming training.
encoder.load_state_dict(torch.load(os.path.join('./models', encoder_file)))
decoder.load_state_dict(torch.load(os.path.join('./models', decoder_file)))
```

While trying out parameters, make sure to take extensive notes and record the settings that you used in your various training runs.  In particular, you don't want to encounter a situation where you've trained a model for several hours but can't remember what settings you used :).

### A Note on Tuning Hyperparameters

To figure out how well your model is doing, you can look at how the training loss and perplexity evolve during training - and for the purposes of this project, you are encouraged to amend the hyperparameters based on this information.  

However, this will not tell you if your model is overfitting to the training data, and, unfortunately, overfitting is a problem that is commonly encountered when training image captioning models.  

For this project, you need not worry about overfitting. **This project does not have strict requirements regarding the performance of your model**, and you just need to demonstrate that your model has learned **_something_** when you generate captions on the test data.  For now, we strongly encourage you to train your model for the suggested 3 epochs without worrying about performance; then, you should immediately transition to the next notebook in the sequence (**3_Inference.ipynb**) to see how your model performs on the test data.  If your model needs to be changed, you can come back to this notebook, amend hyperparameters (if necessary), and re-train the model.

That said, if you would like to go above and beyond in this project, you can read about some approaches to minimizing overfitting in section 4.3.1 of [this paper](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7505636).  In the next (optional) step of this notebook, we provide some guidance for assessing the performance on the validation dataset.

In [11]:
import torch.utils.data as data
import numpy as np
import os
import requests
import time

# Open the training log file.
f = open(log_file, 'w')

old_time = time.time()
response = requests.request("GET", 
                            "http://metadata.google.internal/computeMetadata/v1/instance/attributes/keep_alive_token", 
                            headers={"Metadata-Flavor":"Google"})

for epoch in range(1, num_epochs+1):
    
    for i_step in range(1, total_step+1):
        
        if time.time() - old_time > 60:
            old_time = time.time()
            requests.request("POST", 
                             "https://nebula.udacity.com/api/v1/remote/keep-alive", 
                             headers={'Authorization': "STAR " + response.text})
        
        # Randomly sample a caption length, and sample indices with that length.
        indices = data_loader.dataset.get_train_indices()
        # Create and assign a batch sampler to retrieve a batch with the sampled indices.
        new_sampler = data.sampler.SubsetRandomSampler(indices=indices)
        data_loader.batch_sampler.sampler = new_sampler
        
        # Obtain the batch.
        images, captions = next(iter(data_loader))

        # Move batch of images and captions to GPU if CUDA is available.
        images = images.to(device)
        captions = captions.to(device)
        
        # Zero the gradients.
        decoder.zero_grad()
        encoder.zero_grad()
        
        # Pass the inputs through the CNN-RNN model.
        features = encoder(images)
        outputs = decoder(features, captions)
        
        # Calculate the batch loss.
        loss = criterion(outputs.view(-1, vocab_size), captions.view(-1))
        
        # Backward pass.
        loss.backward()
        
        # Update the parameters in the optimizer.
        optimizer.step()
            
        # Get training statistics.
        stats = 'Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Perplexity: %5.4f' % (epoch, num_epochs, i_step, total_step, loss.item(), np.exp(loss.item()))
        
        # Print training statistics (on same line).
        print('\r' + stats, end="")
        sys.stdout.flush()
        
        # Print training statistics to file.
        f.write(stats + '\n')
        f.flush()
        
        # Print training statistics (on different line).
        if i_step % print_every == 0:
            print('\r' + stats)
            
    # Save the weights.
    if epoch % save_every == 0:
        torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d.pkl' % epoch))
        torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d.pkl' % epoch))

# Close the training log file.
f.close()

torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [1/12942], Loss: 9.0907, Perplexity: 8872.4974torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2/12942], Loss: 9.0149, Perplexity: 8224.7805torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3/12942], Loss: 8.8595, Perplexity: 7040.7930torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4/12942], Loss: 8.4765, Perplexity: 4800.6162torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5/12942], Loss: 7.6955, Perplexity: 2198.3780torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6/12942], Loss: 6.9648, Perplexity: 1058.6546torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [7/12942], Loss: 6.2208, Perplexity: 503.1175torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8/12942], Loss: 5.9713, Perplexity: 392.0245torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], 

Epoch [1/1], Step [71/12942], Loss: 4.7151, Perplexity: 111.6245torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [72/12942], Loss: 4.3707, Perplexity: 79.1025torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [73/12942], Loss: 4.3863, Perplexity: 80.3400torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [74/12942], Loss: 4.3212, Perplexity: 75.2801torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [75/12942], Loss: 4.6418, Perplexity: 103.7317torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [76/12942], Loss: 4.6927, Perplexity: 109.1452torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [77/12942], Loss: 4.7175, Perplexity: 111.8933torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [78/12942], Loss: 4.9875, Perplexity: 146.5637torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [79/12942], Loss: 4.4118, Perplexity: 82.4202torc

Epoch [1/1], Step [141/12942], Loss: 4.4479, Perplexity: 85.4451torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [142/12942], Loss: 4.1290, Perplexity: 62.1174torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [143/12942], Loss: 4.3734, Perplexity: 79.3136torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [144/12942], Loss: 4.2340, Perplexity: 68.9926torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [145/12942], Loss: 4.1662, Perplexity: 64.4726torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [146/12942], Loss: 4.5133, Perplexity: 91.2226torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [147/12942], Loss: 4.1339, Perplexity: 62.4180torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [148/12942], Loss: 4.1008, Perplexity: 60.3889torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [149/12942], Loss: 4.2860, Perplexity: 72.6753

Epoch [1/1], Step [211/12942], Loss: 3.7694, Perplexity: 43.3542torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [212/12942], Loss: 3.8211, Perplexity: 45.6549torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [213/12942], Loss: 4.0388, Perplexity: 56.7554torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [214/12942], Loss: 3.8981, Perplexity: 49.3104torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [215/12942], Loss: 3.7807, Perplexity: 43.8482torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [216/12942], Loss: 3.8671, Perplexity: 47.8034torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [217/12942], Loss: 4.4813, Perplexity: 88.3528torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [218/12942], Loss: 4.0686, Perplexity: 58.4770torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [219/12942], Loss: 4.0481, Perplexity: 57.2892

Epoch [1/1], Step [281/12942], Loss: 3.9372, Perplexity: 51.2736torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [282/12942], Loss: 3.9741, Perplexity: 53.2020torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [283/12942], Loss: 3.6825, Perplexity: 39.7461torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [284/12942], Loss: 3.7407, Perplexity: 42.1294torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [285/12942], Loss: 3.7457, Perplexity: 42.3370torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [286/12942], Loss: 3.7336, Perplexity: 41.8283torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [287/12942], Loss: 3.3295, Perplexity: 27.9248torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [288/12942], Loss: 3.6538, Perplexity: 38.6199torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [289/12942], Loss: 3.4999, Perplexity: 33.1111

Epoch [1/1], Step [351/12942], Loss: 3.8368, Perplexity: 46.3771torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [352/12942], Loss: 3.3373, Perplexity: 28.1436torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [353/12942], Loss: 3.4905, Perplexity: 32.8014torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [354/12942], Loss: 3.4019, Perplexity: 30.0212torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [355/12942], Loss: 3.6428, Perplexity: 38.2003torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [356/12942], Loss: 3.6569, Perplexity: 38.7411torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [357/12942], Loss: 3.7516, Perplexity: 42.5895torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [358/12942], Loss: 3.6339, Perplexity: 37.8585torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [359/12942], Loss: 3.4667, Perplexity: 32.0324

Epoch [1/1], Step [421/12942], Loss: 3.4953, Perplexity: 32.9601torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [422/12942], Loss: 3.4555, Perplexity: 31.6730torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [423/12942], Loss: 3.4029, Perplexity: 30.0501torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [424/12942], Loss: 3.5935, Perplexity: 36.3627torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [425/12942], Loss: 3.2950, Perplexity: 26.9764torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [426/12942], Loss: 3.6659, Perplexity: 39.0930torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [427/12942], Loss: 3.7725, Perplexity: 43.4903torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [428/12942], Loss: 3.2533, Perplexity: 25.8752torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [429/12942], Loss: 3.6965, Perplexity: 40.3053

Epoch [1/1], Step [491/12942], Loss: 3.7580, Perplexity: 42.8626torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [492/12942], Loss: 3.6472, Perplexity: 38.3665torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [493/12942], Loss: 3.5098, Perplexity: 33.4410torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [494/12942], Loss: 3.4944, Perplexity: 32.9311torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [495/12942], Loss: 3.1776, Perplexity: 23.9890torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [496/12942], Loss: 3.3711, Perplexity: 29.1111torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [497/12942], Loss: 3.3357, Perplexity: 28.0991torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [498/12942], Loss: 3.0566, Perplexity: 21.2543torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [499/12942], Loss: 3.3876, Perplexity: 29.5961

Epoch [1/1], Step [561/12942], Loss: 3.3860, Perplexity: 29.5483torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [562/12942], Loss: 3.3475, Perplexity: 28.4326torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [563/12942], Loss: 3.4013, Perplexity: 30.0023torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [564/12942], Loss: 3.5703, Perplexity: 35.5275torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [565/12942], Loss: 3.5217, Perplexity: 33.8422torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [566/12942], Loss: 3.4336, Perplexity: 30.9867torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [567/12942], Loss: 3.2576, Perplexity: 25.9868torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [568/12942], Loss: 3.3189, Perplexity: 27.6305torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [569/12942], Loss: 3.6821, Perplexity: 39.7301

Epoch [1/1], Step [631/12942], Loss: 3.6074, Perplexity: 36.8687torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [632/12942], Loss: 3.0724, Perplexity: 21.5934torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [633/12942], Loss: 3.8821, Perplexity: 48.5267torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [634/12942], Loss: 3.2009, Perplexity: 24.5543torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [635/12942], Loss: 3.3534, Perplexity: 28.5988torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [636/12942], Loss: 3.3401, Perplexity: 28.2215torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [637/12942], Loss: 3.6563, Perplexity: 38.7173torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [638/12942], Loss: 3.2836, Perplexity: 26.6725torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [639/12942], Loss: 3.4654, Perplexity: 31.9886

Epoch [1/1], Step [701/12942], Loss: 3.3252, Perplexity: 27.8037torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [702/12942], Loss: 3.2816, Perplexity: 26.6174torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [703/12942], Loss: 3.7212, Perplexity: 41.3124torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [704/12942], Loss: 3.5156, Perplexity: 33.6350torch.Size([32, 25, 512])
torch.Size([32, 25, 8855])
Epoch [1/1], Step [705/12942], Loss: 4.0338, Perplexity: 56.4772torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [706/12942], Loss: 3.2419, Perplexity: 25.5823torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [707/12942], Loss: 3.5541, Perplexity: 34.9553torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [708/12942], Loss: 3.3237, Perplexity: 27.7640torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [709/12942], Loss: 3.2029, Perplexity: 24.6038

Epoch [1/1], Step [771/12942], Loss: 3.2951, Perplexity: 26.9809torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [772/12942], Loss: 3.5909, Perplexity: 36.2655torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [773/12942], Loss: 3.4123, Perplexity: 30.3350torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [774/12942], Loss: 3.3588, Perplexity: 28.7533torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [775/12942], Loss: 3.4920, Perplexity: 32.8531torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [776/12942], Loss: 3.0287, Perplexity: 20.6711torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [777/12942], Loss: 3.8502, Perplexity: 47.0034torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [778/12942], Loss: 2.8897, Perplexity: 17.9888torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [779/12942], Loss: 3.2572, Perplexity: 25.9765

Epoch [1/1], Step [841/12942], Loss: 3.1565, Perplexity: 23.4876torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [842/12942], Loss: 3.0875, Perplexity: 21.9218torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [843/12942], Loss: 2.9821, Perplexity: 19.7286torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [844/12942], Loss: 4.1852, Perplexity: 65.7062torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [845/12942], Loss: 3.5827, Perplexity: 35.9708torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [846/12942], Loss: 3.2778, Perplexity: 26.5165torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [847/12942], Loss: 3.4232, Perplexity: 30.6681torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [848/12942], Loss: 3.0151, Perplexity: 20.3909torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [849/12942], Loss: 3.2702, Perplexity: 26.3179

Epoch [1/1], Step [911/12942], Loss: 3.1862, Perplexity: 24.1957torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [912/12942], Loss: 3.5194, Perplexity: 33.7650torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [913/12942], Loss: 3.6289, Perplexity: 37.6722torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [914/12942], Loss: 3.2736, Perplexity: 26.4050torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [915/12942], Loss: 3.0942, Perplexity: 22.0689torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [916/12942], Loss: 3.1506, Perplexity: 23.3498torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [917/12942], Loss: 3.5506, Perplexity: 34.8336torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [918/12942], Loss: 3.2858, Perplexity: 26.7294torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [919/12942], Loss: 3.0333, Perplexity: 20.7650

Epoch [1/1], Step [981/12942], Loss: 2.9375, Perplexity: 18.8677torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [982/12942], Loss: 3.1977, Perplexity: 24.4764torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [983/12942], Loss: 2.9576, Perplexity: 19.2523torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [984/12942], Loss: 3.1657, Perplexity: 23.7043torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [985/12942], Loss: 3.5096, Perplexity: 33.4365torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [986/12942], Loss: 3.0145, Perplexity: 20.3799torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [987/12942], Loss: 3.5071, Perplexity: 33.3499torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [988/12942], Loss: 3.2129, Perplexity: 24.8522torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [989/12942], Loss: 2.9554, Perplexity: 19.2097

Epoch [1/1], Step [1051/12942], Loss: 3.2249, Perplexity: 25.1500torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [1052/12942], Loss: 3.1195, Perplexity: 22.6342torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [1053/12942], Loss: 3.1354, Perplexity: 22.9983torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1054/12942], Loss: 2.9239, Perplexity: 18.6128torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1055/12942], Loss: 2.8895, Perplexity: 17.9846torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1056/12942], Loss: 2.9754, Perplexity: 19.5984torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [1057/12942], Loss: 3.1138, Perplexity: 22.5067torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1058/12942], Loss: 3.0213, Perplexity: 20.5184torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [1059/12942], Loss: 3.0857, Perplexity

Epoch [1/1], Step [1120/12942], Loss: 3.0633, Perplexity: 21.3975torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1121/12942], Loss: 3.0603, Perplexity: 21.3341torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1122/12942], Loss: 2.7349, Perplexity: 15.4081torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1123/12942], Loss: 2.9800, Perplexity: 19.6881torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1124/12942], Loss: 2.9041, Perplexity: 18.2495torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1125/12942], Loss: 3.0702, Perplexity: 21.5472torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1126/12942], Loss: 3.2890, Perplexity: 26.8159torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1127/12942], Loss: 3.1371, Perplexity: 23.0365torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1128/12942], Loss: 2.5989, Perplexity

Epoch [1/1], Step [1189/12942], Loss: 3.2284, Perplexity: 25.2381torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1190/12942], Loss: 3.0122, Perplexity: 20.3313torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1191/12942], Loss: 2.9531, Perplexity: 19.1659torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1192/12942], Loss: 2.7749, Perplexity: 16.0377torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1193/12942], Loss: 2.8480, Perplexity: 17.2527torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1194/12942], Loss: 2.8145, Perplexity: 16.6840torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1195/12942], Loss: 3.4862, Perplexity: 32.6613torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1196/12942], Loss: 2.8960, Perplexity: 18.1018torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1197/12942], Loss: 3.1603, Perplexity

Epoch [1/1], Step [1258/12942], Loss: 2.9092, Perplexity: 18.3423torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1259/12942], Loss: 2.8476, Perplexity: 17.2465torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1260/12942], Loss: 3.3099, Perplexity: 27.3827torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1261/12942], Loss: 2.9027, Perplexity: 18.2225torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [1262/12942], Loss: 3.0924, Perplexity: 22.0289torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1263/12942], Loss: 2.9561, Perplexity: 19.2224torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1264/12942], Loss: 3.3432, Perplexity: 28.3105torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1265/12942], Loss: 3.0591, Perplexity: 21.3089torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1266/12942], Loss: 2.6787, Perplexity

Epoch [1/1], Step [1327/12942], Loss: 2.8286, Perplexity: 16.9217torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1328/12942], Loss: 2.9051, Perplexity: 18.2678torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [1329/12942], Loss: 3.2022, Perplexity: 24.5867torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [1330/12942], Loss: 2.8824, Perplexity: 17.8568torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [1331/12942], Loss: 2.9645, Perplexity: 19.3845torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [1332/12942], Loss: 2.8091, Perplexity: 16.5947torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1333/12942], Loss: 2.7804, Perplexity: 16.1256torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1334/12942], Loss: 2.8537, Perplexity: 17.3510torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1335/12942], Loss: 2.9256, Perplexity

Epoch [1/1], Step [1396/12942], Loss: 3.0521, Perplexity: 21.1591torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [1397/12942], Loss: 3.0117, Perplexity: 20.3226torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [1398/12942], Loss: 2.7359, Perplexity: 15.4243torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1399/12942], Loss: 2.8266, Perplexity: 16.8876torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [1400/12942], Loss: 2.9036, Perplexity: 18.2397
torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1401/12942], Loss: 3.0390, Perplexity: 20.8834torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1402/12942], Loss: 2.7288, Perplexity: 15.3138torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1403/12942], Loss: 2.9671, Perplexity: 19.4353torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1404/12942], Loss: 2.6596, Perplexit

Epoch [1/1], Step [1465/12942], Loss: 3.2537, Perplexity: 25.8866torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1466/12942], Loss: 2.8469, Perplexity: 17.2350torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [1467/12942], Loss: 3.1759, Perplexity: 23.9486torch.Size([32, 22, 512])
torch.Size([32, 22, 8855])
Epoch [1/1], Step [1468/12942], Loss: 3.6754, Perplexity: 39.4646torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1469/12942], Loss: 3.0180, Perplexity: 20.4510torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [1470/12942], Loss: 2.7907, Perplexity: 16.2920torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [1471/12942], Loss: 3.2546, Perplexity: 25.9093torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1472/12942], Loss: 2.9235, Perplexity: 18.6055torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1473/12942], Loss: 2.8280, Perplexity

Epoch [1/1], Step [1534/12942], Loss: 2.8159, Perplexity: 16.7080torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1535/12942], Loss: 2.7127, Perplexity: 15.0696torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1536/12942], Loss: 2.8811, Perplexity: 17.8339torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [1537/12942], Loss: 2.9248, Perplexity: 18.6307torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1538/12942], Loss: 2.7076, Perplexity: 14.9937torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [1539/12942], Loss: 2.8500, Perplexity: 17.2879torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1540/12942], Loss: 2.7791, Perplexity: 16.1040torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [1541/12942], Loss: 3.2384, Perplexity: 25.4933torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [1542/12942], Loss: 3.2382, Perplexity

Epoch [1/1], Step [1603/12942], Loss: 2.9500, Perplexity: 19.1051torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1604/12942], Loss: 2.7153, Perplexity: 15.1086torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1605/12942], Loss: 2.8163, Perplexity: 16.7154torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [1606/12942], Loss: 2.9235, Perplexity: 18.6068torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1607/12942], Loss: 2.8215, Perplexity: 16.8020torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1608/12942], Loss: 3.0030, Perplexity: 20.1458torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1609/12942], Loss: 2.9118, Perplexity: 18.3898torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1610/12942], Loss: 2.5823, Perplexity: 13.2275torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1611/12942], Loss: 3.0224, Perplexity

Epoch [1/1], Step [1672/12942], Loss: 3.1067, Perplexity: 22.3465torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1673/12942], Loss: 2.7040, Perplexity: 14.9399torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [1674/12942], Loss: 3.4162, Perplexity: 30.4523torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1675/12942], Loss: 2.7126, Perplexity: 15.0687torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [1676/12942], Loss: 3.1011, Perplexity: 22.2220torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1677/12942], Loss: 2.5489, Perplexity: 12.7932torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1678/12942], Loss: 2.6876, Perplexity: 14.6963torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1679/12942], Loss: 2.7317, Perplexity: 15.3591torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1680/12942], Loss: 2.8722, Perplexity

Epoch [1/1], Step [1741/12942], Loss: 2.9007, Perplexity: 18.1862torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [1742/12942], Loss: 2.9545, Perplexity: 19.1917torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [1743/12942], Loss: 3.1421, Perplexity: 23.1516torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [1744/12942], Loss: 2.9347, Perplexity: 18.8155torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1745/12942], Loss: 2.8519, Perplexity: 17.3201torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [1746/12942], Loss: 2.5634, Perplexity: 12.9798torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1747/12942], Loss: 2.8686, Perplexity: 17.6129torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1748/12942], Loss: 3.1606, Perplexity: 23.5841torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1749/12942], Loss: 2.9622, Perplexity

Epoch [1/1], Step [1810/12942], Loss: 2.8731, Perplexity: 17.6915torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1811/12942], Loss: 2.5637, Perplexity: 12.9833torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [1812/12942], Loss: 2.8532, Perplexity: 17.3433torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [1813/12942], Loss: 3.1332, Perplexity: 22.9484torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [1814/12942], Loss: 3.1477, Perplexity: 23.2821torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1815/12942], Loss: 2.7792, Perplexity: 16.1054torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1816/12942], Loss: 2.5968, Perplexity: 13.4212torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [1817/12942], Loss: 3.1583, Perplexity: 23.5295torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1818/12942], Loss: 2.4702, Perplexity

Epoch [1/1], Step [1879/12942], Loss: 3.0075, Perplexity: 20.2363torch.Size([32, 20, 512])
torch.Size([32, 20, 8855])
Epoch [1/1], Step [1880/12942], Loss: 3.2092, Perplexity: 24.7603torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1881/12942], Loss: 2.4615, Perplexity: 11.7224torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [1882/12942], Loss: 2.8903, Perplexity: 17.9987torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [1883/12942], Loss: 2.9438, Perplexity: 18.9871torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1884/12942], Loss: 2.5075, Perplexity: 12.2747torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1885/12942], Loss: 2.6346, Perplexity: 13.9371torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [1886/12942], Loss: 2.9662, Perplexity: 19.4184torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1887/12942], Loss: 2.8423, Perplexity

Epoch [1/1], Step [1948/12942], Loss: 2.7843, Perplexity: 16.1887torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1949/12942], Loss: 2.9955, Perplexity: 19.9958torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [1950/12942], Loss: 2.7699, Perplexity: 15.9565torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [1951/12942], Loss: 2.4795, Perplexity: 11.9357torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [1952/12942], Loss: 3.4266, Perplexity: 30.7719torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1953/12942], Loss: 2.5942, Perplexity: 13.3858torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [1954/12942], Loss: 2.7034, Perplexity: 14.9298torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1955/12942], Loss: 2.6779, Perplexity: 14.5550torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [1956/12942], Loss: 3.0329, Perplexity

Epoch [1/1], Step [2017/12942], Loss: 2.8257, Perplexity: 16.8731torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2018/12942], Loss: 2.5346, Perplexity: 12.6118torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2019/12942], Loss: 2.6021, Perplexity: 13.4916torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2020/12942], Loss: 2.7408, Perplexity: 15.4987torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2021/12942], Loss: 2.6831, Perplexity: 14.6307torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2022/12942], Loss: 2.6177, Perplexity: 13.7043torch.Size([32, 21, 512])
torch.Size([32, 21, 8855])
Epoch [1/1], Step [2023/12942], Loss: 3.5874, Perplexity: 36.1410torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2024/12942], Loss: 2.5258, Perplexity: 12.5007torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2025/12942], Loss: 2.6897, Perplexity

Epoch [1/1], Step [2086/12942], Loss: 2.7149, Perplexity: 15.1024torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2087/12942], Loss: 2.3105, Perplexity: 10.0791torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2088/12942], Loss: 2.5098, Perplexity: 12.3020torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [2089/12942], Loss: 3.1066, Perplexity: 22.3445torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2090/12942], Loss: 2.5460, Perplexity: 12.7561torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2091/12942], Loss: 2.6671, Perplexity: 14.3981torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2092/12942], Loss: 2.4189, Perplexity: 11.2330torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2093/12942], Loss: 3.0750, Perplexity: 21.6506torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2094/12942], Loss: 2.7278, Perplexity

Epoch [1/1], Step [2155/12942], Loss: 3.0259, Perplexity: 20.6115torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [2156/12942], Loss: 3.1891, Perplexity: 24.2671torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2157/12942], Loss: 2.6736, Perplexity: 14.4924torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2158/12942], Loss: 2.7008, Perplexity: 14.8911torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [2159/12942], Loss: 3.0351, Perplexity: 20.8025torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2160/12942], Loss: 2.8679, Perplexity: 17.6007torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2161/12942], Loss: 2.7782, Perplexity: 16.0900torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2162/12942], Loss: 2.6488, Perplexity: 14.1372torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [2163/12942], Loss: 3.0125, Perplexity

Epoch [1/1], Step [2224/12942], Loss: 2.7057, Perplexity: 14.9651torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2225/12942], Loss: 2.6790, Perplexity: 14.5702torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [2226/12942], Loss: 3.3448, Perplexity: 28.3546torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2227/12942], Loss: 2.5848, Perplexity: 13.2606torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2228/12942], Loss: 2.4311, Perplexity: 11.3710torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2229/12942], Loss: 2.6151, Perplexity: 13.6692torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2230/12942], Loss: 2.5168, Perplexity: 12.3884torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2231/12942], Loss: 2.8961, Perplexity: 18.1042torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [2232/12942], Loss: 2.9746, Perplexity

Epoch [1/1], Step [2293/12942], Loss: 3.0222, Perplexity: 20.5358torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [2294/12942], Loss: 2.8901, Perplexity: 17.9955torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2295/12942], Loss: 2.7298, Perplexity: 15.3306torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2296/12942], Loss: 2.5968, Perplexity: 13.4212torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [2297/12942], Loss: 2.8062, Perplexity: 16.5466torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2298/12942], Loss: 2.3183, Perplexity: 10.1584torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2299/12942], Loss: 2.7961, Perplexity: 16.3801torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2300/12942], Loss: 2.4575, Perplexity: 11.6752
torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2301/12942], Loss: 2.8794, Perplexit

Epoch [1/1], Step [2362/12942], Loss: 2.6627, Perplexity: 14.3344torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2363/12942], Loss: 2.6561, Perplexity: 14.2401torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2364/12942], Loss: 2.6469, Perplexity: 14.1098torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2365/12942], Loss: 2.8504, Perplexity: 17.2943torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2366/12942], Loss: 2.9730, Perplexity: 19.5497torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2367/12942], Loss: 2.7101, Perplexity: 15.0301torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2368/12942], Loss: 2.2335, Perplexity: 9.3328torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [2369/12942], Loss: 2.9901, Perplexity: 19.8878torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2370/12942], Loss: 2.5466, Perplexity:

Epoch [1/1], Step [2431/12942], Loss: 2.9211, Perplexity: 18.5619torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2432/12942], Loss: 2.1870, Perplexity: 8.9083torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2433/12942], Loss: 2.6195, Perplexity: 13.7293torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2434/12942], Loss: 2.3737, Perplexity: 10.7366torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2435/12942], Loss: 2.6175, Perplexity: 13.7009torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [2436/12942], Loss: 3.3181, Perplexity: 27.6075torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2437/12942], Loss: 2.6493, Perplexity: 14.1445torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2438/12942], Loss: 2.7029, Perplexity: 14.9237torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2439/12942], Loss: 2.2654, Perplexity:

Epoch [1/1], Step [2500/12942], Loss: 2.4585, Perplexity: 11.6877
torch.Size([32, 21, 512])
torch.Size([32, 21, 8855])
Epoch [1/1], Step [2501/12942], Loss: 3.3796, Perplexity: 29.3576torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2502/12942], Loss: 2.5238, Perplexity: 12.4760torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2503/12942], Loss: 3.0354, Perplexity: 20.8091torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [2504/12942], Loss: 3.0822, Perplexity: 21.8054torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2505/12942], Loss: 2.7160, Perplexity: 15.1194torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [2506/12942], Loss: 2.9057, Perplexity: 18.2783torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [2507/12942], Loss: 3.1989, Perplexity: 24.5056torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [2508/12942], Loss: 3.2522, Perplexit

Epoch [1/1], Step [2569/12942], Loss: 3.1985, Perplexity: 24.4946torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [2570/12942], Loss: 3.0637, Perplexity: 21.4076torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2571/12942], Loss: 2.8219, Perplexity: 16.8083torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2572/12942], Loss: 2.7094, Perplexity: 15.0210torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2573/12942], Loss: 2.3849, Perplexity: 10.8579torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [2574/12942], Loss: 2.9328, Perplexity: 18.7800torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [2575/12942], Loss: 2.8897, Perplexity: 17.9880torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2576/12942], Loss: 2.4439, Perplexity: 11.5178torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2577/12942], Loss: 2.5238, Perplexity

Epoch [1/1], Step [2638/12942], Loss: 2.8702, Perplexity: 17.6413torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [2639/12942], Loss: 2.7666, Perplexity: 15.9052torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2640/12942], Loss: 2.7079, Perplexity: 14.9971torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2641/12942], Loss: 2.6125, Perplexity: 13.6334torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2642/12942], Loss: 2.8104, Perplexity: 16.6161torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2643/12942], Loss: 2.6442, Perplexity: 14.0724torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2644/12942], Loss: 2.7835, Perplexity: 16.1762torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2645/12942], Loss: 2.7439, Perplexity: 15.5481torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2646/12942], Loss: 2.8262, Perplexity

Epoch [1/1], Step [2707/12942], Loss: 2.8171, Perplexity: 16.7278torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2708/12942], Loss: 2.6347, Perplexity: 13.9385torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2709/12942], Loss: 2.8121, Perplexity: 16.6456torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2710/12942], Loss: 2.6619, Perplexity: 14.3236torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2711/12942], Loss: 2.6380, Perplexity: 13.9857torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2712/12942], Loss: 2.6312, Perplexity: 13.8898torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2713/12942], Loss: 2.7912, Perplexity: 16.3012torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2714/12942], Loss: 2.7532, Perplexity: 15.6930torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [2715/12942], Loss: 2.7059, Perplexity

Epoch [1/1], Step [2776/12942], Loss: 2.7167, Perplexity: 15.1299torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2777/12942], Loss: 2.7625, Perplexity: 15.8387torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2778/12942], Loss: 2.7610, Perplexity: 15.8153torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2779/12942], Loss: 2.4763, Perplexity: 11.8970torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2780/12942], Loss: 2.4964, Perplexity: 12.1383torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [2781/12942], Loss: 2.9583, Perplexity: 19.2646torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2782/12942], Loss: 2.6494, Perplexity: 14.1452torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2783/12942], Loss: 2.7545, Perplexity: 15.7128torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2784/12942], Loss: 2.5348, Perplexity

Epoch [1/1], Step [2845/12942], Loss: 2.7502, Perplexity: 15.6454torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2846/12942], Loss: 2.2811, Perplexity: 9.7875torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2847/12942], Loss: 2.6058, Perplexity: 13.5415torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [2848/12942], Loss: 2.8713, Perplexity: 17.6602torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2849/12942], Loss: 2.4563, Perplexity: 11.6615torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2850/12942], Loss: 2.5929, Perplexity: 13.3689torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2851/12942], Loss: 2.7188, Perplexity: 15.1627torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2852/12942], Loss: 2.6409, Perplexity: 14.0265torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2853/12942], Loss: 2.5449, Perplexity:

Epoch [1/1], Step [2914/12942], Loss: 2.3850, Perplexity: 10.8589torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2915/12942], Loss: 2.4047, Perplexity: 11.0750torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2916/12942], Loss: 2.3794, Perplexity: 10.7987torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [2917/12942], Loss: 2.5089, Perplexity: 12.2909torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2918/12942], Loss: 2.6964, Perplexity: 14.8259torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [2919/12942], Loss: 3.0207, Perplexity: 20.5066torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2920/12942], Loss: 2.5317, Perplexity: 12.5749torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2921/12942], Loss: 2.5076, Perplexity: 12.2755torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [2922/12942], Loss: 2.5743, Perplexity

Epoch [1/1], Step [2983/12942], Loss: 2.5027, Perplexity: 12.2153torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [2984/12942], Loss: 3.0916, Perplexity: 22.0125torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [2985/12942], Loss: 2.4694, Perplexity: 11.8158torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [2986/12942], Loss: 2.8982, Perplexity: 18.1415torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [2987/12942], Loss: 2.8925, Perplexity: 18.0377torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [2988/12942], Loss: 2.5746, Perplexity: 13.1264torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2989/12942], Loss: 2.5820, Perplexity: 13.2232torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [2990/12942], Loss: 2.4146, Perplexity: 11.1849torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [2991/12942], Loss: 2.9599, Perplexity

Epoch [1/1], Step [3052/12942], Loss: 2.4780, Perplexity: 11.9176torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [3053/12942], Loss: 2.2628, Perplexity: 9.6099torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3054/12942], Loss: 2.5525, Perplexity: 12.8395torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3055/12942], Loss: 2.7338, Perplexity: 15.3916torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3056/12942], Loss: 2.2436, Perplexity: 9.4271torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3057/12942], Loss: 2.4957, Perplexity: 12.1301torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3058/12942], Loss: 2.6367, Perplexity: 13.9667torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [3059/12942], Loss: 2.9565, Perplexity: 19.2297torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3060/12942], Loss: 2.4309, Perplexity: 

Epoch [1/1], Step [3121/12942], Loss: 3.1501, Perplexity: 23.3375torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3122/12942], Loss: 2.7901, Perplexity: 16.2831torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [3123/12942], Loss: 2.6641, Perplexity: 14.3546torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3124/12942], Loss: 2.7318, Perplexity: 15.3601torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3125/12942], Loss: 2.3463, Perplexity: 10.4464torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3126/12942], Loss: 2.7977, Perplexity: 16.4065torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3127/12942], Loss: 2.4977, Perplexity: 12.1551torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [3128/12942], Loss: 2.7451, Perplexity: 15.5661torch.Size([32, 21, 512])
torch.Size([32, 21, 8855])
Epoch [1/1], Step [3129/12942], Loss: 3.1769, Perplexity

Epoch [1/1], Step [3190/12942], Loss: 2.8171, Perplexity: 16.7275torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3191/12942], Loss: 2.4047, Perplexity: 11.0748torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [3192/12942], Loss: 2.8194, Perplexity: 16.7669torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3193/12942], Loss: 2.4156, Perplexity: 11.1964torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3194/12942], Loss: 2.6821, Perplexity: 14.6159torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3195/12942], Loss: 2.7166, Perplexity: 15.1291torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3196/12942], Loss: 2.4006, Perplexity: 11.0299torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3197/12942], Loss: 2.5699, Perplexity: 13.0651torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3198/12942], Loss: 2.3918, Perplexity

Epoch [1/1], Step [3259/12942], Loss: 2.7440, Perplexity: 15.5488torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3260/12942], Loss: 2.4498, Perplexity: 11.5860torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [3261/12942], Loss: 2.8772, Perplexity: 17.7641torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3262/12942], Loss: 2.6955, Perplexity: 14.8136torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3263/12942], Loss: 2.5432, Perplexity: 12.7207torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [3264/12942], Loss: 2.4885, Perplexity: 12.0431torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3265/12942], Loss: 2.2267, Perplexity: 9.2695torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [3266/12942], Loss: 2.4988, Perplexity: 12.1674torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3267/12942], Loss: 2.6254, Perplexity:

Epoch [1/1], Step [3328/12942], Loss: 2.3995, Perplexity: 11.0177torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3329/12942], Loss: 2.3591, Perplexity: 10.5819torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3330/12942], Loss: 2.6565, Perplexity: 14.2456torch.Size([32, 20, 512])
torch.Size([32, 20, 8855])
Epoch [1/1], Step [3331/12942], Loss: 2.9682, Perplexity: 19.4573torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3332/12942], Loss: 2.8022, Perplexity: 16.4805torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3333/12942], Loss: 2.3798, Perplexity: 10.8032torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3334/12942], Loss: 2.1518, Perplexity: 8.6002torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3335/12942], Loss: 3.0621, Perplexity: 21.3723torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3336/12942], Loss: 2.2669, Perplexity:

Epoch [1/1], Step [3397/12942], Loss: 2.7490, Perplexity: 15.6276torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3398/12942], Loss: 2.4788, Perplexity: 11.9271torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3399/12942], Loss: 2.4661, Perplexity: 11.7759torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3400/12942], Loss: 2.8371, Perplexity: 17.0662
torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3401/12942], Loss: 2.6015, Perplexity: 13.4841torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [3402/12942], Loss: 2.5807, Perplexity: 13.2061torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [3403/12942], Loss: 2.6433, Perplexity: 14.0593torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [3404/12942], Loss: 2.8117, Perplexity: 16.6388torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3405/12942], Loss: 2.3635, Perplexit

Epoch [1/1], Step [3466/12942], Loss: 2.4098, Perplexity: 11.1317torch.Size([32, 29, 512])
torch.Size([32, 29, 8855])
Epoch [1/1], Step [3467/12942], Loss: 3.6626, Perplexity: 38.9626torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3468/12942], Loss: 2.7648, Perplexity: 15.8752torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3469/12942], Loss: 2.3224, Perplexity: 10.2005torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [3470/12942], Loss: 2.4684, Perplexity: 11.8032torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3471/12942], Loss: 2.5249, Perplexity: 12.4892torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3472/12942], Loss: 2.4743, Perplexity: 11.8731torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3473/12942], Loss: 2.5255, Perplexity: 12.4973torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3474/12942], Loss: 2.3485, Perplexity

Epoch [1/1], Step [3535/12942], Loss: 2.3886, Perplexity: 10.8979torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3536/12942], Loss: 2.3906, Perplexity: 10.9199torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3537/12942], Loss: 2.5051, Perplexity: 12.2443torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [3538/12942], Loss: 2.5841, Perplexity: 13.2518torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3539/12942], Loss: 2.3097, Perplexity: 10.0716torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [3540/12942], Loss: 2.9328, Perplexity: 18.7800torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3541/12942], Loss: 2.5063, Perplexity: 12.2596torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3542/12942], Loss: 2.3580, Perplexity: 10.5695torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3543/12942], Loss: 2.2694, Perplexity

Epoch [1/1], Step [3604/12942], Loss: 2.6100, Perplexity: 13.5992torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3605/12942], Loss: 2.4503, Perplexity: 11.5922torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [3606/12942], Loss: 2.7444, Perplexity: 15.5552torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [3607/12942], Loss: 2.7534, Perplexity: 15.6958torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3608/12942], Loss: 2.6324, Perplexity: 13.9072torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3609/12942], Loss: 2.7156, Perplexity: 15.1139torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3610/12942], Loss: 2.3663, Perplexity: 10.6576torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3611/12942], Loss: 2.4399, Perplexity: 11.4718torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3612/12942], Loss: 2.3136, Perplexity

Epoch [1/1], Step [3673/12942], Loss: 2.2835, Perplexity: 9.8110torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3674/12942], Loss: 2.2557, Perplexity: 9.5415torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3675/12942], Loss: 2.5714, Perplexity: 13.0837torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3676/12942], Loss: 2.3998, Perplexity: 11.0206torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3677/12942], Loss: 2.7169, Perplexity: 15.1326torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3678/12942], Loss: 2.8649, Perplexity: 17.5475torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [3679/12942], Loss: 2.8183, Perplexity: 16.7480torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [3680/12942], Loss: 2.4278, Perplexity: 11.3334torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3681/12942], Loss: 2.7286, Perplexity: 

Epoch [1/1], Step [3742/12942], Loss: 2.5278, Perplexity: 12.5264torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [3743/12942], Loss: 2.7394, Perplexity: 15.4779torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3744/12942], Loss: 2.5659, Perplexity: 13.0123torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3745/12942], Loss: 2.3318, Perplexity: 10.2970torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3746/12942], Loss: 2.3876, Perplexity: 10.8869torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3747/12942], Loss: 2.7647, Perplexity: 15.8741torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3748/12942], Loss: 2.2535, Perplexity: 9.5208torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [3749/12942], Loss: 2.5578, Perplexity: 12.9072torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3750/12942], Loss: 2.3753, Perplexity:

Epoch [1/1], Step [3811/12942], Loss: 2.5981, Perplexity: 13.4385torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3812/12942], Loss: 2.4739, Perplexity: 11.8692torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3813/12942], Loss: 2.6141, Perplexity: 13.6544torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3814/12942], Loss: 2.3735, Perplexity: 10.7344torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [3815/12942], Loss: 2.9330, Perplexity: 18.7831torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3816/12942], Loss: 2.5787, Perplexity: 13.1805torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3817/12942], Loss: 2.6214, Perplexity: 13.7543torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3818/12942], Loss: 2.5709, Perplexity: 13.0780torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3819/12942], Loss: 2.7492, Perplexity

Epoch [1/1], Step [3880/12942], Loss: 2.5947, Perplexity: 13.3931torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3881/12942], Loss: 2.6301, Perplexity: 13.8745torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3882/12942], Loss: 2.6449, Perplexity: 14.0821torch.Size([32, 30, 512])
torch.Size([32, 30, 8855])
Epoch [1/1], Step [3883/12942], Loss: 3.9025, Perplexity: 49.5253torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3884/12942], Loss: 2.5106, Perplexity: 12.3119torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [3885/12942], Loss: 2.7840, Perplexity: 16.1836torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [3886/12942], Loss: 2.3825, Perplexity: 10.8320torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3887/12942], Loss: 2.4146, Perplexity: 11.1851torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3888/12942], Loss: 2.5737, Perplexity

Epoch [1/1], Step [3949/12942], Loss: 2.4898, Perplexity: 12.0583torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3950/12942], Loss: 2.5813, Perplexity: 13.2138torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3951/12942], Loss: 2.7347, Perplexity: 15.4053torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [3952/12942], Loss: 2.7313, Perplexity: 15.3533torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3953/12942], Loss: 2.3867, Perplexity: 10.8772torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [3954/12942], Loss: 2.3369, Perplexity: 10.3493torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [3955/12942], Loss: 2.3081, Perplexity: 10.0549torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [3956/12942], Loss: 2.8630, Perplexity: 17.5136torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [3957/12942], Loss: 2.7297, Perplexity

Epoch [1/1], Step [4018/12942], Loss: 2.3754, Perplexity: 10.7554torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4019/12942], Loss: 2.3693, Perplexity: 10.6898torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [4020/12942], Loss: 2.6174, Perplexity: 13.7007torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4021/12942], Loss: 2.7413, Perplexity: 15.5078torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4022/12942], Loss: 2.8354, Perplexity: 17.0375torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [4023/12942], Loss: 2.7050, Perplexity: 14.9543torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4024/12942], Loss: 2.5029, Perplexity: 12.2174torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [4025/12942], Loss: 2.7836, Perplexity: 16.1768torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4026/12942], Loss: 2.5636, Perplexity

Epoch [1/1], Step [4087/12942], Loss: 2.5029, Perplexity: 12.2184torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4088/12942], Loss: 2.4022, Perplexity: 11.0477torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4089/12942], Loss: 2.0588, Perplexity: 7.8363torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4090/12942], Loss: 2.8143, Perplexity: 16.6813torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4091/12942], Loss: 2.6297, Perplexity: 13.8699torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4092/12942], Loss: 2.6956, Perplexity: 14.8147torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [4093/12942], Loss: 2.6735, Perplexity: 14.4903torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [4094/12942], Loss: 3.2892, Perplexity: 26.8202torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [4095/12942], Loss: 2.7369, Perplexity:

Epoch [1/1], Step [4156/12942], Loss: 2.5027, Perplexity: 12.2150torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4157/12942], Loss: 2.5580, Perplexity: 12.9098torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4158/12942], Loss: 2.4769, Perplexity: 11.9037torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4159/12942], Loss: 2.6444, Perplexity: 14.0756torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4160/12942], Loss: 2.4080, Perplexity: 11.1118torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4161/12942], Loss: 2.1735, Perplexity: 8.7891torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4162/12942], Loss: 2.1731, Perplexity: 8.7854torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4163/12942], Loss: 2.7149, Perplexity: 15.1028torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4164/12942], Loss: 2.3550, Perplexity: 

Epoch [1/1], Step [4225/12942], Loss: 2.4678, Perplexity: 11.7968torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4226/12942], Loss: 2.9066, Perplexity: 18.2937torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4227/12942], Loss: 2.5696, Perplexity: 13.0612torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4228/12942], Loss: 2.3163, Perplexity: 10.1381torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4229/12942], Loss: 2.1159, Perplexity: 8.2971torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4230/12942], Loss: 2.3619, Perplexity: 10.6114torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4231/12942], Loss: 2.3152, Perplexity: 10.1272torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4232/12942], Loss: 2.4054, Perplexity: 11.0830torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4233/12942], Loss: 2.5376, Perplexity:

Epoch [1/1], Step [4294/12942], Loss: 2.6033, Perplexity: 13.5076torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4295/12942], Loss: 2.6057, Perplexity: 13.5412torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4296/12942], Loss: 2.7146, Perplexity: 15.0988torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4297/12942], Loss: 2.5123, Perplexity: 12.3336torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4298/12942], Loss: 2.3037, Perplexity: 10.0116torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [4299/12942], Loss: 2.6888, Perplexity: 14.7141torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4300/12942], Loss: 2.1234, Perplexity: 8.3595
torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4301/12942], Loss: 2.5143, Perplexity: 12.3576torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4302/12942], Loss: 2.6491, Perplexity

Epoch [1/1], Step [4363/12942], Loss: 2.4495, Perplexity: 11.5826torch.Size([32, 23, 512])
torch.Size([32, 23, 8855])
Epoch [1/1], Step [4364/12942], Loss: 3.3256, Perplexity: 27.8144torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4365/12942], Loss: 2.3540, Perplexity: 10.5278torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4366/12942], Loss: 2.4099, Perplexity: 11.1326torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4367/12942], Loss: 2.4068, Perplexity: 11.0987torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [4368/12942], Loss: 2.6905, Perplexity: 14.7396torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4369/12942], Loss: 2.4609, Perplexity: 11.7151torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4370/12942], Loss: 2.5601, Perplexity: 12.9377torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4371/12942], Loss: 2.3237, Perplexity

Epoch [1/1], Step [4432/12942], Loss: 2.4761, Perplexity: 11.8943torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4433/12942], Loss: 2.2755, Perplexity: 9.7327torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4434/12942], Loss: 2.3181, Perplexity: 10.1560torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4435/12942], Loss: 2.1959, Perplexity: 8.9885torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4436/12942], Loss: 2.1078, Perplexity: 8.2301torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [4437/12942], Loss: 2.9133, Perplexity: 18.4181torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4438/12942], Loss: 2.1833, Perplexity: 8.8753torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4439/12942], Loss: 2.4319, Perplexity: 11.3806torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4440/12942], Loss: 2.4682, Perplexity: 11

Epoch [1/1], Step [4502/12942], Loss: 2.3375, Perplexity: 10.3551torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4503/12942], Loss: 2.4197, Perplexity: 11.2421torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4504/12942], Loss: 2.4027, Perplexity: 11.0525torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4505/12942], Loss: 2.1782, Perplexity: 8.8302torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4506/12942], Loss: 2.0364, Perplexity: 7.6626torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4507/12942], Loss: 2.4558, Perplexity: 11.6559torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4508/12942], Loss: 2.3879, Perplexity: 10.8909torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4509/12942], Loss: 2.4464, Perplexity: 11.5469torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4510/12942], Loss: 2.5071, Perplexity: 

Epoch [1/1], Step [4571/12942], Loss: 2.2252, Perplexity: 9.2553torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4572/12942], Loss: 2.6645, Perplexity: 14.3614torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [4573/12942], Loss: 2.5704, Perplexity: 13.0704torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [4574/12942], Loss: 2.4474, Perplexity: 11.5579torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4575/12942], Loss: 2.4286, Perplexity: 11.3431torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4576/12942], Loss: 2.5225, Perplexity: 12.4592torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [4577/12942], Loss: 2.6056, Perplexity: 13.5389torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4578/12942], Loss: 2.2860, Perplexity: 9.8353torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4579/12942], Loss: 2.2387, Perplexity: 

Epoch [1/1], Step [4641/12942], Loss: 2.6670, Perplexity: 14.3962torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [4642/12942], Loss: 2.5037, Perplexity: 12.2272torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [4643/12942], Loss: 2.4902, Perplexity: 12.0635torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4644/12942], Loss: 2.3152, Perplexity: 10.1274torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [4645/12942], Loss: 2.5270, Perplexity: 12.5159torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [4646/12942], Loss: 2.3977, Perplexity: 10.9977torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4647/12942], Loss: 2.2619, Perplexity: 9.6017torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4648/12942], Loss: 2.6190, Perplexity: 13.7223torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4649/12942], Loss: 2.4702, Perplexity:

Epoch [1/1], Step [4711/12942], Loss: 2.4117, Perplexity: 11.1534torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [4712/12942], Loss: 2.8728, Perplexity: 17.6873torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4713/12942], Loss: 2.4676, Perplexity: 11.7940torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4714/12942], Loss: 2.1619, Perplexity: 8.6877torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4715/12942], Loss: 2.0834, Perplexity: 8.0318torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4716/12942], Loss: 2.3217, Perplexity: 10.1926torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [4717/12942], Loss: 2.7843, Perplexity: 16.1891torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4718/12942], Loss: 2.5569, Perplexity: 12.8958torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4719/12942], Loss: 2.1362, Perplexity: 

Epoch [1/1], Step [4780/12942], Loss: 2.2483, Perplexity: 9.4713torch.Size([32, 24, 512])
torch.Size([32, 24, 8855])
Epoch [1/1], Step [4781/12942], Loss: 3.2329, Perplexity: 25.3529torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4782/12942], Loss: 2.3146, Perplexity: 10.1213torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [4783/12942], Loss: 2.4422, Perplexity: 11.4981torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4784/12942], Loss: 2.5099, Perplexity: 12.3031torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4785/12942], Loss: 2.3590, Perplexity: 10.5805torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [4786/12942], Loss: 2.4504, Perplexity: 11.5932torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4787/12942], Loss: 2.2567, Perplexity: 9.5519torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [4788/12942], Loss: 2.5375, Perplexity: 

Epoch [1/1], Step [4850/12942], Loss: 2.3731, Perplexity: 10.7308torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4851/12942], Loss: 2.4663, Perplexity: 11.7785torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4852/12942], Loss: 2.3220, Perplexity: 10.1960torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4853/12942], Loss: 2.1128, Perplexity: 8.2710torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4854/12942], Loss: 2.5131, Perplexity: 12.3427torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4855/12942], Loss: 2.2431, Perplexity: 9.4229torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4856/12942], Loss: 2.1337, Perplexity: 8.4463torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4857/12942], Loss: 2.1221, Perplexity: 8.3487torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4858/12942], Loss: 2.1787, Perplexity: 8.

Epoch [1/1], Step [4920/12942], Loss: 2.2893, Perplexity: 9.8677torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [4921/12942], Loss: 2.7488, Perplexity: 15.6246torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4922/12942], Loss: 2.1658, Perplexity: 8.7218torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4923/12942], Loss: 2.4008, Perplexity: 11.0316torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4924/12942], Loss: 2.5306, Perplexity: 12.5613torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [4925/12942], Loss: 2.3686, Perplexity: 10.6820torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4926/12942], Loss: 2.4679, Perplexity: 11.7977torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4927/12942], Loss: 2.2626, Perplexity: 9.6082torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [4928/12942], Loss: 2.4546, Perplexity: 1

Epoch [1/1], Step [4990/12942], Loss: 2.0929, Perplexity: 8.1085torch.Size([32, 26, 512])
torch.Size([32, 26, 8855])
Epoch [1/1], Step [4991/12942], Loss: 3.1654, Perplexity: 23.6974torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [4992/12942], Loss: 2.5683, Perplexity: 13.0438torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4993/12942], Loss: 2.4495, Perplexity: 11.5830torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [4994/12942], Loss: 2.4725, Perplexity: 11.8515torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [4995/12942], Loss: 2.9122, Perplexity: 18.3966torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4996/12942], Loss: 2.3584, Perplexity: 10.5735torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [4997/12942], Loss: 2.2208, Perplexity: 9.2145torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [4998/12942], Loss: 2.0924, Perplexity: 

Epoch [1/1], Step [5060/12942], Loss: 2.4513, Perplexity: 11.6033torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5061/12942], Loss: 2.5391, Perplexity: 12.6679torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5062/12942], Loss: 2.4481, Perplexity: 11.5667torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5063/12942], Loss: 2.7026, Perplexity: 14.9182torch.Size([32, 9, 512])
torch.Size([32, 9, 8855])
Epoch [1/1], Step [5064/12942], Loss: 2.7202, Perplexity: 15.1832torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [5065/12942], Loss: 2.7836, Perplexity: 16.1767torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5066/12942], Loss: 2.5800, Perplexity: 13.1970torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5067/12942], Loss: 2.2938, Perplexity: 9.9124torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5068/12942], Loss: 2.6203, Perplexity: 1

Epoch [1/1], Step [5130/12942], Loss: 2.4727, Perplexity: 11.8544torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5131/12942], Loss: 2.4597, Perplexity: 11.7014torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5132/12942], Loss: 2.7500, Perplexity: 15.6419torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5133/12942], Loss: 2.3616, Perplexity: 10.6082torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5134/12942], Loss: 2.3461, Perplexity: 10.4450torch.Size([32, 20, 512])
torch.Size([32, 20, 8855])
Epoch [1/1], Step [5135/12942], Loss: 3.0957, Perplexity: 22.1032torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [5136/12942], Loss: 2.5514, Perplexity: 12.8255torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5137/12942], Loss: 2.3416, Perplexity: 10.3979torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5138/12942], Loss: 2.6470, Perplexity

Epoch [1/1], Step [5200/12942], Loss: 2.8346, Perplexity: 17.0229
torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5201/12942], Loss: 2.4050, Perplexity: 11.0779torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5202/12942], Loss: 2.5355, Perplexity: 12.6230torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5203/12942], Loss: 2.2663, Perplexity: 9.6433torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5204/12942], Loss: 2.2812, Perplexity: 9.7889torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5205/12942], Loss: 2.0525, Perplexity: 7.7876torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5206/12942], Loss: 2.4536, Perplexity: 11.6304torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5207/12942], Loss: 2.1602, Perplexity: 8.6728torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5208/12942], Loss: 2.1862, Perplexity: 8

Epoch [1/1], Step [5270/12942], Loss: 2.2048, Perplexity: 9.0687torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [5271/12942], Loss: 2.5469, Perplexity: 12.7669torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [5272/12942], Loss: 2.3511, Perplexity: 10.4967torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5273/12942], Loss: 2.4224, Perplexity: 11.2723torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5274/12942], Loss: 2.2977, Perplexity: 9.9515torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5275/12942], Loss: 2.3458, Perplexity: 10.4420torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5276/12942], Loss: 2.4339, Perplexity: 11.4036torch.Size([32, 20, 512])
torch.Size([32, 20, 8855])
Epoch [1/1], Step [5277/12942], Loss: 2.8988, Perplexity: 18.1532torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [5278/12942], Loss: 2.7797, Perplexity: 

Epoch [1/1], Step [5340/12942], Loss: 2.2649, Perplexity: 9.6300torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5341/12942], Loss: 2.4279, Perplexity: 11.3346torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5342/12942], Loss: 2.2858, Perplexity: 9.8339torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5343/12942], Loss: 2.2008, Perplexity: 9.0320torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5344/12942], Loss: 2.7428, Perplexity: 15.5303torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5345/12942], Loss: 2.2533, Perplexity: 9.5193torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [5346/12942], Loss: 2.8784, Perplexity: 17.7850torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5347/12942], Loss: 2.5681, Perplexity: 13.0414torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5348/12942], Loss: 2.0625, Perplexity: 7.

Epoch [1/1], Step [5410/12942], Loss: 2.3264, Perplexity: 10.2413torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5411/12942], Loss: 2.1940, Perplexity: 8.9707torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5412/12942], Loss: 2.5901, Perplexity: 13.3310torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5413/12942], Loss: 2.3315, Perplexity: 10.2937torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5414/12942], Loss: 2.4066, Perplexity: 11.0960torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [5415/12942], Loss: 2.6542, Perplexity: 14.2130torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5416/12942], Loss: 2.7865, Perplexity: 16.2236torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5417/12942], Loss: 2.1493, Perplexity: 8.5790torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5418/12942], Loss: 2.4761, Perplexity: 

Epoch [1/1], Step [5480/12942], Loss: 2.7005, Perplexity: 14.8871torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5481/12942], Loss: 2.4191, Perplexity: 11.2356torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5482/12942], Loss: 2.2931, Perplexity: 9.9051torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5483/12942], Loss: 2.3192, Perplexity: 10.1673torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5484/12942], Loss: 2.3496, Perplexity: 10.4815torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5485/12942], Loss: 2.2903, Perplexity: 9.8776torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5486/12942], Loss: 2.5615, Perplexity: 12.9547torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5487/12942], Loss: 2.5113, Perplexity: 12.3205torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5488/12942], Loss: 2.2822, Perplexity: 

Epoch [1/1], Step [5550/12942], Loss: 2.7034, Perplexity: 14.9298torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5551/12942], Loss: 2.4462, Perplexity: 11.5448torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5552/12942], Loss: 2.4067, Perplexity: 11.0973torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5553/12942], Loss: 2.8623, Perplexity: 17.5025torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5554/12942], Loss: 2.4224, Perplexity: 11.2732torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5555/12942], Loss: 2.1978, Perplexity: 9.0050torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [5556/12942], Loss: 2.8239, Perplexity: 16.8422torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5557/12942], Loss: 2.3950, Perplexity: 10.9679torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5558/12942], Loss: 2.3690, Perplexity:

Epoch [1/1], Step [5620/12942], Loss: 2.3837, Perplexity: 10.8452torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5621/12942], Loss: 2.1254, Perplexity: 8.3767torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5622/12942], Loss: 2.4628, Perplexity: 11.7374torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5623/12942], Loss: 2.1773, Perplexity: 8.8227torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5624/12942], Loss: 2.4254, Perplexity: 11.3068torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5625/12942], Loss: 2.4745, Perplexity: 11.8762torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5626/12942], Loss: 2.5305, Perplexity: 12.5602torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5627/12942], Loss: 2.3101, Perplexity: 10.0749torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5628/12942], Loss: 2.3698, Perplexity: 

Epoch [1/1], Step [5690/12942], Loss: 2.4198, Perplexity: 11.2435torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [5691/12942], Loss: 2.7057, Perplexity: 14.9655torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5692/12942], Loss: 2.6688, Perplexity: 14.4224torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5693/12942], Loss: 2.0356, Perplexity: 7.6568torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5694/12942], Loss: 2.0967, Perplexity: 8.1394torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5695/12942], Loss: 2.1995, Perplexity: 9.0208torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5696/12942], Loss: 2.3051, Perplexity: 10.0255torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5697/12942], Loss: 2.1753, Perplexity: 8.8046torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [5698/12942], Loss: 2.4391, Perplexity: 11

Epoch [1/1], Step [5760/12942], Loss: 2.7208, Perplexity: 15.1922torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5761/12942], Loss: 2.5763, Perplexity: 13.1490torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5762/12942], Loss: 2.2488, Perplexity: 9.4760torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5763/12942], Loss: 2.3187, Perplexity: 10.1621torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5764/12942], Loss: 2.3557, Perplexity: 10.5456torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5765/12942], Loss: 2.4000, Perplexity: 11.0233torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [5766/12942], Loss: 2.4025, Perplexity: 11.0511torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [5767/12942], Loss: 2.1836, Perplexity: 8.8779torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5768/12942], Loss: 2.2956, Perplexity: 

Epoch [1/1], Step [5830/12942], Loss: 2.3095, Perplexity: 10.0695torch.Size([32, 9, 512])
torch.Size([32, 9, 8855])
Epoch [1/1], Step [5831/12942], Loss: 2.8044, Perplexity: 16.5172torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5832/12942], Loss: 2.5842, Perplexity: 13.2520torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5833/12942], Loss: 2.3883, Perplexity: 10.8954torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5834/12942], Loss: 2.5360, Perplexity: 12.6295torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5835/12942], Loss: 2.3786, Perplexity: 10.7902torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5836/12942], Loss: 2.3567, Perplexity: 10.5558torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5837/12942], Loss: 2.3313, Perplexity: 10.2918torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [5838/12942], Loss: 3.1397, Perplexity: 

Epoch [1/1], Step [5900/12942], Loss: 2.2391, Perplexity: 9.3850
torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5901/12942], Loss: 2.0910, Perplexity: 8.0929torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5902/12942], Loss: 2.3928, Perplexity: 10.9445torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5903/12942], Loss: 2.2879, Perplexity: 9.8542torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5904/12942], Loss: 2.8961, Perplexity: 18.1042torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [5905/12942], Loss: 2.5343, Perplexity: 12.6074torch.Size([32, 24, 512])
torch.Size([32, 24, 8855])
Epoch [1/1], Step [5906/12942], Loss: 3.3502, Perplexity: 28.5077torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [5907/12942], Loss: 2.7005, Perplexity: 14.8878torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [5908/12942], Loss: 2.2964, Perplexity: 

Epoch [1/1], Step [5970/12942], Loss: 2.1936, Perplexity: 8.9673torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [5971/12942], Loss: 2.4336, Perplexity: 11.4004torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5972/12942], Loss: 2.7428, Perplexity: 15.5310torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5973/12942], Loss: 2.6638, Perplexity: 14.3508torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [5974/12942], Loss: 2.2888, Perplexity: 9.8634torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5975/12942], Loss: 2.3859, Perplexity: 10.8685torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [5976/12942], Loss: 2.8825, Perplexity: 17.8597torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [5977/12942], Loss: 2.3163, Perplexity: 10.1384torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [5978/12942], Loss: 2.6834, Perplexity: 

Epoch [1/1], Step [6040/12942], Loss: 2.7471, Perplexity: 15.5979torch.Size([32, 22, 512])
torch.Size([32, 22, 8855])
Epoch [1/1], Step [6041/12942], Loss: 3.1550, Perplexity: 23.4528torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6042/12942], Loss: 2.4681, Perplexity: 11.7995torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6043/12942], Loss: 2.2785, Perplexity: 9.7622torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6044/12942], Loss: 1.9319, Perplexity: 6.9024torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6045/12942], Loss: 2.3111, Perplexity: 10.0859torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6046/12942], Loss: 2.1935, Perplexity: 8.9663torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6047/12942], Loss: 2.3633, Perplexity: 10.6261torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6048/12942], Loss: 2.2590, Perplexity: 9

Epoch [1/1], Step [6110/12942], Loss: 2.2257, Perplexity: 9.2602torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6111/12942], Loss: 2.5682, Perplexity: 13.0428torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6112/12942], Loss: 2.2992, Perplexity: 9.9661torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6113/12942], Loss: 2.3361, Perplexity: 10.3405torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6114/12942], Loss: 2.4124, Perplexity: 11.1611torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6115/12942], Loss: 2.3786, Perplexity: 10.7897torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6116/12942], Loss: 2.4060, Perplexity: 11.0896torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6117/12942], Loss: 2.1207, Perplexity: 8.3370torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6118/12942], Loss: 2.3320, Perplexity: 1

Epoch [1/1], Step [6180/12942], Loss: 2.1694, Perplexity: 8.7528torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6181/12942], Loss: 2.5815, Perplexity: 13.2167torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6182/12942], Loss: 2.2192, Perplexity: 9.1997torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6183/12942], Loss: 2.4349, Perplexity: 11.4143torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6184/12942], Loss: 2.2629, Perplexity: 9.6108torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [6185/12942], Loss: 2.7412, Perplexity: 15.5050torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6186/12942], Loss: 2.1547, Perplexity: 8.6253torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6187/12942], Loss: 2.5644, Perplexity: 12.9935torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6188/12942], Loss: 2.2976, Perplexity: 9.

Epoch [1/1], Step [6250/12942], Loss: 2.3205, Perplexity: 10.1812torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6251/12942], Loss: 2.4972, Perplexity: 12.1483torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6252/12942], Loss: 2.5762, Perplexity: 13.1469torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [6253/12942], Loss: 2.6749, Perplexity: 14.5109torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6254/12942], Loss: 2.6298, Perplexity: 13.8706torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [6255/12942], Loss: 2.3579, Perplexity: 10.5688torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6256/12942], Loss: 2.0187, Perplexity: 7.5288torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [6257/12942], Loss: 2.4389, Perplexity: 11.4600torch.Size([32, 20, 512])
torch.Size([32, 20, 8855])
Epoch [1/1], Step [6258/12942], Loss: 2.9204, Perplexity:

Epoch [1/1], Step [6320/12942], Loss: 2.3115, Perplexity: 10.0899torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6321/12942], Loss: 2.0771, Perplexity: 7.9811torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6322/12942], Loss: 1.9917, Perplexity: 7.3281torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6323/12942], Loss: 2.6485, Perplexity: 14.1324torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6324/12942], Loss: 2.1168, Perplexity: 8.3041torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6325/12942], Loss: 2.3998, Perplexity: 11.0209torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6326/12942], Loss: 2.2621, Perplexity: 9.6036torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6327/12942], Loss: 2.3071, Perplexity: 10.0455torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6328/12942], Loss: 2.3939, Perplexity: 10

Epoch [1/1], Step [6390/12942], Loss: 2.2815, Perplexity: 9.7916torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6391/12942], Loss: 2.4739, Perplexity: 11.8691torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6392/12942], Loss: 2.5135, Perplexity: 12.3484torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6393/12942], Loss: 2.4205, Perplexity: 11.2515torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [6394/12942], Loss: 2.6276, Perplexity: 13.8401torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6395/12942], Loss: 2.2536, Perplexity: 9.5219torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [6396/12942], Loss: 2.1731, Perplexity: 8.7851torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6397/12942], Loss: 2.7158, Perplexity: 15.1167torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6398/12942], Loss: 2.3270, Perplexity: 1

Epoch [1/1], Step [6460/12942], Loss: 2.1528, Perplexity: 8.6093torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6461/12942], Loss: 2.3762, Perplexity: 10.7643torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6462/12942], Loss: 1.9886, Perplexity: 7.3053torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6463/12942], Loss: 2.5231, Perplexity: 12.4677torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6464/12942], Loss: 2.0863, Perplexity: 8.0548torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6465/12942], Loss: 2.1717, Perplexity: 8.7733torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6466/12942], Loss: 2.5488, Perplexity: 12.7915torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [6467/12942], Loss: 2.7829, Perplexity: 16.1652torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6468/12942], Loss: 2.2359, Perplexity: 9.

Epoch [1/1], Step [6530/12942], Loss: 2.5362, Perplexity: 12.6314torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6531/12942], Loss: 2.1550, Perplexity: 8.6275torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6532/12942], Loss: 2.3723, Perplexity: 10.7220torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6533/12942], Loss: 2.0391, Perplexity: 7.6837torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6534/12942], Loss: 2.4588, Perplexity: 11.6912torch.Size([32, 25, 512])
torch.Size([32, 25, 8855])
Epoch [1/1], Step [6535/12942], Loss: 3.2290, Perplexity: 25.2551torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6536/12942], Loss: 2.1921, Perplexity: 8.9544torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6537/12942], Loss: 2.5045, Perplexity: 12.2372torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6538/12942], Loss: 2.4161, Perplexity: 1

Epoch [1/1], Step [6600/12942], Loss: 2.3754, Perplexity: 10.7552
torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6601/12942], Loss: 2.5928, Perplexity: 13.3669torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6602/12942], Loss: 2.3940, Perplexity: 10.9575torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6603/12942], Loss: 2.0131, Perplexity: 7.4865torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6604/12942], Loss: 2.1527, Perplexity: 8.6077torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6605/12942], Loss: 2.3632, Perplexity: 10.6246torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [6606/12942], Loss: 2.6716, Perplexity: 14.4628torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6607/12942], Loss: 2.4231, Perplexity: 11.2803torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6608/12942], Loss: 2.0787, Perplexity:

Epoch [1/1], Step [6670/12942], Loss: 2.7106, Perplexity: 15.0388torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6671/12942], Loss: 2.5187, Perplexity: 12.4119torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6672/12942], Loss: 2.2701, Perplexity: 9.6804torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6673/12942], Loss: 1.9221, Perplexity: 6.8350torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6674/12942], Loss: 2.3161, Perplexity: 10.1360torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6675/12942], Loss: 2.3378, Perplexity: 10.3585torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6676/12942], Loss: 2.5053, Perplexity: 12.2468torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6677/12942], Loss: 2.0263, Perplexity: 7.5858torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6678/12942], Loss: 2.3087, Perplexity: 1

Epoch [1/1], Step [6740/12942], Loss: 2.2100, Perplexity: 9.1155torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6741/12942], Loss: 2.3412, Perplexity: 10.3941torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6742/12942], Loss: 2.2586, Perplexity: 9.5697torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [6743/12942], Loss: 2.5295, Perplexity: 12.5474torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6744/12942], Loss: 2.1457, Perplexity: 8.5482torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6745/12942], Loss: 2.2768, Perplexity: 9.7455torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6746/12942], Loss: 2.4610, Perplexity: 11.7166torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6747/12942], Loss: 2.0815, Perplexity: 8.0162torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [6748/12942], Loss: 2.5507, Perplexity: 12.

Epoch [1/1], Step [6810/12942], Loss: 2.0874, Perplexity: 8.0637torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6811/12942], Loss: 2.1309, Perplexity: 8.4228torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6812/12942], Loss: 2.3031, Perplexity: 10.0047torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [6813/12942], Loss: 2.2779, Perplexity: 9.7564torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [6814/12942], Loss: 2.6932, Perplexity: 14.7789torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6815/12942], Loss: 2.3783, Perplexity: 10.7866torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6816/12942], Loss: 2.3224, Perplexity: 10.2001torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6817/12942], Loss: 2.3365, Perplexity: 10.3454torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6818/12942], Loss: 2.3389, Perplexity: 1

Epoch [1/1], Step [6880/12942], Loss: 2.0629, Perplexity: 7.8687torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6881/12942], Loss: 2.3050, Perplexity: 10.0246torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6882/12942], Loss: 2.3782, Perplexity: 10.7857torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6883/12942], Loss: 2.3329, Perplexity: 10.3079torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6884/12942], Loss: 2.7856, Perplexity: 16.2090torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6885/12942], Loss: 2.3594, Perplexity: 10.5846torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [6886/12942], Loss: 2.0711, Perplexity: 7.9336torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [6887/12942], Loss: 2.3765, Perplexity: 10.7672torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6888/12942], Loss: 2.2895, Perplexity: 

Epoch [1/1], Step [6950/12942], Loss: 2.3944, Perplexity: 10.9621torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6951/12942], Loss: 2.2682, Perplexity: 9.6622torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6952/12942], Loss: 2.3621, Perplexity: 10.6137torch.Size([32, 24, 512])
torch.Size([32, 24, 8855])
Epoch [1/1], Step [6953/12942], Loss: 3.2489, Perplexity: 25.7632torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [6954/12942], Loss: 2.2661, Perplexity: 9.6417torch.Size([32, 25, 512])
torch.Size([32, 25, 8855])
Epoch [1/1], Step [6955/12942], Loss: 3.3441, Perplexity: 28.3357torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [6956/12942], Loss: 2.4950, Perplexity: 12.1222torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [6957/12942], Loss: 2.1377, Perplexity: 8.4797torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [6958/12942], Loss: 2.4165, Perplexity: 1

Epoch [1/1], Step [7020/12942], Loss: 2.2493, Perplexity: 9.4808torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7021/12942], Loss: 2.1973, Perplexity: 9.0008torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7022/12942], Loss: 2.3845, Perplexity: 10.8533torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [7023/12942], Loss: 2.2536, Perplexity: 9.5221torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7024/12942], Loss: 2.2959, Perplexity: 9.9329torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7025/12942], Loss: 2.3655, Perplexity: 10.6498torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7026/12942], Loss: 2.5651, Perplexity: 13.0022torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7027/12942], Loss: 2.1929, Perplexity: 8.9610torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [7028/12942], Loss: 2.6767, Perplexity: 14.

Epoch [1/1], Step [7090/12942], Loss: 2.8300, Perplexity: 16.9448torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [7091/12942], Loss: 2.9299, Perplexity: 18.7254torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7092/12942], Loss: 2.2403, Perplexity: 9.3961torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [7093/12942], Loss: 2.5670, Perplexity: 13.0264torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7094/12942], Loss: 2.2329, Perplexity: 9.3270torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [7095/12942], Loss: 2.3584, Perplexity: 10.5738torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [7096/12942], Loss: 2.2771, Perplexity: 9.7483torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7097/12942], Loss: 1.9916, Perplexity: 7.3274torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7098/12942], Loss: 2.2985, Perplexity: 9.

Epoch [1/1], Step [7160/12942], Loss: 2.4335, Perplexity: 11.3992torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7161/12942], Loss: 2.0642, Perplexity: 7.8789torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7162/12942], Loss: 2.2971, Perplexity: 9.9457torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7163/12942], Loss: 2.2927, Perplexity: 9.9016torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [7164/12942], Loss: 2.7157, Perplexity: 15.1145torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7165/12942], Loss: 2.0333, Perplexity: 7.6392torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7166/12942], Loss: 2.2520, Perplexity: 9.5066torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7167/12942], Loss: 2.2555, Perplexity: 9.5398torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7168/12942], Loss: 2.2185, Perplexity: 9.19

Epoch [1/1], Step [7230/12942], Loss: 2.2915, Perplexity: 9.8901torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [7231/12942], Loss: 2.4864, Perplexity: 12.0182torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7232/12942], Loss: 2.5446, Perplexity: 12.7380torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7233/12942], Loss: 2.2597, Perplexity: 9.5804torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7234/12942], Loss: 2.3722, Perplexity: 10.7204torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7235/12942], Loss: 2.2644, Perplexity: 9.6257torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7236/12942], Loss: 2.4460, Perplexity: 11.5426torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7237/12942], Loss: 2.2179, Perplexity: 9.1879torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7238/12942], Loss: 2.3740, Perplexity: 10

Epoch [1/1], Step [7300/12942], Loss: 2.4021, Perplexity: 11.0461
torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7301/12942], Loss: 2.3076, Perplexity: 10.0501torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7302/12942], Loss: 2.5014, Perplexity: 12.1995torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [7303/12942], Loss: 2.3104, Perplexity: 10.0780torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7304/12942], Loss: 2.5308, Perplexity: 12.5639torch.Size([32, 23, 512])
torch.Size([32, 23, 8855])
Epoch [1/1], Step [7305/12942], Loss: 2.9866, Perplexity: 19.8176torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [7306/12942], Loss: 2.3905, Perplexity: 10.9192torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7307/12942], Loss: 2.2145, Perplexity: 9.1568torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7308/12942], Loss: 2.1361, Perplexity

Epoch [1/1], Step [7370/12942], Loss: 2.1963, Perplexity: 8.9914torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [7371/12942], Loss: 2.4303, Perplexity: 11.3626torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [7372/12942], Loss: 2.3460, Perplexity: 10.4440torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [7373/12942], Loss: 2.4766, Perplexity: 11.9005torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [7374/12942], Loss: 2.4727, Perplexity: 11.8539torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7375/12942], Loss: 2.2401, Perplexity: 9.3939torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7376/12942], Loss: 2.3046, Perplexity: 10.0203torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [7377/12942], Loss: 2.5132, Perplexity: 12.3445torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7378/12942], Loss: 2.2676, Perplexity: 

Epoch [1/1], Step [7440/12942], Loss: 3.0325, Perplexity: 20.7500torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7441/12942], Loss: 2.2446, Perplexity: 9.4366torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7442/12942], Loss: 2.0212, Perplexity: 7.5473torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [7443/12942], Loss: 2.4361, Perplexity: 11.4283torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [7444/12942], Loss: 2.2233, Perplexity: 9.2381torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7445/12942], Loss: 2.0575, Perplexity: 7.8262torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7446/12942], Loss: 2.3733, Perplexity: 10.7327torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7447/12942], Loss: 2.2639, Perplexity: 9.6208torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7448/12942], Loss: 1.9550, Perplexity: 7.0

Epoch [1/1], Step [7510/12942], Loss: 2.3336, Perplexity: 10.3146torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7511/12942], Loss: 2.4212, Perplexity: 11.2593torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [7512/12942], Loss: 2.1530, Perplexity: 8.6105torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7513/12942], Loss: 2.4297, Perplexity: 11.3556torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7514/12942], Loss: 2.2311, Perplexity: 9.3097torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7515/12942], Loss: 2.5495, Perplexity: 12.8003torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [7516/12942], Loss: 2.7636, Perplexity: 15.8561torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7517/12942], Loss: 1.9533, Perplexity: 7.0522torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7518/12942], Loss: 2.3505, Perplexity: 1

Epoch [1/1], Step [7580/12942], Loss: 2.4428, Perplexity: 11.5047torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [7581/12942], Loss: 2.7140, Perplexity: 15.0888torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [7582/12942], Loss: 2.7296, Perplexity: 15.3274torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7583/12942], Loss: 2.1244, Perplexity: 8.3676torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7584/12942], Loss: 2.0132, Perplexity: 7.4870torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7585/12942], Loss: 2.4144, Perplexity: 11.1832torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [7586/12942], Loss: 2.3523, Perplexity: 10.5097torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7587/12942], Loss: 2.0788, Perplexity: 7.9950torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7588/12942], Loss: 2.2591, Perplexity: 9

Epoch [1/1], Step [7650/12942], Loss: 2.5539, Perplexity: 12.8566torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7651/12942], Loss: 2.2442, Perplexity: 9.4325torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7652/12942], Loss: 2.2337, Perplexity: 9.3342torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7653/12942], Loss: 2.3288, Perplexity: 10.2654torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7654/12942], Loss: 2.1961, Perplexity: 8.9902torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7655/12942], Loss: 2.1658, Perplexity: 8.7219torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7656/12942], Loss: 2.2833, Perplexity: 9.8093torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7657/12942], Loss: 2.3964, Perplexity: 10.9835torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7658/12942], Loss: 2.4697, Perplexity: 11.

Epoch [1/1], Step [7720/12942], Loss: 2.2540, Perplexity: 9.5261torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [7721/12942], Loss: 2.3054, Perplexity: 10.0281torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [7722/12942], Loss: 2.6288, Perplexity: 13.8576torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7723/12942], Loss: 2.2259, Perplexity: 9.2614torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7724/12942], Loss: 2.2194, Perplexity: 9.2018torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7725/12942], Loss: 2.4139, Perplexity: 11.1771torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7726/12942], Loss: 2.0861, Perplexity: 8.0531torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7727/12942], Loss: 2.2367, Perplexity: 9.3627torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7728/12942], Loss: 2.2353, Perplexity: 9.3

Epoch [1/1], Step [7790/12942], Loss: 2.0767, Perplexity: 7.9780torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7791/12942], Loss: 2.3119, Perplexity: 10.0932torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7792/12942], Loss: 2.2547, Perplexity: 9.5326torch.Size([32, 21, 512])
torch.Size([32, 21, 8855])
Epoch [1/1], Step [7793/12942], Loss: 2.8099, Perplexity: 16.6090torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7794/12942], Loss: 2.4915, Perplexity: 12.0799torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [7795/12942], Loss: 2.3596, Perplexity: 10.5866torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [7796/12942], Loss: 2.4458, Perplexity: 11.5395torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7797/12942], Loss: 2.3514, Perplexity: 10.4999torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [7798/12942], Loss: 2.3231, Perplexity: 

Epoch [1/1], Step [7860/12942], Loss: 2.8214, Perplexity: 16.7997torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [7861/12942], Loss: 2.5928, Perplexity: 13.3665torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7862/12942], Loss: 2.3969, Perplexity: 10.9896torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7863/12942], Loss: 1.9969, Perplexity: 7.3660torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [7864/12942], Loss: 2.4892, Perplexity: 12.0519torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7865/12942], Loss: 2.2905, Perplexity: 9.8794torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7866/12942], Loss: 1.8229, Perplexity: 6.1896torch.Size([32, 25, 512])
torch.Size([32, 25, 8855])
Epoch [1/1], Step [7867/12942], Loss: 3.0514, Perplexity: 21.1439torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7868/12942], Loss: 2.3480, Perplexity: 1

Epoch [1/1], Step [7930/12942], Loss: 2.3093, Perplexity: 10.0676torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7931/12942], Loss: 2.3297, Perplexity: 10.2747torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [7932/12942], Loss: 2.3406, Perplexity: 10.3874torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [7933/12942], Loss: 2.0586, Perplexity: 7.8346torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7934/12942], Loss: 2.2528, Perplexity: 9.5140torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [7935/12942], Loss: 2.3363, Perplexity: 10.3429torch.Size([32, 23, 512])
torch.Size([32, 23, 8855])
Epoch [1/1], Step [7936/12942], Loss: 3.1575, Perplexity: 23.5126torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [7937/12942], Loss: 2.3901, Perplexity: 10.9146torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [7938/12942], Loss: 2.7305, Perplexity: 

Epoch [1/1], Step [8000/12942], Loss: 2.4480, Perplexity: 11.5653
torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [8001/12942], Loss: 2.6580, Perplexity: 14.2683torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8002/12942], Loss: 2.3136, Perplexity: 10.1108torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8003/12942], Loss: 2.2408, Perplexity: 9.4006torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [8004/12942], Loss: 2.2242, Perplexity: 9.2463torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8005/12942], Loss: 2.3593, Perplexity: 10.5832torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8006/12942], Loss: 2.4578, Perplexity: 11.6791torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [8007/12942], Loss: 2.2304, Perplexity: 9.3038torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8008/12942], Loss: 2.2775, Perplexity: 

Epoch [1/1], Step [8070/12942], Loss: 2.1584, Perplexity: 8.6569torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8071/12942], Loss: 2.3531, Perplexity: 10.5180torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8072/12942], Loss: 2.2224, Perplexity: 9.2293torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8073/12942], Loss: 2.3479, Perplexity: 10.4631torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8074/12942], Loss: 2.1533, Perplexity: 8.6130torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [8075/12942], Loss: 2.6318, Perplexity: 13.8994torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [8076/12942], Loss: 2.0208, Perplexity: 7.5444torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8077/12942], Loss: 2.1549, Perplexity: 8.6268torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8078/12942], Loss: 2.5144, Perplexity: 12.

Epoch [1/1], Step [8140/12942], Loss: 2.4582, Perplexity: 11.6839torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [8141/12942], Loss: 2.6026, Perplexity: 13.4992torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8142/12942], Loss: 2.1740, Perplexity: 8.7933torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8143/12942], Loss: 2.0306, Perplexity: 7.6185torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8144/12942], Loss: 2.5954, Perplexity: 13.4019torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [8145/12942], Loss: 2.7725, Perplexity: 15.9989torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8146/12942], Loss: 2.2686, Perplexity: 9.6662torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8147/12942], Loss: 2.2040, Perplexity: 9.0607torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8148/12942], Loss: 2.2123, Perplexity: 9.

Epoch [1/1], Step [8210/12942], Loss: 2.3715, Perplexity: 10.7139torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8211/12942], Loss: 2.5519, Perplexity: 12.8312torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8212/12942], Loss: 2.0460, Perplexity: 7.7367torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8213/12942], Loss: 2.1870, Perplexity: 8.9084torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8214/12942], Loss: 2.0037, Perplexity: 7.4161torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [8215/12942], Loss: 2.6188, Perplexity: 13.7192torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8216/12942], Loss: 2.2004, Perplexity: 9.0287torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8217/12942], Loss: 2.0174, Perplexity: 7.5189torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8218/12942], Loss: 2.2232, Perplexity: 9.2

Epoch [1/1], Step [8280/12942], Loss: 2.2874, Perplexity: 9.8489torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8281/12942], Loss: 2.4740, Perplexity: 11.8700torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [8282/12942], Loss: 2.3365, Perplexity: 10.3455torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8283/12942], Loss: 2.3166, Perplexity: 10.1409torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [8284/12942], Loss: 2.8808, Perplexity: 17.8287torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8285/12942], Loss: 2.4223, Perplexity: 11.2714torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8286/12942], Loss: 2.3906, Perplexity: 10.9203torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8287/12942], Loss: 2.2170, Perplexity: 9.1801torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8288/12942], Loss: 2.3638, Perplexity: 

Epoch [1/1], Step [8350/12942], Loss: 2.7080, Perplexity: 14.9985torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8351/12942], Loss: 2.2970, Perplexity: 9.9447torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8352/12942], Loss: 2.4820, Perplexity: 11.9654torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8353/12942], Loss: 2.3556, Perplexity: 10.5440torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [8354/12942], Loss: 2.6683, Perplexity: 14.4154torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8355/12942], Loss: 2.2341, Perplexity: 9.3380torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8356/12942], Loss: 2.1744, Perplexity: 8.7967torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8357/12942], Loss: 1.9831, Perplexity: 7.2652torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8358/12942], Loss: 2.5676, Perplexity: 13

Epoch [1/1], Step [8420/12942], Loss: 2.2918, Perplexity: 9.8931torch.Size([32, 20, 512])
torch.Size([32, 20, 8855])
Epoch [1/1], Step [8421/12942], Loss: 2.8635, Perplexity: 17.5228torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [8422/12942], Loss: 2.4043, Perplexity: 11.0702torch.Size([32, 20, 512])
torch.Size([32, 20, 8855])
Epoch [1/1], Step [8423/12942], Loss: 2.8934, Perplexity: 18.0541torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [8424/12942], Loss: 2.0249, Perplexity: 7.5757torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8425/12942], Loss: 2.0509, Perplexity: 7.7746torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [8426/12942], Loss: 2.6517, Perplexity: 14.1775torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8427/12942], Loss: 2.1918, Perplexity: 8.9514torch.Size([32, 21, 512])
torch.Size([32, 21, 8855])
Epoch [1/1], Step [8428/12942], Loss: 3.0171, Perplexity: 20

Epoch [1/1], Step [8490/12942], Loss: 2.2700, Perplexity: 9.6797torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8491/12942], Loss: 2.2635, Perplexity: 9.6169torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8492/12942], Loss: 2.5262, Perplexity: 12.5057torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8493/12942], Loss: 2.3687, Perplexity: 10.6838torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8494/12942], Loss: 2.3190, Perplexity: 10.1655torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8495/12942], Loss: 2.2056, Perplexity: 9.0760torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8496/12942], Loss: 1.9591, Perplexity: 7.0928torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8497/12942], Loss: 2.3230, Perplexity: 10.2068torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [8498/12942], Loss: 2.6048, Perplexity: 13

Epoch [1/1], Step [8560/12942], Loss: 2.2224, Perplexity: 9.2290torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8561/12942], Loss: 2.4735, Perplexity: 11.8642torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8562/12942], Loss: 2.2025, Perplexity: 9.0474torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8563/12942], Loss: 2.3657, Perplexity: 10.6514torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8564/12942], Loss: 2.0608, Perplexity: 7.8525torch.Size([32, 20, 512])
torch.Size([32, 20, 8855])
Epoch [1/1], Step [8565/12942], Loss: 2.7573, Perplexity: 15.7566torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [8566/12942], Loss: 2.7312, Perplexity: 15.3517torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8567/12942], Loss: 2.2796, Perplexity: 9.7724torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8568/12942], Loss: 1.9813, Perplexity: 7.

Epoch [1/1], Step [8630/12942], Loss: 2.3944, Perplexity: 10.9613torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8631/12942], Loss: 2.1448, Perplexity: 8.5404torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8632/12942], Loss: 2.2837, Perplexity: 9.8125torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8633/12942], Loss: 2.2646, Perplexity: 9.6268torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8634/12942], Loss: 2.1052, Perplexity: 8.2085torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8635/12942], Loss: 2.2725, Perplexity: 9.7039torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8636/12942], Loss: 2.1727, Perplexity: 8.7819torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8637/12942], Loss: 1.9778, Perplexity: 7.2269torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [8638/12942], Loss: 2.2112, Perplexity: 9.127

Epoch [1/1], Step [8700/12942], Loss: 2.1229, Perplexity: 8.3557
torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [8701/12942], Loss: 2.2900, Perplexity: 9.8746torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [8702/12942], Loss: 2.1539, Perplexity: 8.6182torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8703/12942], Loss: 2.1164, Perplexity: 8.3015torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8704/12942], Loss: 2.1908, Perplexity: 8.9428torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8705/12942], Loss: 2.1987, Perplexity: 9.0128torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8706/12942], Loss: 2.4849, Perplexity: 12.0001torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [8707/12942], Loss: 2.6032, Perplexity: 13.5070torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8708/12942], Loss: 2.3108, Perplexity: 10.

Epoch [1/1], Step [8770/12942], Loss: 2.3435, Perplexity: 10.4173torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8771/12942], Loss: 2.4162, Perplexity: 11.2028torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8772/12942], Loss: 2.1151, Perplexity: 8.2903torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8773/12942], Loss: 2.1870, Perplexity: 8.9088torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8774/12942], Loss: 2.2349, Perplexity: 9.3455torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8775/12942], Loss: 2.3467, Perplexity: 10.4512torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8776/12942], Loss: 2.3679, Perplexity: 10.6752torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8777/12942], Loss: 2.1566, Perplexity: 8.6417torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [8778/12942], Loss: 2.1520, Perplexity: 8.

Epoch [1/1], Step [8840/12942], Loss: 2.3265, Perplexity: 10.2416torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [8841/12942], Loss: 2.3890, Perplexity: 10.9022torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8842/12942], Loss: 2.1858, Perplexity: 8.8980torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8843/12942], Loss: 2.2780, Perplexity: 9.7572torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [8844/12942], Loss: 2.3585, Perplexity: 10.5746torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [8845/12942], Loss: 2.6454, Perplexity: 14.0889torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8846/12942], Loss: 2.1804, Perplexity: 8.8497torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8847/12942], Loss: 2.0469, Perplexity: 7.7436torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8848/12942], Loss: 2.2634, Perplexity: 9.

Epoch [1/1], Step [8910/12942], Loss: 2.3046, Perplexity: 10.0202torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8911/12942], Loss: 1.9666, Perplexity: 7.1461torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8912/12942], Loss: 2.2012, Perplexity: 9.0359torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8913/12942], Loss: 2.1113, Perplexity: 8.2590torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8914/12942], Loss: 1.9779, Perplexity: 7.2274torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8915/12942], Loss: 2.1950, Perplexity: 8.9804torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8916/12942], Loss: 2.1486, Perplexity: 8.5729torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8917/12942], Loss: 2.2915, Perplexity: 9.8900torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8918/12942], Loss: 2.0749, Perplexity: 7.963

Epoch [1/1], Step [8980/12942], Loss: 2.2785, Perplexity: 9.7624torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8981/12942], Loss: 2.3141, Perplexity: 10.1161torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8982/12942], Loss: 2.2159, Perplexity: 9.1692torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [8983/12942], Loss: 2.5358, Perplexity: 12.6266torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [8984/12942], Loss: 2.5780, Perplexity: 13.1707torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [8985/12942], Loss: 2.1964, Perplexity: 8.9923torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [8986/12942], Loss: 2.0411, Perplexity: 7.6993torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [8987/12942], Loss: 2.4161, Perplexity: 11.2020torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [8988/12942], Loss: 2.2466, Perplexity: 9.

Epoch [1/1], Step [9050/12942], Loss: 2.6246, Perplexity: 13.7989torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9051/12942], Loss: 1.9545, Perplexity: 7.0603torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [9052/12942], Loss: 2.7632, Perplexity: 15.8497torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9053/12942], Loss: 2.2687, Perplexity: 9.6666torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9054/12942], Loss: 2.2208, Perplexity: 9.2145torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9055/12942], Loss: 2.0271, Perplexity: 7.5922torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9056/12942], Loss: 2.4087, Perplexity: 11.1195torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9057/12942], Loss: 2.1646, Perplexity: 8.7114torch.Size([32, 21, 512])
torch.Size([32, 21, 8855])
Epoch [1/1], Step [9058/12942], Loss: 2.9086, Perplexity: 18.

Epoch [1/1], Step [9120/12942], Loss: 2.2109, Perplexity: 9.1240torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9121/12942], Loss: 2.1945, Perplexity: 8.9755torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9122/12942], Loss: 2.4702, Perplexity: 11.8246torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [9123/12942], Loss: 2.8061, Perplexity: 16.5459torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9124/12942], Loss: 2.4386, Perplexity: 11.4567torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9125/12942], Loss: 2.2900, Perplexity: 9.8747torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [9126/12942], Loss: 2.5759, Perplexity: 13.1436torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9127/12942], Loss: 2.3618, Perplexity: 10.6102torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9128/12942], Loss: 2.3022, Perplexity: 9

Epoch [1/1], Step [9190/12942], Loss: 1.8097, Perplexity: 6.1089torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9191/12942], Loss: 2.3865, Perplexity: 10.8750torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9192/12942], Loss: 1.9832, Perplexity: 7.2658torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [9193/12942], Loss: 2.4455, Perplexity: 11.5364torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9194/12942], Loss: 2.2008, Perplexity: 9.0324torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9195/12942], Loss: 2.1210, Perplexity: 8.3396torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9196/12942], Loss: 2.2971, Perplexity: 9.9451torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [9197/12942], Loss: 2.6765, Perplexity: 14.5343torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9198/12942], Loss: 2.4126, Perplexity: 11.

Epoch [1/1], Step [9260/12942], Loss: 2.0908, Perplexity: 8.0913torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9261/12942], Loss: 2.1698, Perplexity: 8.7569torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9262/12942], Loss: 2.4314, Perplexity: 11.3747torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9263/12942], Loss: 2.3667, Perplexity: 10.6616torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [9264/12942], Loss: 2.4287, Perplexity: 11.3439torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [9265/12942], Loss: 2.9406, Perplexity: 18.9265torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [9266/12942], Loss: 2.3189, Perplexity: 10.1641torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9267/12942], Loss: 2.1296, Perplexity: 8.4115torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9268/12942], Loss: 2.1926, Perplexity: 8

Epoch [1/1], Step [9330/12942], Loss: 2.0903, Perplexity: 8.0874torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9331/12942], Loss: 2.3807, Perplexity: 10.8129torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9332/12942], Loss: 2.3446, Perplexity: 10.4288torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9333/12942], Loss: 2.1743, Perplexity: 8.7962torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [9334/12942], Loss: 2.7005, Perplexity: 14.8875torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9335/12942], Loss: 2.1635, Perplexity: 8.7012torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9336/12942], Loss: 2.0797, Perplexity: 8.0023torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [9337/12942], Loss: 2.6199, Perplexity: 13.7343torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [9338/12942], Loss: 2.2710, Perplexity: 9.

Epoch [1/1], Step [9400/12942], Loss: 2.1734, Perplexity: 8.7880
torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9401/12942], Loss: 2.3306, Perplexity: 10.2844torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9402/12942], Loss: 2.4004, Perplexity: 11.0280torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [9403/12942], Loss: 2.6055, Perplexity: 13.5378torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9404/12942], Loss: 2.1202, Perplexity: 8.3324torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9405/12942], Loss: 2.1981, Perplexity: 9.0076torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [9406/12942], Loss: 2.5747, Perplexity: 13.1273torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [9407/12942], Loss: 2.7006, Perplexity: 14.8884torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9408/12942], Loss: 2.3405, Perplexity: 

Epoch [1/1], Step [9470/12942], Loss: 2.0963, Perplexity: 8.1359torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9471/12942], Loss: 2.0916, Perplexity: 8.0981torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9472/12942], Loss: 2.0258, Perplexity: 7.5819torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [9473/12942], Loss: 2.4895, Perplexity: 12.0547torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9474/12942], Loss: 2.3213, Perplexity: 10.1887torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9475/12942], Loss: 2.2289, Perplexity: 9.2898torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9476/12942], Loss: 2.4648, Perplexity: 11.7615torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9477/12942], Loss: 1.8066, Perplexity: 6.0898torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9478/12942], Loss: 2.0329, Perplexity: 7.6

Epoch [1/1], Step [9540/12942], Loss: 2.3651, Perplexity: 10.6454torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9541/12942], Loss: 2.0430, Perplexity: 7.7134torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9542/12942], Loss: 2.1904, Perplexity: 8.9386torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9543/12942], Loss: 2.2197, Perplexity: 9.2047torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9544/12942], Loss: 2.2849, Perplexity: 9.8251torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9545/12942], Loss: 2.2690, Perplexity: 9.6697torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9546/12942], Loss: 2.1479, Perplexity: 8.5668torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9547/12942], Loss: 2.2163, Perplexity: 9.1737torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9548/12942], Loss: 2.2468, Perplexity: 9.457

Epoch [1/1], Step [9610/12942], Loss: 2.3588, Perplexity: 10.5783torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9611/12942], Loss: 2.3371, Perplexity: 10.3516torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [9612/12942], Loss: 2.6615, Perplexity: 14.3183torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9613/12942], Loss: 2.1392, Perplexity: 8.4930torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9614/12942], Loss: 2.3431, Perplexity: 10.4132torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [9615/12942], Loss: 2.3429, Perplexity: 10.4113torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9616/12942], Loss: 2.2537, Perplexity: 9.5233torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9617/12942], Loss: 2.0902, Perplexity: 8.0863torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9618/12942], Loss: 2.2471, Perplexity: 9

Epoch [1/1], Step [9680/12942], Loss: 2.0016, Perplexity: 7.4011torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9681/12942], Loss: 2.3862, Perplexity: 10.8720torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9682/12942], Loss: 2.4749, Perplexity: 11.8801torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [9683/12942], Loss: 2.4524, Perplexity: 11.6161torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9684/12942], Loss: 2.3079, Perplexity: 10.0534torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [9685/12942], Loss: 2.7167, Perplexity: 15.1310torch.Size([32, 22, 512])
torch.Size([32, 22, 8855])
Epoch [1/1], Step [9686/12942], Loss: 3.3426, Perplexity: 28.2934torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9687/12942], Loss: 2.4927, Perplexity: 12.0939torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9688/12942], Loss: 2.2960, Perplexity:

Epoch [1/1], Step [9750/12942], Loss: 2.3008, Perplexity: 9.9826torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9751/12942], Loss: 2.2011, Perplexity: 9.0352torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9752/12942], Loss: 2.0966, Perplexity: 8.1384torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9753/12942], Loss: 2.0210, Perplexity: 7.5461torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9754/12942], Loss: 2.0547, Perplexity: 7.8042torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9755/12942], Loss: 2.1078, Perplexity: 8.2305torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9756/12942], Loss: 2.3227, Perplexity: 10.2031torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9757/12942], Loss: 2.2414, Perplexity: 9.4066torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9758/12942], Loss: 2.4754, Perplexity: 11.88

Epoch [1/1], Step [9820/12942], Loss: 2.3940, Perplexity: 10.9568torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [9821/12942], Loss: 2.0469, Perplexity: 7.7441torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9822/12942], Loss: 2.4432, Perplexity: 11.5093torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9823/12942], Loss: 2.0506, Perplexity: 7.7727torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [9824/12942], Loss: 2.7369, Perplexity: 15.4391torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9825/12942], Loss: 2.3136, Perplexity: 10.1110torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9826/12942], Loss: 2.2921, Perplexity: 9.8955torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9827/12942], Loss: 2.3112, Perplexity: 10.0870torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [9828/12942], Loss: 2.0935, Perplexity: 8

Epoch [1/1], Step [9890/12942], Loss: 2.1451, Perplexity: 8.5426torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9891/12942], Loss: 2.2566, Perplexity: 9.5502torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [9892/12942], Loss: 2.5514, Perplexity: 12.8244torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [9893/12942], Loss: 2.4209, Perplexity: 11.2557torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [9894/12942], Loss: 2.5533, Perplexity: 12.8492torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9895/12942], Loss: 2.4152, Perplexity: 11.1918torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9896/12942], Loss: 2.2649, Perplexity: 9.6297torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9897/12942], Loss: 2.1588, Perplexity: 8.6609torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [9898/12942], Loss: 2.0528, Perplexity: 7.

Epoch [1/1], Step [9960/12942], Loss: 2.3642, Perplexity: 10.6352torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9961/12942], Loss: 2.0357, Perplexity: 7.6578torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9962/12942], Loss: 2.3103, Perplexity: 10.0779torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9963/12942], Loss: 2.0103, Perplexity: 7.4657torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9964/12942], Loss: 2.2907, Perplexity: 9.8817torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [9965/12942], Loss: 2.0244, Perplexity: 7.5714torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [9966/12942], Loss: 2.1584, Perplexity: 8.6571torch.Size([32, 20, 512])
torch.Size([32, 20, 8855])
Epoch [1/1], Step [9967/12942], Loss: 2.8118, Perplexity: 16.6399torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [9968/12942], Loss: 2.2013, Perplexity: 9.0

Epoch [1/1], Step [10029/12942], Loss: 2.3419, Perplexity: 10.4013torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10030/12942], Loss: 2.2683, Perplexity: 9.6626torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10031/12942], Loss: 2.2191, Perplexity: 9.1988torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10032/12942], Loss: 1.8694, Perplexity: 6.4842torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10033/12942], Loss: 2.4630, Perplexity: 11.7403torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10034/12942], Loss: 1.9394, Perplexity: 6.9549torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10035/12942], Loss: 2.5006, Perplexity: 12.1897torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10036/12942], Loss: 2.2249, Perplexity: 9.2528torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10037/12942], Loss: 2.3068, Perple

Epoch [1/1], Step [10098/12942], Loss: 2.1027, Perplexity: 8.1879torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10099/12942], Loss: 2.0695, Perplexity: 7.9211torch.Size([32, 22, 512])
torch.Size([32, 22, 8855])
Epoch [1/1], Step [10100/12942], Loss: 3.0098, Perplexity: 20.2842
torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10101/12942], Loss: 2.3060, Perplexity: 10.0340torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [10102/12942], Loss: 2.3502, Perplexity: 10.4881torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10103/12942], Loss: 2.1862, Perplexity: 8.9010torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10104/12942], Loss: 2.3280, Perplexity: 10.2571torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10105/12942], Loss: 2.2116, Perplexity: 9.1305torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [10106/12942], Loss: 2.5355, Perp

Epoch [1/1], Step [10167/12942], Loss: 2.3681, Perplexity: 10.6775torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10168/12942], Loss: 2.1061, Perplexity: 8.2162torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [10169/12942], Loss: 2.1599, Perplexity: 8.6707torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [10170/12942], Loss: 2.1042, Perplexity: 8.2007torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10171/12942], Loss: 1.9569, Perplexity: 7.0771torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10172/12942], Loss: 2.1461, Perplexity: 8.5516torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10173/12942], Loss: 2.4973, Perplexity: 12.1493torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10174/12942], Loss: 2.2713, Perplexity: 9.6919torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [10175/12942], Loss: 2.2844, Perplex

Epoch [1/1], Step [10236/12942], Loss: 2.1661, Perplexity: 8.7242torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10237/12942], Loss: 2.3488, Perplexity: 10.4725torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10238/12942], Loss: 2.4636, Perplexity: 11.7474torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10239/12942], Loss: 2.0119, Perplexity: 7.4774torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [10240/12942], Loss: 2.4146, Perplexity: 11.1848torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10241/12942], Loss: 2.2452, Perplexity: 9.4420torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [10242/12942], Loss: 2.2371, Perplexity: 9.3661torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [10243/12942], Loss: 2.6838, Perplexity: 14.6403torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10244/12942], Loss: 2.1993, Perpl

Epoch [1/1], Step [10305/12942], Loss: 1.9811, Perplexity: 7.2510torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10306/12942], Loss: 2.1290, Perplexity: 8.4065torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10307/12942], Loss: 2.2352, Perplexity: 9.3484torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [10308/12942], Loss: 2.5949, Perplexity: 13.3956torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10309/12942], Loss: 2.1445, Perplexity: 8.5382torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10310/12942], Loss: 2.3960, Perplexity: 10.9790torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [10311/12942], Loss: 2.2211, Perplexity: 9.2174torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [10312/12942], Loss: 2.3165, Perplexity: 10.1404torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [10313/12942], Loss: 2.5005, Perple

Epoch [1/1], Step [10374/12942], Loss: 2.4434, Perplexity: 11.5127torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10375/12942], Loss: 2.0617, Perplexity: 7.8593torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10376/12942], Loss: 2.4093, Perplexity: 11.1261torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10377/12942], Loss: 2.4523, Perplexity: 11.6151torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [10378/12942], Loss: 2.7092, Perplexity: 15.0168torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10379/12942], Loss: 2.0614, Perplexity: 7.8566torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10380/12942], Loss: 2.2695, Perplexity: 9.6743torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [10381/12942], Loss: 2.3293, Perplexity: 10.2704torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10382/12942], Loss: 2.2195, Perp

Epoch [1/1], Step [10443/12942], Loss: 2.4174, Perplexity: 11.2167torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10444/12942], Loss: 2.0939, Perplexity: 8.1165torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10445/12942], Loss: 1.9453, Perplexity: 6.9959torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10446/12942], Loss: 2.3267, Perplexity: 10.2437torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [10447/12942], Loss: 2.5528, Perplexity: 12.8428torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10448/12942], Loss: 2.1299, Perplexity: 8.4139torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10449/12942], Loss: 1.9750, Perplexity: 7.2066torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10450/12942], Loss: 2.4255, Perplexity: 11.3077torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10451/12942], Loss: 2.7528, Perpl

Epoch [1/1], Step [10512/12942], Loss: 2.0779, Perplexity: 7.9873torch.Size([32, 21, 512])
torch.Size([32, 21, 8855])
Epoch [1/1], Step [10513/12942], Loss: 2.9287, Perplexity: 18.7029torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [10514/12942], Loss: 2.4317, Perplexity: 11.3778torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10515/12942], Loss: 2.3743, Perplexity: 10.7439torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10516/12942], Loss: 2.4565, Perplexity: 11.6643torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10517/12942], Loss: 2.6080, Perplexity: 13.5719torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [10518/12942], Loss: 2.9630, Perplexity: 19.3568torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10519/12942], Loss: 2.0686, Perplexity: 7.9138torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10520/12942], Loss: 2.0769, Per

Epoch [1/1], Step [10581/12942], Loss: 2.3798, Perplexity: 10.8023torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [10582/12942], Loss: 2.4108, Perplexity: 11.1425torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [10583/12942], Loss: 2.2567, Perplexity: 9.5520torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10584/12942], Loss: 2.3908, Perplexity: 10.9224torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10585/12942], Loss: 2.0909, Perplexity: 8.0925torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10586/12942], Loss: 2.2699, Perplexity: 9.6786torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10587/12942], Loss: 2.2402, Perplexity: 9.3949torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10588/12942], Loss: 2.1402, Perplexity: 8.5010torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10589/12942], Loss: 2.2660, Perple

Epoch [1/1], Step [10650/12942], Loss: 2.2975, Perplexity: 9.9494torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10651/12942], Loss: 1.9351, Perplexity: 6.9244torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10652/12942], Loss: 2.0962, Perplexity: 8.1352torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [10653/12942], Loss: 2.0649, Perplexity: 7.8843torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10654/12942], Loss: 2.1154, Perplexity: 8.2932torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10655/12942], Loss: 2.0918, Perplexity: 8.0998torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [10656/12942], Loss: 2.7187, Perplexity: 15.1604torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10657/12942], Loss: 2.3007, Perplexity: 9.9807torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10658/12942], Loss: 2.1102, Perplexi

Epoch [1/1], Step [10719/12942], Loss: 2.6684, Perplexity: 14.4168torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10720/12942], Loss: 2.0577, Perplexity: 7.8280torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10721/12942], Loss: 2.3828, Perplexity: 10.8351torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [10722/12942], Loss: 2.4894, Perplexity: 12.0540torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10723/12942], Loss: 2.2335, Perplexity: 9.3326torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [10724/12942], Loss: 2.2054, Perplexity: 9.0743torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10725/12942], Loss: 1.8833, Perplexity: 6.5754torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10726/12942], Loss: 2.4047, Perplexity: 11.0751torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [10727/12942], Loss: 2.0763, Perpl

Epoch [1/1], Step [10788/12942], Loss: 2.0343, Perplexity: 7.6469torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10789/12942], Loss: 2.0686, Perplexity: 7.9135torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [10790/12942], Loss: 2.6888, Perplexity: 14.7141torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10791/12942], Loss: 2.3994, Perplexity: 11.0170torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10792/12942], Loss: 1.8703, Perplexity: 6.4901torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10793/12942], Loss: 2.0834, Perplexity: 8.0316torch.Size([32, 25, 512])
torch.Size([32, 25, 8855])
Epoch [1/1], Step [10794/12942], Loss: 3.4290, Perplexity: 30.8447torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [10795/12942], Loss: 2.5197, Perplexity: 12.4244torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10796/12942], Loss: 2.1960, Perpl

Epoch [1/1], Step [10857/12942], Loss: 2.2691, Perplexity: 9.6704torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10858/12942], Loss: 2.2730, Perplexity: 9.7090torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10859/12942], Loss: 2.1678, Perplexity: 8.7390torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10860/12942], Loss: 1.9969, Perplexity: 7.3660torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10861/12942], Loss: 2.2952, Perplexity: 9.9261torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10862/12942], Loss: 2.0951, Perplexity: 8.1259torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [10863/12942], Loss: 2.6048, Perplexity: 13.5283torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10864/12942], Loss: 2.1756, Perplexity: 8.8078torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10865/12942], Loss: 2.0354, Perplexi

Epoch [1/1], Step [10926/12942], Loss: 2.6801, Perplexity: 14.5871torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10927/12942], Loss: 2.1400, Perplexity: 8.4996torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10928/12942], Loss: 2.1165, Perplexity: 8.3024torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10929/12942], Loss: 1.8841, Perplexity: 6.5801torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [10930/12942], Loss: 2.5743, Perplexity: 13.1220torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [10931/12942], Loss: 2.3574, Perplexity: 10.5633torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [10932/12942], Loss: 2.2502, Perplexity: 9.4897torch.Size([32, 9, 512])
torch.Size([32, 9, 8855])
Epoch [1/1], Step [10933/12942], Loss: 2.5326, Perplexity: 12.5858torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [10934/12942], Loss: 2.3783, Perplex

Epoch [1/1], Step [10995/12942], Loss: 2.4071, Perplexity: 11.1019torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10996/12942], Loss: 2.2793, Perplexity: 9.7700torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [10997/12942], Loss: 2.5625, Perplexity: 12.9676torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [10998/12942], Loss: 2.4499, Perplexity: 11.5873torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [10999/12942], Loss: 1.9622, Perplexity: 7.1148torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11000/12942], Loss: 1.8137, Perplexity: 6.1331
torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [11001/12942], Loss: 2.6221, Perplexity: 13.7646torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11002/12942], Loss: 2.1535, Perplexity: 8.6147torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [11003/12942], Loss: 2.1644, Perp

Epoch [1/1], Step [11064/12942], Loss: 2.4797, Perplexity: 11.9379torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11065/12942], Loss: 2.1055, Perplexity: 8.2110torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [11066/12942], Loss: 2.4534, Perplexity: 11.6276torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11067/12942], Loss: 2.3936, Perplexity: 10.9525torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11068/12942], Loss: 2.2293, Perplexity: 9.2935torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [11069/12942], Loss: 2.4416, Perplexity: 11.4914torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11070/12942], Loss: 2.0049, Perplexity: 7.4255torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11071/12942], Loss: 2.1386, Perplexity: 8.4879torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11072/12942], Loss: 2.1942, Perpl

Epoch [1/1], Step [11133/12942], Loss: 2.3259, Perplexity: 10.2363torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11134/12942], Loss: 2.2176, Perplexity: 9.1852torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11135/12942], Loss: 2.0289, Perplexity: 7.6056torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11136/12942], Loss: 1.8722, Perplexity: 6.5023torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11137/12942], Loss: 2.0431, Perplexity: 7.7143torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [11138/12942], Loss: 2.7173, Perplexity: 15.1394torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [11139/12942], Loss: 2.2728, Perplexity: 9.7064torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11140/12942], Loss: 2.3973, Perplexity: 10.9929torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [11141/12942], Loss: 2.6013, Perple

Epoch [1/1], Step [11202/12942], Loss: 2.3180, Perplexity: 10.1552torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11203/12942], Loss: 2.3477, Perplexity: 10.4615torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11204/12942], Loss: 2.2388, Perplexity: 9.3817torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11205/12942], Loss: 1.9743, Perplexity: 7.2013torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11206/12942], Loss: 2.3374, Perplexity: 10.3543torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11207/12942], Loss: 2.1893, Perplexity: 8.9290torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11208/12942], Loss: 2.5133, Perplexity: 12.3459torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [11209/12942], Loss: 2.2430, Perplexity: 9.4217torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [11210/12942], Loss: 2.3224, Perpl

Epoch [1/1], Step [11271/12942], Loss: 2.3723, Perplexity: 10.7216torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11272/12942], Loss: 2.1170, Perplexity: 8.3058torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [11273/12942], Loss: 2.5766, Perplexity: 13.1518torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [11274/12942], Loss: 2.0775, Perplexity: 7.9848torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11275/12942], Loss: 2.0302, Perplexity: 7.6153torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11276/12942], Loss: 2.1417, Perplexity: 8.5142torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11277/12942], Loss: 2.0993, Perplexity: 8.1608torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11278/12942], Loss: 2.0945, Perplexity: 8.1211torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11279/12942], Loss: 2.1661, Perplex

Epoch [1/1], Step [11340/12942], Loss: 2.1768, Perplexity: 8.8181torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [11341/12942], Loss: 2.4364, Perplexity: 11.4318torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11342/12942], Loss: 2.2066, Perplexity: 9.0844torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [11343/12942], Loss: 2.3350, Perplexity: 10.3290torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11344/12942], Loss: 1.9476, Perplexity: 7.0116torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11345/12942], Loss: 2.1515, Perplexity: 8.5973torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11346/12942], Loss: 2.0923, Perplexity: 8.1038torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11347/12942], Loss: 2.0368, Perplexity: 7.6658torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11348/12942], Loss: 2.1355, Perplex

Epoch [1/1], Step [11409/12942], Loss: 2.1675, Perplexity: 8.7366torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11410/12942], Loss: 2.1174, Perplexity: 8.3093torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11411/12942], Loss: 2.3252, Perplexity: 10.2284torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11412/12942], Loss: 2.0169, Perplexity: 7.5153torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [11413/12942], Loss: 2.7419, Perplexity: 15.5166torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [11414/12942], Loss: 2.2763, Perplexity: 9.7401torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11415/12942], Loss: 2.2893, Perplexity: 9.8678torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [11416/12942], Loss: 2.1777, Perplexity: 8.8256torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11417/12942], Loss: 2.1550, Perplex

Epoch [1/1], Step [11478/12942], Loss: 2.4333, Perplexity: 11.3969torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11479/12942], Loss: 2.4637, Perplexity: 11.7483torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11480/12942], Loss: 2.3269, Perplexity: 10.2458torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11481/12942], Loss: 2.0533, Perplexity: 7.7934torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11482/12942], Loss: 2.1870, Perplexity: 8.9088torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11483/12942], Loss: 2.1071, Perplexity: 8.2240torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11484/12942], Loss: 2.0324, Perplexity: 7.6321torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11485/12942], Loss: 2.2413, Perplexity: 9.4052torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11486/12942], Loss: 2.1502, Perple

Epoch [1/1], Step [11547/12942], Loss: 2.0209, Perplexity: 7.5450torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11548/12942], Loss: 2.2884, Perplexity: 9.8590torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [11549/12942], Loss: 2.4735, Perplexity: 11.8635torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11550/12942], Loss: 2.2287, Perplexity: 9.2874torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11551/12942], Loss: 2.2645, Perplexity: 9.6262torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11552/12942], Loss: 2.3049, Perplexity: 10.0231torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11553/12942], Loss: 2.3519, Perplexity: 10.5053torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11554/12942], Loss: 2.0653, Perplexity: 7.8874torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11555/12942], Loss: 1.9423, Perple

Epoch [1/1], Step [11616/12942], Loss: 2.3430, Perplexity: 10.4121torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11617/12942], Loss: 2.1116, Perplexity: 8.2611torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11618/12942], Loss: 2.3903, Perplexity: 10.9167torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [11619/12942], Loss: 2.3505, Perplexity: 10.4903torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11620/12942], Loss: 2.0964, Perplexity: 8.1366torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11621/12942], Loss: 2.1616, Perplexity: 8.6849torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [11622/12942], Loss: 2.1853, Perplexity: 8.8932torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11623/12942], Loss: 2.2664, Perplexity: 9.6449torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11624/12942], Loss: 2.0641, Perple

Epoch [1/1], Step [11685/12942], Loss: 2.1613, Perplexity: 8.6828torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [11686/12942], Loss: 2.2916, Perplexity: 9.8903torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11687/12942], Loss: 1.8891, Perplexity: 6.6135torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [11688/12942], Loss: 2.4735, Perplexity: 11.8635torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11689/12942], Loss: 2.4060, Perplexity: 11.0895torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11690/12942], Loss: 2.0702, Perplexity: 7.9264torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11691/12942], Loss: 2.3230, Perplexity: 10.2061torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11692/12942], Loss: 2.3201, Perplexity: 10.1772torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11693/12942], Loss: 2.0811, Perpl

Epoch [1/1], Step [11754/12942], Loss: 2.6135, Perplexity: 13.6472torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11755/12942], Loss: 2.1757, Perplexity: 8.8087torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [11756/12942], Loss: 2.4596, Perplexity: 11.7007torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11757/12942], Loss: 2.1167, Perplexity: 8.3039torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11758/12942], Loss: 2.2846, Perplexity: 9.8222torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11759/12942], Loss: 2.2102, Perplexity: 9.1174torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11760/12942], Loss: 2.0699, Perplexity: 7.9239torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [11761/12942], Loss: 1.9787, Perplexity: 7.2334torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11762/12942], Loss: 2.2593, Perplex

Epoch [1/1], Step [11823/12942], Loss: 2.1692, Perplexity: 8.7516torch.Size([32, 9, 512])
torch.Size([32, 9, 8855])
Epoch [1/1], Step [11824/12942], Loss: 2.3892, Perplexity: 10.9048torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11825/12942], Loss: 2.0534, Perplexity: 7.7940torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [11826/12942], Loss: 2.4922, Perplexity: 12.0883torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11827/12942], Loss: 1.9875, Perplexity: 7.2974torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11828/12942], Loss: 2.0790, Perplexity: 7.9965torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [11829/12942], Loss: 2.4387, Perplexity: 11.4577torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11830/12942], Loss: 2.2654, Perplexity: 9.6352torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11831/12942], Loss: 2.0247, Perplexi

Epoch [1/1], Step [11892/12942], Loss: 2.1352, Perplexity: 8.4589torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11893/12942], Loss: 2.6701, Perplexity: 14.4415torch.Size([32, 25, 512])
torch.Size([32, 25, 8855])
Epoch [1/1], Step [11894/12942], Loss: 2.9948, Perplexity: 19.9814torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11895/12942], Loss: 2.0453, Perplexity: 7.7313torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [11896/12942], Loss: 2.7733, Perplexity: 16.0116torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [11897/12942], Loss: 2.4892, Perplexity: 12.0514torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11898/12942], Loss: 2.1813, Perplexity: 8.8582torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11899/12942], Loss: 1.9903, Perplexity: 7.3174torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11900/12942], Loss: 2.0252, Perpl

Epoch [1/1], Step [11961/12942], Loss: 2.1054, Perplexity: 8.2105torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11962/12942], Loss: 2.3465, Perplexity: 10.4492torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [11963/12942], Loss: 2.0743, Perplexity: 7.9586torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [11964/12942], Loss: 2.0849, Perplexity: 8.0440torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11965/12942], Loss: 2.2246, Perplexity: 9.2494torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [11966/12942], Loss: 2.5185, Perplexity: 12.4095torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [11967/12942], Loss: 2.0030, Perplexity: 7.4113torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11968/12942], Loss: 2.3167, Perplexity: 10.1417torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [11969/12942], Loss: 2.0736, Perple

Epoch [1/1], Step [12030/12942], Loss: 2.0285, Perplexity: 7.6024torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12031/12942], Loss: 1.8299, Perplexity: 6.2335torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12032/12942], Loss: 2.1069, Perplexity: 8.2228torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12033/12942], Loss: 2.1649, Perplexity: 8.7135torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12034/12942], Loss: 2.0770, Perplexity: 7.9804torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [12035/12942], Loss: 2.6067, Perplexity: 13.5542torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12036/12942], Loss: 2.0657, Perplexity: 7.8907torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [12037/12942], Loss: 2.2218, Perplexity: 9.2237torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12038/12942], Loss: 2.0915, Perplexi

Epoch [1/1], Step [12099/12942], Loss: 2.1578, Perplexity: 8.6522torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12100/12942], Loss: 1.9309, Perplexity: 6.8959
torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12101/12942], Loss: 2.2430, Perplexity: 9.4217torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12102/12942], Loss: 1.8661, Perplexity: 6.4632torch.Size([32, 26, 512])
torch.Size([32, 26, 8855])
Epoch [1/1], Step [12103/12942], Loss: 3.2502, Perplexity: 25.7964torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12104/12942], Loss: 1.9370, Perplexity: 6.9382torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [12105/12942], Loss: 2.1915, Perplexity: 8.9484torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [12106/12942], Loss: 2.3700, Perplexity: 10.6977torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12107/12942], Loss: 2.3680, Perple

Epoch [1/1], Step [12168/12942], Loss: 2.2830, Perplexity: 9.8060torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12169/12942], Loss: 2.2453, Perplexity: 9.4429torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12170/12942], Loss: 2.2886, Perplexity: 9.8611torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12171/12942], Loss: 2.1867, Perplexity: 8.9060torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12172/12942], Loss: 2.1508, Perplexity: 8.5920torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12173/12942], Loss: 2.2196, Perplexity: 9.2039torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12174/12942], Loss: 2.1751, Perplexity: 8.8033torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12175/12942], Loss: 1.9608, Perplexity: 7.1049torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12176/12942], Loss: 2.0110, Perplexit

Epoch [1/1], Step [12237/12942], Loss: 2.3037, Perplexity: 10.0114torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [12238/12942], Loss: 2.2503, Perplexity: 9.4907torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12239/12942], Loss: 2.2265, Perplexity: 9.2677torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12240/12942], Loss: 2.3375, Perplexity: 10.3556torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12241/12942], Loss: 2.2176, Perplexity: 9.1851torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12242/12942], Loss: 1.9764, Perplexity: 7.2168torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12243/12942], Loss: 2.0559, Perplexity: 7.8139torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12244/12942], Loss: 2.0005, Perplexity: 7.3926torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [12245/12942], Loss: 2.5235, Perplex

Epoch [1/1], Step [12306/12942], Loss: 2.1669, Perplexity: 8.7316torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [12307/12942], Loss: 2.4240, Perplexity: 11.2907torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12308/12942], Loss: 2.0916, Perplexity: 8.0979torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12309/12942], Loss: 2.2629, Perplexity: 9.6109torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12310/12942], Loss: 2.1922, Perplexity: 8.9549torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12311/12942], Loss: 2.3210, Perplexity: 10.1857torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12312/12942], Loss: 2.0851, Perplexity: 8.0453torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12313/12942], Loss: 2.0644, Perplexity: 7.8806torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12314/12942], Loss: 2.1532, Perplex

Epoch [1/1], Step [12375/12942], Loss: 2.1827, Perplexity: 8.8702torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12376/12942], Loss: 2.2333, Perplexity: 9.3305torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12377/12942], Loss: 2.3634, Perplexity: 10.6266torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12378/12942], Loss: 2.0378, Perplexity: 7.6736torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12379/12942], Loss: 2.0542, Perplexity: 7.8004torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12380/12942], Loss: 2.2056, Perplexity: 9.0759torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [12381/12942], Loss: 2.7457, Perplexity: 15.5762torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12382/12942], Loss: 2.0787, Perplexity: 7.9939torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12383/12942], Loss: 2.0900, Perplex

Epoch [1/1], Step [12444/12942], Loss: 1.9088, Perplexity: 6.7447torch.Size([32, 19, 512])
torch.Size([32, 19, 8855])
Epoch [1/1], Step [12445/12942], Loss: 2.8063, Perplexity: 16.5492torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [12446/12942], Loss: 2.1377, Perplexity: 8.4800torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [12447/12942], Loss: 2.6293, Perplexity: 13.8641torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12448/12942], Loss: 2.3558, Perplexity: 10.5464torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12449/12942], Loss: 2.2724, Perplexity: 9.7025torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [12450/12942], Loss: 2.5296, Perplexity: 12.5489torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [12451/12942], Loss: 1.9706, Perplexity: 7.1747torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [12452/12942], Loss: 2.6110, Perpl

Epoch [1/1], Step [12513/12942], Loss: 2.2633, Perplexity: 9.6147torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12514/12942], Loss: 1.9324, Perplexity: 6.9058torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12515/12942], Loss: 1.9956, Perplexity: 7.3570torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12516/12942], Loss: 2.1947, Perplexity: 8.9770torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [12517/12942], Loss: 2.4514, Perplexity: 11.6042torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12518/12942], Loss: 2.2484, Perplexity: 9.4726torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12519/12942], Loss: 2.1323, Perplexity: 8.4346torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12520/12942], Loss: 1.9806, Perplexity: 7.2470torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12521/12942], Loss: 2.1203, Perplexi

Epoch [1/1], Step [12582/12942], Loss: 2.0229, Perplexity: 7.5603torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12583/12942], Loss: 1.8214, Perplexity: 6.1805torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12584/12942], Loss: 2.0184, Perplexity: 7.5261torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12585/12942], Loss: 2.2248, Perplexity: 9.2521torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12586/12942], Loss: 2.0964, Perplexity: 8.1369torch.Size([32, 17, 512])
torch.Size([32, 17, 8855])
Epoch [1/1], Step [12587/12942], Loss: 2.6200, Perplexity: 13.7361torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [12588/12942], Loss: 2.3264, Perplexity: 10.2415torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12589/12942], Loss: 2.3568, Perplexity: 10.5569torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12590/12942], Loss: 2.1755, Perple

Epoch [1/1], Step [12651/12942], Loss: 2.3319, Perplexity: 10.2978torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12652/12942], Loss: 2.2373, Perplexity: 9.3680torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [12653/12942], Loss: 2.5055, Perplexity: 12.2501torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [12654/12942], Loss: 2.6603, Perplexity: 14.3003torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12655/12942], Loss: 2.0853, Perplexity: 8.0471torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12656/12942], Loss: 2.1334, Perplexity: 8.4438torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [12657/12942], Loss: 2.5502, Perplexity: 12.8097torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12658/12942], Loss: 1.8250, Perplexity: 6.2025torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12659/12942], Loss: 1.9882, Perpl

Epoch [1/1], Step [12720/12942], Loss: 2.7179, Perplexity: 15.1484torch.Size([32, 10, 512])
torch.Size([32, 10, 8855])
Epoch [1/1], Step [12721/12942], Loss: 2.2130, Perplexity: 9.1428torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [12722/12942], Loss: 2.3267, Perplexity: 10.2441torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [12723/12942], Loss: 2.1698, Perplexity: 8.7564torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12724/12942], Loss: 1.9204, Perplexity: 6.8239torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [12725/12942], Loss: 1.9803, Perplexity: 7.2447torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12726/12942], Loss: 2.0552, Perplexity: 7.8083torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12727/12942], Loss: 1.9049, Perplexity: 6.7187torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [12728/12942], Loss: 2.3616, Perplex

Epoch [1/1], Step [12789/12942], Loss: 2.5921, Perplexity: 13.3579torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [12790/12942], Loss: 2.3664, Perplexity: 10.6587torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12791/12942], Loss: 2.3328, Perplexity: 10.3066torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12792/12942], Loss: 2.1056, Perplexity: 8.2124torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12793/12942], Loss: 1.9117, Perplexity: 6.7648torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12794/12942], Loss: 2.0558, Perplexity: 7.8130torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12795/12942], Loss: 1.9907, Perplexity: 7.3206torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12796/12942], Loss: 2.0677, Perplexity: 7.9064torch.Size([32, 15, 512])
torch.Size([32, 15, 8855])
Epoch [1/1], Step [12797/12942], Loss: 2.1311, Perple

Epoch [1/1], Step [12858/12942], Loss: 2.0952, Perplexity: 8.1269torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12859/12942], Loss: 2.1010, Perplexity: 8.1740torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12860/12942], Loss: 2.2430, Perplexity: 9.4218torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12861/12942], Loss: 2.0467, Perplexity: 7.7424torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12862/12942], Loss: 2.1260, Perplexity: 8.3813torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12863/12942], Loss: 1.9379, Perplexity: 6.9443torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12864/12942], Loss: 2.2207, Perplexity: 9.2139torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12865/12942], Loss: 2.1188, Perplexity: 8.3213torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12866/12942], Loss: 2.1396, Perplexit

Epoch [1/1], Step [12927/12942], Loss: 1.8045, Perplexity: 6.0768torch.Size([32, 13, 512])
torch.Size([32, 13, 8855])
Epoch [1/1], Step [12928/12942], Loss: 2.0989, Perplexity: 8.1571torch.Size([32, 11, 512])
torch.Size([32, 11, 8855])
Epoch [1/1], Step [12929/12942], Loss: 2.3900, Perplexity: 10.9135torch.Size([32, 18, 512])
torch.Size([32, 18, 8855])
Epoch [1/1], Step [12930/12942], Loss: 2.3865, Perplexity: 10.8749torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12931/12942], Loss: 2.0397, Perplexity: 7.6882torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12932/12942], Loss: 2.4289, Perplexity: 11.3460torch.Size([32, 14, 512])
torch.Size([32, 14, 8855])
Epoch [1/1], Step [12933/12942], Loss: 2.1313, Perplexity: 8.4260torch.Size([32, 16, 512])
torch.Size([32, 16, 8855])
Epoch [1/1], Step [12934/12942], Loss: 2.5094, Perplexity: 12.2972torch.Size([32, 12, 512])
torch.Size([32, 12, 8855])
Epoch [1/1], Step [12935/12942], Loss: 2.1501, Perpl

<a id='step3'></a>
## Step 3: (Optional) Validate your Model

To assess potential overfitting, one approach is to assess performance on a validation set.  If you decide to do this **optional** task, you are required to first complete all of the steps in the next notebook in the sequence (**3_Inference.ipynb**); as part of that notebook, you will write and test code (specifically, the `sample` method in the `DecoderRNN` class) that uses your RNN decoder to generate captions.  That code will prove incredibly useful here. 

If you decide to validate your model, please do not edit the data loader in **data_loader.py**.  Instead, create a new file named **data_loader_val.py** containing the code for obtaining the data loader for the validation data.  You can access:
- the validation images at filepath `'/opt/cocoapi/images/train2014/'`, and
- the validation image caption annotation file at filepath `'/opt/cocoapi/annotations/captions_val2014.json'`.

The suggested approach to validating your model involves creating a json file such as [this one](https://github.com/cocodataset/cocoapi/blob/master/results/captions_val2014_fakecap_results.json) containing your model's predicted captions for the validation images.  Then, you can write your own script or use one that you [find online](https://github.com/tylin/coco-caption) to calculate the BLEU score of your model.  You can read more about the BLEU score, along with other evaluation metrics (such as TEOR and Cider) in section 4.1 of [this paper](https://arxiv.org/pdf/1411.4555.pdf).  For more information about how to use the annotation file, check out the [website](http://cocodataset.org/#download) for the COCO dataset.

In [4]:
# (Optional) TODO: Validate your model.