# Computer Vision Nanodegree

## Project: Image Captioning

---

In this notebook, you will train your CNN-RNN model.  

You are welcome and encouraged to try out many different architectures and hyperparameters when searching for a good model.

This does have the potential to make the project quite messy!  Before submitting your project, make sure that you clean up:
- the code you write in this notebook.  The notebook should describe how to train a single CNN-RNN architecture, corresponding to your final choice of hyperparameters.  You should structure the notebook so that the reviewer can replicate your results by running the code in this notebook.  
- the output of the code cell in **Step 2**.  The output should show the output obtained when training the model from scratch.

This notebook **will be graded**.  

Feel free to use the links below to navigate the notebook:
- [Step 1](#step1): Training Setup
- [Step 2](#step2): Train your Model
- [Step 3](#step3): (Optional) Validate your Model

<a id='step1'></a>
## Step 1: Training Setup

In this step of the notebook, you will customize the training of your CNN-RNN model by specifying hyperparameters and setting other options that are important to the training procedure.  The values you set now will be used when training your model in **Step 2** below.

You should only amend blocks of code that are preceded by a `TODO` statement.  **Any code blocks that are not preceded by a `TODO` statement should not be modified**.

### Task #1

Begin by setting the following variables:
- `batch_size` - the batch size of each training batch.  It is the number of image-caption pairs used to amend the model weights in each training step. 
- `vocab_threshold` - the minimum word count threshold.  Note that a larger threshold will result in a smaller vocabulary, whereas a smaller threshold will include rarer words and result in a larger vocabulary.  
- `vocab_from_file` - a Boolean that decides whether to load the vocabulary from file. 
- `embed_size` - the dimensionality of the image and word embeddings.  
- `hidden_size` - the number of features in the hidden state of the RNN decoder.  
- `num_epochs` - the number of epochs to train the model.  We recommend that you set `num_epochs=3`, but feel free to increase or decrease this number as you wish.  [This paper](https://arxiv.org/pdf/1502.03044.pdf) trained a captioning model on a single state-of-the-art GPU for 3 days, but you'll soon see that you can get reasonable results in a matter of a few hours!  (_But of course, if you want your model to compete with current research, you will have to train for much longer._)
- `save_every` - determines how often to save the model weights.  We recommend that you set `save_every=1`, to save the model weights after each epoch.  This way, after the `i`th epoch, the encoder and decoder weights will be saved in the `models/` folder as `encoder-i.pkl` and `decoder-i.pkl`, respectively.
- `print_every` - determines how often to print the batch loss to the Jupyter notebook while training.  Note that you **will not** observe a monotonic decrease in the loss function while training - this is perfectly fine and completely expected!  You are encouraged to keep this at its default value of `100` to avoid clogging the notebook, but feel free to change it.
- `log_file` - the name of the text file containing - for every step - how the loss and perplexity evolved during training.

If you're not sure where to begin to set some of the values above, you can peruse [this paper](https://arxiv.org/pdf/1502.03044.pdf) and [this paper](https://arxiv.org/pdf/1411.4555.pdf) for useful guidance!  **To avoid spending too long on this notebook**, you are encouraged to consult these suggested research papers to obtain a strong initial guess for which hyperparameters are likely to work best.  Then, train a single model, and proceed to the next notebook (**3_Inference.ipynb**).  If you are unhappy with your performance, you can return to this notebook to tweak the hyperparameters (and/or the architecture in **model.py**) and re-train your model.

### Question 1

**Question:** Describe your CNN-RNN architecture in detail.  With this architecture in mind, how did you select the values of the variables in Task 1?  If you consulted a research paper detailing a successful implementation of an image captioning model, please provide the reference.

**Answer:** It is straight forward like in the paper "Show and Tell: A Neural Image Caption Generator". 
It follows the basic encoder-decoder architecture. 

The encoder consists of a ResNet backbone with a dense layer implanted on top of it. I sticked with Udacity's default choice here. The output is a feature vector which is the initial input into decoder.

The heart of the decoder is a single lstm cell which takes a word vector as input. The output of the lstm is fed through a dense layer which outputs a score distribution for the next generated word. The distribution is converted to proper probabilities via the softmax function. In inference mode, new word indices are sampled from these probabilities. Given a new word index, either from sampling or as part of a training sequence, it is converted by an embedding layer into a new input for the lstm. The size of the word vector matches the feature vector from the encoder. Thus subsequently generated word vectors are compatible with the initial input from the encoder.

Parameter choice:

* batch_size: Sticked to the number from 1_Preliminaries ...
* vocab_threshold: From the paper.
* embed_size & hidden_size: From the paper. Looks like the used the same size for both.
* num_epochs: Sticked to the suggested count.


### (Optional) Task #2

Note that we have provided a recommended image transform `transform_train` for pre-processing the training images, but you are welcome (and encouraged!) to modify it as you wish.  When modifying this transform, keep in mind that:
- the images in the dataset have varying heights and widths, and 
- if using a pre-trained model, you must perform the corresponding appropriate normalization.

### Question 2

**Question:** How did you select the transform in `transform_train`?  If you left the transform at its provided value, why do you think that it is a good choice for your CNN architecture?

**Answer:** I left it as provided. It checks all marks for a sensible transform:

* The image size of 224 pixels is commonly used for image classification. The ResNet backbone was probably trained on it. So it should not be too bad. However, since the resnet has a global average pooling layer at the end, it would be possible to use different sizes, I think.
* Doing a random crop is a simple way to do data augmentation which can help with overfitting, and good practice.
* Same with flipping.
* The normalization should match the normalization that was used during the training of the backbone. I trust that the provided values are sensible in this regard.
* One could do more, e.g. color augmentation as described in the implementation section of the ResNet paper (https://arxiv.org/pdf/1512.03385.pdf). But since I don't have the time to do ablation experiments I might as well keep things simple and not do more.

### Task #3

Next, you will specify a Python list containing the learnable parameters of the model.  For instance, if you decide to make all weights in the decoder trainable, but only want to train the weights in the embedding layer of the encoder, then you should set `params` to something like:
```
params = list(decoder.parameters()) + list(encoder.embed.parameters()) 
```

### Question 3

**Question:** How did you select the trainable parameters of your architecture?  Why do you think this is a good choice?

**Answer:** I only trained the parameters which are not pretrained already. It is a good choice according to the authors of the captioning paper "Show and Tell: A Neural Image Caption Generator". They remarked that training the  CNN weights had a negative impact on their results!

### Task #4

Finally, you will select an [optimizer](http://pytorch.org/docs/master/optim.html#torch.optim.Optimizer).

### Question 4

**Question:** How did you select the optimizer used to train your model?

**Answer:** I just picked my default. The Adam optimizer. It seems to be good for most purposes. In my experience it is not very sensitive to the learning rate. Something like 1e-4 often works fine. It did so in this case, albeit probably not optimally. I did not bother changing it since the loss was steadily decreasing.

In [1]:
import torch
import torch.nn as nn
from torchvision import transforms
import sys
sys.path.append('/opt/cocoapi/PythonAPI')
from pycocotools.coco import COCO
from data_loader import get_loader
from model import EncoderCNN, DecoderRNN
import math
import itertools

## TODO #1: Select appropriate values for the Python variables below.
batch_size = 10          # batch size
vocab_threshold = 5        # minimum word count threshold
vocab_from_file = True    # if True, load existing vocab file
embed_size = 512           # dimensionality of image and word embeddings
hidden_size = 512          # number of features in hidden state of the RNN decoder
num_epochs = 3             # number of training epochs
save_every = 1             # determines frequency of saving model weights
print_every = 100          # determines window for printing average loss
log_file = 'training_log.txt'       # name of file with saved training loss and perplexity

# (Optional) TODO #2: Amend the image transform below.
transform_train = transforms.Compose([ 
    transforms.Resize(256),                          # smaller edge of image resized to 256
    transforms.RandomCrop(224),                      # get 224x224 crop from random location
    transforms.RandomHorizontalFlip(),               # horizontally flip image with probability=0.5
    transforms.ToTensor(),                           # convert the PIL Image to a tensor
    transforms.Normalize((0.485, 0.456, 0.406),      # normalize image for pre-trained model
                         (0.229, 0.224, 0.225))])

# Build data loader.
data_loader = get_loader(transform=transform_train,
                         mode='train',
                         batch_size=batch_size,
                         vocab_threshold=vocab_threshold,
                         vocab_from_file=vocab_from_file)

# The size of the vocabulary.
vocab_size = len(data_loader.dataset.vocab)
#vocab_size = 1000

# Initialize the encoder and decoder. 
encoder = EncoderCNN(embed_size)
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)

# Move models to GPU if CUDA is available. 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
encoder.to(device)
decoder.to(device)

# Define the loss function. 
criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

# TODO #3: Specify the learnable parameters of the model.
params = itertools.chain(
    (p for p in encoder.parameters() if p.requires_grad), 
    decoder.parameters())

# TODO #4: Define the optimizer.
optimizer = torch.optim.Adam(params, lr=1.e-4)

# Set the total number of training steps per epoch.
total_step = math.ceil(len(data_loader.dataset.caption_lengths) / data_loader.batch_sampler.batch_size)

Vocabulary successfully loaded from vocab.pkl file!
loading annotations into memory...
Done (t=0.87s)
creating index...


  0%|          | 806/414113 [00:00<01:48, 3825.12it/s]

index created!
Obtaining caption lengths...


100%|██████████| 414113/414113 [01:32<00:00, 4469.77it/s]


<a id='step2'></a>
## Step 2: Train your Model

Once you have executed the code cell in **Step 1**, the training procedure below should run without issue.  

It is completely fine to leave the code cell below as-is without modifications to train your model.  However, if you would like to modify the code used to train the model below, you must ensure that your changes are easily parsed by your reviewer.  In other words, make sure to provide appropriate comments to describe how your code works!  

You may find it useful to load saved weights to resume training.  In that case, note the names of the files containing the encoder and decoder weights that you'd like to load (`encoder_file` and `decoder_file`).  Then you can load the weights by using the lines below:

```python
# Load pre-trained weights before resuming training.
encoder.load_state_dict(torch.load(os.path.join('./models', encoder_file)))
decoder.load_state_dict(torch.load(os.path.join('./models', decoder_file)))
```

While trying out parameters, make sure to take extensive notes and record the settings that you used in your various training runs.  In particular, you don't want to encounter a situation where you've trained a model for several hours but can't remember what settings you used :).

### A Note on Tuning Hyperparameters

To figure out how well your model is doing, you can look at how the training loss and perplexity evolve during training - and for the purposes of this project, you are encouraged to amend the hyperparameters based on this information.  

However, this will not tell you if your model is overfitting to the training data, and, unfortunately, overfitting is a problem that is commonly encountered when training image captioning models.  

For this project, you need not worry about overfitting. **This project does not have strict requirements regarding the performance of your model**, and you just need to demonstrate that your model has learned **_something_** when you generate captions on the test data.  For now, we strongly encourage you to train your model for the suggested 3 epochs without worrying about performance; then, you should immediately transition to the next notebook in the sequence (**3_Inference.ipynb**) to see how your model performs on the test data.  If your model needs to be changed, you can come back to this notebook, amend hyperparameters (if necessary), and re-train the model.

That said, if you would like to go above and beyond in this project, you can read about some approaches to minimizing overfitting in section 4.3.1 of [this paper](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7505636).  In the next (optional) step of this notebook, we provide some guidance for assessing the performance on the validation dataset.

In [None]:
import torch.utils.data as data
import numpy as np
import os
import requests
import time

# Open the training log file.
f = open(log_file, 'w')

old_time = time.time()
response = requests.request("GET", 
                            "http://metadata.google.internal/computeMetadata/v1/instance/attributes/keep_alive_token", 
                            headers={"Metadata-Flavor":"Google"})

for epoch in range(1, num_epochs+1):
    
    for i_step in range(1, total_step+1):
        
        if time.time() - old_time > 60:
            old_time = time.time()
            requests.request("POST", 
                             "https://nebula.udacity.com/api/v1/remote/keep-alive", 
                             headers={'Authorization': "STAR " + response.text})
        
        # Randomly sample a caption length, and sample indices with that length.
        indices = data_loader.dataset.get_train_indices()
        # Create and assign a batch sampler to retrieve a batch with the sampled indices.
        new_sampler = data.sampler.SubsetRandomSampler(indices=indices)
        data_loader.batch_sampler.sampler = new_sampler
        
        # Obtain the batch.
        images, captions = next(iter(data_loader))

        # Move batch of images and captions to GPU if CUDA is available.
        images = images.to(device)
        captions = captions.to(device)
        
        # Zero the gradients.
        decoder.zero_grad()
        encoder.zero_grad()
        
        # Pass the inputs through the CNN-RNN model.
        features = encoder(images)
        outputs = decoder(features, captions)
        
        # Calculate the batch loss.
        loss = criterion(outputs.view(-1, vocab_size), captions.view(-1))
        
        # Backward pass.
        loss.backward()
        
        # Update the parameters in the optimizer.
        optimizer.step()
            
        # Get training statistics.
        stats = 'Epoch [%d/%d], Step [%d/%d], Loss: %.4f, Perplexity: %5.4f' % (epoch, num_epochs, i_step, total_step, loss.item(), np.exp(loss.item()))
        
        # Print training statistics (on same line).
        print('\r' + stats, end="")
        sys.stdout.flush()
        
        # Print training statistics to file.
        f.write(stats + '\n')
        f.flush()
        
        # Print training statistics (on different line).
        if i_step % print_every == 0:
            print('\r' + stats)
            
    # Save the weights.
    if epoch % save_every == 0:
        torch.save(decoder.state_dict(), os.path.join('./models', 'decoder-%d.pkl' % epoch))
        torch.save(encoder.state_dict(), os.path.join('./models', 'encoder-%d.pkl' % epoch))

# Close the training log file.
f.close()

Epoch [1/3], Step [100/41412], Loss: 4.8684, Perplexity: 130.1107
Epoch [1/3], Step [200/41412], Loss: 4.2609, Perplexity: 70.87338
Epoch [1/3], Step [300/41412], Loss: 4.0605, Perplexity: 58.00278
Epoch [1/3], Step [400/41412], Loss: 4.1995, Perplexity: 66.65582
Epoch [1/3], Step [500/41412], Loss: 4.0027, Perplexity: 54.74687
Epoch [1/3], Step [600/41412], Loss: 3.8690, Perplexity: 47.89244
Epoch [1/3], Step [700/41412], Loss: 3.6313, Perplexity: 37.76212
Epoch [1/3], Step [800/41412], Loss: 3.6163, Perplexity: 37.19894
Epoch [1/3], Step [900/41412], Loss: 4.0417, Perplexity: 56.92577
Epoch [1/3], Step [1000/41412], Loss: 3.8701, Perplexity: 47.9456
Epoch [1/3], Step [1100/41412], Loss: 3.9982, Perplexity: 54.49883
Epoch [1/3], Step [1200/41412], Loss: 3.8118, Perplexity: 45.23164
Epoch [1/3], Step [1300/41412], Loss: 3.1942, Perplexity: 24.39142
Epoch [1/3], Step [1400/41412], Loss: 3.8697, Perplexity: 47.92911
Epoch [1/3], Step [1500/41412], Loss: 3.9907, Perplexity: 54.09142
Epoch

Epoch [1/3], Step [24600/41412], Loss: 2.2356, Perplexity: 9.35228
Epoch [1/3], Step [24700/41412], Loss: 2.0933, Perplexity: 8.11149
Epoch [1/3], Step [24800/41412], Loss: 2.0650, Perplexity: 7.88533
Epoch [1/3], Step [24900/41412], Loss: 2.2593, Perplexity: 9.57647
Epoch [1/3], Step [25000/41412], Loss: 3.3767, Perplexity: 29.2740
Epoch [1/3], Step [25100/41412], Loss: 2.4643, Perplexity: 11.7548
Epoch [1/3], Step [25200/41412], Loss: 2.1507, Perplexity: 8.59089
Epoch [1/3], Step [25300/41412], Loss: 2.2472, Perplexity: 9.46160
Epoch [1/3], Step [25400/41412], Loss: 2.0458, Perplexity: 7.73550
Epoch [1/3], Step [25500/41412], Loss: 2.2472, Perplexity: 9.46101
Epoch [1/3], Step [25600/41412], Loss: 2.8139, Perplexity: 16.6756
Epoch [1/3], Step [25700/41412], Loss: 2.1131, Perplexity: 8.27405
Epoch [1/3], Step [25800/41412], Loss: 2.2478, Perplexity: 9.46698
Epoch [1/3], Step [25900/41412], Loss: 2.6886, Perplexity: 14.7112
Epoch [1/3], Step [26000/41412], Loss: 3.3732, Perplexity: 29.

Epoch [2/3], Step [7700/41412], Loss: 2.2932, Perplexity: 9.90697
Epoch [2/3], Step [7800/41412], Loss: 1.9202, Perplexity: 6.82211
Epoch [2/3], Step [7900/41412], Loss: 2.5339, Perplexity: 12.6021
Epoch [2/3], Step [8000/41412], Loss: 2.7616, Perplexity: 15.82506
Epoch [2/3], Step [8100/41412], Loss: 2.2728, Perplexity: 9.70649
Epoch [2/3], Step [8200/41412], Loss: 2.1416, Perplexity: 8.51287
Epoch [2/3], Step [8300/41412], Loss: 2.3781, Perplexity: 10.7843
Epoch [2/3], Step [8400/41412], Loss: 2.0399, Perplexity: 7.69022
Epoch [2/3], Step [8500/41412], Loss: 2.3766, Perplexity: 10.7686
Epoch [2/3], Step [8600/41412], Loss: 1.8601, Perplexity: 6.42412
Epoch [2/3], Step [8700/41412], Loss: 1.7294, Perplexity: 5.63743
Epoch [2/3], Step [8800/41412], Loss: 2.3416, Perplexity: 10.3983
Epoch [2/3], Step [8900/41412], Loss: 2.5923, Perplexity: 13.3603
Epoch [2/3], Step [9000/41412], Loss: 2.3615, Perplexity: 10.6072
Epoch [2/3], Step [9100/41412], Loss: 2.0707, Perplexity: 7.93046
Epoch [2/

Epoch [2/3], Step [32200/41412], Loss: 2.4381, Perplexity: 11.4514
Epoch [2/3], Step [32300/41412], Loss: 2.5122, Perplexity: 12.3323
Epoch [2/3], Step [32400/41412], Loss: 2.5324, Perplexity: 12.5842
Epoch [2/3], Step [32500/41412], Loss: 1.8249, Perplexity: 6.20250
Epoch [2/3], Step [32600/41412], Loss: 2.7723, Perplexity: 15.9954
Epoch [2/3], Step [32700/41412], Loss: 2.0026, Perplexity: 7.40830
Epoch [2/3], Step [32800/41412], Loss: 1.5855, Perplexity: 4.88188
Epoch [2/3], Step [32900/41412], Loss: 2.2291, Perplexity: 9.29145
Epoch [2/3], Step [33000/41412], Loss: 1.8088, Perplexity: 6.10340
Epoch [2/3], Step [33100/41412], Loss: 2.2380, Perplexity: 9.37485
Epoch [2/3], Step [33200/41412], Loss: 2.2637, Perplexity: 9.61834
Epoch [2/3], Step [33300/41412], Loss: 2.2063, Perplexity: 9.08212
Epoch [2/3], Step [33400/41412], Loss: 2.2688, Perplexity: 9.66829
Epoch [2/3], Step [33500/41412], Loss: 2.7743, Perplexity: 16.0280
Epoch [2/3], Step [33600/41412], Loss: 2.0546, Perplexity: 7.8

Epoch [3/3], Step [15300/41412], Loss: 1.8165, Perplexity: 6.15063
Epoch [3/3], Step [15400/41412], Loss: 2.2938, Perplexity: 9.91213
Epoch [3/3], Step [15500/41412], Loss: 2.0379, Perplexity: 7.67468
Epoch [3/3], Step [15600/41412], Loss: 1.9168, Perplexity: 6.79939
Epoch [3/3], Step [15700/41412], Loss: 1.7888, Perplexity: 5.98211
Epoch [3/3], Step [15800/41412], Loss: 2.0532, Perplexity: 7.79319
Epoch [3/3], Step [15900/41412], Loss: 1.7223, Perplexity: 5.59774
Epoch [3/3], Step [16000/41412], Loss: 1.8938, Perplexity: 6.64444
Epoch [3/3], Step [16100/41412], Loss: 1.6484, Perplexity: 5.19869
Epoch [3/3], Step [16200/41412], Loss: 2.3036, Perplexity: 10.0099
Epoch [3/3], Step [16300/41412], Loss: 2.0540, Perplexity: 7.79947
Epoch [3/3], Step [16400/41412], Loss: 2.4898, Perplexity: 12.0590
Epoch [3/3], Step [16500/41412], Loss: 1.8555, Perplexity: 6.39493
Epoch [3/3], Step [16600/41412], Loss: 2.8601, Perplexity: 17.4641
Epoch [3/3], Step [16700/41412], Loss: 1.8767, Perplexity: 6.5

In [1]:
%ls

0_Dataset.ipynb                               data_loader.py
0_Dataset-zh.ipynb                            filelist.txt
1_Preliminaries.ipynb                         [0m[01;34mimages[0m/
1_Preliminaries-zh.ipynb                      model.py
2_Training.ipynb                              [01;34mmodels[0m/
2_Training-zh.ipynb                           [01;34m__pycache__[0m/
3_Inference.ipynb                             training_log.txt
3_Inference-zh.ipynb                          vocab.pkl
4_Zip Your Project Files and Submit.ipynb     vocabulary.py
4_Zip Your Project Files and Submit-zh.ipynb


In [2]:
%ls models/

decoder-1.pkl  decoder-3.pkl  encoder-2.pkl
decoder-2.pkl  encoder-1.pkl  encoder-3.pkl


In [None]:
%cat training_log.txt

Epoch [1/3], Step [1/41412], Loss: 9.1051, Perplexity: 9000.6493
Epoch [1/3], Step [2/41412], Loss: 9.0855, Perplexity: 8826.5273
Epoch [1/3], Step [3/41412], Loss: 9.0553, Perplexity: 8564.1307
Epoch [1/3], Step [4/41412], Loss: 9.0461, Perplexity: 8485.3289
Epoch [1/3], Step [5/41412], Loss: 9.0199, Perplexity: 8265.6614
Epoch [1/3], Step [6/41412], Loss: 8.9969, Perplexity: 8077.9230
Epoch [1/3], Step [7/41412], Loss: 8.9808, Perplexity: 7949.1984
Epoch [1/3], Step [8/41412], Loss: 8.9240, Perplexity: 7510.2470
Epoch [1/3], Step [9/41412], Loss: 8.9354, Perplexity: 7596.4350
Epoch [1/3], Step [10/41412], Loss: 8.8800, Perplexity: 7186.6682
Epoch [1/3], Step [11/41412], Loss: 8.8654, Perplexity: 7082.5065
Epoch [1/3], Step [12/41412], Loss: 8.8816, Perplexity: 7198.6105
Epoch [1/3], Step [13/41412], Loss: 8.8921, Perplexity: 7274.1428
Epoch [1/3], Step [14/41412], Loss: 8.7936, Perplexity: 6592.1661
Epoch [1/3], Step [15/41412], Loss: 8.7840, Perplexity: 6529.1613
Epoc

Epoch [1/3], Step [5398/41412], Loss: 3.0800, Perplexity: 21.7586
Epoch [1/3], Step [5399/41412], Loss: 3.0951, Perplexity: 22.0888
Epoch [1/3], Step [5400/41412], Loss: 3.1134, Perplexity: 22.4971
Epoch [1/3], Step [5401/41412], Loss: 3.3447, Perplexity: 28.3522
Epoch [1/3], Step [5402/41412], Loss: 3.1111, Perplexity: 22.4447
Epoch [1/3], Step [5403/41412], Loss: 2.6704, Perplexity: 14.4450
Epoch [1/3], Step [5404/41412], Loss: 3.0627, Perplexity: 21.3846
Epoch [1/3], Step [5405/41412], Loss: 2.8152, Perplexity: 16.6960
Epoch [1/3], Step [5406/41412], Loss: 3.2335, Perplexity: 25.3693
Epoch [1/3], Step [5407/41412], Loss: 3.0036, Perplexity: 20.1578
Epoch [1/3], Step [5408/41412], Loss: 3.1482, Perplexity: 23.2932
Epoch [1/3], Step [5409/41412], Loss: 3.3553, Perplexity: 28.6554
Epoch [1/3], Step [5410/41412], Loss: 3.7109, Perplexity: 40.8914
Epoch [1/3], Step [5411/41412], Loss: 2.9574, Perplexity: 19.2488
Epoch [1/3], Step [5412/41412], Loss: 2.8894, Perplexity: 17.9

Epoch [1/3], Step [9265/41412], Loss: 2.1774, Perplexity: 8.8233
Epoch [1/3], Step [9266/41412], Loss: 3.0358, Perplexity: 20.8183
Epoch [1/3], Step [9267/41412], Loss: 2.6941, Perplexity: 14.7919
Epoch [1/3], Step [9268/41412], Loss: 2.8169, Perplexity: 16.7242
Epoch [1/3], Step [9269/41412], Loss: 2.8625, Perplexity: 17.5061
Epoch [1/3], Step [9270/41412], Loss: 2.9800, Perplexity: 19.6880
Epoch [1/3], Step [9271/41412], Loss: 2.6162, Perplexity: 13.6839
Epoch [1/3], Step [9272/41412], Loss: 2.7704, Perplexity: 15.9642
Epoch [1/3], Step [9273/41412], Loss: 2.4360, Perplexity: 11.4268
Epoch [1/3], Step [9274/41412], Loss: 2.0990, Perplexity: 8.1576
Epoch [1/3], Step [9275/41412], Loss: 2.9989, Perplexity: 20.0627
Epoch [1/3], Step [9276/41412], Loss: 2.4715, Perplexity: 11.8402
Epoch [1/3], Step [9277/41412], Loss: 2.9947, Perplexity: 19.9800
Epoch [1/3], Step [9278/41412], Loss: 2.8340, Perplexity: 17.0135
Epoch [1/3], Step [9279/41412], Loss: 3.4020, Perplexity: 30.025

Epoch [1/3], Step [11718/41412], Loss: 3.3712, Perplexity: 29.1146
Epoch [1/3], Step [11719/41412], Loss: 1.9966, Perplexity: 7.3637
Epoch [1/3], Step [11720/41412], Loss: 2.6724, Perplexity: 14.4741
Epoch [1/3], Step [11721/41412], Loss: 2.6223, Perplexity: 13.7680
Epoch [1/3], Step [11722/41412], Loss: 2.2585, Perplexity: 9.5683
Epoch [1/3], Step [11723/41412], Loss: 2.4977, Perplexity: 12.1543
Epoch [1/3], Step [11724/41412], Loss: 2.7500, Perplexity: 15.6433
Epoch [1/3], Step [11725/41412], Loss: 2.7603, Perplexity: 15.8044
Epoch [1/3], Step [11726/41412], Loss: 2.8523, Perplexity: 17.3282
Epoch [1/3], Step [11727/41412], Loss: 3.0941, Perplexity: 22.0678
Epoch [1/3], Step [11728/41412], Loss: 3.0737, Perplexity: 21.6225
Epoch [1/3], Step [11729/41412], Loss: 2.4568, Perplexity: 11.6671
Epoch [1/3], Step [11730/41412], Loss: 2.4864, Perplexity: 12.0180
Epoch [1/3], Step [11731/41412], Loss: 2.7129, Perplexity: 15.0729
Epoch [1/3], Step [11732/41412], Loss: 3.3349, Per

Epoch [1/3], Step [14172/41412], Loss: 2.2266, Perplexity: 9.2681
Epoch [1/3], Step [14173/41412], Loss: 2.0431, Perplexity: 7.7147
Epoch [1/3], Step [14174/41412], Loss: 2.2892, Perplexity: 9.8669
Epoch [1/3], Step [14175/41412], Loss: 2.8049, Perplexity: 16.5253
Epoch [1/3], Step [14176/41412], Loss: 1.9055, Perplexity: 6.7227
Epoch [1/3], Step [14177/41412], Loss: 2.7376, Perplexity: 15.4495
Epoch [1/3], Step [14178/41412], Loss: 3.1313, Perplexity: 22.9043
Epoch [1/3], Step [14179/41412], Loss: 2.5358, Perplexity: 12.6266
Epoch [1/3], Step [14180/41412], Loss: 2.4014, Perplexity: 11.0383
Epoch [1/3], Step [14181/41412], Loss: 2.4860, Perplexity: 12.0134
Epoch [1/3], Step [14182/41412], Loss: 2.6705, Perplexity: 14.4469
Epoch [1/3], Step [14183/41412], Loss: 2.4098, Perplexity: 11.1321
Epoch [1/3], Step [14184/41412], Loss: 2.3884, Perplexity: 10.8956
Epoch [1/3], Step [14185/41412], Loss: 2.2807, Perplexity: 9.7834
Epoch [1/3], Step [14186/41412], Loss: 3.3838, Perple

Epoch [1/3], Step [16126/41412], Loss: 2.9313, Perplexity: 18.7523
Epoch [1/3], Step [16127/41412], Loss: 2.7508, Perplexity: 15.6545
Epoch [1/3], Step [16128/41412], Loss: 2.6496, Perplexity: 14.1482
Epoch [1/3], Step [16129/41412], Loss: 2.8507, Perplexity: 17.3003
Epoch [1/3], Step [16130/41412], Loss: 2.0761, Perplexity: 7.9734
Epoch [1/3], Step [16131/41412], Loss: 2.8590, Perplexity: 17.4443
Epoch [1/3], Step [16132/41412], Loss: 2.2676, Perplexity: 9.6563
Epoch [1/3], Step [16133/41412], Loss: 2.6206, Perplexity: 13.7436
Epoch [1/3], Step [16134/41412], Loss: 2.7343, Perplexity: 15.3986
Epoch [1/3], Step [16135/41412], Loss: 3.2291, Perplexity: 25.2575
Epoch [1/3], Step [16136/41412], Loss: 2.4238, Perplexity: 11.2889
Epoch [1/3], Step [16137/41412], Loss: 2.4864, Perplexity: 12.0177
Epoch [1/3], Step [16138/41412], Loss: 2.0682, Perplexity: 7.9104
Epoch [1/3], Step [16139/41412], Loss: 2.9388, Perplexity: 18.8927
Epoch [1/3], Step [16140/41412], Loss: 2.7583, Perp

Epoch [1/3], Step [18008/41412], Loss: 2.7262, Perplexity: 15.2746
Epoch [1/3], Step [18009/41412], Loss: 2.7849, Perplexity: 16.1984
Epoch [1/3], Step [18010/41412], Loss: 1.6793, Perplexity: 5.3616
Epoch [1/3], Step [18011/41412], Loss: 2.6753, Perplexity: 14.5167
Epoch [1/3], Step [18012/41412], Loss: 2.5197, Perplexity: 12.4254
Epoch [1/3], Step [18013/41412], Loss: 2.4555, Perplexity: 11.6520
Epoch [1/3], Step [18014/41412], Loss: 2.8329, Perplexity: 16.9943
Epoch [1/3], Step [18015/41412], Loss: 4.0479, Perplexity: 57.2760
Epoch [1/3], Step [18016/41412], Loss: 2.0818, Perplexity: 8.0191
Epoch [1/3], Step [18017/41412], Loss: 2.1098, Perplexity: 8.2466
Epoch [1/3], Step [18018/41412], Loss: 2.2961, Perplexity: 9.9357
Epoch [1/3], Step [18019/41412], Loss: 2.1396, Perplexity: 8.4957
Epoch [1/3], Step [18020/41412], Loss: 2.6987, Perplexity: 14.8606
Epoch [1/3], Step [18021/41412], Loss: 3.1093, Perplexity: 22.4043
Epoch [1/3], Step [18022/41412], Loss: 2.1664, Perple

Epoch [1/3], Step [19678/41412], Loss: 2.6734, Perplexity: 14.4895
Epoch [1/3], Step [19679/41412], Loss: 2.1317, Perplexity: 8.4290
Epoch [1/3], Step [19680/41412], Loss: 2.4374, Perplexity: 11.4434
Epoch [1/3], Step [19681/41412], Loss: 2.3803, Perplexity: 10.8083
Epoch [1/3], Step [19682/41412], Loss: 1.9858, Perplexity: 7.2852
Epoch [1/3], Step [19683/41412], Loss: 2.2990, Perplexity: 9.9639
Epoch [1/3], Step [19684/41412], Loss: 2.5909, Perplexity: 13.3415
Epoch [1/3], Step [19685/41412], Loss: 2.8951, Perplexity: 18.0845
Epoch [1/3], Step [19686/41412], Loss: 2.2164, Perplexity: 9.1745
Epoch [1/3], Step [19687/41412], Loss: 2.5129, Perplexity: 12.3401
Epoch [1/3], Step [19688/41412], Loss: 2.8945, Perplexity: 18.0737
Epoch [1/3], Step [19689/41412], Loss: 2.2986, Perplexity: 9.9602
Epoch [1/3], Step [19690/41412], Loss: 2.4380, Perplexity: 11.4498
Epoch [1/3], Step [19691/41412], Loss: 2.6340, Perplexity: 13.9289
Epoch [1/3], Step [19692/41412], Loss: 2.2810, Perple

Epoch [1/3], Step [21934/41412], Loss: 2.9144, Perplexity: 18.4374
Epoch [1/3], Step [21935/41412], Loss: 2.8084, Perplexity: 16.5829
Epoch [1/3], Step [21936/41412], Loss: 1.9567, Perplexity: 7.0758
Epoch [1/3], Step [21937/41412], Loss: 2.3645, Perplexity: 10.6384
Epoch [1/3], Step [21938/41412], Loss: 2.4338, Perplexity: 11.4020
Epoch [1/3], Step [21939/41412], Loss: 2.8274, Perplexity: 16.9008
Epoch [1/3], Step [21940/41412], Loss: 2.6074, Perplexity: 13.5639
Epoch [1/3], Step [21941/41412], Loss: 2.6913, Perplexity: 14.7515
Epoch [1/3], Step [21942/41412], Loss: 3.0469, Perplexity: 21.0493
Epoch [1/3], Step [21943/41412], Loss: 2.1966, Perplexity: 8.9948
Epoch [1/3], Step [21944/41412], Loss: 2.1784, Perplexity: 8.8323
Epoch [1/3], Step [21945/41412], Loss: 2.3492, Perplexity: 10.4774
Epoch [1/3], Step [21946/41412], Loss: 2.1887, Perplexity: 8.9234
Epoch [1/3], Step [21947/41412], Loss: 1.9986, Perplexity: 7.3791
Epoch [1/3], Step [21948/41412], Loss: 2.7750, Perple

Epoch [1/3], Step [23447/41412], Loss: 2.2639, Perplexity: 9.6207
Epoch [1/3], Step [23448/41412], Loss: 2.5553, Perplexity: 12.8755
Epoch [1/3], Step [23449/41412], Loss: 2.0773, Perplexity: 7.9826
Epoch [1/3], Step [23450/41412], Loss: 1.7966, Perplexity: 6.0293
Epoch [1/3], Step [23451/41412], Loss: 2.4138, Perplexity: 11.1764
Epoch [1/3], Step [23452/41412], Loss: 2.0014, Perplexity: 7.3993
Epoch [1/3], Step [23453/41412], Loss: 2.5434, Perplexity: 12.7232
Epoch [1/3], Step [23454/41412], Loss: 3.5438, Perplexity: 34.5993
Epoch [1/3], Step [23455/41412], Loss: 2.7692, Perplexity: 15.9465
Epoch [1/3], Step [23456/41412], Loss: 2.6417, Perplexity: 14.0373
Epoch [1/3], Step [23457/41412], Loss: 2.0674, Perplexity: 7.9040
Epoch [1/3], Step [23458/41412], Loss: 2.5782, Perplexity: 13.1730
Epoch [1/3], Step [23459/41412], Loss: 2.4932, Perplexity: 12.1001
Epoch [1/3], Step [23460/41412], Loss: 2.4486, Perplexity: 11.5724
Epoch [1/3], Step [23461/41412], Loss: 2.6134, Perple

Epoch [1/3], Step [24842/41412], Loss: 2.4146, Perplexity: 11.1854
Epoch [1/3], Step [24843/41412], Loss: 2.6224, Perplexity: 13.7682
Epoch [1/3], Step [24844/41412], Loss: 2.8784, Perplexity: 17.7861
Epoch [1/3], Step [24845/41412], Loss: 2.7815, Perplexity: 16.1434
Epoch [1/3], Step [24846/41412], Loss: 2.6014, Perplexity: 13.4823
Epoch [1/3], Step [24847/41412], Loss: 1.8966, Perplexity: 6.6633
Epoch [1/3], Step [24848/41412], Loss: 2.0237, Perplexity: 7.5662
Epoch [1/3], Step [24849/41412], Loss: 2.0409, Perplexity: 7.6978
Epoch [1/3], Step [24850/41412], Loss: 3.3335, Perplexity: 28.0377
Epoch [1/3], Step [24851/41412], Loss: 2.6169, Perplexity: 13.6934
Epoch [1/3], Step [24852/41412], Loss: 2.2778, Perplexity: 9.7548
Epoch [1/3], Step [24853/41412], Loss: 2.0866, Perplexity: 8.0573
Epoch [1/3], Step [24854/41412], Loss: 2.1665, Perplexity: 8.7277
Epoch [1/3], Step [24855/41412], Loss: 3.3315, Perplexity: 27.9798
Epoch [1/3], Step [24856/41412], Loss: 2.4344, Perplex

Epoch [1/3], Step [26703/41412], Loss: 2.3329, Perplexity: 10.3076
Epoch [1/3], Step [26704/41412], Loss: 2.5438, Perplexity: 12.7285
Epoch [1/3], Step [26705/41412], Loss: 2.4776, Perplexity: 11.9124
Epoch [1/3], Step [26706/41412], Loss: 2.4792, Perplexity: 11.9317
Epoch [1/3], Step [26707/41412], Loss: 2.2189, Perplexity: 9.1973
Epoch [1/3], Step [26708/41412], Loss: 2.1419, Perplexity: 8.5157
Epoch [1/3], Step [26709/41412], Loss: 2.3519, Perplexity: 10.5050
Epoch [1/3], Step [26710/41412], Loss: 2.6879, Perplexity: 14.7004
Epoch [1/3], Step [26711/41412], Loss: 2.5297, Perplexity: 12.5492
Epoch [1/3], Step [26712/41412], Loss: 2.7097, Perplexity: 15.0251
Epoch [1/3], Step [26713/41412], Loss: 2.8918, Perplexity: 18.0252
Epoch [1/3], Step [26714/41412], Loss: 2.0747, Perplexity: 7.9620
Epoch [1/3], Step [26715/41412], Loss: 2.0550, Perplexity: 7.8066
Epoch [1/3], Step [26716/41412], Loss: 2.7928, Perplexity: 16.3273
Epoch [1/3], Step [26717/41412], Loss: 2.3394, Perpl

Epoch [1/3], Step [29339/41412], Loss: 2.2282, Perplexity: 9.2831
Epoch [1/3], Step [29340/41412], Loss: 3.2816, Perplexity: 26.6191
Epoch [1/3], Step [29341/41412], Loss: 2.9404, Perplexity: 18.9229
Epoch [1/3], Step [29342/41412], Loss: 2.1736, Perplexity: 8.7902
Epoch [1/3], Step [29343/41412], Loss: 2.3691, Perplexity: 10.6877
Epoch [1/3], Step [29344/41412], Loss: 3.5479, Perplexity: 34.7403
Epoch [1/3], Step [29345/41412], Loss: 2.3467, Perplexity: 10.4510
Epoch [1/3], Step [29346/41412], Loss: 2.6991, Perplexity: 14.8657
Epoch [1/3], Step [29347/41412], Loss: 2.8059, Perplexity: 16.5422
Epoch [1/3], Step [29348/41412], Loss: 3.0588, Perplexity: 21.3020
Epoch [1/3], Step [29349/41412], Loss: 2.0436, Perplexity: 7.7184
Epoch [1/3], Step [29350/41412], Loss: 2.8913, Perplexity: 18.0175
Epoch [1/3], Step [29351/41412], Loss: 2.0343, Perplexity: 7.6472
Epoch [1/3], Step [29352/41412], Loss: 3.0721, Perplexity: 21.5863
Epoch [1/3], Step [29353/41412], Loss: 2.0466, Perpl

Epoch [1/3], Step [30599/41412], Loss: 2.1280, Perplexity: 8.3984
Epoch [1/3], Step [30600/41412], Loss: 2.7689, Perplexity: 15.9413
Epoch [1/3], Step [30601/41412], Loss: 2.6488, Perplexity: 14.1367
Epoch [1/3], Step [30602/41412], Loss: 2.1130, Perplexity: 8.2729
Epoch [1/3], Step [30603/41412], Loss: 1.9145, Perplexity: 6.7834
Epoch [1/3], Step [30604/41412], Loss: 2.0920, Perplexity: 8.1012
Epoch [1/3], Step [30605/41412], Loss: 2.4152, Perplexity: 11.1924
Epoch [1/3], Step [30606/41412], Loss: 2.6139, Perplexity: 13.6517
Epoch [1/3], Step [30607/41412], Loss: 2.2390, Perplexity: 9.3838
Epoch [1/3], Step [30608/41412], Loss: 2.0340, Perplexity: 7.6446
Epoch [1/3], Step [30609/41412], Loss: 2.4938, Perplexity: 12.1071
Epoch [1/3], Step [30610/41412], Loss: 2.4492, Perplexity: 11.5790
Epoch [1/3], Step [30611/41412], Loss: 2.0128, Perplexity: 7.4845
Epoch [1/3], Step [30612/41412], Loss: 2.3708, Perplexity: 10.7063
Epoch [1/3], Step [30613/41412], Loss: 2.8690, Perplexi

Epoch [1/3], Step [31839/41412], Loss: 2.4247, Perplexity: 11.2984
Epoch [1/3], Step [31840/41412], Loss: 2.5131, Perplexity: 12.3426
Epoch [1/3], Step [31841/41412], Loss: 2.2647, Perplexity: 9.6285
Epoch [1/3], Step [31842/41412], Loss: 2.5757, Perplexity: 13.1399
Epoch [1/3], Step [31843/41412], Loss: 1.9218, Perplexity: 6.8331
Epoch [1/3], Step [31844/41412], Loss: 2.2405, Perplexity: 9.3983
Epoch [1/3], Step [31845/41412], Loss: 2.5311, Perplexity: 12.5670
Epoch [1/3], Step [31846/41412], Loss: 2.2769, Perplexity: 9.7469
Epoch [1/3], Step [31847/41412], Loss: 3.0088, Perplexity: 20.2627
Epoch [1/3], Step [31848/41412], Loss: 2.9043, Perplexity: 18.2517
Epoch [1/3], Step [31849/41412], Loss: 2.6415, Perplexity: 14.0345
Epoch [1/3], Step [31850/41412], Loss: 2.3848, Perplexity: 10.8570
Epoch [1/3], Step [31851/41412], Loss: 2.0116, Perplexity: 7.4754
Epoch [1/3], Step [31852/41412], Loss: 2.2112, Perplexity: 9.1269
Epoch [1/3], Step [31853/41412], Loss: 2.4639, Perplex

Epoch [1/3], Step [33000/41412], Loss: 1.9784, Perplexity: 7.2310
Epoch [1/3], Step [33001/41412], Loss: 2.6310, Perplexity: 13.8880
Epoch [1/3], Step [33002/41412], Loss: 2.3263, Perplexity: 10.2402
Epoch [1/3], Step [33003/41412], Loss: 1.9475, Perplexity: 7.0109
Epoch [1/3], Step [33004/41412], Loss: 2.4461, Perplexity: 11.5435
Epoch [1/3], Step [33005/41412], Loss: 3.0952, Perplexity: 22.0907
Epoch [1/3], Step [33006/41412], Loss: 2.6006, Perplexity: 13.4723
Epoch [1/3], Step [33007/41412], Loss: 1.8346, Perplexity: 6.2626
Epoch [1/3], Step [33008/41412], Loss: 2.2801, Perplexity: 9.7778
Epoch [1/3], Step [33009/41412], Loss: 3.0641, Perplexity: 21.4156
Epoch [1/3], Step [33010/41412], Loss: 2.1009, Perplexity: 8.1739
Epoch [1/3], Step [33011/41412], Loss: 2.9106, Perplexity: 18.3686
Epoch [1/3], Step [33012/41412], Loss: 2.1383, Perplexity: 8.4854
Epoch [1/3], Step [33013/41412], Loss: 2.0437, Perplexity: 7.7188
Epoch [1/3], Step [33014/41412], Loss: 2.4772, Perplexi

Epoch [1/3], Step [34183/41412], Loss: 1.7279, Perplexity: 5.6286
Epoch [1/3], Step [34184/41412], Loss: 2.5039, Perplexity: 12.2306
Epoch [1/3], Step [34185/41412], Loss: 2.6885, Perplexity: 14.7092
Epoch [1/3], Step [34186/41412], Loss: 2.4458, Perplexity: 11.5396
Epoch [1/3], Step [34187/41412], Loss: 2.0521, Perplexity: 7.7843
Epoch [1/3], Step [34188/41412], Loss: 2.1955, Perplexity: 8.9844
Epoch [1/3], Step [34189/41412], Loss: 2.0763, Perplexity: 7.9750
Epoch [1/3], Step [34190/41412], Loss: 1.7161, Perplexity: 5.5626
Epoch [1/3], Step [34191/41412], Loss: 2.5285, Perplexity: 12.5344
Epoch [1/3], Step [34192/41412], Loss: 2.3696, Perplexity: 10.6929
Epoch [1/3], Step [34193/41412], Loss: 2.5292, Perplexity: 12.5438
Epoch [1/3], Step [34194/41412], Loss: 2.7086, Perplexity: 15.0088
Epoch [1/3], Step [34195/41412], Loss: 2.5770, Perplexity: 13.1580
Epoch [1/3], Step [34196/41412], Loss: 2.6365, Perplexity: 13.9642
Epoch [1/3], Step [34197/41412], Loss: 2.6251, Perple

Epoch [1/3], Step [35283/41412], Loss: 1.9035, Perplexity: 6.7095
Epoch [1/3], Step [35284/41412], Loss: 2.7041, Perplexity: 14.9407
Epoch [1/3], Step [35285/41412], Loss: 2.5692, Perplexity: 13.0548
Epoch [1/3], Step [35286/41412], Loss: 1.7455, Perplexity: 5.7288
Epoch [1/3], Step [35287/41412], Loss: 1.8690, Perplexity: 6.4816
Epoch [1/3], Step [35288/41412], Loss: 2.6348, Perplexity: 13.9405
Epoch [1/3], Step [35289/41412], Loss: 1.9841, Perplexity: 7.2724
Epoch [1/3], Step [35290/41412], Loss: 2.5874, Perplexity: 13.2955
Epoch [1/3], Step [35291/41412], Loss: 2.1446, Perplexity: 8.5384
Epoch [1/3], Step [35292/41412], Loss: 2.2789, Perplexity: 9.7661
Epoch [1/3], Step [35293/41412], Loss: 2.3467, Perplexity: 10.4508
Epoch [1/3], Step [35294/41412], Loss: 1.8950, Perplexity: 6.6523
Epoch [1/3], Step [35295/41412], Loss: 2.2123, Perplexity: 9.1370
Epoch [1/3], Step [35296/41412], Loss: 2.2750, Perplexity: 9.7284
Epoch [1/3], Step [35297/41412], Loss: 2.3566, Perplexity

Epoch [1/3], Step [36347/41412], Loss: 3.1426, Perplexity: 23.1638
Epoch [1/3], Step [36348/41412], Loss: 1.9802, Perplexity: 7.2443
Epoch [1/3], Step [36349/41412], Loss: 1.9804, Perplexity: 7.2458
Epoch [1/3], Step [36350/41412], Loss: 2.5043, Perplexity: 12.2355
Epoch [1/3], Step [36351/41412], Loss: 2.2613, Perplexity: 9.5958
Epoch [1/3], Step [36352/41412], Loss: 1.7082, Perplexity: 5.5189
Epoch [1/3], Step [36353/41412], Loss: 2.2619, Perplexity: 9.6014
Epoch [1/3], Step [36354/41412], Loss: 2.2259, Perplexity: 9.2617
Epoch [1/3], Step [36355/41412], Loss: 1.8523, Perplexity: 6.3744
Epoch [1/3], Step [36356/41412], Loss: 2.8060, Perplexity: 16.5444
Epoch [1/3], Step [36357/41412], Loss: 2.3876, Perplexity: 10.8874
Epoch [1/3], Step [36358/41412], Loss: 2.5312, Perplexity: 12.5687
Epoch [1/3], Step [36359/41412], Loss: 2.6237, Perplexity: 13.7871
Epoch [1/3], Step [36360/41412], Loss: 2.5021, Perplexity: 12.2083
Epoch [1/3], Step [36361/41412], Loss: 2.2833, Perplexi

Epoch [1/3], Step [37410/41412], Loss: 2.3013, Perplexity: 9.9869
Epoch [1/3], Step [37411/41412], Loss: 2.1175, Perplexity: 8.3102
Epoch [1/3], Step [37412/41412], Loss: 2.1948, Perplexity: 8.9779
Epoch [1/3], Step [37413/41412], Loss: 2.7069, Perplexity: 14.9825
Epoch [1/3], Step [37414/41412], Loss: 1.9848, Perplexity: 7.2778
Epoch [1/3], Step [37415/41412], Loss: 2.6606, Perplexity: 14.3046
Epoch [1/3], Step [37416/41412], Loss: 2.1884, Perplexity: 8.9210
Epoch [1/3], Step [37417/41412], Loss: 2.0823, Perplexity: 8.0232
Epoch [1/3], Step [37418/41412], Loss: 2.1190, Perplexity: 8.3225
Epoch [1/3], Step [37419/41412], Loss: 1.9076, Perplexity: 6.7367
Epoch [1/3], Step [37420/41412], Loss: 2.2389, Perplexity: 9.3832
Epoch [1/3], Step [37421/41412], Loss: 1.8545, Perplexity: 6.3884
Epoch [1/3], Step [37422/41412], Loss: 2.4711, Perplexity: 11.8353
Epoch [1/3], Step [37423/41412], Loss: 2.0825, Perplexity: 8.0243
Epoch [1/3], Step [37424/41412], Loss: 2.4918, Perplexity: 

Epoch [1/3], Step [38466/41412], Loss: 2.2585, Perplexity: 9.5687
Epoch [1/3], Step [38467/41412], Loss: 2.4842, Perplexity: 11.9915
Epoch [1/3], Step [38468/41412], Loss: 2.1018, Perplexity: 8.1812
Epoch [1/3], Step [38469/41412], Loss: 1.7016, Perplexity: 5.4829
Epoch [1/3], Step [38470/41412], Loss: 2.6653, Perplexity: 14.3721
Epoch [1/3], Step [38471/41412], Loss: 2.9615, Perplexity: 19.3270
Epoch [1/3], Step [38472/41412], Loss: 2.4211, Perplexity: 11.2577
Epoch [1/3], Step [38473/41412], Loss: 2.3737, Perplexity: 10.7368
Epoch [1/3], Step [38474/41412], Loss: 2.2011, Perplexity: 9.0346
Epoch [1/3], Step [38475/41412], Loss: 2.4427, Perplexity: 11.5041
Epoch [1/3], Step [38476/41412], Loss: 1.9443, Perplexity: 6.9885
Epoch [1/3], Step [38477/41412], Loss: 2.6385, Perplexity: 13.9924
Epoch [1/3], Step [38478/41412], Loss: 2.1519, Perplexity: 8.6013
Epoch [1/3], Step [38479/41412], Loss: 1.8358, Perplexity: 6.2702
Epoch [1/3], Step [38480/41412], Loss: 2.3198, Perplexi

Epoch [1/3], Step [39530/41412], Loss: 3.1422, Perplexity: 23.1548
Epoch [1/3], Step [39531/41412], Loss: 2.3079, Perplexity: 10.0535
Epoch [1/3], Step [39532/41412], Loss: 2.4786, Perplexity: 11.9244
Epoch [1/3], Step [39533/41412], Loss: 3.7855, Perplexity: 44.0572
Epoch [1/3], Step [39534/41412], Loss: 2.6207, Perplexity: 13.7448
Epoch [1/3], Step [39535/41412], Loss: 2.4653, Perplexity: 11.7669
Epoch [1/3], Step [39536/41412], Loss: 2.4722, Perplexity: 11.8482
Epoch [1/3], Step [39537/41412], Loss: 1.7396, Perplexity: 5.6949
Epoch [1/3], Step [39538/41412], Loss: 2.7245, Perplexity: 15.2487
Epoch [1/3], Step [39539/41412], Loss: 2.3000, Perplexity: 9.9745
Epoch [1/3], Step [39540/41412], Loss: 2.5500, Perplexity: 12.8076
Epoch [1/3], Step [39541/41412], Loss: 2.3595, Perplexity: 10.5856
Epoch [1/3], Step [39542/41412], Loss: 1.9933, Perplexity: 7.3398
Epoch [1/3], Step [39543/41412], Loss: 2.1840, Perplexity: 8.8814
Epoch [1/3], Step [39544/41412], Loss: 2.2050, Perpl

Epoch [1/3], Step [40533/41412], Loss: 2.3081, Perplexity: 10.0558
Epoch [1/3], Step [40534/41412], Loss: 2.7161, Perplexity: 15.1218
Epoch [1/3], Step [40535/41412], Loss: 2.4587, Perplexity: 11.6899
Epoch [1/3], Step [40536/41412], Loss: 2.0219, Perplexity: 7.5529
Epoch [1/3], Step [40537/41412], Loss: 2.7679, Perplexity: 15.9256
Epoch [1/3], Step [40538/41412], Loss: 2.4286, Perplexity: 11.3433
Epoch [1/3], Step [40539/41412], Loss: 1.7462, Perplexity: 5.7328
Epoch [1/3], Step [40540/41412], Loss: 2.5701, Perplexity: 13.0675
Epoch [1/3], Step [40541/41412], Loss: 2.0188, Perplexity: 7.5296
Epoch [1/3], Step [40542/41412], Loss: 1.4968, Perplexity: 4.4674
Epoch [1/3], Step [40543/41412], Loss: 1.8020, Perplexity: 6.0618
Epoch [1/3], Step [40544/41412], Loss: 1.6852, Perplexity: 5.3933
Epoch [1/3], Step [40545/41412], Loss: 2.4282, Perplexity: 11.3390
Epoch [1/3], Step [40546/41412], Loss: 2.3285, Perplexity: 10.2625
Epoch [1/3], Step [40547/41412], Loss: 2.5408, Perplex

Epoch [2/3], Step [93/41412], Loss: 2.7915, Perplexity: 16.3054
Epoch [2/3], Step [94/41412], Loss: 2.1756, Perplexity: 8.8071
Epoch [2/3], Step [95/41412], Loss: 2.6190, Perplexity: 13.7214
Epoch [2/3], Step [96/41412], Loss: 2.2562, Perplexity: 9.5471
Epoch [2/3], Step [97/41412], Loss: 2.6527, Perplexity: 14.1921
Epoch [2/3], Step [98/41412], Loss: 2.2802, Perplexity: 9.7785
Epoch [2/3], Step [99/41412], Loss: 2.1911, Perplexity: 8.9451
Epoch [2/3], Step [100/41412], Loss: 2.4124, Perplexity: 11.1603
Epoch [2/3], Step [101/41412], Loss: 2.0799, Perplexity: 8.0035
Epoch [2/3], Step [102/41412], Loss: 2.2249, Perplexity: 9.2522
Epoch [2/3], Step [103/41412], Loss: 2.0447, Perplexity: 7.7270
Epoch [2/3], Step [104/41412], Loss: 2.4862, Perplexity: 12.0160
Epoch [2/3], Step [105/41412], Loss: 2.2705, Perplexity: 9.6841
Epoch [2/3], Step [106/41412], Loss: 2.3958, Perplexity: 10.9765
Epoch [2/3], Step [107/41412], Loss: 2.2243, Perplexity: 9.2466
Epoch [2/3], Step [108/414

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)




Epoch [2/3], Step [26514/41412], Loss: 2.3093, Perplexity: 10.0672
Epoch [2/3], Step [26515/41412], Loss: 1.9071, Perplexity: 6.7339
Epoch [2/3], Step [26516/41412], Loss: 1.9533, Perplexity: 7.0518
Epoch [2/3], Step [26517/41412], Loss: 1.9132, Perplexity: 6.7749
Epoch [2/3], Step [26518/41412], Loss: 2.3128, Perplexity: 10.1030
Epoch [2/3], Step [26519/41412], Loss: 2.2663, Perplexity: 9.6433
Epoch [2/3], Step [26520/41412], Loss: 1.7399, Perplexity: 5.6966
Epoch [2/3], Step [26521/41412], Loss: 1.9611, Perplexity: 7.1072
Epoch [2/3], Step [26522/41412], Loss: 2.9968, Perplexity: 20.0215
Epoch [2/3], Step [26523/41412], Loss: 2.3088, Perplexity: 10.0627
Epoch [2/3], Step [26524/41412], Loss: 2.5644, Perplexity: 12.9934
Epoch [2/3], Step [26525/41412], Loss: 1.8740, Perplexity: 6.5145
Epoch [2/3], Step [26526/41412], Loss: 2.3136, Perplexity: 10.1106
Epoch [2/3], Step [26527/41412], Loss: 2.4717, Perplexity: 11.8423
Epoch [2/3], Step [26528/41412], Loss: 2.2426, Perplex

Epoch [2/3], Step [27179/41412], Loss: 2.1841, Perplexity: 8.8829
Epoch [2/3], Step [27180/41412], Loss: 1.7388, Perplexity: 5.6908
Epoch [2/3], Step [27181/41412], Loss: 2.2227, Perplexity: 9.2319
Epoch [2/3], Step [27182/41412], Loss: 2.0540, Perplexity: 7.7987
Epoch [2/3], Step [27183/41412], Loss: 1.6750, Perplexity: 5.3387
Epoch [2/3], Step [27184/41412], Loss: 1.8204, Perplexity: 6.1743
Epoch [2/3], Step [27185/41412], Loss: 1.5333, Perplexity: 4.6334
Epoch [2/3], Step [27186/41412], Loss: 1.8807, Perplexity: 6.5584
Epoch [2/3], Step [27187/41412], Loss: 2.0056, Perplexity: 7.4308
Epoch [2/3], Step [27188/41412], Loss: 2.0773, Perplexity: 7.9833
Epoch [2/3], Step [27189/41412], Loss: 2.5125, Perplexity: 12.3354
Epoch [2/3], Step [27190/41412], Loss: 2.7596, Perplexity: 15.7936
Epoch [2/3], Step [27191/41412], Loss: 2.9698, Perplexity: 19.4879
Epoch [2/3], Step [27192/41412], Loss: 2.5338, Perplexity: 12.6013
Epoch [2/3], Step [27193/41412], Loss: 1.7066, Perplexity:

Epoch [2/3], Step [27852/41412], Loss: 2.3989, Perplexity: 11.0115
Epoch [2/3], Step [27853/41412], Loss: 2.3867, Perplexity: 10.8777
Epoch [2/3], Step [27854/41412], Loss: 2.3120, Perplexity: 10.0947
Epoch [2/3], Step [27855/41412], Loss: 2.0914, Perplexity: 8.0962
Epoch [2/3], Step [27856/41412], Loss: 2.1846, Perplexity: 8.8870
Epoch [2/3], Step [27857/41412], Loss: 2.3969, Perplexity: 10.9891
Epoch [2/3], Step [27858/41412], Loss: 2.1909, Perplexity: 8.9435
Epoch [2/3], Step [27859/41412], Loss: 1.8506, Perplexity: 6.3637
Epoch [2/3], Step [27860/41412], Loss: 1.9918, Perplexity: 7.3291
Epoch [2/3], Step [27861/41412], Loss: 2.2067, Perplexity: 9.0858
Epoch [2/3], Step [27862/41412], Loss: 1.5829, Perplexity: 4.8689
Epoch [2/3], Step [27863/41412], Loss: 2.1304, Perplexity: 8.4184
Epoch [2/3], Step [27864/41412], Loss: 2.4489, Perplexity: 11.5754
Epoch [2/3], Step [27865/41412], Loss: 2.0086, Perplexity: 7.4528
Epoch [2/3], Step [27866/41412], Loss: 2.7130, Perplexity

Epoch [2/3], Step [28487/41412], Loss: 2.1205, Perplexity: 8.3351
Epoch [2/3], Step [28488/41412], Loss: 2.1306, Perplexity: 8.4196
Epoch [2/3], Step [28489/41412], Loss: 1.8707, Perplexity: 6.4928
Epoch [2/3], Step [28490/41412], Loss: 1.9410, Perplexity: 6.9659
Epoch [2/3], Step [28491/41412], Loss: 1.9162, Perplexity: 6.7953
Epoch [2/3], Step [28492/41412], Loss: 2.2392, Perplexity: 9.3854
Epoch [2/3], Step [28493/41412], Loss: 2.1133, Perplexity: 8.2758
Epoch [2/3], Step [28494/41412], Loss: 1.8914, Perplexity: 6.6286
Epoch [2/3], Step [28495/41412], Loss: 1.9054, Perplexity: 6.7223
Epoch [2/3], Step [28496/41412], Loss: 1.9451, Perplexity: 6.9946
Epoch [2/3], Step [28497/41412], Loss: 2.1219, Perplexity: 8.3474
Epoch [2/3], Step [28498/41412], Loss: 2.3499, Perplexity: 10.4840
Epoch [2/3], Step [28499/41412], Loss: 1.8538, Perplexity: 6.3842
Epoch [2/3], Step [28500/41412], Loss: 2.2791, Perplexity: 9.7684
Epoch [2/3], Step [28501/41412], Loss: 1.9817, Perplexity: 7.

Epoch [2/3], Step [29152/41412], Loss: 1.8977, Perplexity: 6.6708
Epoch [2/3], Step [29153/41412], Loss: 2.3026, Perplexity: 10.0006
Epoch [2/3], Step [29154/41412], Loss: 2.2096, Perplexity: 9.1125
Epoch [2/3], Step [29155/41412], Loss: 2.1557, Perplexity: 8.6342
Epoch [2/3], Step [29156/41412], Loss: 2.5168, Perplexity: 12.3890
Epoch [2/3], Step [29157/41412], Loss: 2.0802, Perplexity: 8.0057
Epoch [2/3], Step [29158/41412], Loss: 2.4094, Perplexity: 11.1272
Epoch [2/3], Step [29159/41412], Loss: 1.7099, Perplexity: 5.5281
Epoch [2/3], Step [29160/41412], Loss: 2.3051, Perplexity: 10.0255
Epoch [2/3], Step [29161/41412], Loss: 1.9319, Perplexity: 6.9029
Epoch [2/3], Step [29162/41412], Loss: 1.8660, Perplexity: 6.4622
Epoch [2/3], Step [29163/41412], Loss: 2.3988, Perplexity: 11.0098
Epoch [2/3], Step [29164/41412], Loss: 2.2088, Perplexity: 9.1047
Epoch [2/3], Step [29165/41412], Loss: 2.2871, Perplexity: 9.8459
Epoch [2/3], Step [29166/41412], Loss: 2.2347, Perplexity

Epoch [2/3], Step [29883/41412], Loss: 1.8768, Perplexity: 6.5325
Epoch [2/3], Step [29884/41412], Loss: 1.9868, Perplexity: 7.2923
Epoch [2/3], Step [29885/41412], Loss: 2.3207, Perplexity: 10.1825
Epoch [2/3], Step [29886/41412], Loss: 2.1473, Perplexity: 8.5613
Epoch [2/3], Step [29887/41412], Loss: 1.7373, Perplexity: 5.6819
Epoch [2/3], Step [29888/41412], Loss: 1.6706, Perplexity: 5.3156
Epoch [2/3], Step [29889/41412], Loss: 2.5790, Perplexity: 13.1835
Epoch [2/3], Step [29890/41412], Loss: 2.1274, Perplexity: 8.3927
Epoch [2/3], Step [29891/41412], Loss: 2.3573, Perplexity: 10.5624
Epoch [2/3], Step [29892/41412], Loss: 2.1835, Perplexity: 8.8775
Epoch [2/3], Step [29893/41412], Loss: 2.3420, Perplexity: 10.4020
Epoch [2/3], Step [29894/41412], Loss: 2.0139, Perplexity: 7.4923
Epoch [2/3], Step [29895/41412], Loss: 2.4576, Perplexity: 11.6762
Epoch [2/3], Step [29896/41412], Loss: 2.0289, Perplexity: 7.6058
Epoch [2/3], Step [29897/41412], Loss: 2.1489, Perplexity

Epoch [2/3], Step [30518/41412], Loss: 1.6726, Perplexity: 5.3259
Epoch [2/3], Step [30519/41412], Loss: 2.9515, Perplexity: 19.1344
Epoch [2/3], Step [30520/41412], Loss: 2.1675, Perplexity: 8.7364
Epoch [2/3], Step [30521/41412], Loss: 2.0742, Perplexity: 7.9582
Epoch [2/3], Step [30522/41412], Loss: 2.3872, Perplexity: 10.8828
Epoch [2/3], Step [30523/41412], Loss: 2.0057, Perplexity: 7.4316
Epoch [2/3], Step [30524/41412], Loss: 2.1415, Perplexity: 8.5124
Epoch [2/3], Step [30525/41412], Loss: 2.0355, Perplexity: 7.6564
Epoch [2/3], Step [30526/41412], Loss: 2.7588, Perplexity: 15.7807
Epoch [2/3], Step [30527/41412], Loss: 2.3182, Perplexity: 10.1576
Epoch [2/3], Step [30528/41412], Loss: 1.8305, Perplexity: 6.2368
Epoch [2/3], Step [30529/41412], Loss: 2.4509, Perplexity: 11.5991
Epoch [2/3], Step [30530/41412], Loss: 2.5953, Perplexity: 13.4006
Epoch [2/3], Step [30531/41412], Loss: 1.5960, Perplexity: 4.9335
Epoch [2/3], Step [30532/41412], Loss: 1.9796, Perplexit

Epoch [2/3], Step [31100/41412], Loss: 1.8370, Perplexity: 6.2779
Epoch [2/3], Step [31101/41412], Loss: 2.5479, Perplexity: 12.7800
Epoch [2/3], Step [31102/41412], Loss: 2.3740, Perplexity: 10.7403
Epoch [2/3], Step [31103/41412], Loss: 2.6506, Perplexity: 14.1629
Epoch [2/3], Step [31104/41412], Loss: 2.4980, Perplexity: 12.1578
Epoch [2/3], Step [31105/41412], Loss: 2.2929, Perplexity: 9.9033
Epoch [2/3], Step [31106/41412], Loss: 1.9403, Perplexity: 6.9608
Epoch [2/3], Step [31107/41412], Loss: 2.0482, Perplexity: 7.7541
Epoch [2/3], Step [31108/41412], Loss: 3.3672, Perplexity: 28.9977
Epoch [2/3], Step [31109/41412], Loss: 2.4487, Perplexity: 11.5733
Epoch [2/3], Step [31110/41412], Loss: 2.2820, Perplexity: 9.7958
Epoch [2/3], Step [31111/41412], Loss: 2.0402, Perplexity: 7.6918
Epoch [2/3], Step [31112/41412], Loss: 3.0483, Perplexity: 21.0793
Epoch [2/3], Step [31113/41412], Loss: 2.4850, Perplexity: 12.0013
Epoch [2/3], Step [31114/41412], Loss: 2.1497, Perplex

Epoch [2/3], Step [31736/41412], Loss: 2.2712, Perplexity: 9.6912
Epoch [2/3], Step [31737/41412], Loss: 2.4789, Perplexity: 11.9287
Epoch [2/3], Step [31738/41412], Loss: 1.6154, Perplexity: 5.0299
Epoch [2/3], Step [31739/41412], Loss: 1.8642, Perplexity: 6.4508
Epoch [2/3], Step [31740/41412], Loss: 1.9846, Perplexity: 7.2759
Epoch [2/3], Step [31741/41412], Loss: 2.2353, Perplexity: 9.3495
Epoch [2/3], Step [31742/41412], Loss: 2.5657, Perplexity: 13.0095
Epoch [2/3], Step [31743/41412], Loss: 2.2476, Perplexity: 9.4649
Epoch [2/3], Step [31744/41412], Loss: 1.8374, Perplexity: 6.2803
Epoch [2/3], Step [31745/41412], Loss: 2.3062, Perplexity: 10.0366
Epoch [2/3], Step [31746/41412], Loss: 2.5080, Perplexity: 12.2807
Epoch [2/3], Step [31747/41412], Loss: 3.1334, Perplexity: 22.9528
Epoch [2/3], Step [31748/41412], Loss: 2.4066, Perplexity: 11.0960
Epoch [2/3], Step [31749/41412], Loss: 2.6522, Perplexity: 14.1859
Epoch [2/3], Step [31750/41412], Loss: 2.0985, Perplexi

Epoch [2/3], Step [32343/41412], Loss: 3.0797, Perplexity: 21.7513
Epoch [2/3], Step [32344/41412], Loss: 3.3835, Perplexity: 29.4744
Epoch [2/3], Step [32345/41412], Loss: 2.3619, Perplexity: 10.6116
Epoch [2/3], Step [32346/41412], Loss: 2.3331, Perplexity: 10.3098
Epoch [2/3], Step [32347/41412], Loss: 2.4844, Perplexity: 11.9944
Epoch [2/3], Step [32348/41412], Loss: 2.2361, Perplexity: 9.3568
Epoch [2/3], Step [32349/41412], Loss: 1.8535, Perplexity: 6.3819
Epoch [2/3], Step [32350/41412], Loss: 2.1541, Perplexity: 8.6199
Epoch [2/3], Step [32351/41412], Loss: 2.1581, Perplexity: 8.6550
Epoch [2/3], Step [32352/41412], Loss: 2.3815, Perplexity: 10.8209
Epoch [2/3], Step [32353/41412], Loss: 1.8890, Perplexity: 6.6128
Epoch [2/3], Step [32354/41412], Loss: 2.2670, Perplexity: 9.6506
Epoch [2/3], Step [32355/41412], Loss: 2.1733, Perplexity: 8.7874
Epoch [2/3], Step [32356/41412], Loss: 2.3877, Perplexity: 10.8885
Epoch [2/3], Step [32357/41412], Loss: 2.0459, Perplexi

Epoch [2/3], Step [32983/41412], Loss: 2.4180, Perplexity: 11.2232
Epoch [2/3], Step [32984/41412], Loss: 2.2782, Perplexity: 9.7591
Epoch [2/3], Step [32985/41412], Loss: 2.3662, Perplexity: 10.6563
Epoch [2/3], Step [32986/41412], Loss: 2.5174, Perplexity: 12.3958
Epoch [2/3], Step [32987/41412], Loss: 2.4492, Perplexity: 11.5792
Epoch [2/3], Step [32988/41412], Loss: 2.0640, Perplexity: 7.8777
Epoch [2/3], Step [32989/41412], Loss: 2.3192, Perplexity: 10.1674
Epoch [2/3], Step [32990/41412], Loss: 1.7585, Perplexity: 5.8040
Epoch [2/3], Step [32991/41412], Loss: 2.4285, Perplexity: 11.3416
Epoch [2/3], Step [32992/41412], Loss: 2.5712, Perplexity: 13.0817
Epoch [2/3], Step [32993/41412], Loss: 2.3617, Perplexity: 10.6089
Epoch [2/3], Step [32994/41412], Loss: 2.2574, Perplexity: 9.5584
Epoch [2/3], Step [32995/41412], Loss: 2.9307, Perplexity: 18.7408
Epoch [2/3], Step [32996/41412], Loss: 1.9948, Perplexity: 7.3504
Epoch [2/3], Step [32997/41412], Loss: 2.3711, Perple

Epoch [2/3], Step [33589/41412], Loss: 1.7652, Perplexity: 5.8426
Epoch [2/3], Step [33590/41412], Loss: 1.9776, Perplexity: 7.2252
Epoch [2/3], Step [33591/41412], Loss: 1.5719, Perplexity: 4.8159
Epoch [2/3], Step [33592/41412], Loss: 2.0291, Perplexity: 7.6069
Epoch [2/3], Step [33593/41412], Loss: 2.3470, Perplexity: 10.4545
Epoch [2/3], Step [33594/41412], Loss: 2.2499, Perplexity: 9.4872
Epoch [2/3], Step [33595/41412], Loss: 2.1817, Perplexity: 8.8611
Epoch [2/3], Step [33596/41412], Loss: 1.7648, Perplexity: 5.8403
Epoch [2/3], Step [33597/41412], Loss: 2.0941, Perplexity: 8.1184
Epoch [2/3], Step [33598/41412], Loss: 2.3336, Perplexity: 10.3154
Epoch [2/3], Step [33599/41412], Loss: 1.8666, Perplexity: 6.4661
Epoch [2/3], Step [33600/41412], Loss: 2.0546, Perplexity: 7.8039
Epoch [2/3], Step [33601/41412], Loss: 1.7718, Perplexity: 5.8815
Epoch [2/3], Step [33602/41412], Loss: 2.5319, Perplexity: 12.5774
Epoch [2/3], Step [33603/41412], Loss: 2.1002, Perplexity: 

Epoch [2/3], Step [34194/41412], Loss: 2.2014, Perplexity: 9.0377
Epoch [2/3], Step [34195/41412], Loss: 2.3804, Perplexity: 10.8097
Epoch [2/3], Step [34196/41412], Loss: 1.8788, Perplexity: 6.5453
Epoch [2/3], Step [34197/41412], Loss: 1.8610, Perplexity: 6.4304
Epoch [2/3], Step [34198/41412], Loss: 1.9618, Perplexity: 7.1120
Epoch [2/3], Step [34199/41412], Loss: 2.3610, Perplexity: 10.6014
Epoch [2/3], Step [34200/41412], Loss: 2.4676, Perplexity: 11.7941
Epoch [2/3], Step [34201/41412], Loss: 1.7905, Perplexity: 5.9922
Epoch [2/3], Step [34202/41412], Loss: 2.1050, Perplexity: 8.2072
Epoch [2/3], Step [34203/41412], Loss: 2.2310, Perplexity: 9.3095
Epoch [2/3], Step [34204/41412], Loss: 2.8495, Perplexity: 17.2786
Epoch [2/3], Step [34205/41412], Loss: 2.1996, Perplexity: 9.0212
Epoch [2/3], Step [34206/41412], Loss: 1.8506, Perplexity: 6.3637
Epoch [2/3], Step [34207/41412], Loss: 2.1297, Perplexity: 8.4123
Epoch [2/3], Step [34208/41412], Loss: 1.9151, Perplexity:

Epoch [2/3], Step [34801/41412], Loss: 1.7928, Perplexity: 6.0065
Epoch [2/3], Step [34802/41412], Loss: 2.6392, Perplexity: 14.0023
Epoch [2/3], Step [34803/41412], Loss: 2.1519, Perplexity: 8.6009
Epoch [2/3], Step [34804/41412], Loss: 1.5976, Perplexity: 4.9411
Epoch [2/3], Step [34805/41412], Loss: 2.1112, Perplexity: 8.2585
Epoch [2/3], Step [34806/41412], Loss: 2.0870, Perplexity: 8.0606
Epoch [2/3], Step [34807/41412], Loss: 2.0650, Perplexity: 7.8852
Epoch [2/3], Step [34808/41412], Loss: 2.1229, Perplexity: 8.3552
Epoch [2/3], Step [34809/41412], Loss: 2.2147, Perplexity: 9.1590
Epoch [2/3], Step [34810/41412], Loss: 2.3932, Perplexity: 10.9489
Epoch [2/3], Step [34811/41412], Loss: 1.7980, Perplexity: 6.0375
Epoch [2/3], Step [34812/41412], Loss: 2.0300, Perplexity: 7.6143
Epoch [2/3], Step [34813/41412], Loss: 1.9207, Perplexity: 6.8256
Epoch [2/3], Step [34814/41412], Loss: 2.5897, Perplexity: 13.3252
Epoch [2/3], Step [34815/41412], Loss: 1.9325, Perplexity: 

Epoch [2/3], Step [35436/41412], Loss: 2.1670, Perplexity: 8.7319
Epoch [2/3], Step [35437/41412], Loss: 2.1326, Perplexity: 8.4364
Epoch [2/3], Step [35438/41412], Loss: 2.1551, Perplexity: 8.6288
Epoch [2/3], Step [35439/41412], Loss: 2.2248, Perplexity: 9.2512
Epoch [2/3], Step [35440/41412], Loss: 2.3381, Perplexity: 10.3617
Epoch [2/3], Step [35441/41412], Loss: 2.4142, Perplexity: 11.1804
Epoch [2/3], Step [35442/41412], Loss: 2.2720, Perplexity: 9.6987
Epoch [2/3], Step [35443/41412], Loss: 1.8274, Perplexity: 6.2176
Epoch [2/3], Step [35444/41412], Loss: 2.7535, Perplexity: 15.6970
Epoch [2/3], Step [35445/41412], Loss: 2.4215, Perplexity: 11.2632
Epoch [2/3], Step [35446/41412], Loss: 1.8024, Perplexity: 6.0641
Epoch [2/3], Step [35447/41412], Loss: 1.8131, Perplexity: 6.1296
Epoch [2/3], Step [35448/41412], Loss: 2.2347, Perplexity: 9.3434
Epoch [2/3], Step [35449/41412], Loss: 2.0458, Perplexity: 7.7351
Epoch [2/3], Step [35450/41412], Loss: 2.1030, Perplexity:

Epoch [2/3], Step [35988/41412], Loss: 1.4324, Perplexity: 4.1888
Epoch [2/3], Step [35989/41412], Loss: 2.0772, Perplexity: 7.9823
Epoch [2/3], Step [35990/41412], Loss: 1.9089, Perplexity: 6.7455
Epoch [2/3], Step [35991/41412], Loss: 1.9013, Perplexity: 6.6946
Epoch [2/3], Step [35992/41412], Loss: 2.3448, Perplexity: 10.4309
Epoch [2/3], Step [35993/41412], Loss: 2.2514, Perplexity: 9.5010
Epoch [2/3], Step [35994/41412], Loss: 1.9373, Perplexity: 6.9397
Epoch [2/3], Step [35995/41412], Loss: 2.5143, Perplexity: 12.3583
Epoch [2/3], Step [35996/41412], Loss: 1.9642, Perplexity: 7.1294
Epoch [2/3], Step [35997/41412], Loss: 2.2849, Perplexity: 9.8251
Epoch [2/3], Step [35998/41412], Loss: 2.2056, Perplexity: 9.0753
Epoch [2/3], Step [35999/41412], Loss: 1.9101, Perplexity: 6.7537
Epoch [2/3], Step [36000/41412], Loss: 1.9354, Perplexity: 6.9266
Epoch [2/3], Step [36001/41412], Loss: 1.9112, Perplexity: 6.7610
Epoch [2/3], Step [36002/41412], Loss: 1.9325, Perplexity: 6

Epoch [2/3], Step [36564/41412], Loss: 2.0036, Perplexity: 7.4161
Epoch [2/3], Step [36565/41412], Loss: 1.9809, Perplexity: 7.2491
Epoch [2/3], Step [36566/41412], Loss: 2.0410, Perplexity: 7.6980
Epoch [2/3], Step [36567/41412], Loss: 2.4894, Perplexity: 12.0544
Epoch [2/3], Step [36568/41412], Loss: 2.5029, Perplexity: 12.2183
Epoch [2/3], Step [36569/41412], Loss: 2.5630, Perplexity: 12.9747
Epoch [2/3], Step [36570/41412], Loss: 2.9121, Perplexity: 18.3953
Epoch [2/3], Step [36571/41412], Loss: 1.7051, Perplexity: 5.5020
Epoch [2/3], Step [36572/41412], Loss: 1.9809, Perplexity: 7.2492
Epoch [2/3], Step [36573/41412], Loss: 2.4946, Perplexity: 12.1171
Epoch [2/3], Step [36574/41412], Loss: 2.0119, Perplexity: 7.4774
Epoch [2/3], Step [36575/41412], Loss: 1.8363, Perplexity: 6.2731
Epoch [2/3], Step [36576/41412], Loss: 2.2650, Perplexity: 9.6313
Epoch [2/3], Step [36577/41412], Loss: 2.2324, Perplexity: 9.3220
Epoch [2/3], Step [36578/41412], Loss: 2.6861, Perplexity

Epoch [2/3], Step [37141/41412], Loss: 1.7907, Perplexity: 5.9937
Epoch [2/3], Step [37142/41412], Loss: 2.3004, Perplexity: 9.9778
Epoch [2/3], Step [37143/41412], Loss: 2.4408, Perplexity: 11.4823
Epoch [2/3], Step [37144/41412], Loss: 2.2907, Perplexity: 9.8815
Epoch [2/3], Step [37145/41412], Loss: 2.0429, Perplexity: 7.7128
Epoch [2/3], Step [37146/41412], Loss: 2.4530, Perplexity: 11.6227
Epoch [2/3], Step [37147/41412], Loss: 1.6577, Perplexity: 5.2474
Epoch [2/3], Step [37148/41412], Loss: 2.2794, Perplexity: 9.7710
Epoch [2/3], Step [37149/41412], Loss: 2.7378, Perplexity: 15.4530
Epoch [2/3], Step [37150/41412], Loss: 1.8418, Perplexity: 6.3078
Epoch [2/3], Step [37151/41412], Loss: 1.9792, Perplexity: 7.2369
Epoch [2/3], Step [37152/41412], Loss: 2.4598, Perplexity: 11.7026
Epoch [2/3], Step [37153/41412], Loss: 1.9116, Perplexity: 6.7637
Epoch [2/3], Step [37154/41412], Loss: 2.2465, Perplexity: 9.4549
Epoch [2/3], Step [37155/41412], Loss: 2.5487, Perplexity:

Epoch [2/3], Step [37688/41412], Loss: 1.9117, Perplexity: 6.7649
Epoch [2/3], Step [37689/41412], Loss: 2.2881, Perplexity: 9.8564
Epoch [2/3], Step [37690/41412], Loss: 2.1460, Perplexity: 8.5503
Epoch [2/3], Step [37691/41412], Loss: 2.3812, Perplexity: 10.8182
Epoch [2/3], Step [37692/41412], Loss: 2.0691, Perplexity: 7.9179
Epoch [2/3], Step [37693/41412], Loss: 1.7195, Perplexity: 5.5818
Epoch [2/3], Step [37694/41412], Loss: 1.9555, Perplexity: 7.0676
Epoch [2/3], Step [37695/41412], Loss: 1.8987, Perplexity: 6.6772
Epoch [2/3], Step [37696/41412], Loss: 2.3919, Perplexity: 10.9341
Epoch [2/3], Step [37697/41412], Loss: 1.6787, Perplexity: 5.3584
Epoch [2/3], Step [37698/41412], Loss: 1.8404, Perplexity: 6.2989
Epoch [2/3], Step [37699/41412], Loss: 1.7273, Perplexity: 5.6253
Epoch [2/3], Step [37700/41412], Loss: 2.0955, Perplexity: 8.1294
Epoch [2/3], Step [37701/41412], Loss: 2.5881, Perplexity: 13.3048
Epoch [2/3], Step [37702/41412], Loss: 2.4024, Perplexity: 

Epoch [2/3], Step [38235/41412], Loss: 1.9404, Perplexity: 6.9614
Epoch [2/3], Step [38236/41412], Loss: 2.2420, Perplexity: 9.4119
Epoch [2/3], Step [38237/41412], Loss: 2.5820, Perplexity: 13.2231
Epoch [2/3], Step [38238/41412], Loss: 2.0199, Perplexity: 7.5377
Epoch [2/3], Step [38239/41412], Loss: 2.5750, Perplexity: 13.1309
Epoch [2/3], Step [38240/41412], Loss: 1.9437, Perplexity: 6.9844
Epoch [2/3], Step [38241/41412], Loss: 2.1133, Perplexity: 8.2759
Epoch [2/3], Step [38242/41412], Loss: 1.8933, Perplexity: 6.6412
Epoch [2/3], Step [38243/41412], Loss: 3.3370, Perplexity: 28.1360
Epoch [2/3], Step [38244/41412], Loss: 2.7510, Perplexity: 15.6582
Epoch [2/3], Step [38245/41412], Loss: 1.5313, Perplexity: 4.6241
Epoch [2/3], Step [38246/41412], Loss: 1.6177, Perplexity: 5.0414
Epoch [2/3], Step [38247/41412], Loss: 2.6014, Perplexity: 13.4832
Epoch [2/3], Step [38248/41412], Loss: 2.4472, Perplexity: 11.5554
Epoch [2/3], Step [38249/41412], Loss: 1.8740, Perplexit

Epoch [2/3], Step [38781/41412], Loss: 2.4049, Perplexity: 11.0778
Epoch [2/3], Step [38782/41412], Loss: 1.8493, Perplexity: 6.3556
Epoch [2/3], Step [38783/41412], Loss: 2.1115, Perplexity: 8.2605
Epoch [2/3], Step [38784/41412], Loss: 2.6658, Perplexity: 14.3791
Epoch [2/3], Step [38785/41412], Loss: 1.9858, Perplexity: 7.2847
Epoch [2/3], Step [38786/41412], Loss: 2.6974, Perplexity: 14.8408
Epoch [2/3], Step [38787/41412], Loss: 1.6581, Perplexity: 5.2494
Epoch [2/3], Step [38788/41412], Loss: 1.9632, Perplexity: 7.1218
Epoch [2/3], Step [38789/41412], Loss: 1.9333, Perplexity: 6.9122
Epoch [2/3], Step [38790/41412], Loss: 2.1617, Perplexity: 8.6855
Epoch [2/3], Step [38791/41412], Loss: 1.9941, Perplexity: 7.3453
Epoch [2/3], Step [38792/41412], Loss: 2.0158, Perplexity: 7.5069
Epoch [2/3], Step [38793/41412], Loss: 1.8045, Perplexity: 6.0766
Epoch [2/3], Step [38794/41412], Loss: 2.2038, Perplexity: 9.0595
Epoch [2/3], Step [38795/41412], Loss: 2.2655, Perplexity: 

Epoch [2/3], Step [39328/41412], Loss: 2.7063, Perplexity: 14.9742
Epoch [2/3], Step [39329/41412], Loss: 1.9966, Perplexity: 7.3641
Epoch [2/3], Step [39330/41412], Loss: 3.3326, Perplexity: 28.0100
Epoch [2/3], Step [39331/41412], Loss: 2.5753, Perplexity: 13.1349
Epoch [2/3], Step [39332/41412], Loss: 2.4530, Perplexity: 11.6232
Epoch [2/3], Step [39333/41412], Loss: 1.7565, Perplexity: 5.7919
Epoch [2/3], Step [39334/41412], Loss: 2.0984, Perplexity: 8.1528
Epoch [2/3], Step [39335/41412], Loss: 1.9921, Perplexity: 7.3307
Epoch [2/3], Step [39336/41412], Loss: 2.6155, Perplexity: 13.6736
Epoch [2/3], Step [39337/41412], Loss: 1.9135, Perplexity: 6.7767
Epoch [2/3], Step [39338/41412], Loss: 1.8951, Perplexity: 6.6531
Epoch [2/3], Step [39339/41412], Loss: 2.5256, Perplexity: 12.4980
Epoch [2/3], Step [39340/41412], Loss: 1.9377, Perplexity: 6.9429
Epoch [2/3], Step [39341/41412], Loss: 1.8130, Perplexity: 6.1290
Epoch [2/3], Step [39342/41412], Loss: 2.1868, Perplexit

Epoch [2/3], Step [39876/41412], Loss: 1.9929, Perplexity: 7.3370
Epoch [2/3], Step [39877/41412], Loss: 2.4346, Perplexity: 11.4115
Epoch [2/3], Step [39878/41412], Loss: 1.8070, Perplexity: 6.0922
Epoch [2/3], Step [39879/41412], Loss: 2.1846, Perplexity: 8.8869
Epoch [2/3], Step [39880/41412], Loss: 2.2502, Perplexity: 9.4901
Epoch [2/3], Step [39881/41412], Loss: 2.2531, Perplexity: 9.5177
Epoch [2/3], Step [39882/41412], Loss: 1.9743, Perplexity: 7.2016
Epoch [2/3], Step [39883/41412], Loss: 1.9149, Perplexity: 6.7859
Epoch [2/3], Step [39884/41412], Loss: 2.0558, Perplexity: 7.8130
Epoch [2/3], Step [39885/41412], Loss: 1.7613, Perplexity: 5.8200
Epoch [2/3], Step [39886/41412], Loss: 1.7865, Perplexity: 5.9682
Epoch [2/3], Step [39887/41412], Loss: 2.3484, Perplexity: 10.4689
Epoch [2/3], Step [39888/41412], Loss: 1.8785, Perplexity: 6.5440
Epoch [2/3], Step [39889/41412], Loss: 2.0696, Perplexity: 7.9214
Epoch [2/3], Step [39890/41412], Loss: 2.2777, Perplexity: 9

Epoch [2/3], Step [40393/41412], Loss: 2.1387, Perplexity: 8.4884
Epoch [2/3], Step [40394/41412], Loss: 2.1239, Perplexity: 8.3636
Epoch [2/3], Step [40395/41412], Loss: 2.2485, Perplexity: 9.4732
Epoch [2/3], Step [40396/41412], Loss: 2.1192, Perplexity: 8.3243
Epoch [2/3], Step [40397/41412], Loss: 1.9828, Perplexity: 7.2631
Epoch [2/3], Step [40398/41412], Loss: 1.9826, Perplexity: 7.2615
Epoch [2/3], Step [40399/41412], Loss: 1.6558, Perplexity: 5.2373
Epoch [2/3], Step [40400/41412], Loss: 2.0597, Perplexity: 7.8437
Epoch [2/3], Step [40401/41412], Loss: 2.7762, Perplexity: 16.0587
Epoch [2/3], Step [40402/41412], Loss: 1.8526, Perplexity: 6.3765
Epoch [2/3], Step [40403/41412], Loss: 2.2554, Perplexity: 9.5389
Epoch [2/3], Step [40404/41412], Loss: 2.0854, Perplexity: 8.0476
Epoch [2/3], Step [40405/41412], Loss: 2.4508, Perplexity: 11.5974
Epoch [2/3], Step [40406/41412], Loss: 1.9627, Perplexity: 7.1185
Epoch [2/3], Step [40407/41412], Loss: 2.1033, Perplexity: 8

Epoch [2/3], Step [40971/41412], Loss: 1.6306, Perplexity: 5.1070
Epoch [2/3], Step [40972/41412], Loss: 2.8643, Perplexity: 17.5372
Epoch [2/3], Step [40973/41412], Loss: 2.1620, Perplexity: 8.6886
Epoch [2/3], Step [40974/41412], Loss: 1.9766, Perplexity: 7.2180
Epoch [2/3], Step [40975/41412], Loss: 2.3629, Perplexity: 10.6219
Epoch [2/3], Step [40976/41412], Loss: 2.5255, Perplexity: 12.4972
Epoch [2/3], Step [40977/41412], Loss: 2.1267, Perplexity: 8.3871
Epoch [2/3], Step [40978/41412], Loss: 1.8896, Perplexity: 6.6167
Epoch [2/3], Step [40979/41412], Loss: 2.5246, Perplexity: 12.4864
Epoch [2/3], Step [40980/41412], Loss: 2.2629, Perplexity: 9.6111
Epoch [2/3], Step [40981/41412], Loss: 2.5830, Perplexity: 13.2371
Epoch [2/3], Step [40982/41412], Loss: 2.4454, Perplexity: 11.5354
Epoch [2/3], Step [40983/41412], Loss: 2.1344, Perplexity: 8.4523
Epoch [2/3], Step [40984/41412], Loss: 2.1433, Perplexity: 8.5279
Epoch [2/3], Step [40985/41412], Loss: 2.8955, Perplexit

Epoch [3/3], Step [110/41412], Loss: 2.7993, Perplexity: 16.4325
Epoch [3/3], Step [111/41412], Loss: 1.8633, Perplexity: 6.4449
Epoch [3/3], Step [112/41412], Loss: 2.1820, Perplexity: 8.8636
Epoch [3/3], Step [113/41412], Loss: 1.6137, Perplexity: 5.0216
Epoch [3/3], Step [114/41412], Loss: 1.5787, Perplexity: 4.8484
Epoch [3/3], Step [115/41412], Loss: 2.5824, Perplexity: 13.2291
Epoch [3/3], Step [116/41412], Loss: 2.4466, Perplexity: 11.5490
Epoch [3/3], Step [117/41412], Loss: 2.0907, Perplexity: 8.0904
Epoch [3/3], Step [118/41412], Loss: 2.8811, Perplexity: 17.8341
Epoch [3/3], Step [119/41412], Loss: 2.4671, Perplexity: 11.7884
Epoch [3/3], Step [120/41412], Loss: 2.0015, Perplexity: 7.4004
Epoch [3/3], Step [121/41412], Loss: 2.2350, Perplexity: 9.3466
Epoch [3/3], Step [122/41412], Loss: 2.3612, Perplexity: 10.6038
Epoch [3/3], Step [123/41412], Loss: 2.0890, Perplexity: 8.0769
Epoch [3/3], Step [124/41412], Loss: 2.1179, Perplexity: 8.3139
Epoch [3/3], Step [

Epoch [3/3], Step [821/41412], Loss: 1.9155, Perplexity: 6.7901
Epoch [3/3], Step [822/41412], Loss: 2.0653, Perplexity: 7.8880
Epoch [3/3], Step [823/41412], Loss: 2.2658, Perplexity: 9.6388
Epoch [3/3], Step [824/41412], Loss: 3.0479, Perplexity: 21.0715
Epoch [3/3], Step [825/41412], Loss: 2.3577, Perplexity: 10.5661
Epoch [3/3], Step [826/41412], Loss: 2.4108, Perplexity: 11.1430
Epoch [3/3], Step [827/41412], Loss: 1.9606, Perplexity: 7.1035
Epoch [3/3], Step [828/41412], Loss: 1.6357, Perplexity: 5.1332
Epoch [3/3], Step [829/41412], Loss: 1.9118, Perplexity: 6.7651
Epoch [3/3], Step [830/41412], Loss: 2.1683, Perplexity: 8.7430
Epoch [3/3], Step [831/41412], Loss: 2.1338, Perplexity: 8.4470
Epoch [3/3], Step [832/41412], Loss: 1.8239, Perplexity: 6.1960
Epoch [3/3], Step [833/41412], Loss: 2.3764, Perplexity: 10.7659
Epoch [3/3], Step [834/41412], Loss: 1.8616, Perplexity: 6.4339
Epoch [3/3], Step [835/41412], Loss: 1.8965, Perplexity: 6.6627
Epoch [3/3], Step [83

Epoch [3/3], Step [1376/41412], Loss: 2.0954, Perplexity: 8.1288
Epoch [3/3], Step [1377/41412], Loss: 2.1679, Perplexity: 8.7395
Epoch [3/3], Step [1378/41412], Loss: 1.6233, Perplexity: 5.0696
Epoch [3/3], Step [1379/41412], Loss: 2.1775, Perplexity: 8.8243
Epoch [3/3], Step [1380/41412], Loss: 1.8290, Perplexity: 6.2274
Epoch [3/3], Step [1381/41412], Loss: 1.9645, Perplexity: 7.1316
Epoch [3/3], Step [1382/41412], Loss: 2.2695, Perplexity: 9.6745
Epoch [3/3], Step [1383/41412], Loss: 2.2672, Perplexity: 9.6525
Epoch [3/3], Step [1384/41412], Loss: 2.2828, Perplexity: 9.8039
Epoch [3/3], Step [1385/41412], Loss: 2.4914, Perplexity: 12.0782
Epoch [3/3], Step [1386/41412], Loss: 2.4484, Perplexity: 11.5697
Epoch [3/3], Step [1387/41412], Loss: 3.0128, Perplexity: 20.3436
Epoch [3/3], Step [1388/41412], Loss: 1.8212, Perplexity: 6.1791
Epoch [3/3], Step [1389/41412], Loss: 2.6330, Perplexity: 13.9153
Epoch [3/3], Step [1390/41412], Loss: 1.8554, Perplexity: 6.3944
Epoch 

Epoch [3/3], Step [1862/41412], Loss: 2.0077, Perplexity: 7.4459
Epoch [3/3], Step [1863/41412], Loss: 1.8872, Perplexity: 6.6010
Epoch [3/3], Step [1864/41412], Loss: 2.1407, Perplexity: 8.5052
Epoch [3/3], Step [1865/41412], Loss: 2.3214, Perplexity: 10.1899
Epoch [3/3], Step [1866/41412], Loss: 3.6390, Perplexity: 38.0526
Epoch [3/3], Step [1867/41412], Loss: 2.5973, Perplexity: 13.4281
Epoch [3/3], Step [1868/41412], Loss: 2.8090, Perplexity: 16.5935
Epoch [3/3], Step [1869/41412], Loss: 1.8255, Perplexity: 6.2057
Epoch [3/3], Step [1870/41412], Loss: 1.5127, Perplexity: 4.5391
Epoch [3/3], Step [1871/41412], Loss: 2.0737, Perplexity: 7.9541
Epoch [3/3], Step [1872/41412], Loss: 1.8569, Perplexity: 6.4040
Epoch [3/3], Step [1873/41412], Loss: 2.3808, Perplexity: 10.8140
Epoch [3/3], Step [1874/41412], Loss: 2.4752, Perplexity: 11.8839
Epoch [3/3], Step [1875/41412], Loss: 2.7017, Perplexity: 14.9057
Epoch [3/3], Step [1876/41412], Loss: 2.1878, Perplexity: 8.9160
Epo

Epoch [3/3], Step [2409/41412], Loss: 2.2642, Perplexity: 9.6236
Epoch [3/3], Step [2410/41412], Loss: 2.9352, Perplexity: 18.8257
Epoch [3/3], Step [2411/41412], Loss: 1.8768, Perplexity: 6.5324
Epoch [3/3], Step [2412/41412], Loss: 2.9416, Perplexity: 18.9463
Epoch [3/3], Step [2413/41412], Loss: 1.7815, Perplexity: 5.9388
Epoch [3/3], Step [2414/41412], Loss: 2.2111, Perplexity: 9.1254
Epoch [3/3], Step [2415/41412], Loss: 1.9694, Perplexity: 7.1665
Epoch [3/3], Step [2416/41412], Loss: 2.0262, Perplexity: 7.5854
Epoch [3/3], Step [2417/41412], Loss: 2.7087, Perplexity: 15.0100
Epoch [3/3], Step [2418/41412], Loss: 2.1320, Perplexity: 8.4317
Epoch [3/3], Step [2419/41412], Loss: 1.8636, Perplexity: 6.4471
Epoch [3/3], Step [2420/41412], Loss: 1.9582, Perplexity: 7.0868
Epoch [3/3], Step [2421/41412], Loss: 1.8066, Perplexity: 6.0896
Epoch [3/3], Step [2422/41412], Loss: 1.8749, Perplexity: 6.5200
Epoch [3/3], Step [2423/41412], Loss: 2.0061, Perplexity: 7.4346
Epoch [

Epoch [3/3], Step [3017/41412], Loss: 1.5862, Perplexity: 4.8852
Epoch [3/3], Step [3018/41412], Loss: 2.5161, Perplexity: 12.3800
Epoch [3/3], Step [3019/41412], Loss: 2.2117, Perplexity: 9.1316
Epoch [3/3], Step [3020/41412], Loss: 2.1545, Perplexity: 8.6234
Epoch [3/3], Step [3021/41412], Loss: 2.2016, Perplexity: 9.0398
Epoch [3/3], Step [3022/41412], Loss: 2.0718, Perplexity: 7.9389
Epoch [3/3], Step [3023/41412], Loss: 2.2233, Perplexity: 9.2377
Epoch [3/3], Step [3024/41412], Loss: 2.3611, Perplexity: 10.6028
Epoch [3/3], Step [3025/41412], Loss: 1.9741, Perplexity: 7.2001
Epoch [3/3], Step [3026/41412], Loss: 2.3370, Perplexity: 10.3501
Epoch [3/3], Step [3027/41412], Loss: 2.4818, Perplexity: 11.9629
Epoch [3/3], Step [3028/41412], Loss: 2.2658, Perplexity: 9.6393
Epoch [3/3], Step [3029/41412], Loss: 2.0890, Perplexity: 8.0768
Epoch [3/3], Step [3030/41412], Loss: 1.7836, Perplexity: 5.9512
Epoch [3/3], Step [3031/41412], Loss: 2.1636, Perplexity: 8.7024
Epoch 

<a id='step3'></a>
## Step 3: (Optional) Validate your Model

To assess potential overfitting, one approach is to assess performance on a validation set.  If you decide to do this **optional** task, you are required to first complete all of the steps in the next notebook in the sequence (**3_Inference.ipynb**); as part of that notebook, you will write and test code (specifically, the `sample` method in the `DecoderRNN` class) that uses your RNN decoder to generate captions.  That code will prove incredibly useful here. 

If you decide to validate your model, please do not edit the data loader in **data_loader.py**.  Instead, create a new file named **data_loader_val.py** containing the code for obtaining the data loader for the validation data.  You can access:
- the validation images at filepath `'/opt/cocoapi/images/train2014/'`, and
- the validation image caption annotation file at filepath `'/opt/cocoapi/annotations/captions_val2014.json'`.

The suggested approach to validating your model involves creating a json file such as [this one](https://github.com/cocodataset/cocoapi/blob/master/results/captions_val2014_fakecap_results.json) containing your model's predicted captions for the validation images.  Then, you can write your own script or use one that you [find online](https://github.com/tylin/coco-caption) to calculate the BLEU score of your model.  You can read more about the BLEU score, along with other evaluation metrics (such as TEOR and Cider) in section 4.1 of [this paper](https://arxiv.org/pdf/1411.4555.pdf).  For more information about how to use the annotation file, check out the [website](http://cocodataset.org/#download) for the COCO dataset.

In [None]:
# (Optional) TODO: Validate your model.