Fine-tuning best T5 Transformer 🤖
-----------------------------------

In this notebook, we will continue the fine-tuning of T5 transformer on the new extracted sentences from the bool **Grammaire de Wolof Moderne** without considering the definitions. We obtained, after a hyperparameter tuning with `wandb`, a best bleu score of **4.281** for the french to wolof translation model. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step. It is important to notice that we will evaluate the training on the validation dataset.

- Parallel coordinates from panel:

- Parameter importance char: 
[t5_v3_importance](https://wandb.ai/oumar-kane-team/small-t5-cross-fw-translation-bayes-hpsearch-v3/reports/undefined-23-05-16-10-36-17---Vmlldzo0Mzc4NDY0?accessToken=eyaiyrid0qz1zg2jkq3fc65biw53084dpfitbi0dgonq6mweupw6kgjml9d2nv1w)

We can see in the above chart that the batch is the most important parameter with a negative correlation with the BLEU score (meaning that a lower batch size is better). Next, we the probability of modifying a character in the french corpus is also important and a high probability provide a better BLEU score.  

In [1]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, set_seed, AdamW, get_linear_schedule_with_warmup, T5ForConditionalGeneration,\
    get_cosine_schedule_with_warmup, Adafactor
from wolof_translate.utils.sent_transformers import TransformerSequences
from torch.nn import TransformerEncoderLayer, TransformerDecoderLayer
from torch.utils.data import Dataset, DataLoader, random_split
from wolof_translate.data.dataset_v2 import SentenceDataset
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from torch.optim.lr_scheduler import _LRScheduler
# from custom_rnn.utils.kwargs import Kwargs
from torch.nn.utils.rnn import pad_sequence
from plotly.subplots import make_subplots
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from torch.nn import functional as F
import plotly.graph_objects as go
from tokenizers import Tokenizer
import matplotlib.pyplot as plt
from tqdm import tqdm, trange
from functools import partial
from torch.nn import utils
from copy import deepcopy
from torch import optim
from typing import *
from torch import nn
import pandas as pd
import numpy as np
import itertools
import evaluate
import random
import string
import shutil
import wandb
import torch
import json
import copy
import os

os.environ["WANDB_DISABLED"] = "true"

  from .autonotebook import tqdm as notebook_tqdm


## French to wolof

### Configure dataset 🔠

In [2]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v3.json")


In [3]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float):

  # Create augmentation to add on French sentences
  fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p),
                                        remove_mark_space, delete_guillemet_space)

  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(f"data/extractions/new_data/train_set.csv",
                                        tokenizer,
                                        truncation = True,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(f"data/extractions/new_data/valid_set.csv",
                                        tokenizer,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

### Configure the model and the evaluation function ⚙️

Let us evaluate the predictions with the `bleu` metric.

In [4]:
%%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, self.tokenizer.pad_token_id)
        
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

Overwriting wolof-translate/wolof_translate/utils/evaluation.py


Let us initialize the evaluation object.

In [5]:
%run wolof-translate/wolof_translate/utils/evaluation.py
evaluation = TranslationEvaluation(tokenizer)


### Searching for the best parameters 🕖

In [6]:
from wolof_translate.models.transformers.optimization import TransformerScheduler
from wolof_translate.trainers.transformer_trainer import ModelRunner
from wolof_translate.utils.evaluation import TranslationEvaluation
from wolof_translate.models.transformers.main import Transformer
from wolof_translate.utils.split_with_valid import split_data


-------------

### ---

In [7]:
# let us initialize the hyperparameter configuration
config = {
    'random_state': 0,
    'fr_char_p': 0.5682074095811468,
    'fr_word_p': 0.026223984144846748,
    'learning_rate': 0.008879286602093731,
    'weight_decay': 0.5215535876633939,
    'batch_size': 32,
    'warmup_ratio': 0.0,
    'max_epoch': 956,
    'bleu': 4.881,
    'model_dir': 'data/checkpoints/fw_t5_small_custom_train_v3_checkpoints/',
    'new_model_dir': 'data/checkpoints/t5_small_custom_train_results_fw_v3/'
}

# Initialize the model name
model_name = 't5-small'

# import the model with its pre-trained weights
model = T5ForConditionalGeneration.from_pretrained(model_name)

# resize the token embeddings
model.resize_token_embeddings(len(tokenizer))

# let us initialize the evaluation class
evaluation = TranslationEvaluation(tokenizer)

# let us initialize the trainer
trainer = ModelRunner(model, seed = 0, version = 1, evaluation = evaluation)

# split the data
split_data(config['random_state'])

# recuperate train and test set
train_dataset, test_dataset = recuperate_datasets(config['fr_char_p'], 
                                                    config['fr_word_p'])

# let us calculate the appropriate warmup steps (let us take a max epoch of 100)
length = len(train_dataset)

n_steps = length // config['batch_size']

num_steps = config['max_epoch'] * n_steps

warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

# Initialize the scheduler parameters
scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}

# Initialize the optimizer parameters
optimizer_args = {
    'lr': config['learning_rate'],
    'weight_decay': config['weight_decay'],
    'betas': (0.9, 0.98),
}

# Initialize the loaders parameters
train_loader_args = {'batch_size': config['batch_size']}

# Add the datasets and hyperparameters to trainer
trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                optimizer_kwargs = optimizer_args,
                lr_scheduler=get_linear_schedule_with_warmup,
                lr_scheduler_kwargs=scheduler_args, 
                predict_with_generate = True,
                hugging_face = True,
                logging_dir="data/logs/t5_small_custom_train_fw_v3"
                )

# We will from checkpoints so let us the model
# trainer.load(config['model_dir'], load_best=True) # Only for the first loading
trainer.load(config['new_model_dir'])

        

In [12]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 5: {Learning rate: [0.008841205979637053]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.35802058911905055, 'test_loss': 0.3686817422509193, 'bleu': 4.1962, 'gen_len': 8.7192}




  0%|          | 1/952 [00:17<4:40:17, 17.68s/it]

For epoch 6: {Learning rate: [0.008831685824022883]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.72batches/s]



Metrics: {'train_loss': 0.28919440507888794, 'test_loss': 0.3806706488132477, 'bleu': 7.5427, 'gen_len': 7.7397}




  0%|          | 2/952 [00:32<4:16:48, 16.22s/it]

For epoch 7: {Learning rate: [0.008822165668408714]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.60batches/s]



Metrics: {'train_loss': 0.2436344274660436, 'test_loss': 0.4040205731987953, 'bleu': 6.7306, 'gen_len': 8.1438}




  0%|          | 3/952 [00:47<4:03:31, 15.40s/it]

For epoch 8: {Learning rate: [0.008812645512794544]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.21992226998980452, 'test_loss': 0.41580230444669725, 'bleu': 6.0713, 'gen_len': 7.7877}




  0%|          | 4/952 [01:01<3:55:59, 14.94s/it]

For epoch 9: {Learning rate: [0.008803125357180374]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.60batches/s]



Metrics: {'train_loss': 0.20159636910368756, 'test_loss': 0.42675782591104505, 'bleu': 7.617, 'gen_len': 6.9932}




  1%|          | 5/952 [01:17<3:58:54, 15.14s/it]

For epoch 10: {Learning rate: [0.008793605201566204]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.1920666531091783, 'test_loss': 0.40567089766263964, 'bleu': 10.5949, 'gen_len': 7.0822}




  1%|          | 6/952 [01:32<4:00:16, 15.24s/it]

For epoch 11: {Learning rate: [0.008784085045952036]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.18594242414323295, 'test_loss': 0.42565407752990725, 'bleu': 11.3044, 'gen_len': 6.9795}




  1%|          | 7/952 [01:51<4:20:29, 16.54s/it]

For epoch 12: {Learning rate: [0.008774564890337866]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.17865310827406441, 'test_loss': 0.4133643306791782, 'bleu': 13.3263, 'gen_len': 6.7466}




  1%|          | 8/952 [02:06<4:13:49, 16.13s/it]

For epoch 13: {Learning rate: [0.008765044734723696]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.50batches/s]



Metrics: {'train_loss': 0.17306013496183767, 'test_loss': 0.41047521233558654, 'bleu': 11.7723, 'gen_len': 7.3973}




  1%|          | 9/952 [02:21<4:05:45, 15.64s/it]

For epoch 14: {Learning rate: [0.008755524579109527]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.54batches/s]



Metrics: {'train_loss': 0.17596057166413562, 'test_loss': 0.42439097017049787, 'bleu': 11.0866, 'gen_len': 6.4863}




  1%|          | 10/952 [02:36<4:01:05, 15.36s/it]

For epoch 15: {Learning rate: [0.008746004423495357]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.17779543923168648, 'test_loss': 0.4109851598739624, 'bleu': 9.8598, 'gen_len': 7.7534}




  1%|          | 11/952 [02:50<3:56:19, 15.07s/it]

For epoch 16: {Learning rate: [0.008736484267881187]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.17430340680407314, 'test_loss': 0.41284233145415783, 'bleu': 13.2382, 'gen_len': 6.3836}




  1%|▏         | 12/952 [03:04<3:52:46, 14.86s/it]

For epoch 17: {Learning rate: [0.008726964112267019]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.17583428868433323, 'test_loss': 0.4215084470808506, 'bleu': 8.6631, 'gen_len': 6.8836}




  1%|▏         | 13/952 [03:19<3:51:08, 14.77s/it]

For epoch 18: {Learning rate: [0.008717443956652849]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.184687766905238, 'test_loss': 0.4222396418452263, 'bleu': 8.5616, 'gen_len': 6.8014}




  1%|▏         | 14/952 [03:34<3:50:13, 14.73s/it]

For epoch 19: {Learning rate: [0.00870792380103868]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.48batches/s]



Metrics: {'train_loss': 0.19087315259910212, 'test_loss': 0.41704543232917785, 'bleu': 11.3287, 'gen_len': 7.4726}




  2%|▏         | 15/952 [03:49<3:50:27, 14.76s/it]

For epoch 20: {Learning rate: [0.00869840364542451]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.17batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.43batches/s]



Metrics: {'train_loss': 0.19470275284313573, 'test_loss': 0.40862575098872184, 'bleu': 11.6164, 'gen_len': 7.0616}




  2%|▏         | 16/952 [04:06<4:04:11, 15.65s/it]

For epoch 21: {Learning rate: [0.00868888348981034]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.42batches/s]



Metrics: {'train_loss': 0.20314740734856304, 'test_loss': 0.41230800077319146, 'bleu': 9.7719, 'gen_len': 6.8904}




  2%|▏         | 17/952 [04:21<4:00:33, 15.44s/it]

For epoch 22: {Learning rate: [0.00867936333419617]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.38batches/s]



Metrics: {'train_loss': 0.2159883579829844, 'test_loss': 0.4172617197036743, 'bleu': 9.3875, 'gen_len': 7.5205}




  2%|▏         | 18/952 [04:36<3:56:52, 15.22s/it]

For epoch 23: {Learning rate: [0.008669843178582002]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.47batches/s]



Metrics: {'train_loss': 0.21931946350307, 'test_loss': 0.41800147145986555, 'bleu': 9.1857, 'gen_len': 6.4658}




  2%|▏         | 19/952 [04:51<3:54:35, 15.09s/it]

For epoch 24: {Learning rate: [0.008660323022967832]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.11batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.48batches/s]



Metrics: {'train_loss': 0.22831263774778784, 'test_loss': 0.4428780511021614, 'bleu': 7.1672, 'gen_len': 9.6781}




  2%|▏         | 20/952 [05:08<4:04:44, 15.76s/it]

For epoch 25: {Learning rate: [0.008650802867353662]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.48batches/s]



Metrics: {'train_loss': 0.2388188330138602, 'test_loss': 0.43363429307937623, 'bleu': 8.8266, 'gen_len': 7.1233}




  2%|▏         | 21/952 [05:23<3:59:23, 15.43s/it]

For epoch 26: {Learning rate: [0.008641282711739492]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.44batches/s]



Metrics: {'train_loss': 0.25525691669161726, 'test_loss': 0.4323226109147072, 'bleu': 6.3456, 'gen_len': 7.4658}




  2%|▏         | 22/952 [05:37<3:56:22, 15.25s/it]

For epoch 27: {Learning rate: [0.008631762556125322]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.60batches/s]



Metrics: {'train_loss': 0.25256935997707086, 'test_loss': 0.4459503024816513, 'bleu': 8.7293, 'gen_len': 6.137}




  2%|▏         | 23/952 [05:52<3:53:32, 15.08s/it]

For epoch 28: {Learning rate: [0.008622242400511152]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.96batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.49batches/s]



Metrics: {'train_loss': 0.26125882202532236, 'test_loss': 0.42832369208335874, 'bleu': 9.1516, 'gen_len': 7.3288}




  3%|▎         | 24/952 [06:08<3:56:50, 15.31s/it]

For epoch 29: {Learning rate: [0.008612722244896984]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.19batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.40batches/s]



Metrics: {'train_loss': 0.26621164417848353, 'test_loss': 0.4338162764906883, 'bleu': 8.943, 'gen_len': 6.9247}




  3%|▎         | 25/952 [06:24<3:57:34, 15.38s/it]

For epoch 30: {Learning rate: [0.008603202089282815]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.90batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.48batches/s]



Metrics: {'train_loss': 0.27344151495433433, 'test_loss': 0.4437884375452995, 'bleu': 6.66, 'gen_len': 6.5822}




  3%|▎         | 26/952 [06:40<4:01:41, 15.66s/it]

For epoch 31: {Learning rate: [0.008593681933668645]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.05batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.53batches/s]



Metrics: {'train_loss': 0.29264452253899925, 'test_loss': 0.44183192402124405, 'bleu': 4.1841, 'gen_len': 6.7945}




  3%|▎         | 27/952 [06:58<4:13:51, 16.47s/it]

For epoch 32: {Learning rate: [0.008584161778054475]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.28221210582954126, 'test_loss': 0.4498401716351509, 'bleu': 5.9211, 'gen_len': 7.6164}




  3%|▎         | 28/952 [07:13<4:04:34, 15.88s/it]

For epoch 33: {Learning rate: [0.008574641622440305]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.18batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.53batches/s]



Metrics: {'train_loss': 0.2790471340824918, 'test_loss': 0.44453089237213134, 'bleu': 6.9538, 'gen_len': 7.4589}




  3%|▎         | 29/952 [07:28<4:01:22, 15.69s/it]

For epoch 34: {Learning rate: [0.008565121466826135]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.08batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.41batches/s]



Metrics: {'train_loss': 0.28059591461972494, 'test_loss': 0.4600356489419937, 'bleu': 8.0032, 'gen_len': 5.9932}




  3%|▎         | 30/952 [07:44<4:01:15, 15.70s/it]

For epoch 35: {Learning rate: [0.008555601311211967]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.01batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.48batches/s]



Metrics: {'train_loss': 0.49216699018710997, 'test_loss': 0.5016797333955765, 'bleu': 2.2042, 'gen_len': 5.6096}




  3%|▎         | 31/952 [07:59<4:00:55, 15.70s/it]

For epoch 36: {Learning rate: [0.008546081155597797]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.08batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.42batches/s]



Metrics: {'train_loss': 0.3680733849362629, 'test_loss': 0.4645692154765129, 'bleu': 3.357, 'gen_len': 7.4658}




  3%|▎         | 32/952 [08:15<4:01:28, 15.75s/it]

For epoch 37: {Learning rate: [0.008536560999983627]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.83batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.51batches/s]



Metrics: {'train_loss': 0.3180810740081275, 'test_loss': 0.4569860026240349, 'bleu': 6.5062, 'gen_len': 5.6918}




  3%|▎         | 33/952 [08:32<4:04:23, 15.96s/it]

For epoch 38: {Learning rate: [0.008527040844369458]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.96batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.02batches/s]



Metrics: {'train_loss': 0.31136413936207935, 'test_loss': 0.4486724302172661, 'bleu': 2.8871, 'gen_len': 7.3973}




  4%|▎         | 34/952 [08:49<4:09:05, 16.28s/it]

For epoch 39: {Learning rate: [0.00851752068875529]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.04batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.52batches/s]



Metrics: {'train_loss': 0.31122560435678903, 'test_loss': 0.4365707740187645, 'bleu': 6.7004, 'gen_len': 7.2466}




  4%|▎         | 35/952 [09:05<4:06:36, 16.14s/it]

For epoch 40: {Learning rate: [0.008508000533141118]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.97batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.49batches/s]



Metrics: {'train_loss': 0.31207493710808637, 'test_loss': 0.45778545290231704, 'bleu': 5.9434, 'gen_len': 7.1233}




  4%|▍         | 36/952 [09:21<4:05:31, 16.08s/it]

For epoch 41: {Learning rate: [0.00849848037752695]}


Train batch number 40: 100%|██████████| 41/41 [00:11<00:00,  3.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.47batches/s]



Metrics: {'train_loss': 0.3053440760548522, 'test_loss': 0.43850625306367874, 'bleu': 4.1741, 'gen_len': 6.8151}




  4%|▍         | 37/952 [09:38<4:10:39, 16.44s/it]

For epoch 42: {Learning rate: [0.00848896022191278]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.79batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.45batches/s]



Metrics: {'train_loss': 0.30370842647261737, 'test_loss': 0.4380831211805344, 'bleu': 5.4256, 'gen_len': 6.2192}




  4%|▍         | 38/952 [09:54<4:09:54, 16.40s/it]

For epoch 43: {Learning rate: [0.00847944006629861]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.76batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.29843987124722177, 'test_loss': 0.4483352601528168, 'bleu': 3.2464, 'gen_len': 6.4178}




  4%|▍         | 39/952 [10:11<4:10:35, 16.47s/it]

For epoch 44: {Learning rate: [0.00846991991068444]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.77batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.48batches/s]



Metrics: {'train_loss': 0.3029792123451466, 'test_loss': 0.4409001588821411, 'bleu': 7.7521, 'gen_len': 6.7808}




  4%|▍         | 40/952 [10:29<4:20:25, 17.13s/it]

For epoch 45: {Learning rate: [0.008460399755070272]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.12batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.47batches/s]



Metrics: {'train_loss': 0.29842147594544943, 'test_loss': 0.4377399802207947, 'bleu': 5.1994, 'gen_len': 7.2466}




  4%|▍         | 41/952 [10:45<4:12:38, 16.64s/it]

For epoch 46: {Learning rate: [0.0084508795994561]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.03batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.51batches/s]



Metrics: {'train_loss': 0.2893065399513012, 'test_loss': 0.4458030372858047, 'bleu': 9.469, 'gen_len': 6.4589}




  4%|▍         | 42/952 [11:01<4:08:26, 16.38s/it]

For epoch 47: {Learning rate: [0.008441359443841933]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.91batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.55batches/s]



Metrics: {'train_loss': 0.3032946204993783, 'test_loss': 0.4447393253445625, 'bleu': 6.8499, 'gen_len': 6.7329}




  5%|▍         | 43/952 [11:17<4:07:32, 16.34s/it]

For epoch 48: {Learning rate: [0.008431839288227763]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.80batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.45batches/s]



Metrics: {'train_loss': 0.29549240375437386, 'test_loss': 0.4347097247838974, 'bleu': 8.4187, 'gen_len': 6.3151}




  5%|▍         | 44/952 [11:34<4:10:15, 16.54s/it]

For epoch 49: {Learning rate: [0.008422319132613593]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.03batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.36batches/s]



Metrics: {'train_loss': 0.28116262286174587, 'test_loss': 0.42807875126600264, 'bleu': 7.9467, 'gen_len': 7.4247}




  5%|▍         | 45/952 [11:50<4:07:48, 16.39s/it]

For epoch 50: {Learning rate: [0.008412798976999423]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.89batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.28275697878221184, 'test_loss': 0.4491478353738785, 'bleu': 5.7914, 'gen_len': 6.5205}




  5%|▍         | 46/952 [12:06<4:06:06, 16.30s/it]

For epoch 51: {Learning rate: [0.008403278821385255]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.95batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.46batches/s]



Metrics: {'train_loss': 0.5358445855175576, 'test_loss': 0.7287889212369919, 'bleu': 0.0, 'gen_len': 2.0}




  5%|▍         | 47/952 [12:22<4:04:22, 16.20s/it]

For epoch 52: {Learning rate: [0.008393758665771084]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.95batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.32batches/s]



Metrics: {'train_loss': 0.6942096686944729, 'test_loss': 0.6461636990308761, 'bleu': 0.0, 'gen_len': 4.5548}




  5%|▌         | 48/952 [12:38<4:05:12, 16.27s/it]

For epoch 53: {Learning rate: [0.008384238510156915]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.04batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.16batches/s]



Metrics: {'train_loss': 0.6338244531212783, 'test_loss': 0.6130709677934647, 'bleu': 0.2063, 'gen_len': 14.089}




  5%|▌         | 49/952 [12:55<4:05:22, 16.30s/it]

For epoch 54: {Learning rate: [0.008374718354542746]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.87batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.41batches/s]



Metrics: {'train_loss': 0.5914880895033116, 'test_loss': 0.5694733411073685, 'bleu': 1.8397, 'gen_len': 5.6918}




  5%|▌         | 50/952 [13:11<4:05:49, 16.35s/it]

For epoch 55: {Learning rate: [0.008365198198928576]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.07batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.33batches/s]



Metrics: {'train_loss': 0.5503924430870428, 'test_loss': 0.5730072259902954, 'bleu': 1.7034, 'gen_len': 5.6918}




  5%|▌         | 51/952 [13:27<4:04:26, 16.28s/it]

For epoch 56: {Learning rate: [0.008355678043314406]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.92batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.31batches/s]



Metrics: {'train_loss': 0.5230824082362943, 'test_loss': 0.5613634079694748, 'bleu': 1.5978, 'gen_len': 6.0137}




  5%|▌         | 52/952 [13:44<4:04:14, 16.28s/it]

For epoch 57: {Learning rate: [0.008346157887700238]}


Train batch number 40: 100%|██████████| 41/41 [00:11<00:00,  3.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.36batches/s]



Metrics: {'train_loss': 0.504730777042668, 'test_loss': 0.534907491505146, 'bleu': 2.3812, 'gen_len': 6.4178}




  6%|▌         | 53/952 [14:01<4:06:44, 16.47s/it]

For epoch 58: {Learning rate: [0.008336637732086066]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.75batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.4879165786068614, 'test_loss': 0.5437783285975456, 'bleu': 1.9089, 'gen_len': 7.1164}




  6%|▌         | 54/952 [14:17<4:06:58, 16.50s/it]

For epoch 59: {Learning rate: [0.008327117576471898]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.37batches/s]



Metrics: {'train_loss': 0.4770909802215855, 'test_loss': 0.5454996764659882, 'bleu': 0.6317, 'gen_len': 8.0411}




  6%|▌         | 55/952 [14:34<4:08:11, 16.60s/it]

For epoch 60: {Learning rate: [0.008317597420857728]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.80batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.28batches/s]



Metrics: {'train_loss': 0.4611286600915397, 'test_loss': 0.5422486335039138, 'bleu': 1.2084, 'gen_len': 7.1918}




  6%|▌         | 56/952 [14:51<4:09:26, 16.70s/it]

For epoch 61: {Learning rate: [0.008308077265243559]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.83batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.17batches/s]



Metrics: {'train_loss': 0.4515404068842167, 'test_loss': 0.5447551608085632, 'bleu': 1.5805, 'gen_len': 6.0753}




  6%|▌         | 57/952 [15:08<4:11:06, 16.83s/it]

For epoch 62: {Learning rate: [0.008298557109629389]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.15batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.19batches/s]



Metrics: {'train_loss': 0.44307632926033763, 'test_loss': 0.5343262851238251, 'bleu': 1.2142, 'gen_len': 6.1233}




  6%|▌         | 58/952 [15:24<4:06:44, 16.56s/it]

For epoch 63: {Learning rate: [0.00828903695401522]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.43batches/s]



Metrics: {'train_loss': 0.43137297906526706, 'test_loss': 0.5288067609071732, 'bleu': 0.939, 'gen_len': 6.5959}




  6%|▌         | 59/952 [15:40<4:05:32, 16.50s/it]

For epoch 64: {Learning rate: [0.00827951679840105]}


Train batch number 40: 100%|██████████| 41/41 [00:11<00:00,  3.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.40batches/s]



Metrics: {'train_loss': 0.4228745618971383, 'test_loss': 0.5301647454500198, 'bleu': 1.4248, 'gen_len': 9.5068}




  6%|▋         | 60/952 [15:57<4:07:14, 16.63s/it]

For epoch 65: {Learning rate: [0.008269996642786881]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.95batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.26batches/s]



Metrics: {'train_loss': 0.41779336624029206, 'test_loss': 0.5277258694171906, 'bleu': 1.8388, 'gen_len': 6.5616}




  6%|▋         | 61/952 [16:14<4:06:15, 16.58s/it]

For epoch 66: {Learning rate: [0.008260476487172711]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.84batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.46batches/s]



Metrics: {'train_loss': 0.40999256619592994, 'test_loss': 0.5241810262203217, 'bleu': 1.6583, 'gen_len': 6.3288}




  7%|▋         | 62/952 [16:30<4:04:16, 16.47s/it]

For epoch 67: {Learning rate: [0.008250956331558541]}


Train batch number 40: 100%|██████████| 41/41 [00:11<00:00,  3.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.52batches/s]



Metrics: {'train_loss': 0.4033110170829587, 'test_loss': 0.5148045063018799, 'bleu': 2.4093, 'gen_len': 6.3904}




  7%|▋         | 63/952 [16:50<4:19:11, 17.49s/it]

For epoch 68: {Learning rate: [0.008241436175944371]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.43batches/s]



Metrics: {'train_loss': 0.39949773506420416, 'test_loss': 0.5316051304340362, 'bleu': 1.7041, 'gen_len': 6.1644}




  7%|▋         | 64/952 [17:05<4:07:13, 16.70s/it]

For epoch 69: {Learning rate: [0.008231916020330203]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.13batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.42batches/s]



Metrics: {'train_loss': 0.3946445082745901, 'test_loss': 0.5245103821158409, 'bleu': 2.4221, 'gen_len': 6.1986}




  7%|▋         | 65/952 [17:20<4:01:39, 16.35s/it]

For epoch 70: {Learning rate: [0.008222395864716034]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.80batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.38832758112651544, 'test_loss': 0.5116777509450913, 'bleu': 2.2931, 'gen_len': 6.9726}




  7%|▋         | 66/952 [17:37<4:01:16, 16.34s/it]

For epoch 71: {Learning rate: [0.008212875709101864]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.73batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.51batches/s]



Metrics: {'train_loss': 0.38689055093904823, 'test_loss': 0.5117860794067383, 'bleu': 0.938, 'gen_len': 6.5616}




  7%|▋         | 67/952 [17:53<4:02:06, 16.41s/it]

For epoch 72: {Learning rate: [0.008203355553487694]}


Train batch number 40: 100%|██████████| 41/41 [00:11<00:00,  3.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.38batches/s]



Metrics: {'train_loss': 0.38136244692453525, 'test_loss': 0.5093629375100136, 'bleu': 2.2837, 'gen_len': 6.1712}




  7%|▋         | 68/952 [18:11<4:08:05, 16.84s/it]

For epoch 73: {Learning rate: [0.008193835397873524]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.05batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.52batches/s]



Metrics: {'train_loss': 0.37419409722816654, 'test_loss': 0.5151037320494651, 'bleu': 2.2569, 'gen_len': 6.226}




  7%|▋         | 69/952 [18:27<4:03:57, 16.58s/it]

For epoch 74: {Learning rate: [0.008184315242259354]}


Train batch number 40: 100%|██████████| 41/41 [00:11<00:00,  3.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.17batches/s]



Metrics: {'train_loss': 0.37427569671374994, 'test_loss': 0.5184222757816315, 'bleu': 2.8849, 'gen_len': 6.7534}




  7%|▋         | 70/952 [18:45<4:08:59, 16.94s/it]

For epoch 75: {Learning rate: [0.008174795086645186]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.83batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.47batches/s]



Metrics: {'train_loss': 0.372225994017066, 'test_loss': 0.49686616361141206, 'bleu': 2.2529, 'gen_len': 5.2808}




  7%|▋         | 71/952 [19:01<4:05:38, 16.73s/it]

For epoch 76: {Learning rate: [0.008165274931031016]}


Train batch number 40: 100%|██████████| 41/41 [00:11<00:00,  3.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.18batches/s]



Metrics: {'train_loss': 0.36747080669170473, 'test_loss': 0.5073222175240517, 'bleu': 0.9423, 'gen_len': 6.0548}




  8%|▊         | 72/952 [19:19<4:10:20, 17.07s/it]

For epoch 77: {Learning rate: [0.008155754775416846]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  3.82batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.49batches/s]



Metrics: {'train_loss': 0.363677813512523, 'test_loss': 0.5065492331981659, 'bleu': 2.578, 'gen_len': 5.6712}




  8%|▊         | 73/952 [19:35<4:06:10, 16.80s/it]

For epoch 78: {Learning rate: [0.008146234619802677]}


Train batch number 40: 100%|██████████| 41/41 [00:20<00:00,  2.00batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.70s/batches]



Metrics: {'train_loss': 0.3642731345281368, 'test_loss': 0.506260085105896, 'bleu': 1.9456, 'gen_len': 6.4521}




  8%|▊         | 74/952 [20:15<5:48:04, 23.79s/it]

For epoch 79: {Learning rate: [0.008136714464188507]}


Train batch number 40: 100%|██████████| 41/41 [00:34<00:00,  1.19batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.97s/batches]



Metrics: {'train_loss': 0.357867263439225, 'test_loss': 0.5150900736451149, 'bleu': 2.2111, 'gen_len': 6.2397}




  8%|▊         | 75/952 [21:16<8:31:38, 35.00s/it]

For epoch 80: {Learning rate: [0.008127194308574337]}


Train batch number 40: 100%|██████████| 41/41 [00:34<00:00,  1.20batches/s]
Test batch number 1:  10%|█         | 1/10 [00:02<00:21,  2.35s/batches]

: 

In [8]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 80: {Learning rate: [0.008127194308574337]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.01batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.25batches/s]



Metrics: {'train_loss': 0.35787499823221347, 'test_loss': 0.49289117753505707, 'bleu': 2.3578, 'gen_len': 6.3562}




  0%|          | 1/877 [00:15<3:44:01, 15.34s/it]

For epoch 81: {Learning rate: [0.008117674152960169]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.3554441027524995, 'test_loss': 0.5087539911270141, 'bleu': 2.6532, 'gen_len': 5.6849}




  0%|          | 2/877 [00:27<3:15:20, 13.39s/it]

For epoch 82: {Learning rate: [0.008108153997345999]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.13batches/s]



Metrics: {'train_loss': 0.3549925626778021, 'test_loss': 0.5062556520104409, 'bleu': 0.8663, 'gen_len': 6.3425}




  0%|          | 3/877 [00:39<3:04:42, 12.68s/it]

For epoch 83: {Learning rate: [0.00809863384173183]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.3522432511899529, 'test_loss': 0.49156564474105835, 'bleu': 2.699, 'gen_len': 6.4315}




  0%|          | 4/877 [00:51<3:02:03, 12.51s/it]

For epoch 84: {Learning rate: [0.00808911368611766]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.34928004407301183, 'test_loss': 0.5020077496767044, 'bleu': 2.0534, 'gen_len': 5.6096}




  1%|          | 5/877 [01:03<2:59:35, 12.36s/it]

For epoch 85: {Learning rate: [0.00807959353050349]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.348417489993863, 'test_loss': 0.505195152759552, 'bleu': 2.2114, 'gen_len': 5.6096}




  1%|          | 6/877 [01:15<2:59:12, 12.35s/it]

For epoch 86: {Learning rate: [0.00807007337488932]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.348244169136373, 'test_loss': 0.4917103052139282, 'bleu': 1.4972, 'gen_len': 5.6301}




  1%|          | 7/877 [01:28<2:59:13, 12.36s/it]

For epoch 87: {Learning rate: [0.008060553219275152]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.25batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.3479044401064152, 'test_loss': 0.5027521952986718, 'bleu': 1.8557, 'gen_len': 7.7397}




  1%|          | 8/877 [01:40<2:59:57, 12.43s/it]

For epoch 88: {Learning rate: [0.008051033063660982]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.89batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.34395950741884185, 'test_loss': 0.5065710589289665, 'bleu': 2.061, 'gen_len': 6.5411}




  1%|          | 9/877 [01:54<3:04:43, 12.77s/it]

For epoch 89: {Learning rate: [0.008041512908046812]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.83batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.49batches/s]



Metrics: {'train_loss': 0.3426916497509654, 'test_loss': 0.5017499804496766, 'bleu': 4.2851, 'gen_len': 6.3836}




  1%|          | 10/877 [02:08<3:10:02, 13.15s/it]

For epoch 90: {Learning rate: [0.008031992752432642]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.34193732825721185, 'test_loss': 0.5095344603061676, 'bleu': 2.0694, 'gen_len': 6.1849}




  1%|▏         | 11/877 [02:23<3:20:39, 13.90s/it]

For epoch 91: {Learning rate: [0.008022472596818472]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.3406717886285084, 'test_loss': 0.5043902441859245, 'bleu': 2.4517, 'gen_len': 7.1918}




  1%|▏         | 12/877 [02:38<3:24:14, 14.17s/it]

For epoch 92: {Learning rate: [0.008012952441204303]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.3382835082891511, 'test_loss': 0.4997985288500786, 'bleu': 1.3633, 'gen_len': 7.3151}




  1%|▏         | 13/877 [02:53<3:26:12, 14.32s/it]

For epoch 93: {Learning rate: [0.008003432285590134]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.3409911466807854, 'test_loss': 0.48380786925554276, 'bleu': 2.438, 'gen_len': 6.8082}




  2%|▏         | 14/877 [03:09<3:34:20, 14.90s/it]

For epoch 94: {Learning rate: [0.007993912129975965]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.3401884807319176, 'test_loss': 0.49700492024421694, 'bleu': 3.2717, 'gen_len': 6.1849}




  2%|▏         | 15/877 [03:23<3:29:22, 14.57s/it]

For epoch 95: {Learning rate: [0.007984391974361795]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.3394285876576493, 'test_loss': 0.4975299075245857, 'bleu': 1.6216, 'gen_len': 7.7808}




  2%|▏         | 16/877 [03:37<3:26:57, 14.42s/it]

For epoch 96: {Learning rate: [0.007974871818747625]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.72batches/s]



Metrics: {'train_loss': 0.3408144639759529, 'test_loss': 0.5099182769656181, 'bleu': 2.0338, 'gen_len': 6.0548}




  2%|▏         | 17/877 [03:51<3:24:25, 14.26s/it]

For epoch 97: {Learning rate: [0.007965351663133455]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.3391167216184663, 'test_loss': 0.4913698688149452, 'bleu': 2.6685, 'gen_len': 6.2397}




  2%|▏         | 18/877 [04:05<3:24:48, 14.31s/it]

For epoch 98: {Learning rate: [0.007955831507519285]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.33670036894519156, 'test_loss': 0.5003373101353645, 'bleu': 3.5976, 'gen_len': 7.1096}




  2%|▏         | 19/877 [04:19<3:22:26, 14.16s/it]

For epoch 99: {Learning rate: [0.007946311351905117]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.33532233499899144, 'test_loss': 0.49954831004142763, 'bleu': 2.041, 'gen_len': 7.2397}




  2%|▏         | 20/877 [04:33<3:21:55, 14.14s/it]

For epoch 100: {Learning rate: [0.007936791196290947]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.36batches/s]



Metrics: {'train_loss': 0.3369556774453419, 'test_loss': 0.4945245340466499, 'bleu': 1.0653, 'gen_len': 6.1644}




  2%|▏         | 21/877 [04:48<3:23:41, 14.28s/it]

For epoch 101: {Learning rate: [0.007927271040676778]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.33349508773989794, 'test_loss': 0.49945783615112305, 'bleu': 2.2026, 'gen_len': 6.6096}




  3%|▎         | 22/877 [05:02<3:22:36, 14.22s/it]

For epoch 102: {Learning rate: [0.007917750885062608]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.3336068711629728, 'test_loss': 0.4796327739953995, 'bleu': 1.8847, 'gen_len': 6.3493}




  3%|▎         | 23/877 [05:16<3:22:10, 14.20s/it]

For epoch 103: {Learning rate: [0.00790823072944844]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.33487219199901674, 'test_loss': 0.4893687441945076, 'bleu': 2.8982, 'gen_len': 5.9863}




  3%|▎         | 24/877 [05:30<3:21:21, 14.16s/it]

For epoch 104: {Learning rate: [0.007898710573834268]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.3333823462811912, 'test_loss': 0.5028675764799118, 'bleu': 1.6766, 'gen_len': 7.7808}




  3%|▎         | 25/877 [05:44<3:21:21, 14.18s/it]

For epoch 105: {Learning rate: [0.0078891904182201]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.337548215941685, 'test_loss': 0.48851595669984815, 'bleu': 1.6749, 'gen_len': 6.3767}




  3%|▎         | 26/877 [05:59<3:21:09, 14.18s/it]

For epoch 106: {Learning rate: [0.00787967026260593]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.33214027561792514, 'test_loss': 0.5061636984348297, 'bleu': 3.2886, 'gen_len': 6.8699}




  3%|▎         | 27/877 [06:13<3:20:34, 14.16s/it]

For epoch 107: {Learning rate: [0.00787015010699176]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.33134046742101997, 'test_loss': 0.502210122346878, 'bleu': 2.0423, 'gen_len': 5.6027}




  3%|▎         | 28/877 [06:26<3:18:39, 14.04s/it]

For epoch 108: {Learning rate: [0.00786062995137759]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.33543561326294413, 'test_loss': 0.5013447433710099, 'bleu': 1.51, 'gen_len': 7.7123}




  3%|▎         | 29/877 [06:43<3:29:47, 14.84s/it]

For epoch 109: {Learning rate: [0.007851109795763422]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.3336426361304958, 'test_loss': 0.5096229806542396, 'bleu': 2.544, 'gen_len': 5.9932}




  3%|▎         | 30/877 [06:57<3:26:19, 14.62s/it]

For epoch 110: {Learning rate: [0.00784158964014925]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.33107241733771997, 'test_loss': 0.49675237089395524, 'bleu': 1.4629, 'gen_len': 10.8699}




  4%|▎         | 31/877 [07:11<3:21:50, 14.32s/it]

For epoch 111: {Learning rate: [0.007832069484535083]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.3302882066587122, 'test_loss': 0.4929403856396675, 'bleu': 2.7165, 'gen_len': 6.226}




  4%|▎         | 32/877 [07:25<3:18:56, 14.13s/it]

For epoch 112: {Learning rate: [0.007822549328920913]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.3324659408592596, 'test_loss': 0.4915787592530251, 'bleu': 3.7418, 'gen_len': 6.5068}




  4%|▍         | 33/877 [07:38<3:17:34, 14.05s/it]

For epoch 113: {Learning rate: [0.007813029173306743]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.3313728425560928, 'test_loss': 0.48799195885658264, 'bleu': 0.8123, 'gen_len': 9.0753}




  4%|▍         | 34/877 [07:52<3:15:54, 13.94s/it]

For epoch 114: {Learning rate: [0.007803509017692573]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.33161129675260403, 'test_loss': 0.4870875790715218, 'bleu': 3.2524, 'gen_len': 6.411}




  4%|▍         | 35/877 [08:06<3:14:04, 13.83s/it]

For epoch 115: {Learning rate: [0.007793988862078404]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.3308856396413431, 'test_loss': 0.493117593228817, 'bleu': 1.914, 'gen_len': 6.6575}




  4%|▍         | 36/877 [08:19<3:13:41, 13.82s/it]

For epoch 116: {Learning rate: [0.0077844687064642345]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.3300340161090944, 'test_loss': 0.4975974515080452, 'bleu': 2.1234, 'gen_len': 6.1096}




  4%|▍         | 37/877 [08:33<3:13:37, 13.83s/it]

For epoch 117: {Learning rate: [0.0077749485508500655]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.3339380872685735, 'test_loss': 0.48489102721214294, 'bleu': 3.0448, 'gen_len': 7.0411}




  4%|▍         | 38/877 [08:47<3:12:49, 13.79s/it]

For epoch 118: {Learning rate: [0.007765428395235896]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.32953515176366016, 'test_loss': 0.4869856685400009, 'bleu': 2.5938, 'gen_len': 6.6301}




  4%|▍         | 39/877 [09:01<3:11:27, 13.71s/it]

For epoch 119: {Learning rate: [0.007755908239621727]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.3256216405368433, 'test_loss': 0.5082556277513504, 'bleu': 2.8012, 'gen_len': 6.6849}




  5%|▍         | 40/877 [09:14<3:11:53, 13.76s/it]

For epoch 120: {Learning rate: [0.007746388084007556]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.3287719052012374, 'test_loss': 0.48201084285974505, 'bleu': 2.0852, 'gen_len': 7.1712}




  5%|▍         | 41/877 [09:28<3:11:54, 13.77s/it]

For epoch 121: {Learning rate: [0.007736867928393387]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.3280044001776998, 'test_loss': 0.4897316634654999, 'bleu': 3.8411, 'gen_len': 6.411}




  5%|▍         | 42/877 [09:42<3:13:46, 13.92s/it]

For epoch 122: {Learning rate: [0.007727347772779217]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.3272501182992284, 'test_loss': 0.48892608284950256, 'bleu': 1.683, 'gen_len': 7.2466}




  5%|▍         | 43/877 [09:56<3:12:26, 13.84s/it]

For epoch 123: {Learning rate: [0.007717827617165048]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.33151819538779376, 'test_loss': 0.493604251742363, 'bleu': 2.5113, 'gen_len': 7.0205}




  5%|▌         | 44/877 [10:10<3:11:27, 13.79s/it]

For epoch 124: {Learning rate: [0.007708307461550878]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.49batches/s]



Metrics: {'train_loss': 0.3285785877123112, 'test_loss': 0.48324686139822004, 'bleu': 2.4272, 'gen_len': 8.0274}




  5%|▌         | 45/877 [10:24<3:14:01, 13.99s/it]

For epoch 125: {Learning rate: [0.0076987873059367095]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.49batches/s]



Metrics: {'train_loss': 0.3294301556377876, 'test_loss': 0.48126296848058703, 'bleu': 2.149, 'gen_len': 7.4521}




  5%|▌         | 46/877 [10:39<3:15:49, 14.14s/it]

For epoch 126: {Learning rate: [0.007689267150322539]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.51batches/s]



Metrics: {'train_loss': 0.3274019909341161, 'test_loss': 0.47087737917900085, 'bleu': 3.5644, 'gen_len': 6.7123}




  5%|▌         | 47/877 [10:53<3:16:36, 14.21s/it]

For epoch 127: {Learning rate: [0.00767974699470837]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.60batches/s]



Metrics: {'train_loss': 0.32792964795740637, 'test_loss': 0.46984362304210664, 'bleu': 3.1855, 'gen_len': 6.8836}




  5%|▌         | 48/877 [11:07<3:16:51, 14.25s/it]

For epoch 128: {Learning rate: [0.0076702268390942]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.3265478051290279, 'test_loss': 0.4876352190971375, 'bleu': 3.1781, 'gen_len': 7.5342}




  6%|▌         | 49/877 [11:22<3:16:01, 14.20s/it]

For epoch 129: {Learning rate: [0.007660706683480031]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.32896814041021394, 'test_loss': 0.49596709460020066, 'bleu': 2.8269, 'gen_len': 6.5685}




  6%|▌         | 50/877 [11:36<3:14:42, 14.13s/it]

For epoch 130: {Learning rate: [0.007651186527865861]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.56batches/s]



Metrics: {'train_loss': 0.3268613321025197, 'test_loss': 0.4910737738013268, 'bleu': 3.616, 'gen_len': 8.1096}




  6%|▌         | 51/877 [11:50<3:16:15, 14.26s/it]

For epoch 131: {Learning rate: [0.007641666372251692]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.40batches/s]



Metrics: {'train_loss': 0.3271311583315454, 'test_loss': 0.48525514602661135, 'bleu': 2.6205, 'gen_len': 7.6644}




  6%|▌         | 52/877 [12:05<3:17:51, 14.39s/it]

For epoch 132: {Learning rate: [0.0076321462166375215]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.53batches/s]



Metrics: {'train_loss': 0.3244384839767363, 'test_loss': 0.48502014130353927, 'bleu': 3.0597, 'gen_len': 6.2123}




  6%|▌         | 53/877 [12:19<3:16:57, 14.34s/it]

For epoch 133: {Learning rate: [0.007622626061023353]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.3270149608937705, 'test_loss': 0.4885032445192337, 'bleu': 2.0554, 'gen_len': 6.2808}




  6%|▌         | 54/877 [12:33<3:16:18, 14.31s/it]

For epoch 134: {Learning rate: [0.007613105905409183]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.54batches/s]



Metrics: {'train_loss': 0.3271952227848332, 'test_loss': 0.49276083558797834, 'bleu': 2.0255, 'gen_len': 6.3699}




  6%|▋         | 55/877 [12:48<3:16:06, 14.31s/it]

For epoch 135: {Learning rate: [0.007603585749795014]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.45batches/s]



Metrics: {'train_loss': 0.3248094142210193, 'test_loss': 0.4882825344800949, 'bleu': 2.5465, 'gen_len': 6.7808}




  6%|▋         | 56/877 [13:02<3:16:50, 14.39s/it]

For epoch 136: {Learning rate: [0.007594065594180844]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.3251047788596735, 'test_loss': 0.49193472117185594, 'bleu': 1.4064, 'gen_len': 6.2123}




  6%|▋         | 57/877 [13:16<3:15:14, 14.29s/it]

For epoch 137: {Learning rate: [0.007584545438566675]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.32354992773474717, 'test_loss': 0.49008146226406096, 'bleu': 2.5874, 'gen_len': 6.3904}




  7%|▋         | 58/877 [13:30<3:14:17, 14.23s/it]

For epoch 138: {Learning rate: [0.007575025282952504]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.43batches/s]



Metrics: {'train_loss': 0.3234863150410536, 'test_loss': 0.4760271191596985, 'bleu': 2.4532, 'gen_len': 7.4521}




  7%|▋         | 59/877 [13:45<3:16:36, 14.42s/it]

For epoch 139: {Learning rate: [0.007565505127338335]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.55batches/s]



Metrics: {'train_loss': 0.32445235441370707, 'test_loss': 0.48066545128822324, 'bleu': 2.4191, 'gen_len': 8.0068}




  7%|▋         | 60/877 [14:04<3:34:28, 15.75s/it]

For epoch 140: {Learning rate: [0.0075559849717241655]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.32277035422441436, 'test_loss': 0.47129472270607947, 'bleu': 3.2415, 'gen_len': 8.1301}




  7%|▋         | 61/877 [14:18<3:28:02, 15.30s/it]

For epoch 141: {Learning rate: [0.0075464648161099965]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.32432329472972127, 'test_loss': 0.4824048012495041, 'bleu': 3.9101, 'gen_len': 6.637}




  7%|▋         | 62/877 [14:32<3:23:02, 14.95s/it]

For epoch 142: {Learning rate: [0.007536944660495827]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.72batches/s]



Metrics: {'train_loss': 0.3251161466284496, 'test_loss': 0.49630106538534163, 'bleu': 3.1969, 'gen_len': 6.5616}




  7%|▋         | 63/877 [14:47<3:19:38, 14.72s/it]

For epoch 143: {Learning rate: [0.007527424504881658]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.56batches/s]



Metrics: {'train_loss': 0.3247269726381069, 'test_loss': 0.4830759152770042, 'bleu': 5.4569, 'gen_len': 7.0822}




  7%|▋         | 64/877 [15:01<3:18:28, 14.65s/it]

For epoch 144: {Learning rate: [0.007517904349267488]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.32305744944549186, 'test_loss': 0.49839281737804414, 'bleu': 3.2962, 'gen_len': 6.5274}




  7%|▋         | 65/877 [15:16<3:17:39, 14.60s/it]

For epoch 145: {Learning rate: [0.007508384193653318]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.63batches/s]



Metrics: {'train_loss': 0.32539259805911924, 'test_loss': 0.4891510888934135, 'bleu': 4.4105, 'gen_len': 6.9658}




  8%|▊         | 66/877 [15:31<3:18:55, 14.72s/it]

For epoch 146: {Learning rate: [0.007498864038039148]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.60batches/s]



Metrics: {'train_loss': 0.32212482792575187, 'test_loss': 0.49228391945362093, 'bleu': 3.9316, 'gen_len': 6.6301}




  8%|▊         | 67/877 [15:45<3:18:11, 14.68s/it]

For epoch 147: {Learning rate: [0.007489343882424979]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.3236211109452131, 'test_loss': 0.48593914061784743, 'bleu': 4.4553, 'gen_len': 6.7671}




  8%|▊         | 68/877 [15:59<3:16:17, 14.56s/it]

For epoch 148: {Learning rate: [0.0074798237268108095]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.51batches/s]



Metrics: {'train_loss': 0.3205532776873286, 'test_loss': 0.4863647028803825, 'bleu': 3.5532, 'gen_len': 7.0616}




  8%|▊         | 69/877 [16:14<3:15:50, 14.54s/it]

For epoch 149: {Learning rate: [0.0074703035711966405]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.32054988313012006, 'test_loss': 0.493027937412262, 'bleu': 1.4204, 'gen_len': 7.3082}




  8%|▊         | 70/877 [16:28<3:14:34, 14.47s/it]

For epoch 150: {Learning rate: [0.007460783415582471]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.3228107807112903, 'test_loss': 0.5028642952442169, 'bleu': 3.0411, 'gen_len': 7.0342}




  8%|▊         | 71/877 [16:42<3:11:39, 14.27s/it]

For epoch 151: {Learning rate: [0.007451263259968301]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.3233050169741235, 'test_loss': 0.49045671373605726, 'bleu': 3.7034, 'gen_len': 6.5068}




  8%|▊         | 72/877 [16:56<3:10:16, 14.18s/it]

For epoch 152: {Learning rate: [0.007441743104354131]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.32397674750990985, 'test_loss': 0.48834074288606644, 'bleu': 2.206, 'gen_len': 6.6918}




  8%|▊         | 73/877 [17:10<3:08:57, 14.10s/it]

For epoch 153: {Learning rate: [0.007432222948739962]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.3204257266550529, 'test_loss': 0.49089538156986234, 'bleu': 3.3186, 'gen_len': 6.5068}




  8%|▊         | 74/877 [17:24<3:08:02, 14.05s/it]

For epoch 154: {Learning rate: [0.007422702793125792]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.3179085901597651, 'test_loss': 0.49056960344314576, 'bleu': 2.5921, 'gen_len': 6.8767}




  9%|▊         | 75/877 [17:40<3:18:19, 14.84s/it]

For epoch 155: {Learning rate: [0.007413182637511623]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.3219077238222448, 'test_loss': 0.4803501695394516, 'bleu': 3.6992, 'gen_len': 7.7397}




  9%|▊         | 76/877 [17:54<3:14:28, 14.57s/it]

For epoch 156: {Learning rate: [0.0074036624818974535]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.3196770388905595, 'test_loss': 0.4828282862901688, 'bleu': 2.6935, 'gen_len': 6.7329}




  9%|▉         | 77/877 [18:08<3:10:52, 14.32s/it]

For epoch 157: {Learning rate: [0.0073941423262832845]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.46batches/s]



Metrics: {'train_loss': 0.318982959520526, 'test_loss': 0.4777162566781044, 'bleu': 2.5266, 'gen_len': 7.3493}




  9%|▉         | 78/877 [18:23<3:11:18, 14.37s/it]

For epoch 158: {Learning rate: [0.007384622170669114]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.318763083074151, 'test_loss': 0.47760910987854005, 'bleu': 4.3055, 'gen_len': 6.3699}




  9%|▉         | 79/877 [18:37<3:12:17, 14.46s/it]

For epoch 159: {Learning rate: [0.007375102015054945]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.64batches/s]



Metrics: {'train_loss': 0.3179759841139724, 'test_loss': 0.48629462718963623, 'bleu': 3.1079, 'gen_len': 6.9247}




  9%|▉         | 80/877 [18:52<3:13:01, 14.53s/it]

For epoch 160: {Learning rate: [0.007365581859440775]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.31856726109981537, 'test_loss': 0.4834911495447159, 'bleu': 2.1254, 'gen_len': 7.2192}




  9%|▉         | 81/877 [19:06<3:11:07, 14.41s/it]

For epoch 161: {Learning rate: [0.007356061703826606]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.3180409764371267, 'test_loss': 0.47983068972826004, 'bleu': 3.6728, 'gen_len': 7.0685}




  9%|▉         | 82/877 [19:21<3:10:56, 14.41s/it]

For epoch 162: {Learning rate: [0.007346541548212436]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.21batches/s]



Metrics: {'train_loss': 0.3192939721956486, 'test_loss': 0.47556995153427123, 'bleu': 3.5687, 'gen_len': 7.5342}




  9%|▉         | 83/877 [19:36<3:13:59, 14.66s/it]

For epoch 163: {Learning rate: [0.007337021392598267]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.31batches/s]



Metrics: {'train_loss': 0.3191699222093675, 'test_loss': 0.4948083937168121, 'bleu': 2.8346, 'gen_len': 6.7192}




 10%|▉         | 84/877 [19:51<3:15:36, 14.80s/it]

For epoch 164: {Learning rate: [0.0073275012369840966]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.32046187541833737, 'test_loss': 0.4769418254494667, 'bleu': 3.4053, 'gen_len': 7.3151}




 10%|▉         | 85/877 [20:05<3:13:37, 14.67s/it]

For epoch 165: {Learning rate: [0.007317981081369928]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.55batches/s]



Metrics: {'train_loss': 0.3184770621904513, 'test_loss': 0.4784339591860771, 'bleu': 4.1268, 'gen_len': 6.9658}




 10%|▉         | 86/877 [20:20<3:11:56, 14.56s/it]

For epoch 166: {Learning rate: [0.007308460925755758]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.32batches/s]



Metrics: {'train_loss': 0.3171043683116029, 'test_loss': 0.4816316500306129, 'bleu': 2.1987, 'gen_len': 7.6644}




 10%|▉         | 87/877 [20:34<3:12:39, 14.63s/it]

For epoch 167: {Learning rate: [0.007298940770141589]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.56batches/s]



Metrics: {'train_loss': 0.31844950594553134, 'test_loss': 0.4840386539697647, 'bleu': 2.258, 'gen_len': 7.8493}




 10%|█         | 88/877 [20:49<3:11:33, 14.57s/it]

For epoch 168: {Learning rate: [0.007289420614527419]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.316082015997026, 'test_loss': 0.48719414323568344, 'bleu': 2.4102, 'gen_len': 6.9178}




 10%|█         | 89/877 [21:03<3:10:20, 14.49s/it]

For epoch 169: {Learning rate: [0.00727990045891325]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.3185643648228994, 'test_loss': 0.47536919862031934, 'bleu': 2.2161, 'gen_len': 7.2123}




 10%|█         | 90/877 [21:18<3:10:31, 14.53s/it]

For epoch 170: {Learning rate: [0.007270380303299079]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.31566902777043787, 'test_loss': 0.4890047773718834, 'bleu': 3.0313, 'gen_len': 7.0548}




 10%|█         | 91/877 [21:32<3:10:05, 14.51s/it]

For epoch 171: {Learning rate: [0.00726086014768491]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.31611897014990087, 'test_loss': 0.4893541172146797, 'bleu': 4.049, 'gen_len': 6.6096}




 10%|█         | 92/877 [21:47<3:10:47, 14.58s/it]

For epoch 172: {Learning rate: [0.0072513399920707405]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.63batches/s]



Metrics: {'train_loss': 0.3154755205642886, 'test_loss': 0.48127051293849943, 'bleu': 3.3977, 'gen_len': 7.9247}




 11%|█         | 93/877 [22:02<3:10:29, 14.58s/it]

For epoch 173: {Learning rate: [0.007241819836456571]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.31402118831146053, 'test_loss': 0.49641327410936353, 'bleu': 3.2766, 'gen_len': 5.9178}




 11%|█         | 94/877 [22:16<3:10:21, 14.59s/it]

For epoch 174: {Learning rate: [0.007232299680842402]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.31851564529465465, 'test_loss': 0.4719820275902748, 'bleu': 3.4953, 'gen_len': 7.3493}




 11%|█         | 95/877 [22:31<3:10:34, 14.62s/it]

For epoch 175: {Learning rate: [0.007222779525228232]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.29batches/s]



Metrics: {'train_loss': 0.31286389217144106, 'test_loss': 0.4805505990982056, 'bleu': 2.0484, 'gen_len': 7.1233}




 11%|█         | 96/877 [22:46<3:11:43, 14.73s/it]

For epoch 176: {Learning rate: [0.007213259369614062]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.53batches/s]



Metrics: {'train_loss': 0.31334422528743744, 'test_loss': 0.48372637331485746, 'bleu': 2.8741, 'gen_len': 6.8767}




 11%|█         | 97/877 [23:00<3:11:02, 14.70s/it]

For epoch 177: {Learning rate: [0.007203739213999892]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.3184017008397637, 'test_loss': 0.48085020631551745, 'bleu': 2.7807, 'gen_len': 6.4932}




 11%|█         | 98/877 [23:15<3:09:44, 14.61s/it]

For epoch 178: {Learning rate: [0.007194219058385723]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.54batches/s]



Metrics: {'train_loss': 0.3148430287837982, 'test_loss': 0.4865194633603096, 'bleu': 1.6635, 'gen_len': 8.0274}




 11%|█▏        | 99/877 [23:29<3:08:44, 14.56s/it]

For epoch 179: {Learning rate: [0.0071846989027715535]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.43batches/s]



Metrics: {'train_loss': 0.31495148525005434, 'test_loss': 0.48485142439603807, 'bleu': 1.8953, 'gen_len': 7.637}




 11%|█▏        | 100/877 [23:44<3:08:51, 14.58s/it]

For epoch 180: {Learning rate: [0.0071751787471573845]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.3165519052162403, 'test_loss': 0.48736423403024676, 'bleu': 2.7843, 'gen_len': 6.8425}




 12%|█▏        | 101/877 [23:58<3:08:00, 14.54s/it]

For epoch 181: {Learning rate: [0.007165658591543215]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.60batches/s]



Metrics: {'train_loss': 0.3155698543641625, 'test_loss': 0.48700015246868134, 'bleu': 5.502, 'gen_len': 6.6507}




 12%|█▏        | 102/877 [24:13<3:07:19, 14.50s/it]

For epoch 182: {Learning rate: [0.007156138435929046]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.52batches/s]



Metrics: {'train_loss': 0.31316714606634, 'test_loss': 0.49260923117399213, 'bleu': 1.7768, 'gen_len': 7.8836}




 12%|█▏        | 103/877 [24:27<3:07:16, 14.52s/it]

For epoch 183: {Learning rate: [0.007146618280314875]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.3135969878696814, 'test_loss': 0.4740543693304062, 'bleu': 4.5711, 'gen_len': 6.589}




 12%|█▏        | 104/877 [24:42<3:06:31, 14.48s/it]

For epoch 184: {Learning rate: [0.007137098124700706]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.36batches/s]



Metrics: {'train_loss': 0.31415258057233764, 'test_loss': 0.4924491822719574, 'bleu': 3.0086, 'gen_len': 6.9041}




 12%|█▏        | 105/877 [24:56<3:07:25, 14.57s/it]

For epoch 185: {Learning rate: [0.007127577969086536]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.52batches/s]



Metrics: {'train_loss': 0.31532478296175237, 'test_loss': 0.48784765750169756, 'bleu': 3.3722, 'gen_len': 7.9041}




 12%|█▏        | 106/877 [25:11<3:07:58, 14.63s/it]

For epoch 186: {Learning rate: [0.007118057813472367]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.34batches/s]



Metrics: {'train_loss': 0.31574251012104315, 'test_loss': 0.4756080970168114, 'bleu': 5.4038, 'gen_len': 6.0753}




 12%|█▏        | 107/877 [25:28<3:14:15, 15.14s/it]

For epoch 187: {Learning rate: [0.0071085376578581974]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.31410058496928794, 'test_loss': 0.48397808969020845, 'bleu': 4.465, 'gen_len': 7.4863}




 12%|█▏        | 108/877 [25:43<3:13:17, 15.08s/it]

For epoch 188: {Learning rate: [0.0070990175022440285]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.32batches/s]



Metrics: {'train_loss': 0.3159273295867734, 'test_loss': 0.4862870439887047, 'bleu': 2.502, 'gen_len': 5.8767}




 12%|█▏        | 109/877 [25:58<3:14:30, 15.20s/it]

For epoch 189: {Learning rate: [0.007089497346629858]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.31277814580173025, 'test_loss': 0.4794291153550148, 'bleu': 3.2385, 'gen_len': 7.4589}




 13%|█▎        | 110/877 [26:13<3:12:05, 15.03s/it]

For epoch 190: {Learning rate: [0.007079977191015689]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.3137936468531446, 'test_loss': 0.4756088644266129, 'bleu': 1.5497, 'gen_len': 8.6918}




 13%|█▎        | 111/877 [26:27<3:07:46, 14.71s/it]

For epoch 191: {Learning rate: [0.007070457035401519]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.3127948311043949, 'test_loss': 0.4754203960299492, 'bleu': 3.8375, 'gen_len': 6.5}




 13%|█▎        | 112/877 [26:41<3:05:20, 14.54s/it]

For epoch 192: {Learning rate: [0.00706093687978735]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.313435839443672, 'test_loss': 0.48395506143569944, 'bleu': 3.4874, 'gen_len': 7.0274}




 13%|█▎        | 113/877 [26:55<3:02:45, 14.35s/it]

For epoch 193: {Learning rate: [0.00705141672417318]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.3099506188456605, 'test_loss': 0.488277430832386, 'bleu': 2.1894, 'gen_len': 6.3973}




 13%|█▎        | 114/877 [27:09<3:01:54, 14.30s/it]

For epoch 194: {Learning rate: [0.007041896568559011]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.31264426722759153, 'test_loss': 0.48166666775941847, 'bleu': 2.9175, 'gen_len': 6.8973}




 13%|█▎        | 115/877 [27:23<2:59:46, 14.16s/it]

For epoch 195: {Learning rate: [0.0070323764129448406]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.3118757336604886, 'test_loss': 0.48542328774929044, 'bleu': 2.2916, 'gen_len': 6.3836}




 13%|█▎        | 116/877 [27:36<2:58:15, 14.05s/it]

For epoch 196: {Learning rate: [0.007022856257330672]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.72batches/s]



Metrics: {'train_loss': 0.3105738958934458, 'test_loss': 0.4815559178590775, 'bleu': 2.2516, 'gen_len': 7.3425}




 13%|█▎        | 117/877 [27:50<2:57:45, 14.03s/it]

For epoch 197: {Learning rate: [0.007013336101716502]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.3137879030006688, 'test_loss': 0.4915165439248085, 'bleu': 3.1225, 'gen_len': 6.0479}




 13%|█▎        | 118/877 [28:04<2:57:07, 14.00s/it]

For epoch 198: {Learning rate: [0.007003815946102333]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.309638558364496, 'test_loss': 0.48351292312145233, 'bleu': 3.2709, 'gen_len': 6.9452}




 14%|█▎        | 119/877 [28:18<2:56:32, 13.97s/it]

For epoch 199: {Learning rate: [0.006994295790488163]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.3079828976131067, 'test_loss': 0.5014308959245681, 'bleu': 3.3322, 'gen_len': 6.4589}




 14%|█▎        | 120/877 [28:32<2:56:28, 13.99s/it]

For epoch 200: {Learning rate: [0.006984775634873994]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.49batches/s]



Metrics: {'train_loss': 0.30906047326762504, 'test_loss': 0.49781920313835143, 'bleu': 3.4267, 'gen_len': 6.4795}




 14%|█▍        | 121/877 [28:47<2:58:06, 14.14s/it]

For epoch 201: {Learning rate: [0.006975255479259824]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.3090658235113795, 'test_loss': 0.48462740480899813, 'bleu': 2.2048, 'gen_len': 7.1849}




 14%|█▍        | 122/877 [29:01<2:58:55, 14.22s/it]

For epoch 202: {Learning rate: [0.006965735323645654]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.3101598166111039, 'test_loss': 0.4834200620651245, 'bleu': 1.957, 'gen_len': 8.2534}




 14%|█▍        | 123/877 [29:15<2:58:41, 14.22s/it]

For epoch 203: {Learning rate: [0.0069562151680314845]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.3096205094238607, 'test_loss': 0.4908932313323021, 'bleu': 3.4451, 'gen_len': 7.2123}




 14%|█▍        | 124/877 [29:29<2:57:34, 14.15s/it]

For epoch 204: {Learning rate: [0.006946695012417316]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.31348895227036827, 'test_loss': 0.4974935084581375, 'bleu': 2.026, 'gen_len': 6.1575}




 14%|█▍        | 125/877 [29:44<2:57:18, 14.15s/it]

For epoch 205: {Learning rate: [0.006937174856803146]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.3102620403941085, 'test_loss': 0.47950068265199663, 'bleu': 1.8776, 'gen_len': 7.6781}




 14%|█▍        | 126/877 [29:59<3:00:44, 14.44s/it]

For epoch 206: {Learning rate: [0.006927654701188977]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.30788720962477895, 'test_loss': 0.49613128006458285, 'bleu': 2.3177, 'gen_len': 6.589}




 14%|█▍        | 127/877 [30:13<2:58:30, 14.28s/it]

For epoch 207: {Learning rate: [0.006918134545574807]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.43batches/s]



Metrics: {'train_loss': 0.30905467635247763, 'test_loss': 0.4815659895539284, 'bleu': 2.2885, 'gen_len': 7.0342}




 15%|█▍        | 128/877 [30:27<2:58:58, 14.34s/it]

For epoch 208: {Learning rate: [0.006908614389960637]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.3074448039618934, 'test_loss': 0.4853266254067421, 'bleu': 2.4109, 'gen_len': 6.5548}




 15%|█▍        | 129/877 [30:41<2:57:12, 14.21s/it]

For epoch 209: {Learning rate: [0.006899094234346467]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.3090593727623544, 'test_loss': 0.48240077346563337, 'bleu': 2.4357, 'gen_len': 7.9726}




 15%|█▍        | 130/877 [30:55<2:55:15, 14.08s/it]

For epoch 210: {Learning rate: [0.006889574078732298]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.3107216409066828, 'test_loss': 0.5002588570117951, 'bleu': 5.6069, 'gen_len': 6.9863}




 15%|█▍        | 131/877 [31:08<2:53:31, 13.96s/it]

For epoch 211: {Learning rate: [0.0068800539231181285]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.3106380868248823, 'test_loss': 0.48651739209890366, 'bleu': 2.8687, 'gen_len': 7.1712}




 15%|█▌        | 132/877 [31:22<2:51:59, 13.85s/it]

For epoch 212: {Learning rate: [0.0068705337675039595]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.72batches/s]



Metrics: {'train_loss': 0.3096819032983082, 'test_loss': 0.4758743345737457, 'bleu': 4.2795, 'gen_len': 6.6986}




 15%|█▌        | 133/877 [31:36<2:51:58, 13.87s/it]

For epoch 213: {Learning rate: [0.00686101361188979]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.53batches/s]



Metrics: {'train_loss': 0.3062056855457585, 'test_loss': 0.4747950181365013, 'bleu': 2.1236, 'gen_len': 6.5822}




 15%|█▌        | 134/877 [31:50<2:52:32, 13.93s/it]

For epoch 214: {Learning rate: [0.006851493456275621]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.3089456729045728, 'test_loss': 0.48453245162963865, 'bleu': 3.7142, 'gen_len': 6.6644}




 15%|█▌        | 135/877 [32:04<2:52:09, 13.92s/it]

For epoch 215: {Learning rate: [0.00684197330066145]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.3055791157047923, 'test_loss': 0.48594170063734055, 'bleu': 3.5097, 'gen_len': 6.3973}




 16%|█▌        | 136/877 [32:18<2:50:48, 13.83s/it]

For epoch 216: {Learning rate: [0.006832453145047281]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.30834838348190957, 'test_loss': 0.48332712799310684, 'bleu': 1.3973, 'gen_len': 7.3836}




 16%|█▌        | 137/877 [32:31<2:49:59, 13.78s/it]

For epoch 217: {Learning rate: [0.006822932989433111]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.76batches/s]



Metrics: {'train_loss': 0.30725859759784324, 'test_loss': 0.5012581393122673, 'bleu': 3.07, 'gen_len': 6.5479}




 16%|█▌        | 138/877 [32:45<2:49:34, 13.77s/it]

For epoch 218: {Learning rate: [0.006813412833818942]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.3059252275199425, 'test_loss': 0.4766694620251656, 'bleu': 4.2137, 'gen_len': 7.2192}




 16%|█▌        | 139/877 [32:59<2:49:42, 13.80s/it]

For epoch 219: {Learning rate: [0.0068038926782047725]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.3086651905280788, 'test_loss': 0.4823186591267586, 'bleu': 1.8949, 'gen_len': 7.0616}




 16%|█▌        | 140/877 [33:13<2:49:50, 13.83s/it]

For epoch 220: {Learning rate: [0.0067943725225906035]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.30682446825795057, 'test_loss': 0.4755541458725929, 'bleu': 1.7667, 'gen_len': 6.5479}




 16%|█▌        | 141/877 [33:27<2:49:43, 13.84s/it]

For epoch 221: {Learning rate: [0.006784852366976433]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.38batches/s]



Metrics: {'train_loss': 0.3056715955821479, 'test_loss': 0.48077516108751295, 'bleu': 4.1354, 'gen_len': 6.3493}




 16%|█▌        | 142/877 [33:41<2:51:49, 14.03s/it]

For epoch 222: {Learning rate: [0.006775332211362264]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.70batches/s]



Metrics: {'train_loss': 0.30309575832471614, 'test_loss': 0.4760854333639145, 'bleu': 3.1476, 'gen_len': 7.3219}




 16%|█▋        | 143/877 [33:55<2:51:19, 14.00s/it]

For epoch 223: {Learning rate: [0.006765812055748094]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.30175790190696716, 'test_loss': 0.48131078481674194, 'bleu': 2.073, 'gen_len': 7.1918}




 16%|█▋        | 144/877 [34:09<2:50:13, 13.93s/it]

For epoch 224: {Learning rate: [0.006756291900133925]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.3040306204702796, 'test_loss': 0.49474141001701355, 'bleu': 4.7168, 'gen_len': 6.137}




 17%|█▋        | 145/877 [34:23<2:49:54, 13.93s/it]

For epoch 225: {Learning rate: [0.006746771744519755]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.3055544259344659, 'test_loss': 0.49066360890865324, 'bleu': 2.8381, 'gen_len': 6.4589}




 17%|█▋        | 146/877 [34:38<2:54:37, 14.33s/it]

For epoch 226: {Learning rate: [0.006737251588905586]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.42batches/s]



Metrics: {'train_loss': 0.303676781857886, 'test_loss': 0.4828293725848198, 'bleu': 3.9093, 'gen_len': 5.8836}




 17%|█▋        | 147/877 [34:53<2:55:32, 14.43s/it]

For epoch 227: {Learning rate: [0.006727731433291416]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.76batches/s]



Metrics: {'train_loss': 0.303227399907461, 'test_loss': 0.4961747795343399, 'bleu': 3.2737, 'gen_len': 6.4041}




 17%|█▋        | 148/877 [35:06<2:53:21, 14.27s/it]

For epoch 228: {Learning rate: [0.006718211277677247]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.30240587708426686, 'test_loss': 0.4804406642913818, 'bleu': 1.8901, 'gen_len': 7.6575}




 17%|█▋        | 149/877 [35:20<2:51:26, 14.13s/it]

For epoch 229: {Learning rate: [0.006708691122063077]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.3019148253086137, 'test_loss': 0.4829527959227562, 'bleu': 2.8514, 'gen_len': 6.8767}




 17%|█▋        | 150/877 [35:34<2:51:21, 14.14s/it]

For epoch 230: {Learning rate: [0.006699170966448908]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.76batches/s]



Metrics: {'train_loss': 0.3051661638224997, 'test_loss': 0.476346829533577, 'bleu': 4.1625, 'gen_len': 7.6644}




 17%|█▋        | 151/877 [35:48<2:50:46, 14.11s/it]

For epoch 231: {Learning rate: [0.006689650810834738]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.3038494608751157, 'test_loss': 0.47860066443681715, 'bleu': 3.015, 'gen_len': 7.5205}




 17%|█▋        | 152/877 [36:02<2:49:14, 14.01s/it]

For epoch 232: {Learning rate: [0.006680130655220569]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.30261453186593407, 'test_loss': 0.4845928892493248, 'bleu': 3.6872, 'gen_len': 6.5548}




 17%|█▋        | 153/877 [36:16<2:48:14, 13.94s/it]

For epoch 233: {Learning rate: [0.006670610499606398]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.3013827564512811, 'test_loss': 0.48387505263090136, 'bleu': 4.4404, 'gen_len': 6.2055}




 18%|█▊        | 154/877 [36:30<2:46:42, 13.84s/it]

For epoch 234: {Learning rate: [0.006661090343992229]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.3043595875908689, 'test_loss': 0.4801594644784927, 'bleu': 3.4763, 'gen_len': 7.1027}




 18%|█▊        | 155/877 [36:43<2:45:35, 13.76s/it]

For epoch 235: {Learning rate: [0.0066515701883780596]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.3036648674709041, 'test_loss': 0.48881769478321074, 'bleu': 2.5481, 'gen_len': 6.8973}




 18%|█▊        | 156/877 [36:57<2:44:41, 13.71s/it]

For epoch 236: {Learning rate: [0.006642050032763891]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.3029680604614863, 'test_loss': 0.491717591881752, 'bleu': 2.4688, 'gen_len': 6.6781}




 18%|█▊        | 157/877 [37:10<2:44:19, 13.69s/it]

For epoch 237: {Learning rate: [0.006632529877149721]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.30320088274595214, 'test_loss': 0.4988493755459785, 'bleu': 2.5067, 'gen_len': 6.9247}




 18%|█▊        | 158/877 [37:24<2:43:26, 13.64s/it]

For epoch 238: {Learning rate: [0.006623009721535552]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.3012366276688692, 'test_loss': 0.4898044764995575, 'bleu': 3.9569, 'gen_len': 6.3151}




 18%|█▊        | 159/877 [37:38<2:43:26, 13.66s/it]

For epoch 239: {Learning rate: [0.006613489565921382]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.30299601976464435, 'test_loss': 0.4843781113624573, 'bleu': 2.0243, 'gen_len': 7.7877}




 18%|█▊        | 160/877 [37:52<2:45:11, 13.82s/it]

For epoch 240: {Learning rate: [0.006603969410307212]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.72batches/s]



Metrics: {'train_loss': 0.3018377546857043, 'test_loss': 0.49164145439863205, 'bleu': 4.0814, 'gen_len': 6.6644}




 18%|█▊        | 161/877 [38:06<2:44:54, 13.82s/it]

For epoch 241: {Learning rate: [0.006594449254693042]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.3006021307735908, 'test_loss': 0.490033695101738, 'bleu': 2.1122, 'gen_len': 8.2945}




 18%|█▊        | 162/877 [38:19<2:43:50, 13.75s/it]

For epoch 242: {Learning rate: [0.006584929099078873]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.3015711827249062, 'test_loss': 0.4810897156596184, 'bleu': 2.7288, 'gen_len': 6.8562}




 19%|█▊        | 163/877 [38:33<2:43:44, 13.76s/it]

For epoch 243: {Learning rate: [0.0065754089434647035]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.30385248108607965, 'test_loss': 0.48855134546756745, 'bleu': 2.3372, 'gen_len': 6.5}




 19%|█▊        | 164/877 [38:47<2:43:37, 13.77s/it]

For epoch 244: {Learning rate: [0.006565888787850535]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.30164575431405044, 'test_loss': 0.48460646718740463, 'bleu': 1.9906, 'gen_len': 7.5342}




 19%|█▉        | 165/877 [39:01<2:43:18, 13.76s/it]

For epoch 245: {Learning rate: [0.006556368632236365]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.3049483357406244, 'test_loss': 0.4962180256843567, 'bleu': 6.0343, 'gen_len': 6.2945}




 19%|█▉        | 166/877 [39:15<2:44:27, 13.88s/it]

For epoch 246: {Learning rate: [0.006546848476622196]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.3006921857595444, 'test_loss': 0.5039267480373383, 'bleu': 3.7652, 'gen_len': 7.3288}




 19%|█▉        | 167/877 [39:29<2:43:51, 13.85s/it]

For epoch 247: {Learning rate: [0.006537328321008025]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.39batches/s]



Metrics: {'train_loss': 0.30165059501078073, 'test_loss': 0.46998237520456315, 'bleu': 4.6837, 'gen_len': 7.3836}




 19%|█▉        | 168/877 [39:43<2:46:43, 14.11s/it]

For epoch 248: {Learning rate: [0.006527808165393856]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.2993896937951809, 'test_loss': 0.48518710881471633, 'bleu': 2.3458, 'gen_len': 7.0342}




 19%|█▉        | 169/877 [39:57<2:45:28, 14.02s/it]

For epoch 249: {Learning rate: [0.006518288009779686]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.30189023221411354, 'test_loss': 0.4774787962436676, 'bleu': 2.8761, 'gen_len': 7.4041}




 19%|█▉        | 170/877 [40:11<2:44:05, 13.93s/it]

For epoch 250: {Learning rate: [0.006508767854165517]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.29876755541417654, 'test_loss': 0.4751048877835274, 'bleu': 2.3821, 'gen_len': 6.9247}




 19%|█▉        | 171/877 [40:25<2:43:17, 13.88s/it]

For epoch 251: {Learning rate: [0.0064992476985513475]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.297307583980444, 'test_loss': 0.5031994014978409, 'bleu': 1.87, 'gen_len': 6.589}




 20%|█▉        | 172/877 [40:38<2:42:16, 13.81s/it]

For epoch 252: {Learning rate: [0.0064897275429371785]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.60batches/s]



Metrics: {'train_loss': 0.30075167010470133, 'test_loss': 0.47480113953351977, 'bleu': 3.7861, 'gen_len': 7.6507}




 20%|█▉        | 173/877 [40:52<2:43:15, 13.91s/it]

For epoch 253: {Learning rate: [0.006480207387323008]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.2997554824119661, 'test_loss': 0.47948236018419266, 'bleu': 2.855, 'gen_len': 7.3082}




 20%|█▉        | 174/877 [41:06<2:42:09, 13.84s/it]

For epoch 254: {Learning rate: [0.006470687231708839]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.2981922415698447, 'test_loss': 0.4799270212650299, 'bleu': 1.9435, 'gen_len': 7.2877}




 20%|█▉        | 175/877 [41:20<2:41:16, 13.78s/it]

For epoch 255: {Learning rate: [0.006461167076094669]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.30166813985603613, 'test_loss': 0.47803124934434893, 'bleu': 3.3361, 'gen_len': 6.8356}




 20%|██        | 176/877 [41:33<2:40:43, 13.76s/it]

For epoch 256: {Learning rate: [0.0064516469204805]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.2984235250368351, 'test_loss': 0.49316045790910723, 'bleu': 2.5341, 'gen_len': 7.3219}




 20%|██        | 177/877 [41:47<2:40:05, 13.72s/it]

For epoch 257: {Learning rate: [0.00644212676486633]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.30331888322423145, 'test_loss': 0.474428229033947, 'bleu': 2.8396, 'gen_len': 7.1438}




 20%|██        | 178/877 [42:01<2:39:34, 13.70s/it]

For epoch 258: {Learning rate: [0.006432606609252161]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.30004495091554595, 'test_loss': 0.4834862291812897, 'bleu': 3.0752, 'gen_len': 7.0068}




 20%|██        | 179/877 [42:14<2:39:34, 13.72s/it]

For epoch 259: {Learning rate: [0.006423086453637991]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.2972908630603697, 'test_loss': 0.48550691902637483, 'bleu': 3.7034, 'gen_len': 7.0411}




 21%|██        | 180/877 [42:28<2:38:37, 13.65s/it]

For epoch 260: {Learning rate: [0.006413566298023822]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.3004985735183809, 'test_loss': 0.4754669338464737, 'bleu': 2.0902, 'gen_len': 7.4932}




 21%|██        | 181/877 [42:42<2:38:58, 13.71s/it]

For epoch 261: {Learning rate: [0.006404046142409652]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.2988226493684257, 'test_loss': 0.4820979222655296, 'bleu': 1.8254, 'gen_len': 7.9178}




 21%|██        | 182/877 [42:55<2:38:19, 13.67s/it]

For epoch 262: {Learning rate: [0.006394525986795483]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.3014644413459592, 'test_loss': 0.5018213868141175, 'bleu': 1.4809, 'gen_len': 6.6164}




 21%|██        | 183/877 [43:09<2:38:18, 13.69s/it]

For epoch 263: {Learning rate: [0.006385005831181313]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.30090210132482575, 'test_loss': 0.48064109981060027, 'bleu': 5.4186, 'gen_len': 6.4521}




 21%|██        | 184/877 [43:23<2:37:54, 13.67s/it]

For epoch 264: {Learning rate: [0.006375485675567144]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.54batches/s]



Metrics: {'train_loss': 0.29977586683703633, 'test_loss': 0.47522277384996414, 'bleu': 2.2018, 'gen_len': 7.9863}




 21%|██        | 185/877 [43:37<2:39:19, 13.81s/it]

For epoch 265: {Learning rate: [0.006365965519952973]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.2988055929905031, 'test_loss': 0.4715257242321968, 'bleu': 3.9544, 'gen_len': 7.0685}




 21%|██        | 186/877 [43:51<2:40:13, 13.91s/it]

For epoch 266: {Learning rate: [0.006356445364338804]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.29515235453117183, 'test_loss': 0.4682412326335907, 'bleu': 4.4639, 'gen_len': 6.726}




 21%|██▏       | 187/877 [44:05<2:39:22, 13.86s/it]

For epoch 267: {Learning rate: [0.006346925208724635]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.2947966463682128, 'test_loss': 0.4842700630426407, 'bleu': 3.8748, 'gen_len': 7.3767}




 21%|██▏       | 188/877 [44:18<2:37:59, 13.76s/it]

For epoch 268: {Learning rate: [0.006337405053110466]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.2964432046907704, 'test_loss': 0.48807902336120607, 'bleu': 2.0403, 'gen_len': 7.8562}




 22%|██▏       | 189/877 [44:32<2:37:52, 13.77s/it]

For epoch 269: {Learning rate: [0.006327884897496296]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.74batches/s]



Metrics: {'train_loss': 0.29720833897590637, 'test_loss': 0.47926700562238694, 'bleu': 2.572, 'gen_len': 7.0548}




 22%|██▏       | 190/877 [44:46<2:37:57, 13.79s/it]

For epoch 270: {Learning rate: [0.006318364741882127]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.2927464675612566, 'test_loss': 0.4764126047492027, 'bleu': 2.5489, 'gen_len': 6.9247}




 22%|██▏       | 191/877 [45:00<2:37:19, 13.76s/it]

For epoch 271: {Learning rate: [0.006308844586267957]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.29551908519209885, 'test_loss': 0.478716179728508, 'bleu': 4.4965, 'gen_len': 6.1644}




 22%|██▏       | 192/877 [45:13<2:36:40, 13.72s/it]

For epoch 272: {Learning rate: [0.006299324430653787]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.29529083792756244, 'test_loss': 0.4926921337842941, 'bleu': 4.7091, 'gen_len': 6.6027}




 22%|██▏       | 193/877 [45:27<2:36:37, 13.74s/it]

For epoch 273: {Learning rate: [0.006289804275039617]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.2957981087085677, 'test_loss': 0.48729112297296523, 'bleu': 1.8661, 'gen_len': 7.3151}




 22%|██▏       | 194/877 [45:41<2:36:20, 13.73s/it]

For epoch 274: {Learning rate: [0.006280284119425448]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.74batches/s]



Metrics: {'train_loss': 0.29664857540188766, 'test_loss': 0.48979356288909914, 'bleu': 3.4659, 'gen_len': 6.2466}




 22%|██▏       | 195/877 [45:55<2:36:46, 13.79s/it]

For epoch 275: {Learning rate: [0.0062707639638112786]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.2924991090123246, 'test_loss': 0.49052560776472093, 'bleu': 4.7295, 'gen_len': 6.3493}




 22%|██▏       | 196/877 [46:08<2:35:50, 13.73s/it]

For epoch 276: {Learning rate: [0.00626124380819711]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.70batches/s]



Metrics: {'train_loss': 0.29436109778357716, 'test_loss': 0.487569722533226, 'bleu': 2.3619, 'gen_len': 6.5685}




 22%|██▏       | 197/877 [46:22<2:35:58, 13.76s/it]

For epoch 277: {Learning rate: [0.00625172365258294]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.29515969135412357, 'test_loss': 0.4916557252407074, 'bleu': 2.873, 'gen_len': 6.3836}




 23%|██▎       | 198/877 [46:36<2:35:24, 13.73s/it]

For epoch 278: {Learning rate: [0.006242203496968771]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.29313239273501607, 'test_loss': 0.4767627984285355, 'bleu': 2.4419, 'gen_len': 6.9589}




 23%|██▎       | 199/877 [46:50<2:35:58, 13.80s/it]

For epoch 279: {Learning rate: [0.0062326833413546]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.28926013119337035, 'test_loss': 0.486750029027462, 'bleu': 2.2951, 'gen_len': 8.2671}




 23%|██▎       | 200/877 [47:03<2:35:46, 13.81s/it]

For epoch 280: {Learning rate: [0.006223163185740431]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.29445346572050235, 'test_loss': 0.484219329059124, 'bleu': 4.0801, 'gen_len': 7.0}




 23%|██▎       | 201/877 [47:17<2:35:30, 13.80s/it]

For epoch 281: {Learning rate: [0.006213643030126261]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.29429665098829966, 'test_loss': 0.4749786347150803, 'bleu': 5.191, 'gen_len': 7.1233}




 23%|██▎       | 202/877 [47:31<2:34:56, 13.77s/it]

For epoch 282: {Learning rate: [0.006204122874512092]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.2957755265439429, 'test_loss': 0.4850395992398262, 'bleu': 3.8668, 'gen_len': 6.226}




 23%|██▎       | 203/877 [47:45<2:34:53, 13.79s/it]

For epoch 283: {Learning rate: [0.0061946027188979225]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.28961500125687295, 'test_loss': 0.48315177410840987, 'bleu': 2.672, 'gen_len': 6.7397}




 23%|██▎       | 204/877 [47:59<2:34:52, 13.81s/it]

For epoch 284: {Learning rate: [0.006185082563283754]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.2919874561995995, 'test_loss': 0.49006218910217286, 'bleu': 4.0403, 'gen_len': 6.8973}




 23%|██▎       | 205/877 [48:13<2:34:58, 13.84s/it]

For epoch 285: {Learning rate: [0.006175562407669583]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.29130546675949565, 'test_loss': 0.4902900218963623, 'bleu': 2.6774, 'gen_len': 6.5685}




 23%|██▎       | 206/877 [48:26<2:34:11, 13.79s/it]

For epoch 286: {Learning rate: [0.006166042252055414]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.29133678463900964, 'test_loss': 0.49358098208904266, 'bleu': 4.8024, 'gen_len': 6.6781}




 24%|██▎       | 207/877 [48:40<2:34:33, 13.84s/it]

For epoch 287: {Learning rate: [0.006156522096441244]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.2920713192079125, 'test_loss': 0.47341360449790953, 'bleu': 2.9335, 'gen_len': 7.2877}




 24%|██▎       | 208/877 [48:54<2:33:56, 13.81s/it]

For epoch 288: {Learning rate: [0.006147001940827075]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.49batches/s]



Metrics: {'train_loss': 0.2895431300488914, 'test_loss': 0.4818298667669296, 'bleu': 2.9788, 'gen_len': 7.089}




 24%|██▍       | 209/877 [49:08<2:35:41, 13.98s/it]

For epoch 289: {Learning rate: [0.006137481785212905]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.29097441638388283, 'test_loss': 0.4849574714899063, 'bleu': 3.0572, 'gen_len': 7.3562}




 24%|██▍       | 210/877 [49:22<2:34:47, 13.92s/it]

For epoch 290: {Learning rate: [0.006127961629598736]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.45batches/s]



Metrics: {'train_loss': 0.2889434735222561, 'test_loss': 0.49043964594602585, 'bleu': 2.4844, 'gen_len': 6.8562}




 24%|██▍       | 211/877 [49:37<2:36:19, 14.08s/it]

For epoch 291: {Learning rate: [0.006118441473984566]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.2928615993842846, 'test_loss': 0.48032962083816527, 'bleu': 4.938, 'gen_len': 7.4726}




 24%|██▍       | 212/877 [49:51<2:36:22, 14.11s/it]

For epoch 292: {Learning rate: [0.006108921318370397]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.29378575268315105, 'test_loss': 0.49002849161624906, 'bleu': 2.4731, 'gen_len': 6.8904}




 24%|██▍       | 213/877 [50:05<2:35:21, 14.04s/it]

For epoch 293: {Learning rate: [0.006099401162756227]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.2905884904832375, 'test_loss': 0.48469667285680773, 'bleu': 4.0569, 'gen_len': 7.5959}




 24%|██▍       | 214/877 [50:18<2:34:12, 13.95s/it]

For epoch 294: {Learning rate: [0.006089881007142058]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.2925620849539594, 'test_loss': 0.488410958647728, 'bleu': 2.7327, 'gen_len': 7.2671}




 25%|██▍       | 215/877 [50:32<2:33:14, 13.89s/it]

For epoch 295: {Learning rate: [0.006080360851527888]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.28925817332616666, 'test_loss': 0.496192654967308, 'bleu': 2.1201, 'gen_len': 6.3836}




 25%|██▍       | 216/877 [50:46<2:32:01, 13.80s/it]

For epoch 296: {Learning rate: [0.006070840695913719]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.28859033163000897, 'test_loss': 0.47055129408836366, 'bleu': 4.0098, 'gen_len': 7.1164}




 25%|██▍       | 217/877 [50:59<2:31:41, 13.79s/it]

For epoch 297: {Learning rate: [0.006061320540299548]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.2866077721118927, 'test_loss': 0.48063365668058394, 'bleu': 3.6885, 'gen_len': 7.7466}




 25%|██▍       | 218/877 [51:13<2:31:17, 13.78s/it]

For epoch 298: {Learning rate: [0.0060518003846853794]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.2891807341721, 'test_loss': 0.4852726548910141, 'bleu': 5.9796, 'gen_len': 6.4795}




 25%|██▍       | 219/877 [51:27<2:30:47, 13.75s/it]

For epoch 299: {Learning rate: [0.00604228022907121]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.2893300107339533, 'test_loss': 0.4685216829180717, 'bleu': 4.3509, 'gen_len': 7.9589}




 25%|██▌       | 220/877 [51:40<2:30:03, 13.70s/it]

For epoch 300: {Learning rate: [0.006032760073457041]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.28845106283339056, 'test_loss': 0.48324306309223175, 'bleu': 2.5356, 'gen_len': 6.4863}




 25%|██▌       | 221/877 [51:54<2:29:38, 13.69s/it]

For epoch 301: {Learning rate: [0.006023239917842871]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.2887953552531033, 'test_loss': 0.479940664768219, 'bleu': 3.6438, 'gen_len': 7.2603}




 25%|██▌       | 222/877 [52:08<2:30:01, 13.74s/it]

For epoch 302: {Learning rate: [0.006013719762228702]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.28832283325311614, 'test_loss': 0.48427720218896864, 'bleu': 5.2413, 'gen_len': 7.1644}




 25%|██▌       | 223/877 [52:22<2:29:41, 13.73s/it]

For epoch 303: {Learning rate: [0.006004199606614532]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.28531329115716425, 'test_loss': 0.48533870875835416, 'bleu': 1.7301, 'gen_len': 7.5137}




 26%|██▌       | 224/877 [52:35<2:29:08, 13.70s/it]

For epoch 304: {Learning rate: [0.005994679451000362]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.2861511220292347, 'test_loss': 0.47759095281362535, 'bleu': 2.3802, 'gen_len': 8.2329}




 26%|██▌       | 225/877 [52:49<2:28:50, 13.70s/it]

For epoch 305: {Learning rate: [0.005985159295386192]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.28607302031865933, 'test_loss': 0.4864430665969849, 'bleu': 2.7366, 'gen_len': 7.089}




 26%|██▌       | 226/877 [53:03<2:29:27, 13.77s/it]

For epoch 306: {Learning rate: [0.005975639139772023]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.2879839877529842, 'test_loss': 0.4683127045631409, 'bleu': 2.5507, 'gen_len': 6.9315}




 26%|██▌       | 227/877 [53:17<2:28:47, 13.74s/it]

For epoch 307: {Learning rate: [0.005966118984157854]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.28706043340810916, 'test_loss': 0.4703521877527237, 'bleu': 4.1791, 'gen_len': 7.3425}




 26%|██▌       | 228/877 [53:31<2:29:32, 13.83s/it]

For epoch 308: {Learning rate: [0.005956598828543685]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.72batches/s]



Metrics: {'train_loss': 0.2833303045935747, 'test_loss': 0.46856421530246734, 'bleu': 2.787, 'gen_len': 8.2466}




 26%|██▌       | 229/877 [53:45<2:29:43, 13.86s/it]

For epoch 309: {Learning rate: [0.005947078672929515]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.2867373656935808, 'test_loss': 0.4746184483170509, 'bleu': 4.5901, 'gen_len': 7.5616}




 26%|██▌       | 230/877 [53:58<2:29:01, 13.82s/it]

For epoch 310: {Learning rate: [0.005937558517315345]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.2857958994260648, 'test_loss': 0.4644870162010193, 'bleu': 4.4214, 'gen_len': 7.0616}




 26%|██▋       | 231/877 [54:12<2:27:55, 13.74s/it]

For epoch 311: {Learning rate: [0.005928038361701175]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.2846668228143599, 'test_loss': 0.4756211146712303, 'bleu': 2.4168, 'gen_len': 7.5685}




 26%|██▋       | 232/877 [54:26<2:27:37, 13.73s/it]

For epoch 312: {Learning rate: [0.005918518206087006]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.70batches/s]



Metrics: {'train_loss': 0.28554490790134524, 'test_loss': 0.4741002231836319, 'bleu': 3.5151, 'gen_len': 8.4521}




 27%|██▋       | 233/877 [54:39<2:27:44, 13.76s/it]

For epoch 313: {Learning rate: [0.005908998050472836]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.28434219084134915, 'test_loss': 0.4821914196014404, 'bleu': 2.3131, 'gen_len': 7.8082}




 27%|██▋       | 234/877 [54:53<2:27:14, 13.74s/it]

For epoch 314: {Learning rate: [0.005899477894858667]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.2842111471222668, 'test_loss': 0.476377834379673, 'bleu': 2.8268, 'gen_len': 7.0959}




 27%|██▋       | 235/877 [55:07<2:26:38, 13.70s/it]

For epoch 315: {Learning rate: [0.0058899577392444976]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.28330489956751104, 'test_loss': 0.48583332300186155, 'bleu': 2.9566, 'gen_len': 6.6849}




 27%|██▋       | 236/877 [55:20<2:26:28, 13.71s/it]

For epoch 316: {Learning rate: [0.005880437583630329]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.2864914834499359, 'test_loss': 0.46940592378377916, 'bleu': 3.9233, 'gen_len': 6.8356}




 27%|██▋       | 237/877 [55:34<2:25:34, 13.65s/it]

For epoch 317: {Learning rate: [0.005870917428016158]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.28452523125381, 'test_loss': 0.48434377908706666, 'bleu': 5.3089, 'gen_len': 6.726}




 27%|██▋       | 238/877 [55:48<2:25:41, 13.68s/it]

For epoch 318: {Learning rate: [0.005861397272401989]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.28380441011452096, 'test_loss': 0.478620944917202, 'bleu': 3.4469, 'gen_len': 6.9589}




 27%|██▋       | 239/877 [56:01<2:25:31, 13.69s/it]

For epoch 319: {Learning rate: [0.005851877116787819]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.2831357379512089, 'test_loss': 0.47336848825216293, 'bleu': 2.5316, 'gen_len': 7.6438}




 27%|██▋       | 240/877 [56:15<2:25:45, 13.73s/it]

For epoch 320: {Learning rate: [0.00584235696117365]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.28106243217863686, 'test_loss': 0.46876674294471743, 'bleu': 6.0841, 'gen_len': 7.1027}




 27%|██▋       | 241/877 [56:29<2:25:02, 13.68s/it]

For epoch 321: {Learning rate: [0.00583283680555948]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.28200930175257893, 'test_loss': 0.45886383652687074, 'bleu': 3.1436, 'gen_len': 7.9178}




 28%|██▊       | 242/877 [56:42<2:24:37, 13.67s/it]

For epoch 322: {Learning rate: [0.005823316649945311]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.283080238031178, 'test_loss': 0.4711862787604332, 'bleu': 3.9418, 'gen_len': 6.863}




 28%|██▊       | 243/877 [56:56<2:24:22, 13.66s/it]

For epoch 323: {Learning rate: [0.005813796494331141]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.2844223976135254, 'test_loss': 0.4711559936404228, 'bleu': 3.7066, 'gen_len': 7.9932}




 28%|██▊       | 244/877 [57:10<2:24:06, 13.66s/it]

For epoch 324: {Learning rate: [0.005804276338716972]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.28465546740264425, 'test_loss': 0.46591165512800214, 'bleu': 4.2688, 'gen_len': 7.3562}




 28%|██▊       | 245/877 [57:23<2:23:57, 13.67s/it]

For epoch 325: {Learning rate: [0.005794756183102802]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.28133527898206945, 'test_loss': 0.47950317710638046, 'bleu': 3.0273, 'gen_len': 8.1781}




 28%|██▊       | 246/877 [57:37<2:23:56, 13.69s/it]

For epoch 326: {Learning rate: [0.005785236027488633]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.2791279052815786, 'test_loss': 0.4902585506439209, 'bleu': 3.7609, 'gen_len': 7.7808}




 28%|██▊       | 247/877 [57:51<2:23:45, 13.69s/it]

For epoch 327: {Learning rate: [0.005775715871874463]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.2788568586111069, 'test_loss': 0.483621883392334, 'bleu': 3.6541, 'gen_len': 6.9178}




 28%|██▊       | 248/877 [58:04<2:23:08, 13.65s/it]

For epoch 328: {Learning rate: [0.005766195716260294]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.2808130837795211, 'test_loss': 0.48487487286329267, 'bleu': 2.3618, 'gen_len': 7.2603}




 28%|██▊       | 249/877 [58:18<2:23:33, 13.72s/it]

For epoch 329: {Learning rate: [0.0057566755606461234]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.28095160606430797, 'test_loss': 0.4763629913330078, 'bleu': 3.3135, 'gen_len': 8.4795}




 29%|██▊       | 250/877 [58:32<2:23:03, 13.69s/it]

For epoch 330: {Learning rate: [0.0057471554050319545]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.51batches/s]



Metrics: {'train_loss': 0.27885236681961434, 'test_loss': 0.47216366827487943, 'bleu': 4.5064, 'gen_len': 6.6507}




 29%|██▊       | 251/877 [58:46<2:24:24, 13.84s/it]

For epoch 331: {Learning rate: [0.005737635249417785]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.60batches/s]



Metrics: {'train_loss': 0.2766020752307845, 'test_loss': 0.499031700193882, 'bleu': 4.4358, 'gen_len': 6.774}




 29%|██▊       | 252/877 [59:00<2:25:26, 13.96s/it]

For epoch 332: {Learning rate: [0.005728115093803616]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.2834111266746754, 'test_loss': 0.47325232326984407, 'bleu': 2.3979, 'gen_len': 8.363}




 29%|██▉       | 253/877 [59:14<2:24:20, 13.88s/it]

For epoch 333: {Learning rate: [0.005718594938189446]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.2802629241856133, 'test_loss': 0.48736949563026427, 'bleu': 4.0115, 'gen_len': 6.9041}




 29%|██▉       | 254/877 [59:28<2:23:06, 13.78s/it]

For epoch 334: {Learning rate: [0.005709074782575277]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.70batches/s]



Metrics: {'train_loss': 0.2750015367821949, 'test_loss': 0.4788248226046562, 'bleu': 5.1478, 'gen_len': 7.1301}




 29%|██▉       | 255/877 [59:42<2:23:23, 13.83s/it]

For epoch 335: {Learning rate: [0.005699554626961107]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.2752803389619036, 'test_loss': 0.4771325096487999, 'bleu': 2.5204, 'gen_len': 7.2466}




 29%|██▉       | 256/877 [59:55<2:22:15, 13.74s/it]

For epoch 336: {Learning rate: [0.005690034471346937]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.27690048334075185, 'test_loss': 0.4929326429963112, 'bleu': 4.9604, 'gen_len': 6.4589}




 29%|██▉       | 257/877 [1:00:09<2:22:30, 13.79s/it]

For epoch 337: {Learning rate: [0.005680514315732767]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.2788019783613158, 'test_loss': 0.4763061925768852, 'bleu': 2.5361, 'gen_len': 7.4863}




 29%|██▉       | 258/877 [1:00:23<2:21:31, 13.72s/it]

For epoch 338: {Learning rate: [0.0056709941601185984]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.27746224512414236, 'test_loss': 0.4705515906214714, 'bleu': 5.0431, 'gen_len': 6.9726}




 30%|██▉       | 259/877 [1:00:36<2:21:03, 13.69s/it]

For epoch 339: {Learning rate: [0.005661474004504429]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.2773839235305786, 'test_loss': 0.4590911492705345, 'bleu': 2.4697, 'gen_len': 6.8425}




 30%|██▉       | 260/877 [1:00:50<2:20:32, 13.67s/it]

For epoch 340: {Learning rate: [0.00565195384889026]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.2769593687319174, 'test_loss': 0.46251353025436404, 'bleu': 3.9252, 'gen_len': 7.5274}




 30%|██▉       | 261/877 [1:01:04<2:20:27, 13.68s/it]

For epoch 341: {Learning rate: [0.00564243369327609]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.27777769398398516, 'test_loss': 0.46626915633678434, 'bleu': 2.4727, 'gen_len': 7.6301}




 30%|██▉       | 262/877 [1:01:17<2:20:07, 13.67s/it]

For epoch 342: {Learning rate: [0.00563291353766192]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.27724131214909437, 'test_loss': 0.4608163468539715, 'bleu': 2.1212, 'gen_len': 7.911}




 30%|██▉       | 263/877 [1:01:31<2:20:04, 13.69s/it]

For epoch 343: {Learning rate: [0.00562339338204775]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.2758736813940653, 'test_loss': 0.4701121062040329, 'bleu': 3.233, 'gen_len': 7.4932}




 30%|███       | 264/877 [1:01:44<2:19:21, 13.64s/it]

For epoch 344: {Learning rate: [0.00561387322643358]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.55batches/s]



Metrics: {'train_loss': 0.2744860652743316, 'test_loss': 0.480135503411293, 'bleu': 4.7773, 'gen_len': 6.9795}




 30%|███       | 265/877 [1:01:58<2:20:09, 13.74s/it]

For epoch 345: {Learning rate: [0.005604353070819411]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.27477671642129015, 'test_loss': 0.4735031321644783, 'bleu': 2.8606, 'gen_len': 7.5616}




 30%|███       | 266/877 [1:02:12<2:19:24, 13.69s/it]

For epoch 346: {Learning rate: [0.0055948329152052416]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.27623238505386727, 'test_loss': 0.48059927076101305, 'bleu': 3.7135, 'gen_len': 6.9658}




 30%|███       | 267/877 [1:02:26<2:19:39, 13.74s/it]

For epoch 347: {Learning rate: [0.005585312759591073]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.27628645882373903, 'test_loss': 0.46921976655721664, 'bleu': 3.3746, 'gen_len': 8.0}




 31%|███       | 268/877 [1:02:39<2:19:10, 13.71s/it]

For epoch 348: {Learning rate: [0.005575792603976902]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.2755327962520646, 'test_loss': 0.45866930261254313, 'bleu': 4.7611, 'gen_len': 7.6712}




 31%|███       | 269/877 [1:02:53<2:18:41, 13.69s/it]

For epoch 349: {Learning rate: [0.005566272448362733]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.76batches/s]



Metrics: {'train_loss': 0.2764538951763293, 'test_loss': 0.4712155178189278, 'bleu': 3.1242, 'gen_len': 7.726}




 31%|███       | 270/877 [1:03:07<2:18:34, 13.70s/it]

For epoch 350: {Learning rate: [0.005556752292748563]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.27137895655341265, 'test_loss': 0.47544923722743987, 'bleu': 4.16, 'gen_len': 8.589}




 31%|███       | 271/877 [1:03:20<2:18:20, 13.70s/it]

For epoch 351: {Learning rate: [0.005547232137134394]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.27379220251629993, 'test_loss': 0.47612234801054, 'bleu': 3.2154, 'gen_len': 7.7877}




 31%|███       | 272/877 [1:03:34<2:18:18, 13.72s/it]

For epoch 352: {Learning rate: [0.005537711981520224]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.27516121144701794, 'test_loss': 0.4701095104217529, 'bleu': 2.3944, 'gen_len': 8.4041}




 31%|███       | 273/877 [1:03:48<2:18:13, 13.73s/it]

For epoch 353: {Learning rate: [0.005528191825906055]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.27487241640323545, 'test_loss': 0.4611749887466431, 'bleu': 4.4276, 'gen_len': 7.3014}




 31%|███       | 274/877 [1:04:02<2:17:49, 13.71s/it]

For epoch 354: {Learning rate: [0.005518671670291885]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.27479820244195985, 'test_loss': 0.46559766232967376, 'bleu': 2.4443, 'gen_len': 9.0137}




 31%|███▏      | 275/877 [1:04:15<2:17:26, 13.70s/it]

For epoch 355: {Learning rate: [0.005509151514677716]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.2738164038192935, 'test_loss': 0.4656894877552986, 'bleu': 3.2653, 'gen_len': 7.363}




 31%|███▏      | 276/877 [1:04:29<2:18:07, 13.79s/it]

For epoch 356: {Learning rate: [0.005499631359063546]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.2744932439995975, 'test_loss': 0.4621025905013084, 'bleu': 3.3853, 'gen_len': 7.8904}




 32%|███▏      | 277/877 [1:04:43<2:17:16, 13.73s/it]

For epoch 357: {Learning rate: [0.005490111203449377]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.74batches/s]



Metrics: {'train_loss': 0.2715881619511581, 'test_loss': 0.46868072599172594, 'bleu': 4.5113, 'gen_len': 6.8836}




 32%|███▏      | 278/877 [1:04:57<2:17:59, 13.82s/it]

For epoch 358: {Learning rate: [0.005480591047835207]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.27292688227281336, 'test_loss': 0.47554914057254793, 'bleu': 3.5, 'gen_len': 7.1507}




 32%|███▏      | 279/877 [1:05:11<2:17:12, 13.77s/it]

For epoch 359: {Learning rate: [0.005471070892221038]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.2707369469287919, 'test_loss': 0.4500320203602314, 'bleu': 3.5875, 'gen_len': 7.9384}




 32%|███▏      | 280/877 [1:05:24<2:16:36, 13.73s/it]

For epoch 360: {Learning rate: [0.005461550736606868]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.2742266186126849, 'test_loss': 0.46148035675287247, 'bleu': 2.9768, 'gen_len': 7.3836}




 32%|███▏      | 281/877 [1:05:38<2:16:12, 13.71s/it]

For epoch 361: {Learning rate: [0.0054520305809926985]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.27328671624020834, 'test_loss': 0.4714091420173645, 'bleu': 3.4588, 'gen_len': 8.3151}




 32%|███▏      | 282/877 [1:05:52<2:16:05, 13.72s/it]

For epoch 362: {Learning rate: [0.005442510425378529]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.2744720334686884, 'test_loss': 0.45992668718099594, 'bleu': 3.6015, 'gen_len': 8.2877}




 32%|███▏      | 283/877 [1:06:05<2:15:30, 13.69s/it]

For epoch 363: {Learning rate: [0.00543299026976436]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.2744278947754604, 'test_loss': 0.4736099734902382, 'bleu': 2.6749, 'gen_len': 8.3973}




 32%|███▏      | 284/877 [1:06:19<2:15:26, 13.70s/it]

For epoch 364: {Learning rate: [0.00542347011415019]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.2731312128101907, 'test_loss': 0.48279399871826173, 'bleu': 2.7557, 'gen_len': 7.089}




 32%|███▏      | 285/877 [1:06:33<2:14:52, 13.67s/it]

For epoch 365: {Learning rate: [0.005413949958536021]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.2710919376553559, 'test_loss': 0.4730250328779221, 'bleu': 2.7363, 'gen_len': 7.3082}




 33%|███▎      | 286/877 [1:06:49<2:22:01, 14.42s/it]

For epoch 366: {Learning rate: [0.005404429802921851]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.2704405820951229, 'test_loss': 0.475186724960804, 'bleu': 4.2532, 'gen_len': 6.8699}




 33%|███▎      | 287/877 [1:07:02<2:19:20, 14.17s/it]

For epoch 367: {Learning rate: [0.005394909647307681]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.2691497075848463, 'test_loss': 0.4712817117571831, 'bleu': 3.5238, 'gen_len': 7.0342}




 33%|███▎      | 288/877 [1:07:16<2:17:08, 13.97s/it]

For epoch 368: {Learning rate: [0.005385389491693511]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.2711922766231909, 'test_loss': 0.4672241970896721, 'bleu': 3.3795, 'gen_len': 7.3014}




 33%|███▎      | 289/877 [1:07:30<2:16:04, 13.88s/it]

For epoch 369: {Learning rate: [0.0053758693360793424]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.2707088226225318, 'test_loss': 0.4706027999520302, 'bleu': 5.3808, 'gen_len': 7.5479}




 33%|███▎      | 290/877 [1:07:43<2:15:08, 13.81s/it]

For epoch 370: {Learning rate: [0.005366349180465173]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.74batches/s]



Metrics: {'train_loss': 0.2682190170375312, 'test_loss': 0.4699255540966988, 'bleu': 1.876, 'gen_len': 7.9452}




 33%|███▎      | 291/877 [1:07:57<2:15:11, 13.84s/it]

For epoch 371: {Learning rate: [0.005356829024851004]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.2685692288526675, 'test_loss': 0.45831720605492593, 'bleu': 3.2721, 'gen_len': 7.9178}




 33%|███▎      | 292/877 [1:08:11<2:14:03, 13.75s/it]

For epoch 372: {Learning rate: [0.005347308869236834]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.268227764382595, 'test_loss': 0.45907353311777116, 'bleu': 4.4655, 'gen_len': 7.589}




 33%|███▎      | 293/877 [1:08:24<2:13:49, 13.75s/it]

For epoch 373: {Learning rate: [0.005337788713622665]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.2701174551393928, 'test_loss': 0.4687724396586418, 'bleu': 2.7283, 'gen_len': 9.4795}




 34%|███▎      | 294/877 [1:08:38<2:13:03, 13.69s/it]

For epoch 374: {Learning rate: [0.005328268558008494]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.55batches/s]



Metrics: {'train_loss': 0.2689400340725736, 'test_loss': 0.4654043272137642, 'bleu': 2.323, 'gen_len': 8.7123}




 34%|███▎      | 295/877 [1:08:52<2:14:36, 13.88s/it]

For epoch 375: {Learning rate: [0.005318748402394325]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.2671382743410948, 'test_loss': 0.46887498795986177, 'bleu': 4.8143, 'gen_len': 6.9452}




 34%|███▍      | 296/877 [1:09:06<2:13:42, 13.81s/it]

For epoch 376: {Learning rate: [0.005309228246780155]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.2667975785528741, 'test_loss': 0.4541195146739483, 'bleu': 2.3511, 'gen_len': 9.4658}




 34%|███▍      | 297/877 [1:09:20<2:13:00, 13.76s/it]

For epoch 377: {Learning rate: [0.005299708091165986]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.48batches/s]



Metrics: {'train_loss': 0.27025790294496027, 'test_loss': 0.4652064099907875, 'bleu': 2.8163, 'gen_len': 8.0}




 34%|███▍      | 298/877 [1:09:34<2:14:35, 13.95s/it]

For epoch 378: {Learning rate: [0.005290187935551817]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.26545890693257496, 'test_loss': 0.465106788277626, 'bleu': 3.3184, 'gen_len': 8.6986}




 34%|███▍      | 299/877 [1:09:48<2:14:46, 13.99s/it]

For epoch 379: {Learning rate: [0.005280667779937648]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.26645007038988716, 'test_loss': 0.4589654251933098, 'bleu': 3.8482, 'gen_len': 7.5822}




 34%|███▍      | 300/877 [1:10:02<2:14:07, 13.95s/it]

For epoch 380: {Learning rate: [0.005271147624323477]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.26860150440437036, 'test_loss': 0.452333627641201, 'bleu': 3.204, 'gen_len': 7.0616}




 34%|███▍      | 301/877 [1:10:15<2:12:42, 13.82s/it]

For epoch 381: {Learning rate: [0.005261627468709308]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.26804953341077015, 'test_loss': 0.4670350655913353, 'bleu': 2.9843, 'gen_len': 7.9795}




 34%|███▍      | 302/877 [1:10:29<2:12:16, 13.80s/it]

For epoch 382: {Learning rate: [0.005252107313095138]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.26772381256266337, 'test_loss': 0.462703937292099, 'bleu': 4.2359, 'gen_len': 8.2123}




 35%|███▍      | 303/877 [1:10:43<2:11:21, 13.73s/it]

For epoch 383: {Learning rate: [0.005242587157480969]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.26831947412432694, 'test_loss': 0.4688781797885895, 'bleu': 5.1555, 'gen_len': 7.3219}




 35%|███▍      | 304/877 [1:10:57<2:11:18, 13.75s/it]

For epoch 384: {Learning rate: [0.005233067001866799]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.26213920516211814, 'test_loss': 0.46225456297397616, 'bleu': 3.577, 'gen_len': 8.1438}




 35%|███▍      | 305/877 [1:11:10<2:10:47, 13.72s/it]

For epoch 385: {Learning rate: [0.00522354684625263]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.26517887122747374, 'test_loss': 0.46236988455057143, 'bleu': 6.1132, 'gen_len': 7.2397}




 35%|███▍      | 306/877 [1:11:24<2:10:01, 13.66s/it]

For epoch 386: {Learning rate: [0.00521402669063846]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.2678288839212278, 'test_loss': 0.4552995428442955, 'bleu': 5.1949, 'gen_len': 7.6918}




 35%|███▌      | 307/877 [1:11:37<2:09:56, 13.68s/it]

For epoch 387: {Learning rate: [0.005204506535024291]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.2679371906489861, 'test_loss': 0.4661698296666145, 'bleu': 5.7464, 'gen_len': 7.7877}




 35%|███▌      | 308/877 [1:11:51<2:09:39, 13.67s/it]

For epoch 388: {Learning rate: [0.005194986379410121]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.2642797769569769, 'test_loss': 0.46192832440137865, 'bleu': 3.3854, 'gen_len': 7.2671}




 35%|███▌      | 309/877 [1:12:05<2:10:12, 13.75s/it]

For epoch 389: {Learning rate: [0.005185466223795952]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.26536451280117035, 'test_loss': 0.45864229574799537, 'bleu': 4.776, 'gen_len': 6.6301}




 35%|███▌      | 310/877 [1:12:19<2:09:25, 13.70s/it]

For epoch 390: {Learning rate: [0.005175946068181782]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.263459555623008, 'test_loss': 0.46343906670808793, 'bleu': 4.6216, 'gen_len': 8.4178}




 35%|███▌      | 311/877 [1:12:32<2:09:14, 13.70s/it]

For epoch 391: {Learning rate: [0.005166425912567613]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.2656314056820986, 'test_loss': 0.4511997178196907, 'bleu': 5.5294, 'gen_len': 7.8219}




 36%|███▌      | 312/877 [1:12:46<2:08:41, 13.67s/it]

For epoch 392: {Learning rate: [0.0051569057569534425]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.26353186768729514, 'test_loss': 0.4547980137169361, 'bleu': 5.6932, 'gen_len': 7.089}




 36%|███▌      | 313/877 [1:13:00<2:08:49, 13.70s/it]

For epoch 393: {Learning rate: [0.0051473856013392735]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.2652596100801375, 'test_loss': 0.4657291129231453, 'bleu': 3.0391, 'gen_len': 6.9384}




 36%|███▌      | 314/877 [1:13:13<2:08:17, 13.67s/it]

For epoch 394: {Learning rate: [0.005137865445725104]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.2648678647308815, 'test_loss': 0.46387965232133865, 'bleu': 4.9265, 'gen_len': 7.6918}




 36%|███▌      | 315/877 [1:13:27<2:08:42, 13.74s/it]

For epoch 395: {Learning rate: [0.005128345290110935]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.26534071164887124, 'test_loss': 0.46012042760849, 'bleu': 2.4175, 'gen_len': 7.3973}




 36%|███▌      | 316/877 [1:13:41<2:08:16, 13.72s/it]

For epoch 396: {Learning rate: [0.005118825134496765]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.26252025329485174, 'test_loss': 0.4638278320431709, 'bleu': 3.5671, 'gen_len': 6.8493}




 36%|███▌      | 317/877 [1:13:55<2:09:22, 13.86s/it]

For epoch 397: {Learning rate: [0.005109304978882596]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.26660258123060554, 'test_loss': 0.4589418537914753, 'bleu': 4.9358, 'gen_len': 6.9726}




 36%|███▋      | 318/877 [1:14:09<2:08:07, 13.75s/it]

For epoch 398: {Learning rate: [0.005099784823268426]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.26209179003064226, 'test_loss': 0.46728030443191526, 'bleu': 5.4954, 'gen_len': 7.2123}




 36%|███▋      | 319/877 [1:14:23<2:08:31, 13.82s/it]

For epoch 399: {Learning rate: [0.005090264667654256]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.2610670906014559, 'test_loss': 0.4554185099899769, 'bleu': 4.1104, 'gen_len': 7.9315}




 36%|███▋      | 320/877 [1:14:36<2:07:54, 13.78s/it]

For epoch 400: {Learning rate: [0.005080744512040086]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.26041755661731814, 'test_loss': 0.4549801304936409, 'bleu': 7.1066, 'gen_len': 7.1781}




 37%|███▋      | 321/877 [1:14:50<2:07:33, 13.77s/it]

For epoch 401: {Learning rate: [0.0050712243564259175]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.26363149394349356, 'test_loss': 0.4449952691793442, 'bleu': 4.8821, 'gen_len': 8.4452}




 37%|███▋      | 322/877 [1:15:04<2:06:47, 13.71s/it]

For epoch 402: {Learning rate: [0.005061704200811748]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.26013729775824196, 'test_loss': 0.4692604601383209, 'bleu': 4.5051, 'gen_len': 7.0205}




 37%|███▋      | 323/877 [1:15:17<2:06:41, 13.72s/it]

For epoch 403: {Learning rate: [0.005052184045197579]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.26194044111705406, 'test_loss': 0.4584700509905815, 'bleu': 3.0724, 'gen_len': 7.3973}




 37%|███▋      | 324/877 [1:15:31<2:06:20, 13.71s/it]

For epoch 404: {Learning rate: [0.005042663889583409]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.26136943915995153, 'test_loss': 0.44249061718583105, 'bleu': 5.2125, 'gen_len': 7.8082}




 37%|███▋      | 325/877 [1:15:45<2:06:15, 13.72s/it]

For epoch 405: {Learning rate: [0.00503314373396924]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.25852212855001777, 'test_loss': 0.4495159566402435, 'bleu': 4.7578, 'gen_len': 7.5068}




 37%|███▋      | 326/877 [1:16:00<2:11:09, 14.28s/it]

For epoch 406: {Learning rate: [0.005023623578355069]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.2566266481469317, 'test_loss': 0.4601869985461235, 'bleu': 4.694, 'gen_len': 7.6507}




 37%|███▋      | 327/877 [1:16:14<2:09:13, 14.10s/it]

For epoch 407: {Learning rate: [0.0050141034227409]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.25818376134081583, 'test_loss': 0.4594258010387421, 'bleu': 3.1303, 'gen_len': 9.0822}




 37%|███▋      | 328/877 [1:16:28<2:07:50, 13.97s/it]

For epoch 408: {Learning rate: [0.00500458326712673]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.2596414986906982, 'test_loss': 0.46121277660131454, 'bleu': 4.4611, 'gen_len': 7.6644}




 38%|███▊      | 329/877 [1:16:41<2:06:43, 13.87s/it]

For epoch 409: {Learning rate: [0.0049950631115125614]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.25947793482280357, 'test_loss': 0.4562712967395782, 'bleu': 4.3311, 'gen_len': 7.7945}




 38%|███▊      | 330/877 [1:16:55<2:06:38, 13.89s/it]

For epoch 410: {Learning rate: [0.004985542955898392]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.2592685037269825, 'test_loss': 0.46328519433736803, 'bleu': 4.7685, 'gen_len': 7.2397}




 38%|███▊      | 331/877 [1:17:09<2:05:56, 13.84s/it]

For epoch 411: {Learning rate: [0.004976022800284223]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.25794441307463295, 'test_loss': 0.4727852612733841, 'bleu': 3.4098, 'gen_len': 7.0548}




 38%|███▊      | 332/877 [1:17:23<2:06:46, 13.96s/it]

For epoch 412: {Learning rate: [0.004966502644670052]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.42batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.99batches/s]



Metrics: {'train_loss': 0.2579066749753022, 'test_loss': 0.45371262952685354, 'bleu': 2.3808, 'gen_len': 8.1712}




 38%|███▊      | 333/877 [1:17:39<2:11:57, 14.55s/it]

For epoch 413: {Learning rate: [0.004956982489055883]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.95batches/s]



Metrics: {'train_loss': 0.2600836201411922, 'test_loss': 0.4485882095992565, 'bleu': 3.9871, 'gen_len': 7.7329}




 38%|███▊      | 334/877 [1:17:55<2:16:37, 15.10s/it]

For epoch 414: {Learning rate: [0.004947462333441713]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.93batches/s]



Metrics: {'train_loss': 0.2584289461374283, 'test_loss': 0.4543960288167, 'bleu': 3.5357, 'gen_len': 8.4863}




 38%|███▊      | 335/877 [1:18:12<2:20:06, 15.51s/it]

For epoch 415: {Learning rate: [0.004937942177827544]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.98batches/s]



Metrics: {'train_loss': 0.2576163999918031, 'test_loss': 0.45667546093463895, 'bleu': 4.9096, 'gen_len': 7.5753}




 38%|███▊      | 336/877 [1:18:28<2:21:00, 15.64s/it]

For epoch 416: {Learning rate: [0.004928422022213374]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.30batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.80batches/s]



Metrics: {'train_loss': 0.2548985266830863, 'test_loss': 0.4416883707046509, 'bleu': 5.944, 'gen_len': 8.363}




 38%|███▊      | 337/877 [1:18:45<2:24:41, 16.08s/it]

For epoch 417: {Learning rate: [0.004918901866599205]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.98batches/s]



Metrics: {'train_loss': 0.25782364207070047, 'test_loss': 0.445974350720644, 'bleu': 6.8762, 'gen_len': 7.2055}




 39%|███▊      | 338/877 [1:19:01<2:24:47, 16.12s/it]

For epoch 418: {Learning rate: [0.004909381710985035]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.44batches/s]



Metrics: {'train_loss': 0.2545498591370699, 'test_loss': 0.4498396493494511, 'bleu': 6.5242, 'gen_len': 6.9932}




 39%|███▊      | 339/877 [1:19:16<2:20:56, 15.72s/it]

For epoch 419: {Learning rate: [0.004899861555370866]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:06<00:00,  1.61batches/s]



Metrics: {'train_loss': 0.25518976660763343, 'test_loss': 0.44552419409155847, 'bleu': 6.7851, 'gen_len': 7.9452}




 39%|███▉      | 340/877 [1:19:33<2:25:01, 16.20s/it]

For epoch 420: {Learning rate: [0.004890341399756696]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.11batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.94batches/s]



Metrics: {'train_loss': 0.2515757912542762, 'test_loss': 0.4570526823401451, 'bleu': 5.3069, 'gen_len': 6.4178}




 39%|███▉      | 341/877 [1:19:51<2:28:10, 16.59s/it]

For epoch 421: {Learning rate: [0.004880821244142527]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.22batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.84batches/s]



Metrics: {'train_loss': 0.25532955621800774, 'test_loss': 0.4550593100488186, 'bleu': 5.0623, 'gen_len': 7.6301}




 39%|███▉      | 342/877 [1:20:08<2:30:24, 16.87s/it]

For epoch 422: {Learning rate: [0.004871301088528357]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.96batches/s]



Metrics: {'train_loss': 0.2557639966650707, 'test_loss': 0.4681544691324234, 'bleu': 8.2091, 'gen_len': 6.911}




 39%|███▉      | 343/877 [1:20:27<2:34:14, 17.33s/it]

For epoch 423: {Learning rate: [0.004861780932914188]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.25459336871054117, 'test_loss': 0.4807933434844017, 'bleu': 5.879, 'gen_len': 7.4726}




 39%|███▉      | 344/877 [1:20:42<2:27:14, 16.57s/it]

For epoch 424: {Learning rate: [0.0048522607773000175]}


Train batch number 40: 100%|██████████| 41/41 [00:11<00:00,  3.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:09<00:00,  1.01batches/s]



Metrics: {'train_loss': 0.25501441374057676, 'test_loss': 0.4810309514403343, 'bleu': 4.526, 'gen_len': 7.5274}




 39%|███▉      | 345/877 [1:21:05<2:45:50, 18.70s/it]

For epoch 425: {Learning rate: [0.0048427406216858485]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.35batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.14batches/s]



Metrics: {'train_loss': 0.25103786224272195, 'test_loss': 0.4656991109251976, 'bleu': 7.0811, 'gen_len': 7.9315}




 39%|███▉      | 346/877 [1:21:21<2:37:29, 17.80s/it]

For epoch 426: {Learning rate: [0.004833220466071679]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.99batches/s]



Metrics: {'train_loss': 0.25310331146891524, 'test_loss': 0.4545183405280113, 'bleu': 7.2436, 'gen_len': 7.3767}




 40%|███▉      | 347/877 [1:21:37<2:32:56, 17.31s/it]

For epoch 427: {Learning rate: [0.00482370031045751]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.32batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.11batches/s]



Metrics: {'train_loss': 0.2517176692078753, 'test_loss': 0.48551021963357927, 'bleu': 6.3157, 'gen_len': 7.3699}




 40%|███▉      | 348/877 [1:21:53<2:29:48, 16.99s/it]

For epoch 428: {Learning rate: [0.00481418015484334]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.34batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.10batches/s]



Metrics: {'train_loss': 0.25442162083416453, 'test_loss': 0.4708062678575516, 'bleu': 6.5782, 'gen_len': 6.8836}




 40%|███▉      | 349/877 [1:22:09<2:26:06, 16.60s/it]

For epoch 429: {Learning rate: [0.004804659999229171]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.00batches/s]



Metrics: {'train_loss': 0.2510694283537748, 'test_loss': 0.4658889055252075, 'bleu': 6.6092, 'gen_len': 6.4041}




 40%|███▉      | 350/877 [1:22:25<2:24:04, 16.40s/it]

For epoch 430: {Learning rate: [0.004795139843615001]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.05batches/s]



Metrics: {'train_loss': 0.2506575355442559, 'test_loss': 0.4658467426896095, 'bleu': 5.2177, 'gen_len': 7.6233}




 40%|████      | 351/877 [1:22:41<2:22:13, 16.22s/it]

For epoch 431: {Learning rate: [0.004785619688000831]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.35batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.72batches/s]



Metrics: {'train_loss': 0.25426201013530175, 'test_loss': 0.4699069976806641, 'bleu': 5.5368, 'gen_len': 6.7945}




 40%|████      | 352/877 [1:22:58<2:24:54, 16.56s/it]

For epoch 432: {Learning rate: [0.0047760995323866615]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.30batches/s]



Metrics: {'train_loss': 0.24974585778829528, 'test_loss': 0.47691062539815904, 'bleu': 7.6766, 'gen_len': 6.9932}




 40%|████      | 353/877 [1:23:13<2:21:19, 16.18s/it]

For epoch 433: {Learning rate: [0.0047665793767724925]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.42batches/s]



Metrics: {'train_loss': 0.24868555330648656, 'test_loss': 0.4586398094892502, 'bleu': 4.7136, 'gen_len': 7.4247}




 40%|████      | 354/877 [1:23:28<2:18:03, 15.84s/it]

For epoch 434: {Learning rate: [0.004757059221158323]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.38batches/s]



Metrics: {'train_loss': 0.24954049579980897, 'test_loss': 0.45916709676384926, 'bleu': 6.2769, 'gen_len': 7.0959}




 40%|████      | 355/877 [1:23:44<2:16:41, 15.71s/it]

For epoch 435: {Learning rate: [0.004747539065544154]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.32batches/s]



Metrics: {'train_loss': 0.2507014965138784, 'test_loss': 0.48499267995357515, 'bleu': 5.2932, 'gen_len': 6.9726}




 41%|████      | 356/877 [1:23:59<2:14:48, 15.53s/it]

For epoch 436: {Learning rate: [0.004738018909929984]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.39batches/s]



Metrics: {'train_loss': 0.2511617123353772, 'test_loss': 0.4697052076458931, 'bleu': 5.2078, 'gen_len': 7.1575}




 41%|████      | 357/877 [1:24:14<2:13:12, 15.37s/it]

For epoch 437: {Learning rate: [0.004728498754315815]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.34batches/s]



Metrics: {'train_loss': 0.25111322213963766, 'test_loss': 0.4693104162812233, 'bleu': 6.6935, 'gen_len': 6.8288}




 41%|████      | 358/877 [1:24:29<2:11:36, 15.22s/it]

For epoch 438: {Learning rate: [0.004718978598701644]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.86batches/s]



Metrics: {'train_loss': 0.24758113811655744, 'test_loss': 0.4539840042591095, 'bleu': 4.0216, 'gen_len': 7.0205}




 41%|████      | 359/877 [1:24:45<2:14:12, 15.54s/it]

For epoch 439: {Learning rate: [0.004709458443087475]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.50batches/s]



Metrics: {'train_loss': 0.24996737626994528, 'test_loss': 0.46018054634332656, 'bleu': 4.7862, 'gen_len': 8.1849}




 41%|████      | 360/877 [1:25:00<2:12:00, 15.32s/it]

For epoch 440: {Learning rate: [0.0046999382874733054]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.63batches/s]



Metrics: {'train_loss': 0.24766341905768324, 'test_loss': 0.4651894599199295, 'bleu': 6.0632, 'gen_len': 7.3425}




 41%|████      | 361/877 [1:25:14<2:08:29, 14.94s/it]

For epoch 441: {Learning rate: [0.0046904181318591365]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.2490758725055834, 'test_loss': 0.45162944346666334, 'bleu': 5.4732, 'gen_len': 7.5479}




 41%|████▏     | 362/877 [1:25:28<2:05:09, 14.58s/it]

For epoch 442: {Learning rate: [0.004680897976244967]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.24748402216085574, 'test_loss': 0.4876313090324402, 'bleu': 5.9358, 'gen_len': 6.9041}




 41%|████▏     | 363/877 [1:25:42<2:03:28, 14.41s/it]

For epoch 443: {Learning rate: [0.004671377820630798]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.29batches/s]



Metrics: {'train_loss': 0.24662928792034708, 'test_loss': 0.4763077154755592, 'bleu': 4.0342, 'gen_len': 8.1712}




 42%|████▏     | 364/877 [1:25:57<2:05:04, 14.63s/it]

For epoch 444: {Learning rate: [0.004661857665016627]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.23batches/s]



Metrics: {'train_loss': 0.24529720334018149, 'test_loss': 0.46997190862894056, 'bleu': 5.9573, 'gen_len': 7.6438}




 42%|████▏     | 365/877 [1:26:12<2:07:10, 14.90s/it]

For epoch 445: {Learning rate: [0.004652337509402458]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.46batches/s]



Metrics: {'train_loss': 0.2453406729349276, 'test_loss': 0.46806939467787745, 'bleu': 5.6564, 'gen_len': 7.0548}




 42%|████▏     | 366/877 [1:26:27<2:05:30, 14.74s/it]

For epoch 446: {Learning rate: [0.004642817353788288]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.26batches/s]



Metrics: {'train_loss': 0.24549147777441072, 'test_loss': 0.46759810000658036, 'bleu': 5.313, 'gen_len': 7.3425}




 42%|████▏     | 367/877 [1:26:42<2:06:22, 14.87s/it]

For epoch 447: {Learning rate: [0.004633297198174119]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.2451790276823974, 'test_loss': 0.4745862051844597, 'bleu': 5.935, 'gen_len': 7.3219}




 42%|████▏     | 368/877 [1:26:56<2:04:16, 14.65s/it]

For epoch 448: {Learning rate: [0.004623777042559949]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.24216122852592933, 'test_loss': 0.45254242047667503, 'bleu': 7.7703, 'gen_len': 6.6575}




 42%|████▏     | 369/877 [1:27:10<2:02:12, 14.43s/it]

For epoch 449: {Learning rate: [0.0046142568869457804]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.24627649020857928, 'test_loss': 0.4649318531155586, 'bleu': 6.617, 'gen_len': 7.5068}




 42%|████▏     | 370/877 [1:27:24<2:00:32, 14.26s/it]

For epoch 450: {Learning rate: [0.00460473673133161]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.98batches/s]



Metrics: {'train_loss': 0.24142467902927864, 'test_loss': 0.4861150234937668, 'bleu': 4.9886, 'gen_len': 7.6096}




 42%|████▏     | 371/877 [1:27:40<2:04:48, 14.80s/it]

For epoch 451: {Learning rate: [0.004595216575717441]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.05batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.24517846870713117, 'test_loss': 0.47362402081489563, 'bleu': 5.9565, 'gen_len': 7.4041}




 42%|████▏     | 372/877 [1:27:56<2:06:33, 15.04s/it]

For epoch 452: {Learning rate: [0.004585696420103271]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.45batches/s]



Metrics: {'train_loss': 0.24264851575944482, 'test_loss': 0.4706452816724777, 'bleu': 6.8286, 'gen_len': 7.2329}




 43%|████▎     | 373/877 [1:28:10<2:04:19, 14.80s/it]

For epoch 453: {Learning rate: [0.004576176264489102]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.2433555097114749, 'test_loss': 0.4715747743844986, 'bleu': 5.8444, 'gen_len': 7.637}




 43%|████▎     | 374/877 [1:28:24<2:01:37, 14.51s/it]

For epoch 454: {Learning rate: [0.004566656108874932]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.24046701338233017, 'test_loss': 0.4693236112594604, 'bleu': 7.0149, 'gen_len': 7.2671}




 43%|████▎     | 375/877 [1:28:38<2:00:26, 14.39s/it]

For epoch 455: {Learning rate: [0.004557135953260763]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.24665867846186568, 'test_loss': 0.46985592097043993, 'bleu': 6.5494, 'gen_len': 7.4589}




 43%|████▎     | 376/877 [1:28:52<1:59:52, 14.36s/it]

For epoch 456: {Learning rate: [0.0045476157976465925]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.24600453311350287, 'test_loss': 0.4639644280076027, 'bleu': 6.9689, 'gen_len': 7.0685}




 43%|████▎     | 377/877 [1:29:06<1:59:48, 14.38s/it]

For epoch 457: {Learning rate: [0.0045380956420324236]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.24382246558259174, 'test_loss': 0.48078139424324035, 'bleu': 6.9154, 'gen_len': 7.4315}




 43%|████▎     | 378/877 [1:29:21<1:58:58, 14.31s/it]

For epoch 458: {Learning rate: [0.004528575486418254]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.38batches/s]



Metrics: {'train_loss': 0.24498391769281247, 'test_loss': 0.47129958122968674, 'bleu': 7.0622, 'gen_len': 7.089}




 43%|████▎     | 379/877 [1:29:35<1:59:31, 14.40s/it]

For epoch 459: {Learning rate: [0.004519055330804085]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.64batches/s]



Metrics: {'train_loss': 0.24283444227241888, 'test_loss': 0.47008284032344816, 'bleu': 6.5662, 'gen_len': 7.2603}




 43%|████▎     | 380/877 [1:29:49<1:59:02, 14.37s/it]

For epoch 460: {Learning rate: [0.004509535175189915]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.76batches/s]



Metrics: {'train_loss': 0.2443536232884337, 'test_loss': 0.4570194773375988, 'bleu': 5.4454, 'gen_len': 7.0685}




 43%|████▎     | 381/877 [1:30:04<1:57:56, 14.27s/it]

For epoch 461: {Learning rate: [0.004500015019575746]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.24320759024561905, 'test_loss': 0.46581106558442115, 'bleu': 6.6332, 'gen_len': 7.1781}




 44%|████▎     | 382/877 [1:30:17<1:56:54, 14.17s/it]

For epoch 462: {Learning rate: [0.004490494863961576]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.2421756732027705, 'test_loss': 0.4697619989514351, 'bleu': 2.9374, 'gen_len': 6.8493}




 44%|████▎     | 383/877 [1:30:32<1:57:02, 14.22s/it]

For epoch 463: {Learning rate: [0.004480974708347406]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.70batches/s]



Metrics: {'train_loss': 0.2406233352858846, 'test_loss': 0.4715845361351967, 'bleu': 6.4908, 'gen_len': 6.5959}




 44%|████▍     | 384/877 [1:30:46<1:56:13, 14.14s/it]

For epoch 464: {Learning rate: [0.0044714545527332365]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.74batches/s]



Metrics: {'train_loss': 0.24165675262125527, 'test_loss': 0.45766731053590776, 'bleu': 5.6074, 'gen_len': 7.7329}




 44%|████▍     | 385/877 [1:31:00<1:55:23, 14.07s/it]

For epoch 465: {Learning rate: [0.0044619343971190675]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.24244326970926144, 'test_loss': 0.4665111228823662, 'bleu': 5.8297, 'gen_len': 6.8425}




 44%|████▍     | 386/877 [1:31:14<1:54:41, 14.02s/it]

For epoch 466: {Learning rate: [0.004452414241504898]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.24120922859122113, 'test_loss': 0.4505649633705616, 'bleu': 5.7519, 'gen_len': 7.3493}




 44%|████▍     | 387/877 [1:31:27<1:53:46, 13.93s/it]

For epoch 467: {Learning rate: [0.004442894085890729]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.24270594846911547, 'test_loss': 0.4638578586280346, 'bleu': 6.2852, 'gen_len': 7.2808}




 44%|████▍     | 388/877 [1:31:42<1:56:33, 14.30s/it]

For epoch 468: {Learning rate: [0.004433373930276559]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.23984753794786406, 'test_loss': 0.47268673181533816, 'bleu': 7.9906, 'gen_len': 7.4658}




 44%|████▍     | 389/877 [1:31:56<1:55:17, 14.18s/it]

For epoch 469: {Learning rate: [0.004423853774662389]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.32batches/s]



Metrics: {'train_loss': 0.2403966115742195, 'test_loss': 0.4620583295822144, 'bleu': 5.7652, 'gen_len': 7.8493}




 44%|████▍     | 390/877 [1:32:11<1:55:58, 14.29s/it]

For epoch 470: {Learning rate: [0.004414333619048219]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.2351770520937152, 'test_loss': 0.46129451170563696, 'bleu': 8.758, 'gen_len': 7.3836}




 45%|████▍     | 391/877 [1:32:25<1:54:46, 14.17s/it]

For epoch 471: {Learning rate: [0.00440481346343405]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.23751547024017428, 'test_loss': 0.45866756290197375, 'bleu': 7.023, 'gen_len': 6.7329}




 45%|████▍     | 392/877 [1:32:38<1:53:05, 13.99s/it]

For epoch 472: {Learning rate: [0.0043952933078198805]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.2380006698573508, 'test_loss': 0.4631552830338478, 'bleu': 8.5362, 'gen_len': 7.8425}




 45%|████▍     | 393/877 [1:32:52<1:52:23, 13.93s/it]

For epoch 473: {Learning rate: [0.004385773152205711]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.23941304516501544, 'test_loss': 0.46849360764026643, 'bleu': 7.6978, 'gen_len': 7.4863}




 45%|████▍     | 394/877 [1:33:06<1:51:27, 13.85s/it]

For epoch 474: {Learning rate: [0.004376252996591542]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.2382453425628383, 'test_loss': 0.46268621608614924, 'bleu': 5.7707, 'gen_len': 7.0959}




 45%|████▌     | 395/877 [1:33:20<1:51:34, 13.89s/it]

For epoch 475: {Learning rate: [0.004366732840977372]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.2389494834876642, 'test_loss': 0.4621312037110329, 'bleu': 6.6718, 'gen_len': 6.9315}




 45%|████▌     | 396/877 [1:33:33<1:50:45, 13.82s/it]

For epoch 476: {Learning rate: [0.004357212685363202]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.2377336232400522, 'test_loss': 0.457246758043766, 'bleu': 7.3547, 'gen_len': 7.6644}




 45%|████▌     | 397/877 [1:33:47<1:51:04, 13.88s/it]

For epoch 477: {Learning rate: [0.004347692529749033]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.23615160984237019, 'test_loss': 0.45451919212937353, 'bleu': 8.6374, 'gen_len': 7.2671}




 45%|████▌     | 398/877 [1:34:01<1:50:34, 13.85s/it]

For epoch 478: {Learning rate: [0.004338172374134863]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.76batches/s]



Metrics: {'train_loss': 0.23464548914897732, 'test_loss': 0.4507646843791008, 'bleu': 5.1782, 'gen_len': 7.911}




 45%|████▌     | 399/877 [1:34:15<1:50:17, 13.84s/it]

For epoch 479: {Learning rate: [0.004328652218520693]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.23799053234298054, 'test_loss': 0.46882989481091497, 'bleu': 7.4681, 'gen_len': 7.0685}




 46%|████▌     | 400/877 [1:34:29<1:49:46, 13.81s/it]

For epoch 480: {Learning rate: [0.0043191320629065244]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.23699693817917894, 'test_loss': 0.4635212063789368, 'bleu': 5.2899, 'gen_len': 7.637}




 46%|████▌     | 401/877 [1:34:43<1:49:24, 13.79s/it]

For epoch 481: {Learning rate: [0.004309611907292355]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.23746609651460882, 'test_loss': 0.46005340442061426, 'bleu': 7.2272, 'gen_len': 7.8767}




 46%|████▌     | 402/877 [1:34:56<1:49:25, 13.82s/it]

For epoch 482: {Learning rate: [0.004300091751678185]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.23540505512458523, 'test_loss': 0.4615600660443306, 'bleu': 7.0442, 'gen_len': 7.3973}




 46%|████▌     | 403/877 [1:35:10<1:49:33, 13.87s/it]

For epoch 483: {Learning rate: [0.004290571596064016]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.24001438479597975, 'test_loss': 0.45548035502433776, 'bleu': 5.6256, 'gen_len': 8.0479}




 46%|████▌     | 404/877 [1:35:24<1:49:20, 13.87s/it]

For epoch 484: {Learning rate: [0.004281051440449846]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.23763755109252, 'test_loss': 0.4618499994277954, 'bleu': 5.9873, 'gen_len': 7.5685}




 46%|████▌     | 405/877 [1:35:38<1:49:15, 13.89s/it]

For epoch 485: {Learning rate: [0.004271531284835676]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.23298575856336734, 'test_loss': 0.46107284128665926, 'bleu': 4.4047, 'gen_len': 7.4247}




 46%|████▋     | 406/877 [1:35:54<1:52:38, 14.35s/it]

For epoch 486: {Learning rate: [0.004262011129221507]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.17batches/s]



Metrics: {'train_loss': 0.23587957388017236, 'test_loss': 0.4585390828549862, 'bleu': 8.6928, 'gen_len': 7.7397}




 46%|████▋     | 407/877 [1:36:09<1:54:27, 14.61s/it]

For epoch 487: {Learning rate: [0.004252490973607337]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.05batches/s]



Metrics: {'train_loss': 0.23511033050897645, 'test_loss': 0.47067988067865374, 'bleu': 6.7191, 'gen_len': 7.0548}




 47%|████▋     | 408/877 [1:36:25<1:57:49, 15.07s/it]

For epoch 488: {Learning rate: [0.0042429708179931675]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.30batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.91batches/s]



Metrics: {'train_loss': 0.23297234024943375, 'test_loss': 0.4823199287056923, 'bleu': 7.4963, 'gen_len': 7.0479}




 47%|████▋     | 409/877 [1:36:42<2:01:30, 15.58s/it]

For epoch 489: {Learning rate: [0.004233450662378999]}


Train batch number 40: 100%|██████████| 41/41 [00:10<00:00,  4.07batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.95batches/s]



Metrics: {'train_loss': 0.23388739838832762, 'test_loss': 0.4602425217628479, 'bleu': 7.6993, 'gen_len': 7.0822}




 47%|████▋     | 410/877 [1:36:59<2:04:50, 16.04s/it]

For epoch 490: {Learning rate: [0.004223930506764829]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.15batches/s]



Metrics: {'train_loss': 0.23433103721316267, 'test_loss': 0.4702172055840492, 'bleu': 7.2941, 'gen_len': 7.4178}




 47%|████▋     | 411/877 [1:37:15<2:04:20, 16.01s/it]

For epoch 491: {Learning rate: [0.004214410351150659]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.40batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.52batches/s]



Metrics: {'train_loss': 0.2347262730685676, 'test_loss': 0.46524187326431277, 'bleu': 7.6756, 'gen_len': 6.6712}




 47%|████▋     | 412/877 [1:37:30<2:01:18, 15.65s/it]

For epoch 492: {Learning rate: [0.00420489019553649]}


Train batch number 14:  34%|███▍      | 14/41 [00:03<00:06,  4.50batches/s]

### ---

In [8]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 492: {Learning rate: [0.00420489019553649]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.42batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.40batches/s]



Metrics: {'train_loss': 0.23064579142303002, 'test_loss': 0.47452086210250854, 'bleu': 9.0274, 'gen_len': 7.1233}




  0%|          | 1/465 [00:15<1:57:45, 15.23s/it]

For epoch 493: {Learning rate: [0.00419537003992232]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.48batches/s]



Metrics: {'train_loss': 0.23169769164992543, 'test_loss': 0.4656444251537323, 'bleu': 7.5686, 'gen_len': 7.1164}




  0%|          | 2/465 [00:28<1:46:44, 13.83s/it]

For epoch 494: {Learning rate: [0.004185849884308151]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.30batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.28batches/s]



Metrics: {'train_loss': 0.22971682105122543, 'test_loss': 0.4775928735733032, 'bleu': 8.5122, 'gen_len': 6.6575}




  1%|          | 3/465 [00:41<1:46:26, 13.82s/it]

For epoch 495: {Learning rate: [0.004176329728693981]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.29batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.13batches/s]



Metrics: {'train_loss': 0.233059320144537, 'test_loss': 0.47714146226644516, 'bleu': 7.3515, 'gen_len': 6.6507}




  1%|          | 4/465 [00:57<1:50:33, 14.39s/it]

For epoch 496: {Learning rate: [0.0041668095730798115]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.33batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.34batches/s]



Metrics: {'train_loss': 0.2334437864582713, 'test_loss': 0.46878292560577395, 'bleu': 8.3085, 'gen_len': 7.4041}




  1%|          | 5/465 [01:10<1:48:35, 14.16s/it]

For epoch 497: {Learning rate: [0.0041572894174656426]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.32batches/s]



Metrics: {'train_loss': 0.23109213317312846, 'test_loss': 0.4634499177336693, 'bleu': 7.4597, 'gen_len': 7.2877}




  1%|▏         | 6/465 [01:24<1:46:55, 13.98s/it]

For epoch 498: {Learning rate: [0.004147769261851473]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.2298001534328228, 'test_loss': 0.4676465027034283, 'bleu': 6.6689, 'gen_len': 7.089}




  2%|▏         | 7/465 [01:36<1:42:11, 13.39s/it]

For epoch 499: {Learning rate: [0.004138249106237303]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.23032007457279577, 'test_loss': 0.46301301270723344, 'bleu': 5.1496, 'gen_len': 7.4041}




  2%|▏         | 8/465 [01:48<1:38:42, 12.96s/it]

For epoch 500: {Learning rate: [0.004128728950623134]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.22911849836023843, 'test_loss': 0.4676524966955185, 'bleu': 6.769, 'gen_len': 8.0548}




  2%|▏         | 9/465 [02:00<1:36:22, 12.68s/it]

For epoch 501: {Learning rate: [0.004119208795008964]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.23142792938686, 'test_loss': 0.4754118680953979, 'bleu': 6.8998, 'gen_len': 6.9795}




  2%|▏         | 10/465 [02:13<1:35:47, 12.63s/it]

For epoch 502: {Learning rate: [0.004109688639394794]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  5.09batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.23039091287589655, 'test_loss': 0.4698532447218895, 'bleu': 7.7598, 'gen_len': 7.8151}




  2%|▏         | 11/465 [02:26<1:36:24, 12.74s/it]

For epoch 503: {Learning rate: [0.004100168483780625]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  5.09batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.2287247806060605, 'test_loss': 0.46942700892686845, 'bleu': 9.4924, 'gen_len': 6.8562}




  3%|▎         | 12/465 [02:40<1:38:44, 13.08s/it]

For epoch 504: {Learning rate: [0.0040906483281664555]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.22765046394452815, 'test_loss': 0.46619201749563216, 'bleu': 6.6598, 'gen_len': 7.5753}




  3%|▎         | 13/465 [02:53<1:40:05, 13.29s/it]

For epoch 505: {Learning rate: [0.004081128172552286]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.23154740907797, 'test_loss': 0.46900095641613004, 'bleu': 7.3969, 'gen_len': 7.6712}




  3%|▎         | 14/465 [03:11<1:48:33, 14.44s/it]

For epoch 506: {Learning rate: [0.004071608016938117]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.97batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.22842423218052563, 'test_loss': 0.46651497632265093, 'bleu': 7.7382, 'gen_len': 6.7397}




  3%|▎         | 15/465 [03:25<1:47:56, 14.39s/it]

For epoch 507: {Learning rate: [0.004062087861323947]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.88batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.70batches/s]



Metrics: {'train_loss': 0.2297225641768153, 'test_loss': 0.473103404045105, 'bleu': 7.114, 'gen_len': 6.9521}




  3%|▎         | 16/465 [03:38<1:45:36, 14.11s/it]

For epoch 508: {Learning rate: [0.004052567705709777]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.23049018550209882, 'test_loss': 0.45615589767694475, 'bleu': 8.4509, 'gen_len': 7.1644}




  4%|▎         | 17/465 [03:52<1:44:04, 13.94s/it]

For epoch 509: {Learning rate: [0.004043047550095608]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.78batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.2293783684329289, 'test_loss': 0.4638485535979271, 'bleu': 10.5128, 'gen_len': 6.9932}




  4%|▍         | 18/465 [04:05<1:42:18, 13.73s/it]

For epoch 510: {Learning rate: [0.004033527394481438]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.83batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.10batches/s]



Metrics: {'train_loss': 0.22926085315099576, 'test_loss': 0.45821858048439024, 'bleu': 7.4778, 'gen_len': 7.9384}




  4%|▍         | 19/465 [04:18<1:40:25, 13.51s/it]

For epoch 511: {Learning rate: [0.004024007238867268]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.77batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.70batches/s]



Metrics: {'train_loss': 0.22658278011694188, 'test_loss': 0.4620973639190197, 'bleu': 7.0561, 'gen_len': 7.5137}




  4%|▍         | 20/465 [04:32<1:40:38, 13.57s/it]

For epoch 512: {Learning rate: [0.0040144870832530995]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.80batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.22638790200396283, 'test_loss': 0.4750009462237358, 'bleu': 7.9904, 'gen_len': 6.7055}




  5%|▍         | 21/465 [04:45<1:40:11, 13.54s/it]

For epoch 513: {Learning rate: [0.00400496692763893]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.80batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.2259700785322887, 'test_loss': 0.46691356152296065, 'bleu': 7.3534, 'gen_len': 7.4863}




  5%|▍         | 22/465 [04:59<1:39:33, 13.49s/it]

For epoch 514: {Learning rate: [0.00399544677202476]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.75batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.22451856550646992, 'test_loss': 0.4635754570364952, 'bleu': 9.2424, 'gen_len': 7.1096}




  5%|▍         | 23/465 [05:12<1:39:05, 13.45s/it]

For epoch 515: {Learning rate: [0.003985926616410591]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.2239118796296236, 'test_loss': 0.45904341265559195, 'bleu': 7.2474, 'gen_len': 7.3014}




  5%|▌         | 24/465 [05:26<1:39:20, 13.51s/it]

For epoch 516: {Learning rate: [0.003976406460796421]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.22532362799818922, 'test_loss': 0.4625184953212738, 'bleu': 6.9786, 'gen_len': 8.1644}




  5%|▌         | 25/465 [05:39<1:38:39, 13.45s/it]

For epoch 517: {Learning rate: [0.003966886305182251]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.2275356617642612, 'test_loss': 0.4636092960834503, 'bleu': 8.9976, 'gen_len': 7.5616}




  6%|▌         | 26/465 [05:53<1:38:48, 13.51s/it]

For epoch 518: {Learning rate: [0.003957366149568082]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.22547232850295743, 'test_loss': 0.45903973281383514, 'bleu': 5.1393, 'gen_len': 8.6438}




  6%|▌         | 27/465 [06:06<1:38:29, 13.49s/it]

For epoch 519: {Learning rate: [0.003947845993953912]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.226313244278838, 'test_loss': 0.46678290218114854, 'bleu': 6.5421, 'gen_len': 7.6644}




  6%|▌         | 28/465 [06:20<1:38:29, 13.52s/it]

For epoch 520: {Learning rate: [0.003938325838339743]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.72batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.22350153581398288, 'test_loss': 0.4636632725596428, 'bleu': 7.1563, 'gen_len': 7.2877}




  6%|▌         | 29/465 [06:33<1:38:21, 13.53s/it]

For epoch 521: {Learning rate: [0.003928805682725574]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.22263762282162178, 'test_loss': 0.47092433273792267, 'bleu': 7.7962, 'gen_len': 7.2466}




  6%|▋         | 30/465 [06:47<1:38:20, 13.57s/it]

For epoch 522: {Learning rate: [0.003919285527111404]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.13batches/s]



Metrics: {'train_loss': 0.22469875834337094, 'test_loss': 0.451921696215868, 'bleu': 6.3122, 'gen_len': 8.3082}




  7%|▋         | 31/465 [07:00<1:37:50, 13.53s/it]

For epoch 523: {Learning rate: [0.003909765371497234]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.22253871690936206, 'test_loss': 0.46974716186523435, 'bleu': 6.0709, 'gen_len': 7.9247}




  7%|▋         | 32/465 [07:14<1:37:19, 13.49s/it]

For epoch 524: {Learning rate: [0.003900245215883065]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.76batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.09batches/s]



Metrics: {'train_loss': 0.2238042957899047, 'test_loss': 0.463631372153759, 'bleu': 7.2375, 'gen_len': 6.9932}




  7%|▋         | 33/465 [07:27<1:36:16, 13.37s/it]

For epoch 525: {Learning rate: [0.003890725060268895]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.75batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.39batches/s]



Metrics: {'train_loss': 0.2220938256601008, 'test_loss': 0.46366423517465594, 'bleu': 9.9277, 'gen_len': 6.863}




  7%|▋         | 34/465 [07:41<1:38:05, 13.65s/it]

For epoch 526: {Learning rate: [0.0038812049046547258]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.02batches/s]



Metrics: {'train_loss': 0.21982014869771352, 'test_loss': 0.45022769644856453, 'bleu': 9.4442, 'gen_len': 7.3425}




  8%|▊         | 35/465 [07:57<1:41:52, 14.21s/it]

For epoch 527: {Learning rate: [0.0038716847490405564]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.22345448376202, 'test_loss': 0.4689529001712799, 'bleu': 8.7095, 'gen_len': 6.9521}




  8%|▊         | 36/465 [08:10<1:40:17, 14.03s/it]

For epoch 528: {Learning rate: [0.0038621645934263866]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.22111838292784808, 'test_loss': 0.4562089741230011, 'bleu': 7.163, 'gen_len': 8.0411}




  8%|▊         | 37/465 [08:24<1:39:30, 13.95s/it]

For epoch 529: {Learning rate: [0.003852644437812217]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.60batches/s]



Metrics: {'train_loss': 0.2208664638967049, 'test_loss': 0.45954821929335593, 'bleu': 5.9908, 'gen_len': 7.8082}




  8%|▊         | 38/465 [08:38<1:39:24, 13.97s/it]

For epoch 530: {Learning rate: [0.0038431242821980478]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.2187430186242592, 'test_loss': 0.4775440156459808, 'bleu': 6.44, 'gen_len': 7.1918}




  8%|▊         | 39/465 [08:52<1:39:07, 13.96s/it]

For epoch 531: {Learning rate: [0.003833604126583878]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.2212915177025446, 'test_loss': 0.45587187334895135, 'bleu': 8.3461, 'gen_len': 7.5411}




  9%|▊         | 40/465 [09:06<1:38:13, 13.87s/it]

For epoch 532: {Learning rate: [0.0038240839709697085]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.22115088672172734, 'test_loss': 0.46944013237953186, 'bleu': 8.7525, 'gen_len': 7.137}




  9%|▉         | 41/465 [09:19<1:37:28, 13.79s/it]

For epoch 533: {Learning rate: [0.003814563815355539]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.2199942335122969, 'test_loss': 0.45824885070323945, 'bleu': 7.0063, 'gen_len': 8.3904}




  9%|▉         | 42/465 [09:33<1:37:25, 13.82s/it]

For epoch 534: {Learning rate: [0.0038050436597413693]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.2184111574074117, 'test_loss': 0.46462518721818924, 'bleu': 6.2665, 'gen_len': 7.411}




  9%|▉         | 43/465 [09:47<1:37:38, 13.88s/it]

For epoch 535: {Learning rate: [0.0037955235041272]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.21567414155820522, 'test_loss': 0.45663699582219125, 'bleu': 8.4546, 'gen_len': 7.3699}




  9%|▉         | 44/465 [10:01<1:37:44, 13.93s/it]

For epoch 536: {Learning rate: [0.0037860033485130305]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.76batches/s]



Metrics: {'train_loss': 0.21994497354437664, 'test_loss': 0.45731194242835044, 'bleu': 6.0083, 'gen_len': 6.8151}




 10%|▉         | 45/465 [10:15<1:37:14, 13.89s/it]

For epoch 537: {Learning rate: [0.003776483192898861]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.21997561578343555, 'test_loss': 0.48348230868577957, 'bleu': 7.668, 'gen_len': 7.5616}




 10%|▉         | 46/465 [10:29<1:37:13, 13.92s/it]

For epoch 538: {Learning rate: [0.0037669630372846913]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.21784509173253688, 'test_loss': 0.47399566173553465, 'bleu': 8.0333, 'gen_len': 7.6438}




 10%|█         | 47/465 [10:42<1:36:13, 13.81s/it]

For epoch 539: {Learning rate: [0.003757442881670522]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.63batches/s]



Metrics: {'train_loss': 0.21641180864194545, 'test_loss': 0.46311434656381606, 'bleu': 9.1044, 'gen_len': 6.9178}




 10%|█         | 48/465 [10:57<1:36:30, 13.89s/it]

For epoch 540: {Learning rate: [0.0037479227260563525]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.21653459929838414, 'test_loss': 0.46927732303738595, 'bleu': 8.5866, 'gen_len': 6.4932}




 11%|█         | 49/465 [11:10<1:35:38, 13.79s/it]

For epoch 541: {Learning rate: [0.0037384025704421827]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.21937666360924885, 'test_loss': 0.4582121044397354, 'bleu': 9.0393, 'gen_len': 7.5137}




 11%|█         | 50/465 [11:24<1:35:21, 13.79s/it]

For epoch 542: {Learning rate: [0.0037288824148280133]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.2180332824951265, 'test_loss': 0.46691402047872543, 'bleu': 7.3114, 'gen_len': 7.6233}




 11%|█         | 51/465 [11:38<1:34:56, 13.76s/it]

For epoch 543: {Learning rate: [0.003719362259213844]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.2120621811325957, 'test_loss': 0.4674268767237663, 'bleu': 9.3674, 'gen_len': 7.1918}




 11%|█         | 52/465 [11:51<1:34:32, 13.73s/it]

For epoch 544: {Learning rate: [0.003709842103599674]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.21484179831132655, 'test_loss': 0.4681229844689369, 'bleu': 6.4414, 'gen_len': 7.4041}




 11%|█▏        | 53/465 [12:05<1:33:53, 13.67s/it]

For epoch 545: {Learning rate: [0.0037003219479855047]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.21490958950868466, 'test_loss': 0.4784576043486595, 'bleu': 8.6973, 'gen_len': 6.4795}




 12%|█▏        | 54/465 [12:18<1:33:01, 13.58s/it]

For epoch 546: {Learning rate: [0.0036908017923713353]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.21768261928383897, 'test_loss': 0.4704307518899441, 'bleu': 7.2476, 'gen_len': 6.9795}




 12%|█▏        | 55/465 [12:32<1:32:22, 13.52s/it]

For epoch 547: {Learning rate: [0.0036812816367571654]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.2169432018588229, 'test_loss': 0.4621335566043854, 'bleu': 9.0737, 'gen_len': 6.6096}




 12%|█▏        | 56/465 [12:45<1:32:07, 13.51s/it]

For epoch 548: {Learning rate: [0.003671761481142996]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.21528012759801818, 'test_loss': 0.45461109429597857, 'bleu': 6.6877, 'gen_len': 8.0205}




 12%|█▏        | 57/465 [12:59<1:32:01, 13.53s/it]

For epoch 549: {Learning rate: [0.0036622413255288267]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.21005220456821164, 'test_loss': 0.46421362161636354, 'bleu': 9.5752, 'gen_len': 7.4384}




 12%|█▏        | 58/465 [13:12<1:31:46, 13.53s/it]

For epoch 550: {Learning rate: [0.003652721169914657]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.2101623128827025, 'test_loss': 0.47180885300040243, 'bleu': 9.3873, 'gen_len': 6.9932}




 13%|█▎        | 59/465 [13:25<1:31:08, 13.47s/it]

For epoch 551: {Learning rate: [0.0036432010143004874]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.2152041494846344, 'test_loss': 0.46468003988265993, 'bleu': 9.0762, 'gen_len': 7.0411}




 13%|█▎        | 60/465 [13:39<1:31:08, 13.50s/it]

For epoch 552: {Learning rate: [0.003633680858686318]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.00batches/s]



Metrics: {'train_loss': 0.2146007109705995, 'test_loss': 0.45820154920220374, 'bleu': 7.6119, 'gen_len': 7.2123}




 13%|█▎        | 61/465 [13:52<1:30:35, 13.46s/it]

For epoch 553: {Learning rate: [0.0036241607030721486]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.21426970311781254, 'test_loss': 0.46033095344901087, 'bleu': 7.1885, 'gen_len': 6.7877}




 13%|█▎        | 62/465 [14:06<1:30:21, 13.45s/it]

For epoch 554: {Learning rate: [0.003614640547457979]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.74batches/s]



Metrics: {'train_loss': 0.21139652627270397, 'test_loss': 0.46263542547822, 'bleu': 8.9975, 'gen_len': 7.2877}




 14%|█▎        | 63/465 [14:20<1:30:53, 13.57s/it]

For epoch 555: {Learning rate: [0.0036051203918438094]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.00batches/s]



Metrics: {'train_loss': 0.2108633420089396, 'test_loss': 0.46843024119734766, 'bleu': 9.8801, 'gen_len': 6.9315}




 14%|█▍        | 64/465 [14:33<1:30:12, 13.50s/it]

For epoch 556: {Learning rate: [0.00359560023622964]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.21024985393372977, 'test_loss': 0.4681056633591652, 'bleu': 4.6418, 'gen_len': 6.6781}




 14%|█▍        | 65/465 [14:47<1:30:26, 13.57s/it]

For epoch 557: {Learning rate: [0.00358608008061547]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.2116241109807317, 'test_loss': 0.47409233152866365, 'bleu': 8.8716, 'gen_len': 6.774}




 14%|█▍        | 66/465 [15:00<1:29:49, 13.51s/it]

For epoch 558: {Learning rate: [0.003576559925001301]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.21002596180613448, 'test_loss': 0.45208646431565286, 'bleu': 7.3316, 'gen_len': 7.1438}




 14%|█▍        | 67/465 [15:14<1:29:25, 13.48s/it]

For epoch 559: {Learning rate: [0.0035670397693871314]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.20885230600833893, 'test_loss': 0.4607992485165596, 'bleu': 11.5194, 'gen_len': 7.3425}




 15%|█▍        | 68/465 [15:27<1:29:05, 13.46s/it]

For epoch 560: {Learning rate: [0.0035575196137729616]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.13batches/s]



Metrics: {'train_loss': 0.21082086715756393, 'test_loss': 0.4617252886295319, 'bleu': 8.5628, 'gen_len': 7.8493}




 15%|█▍        | 69/465 [15:42<1:32:45, 14.05s/it]

For epoch 561: {Learning rate: [0.003547999458158792]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.20850484254883556, 'test_loss': 0.4688154742121696, 'bleu': 8.81, 'gen_len': 7.4521}




 15%|█▌        | 70/465 [15:56<1:32:18, 14.02s/it]

For epoch 562: {Learning rate: [0.003538479302544623]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.20931565252746023, 'test_loss': 0.4750537037849426, 'bleu': 9.9753, 'gen_len': 7.3356}




 15%|█▌        | 71/465 [16:10<1:31:45, 13.97s/it]

For epoch 563: {Learning rate: [0.003528959146930453]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.2081675409543805, 'test_loss': 0.46512590497732165, 'bleu': 8.6273, 'gen_len': 8.0274}




 15%|█▌        | 72/465 [16:24<1:31:22, 13.95s/it]

For epoch 564: {Learning rate: [0.0035194389913162836]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.20429645515069728, 'test_loss': 0.46493272930383683, 'bleu': 9.0503, 'gen_len': 6.9726}




 16%|█▌        | 73/465 [16:38<1:30:51, 13.91s/it]

For epoch 565: {Learning rate: [0.003509918835702114]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.2094114873467422, 'test_loss': 0.4668074741959572, 'bleu': 7.2146, 'gen_len': 7.1644}




 16%|█▌        | 74/465 [16:52<1:30:22, 13.87s/it]

For epoch 566: {Learning rate: [0.0035003986800879443]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.20665219462499385, 'test_loss': 0.4695710584521294, 'bleu': 7.6631, 'gen_len': 7.2877}




 16%|█▌        | 75/465 [17:05<1:29:49, 13.82s/it]

For epoch 567: {Learning rate: [0.003490878524473775]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.20700462344216136, 'test_loss': 0.4739427149295807, 'bleu': 8.3131, 'gen_len': 8.0822}




 16%|█▋        | 76/465 [17:19<1:29:53, 13.87s/it]

For epoch 568: {Learning rate: [0.0034813583688596056]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.2066804239662682, 'test_loss': 0.4734477765858173, 'bleu': 8.0923, 'gen_len': 7.5959}




 17%|█▋        | 77/465 [17:33<1:29:47, 13.88s/it]

For epoch 569: {Learning rate: [0.003471838213245436]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.20573163250597512, 'test_loss': 0.47093606665730475, 'bleu': 11.3556, 'gen_len': 7.0959}




 17%|█▋        | 78/465 [17:47<1:29:36, 13.89s/it]

For epoch 570: {Learning rate: [0.0034623180576312663]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.202361111597317, 'test_loss': 0.45494943037629126, 'bleu': 8.7373, 'gen_len': 7.5}




 17%|█▋        | 79/465 [18:01<1:28:59, 13.83s/it]

For epoch 571: {Learning rate: [0.003452797902017097]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.20395190214238515, 'test_loss': 0.4721055723726749, 'bleu': 9.197, 'gen_len': 7.3973}




 17%|█▋        | 80/465 [18:15<1:28:36, 13.81s/it]

For epoch 572: {Learning rate: [0.0034432777464029275]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.20610036210315982, 'test_loss': 0.4720112383365631, 'bleu': 6.4091, 'gen_len': 7.4521}




 17%|█▋        | 81/465 [18:28<1:28:03, 13.76s/it]

For epoch 573: {Learning rate: [0.0034337575907887577]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.20469977543121431, 'test_loss': 0.46327529698610304, 'bleu': 8.4759, 'gen_len': 7.1712}




 18%|█▊        | 82/465 [18:43<1:28:45, 13.91s/it]

For epoch 574: {Learning rate: [0.0034242374351745883]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.203857512735739, 'test_loss': 0.45800446420907975, 'bleu': 8.4776, 'gen_len': 7.4178}




 18%|█▊        | 83/465 [18:56<1:28:05, 13.84s/it]

For epoch 575: {Learning rate: [0.003414717279560419]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.20479421717364613, 'test_loss': 0.46154830381274226, 'bleu': 11.1693, 'gen_len': 7.3288}




 18%|█▊        | 84/465 [19:10<1:27:22, 13.76s/it]

For epoch 576: {Learning rate: [0.003405197123946249]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.20426868120344674, 'test_loss': 0.4773610830307007, 'bleu': 9.816, 'gen_len': 6.8151}




 18%|█▊        | 85/465 [19:23<1:26:18, 13.63s/it]

For epoch 577: {Learning rate: [0.0033956769683320797]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.20407432862898198, 'test_loss': 0.46595966815948486, 'bleu': 7.6344, 'gen_len': 7.9863}




 18%|█▊        | 86/465 [19:37<1:25:51, 13.59s/it]

For epoch 578: {Learning rate: [0.0033861568127179103]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.2018644806088471, 'test_loss': 0.4678616732358932, 'bleu': 11.0538, 'gen_len': 6.774}




 19%|█▊        | 87/465 [19:50<1:25:45, 13.61s/it]

For epoch 579: {Learning rate: [0.0033766366571037405]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.2031763683005077, 'test_loss': 0.47225269824266436, 'bleu': 8.345, 'gen_len': 7.726}




 19%|█▉        | 88/465 [20:04<1:25:11, 13.56s/it]

For epoch 580: {Learning rate: [0.003367116501489571]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.19854798513214764, 'test_loss': 0.47146652191877364, 'bleu': 6.8861, 'gen_len': 8.7055}




 19%|█▉        | 89/465 [20:17<1:24:53, 13.55s/it]

For epoch 581: {Learning rate: [0.0033575963458754017]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.20224709808826447, 'test_loss': 0.4694652907550335, 'bleu': 9.1323, 'gen_len': 7.1301}




 19%|█▉        | 90/465 [20:31<1:24:26, 13.51s/it]

For epoch 582: {Learning rate: [0.003348076190261232]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.20143346343098617, 'test_loss': 0.46185197681188583, 'bleu': 8.554, 'gen_len': 7.2192}




 20%|█▉        | 91/465 [20:44<1:24:01, 13.48s/it]

For epoch 583: {Learning rate: [0.0033385560346470625]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.2003543373288178, 'test_loss': 0.4756917908787727, 'bleu': 9.7001, 'gen_len': 6.6986}




 20%|█▉        | 92/465 [21:00<1:28:19, 14.21s/it]

For epoch 584: {Learning rate: [0.003329035879032893]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.2021160950747932, 'test_loss': 0.45867898836731913, 'bleu': 7.5117, 'gen_len': 7.589}




 20%|██        | 93/465 [21:14<1:26:55, 14.02s/it]

For epoch 585: {Learning rate: [0.0033195157234187232]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.19994788853133597, 'test_loss': 0.46867084205150605, 'bleu': 10.416, 'gen_len': 7.411}




 20%|██        | 94/465 [21:27<1:26:10, 13.94s/it]

For epoch 586: {Learning rate: [0.003309995567804554]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.19968074996296953, 'test_loss': 0.458117937296629, 'bleu': 8.685, 'gen_len': 7.5959}




 20%|██        | 95/465 [21:41<1:25:05, 13.80s/it]

For epoch 587: {Learning rate: [0.0033004754121903845]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.19875789270168398, 'test_loss': 0.4815156556665897, 'bleu': 7.485, 'gen_len': 7.5753}




 21%|██        | 96/465 [21:54<1:23:58, 13.66s/it]

For epoch 588: {Learning rate: [0.003290955256576215]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.14batches/s]



Metrics: {'train_loss': 0.2003511452093357, 'test_loss': 0.4626190535724163, 'bleu': 8.6271, 'gen_len': 7.2192}




 21%|██        | 97/465 [22:07<1:23:09, 13.56s/it]

For epoch 589: {Learning rate: [0.0032814351009620452]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.19771375852387127, 'test_loss': 0.45451469495892527, 'bleu': 9.4267, 'gen_len': 7.7877}




 21%|██        | 98/465 [22:22<1:23:54, 13.72s/it]

For epoch 590: {Learning rate: [0.003271914945347876]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.19968817946387502, 'test_loss': 0.46735441163182256, 'bleu': 5.9215, 'gen_len': 7.9795}




 21%|██▏       | 99/465 [22:36<1:24:21, 13.83s/it]

For epoch 591: {Learning rate: [0.0032623947897337064]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.19850936132233318, 'test_loss': 0.47757103815674784, 'bleu': 6.6071, 'gen_len': 7.4658}




 22%|██▏       | 100/465 [22:49<1:23:41, 13.76s/it]

For epoch 592: {Learning rate: [0.0032528746341195366]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.20021922595617248, 'test_loss': 0.464480397105217, 'bleu': 7.9298, 'gen_len': 6.9178}




 22%|██▏       | 101/465 [23:03<1:23:06, 13.70s/it]

For epoch 593: {Learning rate: [0.003243354478505367]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.1981812497464622, 'test_loss': 0.4661931648850441, 'bleu': 7.364, 'gen_len': 7.3699}




 22%|██▏       | 102/465 [23:16<1:22:34, 13.65s/it]

For epoch 594: {Learning rate: [0.003233834322891198]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.19541404814254948, 'test_loss': 0.46108092069625856, 'bleu': 10.2927, 'gen_len': 7.637}




 22%|██▏       | 103/465 [23:30<1:21:51, 13.57s/it]

For epoch 595: {Learning rate: [0.003224314167277028]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.19586768273900196, 'test_loss': 0.47603671327233316, 'bleu': 8.8992, 'gen_len': 6.5}




 22%|██▏       | 104/465 [23:43<1:21:24, 13.53s/it]

For epoch 596: {Learning rate: [0.0032147940116628586]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.19622499295851079, 'test_loss': 0.4687031164765358, 'bleu': 9.2038, 'gen_len': 7.0}




 23%|██▎       | 105/465 [23:57<1:21:05, 13.51s/it]

For epoch 597: {Learning rate: [0.003205273856048689]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.19426403576281012, 'test_loss': 0.46374288499355315, 'bleu': 7.8902, 'gen_len': 7.6507}




 23%|██▎       | 106/465 [24:10<1:21:00, 13.54s/it]

For epoch 598: {Learning rate: [0.0031957537004345194]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.19722058751234195, 'test_loss': 0.46657717376947405, 'bleu': 11.1512, 'gen_len': 7.2192}




 23%|██▎       | 107/465 [24:24<1:20:55, 13.56s/it]

For epoch 599: {Learning rate: [0.0031862335448203495]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.19631747610685302, 'test_loss': 0.4785266757011414, 'bleu': 9.3425, 'gen_len': 7.1438}




 23%|██▎       | 108/465 [24:38<1:21:28, 13.69s/it]

For epoch 600: {Learning rate: [0.00317671338920618]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.19256569481477503, 'test_loss': 0.4707088991999626, 'bleu': 9.0661, 'gen_len': 7.3288}




 23%|██▎       | 109/465 [24:51<1:20:52, 13.63s/it]

For epoch 601: {Learning rate: [0.0031671932335920103]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.00batches/s]



Metrics: {'train_loss': 0.19727577014667233, 'test_loss': 0.46051797941327094, 'bleu': 9.2009, 'gen_len': 7.3288}




 24%|██▎       | 110/465 [25:05<1:20:08, 13.54s/it]

For epoch 602: {Learning rate: [0.003157673077977841]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.19391855815561806, 'test_loss': 0.46291167959570884, 'bleu': 7.7929, 'gen_len': 7.8288}




 24%|██▍       | 111/465 [25:18<1:20:26, 13.63s/it]

For epoch 603: {Learning rate: [0.0031481529223636715]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.08batches/s]



Metrics: {'train_loss': 0.19272288462010825, 'test_loss': 0.4701864905655384, 'bleu': 9.5097, 'gen_len': 7.4863}




 24%|██▍       | 112/465 [25:32<1:19:44, 13.55s/it]

For epoch 604: {Learning rate: [0.0031386327667495017]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.195738137131784, 'test_loss': 0.4665227435529232, 'bleu': 8.5107, 'gen_len': 6.9521}




 24%|██▍       | 113/465 [25:45<1:19:26, 13.54s/it]

For epoch 605: {Learning rate: [0.0031291126111353323]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.19159030042043546, 'test_loss': 0.47457323148846625, 'bleu': 10.4602, 'gen_len': 7.0274}




 25%|██▍       | 114/465 [25:59<1:19:27, 13.58s/it]

For epoch 606: {Learning rate: [0.003119592455521163]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.1924852372669592, 'test_loss': 0.4691509924829006, 'bleu': 5.9653, 'gen_len': 8.1233}




 25%|██▍       | 115/465 [26:12<1:18:56, 13.53s/it]

For epoch 607: {Learning rate: [0.003110072299906993]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.19048009413044628, 'test_loss': 0.4620765849947929, 'bleu': 7.311, 'gen_len': 7.9247}




 25%|██▍       | 116/465 [26:26<1:18:50, 13.55s/it]

For epoch 608: {Learning rate: [0.0031005521442928237]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.19130599135305823, 'test_loss': 0.4765947550535202, 'bleu': 9.7979, 'gen_len': 7.5685}




 25%|██▌       | 117/465 [26:39<1:18:21, 13.51s/it]

For epoch 609: {Learning rate: [0.0030910319886786543]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.19133598339266894, 'test_loss': 0.4637996681034565, 'bleu': 11.1791, 'gen_len': 7.8767}




 25%|██▌       | 118/465 [26:53<1:18:07, 13.51s/it]

For epoch 610: {Learning rate: [0.003081511833064485]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.1882637215823662, 'test_loss': 0.47273759767413137, 'bleu': 9.6535, 'gen_len': 7.7808}




 26%|██▌       | 119/465 [27:06<1:17:36, 13.46s/it]

For epoch 611: {Learning rate: [0.003071991677450315]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.18997523661066845, 'test_loss': 0.46417844071984293, 'bleu': 9.0464, 'gen_len': 7.7877}




 26%|██▌       | 120/465 [27:20<1:17:05, 13.41s/it]

For epoch 612: {Learning rate: [0.0030624715218361457]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.1893292534642103, 'test_loss': 0.45757944211363794, 'bleu': 7.0587, 'gen_len': 8.2192}




 26%|██▌       | 121/465 [27:33<1:16:55, 13.42s/it]

For epoch 613: {Learning rate: [0.0030529513662219763]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.19198361657014706, 'test_loss': 0.4735742315649986, 'bleu': 7.26, 'gen_len': 7.5548}




 26%|██▌       | 122/465 [27:47<1:16:55, 13.45s/it]

For epoch 614: {Learning rate: [0.0030434312106078065]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.1890809259763578, 'test_loss': 0.4667065903544426, 'bleu': 10.4515, 'gen_len': 7.0479}




 26%|██▋       | 123/465 [28:00<1:16:25, 13.41s/it]

For epoch 615: {Learning rate: [0.003033911054993637]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.18944725212527486, 'test_loss': 0.47041949778795245, 'bleu': 9.5035, 'gen_len': 6.9726}




 27%|██▋       | 124/465 [28:13<1:16:07, 13.39s/it]

For epoch 616: {Learning rate: [0.0030243908993794677]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.18641239077579685, 'test_loss': 0.4695796750485897, 'bleu': 7.8395, 'gen_len': 7.8288}




 27%|██▋       | 125/465 [28:27<1:16:01, 13.42s/it]

For epoch 617: {Learning rate: [0.003014870743765298]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.18919184258798274, 'test_loss': 0.47574363872408865, 'bleu': 8.7005, 'gen_len': 6.9452}




 27%|██▋       | 126/465 [28:40<1:16:16, 13.50s/it]

For epoch 618: {Learning rate: [0.0030053505881511284]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.18654371798038483, 'test_loss': 0.4518886148929596, 'bleu': 9.1911, 'gen_len': 7.6781}




 27%|██▋       | 127/465 [28:54<1:15:57, 13.48s/it]

For epoch 619: {Learning rate: [0.002995830432536959]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.11batches/s]



Metrics: {'train_loss': 0.1869880669727558, 'test_loss': 0.46589379087090493, 'bleu': 8.8415, 'gen_len': 7.4521}




 28%|██▊       | 128/465 [29:07<1:15:20, 13.41s/it]

For epoch 620: {Learning rate: [0.0029863102769227892]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.76batches/s]



Metrics: {'train_loss': 0.1862545751216935, 'test_loss': 0.46425383165478706, 'bleu': 10.0416, 'gen_len': 7.8014}




 28%|██▊       | 129/465 [29:21<1:15:43, 13.52s/it]

For epoch 621: {Learning rate: [0.00297679012130862]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.1855751988364429, 'test_loss': 0.4707189962267876, 'bleu': 6.1609, 'gen_len': 7.7466}




 28%|██▊       | 130/465 [29:34<1:15:12, 13.47s/it]

For epoch 622: {Learning rate: [0.0029672699656944504]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.15batches/s]



Metrics: {'train_loss': 0.1865346042848215, 'test_loss': 0.46919824555516243, 'bleu': 9.8917, 'gen_len': 7.5411}




 28%|██▊       | 131/465 [29:47<1:14:38, 13.41s/it]

For epoch 623: {Learning rate: [0.0029577498100802806]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.18628432910616805, 'test_loss': 0.4700216926634312, 'bleu': 9.5521, 'gen_len': 7.5068}




 28%|██▊       | 132/465 [30:01<1:14:03, 13.34s/it]

For epoch 624: {Learning rate: [0.002948229654466111]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.00batches/s]



Metrics: {'train_loss': 0.18408088386058807, 'test_loss': 0.4871743991971016, 'bleu': 9.0648, 'gen_len': 7.137}




 29%|██▊       | 133/465 [30:14<1:13:53, 13.35s/it]

For epoch 625: {Learning rate: [0.002938709498851942]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.13batches/s]



Metrics: {'train_loss': 0.18529284763626935, 'test_loss': 0.4780286930501461, 'bleu': 10.9015, 'gen_len': 7.2466}




 29%|██▉       | 134/465 [30:27<1:13:22, 13.30s/it]

For epoch 626: {Learning rate: [0.002929189343237772]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.1857683902106634, 'test_loss': 0.46690677031874656, 'bleu': 9.5256, 'gen_len': 7.5137}




 29%|██▉       | 135/465 [30:41<1:13:10, 13.30s/it]

For epoch 627: {Learning rate: [0.0029196691876236026]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.14batches/s]



Metrics: {'train_loss': 0.18509032304694012, 'test_loss': 0.4769591026008129, 'bleu': 8.3886, 'gen_len': 7.137}




 29%|██▉       | 136/465 [30:54<1:12:51, 13.29s/it]

For epoch 628: {Learning rate: [0.002910149032009433]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.18241497510816992, 'test_loss': 0.4772972099483013, 'bleu': 9.8771, 'gen_len': 7.1164}




 29%|██▉       | 137/465 [31:07<1:12:43, 13.30s/it]

For epoch 629: {Learning rate: [0.002900628876395264]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.11batches/s]



Metrics: {'train_loss': 0.1839140088456433, 'test_loss': 0.4728549227118492, 'bleu': 9.0038, 'gen_len': 7.7603}




 30%|██▉       | 138/465 [31:20<1:12:17, 13.26s/it]

For epoch 630: {Learning rate: [0.002891108720781094]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.00batches/s]



Metrics: {'train_loss': 0.1860426277285669, 'test_loss': 0.4726815275847912, 'bleu': 9.4694, 'gen_len': 7.2123}




 30%|██▉       | 139/465 [31:34<1:12:19, 13.31s/it]

For epoch 631: {Learning rate: [0.0028815885651669246]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.12batches/s]



Metrics: {'train_loss': 0.18084787959005774, 'test_loss': 0.4683655798435211, 'bleu': 7.1717, 'gen_len': 7.5137}




 30%|███       | 140/465 [31:47<1:11:56, 13.28s/it]

For epoch 632: {Learning rate: [0.002872068409552755]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.17924824938541506, 'test_loss': 0.4657833635807037, 'bleu': 10.0645, 'gen_len': 8.0342}




 30%|███       | 141/465 [32:00<1:12:11, 13.37s/it]

For epoch 633: {Learning rate: [0.0028625482539385854]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.11batches/s]



Metrics: {'train_loss': 0.1820623394919605, 'test_loss': 0.4694742068648338, 'bleu': 9.7584, 'gen_len': 7.8356}




 31%|███       | 142/465 [32:14<1:11:53, 13.35s/it]

For epoch 634: {Learning rate: [0.002853028098324416]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.18386263454832683, 'test_loss': 0.46853393614292144, 'bleu': 8.5728, 'gen_len': 7.1233}




 31%|███       | 143/465 [32:27<1:11:41, 13.36s/it]

For epoch 635: {Learning rate: [0.0028435079427102466]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.18049061516436135, 'test_loss': 0.4751579873263836, 'bleu': 8.1646, 'gen_len': 8.1712}




 31%|███       | 144/465 [32:41<1:11:34, 13.38s/it]

For epoch 636: {Learning rate: [0.0028339877870960767]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.11batches/s]



Metrics: {'train_loss': 0.17987373834703027, 'test_loss': 0.4579414285719395, 'bleu': 9.7212, 'gen_len': 6.8767}




 31%|███       | 145/465 [32:54<1:11:13, 13.35s/it]

For epoch 637: {Learning rate: [0.0028244676314819073]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.00batches/s]



Metrics: {'train_loss': 0.1803207484687247, 'test_loss': 0.4712677523493767, 'bleu': 8.8032, 'gen_len': 8.0205}




 31%|███▏      | 146/465 [33:07<1:11:12, 13.39s/it]

For epoch 638: {Learning rate: [0.002814947475867738]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.10batches/s]



Metrics: {'train_loss': 0.18068583673093377, 'test_loss': 0.4685316517949104, 'bleu': 10.5594, 'gen_len': 7.4795}




 32%|███▏      | 147/465 [33:21<1:10:50, 13.37s/it]

For epoch 639: {Learning rate: [0.002805427320253568]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.1799557270800195, 'test_loss': 0.46886005252599716, 'bleu': 10.7476, 'gen_len': 7.1096}




 32%|███▏      | 148/465 [33:34<1:10:45, 13.39s/it]

For epoch 640: {Learning rate: [0.0027959071646393987]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.13batches/s]



Metrics: {'train_loss': 0.1789473418782397, 'test_loss': 0.4673731803894043, 'bleu': 12.9213, 'gen_len': 7.1164}




 32%|███▏      | 149/465 [33:47<1:10:20, 13.36s/it]

For epoch 641: {Learning rate: [0.0027863870090252293]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.17574996555723796, 'test_loss': 0.47067669183015826, 'bleu': 10.2986, 'gen_len': 7.2329}




 32%|███▏      | 150/465 [34:01<1:10:19, 13.40s/it]

For epoch 642: {Learning rate: [0.0027768668534110595]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.09batches/s]



Metrics: {'train_loss': 0.1786498205690849, 'test_loss': 0.47500754445791243, 'bleu': 9.3476, 'gen_len': 8.3219}




 32%|███▏      | 151/465 [34:14<1:09:53, 13.35s/it]

For epoch 643: {Learning rate: [0.00276734669779689]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.17724071043293652, 'test_loss': 0.477756667137146, 'bleu': 9.9038, 'gen_len': 7.5205}




 33%|███▎      | 152/465 [34:28<1:09:42, 13.36s/it]

For epoch 644: {Learning rate: [0.0027578265421827207]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.17698673922114255, 'test_loss': 0.46294952258467675, 'bleu': 8.325, 'gen_len': 7.6507}




 33%|███▎      | 153/465 [34:41<1:09:30, 13.37s/it]

For epoch 645: {Learning rate: [0.0027483063865685513]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.17418319095925586, 'test_loss': 0.46932604238390924, 'bleu': 9.4351, 'gen_len': 7.411}




 33%|███▎      | 154/465 [34:54<1:09:09, 13.34s/it]

For epoch 646: {Learning rate: [0.0027387862309543815]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.17622112192031814, 'test_loss': 0.4749059483408928, 'bleu': 8.4876, 'gen_len': 7.6918}




 33%|███▎      | 155/465 [35:08<1:09:00, 13.36s/it]

For epoch 647: {Learning rate: [0.002729266075340212]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.17867322974815603, 'test_loss': 0.47221412807703017, 'bleu': 11.3457, 'gen_len': 6.7671}




 34%|███▎      | 156/465 [35:21<1:09:15, 13.45s/it]

For epoch 648: {Learning rate: [0.0027197459197260427]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.17569611821232772, 'test_loss': 0.4577432684600353, 'bleu': 9.8643, 'gen_len': 7.8767}




 34%|███▍      | 157/465 [35:35<1:08:55, 13.43s/it]

For epoch 649: {Learning rate: [0.002710225764111873]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.20batches/s]



Metrics: {'train_loss': 0.1759171146081715, 'test_loss': 0.47774958312511445, 'bleu': 9.4756, 'gen_len': 7.9247}




 34%|███▍      | 158/465 [35:48<1:08:23, 13.37s/it]

For epoch 650: {Learning rate: [0.0027007056084977035]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.17429366053604498, 'test_loss': 0.46762900799512863, 'bleu': 11.1769, 'gen_len': 7.7192}




 34%|███▍      | 159/465 [36:01<1:07:56, 13.32s/it]

For epoch 651: {Learning rate: [0.002691185452883534]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.1754929764968593, 'test_loss': 0.45792983323335645, 'bleu': 8.8348, 'gen_len': 8.4863}




 34%|███▍      | 160/465 [36:15<1:07:57, 13.37s/it]

For epoch 652: {Learning rate: [0.0026816652972693642]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.17281918627459827, 'test_loss': 0.46376090571284295, 'bleu': 12.8112, 'gen_len': 7.3836}




 35%|███▍      | 161/465 [36:28<1:07:49, 13.39s/it]

For epoch 653: {Learning rate: [0.002672145141655195]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.1718662522914933, 'test_loss': 0.47587124109268186, 'bleu': 10.8047, 'gen_len': 7.2329}




 35%|███▍      | 162/465 [36:41<1:07:41, 13.41s/it]

For epoch 654: {Learning rate: [0.0026626249860410255]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.1716376006966684, 'test_loss': 0.47081589698791504, 'bleu': 11.0469, 'gen_len': 7.2603}




 35%|███▌      | 163/465 [36:55<1:07:32, 13.42s/it]

For epoch 655: {Learning rate: [0.0026531048304268556]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.12batches/s]



Metrics: {'train_loss': 0.17245285503747987, 'test_loss': 0.4846278727054596, 'bleu': 10.0685, 'gen_len': 8.1712}




 35%|███▌      | 164/465 [37:08<1:07:04, 13.37s/it]

For epoch 656: {Learning rate: [0.0026435846748126862]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.17123225731093708, 'test_loss': 0.4805100500583649, 'bleu': 10.4892, 'gen_len': 6.8425}




 35%|███▌      | 165/465 [37:22<1:07:11, 13.44s/it]

For epoch 657: {Learning rate: [0.002634064519198517]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.1701209745029124, 'test_loss': 0.4680322740226984, 'bleu': 9.948, 'gen_len': 7.7329}




 36%|███▌      | 166/465 [37:35<1:06:41, 13.38s/it]

For epoch 658: {Learning rate: [0.002624544363584347]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.17095073694136084, 'test_loss': 0.463417674601078, 'bleu': 8.8841, 'gen_len': 7.863}




 36%|███▌      | 167/465 [37:48<1:06:11, 13.33s/it]

For epoch 659: {Learning rate: [0.0026150242079701776]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.17111049919593624, 'test_loss': 0.47113361284136773, 'bleu': 11.3723, 'gen_len': 6.9589}




 36%|███▌      | 168/465 [38:02<1:06:09, 13.37s/it]

For epoch 660: {Learning rate: [0.0026055040523560082]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.10batches/s]



Metrics: {'train_loss': 0.17024883446170064, 'test_loss': 0.4650512523949146, 'bleu': 8.4644, 'gen_len': 8.1233}




 36%|███▋      | 169/465 [38:15<1:05:42, 13.32s/it]

For epoch 661: {Learning rate: [0.002595983896741839]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.1695415057787081, 'test_loss': 0.46121554896235467, 'bleu': 11.5384, 'gen_len': 7.2808}




 37%|███▋      | 170/465 [38:29<1:06:00, 13.43s/it]

For epoch 662: {Learning rate: [0.002586463741127669]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.1679676353204541, 'test_loss': 0.4640152260661125, 'bleu': 12.3979, 'gen_len': 7.4521}




 37%|███▋      | 171/465 [38:42<1:06:05, 13.49s/it]

For epoch 663: {Learning rate: [0.0025769435855134996]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.16715437064810498, 'test_loss': 0.454832062497735, 'bleu': 9.6111, 'gen_len': 7.4178}




 37%|███▋      | 172/465 [38:56<1:05:40, 13.45s/it]

For epoch 664: {Learning rate: [0.00256742342989933]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.16595428818609656, 'test_loss': 0.4731673844158649, 'bleu': 9.7823, 'gen_len': 7.8493}




 37%|███▋      | 173/465 [39:09<1:05:05, 13.38s/it]

For epoch 665: {Learning rate: [0.0025579032742851604]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.16945238229705067, 'test_loss': 0.4761534884572029, 'bleu': 12.5341, 'gen_len': 7.5068}




 37%|███▋      | 174/465 [39:22<1:04:53, 13.38s/it]

For epoch 666: {Learning rate: [0.002548383118670991]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.16717457298825428, 'test_loss': 0.4596993800252676, 'bleu': 10.9925, 'gen_len': 7.6918}




 38%|███▊      | 175/465 [39:36<1:04:50, 13.41s/it]

For epoch 667: {Learning rate: [0.0025388629630568216]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.12batches/s]



Metrics: {'train_loss': 0.16643903458990703, 'test_loss': 0.4668041430413723, 'bleu': 11.3226, 'gen_len': 7.2534}




 38%|███▊      | 176/465 [39:49<1:04:25, 13.37s/it]

For epoch 668: {Learning rate: [0.0025293428074426518]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.1645896100416416, 'test_loss': 0.4638968206942081, 'bleu': 9.3218, 'gen_len': 7.6438}




 38%|███▊      | 177/465 [40:02<1:04:19, 13.40s/it]

For epoch 669: {Learning rate: [0.0025198226518284824]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.1650694637763791, 'test_loss': 0.4728436663746834, 'bleu': 11.152, 'gen_len': 7.8014}




 38%|███▊      | 178/465 [40:16<1:04:01, 13.39s/it]

For epoch 670: {Learning rate: [0.002510302496214313]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.16515314033845577, 'test_loss': 0.4771930783987045, 'bleu': 12.8186, 'gen_len': 7.3425}




 38%|███▊      | 179/465 [40:29<1:03:47, 13.38s/it]

For epoch 671: {Learning rate: [0.002500782340600143]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.16495867109880213, 'test_loss': 0.44957316145300863, 'bleu': 8.8328, 'gen_len': 8.363}




 39%|███▊      | 180/465 [40:43<1:03:42, 13.41s/it]

For epoch 672: {Learning rate: [0.0024912621849859738]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.16384719439396045, 'test_loss': 0.46231108009815214, 'bleu': 8.9062, 'gen_len': 7.8973}




 39%|███▉      | 181/465 [40:56<1:03:16, 13.37s/it]

For epoch 673: {Learning rate: [0.0024817420293718044]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.10batches/s]



Metrics: {'train_loss': 0.16376158667773735, 'test_loss': 0.47209499552845957, 'bleu': 10.6702, 'gen_len': 7.9726}




 39%|███▉      | 182/465 [41:09<1:02:58, 13.35s/it]

For epoch 674: {Learning rate: [0.0024722218737576345]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.16164815425872803, 'test_loss': 0.47752281427383425, 'bleu': 9.8248, 'gen_len': 8.1027}




 39%|███▉      | 183/465 [41:24<1:04:43, 13.77s/it]

For epoch 675: {Learning rate: [0.002462701718143465]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.16368362016794158, 'test_loss': 0.4551019035279751, 'bleu': 10.914, 'gen_len': 8.1986}




 40%|███▉      | 184/465 [41:37<1:04:15, 13.72s/it]

For epoch 676: {Learning rate: [0.0024531815625292957]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.00batches/s]



Metrics: {'train_loss': 0.16320066372068917, 'test_loss': 0.46184631772339346, 'bleu': 9.5836, 'gen_len': 8.7945}




 40%|███▉      | 185/465 [41:51<1:03:35, 13.63s/it]

For epoch 677: {Learning rate: [0.0024436614069151263]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.16278533092359218, 'test_loss': 0.4673647917807102, 'bleu': 9.6149, 'gen_len': 7.7808}




 40%|████      | 186/465 [42:04<1:02:54, 13.53s/it]

For epoch 678: {Learning rate: [0.0024341412513009565]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.16169156025095685, 'test_loss': 0.47369730323553083, 'bleu': 12.481, 'gen_len': 7.2466}




 40%|████      | 187/465 [42:18<1:02:26, 13.47s/it]

For epoch 679: {Learning rate: [0.002424621095686787]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.15986272983434724, 'test_loss': 0.4655338615179062, 'bleu': 10.4311, 'gen_len': 8.3562}




 40%|████      | 188/465 [42:31<1:01:56, 13.42s/it]

For epoch 680: {Learning rate: [0.0024151009400726177]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.14batches/s]



Metrics: {'train_loss': 0.15727257873953843, 'test_loss': 0.46030845791101455, 'bleu': 10.2019, 'gen_len': 7.3493}




 41%|████      | 189/465 [42:44<1:01:21, 13.34s/it]

For epoch 681: {Learning rate: [0.002405580784458448]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.15893413653460944, 'test_loss': 0.47374713122844697, 'bleu': 13.048, 'gen_len': 7.2466}




 41%|████      | 190/465 [42:57<1:01:10, 13.35s/it]

For epoch 682: {Learning rate: [0.0023960606288442785]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.15879884780180165, 'test_loss': 0.464760085195303, 'bleu': 10.6722, 'gen_len': 8.2877}




 41%|████      | 191/465 [43:11<1:00:51, 13.33s/it]

For epoch 683: {Learning rate: [0.002386540473230109]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.15818564502931223, 'test_loss': 0.4773699246346951, 'bleu': 9.2944, 'gen_len': 7.6233}




 41%|████▏     | 192/465 [43:24<1:00:47, 13.36s/it]

For epoch 684: {Learning rate: [0.0023770203176159393]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.16112322360277176, 'test_loss': 0.45964278280735016, 'bleu': 11.3392, 'gen_len': 7.9384}




 42%|████▏     | 193/465 [43:37<1:00:31, 13.35s/it]

For epoch 685: {Learning rate: [0.00236750016200177]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.15843929968229153, 'test_loss': 0.47478248178958893, 'bleu': 10.4052, 'gen_len': 7.9247}




 42%|████▏     | 194/465 [43:51<1:00:23, 13.37s/it]

For epoch 686: {Learning rate: [0.0023579800063876005]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.10batches/s]



Metrics: {'train_loss': 0.15795312712832196, 'test_loss': 0.47758850529789926, 'bleu': 10.2469, 'gen_len': 7.3219}




 42%|████▏     | 195/465 [44:04<59:53, 13.31s/it]  

For epoch 687: {Learning rate: [0.0023484598507734307]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.15779548738060928, 'test_loss': 0.4693596884608269, 'bleu': 11.8069, 'gen_len': 7.6096}




 42%|████▏     | 196/465 [44:17<59:33, 13.29s/it]

For epoch 688: {Learning rate: [0.0023389396951592613]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.09batches/s]



Metrics: {'train_loss': 0.15732827691770182, 'test_loss': 0.4682926572859287, 'bleu': 13.3186, 'gen_len': 7.5068}




 42%|████▏     | 197/465 [44:30<59:15, 13.27s/it]

For epoch 689: {Learning rate: [0.002329419539545092]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.15493463597646573, 'test_loss': 0.4684576950967312, 'bleu': 12.767, 'gen_len': 7.1918}




 43%|████▎     | 198/465 [44:44<59:12, 13.30s/it]

For epoch 690: {Learning rate: [0.002319899383930922]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.15682577359967115, 'test_loss': 0.4837391138076782, 'bleu': 9.0783, 'gen_len': 7.8288}




 43%|████▎     | 199/465 [44:57<58:49, 13.27s/it]

For epoch 691: {Learning rate: [0.0023103792283167526]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.1573935619941572, 'test_loss': 0.46640492379665377, 'bleu': 11.6199, 'gen_len': 7.637}




 43%|████▎     | 200/465 [45:10<58:40, 13.28s/it]

For epoch 692: {Learning rate: [0.0023008590727025833]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.15512942686313536, 'test_loss': 0.47885355949401853, 'bleu': 13.1424, 'gen_len': 7.4247}




 43%|████▎     | 201/465 [45:24<58:31, 13.30s/it]

For epoch 693: {Learning rate: [0.0022913389170884134]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.15481977019368148, 'test_loss': 0.46837600991129874, 'bleu': 10.1248, 'gen_len': 7.8699}




 43%|████▎     | 202/465 [45:37<58:29, 13.34s/it]

For epoch 694: {Learning rate: [0.002281818761474244]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.13batches/s]



Metrics: {'train_loss': 0.15494311855333606, 'test_loss': 0.4821519762277603, 'bleu': 13.3134, 'gen_len': 7.5137}




 44%|████▎     | 203/465 [45:50<58:03, 13.29s/it]

For epoch 695: {Learning rate: [0.0022722986058600746]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.15211283215662327, 'test_loss': 0.4683121211826801, 'bleu': 11.1615, 'gen_len': 8.0616}




 44%|████▍     | 204/465 [46:04<57:57, 13.32s/it]

For epoch 696: {Learning rate: [0.0022627784502459052]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.08batches/s]



Metrics: {'train_loss': 0.1545338625224625, 'test_loss': 0.47070243656635286, 'bleu': 12.551, 'gen_len': 7.7466}




 44%|████▍     | 205/465 [46:17<57:40, 13.31s/it]

For epoch 697: {Learning rate: [0.0022532582946317354]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.15537242336971005, 'test_loss': 0.47527896538376807, 'bleu': 12.0718, 'gen_len': 7.4726}




 44%|████▍     | 206/465 [46:30<57:35, 13.34s/it]

For epoch 698: {Learning rate: [0.002243738139017566]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.1546460524564836, 'test_loss': 0.4633336253464222, 'bleu': 8.8578, 'gen_len': 8.1096}




 45%|████▍     | 207/465 [46:44<57:12, 13.30s/it]

For epoch 699: {Learning rate: [0.0022342179834033966]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.72batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.1515395610434253, 'test_loss': 0.479997019469738, 'bleu': 13.2214, 'gen_len': 7.2603}




 45%|████▍     | 208/465 [46:57<57:06, 13.33s/it]

For epoch 700: {Learning rate: [0.002224697827789227]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.17batches/s]



Metrics: {'train_loss': 0.14999322611384275, 'test_loss': 0.46564973443746566, 'bleu': 11.8027, 'gen_len': 7.7192}




 45%|████▍     | 209/465 [47:10<56:42, 13.29s/it]

For epoch 701: {Learning rate: [0.0022151776721750574]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.149964509213843, 'test_loss': 0.47303748354315756, 'bleu': 12.2139, 'gen_len': 7.1438}




 45%|████▌     | 210/465 [47:24<57:11, 13.46s/it]

For epoch 702: {Learning rate: [0.002205657516560888]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.14886343170229982, 'test_loss': 0.47339099943637847, 'bleu': 12.6313, 'gen_len': 7.274}




 45%|████▌     | 211/465 [47:37<56:54, 13.44s/it]

For epoch 703: {Learning rate: [0.002196137360946718]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.11batches/s]



Metrics: {'train_loss': 0.1501375874731599, 'test_loss': 0.46290162652730943, 'bleu': 11.2378, 'gen_len': 7.6918}




 46%|████▌     | 212/465 [47:51<56:33, 13.41s/it]

For epoch 704: {Learning rate: [0.0021866172053325488]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.15267437660112615, 'test_loss': 0.47310829162597656, 'bleu': 11.1569, 'gen_len': 8.0137}




 46%|████▌     | 213/465 [48:04<56:11, 13.38s/it]

For epoch 705: {Learning rate: [0.0021770970497183794]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.08batches/s]



Metrics: {'train_loss': 0.14989230363834194, 'test_loss': 0.4650776758790016, 'bleu': 10.5397, 'gen_len': 8.4726}




 46%|████▌     | 214/465 [48:17<55:47, 13.34s/it]

For epoch 706: {Learning rate: [0.0021675768941042096]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.14771204223720039, 'test_loss': 0.4672006078064442, 'bleu': 14.4325, 'gen_len': 7.8151}




 46%|████▌     | 215/465 [48:32<57:13, 13.73s/it]

For epoch 707: {Learning rate: [0.00215805673849004]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.11batches/s]



Metrics: {'train_loss': 0.14764632121091936, 'test_loss': 0.47071052715182304, 'bleu': 12.1697, 'gen_len': 7.2945}




 46%|████▋     | 216/465 [48:45<56:37, 13.64s/it]

For epoch 708: {Learning rate: [0.0021485365828758708]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.1501063519134754, 'test_loss': 0.46011183187365534, 'bleu': 10.8686, 'gen_len': 8.1027}




 47%|████▋     | 217/465 [48:59<56:01, 13.55s/it]

For epoch 709: {Learning rate: [0.002139016427261701]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.1472847697938361, 'test_loss': 0.47523392662405967, 'bleu': 12.9763, 'gen_len': 7.4658}




 47%|████▋     | 218/465 [49:12<55:26, 13.47s/it]

For epoch 710: {Learning rate: [0.0021294962716475315]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.1452418738385526, 'test_loss': 0.4656100168824196, 'bleu': 12.0773, 'gen_len': 7.5479}




 47%|████▋     | 219/465 [49:26<55:14, 13.48s/it]

For epoch 711: {Learning rate: [0.002119976116033362]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.09batches/s]



Metrics: {'train_loss': 0.14810977312849788, 'test_loss': 0.49006886035203934, 'bleu': 10.6987, 'gen_len': 7.8493}




 47%|████▋     | 220/465 [49:39<54:44, 13.41s/it]

For epoch 712: {Learning rate: [0.0021104559604191928]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.1491881012916565, 'test_loss': 0.46527800112962725, 'bleu': 12.9768, 'gen_len': 7.4863}




 48%|████▊     | 221/465 [49:52<54:23, 13.38s/it]

For epoch 713: {Learning rate: [0.002100935804805023]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.18batches/s]



Metrics: {'train_loss': 0.14496380290607128, 'test_loss': 0.4800374127924442, 'bleu': 12.4778, 'gen_len': 7.4589}




 48%|████▊     | 222/465 [50:05<53:47, 13.28s/it]

For epoch 714: {Learning rate: [0.0020914156491908535]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.1465481170793859, 'test_loss': 0.478478067368269, 'bleu': 11.7573, 'gen_len': 7.6575}




 48%|████▊     | 223/465 [50:19<53:43, 13.32s/it]

For epoch 715: {Learning rate: [0.002081895493576684]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.14527018604482092, 'test_loss': 0.46918816566467286, 'bleu': 12.5193, 'gen_len': 7.7603}




 48%|████▊     | 224/465 [50:32<53:41, 13.37s/it]

For epoch 716: {Learning rate: [0.0020723753379625143]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.14359431604786618, 'test_loss': 0.46911690160632136, 'bleu': 11.833, 'gen_len': 7.2877}




 48%|████▊     | 225/465 [50:45<53:30, 13.38s/it]

For epoch 717: {Learning rate: [0.002062855182348345]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.14393307232275243, 'test_loss': 0.4690990477800369, 'bleu': 12.4013, 'gen_len': 7.5205}




 49%|████▊     | 226/465 [50:59<53:13, 13.36s/it]

For epoch 718: {Learning rate: [0.0020533350267341755]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.14374527644093443, 'test_loss': 0.4771321572363377, 'bleu': 11.6925, 'gen_len': 7.5753}




 49%|████▉     | 227/465 [51:12<53:00, 13.36s/it]

For epoch 719: {Learning rate: [0.0020438148711200057]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.15batches/s]



Metrics: {'train_loss': 0.14378339869946968, 'test_loss': 0.4679240144789219, 'bleu': 11.4586, 'gen_len': 7.6301}




 49%|████▉     | 228/465 [51:25<52:32, 13.30s/it]

For epoch 720: {Learning rate: [0.0020342947155058363]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.1438152139506689, 'test_loss': 0.4674143984913826, 'bleu': 12.8568, 'gen_len': 7.4589}




 49%|████▉     | 229/465 [51:39<52:46, 13.42s/it]

For epoch 721: {Learning rate: [0.002024774559891667]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.14130795220049416, 'test_loss': 0.45944769978523253, 'bleu': 12.5936, 'gen_len': 7.637}




 49%|████▉     | 230/465 [51:52<52:27, 13.39s/it]

For epoch 722: {Learning rate: [0.002015254404277497]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.14085163321436905, 'test_loss': 0.49163961932063105, 'bleu': 11.7929, 'gen_len': 7.0685}




 50%|████▉     | 231/465 [52:06<52:01, 13.34s/it]

For epoch 723: {Learning rate: [0.0020057342486633277]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.14362837610448279, 'test_loss': 0.4646700367331505, 'bleu': 10.2129, 'gen_len': 8.1918}




 50%|████▉     | 232/465 [52:19<51:39, 13.30s/it]

For epoch 724: {Learning rate: [0.0019962140930491583]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.1427011540750178, 'test_loss': 0.46763034611940385, 'bleu': 12.4863, 'gen_len': 8.4247}




 50%|█████     | 233/465 [52:32<51:30, 13.32s/it]

For epoch 725: {Learning rate: [0.0019866939374349885]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.1401437281108484, 'test_loss': 0.46781961992383003, 'bleu': 12.0669, 'gen_len': 7.589}




 50%|█████     | 234/465 [52:46<51:24, 13.35s/it]

For epoch 726: {Learning rate: [0.001977173781820819]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.13749883305735705, 'test_loss': 0.474080441147089, 'bleu': 10.3614, 'gen_len': 7.9247}




 51%|█████     | 235/465 [52:59<51:22, 13.40s/it]

For epoch 727: {Learning rate: [0.0019676536262066492]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.06batches/s]



Metrics: {'train_loss': 0.13791280130787595, 'test_loss': 0.475958950817585, 'bleu': 11.6497, 'gen_len': 7.2808}




 51%|█████     | 236/465 [53:12<50:58, 13.35s/it]

For epoch 728: {Learning rate: [0.00195813347059248]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.1409238535093098, 'test_loss': 0.47319473177194593, 'bleu': 11.747, 'gen_len': 7.3288}




 51%|█████     | 237/465 [53:26<50:56, 13.41s/it]

For epoch 729: {Learning rate: [0.0019486133149783102]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.13977090850835894, 'test_loss': 0.49851519614458084, 'bleu': 10.2176, 'gen_len': 7.0411}




 51%|█████     | 238/465 [53:39<50:42, 13.40s/it]

For epoch 730: {Learning rate: [0.0019390931593641408]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.13562106522845058, 'test_loss': 0.473518131673336, 'bleu': 11.6917, 'gen_len': 7.6301}




 51%|█████▏    | 239/465 [53:53<50:35, 13.43s/it]

For epoch 731: {Learning rate: [0.0019295730037499712]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.09batches/s]



Metrics: {'train_loss': 0.1371061251294322, 'test_loss': 0.4787501558661461, 'bleu': 13.3773, 'gen_len': 7.5205}




 52%|█████▏    | 240/465 [54:06<50:12, 13.39s/it]

For epoch 732: {Learning rate: [0.0019200528481358016]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.13839596473589177, 'test_loss': 0.48474639281630516, 'bleu': 11.7419, 'gen_len': 7.2534}




 52%|█████▏    | 241/465 [54:20<50:08, 13.43s/it]

For epoch 733: {Learning rate: [0.0019105326925216322]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.13570293747797246, 'test_loss': 0.46726742908358576, 'bleu': 11.0606, 'gen_len': 8.0822}




 52%|█████▏    | 242/465 [54:33<49:42, 13.37s/it]

For epoch 734: {Learning rate: [0.0019010125369074626]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.13517233402263829, 'test_loss': 0.47738695070147513, 'bleu': 9.0446, 'gen_len': 8.6986}




 52%|█████▏    | 243/465 [54:46<49:33, 13.39s/it]

For epoch 735: {Learning rate: [0.0018914923812932932]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.13388884321945468, 'test_loss': 0.4618647314608097, 'bleu': 10.2248, 'gen_len': 7.8219}




 52%|█████▏    | 244/465 [55:00<49:33, 13.45s/it]

For epoch 736: {Learning rate: [0.0018819722256791236]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.1329322057526286, 'test_loss': 0.4762712389230728, 'bleu': 12.5628, 'gen_len': 7.589}




 53%|█████▎    | 245/465 [55:13<49:12, 13.42s/it]

For epoch 737: {Learning rate: [0.001872452070064954]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.14batches/s]



Metrics: {'train_loss': 0.13299811631441116, 'test_loss': 0.4847584247589111, 'bleu': 10.467, 'gen_len': 7.8767}




 53%|█████▎    | 246/465 [55:26<48:48, 13.37s/it]

For epoch 738: {Learning rate: [0.0018629319144507846]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.13261744416341548, 'test_loss': 0.47075634598731997, 'bleu': 13.1483, 'gen_len': 7.3699}




 53%|█████▎    | 247/465 [55:40<48:51, 13.45s/it]

For epoch 739: {Learning rate: [0.001853411758836615]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.13203901470434376, 'test_loss': 0.4719203434884548, 'bleu': 11.7899, 'gen_len': 7.9247}




 53%|█████▎    | 248/465 [55:53<48:27, 13.40s/it]

For epoch 740: {Learning rate: [0.0018438916032224454]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.13241841862114465, 'test_loss': 0.4783067837357521, 'bleu': 13.285, 'gen_len': 7.4041}




 54%|█████▎    | 249/465 [56:07<48:16, 13.41s/it]

For epoch 741: {Learning rate: [0.001834371447608276]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.13330174055768224, 'test_loss': 0.4756235808134079, 'bleu': 12.6478, 'gen_len': 8.2397}




 54%|█████▍    | 250/465 [56:20<48:08, 13.44s/it]

For epoch 742: {Learning rate: [0.0018248512919941064]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.13077073507919545, 'test_loss': 0.4754000544548035, 'bleu': 11.2455, 'gen_len': 7.5479}




 54%|█████▍    | 251/465 [56:34<47:56, 13.44s/it]

For epoch 743: {Learning rate: [0.0018153311363799367]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.13040366223672542, 'test_loss': 0.4669643145054579, 'bleu': 12.4289, 'gen_len': 7.7603}




 54%|█████▍    | 252/465 [56:47<47:43, 13.45s/it]

For epoch 744: {Learning rate: [0.0018058109807657674]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.13061056976638188, 'test_loss': 0.47800578773021696, 'bleu': 12.4499, 'gen_len': 7.411}




 54%|█████▍    | 253/465 [57:01<47:27, 13.43s/it]

For epoch 745: {Learning rate: [0.0017962908251515977]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.13172944889562885, 'test_loss': 0.4751656502485275, 'bleu': 11.8583, 'gen_len': 7.8219}




 55%|█████▍    | 254/465 [57:14<47:20, 13.46s/it]

For epoch 746: {Learning rate: [0.0017867706695374283]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.13127961863831775, 'test_loss': 0.4761966861784458, 'bleu': 11.9116, 'gen_len': 7.8356}




 55%|█████▍    | 255/465 [57:28<47:12, 13.49s/it]

For epoch 747: {Learning rate: [0.0017772505139232587]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.13025509584240796, 'test_loss': 0.48021780177950857, 'bleu': 12.3401, 'gen_len': 7.637}




 55%|█████▌    | 256/465 [57:41<47:04, 13.51s/it]

For epoch 748: {Learning rate: [0.0017677303583090891]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.08batches/s]



Metrics: {'train_loss': 0.12826514516661808, 'test_loss': 0.4672838397324085, 'bleu': 14.5669, 'gen_len': 7.7055}




 55%|█████▌    | 257/465 [57:55<47:18, 13.65s/it]

For epoch 749: {Learning rate: [0.0017582102026949197]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.1280407222305856, 'test_loss': 0.466390161216259, 'bleu': 10.6509, 'gen_len': 8.1438}




 55%|█████▌    | 258/465 [58:09<46:47, 13.56s/it]

For epoch 750: {Learning rate: [0.0017486900470807501]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.1267554720000523, 'test_loss': 0.4691167950630188, 'bleu': 12.8564, 'gen_len': 7.8082}




 56%|█████▌    | 259/465 [58:22<46:20, 13.50s/it]

For epoch 751: {Learning rate: [0.0017391698914665805]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.54batches/s]



Metrics: {'train_loss': 0.125291748562964, 'test_loss': 0.4663002550601959, 'bleu': 13.312, 'gen_len': 7.4726}




 56%|█████▌    | 260/465 [58:36<46:47, 13.70s/it]

For epoch 752: {Learning rate: [0.0017296497358524111]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.12524849739743443, 'test_loss': 0.47407170087099076, 'bleu': 14.7698, 'gen_len': 7.6164}




 56%|█████▌    | 261/465 [58:50<47:10, 13.87s/it]

For epoch 753: {Learning rate: [0.0017201295802382415]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.12523446904449928, 'test_loss': 0.4769352041184902, 'bleu': 10.5858, 'gen_len': 8.0548}




 56%|█████▋    | 262/465 [59:04<46:50, 13.84s/it]

For epoch 754: {Learning rate: [0.001710609424624072]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.12504314885633747, 'test_loss': 0.4767480447888374, 'bleu': 13.9877, 'gen_len': 7.363}




 57%|█████▋    | 263/465 [59:17<46:08, 13.70s/it]

For epoch 755: {Learning rate: [0.0017010892690099025]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.12635645415724778, 'test_loss': 0.4749244250357151, 'bleu': 13.4952, 'gen_len': 7.4795}




 57%|█████▋    | 264/465 [59:31<45:55, 13.71s/it]

For epoch 756: {Learning rate: [0.0016915691133957329]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.00batches/s]



Metrics: {'train_loss': 0.1263910909978355, 'test_loss': 0.4945968687534332, 'bleu': 12.0926, 'gen_len': 7.6233}




 57%|█████▋    | 265/465 [59:45<45:24, 13.62s/it]

For epoch 757: {Learning rate: [0.0016820489577815635]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.12356417626142502, 'test_loss': 0.48189244568347933, 'bleu': 12.1801, 'gen_len': 7.7671}




 57%|█████▋    | 266/465 [59:58<44:51, 13.53s/it]

For epoch 758: {Learning rate: [0.0016725288021673939]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.12481926654170199, 'test_loss': 0.4700011685490608, 'bleu': 11.7815, 'gen_len': 7.8082}




 57%|█████▋    | 267/465 [1:00:11<44:28, 13.48s/it]

For epoch 759: {Learning rate: [0.0016630086465532243]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.12269818601084918, 'test_loss': 0.46213143691420555, 'bleu': 14.5335, 'gen_len': 8.1781}




 58%|█████▊    | 268/465 [1:00:25<44:10, 13.45s/it]

For epoch 760: {Learning rate: [0.0016534884909390549]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.17batches/s]



Metrics: {'train_loss': 0.12277620721880983, 'test_loss': 0.4669437721371651, 'bleu': 13.3366, 'gen_len': 7.1712}




 58%|█████▊    | 269/465 [1:00:38<43:49, 13.42s/it]

For epoch 761: {Learning rate: [0.0016439683353248853]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.12218601747256953, 'test_loss': 0.46692952811717986, 'bleu': 13.4844, 'gen_len': 7.3767}




 58%|█████▊    | 270/465 [1:00:51<43:32, 13.40s/it]

For epoch 762: {Learning rate: [0.0016344481797107159]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.08batches/s]



Metrics: {'train_loss': 0.12141123395867465, 'test_loss': 0.4846562832593918, 'bleu': 13.7624, 'gen_len': 7.7808}




 58%|█████▊    | 271/465 [1:01:05<43:17, 13.39s/it]

For epoch 763: {Learning rate: [0.0016249280240965462]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.72batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.11892046506811933, 'test_loss': 0.4936097785830498, 'bleu': 12.3268, 'gen_len': 7.5274}




 58%|█████▊    | 272/465 [1:01:18<43:00, 13.37s/it]

For epoch 764: {Learning rate: [0.0016154078684823766]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.12147042355159433, 'test_loss': 0.4761470153927803, 'bleu': 12.1345, 'gen_len': 7.8699}




 59%|█████▊    | 273/465 [1:01:31<42:48, 13.38s/it]

For epoch 765: {Learning rate: [0.0016058877128682072]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.12194186194640834, 'test_loss': 0.4752812720835209, 'bleu': 12.1057, 'gen_len': 7.8425}




 59%|█████▉    | 274/465 [1:01:45<42:46, 13.44s/it]

For epoch 766: {Learning rate: [0.0015963675572540376]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.11batches/s]



Metrics: {'train_loss': 0.12200885102516268, 'test_loss': 0.4670534111559391, 'bleu': 13.4749, 'gen_len': 7.3425}




 59%|█████▉    | 275/465 [1:01:58<42:25, 13.40s/it]

For epoch 767: {Learning rate: [0.001586847401639868]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.11870789437032328, 'test_loss': 0.4740638457238674, 'bleu': 14.5465, 'gen_len': 6.8288}




 59%|█████▉    | 276/465 [1:02:12<42:06, 13.37s/it]

For epoch 768: {Learning rate: [0.0015773272460256986]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.11798428834938421, 'test_loss': 0.4741910383105278, 'bleu': 13.2714, 'gen_len': 7.411}




 60%|█████▉    | 277/465 [1:02:25<42:10, 13.46s/it]

For epoch 769: {Learning rate: [0.001567807090411529]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.11631232481904147, 'test_loss': 0.4815141066908836, 'bleu': 12.8904, 'gen_len': 7.0137}




 60%|█████▉    | 278/465 [1:02:39<42:03, 13.49s/it]

For epoch 770: {Learning rate: [0.0015582869347973596]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.11780754113342704, 'test_loss': 0.465433981269598, 'bleu': 14.0249, 'gen_len': 6.6712}




 60%|██████    | 279/465 [1:02:52<41:42, 13.45s/it]

For epoch 771: {Learning rate: [0.00154876677918319]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.11583063297155427, 'test_loss': 0.4709685042500496, 'bleu': 13.7003, 'gen_len': 7.4247}




 60%|██████    | 280/465 [1:03:06<41:18, 13.40s/it]

For epoch 772: {Learning rate: [0.0015392466235690204]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.11647907862576043, 'test_loss': 0.4830322444438934, 'bleu': 13.9075, 'gen_len': 7.6027}




 60%|██████    | 281/465 [1:03:20<42:09, 13.75s/it]

For epoch 773: {Learning rate: [0.001529726467954851]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.11batches/s]



Metrics: {'train_loss': 0.11599499895805265, 'test_loss': 0.4817263513803482, 'bleu': 13.1233, 'gen_len': 7.4452}




 61%|██████    | 282/465 [1:03:33<41:36, 13.64s/it]

For epoch 774: {Learning rate: [0.0015202063123406814]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.11619570942186727, 'test_loss': 0.47458558306097987, 'bleu': 13.548, 'gen_len': 7.8288}




 61%|██████    | 283/465 [1:03:47<41:24, 13.65s/it]

For epoch 775: {Learning rate: [0.0015106861567265118]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.11310995424666055, 'test_loss': 0.4779780961573124, 'bleu': 14.1713, 'gen_len': 7.7397}




 61%|██████    | 284/465 [1:04:00<40:49, 13.53s/it]

For epoch 776: {Learning rate: [0.0015011660011123424]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.04batches/s]



Metrics: {'train_loss': 0.11397425503265567, 'test_loss': 0.47016144432127477, 'bleu': 11.2582, 'gen_len': 7.911}




 61%|██████▏   | 285/465 [1:04:14<40:27, 13.49s/it]

For epoch 777: {Learning rate: [0.0014916458454981728]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.08batches/s]



Metrics: {'train_loss': 0.11390834665153085, 'test_loss': 0.4842330664396286, 'bleu': 13.2183, 'gen_len': 7.6781}




 62%|██████▏   | 286/465 [1:04:27<40:02, 13.42s/it]

For epoch 778: {Learning rate: [0.0014821256898840034]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.1129870322055933, 'test_loss': 0.47177321314811704, 'bleu': 13.4432, 'gen_len': 7.3151}




 62%|██████▏   | 287/465 [1:04:41<39:51, 13.44s/it]

For epoch 779: {Learning rate: [0.0014726055342698338]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.11232011310937928, 'test_loss': 0.4773030035197735, 'bleu': 16.0436, 'gen_len': 7.4795}




 62%|██████▏   | 288/465 [1:04:55<40:10, 13.62s/it]

For epoch 780: {Learning rate: [0.0014630853786556642]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.08batches/s]



Metrics: {'train_loss': 0.11267229042402128, 'test_loss': 0.4683572746813297, 'bleu': 13.9343, 'gen_len': 7.9384}




 62%|██████▏   | 289/465 [1:05:08<39:44, 13.55s/it]

For epoch 781: {Learning rate: [0.0014535652230414948]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.1118881947747091, 'test_loss': 0.477932570874691, 'bleu': 11.8389, 'gen_len': 8.1096}




 62%|██████▏   | 290/465 [1:05:21<39:27, 13.53s/it]

For epoch 782: {Learning rate: [0.0014440450674273251]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.1106608006648901, 'test_loss': 0.4602737993001938, 'bleu': 14.3435, 'gen_len': 7.9521}




 63%|██████▎   | 291/465 [1:05:35<39:08, 13.50s/it]

For epoch 783: {Learning rate: [0.0014345249118131555]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.11000163380692644, 'test_loss': 0.4829005554318428, 'bleu': 12.3165, 'gen_len': 7.9315}




 63%|██████▎   | 292/465 [1:05:48<38:46, 13.45s/it]

For epoch 784: {Learning rate: [0.0014250047561989861]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.11069719384356243, 'test_loss': 0.4875015884637833, 'bleu': 14.2518, 'gen_len': 7.2329}




 63%|██████▎   | 293/465 [1:06:02<38:43, 13.51s/it]

For epoch 785: {Learning rate: [0.0014154846005848165]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.10891001071871781, 'test_loss': 0.47391567640006543, 'bleu': 13.6963, 'gen_len': 7.4041}




 63%|██████▎   | 294/465 [1:06:15<38:18, 13.44s/it]

For epoch 786: {Learning rate: [0.0014059644449706471]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.10756886205295237, 'test_loss': 0.48556589931249616, 'bleu': 14.5242, 'gen_len': 7.5685}




 63%|██████▎   | 295/465 [1:06:29<38:02, 13.43s/it]

For epoch 787: {Learning rate: [0.0013964442893564775]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.10961975793286068, 'test_loss': 0.4719475869089365, 'bleu': 14.4906, 'gen_len': 7.5616}




 64%|██████▎   | 296/465 [1:06:42<37:44, 13.40s/it]

For epoch 788: {Learning rate: [0.001386924133742308]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.10619623232178571, 'test_loss': 0.46680501028895377, 'bleu': 12.9206, 'gen_len': 7.5411}




 64%|██████▍   | 297/465 [1:06:55<37:36, 13.43s/it]

For epoch 789: {Learning rate: [0.0013774039781281385]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.09batches/s]



Metrics: {'train_loss': 0.10864106329475962, 'test_loss': 0.4780521087348461, 'bleu': 12.2376, 'gen_len': 7.9658}




 64%|██████▍   | 298/465 [1:07:09<37:16, 13.39s/it]

For epoch 790: {Learning rate: [0.001367883822513969]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.10840476340636974, 'test_loss': 0.4819617122411728, 'bleu': 13.3592, 'gen_len': 7.1781}




 64%|██████▍   | 299/465 [1:07:22<37:03, 13.39s/it]

For epoch 791: {Learning rate: [0.0013583636668997993]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.1058458767649604, 'test_loss': 0.4798515260219574, 'bleu': 14.1272, 'gen_len': 7.4863}




 65%|██████▍   | 300/465 [1:07:35<36:49, 13.39s/it]

For epoch 792: {Learning rate: [0.00134884351128563]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.72batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.10546652954526065, 'test_loss': 0.47774757742881774, 'bleu': 15.1877, 'gen_len': 7.5753}




 65%|██████▍   | 301/465 [1:07:49<36:31, 13.36s/it]

For epoch 793: {Learning rate: [0.0013393233556714603]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.02batches/s]



Metrics: {'train_loss': 0.10555717494429612, 'test_loss': 0.4759845580905676, 'bleu': 12.3872, 'gen_len': 8.2123}




 65%|██████▍   | 302/465 [1:08:02<36:29, 13.43s/it]

For epoch 794: {Learning rate: [0.0013298032000572909]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.00batches/s]



Metrics: {'train_loss': 0.10578792814801379, 'test_loss': 0.46910007745027543, 'bleu': 14.6837, 'gen_len': 7.7671}




 65%|██████▌   | 303/465 [1:08:16<36:17, 13.44s/it]

For epoch 795: {Learning rate: [0.0013202830444431213]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.10557273466412614, 'test_loss': 0.4706326358020306, 'bleu': 13.6337, 'gen_len': 7.5205}




 65%|██████▌   | 304/465 [1:08:30<36:17, 13.53s/it]

For epoch 796: {Learning rate: [0.0013107628888289517]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.10378658662482006, 'test_loss': 0.4760622769594193, 'bleu': 12.1538, 'gen_len': 8.2123}




 66%|██████▌   | 305/465 [1:08:43<36:09, 13.56s/it]

For epoch 797: {Learning rate: [0.0013012427332147823]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.10426516521994661, 'test_loss': 0.4631235498934984, 'bleu': 15.6598, 'gen_len': 7.1096}




 66%|██████▌   | 306/465 [1:08:57<35:54, 13.55s/it]

For epoch 798: {Learning rate: [0.0012917225776006127]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.1029903439850342, 'test_loss': 0.46238066293299196, 'bleu': 10.287, 'gen_len': 8.274}




 66%|██████▌   | 307/465 [1:09:10<35:34, 13.51s/it]

For epoch 799: {Learning rate: [0.001282202421986443]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.07batches/s]



Metrics: {'train_loss': 0.10237670017451775, 'test_loss': 0.4733426637947559, 'bleu': 14.0896, 'gen_len': 7.3014}




 66%|██████▌   | 308/465 [1:09:23<35:13, 13.46s/it]

For epoch 800: {Learning rate: [0.0012726822663722737]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.98batches/s]



Metrics: {'train_loss': 0.10076505999739577, 'test_loss': 0.4843521811068058, 'bleu': 13.7835, 'gen_len': 7.6438}




 66%|██████▋   | 309/465 [1:09:37<34:56, 13.44s/it]

For epoch 801: {Learning rate: [0.001263162110758104]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.08batches/s]



Metrics: {'train_loss': 0.10098263893912478, 'test_loss': 0.47671916969120504, 'bleu': 11.6261, 'gen_len': 7.9041}




 67%|██████▋   | 310/465 [1:09:50<34:39, 13.41s/it]

For epoch 802: {Learning rate: [0.0012536419551439344]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.10235618527342634, 'test_loss': 0.4742803663015366, 'bleu': 14.8048, 'gen_len': 7.5616}




 67%|██████▋   | 311/465 [1:10:04<34:27, 13.42s/it]

For epoch 803: {Learning rate: [0.001244121799529765]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.10238548277354823, 'test_loss': 0.4777773402631283, 'bleu': 13.5695, 'gen_len': 7.6781}




 67%|██████▋   | 312/465 [1:10:17<34:09, 13.39s/it]

For epoch 804: {Learning rate: [0.0012346016439155954]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.09976906665578121, 'test_loss': 0.4690069776028395, 'bleu': 10.9211, 'gen_len': 8.2534}




 67%|██████▋   | 313/465 [1:10:30<33:59, 13.42s/it]

For epoch 805: {Learning rate: [0.001225081488301426]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.11batches/s]



Metrics: {'train_loss': 0.0999587216391796, 'test_loss': 0.4698702085763216, 'bleu': 13.3263, 'gen_len': 7.7055}




 68%|██████▊   | 314/465 [1:10:44<33:43, 13.40s/it]

For epoch 806: {Learning rate: [0.0012155613326872564]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.09798203990226839, 'test_loss': 0.47811277210712433, 'bleu': 10.7994, 'gen_len': 8.2534}




 68%|██████▊   | 315/465 [1:10:57<33:28, 13.39s/it]

For epoch 807: {Learning rate: [0.0012060411770730868]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.09978703845564912, 'test_loss': 0.48117906525731086, 'bleu': 12.3272, 'gen_len': 7.1918}




 68%|██████▊   | 316/465 [1:11:11<33:13, 13.38s/it]

For epoch 808: {Learning rate: [0.0011965210214589174]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.10batches/s]



Metrics: {'train_loss': 0.09816181496149157, 'test_loss': 0.4758201539516449, 'bleu': 13.6503, 'gen_len': 7.8082}




 68%|██████▊   | 317/465 [1:11:24<32:55, 13.35s/it]

For epoch 809: {Learning rate: [0.0011870008658447478]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.05batches/s]



Metrics: {'train_loss': 0.09689204876379269, 'test_loss': 0.47361025139689444, 'bleu': 13.1363, 'gen_len': 7.6918}




 68%|██████▊   | 318/465 [1:11:37<32:45, 13.37s/it]

For epoch 810: {Learning rate: [0.0011774807102305782]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.09650245218015299, 'test_loss': 0.47226928286254405, 'bleu': 12.7882, 'gen_len': 7.5548}




 69%|██████▊   | 319/465 [1:11:51<32:35, 13.40s/it]

For epoch 811: {Learning rate: [0.0011679605546164088]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.10batches/s]



Metrics: {'train_loss': 0.09721237984372348, 'test_loss': 0.4826685801148415, 'bleu': 15.9046, 'gen_len': 6.7877}




 69%|██████▉   | 320/465 [1:12:04<32:16, 13.36s/it]

For epoch 812: {Learning rate: [0.001158440399002239]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.09666610109369929, 'test_loss': 0.4755272876471281, 'bleu': 11.3974, 'gen_len': 7.6233}




 69%|██████▉   | 321/465 [1:12:17<32:03, 13.36s/it]

For epoch 813: {Learning rate: [0.0011489202433880694]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.09605044235543507, 'test_loss': 0.4839719347655773, 'bleu': 13.1057, 'gen_len': 7.4795}




 69%|██████▉   | 322/465 [1:12:31<31:53, 13.38s/it]

For epoch 814: {Learning rate: [0.0011394000877739]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.09489600651147889, 'test_loss': 0.4858357414603233, 'bleu': 13.1241, 'gen_len': 7.7055}




 69%|██████▉   | 323/465 [1:12:44<31:48, 13.44s/it]

For epoch 815: {Learning rate: [0.0011298799321597303]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.03batches/s]



Metrics: {'train_loss': 0.09545839159953885, 'test_loss': 0.4824264977127314, 'bleu': 13.0943, 'gen_len': 7.637}




 70%|██████▉   | 324/465 [1:12:58<31:34, 13.44s/it]

For epoch 816: {Learning rate: [0.001120359776545561]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.09414439521184782, 'test_loss': 0.47512370608747007, 'bleu': 17.3903, 'gen_len': 6.9247}




 70%|██████▉   | 325/465 [1:13:12<31:49, 13.64s/it]

For epoch 817: {Learning rate: [0.0011108396209313913]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.13batches/s]



Metrics: {'train_loss': 0.09405788052372815, 'test_loss': 0.4773532098159194, 'bleu': 15.5249, 'gen_len': 7.3014}




 70%|███████   | 326/465 [1:13:25<31:26, 13.57s/it]

For epoch 818: {Learning rate: [0.001101319465317222]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.09498366150187283, 'test_loss': 0.481505611166358, 'bleu': 15.3793, 'gen_len': 7.5685}




 70%|███████   | 327/465 [1:13:39<31:20, 13.63s/it]

For epoch 819: {Learning rate: [0.0010917993097030523]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.0921851965348895, 'test_loss': 0.48632284328341485, 'bleu': 12.6193, 'gen_len': 7.2123}




 71%|███████   | 328/465 [1:13:53<31:03, 13.60s/it]

For epoch 820: {Learning rate: [0.001082279154088883]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.09167593590370039, 'test_loss': 0.48651455491781237, 'bleu': 13.9088, 'gen_len': 7.3288}




 71%|███████   | 329/465 [1:14:06<30:51, 13.61s/it]

For epoch 821: {Learning rate: [0.0010727589984747133]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.01batches/s]



Metrics: {'train_loss': 0.09117229700815387, 'test_loss': 0.4758783593773842, 'bleu': 13.6997, 'gen_len': 7.7808}




 71%|███████   | 330/465 [1:14:20<30:26, 13.53s/it]

For epoch 822: {Learning rate: [0.0010632388428605437]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.09107910678154085, 'test_loss': 0.48061547577381136, 'bleu': 14.0502, 'gen_len': 7.3699}




 71%|███████   | 331/465 [1:14:33<30:12, 13.53s/it]

For epoch 823: {Learning rate: [0.0010537186872463743]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.0909470392436516, 'test_loss': 0.4721619591116905, 'bleu': 11.9298, 'gen_len': 7.4658}




 71%|███████▏  | 332/465 [1:14:47<30:06, 13.59s/it]

For epoch 824: {Learning rate: [0.0010441985316322047]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.0889161410855084, 'test_loss': 0.4733009420335293, 'bleu': 15.334, 'gen_len': 7.6164}




 72%|███████▏  | 333/465 [1:15:00<29:49, 13.55s/it]

For epoch 825: {Learning rate: [0.0010346783760180353]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.99batches/s]



Metrics: {'train_loss': 0.08708403350376501, 'test_loss': 0.4878332309424877, 'bleu': 13.806, 'gen_len': 7.4521}




 72%|███████▏  | 334/465 [1:15:14<29:32, 13.53s/it]

For epoch 826: {Learning rate: [0.0010251582204038657]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.0883695740343594, 'test_loss': 0.4798432968556881, 'bleu': 16.305, 'gen_len': 7.5753}




 72%|███████▏  | 335/465 [1:15:27<29:18, 13.53s/it]

For epoch 827: {Learning rate: [0.001015638064789696]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  3.12batches/s]



Metrics: {'train_loss': 0.08869103742081945, 'test_loss': 0.47524032443761827, 'bleu': 14.4524, 'gen_len': 7.3288}




 72%|███████▏  | 336/465 [1:15:41<28:54, 13.44s/it]

For epoch 828: {Learning rate: [0.0010061179091755267]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.0873980404036801, 'test_loss': 0.4736367747187614, 'bleu': 15.4729, 'gen_len': 7.8219}




 72%|███████▏  | 337/465 [1:15:54<28:42, 13.45s/it]

For epoch 829: {Learning rate: [0.000996597753561357]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.41batches/s]



Metrics: {'train_loss': 0.08664811502506094, 'test_loss': 0.4798587366938591, 'bleu': 13.1159, 'gen_len': 7.8425}




 73%|███████▎  | 338/465 [1:16:08<28:58, 13.69s/it]

For epoch 830: {Learning rate: [0.0009870775979471875]}


Train batch number 40: 100%|██████████| 41/41 [1:15:37<00:00, 110.66s/batches]  
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.93batches/s]



Metrics: {'train_loss': 0.0857142438975776, 'test_loss': 0.46965294852852824, 'bleu': 13.9191, 'gen_len': 7.8904}




 73%|███████▎  | 339/465 [2:31:52<48:02:45, 1372.74s/it]

For epoch 831: {Learning rate: [0.000977557442333018]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.04batches/s]



Metrics: {'train_loss': 0.08529503516307692, 'test_loss': 0.47968339547514915, 'bleu': 16.2075, 'gen_len': 7.0548}




 73%|███████▎  | 340/465 [2:32:06<33:30:50, 965.20s/it] 

For epoch 832: {Learning rate: [0.0009680372867188485]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.23batches/s]



Metrics: {'train_loss': 0.08565895350241079, 'test_loss': 0.47420374378561975, 'bleu': 13.9429, 'gen_len': 7.363}




 73%|███████▎  | 341/465 [2:32:20<23:24:38, 679.66s/it]

For epoch 833: {Learning rate: [0.000958517131104679]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.26batches/s]



Metrics: {'train_loss': 0.08427664074229031, 'test_loss': 0.49096110835671425, 'bleu': 14.9608, 'gen_len': 7.3904}




 74%|███████▎  | 342/465 [2:32:33<16:23:32, 479.77s/it]

For epoch 834: {Learning rate: [0.0009489969754905094]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.25batches/s]



Metrics: {'train_loss': 0.08428882543997067, 'test_loss': 0.475429505109787, 'bleu': 14.7337, 'gen_len': 7.6301}




 74%|███████▍  | 343/465 [2:32:47<11:31:30, 340.09s/it]

For epoch 835: {Learning rate: [0.0009394768198763397]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.42batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.0830420298365558, 'test_loss': 0.48123000711202624, 'bleu': 14.2233, 'gen_len': 7.3562}




 74%|███████▍  | 344/465 [2:33:00<8:08:02, 242.01s/it] 

For epoch 836: {Learning rate: [0.0009299566642621702]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.22batches/s]



Metrics: {'train_loss': 0.08205621516922625, 'test_loss': 0.4892395235598087, 'bleu': 15.4305, 'gen_len': 7.4178}




 74%|███████▍  | 345/465 [2:33:14<5:47:00, 173.51s/it]

For epoch 837: {Learning rate: [0.0009204365086480007]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.18batches/s]



Metrics: {'train_loss': 0.08347986084295482, 'test_loss': 0.47844487354159354, 'bleu': 14.3169, 'gen_len': 7.6986}




 74%|███████▍  | 346/465 [2:33:28<4:09:06, 125.60s/it]

For epoch 838: {Learning rate: [0.0009109163530338312]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.21batches/s]



Metrics: {'train_loss': 0.08115308286576736, 'test_loss': 0.46395729333162306, 'bleu': 15.4841, 'gen_len': 7.363}




 75%|███████▍  | 347/465 [2:33:42<3:01:07, 92.10s/it] 

For epoch 839: {Learning rate: [0.0009013961974196616]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.11batches/s]



Metrics: {'train_loss': 0.08088003662301273, 'test_loss': 0.48978727012872697, 'bleu': 13.2687, 'gen_len': 7.2123}




 75%|███████▍  | 348/465 [2:33:56<2:13:52, 68.65s/it]

For epoch 840: {Learning rate: [0.0008918760418054921]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.29batches/s]



Metrics: {'train_loss': 0.08093988477456861, 'test_loss': 0.4979137137532234, 'bleu': 15.0846, 'gen_len': 7.3425}




 75%|███████▌  | 349/465 [2:34:09<1:40:42, 52.09s/it]

For epoch 841: {Learning rate: [0.0008823558861913226]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.22batches/s]



Metrics: {'train_loss': 0.0798460120653234, 'test_loss': 0.4762627586722374, 'bleu': 15.5899, 'gen_len': 7.0685}




 75%|███████▌  | 350/465 [2:34:23<1:17:52, 40.63s/it]

For epoch 842: {Learning rate: [0.0008728357305771531]}


Train batch number 40: 100%|██████████| 41/41 [00:07<00:00,  5.21batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.27batches/s]



Metrics: {'train_loss': 0.08089319280371433, 'test_loss': 0.48484230116009713, 'bleu': 16.6493, 'gen_len': 6.7397}




 75%|███████▌  | 351/465 [2:34:37<1:02:03, 32.66s/it]

For epoch 843: {Learning rate: [0.0008633155749629835]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  5.00batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.01batches/s]



Metrics: {'train_loss': 0.0782855296643769, 'test_loss': 0.484022019803524, 'bleu': 14.3324, 'gen_len': 7.3904}




 76%|███████▌  | 352/465 [2:34:52<51:27, 27.33s/it]  

For epoch 844: {Learning rate: [0.000853795419348814]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.21batches/s]



Metrics: {'train_loss': 0.07809639149686186, 'test_loss': 0.4834129996597767, 'bleu': 16.3846, 'gen_len': 7.226}




 76%|███████▌  | 353/465 [2:35:07<44:07, 23.63s/it]

For epoch 845: {Learning rate: [0.0008442752637346445]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.80batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.28batches/s]



Metrics: {'train_loss': 0.07812871247893427, 'test_loss': 0.47279936671257017, 'bleu': 17.477, 'gen_len': 7.0822}




 76%|███████▌  | 354/465 [2:35:25<40:25, 21.85s/it]

For epoch 846: {Learning rate: [0.000834755108120475]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.76batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.25batches/s]



Metrics: {'train_loss': 0.0759932490565428, 'test_loss': 0.47326589971780775, 'bleu': 16.7252, 'gen_len': 7.5137}




 76%|███████▋  | 355/465 [2:35:40<36:14, 19.77s/it]

For epoch 847: {Learning rate: [0.0008252349525063054]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.84batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.42batches/s]



Metrics: {'train_loss': 0.0776048844180456, 'test_loss': 0.476935949921608, 'bleu': 16.5326, 'gen_len': 7.1644}




 77%|███████▋  | 356/465 [2:35:54<32:49, 18.07s/it]

For epoch 848: {Learning rate: [0.0008157147968921359]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.0765787257653911, 'test_loss': 0.49591001570224763, 'bleu': 15.9787, 'gen_len': 7.0753}




 77%|███████▋  | 357/465 [2:36:08<30:27, 16.92s/it]

For epoch 849: {Learning rate: [0.0008061946412779664]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.76batches/s]



Metrics: {'train_loss': 0.07513603468130274, 'test_loss': 0.4994076430797577, 'bleu': 16.1369, 'gen_len': 7.6027}




 77%|███████▋  | 358/465 [2:36:22<28:29, 15.98s/it]

For epoch 850: {Learning rate: [0.0007966744856637969]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.72batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.07520287547533105, 'test_loss': 0.5019745662808418, 'bleu': 15.4525, 'gen_len': 7.2945}




 77%|███████▋  | 359/465 [2:36:36<27:07, 15.36s/it]

For epoch 851: {Learning rate: [0.0007871543300496273]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.07370492252634793, 'test_loss': 0.500941850990057, 'bleu': 14.0999, 'gen_len': 7.0137}




 77%|███████▋  | 360/465 [2:36:50<26:12, 14.97s/it]

For epoch 852: {Learning rate: [0.0007776341744354578]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.53batches/s]



Metrics: {'train_loss': 0.0745862343507569, 'test_loss': 0.49228243827819823, 'bleu': 16.9421, 'gen_len': 7.2534}




 78%|███████▊  | 361/465 [2:37:04<25:35, 14.76s/it]

For epoch 853: {Learning rate: [0.0007681140188212882]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.17batches/s]



Metrics: {'train_loss': 0.07264774846957951, 'test_loss': 0.4891134977340698, 'bleu': 15.5262, 'gen_len': 7.3356}




 78%|███████▊  | 362/465 [2:37:19<25:29, 14.85s/it]

For epoch 854: {Learning rate: [0.0007585938632071187]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.54batches/s]



Metrics: {'train_loss': 0.07224466669850232, 'test_loss': 0.4769108723849058, 'bleu': 14.6734, 'gen_len': 7.1233}




 78%|███████▊  | 363/465 [2:37:34<25:12, 14.82s/it]

For epoch 855: {Learning rate: [0.0007490737075929491]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.10batches/s]



Metrics: {'train_loss': 0.07160468463127206, 'test_loss': 0.4899436078965664, 'bleu': 16.6564, 'gen_len': 7.1438}




 78%|███████▊  | 364/465 [2:37:50<25:20, 15.06s/it]

For epoch 856: {Learning rate: [0.0007395535519787796]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.42batches/s]



Metrics: {'train_loss': 0.07166697012215126, 'test_loss': 0.4877410281449556, 'bleu': 15.2854, 'gen_len': 7.7123}




 78%|███████▊  | 365/465 [2:38:04<24:55, 14.96s/it]

For epoch 857: {Learning rate: [0.0007300333963646101]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.25batches/s]



Metrics: {'train_loss': 0.0711963971758761, 'test_loss': 0.48060154020786283, 'bleu': 14.2211, 'gen_len': 7.8767}




 79%|███████▊  | 366/465 [2:38:19<24:38, 14.94s/it]

For epoch 858: {Learning rate: [0.0007205132407504405]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.89batches/s]



Metrics: {'train_loss': 0.06901114079646947, 'test_loss': 0.4916338637471199, 'bleu': 16.9139, 'gen_len': 7.7329}




 79%|███████▉  | 367/465 [2:38:35<25:00, 15.31s/it]

For epoch 859: {Learning rate: [0.000710993085136271]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.07011398246012084, 'test_loss': 0.4863159589469433, 'bleu': 15.2002, 'gen_len': 7.3356}




 79%|███████▉  | 368/465 [2:38:50<24:17, 15.02s/it]

For epoch 860: {Learning rate: [0.0007014729295221015]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.0700280054858545, 'test_loss': 0.4829140957444906, 'bleu': 16.9446, 'gen_len': 7.226}




 79%|███████▉  | 369/465 [2:39:04<23:42, 14.82s/it]

For epoch 861: {Learning rate: [0.000691952773907932]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.31batches/s]



Metrics: {'train_loss': 0.06977071513126536, 'test_loss': 0.48601757287979125, 'bleu': 15.8084, 'gen_len': 7.5274}




 80%|███████▉  | 370/465 [2:39:19<23:26, 14.80s/it]

For epoch 862: {Learning rate: [0.0006824326182937624]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.47batches/s]



Metrics: {'train_loss': 0.06793186759076468, 'test_loss': 0.4853790566325188, 'bleu': 15.6125, 'gen_len': 7.7945}




 80%|███████▉  | 371/465 [2:39:33<23:05, 14.74s/it]

For epoch 863: {Learning rate: [0.0006729124626795929]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.17batches/s]



Metrics: {'train_loss': 0.0671660977347595, 'test_loss': 0.48577880635857584, 'bleu': 15.2969, 'gen_len': 7.5274}




 80%|████████  | 372/465 [2:39:49<23:02, 14.87s/it]

For epoch 864: {Learning rate: [0.0006633923070654234]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.35batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.07batches/s]



Metrics: {'train_loss': 0.06630894868839078, 'test_loss': 0.49508751183748245, 'bleu': 17.1367, 'gen_len': 7.2055}




 80%|████████  | 373/465 [2:40:04<23:15, 15.16s/it]

For epoch 865: {Learning rate: [0.0006538721514512539]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.30batches/s]



Metrics: {'train_loss': 0.06712320301590896, 'test_loss': 0.4978281155228615, 'bleu': 17.1239, 'gen_len': 7.4726}




 80%|████████  | 374/465 [2:40:19<22:56, 15.13s/it]

For epoch 866: {Learning rate: [0.0006443519958370843]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.47batches/s]



Metrics: {'train_loss': 0.06534045662094908, 'test_loss': 0.4904784630984068, 'bleu': 19.2318, 'gen_len': 7.3151}




 81%|████████  | 375/465 [2:40:35<22:48, 15.20s/it]

For epoch 867: {Learning rate: [0.0006348318402229148]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.06578655640889959, 'test_loss': 0.49008971862494943, 'bleu': 17.7603, 'gen_len': 7.2671}




 81%|████████  | 376/465 [2:40:49<22:09, 14.94s/it]

For epoch 868: {Learning rate: [0.0006253116846087453]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.14batches/s]



Metrics: {'train_loss': 0.0643590823179338, 'test_loss': 0.48196091279387476, 'bleu': 16.4384, 'gen_len': 7.1438}




 81%|████████  | 377/465 [2:41:04<22:01, 15.02s/it]

For epoch 869: {Learning rate: [0.0006157915289945758]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.96batches/s]



Metrics: {'train_loss': 0.06478889913457196, 'test_loss': 0.4819104313850403, 'bleu': 17.9236, 'gen_len': 7.1918}




 81%|████████▏ | 378/465 [2:41:20<22:14, 15.33s/it]

For epoch 870: {Learning rate: [0.0006062713733804062]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.07batches/s]



Metrics: {'train_loss': 0.06406808399209162, 'test_loss': 0.4913905203342438, 'bleu': 16.1108, 'gen_len': 7.4726}




 82%|████████▏ | 379/465 [2:41:36<22:17, 15.55s/it]

For epoch 871: {Learning rate: [0.0005967512177662366]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.12batches/s]



Metrics: {'train_loss': 0.0637647295143546, 'test_loss': 0.49764613956213, 'bleu': 15.9613, 'gen_len': 7.2603}




 82%|████████▏ | 380/465 [2:41:52<21:59, 15.53s/it]

For epoch 872: {Learning rate: [0.0005872310621520671]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.53batches/s]



Metrics: {'train_loss': 0.0620396424357484, 'test_loss': 0.4770782269537449, 'bleu': 15.7811, 'gen_len': 7.4726}




 82%|████████▏ | 381/465 [2:42:07<21:24, 15.29s/it]

For epoch 873: {Learning rate: [0.0005777109065378976]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.06271529343070054, 'test_loss': 0.48459879755973817, 'bleu': 15.9145, 'gen_len': 7.5616}




 82%|████████▏ | 382/465 [2:42:21<20:41, 14.95s/it]

For epoch 874: {Learning rate: [0.000568190750923728]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.06228777848002387, 'test_loss': 0.4921753317117691, 'bleu': 15.8257, 'gen_len': 7.2055}




 82%|████████▏ | 383/465 [2:42:35<19:57, 14.60s/it]

For epoch 875: {Learning rate: [0.0005586705953095585]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.0626650947804858, 'test_loss': 0.4856807358562946, 'bleu': 16.4179, 'gen_len': 7.3767}




 83%|████████▎ | 384/465 [2:42:49<19:30, 14.45s/it]

For epoch 876: {Learning rate: [0.0005491504396953889]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.060698864845240987, 'test_loss': 0.48449279367923737, 'bleu': 14.7745, 'gen_len': 7.0479}




 83%|████████▎ | 385/465 [2:43:02<18:57, 14.22s/it]

For epoch 877: {Learning rate: [0.0005396302840812194]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.061339857829053226, 'test_loss': 0.47725576497614386, 'bleu': 15.8704, 'gen_len': 7.3836}




 83%|████████▎ | 386/465 [2:43:16<18:37, 14.14s/it]

For epoch 878: {Learning rate: [0.0005301101284670499]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.061190300780098614, 'test_loss': 0.48565558418631555, 'bleu': 16.6463, 'gen_len': 7.2123}




 83%|████████▎ | 387/465 [2:43:30<18:11, 14.00s/it]

For epoch 879: {Learning rate: [0.0005205899728528804]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.059439176193824626, 'test_loss': 0.5011105753481389, 'bleu': 13.5239, 'gen_len': 6.9452}




 83%|████████▎ | 388/465 [2:43:44<17:56, 13.98s/it]

For epoch 880: {Learning rate: [0.0005110698172387108]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.70batches/s]



Metrics: {'train_loss': 0.058118082401229114, 'test_loss': 0.4857471965253353, 'bleu': 16.0149, 'gen_len': 7.4315}




 84%|████████▎ | 389/465 [2:43:58<17:42, 13.97s/it]

For epoch 881: {Learning rate: [0.0005015496616245413]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.058733239043049695, 'test_loss': 0.48226153925061227, 'bleu': 16.3288, 'gen_len': 7.2877}




 84%|████████▍ | 390/465 [2:44:12<17:25, 13.94s/it]

For epoch 882: {Learning rate: [0.0004920295060103718]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.058051958498431415, 'test_loss': 0.48241626769304274, 'bleu': 16.0692, 'gen_len': 7.3973}




 84%|████████▍ | 391/465 [2:44:25<17:06, 13.87s/it]

For epoch 883: {Learning rate: [0.00048250935039620223]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.057840905901862354, 'test_loss': 0.47732581943273544, 'bleu': 16.7279, 'gen_len': 7.0822}




 84%|████████▍ | 392/465 [2:44:39<16:50, 13.84s/it]

For epoch 884: {Learning rate: [0.00047298919478203273]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.05780617056823358, 'test_loss': 0.47812268435955046, 'bleu': 17.0878, 'gen_len': 7.2397}




 85%|████████▍ | 393/465 [2:44:53<16:38, 13.87s/it]

For epoch 885: {Learning rate: [0.00046346903916786317]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.056162325710785096, 'test_loss': 0.4866900511085987, 'bleu': 17.3652, 'gen_len': 7.363}




 85%|████████▍ | 394/465 [2:45:07<16:25, 13.89s/it]

For epoch 886: {Learning rate: [0.00045394888355369367]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.05605077461861983, 'test_loss': 0.492694029211998, 'bleu': 19.007, 'gen_len': 7.0753}




 85%|████████▍ | 395/465 [2:45:21<16:10, 13.86s/it]

For epoch 887: {Learning rate: [0.00044442872793952406]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.05575355614830808, 'test_loss': 0.4828959070146084, 'bleu': 18.7127, 'gen_len': 7.089}




 85%|████████▌ | 396/465 [2:45:35<15:52, 13.80s/it]

For epoch 888: {Learning rate: [0.0004349085723253545]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.053815669130261354, 'test_loss': 0.48331646993756294, 'bleu': 17.295, 'gen_len': 7.4315}




 85%|████████▌ | 397/465 [2:45:49<15:41, 13.85s/it]

For epoch 889: {Learning rate: [0.000425388416711185]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.05339548664122093, 'test_loss': 0.4912895172834396, 'bleu': 18.2899, 'gen_len': 7.089}




 86%|████████▌ | 398/465 [2:46:02<15:29, 13.88s/it]

For epoch 890: {Learning rate: [0.00041586826109701544]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.05476331692643282, 'test_loss': 0.4916199021041393, 'bleu': 16.0954, 'gen_len': 7.6849}




 86%|████████▌ | 399/465 [2:46:16<15:16, 13.88s/it]

For epoch 891: {Learning rate: [0.00040634810548284593]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.054152475897131895, 'test_loss': 0.4876385651528835, 'bleu': 19.0838, 'gen_len': 7.1164}




 86%|████████▌ | 400/465 [2:46:30<14:54, 13.77s/it]

For epoch 892: {Learning rate: [0.0003968279498686764]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.05396272105778136, 'test_loss': 0.4880983307957649, 'bleu': 17.2678, 'gen_len': 7.4041}




 86%|████████▌ | 401/465 [2:46:44<14:38, 13.73s/it]

For epoch 893: {Learning rate: [0.0003873077942545069]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.05268306094335347, 'test_loss': 0.48119195476174353, 'bleu': 17.2346, 'gen_len': 7.3904}




 86%|████████▋ | 402/465 [2:46:57<14:24, 13.72s/it]

For epoch 894: {Learning rate: [0.0003777876386403373]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.05159034870746659, 'test_loss': 0.48325358219444753, 'bleu': 18.9731, 'gen_len': 7.3288}




 87%|████████▋ | 403/465 [2:47:11<14:11, 13.73s/it]

For epoch 895: {Learning rate: [0.0003682674830261678]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.051801181239325825, 'test_loss': 0.48917583413422105, 'bleu': 16.9378, 'gen_len': 6.863}




 87%|████████▋ | 404/465 [2:47:25<13:57, 13.73s/it]

For epoch 896: {Learning rate: [0.00035874732741199826]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.05198569250542943, 'test_loss': 0.4783700253814459, 'bleu': 17.084, 'gen_len': 7.1575}




 87%|████████▋ | 405/465 [2:47:38<13:40, 13.68s/it]

For epoch 897: {Learning rate: [0.00034922717179782875]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.05095891490942094, 'test_loss': 0.4816520527005196, 'bleu': 17.4435, 'gen_len': 6.8699}




 87%|████████▋ | 406/465 [2:47:52<13:26, 13.67s/it]

For epoch 898: {Learning rate: [0.0003397070161836592]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.05019778185864774, 'test_loss': 0.4771391671150923, 'bleu': 18.8357, 'gen_len': 7.1918}




 88%|████████▊ | 407/465 [2:48:06<13:12, 13.67s/it]

For epoch 899: {Learning rate: [0.0003301868605694897]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.04917741548724291, 'test_loss': 0.484659905359149, 'bleu': 18.4211, 'gen_len': 7.2123}




 88%|████████▊ | 408/465 [2:48:19<12:59, 13.67s/it]

For epoch 900: {Learning rate: [0.00032066670495532013]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.35batches/s]



Metrics: {'train_loss': 0.0487464181170231, 'test_loss': 0.48449203819036485, 'bleu': 18.0332, 'gen_len': 7.2466}




 88%|████████▊ | 409/465 [2:48:34<12:59, 13.92s/it]

For epoch 901: {Learning rate: [0.00031114654934115063]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.04841474843461339, 'test_loss': 0.49076181203126906, 'bleu': 16.2351, 'gen_len': 7.5274}




 88%|████████▊ | 410/465 [2:48:48<12:43, 13.89s/it]

For epoch 902: {Learning rate: [0.0003016263937269811]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.04799142766107873, 'test_loss': 0.4916473224759102, 'bleu': 14.5576, 'gen_len': 7.8151}




 88%|████████▊ | 411/465 [2:49:01<12:24, 13.79s/it]

For epoch 903: {Learning rate: [0.00029210623811281157]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.04790308744442172, 'test_loss': 0.4922840271145105, 'bleu': 17.3941, 'gen_len': 7.5274}




 89%|████████▊ | 412/465 [2:49:15<12:08, 13.75s/it]

For epoch 904: {Learning rate: [0.000282586082498642]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.046825293724129836, 'test_loss': 0.5037984751164913, 'bleu': 19.0851, 'gen_len': 7.1712}




 89%|████████▉ | 413/465 [2:49:28<11:52, 13.71s/it]

For epoch 905: {Learning rate: [0.00027306592688447246]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.04701281970412266, 'test_loss': 0.49638362377882006, 'bleu': 17.817, 'gen_len': 7.6096}




 89%|████████▉ | 414/465 [2:49:42<11:35, 13.64s/it]

For epoch 906: {Learning rate: [0.00026354577127030295]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.046252395548835035, 'test_loss': 0.4922282934188843, 'bleu': 17.4496, 'gen_len': 7.4041}




 89%|████████▉ | 415/465 [2:49:56<11:24, 13.69s/it]

For epoch 907: {Learning rate: [0.0002540256156561334]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.04702457272243209, 'test_loss': 0.4867952026426792, 'bleu': 17.4104, 'gen_len': 7.4521}




 89%|████████▉ | 416/465 [2:50:10<11:13, 13.75s/it]

For epoch 908: {Learning rate: [0.0002445054600419639]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.69batches/s]



Metrics: {'train_loss': 0.045615463176878486, 'test_loss': 0.4924787972122431, 'bleu': 16.744, 'gen_len': 7.3425}




 90%|████████▉ | 417/465 [2:50:24<11:06, 13.89s/it]

For epoch 909: {Learning rate: [0.00023498530442779433]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.04506971073768488, 'test_loss': 0.49520233646035194, 'bleu': 17.2942, 'gen_len': 6.9521}




 90%|████████▉ | 418/465 [2:50:38<10:51, 13.86s/it]

For epoch 910: {Learning rate: [0.0002254651488136248]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.045169044013430436, 'test_loss': 0.48787569999694824, 'bleu': 18.3365, 'gen_len': 7.2123}




 90%|█████████ | 419/465 [2:50:51<10:32, 13.76s/it]

For epoch 911: {Learning rate: [0.00021594499319945527]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.04471765504014201, 'test_loss': 0.4912487160414457, 'bleu': 17.2778, 'gen_len': 7.3082}




 90%|█████████ | 420/465 [2:51:05<10:16, 13.71s/it]

For epoch 912: {Learning rate: [0.00020642483758528574]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.04401756791261638, 'test_loss': 0.49336192272603513, 'bleu': 17.8971, 'gen_len': 7.1301}




 91%|█████████ | 421/465 [2:51:18<10:00, 13.65s/it]

For epoch 913: {Learning rate: [0.0001969046819711162]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.63batches/s]



Metrics: {'train_loss': 0.04423096862326308, 'test_loss': 0.49227499924600127, 'bleu': 16.982, 'gen_len': 7.089}




 91%|█████████ | 422/465 [2:51:32<09:50, 13.73s/it]

For epoch 914: {Learning rate: [0.00018738452635694666]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.04311833248996153, 'test_loss': 0.4977465867996216, 'bleu': 18.6586, 'gen_len': 7.2329}




 91%|█████████ | 423/465 [2:51:46<09:34, 13.67s/it]

For epoch 915: {Learning rate: [0.00017786437074277713]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.04305561704606545, 'test_loss': 0.48903034403920176, 'bleu': 17.8625, 'gen_len': 7.274}




 91%|█████████ | 424/465 [2:51:59<09:20, 13.67s/it]

For epoch 916: {Learning rate: [0.0001683442151286076]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.042721421344251165, 'test_loss': 0.49401730671525, 'bleu': 16.0133, 'gen_len': 7.1164}




 91%|█████████▏| 425/465 [2:52:13<09:06, 13.65s/it]

For epoch 917: {Learning rate: [0.00015882405951443806]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.04281116127059227, 'test_loss': 0.4912476189434528, 'bleu': 17.0085, 'gen_len': 7.1575}




 92%|█████████▏| 426/465 [2:52:27<08:58, 13.81s/it]

For epoch 918: {Learning rate: [0.00014930390390026853]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.041908496036762145, 'test_loss': 0.4941379681229591, 'bleu': 16.2012, 'gen_len': 7.2123}




 92%|█████████▏| 427/465 [2:52:41<08:43, 13.77s/it]

For epoch 919: {Learning rate: [0.000139783748286099]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.0415153571563523, 'test_loss': 0.4951135773211718, 'bleu': 17.4167, 'gen_len': 7.1712}




 92%|█████████▏| 428/465 [2:52:55<08:30, 13.81s/it]

For epoch 920: {Learning rate: [0.00013026359267192947]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.04152565158721877, 'test_loss': 0.4969640769064426, 'bleu': 16.2489, 'gen_len': 7.1507}




 92%|█████████▏| 429/465 [2:53:08<08:14, 13.75s/it]

For epoch 921: {Learning rate: [0.00012074343705775994]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.04108006825170866, 'test_loss': 0.49578670151531695, 'bleu': 16.8264, 'gen_len': 7.0616}




 92%|█████████▏| 430/465 [2:53:22<08:04, 13.84s/it]

For epoch 922: {Learning rate: [0.0001112232814435904]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.0407974012196064, 'test_loss': 0.49137350991368295, 'bleu': 17.6523, 'gen_len': 7.1986}




 93%|█████████▎| 431/465 [2:53:36<07:47, 13.74s/it]

For epoch 923: {Learning rate: [0.00010170312582942087]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.04067599727976613, 'test_loss': 0.4969899863004684, 'bleu': 16.3712, 'gen_len': 7.1644}




 93%|█████████▎| 432/465 [2:53:50<07:33, 13.73s/it]

For epoch 924: {Learning rate: [9.218297021525134e-05]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.040314022769651764, 'test_loss': 0.4990922976285219, 'bleu': 16.5995, 'gen_len': 7.1096}




 93%|█████████▎| 433/465 [2:54:03<07:19, 13.73s/it]

For epoch 925: {Learning rate: [8.266281460108181e-05]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.040115161549027376, 'test_loss': 0.4985639274120331, 'bleu': 16.2179, 'gen_len': 7.1781}




 93%|█████████▎| 434/465 [2:54:17<07:04, 13.69s/it]

For epoch 926: {Learning rate: [7.314265898691228e-05]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.04008345633018308, 'test_loss': 0.49796501696109774, 'bleu': 17.6115, 'gen_len': 7.089}




 94%|█████████▎| 435/465 [2:54:30<06:49, 13.66s/it]

For epoch 927: {Learning rate: [6.362250337274273e-05]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.039926249942765, 'test_loss': 0.4984249543398619, 'bleu': 17.2608, 'gen_len': 7.1164}




 94%|█████████▍| 436/465 [2:54:44<06:34, 13.62s/it]

For epoch 928: {Learning rate: [5.41023477585732e-05]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.03974258236405326, 'test_loss': 0.4987957011908293, 'bleu': 16.7461, 'gen_len': 7.1781}




 94%|█████████▍| 437/465 [2:54:57<06:20, 13.58s/it]

For epoch 929: {Learning rate: [4.458219214440367e-05]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.039384302978472015, 'test_loss': 0.49757291190326214, 'bleu': 17.1793, 'gen_len': 7.1507}




 94%|█████████▍| 438/465 [2:55:11<06:06, 13.56s/it]

For epoch 930: {Learning rate: [3.506203653023414e-05]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.03904408566290286, 'test_loss': 0.4978446289896965, 'bleu': 17.4016, 'gen_len': 7.1301}




 94%|█████████▍| 439/465 [2:55:25<05:57, 13.76s/it]

For epoch 931: {Learning rate: [2.5541880916064603e-05]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.03891243149594563, 'test_loss': 0.49588767141103746, 'bleu': 17.2997, 'gen_len': 7.1301}




 95%|█████████▍| 440/465 [2:55:39<05:45, 13.83s/it]

For epoch 932: {Learning rate: [1.602172530189507e-05]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.32batches/s]



Metrics: {'train_loss': 0.03909105902946577, 'test_loss': 0.4957246284931898, 'bleu': 17.2568, 'gen_len': 7.1096}




 95%|█████████▍| 441/465 [2:55:54<05:37, 14.05s/it]

For epoch 933: {Learning rate: [6.501569687725536e-06]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.038966659729073684, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 95%|█████████▌| 442/465 [2:56:07<05:20, 13.94s/it]

For epoch 934: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.03884932971218737, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 95%|█████████▌| 443/465 [2:56:21<05:04, 13.85s/it]

For epoch 935: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.82batches/s]



Metrics: {'train_loss': 0.03921581168727177, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 95%|█████████▌| 444/465 [2:56:35<04:50, 13.81s/it]

For epoch 936: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.038489772642894486, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 96%|█████████▌| 445/465 [2:56:48<04:34, 13.74s/it]

For epoch 937: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.03899019525000235, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 96%|█████████▌| 446/465 [2:57:02<04:21, 13.74s/it]

For epoch 938: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.03891002295947656, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 96%|█████████▌| 447/465 [2:57:16<04:06, 13.70s/it]

For epoch 939: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.03850320263243303, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 96%|█████████▋| 448/465 [2:57:29<03:52, 13.68s/it]

For epoch 940: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.03870845104499561, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 97%|█████████▋| 449/465 [2:57:43<03:38, 13.68s/it]

For epoch 941: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.03922688974658164, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 97%|█████████▋| 450/465 [2:57:57<03:25, 13.68s/it]

For epoch 942: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.03888304118157887, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 97%|█████████▋| 451/465 [2:58:10<03:11, 13.68s/it]

For epoch 943: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.038901780936412694, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 97%|█████████▋| 452/465 [2:58:24<02:58, 13.77s/it]

For epoch 944: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.039265907437699595, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 97%|█████████▋| 453/465 [2:58:38<02:46, 13.86s/it]

For epoch 945: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.53batches/s]



Metrics: {'train_loss': 0.03909504022903559, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 98%|█████████▊| 454/465 [2:58:53<02:33, 13.98s/it]

For epoch 946: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.0388043451054794, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 98%|█████████▊| 455/465 [2:59:06<02:18, 13.87s/it]

For epoch 947: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.03848912385178775, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 98%|█████████▊| 456/465 [2:59:20<02:04, 13.84s/it]

For epoch 948: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.038943934049911616, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 98%|█████████▊| 457/465 [2:59:34<01:50, 13.81s/it]

For epoch 949: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.0388662581581895, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 98%|█████████▊| 458/465 [2:59:47<01:36, 13.75s/it]

For epoch 950: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:08<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.038915594858003826, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 99%|█████████▊| 459/465 [3:00:01<01:22, 13.75s/it]

For epoch 951: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.04batches/s]



Metrics: {'train_loss': 0.0389924706754888, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 99%|█████████▉| 460/465 [3:00:17<01:11, 14.35s/it]

For epoch 952: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.42batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.19batches/s]



Metrics: {'train_loss': 0.039222680986291024, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 99%|█████████▉| 461/465 [3:00:33<00:59, 14.86s/it]

For epoch 953: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.08batches/s]



Metrics: {'train_loss': 0.03922341723085904, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




 99%|█████████▉| 462/465 [3:00:49<00:45, 15.08s/it]

For epoch 954: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.18batches/s]



Metrics: {'train_loss': 0.038651127686224335, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




100%|█████████▉| 463/465 [3:01:04<00:30, 15.20s/it]

For epoch 955: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.44batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.10batches/s]



Metrics: {'train_loss': 0.03865489599908271, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




100%|█████████▉| 464/465 [3:01:20<00:15, 15.33s/it]

For epoch 956: {Learning rate: [0.0]}


Train batch number 40: 100%|██████████| 41/41 [00:09<00:00,  4.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.14batches/s]



Metrics: {'train_loss': 0.03899736888706684, 'test_loss': 0.4955804590135813, 'bleu': 17.2568, 'gen_len': 7.1096}




100%|██████████| 465/465 [3:01:35<00:00, 23.43s/it]


--------------

### ---

In [7]:
# let us initialize the hyperparameter configuration
config = {
    'random_state': 0,
    'fr_char_p': 0.2725135383958951,
    'fr_word_p': 0.8803557671452257,
    'learning_rate': 0.0007899243330013148,
    'weight_decay': 0.19140839339569107,
    'batch_size': 8,
    'warmup_ratio': 0.0,
    'max_epoch': 927,
    'bleu': 1.9136,
    'model_dir': 'data/checkpoints/fw_t5_small_custom_train_v3_2_checkpoints/',
    'new_model_dir': 'data/checkpoints/t5_small_custom_train_results_fw_v3_2/'
}

# Initialize the model name
model_name = 't5-small'

# import the model with its pre-trained weights
model = T5ForConditionalGeneration.from_pretrained(model_name)

# resize the token embeddings
model.resize_token_embeddings(len(tokenizer))

# let us initialize the evaluation class
evaluation = TranslationEvaluation(tokenizer)

# let us initialize the trainer
trainer = ModelRunner(model, seed = 0, version = 1, evaluation = evaluation, optimizer=Adafactor)

# split the data
split_data(config['random_state'])

# recuperate train and test set
train_dataset, test_dataset = recuperate_datasets(config['fr_char_p'], 
                                                    config['fr_word_p'])

# let us calculate the appropriate warmup steps (let us take a max epoch of 100)
length = len(train_dataset)

n_steps = length // config['batch_size']

num_steps = config['max_epoch'] * n_steps

warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

# Initialize the scheduler parameters
scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}

# Initialize the optimizer parameters
optimizer_args = {
    'lr': config['learning_rate'],
    'weight_decay': config['weight_decay'],
    # 'betas': (0.9, 0.98),
    'relative_step': False
}

# Initialize the loaders parameters
train_loader_args = {'batch_size': config['batch_size']}

# Add the datasets and hyperparameters to trainer
trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                optimizer_kwargs = optimizer_args,
                lr_scheduler=get_cosine_schedule_with_warmup,
                lr_scheduler_kwargs=scheduler_args, 
                predict_with_generate = True,
                hugging_face = True,
                logging_dir="data/logs/t5_small_custom_train_fw_v3_2"
                )

# We will from checkpoints so let us the model
trainer.load(config['model_dir'], load_best=True) # Only for the first loading
# trainer.load(config['new_model_dir'])

        

### ---

In [8]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 4: {Learning rate: [0.0006099460743927353]}


Train batch number 163: 100%|██████████| 164/164 [00:27<00:00,  5.88batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.55batches/s]



Metrics: {'train_loss': 0.6670737142969922, 'test_loss': 0.6173947781324387, 'bleu': 0.7582, 'gen_len': 6.9863}




  0%|          | 1/467 [00:35<4:33:30, 35.21s/it]

For epoch 5: {Learning rate: [0.0008132614325236472]}


Train batch number 163: 100%|██████████| 164/164 [00:23<00:00,  6.86batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.5947909109839579, 'test_loss': 0.5756663888692856, 'bleu': 1.1484, 'gen_len': 6.3151}




  0%|          | 2/467 [01:05<4:10:59, 32.39s/it]

For epoch 6: {Learning rate: [0.001016576790654559]}


Train batch number 163: 100%|██████████| 164/164 [00:23<00:00,  6.92batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.64batches/s]



Metrics: {'train_loss': 0.5204292920122786, 'test_loss': 0.5491259142756462, 'bleu': 1.8947, 'gen_len': 6.1918}




  1%|          | 3/467 [01:43<4:29:11, 34.81s/it]

For epoch 7: {Learning rate: [0.0012198921487854707]}


Train batch number 163: 100%|██████████| 164/164 [00:23<00:00,  7.00batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.16batches/s]



Metrics: {'train_loss': 0.441221484051245, 'test_loss': 0.5286498919129372, 'bleu': 2.5919, 'gen_len': 6.4041}




  1%|          | 4/467 [02:14<4:17:18, 33.35s/it]

For epoch 8: {Learning rate: [0.0014232075069163827]}


Train batch number 163: 100%|██████████| 164/164 [00:24<00:00,  6.79batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.58batches/s]



Metrics: {'train_loss': 0.3674245117822798, 'test_loss': 0.5313937559723854, 'bleu': 2.1825, 'gen_len': 5.9932}




  1%|          | 5/467 [02:44<4:08:35, 32.28s/it]

For epoch 9: {Learning rate: [0.0016265228650472945]}


Train batch number 163: 100%|██████████| 164/164 [00:23<00:00,  7.07batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.3056399251629667, 'test_loss': 0.5248313203454018, 'bleu': 2.3792, 'gen_len': 6.7877}




  1%|▏         | 6/467 [03:13<3:59:43, 31.20s/it]

For epoch 10: {Learning rate: [0.0018298382231782062]}


Train batch number 163: 100%|██████████| 164/164 [00:25<00:00,  6.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.2563545144185787, 'test_loss': 0.5134955465793609, 'bleu': 2.9479, 'gen_len': 7.9384}




  1%|▏         | 7/467 [03:50<4:11:50, 32.85s/it]

For epoch 11: {Learning rate: [0.002033153581309118]}


Train batch number 163: 100%|██████████| 164/164 [00:24<00:00,  6.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.51batches/s]



Metrics: {'train_loss': 0.21852976806098368, 'test_loss': 0.524639543890953, 'bleu': 6.0248, 'gen_len': 6.7671}




  2%|▏         | 8/467 [04:24<4:14:57, 33.33s/it]

For epoch 12: {Learning rate: [0.00223646893944003]}


Train batch number 163: 100%|██████████| 164/164 [00:26<00:00,  6.11batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.12batches/s]



Metrics: {'train_loss': 0.1945633771942883, 'test_loss': 0.538030031323433, 'bleu': 5.6195, 'gen_len': 6.2534}




  2%|▏         | 9/467 [04:58<4:15:36, 33.49s/it]

For epoch 13: {Learning rate: [0.0024397842975709414]}


Train batch number 163: 100%|██████████| 164/164 [00:29<00:00,  5.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.41batches/s]



Metrics: {'train_loss': 0.18490039284636334, 'test_loss': 0.5214846804738045, 'bleu': 3.7503, 'gen_len': 7.2534}




  2%|▏         | 10/467 [05:34<4:20:17, 34.17s/it]

For epoch 14: {Learning rate: [0.0026430996557018534]}


Train batch number 163: 100%|██████████| 164/164 [00:27<00:00,  6.05batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.92batches/s]



Metrics: {'train_loss': 0.17922210561611304, 'test_loss': 0.534231586754322, 'bleu': 4.5916, 'gen_len': 7.5068}




  2%|▏         | 11/467 [06:08<4:19:38, 34.16s/it]

For epoch 15: {Learning rate: [0.0028464150138327654]}


Train batch number 163: 100%|██████████| 164/164 [00:25<00:00,  6.31batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.35batches/s]



Metrics: {'train_loss': 0.17415801930900027, 'test_loss': 0.5186135053634644, 'bleu': 5.6104, 'gen_len': 7.3288}




  3%|▎         | 12/467 [06:40<4:14:13, 33.52s/it]

For epoch 16: {Learning rate: [0.003049730371963677]}


Train batch number 163: 100%|██████████| 164/164 [00:25<00:00,  6.42batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.29batches/s]



Metrics: {'train_loss': 0.16892985749717165, 'test_loss': 0.5360594511032104, 'bleu': 3.8059, 'gen_len': 7.1301}




  3%|▎         | 13/467 [07:11<4:09:29, 32.97s/it]

For epoch 17: {Learning rate: [0.003253045730094589]}


Train batch number 163: 100%|██████████| 164/164 [00:25<00:00,  6.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.28batches/s]



Metrics: {'train_loss': 0.18367366086146453, 'test_loss': 0.5103440552949905, 'bleu': 5.2491, 'gen_len': 6.911}




  3%|▎         | 14/467 [07:43<4:06:27, 32.64s/it]

For epoch 18: {Learning rate: [0.0034563610882255005]}


Train batch number 163: 100%|██████████| 164/164 [00:26<00:00,  6.28batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.27batches/s]



Metrics: {'train_loss': 0.1780156845726618, 'test_loss': 0.5074075534939766, 'bleu': 5.2753, 'gen_len': 8.0822}




  3%|▎         | 15/467 [08:16<4:05:35, 32.60s/it]

For epoch 19: {Learning rate: [0.0036596764463564125]}


Train batch number 163: 100%|██████████| 164/164 [00:26<00:00,  6.27batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.30batches/s]



Metrics: {'train_loss': 0.19008216457214297, 'test_loss': 0.49808707386255263, 'bleu': 5.4707, 'gen_len': 7.4247}




  3%|▎         | 16/467 [08:48<4:04:23, 32.51s/it]

For epoch 20: {Learning rate: [0.0038629918044873245]}


Train batch number 163: 100%|██████████| 164/164 [00:27<00:00,  5.92batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.47batches/s]



Metrics: {'train_loss': 0.19345010817050934, 'test_loss': 0.5006389483809471, 'bleu': 6.914, 'gen_len': 6.8151}




  4%|▎         | 17/467 [09:25<4:14:22, 33.92s/it]

For epoch 21: {Learning rate: [0.004066307162618236]}


Train batch number 163: 100%|██████████| 164/164 [00:25<00:00,  6.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.28batches/s]



Metrics: {'train_loss': 0.20878721864485159, 'test_loss': 0.4732896491885185, 'bleu': 6.6174, 'gen_len': 6.9384}




  4%|▍         | 18/467 [09:57<4:08:07, 33.16s/it]

For epoch 22: {Learning rate: [0.004269622520749148]}


Train batch number 163: 100%|██████████| 164/164 [00:25<00:00,  6.32batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.37batches/s]



Metrics: {'train_loss': 0.21177325588537427, 'test_loss': 0.489973746240139, 'bleu': 7.3116, 'gen_len': 7.2329}




  4%|▍         | 19/467 [10:30<4:07:25, 33.14s/it]

For epoch 23: {Learning rate: [0.00447293787888006]}


Train batch number 163: 100%|██████████| 164/164 [00:25<00:00,  6.42batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.29batches/s]



Metrics: {'train_loss': 0.22856671926451894, 'test_loss': 0.4873707488179207, 'bleu': 4.3445, 'gen_len': 7.4247}




  4%|▍         | 20/467 [11:01<4:03:34, 32.70s/it]

For epoch 24: {Learning rate: [0.004676253237010972]}


Train batch number 163: 100%|██████████| 164/164 [00:25<00:00,  6.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.46batches/s]



Metrics: {'train_loss': 0.24182206355943914, 'test_loss': 0.47043479457497595, 'bleu': 3.1855, 'gen_len': 8.1575}




  4%|▍         | 21/467 [11:33<4:00:42, 32.38s/it]

For epoch 25: {Learning rate: [0.004879568595141883]}


Train batch number 163: 100%|██████████| 164/164 [00:29<00:00,  5.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:05<00:00,  1.81batches/s]



Metrics: {'train_loss': 0.24890993417399684, 'test_loss': 0.4686092555522919, 'bleu': 5.7921, 'gen_len': 7.6301}




  5%|▍         | 22/467 [12:10<4:10:15, 33.74s/it]

For epoch 26: {Learning rate: [0.005082883953272795]}


Train batch number 163: 100%|██████████| 164/164 [00:27<00:00,  5.87batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.20batches/s]



Metrics: {'train_loss': 0.26946339842568084, 'test_loss': 0.4798166826367378, 'bleu': 2.2519, 'gen_len': 6.089}




  5%|▍         | 23/467 [12:44<4:11:12, 33.95s/it]

For epoch 27: {Learning rate: [0.005286199311403707]}


Train batch number 163: 100%|██████████| 164/164 [00:27<00:00,  5.89batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.50batches/s]



Metrics: {'train_loss': 0.2732226709585364, 'test_loss': 0.464755243062973, 'bleu': 2.8803, 'gen_len': 7.5411}




  5%|▌         | 24/467 [13:18<4:09:48, 33.83s/it]

For epoch 28: {Learning rate: [0.005489514669534619]}


Train batch number 163: 100%|██████████| 164/164 [00:27<00:00,  5.86batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.17batches/s]



Metrics: {'train_loss': 0.2995375098615158, 'test_loss': 0.4577317535877228, 'bleu': 3.6677, 'gen_len': 6.5548}




  5%|▌         | 25/467 [13:53<4:10:52, 34.05s/it]

For epoch 29: {Learning rate: [0.005692830027665531]}


Train batch number 163: 100%|██████████| 164/164 [00:26<00:00,  6.26batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:03<00:00,  2.57batches/s]



Metrics: {'train_loss': 0.31050738310668524, 'test_loss': 0.4655227243900299, 'bleu': 2.0358, 'gen_len': 9.1781}




  6%|▌         | 26/467 [14:25<4:05:39, 33.42s/it]

For epoch 30: {Learning rate: [0.005896145385796442]}


Train batch number 163: 100%|██████████| 164/164 [00:25<00:00,  6.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:04<00:00,  2.27batches/s]



Metrics: {'train_loss': 0.32658602651662944, 'test_loss': 0.48370583057403566, 'bleu': 1.9209, 'gen_len': 5.2671}




  6%|▌         | 27/467 [14:56<4:00:57, 32.86s/it]

For epoch 31: {Learning rate: [0.006099460743927354]}


### ---

In [8]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 4: {Learning rate: [0.0007899036688778694]}


Train batch number 164: 100%|██████████| 164/164 [00:38<00:00,  4.25batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.65batches/s]



Metrics: {'train_loss': 0.5944981680410665, 'test_loss': 0.5867837905883789, 'bleu': 0.9198, 'gen_len': 6.3082}




  0%|          | 1/924 [00:45<11:33:45, 45.10s/it]

For epoch 5: {Learning rate: [0.0007898875970310081]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.47batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.5643130628074088, 'test_loss': 0.5865157037973404, 'bleu': 0.8552, 'gen_len': 6.7123}




  0%|          | 2/924 [01:27<11:08:03, 43.47s/it]

For epoch 6: {Learning rate: [0.0007898669335482351]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.61batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.45batches/s]



Metrics: {'train_loss': 0.5451048398163261, 'test_loss': 0.572836683690548, 'bleu': 0.9264, 'gen_len': 6.2123}




  0%|          | 3/924 [02:09<10:54:39, 42.65s/it]

For epoch 7: {Learning rate: [0.0007898416786697965]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.51batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.17batches/s]



Metrics: {'train_loss': 0.5250344836130375, 'test_loss': 0.561720785498619, 'bleu': 1.3054, 'gen_len': 6.8699}




  0%|          | 4/924 [02:51<10:54:49, 42.71s/it]

For epoch 8: {Learning rate: [0.0007898118326893202]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.56batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.63batches/s]



Metrics: {'train_loss': 0.5043951425610519, 'test_loss': 0.564942017197609, 'bleu': 0.8109, 'gen_len': 6.589}




  1%|          | 5/924 [03:33<10:50:41, 42.48s/it]

For epoch 9: {Learning rate: [0.0007897773959538135]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.47batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.74batches/s]



Metrics: {'train_loss': 0.48723794519901276, 'test_loss': 0.5392887413501739, 'bleu': 1.2364, 'gen_len': 6.8836}




  1%|          | 6/924 [04:15<10:44:33, 42.13s/it]

For epoch 10: {Learning rate: [0.0007897383688636578]}


Train batch number 164: 100%|██████████| 164/164 [00:39<00:00,  4.17batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.46737417160737804, 'test_loss': 0.5443646818399429, 'bleu': 1.1505, 'gen_len': 6.863}




  1%|          | 7/924 [04:59<10:53:07, 42.73s/it]

For epoch 11: {Learning rate: [0.0007896947518726054]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.57batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.45127932618303995, 'test_loss': 0.5426753893494606, 'bleu': 1.0989, 'gen_len': 6.7192}




  1%|          | 8/924 [05:40<10:43:12, 42.13s/it]

For epoch 12: {Learning rate: [0.0007896465454877729]}


Train batch number 164: 100%|██████████| 164/164 [00:39<00:00,  4.17batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.64batches/s]



Metrics: {'train_loss': 0.4348718680259658, 'test_loss': 0.5329298868775367, 'bleu': 1.1535, 'gen_len': 7.0068}




  1%|          | 9/924 [06:24<10:52:11, 42.77s/it]

For epoch 13: {Learning rate: [0.0007895937502696361]}


Train batch number 164: 100%|██████████| 164/164 [00:37<00:00,  4.38batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.51batches/s]



Metrics: {'train_loss': 0.41700041094204277, 'test_loss': 0.5355749860405922, 'bleu': 0.8418, 'gen_len': 6.6164}




  1%|          | 10/924 [07:07<10:52:02, 42.80s/it]

For epoch 14: {Learning rate: [0.0007895363668320237]}


Train batch number 164: 100%|██████████| 164/164 [00:37<00:00,  4.43batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.4024898214492856, 'test_loss': 0.5190224424004555, 'bleu': 0.8216, 'gen_len': 7.0479}




  1%|          | 11/924 [07:49<10:49:33, 42.69s/it]

For epoch 15: {Learning rate: [0.0007894743958421091]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.51batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.64batches/s]



Metrics: {'train_loss': 0.3869156777495291, 'test_loss': 0.5379147097468376, 'bleu': 0.8123, 'gen_len': 6.4589}




  1%|▏         | 12/924 [08:31<10:44:12, 42.38s/it]

For epoch 16: {Learning rate: [0.0007894078380204036]}


Train batch number 164: 100%|██████████| 164/164 [00:38<00:00,  4.26batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.59batches/s]



Metrics: {'train_loss': 0.37270856012658377, 'test_loss': 0.5323194593191147, 'bleu': 0.844, 'gen_len': 6.7534}




  1%|▏         | 13/924 [09:15<10:52:32, 42.98s/it]

For epoch 17: {Learning rate: [0.0007893366941407478]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.49batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.35363869450804664, 'test_loss': 0.5445445477962494, 'bleu': 0.9245, 'gen_len': 6.1438}




  2%|▏         | 14/924 [09:57<10:45:44, 42.58s/it]

For epoch 18: {Learning rate: [0.0007892609650303023]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.76batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.34201622054707714, 'test_loss': 0.5304709792137146, 'bleu': 0.9582, 'gen_len': 6.4795}




  2%|▏         | 15/924 [10:36<10:28:41, 41.50s/it]

For epoch 19: {Learning rate: [0.0007891806515695383]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.79batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.3252134483035018, 'test_loss': 0.5200485706329345, 'bleu': 1.3507, 'gen_len': 6.6781}




  2%|▏         | 16/924 [11:15<10:18:41, 40.88s/it]

For epoch 20: {Learning rate: [0.0007890957546922276]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.70batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.3114710154875023, 'test_loss': 0.528302264213562, 'bleu': 1.932, 'gen_len': 6.4315}




  2%|▏         | 17/924 [11:56<10:16:33, 40.79s/it]

For epoch 21: {Learning rate: [0.0007890062753854314]}


Train batch number 164: 100%|██████████| 164/164 [00:40<00:00,  4.10batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.42batches/s]



Metrics: {'train_loss': 0.29759043137111313, 'test_loss': 0.5230146050453186, 'bleu': 1.9027, 'gen_len': 6.5822}




  2%|▏         | 18/924 [12:41<10:36:27, 42.15s/it]

For epoch 22: {Learning rate: [0.0007889122146894886]}


Train batch number 164: 100%|██████████| 164/164 [00:42<00:00,  3.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.38batches/s]



Metrics: {'train_loss': 0.2844102197304005, 'test_loss': 0.5230530649423599, 'bleu': 1.9029, 'gen_len': 6.6164}




  2%|▏         | 19/924 [13:29<11:00:01, 43.76s/it]

For epoch 23: {Learning rate: [0.0007888135736980047]}


Train batch number 164: 100%|██████████| 164/164 [00:42<00:00,  3.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.44batches/s]



Metrics: {'train_loss': 0.2714919821336502, 'test_loss': 0.5186380386352539, 'bleu': 1.8131, 'gen_len': 6.8836}




  2%|▏         | 20/924 [14:16<11:15:31, 44.84s/it]

For epoch 24: {Learning rate: [0.0007887103535578377]}


Train batch number 164: 100%|██████████| 164/164 [00:42<00:00,  3.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.15batches/s]



Metrics: {'train_loss': 0.2558402121975655, 'test_loss': 0.5224869191646576, 'bleu': 1.7501, 'gen_len': 6.6438}




  2%|▏         | 21/924 [15:04<11:28:38, 45.76s/it]

For epoch 25: {Learning rate: [0.0007886025554690858]}


Train batch number 164: 100%|██████████| 164/164 [00:39<00:00,  4.19batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.50batches/s]



Metrics: {'train_loss': 0.24439441726156852, 'test_loss': 0.5218195348978043, 'bleu': 1.8515, 'gen_len': 6.6849}




  2%|▏         | 22/924 [15:48<11:21:17, 45.32s/it]

For epoch 26: {Learning rate: [0.0007884901806850734]}


Train batch number 164: 100%|██████████| 164/164 [00:40<00:00,  4.06batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.43batches/s]



Metrics: {'train_loss': 0.23250414862683633, 'test_loss': 0.5235583513975144, 'bleu': 3.3871, 'gen_len': 6.637}




  2%|▏         | 23/924 [16:34<11:23:32, 45.52s/it]

For epoch 27: {Learning rate: [0.0007883732305123359]}


Train batch number 164: 100%|██████████| 164/164 [00:40<00:00,  4.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.44batches/s]



Metrics: {'train_loss': 0.22229368425905704, 'test_loss': 0.5337047874927521, 'bleu': 3.1542, 'gen_len': 6.4521}




  3%|▎         | 24/924 [17:20<11:24:32, 45.64s/it]

For epoch 28: {Learning rate: [0.0007882517063106049]}


Train batch number 164: 100%|██████████| 164/164 [00:39<00:00,  4.11batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.05batches/s]



Metrics: {'train_loss': 0.20977857240998163, 'test_loss': 0.52165407538414, 'bleu': 1.6994, 'gen_len': 6.4726}




  3%|▎         | 25/924 [18:06<11:25:10, 45.73s/it]

For epoch 29: {Learning rate: [0.0007881256094927924]}


Train batch number 164: 100%|██████████| 164/164 [00:39<00:00,  4.10batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.50batches/s]



Metrics: {'train_loss': 0.19821283321191624, 'test_loss': 0.5051108986139298, 'bleu': 1.6804, 'gen_len': 6.911}




  3%|▎         | 26/924 [18:51<11:22:33, 45.61s/it]

For epoch 30: {Learning rate: [0.0007879949415249745]}


Train batch number 164: 100%|██████████| 164/164 [00:40<00:00,  4.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.40batches/s]



Metrics: {'train_loss': 0.18739141183110272, 'test_loss': 0.5084527522325516, 'bleu': 3.6513, 'gen_len': 6.8219}




  3%|▎         | 27/924 [19:38<11:25:12, 45.83s/it]

For epoch 31: {Learning rate: [0.0007878597039263739]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.60batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.1793556391011651, 'test_loss': 0.5041384652256966, 'bleu': 4.6626, 'gen_len': 6.8425}




  3%|▎         | 28/924 [20:18<11:00:33, 44.23s/it]

For epoch 32: {Learning rate: [0.0007877198982693427]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.68batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.84batches/s]



Metrics: {'train_loss': 0.16875481546470306, 'test_loss': 0.5095775723457336, 'bleu': 4.1516, 'gen_len': 6.8219}




  3%|▎         | 29/924 [20:58<10:39:07, 42.85s/it]

For epoch 33: {Learning rate: [0.0007875755261793439]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.55batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.15834912939406023, 'test_loss': 0.5188314080238342, 'bleu': 4.2823, 'gen_len': 6.7808}




  3%|▎         | 30/924 [21:39<10:29:23, 42.24s/it]

For epoch 34: {Learning rate: [0.0007874265893349326]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.77batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.90batches/s]



Metrics: {'train_loss': 0.14903407248600228, 'test_loss': 0.5221192479133606, 'bleu': 6.6905, 'gen_len': 6.637}




  3%|▎         | 31/924 [22:20<10:22:10, 41.80s/it]

For epoch 35: {Learning rate: [0.0007872730894677362]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.72batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.92batches/s]



Metrics: {'train_loss': 0.1418559008800402, 'test_loss': 0.5304420560598373, 'bleu': 5.2303, 'gen_len': 6.7671}




  3%|▎         | 32/924 [22:59<10:10:10, 41.04s/it]

For epoch 36: {Learning rate: [0.0007871150283624349]}


Train batch number 164: 100%|██████████| 164/164 [00:33<00:00,  4.85batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.94batches/s]



Metrics: {'train_loss': 0.13472267824066123, 'test_loss': 0.5301214575767517, 'bleu': 5.4434, 'gen_len': 6.7603}




  4%|▎         | 33/924 [23:37<9:57:24, 40.23s/it] 

For epoch 37: {Learning rate: [0.0007869524078567401]}


Train batch number 164: 100%|██████████| 164/164 [00:37<00:00,  4.37batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.12616859821648133, 'test_loss': 0.5623002618551254, 'bleu': 4.4218, 'gen_len': 6.4452}




  4%|▎         | 34/924 [24:20<10:06:15, 40.87s/it]

For epoch 38: {Learning rate: [0.0007867852298413738]}


Train batch number 164: 100%|██████████| 164/164 [00:38<00:00,  4.24batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.78batches/s]



Metrics: {'train_loss': 0.12056277142610491, 'test_loss': 0.5487404078245163, 'bleu': 5.4744, 'gen_len': 6.8014}




  4%|▍         | 35/924 [25:03<10:18:07, 41.72s/it]

For epoch 39: {Learning rate: [0.0007866134962600461]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.63batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.77batches/s]



Metrics: {'train_loss': 0.11424108880867319, 'test_loss': 0.5271997958421707, 'bleu': 6.3666, 'gen_len': 7.1849}




  4%|▍         | 36/924 [25:43<10:10:14, 41.23s/it]

For epoch 40: {Learning rate: [0.0007864372091094331]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.69batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.51batches/s]



Metrics: {'train_loss': 0.10611343583682688, 'test_loss': 0.5434063762426377, 'bleu': 5.4247, 'gen_len': 6.7945}




  4%|▍         | 37/924 [26:23<10:03:55, 40.85s/it]

For epoch 41: {Learning rate: [0.0007862563704391528]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.58batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.09955620879261959, 'test_loss': 0.5499060809612274, 'bleu': 4.9974, 'gen_len': 6.7603}




  4%|▍         | 38/924 [27:04<10:04:20, 40.93s/it]

For epoch 42: {Learning rate: [0.0007860709823517424]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.48batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.67batches/s]



Metrics: {'train_loss': 0.0950865240131573, 'test_loss': 0.5406321167945862, 'bleu': 5.5656, 'gen_len': 6.8014}




  4%|▍         | 39/924 [27:46<10:05:55, 41.08s/it]

For epoch 43: {Learning rate: [0.0007858810470026331]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.70batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.09072694877480589, 'test_loss': 0.5436183750629425, 'bleu': 7.1152, 'gen_len': 6.9932}




  4%|▍         | 40/924 [28:26<9:59:50, 40.71s/it] 

For epoch 44: {Learning rate: [0.0007856865666001254]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.68batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.62batches/s]



Metrics: {'train_loss': 0.08485209383070469, 'test_loss': 0.5436257570981979, 'bleu': 5.6926, 'gen_len': 6.9795}




  4%|▍         | 41/924 [29:05<9:54:58, 40.43s/it]

For epoch 45: {Learning rate: [0.0007854875434053628]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.65batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.71batches/s]



Metrics: {'train_loss': 0.08107480888323086, 'test_loss': 0.5409320592880249, 'bleu': 7.3653, 'gen_len': 6.9795}




  5%|▍         | 42/924 [29:46<9:53:53, 40.40s/it]

For epoch 46: {Learning rate: [0.0007852839797323066]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.45batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.55batches/s]



Metrics: {'train_loss': 0.07730913875488246, 'test_loss': 0.5451938480138778, 'bleu': 8.0267, 'gen_len': 6.8767}




  5%|▍         | 43/924 [30:28<10:00:46, 40.91s/it]

For epoch 47: {Learning rate: [0.0007850758779477079]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.69batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.07363809733765155, 'test_loss': 0.5395638585090637, 'bleu': 8.8432, 'gen_len': 6.9795}




  5%|▍         | 44/924 [31:08<9:56:08, 40.65s/it] 

For epoch 48: {Learning rate: [0.000784863240471081]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.50batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.85batches/s]



Metrics: {'train_loss': 0.07005867895829242, 'test_loss': 0.5431196987628937, 'bleu': 8.3825, 'gen_len': 6.9932}




  5%|▍         | 45/924 [31:49<9:57:12, 40.77s/it]

For epoch 49: {Learning rate: [0.0007846460697746743]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.67batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.06725596507057184, 'test_loss': 0.548516008257866, 'bleu': 7.5384, 'gen_len': 7.0411}




  5%|▍         | 46/924 [32:29<9:52:21, 40.48s/it]

For epoch 50: {Learning rate: [0.0007844243683834424]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.62batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.91batches/s]



Metrics: {'train_loss': 0.0641980488761896, 'test_loss': 0.5578081429004669, 'bleu': 7.1613, 'gen_len': 6.8288}




  5%|▌         | 47/924 [33:09<9:49:52, 40.36s/it]

For epoch 51: {Learning rate: [0.0007841981388750166]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.51batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.06110850478545195, 'test_loss': 0.5515132486820221, 'bleu': 8.6565, 'gen_len': 6.7329}




  5%|▌         | 48/924 [33:50<9:51:21, 40.50s/it]

For epoch 52: {Learning rate: [0.0007839673838796742]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.61batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.34batches/s]



Metrics: {'train_loss': 0.059088235039536544, 'test_loss': 0.5420749217271805, 'bleu': 9.2902, 'gen_len': 7.137}




  5%|▌         | 49/924 [34:31<9:55:19, 40.82s/it]

For epoch 53: {Learning rate: [0.0007837321060803093]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.55batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.054112356644487235, 'test_loss': 0.5403976708650589, 'bleu': 8.0435, 'gen_len': 7.1096}




  5%|▌         | 50/924 [35:12<9:53:58, 40.78s/it]

For epoch 54: {Learning rate: [0.0007834923082124]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.73batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.05050167350507364, 'test_loss': 0.5421034008264541, 'bleu': 10.058, 'gen_len': 6.9863}




  6%|▌         | 51/924 [35:52<9:50:09, 40.56s/it]

For epoch 55: {Learning rate: [0.0007832479930639779]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.50batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.049678588944782574, 'test_loss': 0.5521167010068894, 'bleu': 8.9466, 'gen_len': 7.0685}




  6%|▌         | 52/924 [36:33<9:51:59, 40.73s/it]

For epoch 56: {Learning rate: [0.0007829991634755946]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.56batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.04755852276050463, 'test_loss': 0.5415709227323532, 'bleu': 9.6389, 'gen_len': 7.137}




  6%|▌         | 53/924 [37:15<9:58:23, 41.22s/it]

For epoch 57: {Learning rate: [0.0007827458223402901]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.64batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.047682991966877766, 'test_loss': 0.5515088438987732, 'bleu': 9.3782, 'gen_len': 7.0274}




  6%|▌         | 54/924 [37:56<9:53:46, 40.95s/it]

For epoch 58: {Learning rate: [0.0007824879726035576]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.69batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.97batches/s]



Metrics: {'train_loss': 0.04476221905248921, 'test_loss': 0.5594667106866836, 'bleu': 7.1811, 'gen_len': 7.089}




  6%|▌         | 55/924 [38:36<9:49:03, 40.67s/it]

For epoch 59: {Learning rate: [0.0007822256172633099]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.68batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.93batches/s]



Metrics: {'train_loss': 0.04279203717483253, 'test_loss': 0.5473783671855926, 'bleu': 9.0659, 'gen_len': 6.9795}




  6%|▌         | 56/924 [39:16<9:45:28, 40.47s/it]

For epoch 60: {Learning rate: [0.0007819587593698452]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.51batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.04213112673903929, 'test_loss': 0.5514399498701096, 'bleu': 9.8174, 'gen_len': 7.0822}




  6%|▌         | 57/924 [39:57<9:47:47, 40.68s/it]

For epoch 61: {Learning rate: [0.0007816874020258108]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.61batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.04030789363356989, 'test_loss': 0.5648405343294144, 'bleu': 9.6799, 'gen_len': 6.8699}




  6%|▋         | 58/924 [40:37<9:45:50, 40.59s/it]

For epoch 62: {Learning rate: [0.0007814115483861669]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.63batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.66batches/s]



Metrics: {'train_loss': 0.03860947833873513, 'test_loss': 0.5380843043327331, 'bleu': 8.6367, 'gen_len': 7.1849}




  6%|▋         | 59/924 [41:17<9:43:01, 40.44s/it]

For epoch 63: {Learning rate: [0.0007811312016581507]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.61batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.037682257927727045, 'test_loss': 0.5370070338249207, 'bleu': 8.6365, 'gen_len': 7.1781}




  6%|▋         | 60/924 [41:58<9:41:35, 40.39s/it]

For epoch 64: {Learning rate: [0.0007808463651012385]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.51batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.037389427389404394, 'test_loss': 0.5306890964508056, 'bleu': 10.718, 'gen_len': 7.1918}




  7%|▋         | 61/924 [42:39<9:44:49, 40.66s/it]

For epoch 65: {Learning rate: [0.0007805570420271081]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.68batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.79batches/s]



Metrics: {'train_loss': 0.03551594413271764, 'test_loss': 0.5604428797960281, 'bleu': 9.1118, 'gen_len': 7.0822}




  7%|▋         | 62/924 [43:19<9:39:15, 40.32s/it]

For epoch 66: {Learning rate: [0.0007802632357996002]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.69batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.75batches/s]



Metrics: {'train_loss': 0.036207207285503785, 'test_loss': 0.550083139538765, 'bleu': 10.2498, 'gen_len': 7.137}




  7%|▋         | 63/924 [43:58<9:35:16, 40.09s/it]

For epoch 67: {Learning rate: [0.0007799649498346791]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.49batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.74batches/s]



Metrics: {'train_loss': 0.03437534545934418, 'test_loss': 0.559477686882019, 'bleu': 9.1816, 'gen_len': 7.1027}




  7%|▋         | 64/924 [44:39<9:39:01, 40.40s/it]

For epoch 68: {Learning rate: [0.0007796621876003934]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.64batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.74batches/s]



Metrics: {'train_loss': 0.03274873091566672, 'test_loss': 0.558627986907959, 'bleu': 9.5647, 'gen_len': 7.1644}




  7%|▋         | 65/924 [45:19<9:36:43, 40.28s/it]

For epoch 69: {Learning rate: [0.0007793549526168355]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.57batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.73batches/s]



Metrics: {'train_loss': 0.03180344944547226, 'test_loss': 0.56312775015831, 'bleu': 11.4919, 'gen_len': 7.0137}




  7%|▋         | 66/924 [46:00<9:38:36, 40.46s/it]

For epoch 70: {Learning rate: [0.0007790432484561001]}


Train batch number 164: 100%|██████████| 164/164 [00:37<00:00,  4.35batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.10batches/s]



Metrics: {'train_loss': 0.030844314994926497, 'test_loss': 0.5529612928628922, 'bleu': 9.9249, 'gen_len': 7.1164}




  7%|▋         | 67/924 [46:44<9:50:38, 41.35s/it]

For epoch 71: {Learning rate: [0.0007787270787422437]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.78batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.030328105607001884, 'test_loss': 0.5571311682462692, 'bleu': 11.1376, 'gen_len': 7.0616}




  7%|▋         | 68/924 [47:22<9:39:28, 40.62s/it]

For epoch 72: {Learning rate: [0.0007784064471512421]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.77batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.87batches/s]



Metrics: {'train_loss': 0.029580032089497985, 'test_loss': 0.555533567070961, 'bleu': 10.6921, 'gen_len': 7.2055}




  7%|▋         | 69/924 [48:01<9:31:12, 40.08s/it]

For epoch 73: {Learning rate: [0.000778081357410947]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.66batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.89batches/s]



Metrics: {'train_loss': 0.0301446861916835, 'test_loss': 0.5499492198228836, 'bleu': 10.1932, 'gen_len': 7.1301}




  8%|▊         | 70/924 [48:41<9:29:04, 39.98s/it]

For epoch 74: {Learning rate: [0.0007777518133010433]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.79batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.83batches/s]



Metrics: {'train_loss': 0.028228622108766036, 'test_loss': 0.5380699098110199, 'bleu': 11.1746, 'gen_len': 7.5342}




  8%|▊         | 71/924 [49:20<9:23:31, 39.64s/it]

For epoch 75: {Learning rate: [0.0007774178186530052]}


Train batch number 164: 100%|██████████| 164/164 [00:33<00:00,  4.88batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.02851461699209744, 'test_loss': 0.5455362856388092, 'bleu': 9.92, 'gen_len': 7.3288}




  8%|▊         | 72/924 [49:58<9:16:06, 39.16s/it]

For epoch 76: {Learning rate: [0.0007770793773500515]}


Train batch number 164: 100%|██████████| 164/164 [00:33<00:00,  4.83batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.88batches/s]



Metrics: {'train_loss': 0.026958736221919338, 'test_loss': 0.5386208891868591, 'bleu': 11.1192, 'gen_len': 7.5274}




  8%|▊         | 73/924 [50:36<9:11:54, 38.91s/it]

For epoch 77: {Learning rate: [0.0007767364933271002]}


Train batch number 164: 100%|██████████| 164/164 [00:33<00:00,  4.86batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.96batches/s]



Metrics: {'train_loss': 0.027244669953134, 'test_loss': 0.5466272205114364, 'bleu': 10.0943, 'gen_len': 7.4795}




  8%|▊         | 74/924 [51:14<9:07:55, 38.68s/it]

For epoch 78: {Learning rate: [0.0007763891705707233]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.62batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.95batches/s]



Metrics: {'train_loss': 0.025388697417816374, 'test_loss': 0.5444027036428452, 'bleu': 9.6745, 'gen_len': 7.4726}




  8%|▊         | 75/924 [51:54<9:12:53, 39.07s/it]

For epoch 79: {Learning rate: [0.0007760374131191]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.75batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.81batches/s]



Metrics: {'train_loss': 0.02493798170556746, 'test_loss': 0.5352234601974487, 'bleu': 11.1878, 'gen_len': 7.2671}




  8%|▊         | 76/924 [52:34<9:13:09, 39.14s/it]

For epoch 80: {Learning rate: [0.0007756812250619693]}


Train batch number 164: 100%|██████████| 164/164 [00:34<00:00,  4.80batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.86batches/s]



Metrics: {'train_loss': 0.025315889276609552, 'test_loss': 0.5438223630189896, 'bleu': 11.4748, 'gen_len': 7.137}




  8%|▊         | 77/924 [53:12<9:10:32, 39.00s/it]

For epoch 81: {Learning rate: [0.0007753206105405844]}


Train batch number 164: 100%|██████████| 164/164 [00:35<00:00,  4.65batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:06<00:00,  1.62batches/s]



Metrics: {'train_loss': 0.024309042102952556, 'test_loss': 0.5431822776794434, 'bleu': 10.0744, 'gen_len': 7.2466}




  8%|▊         | 78/924 [53:55<9:25:49, 40.13s/it]

For epoch 82: {Learning rate: [0.0007749555737476617]}


Train batch number 164: 100%|██████████| 164/164 [00:43<00:00,  3.80batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:05<00:00,  1.98batches/s]



Metrics: {'train_loss': 0.023640633157522576, 'test_loss': 0.546173757314682, 'bleu': 10.5782, 'gen_len': 7.0411}




  9%|▊         | 79/924 [54:48<10:18:26, 43.91s/it]

For epoch 83: {Learning rate: [0.0007745861189273344]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.44batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.53batches/s]



Metrics: {'train_loss': 0.022655687431191524, 'test_loss': 0.5412687674164772, 'bleu': 10.9917, 'gen_len': 7.2534}




  9%|▊         | 80/924 [55:30<10:10:54, 43.43s/it]

For epoch 84: {Learning rate: [0.0007742122503751022]}


Train batch number 164: 100%|██████████| 164/164 [00:37<00:00,  4.43batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.80batches/s]



Metrics: {'train_loss': 0.023605166122362745, 'test_loss': 0.5423970460891724, 'bleu': 10.3179, 'gen_len': 7.3356}




  9%|▉         | 81/924 [56:12<10:03:06, 42.93s/it]

For epoch 85: {Learning rate: [0.000773833972437781]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.46batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.61batches/s]



Metrics: {'train_loss': 0.023024466780309633, 'test_loss': 0.5494770705699921, 'bleu': 9.7444, 'gen_len': 7.363}




  9%|▉         | 82/924 [56:53<9:56:47, 42.53s/it] 

For epoch 86: {Learning rate: [0.0007734512895134536]}


Train batch number 164: 100%|██████████| 164/164 [00:36<00:00,  4.52batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:04<00:00,  2.29batches/s]



Metrics: {'train_loss': 0.02317194913218661, 'test_loss': 0.5426514238119126, 'bleu': 10.3122, 'gen_len': 7.5137}




  9%|▉         | 83/924 [57:35<9:53:21, 42.33s/it]

For epoch 87: {Learning rate: [0.0007730642060514169]}


Train batch number 164: 100%|██████████| 164/164 [00:43<00:00,  3.78batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:05<00:00,  1.87batches/s]



Metrics: {'train_loss': 0.021605744514977786, 'test_loss': 0.5494884133338929, 'bleu': 8.6836, 'gen_len': 7.6507}




  9%|▉         | 84/924 [58:26<10:26:32, 44.75s/it]

For epoch 88: {Learning rate: [0.0007726727265521316]}


Train batch number 164: 100%|██████████| 164/164 [00:42<00:00,  3.84batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:03<00:00,  2.68batches/s]



Metrics: {'train_loss': 0.021213641438474196, 'test_loss': 0.54854137301445, 'bleu': 9.8897, 'gen_len': 7.3082}




  9%|▉         | 85/924 [59:13<10:38:01, 45.63s/it]

For epoch 89: {Learning rate: [0.0007722768555671693]}


Train batch number 29:  17%|█▋        | 28/164 [00:06<00:32,  4.21batches/s]

In [9]:
# let us get the best model
# model = T5ForConditionalGeneration.from_pretrained('data/checkpoints/t5_results_fw_v3/...')

# let us get the test set
test_dataset = SentenceDataset(f"data/extractions/new_data/test_set.csv",
                                        tokenizer,
                                        truncation = True)

### Predictions and Evaluation

Let us generate texts and store into a DataFrame.

In [11]:
df_ft_to_wf.tail(10)

Unnamed: 0,original_text,original_label,predicted_label
152,"Homme, lion, boeuf... allaient de concert.","Nit, gayndé, nag... àndoon nañu fi.","Nit, gayndé, nag, àndoon nañu fi."
153,C'est toi qui eusses été élu,Yaa doonkoon falu,Yaa doonkoon wax
154,L'homme ne cultivera pas,Góor gi du bày,Góor gi bëggul
155,S'agiter simplement ne suffit à rien résoudre.,Di tel-teli doŋŋ taxul sotal dara.,Nit ñenn ñi yegseeguñu.
156,C'était son hôte habituellement.,Moo doon ganam.,Man xar mépp.
157,Je parle de ceux-là!,Yenn xar yooyuu laa wax!,Yaw moomu laa wax
158,Tu reconnais cet enfant-ci?,Xammee ŋga bee xale?,Xammee ŋga waa jooju?
159,"Alors l'homme entra, les enfants le virent, il...","Noona góor gi dugg, xale yi gis ka, mu toog, ñ...","Noona Góor gaa ŋgi, mu ñëw."
160,C'est leur ami!,Suñu xarit la!,Su demee
161,Il était Lebou de Yoff.,Mu doon Lebu Yoff.,Dafa doon nitu dëgg.


In [12]:
# let us display 100 samples
pd.options.display.max_rows = 100
df_ft_to_wf.sample(100)

Unnamed: 0,original_text,original_label,predicted_label
105,Qui est-ce?,Ñan la?,Ku mu?
80,Tu as dit cela.,La ŋga wax la.,Li ŋga wax loolu.
52,A Moussa!,Musaa!,Musaa
132,Je connais l'enfant.,Xam naa xale bi.,Xam naa xale bi.
59,L'homme qui eût travaillé,Waa ji liggéeykoon,Góor gi waxkoon na
54,Le voilà qui part!,Mi ŋgiiy!,Ma ŋgee doon dem
115,Que tu partes ou que tu ne partes pas il viendra.,Dana ñëw soo demul ag soo demee itam.,"Soo demee ag soo demul itam, dana ñëw."
114,C'est l'homme qui a soutenu qu'il est sain d'e...,"Góor gee ni nit la, soo demee!",Góor gee ni soo demee nit la
46,J'ai vu mes amis!,Gis naa sana xarit yi!,Gis naa sama xarit yeneen yooyuu
147,Appelle l'homme qui ne part pas,Wool góor gi dul dem,Wool góor gi dul dem


## Colab download and remove step

In [None]:
import shutil

# shutil.rmtree('/content/drive/MyDrive/Memoire/subject2/training2/results2')
shutil.rmtree('wandb')
# shutil.make_archive('wandb', 'zip', 'wanbd')