Fine-tuning best T5 Transformer 🤖
-----------------------------------

In this notebook, we will continue the fine-tuning of T5 transformer on the new extracted sentences from the book **Grammaire de Wolof Moderne** without considering the definitions. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step. We will evaluate the training on the validation dataset.

- Parallel coordinates from panel:

- Parameter importance char: 
[t5_v3_importance](https://wandb.ai/oumar-kane-team/small-t5-cross-fw-translation-bayes-hpsearch-v3/reports/undefined-23-05-16-10-36-17---Vmlldzo0Mzc4NDY0?accessToken=eyaiyrid0qz1zg2jkq3fc65biw53084dpfitbi0dgonq6mweupw6kgjml9d2nv1w)

We can see in the above chart that the batch is the most important parameter with a negative correlation with the BLEU score (meaning that a lower batch size is better). Next, we the probability of modifying a character in the french corpus is also important and a high probability provide a better BLEU score.  

In [1]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, set_seed, AdamW, get_linear_schedule_with_warmup, T5ForConditionalGeneration,\
    get_cosine_schedule_with_warmup, Adafactor
from wolof_translate.utils.sent_transformers import TransformerSequences
from wolof_translate.utils.improvements.end_marks import add_end_mark # added
from torch.nn import TransformerEncoderLayer, TransformerDecoderLayer
from torch.utils.data import Dataset, DataLoader, random_split
from wolof_translate.data.dataset_v3 import SentenceDataset # v2 -> v3
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from torch.optim.lr_scheduler import _LRScheduler
# from custom_rnn.utils.kwargs import Kwargs
from torch.nn.utils.rnn import pad_sequence
from plotly.subplots import make_subplots
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from torch.nn import functional as F
import plotly.graph_objects as go
from tokenizers import Tokenizer
import matplotlib.pyplot as plt
from tqdm import tqdm, trange
from functools import partial
from torch.nn import utils
from copy import deepcopy
from torch import optim
from typing import *
from torch import nn
import pandas as pd
import numpy as np
import itertools
import evaluate
import random
import string
import shutil
import wandb
import torch
import json
import copy
import os

os.environ["WANDB_DISABLED"] = "true"

  from .autonotebook import tqdm as notebook_tqdm


## French to wolof

### Configure dataset 🔠

In [2]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v3.json")


In [3]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float, max_len: int, end_mark_opt: int):

  # Let us recuperate the end_mark adding option
  if end_mark_opt == 1:
    # Create augmentation to add on French sentences
    fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p),
                                          remove_mark_space, delete_guillemet_space)

  else:
    
    if end_mark_opt == 2:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!', replace = True)
    
    elif end_mark_opt == 3:

      end_mark_fn = partial(add_end_mark)
    
    elif end_mark_opt == 4:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!')

    # Create augmentation to add on French sentences
    fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p, 
                                                          aug_word_max= max_len),
                                          remove_mark_space, delete_guillemet_space, end_mark_fn)
    
  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(f"data/extractions/new_data/train_set.csv",
                                        tokenizer,
                                        truncation = True, max_len=max_len,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(f"data/extractions/new_data/valid_set.csv",
                                        tokenizer, max_len=max_len,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

### Configure the model and the evaluation function ⚙️

Let us evaluate the predictions with the `bleu` metric.

In [4]:
%%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, self.tokenizer.pad_token_id)
        
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

Overwriting wolof-translate/wolof_translate/utils/evaluation.py


Let us initialize the evaluation object.

In [5]:
%run wolof-translate/wolof_translate/utils/evaluation.py
evaluation = TranslationEvaluation(tokenizer)


Using the latest cached version of the module from C:\Users\Oumar Kane\.cache\huggingface\modules\evaluate_modules\metrics\evaluate-metric--sacrebleu\28676bf65b4f88b276df566e48e603732d0b4afd237603ebdf92acaacf5be99b (last modified on Wed Apr 26 19:02:40 2023) since it couldn't be found locally at evaluate-metric--sacrebleu, or remotely on the Hugging Face Hub.


### Searching for the best parameters 🕖

In [6]:
from wolof_translate.models.transformers.optimization import TransformerScheduler
from wolof_translate.trainers.transformer_trainer import ModelRunner
from wolof_translate.utils.evaluation import TranslationEvaluation
from wolof_translate.models.transformers.main import Transformer
from wolof_translate.utils.split_with_valid import split_data


Using the latest cached version of the module from C:\Users\Oumar Kane\.cache\huggingface\modules\evaluate_modules\metrics\evaluate-metric--sacrebleu\28676bf65b4f88b276df566e48e603732d0b4afd237603ebdf92acaacf5be99b (last modified on Wed Apr 26 19:02:40 2023) since it couldn't be found locally at evaluate-metric--sacrebleu, or remotely on the Hugging Face Hub.


-------------

### --- Wandb v3

In [7]:
# let us initialize the hyperparameter configuration 
config = {
    'random_state': 0,
    'fr_char_p': 0.16802057037858978,
    'fr_word_p': 0.14803592458095208,
    'learning_rate': 0.00030583792974076316,
    'weight_decay': 0.636712624031075,
    'batch_size': 8,
    'warmup_ratio': 0.0,
    'max_epoch': 965,
    'max_len': 51,
    'end_mark': 4,
    'bleu': 2.8517,
    'model_dir': 'data/checkpoints/fw_t5_base_custom_train_v3_checkpoints/',
    'new_model_dir': 'data/checkpoints/t5_base_custom_train_results_fw_v3/'
}

# Initialize the model name_
model_name = 't5-base'

# import the model with its pre-trained weights
model = T5ForConditionalGeneration.from_pretrained(model_name)

# resize the token embeddings
model.resize_token_embeddings(len(tokenizer))

# let us initialize the evaluation class
evaluation = TranslationEvaluation(tokenizer)

# let us initialize the trainer
trainer = ModelRunner(model, seed = 0, version = 1, evaluation = evaluation, optimizer=Adafactor)

# split the data
split_data(config['random_state'])

# recuperate train and test set
train_dataset, test_dataset = recuperate_datasets(config['fr_char_p'], 
                                                    config['fr_word_p'], 51,
                                                    config['end_mark'])

# let us calculate the appropriate warmup steps (let us take a max epoch of 100)
length = len(train_dataset)

n_steps = length // config['batch_size']

num_steps = config['max_epoch'] * n_steps

warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

# Initialize the scheduler parameters
scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}

# Initialize the optimizer parameters
optimizer_args = {
    'lr': config['learning_rate'],
    'weight_decay': config['weight_decay'],
    # 'betas': (0.9, 0.98),
    'relative_step': False
}

# Initialize the loaders parameters
train_loader_args = {'batch_size': config['batch_size']}

# Add the datasets and hyperparameters to trainer
trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                optimizer_kwargs = optimizer_args,
                lr_scheduler=get_linear_schedule_with_warmup,
                lr_scheduler_kwargs=scheduler_args, 
                predict_with_generate = True,
                hugging_face = True,
                logging_dir="data/logs/t5_base_custom_train_fw_v3"
                )

# We will from checkpoints so let us the model
# trainer.load(config['model_dir'], load_best=True) # Only for the first loading
trainer.load(config['new_model_dir'])

        

### ---

In [12]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 4: {Learning rate: [0.006173700159858874]}


Train batch number 164: 100%|██████████| 164/164 [01:12<00:00,  2.25batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.39batches/s]



Metrics: {'train_loss': 0.257166282887139, 'test_loss': 0.4534699246287346, 'bleu': 1.388, 'gen_len': 8.2055}




  0%|          | 1/962 [01:36<25:42:38, 96.31s/it]

For epoch 5: {Learning rate: [0.006167243097007928]}


Train batch number 164: 100%|██████████| 164/164 [01:19<00:00,  2.06batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.21444461803610732, 'test_loss': 0.46686111986637113, 'bleu': 1.5982, 'gen_len': 6.2192}




  0%|          | 2/962 [03:18<26:39:31, 99.97s/it]

For epoch 6: {Learning rate: [0.0061607860341569825]}


Train batch number 164: 100%|██████████| 164/164 [01:19<00:00,  2.05batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.18970944509818788, 'test_loss': 0.4713015168905258, 'bleu': 0.6705, 'gen_len': 12.3288}




  0%|          | 3/962 [04:54<26:05:31, 97.95s/it]

For epoch 7: {Learning rate: [0.006154328971306036]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.36batches/s]



Metrics: {'train_loss': 0.1737764768484162, 'test_loss': 0.46738528460264206, 'bleu': 0.7354, 'gen_len': 8.9932}




  0%|          | 4/962 [06:31<25:57:38, 97.56s/it]

For epoch 8: {Learning rate: [0.006147871908455091]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.16543982223403164, 'test_loss': 0.46827168166637423, 'bleu': 1.0535, 'gen_len': 7.0685}




  1%|          | 5/962 [08:10<26:06:15, 98.20s/it]

For epoch 9: {Learning rate: [0.006141414845604145]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.15932651909022796, 'test_loss': 0.47090916335582733, 'bleu': 0.6277, 'gen_len': 8.6233}




  1%|          | 6/962 [09:47<25:59:39, 97.89s/it]

For epoch 10: {Learning rate: [0.006134957782753199]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.36batches/s]



Metrics: {'train_loss': 0.15187018628164037, 'test_loss': 0.4755176708102226, 'bleu': 0.6277, 'gen_len': 8.1986}




  1%|          | 7/962 [11:28<26:10:43, 98.68s/it]

For epoch 11: {Learning rate: [0.006128500719902253]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.1471098776815868, 'test_loss': 0.4547003641724586, 'bleu': 0.7824, 'gen_len': 7.6096}




  1%|          | 8/962 [13:07<26:13:45, 98.98s/it]

For epoch 12: {Learning rate: [0.006122043657051308]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.37batches/s]



Metrics: {'train_loss': 0.13927027701241215, 'test_loss': 0.45156280845403673, 'bleu': 0.436, 'gen_len': 10.8082}




  1%|          | 9/962 [14:47<26:15:16, 99.18s/it]

For epoch 13: {Learning rate: [0.006115586594200362]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.13407413597877432, 'test_loss': 0.4547923028469086, 'bleu': 1.7546, 'gen_len': 7.5685}




  1%|          | 10/962 [16:33<26:45:32, 101.19s/it]

For epoch 14: {Learning rate: [0.006109129531349416]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.12848897569063233, 'test_loss': 0.4699230432510376, 'bleu': 1.6696, 'gen_len': 7.8014}




  1%|          | 11/962 [18:12<26:32:58, 100.50s/it]

For epoch 15: {Learning rate: [0.00610267246849847]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.12324001521962445, 'test_loss': 0.451073956489563, 'bleu': 1.2822, 'gen_len': 8.8493}




  1%|          | 12/962 [19:51<26:27:22, 100.26s/it]

For epoch 16: {Learning rate: [0.0060962154056475246]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.1170676555938837, 'test_loss': 0.45911286771297455, 'bleu': 4.1404, 'gen_len': 7.1438}




  1%|▏         | 13/962 [21:37<26:52:38, 101.96s/it]

For epoch 17: {Learning rate: [0.006089758342796579]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.11235896879579962, 'test_loss': 0.4478358328342438, 'bleu': 3.1962, 'gen_len': 8.0959}




  1%|▏         | 14/962 [23:18<26:46:23, 101.67s/it]

For epoch 18: {Learning rate: [0.006083301279945633]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.37batches/s]



Metrics: {'train_loss': 0.10669749357351442, 'test_loss': 0.445649591088295, 'bleu': 4.4044, 'gen_len': 8.6986}




  2%|▏         | 15/962 [25:04<27:06:05, 103.03s/it]

For epoch 19: {Learning rate: [0.0060768442170946865]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.10226333813696373, 'test_loss': 0.4583529531955719, 'bleu': 2.0427, 'gen_len': 8.6027}




  2%|▏         | 16/962 [26:44<26:47:58, 101.99s/it]

For epoch 20: {Learning rate: [0.006070387154243741]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.09413744309326498, 'test_loss': 0.45733753889799117, 'bleu': 4.0007, 'gen_len': 8.3562}




  2%|▏         | 17/962 [28:24<26:37:15, 101.41s/it]

For epoch 21: {Learning rate: [0.006063930091392796]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.36batches/s]



Metrics: {'train_loss': 0.08927194730991997, 'test_loss': 0.4529800400137901, 'bleu': 5.8463, 'gen_len': 7.9658}




  2%|▏         | 18/962 [30:11<27:03:38, 103.20s/it]

For epoch 22: {Learning rate: [0.006057473028541849]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.08716044336466527, 'test_loss': 0.4549557790160179, 'bleu': 4.253, 'gen_len': 7.911}




  2%|▏         | 19/962 [31:52<26:49:31, 102.41s/it]

For epoch 23: {Learning rate: [0.006051015965690904]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.10batches/s]



Metrics: {'train_loss': 0.08118535712270475, 'test_loss': 0.46245292127132415, 'bleu': 4.3412, 'gen_len': 8.7466}




  2%|▏         | 20/962 [33:34<26:44:24, 102.19s/it]

For epoch 24: {Learning rate: [0.0060445589028399575]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.07684126248719489, 'test_loss': 0.45288110226392747, 'bleu': 4.2178, 'gen_len': 8.1781}




  2%|▏         | 21/962 [35:14<26:36:03, 101.77s/it]

For epoch 25: {Learning rate: [0.006038101839989012]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.07164699970403822, 'test_loss': 0.4487046629190445, 'bleu': 7.218, 'gen_len': 7.8836}




  2%|▏         | 22/962 [37:01<26:56:40, 103.19s/it]

For epoch 26: {Learning rate: [0.006031644777138067]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.06798276720886551, 'test_loss': 0.45966849476099014, 'bleu': 4.5326, 'gen_len': 8.8699}




  2%|▏         | 23/962 [38:40<26:36:10, 101.99s/it]

For epoch 27: {Learning rate: [0.00602518771428712]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.06395100170701015, 'test_loss': 0.47585643976926806, 'bleu': 8.5917, 'gen_len': 8.6164}




  2%|▏         | 24/962 [40:26<26:53:02, 103.18s/it]

For epoch 28: {Learning rate: [0.006018730651436174]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.06033079244378137, 'test_loss': 0.46809063404798507, 'bleu': 9.2447, 'gen_len': 7.5753}




  3%|▎         | 25/962 [42:14<27:15:11, 104.71s/it]

For epoch 29: {Learning rate: [0.006012273588585229]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.05742375069956591, 'test_loss': 0.46412360966205596, 'bleu': 12.3852, 'gen_len': 8.6644}




  3%|▎         | 26/962 [44:00<27:19:14, 105.08s/it]

For epoch 30: {Learning rate: [0.006005816525734283]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.052807538179544415, 'test_loss': 0.4634905904531479, 'bleu': 7.1612, 'gen_len': 8.5068}




  3%|▎         | 27/962 [45:40<26:51:11, 103.39s/it]

For epoch 31: {Learning rate: [0.005999359462883337]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.04959579468591184, 'test_loss': 0.46126754134893416, 'bleu': 11.2298, 'gen_len': 8.2192}




  3%|▎         | 28/962 [47:21<26:37:35, 102.63s/it]

For epoch 32: {Learning rate: [0.005992902400032391]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.04603971732871198, 'test_loss': 0.470794078707695, 'bleu': 13.9874, 'gen_len': 8.1233}




  3%|▎         | 29/962 [49:09<27:02:33, 104.34s/it]

For epoch 33: {Learning rate: [0.005986445337181446]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.04492069758111384, 'test_loss': 0.46413999795913696, 'bleu': 13.4403, 'gen_len': 8.6781}




  3%|▎         | 30/962 [50:50<26:43:16, 103.21s/it]

For epoch 34: {Learning rate: [0.0059799882743305005]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.04262480563370556, 'test_loss': 0.47710768282413485, 'bleu': 13.4697, 'gen_len': 8.0479}




  3%|▎         | 31/962 [52:32<26:36:46, 102.91s/it]

For epoch 35: {Learning rate: [0.005973531211479554]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.04150175161818724, 'test_loss': 0.4572422608733177, 'bleu': 14.9812, 'gen_len': 8.3288}




  3%|▎         | 32/962 [54:21<27:03:25, 104.74s/it]

For epoch 36: {Learning rate: [0.005967074148628608]}


Train batch number 97:  59%|█████▊    | 96/164 [00:50<00:36,  1.86batches/s]

### ---

In [8]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 36: {Learning rate: [0.005967074148628608]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.38batches/s]



Metrics: {'train_loss': 0.03819228132346236, 'test_loss': 0.47707190215587614, 'bleu': 14.1767, 'gen_len': 7.4726}




  0%|          | 1/930 [01:35<24:45:33, 95.95s/it]

For epoch 37: {Learning rate: [0.005960617085777662]}


Train batch number 164: 100%|██████████| 164/164 [01:14<00:00,  2.19batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.03683541160894603, 'test_loss': 0.4637154720723629, 'bleu': 12.2902, 'gen_len': 8.6986}




  0%|          | 2/930 [03:06<23:54:00, 92.72s/it]

For epoch 38: {Learning rate: [0.005954160022926717]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.27batches/s]



Metrics: {'train_loss': 0.035306159935028454, 'test_loss': 0.47934024408459663, 'bleu': 16.3635, 'gen_len': 8.4863}




  0%|          | 3/930 [04:53<25:36:39, 99.46s/it]

For epoch 39: {Learning rate: [0.005947702960075771]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.03278520123510644, 'test_loss': 0.4749445170164108, 'bleu': 12.2676, 'gen_len': 8.3767}




  0%|          | 4/930 [06:31<25:25:23, 98.84s/it]

For epoch 40: {Learning rate: [0.005941245897224825]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.031969252417272914, 'test_loss': 0.4805635079741478, 'bleu': 17.5028, 'gen_len': 8.1027}




  1%|          | 5/930 [08:18<26:06:27, 101.61s/it]

For epoch 41: {Learning rate: [0.005934788834373879]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.02894298150488062, 'test_loss': 0.4790022101253271, 'bleu': 13.4764, 'gen_len': 8.4384}




  1%|          | 6/930 [09:59<26:01:56, 101.43s/it]

For epoch 42: {Learning rate: [0.0059283317715229334]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.02946154050892446, 'test_loss': 0.4748890195041895, 'bleu': 14.065, 'gen_len': 8.4315}




  1%|          | 7/930 [11:38<25:50:43, 100.81s/it]

For epoch 43: {Learning rate: [0.005921874708671988]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.27batches/s]



Metrics: {'train_loss': 0.027215226610168453, 'test_loss': 0.4789775848388672, 'bleu': 17.6093, 'gen_len': 8.3699}




  1%|          | 8/930 [13:32<26:52:55, 104.96s/it]

For epoch 44: {Learning rate: [0.005915417645821042]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.0259451469031685, 'test_loss': 0.47619872689247134, 'bleu': 16.6542, 'gen_len': 8.5411}




  1%|          | 9/930 [15:12<26:27:45, 103.44s/it]

For epoch 45: {Learning rate: [0.005908960582970095]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.025742479355824067, 'test_loss': 0.4741707891225815, 'bleu': 16.9348, 'gen_len': 8.1918}




  1%|          | 10/930 [16:54<26:16:57, 102.85s/it]

For epoch 46: {Learning rate: [0.00590250352011915]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.20batches/s]



Metrics: {'train_loss': 0.026143629132292984, 'test_loss': 0.470525985211134, 'bleu': 17.7871, 'gen_len': 8.1164}




  1%|          | 11/930 [18:48<27:06:30, 106.19s/it]

For epoch 47: {Learning rate: [0.0058960464572682045]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.024612590119742404, 'test_loss': 0.4801926136016846, 'bleu': 15.8567, 'gen_len': 8.4247}




  1%|▏         | 12/930 [20:28<26:39:21, 104.53s/it]

For epoch 48: {Learning rate: [0.005889589394417258]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.023988754397667036, 'test_loss': 0.4951056972146034, 'bleu': 15.5235, 'gen_len': 8.4178}




  1%|▏         | 13/930 [22:10<26:21:57, 103.51s/it]

For epoch 49: {Learning rate: [0.005883132331566313]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.022457480948793178, 'test_loss': 0.48339259922504424, 'bleu': 15.5512, 'gen_len': 8.5616}




  2%|▏         | 14/930 [23:52<26:13:23, 103.06s/it]

For epoch 50: {Learning rate: [0.005876675268715366]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.020819501435986106, 'test_loss': 0.4793257012963295, 'bleu': 16.278, 'gen_len': 8.2808}




  2%|▏         | 15/930 [25:34<26:10:39, 102.99s/it]

For epoch 51: {Learning rate: [0.005870218205864421]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.02073534383586176, 'test_loss': 0.47898009680211545, 'bleu': 16.5707, 'gen_len': 8.4658}




  2%|▏         | 16/930 [27:16<26:02:32, 102.57s/it]

For epoch 52: {Learning rate: [0.0058637611430134755]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.01878388056236251, 'test_loss': 0.49066803641617296, 'bleu': 18.3684, 'gen_len': 8.3356}




  2%|▏         | 17/930 [29:04<26:24:36, 104.14s/it]

For epoch 53: {Learning rate: [0.005857304080162529]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.020690067177184107, 'test_loss': 0.4895695194602013, 'bleu': 19.1318, 'gen_len': 8.5753}




  2%|▏         | 18/930 [30:53<26:44:30, 105.56s/it]

For epoch 54: {Learning rate: [0.005850847017311584]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.020099602287589776, 'test_loss': 0.4947419837117195, 'bleu': 17.6572, 'gen_len': 8.6781}




  2%|▏         | 19/930 [32:31<26:09:21, 103.36s/it]

For epoch 55: {Learning rate: [0.005844389954460638]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.018243574037602763, 'test_loss': 0.487200365960598, 'bleu': 19.3227, 'gen_len': 8.4384}




  2%|▏         | 20/930 [34:19<26:31:31, 104.94s/it]

For epoch 56: {Learning rate: [0.005837932891609692]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.01750671527371174, 'test_loss': 0.4961160574108362, 'bleu': 17.481, 'gen_len': 8.5342}




  2%|▏         | 21/930 [35:58<26:00:33, 103.01s/it]

For epoch 57: {Learning rate: [0.0058314758287587466]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.27batches/s]



Metrics: {'train_loss': 0.01845939226857409, 'test_loss': 0.48993171453475953, 'bleu': 17.867, 'gen_len': 8.3973}




  2%|▏         | 22/930 [37:37<25:40:57, 101.83s/it]

For epoch 58: {Learning rate: [0.0058250187659078]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.017091984201858684, 'test_loss': 0.48478200044482944, 'bleu': 17.3081, 'gen_len': 8.5616}




  2%|▏         | 23/930 [39:16<25:25:04, 100.89s/it]

For epoch 59: {Learning rate: [0.005818561703056855]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.016945205069510492, 'test_loss': 0.4866363354027271, 'bleu': 16.6511, 'gen_len': 8.6507}




  3%|▎         | 24/930 [40:56<25:21:06, 100.74s/it]

For epoch 60: {Learning rate: [0.005812104640205909]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.016743421147111803, 'test_loss': 0.4861594453454018, 'bleu': 19.5006, 'gen_len': 8.0479}




  3%|▎         | 25/930 [42:44<25:49:57, 102.76s/it]

For epoch 61: {Learning rate: [0.005805647577354963]}


Train batch number 164: 100%|██████████| 164/164 [01:32<00:00,  1.77batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.01686422212656996, 'test_loss': 0.4869706057012081, 'bleu': 15.9726, 'gen_len': 8.4795}




  3%|▎         | 26/930 [44:33<26:17:17, 104.69s/it]

For epoch 62: {Learning rate: [0.005799190514504017]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.015597753800315464, 'test_loss': 0.4864132758229971, 'bleu': 18.4983, 'gen_len': 8.7603}




  3%|▎         | 27/930 [46:16<26:09:08, 104.26s/it]

For epoch 63: {Learning rate: [0.005792733451653071]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.04batches/s]



Metrics: {'train_loss': 0.01635149172109711, 'test_loss': 0.4778881182894111, 'bleu': 17.5221, 'gen_len': 8.3699}




  3%|▎         | 28/930 [47:59<26:01:30, 103.87s/it]

For epoch 64: {Learning rate: [0.005786276388802126]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.88batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.15batches/s]



Metrics: {'train_loss': 0.014303686593449107, 'test_loss': 0.47989868130534885, 'bleu': 19.0594, 'gen_len': 8.4863}




  3%|▎         | 29/930 [49:43<26:00:45, 103.94s/it]

For epoch 65: {Learning rate: [0.0057798193259511795]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.13batches/s]



Metrics: {'train_loss': 0.013785408478281348, 'test_loss': 0.4910534702241421, 'bleu': 18.7825, 'gen_len': 8.3973}




  3%|▎         | 30/930 [51:28<26:03:05, 104.21s/it]

For epoch 66: {Learning rate: [0.005773362263100234]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.014456219416077635, 'test_loss': 0.4906672425568104, 'bleu': 18.4737, 'gen_len': 8.5753}




  3%|▎         | 31/930 [53:09<25:46:54, 103.24s/it]

For epoch 67: {Learning rate: [0.005766905200249288]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.86batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.015488866872790202, 'test_loss': 0.48535415250808, 'bleu': 17.428, 'gen_len': 8.2192}




  3%|▎         | 32/930 [54:53<25:48:50, 103.49s/it]

For epoch 68: {Learning rate: [0.005760448137398342]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.20batches/s]



Metrics: {'train_loss': 0.015110146073999292, 'test_loss': 0.49804111905395987, 'bleu': 17.7074, 'gen_len': 8.1781}




  4%|▎         | 33/930 [56:36<25:45:50, 103.40s/it]

For epoch 69: {Learning rate: [0.005753991074547397]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.02batches/s]



Metrics: {'train_loss': 0.01469403894606796, 'test_loss': 0.4877496179193258, 'bleu': 16.2529, 'gen_len': 8.4247}




  4%|▎         | 34/930 [58:18<25:37:47, 102.98s/it]

For epoch 70: {Learning rate: [0.0057475340116964506]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.13batches/s]



Metrics: {'train_loss': 0.013868779178265846, 'test_loss': 0.5045388601720333, 'bleu': 18.0912, 'gen_len': 8.2808}




  4%|▍         | 35/930 [1:00:02<25:38:57, 103.17s/it]

For epoch 71: {Learning rate: [0.005741076948845504]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.88batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.01345110827738919, 'test_loss': 0.4834831841289997, 'bleu': 17.9928, 'gen_len': 8.3699}




  4%|▍         | 36/930 [1:01:46<25:40:45, 103.41s/it]

For epoch 72: {Learning rate: [0.00573461988599456]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.08s/batches]



Metrics: {'train_loss': 0.013096524347130937, 'test_loss': 0.47803243771195414, 'bleu': 17.8009, 'gen_len': 8.5205}




  4%|▍         | 37/930 [1:03:32<25:51:08, 104.22s/it]

For epoch 73: {Learning rate: [0.005728162823143613]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.84batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.013049100667647109, 'test_loss': 0.48567099422216414, 'bleu': 17.7895, 'gen_len': 8.5411}




  4%|▍         | 38/930 [1:05:18<25:56:13, 104.68s/it]

For epoch 74: {Learning rate: [0.005721705760292668]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.013673734358545938, 'test_loss': 0.48633677437901496, 'bleu': 18.3566, 'gen_len': 8.2397}




  4%|▍         | 39/930 [1:06:59<25:41:46, 103.82s/it]

For epoch 75: {Learning rate: [0.005715248697441722]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.20batches/s]



Metrics: {'train_loss': 0.013291250609099955, 'test_loss': 0.48977228030562403, 'bleu': 15.4626, 'gen_len': 8.8562}




  4%|▍         | 40/930 [1:08:42<25:34:18, 103.44s/it]

For epoch 76: {Learning rate: [0.005708791634590776]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.011793216900536564, 'test_loss': 0.5025056257843972, 'bleu': 15.1064, 'gen_len': 8.4589}




  4%|▍         | 41/930 [1:10:26<25:35:10, 103.61s/it]

For epoch 77: {Learning rate: [0.005702334571739831]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.012163838998431613, 'test_loss': 0.49115499258041384, 'bleu': 17.1976, 'gen_len': 8.3973}




  5%|▍         | 42/930 [1:12:05<25:13:26, 102.26s/it]

For epoch 78: {Learning rate: [0.005695877508888884]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.88batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.012738386316724666, 'test_loss': 0.47515571042895316, 'bleu': 18.4726, 'gen_len': 8.5205}




  5%|▍         | 43/930 [1:13:49<25:19:07, 102.76s/it]

For epoch 79: {Learning rate: [0.005689420446037938]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.011418544830010476, 'test_loss': 0.48829284608364104, 'bleu': 17.6885, 'gen_len': 8.4932}




  5%|▍         | 44/930 [1:15:32<25:18:02, 102.80s/it]

For epoch 80: {Learning rate: [0.005682963383186993]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.15batches/s]



Metrics: {'train_loss': 0.01137744728688764, 'test_loss': 0.47145743649452926, 'bleu': 19.472, 'gen_len': 8.5616}




  5%|▍         | 45/930 [1:17:14<25:13:54, 102.64s/it]

For epoch 81: {Learning rate: [0.005676506320336047]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.011839227860569158, 'test_loss': 0.4845841094851494, 'bleu': 19.2087, 'gen_len': 8.3356}




  5%|▍         | 46/930 [1:18:57<25:15:00, 102.83s/it]

For epoch 82: {Learning rate: [0.005670049257485101]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.012577167697245165, 'test_loss': 0.4775491276755929, 'bleu': 20.2524, 'gen_len': 8.3082}




  5%|▌         | 47/930 [1:20:47<25:44:40, 104.96s/it]

For epoch 83: {Learning rate: [0.005663592194634155]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.12batches/s]



Metrics: {'train_loss': 0.011303811879947827, 'test_loss': 0.47283904887735845, 'bleu': 17.3848, 'gen_len': 8.4041}




  5%|▌         | 48/930 [1:22:29<25:29:53, 104.07s/it]

For epoch 84: {Learning rate: [0.005657135131783209]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.86batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.009639888080816558, 'test_loss': 0.49377799928188326, 'bleu': 16.6188, 'gen_len': 8.363}




  5%|▌         | 49/930 [1:24:14<25:31:58, 104.33s/it]

For epoch 85: {Learning rate: [0.005650678068932264]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.009924074481049745, 'test_loss': 0.48168723583221434, 'bleu': 18.4434, 'gen_len': 8.411}




### ---

In [9]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 86: {Learning rate: [0.005644221006081318]}


Train batch number 164: 100%|██████████| 164/164 [01:33<00:00,  1.76batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.010900877158458503, 'test_loss': 0.48287404626607894, 'bleu': 18.5719, 'gen_len': 8.4863}




  0%|          | 1/880 [01:50<27:06:01, 110.99s/it]

For epoch 87: {Learning rate: [0.005637763943230372]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.85batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.010752648401582363, 'test_loss': 0.47343857064843176, 'bleu': 17.8764, 'gen_len': 8.4795}




  0%|          | 2/880 [03:35<26:09:10, 107.23s/it]

For epoch 88: {Learning rate: [0.005631306880379426]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.010527441911995637, 'test_loss': 0.4754669634625316, 'bleu': 18.8959, 'gen_len': 8.411}




  0%|          | 3/880 [05:14<25:12:35, 103.48s/it]

For epoch 89: {Learning rate: [0.005624849817528481]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.88batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.01074110487406426, 'test_loss': 0.48421466574072836, 'bleu': 18.19, 'gen_len': 8.2945}




  0%|          | 4/880 [06:57<25:08:16, 103.31s/it]

For epoch 90: {Learning rate: [0.005618392754677535]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.010986960430029018, 'test_loss': 0.4790479900315404, 'bleu': 17.8137, 'gen_len': 8.8288}




  1%|          | 5/880 [08:41<25:11:52, 103.67s/it]

For epoch 91: {Learning rate: [0.005611935691826588]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.84batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.15batches/s]



Metrics: {'train_loss': 0.010851152199703814, 'test_loss': 0.47378216264769435, 'bleu': 20.7818, 'gen_len': 8.3082}




  1%|          | 6/880 [10:35<25:57:20, 106.91s/it]

For epoch 92: {Learning rate: [0.005605478628975643]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.13batches/s]



Metrics: {'train_loss': 0.008640904596754031, 'test_loss': 0.4934361159801483, 'bleu': 19.7373, 'gen_len': 8.4178}




  1%|          | 7/880 [12:19<25:43:29, 106.08s/it]

For epoch 93: {Learning rate: [0.0055990215661246975]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.85batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.008986228276626207, 'test_loss': 0.4793516807258129, 'bleu': 17.9674, 'gen_len': 8.6301}




  1%|          | 8/880 [14:04<25:38:19, 105.85s/it]

For epoch 94: {Learning rate: [0.005592564503273752]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.009325036131663293, 'test_loss': 0.48231811001896857, 'bleu': 18.1445, 'gen_len': 8.5}




  1%|          | 9/880 [15:46<25:17:10, 104.51s/it]

For epoch 95: {Learning rate: [0.005586107440422806]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.13batches/s]



Metrics: {'train_loss': 0.008961988112565539, 'test_loss': 0.4796189896762371, 'bleu': 21.5024, 'gen_len': 8.3219}




  1%|          | 10/880 [17:38<25:48:48, 106.81s/it]

For epoch 96: {Learning rate: [0.005579650377571859]}


Train batch number 164: 100%|██████████| 164/164 [01:30<00:00,  1.82batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.01batches/s]



Metrics: {'train_loss': 0.008287531884157144, 'test_loss': 0.4761825453490019, 'bleu': 20.5762, 'gen_len': 8.0479}




  1%|▏         | 11/880 [19:27<25:55:16, 107.38s/it]

For epoch 97: {Learning rate: [0.005573193314720914]}


Train batch number 164: 100%|██████████| 164/164 [01:31<00:00,  1.80batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.009919249175099383, 'test_loss': 0.479929081723094, 'bleu': 18.0023, 'gen_len': 8.2534}




  1%|▏         | 12/880 [21:15<25:57:02, 107.63s/it]

For epoch 98: {Learning rate: [0.0055667362518699685]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.84batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.09batches/s]



Metrics: {'train_loss': 0.008980475859106064, 'test_loss': 0.4752262391149998, 'bleu': 20.5217, 'gen_len': 8.3356}




  1%|▏         | 13/880 [23:02<25:52:25, 107.43s/it]

For epoch 99: {Learning rate: [0.005560279189019022]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.08batches/s]



Metrics: {'train_loss': 0.00945651674147684, 'test_loss': 0.4710296854376793, 'bleu': 20.1634, 'gen_len': 8.2945}




  2%|▏         | 14/880 [24:47<25:41:24, 106.79s/it]

For epoch 100: {Learning rate: [0.005553822126168077]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.83batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.009945802474692577, 'test_loss': 0.48170366175472734, 'bleu': 18.741, 'gen_len': 8.4178}




  2%|▏         | 15/880 [26:34<25:39:05, 106.76s/it]

For epoch 101: {Learning rate: [0.0055473650633171305]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.84batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.07batches/s]



Metrics: {'train_loss': 0.009243944593103284, 'test_loss': 0.4800200551748276, 'bleu': 17.5592, 'gen_len': 8.6849}




  2%|▏         | 16/880 [28:20<25:36:43, 106.72s/it]

For epoch 102: {Learning rate: [0.005540908000466185]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.85batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.15batches/s]



Metrics: {'train_loss': 0.009784805786827147, 'test_loss': 0.48291702605783937, 'bleu': 17.2542, 'gen_len': 8.3082}




  2%|▏         | 17/880 [30:06<25:32:12, 106.53s/it]

For epoch 103: {Learning rate: [0.00553445093761524]}


Train batch number 164: 100%|██████████| 164/164 [01:30<00:00,  1.80batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.10batches/s]



Metrics: {'train_loss': 0.008450767824997013, 'test_loss': 0.4771898296661675, 'bleu': 18.7745, 'gen_len': 8.4521}




  2%|▏         | 18/880 [31:56<25:41:23, 107.29s/it]

For epoch 104: {Learning rate: [0.005527993874764293]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.84batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.15batches/s]



Metrics: {'train_loss': 0.008678325991396134, 'test_loss': 0.4931886851787567, 'bleu': 18.6468, 'gen_len': 8.5}




  2%|▏         | 19/880 [33:42<25:36:01, 107.04s/it]

For epoch 105: {Learning rate: [0.005521536811913347]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.00842530490183158, 'test_loss': 0.48596997633576394, 'bleu': 18.453, 'gen_len': 8.226}




  2%|▏         | 20/880 [35:21<24:58:46, 104.57s/it]

For epoch 106: {Learning rate: [0.0055150797490624015]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.009022401078432378, 'test_loss': 0.48717884831130503, 'bleu': 19.7825, 'gen_len': 8.3767}




  2%|▏         | 21/880 [37:01<24:38:09, 103.25s/it]

For epoch 107: {Learning rate: [0.005508622686211456]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.01s/batches]



Metrics: {'train_loss': 0.009133258577160788, 'test_loss': 0.4930767672136426, 'bleu': 18.8114, 'gen_len': 8.363}




  2%|▎         | 22/880 [38:45<24:39:14, 103.44s/it]

For epoch 108: {Learning rate: [0.00550216562336051]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.008833021733153146, 'test_loss': 0.4873193148523569, 'bleu': 19.6905, 'gen_len': 8.089}




  3%|▎         | 23/880 [40:26<24:28:46, 102.83s/it]

For epoch 109: {Learning rate: [0.005495708560509564]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.10batches/s]



Metrics: {'train_loss': 0.008042450027793033, 'test_loss': 0.4904258966445923, 'bleu': 18.4686, 'gen_len': 8.4726}




  3%|▎         | 24/880 [42:09<24:28:40, 102.94s/it]

For epoch 110: {Learning rate: [0.005489251497658618]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.27batches/s]



Metrics: {'train_loss': 0.008133656481776398, 'test_loss': 0.4841260457411408, 'bleu': 19.0185, 'gen_len': 8.2671}




  3%|▎         | 25/880 [43:49<24:11:17, 101.84s/it]

For epoch 111: {Learning rate: [0.0054827944348076725]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.008048074600349248, 'test_loss': 0.4905277371406555, 'bleu': 18.1504, 'gen_len': 8.3493}




  3%|▎         | 26/880 [45:32<24:16:31, 102.33s/it]

For epoch 112: {Learning rate: [0.005476337371956727]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.84batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.007515322083923909, 'test_loss': 0.4873209480196238, 'bleu': 19.3504, 'gen_len': 8.363}




  3%|▎         | 27/880 [47:18<24:28:12, 103.27s/it]

For epoch 113: {Learning rate: [0.005469880309105781]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.83batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.09s/batches]



Metrics: {'train_loss': 0.008200015154526648, 'test_loss': 0.495061369240284, 'bleu': 16.5056, 'gen_len': 8.4521}




  3%|▎         | 28/880 [49:07<24:50:28, 104.96s/it]

For epoch 114: {Learning rate: [0.005463423246254835]}


Train batch number 164: 100%|██████████| 164/164 [01:30<00:00,  1.81batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.01s/batches]



Metrics: {'train_loss': 0.008598847338495344, 'test_loss': 0.49646984934806826, 'bleu': 19.3883, 'gen_len': 8.3014}




  3%|▎         | 29/880 [50:58<25:17:45, 107.01s/it]

For epoch 115: {Learning rate: [0.00545696618340389]}


Train batch number 164: 100%|██████████| 164/164 [01:43<00:00,  1.59batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.05batches/s]



Metrics: {'train_loss': 0.007970585006682442, 'test_loss': 0.48401788957417013, 'bleu': 18.6153, 'gen_len': 8.5479}




  3%|▎         | 30/880 [53:00<26:18:53, 111.45s/it]

For epoch 116: {Learning rate: [0.005450509120552944]}


Train batch number 164: 100%|██████████| 164/164 [01:31<00:00,  1.79batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.00789363804133467, 'test_loss': 0.4825442243367434, 'bleu': 18.2801, 'gen_len': 8.5137}




  4%|▎         | 31/880 [54:49<26:05:32, 110.64s/it]

For epoch 117: {Learning rate: [0.005444052057701998]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.83batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.008884657864322567, 'test_loss': 0.4684673771262169, 'bleu': 20.1328, 'gen_len': 8.3219}




  4%|▎         | 32/880 [56:36<25:47:44, 109.51s/it]

For epoch 118: {Learning rate: [0.005437594994851052]}


Train batch number 164: 100%|██████████| 164/164 [01:30<00:00,  1.81batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.007502214090292421, 'test_loss': 0.4804682407528162, 'bleu': 19.168, 'gen_len': 8.2192}




  4%|▍         | 33/880 [58:24<25:38:49, 109.01s/it]

For epoch 119: {Learning rate: [0.005431137932000106]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.84batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.01s/batches]



Metrics: {'train_loss': 0.00809476404147728, 'test_loss': 0.48475802727043626, 'bleu': 15.5779, 'gen_len': 8.8973}




  4%|▍         | 34/880 [1:00:11<25:31:26, 108.61s/it]

For epoch 120: {Learning rate: [0.005424680869149161]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.008975917127028835, 'test_loss': 0.4884779494255781, 'bleu': 19.0533, 'gen_len': 8.2329}




  4%|▍         | 35/880 [1:01:50<24:46:40, 105.56s/it]

For epoch 121: {Learning rate: [0.005418223806298215]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.007741353028771899, 'test_loss': 0.4908464288339019, 'bleu': 17.792, 'gen_len': 8.1918}




  4%|▍         | 36/880 [1:03:29<24:16:39, 103.55s/it]

For epoch 122: {Learning rate: [0.005411766743447268]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.008487027061985317, 'test_loss': 0.48625226840376856, 'bleu': 19.1904, 'gen_len': 8.6438}




  4%|▍         | 37/880 [1:05:07<23:52:31, 101.96s/it]

For epoch 123: {Learning rate: [0.005405309680596323]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.007167284843822793, 'test_loss': 0.4875902608036995, 'bleu': 19.2196, 'gen_len': 8.2808}




  4%|▍         | 38/880 [1:06:46<23:38:37, 101.09s/it]

For epoch 124: {Learning rate: [0.005398852617745377]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.007492548248747636, 'test_loss': 0.4863208081573248, 'bleu': 18.8094, 'gen_len': 8.2466}




  4%|▍         | 39/880 [1:08:24<23:22:53, 100.09s/it]

For epoch 125: {Learning rate: [0.005392395554894431]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.007837528788602468, 'test_loss': 0.49577311500906945, 'bleu': 17.637, 'gen_len': 8.5822}




  5%|▍         | 40/880 [1:10:07<23:33:17, 100.95s/it]

For epoch 126: {Learning rate: [0.005385938492043486]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.27batches/s]



Metrics: {'train_loss': 0.008050415740155319, 'test_loss': 0.4845694452524185, 'bleu': 18.6672, 'gen_len': 8.411}




  5%|▍         | 41/880 [1:11:48<23:32:52, 101.04s/it]

For epoch 127: {Learning rate: [0.005379481429192539]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.0070663999356883125, 'test_loss': 0.4955284398049116, 'bleu': 20.0036, 'gen_len': 8.2671}




  5%|▍         | 42/880 [1:13:29<23:32:43, 101.15s/it]

For epoch 128: {Learning rate: [0.005373024366341594]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.15batches/s]



Metrics: {'train_loss': 0.006806766846971075, 'test_loss': 0.5053371995687485, 'bleu': 18.4602, 'gen_len': 8.5137}




  5%|▍         | 43/880 [1:15:09<23:25:21, 100.74s/it]

For epoch 129: {Learning rate: [0.0053665673034906484]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.006654358095746408, 'test_loss': 0.5053524613380432, 'bleu': 19.2877, 'gen_len': 8.5685}




  5%|▌         | 44/880 [1:16:51<23:27:54, 101.05s/it]

For epoch 130: {Learning rate: [0.005360110240639702]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.006639894254742724, 'test_loss': 0.4919954476878047, 'bleu': 20.0385, 'gen_len': 8.5411}




  5%|▌         | 45/880 [1:18:31<23:21:33, 100.71s/it]

For epoch 131: {Learning rate: [0.005353653177788756]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.01s/batches]



Metrics: {'train_loss': 0.007590647297591914, 'test_loss': 0.4952439650893211, 'bleu': 19.0552, 'gen_len': 8.3425}




  5%|▌         | 46/880 [1:20:11<23:18:01, 100.58s/it]

For epoch 132: {Learning rate: [0.005347196114937811]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.0068693860874457354, 'test_loss': 0.4976315269246697, 'bleu': 17.7833, 'gen_len': 8.5342}




  5%|▌         | 47/880 [1:21:51<23:13:49, 100.40s/it]

For epoch 133: {Learning rate: [0.005340739052086865]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.007022496085349902, 'test_loss': 0.4975840948522091, 'bleu': 18.9034, 'gen_len': 8.4726}




  5%|▌         | 48/880 [1:23:28<22:59:52, 99.51s/it] 

For epoch 134: {Learning rate: [0.0053342819892359195]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.006005478724546265, 'test_loss': 0.5023626573383808, 'bleu': 17.9518, 'gen_len': 8.4384}




  6%|▌         | 49/880 [1:25:07<22:53:02, 99.14s/it]

For epoch 135: {Learning rate: [0.005327824926384973]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.27batches/s]



Metrics: {'train_loss': 0.006811129679568339, 'test_loss': 0.48725941255688665, 'bleu': 20.1052, 'gen_len': 8.4863}




  6%|▌         | 50/880 [1:26:46<22:49:56, 99.03s/it]

For epoch 136: {Learning rate: [0.005321367863534028]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.005927240727653886, 'test_loss': 0.4957412973046303, 'bleu': 20.8368, 'gen_len': 8.7603}




  6%|▌         | 51/880 [1:28:24<22:45:42, 98.84s/it]

For epoch 137: {Learning rate: [0.005314910800683082]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.006884677852212917, 'test_loss': 0.485458667576313, 'bleu': 18.0652, 'gen_len': 8.7192}




  6%|▌         | 52/880 [1:30:01<22:34:32, 98.16s/it]

For epoch 138: {Learning rate: [0.005308453737832136]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.007123745794027544, 'test_loss': 0.4918370388448238, 'bleu': 19.0094, 'gen_len': 8.4247}




  6%|▌         | 53/880 [1:31:39<22:34:53, 98.30s/it]

For epoch 139: {Learning rate: [0.00530199667498119]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.36batches/s]



Metrics: {'train_loss': 0.005648415505253602, 'test_loss': 0.4987149339169264, 'bleu': 20.0069, 'gen_len': 8.3014}




  6%|▌         | 54/880 [1:33:16<22:27:33, 97.89s/it]

For epoch 140: {Learning rate: [0.005295539612130244]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.006626019516616219, 'test_loss': 0.4936957910656929, 'bleu': 17.9985, 'gen_len': 8.5479}




  6%|▋         | 55/880 [1:34:54<22:24:38, 97.79s/it]

For epoch 141: {Learning rate: [0.005289082549279299]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.006227375554749969, 'test_loss': 0.4963867351412773, 'bleu': 20.2492, 'gen_len': 8.4521}




  6%|▋         | 56/880 [1:36:31<22:23:00, 97.79s/it]

For epoch 142: {Learning rate: [0.0052826254864283525]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.006765339783358774, 'test_loss': 0.48756181336939336, 'bleu': 19.717, 'gen_len': 8.1712}




  6%|▋         | 57/880 [1:38:10<22:25:19, 98.08s/it]

For epoch 143: {Learning rate: [0.005276168423577407]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.005728405727806115, 'test_loss': 0.489780330657959, 'bleu': 19.6845, 'gen_len': 8.363}




  7%|▋         | 58/880 [1:39:51<22:35:47, 98.96s/it]

For epoch 144: {Learning rate: [0.005269711360726461]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.006250756472315804, 'test_loss': 0.5022975251078605, 'bleu': 20.6666, 'gen_len': 8.4384}




  7%|▋         | 59/880 [1:41:33<22:44:42, 99.73s/it]

For epoch 145: {Learning rate: [0.005263254297875515]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.006271915373295863, 'test_loss': 0.4980825178325176, 'bleu': 19.5268, 'gen_len': 8.4521}




  7%|▋         | 60/880 [1:43:11<22:36:46, 99.28s/it]

For epoch 146: {Learning rate: [0.00525679723502457]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.005857485907632153, 'test_loss': 0.509405805170536, 'bleu': 18.6939, 'gen_len': 8.6233}




  7%|▋         | 61/880 [1:44:48<22:27:59, 98.75s/it]

For epoch 147: {Learning rate: [0.0052503401721736235]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.006535187416419631, 'test_loss': 0.4970967762172222, 'bleu': 20.3843, 'gen_len': 8.6644}




  7%|▋         | 62/880 [1:46:26<22:20:40, 98.34s/it]

For epoch 148: {Learning rate: [0.005243883109322677]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.0070347054849063595, 'test_loss': 0.49571226760745046, 'bleu': 18.4586, 'gen_len': 8.0137}




  7%|▋         | 63/880 [1:48:03<22:15:18, 98.06s/it]

For epoch 149: {Learning rate: [0.005237426046471733]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.006846142742355502, 'test_loss': 0.4936381604522467, 'bleu': 18.5826, 'gen_len': 8.4041}




  7%|▋         | 64/880 [1:49:40<22:09:22, 97.75s/it]

For epoch 150: {Learning rate: [0.005230968983620786]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.006111497696554401, 'test_loss': 0.4962434310466051, 'bleu': 20.5134, 'gen_len': 8.0479}




  7%|▋         | 65/880 [1:51:19<22:11:32, 98.03s/it]

For epoch 151: {Learning rate: [0.00522451192076984]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.006186606473021107, 'test_loss': 0.5055241912603379, 'bleu': 21.0304, 'gen_len': 8.2603}




  8%|▊         | 66/880 [1:52:56<22:06:24, 97.77s/it]

For epoch 152: {Learning rate: [0.0052180548579188945]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.005552218832989882, 'test_loss': 0.4973263446241617, 'bleu': 20.0115, 'gen_len': 8.3014}




  8%|▊         | 67/880 [1:54:34<22:04:36, 97.76s/it]

For epoch 153: {Learning rate: [0.005211597795067949]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.36batches/s]



Metrics: {'train_loss': 0.006582519956104878, 'test_loss': 0.4979657521471381, 'bleu': 19.8656, 'gen_len': 8.6438}




  8%|▊         | 68/880 [1:56:11<22:00:48, 97.60s/it]

For epoch 154: {Learning rate: [0.005205140732217004]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.006584589434546282, 'test_loss': 0.49190116915851834, 'bleu': 18.9583, 'gen_len': 8.8151}




  8%|▊         | 69/880 [1:57:48<21:56:31, 97.40s/it]

For epoch 155: {Learning rate: [0.005198683669366057]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.37batches/s]



Metrics: {'train_loss': 0.007009030712297096, 'test_loss': 0.48862215541303156, 'bleu': 20.3234, 'gen_len': 8.4863}




  8%|▊         | 70/880 [1:59:24<21:50:52, 97.10s/it]

For epoch 156: {Learning rate: [0.005192226606515111]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.005555932026387692, 'test_loss': 0.5061743803322315, 'bleu': 20.1422, 'gen_len': 8.5479}




  8%|▊         | 71/880 [2:01:02<21:49:44, 97.14s/it]

For epoch 157: {Learning rate: [0.0051857695436641656]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.005960816922219592, 'test_loss': 0.501911246497184, 'bleu': 18.0224, 'gen_len': 8.3356}




  8%|▊         | 72/880 [2:02:39<21:49:18, 97.23s/it]

For epoch 158: {Learning rate: [0.00517931248081322]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.006051451102393912, 'test_loss': 0.504501573741436, 'bleu': 18.7034, 'gen_len': 8.6027}




  8%|▊         | 73/880 [2:04:16<21:44:33, 96.99s/it]

For epoch 159: {Learning rate: [0.005172855417962274]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.0059241498627937, 'test_loss': 0.4902939986437559, 'bleu': 18.6291, 'gen_len': 8.637}




### ---

In [None]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 160: {Learning rate: [0.005166398355111328]}


Train batch number 164: 100%|██████████| 164/164 [01:14<00:00,  2.20batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.0059549497336109465, 'test_loss': 0.5013580966740847, 'bleu': 18.4026, 'gen_len': 8.6644}




  0%|          | 1/806 [01:30<20:17:22, 90.74s/it]

For epoch 161: {Learning rate: [0.005159941292260382]}


Train batch number 164: 100%|██████████| 164/164 [01:18<00:00,  2.08batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.005939603022054099, 'test_loss': 0.4924946501851082, 'bleu': 19.3459, 'gen_len': 8.137}




  0%|          | 2/806 [03:05<20:50:08, 93.29s/it]

For epoch 162: {Learning rate: [0.005153484229409437]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.006088046526689673, 'test_loss': 0.48542294688522813, 'bleu': 18.4273, 'gen_len': 8.2603}




  0%|          | 3/806 [04:49<21:52:21, 98.06s/it]

For epoch 163: {Learning rate: [0.005147027166558491]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.005768591831771873, 'test_loss': 0.48902260065078734, 'bleu': 17.9698, 'gen_len': 8.3151}




  0%|          | 4/806 [06:29<22:02:38, 98.95s/it]

For epoch 164: {Learning rate: [0.005140570103707545]}


Train batch number 164: 100%|██████████| 164/164 [01:30<00:00,  1.81batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.20batches/s]



Metrics: {'train_loss': 0.005299376647688005, 'test_loss': 0.5042872197926045, 'bleu': 17.0666, 'gen_len': 8.3836}




  1%|          | 5/806 [08:17<22:42:04, 102.03s/it]

For epoch 165: {Learning rate: [0.0051341130408565985]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.005377380286163094, 'test_loss': 0.49661434404551985, 'bleu': 18.991, 'gen_len': 8.411}




  1%|          | 6/806 [09:57<22:32:40, 101.45s/it]

For epoch 166: {Learning rate: [0.005127655978005653]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.0064642438587836255, 'test_loss': 0.49607397653162477, 'bleu': 20.0287, 'gen_len': 8.3425}




  1%|          | 7/806 [11:36<22:18:26, 100.51s/it]

For epoch 167: {Learning rate: [0.005121198915154708]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.005599884515881635, 'test_loss': 0.49838277585804464, 'bleu': 19.4148, 'gen_len': 8.5342}




  1%|          | 8/806 [13:14<22:08:30, 99.89s/it] 

For epoch 168: {Learning rate: [0.005114741852303761]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.005900395420900257, 'test_loss': 0.4943507395684719, 'bleu': 20.5789, 'gen_len': 8.4247}




  1%|          | 9/806 [14:52<21:57:39, 99.20s/it]

For epoch 169: {Learning rate: [0.005108284789452816]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.00555053328948483, 'test_loss': 0.4971000589430332, 'bleu': 21.6229, 'gen_len': 8.3356}




  1%|          | 10/806 [16:39<22:28:08, 101.62s/it]

For epoch 170: {Learning rate: [0.0051018277266018704]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.005574574554995662, 'test_loss': 0.5014278501272201, 'bleu': 19.9339, 'gen_len': 8.1301}




  1%|▏         | 11/806 [18:16<22:09:27, 100.34s/it]

For epoch 171: {Learning rate: [0.005095370663750924]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.006095876080507878, 'test_loss': 0.49123134166002275, 'bleu': 20.9047, 'gen_len': 8.1301}




  1%|▏         | 12/806 [19:58<22:12:08, 100.67s/it]

For epoch 172: {Learning rate: [0.005088913600899979]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.00565701465130565, 'test_loss': 0.48780797943472864, 'bleu': 18.4877, 'gen_len': 8.3973}




  2%|▏         | 13/806 [21:37<22:02:53, 100.09s/it]

For epoch 173: {Learning rate: [0.005082456538049032]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.005343636994306197, 'test_loss': 0.4923470377922058, 'bleu': 19.0839, 'gen_len': 8.5411}




  2%|▏         | 14/806 [23:16<21:59:07, 99.93s/it] 

For epoch 174: {Learning rate: [0.005075999475198087]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.0053363314987885735, 'test_loss': 0.4943215135484934, 'bleu': 20.5817, 'gen_len': 8.2877}




  2%|▏         | 15/806 [24:55<21:54:14, 99.69s/it]

For epoch 175: {Learning rate: [0.0050695424123471415]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.004902158955945463, 'test_loss': 0.4970378313213587, 'bleu': 18.4545, 'gen_len': 8.3699}




  2%|▏         | 16/806 [26:35<21:52:27, 99.68s/it]

For epoch 176: {Learning rate: [0.005063085349496195]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.005428318919812006, 'test_loss': 0.49681963101029397, 'bleu': 19.1337, 'gen_len': 8.5685}




  2%|▏         | 17/806 [28:14<21:48:58, 99.54s/it]

For epoch 177: {Learning rate: [0.00505662828664525]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.005239913889498694, 'test_loss': 0.49358680546283723, 'bleu': 19.0422, 'gen_len': 8.1233}




  2%|▏         | 18/806 [29:55<21:50:42, 99.80s/it]

For epoch 178: {Learning rate: [0.005050171223794303]}


Train batch number 164: 100%|██████████| 164/164 [01:31<00:00,  1.79batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:11<00:00,  1.18s/batches]



Metrics: {'train_loss': 0.005384523031463642, 'test_loss': 0.4932085007429123, 'bleu': 19.8618, 'gen_len': 8.5274}




  2%|▏         | 19/806 [31:47<22:37:36, 103.50s/it]

For epoch 179: {Learning rate: [0.005043714160943358]}


Train batch number 164: 100%|██████████| 164/164 [01:32<00:00,  1.77batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.005961863980923137, 'test_loss': 0.5024944946169854, 'bleu': 20.2236, 'gen_len': 8.4863}




  2%|▏         | 20/806 [33:37<23:00:52, 105.41s/it]

For epoch 180: {Learning rate: [0.0050372570980924125]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.004931966735567563, 'test_loss': 0.49693833813071253, 'bleu': 20.0361, 'gen_len': 8.5137}




  3%|▎         | 21/806 [35:21<22:53:25, 104.97s/it]

For epoch 181: {Learning rate: [0.005030800035241466]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.0055227286081900254, 'test_loss': 0.4944369297474623, 'bleu': 20.3036, 'gen_len': 8.2329}




  3%|▎         | 22/806 [37:03<22:42:59, 104.31s/it]

For epoch 182: {Learning rate: [0.00502434297239052]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.005212713847424489, 'test_loss': 0.4980650182813406, 'bleu': 19.1688, 'gen_len': 8.7466}




  3%|▎         | 23/806 [38:44<22:28:26, 103.33s/it]

For epoch 183: {Learning rate: [0.0050178859095395744]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.88batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.09batches/s]



Metrics: {'train_loss': 0.005666995194166326, 'test_loss': 0.5006027169525623, 'bleu': 20.2855, 'gen_len': 8.4247}




  3%|▎         | 24/806 [40:29<22:32:32, 103.78s/it]

For epoch 184: {Learning rate: [0.005011428846688629]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.005675866317554809, 'test_loss': 0.5055985566228628, 'bleu': 19.4246, 'gen_len': 8.4247}




  3%|▎         | 25/806 [42:13<22:31:22, 103.82s/it]

For epoch 185: {Learning rate: [0.005004971783837683]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.005456359433393539, 'test_loss': 0.4993519127368927, 'bleu': 22.2609, 'gen_len': 8.2466}




  3%|▎         | 26/806 [44:02<22:50:39, 105.43s/it]

For epoch 186: {Learning rate: [0.004998514720986737]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.0048840270870508596, 'test_loss': 0.5179280653595925, 'bleu': 22.1226, 'gen_len': 8.2671}




  3%|▎         | 27/806 [45:44<22:33:50, 104.28s/it]

For epoch 187: {Learning rate: [0.004992057658135791]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.0053730115669611955, 'test_loss': 0.5065708786249161, 'bleu': 18.8393, 'gen_len': 8.0274}




  3%|▎         | 28/806 [47:28<22:30:34, 104.16s/it]

For epoch 188: {Learning rate: [0.0049856005952848455]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.85batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.0054221507452894, 'test_loss': 0.5078316889703274, 'bleu': 16.4438, 'gen_len': 8.4589}




  4%|▎         | 29/806 [49:14<22:35:54, 104.70s/it]

For epoch 189: {Learning rate: [0.0049791435324339]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.88batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.005528074002696241, 'test_loss': 0.5005578123033047, 'bleu': 20.6339, 'gen_len': 8.3356}




  4%|▎         | 30/806 [50:58<22:32:58, 104.61s/it]

For epoch 190: {Learning rate: [0.004972686469582954]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.07batches/s]



Metrics: {'train_loss': 0.005441419393652567, 'test_loss': 0.5052222810685635, 'bleu': 19.442, 'gen_len': 8.2945}




  4%|▍         | 31/806 [52:44<22:35:29, 104.94s/it]

For epoch 191: {Learning rate: [0.004966229406732007]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.005123219450757549, 'test_loss': 0.5027111485600472, 'bleu': 21.249, 'gen_len': 8.3356}




  4%|▍         | 32/806 [54:26<22:24:36, 104.23s/it]

For epoch 192: {Learning rate: [0.004959772343881063]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.0050037090355855495, 'test_loss': 0.5070903472602367, 'bleu': 19.1169, 'gen_len': 8.3493}




  4%|▍         | 33/806 [56:10<22:21:35, 104.13s/it]

For epoch 193: {Learning rate: [0.0049533152810301165]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.88batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.005083557128960909, 'test_loss': 0.5109806425869465, 'bleu': 18.3033, 'gen_len': 8.5822}




  4%|▍         | 34/806 [57:55<22:22:04, 104.31s/it]

For epoch 194: {Learning rate: [0.004946858218179171]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.86batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.005330683215572386, 'test_loss': 0.49441706836223603, 'bleu': 21.1313, 'gen_len': 8.5479}




  4%|▍         | 35/806 [59:41<22:25:14, 104.69s/it]

For epoch 195: {Learning rate: [0.004940401155328225]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.004736465703367352, 'test_loss': 0.50141436830163, 'bleu': 20.8838, 'gen_len': 8.2466}




  4%|▍         | 36/806 [1:01:23<22:15:55, 104.10s/it]

For epoch 196: {Learning rate: [0.004933944092477279]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.20batches/s]



Metrics: {'train_loss': 0.004929813848476198, 'test_loss': 0.5028044946491719, 'bleu': 18.8075, 'gen_len': 8.363}




  5%|▍         | 37/806 [1:03:06<22:09:22, 103.72s/it]

For epoch 197: {Learning rate: [0.004927487029626334]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.09batches/s]



Metrics: {'train_loss': 0.004994985935077349, 'test_loss': 0.5058948241174221, 'bleu': 19.5582, 'gen_len': 8.2808}




  5%|▍         | 38/806 [1:04:50<22:06:14, 103.61s/it]

For epoch 198: {Learning rate: [0.0049210299667753876]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.005389007393404892, 'test_loss': 0.4977235995233059, 'bleu': 20.4174, 'gen_len': 8.3082}




  5%|▍         | 39/806 [1:06:34<22:06:10, 103.74s/it]

For epoch 199: {Learning rate: [0.004914572903924441]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.004599140845551496, 'test_loss': 0.5070281740278005, 'bleu': 20.5061, 'gen_len': 8.3425}




  5%|▍         | 40/806 [1:08:17<22:02:04, 103.56s/it]

For epoch 200: {Learning rate: [0.004908115841073496]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.00553193710291497, 'test_loss': 0.5125220574438571, 'bleu': 18.1867, 'gen_len': 8.2671}




  5%|▌         | 41/806 [1:10:00<21:58:23, 103.40s/it]

For epoch 201: {Learning rate: [0.00490165877822255]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.005115218184339649, 'test_loss': 0.5155674748122692, 'bleu': 18.8, 'gen_len': 8.5479}




  5%|▌         | 42/806 [1:11:41<21:49:56, 102.88s/it]

For epoch 202: {Learning rate: [0.004895201715371604]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.0052789100221317066, 'test_loss': 0.5089228812605142, 'bleu': 18.1834, 'gen_len': 8.4247}




  5%|▌         | 43/806 [1:13:21<21:35:54, 101.91s/it]

For epoch 203: {Learning rate: [0.004888744652520659]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.0052859673178761845, 'test_loss': 0.5149091128259897, 'bleu': 21.0148, 'gen_len': 8.3356}




  5%|▌         | 44/806 [1:15:02<21:30:33, 101.62s/it]

For epoch 204: {Learning rate: [0.004882287589669712]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.005074593304681646, 'test_loss': 0.5078780427575111, 'bleu': 20.5911, 'gen_len': 8.3493}




  6%|▌         | 45/806 [1:16:45<21:32:43, 101.92s/it]

For epoch 205: {Learning rate: [0.004875830526818767]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.004833681661072726, 'test_loss': 0.5122073467820882, 'bleu': 18.6387, 'gen_len': 8.274}




  6%|▌         | 46/806 [1:18:28<21:36:22, 102.35s/it]

For epoch 206: {Learning rate: [0.004869373463967821]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.86batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.004480321309351783, 'test_loss': 0.508399095479399, 'bleu': 18.8938, 'gen_len': 8.5548}




  6%|▌         | 47/806 [1:20:14<21:48:24, 103.43s/it]

For epoch 207: {Learning rate: [0.004862916401116875]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.0044797042158656614, 'test_loss': 0.5267911486327648, 'bleu': 19.1386, 'gen_len': 8.2397}




  6%|▌         | 48/806 [1:21:58<21:49:16, 103.64s/it]

For epoch 208: {Learning rate: [0.004856459338265929]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.83batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.11batches/s]



Metrics: {'train_loss': 0.004758667293527851, 'test_loss': 0.512888240814209, 'bleu': 17.0595, 'gen_len': 8.3151}




  6%|▌         | 49/806 [1:23:45<21:59:22, 104.57s/it]

For epoch 209: {Learning rate: [0.004850002275414984]}


Train batch number 164: 100%|██████████| 164/164 [01:55<00:00,  1.43batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.13batches/s]



Metrics: {'train_loss': 0.0051424675846469545, 'test_loss': 0.5066626634448766, 'bleu': 20.0817, 'gen_len': 8.4315}




  6%|▌         | 50/806 [1:25:58<23:44:02, 113.02s/it]

For epoch 210: {Learning rate: [0.004843545212564038]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.0047386577213573465, 'test_loss': 0.5151666355319321, 'bleu': 20.0969, 'gen_len': 8.1644}




  6%|▋         | 51/806 [1:27:39<22:56:57, 109.43s/it]

For epoch 211: {Learning rate: [0.0048370881497130916]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.00batches/s]



Metrics: {'train_loss': 0.004827853326654493, 'test_loss': 0.515923871845007, 'bleu': 18.6935, 'gen_len': 8.4521}




  6%|▋         | 52/806 [1:29:21<22:29:51, 107.42s/it]

For epoch 212: {Learning rate: [0.004830631086862146]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.004638872900480836, 'test_loss': 0.5154665499925614, 'bleu': 19.0498, 'gen_len': 8.2603}




  7%|▋         | 53/806 [1:31:05<22:12:44, 106.19s/it]

For epoch 213: {Learning rate: [0.004824174024011201]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.0044272166313731294, 'test_loss': 0.5125186301767826, 'bleu': 20.3378, 'gen_len': 8.274}




  7%|▋         | 54/806 [1:32:45<21:47:46, 104.34s/it]

For epoch 214: {Learning rate: [0.004817716961160255]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.004903883376089918, 'test_loss': 0.5057087190449238, 'bleu': 18.0738, 'gen_len': 8.4863}




  7%|▋         | 55/806 [1:34:27<21:37:03, 103.63s/it]

For epoch 215: {Learning rate: [0.004811259898309309]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.004456489690932269, 'test_loss': 0.509416963160038, 'bleu': 20.8378, 'gen_len': 8.3973}




  7%|▋         | 56/806 [1:36:08<21:27:23, 102.99s/it]

For epoch 216: {Learning rate: [0.004804802835458363]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.20batches/s]



Metrics: {'train_loss': 0.0045143720516778825, 'test_loss': 0.5094841528683901, 'bleu': 21.1472, 'gen_len': 8.3699}




  7%|▋         | 57/806 [1:37:50<21:22:02, 102.70s/it]

For epoch 217: {Learning rate: [0.004798345772607417]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.004586916138001337, 'test_loss': 0.5101900653913617, 'bleu': 21.6224, 'gen_len': 8.4315}




  7%|▋         | 58/806 [1:39:34<21:24:46, 103.06s/it]

For epoch 218: {Learning rate: [0.004791888709756472]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.004090922221510526, 'test_loss': 0.5164309404790401, 'bleu': 20.7946, 'gen_len': 8.1507}




  7%|▋         | 59/806 [1:41:18<21:26:03, 103.30s/it]

For epoch 219: {Learning rate: [0.004785431646905525]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.0041520192621757775, 'test_loss': 0.5184634130448103, 'bleu': 20.287, 'gen_len': 8.4452}




  7%|▋         | 60/806 [1:43:01<21:23:57, 103.27s/it]

For epoch 220: {Learning rate: [0.00477897458405458]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.004376972162323754, 'test_loss': 0.5106577871367335, 'bleu': 18.6131, 'gen_len': 8.3425}




  8%|▊         | 61/806 [1:44:46<21:27:17, 103.67s/it]

For epoch 221: {Learning rate: [0.004772517521203634]}


Train batch number 164: 100%|██████████| 164/164 [01:42<00:00,  1.60batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.004498700950690119, 'test_loss': 0.5165519263595343, 'bleu': 20.1501, 'gen_len': 8.2466}




  8%|▊         | 62/806 [1:46:46<22:25:47, 108.53s/it]

For epoch 222: {Learning rate: [0.004766060458352688]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.11batches/s]



Metrics: {'train_loss': 0.004617030937391441, 'test_loss': 0.5123066952452063, 'bleu': 20.9907, 'gen_len': 8.4041}




  8%|▊         | 63/806 [1:48:27<21:57:21, 106.38s/it]

For epoch 223: {Learning rate: [0.004759603395501743]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.005225679006288396, 'test_loss': 0.5045823570340872, 'bleu': 20.1229, 'gen_len': 8.3562}




  8%|▊         | 64/806 [1:50:08<21:36:41, 104.85s/it]

For epoch 224: {Learning rate: [0.004753146332650796]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.27batches/s]



Metrics: {'train_loss': 0.004432270558559472, 'test_loss': 0.5123186014592648, 'bleu': 20.3613, 'gen_len': 8.3699}




  8%|▊         | 65/806 [1:51:49<21:20:17, 103.67s/it]

For epoch 225: {Learning rate: [0.00474668926979985]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.004955477039402467, 'test_loss': 0.5096656244248152, 'bleu': 19.8796, 'gen_len': 8.5548}




  8%|▊         | 66/806 [1:53:28<21:02:11, 102.34s/it]

For epoch 226: {Learning rate: [0.0047402322069489055]}


Train batch number 75:  45%|████▌     | 74/164 [00:40<00:49,  1.83batches/s]

### ---

In [8]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 226: {Learning rate: [0.0047402322069489055]}


Train batch number 164: 100%|██████████| 164/164 [01:13<00:00,  2.24batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:06<00:00,  1.44batches/s]



Metrics: {'train_loss': 0.004636473270538107, 'test_loss': 0.5110413627699018, 'bleu': 18.6669, 'gen_len': 8.4521}




  0%|          | 1/740 [01:29<18:17:17, 89.09s/it]

For epoch 227: {Learning rate: [0.004733775144097959]}


Train batch number 164: 100%|██████████| 164/164 [01:18<00:00,  2.09batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.11batches/s]



Metrics: {'train_loss': 0.004259910349657464, 'test_loss': 0.5154160179197789, 'bleu': 20.1116, 'gen_len': 8.3493}




  0%|          | 2/740 [03:04<19:04:42, 93.07s/it]

For epoch 228: {Learning rate: [0.004727318081247013]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.004388363023747111, 'test_loss': 0.509854313544929, 'bleu': 18.5472, 'gen_len': 8.6644}




  0%|          | 3/740 [04:42<19:26:48, 94.99s/it]

For epoch 229: {Learning rate: [0.0047208610183960675]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.05batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.004456182204349472, 'test_loss': 0.5155967690050602, 'bleu': 18.1494, 'gen_len': 8.4726}




  1%|          | 4/740 [06:19<19:33:59, 95.71s/it]

For epoch 230: {Learning rate: [0.004714403955545122]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.39batches/s]



Metrics: {'train_loss': 0.004675029255339745, 'test_loss': 0.5173316411674023, 'bleu': 17.8126, 'gen_len': 8.3425}




  1%|          | 5/740 [07:56<19:41:58, 96.49s/it]

For epoch 231: {Learning rate: [0.004707946892694176]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.005317732499573073, 'test_loss': 0.514179231133312, 'bleu': 20.4516, 'gen_len': 8.226}




  1%|          | 6/740 [09:36<19:53:22, 97.55s/it]

For epoch 232: {Learning rate: [0.00470148982984323]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.00553970340204215, 'test_loss': 0.5148325452581048, 'bleu': 18.6592, 'gen_len': 8.226}




  1%|          | 7/740 [11:15<19:55:30, 97.86s/it]

For epoch 233: {Learning rate: [0.004695032766992284]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.00567059104810696, 'test_loss': 0.49535274687223135, 'bleu': 19.6019, 'gen_len': 8.2055}




  1%|          | 8/740 [12:54<19:59:48, 98.34s/it]

For epoch 234: {Learning rate: [0.0046885757041413385]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.004291198724009829, 'test_loss': 0.5007011219859123, 'bleu': 20.283, 'gen_len': 8.1849}




  1%|          | 9/740 [14:36<20:13:31, 99.61s/it]

For epoch 235: {Learning rate: [0.004682118641290393]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.004867776495524506, 'test_loss': 0.5059314666315913, 'bleu': 20.5336, 'gen_len': 8.3904}




  1%|▏         | 10/740 [16:15<20:07:58, 99.29s/it]

For epoch 236: {Learning rate: [0.004675661578439447]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.03s/batches]



Metrics: {'train_loss': 0.00468992440330955, 'test_loss': 0.5073605658486485, 'bleu': 20.5102, 'gen_len': 8.1507}




  1%|▏         | 11/740 [17:57<20:17:43, 100.22s/it]

For epoch 237: {Learning rate: [0.004669204515588501]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.13batches/s]



Metrics: {'train_loss': 0.005073355382614309, 'test_loss': 0.5032914789393544, 'bleu': 21.8271, 'gen_len': 8.3973}




  2%|▏         | 12/740 [19:41<20:28:10, 101.22s/it]

For epoch 238: {Learning rate: [0.004662747452737555]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.0046295468312941585, 'test_loss': 0.5053722795564681, 'bleu': 18.6253, 'gen_len': 8.3699}




  2%|▏         | 13/740 [21:23<20:28:41, 101.40s/it]

For epoch 239: {Learning rate: [0.0046562903898866095]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.004392618010059799, 'test_loss': 0.5059478639625012, 'bleu': 19.4746, 'gen_len': 8.4041}




  2%|▏         | 14/740 [23:05<20:31:25, 101.77s/it]

For epoch 240: {Learning rate: [0.004649833327035664]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.004462125738649686, 'test_loss': 0.5040244970470666, 'bleu': 19.8445, 'gen_len': 8.274}




  2%|▏         | 15/740 [24:44<20:18:54, 100.87s/it]

For epoch 241: {Learning rate: [0.004643376264184718]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.004340763307437755, 'test_loss': 0.5094567119143903, 'bleu': 20.5114, 'gen_len': 8.5685}




  2%|▏         | 16/740 [26:23<20:11:15, 100.38s/it]

For epoch 242: {Learning rate: [0.0046369192013337715]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.004203399213576809, 'test_loss': 0.5071454612538219, 'bleu': 17.7419, 'gen_len': 8.6301}




  2%|▏         | 17/740 [28:01<20:01:08, 99.68s/it] 

For epoch 243: {Learning rate: [0.004630462138482826]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.00427703561253133, 'test_loss': 0.5114874558523297, 'bleu': 18.8334, 'gen_len': 8.4658}




  2%|▏         | 18/740 [29:43<20:07:36, 100.36s/it]

For epoch 244: {Learning rate: [0.004624005075631881]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.06s/batches]



Metrics: {'train_loss': 0.004071399399679915, 'test_loss': 0.5159735614433885, 'bleu': 19.9821, 'gen_len': 8.2603}




  3%|▎         | 19/740 [31:27<20:17:01, 101.28s/it]

For epoch 245: {Learning rate: [0.004617548012780934]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.00435796386728925, 'test_loss': 0.5176012389361858, 'bleu': 20.3759, 'gen_len': 8.4247}




  3%|▎         | 20/740 [33:07<20:10:51, 100.91s/it]

For epoch 246: {Learning rate: [0.004611090949929989]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.004892166837701707, 'test_loss': 0.5153631318360568, 'bleu': 18.9777, 'gen_len': 8.2192}




  3%|▎         | 21/740 [34:47<20:08:14, 100.83s/it]

For epoch 247: {Learning rate: [0.0046046338870790425]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.0052451077981237266, 'test_loss': 0.5025236491113901, 'bleu': 19.7055, 'gen_len': 8.4521}




  3%|▎         | 22/740 [36:30<20:11:53, 101.27s/it]

For epoch 248: {Learning rate: [0.004598176824228097]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.005015854636040764, 'test_loss': 0.5034972801804543, 'bleu': 21.1948, 'gen_len': 8.4384}




  3%|▎         | 23/740 [38:09<20:02:46, 100.65s/it]

For epoch 249: {Learning rate: [0.004591719761377152]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.004365456990410344, 'test_loss': 0.5179946945980192, 'bleu': 19.8187, 'gen_len': 8.2945}




  3%|▎         | 24/740 [39:49<20:00:39, 100.61s/it]

For epoch 250: {Learning rate: [0.004585262698526205]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.004224155758448632, 'test_loss': 0.5160039108246565, 'bleu': 18.7991, 'gen_len': 8.6986}




  3%|▎         | 25/740 [41:29<19:54:26, 100.23s/it]

For epoch 251: {Learning rate: [0.00457880563567526]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.004190283078503675, 'test_loss': 0.5173523366451264, 'bleu': 19.0365, 'gen_len': 8.4315}




  4%|▎         | 26/740 [43:08<19:48:30, 99.87s/it] 

For epoch 252: {Learning rate: [0.004572348572824314]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.004610613232039103, 'test_loss': 0.5164807893335819, 'bleu': 19.9685, 'gen_len': 8.4315}




  4%|▎         | 27/740 [44:47<19:43:53, 99.63s/it]

For epoch 253: {Learning rate: [0.004565891509973368]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.004282217572037508, 'test_loss': 0.5117859918624162, 'bleu': 21.69, 'gen_len': 8.4932}




  4%|▍         | 28/740 [46:25<19:38:00, 99.27s/it]

For epoch 254: {Learning rate: [0.004559434447122423]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.0045464247473651865, 'test_loss': 0.5169719748198986, 'bleu': 21.9902, 'gen_len': 8.3288}




  4%|▍         | 29/740 [48:06<19:42:55, 99.82s/it]

For epoch 255: {Learning rate: [0.004552977384271476]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.24batches/s]



Metrics: {'train_loss': 0.0040146432637834, 'test_loss': 0.5167984038591384, 'bleu': 20.1605, 'gen_len': 8.5}




  4%|▍         | 30/740 [49:48<19:46:30, 100.27s/it]

For epoch 256: {Learning rate: [0.004546520321420531]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.003993038773309885, 'test_loss': 0.5178216088563203, 'bleu': 20.0817, 'gen_len': 8.4384}




  4%|▍         | 31/740 [51:27<19:43:02, 100.12s/it]

For epoch 257: {Learning rate: [0.0045400632585695854]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.004692118435231372, 'test_loss': 0.5165381710976362, 'bleu': 19.6102, 'gen_len': 8.5753}




  4%|▍         | 32/740 [53:10<19:49:06, 100.77s/it]

For epoch 258: {Learning rate: [0.004533606195718639]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.20batches/s]



Metrics: {'train_loss': 0.004944252533122625, 'test_loss': 0.5141812849789857, 'bleu': 18.8476, 'gen_len': 8.1712}




  4%|▍         | 33/740 [54:52<19:51:41, 101.13s/it]

For epoch 259: {Learning rate: [0.004527149132867693]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.005020899487490391, 'test_loss': 0.5104330956935883, 'bleu': 21.3407, 'gen_len': 8.5342}




  5%|▍         | 34/740 [56:34<19:55:04, 101.56s/it]

For epoch 260: {Learning rate: [0.004520692070016747]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.003968455643633028, 'test_loss': 0.5124306730926037, 'bleu': 21.7456, 'gen_len': 8.4658}




  5%|▍         | 35/740 [58:15<19:49:00, 101.19s/it]

For epoch 261: {Learning rate: [0.004514235007165802]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.0036968619081052586, 'test_loss': 0.518300766311586, 'bleu': 19.8236, 'gen_len': 8.5479}




  5%|▍         | 36/740 [59:55<19:46:31, 101.12s/it]

For epoch 262: {Learning rate: [0.004507777944314856]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.01s/batches]



Metrics: {'train_loss': 0.004037758247937834, 'test_loss': 0.5132275857031345, 'bleu': 20.7114, 'gen_len': 8.3904}




  5%|▌         | 37/740 [1:01:41<20:01:02, 102.51s/it]

For epoch 263: {Learning rate: [0.00450132088146391]}


Train batch number 164: 100%|██████████| 164/164 [01:41<00:00,  1.61batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.003981778380143734, 'test_loss': 0.5104985043406487, 'bleu': 19.7126, 'gen_len': 8.4384}




  5%|▌         | 38/740 [1:03:40<20:55:41, 107.32s/it]

For epoch 264: {Learning rate: [0.004494863818612964]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.11batches/s]



Metrics: {'train_loss': 0.0038132950689387366, 'test_loss': 0.5150652872398496, 'bleu': 19.8052, 'gen_len': 8.363}




  5%|▌         | 39/740 [1:05:22<20:34:33, 105.67s/it]

For epoch 265: {Learning rate: [0.004488406755762018]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.004112998998030756, 'test_loss': 0.5169057577848435, 'bleu': 18.9854, 'gen_len': 8.2877}




  5%|▌         | 40/740 [1:07:05<20:26:36, 105.14s/it]

For epoch 266: {Learning rate: [0.004481949692911073]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.0039109926237426585, 'test_loss': 0.5178463123738766, 'bleu': 18.9957, 'gen_len': 8.4589}




  6%|▌         | 41/740 [1:08:49<20:19:59, 104.72s/it]

For epoch 267: {Learning rate: [0.004475492630060127]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.004319902206833187, 'test_loss': 0.5108829289674759, 'bleu': 20.3117, 'gen_len': 8.3767}




  6%|▌         | 42/740 [1:10:32<20:11:48, 104.17s/it]

For epoch 268: {Learning rate: [0.00446903556720918]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.004329862028609524, 'test_loss': 0.5170114908367396, 'bleu': 22.1189, 'gen_len': 8.411}




  6%|▌         | 43/740 [1:12:15<20:05:12, 103.75s/it]

For epoch 269: {Learning rate: [0.004462578504358236]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.94batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.27batches/s]



Metrics: {'train_loss': 0.004379981521374358, 'test_loss': 0.5075180444866418, 'bleu': 19.7759, 'gen_len': 8.2055}




  6%|▌         | 44/740 [1:13:56<19:52:49, 102.83s/it]

For epoch 270: {Learning rate: [0.0044561214415072894]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.02batches/s]



Metrics: {'train_loss': 0.0042694731160825145, 'test_loss': 0.5069333657622337, 'bleu': 20.504, 'gen_len': 8.3425}




  6%|▌         | 45/740 [1:15:38<19:49:57, 102.73s/it]

For epoch 271: {Learning rate: [0.004449664378656343]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.005093298157456646, 'test_loss': 0.5084239132702351, 'bleu': 19.7716, 'gen_len': 8.5137}




  6%|▌         | 46/740 [1:17:21<19:48:39, 102.77s/it]

For epoch 272: {Learning rate: [0.004443207315805398]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.004445826456193151, 'test_loss': 0.5104258626699447, 'bleu': 20.7945, 'gen_len': 8.3973}




  6%|▋         | 47/740 [1:19:04<19:47:00, 102.77s/it]

For epoch 273: {Learning rate: [0.004436750252954452]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.003954786413908927, 'test_loss': 0.5147350586950779, 'bleu': 20.377, 'gen_len': 8.3493}




  6%|▋         | 48/740 [1:20:45<19:41:34, 102.45s/it]

For epoch 274: {Learning rate: [0.004430293190103507]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.003985613820587765, 'test_loss': 0.518757251650095, 'bleu': 19.6239, 'gen_len': 8.6164}




  7%|▋         | 49/740 [1:22:29<19:42:16, 102.66s/it]

For epoch 275: {Learning rate: [0.0044238361272525605]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.17batches/s]



Metrics: {'train_loss': 0.003924125792409319, 'test_loss': 0.515002466365695, 'bleu': 19.6777, 'gen_len': 8.2534}




  7%|▋         | 50/740 [1:24:11<19:41:24, 102.73s/it]

For epoch 276: {Learning rate: [0.004417379064401614]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.93batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.004168284569459502, 'test_loss': 0.49363266164436936, 'bleu': 20.3807, 'gen_len': 8.5479}




  7%|▋         | 51/740 [1:25:53<19:37:12, 102.51s/it]

For epoch 277: {Learning rate: [0.004410922001550669]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.004049299501674817, 'test_loss': 0.5128565421327949, 'bleu': 20.7666, 'gen_len': 8.4932}




  7%|▋         | 52/740 [1:27:36<19:36:44, 102.62s/it]

For epoch 278: {Learning rate: [0.004404464938699723]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.004302464909193166, 'test_loss': 0.5153625551611185, 'bleu': 20.2439, 'gen_len': 8.4315}




  7%|▋         | 53/740 [1:29:17<19:28:04, 102.02s/it]

For epoch 279: {Learning rate: [0.004398007875848777]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.97batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.003827815932332092, 'test_loss': 0.520525373518467, 'bleu': 19.4817, 'gen_len': 8.5616}




  7%|▋         | 54/740 [1:30:56<19:17:40, 101.25s/it]

For epoch 280: {Learning rate: [0.0043915508129978315]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.19batches/s]



Metrics: {'train_loss': 0.0041041982405170905, 'test_loss': 0.5140411946922541, 'bleu': 21.3246, 'gen_len': 8.3904}




  7%|▋         | 55/740 [1:32:40<19:24:46, 102.02s/it]

For epoch 281: {Learning rate: [0.004385093750146885]}


Train batch number 164: 100%|██████████| 164/164 [01:29<00:00,  1.84batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.004655132280654137, 'test_loss': 0.5254070997238159, 'bleu': 21.7795, 'gen_len': 8.2123}




  8%|▊         | 56/740 [1:34:27<19:38:18, 103.36s/it]

For epoch 282: {Learning rate: [0.00437863668729594]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.07batches/s]



Metrics: {'train_loss': 0.004058777726707531, 'test_loss': 0.5064813259989023, 'bleu': 20.3201, 'gen_len': 8.2055}




  8%|▊         | 57/740 [1:36:11<19:40:29, 103.70s/it]

For epoch 283: {Learning rate: [0.004372179624444994]}


Train batch number 164: 100%|██████████| 164/164 [01:30<00:00,  1.82batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.004267184752701191, 'test_loss': 0.5171544626355171, 'bleu': 21.7708, 'gen_len': 8.411}




  8%|▊         | 58/740 [1:37:58<19:50:27, 104.73s/it]

For epoch 284: {Learning rate: [0.004365722561594048]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.86batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.0036119659863044678, 'test_loss': 0.5148059546947479, 'bleu': 19.0975, 'gen_len': 8.3699}




  8%|▊         | 59/740 [1:39:44<19:51:37, 104.99s/it]

For epoch 285: {Learning rate: [0.004359265498743102]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.12batches/s]



Metrics: {'train_loss': 0.0037564248915902968, 'test_loss': 0.5146012604236603, 'bleu': 19.9334, 'gen_len': 8.3288}




  8%|▊         | 60/740 [1:41:28<19:47:50, 104.81s/it]

For epoch 286: {Learning rate: [0.004352808435892157]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.85batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.12batches/s]



Metrics: {'train_loss': 0.0039840030760121695, 'test_loss': 0.5159700144082308, 'bleu': 21.4062, 'gen_len': 8.2945}




  8%|▊         | 61/740 [1:43:15<19:52:20, 105.36s/it]

For epoch 287: {Learning rate: [0.004346351373041211]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.15batches/s]



Metrics: {'train_loss': 0.0034843335584677023, 'test_loss': 0.5205450098961591, 'bleu': 20.8878, 'gen_len': 8.1849}




  8%|▊         | 62/740 [1:45:00<19:49:02, 105.23s/it]

For epoch 288: {Learning rate: [0.0043398943101902645]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.87batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.0038936676651135757, 'test_loss': 0.519446475058794, 'bleu': 21.6468, 'gen_len': 8.4315}




  9%|▊         | 63/740 [1:46:45<19:47:13, 105.22s/it]

For epoch 289: {Learning rate: [0.004333437247339319]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.86batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:09<00:00,  1.10batches/s]



Metrics: {'train_loss': 0.003918770625404715, 'test_loss': 0.526067103818059, 'bleu': 20.5684, 'gen_len': 8.4247}




  9%|▊         | 64/740 [1:48:32<19:50:47, 105.69s/it]

For epoch 290: {Learning rate: [0.004326980184488374]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.0035847382015362665, 'test_loss': 0.517977224290371, 'bleu': 20.262, 'gen_len': 8.3973}




  9%|▉         | 65/740 [1:50:12<19:31:38, 104.15s/it]

For epoch 291: {Learning rate: [0.004320523121637427]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.27batches/s]



Metrics: {'train_loss': 0.003547232667099196, 'test_loss': 0.5255750045180321, 'bleu': 20.3319, 'gen_len': 8.3288}




  9%|▉         | 66/740 [1:51:51<19:11:42, 102.53s/it]

For epoch 292: {Learning rate: [0.004314066058786482]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.0036850100678726093, 'test_loss': 0.5160747349262238, 'bleu': 19.7794, 'gen_len': 8.2055}




  9%|▉         | 67/740 [1:53:29<18:54:39, 101.16s/it]

For epoch 293: {Learning rate: [0.0043076089959355355]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.90batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:10<00:00,  1.03s/batches]



Metrics: {'train_loss': 0.003855825959739954, 'test_loss': 0.526866103336215, 'bleu': 20.9059, 'gen_len': 8.1438}




  9%|▉         | 68/740 [1:55:14<19:06:33, 102.37s/it]

For epoch 294: {Learning rate: [0.00430115193308459]}


Train batch number 164: 100%|██████████| 164/164 [01:28<00:00,  1.85batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.003888430073280555, 'test_loss': 0.5248782236129046, 'bleu': 20.8311, 'gen_len': 8.5753}




  9%|▉         | 69/740 [1:57:00<19:14:48, 103.26s/it]

For epoch 295: {Learning rate: [0.004294694870233645]}


Train batch number 164: 100%|██████████| 164/164 [01:27<00:00,  1.86batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.0040104282550423595, 'test_loss': 0.5198617216199637, 'bleu': 20.6293, 'gen_len': 8.226}




  9%|▉         | 70/740 [1:58:44<19:17:52, 103.69s/it]

For epoch 296: {Learning rate: [0.004288237807382698]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.005002808353399415, 'test_loss': 0.5161661267280578, 'bleu': 20.7152, 'gen_len': 8.4521}




 10%|▉         | 71/740 [2:00:23<18:58:04, 102.07s/it]

For epoch 297: {Learning rate: [0.004281780744531753]}


Train batch number 164: 100%|██████████| 164/164 [01:23<00:00,  1.96batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.004055806254544267, 'test_loss': 0.5268006380647421, 'bleu': 19.3999, 'gen_len': 8.4041}




 10%|▉         | 72/740 [2:02:03<18:49:53, 101.49s/it]

For epoch 298: {Learning rate: [0.0042753236816808066]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.0037685484725576627, 'test_loss': 0.5253100227564573, 'bleu': 21.1701, 'gen_len': 8.4178}




 10%|▉         | 73/740 [2:03:42<18:39:34, 100.71s/it]

For epoch 299: {Learning rate: [0.004268866618829861]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.0035503182060471396, 'test_loss': 0.5286560654640198, 'bleu': 20.3459, 'gen_len': 8.5616}




 10%|█         | 74/740 [2:05:21<18:32:47, 100.25s/it]

For epoch 300: {Learning rate: [0.004262409555978916]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.003748083083115578, 'test_loss': 0.5187773428857326, 'bleu': 20.2527, 'gen_len': 8.3014}




 10%|█         | 75/740 [2:06:59<18:25:34, 99.75s/it] 

For epoch 301: {Learning rate: [0.004255952493127969]}


Train batch number 164: 100%|██████████| 164/164 [01:24<00:00,  1.95batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.0042410878721101455, 'test_loss': 0.512333894520998, 'bleu': 18.8964, 'gen_len': 8.6027}




 10%|█         | 76/740 [2:08:40<18:27:05, 100.04s/it]

For epoch 302: {Learning rate: [0.004249495430277023]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.0035409554755283243, 'test_loss': 0.519505849853158, 'bleu': 20.4046, 'gen_len': 8.3973}




 10%|█         | 77/740 [2:10:19<18:23:06, 99.83s/it] 

For epoch 303: {Learning rate: [0.004243038367426078]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.91batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.003499352572827952, 'test_loss': 0.5320948760956525, 'bleu': 21.9202, 'gen_len': 8.4315}




 11%|█         | 78/740 [2:12:01<18:28:33, 100.47s/it]

For epoch 304: {Learning rate: [0.004236581304575132]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:11<00:00,  1.12s/batches]



Metrics: {'train_loss': 0.0037803432822817364, 'test_loss': 0.5430816255509854, 'bleu': 22.5757, 'gen_len': 8.3767}




 11%|█         | 79/740 [2:13:51<18:56:22, 103.15s/it]

For epoch 305: {Learning rate: [0.004230124241724186]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.18batches/s]



Metrics: {'train_loss': 0.00370123155512485, 'test_loss': 0.5276652447879314, 'bleu': 20.7038, 'gen_len': 8.7534}




 11%|█         | 80/740 [2:15:35<18:57:06, 103.37s/it]

For epoch 306: {Learning rate: [0.00422366717887324]}


Train batch number 164: 100%|██████████| 164/164 [01:25<00:00,  1.92batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.15batches/s]



Metrics: {'train_loss': 0.003732516755118878, 'test_loss': 0.5235530991107226, 'bleu': 21.9735, 'gen_len': 8.4452}




 11%|█         | 81/740 [2:17:18<18:53:30, 103.20s/it]

For epoch 307: {Learning rate: [0.004217210116022295]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.0034539594053945265, 'test_loss': 0.5278616487979889, 'bleu': 21.4448, 'gen_len': 8.5753}




 11%|█         | 82/740 [2:18:56<18:37:16, 101.88s/it]

For epoch 308: {Learning rate: [0.004210753053171349]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.16batches/s]



Metrics: {'train_loss': 0.0037463223296338296, 'test_loss': 0.5279822053387762, 'bleu': 20.8886, 'gen_len': 8.5959}




 11%|█         | 83/740 [2:20:35<18:25:10, 100.93s/it]

For epoch 309: {Learning rate: [0.004204295990320403]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.0036411891744093025, 'test_loss': 0.5285628400743008, 'bleu': 20.6956, 'gen_len': 8.4521}




 11%|█▏        | 84/740 [2:22:12<18:11:47, 99.86s/it] 

For epoch 310: {Learning rate: [0.004197838927469457]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.0044542004030606736, 'test_loss': 0.5200766762718558, 'bleu': 22.6016, 'gen_len': 8.3699}




 11%|█▏        | 85/740 [2:23:58<18:28:14, 101.52s/it]

For epoch 311: {Learning rate: [0.0041913818646185114]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.36batches/s]



Metrics: {'train_loss': 0.0038337310580107264, 'test_loss': 0.525522681325674, 'bleu': 21.9155, 'gen_len': 8.3219}




 12%|█▏        | 86/740 [2:25:34<18:09:51, 99.99s/it] 

For epoch 312: {Learning rate: [0.004184924801767566]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.12batches/s]



Metrics: {'train_loss': 0.003733142180241951, 'test_loss': 0.5353856109082699, 'bleu': 21.9384, 'gen_len': 8.4932}




 12%|█▏        | 87/740 [2:27:14<18:05:57, 99.78s/it]

For epoch 313: {Learning rate: [0.00417846773891662]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.003945621665790407, 'test_loss': 0.5367072440683842, 'bleu': 22.1913, 'gen_len': 8.3493}




 12%|█▏        | 88/740 [2:28:51<17:55:27, 98.97s/it]

For epoch 314: {Learning rate: [0.004172010676065674]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.003652039954054422, 'test_loss': 0.5331553153693676, 'bleu': 22.6801, 'gen_len': 8.4589}




 12%|█▏        | 89/740 [2:30:36<18:13:32, 100.79s/it]

For epoch 315: {Learning rate: [0.004165553613214728]}


Train batch number 164: 100%|██████████| 164/164 [01:19<00:00,  2.06batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.36batches/s]



Metrics: {'train_loss': 0.003742310359994895, 'test_loss': 0.5296663172543049, 'bleu': 21.1005, 'gen_len': 8.4726}




 12%|█▏        | 90/740 [2:32:11<17:55:34, 99.28s/it] 

For epoch 316: {Learning rate: [0.0041590965503637825]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.21batches/s]



Metrics: {'train_loss': 0.0035225246991148, 'test_loss': 0.52937694452703, 'bleu': 20.8361, 'gen_len': 8.2603}




 12%|█▏        | 91/740 [2:33:50<17:50:20, 98.95s/it]

For epoch 317: {Learning rate: [0.004152639487512837]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.004186448330606587, 'test_loss': 0.5286027554422617, 'bleu': 21.8422, 'gen_len': 8.5}




 12%|█▏        | 92/740 [2:35:26<17:40:41, 98.21s/it]

For epoch 318: {Learning rate: [0.004146182424661891]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.25batches/s]



Metrics: {'train_loss': 0.004048846665815601, 'test_loss': 0.522997535765171, 'bleu': 20.4726, 'gen_len': 8.3836}




 13%|█▎        | 93/740 [2:37:03<17:35:41, 97.90s/it]

For epoch 319: {Learning rate: [0.004139725361810944]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  2.00batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.003658134728184228, 'test_loss': 0.5241936065256596, 'bleu': 22.2768, 'gen_len': 8.5068}




 13%|█▎        | 94/740 [2:38:42<17:35:26, 98.03s/it]

For epoch 320: {Learning rate: [0.004133268298959999]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.29batches/s]



Metrics: {'train_loss': 0.003951084586288943, 'test_loss': 0.5380347955971956, 'bleu': 20.3304, 'gen_len': 8.3288}




 13%|█▎        | 95/740 [2:40:19<17:30:37, 97.73s/it]

For epoch 321: {Learning rate: [0.0041268112361090535]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.0035866171884419333, 'test_loss': 0.5269684512168169, 'bleu': 22.1395, 'gen_len': 8.5616}




 13%|█▎        | 96/740 [2:41:56<17:28:11, 97.66s/it]

For epoch 322: {Learning rate: [0.004120354173258107]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.0039363095740732315, 'test_loss': 0.5310158412903547, 'bleu': 22.1337, 'gen_len': 8.6027}




 13%|█▎        | 97/740 [2:43:33<17:23:31, 97.37s/it]

For epoch 323: {Learning rate: [0.004113897110407162]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.0036402031695527003, 'test_loss': 0.5340776104480028, 'bleu': 21.4437, 'gen_len': 8.7603}




 13%|█▎        | 98/740 [2:45:11<17:23:11, 97.49s/it]

For epoch 324: {Learning rate: [0.0041074400475562154]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.003508229599058878, 'test_loss': 0.5461837727576494, 'bleu': 21.5855, 'gen_len': 8.5}




 13%|█▎        | 99/740 [2:46:47<17:19:08, 97.27s/it]

For epoch 325: {Learning rate: [0.00410098298470527]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.98batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.36batches/s]



Metrics: {'train_loss': 0.003518652263975549, 'test_loss': 0.5436039976775646, 'bleu': 22.7724, 'gen_len': 8.226}




 14%|█▎        | 100/740 [2:48:33<17:45:14, 99.87s/it]

For epoch 326: {Learning rate: [0.0040945259218543245]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.003443346852624772, 'test_loss': 0.543676009774208, 'bleu': 22.1315, 'gen_len': 8.4247}




 14%|█▎        | 101/740 [2:50:10<17:33:54, 98.96s/it]

For epoch 327: {Learning rate: [0.004088068859003378]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.02batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.0033101946041048306, 'test_loss': 0.5444538563489913, 'bleu': 21.7119, 'gen_len': 8.3973}




 14%|█▍        | 102/740 [2:51:47<17:26:43, 98.44s/it]

For epoch 328: {Learning rate: [0.004081611796152432]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.0033298508741853123, 'test_loss': 0.54693255238235, 'bleu': 20.9818, 'gen_len': 8.4795}




 14%|█▍        | 103/740 [2:53:24<17:18:09, 97.79s/it]

For epoch 329: {Learning rate: [0.004075154733301487]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.0035109608659517908, 'test_loss': 0.5389573119580746, 'bleu': 22.2287, 'gen_len': 8.637}




 14%|█▍        | 104/740 [2:55:01<17:15:19, 97.67s/it]

For epoch 330: {Learning rate: [0.004068697670450541]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.30batches/s]



Metrics: {'train_loss': 0.0040510695305384595, 'test_loss': 0.5306688280776143, 'bleu': 22.1462, 'gen_len': 8.6233}




 14%|█▍        | 105/740 [2:56:37<17:09:43, 97.30s/it]

For epoch 331: {Learning rate: [0.004062240607599595]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.0038850337068430324, 'test_loss': 0.5357846166938544, 'bleu': 22.4493, 'gen_len': 8.6301}




 14%|█▍        | 106/740 [2:58:15<17:09:58, 97.47s/it]

For epoch 332: {Learning rate: [0.004055783544748649]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.32batches/s]



Metrics: {'train_loss': 0.003886371219911181, 'test_loss': 0.5430528089404106, 'bleu': 22.7915, 'gen_len': 8.4932}




 14%|█▍        | 107/740 [3:00:00<17:32:11, 99.73s/it]

For epoch 333: {Learning rate: [0.004049326481897704]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.05batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.0038661203497013493, 'test_loss': 0.5315558376489207, 'bleu': 23.5856, 'gen_len': 8.3082}




 15%|█▍        | 108/740 [3:01:44<17:43:50, 101.00s/it]

For epoch 334: {Learning rate: [0.004042869419046758]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.04batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.004067715200218982, 'test_loss': 0.5324797607026994, 'bleu': 23.5096, 'gen_len': 8.4178}




 15%|█▍        | 109/740 [3:03:21<17:27:54, 99.64s/it] 

For epoch 335: {Learning rate: [0.004036412356195812]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.34batches/s]



Metrics: {'train_loss': 0.0040358641061885, 'test_loss': 0.5229344635736197, 'bleu': 23.6281, 'gen_len': 8.4932}




 15%|█▍        | 110/740 [3:05:05<17:41:52, 101.13s/it]

For epoch 336: {Learning rate: [0.004029955293344866]}


Train batch number 164: 100%|██████████| 164/164 [01:20<00:00,  2.03batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.28batches/s]



Metrics: {'train_loss': 0.003880466116597967, 'test_loss': 0.5323702247813344, 'bleu': 21.8696, 'gen_len': 8.5411}




 15%|█▌        | 111/740 [3:06:42<17:27:29, 99.92s/it] 

For epoch 337: {Learning rate: [0.00402349823049392]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.33batches/s]



Metrics: {'train_loss': 0.0038666382302702603, 'test_loss': 0.5328129097819329, 'bleu': 21.4526, 'gen_len': 8.4315}




 15%|█▌        | 112/740 [3:08:20<17:18:28, 99.22s/it]

For epoch 338: {Learning rate: [0.004017041167642975]}


Train batch number 164: 100%|██████████| 164/164 [01:21<00:00,  2.01batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:07<00:00,  1.31batches/s]



Metrics: {'train_loss': 0.0037509203816958555, 'test_loss': 0.5299997904337943, 'bleu': 22.5401, 'gen_len': 8.274}




 15%|█▌        | 113/740 [3:09:58<17:12:58, 98.85s/it]

For epoch 339: {Learning rate: [0.0040105841047920286]}


Train batch number 164: 100%|██████████| 164/164 [01:26<00:00,  1.89batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.14batches/s]



Metrics: {'train_loss': 0.004071245895341612, 'test_loss': 0.547951515391469, 'bleu': 22.1694, 'gen_len': 8.3288}




 15%|█▌        | 114/740 [3:11:42<17:27:10, 100.37s/it]

For epoch 340: {Learning rate: [0.004004127041941083]}


Train batch number 164: 100%|██████████| 164/164 [01:22<00:00,  1.99batches/s]
Test batch number 10: 100%|██████████| 10/10 [00:08<00:00,  1.22batches/s]



Metrics: {'train_loss': 0.0036326103979773487, 'test_loss': 0.5440607860684394, 'bleu': 20.9407, 'gen_len': 8.2945}




### Predictions and Evaluation

In [8]:
# let us get the test set
test_dataset = SentenceDataset(f"data/extractions/new_data/test_set.csv",
                                        tokenizer,
                                        truncation = True)

Let us make the evaluation and print the predicted sentences.

In [9]:
# evaluation with test set
df_ft_to_wf = trainer.evaluate(test_dataset)

Evaluation batch number 11: 100%|██████████| 11/11 [00:10<00:00,  1.07batches/s]


In [10]:
df_ft_to_wf[1].tail(10)

Unnamed: 0,original_sentences,translations,predictions
152,"Te voila, tu as été","Yaa ŋgi, dem ŋga","Samba, yaw, yaa dem"
153,Il veut que tu viennes,Bëgg na ŋga dem,Bëgg na gëléem.
154,Les travailleurs c'est toi et moi.,Liggéeykat yi man ag yaw la.,Moo di jaŋgkat bi.
155,Où?,Foofee fan?,Ana ŋga?
156,"Te voilà, le voilà","Yaa ŋgi, mi ŋgi",Yaa ŋgoogu rekk!
157,Tu as vu celui-ci?,Gis ŋga kooku?,Samba?
158,J'ai été jusqu'à lui.,Dem naa ba ci moom.,Maa demoon
159,Il parle de vous?,Yéen ñan la wax?,Yéen ñan la wax?
160,C'était son hôte habituellement.,Moo doon ganam.,Mu doon Lebu Yoff.
161,Dis à la personne qu'elle vienne,Nil waa ji na ñëw,Tay ci ŋgoon.


In [11]:
# let us display 100 samples
pd.options.display.max_rows = 100
df_ft_to_wf[1].sample(100)

Unnamed: 0,original_sentences,translations,predictions
94,Tu parles de quelle maison (ici)?,Bii néeg ban ŋga wax?,Bii néeg ban ŋga wax?
121,Partout où il ira la paix descendra là.,Fépp fu mu jëm foofu jàmm dana fa wacc.,"Liggéey bii, ba mu sotti!"
46,C'est l'autre que nous connaissons.,Keneen ki la ñu xam.,Lawbe bi la!
128,Quelles femmes se sont égarées?,Jigéen ñan ñoo réer?,Jigéen jan a réer?
27,Je croie qu'aujourd'hui il viendra!,Defe naa tày dana ñëw!,Defe naa ni keneen ki la!
4,J'ai vu cet enfant-là?,Gis naa xale booba?,Gis naa booba xale?
153,Il veut que tu viennes,Bëgg na ŋga dem,Bëgg na gëléem.
157,Tu as vu celui-ci?,Gis ŋga kooku?,Samba?
106,Surveille-moi les-uns que voilà!,Seetal ma ñenn ñuu!,"Te kat, na ŋga ma nég!"
83,Ces enfants que voilà ne sont pas sages.,Xale yule yaruwuñu.,Duŋgeen lekk
