Training the best custom Transformer 🤖
-----------------------------------

In this notebook, we will continue the training of the best custom transformer on the new extracted sentences from the bool **Grammaire de Wolof Moderne**. We obtained, after a hyperparameter tuning with `wandb`, a best bleu score of **?** for french to wolof translation model. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step.

- Parallel coordinates:

- Parameter importance (from [panel]()):


Let us add some libraries bellow:

In [1]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, set_seed, AdamW
from wolof_translate.utils.sent_transformers import TransformerSequences
from torch.nn import TransformerEncoderLayer, TransformerDecoderLayer
from torch.utils.data import Dataset, DataLoader, random_split
from wolof_translate.data.dataset_v2 import SentenceDataset
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from torch.optim.lr_scheduler import _LRScheduler
# from custom_rnn.utils.kwargs import Kwargs
from torch.nn.utils.rnn import pad_sequence
from plotly.subplots import make_subplots
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
from torch.nn import functional as F
import plotly.graph_objects as go
from tokenizers import Tokenizer
import matplotlib.pyplot as plt
from tqdm import tqdm, trange
from functools import partial
from torch.nn import utils
from copy import deepcopy
from torch import optim
from typing import *
from torch import nn
import pandas as pd
import numpy as np
import itertools
import evaluate
import random
import string
import shutil
import wandb
import torch
import json
import copy
import os


  from .autonotebook import tqdm as notebook_tqdm


### Steps

We must add some classes that we implemented when making the hyperparameter search including:
- The custom Sinusoidal-based encoder
- The custom Size prediction module
- The custom Transformer requiring the `pytorch encoder and decoder stacked layers`
- The custom Transformer' learning rate scheduler
- The custom Trainer

And include them in our `wolof-translate` package.

-------------------

After that we will continue the training of the custom Transformer, for which we will resume its parameters from the saved checkpoints.

-------------------

The last part is to evaluate the model on the test set.

Let us go into our pipeline 👌

### Add custom modules

#### Custom Positional Encoder

Let us add bellow the positional encoder module which will permit us to put the positions of the sequence elements on the embedding vector.

In [2]:
%%writefile wolof-translate/wolof_translate/models/transformers/position.py

from torch import nn
import numpy as np
import torch

class PositionalEncoding(nn.Module):

    def __init__(self, n_poses_max: int = 500, d_model: int = 512):
        super(PositionalEncoding, self).__init__()    
        
        self.n_poses = n_poses_max
        
        self.n_dims = d_model
        
        # the angle is calculated as following
        angle = lambda pos, i: pos / 10000 ** (i / self.n_dims)

        # let's initialize the different token positions
        poses = np.arange(0, self.n_poses)

        # let's initialize also the different dimension indexes
        dims = np.arange(0, self.n_dims)

        # let's initialize the index of the different positional vector values
        circle_index = np.arange(0, self.n_dims / 2)

        # let's create the possible combinations between a position and a dimension index
        xv, yv = np.meshgrid(poses, circle_index)

        # let's create a matrix which will contain all the different points initialized
        points = np.zeros((self.n_poses, self.n_dims))

        # let's calculate the circle y axis coordinates
        points[:, ::2] = np.sin(angle(xv.T, yv.T))

        # let's calculate the circle x axis coordinates
        points[:, 1::2] = np.cos(angle(xv.T, yv.T))
        
        self.register_buffer('pe', torch.from_numpy(points).unsqueeze(0))
    
    def forward(self, input_: torch.Tensor):
        
        # let's scale the input
        input_ = input_ * torch.sqrt(torch.tensor(self.n_dims))
        
        # let's recuperate the result of the sum between the input and the positional encoding vectors
        return input_ + self.pe[:, :input_.size(1), :].type_as(input_)
    

Overwriting wolof-translate/wolof_translate/models/transformers/position.py


#### Size Prediction module

Let us define bellow the Size Prediction's module. It is a multi layer perceptron with multiple layers of `linear + relu activation + drop out + layer normalization`. `The number of features`, `the number of layers`, `the layer normalization' activation function` and `the drop out rate` are given as parameters to the module.


In [3]:
%%writefile wolof-translate/wolof_translate/models/transformers/size.py

from torch import nn
import torch

class SizePredict(nn.Module):
    
    def __init__(self, input_size: int, target_size: int = 1, n_features: int = 100, n_layers: int = 1, normalization: bool = True, drop_out: float = 0.1):
        super(SizePredict, self).__init__()
        
        self.layers = nn.ModuleList([])
        
        for l in range(n_layers):
            
            # we have to add batch normalization and drop_out if their are specified
            self.layers.append(
                nn.Sequential(
                    nn.Linear(input_size if l == 0 else n_features, n_features),
                    nn.ReLU(),
                    nn.Dropout(drop_out),
                    nn.LayerNorm(n_features) if normalization else nn.Identity(),
                )
            )
        
        # Initiate the last linear layer
        self.output_layer = nn.Linear(n_features, target_size)
    
    def forward(self, input_: torch.Tensor):
        
        # let's pass the input into the different sequences
        out = input_
        
        for layer in self.layers:
            
            out = layer(out)
        
        # return the final result (you have to take the absolute value of the result to make the number positive)
        return self.output_layer(out)
        
        

Overwriting wolof-translate/wolof_translate/models/transformers/size.py


#### Transformer

The following module is the primary transformer model. It takes as argument:
- a pytorch encoder and a pytorch decoder (they are defined outside of the module)
- the input size or vocabulary size
- the class criterion or loss function of the predict labels (as default to None but can be `nn.CrossEntropyLoss`, which apply the softmax transformation is made on the logits before calculation. label_smoothing can be added to the loss to prevent the model to over-fit according to the prediction values.)
- the size criterion (`Mean Squared Error` $\frac{1}{n}\sum_{i = 1}^n (y_i - \hat{y}_i)^2$ where $n$ is the batch size, $y_i$ is the true label and $\hat{y}_i$ is the predicted label)
- the number of features and the number of layers of the size prediction module
- the max number of positions (it must be the max number of tokens defined when creating the pytorch dataset or the tokenizer)
- the projection type (can be 'embedding' for 2-dimensional data containing integers as we are using or 'linear' for any other type of data different from the sequence of integers).

For the `forward` method we have the following arguments:

- the input sequence
- the input padding mask
- the target sequence or labels
- the target padding mask
- the padding token id.

For the `generate` method we have the following arguments:

- the input sequence
- the input padding mask
- the temperature
- the padding token id.

We added also two exception modules to handle errors.

In [4]:
%%writefile wolof-translate/wolof_translate/models/transformers/main.py

from wolof_translate.models.transformers.position import PositionalEncoding
from wolof_translate.models.transformers.size import SizePredict
from torch import nn
import torch
import copy


# new Exception for that transformer
class TargetException(Exception):
    
    def __init__(self, error):
        
        print(error)

class GenerationException(Exception):

    def __init__(self, error):

        print(error)

class Transformer(nn.Module):
    
    def __init__(self, 
                 vocab_size: int,
                 encoder,
                 decoder,
                 class_criterion = nn.CrossEntropyLoss(label_smoothing=0.1),
                 size_criterion = nn.MSELoss(),
                 n_features: int = 100,
                 n_layers: int = 2,
                 n_poses_max: int = 500,
                 projection_type: str = "embedding"):
        
        super(Transformer, self).__init__()
        
        assert len(encoder.layers) > 0 and len(decoder.layers) > 0
    
        self.dropout = encoder.layers._modules['0'].dropout.p
        
        self.enc_embed_dim = encoder.layers._modules['0'].linear1.in_features
        
        self.dec_embed_dim = decoder.layers._modules['0'].linear1.in_features
        
        # we can initiate the positional encoding model
        self.pe = PositionalEncoding(n_poses_max, self.enc_embed_dim)
        
        if projection_type == "embedding":
            
            self.embedding_layer = nn.Embedding(vocab_size, self.enc_embed_dim)
        
        elif projection_type == "linear":
            
            self.embedding_layer = nn.Linear(vocab_size, self.enc_embed_dim)
        
        # initialize the first encoder and decoder
        self.encoder = encoder
        
        self.decoder = decoder
        
        self.class_criterion = class_criterion
        
        self.size_criterion = size_criterion
        
        # let's initiate the mlp for predicting the target size
        self.size_prediction = SizePredict(
            self.enc_embed_dim,
            n_features=n_features,
            n_layers=n_layers,
            normalization=True, # we always use normalization
            drop_out=self.dropout
            )
      
        self.classifier = nn.Linear(self.dec_embed_dim, vocab_size)

        # let us share the weights between the embedding layer and classification
        # linear layer
        self.classifier.weight.data = self.embedding_layer.weight.data
        
    def forward(self, input_, input_mask = None, target = None, target_mask = None, 
                pad_token_id:int = 3):

        # ---> Encoder prediction
        input_embed = self.embedding_layer(input_)
        
        # recuperate the last input (before position)
        last_input = input_embed[:, -1:]
       
        # add position to input_embedding
        input_embed = self.pe(input_embed)
        
        # recuperate the input mask for pytorch encoder
        pad_mask1 = (input_mask == 0).bool().to(next(self.parameters()).device) if not input_mask is None else None
        
        # let us compute the states
        input_embed = input_embed.type_as(next(self.encoder.parameters()))
        
        states = self.encoder(input_embed, src_key_padding_mask = pad_mask1)
   
        # ---> Decoder prediction
        # let's predict the size of the target 
        target_size = self.size_prediction(states).mean(axis = 1)
        
        target_embed = self.embedding_layer(target)
        
        # recuperate target mask for pytorch decoder            
        pad_mask2 = (target_mask == 0).bool().to(next(self.parameters()).device) if not target_mask is None else None
        
        # define the attention mask
        targ_mask = self.get_target_mask(target_embed.size(1))

        # let's concatenate the last input and the target shifted from one position to the right (new seq dim = target seq dim)
        target_embed = torch.cat((last_input, target_embed[:, :-1]), dim = 1)
        
        # add position to target embed
        target_embed = self.pe(target_embed)
        
        # we pass all of the shifted target sequence to the decoder if training mode
        if self.training:
            
            target_embed = target_embed.type_as(next(self.encoder.parameters()))
            
            outputs = self.decoder(target_embed, states, tgt_mask = targ_mask, tgt_key_padding_mask = pad_mask2)
            
        else: ## This part was understand with the help of the professor Bousso.
            
            # if we are in evaluation mode we will not use the target but the outputs to make prediction and it is
            # sequentially done (see comments)
            
            # let us recuperate the last input as the current outputs
            outputs = last_input.type_as(next(self.encoder.parameters()))
            
            # for each target that we want to predict
            for t in range(target.size(1)):
                
                # recuperate the target mask of the current decoder input
                current_targ_mask = targ_mask[:t+1, :t+1] # all attentions between the elements before the last target
                
                # we do the same for the padding mask
                current_pad_mask = None
                
                if not pad_mask2 is None:
                    
                    current_pad_mask = pad_mask2[:, :t+1]
                
                # make new predictions
                out = self.decoder(outputs, states, tgt_mask = current_targ_mask, tgt_key_padding_mask = current_pad_mask) 
                
                # add the last new prediction to the decoder inputs
                outputs = torch.cat((outputs, out[:, -1:]), dim = 1) # the prediction of the last output is the last to add (!)
            
            # let's take only the predictions (the last input will not be taken)
            outputs = outputs[:, 1:]
        
        # let us add padding index to the outputs
        if not target_mask is None: 
          target = copy.deepcopy(target.cpu())
          target = target.to(target_mask.device).masked_fill_(target_mask == 0, -100)

        # ---> Loss Calculation
        # let us calculate the loss of the size prediction
        size_loss = 0
        if not self.size_criterion is None:
            
            size_loss = self.size_criterion(target_size, target_mask.sum(axis = -1).unsqueeze(1).type_as(next(self.parameters())))
            
        outputs = self.classifier(outputs)
        
        # let us permute the two last dimensions of the outputs
        outputs_ = outputs.permute(0, -1, -2)

        # calculate the loss
        loss = self.class_criterion(outputs_, target)

        outputs = torch.softmax(outputs, dim = -1)

        # calculate the predictionos
        outputs = copy.deepcopy(outputs.detach().cpu())
        predictions = torch.argmax(outputs, dim = -1).to(target_mask.device).masked_fill_(target_mask == 0, pad_token_id)

        return {'loss': loss + size_loss, 'preds': predictions}
    
    def generate(self, input_, input_mask = None, temperature: float = 0, pad_token_id:int = 3):

        if self.training:

          raise GenerationException("You cannot generate when the model is on training mode!")

        # ---> Encoder prediction
        input_embed = self.embedding_layer(input_)
        
        # recuperate the last input (before position)
        last_input = input_embed[:, -1:]
       
        # add position to input_embedding
        input_embed = self.pe(input_embed)
        
        # recuperate the input mask for pytorch encoder
        pad_mask1 = (input_mask == 0).bool().to(next(self.parameters()).device) if not input_mask is None else None
        
        # let us compute the states
        input_embed = input_embed.type_as(next(self.encoder.parameters()))
        
        states = self.encoder(input_embed, src_key_padding_mask = pad_mask1)
   
        # ---> Decoder prediction
        # let's predict the size of the target, the target and the target mask
        target_size = self.size_prediction(states).mean(axis = 1).round().clip(1, input_.size(1))

        target_ = copy.deepcopy(target_size.cpu())

        target_ = [int(size[0])*[1] + [0] * (input_.size(1) - int(size[0])) for size in target_.tolist()]

        target = torch.tensor(target_.copy()).long().to(next(self.parameters()).device)

        target_mask = torch.tensor(target_.copy()).bool().to(next(self.parameters()).device)
        
        target_embed = self.embedding_layer(target)
        
        # recuperate target mask for pytorch decoder            
        pad_mask2 = (target_mask == 0).bool().to(next(self.parameters()).device) if not target_mask is None else None
        
        # define the attention mask
        targ_mask = self.get_target_mask(target_embed.size(1))

        # let's concatenate the last input and the target shifted from one position to the right (new seq dim = target seq dim)
        target_embed = torch.cat((last_input, target_embed[:, :-1]), dim = 1)
        
        # add position to target embed
        target_embed = self.pe(target_embed)
            
        # if we are in evaluation mode we will not use the target but the outputs to make prediction and it is
        # sequentially done (see comments)
        
        # let us recuperate the last input as the current outputs
        outputs = last_input.type_as(next(self.encoder.parameters()))
        
        # for each target that we want to predict
        for t in range(target.size(1)):
            
            # recuperate the target mask of the current decoder input
            current_targ_mask = targ_mask[:t+1, :t+1] # all attentions between the elements before the last target
            
            # we do the same for the padding mask
            current_pad_mask = None
            
            if not pad_mask2 is None:
                
                current_pad_mask = pad_mask2[:, :t+1]
            
            # make new predictions
            out = self.decoder(outputs, states, tgt_mask = current_targ_mask, tgt_key_padding_mask = current_pad_mask) 
            
            # add the last new prediction to the decoder inputs
            outputs = torch.cat((outputs, out[:, -1:]), dim = 1) # the prediction of the last output is the last to add (!)
        
        # let's take only the predictions (the last input will not be taken)
        outputs = outputs[:, 1:]
        
        # let us add padding index to the outputs
        if not target_mask is None: 
          target = copy.deepcopy(target.cpu())
          target = target.to(target_mask.device).masked_fill_(target_mask == 0, -100)

        # ---> Final predictions
        outputs = self.classifier(outputs)

        # calculate the resulted outputs with temperature
        if temperature > 0:

          outputs = torch.softmax(outputs / temperature, dim = -1)
        
        else:

          outputs = torch.softmax(outputs, dim = -1)

        # calculate the predictions
        outputs = copy.deepcopy(outputs.detach().cpu())
        predictions = torch.argmax(outputs, dim = -1).to(target_mask.device).masked_fill_(target_mask == 0, pad_token_id)

        return predictions
    

    def get_target_mask(self, attention_size: int):
        
        return torch.triu(torch.ones((attention_size, attention_size), dtype = bool), diagonal = 1).to(next(self.parameters()).device)

Overwriting wolof-translate/wolof_translate/models/transformers/main.py


#### Learning scheduler

Let us create our own learning rate scheduler according to the paper [Attention Is All You Need](https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) paper.

![scheduler_transformer](https://i.stack.imgur.com/GQurA.png)

In [4]:
%%writefile wolof-translate/wolof_translate/models/transformers/optimization.py

from torch.optim.lr_scheduler import _LRScheduler
from torch import optim
from typing import *

class TransformerScheduler(_LRScheduler):
    
    def __init__(self, optimizer: Union[optim.AdamW, optim.Adam], d_model = 512, lr_warmup_step = 100, **kwargs):
        
        self._optimizer = optimizer
        
        self._dmodel = d_model
        
        self._lr_warmup = lr_warmup_step

        # get the number of parameters
        self.len_param_groups = len(self._optimizer.param_groups)

        # provide the LRScheduler parameters
        super().__init__(self._optimizer, **kwargs)
        
    def get_lr(self):
        
        # recuperate the step number
        _step_num = self._step_count
        
        # calculate the learning rate
        lr = self._dmodel ** -0.5 * min(_step_num ** -0.5, 
                                              _step_num * self._lr_warmup ** -1.5)
        # provide the corresponding learning rate of each parameter vector
        # for updating
        return [lr] * self.len_param_groups

        
        

Overwriting wolof-translate/wolof_translate/models/transformers/optimization.py


In [5]:
%%writefile wolof-translate/wolof_translate/models/transformers/optimization.py

from torch.optim.lr_scheduler import _LRScheduler
from torch import optim
from typing import *
class TransformerScheduler(_LRScheduler):
    
    def __init__(self, optimizer: Union[optim.AdamW, optim.Adam], scale_factor = 1.0, lr_warmup_step = 100, **kwargs):

        self._optimizer = optimizer

        self._scale_factor = scale_factor
        
        self._lr_warmup = lr_warmup_step

        # get the number of parameters
        self.len_param_groups = len(self._optimizer.param_groups)

        # provide the LRScheduler parameters
        super().__init__(self._optimizer, **kwargs)
        
    def get_lr(self):
        
        # recuperate the step number
        _step_num = self._step_count
        
        # calculate the learning rate
        lr = self._scale_factor * min(_step_num ** -0.5, 
                                              _step_num * self._lr_warmup ** -1.5)
        # provide the corresponding learning rate of each parameter vector
        # for updating
        return [lr] * self.len_param_groups

        
        

Overwriting wolof-translate/wolof_translate/models/transformers/optimization.py


#### Trainer

Let us define bellow a part of our long training class that we create and which is available in github. But the lines are commented in French.

In [2]:
%%writefile wolof-translate/wolof_translate/trainers/transformer_trainer.py
"""Nouvelle classe d'entraînement. On la fournit un modèle et des hyperparamètres en entrée.
Nous allons créer des classes supplémentaire qui vont supporter la classe d'entraînement
"""

from wolof_translate.utils.evaluation import TranslationEvaluation
from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import Dataset, DataLoader
from tokenizers import Tokenizer
from tqdm import tqdm, trange
from torch.nn import utils
from torch import optim
from typing import *
from torch import nn
import string
import torch
import json
import copy
import os

# choose letters for random words
letters = string.ascii_lowercase

class PredictionError(Exception):
    
    def __init__(self, error: Union[str, None] = None):

        if not error is None:
            
            print(error)
        
        else:
            
            print("You cannot with this type of data! Provide a list of tensors, a list of numpy arrays, a numpy array or a torch tensor.")

class LossError(Exception):
    
    def __init__(self, error: Union[str, None] = None):

        if not error is None:
            
            print(error)
        
        else:
            
            print("A list of losses is provided for multiple outputs.")
        
class ModelRunner:

    def __init__(
        self,
        model: nn.Module,
        optimizer = optim.AdamW,
        seed: Union[int, None] = None, 
        evaluation: Union[TranslationEvaluation, None] = None
    ):

        # Initialisation de la graine du générateur
        self.seed = seed

        # Recuperate the evaluation metric
        self.evaluation = evaluation

        # Initialisation du générateur
        if self.seed:
            torch.manual_seed(self.seed)

        # Le modèle à utiliser pour les différents entraînements
        self.orig_model = model

        # L'optimiseur à utiliser pour les différentes mises à jour du modèle
        self.orig_optimizer = optimizer

        # Récupération du type de 'device'
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        self.compilation = False

    # ------------------------------ Training staffs (Partie entraînement et compilation) --------------------------
    
    def batch_train(self, input_: torch.Tensor, input_mask: torch.Tensor,
                    labels: torch.Tensor, labels_mask: torch.Tensor, pad_token_id: int = 3):

        if self.hugging_face: # Nous allons utilise un modèle text to text de hugging face (but only for fine-tuning)
          
          # effectuons un passage vers l'avant
          outputs = self.model(input_ids = input_, attention_mask = input_mask, 
                               labels = labels)
          
          # recuperate the predictions and the loss
          preds, loss = outputs.logits, outputs.loss
        
        else:

          # effectuons un passage vers l'avant
          outputs = self.model(input_, input_mask, labels, labels_mask, pad_token_id = pad_token_id)

          # recuperate the predictions and the loss
          preds, loss = outputs['preds'], outputs['loss']

        # effectuons un passage vers l'arrière
        loss.backward()

        # forcons les valeurs des gradients à se tenir dans un certain interval si nécessaire
        if not self.clipping_value is None:

            utils.clip_grad_value_(
                self.model.parameters(), clip_value=self.clipping_value
            )

        # mettons à jour les paramètres
        self.optimizer.step()

        # reinitialisation des gradients
        self.optimizer.zero_grad()

        return preds, loss

    def batch_eval(self, input_: torch.Tensor, input_mask: torch.Tensor,
                    labels: torch.Tensor, labels_mask: torch.Tensor, pad_token_id: int = 3):

        if self.hugging_face: # Nous allons utilise un modèle text to text de hugging face (but only for fine-tuning)

          # effectuons un passage vers l'avant
          outputs = self.model(input_ids = input_, attention_mask = input_mask, 
                               labels = labels)
          
          # recuperate the predictions and the loss
          preds, loss = outputs.logits, outputs.loss

        else:

          # effectuons un passage vers l'avant
          outputs = self.model(input_, input_mask, labels, labels_mask, pad_token_id = pad_token_id)

          # recuperate the predictions and the loss
          preds, loss = outputs['preds'], outputs['loss']

        return preds, loss

    # On a décidé d'ajouter quelques paramètres qui ont été utiles au niveau des enciennes classes d'entraînement
    def compile(
        self,
        train_dataset: Dataset,
        test_dataset: Union[Dataset, None] = None,
        tokenizer: Union[Tokenizer, None] = None,
        train_loader_kwargs: dict = {"batch_size": 16},
        test_loader_kwargs: dict = {"batch_size": 16},
        optimizer_kwargs: dict = {"lr": 1e-4, "weight_decay": 0.4},
        model_kwargs: dict = {'class_criterion': nn.CrossEntropyLoss(label_smoothing=0.1)},
        lr_scheduler_kwargs: dict = {'d_model': 512, 'lr_warmup_step': 100},
        lr_scheduler = None,
        gradient_clipping_value: Union[float, torch.Tensor, None] = None,
        predict_with_generate: bool = False,
        logging_dir: Union[str, None] = None,
        hugging_face: bool = False,
    ):

        if self.seed:
            torch.manual_seed(self.seed)

        # On devra utiliser la méthode 'spread' car on ne connait pas les paramètres du modèle
        if isinstance(self.orig_model, nn.Module): # si c'est une instance d'un modèle alors pas de paramètres requis
            
            self.model = copy.deepcopy(self.orig_model).to(self.device)
        
        else: # sinon on fournit les paramètres
        
            self.model = copy.deepcopy(self.orig_model(**model_kwargs)).to(self.device)

        # Initialisation des paramètres de l'optimiseur
        self.optimizer = self.orig_optimizer(
            self.model.parameters(), **optimizer_kwargs
        )
        
        # On ajoute un réducteur de taux d'apprentissage si nécessaire
        self.lr_scheduling = None

        if not lr_scheduler is None and self.lr_scheduling is None:

            self.lr_scheduling = lr_scheduler(self.optimizer, **lr_scheduler_kwargs)

        self.train_loader = DataLoader(
            train_dataset,
            shuffle=True,
            **train_loader_kwargs,
        )
        
        if test_dataset:
          self.test_loader = DataLoader(
              test_dataset,
              shuffle=False,
              **test_loader_kwargs,
          )
        
        else:
          self.test_loader = None
        
        # Let us initialize the clipping value to make gradient clipping
        self.clipping_value = gradient_clipping_value

        # Other parameters for step tracking and metrics
        self.compilation = True

        self.current_epoch = None

        self.best_score = None

        self.best_epoch = self.current_epoch

        # Recuperate some boolean attributes
        self.predict_with_generate = predict_with_generate

        # Recuperate tokenizer
        self.tokenizer = tokenizer
        
        # Recuperate the logging directory
        self.logging_dir = logging_dir
        
        # Initialize the metrics
        self.metrics = {}

        # Initialize the attribute which indicate if the model is from huggingface
        self.hugging_face = hugging_face
        

    def train(
        self,
        epochs: int = 100,
        auto_save: bool = False,
        log_step: Union[int, None] = None,
        saving_directory: str = "data/checkpoints/last_checkpoints",
        file_name: str = "checkpoints",
        save_best: bool = True,
        metric_for_best_model: str = 'test_loss',
        metric_objective: str = 'minimize'
    ):
        """Entraînement du modèle

        Args:
            epochs (int, optional): Le nombre d'itérations. Defaults to 100.
            auto_save (bool, optional): Auto-sauvegarde du modèle. Defaults to False.
            log_step (int, optional): Le nombre d'itération avant d'afficher les performances. Defaults to 1.
            saving_directory (str, optional): Le dossier de sauvegarde du modèle. Defaults to "inception_package/storage".
            file_name (str, optional): Le nom du fichier de sauvegarde. Defaults to "checkpoints".
            save_best_only (bool): Une varible booléenne indiquant si l'on souhaite ne sauvegarder que le meilleur modèle. Defaults to True.
            metric_for_best_model (str): Le nom de la métrique qui permet de choisir le meilleur modèle. Defaults to 'eval_loss'.
            metric_objective (str): Indique si la métrique doit être maximisée 'maximize' ou minimisée 'minimize'. Defaults to 'minimize'.
            add_to_wandb (bool): Indique si les métrique seront ajouté à un projet dans wandb. Defaults to False. 

        Raises:
            Exception: L'entraînement implique d'avoir déja initialisé les paramètres
        """

        # the file name cannot be "best_checkpoints"
        assert file_name != "best_checkpoints"
        
        ##################### Error Handling ##################################################
        if not self.compilation:
            raise Exception("You must initialize datasets and\
                            parameters with `compile` method. Make sure you don't forget any of them before \n \
                                training the model"
            )

        ##################### Initializations #################################################

        if metric_objective in ['maximize', 'minimize']:

          best_score = float('-inf') if metric_objective == 'maximize' else float('inf')

        else:

          raise ValueError("The metric objective can only between 'maximize' or minimize!")

        if not self.best_score is None:

          best_score = self.best_score

        start_epoch = self.current_epoch if not self.current_epoch is None else 0

        ##################### Training ########################################################

        modes = ['train', 'test'] if not self.test_loader is None else ['train']

        for epoch in tqdm(range(start_epoch, start_epoch + epochs)):

            # Print the actual learning rate
            print(f"For epoch {epoch + 1}: {{Learning rate: {self.lr_scheduling.get_lr()}}}")

            self.metrics = {}
        
            for mode in modes:

                with torch.set_grad_enabled(mode == "train"):

                    # Initialize the loss of the current mode
                    self.metrics[f'{mode}_loss'] = 0

                    if mode == "train":

                        self.model.train()

                        loader = list(iter(self.train_loader))

                    else:

                        self.model.eval()

                        loader = list(iter(self.test_loader))
                    
                    with trange(len(loader), unit = "batches", position = 0, leave = True) as pbar:

                      for i in pbar:
                        
                        pbar.set_description(f"{mode[0].upper() + mode[1:]} batch number {i}")
                        
                        data = loader[i]

                        input_ = data[0].long().to(self.device)
                        
                        input_mask = data[1].to(self.device)

                        labels = data[2].long().to(self.device)

                        labels_mask = data[3].to(self.device)
                        
                        # Récupération de identifiant token du padding (par défaut = 3)
                        pad_token_id = 3 if self.tokenizer is None else self.tokenizer.pad_token_id

                        preds, loss = (
                            self.batch_train(input_, input_mask, labels, labels_mask, pad_token_id)
                            if mode == "train"
                            else self.batch_eval(input_, input_mask, labels, labels_mask, pad_token_id)
                        )

                        n_attr = 0

                        self.metrics[f"{mode}_loss"] += loss.item()

                        # Réduction du taux d'apprentissage à chaque itération si nécessaire
                        if not self.lr_scheduling is None:

                            self.model.train()

                            self.lr_scheduling.step()

                        if not self.evaluation is None and mode == "test":
                          
                          if self.predict_with_generate:

                            if self.hugging_face:

                                preds = self.model.generate(input_, attention_mask = input_mask)

                            else:

                                preds = self.model.generate(input_, input_mask, pad_token_id = pad_token_id)

                                labels = labels.masked_fill_(labels_mask == 0, -100)
                                
                          else:

                            if self.hugging_face:

                                preds = torch.argmax(preds, dim = -1)
                            
                            else:
                                
                                labels = labels.masked_fill_(labels_mask == 0, -100)
                          
                          self.metrics.update(self.evaluation.compute_metrics((preds.cpu(), labels.cpu())))
                      
                      # torch.cuda.empty_cache()

            self.metrics[f"train_loss"] = self.metrics[f"train_loss"] / len(self.train_loader)

            for metric in self.metrics:

               if metric != 'train_loss':

                self.metrics[metric] = self.metrics[metric] / len(self.test_loader)

            # Affichage des métriques
            if not log_step is None and (epoch + 1) % log_step == 0:

              print(f"\nMetrics: {self.metrics}")
              
              if not self.logging_dir is None:
                  
                  with SummaryWriter(self.logging_dir) as writer:
                      
                      for metric in self.metrics:
                          
                        writer.add_scalar(metric, self.metrics[metric], global_step = epoch)
                        
                        writer.add_scalar("global_step", epoch)

            print("\n=============================\n")

            ##################### Model saving #########################################################

            # Save the model in the end of the current epoch. Sauvegarde du modèle à la fin d'une itération
            if auto_save:

                self.current_epoch = epoch
                
                if save_best:

                  # verify if the current score is best and recuperate it if yes
                  if metric_objective == 'maximize':
                    
                    last_score = best_score < self.metrics[metric_for_best_model]
                  
                  elif metric_objective == 'minimize':

                    last_score = best_score > self.metrics[metric_for_best_model]
                  
                  # recuperate the best score
                  if last_score: 

                    best_score = self.metrics[metric_for_best_model]

                    self.best_epoch = self.current_epoch + 1
                    
                    self.best_score = best_score 
                    
                    self.save(saving_directory, "best_checkpoints")
                             
                self.save(saving_directory, file_name)

    # Pour la méthode nous allons nous inspirer sur la méthode save de l'agent ddpg (RL) que l'on avait créée
    def save(
        self,
        directory: str = "data/checkpoints/last_checkpoints",
        file_name: str = "checkpoints"
    ):

          if not os.path.exists(directory):
              os.makedirs(directory)

          file_path = os.path.join(directory, f"{file_name}.pth")

          checkpoints = {
              "model_state_dict": self.model.state_dict(),
              "optimizer_state_dict": self.optimizer.state_dict(),
              "current_epoch": self.current_epoch,
              "metrics": self.metrics,
              "best_score": self.best_score,
              "best_epoch": self.best_epoch,
              "lr_scheduler_state_dict": self.lr_scheduling.state_dict()
          }

          torch.save(checkpoints, file_path)

          # update metrics and the best score dict
          self.metrics['current_epoch'] = self.current_epoch + 1

          best_score_dict = {"best_score": self.best_score, "best_epoch": self.best_epoch}

          # save the metrics as json file
          metrics = json.dumps({'metrics': self.metrics, "best_performance": best_score_dict}, indent=4)

          with open(os.path.join(directory, f'{file_name}.json'), 'w') as f:

            f.write(metrics)   
          
    # Ainsi que pour la méthode load
    def load(
        self,
        directory: str = "data/checkpoints/last_checkpoints",
        file_name: str = "checkpoints",
        load_best: bool = False
    ):

        if load_best: file_name = "best_checkpoints"
        
        file_path = os.path.join(
            directory, 
            f"{file_name}.pth"
        )

        if os.path.exists(file_path):

            checkpoints = torch.load(file_path)

            self.model.load_state_dict(checkpoints["model_state_dict"])

            self.optimizer.load_state_dict(checkpoints["optimizer_state_dict"])

            self.current_epoch = checkpoints["current_epoch"]

            self.best_score = checkpoints["best_score"]

            self.best_epoch = checkpoints["best_epoch"]

            self.lr_scheduling.load_state_dict(checkpoints["lr_scheduler_state_dict"])

        else:

            raise OSError(
                f"Le fichier {file_path} est introuvable. Vérifiez si le chemin fourni est correct!"
            )
    
    def evaluate(self, test_dataset, batch_size: int = 16, loader_kwargs: dict = {}):

        test_loader = list(iter(DataLoader(
            test_dataset,
            batch_size,
            shuffle=False,
            **loader_kwargs,
        )))

        metrics = {'test_loss': 0}

        results = {'original_sentences': [], 'translations': [], 'predictions': []}

        with torch.no_grad():

          with trange(len(test_loader), unit = "batches", position = 0, leave = True) as pbar:

            for i in pbar:
              
              pbar.set_description(f"Evaluation batch number {i}")
              
              data = test_loader[i]
                          
              try:
                  input_ = data[0].float().to(self.device)
              except AttributeError:
                  input_ = data[0]
              
              input_mask = data[1].to(self.device)

              labels = data[2].long().to(self.device)

              labels_mask = data[3].to(self.device)

              preds, loss = self.batch_eval(input_, input_mask, labels, labels_mask, test_dataset.tokenizer.pad_token_id)

              self.metrics[f"test_loss"] += loss.item()

              # let us recuperate the original sentences
              results['original_sentences'].extend(test_dataset.tokenizer.batch_decode(input_, skip_special_tokens = True))

              results['translations'].extend(test_dataset.tokenizer.batch_decode(labels, skip_special_tokens = True))

              results['predictions'].extend(test_dataset.tokenizer.batch_decode(preds, skip_special_tokens = True))

              if not self.evaluation is None:
                
                # labels = labels.masked_fill_(labels_mask == 0, -100)

                metrics.update(self.evaluation.compute_metrics((preds.cpu(), labels.cpu())))

          for metric in self.metrics:

            self.metrics[metric] = self.metrics[metric] / len(self.test_loader)

          return self.metrics, results
        
            
            
            
            

Overwriting wolof-translate/wolof_translate/trainers/transformer_trainer.py


## French to wolof

### Configure dataset 🔠

In [7]:
%%writefile wolof-translate/wolof_translate/utils/split_with_valid.py
""" This module contains a function which split the data. It will consider adding the validation set
"""
from sklearn.model_selection import train_test_split
import pandas as pd
import os

def split_data(random_state: int = 50, data_directory: str = "data/extractions/new_data"):
  """Split data between train, validation and test sets

  Args:
    random_state (int): the seed of the splitting generator. Defaults to 50
  """
  # load the corpora and split into train and test sets
  corpora = pd.read_csv(os.path.join(data_directory, "sentences.csv"))

  train_set, test_set = train_test_split(corpora, test_size=0.1, random_state=random_state)

  # let us save the final training set when performing

  train_set, valid_set = train_test_split(train_set, test_size=0.1, random_state=random_state)

  train_set.to_csv(os.path.join(data_directory, "final_train_set.csv"), index=False)

  # let us save the sets
  train_set.to_csv(os.path.join(data_directory, "train_set.csv"), index=False)

  valid_set.to_csv(os.path.join(data_directory, "valid_set.csv"), index=False)

  test_set.to_csv(os.path.join(data_directory, "test_set.csv"), index=False)

Overwriting wolof-translate/wolof_translate/utils/split_with_valid.py


In [8]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v3.json")


The following function is used to recuperate the datasets from csv files. The test test is not anymore the validation set which is now part of the train set.

In [9]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float):

  # Create augmentation to add on French sentences
  fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p),
                                        remove_mark_space, delete_guillemet_space)

  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(f"data/extractions/new_data/train_set.csv",
                                        tokenizer,
                                        truncation = True,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the validation dataset
  valid_dataset = SentenceDataset(f"data/extractions/new_data/valid_set.csv",
                                        tokenizer,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

### Configure the evaluation class ⚙️

We will evaluate the predictions with the `bleu` metric. The predictions will be generated like we did when making hyperparameter search.

In [10]:
%%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, self.tokenizer.pad_token_id)
        
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

Overwriting wolof-translate/wolof_translate/utils/evaluation.py


### Training

Let us import the transformer, the data splitter, the learning rate scheduler and the evaluation function, bellow:

In [11]:
from wolof_translate.models.transformers.optimization import TransformerScheduler
from wolof_translate.trainers.transformer_trainer import ModelRunner
from wolof_translate.utils.evaluation import TranslationEvaluation
from wolof_translate.models.transformers.main import Transformer
from wolof_translate.utils.split_with_valid import split_data


Using the latest cached version of the module from C:\Users\Oumar Kane\.cache\huggingface\modules\evaluate_modules\metrics\evaluate-metric--sacrebleu\28676bf65b4f88b276df566e48e603732d0b4afd237603ebdf92acaacf5be99b (last modified on Wed Apr 26 19:02:40 2023) since it couldn't be found locally at evaluate-metric--sacrebleu, or remotely on the Hugging Face Hub.


Let us configure the parameters.

In [12]:
# let us initialize the hyperparameter configuration
config = {
    'random_state': 0,
    'fr_char_p': 0.19260604905553697,
    'fr_word_p': 0.5561734876075831,
    'dim_ff': 2199,
    'drop_out_rate': 0.5561734876075831,
    'label_smoothing': 0.1,
    'n_layers': 10,
    'n_features': 138,
    'learning_rate': 0.5561734876075831,
    'weight_decay': 0.5415009274693046,
    'batch_size': 16,
    'model_dir': 'data/checkpoints/fw_custom_v3_checkpoints/',
    'new_model_dir': 'data/checkpoints/custom_results_fw_v3/'
}

# let us initialize the evaluation class
evaluation = TranslationEvaluation(tokenizer)

# let us initialize the trainer
trainer = ModelRunner(model = Transformer, seed = 0, evaluation = evaluation)

# split the data
split_data(config['random_state'])

# recuperate train and test set
train_dataset, test_dataset = recuperate_datasets(config['fr_char_p'], 
                                                    config['fr_word_p'])

# initialize the encoder and the decoder layers
encoder_layer = nn.TransformerEncoderLayer(256, 
                                            8,
                                            config['dim_ff'],
                                            config['drop_out_rate'], batch_first = True)

decoder_layer = nn.TransformerDecoderLayer(256, 
                                            8,
                                            config['dim_ff'],
                                            config['drop_out_rate'], batch_first = True)

# let us initialize the encoder and the decoder
encoder = nn.TransformerEncoder(encoder_layer, 6)

decoder = nn.TransformerDecoder(decoder_layer, 6)

# Initialize the scheduler parameters
scheduler_args = {'d_model': 256, 'lr_warmup_step': 1000}

# Initialize the transformer parameters
model_args = {
    'vocab_size': len(tokenizer),
    'encoder': encoder,
    'decoder': decoder,
    'class_criterion': nn.CrossEntropyLoss(label_smoothing = config['label_smoothing']),
    'n_poses_max': train_dataset.max_len,
    'n_layers': config['n_layers'],
    'n_features': config['n_features']
}

# Initialize the optimizer parameters
optimizer_args = {
    'lr': config['learning_rate'],
    'weight_decay': config['weight_decay'],
    'betas': (0.9, 0.98),
}

# Initialize the loaders parameters
train_loader_args = {'batch_size': config['batch_size']}

# Add the datasets and hyperparameters to trainer
trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                optimizer_kwargs = optimizer_args, model_kwargs = model_args,
                lr_scheduler=TransformerScheduler,
                lr_scheduler_kwargs=scheduler_args, 
                predict_with_generate = True,
                logging_dir="data/logs/custom_fw_v3"
                )

# We will from checkpoints so let us the model
trainer.load(config['model_dir'])

        

Let us train the model.

In [13]:
# Train the model 
trainer.train(200, auto_save = True, log_step = 1, saving_directory=config['model_dir'], 
              metric_for_best_model='bleu',
              metric_objective='maximize')

  0%|          | 0/200 [00:00<?, ?it/s]

For epoch 3: {Learning rate: [0.00025]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.84batches/s]
  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.84s/batches]



Metrics: {'train_loss': 42.677930343441844, 'test_loss': 39.06557769775391, 'bleu': 0.20981, 'gen_len': 1.0}




  0%|          | 1/200 [00:42<2:22:28, 42.96s/it]

For epoch 4: {Learning rate: [0.0003125]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.16batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.70s/batches]



Metrics: {'train_loss': 28.823188176969204, 'test_loss': 45.23988304138184, 'bleu': 0.18739, 'gen_len': 1.1}




  1%|          | 2/200 [01:21<2:12:45, 40.23s/it]

For epoch 5: {Learning rate: [0.000375]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.70s/batches]



Metrics: {'train_loss': 25.558435509844525, 'test_loss': 43.74214324951172, 'bleu': 0.18739, 'gen_len': 1.1}




  2%|▏         | 3/200 [01:57<2:06:43, 38.59s/it]

For epoch 6: {Learning rate: [0.0004375]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 22.855377267046673, 'test_loss': 40.622360420227054, 'bleu': 0.22284, 'gen_len': 1.1}




  2%|▏         | 4/200 [02:33<2:02:30, 37.50s/it]

For epoch 7: {Learning rate: [0.0005]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 20.670675138147868, 'test_loss': 37.15907287597656, 'bleu': 0.18739, 'gen_len': 1.1}




  2%|▎         | 5/200 [03:10<2:00:54, 37.20s/it]

For epoch 8: {Learning rate: [0.0005625000000000001]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 18.589839481725924, 'test_loss': 32.99825649261474, 'bleu': 0.18739, 'gen_len': 1.1}




  3%|▎         | 6/200 [03:46<1:59:27, 36.94s/it]

For epoch 9: {Learning rate: [0.000625]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.67s/batches]



Metrics: {'train_loss': 16.94987369165188, 'test_loss': 33.76158199310303, 'bleu': 0.18739, 'gen_len': 1.1}




  4%|▎         | 7/200 [04:23<1:58:35, 36.87s/it]

For epoch 10: {Learning rate: [0.0006875]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.60s/batches]



Metrics: {'train_loss': 15.713123123820235, 'test_loss': 35.5274995803833, 'bleu': 0.18739, 'gen_len': 1.1}




  4%|▍         | 8/200 [05:00<1:57:43, 36.79s/it]

For epoch 11: {Learning rate: [0.00075]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.68s/batches]



Metrics: {'train_loss': 14.769121088632723, 'test_loss': 140.88019256591798, 'bleu': 0.36671, 'gen_len': 0.6}




  4%|▍         | 9/200 [05:37<1:57:10, 36.81s/it]

For epoch 12: {Learning rate: [0.0008125000000000001]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.60s/batches]



Metrics: {'train_loss': 14.100800706119072, 'test_loss': 45.54373836517334, 'bleu': 0.0, 'gen_len': 0.8}




  5%|▌         | 10/200 [06:13<1:56:16, 36.72s/it]

For epoch 13: {Learning rate: [0.000875]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.29batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 13.982559442520142, 'test_loss': 29.37886772155762, 'bleu': 0.25585, 'gen_len': 0.85}




  6%|▌         | 11/200 [06:51<1:56:52, 37.11s/it]

For epoch 14: {Learning rate: [0.0009375]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 13.666302349509262, 'test_loss': 32.599399185180665, 'bleu': 0.18739, 'gen_len': 1.1}




  6%|▌         | 12/200 [07:28<1:56:03, 37.04s/it]

For epoch 15: {Learning rate: [0.001]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.69s/batches]



Metrics: {'train_loss': 13.460913675587351, 'test_loss': 25.029061126708985, 'bleu': 0.20981, 'gen_len': 1.0}




  6%|▋         | 13/200 [08:07<1:57:29, 37.70s/it]

For epoch 16: {Learning rate: [0.0010625]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 13.280211902246243, 'test_loss': 33.097079277038574, 'bleu': 0.22284, 'gen_len': 1.1}




  7%|▋         | 14/200 [08:43<1:55:17, 37.19s/it]

For epoch 17: {Learning rate: [0.0011250000000000001]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 13.190874099731445, 'test_loss': 28.809939193725587, 'bleu': 0.20981, 'gen_len': 1.0}




  8%|▊         | 15/200 [09:19<1:53:38, 36.86s/it]

For epoch 18: {Learning rate: [0.0011875]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.44batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.71s/batches]



Metrics: {'train_loss': 13.212477561904162, 'test_loss': 29.166506004333495, 'bleu': 0.2495, 'gen_len': 1.0}




  8%|▊         | 16/200 [09:57<1:53:35, 37.04s/it]

For epoch 19: {Learning rate: [0.00125]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.60s/batches]



Metrics: {'train_loss': 13.152416863092562, 'test_loss': 30.870107460021973, 'bleu': 0.0, 'gen_len': 1.1}




  8%|▊         | 17/200 [10:33<1:51:54, 36.69s/it]

For epoch 20: {Learning rate: [0.0013125]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 13.19662106909403, 'test_loss': 27.290746116638182, 'bleu': 0.20981, 'gen_len': 1.0}




  9%|▉         | 18/200 [11:09<1:50:47, 36.52s/it]

For epoch 21: {Learning rate: [0.001375]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 13.152004910678398, 'test_loss': 31.942746353149413, 'bleu': 0.51061, 'gen_len': 1.1}




 10%|▉         | 19/200 [11:44<1:48:59, 36.13s/it]

For epoch 22: {Learning rate: [0.0014375]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 13.072381205675079, 'test_loss': 27.139996910095213, 'bleu': 0.0, 'gen_len': 1.0}




 10%|█         | 20/200 [12:21<1:48:54, 36.30s/it]

For epoch 23: {Learning rate: [0.0015]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.74s/batches]



Metrics: {'train_loss': 13.044931900210496, 'test_loss': 22.851058578491212, 'bleu': 0.20981, 'gen_len': 1.0}




 10%|█         | 21/200 [12:58<1:49:14, 36.62s/it]

For epoch 24: {Learning rate: [0.0015625]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.943584651481814, 'test_loss': 22.593192481994627, 'bleu': 0.0, 'gen_len': 0.9}




 11%|█         | 22/200 [13:34<1:48:06, 36.44s/it]

For epoch 25: {Learning rate: [0.0016250000000000001]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 12.966275796657655, 'test_loss': 25.35263137817383, 'bleu': 0.23839000000000002, 'gen_len': 1.0}




 12%|█▏        | 23/200 [14:10<1:47:24, 36.41s/it]

For epoch 26: {Learning rate: [0.0016875]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 12.951127325616232, 'test_loss': 26.87117805480957, 'bleu': 0.27612000000000003, 'gen_len': 1.0}




 12%|█▏        | 24/200 [14:46<1:46:23, 36.27s/it]

For epoch 27: {Learning rate: [0.00175]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.911633055384566, 'test_loss': 22.426051712036134, 'bleu': 0.0, 'gen_len': 0.9}




 12%|█▎        | 25/200 [15:23<1:46:03, 36.36s/it]

For epoch 28: {Learning rate: [0.0018125]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.29batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.87623822398302, 'test_loss': 21.629340744018556, 'bleu': 0.0, 'gen_len': 0.9}




 13%|█▎        | 26/200 [16:04<1:49:35, 37.79s/it]

For epoch 29: {Learning rate: [0.001875]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.904119834667299, 'test_loss': 24.755013084411623, 'bleu': 0.0, 'gen_len': 1.0}




 14%|█▎        | 27/200 [16:40<1:47:40, 37.34s/it]

For epoch 30: {Learning rate: [0.0019375]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.794695639028781, 'test_loss': 19.904039287567137, 'bleu': 0.23839000000000002, 'gen_len': 0.9}




 14%|█▍        | 28/200 [17:16<1:45:58, 36.97s/it]

For epoch 31: {Learning rate: [0.002]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.69s/batches]



Metrics: {'train_loss': 12.842719677017957, 'test_loss': 25.530256080627442, 'bleu': 0.0, 'gen_len': 1.0}




 14%|█▍        | 29/200 [17:53<1:45:16, 36.94s/it]

For epoch 32: {Learning rate: [0.0020625]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.775162161850348, 'test_loss': 23.45066032409668, 'bleu': 0.32836, 'gen_len': 0.9}




 15%|█▌        | 30/200 [18:30<1:44:18, 36.81s/it]

For epoch 33: {Learning rate: [0.002125]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.26batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.739238890205941, 'test_loss': 19.96805648803711, 'bleu': 0.51061, 'gen_len': 0.9}




 16%|█▌        | 31/200 [19:07<1:44:09, 36.98s/it]

For epoch 34: {Learning rate: [0.0021875]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 12.71192507046025, 'test_loss': 19.026727962493897, 'bleu': 0.3629, 'gen_len': 0.9}




 16%|█▌        | 32/200 [19:43<1:42:54, 36.75s/it]

For epoch 35: {Learning rate: [0.0022500000000000003]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.74078570924154, 'test_loss': 21.90325927734375, 'bleu': 0.0, 'gen_len': 0.9}




 16%|█▋        | 33/200 [20:19<1:41:39, 36.53s/it]

For epoch 36: {Learning rate: [0.0023125]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.68s/batches]



Metrics: {'train_loss': 12.670165079395945, 'test_loss': 18.265416812896728, 'bleu': 0.0, 'gen_len': 0.8}




 17%|█▋        | 34/200 [20:56<1:41:01, 36.51s/it]

For epoch 37: {Learning rate: [0.002375]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.708665126707496, 'test_loss': 19.120588302612305, 'bleu': 0.32836, 'gen_len': 0.9}




 18%|█▊        | 35/200 [21:32<1:40:09, 36.42s/it]

For epoch 38: {Learning rate: [0.0024375]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 12.618057611511976, 'test_loss': 19.20975160598755, 'bleu': 0.0, 'gen_len': 0.9}




 18%|█▊        | 36/200 [22:09<1:39:58, 36.58s/it]

For epoch 39: {Learning rate: [0.0025]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.56038447124202, 'test_loss': 17.308978652954103, 'bleu': 0.0, 'gen_len': 0.8}




 18%|█▊        | 37/200 [22:45<1:38:54, 36.41s/it]

For epoch 40: {Learning rate: [0.0025625]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.34batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.578545779716677, 'test_loss': 19.250236797332764, 'bleu': 0.0, 'gen_len': 0.9}




 19%|█▉        | 38/200 [23:22<1:38:49, 36.60s/it]

For epoch 41: {Learning rate: [0.002625]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.471155143365628, 'test_loss': 16.481003284454346, 'bleu': 0.32836, 'gen_len': 0.8}




 20%|█▉        | 39/200 [24:01<1:40:05, 37.30s/it]

For epoch 42: {Learning rate: [0.0026875000000000002]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.444178354449388, 'test_loss': 20.3317271232605, 'bleu': 0.0, 'gen_len': 0.9}




 20%|██        | 40/200 [24:36<1:38:00, 36.75s/it]

For epoch 43: {Learning rate: [0.00275]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.23batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.67s/batches]



Metrics: {'train_loss': 12.350052978934311, 'test_loss': 17.04861898422241, 'bleu': 0.0, 'gen_len': 0.8}




 20%|██        | 41/200 [25:14<1:38:08, 37.03s/it]

For epoch 44: {Learning rate: [0.0028125]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.81s/batches]



Metrics: {'train_loss': 12.470755611977927, 'test_loss': 21.803909873962404, 'bleu': 0.0, 'gen_len': 0.9}




 21%|██        | 42/200 [25:52<1:38:13, 37.30s/it]

For epoch 45: {Learning rate: [0.002875]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.328283775143507, 'test_loss': 16.780427265167237, 'bleu': 0.40583, 'gen_len': 0.8}




 22%|██▏       | 43/200 [26:28<1:36:39, 36.94s/it]

For epoch 46: {Learning rate: [0.0029375]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.42batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.376909994497531, 'test_loss': 17.878368282318114, 'bleu': 0.0, 'gen_len': 0.8}




 22%|██▏       | 44/200 [27:05<1:35:43, 36.82s/it]

For epoch 47: {Learning rate: [0.003]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.16batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.249056699799329, 'test_loss': 20.548646926879883, 'bleu': 0.23839000000000002, 'gen_len': 0.9}




 22%|██▎       | 45/200 [27:42<1:35:47, 37.08s/it]

For epoch 48: {Learning rate: [0.0030625]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.20batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 12.518638465462661, 'test_loss': 18.29765625, 'bleu': 0.0, 'gen_len': 0.8}




 23%|██▎       | 46/200 [28:20<1:35:16, 37.12s/it]

For epoch 49: {Learning rate: [0.003125]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.40batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 12.366115436321351, 'test_loss': 18.208839225769044, 'bleu': 0.0, 'gen_len': 0.8}




 24%|██▎       | 47/200 [28:56<1:33:56, 36.84s/it]

For epoch 50: {Learning rate: [0.0031875000000000002]}


Train batch number 81: 100%|██████████| 82/82 [00:52<00:00,  1.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:58<00:00,  5.83s/batches]



Metrics: {'train_loss': 12.338353034926623, 'test_loss': 16.568228721618652, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 24%|██▍       | 48/200 [30:49<2:31:21, 59.74s/it]

For epoch 51: {Learning rate: [0.0032500000000000003]}


Train batch number 81: 100%|██████████| 82/82 [00:58<00:00,  1.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:58<00:00,  5.81s/batches]



Metrics: {'train_loss': 12.400510090153391, 'test_loss': 17.602467918395995, 'bleu': 0.32836, 'gen_len': 0.8}




 24%|██▍       | 49/200 [32:48<3:14:52, 77.44s/it]

For epoch 52: {Learning rate: [0.0033125]}


Train batch number 81: 100%|██████████| 82/82 [00:56<00:00,  1.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:53<00:00,  5.35s/batches]



Metrics: {'train_loss': 12.313397233079119, 'test_loss': 16.086814403533936, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 25%|██▌       | 50/200 [34:41<3:40:15, 88.11s/it]

For epoch 53: {Learning rate: [0.003375]}


Train batch number 81: 100%|██████████| 82/82 [00:53<00:00,  1.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:53<00:00,  5.37s/batches]



Metrics: {'train_loss': 12.311412694977552, 'test_loss': 19.777555084228517, 'bleu': 0.0, 'gen_len': 0.8}




 26%|██▌       | 51/200 [36:35<3:58:07, 95.89s/it]

For epoch 54: {Learning rate: [0.0034375]}


Train batch number 81: 100%|██████████| 82/82 [00:56<00:00,  1.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:54<00:00,  5.48s/batches]



Metrics: {'train_loss': 12.431403177540476, 'test_loss': 18.316141223907472, 'bleu': 0.29994, 'gen_len': 0.8}




 26%|██▌       | 52/200 [38:29<4:09:57, 101.33s/it]

For epoch 55: {Learning rate: [0.0035]}


Train batch number 81: 100%|██████████| 82/82 [00:55<00:00,  1.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:52<00:00,  5.29s/batches]



Metrics: {'train_loss': 12.378835637394975, 'test_loss': 20.055805206298828, 'bleu': 0.0, 'gen_len': 0.9}




 26%|██▋       | 53/200 [40:20<4:15:35, 104.32s/it]

For epoch 56: {Learning rate: [0.0035625]}


Train batch number 81: 100%|██████████| 82/82 [00:55<00:00,  1.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:51<00:00,  5.14s/batches]



Metrics: {'train_loss': 12.513912945258909, 'test_loss': 16.479571628570557, 'bleu': 0.0, 'gen_len': 0.8}




 27%|██▋       | 54/200 [42:09<4:17:30, 105.83s/it]

For epoch 57: {Learning rate: [0.003625]}


Train batch number 81: 100%|██████████| 82/82 [00:55<00:00,  1.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:52<00:00,  5.29s/batches]



Metrics: {'train_loss': 12.43775372970395, 'test_loss': 18.54354600906372, 'bleu': 0.0, 'gen_len': 0.8}




 28%|██▊       | 55/200 [44:01<4:19:33, 107.41s/it]

For epoch 58: {Learning rate: [0.0036875000000000002]}


Train batch number 81: 100%|██████████| 82/82 [00:56<00:00,  1.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:59<00:00,  5.95s/batches]



Metrics: {'train_loss': 12.339438072065027, 'test_loss': 17.02654685974121, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 28%|██▊       | 56/200 [45:59<4:25:59, 110.83s/it]

For epoch 59: {Learning rate: [0.00375]}


Train batch number 81: 100%|██████████| 82/82 [00:56<00:00,  1.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:54<00:00,  5.46s/batches]



Metrics: {'train_loss': 12.441719171477526, 'test_loss': 15.66793966293335, 'bleu': 0.46082, 'gen_len': 0.8}




 28%|██▊       | 57/200 [47:53<4:26:23, 111.77s/it]

For epoch 60: {Learning rate: [0.0038125]}


Train batch number 81: 100%|██████████| 82/82 [00:56<00:00,  1.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:53<00:00,  5.37s/batches]



Metrics: {'train_loss': 12.450922785735711, 'test_loss': 19.26196155548096, 'bleu': 0.0, 'gen_len': 0.8}




 29%|██▉       | 58/200 [49:46<4:25:13, 112.07s/it]

For epoch 61: {Learning rate: [0.003875]}


Train batch number 81: 100%|██████████| 82/82 [01:01<00:00,  1.33batches/s]
Test batch number 9: 100%|██████████| 10/10 [01:02<00:00,  6.24s/batches]



Metrics: {'train_loss': 12.377104410311071, 'test_loss': 20.194690895080566, 'bleu': 0.0, 'gen_len': 0.8}




 30%|██▉       | 59/200 [51:53<4:34:06, 116.64s/it]

For epoch 62: {Learning rate: [0.0039375]}


Train batch number 81: 100%|██████████| 82/82 [01:22<00:00,  1.01s/batches]
Test batch number 9: 100%|██████████| 10/10 [01:09<00:00,  6.92s/batches]



Metrics: {'train_loss': 12.466320537939303, 'test_loss': 18.16190195083618, 'bleu': 0.0, 'gen_len': 0.8}




 30%|███       | 60/200 [54:29<4:59:45, 128.47s/it]

For epoch 63: {Learning rate: [0.004]}


Train batch number 81: 100%|██████████| 82/82 [00:59<00:00,  1.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:47<00:00,  4.72s/batches]



Metrics: {'train_loss': 12.572107408104873, 'test_loss': 17.291724300384523, 'bleu': 0.0, 'gen_len': 0.8}




 30%|███       | 61/200 [56:18<4:43:54, 122.55s/it]

For epoch 64: {Learning rate: [0.0040625]}


Train batch number 81: 100%|██████████| 82/82 [00:15<00:00,  5.19batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.51s/batches]



Metrics: {'train_loss': 12.640457327772932, 'test_loss': 16.591095638275146, 'bleu': 0.0, 'gen_len': 0.8}




 31%|███       | 62/200 [56:51<3:39:41, 95.52s/it] 

For epoch 65: {Learning rate: [0.004125]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 12.46305911133929, 'test_loss': 17.34724826812744, 'bleu': 0.0, 'gen_len': 0.8}




 32%|███▏      | 63/200 [57:25<2:56:25, 77.27s/it]

For epoch 66: {Learning rate: [0.0041875]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 12.659225649949981, 'test_loss': 16.497353744506835, 'bleu': 0.46082, 'gen_len': 0.8}




 32%|███▏      | 64/200 [58:00<2:26:13, 64.51s/it]

For epoch 67: {Learning rate: [0.00425]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.67s/batches]



Metrics: {'train_loss': 12.58670597541623, 'test_loss': 16.870601749420167, 'bleu': 0.6237199999999999, 'gen_len': 0.8}




 32%|███▎      | 65/200 [58:36<2:06:03, 56.02s/it]

For epoch 68: {Learning rate: [0.0043125]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.70s/batches]



Metrics: {'train_loss': 12.612471935225695, 'test_loss': 16.494792556762697, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 33%|███▎      | 66/200 [59:13<1:52:21, 50.31s/it]

For epoch 69: {Learning rate: [0.004375]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.33batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.457941834519549, 'test_loss': 19.65790367126465, 'bleu': 0.32836, 'gen_len': 0.8}




 34%|███▎      | 67/200 [59:50<1:42:33, 46.27s/it]

For epoch 70: {Learning rate: [0.0044375000000000005]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.82batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 12.650409343766004, 'test_loss': 17.793671989440917, 'bleu': 0.0, 'gen_len': 0.8}




 34%|███▍      | 68/200 [1:00:30<1:37:26, 44.29s/it]

For epoch 71: {Learning rate: [0.0045000000000000005]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.77batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.55s/batches]



Metrics: {'train_loss': 12.67968947712968, 'test_loss': 16.81099967956543, 'bleu': 0.0, 'gen_len': 0.8}




 34%|███▍      | 69/200 [1:01:04<1:30:04, 41.25s/it]

For epoch 72: {Learning rate: [0.0045625]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 12.530528237180013, 'test_loss': 18.279810905456543, 'bleu': 0.32836, 'gen_len': 0.8}




 35%|███▌      | 70/200 [1:01:39<1:25:15, 39.35s/it]

For epoch 73: {Learning rate: [0.004625]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.652537933210047, 'test_loss': 16.714177513122557, 'bleu': 0.0, 'gen_len': 0.8}




 36%|███▌      | 71/200 [1:02:15<1:22:28, 38.36s/it]

For epoch 74: {Learning rate: [0.0046875]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.637621280623645, 'test_loss': 17.869838809967042, 'bleu': 0.0, 'gen_len': 0.8}




 36%|███▌      | 72/200 [1:02:50<1:19:55, 37.46s/it]

For epoch 75: {Learning rate: [0.00475]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.74batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.6126048971967, 'test_loss': 18.436441898345947, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 36%|███▋      | 73/200 [1:03:26<1:17:54, 36.81s/it]

For epoch 76: {Learning rate: [0.0048125]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.904958381885436, 'test_loss': 16.94376344680786, 'bleu': 0.0, 'gen_len': 0.8}




 37%|███▋      | 74/200 [1:04:01<1:16:31, 36.44s/it]

For epoch 77: {Learning rate: [0.004875]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 13.438204811840523, 'test_loss': 16.66407356262207, 'bleu': 0.0, 'gen_len': 0.7}




 38%|███▊      | 75/200 [1:04:36<1:14:51, 35.93s/it]

For epoch 78: {Learning rate: [0.0049375]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.60s/batches]



Metrics: {'train_loss': 12.915557646169894, 'test_loss': 16.24409112930298, 'bleu': 0.6237199999999999, 'gen_len': 0.8}




 38%|███▊      | 76/200 [1:05:11<1:13:44, 35.68s/it]

For epoch 79: {Learning rate: [0.005]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.72s/batches]



Metrics: {'train_loss': 12.93851960577616, 'test_loss': 16.8388879776001, 'bleu': 0.29994, 'gen_len': 0.8}




 38%|███▊      | 77/200 [1:05:48<1:13:57, 36.08s/it]

For epoch 80: {Learning rate: [0.0050625]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.830159617633354, 'test_loss': 16.534305763244628, 'bleu': 0.0, 'gen_len': 0.7}




 39%|███▉      | 78/200 [1:06:24<1:13:22, 36.09s/it]

For epoch 81: {Learning rate: [0.005125]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.807587536369882, 'test_loss': 16.557968139648438, 'bleu': 0.32836, 'gen_len': 0.8}




 40%|███▉      | 79/200 [1:06:59<1:12:15, 35.83s/it]

For epoch 82: {Learning rate: [0.0051875]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.28batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.861869021159846, 'test_loss': 16.575935459136964, 'bleu': 0.0, 'gen_len': 0.7}




 40%|████      | 80/200 [1:07:36<1:11:57, 35.98s/it]

For epoch 83: {Learning rate: [0.00525]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.85979287217303, 'test_loss': 17.712771129608154, 'bleu': 0.0, 'gen_len': 0.8}




 40%|████      | 81/200 [1:08:11<1:10:58, 35.78s/it]

For epoch 84: {Learning rate: [0.0053125]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.44batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.85063012053327, 'test_loss': 15.743866062164306, 'bleu': 0.40583, 'gen_len': 0.8}




 41%|████      | 82/200 [1:08:47<1:10:19, 35.76s/it]

For epoch 85: {Learning rate: [0.0053750000000000004]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 12.822488447514976, 'test_loss': 18.64317512512207, 'bleu': 0.0, 'gen_len': 0.9}




 42%|████▏     | 83/200 [1:09:22<1:09:19, 35.55s/it]

For epoch 86: {Learning rate: [0.0054375000000000005]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 12.82554363622898, 'test_loss': 16.650219535827638, 'bleu': 0.0, 'gen_len': 0.8}




 42%|████▏     | 84/200 [1:09:56<1:08:07, 35.23s/it]

For epoch 87: {Learning rate: [0.0055]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.837488436117404, 'test_loss': 16.58106985092163, 'bleu': 0.0, 'gen_len': 0.8}




 42%|████▎     | 85/200 [1:10:34<1:09:01, 36.01s/it]

For epoch 88: {Learning rate: [0.0055625]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.82batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.85627302309362, 'test_loss': 15.292312717437744, 'bleu': 0.0, 'gen_len': 0.7}




 43%|████▎     | 86/200 [1:11:08<1:07:26, 35.50s/it]

For epoch 89: {Learning rate: [0.005625]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.55s/batches]



Metrics: {'train_loss': 12.904859682408775, 'test_loss': 14.860434627532959, 'bleu': 0.32836, 'gen_len': 0.8}




 44%|████▎     | 87/200 [1:11:43<1:06:26, 35.28s/it]

For epoch 90: {Learning rate: [0.0056875]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 12.902568165848894, 'test_loss': 16.350125122070313, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 44%|████▍     | 88/200 [1:12:18<1:05:25, 35.05s/it]

For epoch 91: {Learning rate: [0.00575]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.966647241173721, 'test_loss': 13.945644092559814, 'bleu': 0.0, 'gen_len': 0.7}




 44%|████▍     | 89/200 [1:12:53<1:04:46, 35.01s/it]

For epoch 92: {Learning rate: [0.0058125]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.928322867649358, 'test_loss': 18.85077142715454, 'bleu': 0.23839000000000002, 'gen_len': 0.9}




 45%|████▌     | 90/200 [1:13:27<1:04:06, 34.97s/it]

For epoch 93: {Learning rate: [0.005875]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.880683294156702, 'test_loss': 18.04403257369995, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 46%|████▌     | 91/200 [1:14:03<1:03:54, 35.18s/it]

For epoch 94: {Learning rate: [0.0059375]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 12.98835067051213, 'test_loss': 16.123044872283934, 'bleu': 0.0, 'gen_len': 0.8}




 46%|████▌     | 92/200 [1:14:38<1:03:20, 35.19s/it]

For epoch 95: {Learning rate: [0.006]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.60s/batches]



Metrics: {'train_loss': 12.915140739301355, 'test_loss': 18.76964635848999, 'bleu': 0.23839000000000002, 'gen_len': 0.9}




 46%|████▋     | 93/200 [1:15:13<1:02:39, 35.13s/it]

For epoch 96: {Learning rate: [0.0060625]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.85s/batches]



Metrics: {'train_loss': 12.832684639023572, 'test_loss': 14.966939353942871, 'bleu': 0.0, 'gen_len': 0.7}




 47%|████▋     | 94/200 [1:15:51<1:03:25, 35.90s/it]

For epoch 97: {Learning rate: [0.006125]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.991557987724862, 'test_loss': 15.784984588623047, 'bleu': 0.0, 'gen_len': 0.8}




 48%|████▊     | 95/200 [1:16:26<1:02:23, 35.65s/it]

For epoch 98: {Learning rate: [0.0061875]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.76batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.936553315418523, 'test_loss': 17.548742294311523, 'bleu': 0.0, 'gen_len': 0.8}




 48%|████▊     | 96/200 [1:17:01<1:01:28, 35.47s/it]

For epoch 99: {Learning rate: [0.00625]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.971190150191145, 'test_loss': 15.495287132263183, 'bleu': 0.32836, 'gen_len': 0.8}




 48%|████▊     | 97/200 [1:17:37<1:00:50, 35.44s/it]

For epoch 100: {Learning rate: [0.006218982438812432]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.73batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.985943788435401, 'test_loss': 16.236849308013916, 'bleu': 0.40583, 'gen_len': 0.8}




 49%|████▉     | 98/200 [1:18:12<1:00:05, 35.35s/it]

For epoch 101: {Learning rate: [0.006188422143604214]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 13.104951602656667, 'test_loss': 17.253271293640136, 'bleu': 0.0, 'gen_len': 0.8}




 50%|████▉     | 99/200 [1:18:48<59:51, 35.56s/it]  

For epoch 102: {Learning rate: [0.0061583079885268325]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 13.143378414758821, 'test_loss': 16.683351039886475, 'bleu': 0.0, 'gen_len': 0.8}




 50%|█████     | 100/200 [1:19:23<58:59, 35.40s/it]

For epoch 103: {Learning rate: [0.006128629223068251]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.74batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 12.95172708208968, 'test_loss': 20.325542831420897, 'bleu': 0.40583, 'gen_len': 0.9}




 50%|█████     | 101/200 [1:19:58<58:16, 35.31s/it]

For epoch 104: {Learning rate: [0.006099375455928332]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 13.25640181215798, 'test_loss': 15.091315650939942, 'bleu': 0.32836, 'gen_len': 0.7}




 51%|█████     | 102/200 [1:20:33<57:47, 35.38s/it]

For epoch 105: {Learning rate: [0.006070536639732901]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 13.011621504295164, 'test_loss': 15.099224662780761, 'bleu': 0.0, 'gen_len': 0.8}




 52%|█████▏    | 103/200 [1:21:09<57:17, 35.44s/it]

For epoch 106: {Learning rate: [0.006042103056535397]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 13.155700916197242, 'test_loss': 16.89585952758789, 'bleu': 0.48328, 'gen_len': 0.8}




 52%|█████▏    | 104/200 [1:21:44<56:35, 35.37s/it]

For epoch 107: {Learning rate: [0.006014065304058602]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 13.04373819072072, 'test_loss': 15.642766094207763, 'bleu': 0.0, 'gen_len': 0.8}




 52%|█████▎    | 105/200 [1:22:20<56:04, 35.42s/it]

For epoch 108: {Learning rate: [0.005986414282632196]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 12.895498723518557, 'test_loss': 15.387954711914062, 'bleu': 0.0, 'gen_len': 0.8}




 53%|█████▎    | 106/200 [1:22:55<55:22, 35.35s/it]

For epoch 109: {Learning rate: [0.005959141182784952]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.75s/batches]



Metrics: {'train_loss': 12.977800497194616, 'test_loss': 15.983740043640136, 'bleu': 0.0, 'gen_len': 0.8}




 54%|█████▎    | 107/200 [1:23:32<55:47, 36.00s/it]

For epoch 110: {Learning rate: [0.005932237473453119]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 13.040412850496246, 'test_loss': 15.506861591339112, 'bleu': 0.0, 'gen_len': 0.7}




 54%|█████▍    | 108/200 [1:24:08<54:56, 35.83s/it]

For epoch 111: {Learning rate: [0.005905694890769176]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.68s/batches]



Metrics: {'train_loss': 12.976079673301882, 'test_loss': 15.798943138122558, 'bleu': 0.0, 'gen_len': 0.8}




 55%|█████▍    | 109/200 [1:24:44<54:18, 35.80s/it]

For epoch 112: {Learning rate: [0.005879505427397483]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.60s/batches]



Metrics: {'train_loss': 13.047729846907824, 'test_loss': 16.29779796600342, 'bleu': 0.0, 'gen_len': 0.8}




 55%|█████▌    | 110/200 [1:25:19<53:29, 35.66s/it]

For epoch 113: {Learning rate: [0.005853661322385587]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.958714711956862, 'test_loss': 16.14857635498047, 'bleu': 0.0, 'gen_len': 0.8}




 56%|█████▌    | 111/200 [1:25:56<53:19, 35.94s/it]

For epoch 114: {Learning rate: [0.005828155051501961]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 13.046672809414748, 'test_loss': 16.71113004684448, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 56%|█████▌    | 112/200 [1:26:31<52:29, 35.79s/it]

For epoch 115: {Learning rate: [0.005802979318032871]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 12.842651181104706, 'test_loss': 14.513495922088623, 'bleu': 0.43215000000000003, 'gen_len': 0.7}




 56%|█████▋    | 113/200 [1:27:07<52:06, 35.94s/it]

For epoch 116: {Learning rate: [0.005778127044012803]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.953490199112311, 'test_loss': 15.756228828430176, 'bleu': 0.0, 'gen_len': 0.8}




 57%|█████▋    | 114/200 [1:27:43<51:19, 35.81s/it]

For epoch 117: {Learning rate: [0.005753591361864521]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.875303303323141, 'test_loss': 14.358842182159425, 'bleu': 0.32836, 'gen_len': 0.7}




 57%|█████▊    | 115/200 [1:28:18<50:37, 35.74s/it]

For epoch 118: {Learning rate: [0.005729365606426321]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.930230844311598, 'test_loss': 15.864393424987792, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 58%|█████▊    | 116/200 [1:28:55<50:22, 35.99s/it]

For epoch 119: {Learning rate: [0.00570544330734548]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.890129531302103, 'test_loss': 16.552230834960938, 'bleu': 0.0, 'gen_len': 0.8}




 58%|█████▊    | 117/200 [1:29:30<49:26, 35.74s/it]

For epoch 120: {Learning rate: [0.005681818181818182]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.947211352790275, 'test_loss': 17.9154109954834, 'bleu': 0.0, 'gen_len': 0.8}




 59%|█████▉    | 118/200 [1:30:06<48:55, 35.80s/it]

For epoch 121: {Learning rate: [0.005658484127657408]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 13.010080291003716, 'test_loss': 14.58332986831665, 'bleu': 0.0, 'gen_len': 0.7}




 60%|█████▉    | 119/200 [1:30:41<48:06, 35.63s/it]

For epoch 122: {Learning rate: [0.005635435216671452]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.996974462416114, 'test_loss': 14.577226734161377, 'bleu': 0.0, 'gen_len': 0.7}




 60%|██████    | 120/200 [1:31:17<47:26, 35.58s/it]

For epoch 123: {Learning rate: [0.005612665688336716]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.848591502119856, 'test_loss': 14.734476375579835, 'bleu': 0.0, 'gen_len': 0.7}




 60%|██████    | 121/200 [1:31:52<46:51, 35.59s/it]

For epoch 124: {Learning rate: [0.005590169943749474]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 13.132556700124972, 'test_loss': 15.13656826019287, 'bleu': 0.0, 'gen_len': 0.7}




 61%|██████    | 122/200 [1:32:28<46:20, 35.64s/it]

For epoch 125: {Learning rate: [0.005567942539842175]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 12.900277201722307, 'test_loss': 15.369274139404297, 'bleu': 0.0, 'gen_len': 0.8}




 62%|██████▏   | 123/200 [1:33:03<45:38, 35.57s/it]

For epoch 126: {Learning rate: [0.005545978183850711]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.72s/batches]



Metrics: {'train_loss': 13.044495629101265, 'test_loss': 16.317916297912596, 'bleu': 0.0, 'gen_len': 0.8}




 62%|██████▏   | 124/200 [1:33:40<45:31, 35.95s/it]

For epoch 127: {Learning rate: [0.005524271728019903]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.92150173536161, 'test_loss': 16.517501735687254, 'bleu': 0.0, 'gen_len': 0.8}




 62%|██████▎   | 125/200 [1:34:16<44:44, 35.80s/it]

For epoch 128: {Learning rate: [0.005502818164535149]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 12.879887336637916, 'test_loss': 14.76160717010498, 'bleu': 0.46082, 'gen_len': 0.7}




 63%|██████▎   | 126/200 [1:34:51<44:06, 35.76s/it]

For epoch 129: {Learning rate: [0.005481612620668932]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.981116213449617, 'test_loss': 14.200682353973388, 'bleu': 0.0, 'gen_len': 0.7}




 64%|██████▎   | 127/200 [1:35:27<43:35, 35.82s/it]

For epoch 130: {Learning rate: [0.005460650354131487]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.19batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.874354507864975, 'test_loss': 14.554147529602051, 'bleu': 0.0, 'gen_len': 0.7}




 64%|██████▍   | 128/200 [1:36:04<43:18, 36.09s/it]

For epoch 131: {Learning rate: [0.005439926748615558]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.813466479138631, 'test_loss': 15.410532379150391, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 64%|██████▍   | 129/200 [1:36:40<42:46, 36.15s/it]

For epoch 132: {Learning rate: [0.00541943730952575]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.67s/batches]



Metrics: {'train_loss': 12.868919169030539, 'test_loss': 14.27790994644165, 'bleu': 0.0, 'gen_len': 0.7}




 65%|██████▌   | 130/200 [1:37:17<42:12, 36.18s/it]

For epoch 133: {Learning rate: [0.005399177659883501]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.72s/batches]



Metrics: {'train_loss': 12.906121253967285, 'test_loss': 14.464609813690185, 'bleu': 0.0, 'gen_len': 0.7}




 66%|██████▌   | 131/200 [1:37:53<41:40, 36.24s/it]

For epoch 134: {Learning rate: [0.00537914353639919]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.820798949497503, 'test_loss': 14.495271492004395, 'bleu': 0.0, 'gen_len': 0.7}




 66%|██████▌   | 132/200 [1:38:29<41:00, 36.19s/it]

For epoch 135: {Learning rate: [0.0053593307857034015]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.30batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.864380981863999, 'test_loss': 16.812628841400148, 'bleu': 0.3629, 'gen_len': 0.8}




 66%|██████▋   | 133/200 [1:39:05<40:28, 36.25s/it]

For epoch 136: {Learning rate: [0.005339735360729756]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 12.849325691781393, 'test_loss': 16.34139528274536, 'bleu': 0.0, 'gen_len': 0.8}




 67%|██████▋   | 134/200 [1:39:41<39:35, 35.99s/it]

For epoch 137: {Learning rate: [0.005320353317242179]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.67s/batches]



Metrics: {'train_loss': 12.879421414398566, 'test_loss': 14.343085765838623, 'bleu': 0.0, 'gen_len': 0.7}




 68%|██████▊   | 135/200 [1:40:17<39:10, 36.16s/it]

For epoch 138: {Learning rate: [0.005301180810499818]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 12.933278676940173, 'test_loss': 14.75425329208374, 'bleu': 0.48328, 'gen_len': 0.7}




 68%|██████▊   | 136/200 [1:40:53<38:24, 36.02s/it]

For epoch 139: {Learning rate: [0.005282214092053228]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.70s/batches]



Metrics: {'train_loss': 12.901044502490905, 'test_loss': 15.038621997833252, 'bleu': 0.0, 'gen_len': 0.7}




 68%|██████▊   | 137/200 [1:41:29<37:55, 36.12s/it]

For epoch 140: {Learning rate: [0.005263449506665743]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.89140701875454, 'test_loss': 14.940618801116944, 'bleu': 0.0, 'gen_len': 0.7}




 69%|██████▉   | 138/200 [1:42:05<37:13, 36.02s/it]

For epoch 141: {Learning rate: [0.005244883489354307]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.43batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.874279871219542, 'test_loss': 13.776911544799805, 'bleu': 0.32836, 'gen_len': 0.7}




 70%|██████▉   | 139/200 [1:42:41<36:34, 35.98s/it]

For epoch 142: {Learning rate: [0.0052265125625443175]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.71s/batches]



Metrics: {'train_loss': 12.935641940047102, 'test_loss': 15.098335456848144, 'bleu': 0.32836, 'gen_len': 0.8}




 70%|███████   | 140/200 [1:43:18<36:18, 36.32s/it]

For epoch 143: {Learning rate: [0.005208333333333333]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.74s/batches]



Metrics: {'train_loss': 12.82892258574323, 'test_loss': 15.098147201538087, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 70%|███████   | 141/200 [1:43:56<36:12, 36.82s/it]

For epoch 144: {Learning rate: [0.005190342490858748]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.68s/batches]



Metrics: {'train_loss': 12.900680012819244, 'test_loss': 14.67774772644043, 'bleu': 0.32836, 'gen_len': 0.7}




 71%|███████   | 142/200 [1:44:32<35:23, 36.61s/it]

For epoch 145: {Learning rate: [0.0051725368037648]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.85171843156582, 'test_loss': 14.541127967834473, 'bleu': 0.40583, 'gen_len': 0.7}




 72%|███████▏  | 143/200 [1:45:09<34:39, 36.49s/it]

For epoch 146: {Learning rate: [0.005154913117764516]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.40batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.74s/batches]



Metrics: {'train_loss': 12.804075305054827, 'test_loss': 13.811814022064208, 'bleu': 0.40583, 'gen_len': 0.7}




 72%|███████▏  | 144/200 [1:45:46<34:27, 36.93s/it]

For epoch 147: {Learning rate: [0.005137468353292415]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.67s/batches]



Metrics: {'train_loss': 12.989227748498685, 'test_loss': 14.765222072601318, 'bleu': 0.32836, 'gen_len': 0.7}




 72%|███████▎  | 145/200 [1:46:23<33:45, 36.83s/it]

For epoch 148: {Learning rate: [0.005120199503244003]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.68s/batches]



Metrics: {'train_loss': 12.896381924792033, 'test_loss': 14.773280334472656, 'bleu': 0.0, 'gen_len': 0.7}




 73%|███████▎  | 146/200 [1:47:00<33:02, 36.71s/it]

For epoch 149: {Learning rate: [0.005103103630798288]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.73s/batches]



Metrics: {'train_loss': 12.822922410034552, 'test_loss': 14.87212314605713, 'bleu': 0.0, 'gen_len': 0.7}




 74%|███████▎  | 147/200 [1:47:37<32:36, 36.91s/it]

For epoch 150: {Learning rate: [0.005086177867319746]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.60s/batches]



Metrics: {'train_loss': 12.80018184243179, 'test_loss': 14.577331066131592, 'bleu': 0.0, 'gen_len': 0.7}




 74%|███████▍  | 148/200 [1:48:13<31:40, 36.55s/it]

For epoch 151: {Learning rate: [0.0050694194103363295]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.74s/batches]



Metrics: {'train_loss': 12.84397493920675, 'test_loss': 14.400599670410156, 'bleu': 0.0, 'gen_len': 0.7}




 74%|███████▍  | 149/200 [1:48:50<31:19, 36.86s/it]

For epoch 152: {Learning rate: [0.0050528255215902705]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 12.901801318657107, 'test_loss': 14.274520301818848, 'bleu': 0.0, 'gen_len': 0.7}




 75%|███████▌  | 150/200 [1:49:26<30:28, 36.57s/it]

For epoch 153: {Learning rate: [0.005036393525158627]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.71s/batches]



Metrics: {'train_loss': 12.766573557039587, 'test_loss': 14.590434265136718, 'bleu': 0.51061, 'gen_len': 0.7}




 76%|███████▌  | 151/200 [1:50:03<29:51, 36.56s/it]

For epoch 154: {Learning rate: [0.005020120805640618]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.67s/batches]



Metrics: {'train_loss': 12.83215811775952, 'test_loss': 15.199682140350342, 'bleu': 0.0, 'gen_len': 0.8}




 76%|███████▌  | 152/200 [1:50:39<29:10, 36.48s/it]

For epoch 155: {Learning rate: [0.005004004806408973]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 12.868070701273476, 'test_loss': 13.793290138244629, 'bleu': 0.32836, 'gen_len': 0.7}




 76%|███████▋  | 153/200 [1:51:15<28:28, 36.35s/it]

For epoch 156: {Learning rate: [0.004988043027922638]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.57s/batches]



Metrics: {'train_loss': 12.972172306805122, 'test_loss': 15.062430572509765, 'bleu': 0.0, 'gen_len': 0.8}




 77%|███████▋  | 154/200 [1:51:50<27:38, 36.05s/it]

For epoch 157: {Learning rate: [0.004972233026098313]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.70s/batches]



Metrics: {'train_loss': 12.840145029672762, 'test_loss': 14.935468578338623, 'bleu': 0.0, 'gen_len': 0.7}




 78%|███████▊  | 155/200 [1:52:27<27:15, 36.35s/it]

For epoch 158: {Learning rate: [0.004956572410738401]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.67batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.08s/batches]



Metrics: {'train_loss': 12.760240310575904, 'test_loss': 14.04172487258911, 'bleu': 0.32836, 'gen_len': 0.7}




 78%|███████▊  | 156/200 [1:53:08<27:33, 37.57s/it]

For epoch 159: {Learning rate: [0.004941058844013093]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.84batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.75s/batches]



Metrics: {'train_loss': 12.846170349818903, 'test_loss': 15.617830562591553, 'bleu': 0.0, 'gen_len': 0.8}




 78%|███████▊  | 157/200 [1:53:48<27:35, 38.50s/it]

For epoch 160: {Learning rate: [0.004925690038994379]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.68s/batches]



Metrics: {'train_loss': 12.809740932976327, 'test_loss': 13.956801605224609, 'bleu': 0.32836, 'gen_len': 0.7}




 79%|███████▉  | 158/200 [1:54:26<26:42, 38.15s/it]

For epoch 161: {Learning rate: [0.004910463758239913]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.70s/batches]



Metrics: {'train_loss': 12.824684712944961, 'test_loss': 15.662259960174561, 'bleu': 0.0, 'gen_len': 0.7}




 80%|███████▉  | 159/200 [1:55:02<25:45, 37.68s/it]

For epoch 162: {Learning rate: [0.004895377812424734]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.81s/batches]



Metrics: {'train_loss': 12.844290494918823, 'test_loss': 14.409135818481445, 'bleu': 0.0, 'gen_len': 0.7}




 80%|████████  | 160/200 [1:55:40<25:06, 37.66s/it]

For epoch 163: {Learning rate: [0.00488043005901894]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.898980460515837, 'test_loss': 14.63941011428833, 'bleu': 0.0, 'gen_len': 0.7}




 80%|████████  | 161/200 [1:56:16<24:11, 37.21s/it]

For epoch 164: {Learning rate: [0.0048656184010095185]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.32batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.902730273037422, 'test_loss': 14.524483013153077, 'bleu': 0.5489200000000001, 'gen_len': 0.7}




 81%|████████  | 162/200 [1:56:52<23:22, 36.90s/it]

For epoch 165: {Learning rate: [0.00485094078566458]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.74s/batches]



Metrics: {'train_loss': 12.780966595905584, 'test_loss': 14.425917053222657, 'bleu': 0.5489200000000001, 'gen_len': 0.7}




 82%|████████▏ | 163/200 [1:57:30<22:55, 37.19s/it]

For epoch 166: {Learning rate: [0.004836395203338355]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.42batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.90s/batches]



Metrics: {'train_loss': 12.840291465201028, 'test_loss': 15.039176940917969, 'bleu': 0.0, 'gen_len': 0.7}




 82%|████████▏ | 164/200 [1:58:09<22:38, 37.72s/it]

For epoch 167: {Learning rate: [0.004821979686315372]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.979771253539294, 'test_loss': 14.788806247711182, 'bleu': 0.0, 'gen_len': 0.7}




 82%|████████▎ | 165/200 [1:58:47<21:58, 37.66s/it]

For epoch 168: {Learning rate: [0.004807692307692308]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.825675964355469, 'test_loss': 14.195613670349122, 'bleu': 0.32836, 'gen_len': 0.7}




 83%|████████▎ | 166/200 [1:59:25<21:23, 37.75s/it]

For epoch 169: {Learning rate: [0.004793531180296066]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.70s/batches]



Metrics: {'train_loss': 12.776410341262817, 'test_loss': 14.4087064743042, 'bleu': 0.32836, 'gen_len': 0.7}




 84%|████████▎ | 167/200 [2:00:02<20:43, 37.68s/it]

For epoch 170: {Learning rate: [0.004779494455636703]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.72s/batches]



Metrics: {'train_loss': 12.814449257966949, 'test_loss': 15.83489923477173, 'bleu': 0.39049, 'gen_len': 0.8}




 84%|████████▍ | 168/200 [2:00:39<19:54, 37.33s/it]

For epoch 171: {Learning rate: [0.004765580322893896]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.71s/batches]



Metrics: {'train_loss': 12.764267688844262, 'test_loss': 14.476733875274657, 'bleu': 0.0, 'gen_len': 0.7}




 84%|████████▍ | 169/200 [2:01:15<19:11, 37.14s/it]

For epoch 172: {Learning rate: [0.0047517870079356594]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.35batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 12.78864015602484, 'test_loss': 14.253598403930663, 'bleu': 0.0, 'gen_len': 0.7}




 85%|████████▌ | 170/200 [2:01:52<18:28, 36.96s/it]

For epoch 173: {Learning rate: [0.004738112772368146]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.11batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.851154013377863, 'test_loss': 14.590885734558105, 'bleu': 0.0, 'gen_len': 0.7}




 86%|████████▌ | 171/200 [2:02:30<18:01, 37.29s/it]

For epoch 174: {Learning rate: [0.00472455591261534]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.814008265006834, 'test_loss': 15.23328447341919, 'bleu': 0.0, 'gen_len': 0.8}




 86%|████████▌ | 172/200 [2:03:08<17:30, 37.53s/it]

For epoch 175: {Learning rate: [0.004711114759027557]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.740174485415947, 'test_loss': 14.129410934448241, 'bleu': 0.46082, 'gen_len': 0.7}




 86%|████████▋ | 173/200 [2:03:46<16:53, 37.54s/it]

For epoch 176: {Learning rate: [0.00469778767501768]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.35batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.87984284540502, 'test_loss': 16.162276172637938, 'bleu': 0.0, 'gen_len': 0.8}




 87%|████████▋ | 174/200 [2:04:22<16:10, 37.34s/it]

For epoch 177: {Learning rate: [0.004684573056224134]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.33batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.60s/batches]



Metrics: {'train_loss': 12.867844401336297, 'test_loss': 14.791545867919922, 'bleu': 0.0, 'gen_len': 0.7}




 88%|████████▊ | 175/200 [2:04:59<15:26, 37.05s/it]

For epoch 178: {Learning rate: [0.004671469329699599]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.19batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.40s/batches]



Metrics: {'train_loss': 12.775437395747115, 'test_loss': 13.987255096435547, 'bleu': 0.0, 'gen_len': 0.7}




 88%|████████▊ | 176/200 [2:05:45<15:51, 39.65s/it]

For epoch 179: {Learning rate: [0.004658474953124562]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.36s/batches]



Metrics: {'train_loss': 12.811296375786386, 'test_loss': 14.418723011016846, 'bleu': 0.0, 'gen_len': 0.7}




 88%|████████▊ | 177/200 [2:06:33<16:15, 42.42s/it]

For epoch 180: {Learning rate: [0.00464558841404479]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.89batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.86s/batches]



Metrics: {'train_loss': 12.808338583969489, 'test_loss': 16.741734981536865, 'bleu': 0.5489200000000001, 'gen_len': 0.8}




 89%|████████▉ | 178/200 [2:07:15<15:30, 42.30s/it]

For epoch 181: {Learning rate: [0.004632808229131882]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.87batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.82s/batches]



Metrics: {'train_loss': 12.726102154429366, 'test_loss': 14.432224559783936, 'bleu': 0.0, 'gen_len': 0.7}




 90%|████████▉ | 179/200 [2:07:56<14:39, 41.86s/it]

For epoch 182: {Learning rate: [0.00462013294346608]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:24<00:00,  2.41s/batches]



Metrics: {'train_loss': 12.759043077143227, 'test_loss': 14.047781562805175, 'bleu': 0.0, 'gen_len': 0.7}




 90%|█████████ | 180/200 [2:08:45<14:38, 43.95s/it]

For epoch 183: {Learning rate: [0.0046075611298405355]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.91s/batches]



Metrics: {'train_loss': 12.732412977916438, 'test_loss': 14.700665473937988, 'bleu': 0.0, 'gen_len': 0.7}




 90%|█████████ | 181/200 [2:09:30<13:58, 44.13s/it]

For epoch 184: {Learning rate: [0.004595091388086298]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.89batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.00s/batches]



Metrics: {'train_loss': 12.818203396913482, 'test_loss': 14.844503021240234, 'bleu': 0.0, 'gen_len': 0.7}




 91%|█████████ | 182/200 [2:10:13<13:07, 43.76s/it]

For epoch 185: {Learning rate: [0.004582722344417291]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.96s/batches]



Metrics: {'train_loss': 12.701036807967395, 'test_loss': 15.093740940093994, 'bleu': 0.74173, 'gen_len': 0.8}




 92%|█████████▏| 183/200 [2:10:53<12:05, 42.67s/it]

For epoch 186: {Learning rate: [0.004570452650794567]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.91batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.745048848594108, 'test_loss': 16.038245964050294, 'bleu': 0.47517, 'gen_len': 0.8}




 92%|█████████▏| 184/200 [2:11:34<11:14, 42.13s/it]

For epoch 187: {Learning rate: [0.004558280984309205]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.02s/batches]



Metrics: {'train_loss': 12.80456160336006, 'test_loss': 15.086432361602784, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 92%|█████████▎| 185/200 [2:12:18<10:41, 42.79s/it]

For epoch 188: {Learning rate: [0.0045462060465831745]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.92s/batches]



Metrics: {'train_loss': 12.822854507260207, 'test_loss': 15.586955356597901, 'bleu': 0.40583, 'gen_len': 0.8}




 93%|█████████▎| 186/200 [2:12:57<09:42, 41.58s/it]

For epoch 189: {Learning rate: [0.004534226563187573]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.76batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.16s/batches]



Metrics: {'train_loss': 12.773990131006009, 'test_loss': 14.749978256225585, 'bleu': 0.0, 'gen_len': 0.7}




 94%|█████████▎| 187/200 [2:13:42<09:16, 42.81s/it]

For epoch 190: {Learning rate: [0.004522341283077635]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.21s/batches]



Metrics: {'train_loss': 12.758430562368254, 'test_loss': 15.544607639312744, 'bleu': 0.0, 'gen_len': 0.8}




 94%|█████████▍| 188/200 [2:14:30<08:52, 44.37s/it]

For epoch 191: {Learning rate: [0.004510548978043951]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.73batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:24<00:00,  2.47s/batches]



Metrics: {'train_loss': 12.715417385101318, 'test_loss': 14.181504821777343, 'bleu': 0.0, 'gen_len': 0.7}




 94%|█████████▍| 189/200 [2:15:19<08:22, 45.64s/it]

For epoch 192: {Learning rate: [0.004498848442179341]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.05s/batches]



Metrics: {'train_loss': 12.838749734366813, 'test_loss': 15.17668514251709, 'bleu': 0.48328, 'gen_len': 0.7}




 95%|█████████▌| 190/200 [2:16:00<07:21, 44.18s/it]

For epoch 193: {Learning rate: [0.004487238491360864]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.99batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.73s/batches]



Metrics: {'train_loss': 12.77457569285137, 'test_loss': 15.060381507873535, 'bleu': 0.32836, 'gen_len': 0.7}




 96%|█████████▌| 191/200 [2:16:39<06:24, 42.73s/it]

For epoch 194: {Learning rate: [0.004475717962746455]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.21batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.94s/batches]



Metrics: {'train_loss': 12.787938379659884, 'test_loss': 15.096821784973145, 'bleu': 0.48328, 'gen_len': 0.8}




 96%|█████████▌| 192/200 [2:17:20<05:37, 42.13s/it]

For epoch 195: {Learning rate: [0.004464285714285714]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.40batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.81s/batches]



Metrics: {'train_loss': 12.740544394748968, 'test_loss': 14.479191780090332, 'bleu': 0.32836, 'gen_len': 0.7}




 96%|█████████▋| 193/200 [2:17:59<04:47, 41.11s/it]

For epoch 196: {Learning rate: [0.004452940624244353]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.21batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.68s/batches]



Metrics: {'train_loss': 12.839780068979032, 'test_loss': 14.256989479064941, 'bleu': 0.0, 'gen_len': 0.7}




 97%|█████████▋| 194/200 [2:18:36<04:00, 40.10s/it]

For epoch 197: {Learning rate: [0.004441681590741884]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.97batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.73s/batches]



Metrics: {'train_loss': 12.84148916384069, 'test_loss': 14.40122365951538, 'bleu': 0.0, 'gen_len': 0.7}




 98%|█████████▊| 195/200 [2:19:16<03:19, 39.95s/it]

For epoch 198: {Learning rate: [0.0044305075313021]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.13batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.84s/batches]



Metrics: {'train_loss': 12.769711058314254, 'test_loss': 14.544385147094726, 'bleu': 0.32836, 'gen_len': 0.7}




 98%|█████████▊| 196/200 [2:19:56<02:39, 39.90s/it]

For epoch 199: {Learning rate: [0.004419417382415922]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.82s/batches]



Metrics: {'train_loss': 12.765274518873634, 'test_loss': 15.035828399658204, 'bleu': 0.32836, 'gen_len': 0.7}




 98%|█████████▊| 197/200 [2:20:34<01:58, 39.51s/it]

For epoch 200: {Learning rate: [0.004408410099116239]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  4.07batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.776805953281682, 'test_loss': 15.385392761230468, 'bleu': 0.0, 'gen_len': 0.7}




 99%|█████████▉| 198/200 [2:21:12<01:18, 39.12s/it]

For epoch 201: {Learning rate: [0.004397484654564324]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.11batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.98s/batches]



Metrics: {'train_loss': 12.731154122003694, 'test_loss': 14.166371059417724, 'bleu': 0.0, 'gen_len': 0.7}




100%|█████████▉| 199/200 [2:21:54<00:39, 39.83s/it]

For epoch 202: {Learning rate: [0.004386640039647478]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.87s/batches]



Metrics: {'train_loss': 12.90196399572419, 'test_loss': 15.332494163513184, 'bleu': 0.0, 'gen_len': 0.7}




100%|██████████| 200/200 [2:22:33<00:00, 42.77s/it]


In [13]:
# We will from checkpoints so let us the model
trainer.load(config['model_dir'])

In [14]:
# Train the model 
trainer.train(200, auto_save = True, log_step = 1, saving_directory=config['model_dir'], 
              metric_for_best_model='bleu',
              metric_objective='maximize')

  0%|          | 0/200 [00:00<?, ?it/s]

For epoch 202: {Learning rate: [0.00437587526258753]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.49batches/s]
  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.08s/batches]



Metrics: {'train_loss': 12.858221740257449, 'test_loss': 14.127605628967284, 'bleu': 0.0, 'gen_len': 0.7}




  0%|          | 1/200 [00:48<2:41:12, 48.60s/it]

For epoch 203: {Learning rate: [0.0043651893485598635]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.23s/batches]



Metrics: {'train_loss': 12.771917581558228, 'test_loss': 14.267854690551758, 'bleu': 0.32836, 'gen_len': 0.7}




  1%|          | 2/200 [01:36<2:39:30, 48.34s/it]

For epoch 204: {Learning rate: [0.0043545813393226105]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.88batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.05s/batches]



Metrics: {'train_loss': 12.785439223777956, 'test_loss': 14.757156944274902, 'bleu': 0.0, 'gen_len': 0.7}




  2%|▏         | 3/200 [02:20<2:31:56, 46.28s/it]

For epoch 205: {Learning rate: [0.004344050292855724]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.93batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.06s/batches]



Metrics: {'train_loss': 12.767860755687806, 'test_loss': 14.081186103820801, 'bleu': 0.0, 'gen_len': 0.7}




  2%|▏         | 4/200 [03:04<2:28:02, 45.32s/it]

For epoch 206: {Learning rate: [0.004333595283009603]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.93batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.05s/batches]



Metrics: {'train_loss': 12.679996304395722, 'test_loss': 13.98392105102539, 'bleu': 0.32836, 'gen_len': 0.7}




  2%|▎         | 5/200 [03:48<2:25:27, 44.75s/it]

For epoch 207: {Learning rate: [0.004323215399162967]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.92batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.09s/batches]



Metrics: {'train_loss': 12.693587791628953, 'test_loss': 14.351985454559326, 'bleu': 0.32836, 'gen_len': 0.7}




  3%|▎         | 6/200 [04:32<2:24:17, 44.63s/it]

For epoch 208: {Learning rate: [0.004312909745889714]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.83batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:31<00:00,  3.13s/batches]



Metrics: {'train_loss': 12.658336296314147, 'test_loss': 14.061067199707031, 'bleu': 0.0, 'gen_len': 0.7}




  4%|▎         | 7/200 [05:28<2:35:19, 48.29s/it]

For epoch 209: {Learning rate: [0.004302677442634464]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.38s/batches]



Metrics: {'train_loss': 12.749797594256517, 'test_loss': 14.75880479812622, 'bleu': 0.0, 'gen_len': 0.7}




  4%|▍         | 8/200 [06:18<2:35:58, 48.74s/it]

For epoch 210: {Learning rate: [0.004292517623396532]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.86batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.27s/batches]



Metrics: {'train_loss': 12.675761467073022, 'test_loss': 14.614576244354248, 'bleu': 0.0, 'gen_len': 0.7}




  4%|▍         | 9/200 [07:04<2:33:14, 48.14s/it]

For epoch 211: {Learning rate: [0.004282429436422073]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.75batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:26<00:00,  2.67s/batches]



Metrics: {'train_loss': 12.82095000802017, 'test_loss': 13.864013814926148, 'bleu': 0.0, 'gen_len': 0.6}




  5%|▌         | 10/200 [07:58<2:37:56, 49.88s/it]

For epoch 212: {Learning rate: [0.004272412043904146]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.31batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.30s/batches]



Metrics: {'train_loss': 12.773477519430765, 'test_loss': 14.134716606140136, 'bleu': 0.0, 'gen_len': 0.7}




  6%|▌         | 11/200 [08:49<2:38:00, 50.16s/it]

For epoch 213: {Learning rate: [0.004262464621690459]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.26s/batches]



Metrics: {'train_loss': 12.78917085252157, 'test_loss': 14.401257801055909, 'bleu': 0.0, 'gen_len': 0.7}




  6%|▌         | 12/200 [09:38<2:35:54, 49.76s/it]

For epoch 214: {Learning rate: [0.004252586358998573]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.97batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.91s/batches]



Metrics: {'train_loss': 12.771766540480822, 'test_loss': 14.307787895202637, 'bleu': 0.40583, 'gen_len': 0.7}




  6%|▋         | 13/200 [10:20<2:28:05, 47.52s/it]

For epoch 215: {Learning rate: [0.0042427764581383165]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.82batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.02s/batches]



Metrics: {'train_loss': 12.688798282204605, 'test_loss': 14.642671298980712, 'bleu': 0.0, 'gen_len': 0.7}




  7%|▋         | 14/200 [11:04<2:23:42, 46.36s/it]

For epoch 216: {Learning rate: [0.004233034134241228]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  4.01batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.01s/batches]



Metrics: {'train_loss': 12.8178444548351, 'test_loss': 13.88313913345337, 'bleu': 0.32836, 'gen_len': 0.7}




  8%|▊         | 15/200 [11:47<2:19:29, 45.24s/it]

For epoch 217: {Learning rate: [0.004223358614996787]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.84batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.02s/batches]



Metrics: {'train_loss': 12.664972229701716, 'test_loss': 14.744686508178711, 'bleu': 0.32836, 'gen_len': 0.7}




  8%|▊         | 16/200 [12:30<2:17:07, 44.72s/it]

For epoch 218: {Learning rate: [0.004213749140395263]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.14s/batches]



Metrics: {'train_loss': 12.714651706742078, 'test_loss': 14.369641494750976, 'bleu': 0.0, 'gen_len': 0.7}




  8%|▊         | 17/200 [13:15<2:16:47, 44.85s/it]

For epoch 219: {Learning rate: [0.004204204962476953]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.86batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.38s/batches]



Metrics: {'train_loss': 12.748192548751831, 'test_loss': 15.121641921997071, 'bleu': 0.0, 'gen_len': 0.7}




  9%|▉         | 18/200 [14:03<2:18:22, 45.62s/it]

For epoch 220: {Learning rate: [0.004194725345087652]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.32s/batches]



Metrics: {'train_loss': 12.71346137581802, 'test_loss': 14.069414615631104, 'bleu': 0.46082, 'gen_len': 0.7}




 10%|▉         | 19/200 [14:51<2:20:01, 46.42s/it]

For epoch 221: {Learning rate: [0.004185309563640156]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.27s/batches]



Metrics: {'train_loss': 12.766939163208008, 'test_loss': 14.294399166107178, 'bleu': 0.5489200000000001, 'gen_len': 0.7}




 10%|█         | 20/200 [15:40<2:22:03, 47.35s/it]

For epoch 222: {Learning rate: [0.004175956904881631]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.29batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.13s/batches]



Metrics: {'train_loss': 12.75622752236157, 'test_loss': 14.180955696105958, 'bleu': 0.32836, 'gen_len': 0.7}




 10%|█         | 21/200 [16:29<2:22:50, 47.88s/it]

For epoch 223: {Learning rate: [0.004166666666666667]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.80batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.03s/batches]



Metrics: {'train_loss': 12.691267368270129, 'test_loss': 14.6866024017334, 'bleu': 0.32836, 'gen_len': 0.7}




 11%|█         | 22/200 [17:15<2:19:46, 47.12s/it]

For epoch 224: {Learning rate: [0.004157438157735871]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.72batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.12s/batches]



Metrics: {'train_loss': 12.68343971415264, 'test_loss': 14.028268241882325, 'bleu': 0.39049, 'gen_len': 0.7}




 12%|█▏        | 23/200 [18:00<2:17:35, 46.64s/it]

For epoch 225: {Learning rate: [0.004148270697499825]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.79batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.10s/batches]



Metrics: {'train_loss': 12.622573614120483, 'test_loss': 13.7927237033844, 'bleu': 0.0, 'gen_len': 0.7}




 12%|█▏        | 24/200 [18:45<2:15:07, 46.06s/it]

For epoch 226: {Learning rate: [0.004139163615828262]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.07s/batches]



Metrics: {'train_loss': 12.702716815762404, 'test_loss': 13.81421513557434, 'bleu': 0.3629, 'gen_len': 0.7}




 12%|█▎        | 25/200 [19:30<2:12:57, 45.58s/it]

For epoch 227: {Learning rate: [0.004130116252844311]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.13s/batches]



Metrics: {'train_loss': 12.70619144090792, 'test_loss': 13.986900520324706, 'bleu': 0.48328, 'gen_len': 0.7}




 13%|█▎        | 26/200 [20:16<2:12:59, 45.86s/it]

For epoch 228: {Learning rate: [0.004121127958723669]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.11s/batches]



Metrics: {'train_loss': 12.7000108114103, 'test_loss': 13.818062686920166, 'bleu': 0.0, 'gen_len': 0.7}




 14%|█▎        | 27/200 [21:01<2:11:51, 45.73s/it]

For epoch 229: {Learning rate: [0.004112198093498556]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.66batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.07s/batches]



Metrics: {'train_loss': 12.749597869268278, 'test_loss': 14.406417942047119, 'bleu': 0.39049, 'gen_len': 0.7}




 14%|█▍        | 28/200 [21:47<2:10:37, 45.57s/it]

For epoch 230: {Learning rate: [0.0041033260268663295]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.07s/batches]



Metrics: {'train_loss': 12.725112118372103, 'test_loss': 15.421640110015868, 'bleu': 0.0, 'gen_len': 0.7}




 14%|█▍        | 29/200 [22:32<2:09:43, 45.52s/it]

For epoch 231: {Learning rate: [0.004094511138002615]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.72batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.02s/batches]



Metrics: {'train_loss': 12.660287217396062, 'test_loss': 14.9580002784729, 'bleu': 0.32836, 'gen_len': 0.7}




 15%|█▌        | 30/200 [23:17<2:08:25, 45.32s/it]

For epoch 232: {Learning rate: [0.004085752815378834]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.76batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.06s/batches]



Metrics: {'train_loss': 12.70042225209678, 'test_loss': 15.439726257324219, 'bleu': 0.32836, 'gen_len': 0.7}




 16%|█▌        | 31/200 [24:02<2:07:18, 45.20s/it]

For epoch 233: {Learning rate: [0.004077050456584014]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.20s/batches]



Metrics: {'train_loss': 12.691125753449231, 'test_loss': 14.689115905761719, 'bleu': 0.40583, 'gen_len': 0.7}




 16%|█▌        | 32/200 [24:48<2:07:13, 45.44s/it]

For epoch 234: {Learning rate: [0.0040684034681507455]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.12s/batches]



Metrics: {'train_loss': 12.721056472964403, 'test_loss': 14.624356746673584, 'bleu': 0.0, 'gen_len': 0.7}




 16%|█▋        | 33/200 [25:33<2:06:21, 45.40s/it]

For epoch 235: {Learning rate: [0.004059811265385193]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.16s/batches]



Metrics: {'train_loss': 12.809082723245389, 'test_loss': 14.562788391113282, 'bleu': 0.40583, 'gen_len': 0.7}




 17%|█▋        | 34/200 [26:20<2:06:30, 45.73s/it]

For epoch 236: {Learning rate: [0.0040512732722010275]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.74batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.97s/batches]



Metrics: {'train_loss': 12.7519918011456, 'test_loss': 14.487902450561524, 'bleu': 0.0, 'gen_len': 0.7}




 18%|█▊        | 35/200 [27:06<2:06:40, 46.06s/it]

For epoch 237: {Learning rate: [0.004042788920957193]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.11s/batches]



Metrics: {'train_loss': 12.64658654026869, 'test_loss': 16.080552291870116, 'bleu': 0.0, 'gen_len': 0.8}




 18%|█▊        | 36/200 [27:51<2:04:34, 45.58s/it]

For epoch 238: {Learning rate: [0.004034357652299393]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.77batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.04s/batches]



Metrics: {'train_loss': 12.690851525562566, 'test_loss': 14.984047794342041, 'bleu': 0.0, 'gen_len': 0.7}




 18%|█▊        | 37/200 [28:36<2:03:17, 45.38s/it]

For epoch 239: {Learning rate: [0.004025978915005193]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.86batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.05s/batches]



Metrics: {'train_loss': 12.680387229454226, 'test_loss': 13.983243370056153, 'bleu': 0.32836, 'gen_len': 0.7}




 19%|█▉        | 38/200 [29:20<2:01:30, 45.00s/it]

For epoch 240: {Learning rate: [0.004017652165832657]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.17s/batches]



Metrics: {'train_loss': 12.8125617445969, 'test_loss': 14.169450855255127, 'bleu': 0.0, 'gen_len': 0.7}




 20%|█▉        | 39/200 [30:06<2:01:40, 45.34s/it]

For epoch 241: {Learning rate: [0.004009376869372401]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.82batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.10s/batches]



Metrics: {'train_loss': 12.674533960295886, 'test_loss': 14.493872356414794, 'bleu': 0.39049, 'gen_len': 0.7}




 20%|██        | 40/200 [30:51<2:00:32, 45.21s/it]

For epoch 242: {Learning rate: [0.004001152497902999]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.82batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.08s/batches]



Metrics: {'train_loss': 12.703512598828572, 'test_loss': 13.869960308074951, 'bleu': 0.0, 'gen_len': 0.7}




 20%|██        | 41/200 [31:35<1:59:12, 44.98s/it]

For epoch 243: {Learning rate: [0.0039929785312496245]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.69batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.06s/batches]



Metrics: {'train_loss': 12.702969080064355, 'test_loss': 14.768200969696045, 'bleu': 0.0, 'gen_len': 0.7}




 21%|██        | 42/200 [32:21<1:58:30, 45.01s/it]

For epoch 244: {Learning rate: [0.003984854456645865]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.77batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.10s/batches]



Metrics: {'train_loss': 12.68161181705754, 'test_loss': 14.733510589599609, 'bleu': 0.0, 'gen_len': 0.7}




 22%|██▏       | 43/200 [33:05<1:57:36, 44.95s/it]

For epoch 245: {Learning rate: [0.003976779768598611]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.75batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.06s/batches]



Metrics: {'train_loss': 12.652084187763494, 'test_loss': 14.29676170349121, 'bleu': 0.32836, 'gen_len': 0.7}




 22%|██▏       | 44/200 [33:50<1:56:41, 44.88s/it]

For epoch 246: {Learning rate: [0.003968753968755953]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.40batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.31s/batches]



Metrics: {'train_loss': 12.630093440776918, 'test_loss': 14.509889221191406, 'bleu': 0.32836, 'gen_len': 0.7}




 22%|██▎       | 45/200 [34:39<1:59:23, 46.22s/it]

For epoch 247: {Learning rate: [0.003960776565777987]}


Train batch number 81: 100%|██████████| 82/82 [00:25<00:00,  3.25batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.37s/batches]



Metrics: {'train_loss': 12.67248874757348, 'test_loss': 15.851583671569824, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 23%|██▎       | 46/200 [35:31<2:02:25, 47.70s/it]

For epoch 248: {Learning rate: [0.003952847075210474]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.29batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:24<00:00,  2.47s/batches]



Metrics: {'train_loss': 12.701934622555244, 'test_loss': 13.999089813232422, 'bleu': 0.32836, 'gen_len': 0.7}




 24%|██▎       | 47/200 [36:23<2:05:04, 49.05s/it]

For epoch 249: {Learning rate: [0.0039449650193612695]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.30s/batches]



Metrics: {'train_loss': 12.706787719959166, 'test_loss': 15.137683963775634, 'bleu': 0.3629, 'gen_len': 0.8}




 24%|██▍       | 48/200 [37:13<2:05:25, 49.51s/it]

For epoch 250: {Learning rate: [0.0039371299271794506]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.40batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:24<00:00,  2.40s/batches]



Metrics: {'train_loss': 12.735631494987302, 'test_loss': 14.713428783416749, 'bleu': 0.0, 'gen_len': 0.7}




 24%|██▍       | 49/200 [38:04<2:05:25, 49.84s/it]

For epoch 251: {Learning rate: [0.003929341334137072]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.29batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:24<00:00,  2.43s/batches]



Metrics: {'train_loss': 12.712988004451844, 'test_loss': 15.077911949157714, 'bleu': 0.48328, 'gen_len': 0.8}




 25%|██▌       | 50/200 [38:55<2:05:50, 50.34s/it]

For epoch 252: {Learning rate: [0.003921598782113491]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.40batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:27<00:00,  2.74s/batches]



Metrics: {'train_loss': 12.671479661290238, 'test_loss': 14.234451007843017, 'bleu': 0.32836, 'gen_len': 0.7}




 26%|██▌       | 51/200 [39:50<2:08:12, 51.63s/it]

For epoch 253: {Learning rate: [0.003913901819282185]}


Train batch number 81: 100%|██████████| 82/82 [00:25<00:00,  3.24batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:25<00:00,  2.53s/batches]



Metrics: {'train_loss': 12.630812691479194, 'test_loss': 14.139599323272705, 'bleu': 0.32836, 'gen_len': 0.7}




 26%|██▌       | 52/200 [40:44<2:09:21, 52.44s/it]

For epoch 254: {Learning rate: [0.00390625]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:29<00:00,  2.94s/batches]



Metrics: {'train_loss': 12.63962545627501, 'test_loss': 14.356419897079467, 'bleu': 0.0, 'gen_len': 0.6}




 26%|██▋       | 53/200 [41:41<2:11:23, 53.63s/it]

For epoch 255: {Learning rate: [0.0038986428846987833]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.44batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:24<00:00,  2.43s/batches]



Metrics: {'train_loss': 12.808619220082353, 'test_loss': 14.122355842590332, 'bleu': 0.39049, 'gen_len': 0.7}




 27%|██▋       | 54/200 [42:31<2:08:16, 52.71s/it]

For epoch 256: {Learning rate: [0.0038910800397793147]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.36s/batches]



Metrics: {'train_loss': 12.680473100848314, 'test_loss': 14.226104354858398, 'bleu': 0.0, 'gen_len': 0.7}




 28%|██▊       | 55/200 [43:22<2:05:42, 52.02s/it]

For epoch 257: {Learning rate: [0.0038835610375075004]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.34batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:27<00:00,  2.70s/batches]



Metrics: {'train_loss': 12.687568542433947, 'test_loss': 15.070810317993164, 'bleu': 0.0, 'gen_len': 0.7}




 28%|██▊       | 56/200 [44:17<2:07:17, 53.04s/it]

For epoch 258: {Learning rate: [0.003876085455912764]}


Train batch number 81: 100%|██████████| 82/82 [00:25<00:00,  3.24batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:27<00:00,  2.78s/batches]



Metrics: {'train_loss': 12.720140125693344, 'test_loss': 16.03750352859497, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 28%|██▊       | 57/200 [45:15<2:09:50, 54.48s/it]

For epoch 259: {Learning rate: [0.0038686528786885804]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.44batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:25<00:00,  2.58s/batches]



Metrics: {'train_loss': 12.75147258363119, 'test_loss': 14.68843870162964, 'bleu': 0.32836, 'gen_len': 0.7}




 29%|██▉       | 58/200 [46:07<2:07:15, 53.77s/it]

For epoch 260: {Learning rate: [0.003861262895095097]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.34s/batches]



Metrics: {'train_loss': 12.680670749850389, 'test_loss': 14.609723949432373, 'bleu': 0.0, 'gen_len': 0.7}




 30%|██▉       | 59/200 [46:57<2:03:37, 52.61s/it]

For epoch 261: {Learning rate: [0.0038539150998637963]}


Train batch number 81: 100%|██████████| 82/82 [00:22<00:00,  3.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:26<00:00,  2.65s/batches]



Metrics: {'train_loss': 12.66550264125917, 'test_loss': 14.184258079528808, 'bleu': 0.0, 'gen_len': 0.7}




 30%|███       | 60/200 [47:51<2:03:21, 52.87s/it]

For epoch 262: {Learning rate: [0.003846609093104148]}


Train batch number 81: 100%|██████████| 82/82 [00:24<00:00,  3.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.21s/batches]



Metrics: {'train_loss': 12.8029465675354, 'test_loss': 14.24028730392456, 'bleu': 0.46082, 'gen_len': 0.7}




 30%|███       | 61/200 [48:39<1:59:34, 51.61s/it]

For epoch 263: {Learning rate: [0.003839344480212195]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.30s/batches]



Metrics: {'train_loss': 12.65901840605387, 'test_loss': 14.241136646270752, 'bleu': 0.32836, 'gen_len': 0.7}




 31%|███       | 62/200 [49:28<1:56:36, 50.70s/it]

For epoch 264: {Learning rate: [0.0038321208717810363]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.27s/batches]



Metrics: {'train_loss': 12.62582882439218, 'test_loss': 14.063442802429199, 'bleu': 0.0, 'gen_len': 0.7}




 32%|███▏      | 63/200 [50:17<1:54:34, 50.18s/it]

For epoch 265: {Learning rate: [0.0038249378835131537]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.31s/batches]



Metrics: {'train_loss': 12.687086675225235, 'test_loss': 14.998522853851318, 'bleu': 0.0, 'gen_len': 0.7}




 32%|███▏      | 64/200 [51:05<1:52:45, 49.75s/it]

For epoch 266: {Learning rate: [0.0038177951361345382]}


Train batch number 81: 100%|██████████| 82/82 [00:23<00:00,  3.48batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.20s/batches]



Metrics: {'train_loss': 12.686710322775491, 'test_loss': 13.96467833518982, 'bleu': 0.0, 'gen_len': 0.6}




 32%|███▎      | 65/200 [51:53<1:50:24, 49.07s/it]

For epoch 267: {Learning rate: [0.0038106922553105774]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.29batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.73s/batches]



Metrics: {'train_loss': 12.636505185103998, 'test_loss': 13.699907684326172, 'bleu': 0.40583, 'gen_len': 0.7}




 33%|███▎      | 66/200 [52:31<1:42:27, 45.88s/it]

For epoch 268: {Learning rate: [0.0038036288715636536]}


Train batch number 5:   6%|▌         | 5/82 [00:01<00:19,  4.03batches/s]

In [15]:
# Train the model 
trainer.train(150, auto_save = True, log_step = 1, saving_directory=config['model_dir'], 
              metric_for_best_model='bleu',
              metric_objective='maximize')

  0%|          | 0/150 [00:00<?, ?it/s]

For epoch 267: {Learning rate: [0.0038036288715636536]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.11batches/s]
  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.68s/batches]



Metrics: {'train_loss': 12.658672954978012, 'test_loss': 13.975855827331543, 'bleu': 0.60722, 'gen_len': 0.7}




  1%|          | 1/150 [00:38<1:35:42, 38.54s/it]

For epoch 268: {Learning rate: [0.003796604620192419]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.72s/batches]



Metrics: {'train_loss': 12.669705972438905, 'test_loss': 13.831021881103515, 'bleu': 0.32836, 'gen_len': 0.7}




  1%|▏         | 2/150 [01:14<1:31:51, 37.24s/it]

For epoch 269: {Learning rate: [0.0037896191411927026]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.59s/batches]



Metrics: {'train_loss': 12.708249801542701, 'test_loss': 15.04109811782837, 'bleu': 0.0, 'gen_len': 0.7}




  2%|▏         | 3/150 [01:49<1:28:10, 35.99s/it]

For epoch 270: {Learning rate: [0.003782672079180015]}


Train batch number 81: 100%|██████████| 82/82 [00:16<00:00,  4.85batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.641125016096161, 'test_loss': 14.230920600891114, 'bleu': 0.32836, 'gen_len': 0.7}




  3%|▎         | 4/150 [02:25<1:27:42, 36.05s/it]

For epoch 271: {Learning rate: [0.003775763083313606]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.73batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.67s/batches]



Metrics: {'train_loss': 12.597859580342362, 'test_loss': 14.034138584136963, 'bleu': 0.32836, 'gen_len': 0.7}




  3%|▎         | 5/150 [03:01<1:26:56, 35.97s/it]

For epoch 272: {Learning rate: [0.003768891807222045]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.78batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.61s/batches]



Metrics: {'train_loss': 12.606584915300695, 'test_loss': 14.13065071105957, 'bleu': 0.32836, 'gen_len': 0.7}




  4%|▍         | 6/150 [03:36<1:25:25, 35.59s/it]

For epoch 273: {Learning rate: [0.0037620579089302874]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.65batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.661000385517028, 'test_loss': 15.85502223968506, 'bleu': 0.39049, 'gen_len': 0.8}




  5%|▍         | 7/150 [04:11<1:24:38, 35.51s/it]

For epoch 274: {Learning rate: [0.0037552610507881855]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.736456295339073, 'test_loss': 15.743404960632324, 'bleu': 0.0, 'gen_len': 0.7}




  5%|▌         | 8/150 [04:48<1:24:53, 35.87s/it]

For epoch 275: {Learning rate: [0.0037485008994004197]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.72batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.60s/batches]



Metrics: {'train_loss': 12.629388006722055, 'test_loss': 14.289960384368896, 'bleu': 0.0, 'gen_len': 0.7}




  6%|▌         | 9/150 [05:23<1:23:37, 35.58s/it]

For epoch 276: {Learning rate: [0.0037417771255578106]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.65527229774289, 'test_loss': 14.487878036499023, 'bleu': 0.0, 'gen_len': 0.7}




  7%|▋         | 10/150 [05:58<1:23:06, 35.62s/it]

For epoch 277: {Learning rate: [0.00373508940416998]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.68batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.663823592953566, 'test_loss': 14.88656997680664, 'bleu': 0.32836, 'gen_len': 0.7}




  7%|▋         | 11/150 [06:34<1:22:45, 35.73s/it]

For epoch 278: {Learning rate: [0.003728437414199335]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.646568810067526, 'test_loss': 14.23114080429077, 'bleu': 0.0, 'gen_len': 0.7}




  8%|▊         | 12/150 [07:10<1:22:14, 35.76s/it]

For epoch 279: {Learning rate: [0.0037218208385963354]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.07s/batches]



Metrics: {'train_loss': 12.631642742854792, 'test_loss': 14.60594654083252, 'bleu': 0.40583, 'gen_len': 0.7}




  9%|▊         | 13/150 [07:52<1:26:08, 37.73s/it]

For epoch 280: {Learning rate: [0.003715239364236025]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.74batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.30s/batches]



Metrics: {'train_loss': 12.728442081590979, 'test_loss': 14.569773197174072, 'bleu': 0.51061, 'gen_len': 0.7}




  9%|▉         | 14/150 [08:40<1:32:07, 40.64s/it]

For epoch 281: {Learning rate: [0.003708692681855792]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.77batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.24s/batches]



Metrics: {'train_loss': 12.632014082699287, 'test_loss': 13.819097232818603, 'bleu': 0.0, 'gen_len': 0.7}




 10%|█         | 15/150 [09:26<1:35:31, 42.45s/it]

For epoch 282: {Learning rate: [0.003702180485994327]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.81batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.22s/batches]



Metrics: {'train_loss': 12.640607386100584, 'test_loss': 14.15187339782715, 'bleu': 0.0, 'gen_len': 0.7}




 11%|█         | 16/150 [10:12<1:37:13, 43.53s/it]

For epoch 283: {Learning rate: [0.003695702474931766]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.74batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.22s/batches]



Metrics: {'train_loss': 12.615992354183662, 'test_loss': 14.28606481552124, 'bleu': 0.0, 'gen_len': 0.7}




 11%|█▏        | 17/150 [10:59<1:38:32, 44.46s/it]

For epoch 284: {Learning rate: [0.0036892583506309704]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.92batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.89s/batches]



Metrics: {'train_loss': 12.637596043144784, 'test_loss': 14.384871578216552, 'bleu': 0.32836, 'gen_len': 0.7}




 12%|█▏        | 18/150 [11:41<1:36:15, 43.75s/it]

For epoch 285: {Learning rate: [0.003682847818679935]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.25batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.89s/batches]



Metrics: {'train_loss': 12.614023481927267, 'test_loss': 14.668813705444336, 'bleu': 0.0, 'gen_len': 0.7}




 13%|█▎        | 19/150 [12:21<1:33:08, 42.66s/it]

For epoch 286: {Learning rate: [0.003676470588235294]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.24batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.90s/batches]



Metrics: {'train_loss': 12.624171448916924, 'test_loss': 14.632987213134765, 'bleu': 0.39049, 'gen_len': 0.7}




 13%|█▎        | 20/150 [13:02<1:30:55, 41.96s/it]

For epoch 287: {Learning rate: [0.0036701263719668966]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.20batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.90s/batches]



Metrics: {'train_loss': 12.576866830267557, 'test_loss': 14.102198505401612, 'bleu': 0.32836, 'gen_len': 0.7}




 14%|█▍        | 21/150 [13:42<1:29:14, 41.51s/it]

For epoch 288: {Learning rate: [0.003663814886003432]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.26batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.92s/batches]



Metrics: {'train_loss': 12.711094024704725, 'test_loss': 14.140999507904052, 'bleu': 0.32836, 'gen_len': 0.7}




 15%|█▍        | 22/150 [14:22<1:27:45, 41.13s/it]

For epoch 289: {Learning rate: [0.0036575358498790803]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.23batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.95s/batches]



Metrics: {'train_loss': 12.562436528322174, 'test_loss': 13.963122463226318, 'bleu': 0.0, 'gen_len': 0.7}




 15%|█▌        | 23/150 [15:03<1:26:49, 41.02s/it]

For epoch 290: {Learning rate: [0.0036512889864811623]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.19batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.91s/batches]



Metrics: {'train_loss': 12.611732808555045, 'test_loss': 15.434176349639893, 'bleu': 0.40583, 'gen_len': 0.8}




 16%|█▌        | 24/150 [15:44<1:25:50, 40.88s/it]

For epoch 291: {Learning rate: [0.003645074021998777]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.19batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.89s/batches]



Metrics: {'train_loss': 12.657720769323953, 'test_loss': 14.036186218261719, 'bleu': 0.40583, 'gen_len': 0.7}




 17%|█▋        | 25/150 [16:24<1:24:58, 40.79s/it]

For epoch 292: {Learning rate: [0.003638890685872387]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.25batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.90s/batches]



Metrics: {'train_loss': 12.654114449896463, 'test_loss': 13.841159391403199, 'bleu': 0.40583, 'gen_len': 0.6}




 17%|█▋        | 26/150 [17:05<1:23:59, 40.64s/it]

For epoch 293: {Learning rate: [0.0036327387107443526]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.24batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.90s/batches]



Metrics: {'train_loss': 12.72511450837298, 'test_loss': 14.470464706420898, 'bleu': 0.0, 'gen_len': 0.7}




 18%|█▊        | 27/150 [17:45<1:23:03, 40.52s/it]

For epoch 294: {Learning rate: [0.0036266178324103715]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.26batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.87s/batches]



Metrics: {'train_loss': 12.621738364056844, 'test_loss': 15.55629358291626, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 19%|█▊        | 28/150 [18:25<1:22:00, 40.33s/it]

For epoch 295: {Learning rate: [0.0036205277897718266]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.20batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.76s/batches]



Metrics: {'train_loss': 12.68554536889239, 'test_loss': 14.521294689178466, 'bleu': 0.0, 'gen_len': 0.7}




 19%|█▉        | 29/150 [19:04<1:20:26, 39.89s/it]

For epoch 296: {Learning rate: [0.0036144683247890013]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.62batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.772041402212004, 'test_loss': 14.139497184753418, 'bleu': 0.32836, 'gen_len': 0.7}




 20%|██        | 30/150 [19:39<1:17:22, 38.69s/it]

For epoch 297: {Learning rate: [0.003608439182435161]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.596589640873235, 'test_loss': 14.073621273040771, 'bleu': 0.0, 'gen_len': 0.7}




 21%|██        | 31/150 [20:15<1:15:09, 37.90s/it]

For epoch 298: {Learning rate: [0.0036024401106514686]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.66s/batches]



Metrics: {'train_loss': 12.609732145216407, 'test_loss': 14.50914888381958, 'bleu': 0.5489200000000001, 'gen_len': 0.7}




 21%|██▏       | 32/150 [20:53<1:14:09, 37.71s/it]

For epoch 299: {Learning rate: [0.003596470860302725]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.652920467097585, 'test_loss': 14.13361930847168, 'bleu': 0.0, 'gen_len': 0.7}




 22%|██▏       | 33/150 [21:28<1:12:20, 37.10s/it]

For epoch 300: {Learning rate: [0.003590531185133913]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.56batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.70s/batches]



Metrics: {'train_loss': 12.615499275486643, 'test_loss': 14.40838975906372, 'bleu': 0.0, 'gen_len': 0.7}




 23%|██▎       | 34/150 [22:05<1:11:26, 36.95s/it]

For epoch 301: {Learning rate: [0.0035846208417275277]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.609328275773583, 'test_loss': 15.001454067230224, 'bleu': 0.0, 'gen_len': 0.7}




 23%|██▎       | 35/150 [22:41<1:10:15, 36.66s/it]

For epoch 302: {Learning rate: [0.0035787395894616766]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.58batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.68s/batches]



Metrics: {'train_loss': 12.656618350889625, 'test_loss': 14.670851516723634, 'bleu': 0.0, 'gen_len': 0.7}




 24%|██▍       | 36/150 [23:17<1:09:27, 36.56s/it]

For epoch 303: {Learning rate: [0.0035728871904689343]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.63batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.63803514620153, 'test_loss': 14.10641622543335, 'bleu': 0.40583, 'gen_len': 0.7}




 25%|██▍       | 37/150 [23:53<1:08:20, 36.29s/it]

For epoch 304: {Learning rate: [0.003567063409595935]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.54batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.56855015638398, 'test_loss': 14.32993974685669, 'bleu': 0.0, 'gen_len': 0.7}




 25%|██▌       | 38/150 [24:29<1:07:32, 36.18s/it]

For epoch 305: {Learning rate: [0.003561268014363686]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.60batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.723511765642863, 'test_loss': 14.278737545013428, 'bleu': 0.0, 'gen_len': 0.7}




 26%|██▌       | 39/150 [25:05<1:06:45, 36.08s/it]

For epoch 306: {Learning rate: [0.0035555007749285892]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.64s/batches]



Metrics: {'train_loss': 12.586993653599809, 'test_loss': 14.47523536682129, 'bleu': 0.0, 'gen_len': 0.7}




 27%|██▋       | 40/150 [25:41<1:06:13, 36.12s/it]

For epoch 307: {Learning rate: [0.003549761464044155]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.61batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.630482755056242, 'test_loss': 14.821363830566407, 'bleu': 0.0, 'gen_len': 0.7}




 27%|██▋       | 41/150 [26:17<1:05:27, 36.03s/it]

For epoch 308: {Learning rate: [0.003544049857023392]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.63s/batches]



Metrics: {'train_loss': 12.677390238133873, 'test_loss': 14.53652572631836, 'bleu': 0.32836, 'gen_len': 0.7}




 28%|██▊       | 42/150 [26:56<1:06:23, 36.89s/it]

For epoch 309: {Learning rate: [0.0035383657317018618]}


Train batch number 81: 100%|██████████| 82/82 [00:16<00:00,  4.85batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.55s/batches]



Metrics: {'train_loss': 12.60798019897647, 'test_loss': 14.018474769592284, 'bleu': 0.32836, 'gen_len': 0.7}




 29%|██▊       | 43/150 [27:29<1:04:06, 35.95s/it]

For epoch 310: {Learning rate: [0.0035327088684013845]}


Train batch number 81: 100%|██████████| 82/82 [00:16<00:00,  4.94batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.54s/batches]



Metrics: {'train_loss': 12.720086115162546, 'test_loss': 14.099175071716308, 'bleu': 0.0, 'gen_len': 0.7}




 29%|██▉       | 44/150 [28:03<1:02:13, 35.22s/it]

For epoch 311: {Learning rate: [0.003527079049894377]}


Train batch number 81: 100%|██████████| 82/82 [00:16<00:00,  4.88batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.634734857373122, 'test_loss': 13.902803468704224, 'bleu': 0.39049, 'gen_len': 0.7}




 30%|███       | 45/150 [28:39<1:02:14, 35.56s/it]

For epoch 312: {Learning rate: [0.0035214760613688193]}


Train batch number 81: 100%|██████████| 82/82 [00:16<00:00,  4.89batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.51s/batches]



Metrics: {'train_loss': 12.633569833708972, 'test_loss': 13.876786136627198, 'bleu': 0.39049, 'gen_len': 0.7}




 31%|███       | 46/150 [29:13<1:00:40, 35.01s/it]

For epoch 313: {Learning rate: [0.003515899690393825]}


Train batch number 81: 100%|██████████| 82/82 [00:16<00:00,  4.86batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:15<00:00,  1.58s/batches]



Metrics: {'train_loss': 12.664381067927291, 'test_loss': 14.34045763015747, 'bleu': 0.32836, 'gen_len': 0.7}




 31%|███▏      | 47/150 [29:47<59:45, 34.81s/it]  

For epoch 314: {Learning rate: [0.0035103497268858153]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.19batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:26<00:00,  2.67s/batches]



Metrics: {'train_loss': 12.63506678255593, 'test_loss': 14.41978931427002, 'bleu': 0.32836, 'gen_len': 0.7}




 32%|███▏      | 48/150 [30:35<1:05:55, 38.78s/it]

For epoch 315: {Learning rate: [0.0035048259630752767]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.87batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.08s/batches]



Metrics: {'train_loss': 12.641494314845016, 'test_loss': 14.34506196975708, 'bleu': 0.39049, 'gen_len': 0.7}




 33%|███▎      | 49/150 [31:20<1:07:58, 40.38s/it]

For epoch 316: {Learning rate: [0.003499328193474089]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.90batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.24s/batches]



Metrics: {'train_loss': 12.684349071688768, 'test_loss': 13.939120674133301, 'bleu': 0.32836, 'gen_len': 0.7}




 33%|███▎      | 50/150 [32:05<1:09:48, 41.89s/it]

For epoch 317: {Learning rate: [0.0034938562148434213]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.73s/batches]



Metrics: {'train_loss': 12.697160092795768, 'test_loss': 14.555039215087891, 'bleu': 0.32836, 'gen_len': 0.7}




 34%|███▍      | 51/150 [32:42<1:06:56, 40.57s/it]

For epoch 318: {Learning rate: [0.003488409826162172]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.24batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.73s/batches]



Metrics: {'train_loss': 12.646170435882196, 'test_loss': 14.501553630828857, 'bleu': 0.0, 'gen_len': 0.7}




 35%|███▍      | 52/150 [33:21<1:05:21, 40.01s/it]

For epoch 319: {Learning rate: [0.003482988828595955]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.71batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.65s/batches]



Metrics: {'train_loss': 12.583555256448141, 'test_loss': 13.785840320587159, 'bleu': 0.60722, 'gen_len': 0.7}




 35%|███▌      | 53/150 [33:57<1:02:30, 38.67s/it]

For epoch 320: {Learning rate: [0.0034775930254666077]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:16<00:00,  1.62s/batches]



Metrics: {'train_loss': 12.682213498324883, 'test_loss': 14.404471492767334, 'bleu': 0.32836, 'gen_len': 0.7}




 36%|███▌      | 54/150 [34:33<1:00:34, 37.86s/it]

For epoch 321: {Learning rate: [0.003472222222222222]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.70batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.84s/batches]



Metrics: {'train_loss': 12.619395331638616, 'test_loss': 14.14080228805542, 'bleu': 0.0, 'gen_len': 0.7}




 37%|███▋      | 55/150 [35:10<59:55, 37.85s/it]  

For epoch 322: {Learning rate: [0.003466876226407682]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.27batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.92s/batches]



Metrics: {'train_loss': 12.59260099690135, 'test_loss': 15.129222297668457, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 37%|███▋      | 56/150 [35:51<1:00:32, 38.64s/it]

For epoch 323: {Learning rate: [0.0034615548476356955]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.14batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.09s/batches]



Metrics: {'train_loss': 12.559183196323675, 'test_loss': 14.688575839996338, 'bleu': 0.39049, 'gen_len': 0.7}




 38%|███▊      | 57/150 [36:34<1:01:43, 39.82s/it]

For epoch 324: {Learning rate: [0.003456257897558319]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.28batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.661015876909582, 'test_loss': 14.341379737854004, 'bleu': 0.0, 'gen_len': 0.7}




 39%|███▊      | 58/150 [37:12<1:00:38, 39.55s/it]

For epoch 325: {Learning rate: [0.0034509851898389546]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.588548892881812, 'test_loss': 14.515635967254639, 'bleu': 0.32836, 'gen_len': 0.7}




 39%|███▉      | 59/150 [37:50<59:15, 39.08s/it]  

For epoch 326: {Learning rate: [0.0034457365401248203]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.582396280474779, 'test_loss': 14.265187454223632, 'bleu': 0.0, 'gen_len': 0.7}




 40%|████      | 60/150 [38:29<58:21, 38.91s/it]

For epoch 327: {Learning rate: [0.0034405117660198767]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.55batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.74s/batches]



Metrics: {'train_loss': 12.621614130531869, 'test_loss': 13.755756711959839, 'bleu': 0.32836, 'gen_len': 0.7}




 41%|████      | 61/150 [39:06<56:56, 38.39s/it]

For epoch 328: {Learning rate: [0.0034353106870582046]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.57batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.63086960955364, 'test_loss': 14.615915775299072, 'bleu': 0.0, 'gen_len': 0.7}




 41%|████▏     | 62/150 [39:44<56:01, 38.20s/it]

For epoch 329: {Learning rate: [0.0034301331246778233]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.86s/batches]



Metrics: {'train_loss': 12.581509113311768, 'test_loss': 14.796258163452148, 'bleu': 0.0, 'gen_len': 0.7}




 42%|████▏     | 63/150 [40:22<55:32, 38.30s/it]

For epoch 330: {Learning rate: [0.0034249789021949437]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.51batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.77s/batches]



Metrics: {'train_loss': 12.617219139889974, 'test_loss': 14.074411773681641, 'bleu': 0.39049, 'gen_len': 0.7}




 43%|████▎     | 64/150 [41:00<54:37, 38.11s/it]

For epoch 331: {Learning rate: [0.0034198478447786426]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.52batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.628752411865607, 'test_loss': 14.68356637954712, 'bleu': 0.0, 'gen_len': 0.7}




 43%|████▎     | 65/150 [41:38<53:54, 38.05s/it]

For epoch 332: {Learning rate: [0.0034147397794259565]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.77s/batches]



Metrics: {'train_loss': 12.5627253288176, 'test_loss': 15.010082244873047, 'bleu': 0.0, 'gen_len': 0.7}




 44%|████▍     | 66/150 [42:16<53:19, 38.09s/it]

For epoch 333: {Learning rate: [0.003409654534937381]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.12batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.87s/batches]



Metrics: {'train_loss': 12.63872792081135, 'test_loss': 14.145523738861083, 'bleu': 0.0, 'gen_len': 0.7}




 45%|████▍     | 67/150 [42:57<53:47, 38.89s/it]

For epoch 334: {Learning rate: [0.0034045919418927706]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.77batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.621713184728854, 'test_loss': 14.014991092681885, 'bleu': 0.32836, 'gen_len': 0.7}




 45%|████▌     | 68/150 [43:39<54:25, 39.82s/it]

For epoch 335: {Learning rate: [0.0033995518326276324]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.12batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:23<00:00,  2.37s/batches]



Metrics: {'train_loss': 12.547850172694137, 'test_loss': 14.171343517303466, 'bleu': 0.0, 'gen_len': 0.7}




In [26]:
# Train the model 
trainer.train(65, auto_save = True, log_step = 1, saving_directory=config['model_dir'], 
              metric_for_best_model='bleu',
              metric_objective='maximize')

  0%|          | 0/65 [00:00<?, ?it/s]

For epoch 335: {Learning rate: [0.0033945340412098023]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.49batches/s]
  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.76s/batches]



Metrics: {'train_loss': 12.629058140080149, 'test_loss': 14.007775974273681, 'bleu': 0.51061, 'gen_len': 0.7}




  2%|▏         | 1/65 [00:37<40:20, 37.81s/it]

For epoch 336: {Learning rate: [0.0033895384034165026]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  4.07batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.82s/batches]



Metrics: {'train_loss': 12.56764600335098, 'test_loss': 14.499281215667725, 'bleu': 0.32836, 'gen_len': 0.7}




  3%|▎         | 2/65 [01:17<41:07, 39.17s/it]

For epoch 337: {Learning rate: [0.0033845647567117645]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.32batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.627489165561956, 'test_loss': 14.25087308883667, 'bleu': 0.65278, 'gen_len': 0.7}




  5%|▍         | 3/65 [01:56<40:17, 38.98s/it]

For epoch 338: {Learning rate: [0.0033796129402242194]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.94batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.81s/batches]



Metrics: {'train_loss': 12.55660811866202, 'test_loss': 15.232921981811524, 'bleu': 0.0, 'gen_len': 0.7}




  6%|▌         | 4/65 [02:37<40:24, 39.74s/it]

For epoch 339: {Learning rate: [0.003374682794725243]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.22batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:21<00:00,  2.16s/batches]



Metrics: {'train_loss': 12.53303769158154, 'test_loss': 14.733649063110352, 'bleu': 0.0, 'gen_len': 0.7}




  8%|▊         | 5/65 [03:20<40:52, 40.87s/it]

For epoch 340: {Learning rate: [0.0033697741626074504]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.12batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.84s/batches]



Metrics: {'train_loss': 12.555385234879285, 'test_loss': 14.421464824676514, 'bleu': 0.46082, 'gen_len': 0.7}




  9%|▉         | 6/65 [04:00<39:55, 40.60s/it]

For epoch 341: {Learning rate: [0.0033648868878635345]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.647082311351125, 'test_loss': 14.879205894470214, 'bleu': 0.32836, 'gen_len': 0.7}




 11%|█         | 7/65 [04:38<38:24, 39.74s/it]

For epoch 342: {Learning rate: [0.0033600208160654396]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.561704903114133, 'test_loss': 14.326913738250733, 'bleu': 0.32836, 'gen_len': 0.7}




 12%|█▏        | 8/65 [05:17<37:26, 39.42s/it]

For epoch 343: {Learning rate: [0.0033551757943438686]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  3.96batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:22<00:00,  2.27s/batches]



Metrics: {'train_loss': 12.596391741822405, 'test_loss': 14.582627105712891, 'bleu': 0.32836, 'gen_len': 0.7}




 14%|█▍        | 9/65 [06:02<38:26, 41.18s/it]

For epoch 344: {Learning rate: [0.003350351671368109]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  4.01batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:20<00:00,  2.08s/batches]



Metrics: {'train_loss': 12.613458046099035, 'test_loss': 14.359029960632324, 'bleu': 0.46082, 'gen_len': 0.7}




 15%|█▌        | 10/65 [06:45<38:26, 41.93s/it]

For epoch 345: {Learning rate: [0.0033455482973261826]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  4.07batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.95s/batches]



Metrics: {'train_loss': 12.619198746797515, 'test_loss': 14.387747764587402, 'bleu': 0.0, 'gen_len': 0.7}




 17%|█▋        | 11/65 [07:27<37:39, 41.85s/it]

For epoch 346: {Learning rate: [0.003340765523905305]}


Train batch number 81: 100%|██████████| 82/82 [00:21<00:00,  3.83batches/s]
Test batch number 6:  60%|██████    | 6/10 [00:11<00:08,  2.03s/batches]

In [14]:
# Train the model 
trainer.train(54, auto_save = True, log_step = 1, saving_directory=config['model_dir'], 
              metric_for_best_model='bleu',
              metric_objective='maximize')

  0%|          | 0/54 [00:00<?, ?it/s]

For epoch 345: {Learning rate: [0.003340765523905305]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.53batches/s]
  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.84s/batches]



Metrics: {'train_loss': 12.629660635459715, 'test_loss': 14.263394165039063, 'bleu': 0.0, 'gen_len': 0.7}




  2%|▏         | 1/54 [00:38<34:07, 38.63s/it]

For epoch 346: {Learning rate: [0.0033360032042726484]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.77s/batches]



Metrics: {'train_loss': 12.64304045933049, 'test_loss': 14.388862323760986, 'bleu': 0.0, 'gen_len': 0.7}




  4%|▎         | 2/54 [01:16<33:18, 38.44s/it]

For epoch 347: {Learning rate: [0.003331261193056413]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.59batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.91s/batches]



Metrics: {'train_loss': 12.636578949486337, 'test_loss': 14.538853073120118, 'bleu': 0.0, 'gen_len': 0.7}




  6%|▌         | 3/54 [01:55<32:54, 38.72s/it]

For epoch 348: {Learning rate: [0.0033265393463271843]}


Train batch number 81: 100%|██████████| 82/82 [00:17<00:00,  4.64batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.609664585532212, 'test_loss': 13.993708848953247, 'bleu': 0.32836, 'gen_len': 0.7}




  7%|▋         | 4/54 [02:33<31:49, 38.18s/it]

For epoch 349: {Learning rate: [0.0033218375215795866]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.50batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.72s/batches]



Metrics: {'train_loss': 12.583042871661302, 'test_loss': 14.660491561889648, 'bleu': 0.0, 'gen_len': 0.7}




  9%|▉         | 5/54 [03:10<30:56, 37.89s/it]

For epoch 350: {Learning rate: [0.0033171555777142202]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.53batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.92s/batches]



Metrics: {'train_loss': 12.624120776246233, 'test_loss': 14.593277168273925, 'bleu': 0.40583, 'gen_len': 0.7}




 11%|█         | 6/54 [03:49<30:41, 38.36s/it]

For epoch 351: {Learning rate: [0.003312493375019875]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.76s/batches]



Metrics: {'train_loss': 12.535577878719423, 'test_loss': 16.00751323699951, 'bleu': 0.27612000000000003, 'gen_len': 0.8}




 13%|█▎        | 7/54 [04:28<29:59, 38.30s/it]

For epoch 352: {Learning rate: [0.00330785077515602]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.49batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.562840310538688, 'test_loss': 14.13433084487915, 'bleu': 0.40583, 'gen_len': 0.7}




 15%|█▍        | 8/54 [05:06<29:20, 38.28s/it]

For epoch 353: {Learning rate: [0.0033032276411355623]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.47batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.562615976101014, 'test_loss': 14.737193489074707, 'bleu': 0.0, 'gen_len': 0.7}




 17%|█▋        | 9/54 [05:44<28:40, 38.23s/it]

For epoch 354: {Learning rate: [0.003298623837307872]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.628948630356208, 'test_loss': 13.907479190826416, 'bleu': 0.0, 'gen_len': 0.7}




 19%|█▊        | 10/54 [06:23<28:05, 38.31s/it]

For epoch 355: {Learning rate: [0.0032940392293420617]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.31batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.59543832336984, 'test_loss': 14.359315109252929, 'bleu': 0.0, 'gen_len': 0.7}




 20%|██        | 11/54 [07:01<27:34, 38.47s/it]

For epoch 356: {Learning rate: [0.003289473684210526]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.77s/batches]



Metrics: {'train_loss': 12.567941089955772, 'test_loss': 14.548884010314941, 'bleu': 0.32836, 'gen_len': 0.7}




 22%|██▏       | 12/54 [07:40<26:55, 38.46s/it]

For epoch 357: {Learning rate: [0.003284927070172729]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.82s/batches]



Metrics: {'train_loss': 12.586177250234092, 'test_loss': 14.832741737365723, 'bleu': 0.0, 'gen_len': 0.7}




 24%|██▍       | 13/54 [08:19<26:21, 38.58s/it]

For epoch 358: {Learning rate: [0.0032803992567592374]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.586292022612037, 'test_loss': 14.5789005279541, 'bleu': 0.0, 'gen_len': 0.7}




 26%|██▌       | 14/54 [08:57<25:43, 38.59s/it]

For epoch 359: {Learning rate: [0.0032758901147559947]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.34batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.571100607150939, 'test_loss': 14.959103298187255, 'bleu': 0.39049, 'gen_len': 0.7}




 28%|██▊       | 15/54 [09:36<25:06, 38.63s/it]

For epoch 360: {Learning rate: [0.0032713995161888355]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.631683948563367, 'test_loss': 14.183402061462402, 'bleu': 0.0, 'gen_len': 0.7}




 30%|██▉       | 16/54 [10:14<24:24, 38.55s/it]

For epoch 361: {Learning rate: [0.0032669273343082293]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.81s/batches]



Metrics: {'train_loss': 12.533266369889422, 'test_loss': 13.866189670562743, 'bleu': 0.0, 'gen_len': 0.6}




 31%|███▏      | 17/54 [10:53<23:47, 38.59s/it]

For epoch 362: {Learning rate: [0.0032624734435742534]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.40batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.86s/batches]



Metrics: {'train_loss': 12.624689561564749, 'test_loss': 14.223624420166015, 'bleu': 0.0, 'gen_len': 0.7}




 33%|███▎      | 18/54 [11:32<23:13, 38.71s/it]

For epoch 363: {Learning rate: [0.0032580377196417933]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.19batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.99s/batches]



Metrics: {'train_loss': 12.589187982605726, 'test_loss': 14.44616765975952, 'bleu': 0.74173, 'gen_len': 0.6}




 35%|███▌      | 19/54 [12:14<23:11, 39.76s/it]

For epoch 364: {Learning rate: [0.0032536200393459597]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.66143661010556, 'test_loss': 14.356874084472656, 'bleu': 0.0, 'gen_len': 0.7}




 37%|███▋      | 20/54 [12:53<22:18, 39.36s/it]

For epoch 365: {Learning rate: [0.003249220280687727]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.585124829920327, 'test_loss': 14.10213794708252, 'bleu': 0.32836, 'gen_len': 0.7}




 39%|███▉      | 21/54 [13:31<21:28, 39.05s/it]

For epoch 366: {Learning rate: [0.0032448383228197816]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  4.07batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.85s/batches]



Metrics: {'train_loss': 12.603022371850363, 'test_loss': 13.856917428970338, 'bleu': 0.0, 'gen_len': 0.7}




 41%|████      | 22/54 [14:12<21:05, 39.55s/it]

For epoch 367: {Learning rate: [0.003240474046032579]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.32batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.88s/batches]



Metrics: {'train_loss': 12.511541645701339, 'test_loss': 14.90884771347046, 'bleu': 0.32836, 'gen_len': 0.7}




 43%|████▎     | 23/54 [14:51<20:27, 39.60s/it]

For epoch 368: {Learning rate: [0.0032361273317406108]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.31batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.547026384167555, 'test_loss': 14.384672164916992, 'bleu': 0.0, 'gen_len': 0.7}




 44%|████▍     | 24/54 [15:30<19:41, 39.37s/it]

For epoch 369: {Learning rate: [0.0032317980624688696]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.35batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.82s/batches]



Metrics: {'train_loss': 12.5507808836495, 'test_loss': 14.953842926025391, 'bleu': 0.32836, 'gen_len': 0.7}




 46%|████▋     | 25/54 [16:09<18:57, 39.22s/it]

For epoch 370: {Learning rate: [0.003227486121839514]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.35batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.578080124971343, 'test_loss': 14.384054470062257, 'bleu': 0.32836, 'gen_len': 0.7}




 48%|████▊     | 26/54 [16:48<18:14, 39.09s/it]

For epoch 371: {Learning rate: [0.0032231913945587293]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.32batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.77s/batches]



Metrics: {'train_loss': 12.576162082392996, 'test_loss': 13.915660762786866, 'bleu': 0.0, 'gen_len': 0.7}




 50%|█████     | 27/54 [17:26<17:31, 38.93s/it]

For epoch 372: {Learning rate: [0.0032189137664037797]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.34batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.566604678223772, 'test_loss': 14.16174602508545, 'bleu': 0.0, 'gen_len': 0.7}




 52%|█████▏    | 28/54 [18:05<16:50, 38.87s/it]

For epoch 373: {Learning rate: [0.003214653124210248]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.35batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.598593938641432, 'test_loss': 14.47129783630371, 'bleu': 0.32836, 'gen_len': 0.7}




 54%|█████▎    | 29/54 [18:44<16:09, 38.79s/it]

For epoch 374: {Learning rate: [0.0032104093558594634]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.85s/batches]



Metrics: {'train_loss': 12.515903374043907, 'test_loss': 14.077149963378906, 'bleu': 0.5489200000000001, 'gen_len': 0.7}




 56%|█████▌    | 30/54 [19:23<15:34, 38.93s/it]

For epoch 375: {Learning rate: [0.0032061823502661066]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.45batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.613299236065004, 'test_loss': 14.405285549163818, 'bleu': 0.3629, 'gen_len': 0.7}




 57%|█████▋    | 31/54 [20:01<14:50, 38.72s/it]

For epoch 376: {Learning rate: [0.0032019719973659998]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.81s/batches]



Metrics: {'train_loss': 12.543129211518822, 'test_loss': 14.265832233428956, 'bleu': 0.32836, 'gen_len': 0.7}




 59%|█████▉    | 32/54 [20:40<14:12, 38.76s/it]

For epoch 377: {Learning rate: [0.003197778188104068]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.35batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.55022608361593, 'test_loss': 13.80153226852417, 'bleu': 0.0, 'gen_len': 0.7}




 61%|██████    | 33/54 [21:19<13:31, 38.66s/it]

For epoch 378: {Learning rate: [0.003193600814422475]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.31batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.601170975987504, 'test_loss': 14.08078145980835, 'bleu': 0.39049, 'gen_len': 0.7}




 63%|██████▎   | 34/54 [21:57<12:54, 38.72s/it]

For epoch 379: {Learning rate: [0.00318943976924893]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.84s/batches]



Metrics: {'train_loss': 12.57596021745263, 'test_loss': 13.967436790466309, 'bleu': 0.0, 'gen_len': 0.7}




 65%|██████▍   | 35/54 [22:36<12:15, 38.72s/it]

For epoch 380: {Learning rate: [0.0031852949464851598]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.580357220114731, 'test_loss': 14.096674823760987, 'bleu': 0.0, 'gen_len': 0.7}




 67%|██████▋   | 36/54 [23:15<11:35, 38.64s/it]

For epoch 381: {Learning rate: [0.0031811662409955473]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.32batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:19<00:00,  1.98s/batches]



Metrics: {'train_loss': 12.558404835259042, 'test_loss': 14.52008752822876, 'bleu': 0.0, 'gen_len': 0.7}




 69%|██████▊   | 37/54 [23:55<11:07, 39.25s/it]

For epoch 382: {Learning rate: [0.0031770535485959304]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.33batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.77s/batches]



Metrics: {'train_loss': 12.539669560223091, 'test_loss': 14.367604827880859, 'bleu': 0.32836, 'gen_len': 0.7}




 70%|███████   | 38/54 [24:34<10:24, 39.02s/it]

For epoch 383: {Learning rate: [0.0031729567660425595]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.30batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.544421992650847, 'test_loss': 14.355961227416993, 'bleu': 0.0, 'gen_len': 0.7}




 72%|███████▏  | 39/54 [25:14<09:51, 39.45s/it]

For epoch 384: {Learning rate: [0.0031688757910212115]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.46batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.83s/batches]



Metrics: {'train_loss': 12.536541130484604, 'test_loss': 14.101783561706544, 'bleu': 0.0, 'gen_len': 0.7}




 74%|███████▍  | 40/54 [25:53<09:08, 39.21s/it]

For epoch 385: {Learning rate: [0.0031648105221364583]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.554352347443743, 'test_loss': 14.126078987121582, 'bleu': 0.40583, 'gen_len': 0.6}




 76%|███████▌  | 41/54 [26:31<08:26, 38.98s/it]

For epoch 386: {Learning rate: [0.003160760858901085]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.33batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.569220804586642, 'test_loss': 14.219939422607421, 'bleu': 0.5837600000000001, 'gen_len': 0.7}




 78%|███████▊  | 42/54 [27:11<07:48, 39.07s/it]

For epoch 387: {Learning rate: [0.003156726701725659]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.512187132021277, 'test_loss': 14.122212409973145, 'bleu': 0.74173, 'gen_len': 0.7}




 80%|███████▉  | 43/54 [27:49<07:07, 38.89s/it]

For epoch 388: {Learning rate: [0.0031527079519082396]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.36batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.81s/batches]



Metrics: {'train_loss': 12.615889462029061, 'test_loss': 14.00363368988037, 'bleu': 0.32836, 'gen_len': 0.7}




 81%|████████▏ | 44/54 [28:28<06:28, 38.86s/it]

For epoch 389: {Learning rate: [0.00314870451162424]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.83s/batches]



Metrics: {'train_loss': 12.517385651425618, 'test_loss': 14.77435884475708, 'bleu': 0.0, 'gen_len': 0.7}




 83%|████████▎ | 45/54 [29:07<05:50, 38.91s/it]

For epoch 390: {Learning rate: [0.0031447162839164226]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.38batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.524858102565858, 'test_loss': 14.69449520111084, 'bleu': 0.0, 'gen_len': 0.7}




 85%|████████▌ | 46/54 [29:45<05:10, 38.75s/it]

For epoch 391: {Learning rate: [0.003140743172685038]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.40batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.81s/batches]



Metrics: {'train_loss': 12.52794361114502, 'test_loss': 14.249109268188477, 'bleu': 0.32836, 'gen_len': 0.7}




 87%|████████▋ | 47/54 [30:24<04:31, 38.73s/it]

For epoch 392: {Learning rate: [0.0031367850826780974]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.39batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.82s/batches]



Metrics: {'train_loss': 12.587338331269056, 'test_loss': 14.254604244232178, 'bleu': 0.0, 'gen_len': 0.7}




 89%|████████▉ | 48/54 [31:03<03:52, 38.74s/it]

For epoch 393: {Learning rate: [0.0031328419194817845]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.31batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.78s/batches]



Metrics: {'train_loss': 12.569716302359977, 'test_loss': 14.08548936843872, 'bleu': 0.0, 'gen_len': 0.7}




 91%|█████████ | 49/54 [31:41<03:13, 38.75s/it]

For epoch 394: {Learning rate: [0.003128913589510993]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.41batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.79s/batches]



Metrics: {'train_loss': 12.539892621156646, 'test_loss': 14.197038745880127, 'bleu': 0.48328, 'gen_len': 0.7}




 93%|█████████▎| 50/54 [32:20<02:34, 38.63s/it]

For epoch 395: {Learning rate: [0.003125]}


Train batch number 81: 100%|██████████| 82/82 [00:19<00:00,  4.28batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.525810875543733, 'test_loss': 14.552533197402955, 'bleu': 0.40583, 'gen_len': 0.6}




 94%|█████████▍| 51/54 [32:59<01:56, 38.79s/it]

For epoch 396: {Learning rate: [0.0031211010589932645]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.37batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.85s/batches]



Metrics: {'train_loss': 12.664681830057283, 'test_loss': 14.71955394744873, 'bleu': 0.0, 'gen_len': 0.7}




 96%|█████████▋| 52/54 [33:38<01:17, 38.87s/it]

For epoch 397: {Learning rate: [0.0031172166753363527]}


Train batch number 81: 100%|██████████| 82/82 [00:20<00:00,  4.04batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:18<00:00,  1.81s/batches]



Metrics: {'train_loss': 12.545361315331808, 'test_loss': 15.36596736907959, 'bleu': 0.0, 'gen_len': 0.7}




 98%|█████████▊| 53/54 [34:18<00:39, 39.30s/it]

For epoch 398: {Learning rate: [0.0031133467586669868]}


Train batch number 81: 100%|██████████| 82/82 [00:18<00:00,  4.34batches/s]
Test batch number 9: 100%|██████████| 10/10 [00:17<00:00,  1.80s/batches]



Metrics: {'train_loss': 12.49540159760452, 'test_loss': 13.923585891723633, 'bleu': 0.48262, 'gen_len': 0.6}




100%|██████████| 54/54 [34:57<00:00, 38.84s/it]
