Fine-tuning best T5 Transformer 🤖
-----------------------------------

In this notebook, we will continue the fine-tuning of T5 transformer on the sentences got from the book `Grammaire de Wolof Moderne` by Pathe Diagne additionally to the sentences got from `Wolof version of L'Africain` by Daouda Ndiaye. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step. We will evaluate the training on the validation dataset.

- Parallel coordinates from panel:

- Parameter importance char: 
[t5_v3_importance](https://wandb.ai/oumar-kane-team/small-t5-cross-fw-translation-bayes-hpsearch-v3/reports/undefined-23-05-16-10-36-17---Vmlldzo0Mzc4NDY0?accessToken=eyaiyrid0qz1zg2jkq3fc65biw53084dpfitbi0dgonq6mweupw6kgjml9d2nv1w)

We can see in the above chart that the batch is the most important parameter with a negative correlation with the BLEU score (meaning that a lower batch size is better). Next, we the probability of modifying a character in the french corpus is also important and a high probability provide a better BLEU score.  

In [1]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, set_seed, AdamW, get_linear_schedule_with_warmup, T5ForConditionalGeneration,\
    get_cosine_schedule_with_warmup, Adafactor
from wolof_translate.utils.sent_transformers import TransformerSequences
from torch.nn import TransformerEncoderLayer, TransformerDecoderLayer
from torch.utils.data import Dataset, DataLoader, random_split
from wolof_translate.data.dataset_v3 import SentenceDataset
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from torch.optim.lr_scheduler import _LRScheduler
# from custom_rnn.utils.kwargs import Kwargs
from torch.nn.utils.rnn import pad_sequence
from plotly.subplots import make_subplots
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from torch.nn import functional as F
import plotly.graph_objects as go
from tokenizers import Tokenizer
import matplotlib.pyplot as plt
from tqdm import tqdm, trange
from functools import partial
from torch.nn import utils
from copy import deepcopy
from torch import optim
from typing import *
from torch import nn
import pandas as pd
import numpy as np
import itertools
import evaluate
import random
import string
import shutil
import wandb
import torch
import json
import copy
import os

os.environ["WANDB_DISABLED"] = "true"

  from .autonotebook import tqdm as notebook_tqdm


## French to wolof

### Configure dataset 🔠

In [2]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v4.json")


In [3]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float, max_len: int):

  # Create augmentation to add on French sentences
  fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p,
                                                         aug_word_max= max_len),
                                        remove_mark_space, delete_guillemet_space)

  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(f"data/extractions/new_data/train_set.csv", max_len = max_len,
                                        tokenizer = tokenizer,
                                        truncation = True,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the validation dataset
  valid_dataset = SentenceDataset(f"data/extractions/new_data/valid_set.csv", max_len = max_len,
                                        tokenizer = tokenizer,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

### Configure the model and the evaluation function ⚙️

Let us evaluate the predictions with the `bleu` metric.

In [4]:
%%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, self.tokenizer.pad_token_id)
        
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

Overwriting wolof-translate/wolof_translate/utils/evaluation.py


Let us initialize the evaluation object.

In [5]:
%run wolof-translate/wolof_translate/utils/evaluation.py
evaluation = TranslationEvaluation(tokenizer)


### Searching for the best parameters 🕖

In [6]:
from wolof_translate.models.transformers.optimization import TransformerScheduler
from wolof_translate.trainers.transformer_trainer import ModelRunner
from wolof_translate.utils.evaluation import TranslationEvaluation
from wolof_translate.models.transformers.main import Transformer
from wolof_translate.utils.split_with_valid import split_data


-------------

### ---

In [7]:
# let us initialize the hyperparameter configuration
config = {
    'random_state': 0,
    'fr_char_p': 0.33865575033761214,
    'fr_word_p': 0.1215427458724321,
    'learning_rate': 0.009397216172457796,
    'weight_decay': 0.036296976653260773,
    'batch_size': 16,
    'warmup_ratio': 0.0,
    'max_epoch': 1175,
    'max_len': 104,
    'bleu': 0.5746,
    'model_dir': 'data/checkpoints/fw_t5_small_custom_train_v4_checkpoints/',
    'new_model_dir': 'data/checkpoints/t5_small_custom_train_results_fw_v4/'
}

# Initialize the model name
model_name = 't5-small'

# import the model with its pre-trained weights
model = T5ForConditionalGeneration.from_pretrained(model_name)

# resize the token embeddings
model.resize_token_embeddings(len(tokenizer))

# let us initialize the evaluation class
evaluation = TranslationEvaluation(tokenizer)

# let us initialize the trainer
trainer = ModelRunner(model, seed = 0, version = 1, evaluation = evaluation, optimizer=Adafactor)

# split the data
split_data(config['random_state'], csv_file="corpora_v4.csv")

# recuperate train and test set
train_dataset, test_dataset = recuperate_datasets(config['fr_char_p'], 
                                                    config['fr_word_p'],
                                                    max_len=config['max_len'])

# let us calculate the appropriate warmup steps (let us take a max epoch of 100)
length = len(train_dataset)

n_steps = length // config['batch_size']

num_steps = config['max_epoch'] * n_steps

warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

# Initialize the scheduler parameters
scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}

# Initialize the optimizer parameters
optimizer_args = {
    'lr': config['learning_rate'],
    'weight_decay': config['weight_decay'],
    # 'betas': (0.9, 0.98),
    'relative_step': False
}

# Initialize the loaders parameters
train_loader_args = {'batch_size': config['batch_size']}

# Add the datasets and hyperparameters to trainer
trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                optimizer_kwargs = optimizer_args,
                lr_scheduler=get_linear_schedule_with_warmup,
                lr_scheduler_kwargs=scheduler_args, 
                predict_with_generate = True,
                hugging_face = True,
                logging_dir="data/logs/t5_small_custom_train_fw_v4"
                )

# We will from checkpoints so let us the model
trainer.load(config['model_dir'], load_best=True) # Only for the first loading
# trainer.load(config['new_model_dir'])

        

In [8]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 4: {Learning rate: [0.009373032860321989]}


Train batch number 127: 100%|██████████| 127/127 [00:57<00:00,  2.22batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.65batches/s]



Metrics: {'train_loss': 0.6396655408650871, 'test_loss': 0.8404125978549322, 'bleu': 0.6164, 'gen_len': 12.0844}




  0%|          | 1/1172 [01:11<23:22:31, 71.86s/it]

For epoch 5: {Learning rate: [0.009364971756276718]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.32batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.87batches/s]



Metrics: {'train_loss': 0.5658696134259381, 'test_loss': 0.8473371063669523, 'bleu': 0.4908, 'gen_len': 10.1778}




  0%|          | 2/1172 [02:17<22:07:05, 68.06s/it]

For epoch 6: {Learning rate: [0.009356910652231449]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.44batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:06<00:00,  2.30batches/s]



Metrics: {'train_loss': 0.49778958122561295, 'test_loss': 0.8723774358630181, 'bleu': 0.9595, 'gen_len': 11.4044}




  0%|          | 3/1172 [03:18<21:09:48, 65.17s/it]

For epoch 7: {Learning rate: [0.00934884954818618]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.41batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.04batches/s]



Metrics: {'train_loss': 0.42908084885341913, 'test_loss': 0.9052557662129402, 'bleu': 0.7864, 'gen_len': 11.2311}




  0%|          | 4/1172 [04:21<20:45:52, 64.00s/it]

For epoch 8: {Learning rate: [0.00934078844414091]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.38batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.54batches/s]



Metrics: {'train_loss': 0.35951348534953875, 'test_loss': 0.9593725110093753, 'bleu': 1.0326, 'gen_len': 10.2133}




  0%|          | 5/1172 [05:30<21:25:07, 66.07s/it]

For epoch 9: {Learning rate: [0.00933272734009564]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.36batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:10<00:00,  1.45batches/s]



Metrics: {'train_loss': 0.30335200492908637, 'test_loss': 0.9855174926420053, 'bleu': 1.2899, 'gen_len': 11.9378}




  1%|          | 6/1172 [06:38<21:32:18, 66.50s/it]

For epoch 10: {Learning rate: [0.009324666236050373]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.28batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.96batches/s]



Metrics: {'train_loss': 0.25331692374128056, 'test_loss': 1.0313124624391397, 'bleu': 1.8498, 'gen_len': 11.44}




  1%|          | 7/1172 [07:44<21:29:45, 66.43s/it]

For epoch 11: {Learning rate: [0.009316605132005102]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.39batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.02batches/s]



Metrics: {'train_loss': 0.20675146122147717, 'test_loss': 1.0892622527976832, 'bleu': 2.0299, 'gen_len': 11.3778}




  1%|          | 8/1172 [08:48<21:11:59, 65.57s/it]

For epoch 12: {Learning rate: [0.009308544027959833]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.31batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.76batches/s]



Metrics: {'train_loss': 0.16950597643382906, 'test_loss': 1.122325330848495, 'bleu': 2.0449, 'gen_len': 11.9822}




  1%|          | 9/1172 [09:54<21:15:15, 65.79s/it]

For epoch 13: {Learning rate: [0.009300482923914563]}


Train batch number 127: 100%|██████████| 127/127 [00:57<00:00,  2.23batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.82batches/s]



Metrics: {'train_loss': 0.13726942035860903, 'test_loss': 1.168237425883611, 'bleu': 1.9491, 'gen_len': 11.3289}




  1%|          | 10/1172 [11:02<21:28:46, 66.55s/it]

For epoch 14: {Learning rate: [0.009292421819869293]}


Train batch number 127: 100%|██████████| 127/127 [01:02<00:00,  2.04batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.54batches/s]



Metrics: {'train_loss': 0.11747406276426917, 'test_loss': 1.1858299940824508, 'bleu': 2.2116, 'gen_len': 12.1467}




  1%|          | 11/1172 [12:18<22:21:48, 69.34s/it]

For epoch 15: {Learning rate: [0.009284360715824024]}


Train batch number 127: 100%|██████████| 127/127 [01:00<00:00,  2.10batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.13batches/s]



Metrics: {'train_loss': 0.09862760216819019, 'test_loss': 1.2158544386426609, 'bleu': 2.2682, 'gen_len': 11.5422}




  1%|          | 12/1172 [13:29<22:29:03, 69.78s/it]

For epoch 16: {Learning rate: [0.009276299611778754]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.38batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.93batches/s]



Metrics: {'train_loss': 0.08470561959612088, 'test_loss': 1.2195446595549584, 'bleu': 2.4208, 'gen_len': 12.0978}




  1%|          | 13/1172 [14:35<22:08:28, 68.77s/it]

For epoch 17: {Learning rate: [0.009268238507733485]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.41batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.11batches/s]



Metrics: {'train_loss': 0.07481852125405795, 'test_loss': 1.2567684551080067, 'bleu': 2.5894, 'gen_len': 11.9422}




  1%|          | 14/1172 [15:38<21:32:37, 66.98s/it]

For epoch 18: {Learning rate: [0.009260177403688216]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.45batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.12batches/s]



Metrics: {'train_loss': 0.06829568648373517, 'test_loss': 1.2564526585241158, 'bleu': 2.434, 'gen_len': 12.2489}




  1%|▏         | 15/1172 [16:39<20:57:11, 65.20s/it]

For epoch 19: {Learning rate: [0.009252116299642947]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.41batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.96batches/s]



Metrics: {'train_loss': 0.06131653473015845, 'test_loss': 1.2523178145289422, 'bleu': 3.0307, 'gen_len': 12.28}




  1%|▏         | 16/1172 [17:42<20:41:32, 64.44s/it]

For epoch 20: {Learning rate: [0.009244055195597676]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.40batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.89batches/s]



Metrics: {'train_loss': 0.054507003613109666, 'test_loss': 1.251902869095405, 'bleu': 2.4038, 'gen_len': 11.6311}




  1%|▏         | 17/1172 [18:45<20:33:53, 64.10s/it]

For epoch 21: {Learning rate: [0.009235994091552407]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.37batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:06<00:00,  2.17batches/s]



Metrics: {'train_loss': 0.05156587171331635, 'test_loss': 1.2569687803586325, 'bleu': 2.5742, 'gen_len': 12.4489}




  2%|▏         | 18/1172 [19:48<20:24:50, 63.68s/it]

For epoch 22: {Learning rate: [0.009227932987507138]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.41batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.00batches/s]



Metrics: {'train_loss': 0.04465526181793823, 'test_loss': 1.2829449673493702, 'bleu': 2.715, 'gen_len': 11.7067}




  2%|▏         | 19/1172 [20:51<20:18:21, 63.40s/it]

For epoch 23: {Learning rate: [0.009219871883461869]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.32batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.66batches/s]



Metrics: {'train_loss': 0.042608285601448824, 'test_loss': 1.2911169464389483, 'bleu': 2.3709, 'gen_len': 11.6711}




  2%|▏         | 20/1172 [21:56<20:31:08, 64.12s/it]

For epoch 24: {Learning rate: [0.0092118107794166]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.39batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.04batches/s]



Metrics: {'train_loss': 0.04329907415273387, 'test_loss': 1.2854806527495384, 'bleu': 2.299, 'gen_len': 11.4578}




  2%|▏         | 21/1172 [22:59<20:23:19, 63.77s/it]

For epoch 25: {Learning rate: [0.009203749675371329]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.42batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.89batches/s]



Metrics: {'train_loss': 0.0389226196245767, 'test_loss': 1.2795045914749303, 'bleu': 2.716, 'gen_len': 11.8178}




  2%|▏         | 22/1172 [24:03<20:19:54, 63.65s/it]

For epoch 26: {Learning rate: [0.00919568857132606]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.38batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.96batches/s]



Metrics: {'train_loss': 0.03928799622171507, 'test_loss': 1.2956522847215335, 'bleu': 2.6126, 'gen_len': 11.8267}




  2%|▏         | 23/1172 [25:06<20:16:45, 63.54s/it]

For epoch 27: {Learning rate: [0.00918762746728079]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.29batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.81batches/s]



Metrics: {'train_loss': 0.03471670787429481, 'test_loss': 1.2946669047077497, 'bleu': 2.5913, 'gen_len': 11.7556}




  2%|▏         | 24/1172 [26:12<20:31:59, 64.39s/it]

For epoch 28: {Learning rate: [0.009179566363235522]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.27batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.98batches/s]



Metrics: {'train_loss': 0.03405567647991922, 'test_loss': 1.2661456694205602, 'bleu': 2.6605, 'gen_len': 11.7467}




  2%|▏         | 25/1172 [27:19<20:43:55, 65.07s/it]

For epoch 29: {Learning rate: [0.009171505259190253]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.29batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.78batches/s]



Metrics: {'train_loss': 0.03291528374983335, 'test_loss': 1.286092329521974, 'bleu': 3.0035, 'gen_len': 12.4}




  2%|▏         | 26/1172 [28:26<20:51:10, 65.51s/it]

For epoch 30: {Learning rate: [0.009163444155144983]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.38batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.96batches/s]



Metrics: {'train_loss': 0.0320729845108127, 'test_loss': 1.2685337488849957, 'bleu': 2.8018, 'gen_len': 12.3467}




  2%|▏         | 27/1172 [29:29<20:38:00, 64.87s/it]

For epoch 31: {Learning rate: [0.009155383051099713]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.36batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.90batches/s]



Metrics: {'train_loss': 0.029723278730814384, 'test_loss': 1.2981869881351789, 'bleu': 2.7537, 'gen_len': 12.1733}




  2%|▏         | 28/1172 [30:33<20:31:55, 64.61s/it]

For epoch 32: {Learning rate: [0.009147321947054443]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.38batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.89batches/s]



Metrics: {'train_loss': 0.02869540852619203, 'test_loss': 1.2921462843815485, 'bleu': 2.8654, 'gen_len': 11.7778}




  2%|▏         | 29/1172 [31:37<20:29:22, 64.53s/it]

For epoch 33: {Learning rate: [0.009139260843009174]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.44batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.79batches/s]



Metrics: {'train_loss': 0.02840149535350208, 'test_loss': 1.3130461330215135, 'bleu': 3.294, 'gen_len': 14.8356}




  3%|▎         | 30/1172 [32:41<20:23:00, 64.26s/it]

For epoch 34: {Learning rate: [0.009131199738963905]}


Train batch number 127: 100%|██████████| 127/127 [00:56<00:00,  2.27batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.96batches/s]



Metrics: {'train_loss': 0.02682523680924196, 'test_loss': 1.3025061771273614, 'bleu': 2.6165, 'gen_len': 12.0978}




  3%|▎         | 31/1172 [33:47<20:33:18, 64.85s/it]

For epoch 35: {Learning rate: [0.009123138634918636]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.41batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.66batches/s]



Metrics: {'train_loss': 0.02616055567902843, 'test_loss': 1.2735323302447796, 'bleu': 3.3245, 'gen_len': 12.5111}




  3%|▎         | 32/1172 [34:52<20:30:04, 64.74s/it]

For epoch 36: {Learning rate: [0.009115077530873365]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.34batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.86batches/s]



Metrics: {'train_loss': 0.025541117766828048, 'test_loss': 1.2743814662098885, 'bleu': 2.7384, 'gen_len': 11.9289}




  3%|▎         | 33/1172 [35:57<20:31:54, 64.89s/it]

For epoch 37: {Learning rate: [0.009107016426828096]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.33batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.50batches/s]



Metrics: {'train_loss': 0.023586882560653246, 'test_loss': 1.2806941814720632, 'bleu': 2.6625, 'gen_len': 11.8133}




  3%|▎         | 34/1172 [37:04<20:44:25, 65.61s/it]

For epoch 38: {Learning rate: [0.009098955322782827]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.47batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.05batches/s]



Metrics: {'train_loss': 0.023501737303591855, 'test_loss': 1.2785256519913673, 'bleu': 2.6888, 'gen_len': 12.0089}




  3%|▎         | 35/1172 [38:05<20:18:10, 64.28s/it]

For epoch 39: {Learning rate: [0.009090894218737558]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.48batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.69batches/s]



Metrics: {'train_loss': 0.021979067518603145, 'test_loss': 1.2787995643913745, 'bleu': 2.6452, 'gen_len': 11.8978}




  3%|▎         | 36/1172 [39:08<20:08:28, 63.83s/it]

For epoch 40: {Learning rate: [0.009082833114692289]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.45batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.07batches/s]



Metrics: {'train_loss': 0.022176320525252913, 'test_loss': 1.2804197664062182, 'bleu': 2.9022, 'gen_len': 11.7244}




  3%|▎         | 37/1172 [40:09<19:52:51, 63.06s/it]

For epoch 41: {Learning rate: [0.00907477201064702]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.45batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.12batches/s]



Metrics: {'train_loss': 0.021343534420002398, 'test_loss': 1.2591455115626256, 'bleu': 2.9153, 'gen_len': 11.7911}




  3%|▎         | 38/1172 [41:10<19:40:21, 62.45s/it]

For epoch 42: {Learning rate: [0.009066710906601749]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.47batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.07batches/s]



Metrics: {'train_loss': 0.020047769658120832, 'test_loss': 1.2841236010193824, 'bleu': 2.7043, 'gen_len': 11.4711}




  3%|▎         | 39/1172 [42:11<19:31:25, 62.03s/it]

For epoch 43: {Learning rate: [0.00905864980255648]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.48batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.04batches/s]



Metrics: {'train_loss': 0.019886214227422955, 'test_loss': 1.2833851118882498, 'bleu': 2.9588, 'gen_len': 11.8444}




  3%|▎         | 40/1172 [43:13<19:25:06, 61.75s/it]

For epoch 44: {Learning rate: [0.00905058869851121]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.45batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.99batches/s]



Metrics: {'train_loss': 0.019945380050600984, 'test_loss': 1.2755233585834502, 'bleu': 3.0222, 'gen_len': 12.5467}




  3%|▎         | 41/1172 [44:14<19:23:21, 61.72s/it]

For epoch 45: {Learning rate: [0.00904252759446594]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.47batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.00batches/s]



Metrics: {'train_loss': 0.019396390762238754, 'test_loss': 1.2594849308331808, 'bleu': 3.0171, 'gen_len': 11.4933}




  4%|▎         | 42/1172 [45:15<19:19:45, 61.58s/it]

For epoch 46: {Learning rate: [0.009034466490420672]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.43batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.58batches/s]



Metrics: {'train_loss': 0.019182914812997803, 'test_loss': 1.2727840319275856, 'bleu': 3.0137, 'gen_len': 12.2222}




  4%|▎         | 43/1172 [46:20<19:33:33, 62.37s/it]

For epoch 47: {Learning rate: [0.009026405386375402]}


Train batch number 127: 100%|██████████| 127/127 [00:57<00:00,  2.21batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.90batches/s]



Metrics: {'train_loss': 0.019076449173231293, 'test_loss': 1.285535177588463, 'bleu': 2.9168, 'gen_len': 11.6978}




  4%|▍         | 44/1172 [47:28<20:04:49, 64.09s/it]

For epoch 48: {Learning rate: [0.009018344282330133]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.37batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.81batches/s]



Metrics: {'train_loss': 0.01728074785915944, 'test_loss': 1.2777830402056376, 'bleu': 2.9735, 'gen_len': 11.6089}




  4%|▍         | 45/1172 [48:32<20:05:38, 64.19s/it]

For epoch 49: {Learning rate: [0.009010283178284863]}


Train batch number 127: 100%|██████████| 127/127 [00:59<00:00,  2.15batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.59batches/s]



Metrics: {'train_loss': 0.017291182655782448, 'test_loss': 1.2619646539290745, 'bleu': 3.1722, 'gen_len': 11.8933}




  4%|▍         | 46/1172 [49:44<20:45:42, 66.38s/it]

For epoch 50: {Learning rate: [0.009002222074239594]}


Train batch number 127: 100%|██████████| 127/127 [00:56<00:00,  2.25batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.94batches/s]



Metrics: {'train_loss': 0.016589510250895275, 'test_loss': 1.2690670763452847, 'bleu': 3.0906, 'gen_len': 11.9511}




  4%|▍         | 47/1172 [50:51<20:48:52, 66.61s/it]

For epoch 51: {Learning rate: [0.008994160970194323]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.35batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.87batches/s]



Metrics: {'train_loss': 0.015968054368006666, 'test_loss': 1.2776118824879328, 'bleu': 3.1052, 'gen_len': 12.3111}




  4%|▍         | 48/1172 [51:55<20:35:43, 65.96s/it]

For epoch 52: {Learning rate: [0.008986099866149056]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.28batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.82batches/s]



Metrics: {'train_loss': 0.01618501619531179, 'test_loss': 1.290345831712087, 'bleu': 3.0273, 'gen_len': 11.8267}




  4%|▍         | 49/1172 [53:02<20:36:26, 66.06s/it]

For epoch 53: {Learning rate: [0.008978038762103785]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.33batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.56batches/s]



Metrics: {'train_loss': 0.01655503252418492, 'test_loss': 1.2790705849726995, 'bleu': 2.7919, 'gen_len': 11.56}




  4%|▍         | 50/1172 [54:09<20:43:45, 66.51s/it]

For epoch 54: {Learning rate: [0.008969977658058516]}


Train batch number 127: 100%|██████████| 127/127 [00:58<00:00,  2.16batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.62batches/s]



Metrics: {'train_loss': 0.015960717584761815, 'test_loss': 1.2592861304680507, 'bleu': 3.2477, 'gen_len': 11.9689}




  4%|▍         | 51/1172 [55:20<21:08:10, 67.88s/it]

For epoch 55: {Learning rate: [0.008961916554013247]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.35batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.66batches/s]



Metrics: {'train_loss': 0.015214998617870953, 'test_loss': 1.277988630036513, 'bleu': 3.0649, 'gen_len': 11.9689}




  4%|▍         | 52/1172 [56:26<20:57:49, 67.38s/it]

For epoch 56: {Learning rate: [0.008953855449967976]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.33batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.66batches/s]



Metrics: {'train_loss': 0.014713839160307772, 'test_loss': 1.264412888387839, 'bleu': 3.2779, 'gen_len': 11.96}




  5%|▍         | 53/1172 [57:33<20:50:00, 67.02s/it]

For epoch 57: {Learning rate: [0.008945794345922707]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.31batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.97batches/s]



Metrics: {'train_loss': 0.014775067956487494, 'test_loss': 1.2535011400779088, 'bleu': 2.921, 'gen_len': 12.0267}




  5%|▍         | 54/1172 [58:39<20:42:50, 66.70s/it]

For epoch 58: {Learning rate: [0.008937733241877438]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.29batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.88batches/s]



Metrics: {'train_loss': 0.015316532240945875, 'test_loss': 1.2525913447141648, 'bleu': 3.0528, 'gen_len': 11.7244}




  5%|▍         | 55/1172 [59:45<20:38:22, 66.52s/it]

For epoch 59: {Learning rate: [0.008929672137832169]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.34batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.75batches/s]



Metrics: {'train_loss': 0.015251041267169859, 'test_loss': 1.2623561958471934, 'bleu': 2.801, 'gen_len': 11.6044}




  5%|▍         | 56/1172 [1:00:50<20:31:29, 66.21s/it]

For epoch 60: {Learning rate: [0.0089216110337869]}


Train batch number 127: 100%|██████████| 127/127 [00:59<00:00,  2.13batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.59batches/s]



Metrics: {'train_loss': 0.014266353899862354, 'test_loss': 1.2725721513231596, 'bleu': 2.8484, 'gen_len': 12.04}




  5%|▍         | 57/1172 [1:02:02<20:59:40, 67.78s/it]

For epoch 61: {Learning rate: [0.00891354992974163]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.35batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.87batches/s]



Metrics: {'train_loss': 0.014286225778967376, 'test_loss': 1.2677295366923014, 'bleu': 2.6808, 'gen_len': 11.7111}




  5%|▍         | 58/1172 [1:03:07<20:42:59, 66.95s/it]

For epoch 62: {Learning rate: [0.00890548882569636]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.38batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.56batches/s]



Metrics: {'train_loss': 0.01468441715669327, 'test_loss': 1.254246364037196, 'bleu': 3.0833, 'gen_len': 12.4089}




  5%|▌         | 59/1172 [1:04:12<20:35:05, 66.58s/it]

For epoch 63: {Learning rate: [0.00889742772165109]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.36batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.62batches/s]



Metrics: {'train_loss': 0.013914496080553907, 'test_loss': 1.2674093589186668, 'bleu': 3.1902, 'gen_len': 12.2089}




  5%|▌         | 60/1172 [1:05:18<20:29:36, 66.35s/it]

For epoch 64: {Learning rate: [0.008889366617605822]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.37batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.04batches/s]



Metrics: {'train_loss': 0.013801654219612713, 'test_loss': 1.2633440082271894, 'bleu': 2.9605, 'gen_len': 11.9422}




  5%|▌         | 61/1172 [1:06:22<20:14:37, 65.60s/it]

For epoch 65: {Learning rate: [0.008881305513560553]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.32batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.97batches/s]



Metrics: {'train_loss': 0.013372923196183415, 'test_loss': 1.2586288223663966, 'bleu': 3.2671, 'gen_len': 12.1556}




  5%|▌         | 62/1172 [1:07:27<20:08:04, 65.30s/it]

For epoch 66: {Learning rate: [0.008873244409515283]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.44batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.10batches/s]



Metrics: {'train_loss': 0.012618114299133537, 'test_loss': 1.253600569566091, 'bleu': 3.1416, 'gen_len': 12.2133}




  5%|▌         | 63/1172 [1:08:28<19:43:49, 64.05s/it]

For epoch 67: {Learning rate: [0.008865183305470013]}


Train batch number 127: 100%|██████████| 127/127 [00:51<00:00,  2.46batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.72batches/s]



Metrics: {'train_loss': 0.011483285317608104, 'test_loss': 1.244955080250899, 'bleu': 3.0369, 'gen_len': 11.7422}




  5%|▌         | 64/1172 [1:09:30<19:35:06, 63.63s/it]

For epoch 68: {Learning rate: [0.008857122201424743]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.33batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.73batches/s]



Metrics: {'train_loss': 0.011518455312154658, 'test_loss': 1.2464750468730927, 'bleu': 3.2783, 'gen_len': 11.7378}




  6%|▌         | 65/1172 [1:10:36<19:45:20, 64.25s/it]

For epoch 69: {Learning rate: [0.008849061097379474]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.28batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.81batches/s]



Metrics: {'train_loss': 0.01128906418393388, 'test_loss': 1.2438841914137204, 'bleu': 3.5055, 'gen_len': 11.8933}




  6%|▌         | 66/1172 [1:11:44<20:02:49, 65.25s/it]

For epoch 70: {Learning rate: [0.008840999993334205]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.35batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.86batches/s]



Metrics: {'train_loss': 0.01139642784782634, 'test_loss': 1.2490117887655894, 'bleu': 3.1016, 'gen_len': 11.7022}




  6%|▌         | 67/1172 [1:12:48<19:58:08, 65.06s/it]

For epoch 71: {Learning rate: [0.008832938889288936]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.37batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.77batches/s]



Metrics: {'train_loss': 0.010796213924048805, 'test_loss': 1.2551923568050067, 'bleu': 3.0572, 'gen_len': 11.9556}




  6%|▌         | 68/1172 [1:13:52<19:52:03, 64.79s/it]

For epoch 72: {Learning rate: [0.008824877785243667]}


Train batch number 127: 100%|██████████| 127/127 [00:56<00:00,  2.27batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.92batches/s]



Metrics: {'train_loss': 0.011415059196095414, 'test_loss': 1.2575892224907874, 'bleu': 3.0184, 'gen_len': 11.7467}




  6%|▌         | 69/1172 [1:14:59<19:58:31, 65.20s/it]

For epoch 73: {Learning rate: [0.008816816681198396]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.33batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.94batches/s]



Metrics: {'train_loss': 0.011356013984690735, 'test_loss': 1.2533033097783723, 'bleu': 2.9738, 'gen_len': 11.8}




  6%|▌         | 70/1172 [1:16:03<19:55:36, 65.10s/it]

For epoch 74: {Learning rate: [0.008808755577153127]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.27batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.05batches/s]



Metrics: {'train_loss': 0.01157918280379216, 'test_loss': 1.2663427953918776, 'bleu': 2.7826, 'gen_len': 11.9911}




  6%|▌         | 71/1172 [1:17:10<20:01:02, 65.45s/it]

For epoch 75: {Learning rate: [0.008800694473107858]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.32batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.90batches/s]



Metrics: {'train_loss': 0.010977354973158616, 'test_loss': 1.24895894775788, 'bleu': 3.1457, 'gen_len': 12.0178}




  6%|▌         | 72/1172 [1:18:15<19:58:36, 65.38s/it]

For epoch 76: {Learning rate: [0.008792633369062589]}


Train batch number 127: 100%|██████████| 127/127 [00:56<00:00,  2.23batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.83batches/s]



Metrics: {'train_loss': 0.01068192625997632, 'test_loss': 1.237419489522775, 'bleu': 3.1233, 'gen_len': 12.0844}




  6%|▌         | 73/1172 [1:19:23<20:10:16, 66.08s/it]

For epoch 77: {Learning rate: [0.00878457226501732]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.29batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.81batches/s]



Metrics: {'train_loss': 0.010671870625306537, 'test_loss': 1.2685782119631768, 'bleu': 3.0597, 'gen_len': 11.84}




  6%|▋         | 74/1172 [1:20:29<20:13:30, 66.31s/it]

For epoch 78: {Learning rate: [0.008776511160972049]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.31batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.65batches/s]



Metrics: {'train_loss': 0.010841947168067802, 'test_loss': 1.238839610417684, 'bleu': 3.0547, 'gen_len': 12.0356}




  6%|▋         | 75/1172 [1:21:37<20:19:52, 66.72s/it]

For epoch 79: {Learning rate: [0.00876845005692678]}


Train batch number 127: 100%|██████████| 127/127 [01:02<00:00,  2.03batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:10<00:00,  1.44batches/s]



Metrics: {'train_loss': 0.010681639848951631, 'test_loss': 1.264217951397101, 'bleu': 3.0239, 'gen_len': 11.8178}




  6%|▋         | 76/1172 [1:22:53<21:09:14, 69.48s/it]

For epoch 80: {Learning rate: [0.00876038895288151]}


Train batch number 127: 100%|██████████| 127/127 [01:01<00:00,  2.06batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.52batches/s]



Metrics: {'train_loss': 0.010533636670530313, 'test_loss': 1.2479702283938725, 'bleu': 3.1835, 'gen_len': 11.5067}




  7%|▋         | 77/1172 [1:24:08<21:39:36, 71.21s/it]

For epoch 81: {Learning rate: [0.008752327848836242]}


Train batch number 127: 100%|██████████| 127/127 [00:57<00:00,  2.23batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.79batches/s]



Metrics: {'train_loss': 0.010373710476844681, 'test_loss': 1.2457130586107572, 'bleu': 3.1153, 'gen_len': 12.1733}




  7%|▋         | 78/1172 [1:25:18<21:29:56, 70.75s/it]

For epoch 82: {Learning rate: [0.008744266744790972]}


Train batch number 127: 100%|██████████| 127/127 [01:02<00:00,  2.02batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:10<00:00,  1.47batches/s]



Metrics: {'train_loss': 0.009749921220244737, 'test_loss': 1.2578890115022658, 'bleu': 3.4427, 'gen_len': 12.0356}




  7%|▋         | 79/1172 [1:26:34<21:55:32, 72.22s/it]

For epoch 83: {Learning rate: [0.008736205640745703]}


Train batch number 127: 100%|██████████| 127/127 [00:57<00:00,  2.21batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.84batches/s]



Metrics: {'train_loss': 0.008704951016496367, 'test_loss': 1.2561772247155507, 'bleu': 3.5245, 'gen_len': 11.8178}




  7%|▋         | 80/1172 [1:27:43<21:37:13, 71.28s/it]

For epoch 84: {Learning rate: [0.008728144536700433]}


Train batch number 127: 100%|██████████| 127/127 [01:02<00:00,  2.02batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:16<00:00,  1.13s/batches]



Metrics: {'train_loss': 0.009714044119033404, 'test_loss': 1.2473492483297983, 'bleu': 3.1548, 'gen_len': 11.8089}




  7%|▋         | 81/1172 [1:29:06<22:41:11, 74.86s/it]

For epoch 85: {Learning rate: [0.008720083432655163]}


Train batch number 127: 100%|██████████| 127/127 [00:57<00:00,  2.20batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.00batches/s]



Metrics: {'train_loss': 0.009715430532017445, 'test_loss': 1.2578106611967086, 'bleu': 3.0339, 'gen_len': 11.6533}




  7%|▋         | 82/1172 [1:30:14<22:03:48, 72.87s/it]

For epoch 86: {Learning rate: [0.008712022328609894]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.33batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.73batches/s]



Metrics: {'train_loss': 0.009467858157494641, 'test_loss': 1.2422298987706502, 'bleu': 3.0377, 'gen_len': 11.7689}




  7%|▋         | 83/1172 [1:31:20<21:23:49, 70.73s/it]

For epoch 87: {Learning rate: [0.008703961224564623]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.33batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.72batches/s]



Metrics: {'train_loss': 0.009166292780203613, 'test_loss': 1.2621286322673162, 'bleu': 2.9259, 'gen_len': 11.9022}




  7%|▋         | 84/1172 [1:32:26<20:55:25, 69.23s/it]

For epoch 88: {Learning rate: [0.008695900120519356]}


Train batch number 127: 100%|██████████| 127/127 [00:57<00:00,  2.20batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:12<00:00,  1.23batches/s]



Metrics: {'train_loss': 0.009470785187014679, 'test_loss': 1.241885278125604, 'bleu': 3.2487, 'gen_len': 11.4667}




  7%|▋         | 85/1172 [1:33:38<21:13:41, 70.31s/it]

For epoch 89: {Learning rate: [0.008687839016474085]}


Train batch number 127: 100%|██████████| 127/127 [01:03<00:00,  2.01batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:10<00:00,  1.49batches/s]



Metrics: {'train_loss': 0.00923583906063238, 'test_loss': 1.240811205903689, 'bleu': 3.7498, 'gen_len': 12.0178}




  7%|▋         | 86/1172 [1:34:55<21:47:40, 72.25s/it]

For epoch 90: {Learning rate: [0.008679777912428816]}


Train batch number 127: 100%|██████████| 127/127 [00:58<00:00,  2.17batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:09<00:00,  1.64batches/s]



Metrics: {'train_loss': 0.008812962680153079, 'test_loss': 1.2554672588904698, 'bleu': 2.7211, 'gen_len': 11.48}




  7%|▋         | 87/1172 [1:36:06<21:37:06, 71.73s/it]

For epoch 91: {Learning rate: [0.008671716808383547]}


Train batch number 127: 100%|██████████| 127/127 [01:00<00:00,  2.11batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:10<00:00,  1.47batches/s]



Metrics: {'train_loss': 0.008352183812026551, 'test_loss': 1.240593791504701, 'bleu': 3.0456, 'gen_len': 11.88}




  8%|▊         | 88/1172 [1:37:21<21:53:52, 72.72s/it]

For epoch 92: {Learning rate: [0.008663655704338278]}


Train batch number 127: 100%|██████████| 127/127 [01:01<00:00,  2.05batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.85batches/s]



Metrics: {'train_loss': 0.007838771295345088, 'test_loss': 1.260141459107399, 'bleu': 2.9027, 'gen_len': 11.5556}




  8%|▊         | 89/1172 [1:38:35<22:03:21, 73.32s/it]

For epoch 93: {Learning rate: [0.008655594600293007]}


Train batch number 127: 100%|██████████| 127/127 [00:58<00:00,  2.19batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.77batches/s]



Metrics: {'train_loss': 0.008547606161157564, 'test_loss': 1.244903488457203, 'bleu': 3.3342, 'gen_len': 12.0933}




  8%|▊         | 90/1172 [1:39:45<21:40:53, 72.14s/it]

For epoch 94: {Learning rate: [0.00864753349624774]}


Train batch number 127: 100%|██████████| 127/127 [00:55<00:00,  2.30batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.85batches/s]



Metrics: {'train_loss': 0.008085974222635777, 'test_loss': 1.2459709386030833, 'bleu': 2.8157, 'gen_len': 12.1689}




  8%|▊         | 91/1172 [1:40:51<21:05:04, 70.22s/it]

For epoch 95: {Learning rate: [0.008639472392202469]}


Train batch number 127: 100%|██████████| 127/127 [00:53<00:00,  2.37batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.68batches/s]



Metrics: {'train_loss': 0.00823400850869774, 'test_loss': 1.2464361131191253, 'bleu': 3.1481, 'gen_len': 11.72}




  8%|▊         | 92/1172 [1:41:56<20:35:58, 68.67s/it]

For epoch 96: {Learning rate: [0.0086314112881572]}


Train batch number 127: 100%|██████████| 127/127 [01:00<00:00,  2.09batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:10<00:00,  1.42batches/s]



Metrics: {'train_loss': 0.008565918553444579, 'test_loss': 1.2408836990594865, 'bleu': 2.8425, 'gen_len': 11.6889}




  8%|▊         | 93/1172 [1:43:10<21:05:05, 70.35s/it]

For epoch 97: {Learning rate: [0.00862335018411193]}


Train batch number 127: 100%|██████████| 127/127 [01:03<00:00,  2.00batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:10<00:00,  1.40batches/s]



Metrics: {'train_loss': 0.008901374671226881, 'test_loss': 1.2536312202612558, 'bleu': 3.2971, 'gen_len': 11.96}




  8%|▊         | 94/1172 [1:44:27<21:40:41, 72.39s/it]

For epoch 98: {Learning rate: [0.00861528908006666]}


Train batch number 127: 100%|██████████| 127/127 [01:00<00:00,  2.11batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:11<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.008308192878213572, 'test_loss': 1.2490251590808232, 'bleu': 3.282, 'gen_len': 11.7867}




  8%|▊         | 95/1172 [1:45:42<21:54:24, 73.23s/it]

For epoch 99: {Learning rate: [0.00860722797602139]}


Train batch number 127: 100%|██████████| 127/127 [01:06<00:00,  1.91batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  2.00batches/s]



Metrics: {'train_loss': 0.008718898024879337, 'test_loss': 1.2336109509070714, 'bleu': 3.2139, 'gen_len': 12.08}




  8%|▊         | 96/1172 [1:47:00<22:17:30, 74.58s/it]

For epoch 100: {Learning rate: [0.008599166871976122]}


Train batch number 127: 100%|██████████| 127/127 [01:01<00:00,  2.06batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:11<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.008138679501446566, 'test_loss': 1.252328943212827, 'bleu': 2.7632, 'gen_len': 11.8889}




  8%|▊         | 97/1172 [1:48:17<22:27:02, 75.18s/it]

For epoch 101: {Learning rate: [0.008591105767930853]}


Train batch number 127: 100%|██████████| 127/127 [00:59<00:00,  2.14batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.78batches/s]



Metrics: {'train_loss': 0.008658011974656852, 'test_loss': 1.2516943742831548, 'bleu': 3.3903, 'gen_len': 11.6933}




  8%|▊         | 98/1172 [1:49:27<21:59:04, 73.69s/it]

For epoch 102: {Learning rate: [0.008583044663885583]}


Train batch number 127: 100%|██████████| 127/127 [00:52<00:00,  2.41batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:07<00:00,  1.92batches/s]



Metrics: {'train_loss': 0.008323601119799583, 'test_loss': 1.2326380034287772, 'bleu': 3.0034, 'gen_len': 11.8978}




  8%|▊         | 99/1172 [1:50:29<20:58:43, 70.38s/it]

For epoch 103: {Learning rate: [0.008574983559840314]}


Train batch number 127: 100%|██████████| 127/127 [00:59<00:00,  2.14batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:10<00:00,  1.45batches/s]



Metrics: {'train_loss': 0.008425784555784478, 'test_loss': 1.2456810176372528, 'bleu': 3.5155, 'gen_len': 11.6133}




  9%|▊         | 100/1172 [1:51:42<21:10:18, 71.10s/it]

For epoch 104: {Learning rate: [0.008566922455795043]}


Train batch number 127: 100%|██████████| 127/127 [01:08<00:00,  1.86batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:11<00:00,  1.26batches/s]



Metrics: {'train_loss': 0.008773180291305964, 'test_loss': 1.247572856148084, 'bleu': 3.3875, 'gen_len': 11.9378}




  9%|▊         | 101/1172 [1:53:06<22:16:57, 74.90s/it]

For epoch 105: {Learning rate: [0.008558861351749776]}


Train batch number 127: 100%|██████████| 127/127 [01:01<00:00,  2.06batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.86batches/s]



Metrics: {'train_loss': 0.00891343998634733, 'test_loss': 1.2530828575293222, 'bleu': 3.304, 'gen_len': 11.9556}




  9%|▊         | 102/1172 [1:54:20<22:11:07, 74.64s/it]

For epoch 106: {Learning rate: [0.008550800247704505]}


Train batch number 127: 100%|██████████| 127/127 [00:54<00:00,  2.31batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:08<00:00,  1.72batches/s]



Metrics: {'train_loss': 0.007870847318966792, 'test_loss': 1.2429321552316348, 'bleu': 2.9701, 'gen_len': 11.6578}




  9%|▉         | 103/1172 [1:55:26<21:24:05, 72.07s/it]

For epoch 107: {Learning rate: [0.008542739143659236]}


Train batch number 127: 100%|██████████| 127/127 [00:56<00:00,  2.26batches/s]
Test batch number 15: 100%|██████████| 15/15 [00:11<00:00,  1.35batches/s]



Metrics: {'train_loss': 0.008017809968075062, 'test_loss': 1.2528435369332631, 'bleu': 2.9606, 'gen_len': 11.7111}




  9%|▉         | 104/1172 [1:56:36<21:11:17, 71.42s/it]

For epoch 108: {Learning rate: [0.008534678039613967]}


Train batch number 12:   9%|▊         | 11/127 [00:05<00:55,  2.09batches/s]
  9%|▉         | 104/1172 [1:56:46<19:59:10, 67.37s/it]


KeyboardInterrupt: 

### Predictions and Evaluation

In [None]:
# let us get the best model
# model = T5ForConditionalGeneration.from_pretrained('data/checkpoints/t5_results_fw_v3/...')

# let us get the test set
test_dataset = SentenceDataset(f"data/extractions/new_data/test_set.csv",
                                        tokenizer,
                                        truncation = True)

Let us make the evaluation and print the predicted sentences.

In [None]:
# evaluation with test set
df_ft_to_wf = trainer.evaluate(test_dataset)

Evaluation batch number 11: 100%|██████████| 11/11 [00:05<00:00,  1.88batches/s]


Let us display the 10 last sentences.

In [11]:
df_ft_to_wf.tail(10)

Unnamed: 0,original_text,original_label,predicted_label
152,"Homme, lion, boeuf... allaient de concert.","Nit, gayndé, nag... àndoon nañu fi.","Nit, gayndé, nag, àndoon nañu fi."
153,C'est toi qui eusses été élu,Yaa doonkoon falu,Yaa doonkoon wax
154,L'homme ne cultivera pas,Góor gi du bày,Góor gi bëggul
155,S'agiter simplement ne suffit à rien résoudre.,Di tel-teli doŋŋ taxul sotal dara.,Nit ñenn ñi yegseeguñu.
156,C'était son hôte habituellement.,Moo doon ganam.,Man xar mépp.
157,Je parle de ceux-là!,Yenn xar yooyuu laa wax!,Yaw moomu laa wax
158,Tu reconnais cet enfant-ci?,Xammee ŋga bee xale?,Xammee ŋga waa jooju?
159,"Alors l'homme entra, les enfants le virent, il...","Noona góor gi dugg, xale yi gis ka, mu toog, ñ...","Noona Góor gaa ŋgi, mu ñëw."
160,C'est leur ami!,Suñu xarit la!,Su demee
161,Il était Lebou de Yoff.,Mu doon Lebu Yoff.,Dafa doon nitu dëgg.


Let us display 100 samples.

In [12]:
# let us display 100 samples
pd.options.display.max_rows = 100
df_ft_to_wf.sample(100)

Unnamed: 0,original_text,original_label,predicted_label
105,Qui est-ce?,Ñan la?,Ku mu?
80,Tu as dit cela.,La ŋga wax la.,Li ŋga wax loolu.
52,A Moussa!,Musaa!,Musaa
132,Je connais l'enfant.,Xam naa xale bi.,Xam naa xale bi.
59,L'homme qui eût travaillé,Waa ji liggéeykoon,Góor gi waxkoon na
54,Le voilà qui part!,Mi ŋgiiy!,Ma ŋgee doon dem
115,Que tu partes ou que tu ne partes pas il viendra.,Dana ñëw soo demul ag soo demee itam.,"Soo demee ag soo demul itam, dana ñëw."
114,C'est l'homme qui a soutenu qu'il est sain d'e...,"Góor gee ni nit la, soo demee!",Góor gee ni soo demee nit la
46,J'ai vu mes amis!,Gis naa sana xarit yi!,Gis naa sama xarit yeneen yooyuu
147,Appelle l'homme qui ne part pas,Wool góor gi dul dem,Wool góor gi dul dem
