Fine-tuning best T5 Transformer 🤖
-----------------------------------

In this notebook, we will continue the fine-tuning of T5 transformer on the new extracted sentences from the bool **Grammaire de Wolof Moderne** without considering the definitions. We obtained, after a hyperparameter tuning with `wandb`, a best bleu score of **4.281** for the french to wolof translation model. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step. It is important to notice that we will evaluate the training on the validation dataset.

- Parallel coordinates from panel:

- Parameter importance char: 
[t5_v3_importance](https://wandb.ai/oumar-kane-team/small-t5-cross-fw-translation-bayes-hpsearch-v3/reports/undefined-23-05-16-10-36-17---Vmlldzo0Mzc4NDY0?accessToken=eyaiyrid0qz1zg2jkq3fc65biw53084dpfitbi0dgonq6mweupw6kgjml9d2nv1w)

We can see in the above chart that the batch is the most important parameter with a negative correlation with the BLEU score (meaning that a lower batch size is better). Next, we the probability of modifying a character in the french corpus is also important and a high probability provide a better BLEU score.  

In [1]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, AdamW, set_seed
from wolof_translate.utils.sent_transformers import TransformerSequences
from wolof_translate.data.dataset_v2 import T5SentenceDataset
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
from functools import partial
from tqdm import tqdm
import pandas as pd
import numpy as np
import evaluate
import torch
import os

os.environ["WANDB_DISABLED"] = "true"

  from .autonotebook import tqdm as notebook_tqdm


## French to wolof

### Configure dataset 🔠

In [2]:
def split_data(random_state: int = 50):
  """Split data between train, validation and test sets

  Args:
    random_state (int): the seed of the splitting generator. Defaults to 50
  """
  # load the corpora and split into train and test sets
  corpora = pd.read_csv("data/extractions/new_data/sentences.csv")

  train_set, test_set = train_test_split(corpora, test_size=0.1, random_state=random_state)

  # let us save the final training set when performing

  train_set, valid_set = train_test_split(train_set, test_size=0.1, random_state=random_state)

  train_set.to_csv("data/extractions/new_data/final_train_set.csv", index=False)

  # let us save the sets
  train_set.to_csv(f"data/extractions/new_data/train_set.csv", index=False)

  valid_set.to_csv(f"data/extractions/new_data/valid_set.csv", index=False)

  test_set.to_csv(f"data/extractions/new_data/test_set.csv", index=False)

In [3]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v3.json")


In [4]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float):

  # Create augmentation to add on French sentences
  fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p),
                                        remove_mark_space, delete_guillemet_space)

  # Recuperate the train dataset
  train_dataset_aug = T5SentenceDataset(f"data/extractions/new_data/train_set.csv",
                                        tokenizer,
                                        truncation = True,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the valid dataset
  valid_dataset = T5SentenceDataset(f"data/extractions/new_data/valid_set.csv",
                                        tokenizer,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

### Configure the model and the evaluation function ⚙️

Let us recuperate the model and resize the token embeddings.

In [5]:
# Initialize the model name
model_name = 't5-small'

# import the model with its pre-trained weights
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, add_cross_attention = True)

# resize the token embeddings
model.resize_token_embeddings(len(tokenizer))

Embedding(1942, 512)

Let us evaluate the predictions with the `bleu` metric.

In [6]:
%%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, self.tokenizer.pad_token_id)
        
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

Overwriting wolof-translate/wolof_translate/utils/evaluation.py


Let us initialize the evaluation object.

In [7]:
%run wolof-translate/wolof_translate/utils/evaluation.py
evaluation = TranslationEvaluation(tokenizer)


### Searching for the best parameters 🕖

Let us define the data collator.

In [8]:
def data_collator(batch):
    """Generate a batch of data to provide to trainer

    Args:
        batch (_type_): The batch

    Returns:
        dict: A dictionary containing the ids, the attention mask and the labels
    """
    input_ids = torch.stack([b[0].squeeze(0) for b in batch])
    
    attention_mask = torch.stack([b[1].squeeze(0) for b in batch])
    
    labels = torch.stack([b[2].squeeze(0) for b in batch])
    
    return {'input_ids': input_ids, 'attention_mask': attention_mask,
            'labels': labels}

Let us continue the training until reaching 1000 epochs.

### ---

In [9]:
# %%wandb

"""Best parameters
learning_rate = 0.002487755684767169
weight_decay = 0.6508206071726469
train_batch_size = 16
random_state = 0
fr_char_p = 0.6448488763023017
fr_word_p = 0.7689809343290729
eval/bleu = 3.4417
"""

# let us define a directory
directory = "data/checkpoints/t5_results_fw_v3"

# seed
set_seed(0)

# split the data
split_data(random_state=0)

# let us recuperate the datasets
train_dataset, valid_dataset = recuperate_datasets(0.6448488763023017, 0.7689809343290729)

# set training arguments
training_args = Seq2SeqTrainingArguments(directory,
                                    logging_dir="data/logs/results_fw_v3",
                                    num_train_epochs=1000,
                                    load_best_model_at_end=True,
                                    save_strategy="epoch",
                                    evaluation_strategy="epoch",
                                    logging_strategy="epoch",
                                    per_device_train_batch_size=16, 
                                    per_device_eval_batch_size=16,
                                    learning_rate=0.002487755684767169,
                                    weight_decay=0.6508206071726469,
                                    predict_with_generate=True, # we will use predict with generate in order to obtain more valuable test results
                                    fp16 = True,
                                    metric_for_best_model = 'bleu', # a bleu score will be used to find the best model
                                    greater_is_better = True,
                                    save_total_limit = 2, # we will save only the best model
                                    )   

# define training loop
trainer = Seq2SeqTrainer(model = model,
                  args=training_args,
                  train_dataset=train_dataset, 
                  eval_dataset=valid_dataset,
                  data_collator=data_collator,
                  compute_metrics=evaluation.compute_metrics
                  )

# load last checkpoint
# trainer._load_from_checkpoint("data/training2/results/checkpoint-147")

# start training loop
trainer.train()
# trainer.train('data/checkpoints/fw_t5_small_v3_checkpoints/') # from the searching best model



Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
  0%|          | 82/82000 [00:26<5:46:00,  3.95it/s] 

{'loss': 0.9112, 'learning_rate': 0.002485328606050323, 'epoch': 1.0}


                                                    
  0%|          | 82/82000 [00:31<5:46:00,  3.95it/s]

{'eval_loss': 0.6643708944320679, 'eval_bleu': 1.888, 'eval_gen_len': 5.4178, 'eval_runtime': 5.455, 'eval_samples_per_second': 26.764, 'eval_steps_per_second': 1.833, 'epoch': 1.0}


  0%|          | 164/82000 [00:50<4:02:44,  5.62it/s]

{'loss': 0.6128, 'learning_rate': 0.002482840850365556, 'epoch': 2.0}


                                                     
  0%|          | 164/82000 [00:54<4:02:44,  5.62it/s]

{'eval_loss': 0.5834963321685791, 'eval_bleu': 1.1492, 'eval_gen_len': 7.7603, 'eval_runtime': 4.2014, 'eval_samples_per_second': 34.751, 'eval_steps_per_second': 2.38, 'epoch': 2.0}


  0%|          | 246/82000 [01:11<4:58:35,  4.56it/s] 

{'loss': 0.5008, 'learning_rate': 0.0024803530946807886, 'epoch': 3.0}


                                                     
  0%|          | 246/82000 [01:16<4:58:35,  4.56it/s]

{'eval_loss': 0.5533992648124695, 'eval_bleu': 3.141, 'eval_gen_len': 6.1438, 'eval_runtime': 4.3954, 'eval_samples_per_second': 33.216, 'eval_steps_per_second': 2.275, 'epoch': 3.0}


  0%|          | 328/82000 [01:35<4:55:49,  4.60it/s] 

{'loss': 0.4091, 'learning_rate': 0.0024778653389960215, 'epoch': 4.0}


                                                     
  0%|          | 328/82000 [01:41<4:55:49,  4.60it/s]

{'eval_loss': 0.5341570973396301, 'eval_bleu': 2.5513, 'eval_gen_len': 7.589, 'eval_runtime': 5.1677, 'eval_samples_per_second': 28.253, 'eval_steps_per_second': 1.935, 'epoch': 4.0}


  0%|          | 410/82000 [02:00<4:33:40,  4.97it/s] 

{'loss': 0.3335, 'learning_rate': 0.0024753775833112543, 'epoch': 5.0}


                                                     
  0%|          | 410/82000 [02:04<4:33:40,  4.97it/s]

{'eval_loss': 0.5343299508094788, 'eval_bleu': 6.1364, 'eval_gen_len': 5.4247, 'eval_runtime': 3.8755, 'eval_samples_per_second': 37.673, 'eval_steps_per_second': 2.58, 'epoch': 5.0}


  1%|          | 492/82000 [02:22<4:21:37,  5.19it/s] 

{'loss': 0.2737, 'learning_rate': 0.002472889827626487, 'epoch': 6.0}


                                                     
  1%|          | 492/82000 [02:27<4:21:37,  5.19it/s]

{'eval_loss': 0.5192663073539734, 'eval_bleu': 2.062, 'eval_gen_len': 8.2466, 'eval_runtime': 4.8164, 'eval_samples_per_second': 30.313, 'eval_steps_per_second': 2.076, 'epoch': 6.0}


  1%|          | 574/82000 [02:46<4:21:07,  5.20it/s] 

{'loss': 0.2346, 'learning_rate': 0.00247040207194172, 'epoch': 7.0}


                                                     
  1%|          | 574/82000 [02:50<4:21:07,  5.20it/s]

{'eval_loss': 0.5272434949874878, 'eval_bleu': 3.2023, 'eval_gen_len': 6.6438, 'eval_runtime': 3.9532, 'eval_samples_per_second': 36.932, 'eval_steps_per_second': 2.53, 'epoch': 7.0}


  1%|          | 656/82000 [03:08<7:03:37,  3.20it/s] 

{'loss': 0.2068, 'learning_rate': 0.002467914316256953, 'epoch': 8.0}


                                                     
  1%|          | 656/82000 [03:13<7:03:37,  3.20it/s]

{'eval_loss': 0.5259911417961121, 'eval_bleu': 3.7674, 'eval_gen_len': 6.7123, 'eval_runtime': 4.5033, 'eval_samples_per_second': 32.421, 'eval_steps_per_second': 2.221, 'epoch': 8.0}


  1%|          | 738/82000 [03:33<5:44:54,  3.93it/s] 

{'loss': 0.1842, 'learning_rate': 0.0024654265605721856, 'epoch': 9.0}


                                                     
  1%|          | 738/82000 [03:38<5:44:54,  3.93it/s]

{'eval_loss': 0.525863528251648, 'eval_bleu': 5.5586, 'eval_gen_len': 6.5753, 'eval_runtime': 5.1455, 'eval_samples_per_second': 28.374, 'eval_steps_per_second': 1.943, 'epoch': 9.0}


  1%|          | 820/82000 [03:55<3:57:44,  5.69it/s] 

{'loss': 0.1696, 'learning_rate': 0.0024629388048874185, 'epoch': 10.0}


                                                     
  1%|          | 820/82000 [04:00<3:57:44,  5.69it/s]

{'eval_loss': 0.5171877145767212, 'eval_bleu': 3.2642, 'eval_gen_len': 6.637, 'eval_runtime': 4.4291, 'eval_samples_per_second': 32.964, 'eval_steps_per_second': 2.258, 'epoch': 10.0}


  1%|          | 902/82000 [04:21<6:15:04,  3.60it/s] 

{'loss': 0.167, 'learning_rate': 0.0024604510492026513, 'epoch': 11.0}


                                                     
  1%|          | 902/82000 [04:27<6:15:04,  3.60it/s]

{'eval_loss': 0.5169656872749329, 'eval_bleu': 7.2005, 'eval_gen_len': 6.9041, 'eval_runtime': 5.773, 'eval_samples_per_second': 25.29, 'eval_steps_per_second': 1.732, 'epoch': 11.0}


  1%|          | 984/82000 [04:46<4:44:57,  4.74it/s] 

{'loss': 0.1656, 'learning_rate': 0.002457963293517884, 'epoch': 12.0}


                                                     
  1%|          | 984/82000 [04:51<4:44:57,  4.74it/s]

{'eval_loss': 0.5134344100952148, 'eval_bleu': 3.5828, 'eval_gen_len': 6.5822, 'eval_runtime': 4.5545, 'eval_samples_per_second': 32.056, 'eval_steps_per_second': 2.196, 'epoch': 12.0}


  1%|▏         | 1066/82000 [05:11<4:32:21,  4.95it/s]

{'loss': 0.1611, 'learning_rate': 0.002455475537833117, 'epoch': 13.0}


                                                      
  1%|▏         | 1066/82000 [05:15<4:32:21,  4.95it/s]

{'eval_loss': 0.519092321395874, 'eval_bleu': 4.3172, 'eval_gen_len': 6.3699, 'eval_runtime': 4.0482, 'eval_samples_per_second': 36.065, 'eval_steps_per_second': 2.47, 'epoch': 13.0}


  1%|▏         | 1148/82000 [05:34<4:21:12,  5.16it/s] 

{'loss': 0.1584, 'learning_rate': 0.0024529877821483498, 'epoch': 14.0}


                                                      
  1%|▏         | 1148/82000 [05:38<4:21:12,  5.16it/s]

{'eval_loss': 0.5253440141677856, 'eval_bleu': 7.1389, 'eval_gen_len': 5.9452, 'eval_runtime': 4.0076, 'eval_samples_per_second': 36.431, 'eval_steps_per_second': 2.495, 'epoch': 14.0}


  2%|▏         | 1230/82000 [05:56<4:21:11,  5.15it/s] 

{'loss': 0.1596, 'learning_rate': 0.0024505000264635826, 'epoch': 15.0}


                                                      
  2%|▏         | 1230/82000 [06:00<4:21:11,  5.15it/s]

{'eval_loss': 0.5103241205215454, 'eval_bleu': 4.8722, 'eval_gen_len': 6.9726, 'eval_runtime': 4.1031, 'eval_samples_per_second': 35.582, 'eval_steps_per_second': 2.437, 'epoch': 15.0}


  2%|▏         | 1312/82000 [06:18<6:00:00,  3.74it/s] 

{'loss': 0.1618, 'learning_rate': 0.0024480122707788154, 'epoch': 16.0}


                                                      
  2%|▏         | 1312/82000 [06:24<6:00:00,  3.74it/s]

{'eval_loss': 0.5146852731704712, 'eval_bleu': 5.0764, 'eval_gen_len': 6.5753, 'eval_runtime': 5.3964, 'eval_samples_per_second': 27.055, 'eval_steps_per_second': 1.853, 'epoch': 16.0}


  2%|▏         | 1394/82000 [06:41<4:15:51,  5.25it/s] 

{'loss': 0.1669, 'learning_rate': 0.0024455245150940483, 'epoch': 17.0}


                                                      
  2%|▏         | 1394/82000 [06:45<4:15:51,  5.25it/s]

{'eval_loss': 0.4839060604572296, 'eval_bleu': 7.3675, 'eval_gen_len': 6.3288, 'eval_runtime': 4.0212, 'eval_samples_per_second': 36.308, 'eval_steps_per_second': 2.487, 'epoch': 17.0}


  2%|▏         | 1476/82000 [07:02<4:05:35,  5.46it/s] 

{'loss': 0.1583, 'learning_rate': 0.002443036759409281, 'epoch': 18.0}


                                                      
  2%|▏         | 1476/82000 [07:06<4:05:35,  5.46it/s]

{'eval_loss': 0.5204567909240723, 'eval_bleu': 5.0009, 'eval_gen_len': 6.411, 'eval_runtime': 4.1527, 'eval_samples_per_second': 35.158, 'eval_steps_per_second': 2.408, 'epoch': 18.0}


  2%|▏         | 1558/82000 [07:25<4:19:50,  5.16it/s] 

{'loss': 0.1687, 'learning_rate': 0.002440549003724514, 'epoch': 19.0}


                                                      
  2%|▏         | 1558/82000 [07:29<4:19:50,  5.16it/s]

{'eval_loss': 0.4810434579849243, 'eval_bleu': 4.6553, 'eval_gen_len': 6.5822, 'eval_runtime': 4.0493, 'eval_samples_per_second': 36.056, 'eval_steps_per_second': 2.47, 'epoch': 19.0}


  2%|▏         | 1640/82000 [07:46<4:14:34,  5.26it/s] 

{'loss': 0.2038, 'learning_rate': 0.0024380612480397468, 'epoch': 20.0}


                                                      
  2%|▏         | 1640/82000 [07:51<4:14:34,  5.26it/s]

{'eval_loss': 0.5236563682556152, 'eval_bleu': 5.5922, 'eval_gen_len': 5.7329, 'eval_runtime': 4.4539, 'eval_samples_per_second': 32.781, 'eval_steps_per_second': 2.245, 'epoch': 20.0}


  2%|▏         | 1722/82000 [08:08<4:06:58,  5.42it/s] 

{'loss': 0.2054, 'learning_rate': 0.0024355734923549796, 'epoch': 21.0}


                                                      
  2%|▏         | 1722/82000 [08:12<4:06:58,  5.42it/s]

{'eval_loss': 0.4889267086982727, 'eval_bleu': 4.7385, 'eval_gen_len': 7.1301, 'eval_runtime': 3.8619, 'eval_samples_per_second': 37.805, 'eval_steps_per_second': 2.589, 'epoch': 21.0}


  2%|▏         | 1804/82000 [08:30<4:33:03,  4.89it/s] 

{'loss': 0.1801, 'learning_rate': 0.0024330857366702124, 'epoch': 22.0}


                                                      
  2%|▏         | 1804/82000 [08:34<4:33:03,  4.89it/s]

{'eval_loss': 0.4763026833534241, 'eval_bleu': 8.717, 'eval_gen_len': 6.0822, 'eval_runtime': 4.0225, 'eval_samples_per_second': 36.296, 'eval_steps_per_second': 2.486, 'epoch': 22.0}


  2%|▏         | 1886/82000 [08:52<4:02:55,  5.50it/s] 

{'loss': 0.1697, 'learning_rate': 0.0024305979809854453, 'epoch': 23.0}


                                                      
  2%|▏         | 1886/82000 [08:56<4:02:55,  5.50it/s]

{'eval_loss': 0.4976660907268524, 'eval_bleu': 8.0366, 'eval_gen_len': 5.8425, 'eval_runtime': 4.089, 'eval_samples_per_second': 35.705, 'eval_steps_per_second': 2.446, 'epoch': 23.0}


  2%|▏         | 1968/82000 [09:14<5:34:11,  3.99it/s] 

{'loss': 0.1774, 'learning_rate': 0.002428110225300678, 'epoch': 24.0}


                                                      
  2%|▏         | 1968/82000 [09:18<5:34:11,  3.99it/s]

{'eval_loss': 0.48663225769996643, 'eval_bleu': 8.4666, 'eval_gen_len': 5.4178, 'eval_runtime': 4.0834, 'eval_samples_per_second': 35.755, 'eval_steps_per_second': 2.449, 'epoch': 24.0}


  2%|▎         | 2050/82000 [09:36<4:23:59,  5.05it/s] 

{'loss': 0.1822, 'learning_rate': 0.002425622469615911, 'epoch': 25.0}


                                                      
  2%|▎         | 2050/82000 [09:40<4:23:59,  5.05it/s]

{'eval_loss': 0.4910357892513275, 'eval_bleu': 5.8051, 'eval_gen_len': 5.911, 'eval_runtime': 4.088, 'eval_samples_per_second': 35.714, 'eval_steps_per_second': 2.446, 'epoch': 25.0}


  3%|▎         | 2132/82000 [09:57<3:55:22,  5.66it/s] 

{'loss': 0.1978, 'learning_rate': 0.0024231347139311438, 'epoch': 26.0}


                                                      
  3%|▎         | 2132/82000 [10:01<3:55:22,  5.66it/s]

{'eval_loss': 0.5004274845123291, 'eval_bleu': 4.0842, 'eval_gen_len': 5.6781, 'eval_runtime': 4.122, 'eval_samples_per_second': 35.419, 'eval_steps_per_second': 2.426, 'epoch': 26.0}


  3%|▎         | 2214/82000 [10:19<5:01:23,  4.41it/s] 

{'loss': 0.2005, 'learning_rate': 0.0024206469582463766, 'epoch': 27.0}


                                                      
  3%|▎         | 2214/82000 [10:24<5:01:23,  4.41it/s]

{'eval_loss': 0.4862079322338104, 'eval_bleu': 7.3671, 'eval_gen_len': 5.911, 'eval_runtime': 4.5802, 'eval_samples_per_second': 31.876, 'eval_steps_per_second': 2.183, 'epoch': 27.0}


  3%|▎         | 2296/82000 [10:43<4:25:02,  5.01it/s] 

{'loss': 0.2034, 'learning_rate': 0.0024181592025616094, 'epoch': 28.0}


                                                      
  3%|▎         | 2296/82000 [10:47<4:25:02,  5.01it/s]

{'eval_loss': 0.47637924551963806, 'eval_bleu': 6.3722, 'eval_gen_len': 6.9658, 'eval_runtime': 3.947, 'eval_samples_per_second': 36.99, 'eval_steps_per_second': 2.534, 'epoch': 28.0}


  3%|▎         | 2378/82000 [11:04<4:28:37,  4.94it/s] 

{'loss': 0.2114, 'learning_rate': 0.0024156714468768423, 'epoch': 29.0}


                                                      
  3%|▎         | 2378/82000 [11:08<4:28:37,  4.94it/s]

{'eval_loss': 0.4773467481136322, 'eval_bleu': 6.0272, 'eval_gen_len': 6.2534, 'eval_runtime': 4.0697, 'eval_samples_per_second': 35.875, 'eval_steps_per_second': 2.457, 'epoch': 29.0}


  3%|▎         | 2460/82000 [11:25<4:19:45,  5.10it/s] 

{'loss': 0.2149, 'learning_rate': 0.002413183691192075, 'epoch': 30.0}


                                                      
  3%|▎         | 2460/82000 [11:29<4:19:45,  5.10it/s]

{'eval_loss': 0.45777401328086853, 'eval_bleu': 8.5253, 'eval_gen_len': 6.6164, 'eval_runtime': 4.0266, 'eval_samples_per_second': 36.259, 'eval_steps_per_second': 2.483, 'epoch': 30.0}


  3%|▎         | 2542/82000 [11:47<4:38:37,  4.75it/s] 

{'loss': 0.2068, 'learning_rate': 0.002410695935507308, 'epoch': 31.0}


                                                      
  3%|▎         | 2542/82000 [11:51<4:38:37,  4.75it/s]

{'eval_loss': 0.46189969778060913, 'eval_bleu': 6.7469, 'eval_gen_len': 6.6301, 'eval_runtime': 4.1227, 'eval_samples_per_second': 35.414, 'eval_steps_per_second': 2.426, 'epoch': 31.0}


  3%|▎         | 2624/82000 [12:09<4:09:49,  5.30it/s] 

{'loss': 0.2154, 'learning_rate': 0.0024082081798225408, 'epoch': 32.0}


                                                      
  3%|▎         | 2624/82000 [12:13<4:09:49,  5.30it/s]

{'eval_loss': 0.4839116036891937, 'eval_bleu': 6.9788, 'eval_gen_len': 6.1849, 'eval_runtime': 4.6632, 'eval_samples_per_second': 31.309, 'eval_steps_per_second': 2.144, 'epoch': 32.0}


  3%|▎         | 2706/82000 [12:36<4:09:48,  5.29it/s] 

{'loss': 0.2184, 'learning_rate': 0.0024057204241377736, 'epoch': 33.0}


                                                      
  3%|▎         | 2706/82000 [12:42<4:09:48,  5.29it/s]

{'eval_loss': 0.46528708934783936, 'eval_bleu': 6.4353, 'eval_gen_len': 6.3904, 'eval_runtime': 5.265, 'eval_samples_per_second': 27.73, 'eval_steps_per_second': 1.899, 'epoch': 33.0}


  3%|▎         | 2788/82000 [13:00<4:45:20,  4.63it/s] 

{'loss': 0.213, 'learning_rate': 0.0024032326684530064, 'epoch': 34.0}


                                                      
  3%|▎         | 2788/82000 [13:03<4:45:20,  4.63it/s]

{'eval_loss': 0.479687362909317, 'eval_bleu': 6.277, 'eval_gen_len': 5.726, 'eval_runtime': 3.7447, 'eval_samples_per_second': 38.988, 'eval_steps_per_second': 2.67, 'epoch': 34.0}


  4%|▎         | 2870/82000 [13:22<5:07:50,  4.28it/s] 

{'loss': 0.2091, 'learning_rate': 0.0024007449127682393, 'epoch': 35.0}


                                                      
  4%|▎         | 2870/82000 [13:27<5:07:50,  4.28it/s]

{'eval_loss': 0.4764259457588196, 'eval_bleu': 7.8847, 'eval_gen_len': 6.2397, 'eval_runtime': 4.4508, 'eval_samples_per_second': 32.803, 'eval_steps_per_second': 2.247, 'epoch': 35.0}


  4%|▎         | 2952/82000 [13:45<3:36:51,  6.08it/s] 

{'loss': 0.2183, 'learning_rate': 0.002398257157083472, 'epoch': 36.0}


                                                      
  4%|▎         | 2952/82000 [13:49<3:36:51,  6.08it/s]

{'eval_loss': 0.4749408960342407, 'eval_bleu': 9.1261, 'eval_gen_len': 5.774, 'eval_runtime': 3.8577, 'eval_samples_per_second': 37.847, 'eval_steps_per_second': 2.592, 'epoch': 36.0}


  4%|▎         | 3034/82000 [14:07<5:05:29,  4.31it/s] 

{'loss': 0.2176, 'learning_rate': 0.002395769401398705, 'epoch': 37.0}


                                                      
  4%|▎         | 3034/82000 [14:12<5:05:29,  4.31it/s]

{'eval_loss': 0.4760516881942749, 'eval_bleu': 5.9746, 'eval_gen_len': 6.7055, 'eval_runtime': 4.7695, 'eval_samples_per_second': 30.611, 'eval_steps_per_second': 2.097, 'epoch': 37.0}


  4%|▍         | 3116/82000 [14:33<5:15:21,  4.17it/s] 

{'loss': 0.2471, 'learning_rate': 0.0023932816457139378, 'epoch': 38.0}


                                                      
  4%|▍         | 3116/82000 [14:38<5:15:21,  4.17it/s]

{'eval_loss': 0.46392279863357544, 'eval_bleu': 5.7462, 'eval_gen_len': 6.4521, 'eval_runtime': 5.0936, 'eval_samples_per_second': 28.663, 'eval_steps_per_second': 1.963, 'epoch': 38.0}


  4%|▍         | 3198/82000 [14:59<4:56:13,  4.43it/s] 

{'loss': 0.2457, 'learning_rate': 0.0023907938900291706, 'epoch': 39.0}


                                                      
  4%|▍         | 3198/82000 [15:03<4:56:13,  4.43it/s]

{'eval_loss': 0.5003670454025269, 'eval_bleu': 5.8031, 'eval_gen_len': 5.1986, 'eval_runtime': 4.4717, 'eval_samples_per_second': 32.65, 'eval_steps_per_second': 2.236, 'epoch': 39.0}


  4%|▍         | 3280/82000 [15:21<4:17:33,  5.09it/s] 

{'loss': 0.2203, 'learning_rate': 0.0023883061343444034, 'epoch': 40.0}


                                                      
  4%|▍         | 3280/82000 [15:25<4:17:33,  5.09it/s]

{'eval_loss': 0.4742293059825897, 'eval_bleu': 4.1487, 'eval_gen_len': 5.8973, 'eval_runtime': 4.1287, 'eval_samples_per_second': 35.362, 'eval_steps_per_second': 2.422, 'epoch': 40.0}


  4%|▍         | 3362/82000 [15:42<3:59:52,  5.46it/s] 

{'loss': 0.209, 'learning_rate': 0.0023858183786596363, 'epoch': 41.0}


                                                      
  4%|▍         | 3362/82000 [15:46<3:59:52,  5.46it/s]

{'eval_loss': 0.47078371047973633, 'eval_bleu': 7.4815, 'eval_gen_len': 5.7671, 'eval_runtime': 4.3157, 'eval_samples_per_second': 33.83, 'eval_steps_per_second': 2.317, 'epoch': 41.0}


  4%|▍         | 3444/82000 [16:05<4:48:08,  4.54it/s] 

{'loss': 0.1996, 'learning_rate': 0.002383330622974869, 'epoch': 42.0}


                                                      
  4%|▍         | 3444/82000 [16:10<4:48:08,  4.54it/s]

{'eval_loss': 0.4753517210483551, 'eval_bleu': 6.6572, 'eval_gen_len': 5.6781, 'eval_runtime': 4.979, 'eval_samples_per_second': 29.323, 'eval_steps_per_second': 2.008, 'epoch': 42.0}


  4%|▍         | 3526/82000 [16:30<4:49:19,  4.52it/s] 

{'loss': 0.2095, 'learning_rate': 0.002380842867290102, 'epoch': 43.0}


                                                      
  4%|▍         | 3526/82000 [16:35<4:49:19,  4.52it/s]

{'eval_loss': 0.4772045314311981, 'eval_bleu': 5.3854, 'eval_gen_len': 5.7603, 'eval_runtime': 4.7726, 'eval_samples_per_second': 30.591, 'eval_steps_per_second': 2.095, 'epoch': 43.0}


  4%|▍         | 3608/82000 [17:00<4:36:51,  4.72it/s] 

{'loss': 0.2141, 'learning_rate': 0.0023783551116053347, 'epoch': 44.0}


                                                      
  4%|▍         | 3608/82000 [17:04<4:36:51,  4.72it/s]

{'eval_loss': 0.47274649143218994, 'eval_bleu': 4.1865, 'eval_gen_len': 5.5616, 'eval_runtime': 3.9681, 'eval_samples_per_second': 36.793, 'eval_steps_per_second': 2.52, 'epoch': 44.0}


  4%|▍         | 3690/82000 [17:23<4:34:01,  4.76it/s] 

{'loss': 0.2087, 'learning_rate': 0.0023758673559205676, 'epoch': 45.0}


                                                      
  4%|▍         | 3690/82000 [17:27<4:34:01,  4.76it/s]

{'eval_loss': 0.4692542850971222, 'eval_bleu': 7.1957, 'eval_gen_len': 6.6233, 'eval_runtime': 3.9872, 'eval_samples_per_second': 36.617, 'eval_steps_per_second': 2.508, 'epoch': 45.0}


  5%|▍         | 3772/82000 [17:44<4:22:37,  4.96it/s] 

{'loss': 0.2063, 'learning_rate': 0.0023733796002358004, 'epoch': 46.0}


                                                      
  5%|▍         | 3772/82000 [17:49<4:22:37,  4.96it/s]

{'eval_loss': 0.47898393869400024, 'eval_bleu': 6.3666, 'eval_gen_len': 5.7397, 'eval_runtime': 4.2162, 'eval_samples_per_second': 34.628, 'eval_steps_per_second': 2.372, 'epoch': 46.0}


  5%|▍         | 3854/82000 [18:06<3:55:00,  5.54it/s] 

{'loss': 0.2128, 'learning_rate': 0.0023708918445510332, 'epoch': 47.0}


                                                      
  5%|▍         | 3854/82000 [18:10<3:55:00,  5.54it/s]

{'eval_loss': 0.47242772579193115, 'eval_bleu': 5.819, 'eval_gen_len': 7.7466, 'eval_runtime': 4.5867, 'eval_samples_per_second': 31.831, 'eval_steps_per_second': 2.18, 'epoch': 47.0}


  5%|▍         | 3936/82000 [18:29<4:32:23,  4.78it/s] 

{'loss': 0.2045, 'learning_rate': 0.002368404088866266, 'epoch': 48.0}


                                                      
  5%|▍         | 3936/82000 [18:34<4:32:23,  4.78it/s]

{'eval_loss': 0.45519763231277466, 'eval_bleu': 6.9455, 'eval_gen_len': 6.0959, 'eval_runtime': 4.599, 'eval_samples_per_second': 31.746, 'eval_steps_per_second': 2.174, 'epoch': 48.0}


  5%|▍         | 4018/82000 [18:53<6:33:36,  3.30it/s] 

{'loss': 0.2019, 'learning_rate': 0.002365916333181499, 'epoch': 49.0}


                                                      
  5%|▍         | 4018/82000 [18:57<6:33:36,  3.30it/s]

{'eval_loss': 0.4645976126194, 'eval_bleu': 5.9773, 'eval_gen_len': 6.1096, 'eval_runtime': 4.6225, 'eval_samples_per_second': 31.585, 'eval_steps_per_second': 2.163, 'epoch': 49.0}


  5%|▌         | 4100/82000 [19:18<5:24:00,  4.01it/s] 

{'loss': 0.2052, 'learning_rate': 0.0023634285774967317, 'epoch': 50.0}


                                                      
  5%|▌         | 4100/82000 [19:23<5:24:00,  4.01it/s]

{'eval_loss': 0.48357945680618286, 'eval_bleu': 7.7766, 'eval_gen_len': 6.1712, 'eval_runtime': 5.4932, 'eval_samples_per_second': 26.578, 'eval_steps_per_second': 1.82, 'epoch': 50.0}


  5%|▌         | 4182/82000 [19:42<4:36:14,  4.70it/s] 

{'loss': 0.2013, 'learning_rate': 0.0023609408218119646, 'epoch': 51.0}


                                                      
  5%|▌         | 4182/82000 [19:47<4:36:14,  4.70it/s]

{'eval_loss': 0.48906153440475464, 'eval_bleu': 5.868, 'eval_gen_len': 5.8288, 'eval_runtime': 4.6291, 'eval_samples_per_second': 31.539, 'eval_steps_per_second': 2.16, 'epoch': 51.0}


  5%|▌         | 4264/82000 [20:06<4:22:12,  4.94it/s] 

{'loss': 0.2066, 'learning_rate': 0.0023584530661271974, 'epoch': 52.0}


                                                      
  5%|▌         | 4264/82000 [20:11<4:22:12,  4.94it/s]

{'eval_loss': 0.4712497293949127, 'eval_bleu': 7.7348, 'eval_gen_len': 6.9589, 'eval_runtime': 4.5831, 'eval_samples_per_second': 31.856, 'eval_steps_per_second': 2.182, 'epoch': 52.0}


  5%|▌         | 4346/82000 [20:31<4:59:47,  4.32it/s] 

{'loss': 0.2028, 'learning_rate': 0.0023559653104424302, 'epoch': 53.0}


                                                      
  5%|▌         | 4346/82000 [20:36<4:59:47,  4.32it/s]

{'eval_loss': 0.49783748388290405, 'eval_bleu': 4.3008, 'eval_gen_len': 6.2671, 'eval_runtime': 5.4445, 'eval_samples_per_second': 26.816, 'eval_steps_per_second': 1.837, 'epoch': 53.0}


  5%|▌         | 4428/82000 [20:57<5:36:07,  3.85it/s] 

{'loss': 0.2029, 'learning_rate': 0.002353477554757663, 'epoch': 54.0}


                                                      
  5%|▌         | 4428/82000 [21:03<5:36:07,  3.85it/s]

{'eval_loss': 0.4777381122112274, 'eval_bleu': 6.2268, 'eval_gen_len': 6.137, 'eval_runtime': 6.2295, 'eval_samples_per_second': 23.437, 'eval_steps_per_second': 1.605, 'epoch': 54.0}


  6%|▌         | 4510/82000 [21:24<4:01:21,  5.35it/s] 

{'loss': 0.2004, 'learning_rate': 0.002350989799072896, 'epoch': 55.0}


                                                      
  6%|▌         | 4510/82000 [21:29<4:01:21,  5.35it/s]

{'eval_loss': 0.47311684489250183, 'eval_bleu': 8.486, 'eval_gen_len': 6.5548, 'eval_runtime': 4.4148, 'eval_samples_per_second': 33.07, 'eval_steps_per_second': 2.265, 'epoch': 55.0}


  6%|▌         | 4592/82000 [21:50<4:49:39,  4.45it/s] 

{'loss': 0.1975, 'learning_rate': 0.0023485020433881287, 'epoch': 56.0}


                                                      
  6%|▌         | 4592/82000 [21:56<4:49:39,  4.45it/s]

{'eval_loss': 0.47009047865867615, 'eval_bleu': 7.7914, 'eval_gen_len': 5.9178, 'eval_runtime': 6.2847, 'eval_samples_per_second': 23.231, 'eval_steps_per_second': 1.591, 'epoch': 56.0}


  6%|▌         | 4674/82000 [22:17<5:13:24,  4.11it/s] 

{'loss': 0.1857, 'learning_rate': 0.0023460142877033616, 'epoch': 57.0}


                                                      
  6%|▌         | 4674/82000 [22:23<5:13:24,  4.11it/s]

{'eval_loss': 0.468588262796402, 'eval_bleu': 6.6398, 'eval_gen_len': 6.7055, 'eval_runtime': 6.0301, 'eval_samples_per_second': 24.212, 'eval_steps_per_second': 1.658, 'epoch': 57.0}


  6%|▌         | 4756/82000 [22:43<5:35:54,  3.83it/s] 

{'loss': 0.1856, 'learning_rate': 0.0023435265320185944, 'epoch': 58.0}


                                                      
  6%|▌         | 4756/82000 [22:48<5:35:54,  3.83it/s]

{'eval_loss': 0.47128406167030334, 'eval_bleu': 5.2491, 'eval_gen_len': 5.8356, 'eval_runtime': 5.1113, 'eval_samples_per_second': 28.564, 'eval_steps_per_second': 1.956, 'epoch': 58.0}


  6%|▌         | 4838/82000 [23:06<4:54:13,  4.37it/s] 

{'loss': 0.1851, 'learning_rate': 0.0023410387763338272, 'epoch': 59.0}


                                                      
  6%|▌         | 4838/82000 [23:11<4:54:13,  4.37it/s]

{'eval_loss': 0.4811716079711914, 'eval_bleu': 5.6479, 'eval_gen_len': 6.3425, 'eval_runtime': 5.0808, 'eval_samples_per_second': 28.736, 'eval_steps_per_second': 1.968, 'epoch': 59.0}


  6%|▌         | 4920/82000 [23:32<4:03:44,  5.27it/s] 

{'loss': 0.1882, 'learning_rate': 0.00233855102064906, 'epoch': 60.0}


                                                      
  6%|▌         | 4920/82000 [23:37<4:03:44,  5.27it/s]

{'eval_loss': 0.49536243081092834, 'eval_bleu': 6.4925, 'eval_gen_len': 5.3493, 'eval_runtime': 5.0773, 'eval_samples_per_second': 28.756, 'eval_steps_per_second': 1.97, 'epoch': 60.0}


  6%|▌         | 5002/82000 [23:56<4:29:14,  4.77it/s] 

{'loss': 0.1831, 'learning_rate': 0.002336063264964293, 'epoch': 61.0}


                                                      
  6%|▌         | 5002/82000 [24:02<4:29:14,  4.77it/s]

{'eval_loss': 0.4776536226272583, 'eval_bleu': 5.2777, 'eval_gen_len': 5.9521, 'eval_runtime': 5.2311, 'eval_samples_per_second': 27.91, 'eval_steps_per_second': 1.912, 'epoch': 61.0}


  6%|▌         | 5084/82000 [24:21<5:08:19,  4.16it/s] 

{'loss': 0.1767, 'learning_rate': 0.0023335755092795257, 'epoch': 62.0}


                                                      
  6%|▌         | 5084/82000 [24:28<5:08:19,  4.16it/s]

{'eval_loss': 0.49045467376708984, 'eval_bleu': 6.2196, 'eval_gen_len': 5.8699, 'eval_runtime': 6.5645, 'eval_samples_per_second': 22.241, 'eval_steps_per_second': 1.523, 'epoch': 62.0}


  6%|▋         | 5166/82000 [24:49<5:36:18,  3.81it/s] 

{'loss': 0.179, 'learning_rate': 0.0023310877535947586, 'epoch': 63.0}


                                                      
  6%|▋         | 5166/82000 [24:55<5:36:18,  3.81it/s]

{'eval_loss': 0.4959859848022461, 'eval_bleu': 3.4197, 'eval_gen_len': 6.3425, 'eval_runtime': 5.2711, 'eval_samples_per_second': 27.698, 'eval_steps_per_second': 1.897, 'epoch': 63.0}


  6%|▋         | 5248/82000 [25:16<4:06:18,  5.19it/s] 

{'loss': 0.1842, 'learning_rate': 0.0023285999979099914, 'epoch': 64.0}


                                                      
  6%|▋         | 5248/82000 [25:21<4:06:18,  5.19it/s]

{'eval_loss': 0.4693678617477417, 'eval_bleu': 7.3652, 'eval_gen_len': 6.4795, 'eval_runtime': 4.7946, 'eval_samples_per_second': 30.451, 'eval_steps_per_second': 2.086, 'epoch': 64.0}


  6%|▋         | 5330/82000 [25:44<4:28:23,  4.76it/s] 

{'loss': 0.2087, 'learning_rate': 0.0023261122422252242, 'epoch': 65.0}


                                                      
  6%|▋         | 5330/82000 [25:48<4:28:23,  4.76it/s]

{'eval_loss': 0.48264437913894653, 'eval_bleu': 5.7392, 'eval_gen_len': 7.3904, 'eval_runtime': 4.6573, 'eval_samples_per_second': 31.349, 'eval_steps_per_second': 2.147, 'epoch': 65.0}


  7%|▋         | 5412/82000 [26:08<4:09:21,  5.12it/s] 

{'loss': 0.2232, 'learning_rate': 0.002323624486540457, 'epoch': 66.0}


                                                      
  7%|▋         | 5412/82000 [26:13<4:09:21,  5.12it/s]

{'eval_loss': 0.5075202584266663, 'eval_bleu': 6.6024, 'eval_gen_len': 5.8562, 'eval_runtime': 4.646, 'eval_samples_per_second': 31.425, 'eval_steps_per_second': 2.152, 'epoch': 66.0}


  7%|▋         | 5494/82000 [26:32<5:01:41,  4.23it/s] 

{'loss': 0.2261, 'learning_rate': 0.00232113673085569, 'epoch': 67.0}


                                                      
  7%|▋         | 5494/82000 [26:37<5:01:41,  4.23it/s]

{'eval_loss': 0.47951602935791016, 'eval_bleu': 4.1243, 'eval_gen_len': 5.6301, 'eval_runtime': 4.5113, 'eval_samples_per_second': 32.363, 'eval_steps_per_second': 2.217, 'epoch': 67.0}


  7%|▋         | 5576/82000 [26:54<4:46:14,  4.45it/s] 

{'loss': 0.1827, 'learning_rate': 0.0023186489751709227, 'epoch': 68.0}


                                                      
  7%|▋         | 5576/82000 [27:01<4:46:14,  4.45it/s]

{'eval_loss': 0.4812113046646118, 'eval_bleu': 6.1741, 'eval_gen_len': 6.3356, 'eval_runtime': 6.7176, 'eval_samples_per_second': 21.734, 'eval_steps_per_second': 1.489, 'epoch': 68.0}


  7%|▋         | 5658/82000 [27:20<5:02:40,  4.20it/s] 

{'loss': 0.16, 'learning_rate': 0.0023161612194861556, 'epoch': 69.0}


                                                      
  7%|▋         | 5658/82000 [27:25<5:02:40,  4.20it/s]

{'eval_loss': 0.4848424792289734, 'eval_bleu': 6.501, 'eval_gen_len': 6.6233, 'eval_runtime': 4.561, 'eval_samples_per_second': 32.011, 'eval_steps_per_second': 2.193, 'epoch': 69.0}


  7%|▋         | 5740/82000 [27:46<4:55:14,  4.30it/s] 

{'loss': 0.1623, 'learning_rate': 0.0023136734638013884, 'epoch': 70.0}


                                                      
  7%|▋         | 5740/82000 [27:51<4:55:14,  4.30it/s]

{'eval_loss': 0.4958149194717407, 'eval_bleu': 5.6741, 'eval_gen_len': 6.4315, 'eval_runtime': 4.8936, 'eval_samples_per_second': 29.835, 'eval_steps_per_second': 2.043, 'epoch': 70.0}


  7%|▋         | 5822/82000 [28:11<4:27:37,  4.74it/s] 

{'loss': 0.1653, 'learning_rate': 0.0023111857081166212, 'epoch': 71.0}


                                                      
  7%|▋         | 5822/82000 [28:16<4:27:37,  4.74it/s]

{'eval_loss': 0.4899645149707794, 'eval_bleu': 6.4462, 'eval_gen_len': 5.8493, 'eval_runtime': 5.5073, 'eval_samples_per_second': 26.51, 'eval_steps_per_second': 1.816, 'epoch': 71.0}


  7%|▋         | 5904/82000 [28:34<4:14:10,  4.99it/s] 

{'loss': 0.1708, 'learning_rate': 0.002308697952431854, 'epoch': 72.0}


                                                      
  7%|▋         | 5904/82000 [28:39<4:14:10,  4.99it/s]

{'eval_loss': 0.4849715530872345, 'eval_bleu': 6.2025, 'eval_gen_len': 5.9315, 'eval_runtime': 4.6944, 'eval_samples_per_second': 31.101, 'eval_steps_per_second': 2.13, 'epoch': 72.0}


  7%|▋         | 5986/82000 [28:57<4:20:03,  4.87it/s] 

{'loss': 0.1748, 'learning_rate': 0.002306210196747087, 'epoch': 73.0}


                                                      
  7%|▋         | 5986/82000 [29:02<4:20:03,  4.87it/s]

{'eval_loss': 0.4870261549949646, 'eval_bleu': 7.1217, 'eval_gen_len': 6.2945, 'eval_runtime': 4.4486, 'eval_samples_per_second': 32.82, 'eval_steps_per_second': 2.248, 'epoch': 73.0}


  7%|▋         | 6068/82000 [29:20<4:08:46,  5.09it/s] 

{'loss': 0.1676, 'learning_rate': 0.0023037224410623197, 'epoch': 74.0}


                                                      
  7%|▋         | 6068/82000 [29:25<4:08:46,  5.09it/s]

{'eval_loss': 0.4801238179206848, 'eval_bleu': 9.0445, 'eval_gen_len': 6.0342, 'eval_runtime': 4.7137, 'eval_samples_per_second': 30.973, 'eval_steps_per_second': 2.121, 'epoch': 74.0}


  8%|▊         | 6150/82000 [29:45<5:14:15,  4.02it/s] 

{'loss': 0.1653, 'learning_rate': 0.0023012346853775525, 'epoch': 75.0}


                                                      
  8%|▊         | 6150/82000 [29:51<5:14:15,  4.02it/s]

{'eval_loss': 0.491851806640625, 'eval_bleu': 7.819, 'eval_gen_len': 6.4521, 'eval_runtime': 6.0685, 'eval_samples_per_second': 24.059, 'eval_steps_per_second': 1.648, 'epoch': 75.0}


  8%|▊         | 6232/82000 [30:12<4:14:27,  4.96it/s] 

{'loss': 0.1686, 'learning_rate': 0.0022987469296927854, 'epoch': 76.0}


                                                      
  8%|▊         | 6232/82000 [30:17<4:14:27,  4.96it/s]

{'eval_loss': 0.48406845331192017, 'eval_bleu': 7.3434, 'eval_gen_len': 6.2397, 'eval_runtime': 5.2277, 'eval_samples_per_second': 27.928, 'eval_steps_per_second': 1.913, 'epoch': 76.0}


  8%|▊         | 6314/82000 [30:37<4:29:23,  4.68it/s] 

{'loss': 0.1675, 'learning_rate': 0.002296259174008018, 'epoch': 77.0}


                                                      
  8%|▊         | 6314/82000 [30:41<4:29:23,  4.68it/s]

{'eval_loss': 0.4651334583759308, 'eval_bleu': 8.0966, 'eval_gen_len': 6.1712, 'eval_runtime': 4.6725, 'eval_samples_per_second': 31.247, 'eval_steps_per_second': 2.14, 'epoch': 77.0}


  8%|▊         | 6396/82000 [31:01<3:57:51,  5.30it/s] 

{'loss': 0.1579, 'learning_rate': 0.002293771418323251, 'epoch': 78.0}


                                                      
  8%|▊         | 6396/82000 [31:05<3:57:51,  5.30it/s]

{'eval_loss': 0.500201940536499, 'eval_bleu': 6.2408, 'eval_gen_len': 6.8082, 'eval_runtime': 4.3568, 'eval_samples_per_second': 33.511, 'eval_steps_per_second': 2.295, 'epoch': 78.0}


  8%|▊         | 6478/82000 [31:26<4:23:42,  4.77it/s] 

{'loss': 0.2111, 'learning_rate': 0.002291283662638484, 'epoch': 79.0}


                                                      
  8%|▊         | 6478/82000 [31:31<4:23:42,  4.77it/s]

{'eval_loss': 0.5133199691772461, 'eval_bleu': 3.9163, 'eval_gen_len': 6.0068, 'eval_runtime': 4.5934, 'eval_samples_per_second': 31.785, 'eval_steps_per_second': 2.177, 'epoch': 79.0}


  8%|▊         | 6560/82000 [31:52<4:33:57,  4.59it/s] 

{'loss': 0.1892, 'learning_rate': 0.0022887959069537167, 'epoch': 80.0}


                                                      
  8%|▊         | 6560/82000 [31:56<4:33:57,  4.59it/s]

{'eval_loss': 0.5045763254165649, 'eval_bleu': 4.6097, 'eval_gen_len': 5.6096, 'eval_runtime': 4.5741, 'eval_samples_per_second': 31.919, 'eval_steps_per_second': 2.186, 'epoch': 80.0}


  8%|▊         | 6642/82000 [32:15<3:59:45,  5.24it/s] 

{'loss': 0.1621, 'learning_rate': 0.0022863081512689495, 'epoch': 81.0}


                                                      
  8%|▊         | 6642/82000 [32:20<3:59:45,  5.24it/s]

{'eval_loss': 0.4803810119628906, 'eval_bleu': 7.4209, 'eval_gen_len': 6.5616, 'eval_runtime': 5.3475, 'eval_samples_per_second': 27.303, 'eval_steps_per_second': 1.87, 'epoch': 81.0}


  8%|▊         | 6724/82000 [32:40<4:57:40,  4.21it/s] 

{'loss': 0.1549, 'learning_rate': 0.0022838203955841824, 'epoch': 82.0}


                                                      
  8%|▊         | 6724/82000 [32:48<4:57:40,  4.21it/s]

{'eval_loss': 0.4950057864189148, 'eval_bleu': 6.266, 'eval_gen_len': 5.9178, 'eval_runtime': 7.1115, 'eval_samples_per_second': 20.53, 'eval_steps_per_second': 1.406, 'epoch': 82.0}


  8%|▊         | 6806/82000 [33:19<17:16:08,  1.21it/s]

{'loss': 0.1531, 'learning_rate': 0.002281332639899415, 'epoch': 83.0}


                                                       
  8%|▊         | 6806/82000 [33:34<17:16:08,  1.21it/s]

{'eval_loss': 0.49541300535202026, 'eval_bleu': 4.2621, 'eval_gen_len': 6.4384, 'eval_runtime': 15.2853, 'eval_samples_per_second': 9.552, 'eval_steps_per_second': 0.654, 'epoch': 83.0}


  8%|▊         | 6888/82000 [35:20<26:17:42,  1.26s/it] 

{'loss': 0.1481, 'learning_rate': 0.002278844884214648, 'epoch': 84.0}


                                                       
  8%|▊         | 6888/82000 [35:42<26:17:42,  1.26s/it]

{'eval_loss': 0.4936107099056244, 'eval_bleu': 5.6579, 'eval_gen_len': 6.7534, 'eval_runtime': 21.4349, 'eval_samples_per_second': 6.811, 'eval_steps_per_second': 0.467, 'epoch': 84.0}


  8%|▊         | 6970/82000 [37:30<23:32:42,  1.13s/it] 

{'loss': 0.1514, 'learning_rate': 0.002276357128529881, 'epoch': 85.0}


                                                       
  8%|▊         | 6970/82000 [37:48<23:32:42,  1.13s/it]

{'eval_loss': 0.5032789707183838, 'eval_bleu': 6.4357, 'eval_gen_len': 6.4315, 'eval_runtime': 17.9237, 'eval_samples_per_second': 8.146, 'eval_steps_per_second': 0.558, 'epoch': 85.0}


  9%|▊         | 7052/82000 [38:54<3:17:39,  6.32it/s]  

{'loss': 0.1573, 'learning_rate': 0.0022738693728451137, 'epoch': 86.0}


                                                      
  9%|▊         | 7052/82000 [38:57<3:17:39,  6.32it/s]

{'eval_loss': 0.5053951144218445, 'eval_bleu': 7.4303, 'eval_gen_len': 6.4726, 'eval_runtime': 3.6049, 'eval_samples_per_second': 40.5, 'eval_steps_per_second': 2.774, 'epoch': 86.0}


  9%|▊         | 7134/82000 [39:14<4:40:51,  4.44it/s] 

{'loss': 0.1553, 'learning_rate': 0.0022713816171603465, 'epoch': 87.0}


                                                      
  9%|▊         | 7134/82000 [39:18<4:40:51,  4.44it/s]

{'eval_loss': 0.5428372621536255, 'eval_bleu': 7.1441, 'eval_gen_len': 6.2534, 'eval_runtime': 4.0791, 'eval_samples_per_second': 35.792, 'eval_steps_per_second': 2.452, 'epoch': 87.0}


  9%|▉         | 7216/82000 [39:36<4:37:12,  4.50it/s] 

{'loss': 0.1537, 'learning_rate': 0.0022688938614755794, 'epoch': 88.0}


                                                      
  9%|▉         | 7216/82000 [39:42<4:37:12,  4.50it/s]

{'eval_loss': 0.48996207118034363, 'eval_bleu': 7.2779, 'eval_gen_len': 6.8151, 'eval_runtime': 5.9392, 'eval_samples_per_second': 24.582, 'eval_steps_per_second': 1.684, 'epoch': 88.0}


  9%|▉         | 7298/82000 [40:01<4:12:47,  4.93it/s] 

{'loss': 0.1521, 'learning_rate': 0.002266406105790812, 'epoch': 89.0}


                                                      
  9%|▉         | 7298/82000 [40:05<4:12:47,  4.93it/s]

{'eval_loss': 0.48815542459487915, 'eval_bleu': 6.4876, 'eval_gen_len': 7.1301, 'eval_runtime': 4.5468, 'eval_samples_per_second': 32.111, 'eval_steps_per_second': 2.199, 'epoch': 89.0}


  9%|▉         | 7380/82000 [40:27<7:19:33,  2.83it/s] 

{'loss': 0.1464, 'learning_rate': 0.002263918350106045, 'epoch': 90.0}


                                                      
  9%|▉         | 7380/82000 [40:34<7:19:33,  2.83it/s]

{'eval_loss': 0.47839125990867615, 'eval_bleu': 7.3947, 'eval_gen_len': 6.3562, 'eval_runtime': 7.4414, 'eval_samples_per_second': 19.62, 'eval_steps_per_second': 1.344, 'epoch': 90.0}


  9%|▉         | 7462/82000 [40:52<4:05:05,  5.07it/s] 

{'loss': 0.1801, 'learning_rate': 0.002261430594421278, 'epoch': 91.0}


                                                      
  9%|▉         | 7462/82000 [40:56<4:05:05,  5.07it/s]

{'eval_loss': 0.5145735144615173, 'eval_bleu': 6.322, 'eval_gen_len': 6.5753, 'eval_runtime': 4.0131, 'eval_samples_per_second': 36.381, 'eval_steps_per_second': 2.492, 'epoch': 91.0}


  9%|▉         | 7544/82000 [41:15<3:51:43,  5.36it/s] 

{'loss': 0.1919, 'learning_rate': 0.0022589428387365107, 'epoch': 92.0}


                                                      
  9%|▉         | 7544/82000 [41:19<3:51:43,  5.36it/s]

{'eval_loss': 0.4906582832336426, 'eval_bleu': 6.8099, 'eval_gen_len': 6.9795, 'eval_runtime': 4.0324, 'eval_samples_per_second': 36.206, 'eval_steps_per_second': 2.48, 'epoch': 92.0}


  9%|▉         | 7626/82000 [41:37<4:26:12,  4.66it/s] 

{'loss': 0.1559, 'learning_rate': 0.0022564550830517435, 'epoch': 93.0}


                                                      
  9%|▉         | 7626/82000 [41:41<4:26:12,  4.66it/s]

{'eval_loss': 0.4875680208206177, 'eval_bleu': 7.7576, 'eval_gen_len': 6.7671, 'eval_runtime': 4.1656, 'eval_samples_per_second': 35.049, 'eval_steps_per_second': 2.401, 'epoch': 93.0}


  9%|▉         | 7708/82000 [42:01<4:25:46,  4.66it/s] 

{'loss': 0.1422, 'learning_rate': 0.0022539673273669764, 'epoch': 94.0}


                                                      
  9%|▉         | 7708/82000 [42:06<4:25:46,  4.66it/s]

{'eval_loss': 0.4981619417667389, 'eval_bleu': 5.3896, 'eval_gen_len': 6.5, 'eval_runtime': 5.3409, 'eval_samples_per_second': 27.336, 'eval_steps_per_second': 1.872, 'epoch': 94.0}


 10%|▉         | 7790/82000 [42:28<4:34:42,  4.50it/s] 

{'loss': 0.1371, 'learning_rate': 0.002251479571682209, 'epoch': 95.0}


                                                      
 10%|▉         | 7790/82000 [42:33<4:34:42,  4.50it/s]

{'eval_loss': 0.5120676159858704, 'eval_bleu': 7.0196, 'eval_gen_len': 6.4795, 'eval_runtime': 4.9142, 'eval_samples_per_second': 29.71, 'eval_steps_per_second': 2.035, 'epoch': 95.0}


 10%|▉         | 7872/82000 [42:52<4:47:18,  4.30it/s] 

{'loss': 0.1396, 'learning_rate': 0.002248991815997442, 'epoch': 96.0}


                                                      
 10%|▉         | 7872/82000 [42:58<4:47:18,  4.30it/s]

{'eval_loss': 0.49389663338661194, 'eval_bleu': 8.5284, 'eval_gen_len': 6.5479, 'eval_runtime': 6.0243, 'eval_samples_per_second': 24.235, 'eval_steps_per_second': 1.66, 'epoch': 96.0}


 10%|▉         | 7954/82000 [43:19<4:18:54,  4.77it/s] 

{'loss': 0.1451, 'learning_rate': 0.002246504060312675, 'epoch': 97.0}


                                                      
 10%|▉         | 7954/82000 [43:25<4:18:54,  4.77it/s]

{'eval_loss': 0.4866081774234772, 'eval_bleu': 6.6478, 'eval_gen_len': 5.9726, 'eval_runtime': 5.4442, 'eval_samples_per_second': 26.818, 'eval_steps_per_second': 1.837, 'epoch': 97.0}


 10%|▉         | 8036/82000 [43:45<5:11:54,  3.95it/s] 

{'loss': 0.1458, 'learning_rate': 0.0022440163046279077, 'epoch': 98.0}


                                                      
 10%|▉         | 8036/82000 [43:51<5:11:54,  3.95it/s]

{'eval_loss': 0.4897317886352539, 'eval_bleu': 5.9613, 'eval_gen_len': 6.5274, 'eval_runtime': 5.94, 'eval_samples_per_second': 24.579, 'eval_steps_per_second': 1.684, 'epoch': 98.0}


 10%|▉         | 8118/82000 [44:09<4:01:05,  5.11it/s] 

{'loss': 0.1409, 'learning_rate': 0.0022415285489431405, 'epoch': 99.0}


                                                      
 10%|▉         | 8118/82000 [44:14<4:01:05,  5.11it/s]

{'eval_loss': 0.4985682964324951, 'eval_bleu': 5.3536, 'eval_gen_len': 6.1849, 'eval_runtime': 4.1957, 'eval_samples_per_second': 34.797, 'eval_steps_per_second': 2.383, 'epoch': 99.0}


 10%|█         | 8200/82000 [44:32<4:41:09,  4.37it/s] 

{'loss': 0.1394, 'learning_rate': 0.0022390407932583734, 'epoch': 100.0}


                                                      
 10%|█         | 8200/82000 [44:36<4:41:09,  4.37it/s]

{'eval_loss': 0.49360913038253784, 'eval_bleu': 4.2261, 'eval_gen_len': 6.7877, 'eval_runtime': 4.3629, 'eval_samples_per_second': 33.464, 'eval_steps_per_second': 2.292, 'epoch': 100.0}


 10%|█         | 8282/82000 [44:55<4:59:04,  4.11it/s] 

{'loss': 0.1514, 'learning_rate': 0.002236553037573606, 'epoch': 101.0}


                                                      
 10%|█         | 8282/82000 [44:59<4:59:04,  4.11it/s]

{'eval_loss': 0.48317405581474304, 'eval_bleu': 7.1605, 'eval_gen_len': 5.9863, 'eval_runtime': 4.325, 'eval_samples_per_second': 33.757, 'eval_steps_per_second': 2.312, 'epoch': 101.0}


 10%|█         | 8364/82000 [45:20<4:46:44,  4.28it/s] 

{'loss': 0.1473, 'learning_rate': 0.002234065281888839, 'epoch': 102.0}


                                                      
 10%|█         | 8364/82000 [45:25<4:46:44,  4.28it/s]

{'eval_loss': 0.4996851086616516, 'eval_bleu': 7.5467, 'eval_gen_len': 6.2877, 'eval_runtime': 5.0837, 'eval_samples_per_second': 28.719, 'eval_steps_per_second': 1.967, 'epoch': 102.0}


 10%|█         | 8446/82000 [45:46<4:30:25,  4.53it/s] 

{'loss': 0.1387, 'learning_rate': 0.002231577526204072, 'epoch': 103.0}


                                                      
 10%|█         | 8446/82000 [45:51<4:30:25,  4.53it/s]

{'eval_loss': 0.4979124665260315, 'eval_bleu': 6.7977, 'eval_gen_len': 6.6644, 'eval_runtime': 4.4632, 'eval_samples_per_second': 32.712, 'eval_steps_per_second': 2.241, 'epoch': 103.0}


 10%|█         | 8528/82000 [46:11<4:15:37,  4.79it/s] 

{'loss': 0.136, 'learning_rate': 0.0022290897705193047, 'epoch': 104.0}


                                                      
 10%|█         | 8528/82000 [46:16<4:15:37,  4.79it/s]

{'eval_loss': 0.5312088131904602, 'eval_bleu': 5.6546, 'eval_gen_len': 5.9178, 'eval_runtime': 5.5097, 'eval_samples_per_second': 26.499, 'eval_steps_per_second': 1.815, 'epoch': 104.0}


 10%|█         | 8610/82000 [46:37<4:21:16,  4.68it/s] 

{'loss': 0.1365, 'learning_rate': 0.0022266020148345375, 'epoch': 105.0}


                                                      
 10%|█         | 8610/82000 [46:42<4:21:16,  4.68it/s]

{'eval_loss': 0.5316005349159241, 'eval_bleu': 5.8395, 'eval_gen_len': 6.8219, 'eval_runtime': 4.3532, 'eval_samples_per_second': 33.539, 'eval_steps_per_second': 2.297, 'epoch': 105.0}


 11%|█         | 8692/82000 [47:01<4:35:25,  4.44it/s] 

{'loss': 0.141, 'learning_rate': 0.0022241445976337306, 'epoch': 106.0}


                                                      
 11%|█         | 8692/82000 [47:06<4:35:25,  4.44it/s]

{'eval_loss': 0.5071889162063599, 'eval_bleu': 8.5416, 'eval_gen_len': 6.411, 'eval_runtime': 5.0003, 'eval_samples_per_second': 29.198, 'eval_steps_per_second': 2.0, 'epoch': 106.0}


 11%|█         | 8774/82000 [47:26<4:14:43,  4.79it/s] 

{'loss': 0.1384, 'learning_rate': 0.0022216568419489635, 'epoch': 107.0}


                                                      
 11%|█         | 8774/82000 [47:31<4:14:43,  4.79it/s]

{'eval_loss': 0.5086562037467957, 'eval_bleu': 7.1303, 'eval_gen_len': 6.274, 'eval_runtime': 5.0694, 'eval_samples_per_second': 28.8, 'eval_steps_per_second': 1.973, 'epoch': 107.0}


 11%|█         | 8856/82000 [47:50<3:50:07,  5.30it/s] 

{'loss': 0.1365, 'learning_rate': 0.0022191690862641963, 'epoch': 108.0}


                                                      
 11%|█         | 8856/82000 [47:56<3:50:07,  5.30it/s]

{'eval_loss': 0.4930969774723053, 'eval_bleu': 7.0651, 'eval_gen_len': 6.6849, 'eval_runtime': 6.0869, 'eval_samples_per_second': 23.986, 'eval_steps_per_second': 1.643, 'epoch': 108.0}


 11%|█         | 8938/82000 [48:16<4:38:26,  4.37it/s] 

{'loss': 0.1409, 'learning_rate': 0.002216681330579429, 'epoch': 109.0}


                                                      
 11%|█         | 8938/82000 [48:22<4:38:26,  4.37it/s]

{'eval_loss': 0.5075874924659729, 'eval_bleu': 7.4662, 'eval_gen_len': 6.0685, 'eval_runtime': 6.1111, 'eval_samples_per_second': 23.891, 'eval_steps_per_second': 1.636, 'epoch': 109.0}


 11%|█         | 9020/82000 [48:42<4:17:51,  4.72it/s] 

{'loss': 0.1384, 'learning_rate': 0.002214193574894662, 'epoch': 110.0}


                                                      
 11%|█         | 9020/82000 [48:48<4:17:51,  4.72it/s]

{'eval_loss': 0.4940423369407654, 'eval_bleu': 8.4899, 'eval_gen_len': 6.3562, 'eval_runtime': 5.9886, 'eval_samples_per_second': 24.38, 'eval_steps_per_second': 1.67, 'epoch': 110.0}


 11%|█         | 9102/82000 [49:09<5:52:53,  3.44it/s] 

{'loss': 0.1451, 'learning_rate': 0.002211705819209895, 'epoch': 111.0}


                                                      
 11%|█         | 9102/82000 [49:15<5:52:53,  3.44it/s]

{'eval_loss': 0.5100040435791016, 'eval_bleu': 5.6064, 'eval_gen_len': 6.2329, 'eval_runtime': 6.3018, 'eval_samples_per_second': 23.168, 'eval_steps_per_second': 1.587, 'epoch': 111.0}


 11%|█         | 9184/82000 [49:35<3:20:06,  6.06it/s] 

{'loss': 0.1434, 'learning_rate': 0.0022092484020090884, 'epoch': 112.0}


                                                      
 11%|█         | 9184/82000 [49:40<3:20:06,  6.06it/s]

{'eval_loss': 0.4990517497062683, 'eval_bleu': 7.5452, 'eval_gen_len': 6.3288, 'eval_runtime': 4.1762, 'eval_samples_per_second': 34.96, 'eval_steps_per_second': 2.395, 'epoch': 112.0}


 11%|█▏        | 9266/82000 [49:59<4:17:23,  4.71it/s] 

{'loss': 0.1441, 'learning_rate': 0.002206790984808282, 'epoch': 113.0}


                                                      
 11%|█▏        | 9266/82000 [50:04<4:17:23,  4.71it/s]

{'eval_loss': 0.4916530251502991, 'eval_bleu': 7.8085, 'eval_gen_len': 7.6575, 'eval_runtime': 4.3541, 'eval_samples_per_second': 33.532, 'eval_steps_per_second': 2.297, 'epoch': 113.0}


 11%|█▏        | 9348/82000 [50:24<4:37:00,  4.37it/s] 

{'loss': 0.1369, 'learning_rate': 0.0022043032291235148, 'epoch': 114.0}


                                                      
 11%|█▏        | 9348/82000 [50:29<4:37:00,  4.37it/s]

{'eval_loss': 0.49071577191352844, 'eval_bleu': 6.6159, 'eval_gen_len': 6.3425, 'eval_runtime': 4.684, 'eval_samples_per_second': 31.17, 'eval_steps_per_second': 2.135, 'epoch': 114.0}


 12%|█▏        | 9430/82000 [50:49<4:56:29,  4.08it/s] 

{'loss': 0.1349, 'learning_rate': 0.0022018154734387476, 'epoch': 115.0}


                                                      
 12%|█▏        | 9430/82000 [50:55<4:56:29,  4.08it/s]

{'eval_loss': 0.5072570443153381, 'eval_bleu': 6.0752, 'eval_gen_len': 6.6781, 'eval_runtime': 5.8458, 'eval_samples_per_second': 24.975, 'eval_steps_per_second': 1.711, 'epoch': 115.0}


 12%|█▏        | 9512/82000 [51:14<4:37:56,  4.35it/s] 

{'loss': 0.1333, 'learning_rate': 0.0021993277177539804, 'epoch': 116.0}


                                                      
 12%|█▏        | 9512/82000 [51:18<4:37:56,  4.35it/s]

{'eval_loss': 0.5047363042831421, 'eval_bleu': 5.8883, 'eval_gen_len': 6.3151, 'eval_runtime': 4.4244, 'eval_samples_per_second': 32.999, 'eval_steps_per_second': 2.26, 'epoch': 116.0}


 12%|█▏        | 9594/82000 [51:40<4:10:02,  4.83it/s] 

{'loss': 0.1249, 'learning_rate': 0.0021968399620692133, 'epoch': 117.0}


                                                      
 12%|█▏        | 9594/82000 [51:45<4:10:02,  4.83it/s]

{'eval_loss': 0.5266658663749695, 'eval_bleu': 6.7293, 'eval_gen_len': 6.6507, 'eval_runtime': 4.9171, 'eval_samples_per_second': 29.692, 'eval_steps_per_second': 2.034, 'epoch': 117.0}


 12%|█▏        | 9676/82000 [52:06<4:06:16,  4.89it/s] 

{'loss': 0.1261, 'learning_rate': 0.002194352206384446, 'epoch': 118.0}


                                                      
 12%|█▏        | 9676/82000 [52:10<4:06:16,  4.89it/s]

{'eval_loss': 0.4933232367038727, 'eval_bleu': 7.9556, 'eval_gen_len': 6.4384, 'eval_runtime': 4.4631, 'eval_samples_per_second': 32.713, 'eval_steps_per_second': 2.241, 'epoch': 118.0}


 12%|█▏        | 9758/82000 [52:29<5:00:56,  4.00it/s] 

{'loss': 0.1254, 'learning_rate': 0.002191864450699679, 'epoch': 119.0}


                                                      
 12%|█▏        | 9758/82000 [52:35<5:00:56,  4.00it/s]

{'eval_loss': 0.4786854684352875, 'eval_bleu': 10.2154, 'eval_gen_len': 6.5548, 'eval_runtime': 6.1774, 'eval_samples_per_second': 23.634, 'eval_steps_per_second': 1.619, 'epoch': 119.0}


 12%|█▏        | 9840/82000 [52:53<4:37:49,  4.33it/s] 

{'loss': 0.1254, 'learning_rate': 0.0021893766950149118, 'epoch': 120.0}


                                                      
 12%|█▏        | 9840/82000 [52:58<4:37:49,  4.33it/s]

{'eval_loss': 0.5067509412765503, 'eval_bleu': 6.023, 'eval_gen_len': 6.6575, 'eval_runtime': 4.4185, 'eval_samples_per_second': 33.043, 'eval_steps_per_second': 2.263, 'epoch': 120.0}


 12%|█▏        | 9922/82000 [53:17<3:39:33,  5.47it/s] 

{'loss': 0.1299, 'learning_rate': 0.0021868889393301446, 'epoch': 121.0}


                                                      
 12%|█▏        | 9922/82000 [53:22<3:39:33,  5.47it/s]

{'eval_loss': 0.500360906124115, 'eval_bleu': 8.7239, 'eval_gen_len': 6.8082, 'eval_runtime': 4.5469, 'eval_samples_per_second': 32.11, 'eval_steps_per_second': 2.199, 'epoch': 121.0}


 12%|█▏        | 10004/82000 [53:43<4:03:25,  4.93it/s]

{'loss': 0.1434, 'learning_rate': 0.0021844011836453774, 'epoch': 122.0}


                                                       
 12%|█▏        | 10004/82000 [53:48<4:03:25,  4.93it/s]

{'eval_loss': 0.5508583784103394, 'eval_bleu': 4.4753, 'eval_gen_len': 6.9726, 'eval_runtime': 5.0465, 'eval_samples_per_second': 28.931, 'eval_steps_per_second': 1.982, 'epoch': 122.0}


 12%|█▏        | 10086/82000 [54:06<3:59:34,  5.00it/s] 

{'loss': 0.1875, 'learning_rate': 0.00218191342796061, 'epoch': 123.0}


                                                       
 12%|█▏        | 10086/82000 [54:10<3:59:34,  5.00it/s]

{'eval_loss': 0.544540286064148, 'eval_bleu': 3.7924, 'eval_gen_len': 5.9315, 'eval_runtime': 4.3664, 'eval_samples_per_second': 33.437, 'eval_steps_per_second': 2.29, 'epoch': 123.0}


 12%|█▏        | 10168/82000 [54:28<4:06:08,  4.86it/s] 

{'loss': 0.1561, 'learning_rate': 0.0021794256722758427, 'epoch': 124.0}


                                                       
 12%|█▏        | 10168/82000 [54:33<4:06:08,  4.86it/s]

{'eval_loss': 0.5226793885231018, 'eval_bleu': 6.0314, 'eval_gen_len': 6.0548, 'eval_runtime': 4.1331, 'eval_samples_per_second': 35.325, 'eval_steps_per_second': 2.42, 'epoch': 124.0}


 12%|█▎        | 10250/82000 [54:51<3:45:57,  5.29it/s] 

{'loss': 0.1314, 'learning_rate': 0.0021769379165910755, 'epoch': 125.0}


                                                       
 12%|█▎        | 10250/82000 [54:56<3:45:57,  5.29it/s]

{'eval_loss': 0.5073304772377014, 'eval_bleu': 10.0184, 'eval_gen_len': 6.2945, 'eval_runtime': 4.5583, 'eval_samples_per_second': 32.029, 'eval_steps_per_second': 2.194, 'epoch': 125.0}


 13%|█▎        | 10332/82000 [55:15<7:43:32,  2.58it/s] 

{'loss': 0.112, 'learning_rate': 0.0021744501609063088, 'epoch': 126.0}


                                                       
 13%|█▎        | 10332/82000 [55:20<7:43:32,  2.58it/s]

{'eval_loss': 0.514306366443634, 'eval_bleu': 8.4685, 'eval_gen_len': 6.5479, 'eval_runtime': 5.2332, 'eval_samples_per_second': 27.899, 'eval_steps_per_second': 1.911, 'epoch': 126.0}


 13%|█▎        | 10414/82000 [55:40<3:58:32,  5.00it/s] 

{'loss': 0.1141, 'learning_rate': 0.0021719624052215416, 'epoch': 127.0}


                                                       
 13%|█▎        | 10414/82000 [55:44<3:58:32,  5.00it/s]

{'eval_loss': 0.5142415761947632, 'eval_bleu': 7.5555, 'eval_gen_len': 7.0616, 'eval_runtime': 4.3457, 'eval_samples_per_second': 33.596, 'eval_steps_per_second': 2.301, 'epoch': 127.0}


 13%|█▎        | 10496/82000 [56:05<4:13:40,  4.70it/s] 

{'loss': 0.1231, 'learning_rate': 0.0021694746495367744, 'epoch': 128.0}


                                                       
 13%|█▎        | 10496/82000 [56:09<4:13:40,  4.70it/s]

{'eval_loss': 0.501114547252655, 'eval_bleu': 6.5259, 'eval_gen_len': 6.774, 'eval_runtime': 4.0337, 'eval_samples_per_second': 36.195, 'eval_steps_per_second': 2.479, 'epoch': 128.0}


 13%|█▎        | 10578/82000 [56:26<3:42:47,  5.34it/s] 

{'loss': 0.1224, 'learning_rate': 0.0021669868938520073, 'epoch': 129.0}


                                                       
 13%|█▎        | 10578/82000 [56:30<3:42:47,  5.34it/s]

{'eval_loss': 0.4976256489753723, 'eval_bleu': 8.8776, 'eval_gen_len': 6.2055, 'eval_runtime': 3.9755, 'eval_samples_per_second': 36.725, 'eval_steps_per_second': 2.515, 'epoch': 129.0}


 13%|█▎        | 10660/82000 [56:47<3:42:41,  5.34it/s] 

{'loss': 0.1253, 'learning_rate': 0.00216449913816724, 'epoch': 130.0}


                                                       
 13%|█▎        | 10660/82000 [56:51<3:42:41,  5.34it/s]

{'eval_loss': 0.5048703551292419, 'eval_bleu': 10.3285, 'eval_gen_len': 5.8493, 'eval_runtime': 4.2502, 'eval_samples_per_second': 34.351, 'eval_steps_per_second': 2.353, 'epoch': 130.0}


 13%|█▎        | 10742/82000 [57:09<3:56:32,  5.02it/s] 

{'loss': 0.1254, 'learning_rate': 0.002162011382482473, 'epoch': 131.0}


                                                       
 13%|█▎        | 10742/82000 [57:12<3:56:32,  5.02it/s]

{'eval_loss': 0.4959777891635895, 'eval_bleu': 7.5246, 'eval_gen_len': 6.9795, 'eval_runtime': 3.8968, 'eval_samples_per_second': 37.467, 'eval_steps_per_second': 2.566, 'epoch': 131.0}


 13%|█▎        | 10824/82000 [57:32<3:55:52,  5.03it/s] 

{'loss': 0.1223, 'learning_rate': 0.0021595236267977058, 'epoch': 132.0}


                                                       
 13%|█▎        | 10824/82000 [57:36<3:55:52,  5.03it/s]

{'eval_loss': 0.5123797059059143, 'eval_bleu': 7.6226, 'eval_gen_len': 6.2877, 'eval_runtime': 3.9585, 'eval_samples_per_second': 36.882, 'eval_steps_per_second': 2.526, 'epoch': 132.0}


 13%|█▎        | 10906/82000 [57:53<3:43:08,  5.31it/s] 

{'loss': 0.1231, 'learning_rate': 0.0021570358711129386, 'epoch': 133.0}


                                                       
 13%|█▎        | 10906/82000 [57:57<3:43:08,  5.31it/s]

{'eval_loss': 0.5097094774246216, 'eval_bleu': 9.562, 'eval_gen_len': 7.0342, 'eval_runtime': 4.1207, 'eval_samples_per_second': 35.431, 'eval_steps_per_second': 2.427, 'epoch': 133.0}


 13%|█▎        | 10988/82000 [58:14<5:39:15,  3.49it/s] 

{'loss': 0.1271, 'learning_rate': 0.0021545481154281714, 'epoch': 134.0}


                                                       
 13%|█▎        | 10988/82000 [58:19<5:39:15,  3.49it/s]

{'eval_loss': 0.5256674885749817, 'eval_bleu': 4.0291, 'eval_gen_len': 5.7123, 'eval_runtime': 4.7498, 'eval_samples_per_second': 30.738, 'eval_steps_per_second': 2.105, 'epoch': 134.0}


 14%|█▎        | 11070/82000 [58:36<3:40:32,  5.36it/s] 

{'loss': 0.1253, 'learning_rate': 0.0021520603597434042, 'epoch': 135.0}


                                                       
 14%|█▎        | 11070/82000 [58:40<3:40:32,  5.36it/s]

{'eval_loss': 0.5051003098487854, 'eval_bleu': 6.0442, 'eval_gen_len': 5.9932, 'eval_runtime': 3.9359, 'eval_samples_per_second': 37.094, 'eval_steps_per_second': 2.541, 'epoch': 135.0}


 14%|█▎        | 11152/82000 [58:59<5:07:12,  3.84it/s] 

{'loss': 0.1177, 'learning_rate': 0.002149572604058637, 'epoch': 136.0}


                                                       
 14%|█▎        | 11152/82000 [59:05<5:07:12,  3.84it/s]

{'eval_loss': 0.4961634576320648, 'eval_bleu': 6.9543, 'eval_gen_len': 5.9726, 'eval_runtime': 5.9921, 'eval_samples_per_second': 24.365, 'eval_steps_per_second': 1.669, 'epoch': 136.0}


 14%|█▎        | 11234/82000 [59:23<4:16:32,  4.60it/s] 

{'loss': 0.1281, 'learning_rate': 0.00214708484837387, 'epoch': 137.0}


                                                       
 14%|█▎        | 11234/82000 [59:28<4:16:32,  4.60it/s]

{'eval_loss': 0.5144163966178894, 'eval_bleu': 7.3024, 'eval_gen_len': 6.9247, 'eval_runtime': 4.7198, 'eval_samples_per_second': 30.933, 'eval_steps_per_second': 2.119, 'epoch': 137.0}


 14%|█▍        | 11316/82000 [59:48<3:40:32,  5.34it/s] 

{'loss': 0.1251, 'learning_rate': 0.0021445970926891027, 'epoch': 138.0}


                                                       
 14%|█▍        | 11316/82000 [59:53<3:40:32,  5.34it/s]

{'eval_loss': 0.5245012044906616, 'eval_bleu': 5.7405, 'eval_gen_len': 6.4247, 'eval_runtime': 5.1259, 'eval_samples_per_second': 28.483, 'eval_steps_per_second': 1.951, 'epoch': 138.0}


 14%|█▍        | 11398/82000 [1:00:13<4:01:12,  4.88it/s]

{'loss': 0.1186, 'learning_rate': 0.0021421093370043356, 'epoch': 139.0}


                                                         
 14%|█▍        | 11398/82000 [1:00:18<4:01:12,  4.88it/s]

{'eval_loss': 0.5172784328460693, 'eval_bleu': 7.1585, 'eval_gen_len': 6.1575, 'eval_runtime': 4.9027, 'eval_samples_per_second': 29.779, 'eval_steps_per_second': 2.04, 'epoch': 139.0}


 14%|█▍        | 11480/82000 [1:00:37<3:36:31,  5.43it/s] 

{'loss': 0.1199, 'learning_rate': 0.0021396215813195684, 'epoch': 140.0}


                                                         
 14%|█▍        | 11480/82000 [1:00:41<3:36:31,  5.43it/s]

{'eval_loss': 0.5180147886276245, 'eval_bleu': 5.311, 'eval_gen_len': 6.2877, 'eval_runtime': 4.1532, 'eval_samples_per_second': 35.154, 'eval_steps_per_second': 2.408, 'epoch': 140.0}


 14%|█▍        | 11562/82000 [1:01:03<4:17:49,  4.55it/s] 

{'loss': 0.1207, 'learning_rate': 0.0021371641641187615, 'epoch': 141.0}


                                                         
 14%|█▍        | 11562/82000 [1:01:08<4:17:49,  4.55it/s]

{'eval_loss': 0.5096808671951294, 'eval_bleu': 5.6077, 'eval_gen_len': 6.1918, 'eval_runtime': 4.7327, 'eval_samples_per_second': 30.849, 'eval_steps_per_second': 2.113, 'epoch': 141.0}


 14%|█▍        | 11644/82000 [1:01:28<4:14:52,  4.60it/s] 

{'loss': 0.1461, 'learning_rate': 0.0021346764084339944, 'epoch': 142.0}


                                                         
 14%|█▍        | 11644/82000 [1:01:34<4:14:52,  4.60it/s]

{'eval_loss': 0.5062730312347412, 'eval_bleu': 9.9282, 'eval_gen_len': 6.3425, 'eval_runtime': 5.5385, 'eval_samples_per_second': 26.361, 'eval_steps_per_second': 1.806, 'epoch': 142.0}


 14%|█▍        | 11726/82000 [1:01:52<3:55:34,  4.97it/s] 

{'loss': 0.1316, 'learning_rate': 0.002132188652749227, 'epoch': 143.0}


                                                         
 14%|█▍        | 11726/82000 [1:01:56<3:55:34,  4.97it/s]

{'eval_loss': 0.5500849485397339, 'eval_bleu': 6.0739, 'eval_gen_len': 6.1918, 'eval_runtime': 4.0089, 'eval_samples_per_second': 36.419, 'eval_steps_per_second': 2.494, 'epoch': 143.0}


 14%|█▍        | 11808/82000 [1:02:14<4:09:29,  4.69it/s] 

{'loss': 0.1139, 'learning_rate': 0.00212970089706446, 'epoch': 144.0}


                                                         
 14%|█▍        | 11808/82000 [1:02:18<4:09:29,  4.69it/s]

{'eval_loss': 0.5344101786613464, 'eval_bleu': 9.1448, 'eval_gen_len': 6.589, 'eval_runtime': 3.9267, 'eval_samples_per_second': 37.181, 'eval_steps_per_second': 2.547, 'epoch': 144.0}


 14%|█▍        | 11890/82000 [1:02:36<3:37:10,  5.38it/s] 

{'loss': 0.1082, 'learning_rate': 0.002127213141379693, 'epoch': 145.0}


                                                         
 14%|█▍        | 11890/82000 [1:02:40<3:37:10,  5.38it/s]

{'eval_loss': 0.5235180854797363, 'eval_bleu': 6.8277, 'eval_gen_len': 6.3973, 'eval_runtime': 4.0534, 'eval_samples_per_second': 36.019, 'eval_steps_per_second': 2.467, 'epoch': 145.0}


 15%|█▍        | 11972/82000 [1:02:58<5:42:12,  3.41it/s] 

{'loss': 0.1096, 'learning_rate': 0.0021247253856949257, 'epoch': 146.0}


                                                         
 15%|█▍        | 11972/82000 [1:03:02<5:42:12,  3.41it/s]

{'eval_loss': 0.49514561891555786, 'eval_bleu': 8.0262, 'eval_gen_len': 5.9247, 'eval_runtime': 3.8552, 'eval_samples_per_second': 37.871, 'eval_steps_per_second': 2.594, 'epoch': 146.0}


 15%|█▍        | 12054/82000 [1:03:19<3:53:21,  5.00it/s] 

{'loss': 0.1205, 'learning_rate': 0.0021222376300101585, 'epoch': 147.0}


                                                         
 15%|█▍        | 12054/82000 [1:03:23<3:53:21,  5.00it/s]

{'eval_loss': 0.4873931109905243, 'eval_bleu': 8.9332, 'eval_gen_len': 6.589, 'eval_runtime': 4.2086, 'eval_samples_per_second': 34.691, 'eval_steps_per_second': 2.376, 'epoch': 147.0}


 15%|█▍        | 12136/82000 [1:03:40<3:37:06,  5.36it/s] 

{'loss': 0.1152, 'learning_rate': 0.0021197498743253914, 'epoch': 148.0}


                                                         
 15%|█▍        | 12136/82000 [1:03:45<3:37:06,  5.36it/s]

{'eval_loss': 0.4985153079032898, 'eval_bleu': 7.0945, 'eval_gen_len': 6.1849, 'eval_runtime': 4.2453, 'eval_samples_per_second': 34.391, 'eval_steps_per_second': 2.356, 'epoch': 148.0}


 15%|█▍        | 12218/82000 [1:04:02<4:31:02,  4.29it/s] 

{'loss': 0.1169, 'learning_rate': 0.002117262118640624, 'epoch': 149.0}


                                                         
 15%|█▍        | 12218/82000 [1:04:06<4:31:02,  4.29it/s]

{'eval_loss': 0.49700218439102173, 'eval_bleu': 8.3732, 'eval_gen_len': 6.8836, 'eval_runtime': 3.8796, 'eval_samples_per_second': 37.633, 'eval_steps_per_second': 2.578, 'epoch': 149.0}


 15%|█▌        | 12300/82000 [1:04:23<3:40:16,  5.27it/s] 

{'loss': 0.1215, 'learning_rate': 0.002114774362955857, 'epoch': 150.0}


                                                         
 15%|█▌        | 12300/82000 [1:04:26<3:40:16,  5.27it/s]

{'eval_loss': 0.5136995911598206, 'eval_bleu': 5.7733, 'eval_gen_len': 6.7123, 'eval_runtime': 3.8864, 'eval_samples_per_second': 37.567, 'eval_steps_per_second': 2.573, 'epoch': 150.0}


 15%|█▌        | 12382/82000 [1:04:46<5:21:34,  3.61it/s] 

{'loss': 0.1159, 'learning_rate': 0.00211228660727109, 'epoch': 151.0}


                                                         
 15%|█▌        | 12382/82000 [1:04:52<5:21:34,  3.61it/s]

{'eval_loss': 0.5166859030723572, 'eval_bleu': 6.81, 'eval_gen_len': 6.6575, 'eval_runtime': 6.141, 'eval_samples_per_second': 23.775, 'eval_steps_per_second': 1.628, 'epoch': 151.0}


 15%|█▌        | 12464/82000 [1:05:19<4:51:49,  3.97it/s] 

{'loss': 0.1211, 'learning_rate': 0.0021097988515863227, 'epoch': 152.0}


                                                         
 15%|█▌        | 12464/82000 [1:05:25<4:51:49,  3.97it/s]

{'eval_loss': 0.512475848197937, 'eval_bleu': 7.3754, 'eval_gen_len': 6.5, 'eval_runtime': 6.1339, 'eval_samples_per_second': 23.802, 'eval_steps_per_second': 1.63, 'epoch': 152.0}


 15%|█▌        | 12546/82000 [1:05:50<5:12:34,  3.70it/s] 

{'loss': 0.1194, 'learning_rate': 0.0021073110959015555, 'epoch': 153.0}


                                                         
 15%|█▌        | 12546/82000 [1:05:55<5:12:34,  3.70it/s]

{'eval_loss': 0.5251861214637756, 'eval_bleu': 8.4095, 'eval_gen_len': 6.1164, 'eval_runtime': 5.7945, 'eval_samples_per_second': 25.196, 'eval_steps_per_second': 1.726, 'epoch': 153.0}


 15%|█▌        | 12628/82000 [1:06:18<4:34:36,  4.21it/s] 

{'loss': 0.1194, 'learning_rate': 0.0021048233402167884, 'epoch': 154.0}


                                                         
 15%|█▌        | 12628/82000 [1:06:23<4:34:36,  4.21it/s]

{'eval_loss': 0.5534837245941162, 'eval_bleu': 5.4285, 'eval_gen_len': 6.2123, 'eval_runtime': 5.292, 'eval_samples_per_second': 27.589, 'eval_steps_per_second': 1.89, 'epoch': 154.0}


 16%|█▌        | 12710/82000 [1:06:45<7:24:12,  2.60it/s] 

{'loss': 0.1464, 'learning_rate': 0.002102335584532021, 'epoch': 155.0}


                                                         
 16%|█▌        | 12710/82000 [1:06:51<7:24:12,  2.60it/s]

{'eval_loss': 0.534020185470581, 'eval_bleu': 6.2803, 'eval_gen_len': 6.137, 'eval_runtime': 5.8028, 'eval_samples_per_second': 25.16, 'eval_steps_per_second': 1.723, 'epoch': 155.0}


 16%|█▌        | 12792/82000 [1:07:12<4:33:29,  4.22it/s] 

{'loss': 0.1205, 'learning_rate': 0.002099847828847254, 'epoch': 156.0}


                                                         
 16%|█▌        | 12792/82000 [1:07:18<4:33:29,  4.22it/s]

{'eval_loss': 0.5385541915893555, 'eval_bleu': 6.6103, 'eval_gen_len': 6.1712, 'eval_runtime': 5.7903, 'eval_samples_per_second': 25.215, 'eval_steps_per_second': 1.727, 'epoch': 156.0}


 16%|█▌        | 12874/82000 [1:07:39<4:56:49,  3.88it/s] 

{'loss': 0.1059, 'learning_rate': 0.002097360073162487, 'epoch': 157.0}


                                                         
 16%|█▌        | 12874/82000 [1:07:45<4:56:49,  3.88it/s]

{'eval_loss': 0.5182590484619141, 'eval_bleu': 6.231, 'eval_gen_len': 6.1712, 'eval_runtime': 5.3683, 'eval_samples_per_second': 27.197, 'eval_steps_per_second': 1.863, 'epoch': 157.0}


 16%|█▌        | 12956/82000 [1:08:03<3:26:52,  5.56it/s] 

{'loss': 0.0978, 'learning_rate': 0.0020948723174777197, 'epoch': 158.0}


                                                         
 16%|█▌        | 12956/82000 [1:08:07<3:26:52,  5.56it/s]

{'eval_loss': 0.5206412076950073, 'eval_bleu': 8.1952, 'eval_gen_len': 6.4041, 'eval_runtime': 4.2867, 'eval_samples_per_second': 34.059, 'eval_steps_per_second': 2.333, 'epoch': 158.0}


 16%|█▌        | 13038/82000 [1:08:25<4:06:40,  4.66it/s] 

{'loss': 0.1087, 'learning_rate': 0.0020923845617929525, 'epoch': 159.0}


                                                         
 16%|█▌        | 13038/82000 [1:08:30<4:06:40,  4.66it/s]

{'eval_loss': 0.5528596043586731, 'eval_bleu': 7.4187, 'eval_gen_len': 6.1027, 'eval_runtime': 5.1688, 'eval_samples_per_second': 28.247, 'eval_steps_per_second': 1.935, 'epoch': 159.0}


 16%|█▌        | 13120/82000 [1:08:47<3:46:13,  5.07it/s] 

{'loss': 0.1166, 'learning_rate': 0.0020898968061081854, 'epoch': 160.0}


                                                         
 16%|█▌        | 13120/82000 [1:08:52<3:46:13,  5.07it/s]

{'eval_loss': 0.5074155926704407, 'eval_bleu': 3.8511, 'eval_gen_len': 6.5137, 'eval_runtime': 4.5544, 'eval_samples_per_second': 32.057, 'eval_steps_per_second': 2.196, 'epoch': 160.0}


 16%|█▌        | 13202/82000 [1:09:12<3:50:10,  4.98it/s] 

{'loss': 0.1135, 'learning_rate': 0.002087409050423418, 'epoch': 161.0}


                                                         
 16%|█▌        | 13202/82000 [1:09:16<3:50:10,  4.98it/s]

{'eval_loss': 0.5513588786125183, 'eval_bleu': 7.8814, 'eval_gen_len': 6.4795, 'eval_runtime': 4.2777, 'eval_samples_per_second': 34.131, 'eval_steps_per_second': 2.338, 'epoch': 161.0}


 16%|█▌        | 13284/82000 [1:09:35<3:52:48,  4.92it/s] 

{'loss': 0.1233, 'learning_rate': 0.002084921294738651, 'epoch': 162.0}


                                                         
 16%|█▌        | 13284/82000 [1:09:39<3:52:48,  4.92it/s]

{'eval_loss': 0.5458780527114868, 'eval_bleu': 7.7583, 'eval_gen_len': 6.1301, 'eval_runtime': 4.2748, 'eval_samples_per_second': 34.153, 'eval_steps_per_second': 2.339, 'epoch': 162.0}


 16%|█▋        | 13366/82000 [1:09:56<3:52:28,  4.92it/s] 

{'loss': 0.1292, 'learning_rate': 0.002082433539053884, 'epoch': 163.0}


                                                         
 16%|█▋        | 13366/82000 [1:10:01<3:52:28,  4.92it/s]

{'eval_loss': 0.5405319333076477, 'eval_bleu': 7.5182, 'eval_gen_len': 6.1781, 'eval_runtime': 4.4353, 'eval_samples_per_second': 32.917, 'eval_steps_per_second': 2.255, 'epoch': 163.0}


 16%|█▋        | 13448/82000 [1:10:20<4:42:32,  4.04it/s] 

{'loss': 0.114, 'learning_rate': 0.0020799457833691167, 'epoch': 164.0}


                                                         
 16%|█▋        | 13448/82000 [1:10:25<4:42:32,  4.04it/s]

{'eval_loss': 0.5276998281478882, 'eval_bleu': 6.6454, 'eval_gen_len': 6.3973, 'eval_runtime': 5.271, 'eval_samples_per_second': 27.699, 'eval_steps_per_second': 1.897, 'epoch': 164.0}


 16%|█▋        | 13530/82000 [1:10:44<3:48:12,  5.00it/s] 

{'loss': 0.105, 'learning_rate': 0.0020774580276843495, 'epoch': 165.0}


                                                         
 16%|█▋        | 13530/82000 [1:10:49<3:48:12,  5.00it/s]

{'eval_loss': 0.5469301342964172, 'eval_bleu': 9.0363, 'eval_gen_len': 6.0342, 'eval_runtime': 4.9219, 'eval_samples_per_second': 29.663, 'eval_steps_per_second': 2.032, 'epoch': 165.0}


 17%|█▋        | 13612/82000 [1:11:06<3:58:05,  4.79it/s] 

{'loss': 0.105, 'learning_rate': 0.0020749702719995823, 'epoch': 166.0}


                                                         
 17%|█▋        | 13612/82000 [1:11:11<3:58:05,  4.79it/s]

{'eval_loss': 0.5180948972702026, 'eval_bleu': 6.9277, 'eval_gen_len': 6.2123, 'eval_runtime': 4.7381, 'eval_samples_per_second': 30.814, 'eval_steps_per_second': 2.111, 'epoch': 166.0}


 17%|█▋        | 13694/82000 [1:11:29<3:47:09,  5.01it/s] 

{'loss': 0.1002, 'learning_rate': 0.002072482516314815, 'epoch': 167.0}


                                                         
 17%|█▋        | 13694/82000 [1:11:33<3:47:09,  5.01it/s]

{'eval_loss': 0.5277482271194458, 'eval_bleu': 7.8556, 'eval_gen_len': 6.5205, 'eval_runtime': 4.4331, 'eval_samples_per_second': 32.934, 'eval_steps_per_second': 2.256, 'epoch': 167.0}


 17%|█▋        | 13776/82000 [1:11:51<3:39:33,  5.18it/s] 

{'loss': 0.1056, 'learning_rate': 0.002069994760630048, 'epoch': 168.0}


                                                         
 17%|█▋        | 13776/82000 [1:11:56<3:39:33,  5.18it/s]

{'eval_loss': 0.5423967838287354, 'eval_bleu': 5.7292, 'eval_gen_len': 6.1233, 'eval_runtime': 4.5692, 'eval_samples_per_second': 31.953, 'eval_steps_per_second': 2.189, 'epoch': 168.0}


 17%|█▋        | 13858/82000 [1:12:14<3:53:01,  4.87it/s] 

{'loss': 0.118, 'learning_rate': 0.002067507004945281, 'epoch': 169.0}


                                                         
 17%|█▋        | 13858/82000 [1:12:18<3:53:01,  4.87it/s]

{'eval_loss': 0.5250555872917175, 'eval_bleu': 5.317, 'eval_gen_len': 6.4315, 'eval_runtime': 4.8447, 'eval_samples_per_second': 30.136, 'eval_steps_per_second': 2.064, 'epoch': 169.0}


 17%|█▋        | 13940/82000 [1:12:37<3:49:19,  4.95it/s] 

{'loss': 0.1141, 'learning_rate': 0.0020650192492605137, 'epoch': 170.0}


                                                         
 17%|█▋        | 13940/82000 [1:12:41<3:49:19,  4.95it/s]

{'eval_loss': 0.5307872295379639, 'eval_bleu': 9.7397, 'eval_gen_len': 6.6096, 'eval_runtime': 4.4332, 'eval_samples_per_second': 32.933, 'eval_steps_per_second': 2.256, 'epoch': 170.0}


 17%|█▋        | 14022/82000 [1:13:01<3:33:46,  5.30it/s] 

{'loss': 0.1073, 'learning_rate': 0.0020625314935757465, 'epoch': 171.0}


                                                         
 17%|█▋        | 14022/82000 [1:13:06<3:33:46,  5.30it/s]

{'eval_loss': 0.5274799466133118, 'eval_bleu': 8.1622, 'eval_gen_len': 6.7192, 'eval_runtime': 4.4033, 'eval_samples_per_second': 33.157, 'eval_steps_per_second': 2.271, 'epoch': 171.0}


 17%|█▋        | 14104/82000 [1:13:23<3:46:17,  5.00it/s] 

{'loss': 0.1103, 'learning_rate': 0.0020600437378909793, 'epoch': 172.0}


                                                         
 17%|█▋        | 14104/82000 [1:13:28<3:46:17,  5.00it/s]

{'eval_loss': 0.5326807498931885, 'eval_bleu': 10.4253, 'eval_gen_len': 6.7671, 'eval_runtime': 4.4064, 'eval_samples_per_second': 33.134, 'eval_steps_per_second': 2.269, 'epoch': 172.0}


 17%|█▋        | 14186/82000 [1:13:44<3:17:02,  5.74it/s] 

{'loss': 0.1113, 'learning_rate': 0.002057555982206212, 'epoch': 173.0}


                                                         
 17%|█▋        | 14186/82000 [1:13:48<3:17:02,  5.74it/s]

{'eval_loss': 0.5251772403717041, 'eval_bleu': 8.4362, 'eval_gen_len': 6.8151, 'eval_runtime': 3.6992, 'eval_samples_per_second': 39.468, 'eval_steps_per_second': 2.703, 'epoch': 173.0}


 17%|█▋        | 14268/82000 [1:14:05<3:10:45,  5.92it/s] 

{'loss': 0.1185, 'learning_rate': 0.002055068226521445, 'epoch': 174.0}


                                                         
 17%|█▋        | 14268/82000 [1:14:09<3:10:45,  5.92it/s]

{'eval_loss': 0.5513583421707153, 'eval_bleu': 6.713, 'eval_gen_len': 5.726, 'eval_runtime': 3.9558, 'eval_samples_per_second': 36.908, 'eval_steps_per_second': 2.528, 'epoch': 174.0}


 18%|█▊        | 14350/82000 [1:14:24<3:15:15,  5.77it/s] 

{'loss': 0.129, 'learning_rate': 0.002052580470836678, 'epoch': 175.0}


                                                         
 18%|█▊        | 14350/82000 [1:14:28<3:15:15,  5.77it/s]

{'eval_loss': 0.5216081142425537, 'eval_bleu': 9.0458, 'eval_gen_len': 6.9589, 'eval_runtime': 4.0569, 'eval_samples_per_second': 35.988, 'eval_steps_per_second': 2.465, 'epoch': 175.0}


 18%|█▊        | 14432/82000 [1:14:44<2:58:30,  6.31it/s] 

{'loss': 0.1149, 'learning_rate': 0.0020500927151519107, 'epoch': 176.0}


                                                         
 18%|█▊        | 14432/82000 [1:14:48<2:58:30,  6.31it/s]

{'eval_loss': 0.5254054665565491, 'eval_bleu': 9.0431, 'eval_gen_len': 6.6027, 'eval_runtime': 3.7286, 'eval_samples_per_second': 39.157, 'eval_steps_per_second': 2.682, 'epoch': 176.0}


 18%|█▊        | 14514/82000 [1:15:02<2:46:04,  6.77it/s] 

{'loss': 0.1022, 'learning_rate': 0.0020476049594671435, 'epoch': 177.0}


                                                         
 18%|█▊        | 14514/82000 [1:15:06<2:46:04,  6.77it/s]

{'eval_loss': 0.520235002040863, 'eval_bleu': 7.7327, 'eval_gen_len': 6.7671, 'eval_runtime': 3.7781, 'eval_samples_per_second': 38.644, 'eval_steps_per_second': 2.647, 'epoch': 177.0}


 18%|█▊        | 14596/82000 [1:15:21<3:40:25,  5.10it/s] 

{'loss': 0.0964, 'learning_rate': 0.0020451172037823763, 'epoch': 178.0}


                                                         
 18%|█▊        | 14596/82000 [1:15:24<3:40:25,  5.10it/s]

{'eval_loss': 0.513776421546936, 'eval_bleu': 6.3458, 'eval_gen_len': 6.1164, 'eval_runtime': 3.5026, 'eval_samples_per_second': 41.683, 'eval_steps_per_second': 2.855, 'epoch': 178.0}


 18%|█▊        | 14678/82000 [1:15:39<3:07:38,  5.98it/s] 

{'loss': 0.0964, 'learning_rate': 0.002042629448097609, 'epoch': 179.0}


                                                         
 18%|█▊        | 14678/82000 [1:15:43<3:07:38,  5.98it/s]

{'eval_loss': 0.5230356454849243, 'eval_bleu': 7.7555, 'eval_gen_len': 6.2055, 'eval_runtime': 3.6591, 'eval_samples_per_second': 39.9, 'eval_steps_per_second': 2.733, 'epoch': 179.0}


 18%|█▊        | 14760/82000 [1:15:58<2:56:58,  6.33it/s] 

{'loss': 0.0973, 'learning_rate': 0.002040141692412842, 'epoch': 180.0}


                                                         
 18%|█▊        | 14760/82000 [1:16:01<2:56:58,  6.33it/s]

{'eval_loss': 0.5236246585845947, 'eval_bleu': 6.6986, 'eval_gen_len': 6.9658, 'eval_runtime': 3.6918, 'eval_samples_per_second': 39.547, 'eval_steps_per_second': 2.709, 'epoch': 180.0}


 18%|█▊        | 14842/82000 [1:16:16<3:07:35,  5.97it/s] 

{'loss': 0.0994, 'learning_rate': 0.002037653936728075, 'epoch': 181.0}


                                                         
 18%|█▊        | 14842/82000 [1:16:20<3:07:35,  5.97it/s]

{'eval_loss': 0.5190406441688538, 'eval_bleu': 8.9629, 'eval_gen_len': 6.4452, 'eval_runtime': 3.5836, 'eval_samples_per_second': 40.741, 'eval_steps_per_second': 2.791, 'epoch': 181.0}


 18%|█▊        | 14924/82000 [1:16:41<4:32:29,  4.10it/s] 

{'loss': 0.1064, 'learning_rate': 0.0020351661810433077, 'epoch': 182.0}


                                                         
 18%|█▊        | 14924/82000 [1:16:45<4:32:29,  4.10it/s]

{'eval_loss': 0.528886616230011, 'eval_bleu': 6.3307, 'eval_gen_len': 6.0616, 'eval_runtime': 3.7785, 'eval_samples_per_second': 38.639, 'eval_steps_per_second': 2.647, 'epoch': 182.0}


 18%|█▊        | 15006/82000 [1:17:00<3:28:50,  5.35it/s] 

{'loss': 0.1075, 'learning_rate': 0.0020326784253585405, 'epoch': 183.0}


                                                         
 18%|█▊        | 15006/82000 [1:17:04<3:28:50,  5.35it/s]

{'eval_loss': 0.527590274810791, 'eval_bleu': 5.8786, 'eval_gen_len': 6.3767, 'eval_runtime': 3.7633, 'eval_samples_per_second': 38.796, 'eval_steps_per_second': 2.657, 'epoch': 183.0}


 18%|█▊        | 15088/82000 [1:17:20<3:22:35,  5.50it/s] 

{'loss': 0.1097, 'learning_rate': 0.0020301906696737733, 'epoch': 184.0}


                                                         
 18%|█▊        | 15088/82000 [1:17:24<3:22:35,  5.50it/s]

{'eval_loss': 0.5171642303466797, 'eval_bleu': 7.3903, 'eval_gen_len': 6.5274, 'eval_runtime': 3.624, 'eval_samples_per_second': 40.287, 'eval_steps_per_second': 2.759, 'epoch': 184.0}


 18%|█▊        | 15170/82000 [1:17:40<3:10:57,  5.83it/s] 

{'loss': 0.1037, 'learning_rate': 0.002027702913989006, 'epoch': 185.0}


                                                         
 18%|█▊        | 15170/82000 [1:17:44<3:10:57,  5.83it/s]

{'eval_loss': 0.5195764303207397, 'eval_bleu': 7.3873, 'eval_gen_len': 6.8014, 'eval_runtime': 3.9665, 'eval_samples_per_second': 36.808, 'eval_steps_per_second': 2.521, 'epoch': 185.0}


 19%|█▊        | 15252/82000 [1:17:59<3:13:54,  5.74it/s] 

{'loss': 0.1042, 'learning_rate': 0.002025215158304239, 'epoch': 186.0}


                                                         
 19%|█▊        | 15252/82000 [1:18:03<3:13:54,  5.74it/s]

{'eval_loss': 0.5029560327529907, 'eval_bleu': 9.8145, 'eval_gen_len': 6.6575, 'eval_runtime': 3.9888, 'eval_samples_per_second': 36.602, 'eval_steps_per_second': 2.507, 'epoch': 186.0}


 19%|█▊        | 15334/82000 [1:18:18<3:07:46,  5.92it/s] 

{'loss': 0.104, 'learning_rate': 0.002022727402619472, 'epoch': 187.0}


                                                         
 19%|█▊        | 15334/82000 [1:18:23<3:07:46,  5.92it/s]

{'eval_loss': 0.4994444251060486, 'eval_bleu': 7.6915, 'eval_gen_len': 6.7123, 'eval_runtime': 4.1808, 'eval_samples_per_second': 34.922, 'eval_steps_per_second': 2.392, 'epoch': 187.0}


 19%|█▉        | 15416/82000 [1:18:38<3:10:39,  5.82it/s] 

{'loss': 0.1041, 'learning_rate': 0.002020239646934705, 'epoch': 188.0}


                                                         
 19%|█▉        | 15416/82000 [1:18:42<3:10:39,  5.82it/s]

{'eval_loss': 0.5184868574142456, 'eval_bleu': 7.1547, 'eval_gen_len': 6.5342, 'eval_runtime': 3.9519, 'eval_samples_per_second': 36.945, 'eval_steps_per_second': 2.53, 'epoch': 188.0}


 19%|█▉        | 15498/82000 [1:18:58<3:19:25,  5.56it/s] 

{'loss': 0.1063, 'learning_rate': 0.002017751891249938, 'epoch': 189.0}


                                                         
 19%|█▉        | 15498/82000 [1:19:03<3:19:25,  5.56it/s]

{'eval_loss': 0.51961350440979, 'eval_bleu': 6.3064, 'eval_gen_len': 5.9178, 'eval_runtime': 4.2884, 'eval_samples_per_second': 34.045, 'eval_steps_per_second': 2.332, 'epoch': 189.0}


 19%|█▉        | 15580/82000 [1:19:18<2:53:42,  6.37it/s] 

{'loss': 0.1043, 'learning_rate': 0.0020152641355651708, 'epoch': 190.0}


                                                         
 19%|█▉        | 15580/82000 [1:19:22<2:53:42,  6.37it/s]

{'eval_loss': 0.527941882610321, 'eval_bleu': 8.5753, 'eval_gen_len': 6.5616, 'eval_runtime': 3.6972, 'eval_samples_per_second': 39.49, 'eval_steps_per_second': 2.705, 'epoch': 190.0}


 19%|█▉        | 15662/82000 [1:19:37<2:52:57,  6.39it/s] 

{'loss': 0.1114, 'learning_rate': 0.0020127763798804036, 'epoch': 191.0}


                                                         
 19%|█▉        | 15662/82000 [1:19:40<2:52:57,  6.39it/s]

{'eval_loss': 0.5228583812713623, 'eval_bleu': 6.9301, 'eval_gen_len': 5.8219, 'eval_runtime': 3.556, 'eval_samples_per_second': 41.057, 'eval_steps_per_second': 2.812, 'epoch': 191.0}


 19%|█▉        | 15744/82000 [1:19:57<4:14:25,  4.34it/s] 

{'loss': 0.1027, 'learning_rate': 0.0020102886241956364, 'epoch': 192.0}


                                                         
 19%|█▉        | 15744/82000 [1:20:00<4:14:25,  4.34it/s]

{'eval_loss': 0.540436863899231, 'eval_bleu': 7.0287, 'eval_gen_len': 6.1301, 'eval_runtime': 3.641, 'eval_samples_per_second': 40.099, 'eval_steps_per_second': 2.746, 'epoch': 192.0}


 19%|█▉        | 15826/82000 [1:20:15<3:09:55,  5.81it/s] 

{'loss': 0.102, 'learning_rate': 0.0020078008685108692, 'epoch': 193.0}


                                                         
 19%|█▉        | 15826/82000 [1:20:20<3:09:55,  5.81it/s]

{'eval_loss': 0.5504733324050903, 'eval_bleu': 5.6357, 'eval_gen_len': 6.3151, 'eval_runtime': 4.4934, 'eval_samples_per_second': 32.492, 'eval_steps_per_second': 2.225, 'epoch': 193.0}


 19%|█▉        | 15908/82000 [1:20:36<3:06:56,  5.89it/s] 

{'loss': 0.1202, 'learning_rate': 0.002005313112826102, 'epoch': 194.0}


                                                         
 19%|█▉        | 15908/82000 [1:20:40<3:06:56,  5.89it/s]

{'eval_loss': 0.5360766649246216, 'eval_bleu': 6.3532, 'eval_gen_len': 6.863, 'eval_runtime': 3.6434, 'eval_samples_per_second': 40.073, 'eval_steps_per_second': 2.745, 'epoch': 194.0}


 20%|█▉        | 15990/82000 [1:20:55<3:22:52,  5.42it/s] 

{'loss': 0.1244, 'learning_rate': 0.002002855695625295, 'epoch': 195.0}


                                                         
 20%|█▉        | 15990/82000 [1:20:59<3:22:52,  5.42it/s]

{'eval_loss': 0.5300918221473694, 'eval_bleu': 9.7912, 'eval_gen_len': 6.3767, 'eval_runtime': 3.7924, 'eval_samples_per_second': 38.498, 'eval_steps_per_second': 2.637, 'epoch': 195.0}


 20%|█▉        | 16072/82000 [1:21:15<3:15:14,  5.63it/s] 

{'loss': 0.1139, 'learning_rate': 0.002000367939940528, 'epoch': 196.0}


                                                         
 20%|█▉        | 16072/82000 [1:21:19<3:15:14,  5.63it/s]

{'eval_loss': 0.5240688323974609, 'eval_bleu': 7.3026, 'eval_gen_len': 6.5616, 'eval_runtime': 3.9392, 'eval_samples_per_second': 37.063, 'eval_steps_per_second': 2.539, 'epoch': 196.0}


 20%|█▉        | 16154/82000 [1:21:34<3:23:01,  5.41it/s] 

{'loss': 0.1029, 'learning_rate': 0.001997880184255761, 'epoch': 197.0}


                                                         
 20%|█▉        | 16154/82000 [1:21:38<3:23:01,  5.41it/s]

{'eval_loss': 0.5170548558235168, 'eval_bleu': 8.3729, 'eval_gen_len': 6.4863, 'eval_runtime': 3.6348, 'eval_samples_per_second': 40.167, 'eval_steps_per_second': 2.751, 'epoch': 197.0}


 20%|█▉        | 16236/82000 [1:21:53<3:01:13,  6.05it/s] 

{'loss': 0.0908, 'learning_rate': 0.0019953924285709937, 'epoch': 198.0}


                                                         
 20%|█▉        | 16236/82000 [1:21:58<3:01:13,  6.05it/s]

{'eval_loss': 0.5161277651786804, 'eval_bleu': 7.4749, 'eval_gen_len': 6.2877, 'eval_runtime': 4.8036, 'eval_samples_per_second': 30.394, 'eval_steps_per_second': 2.082, 'epoch': 198.0}


 20%|█▉        | 16318/82000 [1:22:18<3:53:18,  4.69it/s] 

{'loss': 0.0871, 'learning_rate': 0.0019929046728862265, 'epoch': 199.0}


                                                         
 20%|█▉        | 16318/82000 [1:22:21<3:53:18,  4.69it/s]

{'eval_loss': 0.5333148837089539, 'eval_bleu': 8.0462, 'eval_gen_len': 6.8904, 'eval_runtime': 3.8267, 'eval_samples_per_second': 38.153, 'eval_steps_per_second': 2.613, 'epoch': 199.0}


 20%|██        | 16400/82000 [1:22:38<3:24:49,  5.34it/s] 

{'loss': 0.0926, 'learning_rate': 0.0019904169172014594, 'epoch': 200.0}


                                                         
 20%|██        | 16400/82000 [1:22:43<3:24:49,  5.34it/s]

{'eval_loss': 0.5190795063972473, 'eval_bleu': 8.4966, 'eval_gen_len': 6.5959, 'eval_runtime': 4.4661, 'eval_samples_per_second': 32.691, 'eval_steps_per_second': 2.239, 'epoch': 200.0}


 20%|██        | 16482/82000 [1:23:04<4:25:19,  4.12it/s] 

{'loss': 0.096, 'learning_rate': 0.001987929161516692, 'epoch': 201.0}


                                                         
 20%|██        | 16482/82000 [1:23:09<4:25:19,  4.12it/s]

{'eval_loss': 0.5336241126060486, 'eval_bleu': 8.6227, 'eval_gen_len': 6.3288, 'eval_runtime': 5.6374, 'eval_samples_per_second': 25.899, 'eval_steps_per_second': 1.774, 'epoch': 201.0}


 20%|██        | 16564/82000 [1:23:32<5:27:58,  3.33it/s] 

{'loss': 0.1055, 'learning_rate': 0.001985441405831925, 'epoch': 202.0}


                                                         
 20%|██        | 16564/82000 [1:23:38<5:27:58,  3.33it/s]

{'eval_loss': 0.5225945115089417, 'eval_bleu': 7.9969, 'eval_gen_len': 6.3014, 'eval_runtime': 5.6976, 'eval_samples_per_second': 25.625, 'eval_steps_per_second': 1.755, 'epoch': 202.0}


 20%|██        | 16574/82000 [1:23:41<5:56:09,  3.06it/s] 

KeyboardInterrupt: 

In [10]:
trainer.state.best_model_checkpoint

'data/checkpoints/t5_results_fw_v3\\checkpoint-1048'

We obtained a final BLEU score of **25.7015** for the best model.

In [9]:
# let us get the best model
model = AutoModelForSeq2SeqLM.from_pretrained('data/checkpoints/t5_results_fw_v3/...')

# let us get the test set
test_dataset = T5SentenceDataset(f"data/extractions/new_data/test_set.csv",
                                        tokenizer,
                                        truncation = True)

### Predictions

Let us generate texts and store into a DataFrame.

In [10]:

# set the model to eval mode
_ = model.eval()

# run model inference on all test data
original_translations, predicted_translations, original_texts, scores = [], [], [], {}

for data, attention_mask, labels in tqdm(DataLoader(test_dataset)):
    
    # Traduce the sentences
    original_text = tokenizer.decode(data[0], skip_special_tokens=True)
    
    original_translation = tokenizer.decode(labels[0], skip_special_tokens=True)
    
    # get tokens
    generated = torch.tensor(data)
    
    attention_mask = torch.tensor(attention_mask)
    
    # recuperate the pad token id
    pad_token_id = tokenizer.pad_token_id
    
    # perform prediction
    predictions = model.generate(generated, do_sample = False, top_k = 50, max_length = test_dataset.max_len, top_p = 0.90,
                                    temperature = 0, num_return_sequences = 0, attention_mask = attention_mask, pad_token_id = pad_token_id)
    
    # calculate the score and add it to the score
    result = evaluation.compute_metrics((predictions, torch.tensor(labels)))
    
    if not scores: scores.update({k: v for k, v in result.items()})
    
    else: scores.update({k: round(scores[k] + v, 4) for k, v in result.items()})
    
    # decode the predicted tokens into texts
    predicted_translation = list(test_dataset.decode(predictions))
    
    print(predicted_translation[0])
    
    # append results
    original_translations.append(original_translation)
    
    predicted_translations.extend(predicted_translation)
    
    original_texts.append(original_text)

# transform result into data frame
df_ft_to_wf = pd.DataFrame({'original_text': original_texts,
                            'original_label': original_translations,
                            'predicted_label': predicted_translations})

# print the result
df_ft_to_wf.head()

  generated = torch.tensor(data)
  attention_mask = torch.tensor(attention_mask)
  result = evaluation.compute_metrics((predictions, torch.tensor(labels)))
  1%|          | 1/162 [00:01<04:39,  1.74s/it]

Mbaa jan?


  1%|          | 2/162 [00:02<03:28,  1.30s/it]

Góor gi kenn bañ Moom


  2%|▏         | 3/162 [00:04<03:36,  1.36s/it]

Ci biir ŋgeen jëm?


  2%|▏         | 4/162 [00:05<03:17,  1.25s/it]

Dem naa ci keneen ki ñëw.


  3%|▎         | 5/162 [00:06<02:59,  1.14s/it]

Dil nitu réew mi


  4%|▎         | 6/162 [00:07<02:50,  1.09s/it]

Doo jëm?


  4%|▍         | 7/162 [00:08<03:25,  1.32s/it]

Mbaa kenn demul?


  5%|▍         | 8/162 [00:10<03:32,  1.38s/it]

Séen naa am xar.


  6%|▌         | 9/162 [00:11<03:29,  1.37s/it]

Yaw mi ŋga


  6%|▌         | 10/162 [00:12<03:15,  1.29s/it]

Ñii dañu demul woon


  7%|▋         | 11/162 [00:14<03:20,  1.33s/it]

Ku Loolu?


  7%|▋         | 12/162 [00:15<03:05,  1.24s/it]

Dóor na ka ba mi ŋgi.


  8%|▊         | 13/162 [00:16<02:52,  1.16s/it]

Demal rekk


  9%|▊         | 14/162 [00:17<02:49,  1.15s/it]

Waxtaan ak kenn kan?


  9%|▉         | 15/162 [00:18<02:46,  1.13s/it]

Nit ki rekk a ñëwul.


 10%|▉         | 16/162 [00:19<02:49,  1.16s/it]

Ci kii.


 10%|█         | 17/162 [00:20<02:40,  1.10s/it]

Moo di dem.


 11%|█         | 18/162 [00:21<02:33,  1.07s/it]

Waxtaan ŋga ag góor gi doon dem


 12%|█▏        | 19/162 [00:22<02:28,  1.04s/it]

Yéen mi ŋgi ci foofu


 12%|█▏        | 20/162 [00:23<02:31,  1.07s/it]

Soo demee, mu ñëw.


 13%|█▎        | 21/162 [00:24<02:25,  1.03s/it]

Bëgg naa góor gi ñëw


 14%|█▎        | 22/162 [00:25<02:22,  1.02s/it]

Gis ŋga xale yooyule?


 14%|█▍        | 23/162 [00:26<02:18,  1.00it/s]

Noona xale yi set nañu


 15%|█▍        | 24/162 [00:29<03:20,  1.45s/it]

Gor gii di Lawbe Ndar.


 15%|█▌        | 25/162 [00:31<03:35,  1.58s/it]

Yan ñoo ñëw?


 16%|█▌        | 26/162 [00:32<03:20,  1.47s/it]

Lépp jeex na.


 17%|█▋        | 27/162 [00:34<03:29,  1.56s/it]

Benn boobule laa la may.


 17%|█▋        | 28/162 [00:35<03:20,  1.50s/it]

Yaa ka gis moom Samba.


 18%|█▊        | 29/162 [00:37<03:28,  1.57s/it]

Xale yi bëgg nañu dikk, te mag ni ñaan nañu ŋgeen dem


 19%|█▊        | 30/162 [00:38<03:05,  1.40s/it]

Moontin nag, bëgg nañu dem


 19%|█▉        | 31/162 [00:39<02:58,  1.36s/it]

Waxuma yooyale xale?


 20%|█▉        | 32/162 [00:40<02:44,  1.27s/it]

Gis na keneen ki woon.


 20%|██        | 33/162 [00:41<02:30,  1.17s/it]

Noona sa waajur ñëw


 21%|██        | 34/162 [00:42<02:18,  1.08s/it]

Ku ñëw?


 22%|██▏       | 35/162 [00:43<02:08,  1.01s/it]

Gis naa ki woon.


 22%|██▏       | 36/162 [00:44<02:05,  1.00it/s]

Yaa ñëwkóon


 23%|██▎       | 37/162 [00:45<02:02,  1.02it/s]

Yéen demulwoon


 23%|██▎       | 38/162 [00:46<02:00,  1.03it/s]

Waw kookule.


 24%|██▍       | 39/162 [00:47<01:58,  1.04it/s]

Gis ŋga nit kee?


 25%|██▍       | 40/162 [00:47<01:55,  1.06it/s]

Ni ŋga def noonu.


 25%|██▌       | 41/162 [00:48<01:55,  1.05it/s]

Demal rekk!


 26%|██▌       | 42/162 [00:49<01:51,  1.07it/s]

Kenn ki dem na


 27%|██▋       | 43/162 [00:50<01:47,  1.10it/s]

Wax ji yépp, bañ-ŋga-ñëw la.


 27%|██▋       | 44/162 [00:51<01:47,  1.10it/s]

Ñun ñii lay set.


 28%|██▊       | 45/162 [00:52<01:46,  1.10it/s]

Dem nañu


 28%|██▊       | 46/162 [00:53<01:44,  1.11it/s]

Na dem su bëggul


 29%|██▉       | 47/162 [00:54<01:44,  1.10it/s]

Gis naa sama xarit yeneen yooyuu


 30%|██▉       | 48/162 [00:55<01:40,  1.13it/s]

Yooyale deey bëggu leen!


 30%|███       | 49/162 [00:55<01:39,  1.13it/s]

Su dee dem


 31%|███       | 50/162 [00:56<01:37,  1.14it/s]

Kii dafa demkoon


 31%|███▏      | 51/162 [00:57<01:35,  1.16it/s]

Soo demee, ci ñëw


 32%|███▏      | 52/162 [00:58<01:35,  1.16it/s]

Gis ŋga xale be?


 33%|███▎      | 53/162 [00:59<01:36,  1.12it/s]

Musaa


 33%|███▎      | 54/162 [01:00<01:34,  1.14it/s]

Doo dem?


 34%|███▍      | 55/162 [01:01<01:32,  1.16it/s]

Ma ŋgee doon dem


 35%|███▍      | 56/162 [01:01<01:30,  1.17it/s]

Ndax kan dem?


 35%|███▌      | 57/162 [01:02<01:29,  1.17it/s]

Góor gi moo demulwoon


 36%|███▌      | 58/162 [01:03<01:31,  1.13it/s]

Laobe ŋga woon.


 36%|███▋      | 59/162 [01:04<01:31,  1.13it/s]

Séen naa am guy.


 37%|███▋      | 60/162 [01:05<01:29,  1.14it/s]

Góor gi waxkoon na


 38%|███▊      | 61/162 [01:06<01:27,  1.15it/s]

Tann ŋga doomu benn jigéen.


 38%|███▊      | 62/162 [01:07<01:26,  1.16it/s]

Ma may ñan?


 39%|███▉      | 63/162 [01:08<01:24,  1.18it/s]

Ku mu?


 40%|███▉      | 64/162 [01:09<01:27,  1.12it/s]

Waxal ak ñooñule!


 40%|████      | 65/162 [01:09<01:25,  1.14it/s]

Gis ŋga nag yii yépp, woowuu moo ci gën.


 41%|████      | 66/162 [01:10<01:23,  1.15it/s]

Soo demee ag soo demul


 41%|████▏     | 67/162 [01:11<01:21,  1.17it/s]

Gis ŋga buu?


 42%|████▏     | 68/162 [01:12<01:20,  1.17it/s]

Doo nitu jamm


 43%|████▎     | 69/162 [01:13<01:19,  1.18it/s]

Kookule,?


 43%|████▎     | 70/162 [01:14<01:20,  1.14it/s]

Faatim la.


 44%|████▍     | 71/162 [01:15<01:19,  1.14it/s]

Waxal ag ndaw soo demul itam.


 44%|████▍     | 72/162 [01:15<01:18,  1.15it/s]

Ibraayima


 45%|████▌     | 73/162 [01:16<01:17,  1.15it/s]

Loo jëm?


 46%|████▌     | 74/162 [01:17<01:16,  1.15it/s]

Góor gi bëggul


 46%|████▋     | 75/162 [01:18<01:17,  1.12it/s]

Kookule la soo demee


 47%|████▋     | 76/162 [01:19<01:17,  1.11it/s]

Nit kookuu génn laa wax.


 48%|████▊     | 77/162 [01:20<01:15,  1.13it/s]

Góor gi gisul meneen.


 48%|████▊     | 78/162 [01:21<01:13,  1.14it/s]

Gisoon seen ban xarit?


 49%|████▉     | 79/162 [01:22<01:11,  1.15it/s]

Góor gi nee na la fi saŋx, ŋga dem ci biti.


 49%|████▉     | 80/162 [01:22<01:10,  1.16it/s]

Xale bi tawat la wax.


 50%|█████     | 81/162 [01:23<01:11,  1.13it/s]

Li ŋga wax loolu.


 51%|█████     | 82/162 [01:24<01:11,  1.11it/s]

Defe naa du ñëw


 51%|█████     | 83/162 [01:25<01:09,  1.13it/s]

Nit kookuu ci sama wet.


 52%|█████▏    | 84/162 [01:26<01:07,  1.15it/s]

Dem


 52%|█████▏    | 85/162 [01:27<01:06,  1.16it/s]

Feneen fi bëttóon foofu.


 53%|█████▎    | 86/162 [01:28<01:06,  1.15it/s]

Aminta ñëw?


 54%|█████▎    | 87/162 [01:29<01:06,  1.12it/s]

Yaa ñëw na.


 54%|█████▍    | 88/162 [01:30<01:05,  1.14it/s]

Dem nañu


 55%|█████▍    | 89/162 [01:30<01:03,  1.15it/s]

Ci fi góor gi dem.


 56%|█████▌    | 90/162 [01:31<01:04,  1.12it/s]

Loolule lépp.


 56%|█████▌    | 91/162 [01:32<01:03,  1.12it/s]

Nit ag gaynde duñu dëkkóo.


 57%|█████▋    | 92/162 [01:33<01:04,  1.08it/s]

Sa yay nee dana ñëw ci ŋgoon.


 57%|█████▋    | 93/162 [01:34<01:06,  1.05it/s]

Xale yile yarunañu.


 58%|█████▊    | 94/162 [01:35<01:03,  1.07it/s]

Yan kan ŋga dem?


 59%|█████▊    | 95/162 [01:36<01:01,  1.09it/s]

Góor gi moo dulwoon


 59%|█████▉    | 96/162 [01:37<01:00,  1.10it/s]

Daŋga gis kan?


 60%|█████▉    | 97/162 [01:38<00:58,  1.12it/s]

Seetil nag yépp!


 60%|██████    | 98/162 [01:39<00:58,  1.10it/s]

Xale bi mayul dara kii.


 61%|██████    | 99/162 [01:40<00:56,  1.11it/s]

Nit la.


 62%|██████▏   | 100/162 [01:40<00:54,  1.13it/s]

Dem na


 62%|██████▏   | 101/162 [01:41<00:54,  1.13it/s]

Xale bi tawat la wax.


 63%|██████▎   | 102/162 [01:42<00:53,  1.12it/s]

Gis naa booba xale?


 64%|██████▎   | 103/162 [01:43<00:52,  1.13it/s]

Dafa di nitu tay.


 64%|██████▍   | 104/162 [01:44<00:53,  1.08it/s]

Góor gi du t


 65%|██████▍   | 105/162 [01:45<00:51,  1.10it/s]

Dem ŋga dem te mu dem ag sama xarit ya.


 65%|██████▌   | 106/162 [01:46<00:50,  1.10it/s]

Ku mu?


 66%|██████▌   | 107/162 [01:47<00:49,  1.11it/s]

Boobu néeg ban ŋga wax?


 67%|██████▋   | 108/162 [01:48<00:48,  1.11it/s]

Góor gi dem?


 67%|██████▋   | 109/162 [01:49<00:48,  1.08it/s]

Su dee Lawbe


 68%|██████▊   | 110/162 [01:50<00:48,  1.08it/s]

Jan ŋga gis?


 69%|██████▊   | 111/162 [01:51<00:47,  1.07it/s]

Baax na?


 69%|██████▉   | 112/162 [01:51<00:47,  1.06it/s]

Demkoonuma


 70%|██████▉   | 113/162 [01:52<00:46,  1.05it/s]

Gis na keneen ki.


 70%|███████   | 114/162 [01:54<00:47,  1.02it/s]

Gis na keneen ki woon.


 71%|███████   | 115/162 [01:55<00:46,  1.01it/s]

Góor gee ni soo demee nit la


 72%|███████▏  | 116/162 [01:55<00:45,  1.01it/s]

Soo demee ag soo demul itam, dana ñëw.


 72%|███████▏  | 117/162 [01:56<00:43,  1.04it/s]

Séen naa am xar.


 73%|███████▎  | 118/162 [01:57<00:42,  1.04it/s]

Ñëwël ndax xale yi di ay liggéeykat, di ay jambaar


 73%|███████▎  | 119/162 [01:58<00:43,  1.00s/it]

Ci kooku, ndax mu wettëliku


 74%|███████▍  | 120/162 [01:59<00:41,  1.01it/s]

Lii lan?


 75%|███████▍  | 121/162 [02:00<00:39,  1.04it/s]

Koo gis?


 75%|███████▌  | 122/162 [02:01<00:36,  1.09it/s]

Du ŋgeen


 76%|███████▌  | 123/162 [02:02<00:35,  1.10it/s]

Bi ŋga dee dem


 77%|███████▋  | 124/162 [02:03<00:34,  1.10it/s]

Gis naa gaynde.


 77%|███████▋  | 125/162 [02:04<00:34,  1.08it/s]

Nit kookuu doo ka.


 78%|███████▊  | 126/162 [02:05<00:32,  1.11it/s]

Góor gi jëm


 78%|███████▊  | 127/162 [02:06<00:30,  1.14it/s]

Foofu, góor gi dem ba mi ŋgi fi.


 79%|███████▉  | 128/162 [02:06<00:29,  1.15it/s]

Gis naa xar yi gannaaw yaw.


 80%|███████▉  | 129/162 [02:07<00:28,  1.15it/s]

Doo dem?


 80%|████████  | 130/162 [02:08<00:27,  1.15it/s]

Ma ŋgoogule foofu.


 81%|████████  | 131/162 [02:09<00:27,  1.14it/s]

Fee la.


 81%|████████▏ | 132/162 [02:10<00:26,  1.15it/s]

Jile jigéen jan ŋgeen wax?


 82%|████████▏ | 133/162 [02:11<00:25,  1.15it/s]

Xam naa xale bi.


 83%|████████▎ | 134/162 [02:12<00:24,  1.13it/s]

Samba la?


 83%|████████▎ | 135/162 [02:13<00:23,  1.15it/s]

Ñeñeen lañu.


 84%|████████▍ | 136/162 [02:13<00:23,  1.12it/s]

Seet ŋga ñooñale ñan?


 85%|████████▍ | 137/162 [02:14<00:22,  1.12it/s]

Jigéen jan ñoo réer?


 85%|████████▌ | 138/162 [02:15<00:21,  1.14it/s]

Waxu la.


 86%|████████▌ | 139/162 [02:16<00:19,  1.15it/s]

Nit, demkoon


 86%|████████▋ | 140/162 [02:17<00:18,  1.16it/s]

Génnéel képp nit koo gis!


 87%|████████▋ | 141/162 [02:18<00:18,  1.17it/s]

Bëgguma du yar.


 88%|████████▊ | 142/162 [02:19<00:17,  1.14it/s]

Menn xar réerul.


 88%|████████▊ | 143/162 [02:20<00:16,  1.16it/s]

Yéen dem ŋga


 89%|████████▉ | 144/162 [02:20<00:15,  1.16it/s]

Jënd ñaa menn xar mi.


 90%|████████▉ | 145/162 [02:21<00:14,  1.17it/s]

Demkoonuma


 90%|█████████ | 146/162 [02:22<00:13,  1.18it/s]

Kookule la soo demee


 91%|█████████ | 147/162 [02:23<00:12,  1.17it/s]

Ci Séeréer yi ag Pël yi


 91%|█████████▏| 148/162 [02:24<00:12,  1.16it/s]

Wool góor gi dul dem


 92%|█████████▏| 149/162 [02:25<00:11,  1.18it/s]

Ku dem?


 93%|█████████▎| 150/162 [02:25<00:09,  1.21it/s]

Gis naa xar.


 93%|█████████▎| 151/162 [02:26<00:09,  1.22it/s]

Nit ka, moom nit kooka la.


 94%|█████████▍| 152/162 [02:27<00:08,  1.22it/s]

Dem na


 94%|█████████▍| 153/162 [02:28<00:07,  1.21it/s]

Nit, gayndé, nag, àndoon nañu fi.


 95%|█████████▌| 154/162 [02:29<00:06,  1.19it/s]

Yaa doonkoon wax


 96%|█████████▌| 155/162 [02:30<00:05,  1.20it/s]

Góor gi bëggul


 96%|█████████▋| 156/162 [02:30<00:04,  1.21it/s]

Nit ñenn ñi yegseeguñu.


 97%|█████████▋| 157/162 [02:31<00:04,  1.23it/s]

Man xar mépp.


 98%|█████████▊| 158/162 [02:32<00:03,  1.24it/s]

Yaw moomu laa wax


 98%|█████████▊| 159/162 [02:33<00:02,  1.23it/s]

Xammee ŋga waa jooju?


 99%|█████████▉| 160/162 [02:34<00:01,  1.21it/s]

Noona Góor gaa ŋgi, mu ñëw.


 99%|█████████▉| 161/162 [02:34<00:00,  1.22it/s]

Su demee


100%|██████████| 162/162 [02:35<00:00,  1.04it/s]

Dafa doon nitu dëgg.





Unnamed: 0,original_text,original_label,predicted_label
0,Que j'attrape quelles vaches?,Ma japp nag yee yan?,Mbaa jan?
1,Et que nul ne bouge!,Te bu fi kenn jogé!,Góor gi kenn bañ Moom
2,C'est à l'intérieur que tu dis?,Ci biir ŋga wax?,Ci biir ŋgeen jëm?
3,J'ai donné le livre à l'homme qui est venu.,Jox naa téere bi góor gi ñëw.,Dem naa ci keneen ki ñëw.
4,Sois homme de ce pays!,Dil nitu réew mi!,Dil nitu réew mi


In [11]:
df_ft_to_wf.tail(10)

Unnamed: 0,original_text,original_label,predicted_label
152,"Homme, lion, boeuf... allaient de concert.","Nit, gayndé, nag... àndoon nañu fi.","Nit, gayndé, nag, àndoon nañu fi."
153,C'est toi qui eusses été élu,Yaa doonkoon falu,Yaa doonkoon wax
154,L'homme ne cultivera pas,Góor gi du bày,Góor gi bëggul
155,S'agiter simplement ne suffit à rien résoudre.,Di tel-teli doŋŋ taxul sotal dara.,Nit ñenn ñi yegseeguñu.
156,C'était son hôte habituellement.,Moo doon ganam.,Man xar mépp.
157,Je parle de ceux-là!,Yenn xar yooyuu laa wax!,Yaw moomu laa wax
158,Tu reconnais cet enfant-ci?,Xammee ŋga bee xale?,Xammee ŋga waa jooju?
159,"Alors l'homme entra, les enfants le virent, il...","Noona góor gi dugg, xale yi gis ka, mu toog, ñ...","Noona Góor gaa ŋgi, mu ñëw."
160,C'est leur ami!,Suñu xarit la!,Su demee
161,Il était Lebou de Yoff.,Mu doon Lebu Yoff.,Dafa doon nitu dëgg.


In [12]:
# let us display 100 samples
pd.options.display.max_rows = 100
df_ft_to_wf.sample(100)

Unnamed: 0,original_text,original_label,predicted_label
105,Qui est-ce?,Ñan la?,Ku mu?
80,Tu as dit cela.,La ŋga wax la.,Li ŋga wax loolu.
52,A Moussa!,Musaa!,Musaa
132,Je connais l'enfant.,Xam naa xale bi.,Xam naa xale bi.
59,L'homme qui eût travaillé,Waa ji liggéeykoon,Góor gi waxkoon na
54,Le voilà qui part!,Mi ŋgiiy!,Ma ŋgee doon dem
115,Que tu partes ou que tu ne partes pas il viendra.,Dana ñëw soo demul ag soo demee itam.,"Soo demee ag soo demul itam, dana ñëw."
114,C'est l'homme qui a soutenu qu'il est sain d'e...,"Góor gee ni nit la, soo demee!",Góor gee ni soo demee nit la
46,J'ai vu mes amis!,Gis naa sana xarit yi!,Gis naa sama xarit yeneen yooyuu
147,Appelle l'homme qui ne part pas,Wool góor gi dul dem,Wool góor gi dul dem


## Colab download and remove step

In [None]:
import shutil

# shutil.rmtree('/content/drive/MyDrive/Memoire/subject2/training2/results2')
shutil.rmtree('wandb')
# shutil.make_archive('wandb', 'zip', 'wanbd')