Fine-tuning best T5 Transformer 🤖
-----------------------------------

In this notebook, we will continue the fine-tuning of T5 transformer on the new extracted sentences from the bool **Grammaire de Wolof Moderne**. We obtained, after a hyperparameter tuning with `wandb`, a best bleu score of **2.47** for french to wolof translation model. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step.

- Parallel coordinates from panel:


`Parameter importance char` (from [panel]():

![parameter_importance]()

In [1]:
# let us extend the paths of the system
import sys

# path = "/content/drive/MyDrive/Memoire/subject2/T5/"

# sys.path.extend([path, f"{path}new_data"])

In [3]:
# !pip install -qq wandb --upgrade

In [4]:
# !pip install evaluate -qq
# !pip install sacrebleu -qq
# !pip install optuna -qq
# !pip install transformers -qq 
# !pip install tokenizers -qq
# !pip install nlpaug -qq
# !pip install ray[tune] -qq
# !python -m spacy download fr_core_news_lg 

In [5]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, set_seed
from wolof_translate.utils.sent_transformers import TransformerSequences
from wolof_translate.data.dataset_v2 import T5SentenceDataset
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from functools import partial
from tqdm import tqdm
import pandas as pd
import numpy as np
import evaluate
import wandb
import torch

# wandb.login(key="237a8450cd2568ea1c8e1f8e0400708e79b6b4ee")


  from .autonotebook import tqdm as notebook_tqdm


## French to wolof

### Configure dataset 🔠

In [6]:
def split_data(random_state: int = 50):
  """Split data between train, validation and test sets

  Args:
    random_state (int): the seed of the splitting generator. Defaults to 50
  """
  # load the corpora and split into train and test sets
  corpora = pd.read_csv(f"data/additional_documents/diagne_sentences/extractions.csv")

  train_set, test_set = train_test_split(corpora, test_size=0.1, random_state=random_state)

  # let us save the final training set when performing

  train_set, valid_set = train_test_split(train_set, test_size=0.1, random_state=random_state)

  train_set.to_csv(f"data/additional_documents/diagne_sentences/final_train_set.csv", index=False)

  # let us save the sets
  train_set.to_csv(f"data/additional_documents/diagne_sentences/train_set.csv", index=False)

  valid_set.to_csv(f"data/additional_documents/diagne_sentences/valid_set.csv", index=False)

  test_set.to_csv(f"data/additional_documents/diagne_sentences/test_set.csv", index=False)

In [7]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v3.json")


In [8]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float):

  # Create augmentation to add on French sentences
  fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p),
                                        remove_mark_space, delete_guillemet_space)

  # Recuperate the train dataset
  train_dataset_aug = T5SentenceDataset(f"data/additional_documents/diagne_sentences/final_train_set.csv",
                                        tokenizer,
                                        truncation = True,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the test dataset
  test_dataset = T5SentenceDataset(f"data/additional_documents/diagne_sentences/test_set.csv",
                                        tokenizer,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, test_dataset

### Configure the model and the evaluation function ⚙️

Let us recuperate the model and resize the token embeddings.

In [9]:
def t5_model_init(tokenizer):

  # Initialize the model name
  model_name = 't5-small'
  # model_name = 'data/checkpoints/vf_t5_small_v2_checkpoints_2/' # from checkpoint

  # import the model with its pre-trained weights
  model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

  # resize the token embeddings
  model.resize_token_embeddings(len(tokenizer))

  return model

Let us evaluate the predictions with the `bleu` metric.

In [37]:
%%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
        
        decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

Overwriting wolof-translate/wolof_translate/utils/evaluation.py


Let us initialize the evaluation object.

In [12]:
evaluation = TranslationEvaluation(tokenizer)


### Searching for the best parameters 🕖

Let us define the data collator.

In [13]:
def data_collator(batch):
    """Generate a batch of data to provide to trainer

    Args:
        batch (_type_): The batch

    Returns:
        dict: A dictionary containing the ids, the attention mask and the labels
    """
    input_ids = torch.stack([b[0].squeeze(0) for b in batch])
    
    attention_mask = torch.stack([b[1].squeeze(0) for b in batch])
    
    labels = torch.stack([b[2].squeeze(0) for b in batch])
    
    return {'input_ids': input_ids, 'attention_mask': attention_mask,
            'labels': labels}

Let us initialize the training arguments and make random search.

In [14]:
# %%wandb

"""Best parameters
learning_rate = 0.0029455426961160418
weight_decay = 0.3273145442978588
train_batch_size = 16
random_state = 2
fr_char_p = 0.2646960611549013
fr_word_p = 0.32759507689127154
eval/bleu = 3.0599
"""

# let us define a directory
directory = "data/checkpoints/t5_results_fw_v2_2"

# seed
set_seed(0)

# split the data
split_data(random_state=2)

# let us recuperate the datasets
train_dataset, test_dataset = recuperate_datasets(0.2646960611549013, 0.32759507689127154)

# set training arguments
training_args = Seq2SeqTrainingArguments(directory,
                                    logging_dir="data/logs/results_fw_v2_2",
                                    num_train_epochs=300,
                                    load_best_model_at_end=True,
                                    save_strategy="epoch",
                                    evaluation_strategy="epoch",
                                    logging_strategy="epoch",
                                    per_device_train_batch_size=16, 
                                    per_device_eval_batch_size=16,
                                    learning_rate=0.0029455426961160418,
                                    # learning_rate=0.00003113,
                                    weight_decay=0.3273145442978588,
                                    predict_with_generate=True, # we will use predict with generate in order to obtain more valuable test results
                                    fp16 = True,
                                    metric_for_best_model = 'bleu', # a bleu score will be used to find the best model
                                    greater_is_better = True,
                                    save_total_limit = 1, # we will save only the best model
                                    )   

# define training loop
trainer = Seq2SeqTrainer(model_init=partial(t5_model_init, tokenizer = train_dataset.tokenizer),
                  args=training_args,
                  train_dataset=train_dataset, 
                  eval_dataset=test_dataset,
                  data_collator=data_collator,
                  compute_metrics=evaluation.compute_metrics
                  )

# load last checkpoint
# trainer._load_from_checkpoint("data/training2/results/checkpoint-147")

# start training loop
trainer.train()
# trainer.train('data/checkpoints/vf_t5_small_v2_checkpoints_2/')
# trainer.train('data/checkpoints/vf_t5_small_v2_checkpoints/') # from the searching best model
# trainer.train('data/checkpoints/results_fw_v2/last_checkpoint/') # from last checkpoint

# let us save the best model
trainer.save_state()
trainer.save_model(directory)
trainer._save('data/checkpoints/')

with open(f'{directory}/optimizer.pt', 'wb') as f:
    
    torch.save(trainer.optimizer.state_dict(), f)
    
with open(f'{directory}/scheduler.pt', 'wb') as f:
    
    torch.save(trainer.lr_scheduler.state_dict(), f)


Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33moumar-kane[0m ([33moumar-kane-team[0m). Use [1m`wandb login --relogin`[0m to force relogin


  0%|          | 151/45300 [00:28<2:12:34,  5.68it/s]

{'loss': 0.6276, 'learning_rate': 0.002935789243479896, 'epoch': 1.0}


                                                     
  0%|          | 151/45300 [00:35<2:12:34,  5.68it/s]

{'eval_loss': 0.5270705819129944, 'eval_bleu': 0.9137, 'eval_gen_len': 4.4545, 'eval_runtime': 7.24, 'eval_samples_per_second': 41.022, 'eval_steps_per_second': 2.624, 'epoch': 1.0}


  1%|          | 302/45300 [01:03<2:16:04,  5.51it/s] 

{'loss': 0.4564, 'learning_rate': 0.002925970767826176, 'epoch': 2.0}


                                                     
  1%|          | 302/45300 [01:11<2:16:04,  5.51it/s]

{'eval_loss': 0.46776384115219116, 'eval_bleu': 2.222, 'eval_gen_len': 4.697, 'eval_runtime': 7.7699, 'eval_samples_per_second': 38.225, 'eval_steps_per_second': 2.445, 'epoch': 2.0}


  1%|          | 453/45300 [01:41<2:11:45,  5.67it/s] 

{'loss': 0.384, 'learning_rate': 0.002916152292172456, 'epoch': 3.0}


                                                     
  1%|          | 453/45300 [01:49<2:11:45,  5.67it/s]

{'eval_loss': 0.44104301929473877, 'eval_bleu': 0.6911, 'eval_gen_len': 4.4411, 'eval_runtime': 7.9583, 'eval_samples_per_second': 37.32, 'eval_steps_per_second': 2.387, 'epoch': 3.0}


  1%|▏         | 604/45300 [02:16<2:01:55,  6.11it/s] 

{'loss': 0.3266, 'learning_rate': 0.0029063338165187355, 'epoch': 4.0}


                                                     
  1%|▏         | 604/45300 [02:24<2:01:55,  6.11it/s]

{'eval_loss': 0.42615368962287903, 'eval_bleu': 2.6885, 'eval_gen_len': 4.4916, 'eval_runtime': 7.7338, 'eval_samples_per_second': 38.403, 'eval_steps_per_second': 2.457, 'epoch': 4.0}


  2%|▏         | 755/45300 [02:51<1:57:42,  6.31it/s] 

{'loss': 0.2805, 'learning_rate': 0.0028965153408650156, 'epoch': 5.0}


                                                     
  2%|▏         | 755/45300 [02:59<1:57:42,  6.31it/s]

{'eval_loss': 0.41543784737586975, 'eval_bleu': 2.832, 'eval_gen_len': 5.1279, 'eval_runtime': 8.176, 'eval_samples_per_second': 36.326, 'eval_steps_per_second': 2.324, 'epoch': 5.0}


  2%|▏         | 906/45300 [03:29<2:10:47,  5.66it/s] 

{'loss': 0.2463, 'learning_rate': 0.002886696865211295, 'epoch': 6.0}


                                                     
  2%|▏         | 906/45300 [03:37<2:10:47,  5.66it/s]

{'eval_loss': 0.40771427750587463, 'eval_bleu': 5.176, 'eval_gen_len': 5.2391, 'eval_runtime': 7.9598, 'eval_samples_per_second': 37.313, 'eval_steps_per_second': 2.387, 'epoch': 6.0}


  2%|▏         | 1057/45300 [04:05<2:11:18,  5.62it/s]

{'loss': 0.2203, 'learning_rate': 0.002876878389557575, 'epoch': 7.0}


                                                      
  2%|▏         | 1057/45300 [04:13<2:11:18,  5.62it/s]

{'eval_loss': 0.42331331968307495, 'eval_bleu': 5.0309, 'eval_gen_len': 5.0236, 'eval_runtime': 7.7996, 'eval_samples_per_second': 38.079, 'eval_steps_per_second': 2.436, 'epoch': 7.0}


  3%|▎         | 1208/45300 [04:46<2:08:51,  5.70it/s] 

{'loss': 0.201, 'learning_rate': 0.002867059913903855, 'epoch': 8.0}


                                                      
  3%|▎         | 1208/45300 [04:54<2:08:51,  5.70it/s]

{'eval_loss': 0.42926743626594543, 'eval_bleu': 4.992, 'eval_gen_len': 5.1145, 'eval_runtime': 8.0745, 'eval_samples_per_second': 36.783, 'eval_steps_per_second': 2.353, 'epoch': 8.0}


  3%|▎         | 1359/45300 [05:21<2:05:58,  5.81it/s] 

{'loss': 0.1901, 'learning_rate': 0.002857241438250135, 'epoch': 9.0}


                                                      
  3%|▎         | 1359/45300 [05:29<2:05:58,  5.81it/s]

{'eval_loss': 0.40176886320114136, 'eval_bleu': 2.8842, 'eval_gen_len': 6.697, 'eval_runtime': 7.602, 'eval_samples_per_second': 39.069, 'eval_steps_per_second': 2.499, 'epoch': 9.0}


  3%|▎         | 1510/45300 [05:58<2:09:36,  5.63it/s] 

{'loss': 0.1845, 'learning_rate': 0.0028474229625964145, 'epoch': 10.0}


                                                      
  3%|▎         | 1510/45300 [06:05<2:09:36,  5.63it/s]

{'eval_loss': 0.40594586730003357, 'eval_bleu': 5.5343, 'eval_gen_len': 5.4781, 'eval_runtime': 7.3356, 'eval_samples_per_second': 40.487, 'eval_steps_per_second': 2.59, 'epoch': 10.0}


  4%|▎         | 1661/45300 [06:37<2:29:27,  4.87it/s] 

{'loss': 0.1817, 'learning_rate': 0.0028376044869426946, 'epoch': 11.0}


                                                      
  4%|▎         | 1661/45300 [06:45<2:29:27,  4.87it/s]

{'eval_loss': 0.40573203563690186, 'eval_bleu': 4.584, 'eval_gen_len': 5.6768, 'eval_runtime': 8.4377, 'eval_samples_per_second': 35.199, 'eval_steps_per_second': 2.252, 'epoch': 11.0}


  4%|▍         | 1812/45300 [07:15<2:21:55,  5.11it/s] 

{'loss': 0.1814, 'learning_rate': 0.0028277860112889746, 'epoch': 12.0}


                                                      
  4%|▍         | 1812/45300 [07:24<2:21:55,  5.11it/s]

{'eval_loss': 0.40791839361190796, 'eval_bleu': 3.7621, 'eval_gen_len': 4.8451, 'eval_runtime': 9.3347, 'eval_samples_per_second': 31.817, 'eval_steps_per_second': 2.035, 'epoch': 12.0}


  4%|▍         | 1963/45300 [07:56<2:11:58,  5.47it/s] 

{'loss': 0.1749, 'learning_rate': 0.0028179675356352542, 'epoch': 13.0}


                                                      
  4%|▍         | 1963/45300 [08:05<2:11:58,  5.47it/s]

{'eval_loss': 0.4031504690647125, 'eval_bleu': 6.8884, 'eval_gen_len': 4.6936, 'eval_runtime': 8.9485, 'eval_samples_per_second': 33.19, 'eval_steps_per_second': 2.123, 'epoch': 13.0}


  5%|▍         | 2114/45300 [08:33<2:16:23,  5.28it/s] 

{'loss': 0.1743, 'learning_rate': 0.002808149059981534, 'epoch': 14.0}


                                                      
  5%|▍         | 2114/45300 [08:41<2:16:23,  5.28it/s]

{'eval_loss': 0.39857351779937744, 'eval_bleu': 4.434, 'eval_gen_len': 5.0707, 'eval_runtime': 7.9157, 'eval_samples_per_second': 37.52, 'eval_steps_per_second': 2.4, 'epoch': 14.0}


  5%|▌         | 2265/45300 [09:11<2:56:55,  4.05it/s] 

{'loss': 0.1727, 'learning_rate': 0.002798330584327814, 'epoch': 15.0}


                                                      
  5%|▌         | 2265/45300 [09:22<2:56:55,  4.05it/s]

{'eval_loss': 0.4056515097618103, 'eval_bleu': 5.0873, 'eval_gen_len': 4.6835, 'eval_runtime': 10.5843, 'eval_samples_per_second': 28.06, 'eval_steps_per_second': 1.795, 'epoch': 15.0}


  5%|▌         | 2416/45300 [09:51<1:49:40,  6.52it/s] 

{'loss': 0.1684, 'learning_rate': 0.002788512108674094, 'epoch': 16.0}


                                                      
  5%|▌         | 2416/45300 [09:58<1:49:40,  6.52it/s]

{'eval_loss': 0.4093344807624817, 'eval_bleu': 6.2989, 'eval_gen_len': 4.6936, 'eval_runtime': 7.0439, 'eval_samples_per_second': 42.164, 'eval_steps_per_second': 2.697, 'epoch': 16.0}


  6%|▌         | 2567/45300 [10:24<1:46:59,  6.66it/s] 

{'loss': 0.1696, 'learning_rate': 0.0027786936330203736, 'epoch': 17.0}


                                                      
  6%|▌         | 2567/45300 [10:31<1:46:59,  6.66it/s]

{'eval_loss': 0.4002370238304138, 'eval_bleu': 6.9217, 'eval_gen_len': 4.963, 'eval_runtime': 7.2048, 'eval_samples_per_second': 41.223, 'eval_steps_per_second': 2.637, 'epoch': 17.0}


  6%|▌         | 2718/45300 [11:17<9:41:25,  1.22it/s] 

{'loss': 0.1797, 'learning_rate': 0.0027688751573666536, 'epoch': 18.0}


                                                      
  6%|▌         | 2718/45300 [11:46<9:41:25,  1.22it/s]

{'eval_loss': 0.3963322043418884, 'eval_bleu': 7.484, 'eval_gen_len': 5.101, 'eval_runtime': 29.0639, 'eval_samples_per_second': 10.219, 'eval_steps_per_second': 0.654, 'epoch': 18.0}


  6%|▋         | 2869/45300 [13:59<8:03:57,  1.46it/s]  

{'loss': 0.1701, 'learning_rate': 0.0027590566817129337, 'epoch': 19.0}


                                                      
  6%|▋         | 2869/45300 [14:20<8:03:57,  1.46it/s]

{'eval_loss': 0.39431053400039673, 'eval_bleu': 7.2493, 'eval_gen_len': 4.9428, 'eval_runtime': 20.8061, 'eval_samples_per_second': 14.275, 'eval_steps_per_second': 0.913, 'epoch': 19.0}


  7%|▋         | 3020/45300 [16:12<9:16:19,  1.27it/s] 

{'loss': 0.1688, 'learning_rate': 0.0027492382060592133, 'epoch': 20.0}


                                                      
  7%|▋         | 3020/45300 [16:31<9:16:19,  1.27it/s]

{'eval_loss': 0.3915559947490692, 'eval_bleu': 8.5012, 'eval_gen_len': 4.6195, 'eval_runtime': 18.9726, 'eval_samples_per_second': 15.654, 'eval_steps_per_second': 1.001, 'epoch': 20.0}


  7%|▋         | 3171/45300 [18:21<8:33:38,  1.37it/s] 

{'loss': 0.1659, 'learning_rate': 0.0027394197304054933, 'epoch': 21.0}


                                                      
  7%|▋         | 3171/45300 [18:52<8:33:38,  1.37it/s]

{'eval_loss': 0.38538435101509094, 'eval_bleu': 7.1165, 'eval_gen_len': 5.1549, 'eval_runtime': 30.956, 'eval_samples_per_second': 9.594, 'eval_steps_per_second': 0.614, 'epoch': 21.0}


  7%|▋         | 3322/45300 [19:17<1:59:45,  5.84it/s]  

{'loss': 0.1664, 'learning_rate': 0.002729601254751773, 'epoch': 22.0}


                                                      
  7%|▋         | 3322/45300 [19:23<1:59:45,  5.84it/s]

{'eval_loss': 0.4039405882358551, 'eval_bleu': 8.4632, 'eval_gen_len': 4.3199, 'eval_runtime': 6.6002, 'eval_samples_per_second': 44.999, 'eval_steps_per_second': 2.879, 'epoch': 22.0}


  8%|▊         | 3473/45300 [19:49<2:03:05,  5.66it/s] 

{'loss': 0.1905, 'learning_rate': 0.0027197827790980526, 'epoch': 23.0}


                                                      
  8%|▊         | 3473/45300 [19:55<2:03:05,  5.66it/s]

{'eval_loss': 0.38120904564857483, 'eval_bleu': 5.8697, 'eval_gen_len': 5.2795, 'eval_runtime': 6.5597, 'eval_samples_per_second': 45.276, 'eval_steps_per_second': 2.896, 'epoch': 23.0}


  8%|▊         | 3624/45300 [20:19<1:48:36,  6.40it/s] 

{'loss': 0.1716, 'learning_rate': 0.0027099643034443326, 'epoch': 24.0}


                                                      
  8%|▊         | 3624/45300 [20:26<1:48:36,  6.40it/s]

{'eval_loss': 0.38579124212265015, 'eval_bleu': 6.7711, 'eval_gen_len': 5.7609, 'eval_runtime': 6.7606, 'eval_samples_per_second': 43.931, 'eval_steps_per_second': 2.81, 'epoch': 24.0}


  8%|▊         | 3775/45300 [20:50<1:47:55,  6.41it/s] 

{'loss': 0.1573, 'learning_rate': 0.0027001458277906127, 'epoch': 25.0}


                                                      
  8%|▊         | 3775/45300 [20:57<1:47:55,  6.41it/s]

{'eval_loss': 0.37721219658851624, 'eval_bleu': 6.1187, 'eval_gen_len': 4.9461, 'eval_runtime': 6.7272, 'eval_samples_per_second': 44.149, 'eval_steps_per_second': 2.824, 'epoch': 25.0}


  9%|▊         | 3926/45300 [21:21<1:47:17,  6.43it/s] 

{'loss': 0.1558, 'learning_rate': 0.0026903273521368923, 'epoch': 26.0}


                                                      
  9%|▊         | 3926/45300 [21:28<1:47:17,  6.43it/s]

{'eval_loss': 0.37709590792655945, 'eval_bleu': 8.2721, 'eval_gen_len': 5.6128, 'eval_runtime': 7.1728, 'eval_samples_per_second': 41.406, 'eval_steps_per_second': 2.649, 'epoch': 26.0}


  9%|▉         | 4077/45300 [21:54<2:03:07,  5.58it/s] 

{'loss': 0.1593, 'learning_rate': 0.0026805088764831723, 'epoch': 27.0}


                                                      
  9%|▉         | 4077/45300 [22:00<2:03:07,  5.58it/s]

{'eval_loss': 0.3845570385456085, 'eval_bleu': 7.5127, 'eval_gen_len': 4.5455, 'eval_runtime': 6.6588, 'eval_samples_per_second': 44.603, 'eval_steps_per_second': 2.853, 'epoch': 27.0}


  9%|▉         | 4228/45300 [22:25<1:57:00,  5.85it/s] 

{'loss': 0.1622, 'learning_rate': 0.0026706904008294524, 'epoch': 28.0}


                                                      
  9%|▉         | 4228/45300 [22:32<1:57:00,  5.85it/s]

{'eval_loss': 0.38674473762512207, 'eval_bleu': 9.5372, 'eval_gen_len': 4.7946, 'eval_runtime': 6.8199, 'eval_samples_per_second': 43.549, 'eval_steps_per_second': 2.786, 'epoch': 28.0}


 10%|▉         | 4379/45300 [23:02<2:14:02,  5.09it/s] 

{'loss': 0.165, 'learning_rate': 0.002660871925175732, 'epoch': 29.0}


                                                      
 10%|▉         | 4379/45300 [23:10<2:14:02,  5.09it/s]

{'eval_loss': 0.38039854168891907, 'eval_bleu': 6.7476, 'eval_gen_len': 4.6667, 'eval_runtime': 8.1054, 'eval_samples_per_second': 36.642, 'eval_steps_per_second': 2.344, 'epoch': 29.0}


 10%|█         | 4530/45300 [23:40<1:58:10,  5.75it/s] 

{'loss': 0.1609, 'learning_rate': 0.0026510534495220116, 'epoch': 30.0}


                                                      
 10%|█         | 4530/45300 [23:48<1:58:10,  5.75it/s]

{'eval_loss': 0.3833341896533966, 'eval_bleu': 4.4469, 'eval_gen_len': 4.3704, 'eval_runtime': 7.7387, 'eval_samples_per_second': 38.378, 'eval_steps_per_second': 2.455, 'epoch': 30.0}


 10%|█         | 4681/45300 [24:17<1:53:07,  5.98it/s] 

{'loss': 0.1589, 'learning_rate': 0.0026412349738682917, 'epoch': 31.0}


                                                      
 10%|█         | 4681/45300 [24:25<1:53:07,  5.98it/s]

{'eval_loss': 0.37858521938323975, 'eval_bleu': 8.1351, 'eval_gen_len': 5.138, 'eval_runtime': 8.0492, 'eval_samples_per_second': 36.898, 'eval_steps_per_second': 2.36, 'epoch': 31.0}


 11%|█         | 4832/45300 [24:54<2:07:56,  5.27it/s] 

{'loss': 0.1624, 'learning_rate': 0.0026314164982145717, 'epoch': 32.0}


                                                      
 11%|█         | 4832/45300 [25:03<2:07:56,  5.27it/s]

{'eval_loss': 0.37160608172416687, 'eval_bleu': 8.1333, 'eval_gen_len': 5.2593, 'eval_runtime': 8.3994, 'eval_samples_per_second': 35.36, 'eval_steps_per_second': 2.262, 'epoch': 32.0}


 11%|█         | 4983/45300 [25:32<1:57:24,  5.72it/s] 

{'loss': 0.1586, 'learning_rate': 0.0026215980225608513, 'epoch': 33.0}


                                                      
 11%|█         | 4983/45300 [25:40<1:57:24,  5.72it/s]

{'eval_loss': 0.3840201497077942, 'eval_bleu': 8.2431, 'eval_gen_len': 5.4074, 'eval_runtime': 8.2666, 'eval_samples_per_second': 35.928, 'eval_steps_per_second': 2.298, 'epoch': 33.0}


 11%|█▏        | 5134/45300 [26:10<2:14:19,  4.98it/s] 

{'loss': 0.1604, 'learning_rate': 0.0026117795469071314, 'epoch': 34.0}


                                                      
 11%|█▏        | 5134/45300 [26:18<2:14:19,  4.98it/s]

{'eval_loss': 0.379743367433548, 'eval_bleu': 10.4333, 'eval_gen_len': 4.8788, 'eval_runtime': 8.0237, 'eval_samples_per_second': 37.015, 'eval_steps_per_second': 2.368, 'epoch': 34.0}


 12%|█▏        | 5285/45300 [26:47<2:12:33,  5.03it/s] 

{'loss': 0.1595, 'learning_rate': 0.0026019610712534114, 'epoch': 35.0}


                                                      
 12%|█▏        | 5285/45300 [26:55<2:12:33,  5.03it/s]

{'eval_loss': 0.38635826110839844, 'eval_bleu': 9.781, 'eval_gen_len': 5.2896, 'eval_runtime': 8.2036, 'eval_samples_per_second': 36.204, 'eval_steps_per_second': 2.316, 'epoch': 35.0}


 12%|█▏        | 5436/45300 [27:24<2:10:03,  5.11it/s] 

{'loss': 0.1583, 'learning_rate': 0.002592142595599691, 'epoch': 36.0}


                                                      
 12%|█▏        | 5436/45300 [27:32<2:10:03,  5.11it/s]

{'eval_loss': 0.38095858693122864, 'eval_bleu': 7.2446, 'eval_gen_len': 5.0337, 'eval_runtime': 8.05, 'eval_samples_per_second': 36.894, 'eval_steps_per_second': 2.36, 'epoch': 36.0}


 12%|█▏        | 5587/45300 [28:02<1:52:53,  5.86it/s] 

{'loss': 0.1552, 'learning_rate': 0.0025823241199459707, 'epoch': 37.0}


                                                      
 12%|█▏        | 5587/45300 [28:11<1:52:53,  5.86it/s]

{'eval_loss': 0.3698638379573822, 'eval_bleu': 9.6076, 'eval_gen_len': 5.0673, 'eval_runtime': 8.2911, 'eval_samples_per_second': 35.822, 'eval_steps_per_second': 2.292, 'epoch': 37.0}


 13%|█▎        | 5738/45300 [28:40<2:02:32,  5.38it/s] 

{'loss': 0.1556, 'learning_rate': 0.0025725056442922507, 'epoch': 38.0}


                                                      
 13%|█▎        | 5738/45300 [28:48<2:02:32,  5.38it/s]

{'eval_loss': 0.3773082494735718, 'eval_bleu': 7.7312, 'eval_gen_len': 4.9293, 'eval_runtime': 8.4006, 'eval_samples_per_second': 35.355, 'eval_steps_per_second': 2.262, 'epoch': 38.0}


 13%|█▎        | 5889/45300 [29:20<2:25:53,  4.50it/s] 

{'loss': 0.1646, 'learning_rate': 0.0025626871686385303, 'epoch': 39.0}


                                                      
 13%|█▎        | 5889/45300 [29:29<2:25:53,  4.50it/s]

{'eval_loss': 0.3821181058883667, 'eval_bleu': 4.6046, 'eval_gen_len': 5.3266, 'eval_runtime': 8.2547, 'eval_samples_per_second': 35.979, 'eval_steps_per_second': 2.302, 'epoch': 39.0}


 13%|█▎        | 6040/45300 [29:58<1:50:06,  5.94it/s] 

{'loss': 0.1599, 'learning_rate': 0.0025528686929848104, 'epoch': 40.0}


                                                      
 13%|█▎        | 6040/45300 [30:07<1:50:06,  5.94it/s]

{'eval_loss': 0.3803960680961609, 'eval_bleu': 8.7255, 'eval_gen_len': 4.9562, 'eval_runtime': 8.3417, 'eval_samples_per_second': 35.604, 'eval_steps_per_second': 2.278, 'epoch': 40.0}


 14%|█▎        | 6191/45300 [30:33<1:39:53,  6.53it/s] 

{'loss': 0.1542, 'learning_rate': 0.0025430502173310904, 'epoch': 41.0}


                                                      
 14%|█▎        | 6191/45300 [30:40<1:39:53,  6.53it/s]

{'eval_loss': 0.3799428343772888, 'eval_bleu': 11.1894, 'eval_gen_len': 4.8653, 'eval_runtime': 6.9995, 'eval_samples_per_second': 42.432, 'eval_steps_per_second': 2.714, 'epoch': 41.0}


 14%|█▍        | 6342/45300 [31:06<1:39:52,  6.50it/s] 

{'loss': 0.154, 'learning_rate': 0.0025332317416773705, 'epoch': 42.0}


                                                      
 14%|█▍        | 6342/45300 [31:14<1:39:52,  6.50it/s]

{'eval_loss': 0.37520816922187805, 'eval_bleu': 8.4046, 'eval_gen_len': 5.0842, 'eval_runtime': 7.1953, 'eval_samples_per_second': 41.277, 'eval_steps_per_second': 2.641, 'epoch': 42.0}


 14%|█▍        | 6493/45300 [31:39<1:39:54,  6.47it/s] 

{'loss': 0.1493, 'learning_rate': 0.00252341326602365, 'epoch': 43.0}


                                                      
 14%|█▍        | 6493/45300 [31:46<1:39:54,  6.47it/s]

{'eval_loss': 0.3867420554161072, 'eval_bleu': 5.7854, 'eval_gen_len': 4.936, 'eval_runtime': 7.167, 'eval_samples_per_second': 41.44, 'eval_steps_per_second': 2.651, 'epoch': 43.0}


 15%|█▍        | 6644/45300 [32:11<2:03:11,  5.23it/s] 

{'loss': 0.1553, 'learning_rate': 0.00251359479036993, 'epoch': 44.0}


                                                      
 15%|█▍        | 6644/45300 [32:17<2:03:11,  5.23it/s]

{'eval_loss': 0.398648202419281, 'eval_bleu': 5.2877, 'eval_gen_len': 4.9057, 'eval_runtime': 6.939, 'eval_samples_per_second': 42.801, 'eval_steps_per_second': 2.738, 'epoch': 44.0}


 15%|█▌        | 6795/45300 [32:43<1:42:24,  6.27it/s] 

{'loss': 0.1664, 'learning_rate': 0.0025038413377337842, 'epoch': 45.0}


                                                      
 15%|█▌        | 6795/45300 [32:50<1:42:24,  6.27it/s]

{'eval_loss': 0.3854109048843384, 'eval_bleu': 8.1779, 'eval_gen_len': 4.633, 'eval_runtime': 6.8788, 'eval_samples_per_second': 43.176, 'eval_steps_per_second': 2.762, 'epoch': 45.0}


 15%|█▌        | 6946/45300 [33:16<1:42:51,  6.21it/s] 

{'loss': 0.1489, 'learning_rate': 0.002494022862080064, 'epoch': 46.0}


                                                      
 15%|█▌        | 6946/45300 [33:23<1:42:51,  6.21it/s]

{'eval_loss': 0.37227076292037964, 'eval_bleu': 9.1659, 'eval_gen_len': 4.9663, 'eval_runtime': 7.3749, 'eval_samples_per_second': 40.272, 'eval_steps_per_second': 2.576, 'epoch': 46.0}


 16%|█▌        | 7097/45300 [33:49<1:38:56,  6.44it/s] 

{'loss': 0.1355, 'learning_rate': 0.002484204386426344, 'epoch': 47.0}


                                                      
 16%|█▌        | 7097/45300 [33:56<1:38:56,  6.44it/s]

{'eval_loss': 0.3776971697807312, 'eval_bleu': 9.5106, 'eval_gen_len': 4.862, 'eval_runtime': 7.4103, 'eval_samples_per_second': 40.079, 'eval_steps_per_second': 2.564, 'epoch': 47.0}


 16%|█▌        | 7248/45300 [34:20<1:48:06,  5.87it/s] 

{'loss': 0.1355, 'learning_rate': 0.0024743859107726235, 'epoch': 48.0}


                                                      
 16%|█▌        | 7248/45300 [34:27<1:48:06,  5.87it/s]

{'eval_loss': 0.3799944818019867, 'eval_bleu': 8.3337, 'eval_gen_len': 4.9226, 'eval_runtime': 7.0687, 'eval_samples_per_second': 42.016, 'eval_steps_per_second': 2.688, 'epoch': 48.0}


 16%|█▋        | 7399/45300 [34:52<1:40:50,  6.26it/s] 

{'loss': 0.1419, 'learning_rate': 0.0024645674351189036, 'epoch': 49.0}


                                                      
 16%|█▋        | 7399/45300 [34:59<1:40:50,  6.26it/s]

{'eval_loss': 0.3702561855316162, 'eval_bleu': 11.2498, 'eval_gen_len': 4.9865, 'eval_runtime': 6.9752, 'eval_samples_per_second': 42.579, 'eval_steps_per_second': 2.724, 'epoch': 49.0}


 17%|█▋        | 7550/45300 [35:25<1:33:44,  6.71it/s] 

{'loss': 0.1396, 'learning_rate': 0.0024547489594651836, 'epoch': 50.0}


                                                      
 17%|█▋        | 7550/45300 [35:32<1:33:44,  6.71it/s]

{'eval_loss': 0.3957270085811615, 'eval_bleu': 7.4511, 'eval_gen_len': 4.6296, 'eval_runtime': 7.104, 'eval_samples_per_second': 41.807, 'eval_steps_per_second': 2.675, 'epoch': 50.0}


 17%|█▋        | 7701/45300 [35:57<1:40:12,  6.25it/s] 

{'loss': 0.1377, 'learning_rate': 0.0024449304838114632, 'epoch': 51.0}


                                                      
 17%|█▋        | 7701/45300 [36:04<1:40:12,  6.25it/s]

{'eval_loss': 0.3785589337348938, 'eval_bleu': 6.441, 'eval_gen_len': 4.9966, 'eval_runtime': 6.9749, 'eval_samples_per_second': 42.581, 'eval_steps_per_second': 2.724, 'epoch': 51.0}


 17%|█▋        | 7852/45300 [36:28<1:28:53,  7.02it/s] 

{'loss': 0.1361, 'learning_rate': 0.0024351120081577433, 'epoch': 52.0}


                                                      
 17%|█▋        | 7852/45300 [36:36<1:28:53,  7.02it/s]

{'eval_loss': 0.38805902004241943, 'eval_bleu': 6.8066, 'eval_gen_len': 4.8754, 'eval_runtime': 7.1982, 'eval_samples_per_second': 41.26, 'eval_steps_per_second': 2.64, 'epoch': 52.0}


 18%|█▊        | 8003/45300 [37:01<1:46:39,  5.83it/s] 

{'loss': 0.1392, 'learning_rate': 0.002425293532504023, 'epoch': 53.0}


                                                      
 18%|█▊        | 8003/45300 [37:08<1:46:39,  5.83it/s]

{'eval_loss': 0.40088900923728943, 'eval_bleu': 6.9255, 'eval_gen_len': 4.404, 'eval_runtime': 7.3546, 'eval_samples_per_second': 40.383, 'eval_steps_per_second': 2.583, 'epoch': 53.0}


 18%|█▊        | 8154/45300 [37:33<1:46:05,  5.84it/s] 

{'loss': 0.14, 'learning_rate': 0.002415475056850303, 'epoch': 54.0}


                                                      
 18%|█▊        | 8154/45300 [37:40<1:46:05,  5.84it/s]

{'eval_loss': 0.39081528782844543, 'eval_bleu': 7.6447, 'eval_gen_len': 4.899, 'eval_runtime': 7.0932, 'eval_samples_per_second': 41.871, 'eval_steps_per_second': 2.679, 'epoch': 54.0}


 18%|█▊        | 8305/45300 [38:05<1:37:53,  6.30it/s] 

{'loss': 0.139, 'learning_rate': 0.0024056565811965826, 'epoch': 55.0}


                                                      
 18%|█▊        | 8305/45300 [38:12<1:37:53,  6.30it/s]

{'eval_loss': 0.3799628019332886, 'eval_bleu': 12.0002, 'eval_gen_len': 5.2694, 'eval_runtime': 7.5393, 'eval_samples_per_second': 39.393, 'eval_steps_per_second': 2.52, 'epoch': 55.0}


 19%|█▊        | 8456/45300 [38:37<1:28:07,  6.97it/s] 

{'loss': 0.13, 'learning_rate': 0.0023958381055428626, 'epoch': 56.0}


                                                      
 19%|█▊        | 8456/45300 [38:44<1:28:07,  6.97it/s]

{'eval_loss': 0.3956621587276459, 'eval_bleu': 5.4688, 'eval_gen_len': 4.9596, 'eval_runtime': 6.8282, 'eval_samples_per_second': 43.496, 'eval_steps_per_second': 2.783, 'epoch': 56.0}


 19%|█▉        | 8607/45300 [39:09<1:28:42,  6.89it/s] 

{'loss': 0.1318, 'learning_rate': 0.0023860196298891422, 'epoch': 57.0}


                                                      
 19%|█▉        | 8607/45300 [39:18<1:28:42,  6.89it/s]

{'eval_loss': 0.3849455416202545, 'eval_bleu': 8.9938, 'eval_gen_len': 5.2492, 'eval_runtime': 8.9128, 'eval_samples_per_second': 33.323, 'eval_steps_per_second': 2.132, 'epoch': 57.0}


 19%|█▉        | 8758/45300 [39:45<1:27:01,  7.00it/s] 

{'loss': 0.131, 'learning_rate': 0.0023762011542354223, 'epoch': 58.0}


                                                      
 19%|█▉        | 8758/45300 [39:52<1:27:01,  7.00it/s]

{'eval_loss': 0.38571152091026306, 'eval_bleu': 8.2431, 'eval_gen_len': 4.9899, 'eval_runtime': 7.0176, 'eval_samples_per_second': 42.322, 'eval_steps_per_second': 2.707, 'epoch': 58.0}


 20%|█▉        | 8909/45300 [40:19<1:43:35,  5.86it/s] 

{'loss': 0.1266, 'learning_rate': 0.0023663826785817023, 'epoch': 59.0}


                                                      
 20%|█▉        | 8909/45300 [40:26<1:43:35,  5.86it/s]

{'eval_loss': 0.3874642550945282, 'eval_bleu': 4.2708, 'eval_gen_len': 6.1785, 'eval_runtime': 6.9676, 'eval_samples_per_second': 42.626, 'eval_steps_per_second': 2.727, 'epoch': 59.0}


 20%|██        | 9060/45300 [40:51<1:31:33,  6.60it/s] 

{'loss': 0.1578, 'learning_rate': 0.0023565642029279824, 'epoch': 60.0}


                                                      
 20%|██        | 9060/45300 [40:58<1:31:33,  6.60it/s]

{'eval_loss': 0.3804610073566437, 'eval_bleu': 8.8729, 'eval_gen_len': 5.7003, 'eval_runtime': 6.8684, 'eval_samples_per_second': 43.241, 'eval_steps_per_second': 2.766, 'epoch': 60.0}


 20%|██        | 9211/45300 [41:23<1:33:37,  6.42it/s] 

{'loss': 0.1313, 'learning_rate': 0.002346745727274262, 'epoch': 61.0}


                                                      
 20%|██        | 9211/45300 [41:30<1:33:37,  6.42it/s]

{'eval_loss': 0.3824653923511505, 'eval_bleu': 9.292, 'eval_gen_len': 5.1414, 'eval_runtime': 7.423, 'eval_samples_per_second': 40.011, 'eval_steps_per_second': 2.56, 'epoch': 61.0}


 21%|██        | 9362/45300 [41:54<1:25:56,  6.97it/s] 

{'loss': 0.1196, 'learning_rate': 0.0023369272516205416, 'epoch': 62.0}


                                                      
 21%|██        | 9362/45300 [42:01<1:25:56,  6.97it/s]

{'eval_loss': 0.3911241888999939, 'eval_bleu': 10.4006, 'eval_gen_len': 5.7845, 'eval_runtime': 6.9921, 'eval_samples_per_second': 42.477, 'eval_steps_per_second': 2.717, 'epoch': 62.0}


 21%|██        | 9513/45300 [42:26<1:32:23,  6.46it/s] 

{'loss': 0.1154, 'learning_rate': 0.0023271087759668217, 'epoch': 63.0}


                                                      
 21%|██        | 9513/45300 [42:35<1:32:23,  6.46it/s]

{'eval_loss': 0.40041306614875793, 'eval_bleu': 9.9675, 'eval_gen_len': 4.8047, 'eval_runtime': 8.1868, 'eval_samples_per_second': 36.278, 'eval_steps_per_second': 2.321, 'epoch': 63.0}


 21%|██▏       | 9664/45300 [43:02<1:27:36,  6.78it/s] 

{'loss': 0.1183, 'learning_rate': 0.0023172903003131013, 'epoch': 64.0}


                                                      
 21%|██▏       | 9664/45300 [43:09<1:27:36,  6.78it/s]

{'eval_loss': 0.3814089894294739, 'eval_bleu': 11.0197, 'eval_gen_len': 5.9158, 'eval_runtime': 6.9874, 'eval_samples_per_second': 42.505, 'eval_steps_per_second': 2.719, 'epoch': 64.0}


 22%|██▏       | 9815/45300 [43:34<1:26:02,  6.87it/s] 

{'loss': 0.1276, 'learning_rate': 0.0023074718246593813, 'epoch': 65.0}


                                                      
 22%|██▏       | 9815/45300 [43:41<1:26:02,  6.87it/s]

{'eval_loss': 0.3824313282966614, 'eval_bleu': 9.9195, 'eval_gen_len': 5.1279, 'eval_runtime': 7.2715, 'eval_samples_per_second': 40.845, 'eval_steps_per_second': 2.613, 'epoch': 65.0}


 22%|██▏       | 9966/45300 [44:06<1:24:43,  6.95it/s] 

{'loss': 0.1248, 'learning_rate': 0.0022976533490056614, 'epoch': 66.0}


                                                      
 22%|██▏       | 9966/45300 [44:13<1:24:43,  6.95it/s]

{'eval_loss': 0.4015386998653412, 'eval_bleu': 9.139, 'eval_gen_len': 4.8451, 'eval_runtime': 7.0481, 'eval_samples_per_second': 42.139, 'eval_steps_per_second': 2.696, 'epoch': 66.0}


 22%|██▏       | 10117/45300 [44:42<1:58:38,  4.94it/s]

{'loss': 0.1208, 'learning_rate': 0.002287834873351941, 'epoch': 67.0}


                                                       
 22%|██▏       | 10117/45300 [44:50<1:58:38,  4.94it/s]

{'eval_loss': 0.38683804869651794, 'eval_bleu': 9.2822, 'eval_gen_len': 5.1212, 'eval_runtime': 8.4015, 'eval_samples_per_second': 35.351, 'eval_steps_per_second': 2.262, 'epoch': 67.0}


 23%|██▎       | 10268/45300 [45:21<1:35:11,  6.13it/s] 

{'loss': 0.1169, 'learning_rate': 0.002278016397698221, 'epoch': 68.0}


                                                       
 23%|██▎       | 10268/45300 [45:28<1:35:11,  6.13it/s]

{'eval_loss': 0.39260026812553406, 'eval_bleu': 9.4811, 'eval_gen_len': 5.6599, 'eval_runtime': 7.6957, 'eval_samples_per_second': 38.593, 'eval_steps_per_second': 2.469, 'epoch': 68.0}


 23%|██▎       | 10419/45300 [1:05:39<7:23:03,  1.31it/s]    

{'loss': 0.1371, 'learning_rate': 0.0022681979220445007, 'epoch': 69.0}


                                                         
 23%|██▎       | 10419/45300 [1:06:01<7:23:03,  1.31it/s]

{'eval_loss': 0.40077173709869385, 'eval_bleu': 7.3185, 'eval_gen_len': 5.2896, 'eval_runtime': 22.0603, 'eval_samples_per_second': 13.463, 'eval_steps_per_second': 0.861, 'epoch': 69.0}


 23%|██▎       | 10570/45300 [1:08:18<7:58:41,  1.21it/s] 

{'loss': 0.1263, 'learning_rate': 0.0022583794463907803, 'epoch': 70.0}


                                                         
 23%|██▎       | 10570/45300 [1:08:43<7:58:41,  1.21it/s]

{'eval_loss': 0.3949489891529083, 'eval_bleu': 8.8043, 'eval_gen_len': 5.3737, 'eval_runtime': 25.3072, 'eval_samples_per_second': 11.736, 'eval_steps_per_second': 0.751, 'epoch': 70.0}


 24%|██▎       | 10721/45300 [1:10:54<7:25:10,  1.29it/s] 

{'loss': 0.1181, 'learning_rate': 0.0022485609707370603, 'epoch': 71.0}


                                                         
 24%|██▎       | 10721/45300 [1:11:13<7:25:10,  1.29it/s]

{'eval_loss': 0.4003666937351227, 'eval_bleu': 11.1089, 'eval_gen_len': 4.9024, 'eval_runtime': 19.4613, 'eval_samples_per_second': 15.261, 'eval_steps_per_second': 0.976, 'epoch': 71.0}


 24%|██▍       | 10872/45300 [1:13:07<8:59:34,  1.06it/s] 

{'loss': 0.1103, 'learning_rate': 0.0022388075181009144, 'epoch': 72.0}


                                                         
 24%|██▍       | 10872/45300 [1:13:29<8:59:34,  1.06it/s]

{'eval_loss': 0.3897719979286194, 'eval_bleu': 9.6623, 'eval_gen_len': 4.7138, 'eval_runtime': 21.2203, 'eval_samples_per_second': 13.996, 'eval_steps_per_second': 0.895, 'epoch': 72.0}


 24%|██▍       | 11023/45300 [1:15:19<7:22:30,  1.29it/s] 

{'loss': 0.1076, 'learning_rate': 0.0022289890424471944, 'epoch': 73.0}


                                                         
 24%|██▍       | 11023/45300 [1:15:39<7:22:30,  1.29it/s]

{'eval_loss': 0.40450000762939453, 'eval_bleu': 6.4359, 'eval_gen_len': 5.0, 'eval_runtime': 20.1637, 'eval_samples_per_second': 14.729, 'eval_steps_per_second': 0.942, 'epoch': 73.0}


 25%|██▍       | 11174/45300 [1:17:34<6:24:55,  1.48it/s] 

{'loss': 0.1099, 'learning_rate': 0.0022191705667934745, 'epoch': 74.0}


                                                         
 25%|██▍       | 11174/45300 [1:17:54<6:24:55,  1.48it/s]

{'eval_loss': 0.40639299154281616, 'eval_bleu': 10.2408, 'eval_gen_len': 4.2727, 'eval_runtime': 19.9632, 'eval_samples_per_second': 14.877, 'eval_steps_per_second': 0.952, 'epoch': 74.0}


 25%|██▌       | 11325/45300 [1:19:47<7:27:40,  1.26it/s] 

{'loss': 0.1152, 'learning_rate': 0.002209352091139754, 'epoch': 75.0}


                                                         
 25%|██▌       | 11325/45300 [1:20:07<7:27:40,  1.26it/s]

{'eval_loss': 0.40274477005004883, 'eval_bleu': 8.5977, 'eval_gen_len': 4.9158, 'eval_runtime': 19.629, 'eval_samples_per_second': 15.131, 'eval_steps_per_second': 0.968, 'epoch': 75.0}


 25%|██▌       | 11476/45300 [1:21:57<6:25:25,  1.46it/s] 

{'loss': 0.1157, 'learning_rate': 0.002199533615486034, 'epoch': 76.0}


                                                         
 25%|██▌       | 11476/45300 [1:22:18<6:25:25,  1.46it/s]

{'eval_loss': 0.40107426047325134, 'eval_bleu': 8.0841, 'eval_gen_len': 5.4646, 'eval_runtime': 20.4971, 'eval_samples_per_second': 14.49, 'eval_steps_per_second': 0.927, 'epoch': 76.0}


 26%|██▌       | 11627/45300 [1:24:09<6:57:29,  1.34it/s] 

{'loss': 0.1117, 'learning_rate': 0.002189715139832314, 'epoch': 77.0}


                                                         
 26%|██▌       | 11627/45300 [1:24:29<6:57:29,  1.34it/s]

{'eval_loss': 0.3979099690914154, 'eval_bleu': 11.8276, 'eval_gen_len': 5.2256, 'eval_runtime': 19.9427, 'eval_samples_per_second': 14.893, 'eval_steps_per_second': 0.953, 'epoch': 77.0}


 26%|██▌       | 11778/45300 [1:26:25<6:41:53,  1.39it/s] 

{'loss': 0.1128, 'learning_rate': 0.002179896664178594, 'epoch': 78.0}


                                                         
 26%|██▌       | 11778/45300 [1:26:45<6:41:53,  1.39it/s]

{'eval_loss': 0.40391865372657776, 'eval_bleu': 9.8773, 'eval_gen_len': 5.1515, 'eval_runtime': 19.744, 'eval_samples_per_second': 15.043, 'eval_steps_per_second': 0.962, 'epoch': 78.0}


 26%|██▋       | 11929/45300 [1:28:38<6:37:52,  1.40it/s] 

{'loss': 0.1175, 'learning_rate': 0.0021700781885248734, 'epoch': 79.0}


                                                         
 26%|██▋       | 11929/45300 [1:28:59<6:37:52,  1.40it/s]

{'eval_loss': 0.3935452401638031, 'eval_bleu': 7.4091, 'eval_gen_len': 5.1111, 'eval_runtime': 21.4963, 'eval_samples_per_second': 13.816, 'eval_steps_per_second': 0.884, 'epoch': 79.0}


 27%|██▋       | 12080/45300 [1:30:59<6:23:04,  1.45it/s] 

{'loss': 0.1131, 'learning_rate': 0.002160324735888728, 'epoch': 80.0}


                                                         
 27%|██▋       | 12080/45300 [1:31:20<6:23:04,  1.45it/s]

{'eval_loss': 0.42951807379722595, 'eval_bleu': 4.9064, 'eval_gen_len': 5.4949, 'eval_runtime': 20.2382, 'eval_samples_per_second': 14.675, 'eval_steps_per_second': 0.939, 'epoch': 80.0}


 27%|██▋       | 12231/45300 [1:33:12<6:39:59,  1.38it/s] 

{'loss': 0.1179, 'learning_rate': 0.0021505062602350075, 'epoch': 81.0}


                                                         
 27%|██▋       | 12231/45300 [1:33:33<6:39:59,  1.38it/s]

{'eval_loss': 0.4062117338180542, 'eval_bleu': 8.891, 'eval_gen_len': 4.8687, 'eval_runtime': 21.3671, 'eval_samples_per_second': 13.9, 'eval_steps_per_second': 0.889, 'epoch': 81.0}


 27%|██▋       | 12382/45300 [1:35:25<7:10:24,  1.27it/s] 

{'loss': 0.1064, 'learning_rate': 0.0021406877845812876, 'epoch': 82.0}


                                                         
 27%|██▋       | 12382/45300 [1:35:45<7:10:24,  1.27it/s]

{'eval_loss': 0.41024941205978394, 'eval_bleu': 8.2284, 'eval_gen_len': 5.2997, 'eval_runtime': 20.4766, 'eval_samples_per_second': 14.504, 'eval_steps_per_second': 0.928, 'epoch': 82.0}


 28%|██▊       | 12533/45300 [1:42:10<1:34:32,  5.78it/s]  

{'loss': 0.1003, 'learning_rate': 0.002130869308927567, 'epoch': 83.0}


                                                         
 28%|██▊       | 12533/45300 [1:42:17<1:34:32,  5.78it/s]

{'eval_loss': 0.40042346715927124, 'eval_bleu': 12.3312, 'eval_gen_len': 4.7172, 'eval_runtime': 7.4385, 'eval_samples_per_second': 39.927, 'eval_steps_per_second': 2.554, 'epoch': 83.0}


 28%|██▊       | 12684/45300 [1:42:40<1:21:37,  6.66it/s] 

{'loss': 0.1032, 'learning_rate': 0.0021210508332738473, 'epoch': 84.0}


                                                         
 28%|██▊       | 12684/45300 [1:42:47<1:21:37,  6.66it/s]

{'eval_loss': 0.40509963035583496, 'eval_bleu': 10.4942, 'eval_gen_len': 5.0976, 'eval_runtime': 6.4349, 'eval_samples_per_second': 46.155, 'eval_steps_per_second': 2.953, 'epoch': 84.0}


 28%|██▊       | 12835/45300 [1:43:11<1:14:27,  7.27it/s] 

{'loss': 0.1037, 'learning_rate': 0.0021112323576201273, 'epoch': 85.0}


                                                         
 28%|██▊       | 12835/45300 [1:43:17<1:14:27,  7.27it/s]

{'eval_loss': 0.43255677819252014, 'eval_bleu': 7.0683, 'eval_gen_len': 5.0168, 'eval_runtime': 6.5714, 'eval_samples_per_second': 45.196, 'eval_steps_per_second': 2.891, 'epoch': 85.0}


 29%|██▊       | 12986/45300 [1:43:41<1:20:17,  6.71it/s] 

{'loss': 0.1119, 'learning_rate': 0.002101413881966407, 'epoch': 86.0}


                                                         
 29%|██▊       | 12986/45300 [1:43:48<1:20:17,  6.71it/s]

{'eval_loss': 0.40983325242996216, 'eval_bleu': 8.2579, 'eval_gen_len': 4.9495, 'eval_runtime': 6.7054, 'eval_samples_per_second': 44.293, 'eval_steps_per_second': 2.834, 'epoch': 86.0}


 29%|██▉       | 13137/45300 [1:44:12<1:19:41,  6.73it/s] 

{'loss': 0.1044, 'learning_rate': 0.002091595406312687, 'epoch': 87.0}


                                                         
 29%|██▉       | 13137/45300 [1:44:19<1:19:41,  6.73it/s]

{'eval_loss': 0.4079781472682953, 'eval_bleu': 8.3934, 'eval_gen_len': 4.7104, 'eval_runtime': 6.6431, 'eval_samples_per_second': 44.708, 'eval_steps_per_second': 2.86, 'epoch': 87.0}


 29%|██▉       | 13288/45300 [1:44:43<1:22:14,  6.49it/s] 

{'loss': 0.1003, 'learning_rate': 0.0020817769306589666, 'epoch': 88.0}


                                                         
 29%|██▉       | 13288/45300 [1:44:49<1:22:14,  6.49it/s]

{'eval_loss': 0.41156572103500366, 'eval_bleu': 10.1419, 'eval_gen_len': 4.5084, 'eval_runtime': 6.8131, 'eval_samples_per_second': 43.593, 'eval_steps_per_second': 2.789, 'epoch': 88.0}


 30%|██▉       | 13439/45300 [1:45:16<1:25:12,  6.23it/s] 

{'loss': 0.1008, 'learning_rate': 0.0020719584550052466, 'epoch': 89.0}


                                                         
 30%|██▉       | 13439/45300 [1:45:23<1:25:12,  6.23it/s]

{'eval_loss': 0.4061426520347595, 'eval_bleu': 11.9164, 'eval_gen_len': 4.734, 'eval_runtime': 6.8109, 'eval_samples_per_second': 43.606, 'eval_steps_per_second': 2.79, 'epoch': 89.0}


 30%|███       | 13590/45300 [1:45:48<1:15:46,  6.97it/s] 

{'loss': 0.1004, 'learning_rate': 0.0020621399793515263, 'epoch': 90.0}


                                                         
 30%|███       | 13590/45300 [1:45:55<1:15:46,  6.97it/s]

{'eval_loss': 0.4118405878543854, 'eval_bleu': 10.232, 'eval_gen_len': 5.3737, 'eval_runtime': 6.8145, 'eval_samples_per_second': 43.584, 'eval_steps_per_second': 2.788, 'epoch': 90.0}


 30%|███       | 13741/45300 [1:46:19<1:17:57,  6.75it/s] 

{'loss': 0.101, 'learning_rate': 0.0020523215036978063, 'epoch': 91.0}


                                                         
 30%|███       | 13741/45300 [1:46:26<1:17:57,  6.75it/s]

{'eval_loss': 0.40770480036735535, 'eval_bleu': 8.9015, 'eval_gen_len': 5.0269, 'eval_runtime': 7.1433, 'eval_samples_per_second': 41.577, 'eval_steps_per_second': 2.66, 'epoch': 91.0}


 31%|███       | 13892/45300 [1:46:50<1:14:48,  7.00it/s] 

{'loss': 0.1058, 'learning_rate': 0.0020425030280440864, 'epoch': 92.0}


                                                         
 31%|███       | 13892/45300 [1:46:57<1:14:48,  7.00it/s]

{'eval_loss': 0.41835570335388184, 'eval_bleu': 11.6401, 'eval_gen_len': 5.5455, 'eval_runtime': 6.7976, 'eval_samples_per_second': 43.692, 'eval_steps_per_second': 2.795, 'epoch': 92.0}


 31%|███       | 14043/45300 [1:47:24<1:21:48,  6.37it/s] 

{'loss': 0.1031, 'learning_rate': 0.002032684552390366, 'epoch': 93.0}


                                                         
 31%|███       | 14043/45300 [1:47:31<1:21:48,  6.37it/s]

{'eval_loss': 0.43648064136505127, 'eval_bleu': 9.7277, 'eval_gen_len': 4.3131, 'eval_runtime': 7.3095, 'eval_samples_per_second': 40.632, 'eval_steps_per_second': 2.599, 'epoch': 93.0}


 31%|███▏      | 14194/45300 [1:47:57<1:16:04,  6.81it/s] 

{'loss': 0.1008, 'learning_rate': 0.002022866076736646, 'epoch': 94.0}


                                                         
 31%|███▏      | 14194/45300 [1:48:04<1:16:04,  6.81it/s]

{'eval_loss': 0.41664794087409973, 'eval_bleu': 12.5753, 'eval_gen_len': 4.9832, 'eval_runtime': 6.8494, 'eval_samples_per_second': 43.361, 'eval_steps_per_second': 2.774, 'epoch': 94.0}


 32%|███▏      | 14345/45300 [1:48:30<1:16:33,  6.74it/s] 

{'loss': 0.0969, 'learning_rate': 0.002013047601082926, 'epoch': 95.0}


                                                         
 32%|███▏      | 14345/45300 [1:48:37<1:16:33,  6.74it/s]

{'eval_loss': 0.4105168581008911, 'eval_bleu': 12.5047, 'eval_gen_len': 5.1347, 'eval_runtime': 7.4164, 'eval_samples_per_second': 40.047, 'eval_steps_per_second': 2.562, 'epoch': 95.0}


 32%|███▏      | 14496/45300 [1:49:02<1:13:18,  7.00it/s] 

{'loss': 0.0992, 'learning_rate': 0.0020032291254292057, 'epoch': 96.0}


                                                         
 32%|███▏      | 14496/45300 [1:49:09<1:13:18,  7.00it/s]

{'eval_loss': 0.42442625761032104, 'eval_bleu': 9.4354, 'eval_gen_len': 5.0842, 'eval_runtime': 7.4802, 'eval_samples_per_second': 39.705, 'eval_steps_per_second': 2.54, 'epoch': 96.0}


 32%|███▏      | 14647/45300 [1:49:38<1:27:42,  5.82it/s] 

{'loss': 0.1058, 'learning_rate': 0.0019934106497754853, 'epoch': 97.0}


                                                         
 32%|███▏      | 14647/45300 [1:49:45<1:27:42,  5.82it/s]

{'eval_loss': 0.41824519634246826, 'eval_bleu': 8.1795, 'eval_gen_len': 4.9966, 'eval_runtime': 7.2663, 'eval_samples_per_second': 40.873, 'eval_steps_per_second': 2.615, 'epoch': 97.0}


 33%|███▎      | 14798/45300 [1:50:09<1:25:00,  5.98it/s] 

{'loss': 0.0972, 'learning_rate': 0.0019835921741217654, 'epoch': 98.0}


                                                         
 33%|███▎      | 14798/45300 [1:50:16<1:25:00,  5.98it/s]

{'eval_loss': 0.4111669361591339, 'eval_bleu': 13.1449, 'eval_gen_len': 4.6801, 'eval_runtime': 7.0231, 'eval_samples_per_second': 42.289, 'eval_steps_per_second': 2.705, 'epoch': 98.0}


 33%|███▎      | 14949/45300 [1:50:41<1:11:04,  7.12it/s] 

{'loss': 0.0932, 'learning_rate': 0.001973773698468045, 'epoch': 99.0}


                                                         
 33%|███▎      | 14949/45300 [1:50:48<1:11:04,  7.12it/s]

{'eval_loss': 0.41163137555122375, 'eval_bleu': 8.6046, 'eval_gen_len': 5.771, 'eval_runtime': 7.3957, 'eval_samples_per_second': 40.158, 'eval_steps_per_second': 2.569, 'epoch': 99.0}


 33%|███▎      | 15100/45300 [1:51:13<1:12:58,  6.90it/s] 

{'loss': 0.0938, 'learning_rate': 0.001963955222814325, 'epoch': 100.0}


                                                         
 33%|███▎      | 15100/45300 [1:51:20<1:12:58,  6.90it/s]

{'eval_loss': 0.4111463725566864, 'eval_bleu': 11.4353, 'eval_gen_len': 5.1515, 'eval_runtime': 7.1322, 'eval_samples_per_second': 41.642, 'eval_steps_per_second': 2.664, 'epoch': 100.0}


 34%|███▎      | 15251/45300 [1:51:46<1:19:06,  6.33it/s] 

{'loss': 0.1001, 'learning_rate': 0.001954136747160605, 'epoch': 101.0}


                                                         
 34%|███▎      | 15251/45300 [1:51:53<1:19:06,  6.33it/s]

{'eval_loss': 0.4275825619697571, 'eval_bleu': 8.973, 'eval_gen_len': 4.9562, 'eval_runtime': 7.0946, 'eval_samples_per_second': 41.863, 'eval_steps_per_second': 2.678, 'epoch': 101.0}


 34%|███▍      | 15402/45300 [1:52:18<1:10:07,  7.11it/s] 

{'loss': 0.0957, 'learning_rate': 0.001944318271506885, 'epoch': 102.0}


                                                         
 34%|███▍      | 15402/45300 [1:52:25<1:10:07,  7.11it/s]

{'eval_loss': 0.407002329826355, 'eval_bleu': 14.8994, 'eval_gen_len': 5.0168, 'eval_runtime': 6.8313, 'eval_samples_per_second': 43.476, 'eval_steps_per_second': 2.781, 'epoch': 102.0}


 34%|███▍      | 15553/45300 [1:52:50<1:11:10,  6.97it/s] 

{'loss': 0.0916, 'learning_rate': 0.0019344997958531645, 'epoch': 103.0}


                                                         
 34%|███▍      | 15553/45300 [1:52:57<1:11:10,  6.97it/s]

{'eval_loss': 0.4073601961135864, 'eval_bleu': 9.5786, 'eval_gen_len': 5.2391, 'eval_runtime': 6.9835, 'eval_samples_per_second': 42.529, 'eval_steps_per_second': 2.721, 'epoch': 103.0}


 35%|███▍      | 15704/45300 [1:53:21<1:12:35,  6.79it/s] 

{'loss': 0.0943, 'learning_rate': 0.0019246813201994446, 'epoch': 104.0}


                                                         
 35%|███▍      | 15704/45300 [1:53:28<1:12:35,  6.79it/s]

{'eval_loss': 0.41789156198501587, 'eval_bleu': 12.5793, 'eval_gen_len': 5.0168, 'eval_runtime': 6.9136, 'eval_samples_per_second': 42.959, 'eval_steps_per_second': 2.748, 'epoch': 104.0}


 35%|███▌      | 15855/45300 [1:53:55<1:25:55,  5.71it/s] 

{'loss': 0.0933, 'learning_rate': 0.0019148628445457244, 'epoch': 105.0}


                                                         
 35%|███▌      | 15855/45300 [1:54:02<1:25:55,  5.71it/s]

{'eval_loss': 0.4125231206417084, 'eval_bleu': 9.4763, 'eval_gen_len': 5.367, 'eval_runtime': 7.0639, 'eval_samples_per_second': 42.044, 'eval_steps_per_second': 2.69, 'epoch': 105.0}


 35%|███▌      | 16006/45300 [1:54:27<1:20:48,  6.04it/s] 

{'loss': 0.0931, 'learning_rate': 0.001905044368892004, 'epoch': 106.0}


                                                         
 35%|███▌      | 16006/45300 [1:54:34<1:20:48,  6.04it/s]

{'eval_loss': 0.4259220063686371, 'eval_bleu': 10.5651, 'eval_gen_len': 5.0303, 'eval_runtime': 6.9796, 'eval_samples_per_second': 42.552, 'eval_steps_per_second': 2.722, 'epoch': 106.0}


 36%|███▌      | 16157/45300 [1:54:59<1:09:00,  7.04it/s] 

{'loss': 0.0912, 'learning_rate': 0.001895225893238284, 'epoch': 107.0}


                                                         
 36%|███▌      | 16157/45300 [1:55:07<1:09:00,  7.04it/s]

{'eval_loss': 0.4003058075904846, 'eval_bleu': 9.7974, 'eval_gen_len': 5.569, 'eval_runtime': 8.0896, 'eval_samples_per_second': 36.714, 'eval_steps_per_second': 2.349, 'epoch': 107.0}


 36%|███▌      | 16308/45300 [1:55:35<1:12:13,  6.69it/s] 

{'loss': 0.0895, 'learning_rate': 0.0018854074175845641, 'epoch': 108.0}


                                                         
 36%|███▌      | 16308/45300 [1:55:42<1:12:13,  6.69it/s]

{'eval_loss': 0.42523032426834106, 'eval_bleu': 10.2409, 'eval_gen_len': 5.0539, 'eval_runtime': 6.934, 'eval_samples_per_second': 42.832, 'eval_steps_per_second': 2.74, 'epoch': 108.0}


 36%|███▋      | 16459/45300 [1:56:07<1:11:17,  6.74it/s] 

{'loss': 0.0894, 'learning_rate': 0.0018755889419308437, 'epoch': 109.0}


                                                         
 36%|███▋      | 16459/45300 [1:56:14<1:11:17,  6.74it/s]

{'eval_loss': 0.43465960025787354, 'eval_bleu': 7.0881, 'eval_gen_len': 4.7374, 'eval_runtime': 7.0554, 'eval_samples_per_second': 42.096, 'eval_steps_per_second': 2.693, 'epoch': 109.0}


 37%|███▋      | 16610/45300 [1:56:39<1:11:19,  6.70it/s] 

{'loss': 0.0975, 'learning_rate': 0.0018657704662771236, 'epoch': 110.0}


                                                         
 37%|███▋      | 16610/45300 [1:56:46<1:11:19,  6.70it/s]

{'eval_loss': 0.4336528182029724, 'eval_bleu': 8.2296, 'eval_gen_len': 4.4815, 'eval_runtime': 7.1715, 'eval_samples_per_second': 41.414, 'eval_steps_per_second': 2.649, 'epoch': 110.0}


 37%|███▋      | 16761/45300 [1:57:11<1:22:49,  5.74it/s] 

{'loss': 0.0966, 'learning_rate': 0.0018559519906234036, 'epoch': 111.0}


                                                         
 37%|███▋      | 16761/45300 [1:57:18<1:22:49,  5.74it/s]

{'eval_loss': 0.42156463861465454, 'eval_bleu': 11.6433, 'eval_gen_len': 4.7576, 'eval_runtime': 7.1803, 'eval_samples_per_second': 41.363, 'eval_steps_per_second': 2.646, 'epoch': 111.0}


 37%|███▋      | 16912/45300 [1:57:44<1:16:56,  6.15it/s] 

{'loss': 0.0931, 'learning_rate': 0.0018461335149696832, 'epoch': 112.0}


                                                         
 37%|███▋      | 16912/45300 [1:57:51<1:16:56,  6.15it/s]

{'eval_loss': 0.42493122816085815, 'eval_bleu': 10.3659, 'eval_gen_len': 5.3131, 'eval_runtime': 7.0437, 'eval_samples_per_second': 42.165, 'eval_steps_per_second': 2.697, 'epoch': 112.0}


 38%|███▊      | 17063/45300 [1:58:16<1:18:58,  5.96it/s] 

{'loss': 0.0866, 'learning_rate': 0.0018363150393159633, 'epoch': 113.0}


                                                         
 38%|███▊      | 17063/45300 [1:58:23<1:18:58,  5.96it/s]

{'eval_loss': 0.4132138788700104, 'eval_bleu': 10.9064, 'eval_gen_len': 5.1717, 'eval_runtime': 6.8704, 'eval_samples_per_second': 43.229, 'eval_steps_per_second': 2.765, 'epoch': 113.0}


 38%|███▊      | 17214/45300 [1:58:49<1:12:48,  6.43it/s] 

{'loss': 0.0848, 'learning_rate': 0.0018264965636622431, 'epoch': 114.0}


                                                         
 38%|███▊      | 17214/45300 [1:58:56<1:12:48,  6.43it/s]

{'eval_loss': 0.42637860774993896, 'eval_bleu': 10.1518, 'eval_gen_len': 5.3401, 'eval_runtime': 6.9214, 'eval_samples_per_second': 42.91, 'eval_steps_per_second': 2.745, 'epoch': 114.0}


 38%|███▊      | 17365/45300 [1:59:24<1:09:54,  6.66it/s] 

{'loss': 0.0933, 'learning_rate': 0.0018166780880085232, 'epoch': 115.0}


                                                         
 38%|███▊      | 17365/45300 [1:59:31<1:09:54,  6.66it/s]

{'eval_loss': 0.41507241129875183, 'eval_bleu': 12.1653, 'eval_gen_len': 5.4141, 'eval_runtime': 6.9754, 'eval_samples_per_second': 42.578, 'eval_steps_per_second': 2.724, 'epoch': 115.0}


 39%|███▊      | 17516/45300 [1:59:57<1:13:54,  6.27it/s] 

{'loss': 0.0918, 'learning_rate': 0.0018068596123548028, 'epoch': 116.0}


                                                         
 39%|███▊      | 17516/45300 [2:00:04<1:13:54,  6.27it/s]

{'eval_loss': 0.4433766305446625, 'eval_bleu': 9.3277, 'eval_gen_len': 4.4579, 'eval_runtime': 7.0364, 'eval_samples_per_second': 42.209, 'eval_steps_per_second': 2.7, 'epoch': 116.0}


 39%|███▉      | 17667/45300 [2:00:30<1:07:30,  6.82it/s] 

{'loss': 0.0907, 'learning_rate': 0.0017970411367010826, 'epoch': 117.0}


                                                         
 39%|███▉      | 17667/45300 [2:00:37<1:07:30,  6.82it/s]

{'eval_loss': 0.4279710650444031, 'eval_bleu': 13.0756, 'eval_gen_len': 4.6902, 'eval_runtime': 7.4956, 'eval_samples_per_second': 39.623, 'eval_steps_per_second': 2.535, 'epoch': 117.0}


 39%|███▉      | 17818/45300 [2:01:02<1:09:16,  6.61it/s] 

{'loss': 0.0863, 'learning_rate': 0.0017872226610473627, 'epoch': 118.0}


                                                         
 39%|███▉      | 17818/45300 [2:01:10<1:09:16,  6.61it/s]

{'eval_loss': 0.42009440064430237, 'eval_bleu': 14.2779, 'eval_gen_len': 4.9158, 'eval_runtime': 7.0579, 'eval_samples_per_second': 42.08, 'eval_steps_per_second': 2.692, 'epoch': 118.0}


 40%|███▉      | 17969/45300 [2:01:34<1:06:44,  6.82it/s] 

{'loss': 0.0824, 'learning_rate': 0.0017774041853936423, 'epoch': 119.0}


                                                         
 40%|███▉      | 17969/45300 [2:01:41<1:06:44,  6.82it/s]

{'eval_loss': 0.41543152928352356, 'eval_bleu': 12.2373, 'eval_gen_len': 5.5118, 'eval_runtime': 7.1372, 'eval_samples_per_second': 41.613, 'eval_steps_per_second': 2.662, 'epoch': 119.0}


 40%|████      | 18120/45300 [2:02:07<1:07:46,  6.68it/s] 

{'loss': 0.0801, 'learning_rate': 0.0017675857097399223, 'epoch': 120.0}


                                                         
 40%|████      | 18120/45300 [2:02:15<1:07:46,  6.68it/s]

{'eval_loss': 0.43615686893463135, 'eval_bleu': 12.967, 'eval_gen_len': 4.468, 'eval_runtime': 7.2573, 'eval_samples_per_second': 40.924, 'eval_steps_per_second': 2.618, 'epoch': 120.0}


 40%|████      | 18271/45300 [2:02:40<1:09:10,  6.51it/s] 

{'loss': 0.0822, 'learning_rate': 0.0017577672340862022, 'epoch': 121.0}


                                                         
 40%|████      | 18271/45300 [2:02:47<1:09:10,  6.51it/s]

{'eval_loss': 0.41361671686172485, 'eval_bleu': 13.6962, 'eval_gen_len': 4.6633, 'eval_runtime': 7.1927, 'eval_samples_per_second': 41.292, 'eval_steps_per_second': 2.642, 'epoch': 121.0}


 41%|████      | 18422/45300 [2:03:13<1:08:29,  6.54it/s] 

{'loss': 0.0819, 'learning_rate': 0.0017479487584324818, 'epoch': 122.0}


                                                         
 41%|████      | 18422/45300 [2:03:20<1:08:29,  6.54it/s]

{'eval_loss': 0.42615410685539246, 'eval_bleu': 11.0561, 'eval_gen_len': 4.936, 'eval_runtime': 7.2714, 'eval_samples_per_second': 40.845, 'eval_steps_per_second': 2.613, 'epoch': 122.0}


 41%|████      | 18573/45300 [2:03:46<1:06:28,  6.70it/s] 

{'loss': 0.0859, 'learning_rate': 0.0017381953057963363, 'epoch': 123.0}


                                                         
 41%|████      | 18573/45300 [2:03:53<1:06:28,  6.70it/s]

{'eval_loss': 0.42157766222953796, 'eval_bleu': 10.9045, 'eval_gen_len': 4.8586, 'eval_runtime': 7.3628, 'eval_samples_per_second': 40.338, 'eval_steps_per_second': 2.581, 'epoch': 123.0}


 41%|████▏     | 18724/45300 [2:04:19<1:09:37,  6.36it/s] 

{'loss': 0.0928, 'learning_rate': 0.0017283768301426159, 'epoch': 124.0}


                                                         
 41%|████▏     | 18724/45300 [2:04:26<1:09:37,  6.36it/s]

{'eval_loss': 0.4561442732810974, 'eval_bleu': 11.1742, 'eval_gen_len': 4.3502, 'eval_runtime': 6.83, 'eval_samples_per_second': 43.485, 'eval_steps_per_second': 2.782, 'epoch': 124.0}


 42%|████▏     | 18875/45300 [2:04:53<1:02:24,  7.06it/s] 

{'loss': 0.0867, 'learning_rate': 0.00171862337750647, 'epoch': 125.0}


                                                         
 42%|████▏     | 18875/45300 [2:05:00<1:02:24,  7.06it/s]

{'eval_loss': 0.42192941904067993, 'eval_bleu': 12.2023, 'eval_gen_len': 4.8081, 'eval_runtime': 6.5385, 'eval_samples_per_second': 45.423, 'eval_steps_per_second': 2.906, 'epoch': 125.0}


 42%|████▏     | 19026/45300 [2:05:24<1:01:24,  7.13it/s] 

{'loss': 0.0808, 'learning_rate': 0.00170880490185275, 'epoch': 126.0}


                                                         
 42%|████▏     | 19026/45300 [2:05:30<1:01:24,  7.13it/s]

{'eval_loss': 0.4137322008609772, 'eval_bleu': 13.8495, 'eval_gen_len': 4.9697, 'eval_runtime': 6.4515, 'eval_samples_per_second': 46.036, 'eval_steps_per_second': 2.945, 'epoch': 126.0}


 42%|████▏     | 19177/45300 [2:05:54<1:03:40,  6.84it/s] 

{'loss': 0.0794, 'learning_rate': 0.00169898642619903, 'epoch': 127.0}


                                                         
 42%|████▏     | 19177/45300 [2:06:00<1:03:40,  6.84it/s]

{'eval_loss': 0.436249703168869, 'eval_bleu': 10.4045, 'eval_gen_len': 5.0741, 'eval_runtime': 6.5548, 'eval_samples_per_second': 45.31, 'eval_steps_per_second': 2.899, 'epoch': 127.0}


 43%|████▎     | 19328/45300 [2:06:26<1:00:55,  7.11it/s] 

{'loss': 0.0807, 'learning_rate': 0.0016891679505453099, 'epoch': 128.0}


                                                         
 43%|████▎     | 19328/45300 [2:06:32<1:00:55,  7.11it/s]

{'eval_loss': 0.4420241415500641, 'eval_bleu': 11.7673, 'eval_gen_len': 4.9764, 'eval_runtime': 6.8062, 'eval_samples_per_second': 43.637, 'eval_steps_per_second': 2.792, 'epoch': 128.0}


 43%|████▎     | 19479/45300 [2:06:56<1:01:47,  6.97it/s] 

{'loss': 0.0809, 'learning_rate': 0.0016793494748915895, 'epoch': 129.0}


                                                         
 43%|████▎     | 19479/45300 [2:07:03<1:01:47,  6.97it/s]

{'eval_loss': 0.4349178075790405, 'eval_bleu': 12.2379, 'eval_gen_len': 4.9966, 'eval_runtime': 6.8107, 'eval_samples_per_second': 43.608, 'eval_steps_per_second': 2.79, 'epoch': 129.0}


 43%|████▎     | 19630/45300 [2:07:28<1:04:38,  6.62it/s] 

{'loss': 0.0841, 'learning_rate': 0.0016695309992378695, 'epoch': 130.0}


                                                         
 43%|████▎     | 19630/45300 [2:07:35<1:04:38,  6.62it/s]

{'eval_loss': 0.43102750182151794, 'eval_bleu': 11.328, 'eval_gen_len': 5.7475, 'eval_runtime': 6.6141, 'eval_samples_per_second': 44.904, 'eval_steps_per_second': 2.873, 'epoch': 130.0}


 44%|████▎     | 19781/45300 [2:07:58<1:05:11,  6.52it/s] 

{'loss': 0.0849, 'learning_rate': 0.0016597125235841496, 'epoch': 131.0}


                                                         
 44%|████▎     | 19781/45300 [2:08:05<1:05:11,  6.52it/s]

{'eval_loss': 0.4487870931625366, 'eval_bleu': 9.4327, 'eval_gen_len': 4.2862, 'eval_runtime': 6.5203, 'eval_samples_per_second': 45.55, 'eval_steps_per_second': 2.914, 'epoch': 131.0}


 44%|████▍     | 19932/45300 [2:08:28<1:12:42,  5.81it/s] 

{'loss': 0.0787, 'learning_rate': 0.0016498940479304292, 'epoch': 132.0}


                                                         
 44%|████▍     | 19932/45300 [2:08:35<1:12:42,  5.81it/s]

{'eval_loss': 0.43753525614738464, 'eval_bleu': 13.6072, 'eval_gen_len': 4.9192, 'eval_runtime': 6.7121, 'eval_samples_per_second': 44.249, 'eval_steps_per_second': 2.831, 'epoch': 132.0}


 44%|████▍     | 20083/45300 [2:08:58<1:03:19,  6.64it/s] 

{'loss': 0.0792, 'learning_rate': 0.001640075572276709, 'epoch': 133.0}


                                                         
 44%|████▍     | 20083/45300 [2:09:05<1:03:19,  6.64it/s]

{'eval_loss': 0.43979066610336304, 'eval_bleu': 12.7978, 'eval_gen_len': 4.7643, 'eval_runtime': 6.52, 'eval_samples_per_second': 45.552, 'eval_steps_per_second': 2.914, 'epoch': 133.0}


 45%|████▍     | 20234/45300 [2:09:30<1:14:59,  5.57it/s] 

{'loss': 0.0756, 'learning_rate': 0.001630257096622989, 'epoch': 134.0}


                                                         
 45%|████▍     | 20234/45300 [2:09:37<1:14:59,  5.57it/s]

{'eval_loss': 0.433457612991333, 'eval_bleu': 9.8978, 'eval_gen_len': 5.0067, 'eval_runtime': 6.6448, 'eval_samples_per_second': 44.697, 'eval_steps_per_second': 2.859, 'epoch': 134.0}


 45%|████▌     | 20385/45300 [2:10:01<1:01:57,  6.70it/s] 

{'loss': 0.0857, 'learning_rate': 0.0016204386209692687, 'epoch': 135.0}


                                                         
 45%|████▌     | 20385/45300 [2:10:08<1:01:57,  6.70it/s]

{'eval_loss': 0.442108154296875, 'eval_bleu': 9.111, 'eval_gen_len': 4.9933, 'eval_runtime': 6.6672, 'eval_samples_per_second': 44.546, 'eval_steps_per_second': 2.85, 'epoch': 135.0}


 45%|████▌     | 20536/45300 [2:10:32<1:13:15,  5.63it/s] 

{'loss': 0.0849, 'learning_rate': 0.0016106851683331232, 'epoch': 136.0}


                                                         
 45%|████▌     | 20536/45300 [2:10:39<1:13:15,  5.63it/s]

{'eval_loss': 0.43018653988838196, 'eval_bleu': 12.7772, 'eval_gen_len': 4.6094, 'eval_runtime': 6.8578, 'eval_samples_per_second': 43.308, 'eval_steps_per_second': 2.771, 'epoch': 136.0}


 46%|████▌     | 20687/45300 [2:11:02<1:02:14,  6.59it/s] 

{'loss': 0.0766, 'learning_rate': 0.0016008666926794028, 'epoch': 137.0}


                                                         
 46%|████▌     | 20687/45300 [2:11:09<1:02:14,  6.59it/s]

{'eval_loss': 0.43379828333854675, 'eval_bleu': 10.3954, 'eval_gen_len': 4.8047, 'eval_runtime': 6.6974, 'eval_samples_per_second': 44.346, 'eval_steps_per_second': 2.837, 'epoch': 137.0}


 46%|████▌     | 20838/45300 [2:11:33<1:00:47,  6.71it/s] 

{'loss': 0.0748, 'learning_rate': 0.0015910482170256827, 'epoch': 138.0}


                                                         
 46%|████▌     | 20838/45300 [2:11:40<1:00:47,  6.71it/s]

{'eval_loss': 0.4230984151363373, 'eval_bleu': 10.9809, 'eval_gen_len': 4.8754, 'eval_runtime': 6.9173, 'eval_samples_per_second': 42.936, 'eval_steps_per_second': 2.747, 'epoch': 138.0}


 46%|████▋     | 20989/45300 [2:12:03<1:02:43,  6.46it/s] 

{'loss': 0.0757, 'learning_rate': 0.0015812297413719627, 'epoch': 139.0}


                                                         
 46%|████▋     | 20989/45300 [2:12:10<1:02:43,  6.46it/s]

{'eval_loss': 0.4327877461910248, 'eval_bleu': 11.176, 'eval_gen_len': 4.8653, 'eval_runtime': 6.7168, 'eval_samples_per_second': 44.217, 'eval_steps_per_second': 2.829, 'epoch': 139.0}


 47%|████▋     | 21140/45300 [2:12:33<1:08:17,  5.90it/s] 

{'loss': 0.0749, 'learning_rate': 0.0015714112657182423, 'epoch': 140.0}


                                                         
 47%|████▋     | 21140/45300 [2:12:39<1:08:17,  5.90it/s]

{'eval_loss': 0.4315702021121979, 'eval_bleu': 13.1722, 'eval_gen_len': 4.5185, 'eval_runtime': 6.4606, 'eval_samples_per_second': 45.971, 'eval_steps_per_second': 2.941, 'epoch': 140.0}


 47%|████▋     | 21291/45300 [2:13:02<58:34,  6.83it/s]   

{'loss': 0.0738, 'learning_rate': 0.0015615927900645224, 'epoch': 141.0}


                                                       
 47%|████▋     | 21291/45300 [2:13:09<58:34,  6.83it/s]

{'eval_loss': 0.4387146830558777, 'eval_bleu': 13.7096, 'eval_gen_len': 4.9024, 'eval_runtime': 6.5599, 'eval_samples_per_second': 45.275, 'eval_steps_per_second': 2.896, 'epoch': 141.0}


 47%|████▋     | 21442/45300 [2:13:32<53:41,  7.41it/s]   

{'loss': 0.0727, 'learning_rate': 0.0015517743144108022, 'epoch': 142.0}


                                                       
 47%|████▋     | 21442/45300 [2:13:38<53:41,  7.41it/s]

{'eval_loss': 0.45716461539268494, 'eval_bleu': 12.0023, 'eval_gen_len': 4.5623, 'eval_runtime': 6.4879, 'eval_samples_per_second': 45.778, 'eval_steps_per_second': 2.929, 'epoch': 142.0}


 48%|████▊     | 21593/45300 [2:14:02<53:42,  7.36it/s]   

{'loss': 0.0743, 'learning_rate': 0.0015419558387570818, 'epoch': 143.0}


                                                       
 48%|████▊     | 21593/45300 [2:14:09<53:42,  7.36it/s]

{'eval_loss': 0.44120731949806213, 'eval_bleu': 12.5904, 'eval_gen_len': 4.6364, 'eval_runtime': 6.4894, 'eval_samples_per_second': 45.767, 'eval_steps_per_second': 2.928, 'epoch': 143.0}


 48%|████▊     | 21744/45300 [2:14:32<55:15,  7.11it/s]   

{'loss': 0.0767, 'learning_rate': 0.0015321373631033619, 'epoch': 144.0}


                                                       
 48%|████▊     | 21744/45300 [2:14:39<55:15,  7.11it/s]

{'eval_loss': 0.43647298216819763, 'eval_bleu': 8.9891, 'eval_gen_len': 4.9293, 'eval_runtime': 6.5039, 'eval_samples_per_second': 45.665, 'eval_steps_per_second': 2.921, 'epoch': 144.0}


 48%|████▊     | 21895/45300 [2:15:02<1:03:34,  6.14it/s] 

{'loss': 0.0848, 'learning_rate': 0.0015223188874496417, 'epoch': 145.0}


                                                         
 48%|████▊     | 21895/45300 [2:15:09<1:03:34,  6.14it/s]

{'eval_loss': 0.4413142800331116, 'eval_bleu': 10.1654, 'eval_gen_len': 5.3468, 'eval_runtime': 6.5305, 'eval_samples_per_second': 45.479, 'eval_steps_per_second': 2.909, 'epoch': 145.0}


 49%|████▊     | 22046/45300 [2:15:35<1:12:26,  5.35it/s] 

{'loss': 0.0817, 'learning_rate': 0.0015125004117959215, 'epoch': 146.0}


                                                         
 49%|████▊     | 22046/45300 [2:15:41<1:12:26,  5.35it/s]

{'eval_loss': 0.4554881453514099, 'eval_bleu': 10.3343, 'eval_gen_len': 4.8249, 'eval_runtime': 6.4369, 'eval_samples_per_second': 46.14, 'eval_steps_per_second': 2.952, 'epoch': 146.0}


 49%|████▉     | 22197/45300 [2:16:05<56:45,  6.78it/s]   

{'loss': 0.0733, 'learning_rate': 0.0015026819361422014, 'epoch': 147.0}


                                                       
 49%|████▉     | 22197/45300 [2:16:11<56:45,  6.78it/s]

{'eval_loss': 0.42119356989860535, 'eval_bleu': 11.5471, 'eval_gen_len': 5.4108, 'eval_runtime': 6.4435, 'eval_samples_per_second': 46.093, 'eval_steps_per_second': 2.949, 'epoch': 147.0}


 49%|████▉     | 22348/45300 [2:16:35<53:32,  7.14it/s]   

{'loss': 0.0687, 'learning_rate': 0.0014928634604884814, 'epoch': 148.0}


                                                       
 49%|████▉     | 22348/45300 [2:16:41<53:32,  7.14it/s]

{'eval_loss': 0.4297952353954315, 'eval_bleu': 12.8099, 'eval_gen_len': 4.9394, 'eval_runtime': 6.4472, 'eval_samples_per_second': 46.067, 'eval_steps_per_second': 2.947, 'epoch': 148.0}


 50%|████▉     | 22499/45300 [2:17:05<55:42,  6.82it/s]   

{'loss': 0.068, 'learning_rate': 0.0014830449848347613, 'epoch': 149.0}


                                                       
 50%|████▉     | 22499/45300 [2:17:12<55:42,  6.82it/s]

{'eval_loss': 0.4394037127494812, 'eval_bleu': 11.548, 'eval_gen_len': 4.8653, 'eval_runtime': 6.7445, 'eval_samples_per_second': 44.036, 'eval_steps_per_second': 2.817, 'epoch': 149.0}


 50%|█████     | 22650/45300 [2:17:35<54:58,  6.87it/s]   

{'loss': 0.0678, 'learning_rate': 0.0014732265091810409, 'epoch': 150.0}


                                                       
 50%|█████     | 22650/45300 [2:17:41<54:58,  6.87it/s]

{'eval_loss': 0.42669743299484253, 'eval_bleu': 12.076, 'eval_gen_len': 5.2896, 'eval_runtime': 6.3427, 'eval_samples_per_second': 46.825, 'eval_steps_per_second': 2.996, 'epoch': 150.0}


 50%|█████     | 22801/45300 [2:18:04<57:37,  6.51it/s]   

{'loss': 0.069, 'learning_rate': 0.001463408033527321, 'epoch': 151.0}


                                                       
 50%|█████     | 22801/45300 [2:18:10<57:37,  6.51it/s]

{'eval_loss': 0.42002633213996887, 'eval_bleu': 11.2809, 'eval_gen_len': 5.037, 'eval_runtime': 6.5351, 'eval_samples_per_second': 45.447, 'eval_steps_per_second': 2.907, 'epoch': 151.0}


 51%|█████     | 22952/45300 [2:18:33<51:17,  7.26it/s]   

{'loss': 0.0684, 'learning_rate': 0.0014535895578736008, 'epoch': 152.0}


                                                       
 51%|█████     | 22952/45300 [2:18:40<51:17,  7.26it/s]

{'eval_loss': 0.43634217977523804, 'eval_bleu': 9.3785, 'eval_gen_len': 5.1111, 'eval_runtime': 6.6499, 'eval_samples_per_second': 44.663, 'eval_steps_per_second': 2.857, 'epoch': 152.0}


 51%|█████     | 23103/45300 [2:19:03<55:14,  6.70it/s]   

{'loss': 0.0694, 'learning_rate': 0.0014437710822198806, 'epoch': 153.0}


                                                       
 51%|█████     | 23103/45300 [2:19:09<55:14,  6.70it/s]

{'eval_loss': 0.4276743531227112, 'eval_bleu': 13.1783, 'eval_gen_len': 5.1279, 'eval_runtime': 6.6563, 'eval_samples_per_second': 44.62, 'eval_steps_per_second': 2.854, 'epoch': 153.0}


 51%|█████▏    | 23254/45300 [2:19:35<52:50,  6.95it/s]   

{'loss': 0.0698, 'learning_rate': 0.0014339526065661604, 'epoch': 154.0}


                                                       
 51%|█████▏    | 23254/45300 [2:19:41<52:50,  6.95it/s]

{'eval_loss': 0.4255842864513397, 'eval_bleu': 12.4168, 'eval_gen_len': 5.0067, 'eval_runtime': 6.3259, 'eval_samples_per_second': 46.95, 'eval_steps_per_second': 3.004, 'epoch': 154.0}


 52%|█████▏    | 23405/45300 [2:20:04<49:58,  7.30it/s]   

{'loss': 0.0718, 'learning_rate': 0.0014241341309124403, 'epoch': 155.0}


                                                       
 52%|█████▏    | 23405/45300 [2:20:10<49:58,  7.30it/s]

{'eval_loss': 0.43629634380340576, 'eval_bleu': 9.4698, 'eval_gen_len': 5.4949, 'eval_runtime': 6.7336, 'eval_samples_per_second': 44.107, 'eval_steps_per_second': 2.822, 'epoch': 155.0}


 52%|█████▏    | 23556/45300 [2:20:34<49:07,  7.38it/s]   

{'loss': 0.0744, 'learning_rate': 0.0014143156552587203, 'epoch': 156.0}


                                                       
 52%|█████▏    | 23556/45300 [2:20:40<49:07,  7.38it/s]

{'eval_loss': 0.4406436085700989, 'eval_bleu': 11.719, 'eval_gen_len': 4.7104, 'eval_runtime': 6.523, 'eval_samples_per_second': 45.531, 'eval_steps_per_second': 2.913, 'epoch': 156.0}


 52%|█████▏    | 23707/45300 [2:21:03<51:39,  6.97it/s]   

{'loss': 0.0722, 'learning_rate': 0.001404497179605, 'epoch': 157.0}


                                                       
 52%|█████▏    | 23707/45300 [2:21:09<51:39,  6.97it/s]

{'eval_loss': 0.4311971068382263, 'eval_bleu': 9.3836, 'eval_gen_len': 5.2424, 'eval_runtime': 6.6798, 'eval_samples_per_second': 44.462, 'eval_steps_per_second': 2.844, 'epoch': 157.0}


 53%|█████▎    | 23858/45300 [2:21:32<51:07,  6.99it/s]   

{'loss': 0.0657, 'learning_rate': 0.00139467870395128, 'epoch': 158.0}


                                                       
 53%|█████▎    | 23858/45300 [2:21:39<51:07,  6.99it/s]

{'eval_loss': 0.423944890499115, 'eval_bleu': 13.1098, 'eval_gen_len': 5.0808, 'eval_runtime': 6.7503, 'eval_samples_per_second': 43.998, 'eval_steps_per_second': 2.815, 'epoch': 158.0}


 53%|█████▎    | 24009/45300 [2:22:02<51:25,  6.90it/s]   

{'loss': 0.0645, 'learning_rate': 0.0013848602282975598, 'epoch': 159.0}


                                                       
 53%|█████▎    | 24009/45300 [2:22:08<51:25,  6.90it/s]

{'eval_loss': 0.4194636046886444, 'eval_bleu': 13.3912, 'eval_gen_len': 5.0707, 'eval_runtime': 6.7109, 'eval_samples_per_second': 44.257, 'eval_steps_per_second': 2.831, 'epoch': 159.0}


 53%|█████▎    | 24160/45300 [2:22:31<48:15,  7.30it/s]   

{'loss': 0.0641, 'learning_rate': 0.0013750417526438396, 'epoch': 160.0}


                                                       
 53%|█████▎    | 24160/45300 [2:22:38<48:15,  7.30it/s]

{'eval_loss': 0.43361184000968933, 'eval_bleu': 14.7841, 'eval_gen_len': 4.7138, 'eval_runtime': 6.7934, 'eval_samples_per_second': 43.719, 'eval_steps_per_second': 2.797, 'epoch': 160.0}


 54%|█████▎    | 24311/45300 [2:23:01<49:37,  7.05it/s]   

{'loss': 0.0666, 'learning_rate': 0.0013652232769901195, 'epoch': 161.0}


                                                       
 54%|█████▎    | 24311/45300 [2:23:07<49:37,  7.05it/s]

{'eval_loss': 0.4345463812351227, 'eval_bleu': 13.8483, 'eval_gen_len': 4.596, 'eval_runtime': 6.4355, 'eval_samples_per_second': 46.15, 'eval_steps_per_second': 2.952, 'epoch': 161.0}


 54%|█████▍    | 24462/45300 [2:23:30<47:40,  7.28it/s]   

{'loss': 0.0663, 'learning_rate': 0.0013554048013363993, 'epoch': 162.0}


                                                       
 54%|█████▍    | 24462/45300 [2:23:37<47:40,  7.28it/s]

{'eval_loss': 0.4368448555469513, 'eval_bleu': 10.4492, 'eval_gen_len': 5.2121, 'eval_runtime': 6.3628, 'eval_samples_per_second': 46.678, 'eval_steps_per_second': 2.986, 'epoch': 162.0}


 54%|█████▍    | 24613/45300 [2:23:59<54:15,  6.35it/s]   

{'loss': 0.0676, 'learning_rate': 0.0013455863256826791, 'epoch': 163.0}


                                                       
 54%|█████▍    | 24613/45300 [2:24:06<54:15,  6.35it/s]

{'eval_loss': 0.44530192017555237, 'eval_bleu': 11.2202, 'eval_gen_len': 4.3569, 'eval_runtime': 6.3596, 'eval_samples_per_second': 46.701, 'eval_steps_per_second': 2.988, 'epoch': 163.0}


 55%|█████▍    | 24764/45300 [2:24:29<58:05,  5.89it/s]   

{'loss': 0.0672, 'learning_rate': 0.0013357678500289592, 'epoch': 164.0}


                                                       
 55%|█████▍    | 24764/45300 [2:24:35<58:05,  5.89it/s]

{'eval_loss': 0.44037219882011414, 'eval_bleu': 10.1423, 'eval_gen_len': 4.8451, 'eval_runtime': 6.4686, 'eval_samples_per_second': 45.914, 'eval_steps_per_second': 2.937, 'epoch': 164.0}


 55%|█████▌    | 24915/45300 [2:24:59<47:31,  7.15it/s]   

{'loss': 0.0655, 'learning_rate': 0.0013259493743752388, 'epoch': 165.0}


                                                       
 55%|█████▌    | 24915/45300 [2:25:05<47:31,  7.15it/s]

{'eval_loss': 0.4348912239074707, 'eval_bleu': 12.8527, 'eval_gen_len': 5.0404, 'eval_runtime': 6.7448, 'eval_samples_per_second': 44.034, 'eval_steps_per_second': 2.817, 'epoch': 165.0}


 55%|█████▌    | 25066/45300 [2:25:28<47:26,  7.11it/s]   

{'loss': 0.0655, 'learning_rate': 0.0013161308987215189, 'epoch': 166.0}


                                                       
 55%|█████▌    | 25066/45300 [2:25:34<47:26,  7.11it/s]

{'eval_loss': 0.4352060854434967, 'eval_bleu': 10.2146, 'eval_gen_len': 4.8182, 'eval_runtime': 6.4744, 'eval_samples_per_second': 45.873, 'eval_steps_per_second': 2.935, 'epoch': 166.0}


 56%|█████▌    | 25217/45300 [2:26:00<52:44,  6.35it/s]   

{'loss': 0.0649, 'learning_rate': 0.0013063124230677987, 'epoch': 167.0}


                                                       
 56%|█████▌    | 25217/45300 [2:26:06<52:44,  6.35it/s]

{'eval_loss': 0.4263344705104828, 'eval_bleu': 13.7351, 'eval_gen_len': 5.1313, 'eval_runtime': 6.4487, 'eval_samples_per_second': 46.056, 'eval_steps_per_second': 2.946, 'epoch': 167.0}


 56%|█████▌    | 25368/45300 [2:26:29<45:57,  7.23it/s]   

{'loss': 0.0643, 'learning_rate': 0.0012964939474140785, 'epoch': 168.0}


                                                       
 56%|█████▌    | 25368/45300 [2:26:35<45:57,  7.23it/s]

{'eval_loss': 0.4351728558540344, 'eval_bleu': 12.6776, 'eval_gen_len': 5.2054, 'eval_runtime': 6.5519, 'eval_samples_per_second': 45.33, 'eval_steps_per_second': 2.9, 'epoch': 168.0}


 56%|█████▋    | 25519/45300 [2:26:58<47:03,  7.01it/s]   

{'loss': 0.0648, 'learning_rate': 0.0012866754717603584, 'epoch': 169.0}


                                                       
 56%|█████▋    | 25519/45300 [2:27:05<47:03,  7.01it/s]

{'eval_loss': 0.43393775820732117, 'eval_bleu': 15.5603, 'eval_gen_len': 5.0, 'eval_runtime': 7.0663, 'eval_samples_per_second': 42.031, 'eval_steps_per_second': 2.689, 'epoch': 169.0}


 57%|█████▋    | 25670/45300 [2:27:28<43:30,  7.52it/s]   

{'loss': 0.0619, 'learning_rate': 0.0012768569961066382, 'epoch': 170.0}


                                                       
 57%|█████▋    | 25670/45300 [2:27:34<43:30,  7.52it/s]

{'eval_loss': 0.44794780015945435, 'eval_bleu': 11.246, 'eval_gen_len': 5.0303, 'eval_runtime': 6.4411, 'eval_samples_per_second': 46.11, 'eval_steps_per_second': 2.95, 'epoch': 170.0}


 57%|█████▋    | 25821/45300 [2:27:58<46:25,  6.99it/s]   

{'loss': 0.0624, 'learning_rate': 0.0012670385204529182, 'epoch': 171.0}


                                                       
 57%|█████▋    | 25821/45300 [2:28:04<46:25,  6.99it/s]

{'eval_loss': 0.43895766139030457, 'eval_bleu': 16.3791, 'eval_gen_len': 4.5892, 'eval_runtime': 6.4262, 'eval_samples_per_second': 46.217, 'eval_steps_per_second': 2.957, 'epoch': 171.0}


 57%|█████▋    | 25972/45300 [2:28:27<49:16,  6.54it/s]   

{'loss': 0.0603, 'learning_rate': 0.001257220044799198, 'epoch': 172.0}


                                                       
 57%|█████▋    | 25972/45300 [2:28:34<49:16,  6.54it/s]

{'eval_loss': 0.42058441042900085, 'eval_bleu': 13.8405, 'eval_gen_len': 5.2525, 'eval_runtime': 6.4456, 'eval_samples_per_second': 46.078, 'eval_steps_per_second': 2.948, 'epoch': 172.0}


 58%|█████▊    | 26123/45300 [2:28:56<44:10,  7.24it/s]   

{'loss': 0.0604, 'learning_rate': 0.0012474015691454777, 'epoch': 173.0}


                                                       
 58%|█████▊    | 26123/45300 [2:29:03<44:10,  7.24it/s]

{'eval_loss': 0.452865332365036, 'eval_bleu': 12.1426, 'eval_gen_len': 4.4478, 'eval_runtime': 6.4138, 'eval_samples_per_second': 46.307, 'eval_steps_per_second': 2.962, 'epoch': 173.0}


 58%|█████▊    | 26274/45300 [2:29:27<45:11,  7.02it/s]   

{'loss': 0.0628, 'learning_rate': 0.0012375830934917577, 'epoch': 174.0}


                                                       
 58%|█████▊    | 26274/45300 [2:29:34<45:11,  7.02it/s]

{'eval_loss': 0.43393823504447937, 'eval_bleu': 12.1847, 'eval_gen_len': 5.1582, 'eval_runtime': 6.614, 'eval_samples_per_second': 44.905, 'eval_steps_per_second': 2.873, 'epoch': 174.0}


 58%|█████▊    | 26425/45300 [2:29:57<43:46,  7.19it/s]   

{'loss': 0.0606, 'learning_rate': 0.0012277646178380376, 'epoch': 175.0}


                                                       
 58%|█████▊    | 26425/45300 [2:30:03<43:46,  7.19it/s]

{'eval_loss': 0.4318387806415558, 'eval_bleu': 14.5699, 'eval_gen_len': 4.963, 'eval_runtime': 6.4452, 'eval_samples_per_second': 46.081, 'eval_steps_per_second': 2.948, 'epoch': 175.0}


 59%|█████▊    | 26576/45300 [2:30:27<48:32,  6.43it/s]   

{'loss': 0.0591, 'learning_rate': 0.0012179461421843174, 'epoch': 176.0}


                                                       
 59%|█████▊    | 26576/45300 [2:30:33<48:32,  6.43it/s]

{'eval_loss': 0.44348371028900146, 'eval_bleu': 14.5871, 'eval_gen_len': 4.9798, 'eval_runtime': 6.4945, 'eval_samples_per_second': 45.731, 'eval_steps_per_second': 2.926, 'epoch': 176.0}


 59%|█████▉    | 26727/45300 [2:30:56<40:48,  7.59it/s]   

{'loss': 0.0581, 'learning_rate': 0.0012081926895481717, 'epoch': 177.0}


                                                       
 59%|█████▉    | 26727/45300 [2:31:03<40:48,  7.59it/s]

{'eval_loss': 0.41637372970581055, 'eval_bleu': 14.2908, 'eval_gen_len': 5.2323, 'eval_runtime': 6.64, 'eval_samples_per_second': 44.729, 'eval_steps_per_second': 2.861, 'epoch': 177.0}


 59%|█████▉    | 26878/45300 [2:31:26<49:09,  6.25it/s]   

{'loss': 0.0573, 'learning_rate': 0.0011983742138944513, 'epoch': 178.0}


                                                       
 59%|█████▉    | 26878/45300 [2:31:32<49:09,  6.25it/s]

{'eval_loss': 0.4348854124546051, 'eval_bleu': 10.7072, 'eval_gen_len': 4.7407, 'eval_runtime': 6.5097, 'eval_samples_per_second': 45.624, 'eval_steps_per_second': 2.919, 'epoch': 178.0}


 60%|█████▉    | 27029/45300 [2:31:55<46:39,  6.53it/s]   

{'loss': 0.0613, 'learning_rate': 0.0011885557382407313, 'epoch': 179.0}


                                                       
 60%|█████▉    | 27029/45300 [2:32:02<46:39,  6.53it/s]

{'eval_loss': 0.43369466066360474, 'eval_bleu': 8.067, 'eval_gen_len': 6.0673, 'eval_runtime': 6.5528, 'eval_samples_per_second': 45.324, 'eval_steps_per_second': 2.9, 'epoch': 179.0}


 60%|██████    | 27180/45300 [2:32:24<42:26,  7.12it/s]   

{'loss': 0.0599, 'learning_rate': 0.0011787372625870112, 'epoch': 180.0}


                                                       
 60%|██████    | 27180/45300 [2:32:31<42:26,  7.12it/s]

{'eval_loss': 0.4319347143173218, 'eval_bleu': 12.8378, 'eval_gen_len': 5.2761, 'eval_runtime': 7.0266, 'eval_samples_per_second': 42.268, 'eval_steps_per_second': 2.704, 'epoch': 180.0}


 60%|██████    | 27331/45300 [2:32:54<42:32,  7.04it/s]   

{'loss': 0.0588, 'learning_rate': 0.001168918786933291, 'epoch': 181.0}


                                                       
 60%|██████    | 27331/45300 [2:33:00<42:32,  7.04it/s]

{'eval_loss': 0.45759084820747375, 'eval_bleu': 11.6946, 'eval_gen_len': 4.6195, 'eval_runtime': 6.3507, 'eval_samples_per_second': 46.766, 'eval_steps_per_second': 2.992, 'epoch': 181.0}


 61%|██████    | 27482/45300 [2:33:23<45:02,  6.59it/s]   

{'loss': 0.0569, 'learning_rate': 0.0011591003112795708, 'epoch': 182.0}


                                                       
 61%|██████    | 27482/45300 [2:33:29<45:02,  6.59it/s]

{'eval_loss': 0.46262872219085693, 'eval_bleu': 11.7057, 'eval_gen_len': 4.6094, 'eval_runtime': 6.3214, 'eval_samples_per_second': 46.983, 'eval_steps_per_second': 3.006, 'epoch': 182.0}


 61%|██████    | 27633/45300 [2:33:52<39:38,  7.43it/s]   

{'loss': 0.062, 'learning_rate': 0.0011492818356258507, 'epoch': 183.0}


                                                       
 61%|██████    | 27633/45300 [2:33:59<39:38,  7.43it/s]

{'eval_loss': 0.4489828944206238, 'eval_bleu': 10.054, 'eval_gen_len': 4.7845, 'eval_runtime': 6.3592, 'eval_samples_per_second': 46.704, 'eval_steps_per_second': 2.988, 'epoch': 183.0}


 61%|██████▏   | 27784/45300 [2:34:23<40:46,  7.16it/s]   

{'loss': 0.0565, 'learning_rate': 0.0011394633599721305, 'epoch': 184.0}


                                                       
 61%|██████▏   | 27784/45300 [2:34:30<40:46,  7.16it/s]

{'eval_loss': 0.4342659115791321, 'eval_bleu': 13.9421, 'eval_gen_len': 4.9226, 'eval_runtime': 6.5114, 'eval_samples_per_second': 45.613, 'eval_steps_per_second': 2.918, 'epoch': 184.0}


 62%|██████▏   | 27935/45300 [2:34:53<40:38,  7.12it/s]   

{'loss': 0.0561, 'learning_rate': 0.0011296448843184106, 'epoch': 185.0}


                                                       
 62%|██████▏   | 27935/45300 [2:34:59<40:38,  7.12it/s]

{'eval_loss': 0.4345179498195648, 'eval_bleu': 17.12, 'eval_gen_len': 4.9966, 'eval_runtime': 6.3873, 'eval_samples_per_second': 46.499, 'eval_steps_per_second': 2.975, 'epoch': 185.0}


 62%|██████▏   | 28086/45300 [2:35:22<47:01,  6.10it/s]   

{'loss': 0.0548, 'learning_rate': 0.0011198264086646902, 'epoch': 186.0}


                                                       
 62%|██████▏   | 28086/45300 [2:35:28<47:01,  6.10it/s]

{'eval_loss': 0.442286878824234, 'eval_bleu': 12.4482, 'eval_gen_len': 5.0438, 'eval_runtime': 6.3897, 'eval_samples_per_second': 46.481, 'eval_steps_per_second': 2.974, 'epoch': 186.0}


 62%|██████▏   | 28237/45300 [2:35:51<43:00,  6.61it/s]   

{'loss': 0.0555, 'learning_rate': 0.0011100079330109702, 'epoch': 187.0}


                                                       
 62%|██████▏   | 28237/45300 [2:35:58<43:00,  6.61it/s]

{'eval_loss': 0.4291687309741974, 'eval_bleu': 16.5154, 'eval_gen_len': 5.1717, 'eval_runtime': 6.4154, 'eval_samples_per_second': 46.295, 'eval_steps_per_second': 2.962, 'epoch': 187.0}


 63%|██████▎   | 28388/45300 [2:36:21<41:25,  6.80it/s]   

{'loss': 0.0544, 'learning_rate': 0.00110018945735725, 'epoch': 188.0}


                                                       
 63%|██████▎   | 28388/45300 [2:36:27<41:25,  6.80it/s]

{'eval_loss': 0.4505194425582886, 'eval_bleu': 11.848, 'eval_gen_len': 5.1313, 'eval_runtime': 6.5363, 'eval_samples_per_second': 45.438, 'eval_steps_per_second': 2.907, 'epoch': 188.0}


 63%|██████▎   | 28539/45300 [2:36:50<47:57,  5.82it/s]   

{'loss': 0.0593, 'learning_rate': 0.00109037098170353, 'epoch': 189.0}


                                                       
 63%|██████▎   | 28539/45300 [2:36:57<47:57,  5.82it/s]

{'eval_loss': 0.4359145164489746, 'eval_bleu': 12.792, 'eval_gen_len': 4.7912, 'eval_runtime': 6.8545, 'eval_samples_per_second': 43.329, 'eval_steps_per_second': 2.772, 'epoch': 189.0}


 63%|██████▎   | 28690/45300 [2:37:20<38:45,  7.14it/s]   

{'loss': 0.0559, 'learning_rate': 0.0010805525060498097, 'epoch': 190.0}


                                                       
 63%|██████▎   | 28690/45300 [2:37:27<38:45,  7.14it/s]

{'eval_loss': 0.4345765709877014, 'eval_bleu': 12.9506, 'eval_gen_len': 4.6835, 'eval_runtime': 6.7456, 'eval_samples_per_second': 44.029, 'eval_steps_per_second': 2.817, 'epoch': 190.0}


 64%|██████▎   | 28841/45300 [2:37:50<38:27,  7.13it/s]   

{'loss': 0.0506, 'learning_rate': 0.0010707340303960896, 'epoch': 191.0}


                                                       
 64%|██████▎   | 28841/45300 [2:37:56<38:27,  7.13it/s]

{'eval_loss': 0.4381664991378784, 'eval_bleu': 13.6968, 'eval_gen_len': 5.2189, 'eval_runtime': 6.4356, 'eval_samples_per_second': 46.149, 'eval_steps_per_second': 2.952, 'epoch': 191.0}


 64%|██████▍   | 28992/45300 [2:38:19<37:49,  7.19it/s]   

{'loss': 0.0511, 'learning_rate': 0.0010609155547423696, 'epoch': 192.0}


                                                       
 64%|██████▍   | 28992/45300 [2:38:26<37:49,  7.19it/s]

{'eval_loss': 0.4365082383155823, 'eval_bleu': 14.1482, 'eval_gen_len': 5.2997, 'eval_runtime': 6.4258, 'eval_samples_per_second': 46.22, 'eval_steps_per_second': 2.957, 'epoch': 192.0}


 64%|██████▍   | 29143/45300 [2:38:48<37:43,  7.14it/s]   

{'loss': 0.0514, 'learning_rate': 0.0010511621021062237, 'epoch': 193.0}


                                                       
 64%|██████▍   | 29143/45300 [2:38:54<37:43,  7.14it/s]

{'eval_loss': 0.44008439779281616, 'eval_bleu': 12.3596, 'eval_gen_len': 5.229, 'eval_runtime': 6.4407, 'eval_samples_per_second': 46.113, 'eval_steps_per_second': 2.95, 'epoch': 193.0}


 65%|██████▍   | 29294/45300 [2:39:20<36:23,  7.33it/s]   

{'loss': 0.0528, 'learning_rate': 0.0010413436264525035, 'epoch': 194.0}


                                                       
 65%|██████▍   | 29294/45300 [2:39:26<36:23,  7.33it/s]

{'eval_loss': 0.4392881989479065, 'eval_bleu': 13.7163, 'eval_gen_len': 5.1717, 'eval_runtime': 6.5521, 'eval_samples_per_second': 45.329, 'eval_steps_per_second': 2.9, 'epoch': 194.0}


 65%|██████▌   | 29445/45300 [2:39:51<36:19,  7.27it/s]   

{'loss': 0.0504, 'learning_rate': 0.0010315251507987833, 'epoch': 195.0}


                                                       
 65%|██████▌   | 29445/45300 [2:39:57<36:19,  7.27it/s]

{'eval_loss': 0.4301779270172119, 'eval_bleu': 12.2447, 'eval_gen_len': 5.6936, 'eval_runtime': 6.4877, 'eval_samples_per_second': 45.779, 'eval_steps_per_second': 2.929, 'epoch': 195.0}


 65%|██████▌   | 29596/45300 [2:40:21<37:53,  6.91it/s]   

{'loss': 0.0506, 'learning_rate': 0.0010217066751450632, 'epoch': 196.0}


                                                       
 65%|██████▌   | 29596/45300 [2:40:27<37:53,  6.91it/s]

{'eval_loss': 0.42883220314979553, 'eval_bleu': 13.0498, 'eval_gen_len': 5.4714, 'eval_runtime': 6.4934, 'eval_samples_per_second': 45.739, 'eval_steps_per_second': 2.926, 'epoch': 196.0}


 66%|██████▌   | 29747/45300 [2:40:50<35:24,  7.32it/s]   

{'loss': 0.0538, 'learning_rate': 0.001011888199491343, 'epoch': 197.0}


                                                       
 66%|██████▌   | 29747/45300 [2:40:56<35:24,  7.32it/s]

{'eval_loss': 0.4353415369987488, 'eval_bleu': 15.8786, 'eval_gen_len': 4.8249, 'eval_runtime': 6.4811, 'eval_samples_per_second': 45.826, 'eval_steps_per_second': 2.932, 'epoch': 197.0}


 66%|██████▌   | 29898/45300 [2:41:20<36:06,  7.11it/s]   

{'loss': 0.0508, 'learning_rate': 0.001002069723837623, 'epoch': 198.0}


                                                       
 66%|██████▌   | 29898/45300 [2:41:27<36:06,  7.11it/s]

{'eval_loss': 0.4456596076488495, 'eval_bleu': 11.5931, 'eval_gen_len': 4.9899, 'eval_runtime': 6.5456, 'eval_samples_per_second': 45.374, 'eval_steps_per_second': 2.903, 'epoch': 198.0}


 66%|██████▋   | 30049/45300 [2:41:50<35:43,  7.11it/s]  

{'loss': 0.0493, 'learning_rate': 0.0009922512481839027, 'epoch': 199.0}


                                                       
 66%|██████▋   | 30049/45300 [2:41:56<35:43,  7.11it/s]

{'eval_loss': 0.4450569450855255, 'eval_bleu': 10.0054, 'eval_gen_len': 4.7407, 'eval_runtime': 6.7361, 'eval_samples_per_second': 44.091, 'eval_steps_per_second': 2.821, 'epoch': 199.0}


 67%|██████▋   | 30200/45300 [3:05:29<55:54,  4.50it/s]      

{'loss': 0.0503, 'learning_rate': 0.0009824327725301827, 'epoch': 200.0}


                                                       
 67%|██████▋   | 30200/45300 [3:05:38<55:54,  4.50it/s]

{'eval_loss': 0.4367793798446655, 'eval_bleu': 12.6697, 'eval_gen_len': 5.3165, 'eval_runtime': 9.4245, 'eval_samples_per_second': 31.514, 'eval_steps_per_second': 2.016, 'epoch': 200.0}


 67%|██████▋   | 30351/45300 [3:06:08<35:44,  6.97it/s]   

{'loss': 0.0506, 'learning_rate': 0.0009726142968764625, 'epoch': 201.0}


                                                       
 67%|██████▋   | 30351/45300 [3:06:16<35:44,  6.97it/s]

{'eval_loss': 0.4483695924282074, 'eval_bleu': 11.8873, 'eval_gen_len': 5.1852, 'eval_runtime': 8.1076, 'eval_samples_per_second': 36.632, 'eval_steps_per_second': 2.343, 'epoch': 201.0}


 67%|██████▋   | 30502/45300 [3:06:44<37:02,  6.66it/s]   

{'loss': 0.0474, 'learning_rate': 0.0009627958212227424, 'epoch': 202.0}


                                                       
 67%|██████▋   | 30502/45300 [3:06:52<37:02,  6.66it/s]

{'eval_loss': 0.43247878551483154, 'eval_bleu': 15.1266, 'eval_gen_len': 5.1044, 'eval_runtime': 8.03, 'eval_samples_per_second': 36.986, 'eval_steps_per_second': 2.366, 'epoch': 202.0}


 68%|██████▊   | 30653/45300 [3:07:21<43:31,  5.61it/s]   

{'loss': 0.0499, 'learning_rate': 0.0009529773455690223, 'epoch': 203.0}


                                                       
 68%|██████▊   | 30653/45300 [3:07:30<43:31,  5.61it/s]

{'eval_loss': 0.4284421503543854, 'eval_bleu': 13.099, 'eval_gen_len': 4.8956, 'eval_runtime': 8.4254, 'eval_samples_per_second': 35.25, 'eval_steps_per_second': 2.255, 'epoch': 203.0}


 68%|██████▊   | 30804/45300 [3:07:59<40:00,  6.04it/s]   

{'loss': 0.049, 'learning_rate': 0.000943158869915302, 'epoch': 204.0}


                                                       
 68%|██████▊   | 30804/45300 [3:08:08<40:00,  6.04it/s]

{'eval_loss': 0.42994073033332825, 'eval_bleu': 11.587, 'eval_gen_len': 5.633, 'eval_runtime': 8.7142, 'eval_samples_per_second': 34.082, 'eval_steps_per_second': 2.18, 'epoch': 204.0}


 68%|██████▊   | 30955/45300 [3:08:34<39:59,  5.98it/s]   

{'loss': 0.0474, 'learning_rate': 0.0009333403942615821, 'epoch': 205.0}


                                                       
 68%|██████▊   | 30955/45300 [3:08:42<39:59,  5.98it/s]

{'eval_loss': 0.4324076175689697, 'eval_bleu': 14.6688, 'eval_gen_len': 4.9663, 'eval_runtime': 7.3174, 'eval_samples_per_second': 40.588, 'eval_steps_per_second': 2.597, 'epoch': 205.0}


 69%|██████▊   | 31106/45300 [3:09:06<42:03,  5.62it/s]   

{'loss': 0.0451, 'learning_rate': 0.0009235219186078618, 'epoch': 206.0}


                                                       
 69%|██████▊   | 31106/45300 [3:09:14<42:03,  5.62it/s]

{'eval_loss': 0.4397449493408203, 'eval_bleu': 13.9221, 'eval_gen_len': 5.0303, 'eval_runtime': 7.6553, 'eval_samples_per_second': 38.797, 'eval_steps_per_second': 2.482, 'epoch': 206.0}


 69%|██████▉   | 31257/45300 [3:09:40<39:00,  6.00it/s]   

{'loss': 0.0464, 'learning_rate': 0.0009137034429541417, 'epoch': 207.0}


                                                       
 69%|██████▉   | 31257/45300 [3:09:47<39:00,  6.00it/s]

{'eval_loss': 0.4279995858669281, 'eval_bleu': 15.4244, 'eval_gen_len': 5.1313, 'eval_runtime': 7.0802, 'eval_samples_per_second': 41.948, 'eval_steps_per_second': 2.684, 'epoch': 207.0}


 69%|██████▉   | 31408/45300 [3:10:13<35:00,  6.61it/s]   

{'loss': 0.0451, 'learning_rate': 0.0009038849673004216, 'epoch': 208.0}


                                                       
 69%|██████▉   | 31408/45300 [3:10:20<35:00,  6.61it/s]

{'eval_loss': 0.4402582049369812, 'eval_bleu': 16.3179, 'eval_gen_len': 4.7912, 'eval_runtime': 6.9881, 'eval_samples_per_second': 42.501, 'eval_steps_per_second': 2.719, 'epoch': 208.0}


 70%|██████▉   | 31559/45300 [3:10:46<40:04,  5.71it/s]  

{'loss': 0.045, 'learning_rate': 0.0008940664916467014, 'epoch': 209.0}


                                                       
 70%|██████▉   | 31559/45300 [3:10:53<40:04,  5.71it/s]

{'eval_loss': 0.44381847977638245, 'eval_bleu': 14.1498, 'eval_gen_len': 5.2088, 'eval_runtime': 7.1937, 'eval_samples_per_second': 41.286, 'eval_steps_per_second': 2.641, 'epoch': 209.0}


 70%|███████   | 31710/45300 [3:11:18<34:00,  6.66it/s]  

{'loss': 0.0471, 'learning_rate': 0.0008842480159929812, 'epoch': 210.0}


                                                       
 70%|███████   | 31710/45300 [3:11:25<34:00,  6.66it/s]

{'eval_loss': 0.4345110356807709, 'eval_bleu': 16.7166, 'eval_gen_len': 5.2795, 'eval_runtime': 6.7428, 'eval_samples_per_second': 44.047, 'eval_steps_per_second': 2.818, 'epoch': 210.0}


 70%|███████   | 31861/45300 [3:11:50<31:55,  7.02it/s]  

{'loss': 0.0457, 'learning_rate': 0.0008744945633568354, 'epoch': 211.0}


                                                       
 70%|███████   | 31861/45300 [3:11:57<31:55,  7.02it/s]

{'eval_loss': 0.4480481445789337, 'eval_bleu': 11.39, 'eval_gen_len': 5.2626, 'eval_runtime': 7.2855, 'eval_samples_per_second': 40.766, 'eval_steps_per_second': 2.608, 'epoch': 211.0}


 71%|███████   | 32012/45300 [3:12:22<32:36,  6.79it/s]  

{'loss': 0.0435, 'learning_rate': 0.0008646760877031153, 'epoch': 212.0}


                                                       
 71%|███████   | 32012/45300 [3:12:29<32:36,  6.79it/s]

{'eval_loss': 0.4414404630661011, 'eval_bleu': 13.9083, 'eval_gen_len': 5.0505, 'eval_runtime': 6.954, 'eval_samples_per_second': 42.709, 'eval_steps_per_second': 2.732, 'epoch': 212.0}


 71%|███████   | 32163/45300 [3:12:54<34:10,  6.41it/s]  

{'loss': 0.0446, 'learning_rate': 0.0008548576120493952, 'epoch': 213.0}


                                                       
 71%|███████   | 32163/45300 [3:13:01<34:10,  6.41it/s]

{'eval_loss': 0.4495558440685272, 'eval_bleu': 13.139, 'eval_gen_len': 5.8316, 'eval_runtime': 7.1645, 'eval_samples_per_second': 41.455, 'eval_steps_per_second': 2.652, 'epoch': 213.0}


 71%|███████▏  | 32314/45300 [3:13:26<36:28,  5.93it/s]  

{'loss': 0.046, 'learning_rate': 0.0008451041594132494, 'epoch': 214.0}


                                                       
 71%|███████▏  | 32314/45300 [3:13:33<36:28,  5.93it/s]

{'eval_loss': 0.431618332862854, 'eval_bleu': 12.9828, 'eval_gen_len': 5.4007, 'eval_runtime': 6.9307, 'eval_samples_per_second': 42.853, 'eval_steps_per_second': 2.741, 'epoch': 214.0}


 72%|███████▏  | 32465/45300 [3:13:59<34:48,  6.15it/s]  

{'loss': 0.0434, 'learning_rate': 0.0008352856837595292, 'epoch': 215.0}


                                                       
 72%|███████▏  | 32465/45300 [3:14:06<34:48,  6.15it/s]

{'eval_loss': 0.43971729278564453, 'eval_bleu': 16.8234, 'eval_gen_len': 5.0, 'eval_runtime': 7.107, 'eval_samples_per_second': 41.79, 'eval_steps_per_second': 2.673, 'epoch': 215.0}


 72%|███████▏  | 32616/45300 [3:14:31<34:21,  6.15it/s]  

{'loss': 0.0444, 'learning_rate': 0.0008254672081058091, 'epoch': 216.0}


                                                       
 72%|███████▏  | 32616/45300 [3:14:38<34:21,  6.15it/s]

{'eval_loss': 0.44295647740364075, 'eval_bleu': 12.6858, 'eval_gen_len': 5.2357, 'eval_runtime': 7.0363, 'eval_samples_per_second': 42.21, 'eval_steps_per_second': 2.7, 'epoch': 216.0}


 72%|███████▏  | 32767/45300 [3:15:03<32:54,  6.35it/s]  

{'loss': 0.0418, 'learning_rate': 0.0008156487324520889, 'epoch': 217.0}


                                                       
 72%|███████▏  | 32767/45300 [3:15:11<32:54,  6.35it/s]

{'eval_loss': 0.4352467954158783, 'eval_bleu': 14.9595, 'eval_gen_len': 5.0539, 'eval_runtime': 7.4837, 'eval_samples_per_second': 39.686, 'eval_steps_per_second': 2.539, 'epoch': 217.0}


 73%|███████▎  | 32918/45300 [3:15:37<39:23,  5.24it/s]  

{'loss': 0.0403, 'learning_rate': 0.0008058302567983687, 'epoch': 218.0}


                                                       
 73%|███████▎  | 32918/45300 [3:15:44<39:23,  5.24it/s]

{'eval_loss': 0.44077134132385254, 'eval_bleu': 17.1645, 'eval_gen_len': 5.4848, 'eval_runtime': 6.8738, 'eval_samples_per_second': 43.207, 'eval_steps_per_second': 2.764, 'epoch': 218.0}


 73%|███████▎  | 33069/45300 [3:16:09<31:09,  6.54it/s]  

{'loss': 0.0409, 'learning_rate': 0.0007960117811446486, 'epoch': 219.0}


                                                       
 73%|███████▎  | 33069/45300 [3:16:16<31:09,  6.54it/s]

{'eval_loss': 0.44004589319229126, 'eval_bleu': 14.4402, 'eval_gen_len': 5.202, 'eval_runtime': 6.9723, 'eval_samples_per_second': 42.597, 'eval_steps_per_second': 2.725, 'epoch': 219.0}


 73%|███████▎  | 33220/45300 [3:16:42<29:29,  6.83it/s]  

{'loss': 0.0448, 'learning_rate': 0.0007862583285085028, 'epoch': 220.0}


                                                       
 73%|███████▎  | 33220/45300 [3:16:49<29:29,  6.83it/s]

{'eval_loss': 0.4315206706523895, 'eval_bleu': 15.1611, 'eval_gen_len': 5.0707, 'eval_runtime': 7.211, 'eval_samples_per_second': 41.187, 'eval_steps_per_second': 2.635, 'epoch': 220.0}


 74%|███████▎  | 33371/45300 [3:17:14<31:04,  6.40it/s]  

{'loss': 0.04, 'learning_rate': 0.0007764398528547825, 'epoch': 221.0}


                                                       
 74%|███████▎  | 33371/45300 [3:17:21<31:04,  6.40it/s]

{'eval_loss': 0.4359614849090576, 'eval_bleu': 16.8979, 'eval_gen_len': 5.3064, 'eval_runtime': 7.0478, 'eval_samples_per_second': 42.141, 'eval_steps_per_second': 2.696, 'epoch': 221.0}


 74%|███████▍  | 33522/45300 [3:17:47<36:14,  5.42it/s]  

{'loss': 0.0397, 'learning_rate': 0.0007666213772010626, 'epoch': 222.0}


                                                       
 74%|███████▍  | 33522/45300 [3:17:54<36:14,  5.42it/s]

{'eval_loss': 0.4356461465358734, 'eval_bleu': 14.6547, 'eval_gen_len': 4.8923, 'eval_runtime': 6.9741, 'eval_samples_per_second': 42.586, 'eval_steps_per_second': 2.724, 'epoch': 222.0}


 74%|███████▍  | 33673/45300 [3:18:19<32:46,  5.91it/s]  

{'loss': 0.0403, 'learning_rate': 0.0007568029015473423, 'epoch': 223.0}


                                                       
 74%|███████▍  | 33673/45300 [3:18:27<32:46,  5.91it/s]

{'eval_loss': 0.4350486397743225, 'eval_bleu': 15.646, 'eval_gen_len': 5.1852, 'eval_runtime': 7.1373, 'eval_samples_per_second': 41.612, 'eval_steps_per_second': 2.662, 'epoch': 223.0}


 75%|███████▍  | 33824/45300 [3:18:52<30:37,  6.25it/s]  

{'loss': 0.0362, 'learning_rate': 0.0007469844258936224, 'epoch': 224.0}


                                                       
 75%|███████▍  | 33824/45300 [3:18:59<30:37,  6.25it/s]

{'eval_loss': 0.4459276795387268, 'eval_bleu': 15.5825, 'eval_gen_len': 4.9461, 'eval_runtime': 6.9611, 'eval_samples_per_second': 42.666, 'eval_steps_per_second': 2.729, 'epoch': 224.0}


 75%|███████▌  | 33975/45300 [3:19:26<30:48,  6.13it/s]  

{'loss': 0.0376, 'learning_rate': 0.0007371659502399021, 'epoch': 225.0}


                                                       
 75%|███████▌  | 33975/45300 [3:19:33<30:48,  6.13it/s]

{'eval_loss': 0.4362315535545349, 'eval_bleu': 14.9502, 'eval_gen_len': 4.9259, 'eval_runtime': 6.966, 'eval_samples_per_second': 42.636, 'eval_steps_per_second': 2.728, 'epoch': 225.0}


 75%|███████▌  | 34126/45300 [3:19:59<29:12,  6.38it/s]  

{'loss': 0.0412, 'learning_rate': 0.000727347474586182, 'epoch': 226.0}


                                                       
 75%|███████▌  | 34126/45300 [3:20:06<29:12,  6.38it/s]

{'eval_loss': 0.45209792256355286, 'eval_bleu': 18.5056, 'eval_gen_len': 4.7407, 'eval_runtime': 6.959, 'eval_samples_per_second': 42.678, 'eval_steps_per_second': 2.73, 'epoch': 226.0}


 76%|███████▌  | 34277/45300 [3:20:32<28:26,  6.46it/s]  

{'loss': 0.0399, 'learning_rate': 0.0007175289989324619, 'epoch': 227.0}


                                                       
 76%|███████▌  | 34277/45300 [3:20:39<28:26,  6.46it/s]

{'eval_loss': 0.4334268867969513, 'eval_bleu': 13.229, 'eval_gen_len': 5.0101, 'eval_runtime': 7.2588, 'eval_samples_per_second': 40.916, 'eval_steps_per_second': 2.618, 'epoch': 227.0}


 76%|███████▌  | 34428/45300 [3:21:05<30:56,  5.86it/s]  

{'loss': 0.0398, 'learning_rate': 0.0007077105232787417, 'epoch': 228.0}


                                                       
 76%|███████▌  | 34428/45300 [3:21:11<30:56,  5.86it/s]

{'eval_loss': 0.4287923574447632, 'eval_bleu': 15.8159, 'eval_gen_len': 4.8148, 'eval_runtime': 6.856, 'eval_samples_per_second': 43.319, 'eval_steps_per_second': 2.771, 'epoch': 228.0}


 76%|███████▋  | 34579/45300 [3:21:36<26:58,  6.62it/s]  

{'loss': 0.0365, 'learning_rate': 0.0006978920476250215, 'epoch': 229.0}


                                                       
 76%|███████▋  | 34579/45300 [3:21:44<26:58,  6.62it/s]

{'eval_loss': 0.44344979524612427, 'eval_bleu': 14.7988, 'eval_gen_len': 4.7643, 'eval_runtime': 7.0342, 'eval_samples_per_second': 42.222, 'eval_steps_per_second': 2.701, 'epoch': 229.0}


 77%|███████▋  | 34730/45300 [3:22:09<26:46,  6.58it/s]  

{'loss': 0.0359, 'learning_rate': 0.0006880735719713015, 'epoch': 230.0}


                                                       
 77%|███████▋  | 34730/45300 [3:22:16<26:46,  6.58it/s]

{'eval_loss': 0.4394179880619049, 'eval_bleu': 13.1259, 'eval_gen_len': 5.101, 'eval_runtime': 6.9262, 'eval_samples_per_second': 42.881, 'eval_steps_per_second': 2.743, 'epoch': 230.0}


 77%|███████▋  | 34881/45300 [3:22:42<27:16,  6.37it/s]  

{'loss': 0.0353, 'learning_rate': 0.0006782550963175813, 'epoch': 231.0}


                                                       
 77%|███████▋  | 34881/45300 [3:22:49<27:16,  6.37it/s]

{'eval_loss': 0.44093087315559387, 'eval_bleu': 13.1515, 'eval_gen_len': 5.0303, 'eval_runtime': 7.1464, 'eval_samples_per_second': 41.56, 'eval_steps_per_second': 2.659, 'epoch': 231.0}


 77%|███████▋  | 35032/45300 [3:23:14<26:53,  6.36it/s]  

{'loss': 0.0368, 'learning_rate': 0.0006684366206638611, 'epoch': 232.0}


                                                       
 77%|███████▋  | 35032/45300 [3:23:21<26:53,  6.36it/s]

{'eval_loss': 0.44256600737571716, 'eval_bleu': 14.722, 'eval_gen_len': 4.9125, 'eval_runtime': 6.9983, 'eval_samples_per_second': 42.439, 'eval_steps_per_second': 2.715, 'epoch': 232.0}


 78%|███████▊  | 35183/45300 [3:23:47<25:38,  6.58it/s]  

{'loss': 0.0349, 'learning_rate': 0.000658618145010141, 'epoch': 233.0}


                                                       
 78%|███████▊  | 35183/45300 [3:23:54<25:38,  6.58it/s]

{'eval_loss': 0.4554038643836975, 'eval_bleu': 13.0274, 'eval_gen_len': 4.9024, 'eval_runtime': 7.0519, 'eval_samples_per_second': 42.116, 'eval_steps_per_second': 2.694, 'epoch': 233.0}


 78%|███████▊  | 35334/45300 [3:24:20<25:56,  6.40it/s]  

{'loss': 0.0349, 'learning_rate': 0.0006487996693564208, 'epoch': 234.0}


                                                       
 78%|███████▊  | 35334/45300 [3:24:27<25:56,  6.40it/s]

{'eval_loss': 0.4289938509464264, 'eval_bleu': 13.3444, 'eval_gen_len': 5.5084, 'eval_runtime': 7.0956, 'eval_samples_per_second': 41.857, 'eval_steps_per_second': 2.678, 'epoch': 234.0}


 78%|███████▊  | 35485/45300 [3:24:53<28:37,  5.71it/s]  

{'loss': 0.0341, 'learning_rate': 0.0006389811937027007, 'epoch': 235.0}


                                                       
 78%|███████▊  | 35485/45300 [3:25:00<28:37,  5.71it/s]

{'eval_loss': 0.436056524515152, 'eval_bleu': 13.8195, 'eval_gen_len': 5.1313, 'eval_runtime': 7.0485, 'eval_samples_per_second': 42.137, 'eval_steps_per_second': 2.696, 'epoch': 235.0}


 79%|███████▊  | 35636/45300 [3:25:26<26:05,  6.17it/s]  

{'loss': 0.0337, 'learning_rate': 0.0006291627180489806, 'epoch': 236.0}


                                                       
 79%|███████▊  | 35636/45300 [3:25:33<26:05,  6.17it/s]

{'eval_loss': 0.43672725558280945, 'eval_bleu': 16.0837, 'eval_gen_len': 5.138, 'eval_runtime': 6.9834, 'eval_samples_per_second': 42.529, 'eval_steps_per_second': 2.721, 'epoch': 236.0}


 79%|███████▉  | 35787/45300 [3:25:58<26:52,  5.90it/s]  

{'loss': 0.0325, 'learning_rate': 0.0006193442423952604, 'epoch': 237.0}


                                                       
 79%|███████▉  | 35787/45300 [3:26:05<26:52,  5.90it/s]

{'eval_loss': 0.43366748094558716, 'eval_bleu': 16.8218, 'eval_gen_len': 4.8047, 'eval_runtime': 7.0084, 'eval_samples_per_second': 42.378, 'eval_steps_per_second': 2.711, 'epoch': 237.0}


 79%|███████▉  | 35938/45300 [3:26:31<23:37,  6.60it/s]  

{'loss': 0.0318, 'learning_rate': 0.0006095257667415402, 'epoch': 238.0}


                                                       
 79%|███████▉  | 35938/45300 [3:26:38<23:37,  6.60it/s]

{'eval_loss': 0.4333776533603668, 'eval_bleu': 14.813, 'eval_gen_len': 4.9259, 'eval_runtime': 7.314, 'eval_samples_per_second': 40.607, 'eval_steps_per_second': 2.598, 'epoch': 238.0}


 80%|███████▉  | 36089/45300 [3:27:04<28:03,  5.47it/s]  

{'loss': 0.0318, 'learning_rate': 0.0005997072910878202, 'epoch': 239.0}


                                                       
 80%|███████▉  | 36089/45300 [3:27:11<28:03,  5.47it/s]

{'eval_loss': 0.4469885528087616, 'eval_bleu': 14.0204, 'eval_gen_len': 4.835, 'eval_runtime': 7.0727, 'eval_samples_per_second': 41.993, 'eval_steps_per_second': 2.686, 'epoch': 239.0}


 80%|████████  | 36240/45300 [3:27:36<22:33,  6.69it/s]  

{'loss': 0.0328, 'learning_rate': 0.0005898888154341, 'epoch': 240.0}


                                                       
 80%|████████  | 36240/45300 [3:27:43<22:33,  6.69it/s]

{'eval_loss': 0.4348984658718109, 'eval_bleu': 14.0032, 'eval_gen_len': 4.7475, 'eval_runtime': 6.9305, 'eval_samples_per_second': 42.854, 'eval_steps_per_second': 2.742, 'epoch': 240.0}


 80%|████████  | 36391/45300 [3:28:08<26:20,  5.64it/s]  

{'loss': 0.0322, 'learning_rate': 0.0005800703397803799, 'epoch': 241.0}


                                                       
 80%|████████  | 36391/45300 [3:28:16<26:20,  5.64it/s]

{'eval_loss': 0.43743762373924255, 'eval_bleu': 17.6128, 'eval_gen_len': 5.1111, 'eval_runtime': 7.0752, 'eval_samples_per_second': 41.978, 'eval_steps_per_second': 2.685, 'epoch': 241.0}


 81%|████████  | 36542/45300 [3:28:41<22:09,  6.59it/s]  

{'loss': 0.0317, 'learning_rate': 0.0005702518641266597, 'epoch': 242.0}


                                                       
 81%|████████  | 36542/45300 [3:28:48<22:09,  6.59it/s]

{'eval_loss': 0.4492499828338623, 'eval_bleu': 16.0363, 'eval_gen_len': 4.9865, 'eval_runtime': 7.2751, 'eval_samples_per_second': 40.824, 'eval_steps_per_second': 2.612, 'epoch': 242.0}


 81%|████████  | 36693/45300 [3:29:15<27:02,  5.30it/s]  

{'loss': 0.0305, 'learning_rate': 0.0005604333884729396, 'epoch': 243.0}


                                                       
 81%|████████  | 36693/45300 [3:29:22<27:02,  5.30it/s]

{'eval_loss': 0.44094201922416687, 'eval_bleu': 13.9294, 'eval_gen_len': 5.2323, 'eval_runtime': 7.2853, 'eval_samples_per_second': 40.767, 'eval_steps_per_second': 2.608, 'epoch': 243.0}


 81%|████████▏ | 36844/45300 [3:29:50<24:58,  5.64it/s]  

{'loss': 0.0301, 'learning_rate': 0.0005506149128192195, 'epoch': 244.0}


                                                       
 81%|████████▏ | 36844/45300 [3:29:57<24:58,  5.64it/s]

{'eval_loss': 0.4427143931388855, 'eval_bleu': 16.544, 'eval_gen_len': 5.1852, 'eval_runtime': 6.9055, 'eval_samples_per_second': 43.009, 'eval_steps_per_second': 2.751, 'epoch': 244.0}


 82%|████████▏ | 36995/45300 [3:30:23<23:02,  6.01it/s]  

{'loss': 0.0303, 'learning_rate': 0.0005407964371654993, 'epoch': 245.0}


                                                       
 82%|████████▏ | 36995/45300 [3:30:30<23:02,  6.01it/s]

{'eval_loss': 0.44392871856689453, 'eval_bleu': 17.8229, 'eval_gen_len': 5.1751, 'eval_runtime': 7.2315, 'eval_samples_per_second': 41.071, 'eval_steps_per_second': 2.627, 'epoch': 245.0}


 82%|████████▏ | 37146/45300 [3:30:55<20:16,  6.70it/s]  

{'loss': 0.0307, 'learning_rate': 0.0005309779615117791, 'epoch': 246.0}


                                                       
 82%|████████▏ | 37146/45300 [3:31:02<20:16,  6.70it/s]

{'eval_loss': 0.4553370177745819, 'eval_bleu': 14.8574, 'eval_gen_len': 5.1717, 'eval_runtime': 7.0482, 'eval_samples_per_second': 42.139, 'eval_steps_per_second': 2.696, 'epoch': 246.0}


 82%|████████▏ | 37297/45300 [3:31:28<22:27,  5.94it/s]  

{'loss': 0.031, 'learning_rate': 0.0005211594858580591, 'epoch': 247.0}


                                                       
 82%|████████▏ | 37297/45300 [3:31:35<22:27,  5.94it/s]

{'eval_loss': 0.4298984408378601, 'eval_bleu': 15.6155, 'eval_gen_len': 5.2424, 'eval_runtime': 7.1645, 'eval_samples_per_second': 41.455, 'eval_steps_per_second': 2.652, 'epoch': 247.0}


 83%|████████▎ | 37448/45300 [3:32:01<20:54,  6.26it/s]  

{'loss': 0.0291, 'learning_rate': 0.0005113410102043389, 'epoch': 248.0}


                                                       
 83%|████████▎ | 37448/45300 [3:32:08<20:54,  6.26it/s]

{'eval_loss': 0.4470827281475067, 'eval_bleu': 15.0865, 'eval_gen_len': 5.1549, 'eval_runtime': 6.8438, 'eval_samples_per_second': 43.397, 'eval_steps_per_second': 2.776, 'epoch': 248.0}


 83%|████████▎ | 37599/45300 [3:32:33<21:22,  6.01it/s]  

{'loss': 0.028, 'learning_rate': 0.0005015225345506188, 'epoch': 249.0}


                                                       
 83%|████████▎ | 37599/45300 [3:32:40<21:22,  6.01it/s]

{'eval_loss': 0.438398152589798, 'eval_bleu': 15.8531, 'eval_gen_len': 5.0909, 'eval_runtime': 7.0772, 'eval_samples_per_second': 41.966, 'eval_steps_per_second': 2.685, 'epoch': 249.0}


 83%|████████▎ | 37750/45300 [3:33:06<19:47,  6.36it/s]  

{'loss': 0.0267, 'learning_rate': 0.0004917040588968986, 'epoch': 250.0}


                                                       
 83%|████████▎ | 37750/45300 [3:33:13<19:47,  6.36it/s]

{'eval_loss': 0.4371975064277649, 'eval_bleu': 16.5963, 'eval_gen_len': 4.8889, 'eval_runtime': 7.0366, 'eval_samples_per_second': 42.208, 'eval_steps_per_second': 2.7, 'epoch': 250.0}


 84%|████████▎ | 37901/45300 [3:33:39<23:01,  5.36it/s]  

{'loss': 0.0276, 'learning_rate': 0.00048188558324317846, 'epoch': 251.0}


                                                       
 84%|████████▎ | 37901/45300 [3:33:46<23:01,  5.36it/s]

{'eval_loss': 0.44213107228279114, 'eval_bleu': 16.946, 'eval_gen_len': 4.8687, 'eval_runtime': 7.0264, 'eval_samples_per_second': 42.269, 'eval_steps_per_second': 2.704, 'epoch': 251.0}


 84%|████████▍ | 38052/45300 [3:34:18<25:00,  4.83it/s]  

{'loss': 0.0263, 'learning_rate': 0.00047206710758945834, 'epoch': 252.0}


                                                       
 84%|████████▍ | 38052/45300 [3:34:27<25:00,  4.83it/s]

{'eval_loss': 0.44021689891815186, 'eval_bleu': 18.0179, 'eval_gen_len': 4.9259, 'eval_runtime': 9.1293, 'eval_samples_per_second': 32.532, 'eval_steps_per_second': 2.081, 'epoch': 252.0}


 84%|████████▍ | 38203/45300 [3:35:00<21:50,  5.41it/s]  

{'loss': 0.0265, 'learning_rate': 0.00046224863193573823, 'epoch': 253.0}


                                                       
 84%|████████▍ | 38203/45300 [3:35:08<21:50,  5.41it/s]

{'eval_loss': 0.4419268071651459, 'eval_bleu': 15.6058, 'eval_gen_len': 4.7104, 'eval_runtime': 8.2913, 'eval_samples_per_second': 35.821, 'eval_steps_per_second': 2.292, 'epoch': 253.0}


 85%|████████▍ | 38354/45300 [3:35:36<18:32,  6.24it/s]  

{'loss': 0.0262, 'learning_rate': 0.000452430156282018, 'epoch': 254.0}


                                                       
 85%|████████▍ | 38354/45300 [3:35:44<18:32,  6.24it/s]

{'eval_loss': 0.4503368139266968, 'eval_bleu': 17.7771, 'eval_gen_len': 4.8586, 'eval_runtime': 8.2066, 'eval_samples_per_second': 36.19, 'eval_steps_per_second': 2.315, 'epoch': 254.0}


 85%|████████▌ | 38505/45300 [3:36:11<18:14,  6.21it/s]  

{'loss': 0.0257, 'learning_rate': 0.0004426116806282979, 'epoch': 255.0}


                                                       
 85%|████████▌ | 38505/45300 [3:36:18<18:14,  6.21it/s]

{'eval_loss': 0.4460180997848511, 'eval_bleu': 17.2515, 'eval_gen_len': 4.8384, 'eval_runtime': 7.3694, 'eval_samples_per_second': 40.302, 'eval_steps_per_second': 2.578, 'epoch': 255.0}


 85%|████████▌ | 38656/45300 [3:36:44<20:36,  5.37it/s]  

{'loss': 0.0259, 'learning_rate': 0.0004327932049745778, 'epoch': 256.0}


                                                       
 85%|████████▌ | 38656/45300 [3:36:54<20:36,  5.37it/s]

{'eval_loss': 0.4377048909664154, 'eval_bleu': 16.6967, 'eval_gen_len': 5.3603, 'eval_runtime': 9.195, 'eval_samples_per_second': 32.3, 'eval_steps_per_second': 2.066, 'epoch': 256.0}


 86%|████████▌ | 38807/45300 [3:37:19<17:14,  6.28it/s]  

{'loss': 0.0251, 'learning_rate': 0.0004229747293208577, 'epoch': 257.0}


                                                       
 86%|████████▌ | 38807/45300 [3:37:26<17:14,  6.28it/s]

{'eval_loss': 0.4352237284183502, 'eval_bleu': 15.3586, 'eval_gen_len': 5.5084, 'eval_runtime': 7.3212, 'eval_samples_per_second': 40.567, 'eval_steps_per_second': 2.595, 'epoch': 257.0}


 86%|████████▌ | 38958/45300 [3:37:55<17:01,  6.21it/s]  

{'loss': 0.0243, 'learning_rate': 0.00041315625366713756, 'epoch': 258.0}


                                                       
 86%|████████▌ | 38958/45300 [3:38:02<17:01,  6.21it/s]

{'eval_loss': 0.4460381269454956, 'eval_bleu': 17.3834, 'eval_gen_len': 4.9933, 'eval_runtime': 6.682, 'eval_samples_per_second': 44.448, 'eval_steps_per_second': 2.843, 'epoch': 258.0}


 86%|████████▋ | 39109/45300 [3:38:37<16:56,  6.09it/s]  

{'loss': 0.0243, 'learning_rate': 0.00040333777801341734, 'epoch': 259.0}


                                                       
 86%|████████▋ | 39109/45300 [3:38:44<16:56,  6.09it/s]

{'eval_loss': 0.4389822483062744, 'eval_bleu': 14.9787, 'eval_gen_len': 5.138, 'eval_runtime': 7.5416, 'eval_samples_per_second': 39.381, 'eval_steps_per_second': 2.519, 'epoch': 259.0}


 87%|████████▋ | 39260/45300 [3:39:12<21:28,  4.69it/s]  

{'loss': 0.0234, 'learning_rate': 0.00039351930235969723, 'epoch': 260.0}


                                                       
 87%|████████▋ | 39260/45300 [3:39:21<21:28,  4.69it/s]

{'eval_loss': 0.44571688771247864, 'eval_bleu': 16.7958, 'eval_gen_len': 5.101, 'eval_runtime': 8.4675, 'eval_samples_per_second': 35.075, 'eval_steps_per_second': 2.244, 'epoch': 260.0}


 87%|████████▋ | 39411/45300 [3:39:47<16:22,  5.99it/s]  

{'loss': 0.0225, 'learning_rate': 0.0003837008267059771, 'epoch': 261.0}


                                                       
 87%|████████▋ | 39411/45300 [3:39:54<16:22,  5.99it/s]

{'eval_loss': 0.43826448917388916, 'eval_bleu': 18.3902, 'eval_gen_len': 4.9562, 'eval_runtime': 7.0452, 'eval_samples_per_second': 42.156, 'eval_steps_per_second': 2.697, 'epoch': 261.0}


 87%|████████▋ | 39562/45300 [3:40:22<16:41,  5.73it/s]  

{'loss': 0.022, 'learning_rate': 0.000373882351052257, 'epoch': 262.0}


                                                       
 87%|████████▋ | 39562/45300 [3:40:29<16:41,  5.73it/s]

{'eval_loss': 0.4673096537590027, 'eval_bleu': 16.0867, 'eval_gen_len': 4.7306, 'eval_runtime': 7.2292, 'eval_samples_per_second': 41.084, 'eval_steps_per_second': 2.628, 'epoch': 262.0}


 88%|████████▊ | 39713/45300 [3:40:55<14:35,  6.38it/s]  

{'loss': 0.0224, 'learning_rate': 0.00036412889841611117, 'epoch': 263.0}


                                                       
 88%|████████▊ | 39713/45300 [3:41:02<14:35,  6.38it/s]

{'eval_loss': 0.4383811950683594, 'eval_bleu': 17.1106, 'eval_gen_len': 5.0404, 'eval_runtime': 7.5246, 'eval_samples_per_second': 39.471, 'eval_steps_per_second': 2.525, 'epoch': 263.0}


 88%|████████▊ | 39864/45300 [3:41:27<14:10,  6.39it/s]  

{'loss': 0.0215, 'learning_rate': 0.000354310422762391, 'epoch': 264.0}


                                                       
 88%|████████▊ | 39864/45300 [3:41:34<14:10,  6.39it/s]

{'eval_loss': 0.4510791003704071, 'eval_bleu': 17.4136, 'eval_gen_len': 4.9933, 'eval_runtime': 7.0915, 'eval_samples_per_second': 41.881, 'eval_steps_per_second': 2.679, 'epoch': 264.0}


 88%|████████▊ | 40015/45300 [3:41:59<13:05,  6.73it/s]  

{'loss': 0.0211, 'learning_rate': 0.00034449194710867083, 'epoch': 265.0}


                                                       
 88%|████████▊ | 40015/45300 [3:42:06<13:05,  6.73it/s]

{'eval_loss': 0.46134090423583984, 'eval_bleu': 16.4622, 'eval_gen_len': 4.8855, 'eval_runtime': 7.0611, 'eval_samples_per_second': 42.061, 'eval_steps_per_second': 2.691, 'epoch': 265.0}


 89%|████████▊ | 40166/45300 [3:42:31<13:58,  6.12it/s]  

{'loss': 0.0206, 'learning_rate': 0.0003346734714549507, 'epoch': 266.0}


                                                       
 89%|████████▊ | 40166/45300 [3:42:37<13:58,  6.12it/s]

{'eval_loss': 0.44701823592185974, 'eval_bleu': 15.1877, 'eval_gen_len': 5.1818, 'eval_runtime': 6.7578, 'eval_samples_per_second': 43.949, 'eval_steps_per_second': 2.812, 'epoch': 266.0}


 89%|████████▉ | 40317/45300 [3:43:02<12:22,  6.71it/s]  

{'loss': 0.0206, 'learning_rate': 0.00032485499580123055, 'epoch': 267.0}


                                                       
 89%|████████▉ | 40317/45300 [3:43:09<12:22,  6.71it/s]

{'eval_loss': 0.4405946433544159, 'eval_bleu': 16.9015, 'eval_gen_len': 5.229, 'eval_runtime': 6.8112, 'eval_samples_per_second': 43.604, 'eval_steps_per_second': 2.79, 'epoch': 267.0}


 89%|████████▉ | 40468/45300 [3:43:34<12:32,  6.42it/s]  

{'loss': 0.0205, 'learning_rate': 0.00031503652014751044, 'epoch': 268.0}


                                                       
 89%|████████▉ | 40468/45300 [3:43:41<12:32,  6.42it/s]

{'eval_loss': 0.4511674642562866, 'eval_bleu': 18.0591, 'eval_gen_len': 4.9158, 'eval_runtime': 6.8573, 'eval_samples_per_second': 43.311, 'eval_steps_per_second': 2.771, 'epoch': 268.0}


 90%|████████▉ | 40619/45300 [3:44:13<15:36,  5.00it/s]  

{'loss': 0.02, 'learning_rate': 0.0003052180444937903, 'epoch': 269.0}


                                                       
 90%|████████▉ | 40619/45300 [3:44:21<15:36,  5.00it/s]

{'eval_loss': 0.451105535030365, 'eval_bleu': 18.734, 'eval_gen_len': 5.0034, 'eval_runtime': 8.9499, 'eval_samples_per_second': 33.185, 'eval_steps_per_second': 2.123, 'epoch': 269.0}


 90%|█████████ | 40770/45300 [3:44:51<14:55,  5.06it/s]  

{'loss': 0.0206, 'learning_rate': 0.0002953995688400701, 'epoch': 270.0}


                                                       
 90%|█████████ | 40770/45300 [3:44:59<14:55,  5.06it/s]

{'eval_loss': 0.4390237629413605, 'eval_bleu': 16.215, 'eval_gen_len': 5.1414, 'eval_runtime': 7.8321, 'eval_samples_per_second': 37.921, 'eval_steps_per_second': 2.426, 'epoch': 270.0}


 90%|█████████ | 40921/45300 [3:45:27<12:00,  6.08it/s]  

{'loss': 0.0195, 'learning_rate': 0.00028558109318635, 'epoch': 271.0}


                                                       
 90%|█████████ | 40921/45300 [3:45:36<12:00,  6.08it/s]

{'eval_loss': 0.4444037675857544, 'eval_bleu': 19.485, 'eval_gen_len': 5.037, 'eval_runtime': 8.6151, 'eval_samples_per_second': 34.474, 'eval_steps_per_second': 2.205, 'epoch': 271.0}


 91%|█████████ | 41072/45300 [3:46:08<12:58,  5.43it/s]  

{'loss': 0.0178, 'learning_rate': 0.00027576261753262983, 'epoch': 272.0}


                                                       
 91%|█████████ | 41072/45300 [3:46:16<12:58,  5.43it/s]

{'eval_loss': 0.46062707901000977, 'eval_bleu': 16.8723, 'eval_gen_len': 4.8956, 'eval_runtime': 7.8836, 'eval_samples_per_second': 37.673, 'eval_steps_per_second': 2.41, 'epoch': 272.0}


 91%|█████████ | 41223/45300 [3:46:45<12:29,  5.44it/s]  

{'loss': 0.0174, 'learning_rate': 0.0002659441418789097, 'epoch': 273.0}


                                                       
 91%|█████████ | 41223/45300 [3:46:53<12:29,  5.44it/s]

{'eval_loss': 0.44175633788108826, 'eval_bleu': 17.6368, 'eval_gen_len': 5.4141, 'eval_runtime': 7.9123, 'eval_samples_per_second': 37.537, 'eval_steps_per_second': 2.401, 'epoch': 273.0}


 91%|█████████▏| 41374/45300 [3:47:24<17:00,  3.85it/s]  

{'loss': 0.0171, 'learning_rate': 0.0002561256662251896, 'epoch': 274.0}


                                                       
 91%|█████████▏| 41374/45300 [3:47:32<17:00,  3.85it/s]

{'eval_loss': 0.4482037127017975, 'eval_bleu': 19.1108, 'eval_gen_len': 5.0909, 'eval_runtime': 8.8813, 'eval_samples_per_second': 33.441, 'eval_steps_per_second': 2.139, 'epoch': 274.0}


 92%|█████████▏| 41525/45300 [3:48:03<13:05,  4.81it/s]  

{'loss': 0.0166, 'learning_rate': 0.00024630719057146944, 'epoch': 275.0}


                                                       
 92%|█████████▏| 41525/45300 [3:48:12<13:05,  4.81it/s]

{'eval_loss': 0.4472634792327881, 'eval_bleu': 16.6683, 'eval_gen_len': 4.9428, 'eval_runtime': 9.0617, 'eval_samples_per_second': 32.775, 'eval_steps_per_second': 2.097, 'epoch': 275.0}


 92%|█████████▏| 41676/45300 [3:48:42<11:02,  5.47it/s]  

{'loss': 0.0167, 'learning_rate': 0.00023648871491774933, 'epoch': 276.0}


                                                       
 92%|█████████▏| 41676/45300 [3:48:50<11:02,  5.47it/s]

{'eval_loss': 0.44834616780281067, 'eval_bleu': 19.9836, 'eval_gen_len': 4.9865, 'eval_runtime': 8.3833, 'eval_samples_per_second': 35.428, 'eval_steps_per_second': 2.266, 'epoch': 276.0}


 92%|█████████▏| 41827/45300 [3:49:22<09:24,  6.15it/s]  

{'loss': 0.0167, 'learning_rate': 0.00022667023926402916, 'epoch': 277.0}


                                                       
 92%|█████████▏| 41827/45300 [3:49:29<09:24,  6.15it/s]

{'eval_loss': 0.45016247034072876, 'eval_bleu': 18.8206, 'eval_gen_len': 4.9562, 'eval_runtime': 7.4531, 'eval_samples_per_second': 39.849, 'eval_steps_per_second': 2.549, 'epoch': 277.0}


 93%|█████████▎| 41978/45300 [3:50:03<09:20,  5.93it/s]  

{'loss': 0.0153, 'learning_rate': 0.00021685176361030905, 'epoch': 278.0}


                                                       
 93%|█████████▎| 41978/45300 [3:50:11<09:20,  5.93it/s]

{'eval_loss': 0.44741031527519226, 'eval_bleu': 15.9409, 'eval_gen_len': 5.0, 'eval_runtime': 7.1761, 'eval_samples_per_second': 41.388, 'eval_steps_per_second': 2.648, 'epoch': 278.0}


 93%|█████████▎| 42129/45300 [3:50:37<08:17,  6.38it/s]  

{'loss': 0.0154, 'learning_rate': 0.00020703328795658888, 'epoch': 279.0}


                                                       
 93%|█████████▎| 42129/45300 [3:50:45<08:17,  6.38it/s]

{'eval_loss': 0.45396658778190613, 'eval_bleu': 17.2076, 'eval_gen_len': 4.9865, 'eval_runtime': 7.0947, 'eval_samples_per_second': 41.862, 'eval_steps_per_second': 2.678, 'epoch': 279.0}


 93%|█████████▎| 42280/45300 [3:51:11<08:31,  5.90it/s]  

{'loss': 0.0154, 'learning_rate': 0.00019721481230286877, 'epoch': 280.0}


                                                       
 93%|█████████▎| 42280/45300 [3:51:18<08:31,  5.90it/s]

{'eval_loss': 0.44747933745384216, 'eval_bleu': 15.8376, 'eval_gen_len': 5.0202, 'eval_runtime': 7.2291, 'eval_samples_per_second': 41.084, 'eval_steps_per_second': 2.628, 'epoch': 280.0}


 94%|█████████▎| 42431/45300 [3:51:46<08:42,  5.49it/s]  

{'loss': 0.0142, 'learning_rate': 0.0001873963366491486, 'epoch': 281.0}


                                                       
 94%|█████████▎| 42431/45300 [3:51:54<08:42,  5.49it/s]

{'eval_loss': 0.4514760375022888, 'eval_bleu': 19.1921, 'eval_gen_len': 4.9697, 'eval_runtime': 7.1813, 'eval_samples_per_second': 41.358, 'eval_steps_per_second': 2.646, 'epoch': 281.0}


 94%|█████████▍| 42582/45300 [3:52:20<07:31,  6.02it/s]  

{'loss': 0.0139, 'learning_rate': 0.0001775778609954285, 'epoch': 282.0}


                                                       
 94%|█████████▍| 42582/45300 [3:52:28<07:31,  6.02it/s]

{'eval_loss': 0.44277647137641907, 'eval_bleu': 20.9229, 'eval_gen_len': 5.0337, 'eval_runtime': 8.0186, 'eval_samples_per_second': 37.039, 'eval_steps_per_second': 2.369, 'epoch': 282.0}


 94%|█████████▍| 42733/45300 [3:53:00<06:37,  6.45it/s]  

{'loss': 0.0139, 'learning_rate': 0.00016775938534170835, 'epoch': 283.0}


                                                       
 94%|█████████▍| 42733/45300 [3:53:07<06:37,  6.45it/s]

{'eval_loss': 0.45893311500549316, 'eval_bleu': 18.3669, 'eval_gen_len': 4.8586, 'eval_runtime': 7.0189, 'eval_samples_per_second': 42.314, 'eval_steps_per_second': 2.707, 'epoch': 283.0}


 95%|█████████▍| 42884/45300 [3:53:38<09:35,  4.19it/s]  

{'loss': 0.0138, 'learning_rate': 0.0001579409096879882, 'epoch': 284.0}


                                                       
 95%|█████████▍| 42884/45300 [3:53:46<09:35,  4.19it/s]

{'eval_loss': 0.4510662853717804, 'eval_bleu': 17.1927, 'eval_gen_len': 5.1111, 'eval_runtime': 8.3936, 'eval_samples_per_second': 35.384, 'eval_steps_per_second': 2.264, 'epoch': 284.0}


 95%|█████████▌| 43035/45300 [3:54:15<06:07,  6.17it/s]  

{'loss': 0.013, 'learning_rate': 0.00014812243403426807, 'epoch': 285.0}


                                                       
 95%|█████████▌| 43035/45300 [3:54:22<06:07,  6.17it/s]

{'eval_loss': 0.4490536153316498, 'eval_bleu': 19.1545, 'eval_gen_len': 5.2424, 'eval_runtime': 7.1709, 'eval_samples_per_second': 41.418, 'eval_steps_per_second': 2.65, 'epoch': 285.0}


 95%|█████████▌| 43186/45300 [3:54:48<06:13,  5.67it/s]  

{'loss': 0.0134, 'learning_rate': 0.00013830395838054793, 'epoch': 286.0}


                                                       
 95%|█████████▌| 43186/45300 [3:54:55<06:13,  5.67it/s]

{'eval_loss': 0.45711520314216614, 'eval_bleu': 20.0681, 'eval_gen_len': 5.0808, 'eval_runtime': 7.071, 'eval_samples_per_second': 42.003, 'eval_steps_per_second': 2.687, 'epoch': 286.0}


 96%|█████████▌| 43337/45300 [3:55:20<05:06,  6.41it/s]  

{'loss': 0.0127, 'learning_rate': 0.00012848548272682776, 'epoch': 287.0}


                                                       
 96%|█████████▌| 43337/45300 [3:55:27<05:06,  6.41it/s]

{'eval_loss': 0.4571076035499573, 'eval_bleu': 20.367, 'eval_gen_len': 5.1751, 'eval_runtime': 7.6132, 'eval_samples_per_second': 39.011, 'eval_steps_per_second': 2.496, 'epoch': 287.0}


 96%|█████████▌| 43488/45300 [3:55:53<05:21,  5.63it/s]  

{'loss': 0.0126, 'learning_rate': 0.00011866700707310764, 'epoch': 288.0}


                                                       
 96%|█████████▌| 43488/45300 [3:56:00<05:21,  5.63it/s]

{'eval_loss': 0.4535294473171234, 'eval_bleu': 20.3978, 'eval_gen_len': 5.3131, 'eval_runtime': 7.2467, 'eval_samples_per_second': 40.984, 'eval_steps_per_second': 2.622, 'epoch': 288.0}


 96%|█████████▋| 43639/45300 [3:56:25<06:15,  4.42it/s]  

{'loss': 0.0123, 'learning_rate': 0.0001088485314193875, 'epoch': 289.0}


                                                       
 96%|█████████▋| 43639/45300 [3:56:34<06:15,  4.42it/s]

{'eval_loss': 0.4616987407207489, 'eval_bleu': 20.4517, 'eval_gen_len': 5.2559, 'eval_runtime': 8.2369, 'eval_samples_per_second': 36.057, 'eval_steps_per_second': 2.307, 'epoch': 289.0}


 97%|█████████▋| 43790/45300 [3:57:03<04:42,  5.35it/s]  

{'loss': 0.012, 'learning_rate': 9.903005576566737e-05, 'epoch': 290.0}


                                                       
 97%|█████████▋| 43790/45300 [3:57:12<04:42,  5.35it/s]

{'eval_loss': 0.4523962140083313, 'eval_bleu': 20.0246, 'eval_gen_len': 5.4411, 'eval_runtime': 9.2008, 'eval_samples_per_second': 32.28, 'eval_steps_per_second': 2.065, 'epoch': 290.0}


 97%|█████████▋| 43941/45300 [3:57:46<03:58,  5.70it/s]  

{'loss': 0.0114, 'learning_rate': 8.927660312952154e-05, 'epoch': 291.0}


                                                       
 97%|█████████▋| 43941/45300 [3:57:54<03:58,  5.70it/s]

{'eval_loss': 0.45781442523002625, 'eval_bleu': 19.6264, 'eval_gen_len': 5.4478, 'eval_runtime': 7.9915, 'eval_samples_per_second': 37.164, 'eval_steps_per_second': 2.378, 'epoch': 291.0}


 97%|█████████▋| 44092/45300 [3:58:23<03:31,  5.72it/s]  

{'loss': 0.011, 'learning_rate': 7.94581274758014e-05, 'epoch': 292.0}


                                                       
 97%|█████████▋| 44092/45300 [3:58:30<03:31,  5.72it/s]

{'eval_loss': 0.4590538442134857, 'eval_bleu': 20.7096, 'eval_gen_len': 4.9663, 'eval_runtime': 6.9893, 'eval_samples_per_second': 42.494, 'eval_steps_per_second': 2.718, 'epoch': 292.0}


 98%|█████████▊| 44243/45300 [3:58:58<02:38,  6.65it/s]

{'loss': 0.0108, 'learning_rate': 6.963965182208124e-05, 'epoch': 293.0}


                                                       
 98%|█████████▊| 44243/45300 [3:59:06<02:38,  6.65it/s]

{'eval_loss': 0.4569489359855652, 'eval_bleu': 21.1886, 'eval_gen_len': 5.1448, 'eval_runtime': 7.6785, 'eval_samples_per_second': 38.679, 'eval_steps_per_second': 2.474, 'epoch': 293.0}


 98%|█████████▊| 44394/45300 [3:59:37<03:31,  4.28it/s]

{'loss': 0.0105, 'learning_rate': 5.982117616836112e-05, 'epoch': 294.0}


                                                       
 98%|█████████▊| 44394/45300 [3:59:46<03:31,  4.28it/s]

{'eval_loss': 0.45751649141311646, 'eval_bleu': 20.8517, 'eval_gen_len': 5.0707, 'eval_runtime': 8.5781, 'eval_samples_per_second': 34.623, 'eval_steps_per_second': 2.215, 'epoch': 294.0}


 98%|█████████▊| 44545/45300 [4:00:14<02:04,  6.05it/s]

{'loss': 0.0106, 'learning_rate': 5.0067723532215275e-05, 'epoch': 295.0}


                                                       
 98%|█████████▊| 44545/45300 [4:00:23<02:04,  6.05it/s]

{'eval_loss': 0.4673160910606384, 'eval_bleu': 19.3657, 'eval_gen_len': 5.0067, 'eval_runtime': 8.312, 'eval_samples_per_second': 35.731, 'eval_steps_per_second': 2.286, 'epoch': 295.0}


 99%|█████████▊| 44696/45300 [4:00:53<01:31,  6.63it/s]

{'loss': 0.0103, 'learning_rate': 4.0314270896069446e-05, 'epoch': 296.0}


                                                       
 99%|█████████▊| 44696/45300 [4:01:00<01:31,  6.63it/s]

{'eval_loss': 0.47035592794418335, 'eval_bleu': 20.3329, 'eval_gen_len': 4.9697, 'eval_runtime': 6.8941, 'eval_samples_per_second': 43.081, 'eval_steps_per_second': 2.756, 'epoch': 296.0}


 99%|█████████▉| 44847/45300 [4:01:27<01:21,  5.58it/s]

{'loss': 0.0095, 'learning_rate': 3.0495795242349306e-05, 'epoch': 297.0}


                                                       
 99%|█████████▉| 44847/45300 [4:01:36<01:21,  5.58it/s]

{'eval_loss': 0.46865126490592957, 'eval_bleu': 18.9917, 'eval_gen_len': 4.9529, 'eval_runtime': 8.9516, 'eval_samples_per_second': 33.178, 'eval_steps_per_second': 2.123, 'epoch': 297.0}


 99%|█████████▉| 44998/45300 [4:02:09<01:12,  4.17it/s]

{'loss': 0.0096, 'learning_rate': 2.067731958862917e-05, 'epoch': 298.0}


                                                       
 99%|█████████▉| 44998/45300 [4:02:20<01:12,  4.17it/s]

{'eval_loss': 0.4684959352016449, 'eval_bleu': 20.3238, 'eval_gen_len': 4.9966, 'eval_runtime': 11.5211, 'eval_samples_per_second': 25.779, 'eval_steps_per_second': 1.649, 'epoch': 298.0}


100%|█████████▉| 45149/45300 [4:02:56<00:30,  4.93it/s]

{'loss': 0.0095, 'learning_rate': 1.0858843934909029e-05, 'epoch': 299.0}


                                                       
100%|█████████▉| 45149/45300 [4:03:07<00:30,  4.93it/s]

{'eval_loss': 0.4690772593021393, 'eval_bleu': 20.2729, 'eval_gen_len': 5.0034, 'eval_runtime': 10.8602, 'eval_samples_per_second': 27.347, 'eval_steps_per_second': 1.75, 'epoch': 299.0}


100%|██████████| 45300/45300 [4:03:42<00:00,  3.28it/s]

{'loss': 0.0093, 'learning_rate': 1.0403682811888888e-06, 'epoch': 300.0}


                                                       
100%|██████████| 45300/45300 [4:03:54<00:00,  3.28it/s]

{'eval_loss': 0.46803587675094604, 'eval_bleu': 20.2492, 'eval_gen_len': 5.0236, 'eval_runtime': 11.7114, 'eval_samples_per_second': 25.36, 'eval_steps_per_second': 1.622, 'epoch': 300.0}


100%|██████████| 45300/45300 [4:03:55<00:00,  3.28it/s]

{'train_runtime': 14640.3478, 'train_samples_per_second': 49.241, 'train_steps_per_second': 3.094, 'train_loss': 0.08464616105782802, 'epoch': 300.0}


100%|██████████| 45300/45300 [4:03:55<00:00,  3.10it/s]


TrainOutput(global_step=45300, training_loss=0.08464616105782802, metrics={'train_runtime': 14640.3478, 'train_samples_per_second': 49.241, 'train_steps_per_second': 3.094, 'train_loss': 0.08464616105782802, 'epoch': 300.0})

In [31]:
# let us save the best model
trainer.save_state()
trainer.save_model('data/checkpoints/t5_results_fw_v2_2/')
trainer._save('data/checkpoints/')

with open('data/checkpoints/t5_results_fw_v2_2/optimizer.pt', 'wb') as f:
    
    torch.save(trainer.optimizer.state_dict(), f)
    
with open('data/checkpoints/t5_results_fw_v2_2/scheduler.pt', 'wb') as f:
    
    torch.save(trainer.lr_scheduler.state_dict(), f)


In [32]:
# let us get the best model
model = AutoModelForSeq2SeqLM.from_pretrained('data/checkpoints/t5_results_fw_v2_2/final_checkpoint/')

### Predictions

Let us generate texts and store into a DataFrame.

In [33]:

# set the model to eval mode
_ = model.eval()

# run model inference on all test data
original_translations, predicted_translations, original_texts, scores = [], [], [], {}

for data, attention_mask, labels in tqdm(DataLoader(test_dataset)):
    
    # Traduce the sentences
    original_text = tokenizer.decode(data[0], skip_special_tokens=True)
    
    original_translation = tokenizer.decode(labels[0], skip_special_tokens=True)
    
    # get tokens
    generated = torch.tensor(data)
    
    attention_mask = torch.tensor(attention_mask)
    
    # recuperate the pad token id
    pad_token_id = tokenizer.pad_token_id
    
    # perform prediction
    predictions = model.generate(generated, do_sample = False, top_k = 50, max_length = test_dataset.max_len, top_p = 0.90,
                                    temperature = 0, num_return_sequences = 0, attention_mask = attention_mask, pad_token_id = pad_token_id)
    
    # calculate the score and add it to the score
    result = evaluation.compute_metrics((predictions, torch.tensor(labels)))
    
    if not scores: scores.update({k: v for k, v in result.items()})
    
    else: scores.update({k: round(scores[k] + v, 4) for k, v in result.items()})
    
    # decode the predicted tokens into texts
    predicted_translation = list(test_dataset.decode(predictions))
    
    print(predicted_translation[0])
    
    # append results
    original_translations.append(original_translation)
    
    predicted_translations.extend(predicted_translation)
    
    original_texts.append(original_text)

# transform result into data frame
df_ft_to_wf = pd.DataFrame({'original_text': original_texts,
                            'original_label': original_translations,
                            'predicted_label': predicted_translations})

# print the result
df_ft_to_wf.head()

  generated = torch.tensor(data)
  attention_mask = torch.tensor(attention_mask)
  result = evaluation.compute_metrics((predictions, torch.tensor(labels)))
  0%|          | 1/297 [00:01<09:29,  1.93s/it]

Seetil


  1%|          | 2/297 [00:02<06:47,  1.38s/it]

dog


  1%|          | 3/297 [00:03<05:35,  1.14s/it]

Jambaar du bare wax.


  1%|▏         | 4/297 [00:04<05:21,  1.10s/it]

Su góor gi dee ñëw


  2%|▏         | 5/297 [00:05<05:00,  1.03s/it]

bokk


  2%|▏         | 6/297 [00:06<05:06,  1.05s/it]

Mbaa...


  2%|▏         | 7/297 [00:07<04:59,  1.03s/it]

ba


  3%|▎         | 8/297 [00:08<04:43,  1.02it/s]

Ndaw senn réerul.


  3%|▎         | 9/297 [00:09<04:43,  1.02it/s]

moyaal


  3%|▎         | 10/297 [00:10<04:34,  1.05it/s]

Nit ki dóor na nag wi ak bant.


  4%|▎         | 11/297 [00:13<07:36,  1.60s/it]

Maa dem


  4%|▍         | 12/297 [00:14<06:40,  1.40s/it]

Ma õgee doon dem


  4%|▍         | 13/297 [00:15<05:54,  1.25s/it]

Waaye man sax dama dul ñëw


  5%|▍         | 14/297 [00:16<05:39,  1.20s/it]

ëndi


  5%|▌         | 15/297 [00:17<05:11,  1.10s/it]

ca


  5%|▌         | 16/297 [00:18<04:54,  1.05s/it]

mpar


  6%|▌         | 17/297 [00:19<04:48,  1.03s/it]

sabar


  6%|▌         | 18/297 [00:20<04:40,  1.01s/it]

Kookule la?


  6%|▋         | 19/297 [00:21<04:44,  1.02s/it]

maggat


  7%|▋         | 20/297 [00:22<04:38,  1.01s/it]

segg


  7%|▋         | 21/297 [00:23<04:32,  1.01it/s]

Góor gi nit la.


  7%|▋         | 22/297 [00:26<07:26,  1.62s/it]

Benn bi rëcc laay seet.


  8%|▊         | 23/297 [00:27<06:49,  1.50s/it]

sëgg


  8%|▊         | 24/297 [00:28<06:20,  1.39s/it]

maggat


  8%|▊         | 25/297 [00:29<05:49,  1.28s/it]

ag it


  9%|▉         | 26/297 [00:30<05:20,  1.18s/it]

lemu


  9%|▉         | 27/297 [00:31<05:12,  1.16s/it]

siifal


  9%|▉         | 28/297 [00:32<05:10,  1.15s/it]

xer


 10%|▉         | 29/297 [00:33<04:52,  1.09s/it]

Doo dem, xanaa?


 10%|█         | 30/297 [00:34<04:42,  1.06s/it]

fatu


 10%|█         | 31/297 [00:36<05:03,  1.14s/it]

Yéen demuloo fenn


 11%|█         | 32/297 [00:37<05:32,  1.25s/it]

Góor gi moo ñëw na.


 11%|█         | 33/297 [00:38<05:05,  1.16s/it]

wet


 11%|█▏        | 34/297 [00:39<04:45,  1.08s/it]

yi


 12%|█▏        | 35/297 [00:40<04:28,  1.02s/it]

Góor gi dem ba xale ba?


 12%|█▏        | 36/297 [00:41<04:21,  1.00s/it]

Bi mu demee


 12%|█▏        | 37/297 [00:42<04:23,  1.01s/it]

wax woon


 13%|█▎        | 38/297 [00:43<04:15,  1.01it/s]

Moom daal, léegi addina dafa soppeeku.


 13%|█▎        | 39/297 [00:44<04:25,  1.03s/it]

Ba mu demee


 13%|█▎        | 40/297 [00:45<04:20,  1.01s/it]

pas


 14%|█▍        | 41/297 [00:46<04:22,  1.02s/it]

wàñ


 14%|█▍        | 42/297 [00:49<06:38,  1.56s/it]

Bu ñu dem


 14%|█▍        | 43/297 [00:50<05:47,  1.37s/it]

Kooku dem na


 15%|█▍        | 44/297 [00:51<05:23,  1.28s/it]

Nit ag gaynde duñu dëkkóo.


 15%|█▌        | 45/297 [00:52<05:03,  1.20s/it]

Te ci bir keneen ki ci buntu kër gi.


 15%|█▌        | 46/297 [00:53<04:40,  1.12s/it]

Mi õgi noonu rekk


 16%|█▌        | 47/297 [00:54<04:30,  1.08s/it]

seetu


 16%|█▌        | 48/297 [00:55<04:13,  1.02s/it]

Dem nañu...


 16%|█▋        | 49/297 [00:56<04:05,  1.01it/s]

foofa


 17%|█▋        | 50/297 [00:57<04:15,  1.03s/it]

magg


 17%|█▋        | 51/297 [00:58<04:05,  1.00it/s]

araféef raññee


 18%|█▊        | 52/297 [00:59<04:07,  1.01s/it]

waxale


 18%|█▊        | 53/297 [01:01<05:18,  1.30s/it]

jéex


 18%|█▊        | 54/297 [01:02<05:46,  1.42s/it]

mépp


 19%|█▊        | 55/297 [01:03<05:13,  1.30s/it]

ndax kenn bañ génn


 19%|█▉        | 56/297 [01:04<05:01,  1.25s/it]

looluu


 19%|█▉        | 57/297 [01:06<04:48,  1.20s/it]

nit


 20%|█▉        | 58/297 [01:07<04:38,  1.17s/it]

May na ka keneen ku sawar.


 20%|█▉        | 59/297 [01:08<04:18,  1.09s/it]

mee


 20%|██        | 60/297 [01:08<04:03,  1.03s/it]

cëriñ


 21%|██        | 61/297 [01:09<04:02,  1.03s/it]

Yaw daawuloo coow.


 21%|██        | 62/297 [01:10<03:59,  1.02s/it]

Góor gee ni mi õgi fi, soo demee


 21%|██        | 63/297 [01:12<04:04,  1.04s/it]

jaambur


 22%|██▏       | 64/297 [01:13<04:13,  1.09s/it]

Bi mu dee dem


 22%|██▏       | 65/297 [01:14<03:59,  1.03s/it]

Maa õgi, te dem õga, te dem na.


 22%|██▏       | 66/297 [01:15<03:54,  1.01s/it]

Dem na ca subë.


 23%|██▎       | 67/297 [01:16<03:49,  1.00it/s]

sawwu


 23%|██▎       | 68/297 [01:17<04:03,  1.06s/it]

Gis na keneen ki woon.


 23%|██▎       | 69/297 [01:18<03:53,  1.02s/it]

toogukaay


 24%|██▎       | 70/297 [01:19<03:45,  1.01it/s]

faj


 24%|██▍       | 71/297 [01:20<03:45,  1.00it/s]

Góor gi di dem


 24%|██▍       | 72/297 [01:21<03:39,  1.03it/s]

Mu doon Lebu Yoff.


 25%|██▍       | 73/297 [01:22<03:53,  1.04s/it]

yooyu


 25%|██▍       | 74/297 [01:23<04:04,  1.10s/it]

wàñ


 25%|██▌       | 75/297 [01:24<03:54,  1.06s/it]

Maa demulkoon


 26%|██▌       | 76/297 [01:25<03:50,  1.04s/it]

Xale ya mbër lañu fa woon.


 26%|██▌       | 77/297 [01:26<03:58,  1.09s/it]

waxé


 26%|██▋       | 78/297 [01:27<03:49,  1.05s/it]

nit ku góor


 27%|██▋       | 79/297 [01:28<03:35,  1.01it/s]

Fu mu demoon?


 27%|██▋       | 80/297 [01:29<03:29,  1.04it/s]

Xale yi set nañu ci biir, tày ci biti itam.


 27%|██▋       | 81/297 [01:30<03:23,  1.06it/s]

Kooka ak kile, bokkuñu


 28%|██▊       | 82/297 [01:31<03:30,  1.02it/s]

Su góor gi dee ñëw


 28%|██▊       | 83/297 [01:32<03:53,  1.09s/it]

laaw


 28%|██▊       | 84/297 [01:33<03:44,  1.05s/it]

keneen ku jigéen


 29%|██▊       | 85/297 [01:36<05:26,  1.54s/it]

Kooku ci biir.


 29%|██▉       | 86/297 [01:37<05:00,  1.42s/it]

dendandoo


 29%|██▉       | 87/297 [01:38<04:28,  1.28s/it]

wextan


 30%|██▉       | 88/297 [01:39<04:10,  1.20s/it]

Góor gi demoon na


 30%|██▉       | 89/297 [01:40<03:52,  1.12s/it]

Su demoon


 30%|███       | 90/297 [01:41<03:45,  1.09s/it]

te


 31%|███       | 91/297 [01:42<03:44,  1.09s/it]

Yooyuu


 31%|███       | 92/297 [01:43<03:38,  1.06s/it]

des


 31%|███▏      | 93/297 [01:44<03:31,  1.03s/it]

maggat


 32%|███▏      | 94/297 [01:45<03:22,  1.00it/s]

Maa õgi ci bir kär gi


 32%|███▏      | 95/297 [01:46<03:48,  1.13s/it]

Deõk naa jigéen ju.


 32%|███▏      | 96/297 [01:48<03:52,  1.15s/it]

Jambaar du bare wax.


 33%|███▎      | 97/297 [01:49<03:45,  1.13s/it]

Seetal sarax su.


 33%|███▎      | 98/297 [01:49<03:30,  1.06s/it]

om


 33%|███▎      | 99/297 [01:50<03:22,  1.02s/it]

Foo jëm?


 34%|███▎      | 100/297 [01:52<03:32,  1.08s/it]

Bëgg naa ci juróomi waxtu, ci subë, õga ñëw


 34%|███▍      | 101/297 [01:53<03:31,  1.08s/it]

jaambur-jaambur


 34%|███▍      | 102/297 [01:54<03:26,  1.06s/it]

Gisoon na, moom, Musâ


 35%|███▍      | 103/297 [01:55<03:18,  1.02s/it]

araf raññee


 35%|███▌      | 104/297 [01:56<03:13,  1.00s/it]

Dem na.


 35%|███▌      | 105/297 [01:57<03:16,  1.02s/it]

Su góor gi dee ñëw


 36%|███▌      | 106/297 [02:00<05:06,  1.60s/it]

cemp


 36%|███▌      | 107/297 [02:01<04:44,  1.50s/it]

Bi õga demee la.


 36%|███▋      | 108/297 [02:02<04:23,  1.40s/it]

Wall wuu wépp.


 37%|███▋      | 109/297 [02:03<04:07,  1.32s/it]

ndax


 37%|███▋      | 110/297 [02:04<03:54,  1.26s/it]

araféef


 37%|███▋      | 111/297 [02:05<03:39,  1.18s/it]

Na góor gi dem


 38%|███▊      | 112/297 [02:07<03:44,  1.21s/it]

Daan na di génn.


 38%|███▊      | 113/297 [02:08<03:49,  1.25s/it]

Dama bëgg dem


 38%|███▊      | 114/297 [02:10<04:47,  1.57s/it]

Séen naa ay ndaw.


 39%|███▊      | 115/297 [02:11<04:21,  1.44s/it]

leneen leneen


 39%|███▉      | 116/297 [02:12<03:54,  1.30s/it]

Soo dee góor


 39%|███▉      | 117/297 [02:13<03:41,  1.23s/it]

góox


 40%|███▉      | 118/297 [02:15<03:33,  1.19s/it]

Ci foofu õga taxaw.


 40%|████      | 119/297 [02:16<03:22,  1.14s/it]

Gis naa jeeg bi.


 40%|████      | 120/297 [02:17<03:21,  1.14s/it]

Dem nañu...


 41%|████      | 121/297 [02:18<03:08,  1.07s/it]

su


 41%|████      | 122/297 [02:18<02:58,  1.02s/it]

xamadi


 41%|████▏     | 123/297 [02:20<03:04,  1.06s/it]

Wax jéppu mu Lëf lan?


 42%|████▏     | 124/297 [02:22<03:56,  1.37s/it]

xaj


 42%|████▏     | 125/297 [02:23<04:12,  1.47s/it]

Dem na góor gi.


 42%|████▏     | 126/297 [02:24<03:43,  1.31s/it]

Góor gee ni kookule la, soo demee


 43%|████▎     | 127/297 [02:25<03:27,  1.22s/it]

Séen naa ay yëf.


 43%|████▎     | 128/297 [02:26<03:20,  1.18s/it]

Nit la ci


 43%|████▎     | 129/297 [02:28<03:13,  1.15s/it]

ndax


 44%|████▍     | 130/297 [02:28<02:57,  1.06s/it]

dindiku


 44%|████▍     | 131/297 [02:29<02:49,  1.02s/it]

nëbbante


 44%|████▍     | 132/297 [02:30<02:50,  1.03s/it]

Naan?


 45%|████▍     | 133/297 [02:32<02:53,  1.06s/it]

Wool kee dul dem


 45%|████▌     | 134/297 [02:33<02:52,  1.06s/it]

waxkat


 45%|████▌     | 135/297 [02:34<03:25,  1.27s/it]

coppeekuwaay


 46%|████▌     | 136/297 [02:36<03:52,  1.45s/it]

fatu


 46%|████▌     | 137/297 [02:37<03:33,  1.33s/it]

Noona Su dem Noon dem Noon Noon Noon Noon Noon dem dem dem dem dem Noon Noon Noon Noon Noon dem dem dem dem dem Noon Noon Noon Noon Noon dem dem dem dem dem Noon Noon Noon Noon Noon Noon dem dem dem dem dem Noon Noon Noon


 46%|████▋     | 138/297 [02:38<03:13,  1.22s/it]

yan


 47%|████▋     | 139/297 [02:39<02:56,  1.12s/it]

jëmentalukaay


 47%|████▋     | 140/297 [02:40<02:45,  1.06s/it]

Góor gi may na nit dara.


 47%|████▋     | 141/297 [02:41<02:46,  1.07s/it]

wàcc


 48%|████▊     | 142/297 [02:43<03:01,  1.17s/it]

xasaw


 48%|████▊     | 143/297 [02:44<02:56,  1.15s/it]

Ma õgii dem.


 48%|████▊     | 144/297 [02:45<02:48,  1.10s/it]

Wooyil Musaa moom mi di dem


 49%|████▉     | 145/297 [02:46<02:41,  1.06s/it]

Gaynde gee lekk moomu mépp


 49%|████▉     | 146/297 [02:47<03:14,  1.29s/it]

Faatim la, mu ni.


 49%|████▉     | 147/297 [02:48<03:04,  1.23s/it]

Góor gi nee xale yi demkoon nañu fa.


 50%|████▉     | 148/297 [02:49<02:50,  1.14s/it]

Kooy waxal?


 50%|█████     | 149/297 [02:50<02:43,  1.10s/it]

dak


 51%|█████     | 150/297 [02:52<02:49,  1.15s/it]

Góor gi dem


 51%|█████     | 151/297 [02:53<02:51,  1.17s/it]

Su dee dem


 51%|█████     | 152/297 [02:54<02:45,  1.14s/it]

Duma dem


 52%|█████▏    | 153/297 [02:55<02:32,  1.06s/it]

Dama dem


 52%|█████▏    | 154/297 [02:56<02:29,  1.05s/it]

fatu


 52%|█████▏    | 155/297 [02:57<02:47,  1.18s/it]

foofii


 53%|█████▎    | 156/297 [02:59<02:49,  1.20s/it]

raññe ameef


 53%|█████▎    | 157/297 [03:00<02:39,  1.14s/it]

Gor gii di Lawbe Ndar.


 53%|█████▎    | 158/297 [03:01<02:32,  1.10s/it]

Séen naa ab néeg.


 54%|█████▎    | 159/297 [03:02<02:31,  1.10s/it]

wàt


 54%|█████▍    | 160/297 [03:03<02:31,  1.10s/it]

ka


 54%|█████▍    | 161/297 [03:04<02:35,  1.14s/it]

jëmentalukaay


 55%|█████▍    | 162/297 [03:05<02:26,  1.08s/it]

Boobu néeg ban õga wax?


 55%|█████▍    | 163/297 [03:06<02:25,  1.08s/it]

lépp loolu woon


 55%|█████▌    | 164/297 [03:07<02:26,  1.10s/it]

yooyu


 56%|█████▌    | 165/297 [03:10<03:25,  1.55s/it]

faj


 56%|█████▌    | 166/297 [03:11<03:09,  1.45s/it]

macc


 56%|█████▌    | 167/297 [03:12<03:01,  1.40s/it]

Góor gi gisul xale bi.


 57%|█████▋    | 168/297 [03:13<02:51,  1.33s/it]

Bëgg naa õga dem te it õga noppi


 57%|█████▋    | 169/297 [03:15<02:48,  1.32s/it]

mooy


 57%|█████▋    | 170/297 [03:16<02:45,  1.30s/it]

tur


 58%|█████▊    | 171/297 [03:17<02:37,  1.25s/it]

Gis õga nit kee?


 58%|█████▊    | 172/297 [03:18<02:26,  1.17s/it]

õgeenal ma ñenn ñuu!


 58%|█████▊    | 173/297 [03:19<02:17,  1.11s/it]

weddi


 59%|█████▊    | 174/297 [03:21<02:37,  1.28s/it]

nale


 59%|█████▉    | 175/297 [03:22<02:32,  1.25s/it]

Góor gi nee na soo demee mi õgi fi


 59%|█████▉    | 176/297 [03:23<02:22,  1.18s/it]

Demkoon õga na ma keneen.


 60%|█████▉    | 177/297 [03:24<02:12,  1.10s/it]

demal


 60%|█████▉    | 178/297 [03:25<02:10,  1.10s/it]

Koo gis?


 60%|██████    | 179/297 [03:27<02:30,  1.28s/it]

daõdaõluji


 61%|██████    | 180/297 [03:28<02:35,  1.33s/it]

Geneen gaynde laa bëgg


 61%|██████    | 181/297 [03:29<02:27,  1.27s/it]

Dafa doon liggéey.


 61%|██████▏   | 182/297 [03:30<02:18,  1.21s/it]

bëgg


 62%|██████▏   | 183/297 [03:33<03:21,  1.77s/it]

jaam


 62%|██████▏   | 184/297 [03:35<02:58,  1.58s/it]

Dañu defewoon ci õgoon ag ci suba yépp da dem


 62%|██████▏   | 185/297 [03:36<02:41,  1.44s/it]

mpar


 63%|██████▎   | 186/297 [03:38<02:56,  1.59s/it]

xar xar


 63%|██████▎   | 187/297 [03:39<02:44,  1.50s/it]

Më õgile demkoon


 63%|██████▎   | 188/297 [03:40<02:33,  1.41s/it]

Wool góor gi dul dem


 64%|██████▎   | 189/297 [03:41<02:26,  1.36s/it]

So demee, mi õgiy wax


 64%|██████▍   | 190/297 [03:43<02:26,  1.37s/it]

Réew mi am na alal ndax?


 64%|██████▍   | 191/297 [03:44<02:20,  1.33s/it]

Jigéen jule demul.


 65%|██████▍   | 192/297 [03:45<02:11,  1.26s/it]

mbaa


 65%|██████▍   | 193/297 [03:46<02:03,  1.19s/it]

sax-sax


 65%|██████▌   | 194/297 [03:48<02:13,  1.29s/it]

sax-sax


 66%|██████▌   | 195/297 [03:49<02:05,  1.23s/it]

dog


 66%|██████▌   | 196/297 [03:50<02:00,  1.19s/it]

jabartu


 66%|██████▋   | 197/297 [03:51<01:53,  1.14s/it]

wex


 67%|██████▋   | 198/297 [03:52<01:54,  1.15s/it]

car


 67%|██████▋   | 199/297 [03:53<01:59,  1.22s/it]

defarkat


 67%|██████▋   | 200/297 [03:55<02:12,  1.37s/it]

Ku dem


 68%|██████▊   | 201/297 [03:56<01:58,  1.23s/it]

Gis na seeni xarit yooyu yépp


 68%|██████▊   | 202/297 [03:57<01:50,  1.16s/it]

Góor ñi dañu tayel te jigéen ñi du ñu deglóo.


 68%|██████▊   | 203/297 [03:58<01:52,  1.20s/it]

góor


 69%|██████▊   | 204/297 [04:00<01:53,  1.22s/it]

noonile


 69%|██████▉   | 205/297 [04:01<01:48,  1.18s/it]

Wooyil Musaa moom mi di dem


 69%|██████▉   | 206/297 [04:02<01:41,  1.12s/it]

nii


 70%|██████▉   | 207/297 [04:03<01:47,  1.19s/it]

loolu doõõ


 70%|███████   | 208/297 [04:04<01:45,  1.18s/it]

far


 70%|███████   | 209/297 [04:05<01:40,  1.14s/it]

Kenn nit ki ñëw na.


 71%|███████   | 210/297 [04:08<02:19,  1.60s/it]

Dem õga...


 71%|███████   | 211/297 [04:09<02:07,  1.49s/it]

saf


 71%|███████▏  | 212/297 [04:10<01:57,  1.38s/it]

kookuu


 72%|███████▏  | 213/297 [04:11<01:44,  1.24s/it]

Ñenn nit ñi yegseeguñu.


 72%|███████▏  | 214/297 [04:12<01:38,  1.19s/it]

Jigéen ji ag góor gi ñjool lañu.


 72%|███████▏  | 215/297 [04:13<01:35,  1.16s/it]

Bëgg naa õgeen dem


 73%|███████▎  | 216/297 [04:15<01:36,  1.20s/it]

Ñëwël ndax xale yi di mbër te it ñu di ay jambaar.


 73%|███████▎  | 217/297 [04:16<01:40,  1.26s/it]

Yobul na fi xar góor gi.


 73%|███████▎  | 218/297 [04:17<01:35,  1.21s/it]

Yaw milekoon yalwaan tay.


 74%|███████▎  | 219/297 [04:19<01:43,  1.32s/it]

Feneen fi bëtt na mu jëm.


 74%|███████▍  | 220/297 [04:20<01:40,  1.30s/it]

Jigéen jale.


 74%|███████▍  | 221/297 [04:21<01:36,  1.27s/it]

goreedi


 75%|███████▍  | 222/297 [04:23<01:38,  1.32s/it]

Su demee


 75%|███████▌  | 223/297 [04:24<01:34,  1.27s/it]

Gis na ma xarit yeneen yi


 75%|███████▌  | 224/297 [04:25<01:29,  1.23s/it]

Yaa doonkoon wax


 76%|███████▌  | 225/297 [04:26<01:27,  1.22s/it]

kennu nit


 76%|███████▌  | 226/297 [04:27<01:25,  1.21s/it]

dajji


 76%|███████▋  | 227/297 [04:28<01:23,  1.19s/it]

añ


 77%|███████▋  | 228/297 [04:30<01:33,  1.36s/it]

bëgg-bëgg


 77%|███████▋  | 229/297 [04:31<01:30,  1.33s/it]

jog


 77%|███████▋  | 230/297 [04:32<01:24,  1.27s/it]

anal


 78%|███████▊  | 231/297 [04:34<01:19,  1.21s/it]

Loolu la.


 78%|███████▊  | 232/297 [04:35<01:15,  1.16s/it]

Nit lañu!


 78%|███████▊  | 233/297 [04:36<01:17,  1.21s/it]

Góor gi bëgg na


 79%|███████▉  | 234/297 [04:38<01:23,  1.32s/it]

xéewlu


 79%|███████▉  | 235/297 [04:39<01:18,  1.26s/it]

Koo gis?


 79%|███████▉  | 236/297 [04:40<01:18,  1.29s/it]

naa


 80%|███████▉  | 237/297 [04:43<01:42,  1.71s/it]

Kenn ki ci buntu kër gi


 80%|████████  | 238/297 [04:44<01:29,  1.51s/it]

Su dee dem


 80%|████████  | 239/297 [04:45<01:19,  1.37s/it]

Jigéen ñi demuñu Ndar ba tày.


 81%|████████  | 240/297 [04:46<01:12,  1.28s/it]

lenn lale


 81%|████████  | 241/297 [04:47<01:13,  1.31s/it]

Ndaw senn réerul.


 81%|████████▏ | 242/297 [04:49<01:14,  1.35s/it]

Xammee õga bee xale?


 82%|████████▏ | 243/297 [04:50<01:07,  1.25s/it]

Xale


 82%|████████▏ | 244/297 [04:51<01:01,  1.17s/it]

May na ka keneen ku sawar.


 82%|████████▏ | 245/297 [04:52<01:02,  1.19s/it]

lekke


 83%|████████▎ | 246/297 [04:54<01:09,  1.36s/it]

Demuma fa woon


 83%|████████▎ | 247/297 [04:55<01:02,  1.25s/it]

fenn


 84%|████████▎ | 248/297 [04:56<00:57,  1.18s/it]

Maa di dem


 84%|████████▍ | 249/297 [04:57<00:58,  1.23s/it]

waaye


 84%|████████▍ | 250/297 [05:00<01:16,  1.63s/it]

Dellu biir Ndar, soo bëggée.


 85%|████████▍ | 251/297 [05:01<01:18,  1.70s/it]

tëccu


 85%|████████▍ | 252/297 [05:04<01:25,  1.90s/it]

jëf


 85%|████████▌ | 253/297 [05:06<01:26,  1.96s/it]

um


 86%|████████▌ | 254/297 [05:07<01:11,  1.66s/it]

wante itam


 86%|████████▌ | 255/297 [05:08<01:00,  1.43s/it]

Yan ñoo yeksi?


 86%|████████▌ | 256/297 [05:09<00:54,  1.33s/it]

ñennu nit


 87%|████████▋ | 257/297 [05:10<00:50,  1.26s/it]

Maa di dem


 87%|████████▋ | 258/297 [05:11<00:44,  1.13s/it]

tiit


 87%|████████▋ | 259/297 [05:12<00:40,  1.06s/it]

Ñu dikkul?


 88%|████████▊ | 260/297 [05:13<00:36,  1.00it/s]

Nit la woon


 88%|████████▊ | 261/297 [05:13<00:33,  1.07it/s]

lekke


 88%|████████▊ | 262/297 [05:15<00:40,  1.16s/it]

bu


 89%|████████▊ | 263/297 [05:18<00:58,  1.73s/it]

Na õgeen def?


 89%|████████▉ | 264/297 [05:19<00:51,  1.56s/it]

Maa di dem


 89%|████████▉ | 265/297 [05:21<00:47,  1.50s/it]

Gis õga coroom la woon?


 90%|████████▉ | 266/297 [05:22<00:42,  1.36s/it]

Du kenn.


 90%|████████▉ | 267/297 [05:23<00:40,  1.34s/it]

laõk


 90%|█████████ | 268/297 [05:24<00:37,  1.28s/it]

Gisoon naa nit ñooña ñépp.


 91%|█████████ | 269/297 [05:25<00:36,  1.32s/it]

Yéena dikkulwoon


 91%|█████████ | 270/297 [05:27<00:39,  1.47s/it]

Dogul cu seen biir.


 91%|█████████ | 271/297 [05:29<00:41,  1.60s/it]

te


 92%|█████████▏| 272/297 [05:31<00:38,  1.53s/it]

ka ka moom, ka ka ka ka ka moom ka ka moom, ka ka ka ka ka ka moom ka ka moom, ka ka ka ka ka ka moom ka ka moom, ka ka ka ka ka ka moom ka ka moom, ka ka ka


 92%|█████████▏| 273/297 [05:32<00:39,  1.63s/it]

Xar mi.


 92%|█████████▏| 274/297 [05:35<00:40,  1.77s/it]

baq


 93%|█████████▎| 275/297 [05:36<00:36,  1.64s/it]

Yooyale deey bëggu leen


 93%|█████████▎| 276/297 [05:37<00:32,  1.53s/it]

Yaa ka gis moom.


 93%|█████████▎| 277/297 [05:38<00:28,  1.45s/it]

te


 94%|█████████▎| 278/297 [05:40<00:29,  1.56s/it]

ba


 94%|█████████▍| 279/297 [05:42<00:31,  1.74s/it]

Gis õga sa xarit yeneen yi?


 94%|█████████▍| 280/297 [05:44<00:27,  1.59s/it]

su


 95%|█████████▍| 281/297 [05:45<00:23,  1.50s/it]

Yéen ñan la wax


 95%|█████████▍| 282/297 [05:46<00:21,  1.40s/it]

Menn nit ñëwul.


 95%|█████████▌| 283/297 [05:48<00:20,  1.46s/it]

Mi õgii dem ba delusi


 96%|█████████▌| 284/297 [05:49<00:18,  1.43s/it]

Su demee, nit lay na na...


 96%|█████████▌| 285/297 [05:50<00:16,  1.39s/it]

Ñépp ñan õga gis?


 96%|█████████▋| 286/297 [05:52<00:15,  1.37s/it]

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l


 97%|█████████▋| 287/297 [05:54<00:16,  1.69s/it]

baxbax-lu


 97%|█████████▋| 288/297 [05:55<00:14,  1.58s/it]

ub


 97%|█████████▋| 289/297 [05:57<00:11,  1.50s/it]

nile


 98%|█████████▊| 290/297 [05:58<00:09,  1.39s/it]

Gaynde la, ku dem


 98%|█████████▊| 291/297 [05:59<00:08,  1.39s/it]

xam xam lu


 98%|█████████▊| 292/297 [06:01<00:07,  1.42s/it]

Ñeneen lañu.


 99%|█████████▊| 293/297 [06:02<00:05,  1.38s/it]

dem-na


 99%|█████████▉| 294/297 [06:04<00:04,  1.45s/it]

kubéer


 99%|█████████▉| 295/297 [06:05<00:03,  1.55s/it]

Gor gii dem?


100%|█████████▉| 296/297 [06:08<00:01,  1.72s/it]

Góor gi


100%|██████████| 297/297 [06:09<00:00,  1.24s/it]

Man demuma





Unnamed: 0,original_text,original_label,predicted_label
0,Va les voir!,Gisi leen!,Seetil
1,couper,dagg,dog
2,Ce n'était pas un homme de Saint-Louis.,Du woon góoru Ndar.,Jambaar du bare wax.
3,Peut-être l'homme a-t-il dit que c'est celui-là!,Soo demee góor gee ni kookule la!,Su góor gi dee ñëw
4,indignité,goreedi,bokk


In [34]:
df_ft_to_wf.tail(10)

Unnamed: 0,original_text,original_label,predicted_label
287,fermer,up,ub
288,de quelle manière,naan,nile
289,C'est peut-être un lion!,"Ku dem, gaynde la!","Gaynde la, ku dem"
290,suinter,siit,xam xam lu
291,"Autres, ils sont.",Ñeñeen lañu.,Ñeneen lañu.
292,il y a un moment,saõx,dem-na
293,qui nie,weddikat,kubéer
294,"Tu sais, cet homme?",Gis õga nit kookale?,Gor gii dem?
295,L'homme partira aujourd'hui,Góor gi dana dem tay ji,Góor gi
296,Moi-même je n'ai pas été,Man mii demuma,Man demuma


In [35]:
# let us display 100 samples
pd.options.display.max_rows = 100
df_ft_to_wf.sample(100)

Unnamed: 0,original_text,original_label,predicted_label
167,"Je veux, si vous avez fini, que tu viennes et ...",Bëgg naa õga ñëw mu dem su õgeen noppée!,Bëgg naa õga dem te it õga noppi
211,celui-ci près de toi,kookii,kookuu
63,Du moment qu'il part,Bi mu dee dem,Bi mu dee dem
154,là a cet endroit,foofile,foofii
5,Est-ce comme je le crains que...!,Mbaa...!,Mbaa...
77,êtres humains de sexe masculin,nit ñu góor,nit ku góor
183,On croyait que tu allais partir matin et soir!,Dañu defewoon ni daa dem ci subë ag ci õgoon y...,Dañu defewoon ci õgoon ag ci suba yépp da dem
158,traîner,wàtal,wàt
9,L'homme n'a rien mangé avec la main.,Nit ki lekkul dara ak loxoom.,Nit ki dóor na nag wi ak bant.
139,L'homme a donné quelque chose à quelqu'un.,Góor gi may na dara nit.,Góor gi may na nit dara.


## Colab download and remove step

In [None]:
import shutil

# shutil.rmtree('/content/drive/MyDrive/Memoire/subject2/training2/results2')
shutil.rmtree('wandb')
# shutil.make_archive('wandb', 'zip', 'wanbd')