Fine-tuning best T5 Transformer 🤖
-----------------------------------

In this notebook, we will continue the fine-tuning of T5 transformer on the new extracted sentences from the bool **Grammaire de Wolof Moderne**. We obtained, after a hyperparameter tuning with `wandb`, a best bleu score of **2.47** for french to wolof translation model. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step.

- Parallel coordinates from panel:


`Parameter importance char` (from [panel]():

![parameter_importance]()

In [1]:
# let us extend the paths of the system
import sys
import os

# path = "/content/drive/MyDrive/Memoire/subject2/T5/"

# sys.path.extend([path, f"{path}new_data"])

In [2]:
os.environ["WANDB_DISABLED"] = "true"

In [3]:
# !pip install -qq wandb --upgrade

In [4]:
# !pip install evaluate -qq
# !pip install sacrebleu -qq
# !pip install optuna -qq
# !pip install transformers -qq 
# !pip install tokenizers -qq
# !pip install nlpaug -qq
# !pip install ray[tune] -qq
# !python -m spacy download fr_core_news_lg 

In [5]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, set_seed
from wolof_translate.utils.sent_transformers import TransformerSequences
from wolof_translate.data.dataset_v2 import T5SentenceDataset
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from functools import partial
from tqdm import tqdm
import pandas as pd
import numpy as np
import evaluate
import wandb
import torch



  from .autonotebook import tqdm as notebook_tqdm


--------------

## French to wolof

### Configure dataset 🔠

In [6]:
def split_data(random_state: int = 50):
  """Split data between train, validation and test sets

  Args:
    random_state (int): the seed of the splitting generator. Defaults to 50
  """
  # load the corpora and split into train and test sets
  corpora = pd.read_csv(f"data/additional_documents/diagne_sentences/extractions.csv")

  train_set, test_set = train_test_split(corpora, test_size=0.1, random_state=random_state)

  # let us save the final training set when performing

  train_set, valid_set = train_test_split(train_set, test_size=0.1, random_state=random_state)

  train_set.to_csv(f"data/additional_documents/diagne_sentences/final_train_set.csv", index=False)

  # let us save the sets
  train_set.to_csv(f"data/additional_documents/diagne_sentences/train_set.csv", index=False)

  valid_set.to_csv(f"data/additional_documents/diagne_sentences/valid_set.csv", index=False)

  test_set.to_csv(f"data/additional_documents/diagne_sentences/test_set.csv", index=False)

In [7]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v3.json")


In [8]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float):

  # Create augmentation to add on French sentences
  fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p),
                                        remove_mark_space, delete_guillemet_space)

  # Recuperate the train dataset
  train_dataset_aug = T5SentenceDataset(f"data/additional_documents/diagne_sentences/final_train_set.csv",
                                        tokenizer,
                                        truncation = True,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the test dataset
  test_dataset = T5SentenceDataset(f"data/additional_documents/diagne_sentences/test_set.csv",
                                        tokenizer,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, test_dataset

### Configure the model and the evaluation function ⚙️

Let us recuperate the model and resize the token embeddings.

In [9]:
def t5_model_init(tokenizer):

  # Initialize the model name
  model_name = 't5-small'
  # model_name = 'data/checkpoints/vf_t5_small_v2_checkpoints_2/' # from checkpoint

  # import the model with its pre-trained weights
  model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

  # resize the token embeddings
  model.resize_token_embeddings(len(tokenizer))

  return model

Let us evaluate the predictions with the `bleu` metric.

In [10]:
# %%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
        
        decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

In [11]:
# %run wolof-translate/wolof_translate/utils/evaluation.py

Let us initialize the evaluation object.

In [12]:
evaluation = TranslationEvaluation(tokenizer)


### Searching for the best parameters 🕖

Let us define the data collator.

In [13]:
def data_collator(batch):
    """Generate a batch of data to provide to trainer

    Args:
        batch (_type_): The batch

    Returns:
        dict: A dictionary containing the ids, the attention mask and the labels
    """
    input_ids = torch.stack([b[0].squeeze(0) for b in batch])
    
    attention_mask = torch.stack([b[1].squeeze(0) for b in batch])
    
    labels = torch.stack([b[2].squeeze(0) for b in batch])
    
    return {'input_ids': input_ids, 'attention_mask': attention_mask,
            'labels': labels}

Let us initialize the training arguments and make random search.

In [14]:
# %%wandb

"""Best parameters
learning_rate = 0.0029455426961160418
weight_decay = 0.3273145442978588
train_batch_size = 16
random_state = 2
fr_char_p = 0.2646960611549013
fr_word_p = 0.32759507689127154
eval/bleu = 3.0599
"""

# let us define a directory
directory = "data/checkpoints/t5_results_fw_v2_3"

# seed
set_seed(0)

# split the data
split_data(random_state=2)

# let us recuperate the datasets
train_dataset, test_dataset = recuperate_datasets(0.2646960611549013, 0.32759507689127154)

# set training arguments
training_args = Seq2SeqTrainingArguments(directory,
                                    logging_dir="data/logs/results_fw_v2_3",
                                    overwrite_output_dir=True,
                                    num_train_epochs=500,
                                    load_best_model_at_end=True,
                                    save_strategy="epoch",
                                    evaluation_strategy="epoch",
                                    logging_strategy="epoch",
                                    per_device_train_batch_size=16, 
                                    per_device_eval_batch_size=16,
                                    learning_rate=0.0029455426961160418,
                                    # learning_rate=0.00003113,
                                    weight_decay=0.3273145442978588,
                                    predict_with_generate=True, # we will use predict with generate in order to obtain more valuable test results
                                    fp16 = True,
                                    metric_for_best_model = 'bleu', # a bleu score will be used to find the best model
                                    greater_is_better = True,
                                    save_total_limit = 2, # we will save only the best model
                                    )   

# define training loop
trainer = Seq2SeqTrainer(model_init=partial(t5_model_init, tokenizer = train_dataset.tokenizer),
                  args=training_args,
                  train_dataset=train_dataset, 
                  eval_dataset=test_dataset,
                  data_collator=data_collator,
                  compute_metrics=evaluation.compute_metrics
                  )

# load last checkpoint
# trainer._load_from_checkpoint("data/training2/results/checkpoint-147")

# start training loop
trainer.train()
# trainer.train(resume_from_checkpoint=True)
# trainer.train('data/checkpoints/t5_results_fw_v2_2/')
# trainer.train('data/checkpoints/vf_t5_small_v2_checkpoints_2/')
# trainer.train('data/checkpoints/vf_t5_small_v2_checkpoints/') # from the searching best model
# trainer.train('data/checkpoints/results_fw_v2/last_checkpoint/') # from last checkpoint



Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
  0%|          | 151/75500 [00:27<2:58:32,  7.03it/s]

{'loss': 0.6273, 'learning_rate': 0.0029396906245343544, 'epoch': 1.0}


                                                     
  0%|          | 151/75500 [00:33<2:58:32,  7.03it/s]

{'eval_loss': 0.5181533098220825, 'eval_bleu': 0.6952, 'eval_gen_len': 4.1582, 'eval_runtime': 6.2139, 'eval_samples_per_second': 47.796, 'eval_steps_per_second': 3.058, 'epoch': 1.0}


  0%|          | 302/75500 [00:58<2:52:12,  7.28it/s] 

{'loss': 0.4553, 'learning_rate': 0.0029337995391421224, 'epoch': 2.0}


                                                     
  0%|          | 302/75500 [01:05<2:52:12,  7.28it/s]

{'eval_loss': 0.48194652795791626, 'eval_bleu': 2.08, 'eval_gen_len': 4.4411, 'eval_runtime': 6.559, 'eval_samples_per_second': 45.281, 'eval_steps_per_second': 2.897, 'epoch': 2.0}


  1%|          | 453/75500 [01:31<2:58:06,  7.02it/s] 

{'loss': 0.3816, 'learning_rate': 0.00292790845374989, 'epoch': 3.0}


                                                     
  1%|          | 453/75500 [01:37<2:58:06,  7.02it/s]

{'eval_loss': 0.4365682303905487, 'eval_bleu': 3.6015, 'eval_gen_len': 4.1785, 'eval_runtime': 6.3955, 'eval_samples_per_second': 46.439, 'eval_steps_per_second': 2.971, 'epoch': 3.0}


  1%|          | 604/75500 [02:03<3:37:14,  5.75it/s] 

{'loss': 0.3232, 'learning_rate': 0.002922017368357658, 'epoch': 4.0}


                                                     
  1%|          | 604/75500 [02:10<3:37:14,  5.75it/s]

{'eval_loss': 0.4345305562019348, 'eval_bleu': 2.1008, 'eval_gen_len': 4.6296, 'eval_runtime': 7.0493, 'eval_samples_per_second': 42.132, 'eval_steps_per_second': 2.695, 'epoch': 4.0}


  1%|          | 755/75500 [02:36<3:13:08,  6.45it/s] 

{'loss': 0.2769, 'learning_rate': 0.002916126282965426, 'epoch': 5.0}


                                                     
  1%|          | 755/75500 [02:42<3:13:08,  6.45it/s]

{'eval_loss': 0.41947734355926514, 'eval_bleu': 4.6062, 'eval_gen_len': 4.9529, 'eval_runtime': 6.4604, 'eval_samples_per_second': 45.973, 'eval_steps_per_second': 2.941, 'epoch': 5.0}


  1%|          | 906/75500 [03:07<3:33:02,  5.84it/s] 

{'loss': 0.2412, 'learning_rate': 0.002910235197573194, 'epoch': 6.0}


                                                     
  1%|          | 906/75500 [03:14<3:33:02,  5.84it/s]

{'eval_loss': 0.4115380644798279, 'eval_bleu': 5.5016, 'eval_gen_len': 5.3939, 'eval_runtime': 7.362, 'eval_samples_per_second': 40.342, 'eval_steps_per_second': 2.581, 'epoch': 6.0}


  1%|▏         | 1057/75500 [03:39<3:06:37,  6.65it/s]

{'loss': 0.2181, 'learning_rate': 0.0029043441121809617, 'epoch': 7.0}


                                                      
  1%|▏         | 1057/75500 [03:46<3:06:37,  6.65it/s]

{'eval_loss': 0.4145619869232178, 'eval_bleu': 5.7085, 'eval_gen_len': 5.229, 'eval_runtime': 6.619, 'eval_samples_per_second': 44.871, 'eval_steps_per_second': 2.871, 'epoch': 7.0}


  2%|▏         | 1208/75500 [04:11<3:23:44,  6.08it/s] 

{'loss': 0.2008, 'learning_rate': 0.0028984530267887297, 'epoch': 8.0}


                                                      
  2%|▏         | 1208/75500 [04:18<3:23:44,  6.08it/s]

{'eval_loss': 0.4210500121116638, 'eval_bleu': 5.2822, 'eval_gen_len': 5.0505, 'eval_runtime': 6.5931, 'eval_samples_per_second': 45.047, 'eval_steps_per_second': 2.882, 'epoch': 8.0}


  2%|▏         | 1359/75500 [04:44<4:30:23,  4.57it/s] 

{'loss': 0.1856, 'learning_rate': 0.0028925619413964977, 'epoch': 9.0}


                                                      
  2%|▏         | 1359/75500 [04:51<4:30:23,  4.57it/s]

{'eval_loss': 0.4126814901828766, 'eval_bleu': 7.1717, 'eval_gen_len': 5.9327, 'eval_runtime': 7.4492, 'eval_samples_per_second': 39.87, 'eval_steps_per_second': 2.551, 'epoch': 9.0}


  2%|▏         | 1510/75500 [05:21<3:07:55,  6.56it/s] 

{'loss': 0.1812, 'learning_rate': 0.0028866708560042658, 'epoch': 10.0}


                                                      
  2%|▏         | 1510/75500 [05:29<3:07:55,  6.56it/s]

{'eval_loss': 0.39520978927612305, 'eval_bleu': 9.409, 'eval_gen_len': 5.2795, 'eval_runtime': 8.1311, 'eval_samples_per_second': 36.526, 'eval_steps_per_second': 2.337, 'epoch': 10.0}


  2%|▏         | 1661/75500 [05:58<3:27:24,  5.93it/s] 

{'loss': 0.1806, 'learning_rate': 0.0028807797706120334, 'epoch': 11.0}


                                                      
  2%|▏         | 1661/75500 [06:06<3:27:24,  5.93it/s]

{'eval_loss': 0.40272119641304016, 'eval_bleu': 8.0553, 'eval_gen_len': 5.2795, 'eval_runtime': 7.8262, 'eval_samples_per_second': 37.949, 'eval_steps_per_second': 2.428, 'epoch': 11.0}


  2%|▏         | 1812/75500 [06:36<3:35:01,  5.71it/s] 

{'loss': 0.1839, 'learning_rate': 0.0028748886852198014, 'epoch': 12.0}


                                                      
  2%|▏         | 1812/75500 [06:44<3:35:01,  5.71it/s]

{'eval_loss': 0.401095986366272, 'eval_bleu': 5.6073, 'eval_gen_len': 5.1549, 'eval_runtime': 8.1825, 'eval_samples_per_second': 36.297, 'eval_steps_per_second': 2.322, 'epoch': 12.0}


  3%|▎         | 1963/75500 [07:12<3:50:06,  5.33it/s] 

{'loss': 0.1772, 'learning_rate': 0.0028689975998275694, 'epoch': 13.0}


                                                      
  3%|▎         | 1963/75500 [07:21<3:50:06,  5.33it/s]

{'eval_loss': 0.39647069573402405, 'eval_bleu': 6.0461, 'eval_gen_len': 5.1481, 'eval_runtime': 9.0721, 'eval_samples_per_second': 32.738, 'eval_steps_per_second': 2.094, 'epoch': 13.0}


  3%|▎         | 2114/75500 [07:52<4:00:49,  5.08it/s] 

{'loss': 0.1765, 'learning_rate': 0.002863106514435337, 'epoch': 14.0}


                                                      
  3%|▎         | 2114/75500 [07:59<4:00:49,  5.08it/s]

{'eval_loss': 0.41492760181427, 'eval_bleu': 4.4338, 'eval_gen_len': 4.6768, 'eval_runtime': 6.9124, 'eval_samples_per_second': 42.966, 'eval_steps_per_second': 2.749, 'epoch': 14.0}


  3%|▎         | 2265/75500 [08:24<3:13:37,  6.30it/s] 

{'loss': 0.1768, 'learning_rate': 0.002857215429043105, 'epoch': 15.0}


                                                      
  3%|▎         | 2265/75500 [08:30<3:13:37,  6.30it/s]

{'eval_loss': 0.3964254856109619, 'eval_bleu': 6.1307, 'eval_gen_len': 4.7879, 'eval_runtime': 6.4599, 'eval_samples_per_second': 45.976, 'eval_steps_per_second': 2.941, 'epoch': 15.0}


  3%|▎         | 2416/75500 [08:55<2:54:17,  6.99it/s] 

{'loss': 0.1694, 'learning_rate': 0.002851324343650873, 'epoch': 16.0}


                                                      
  3%|▎         | 2416/75500 [09:01<2:54:17,  6.99it/s]

{'eval_loss': 0.40103667974472046, 'eval_bleu': 5.8791, 'eval_gen_len': 4.7374, 'eval_runtime': 6.6971, 'eval_samples_per_second': 44.348, 'eval_steps_per_second': 2.837, 'epoch': 16.0}


  3%|▎         | 2567/75500 [09:26<2:55:28,  6.93it/s] 

{'loss': 0.1688, 'learning_rate': 0.002845433258258641, 'epoch': 17.0}


                                                      
  3%|▎         | 2567/75500 [09:32<2:55:28,  6.93it/s]

{'eval_loss': 0.41996681690216064, 'eval_bleu': 8.3831, 'eval_gen_len': 4.5589, 'eval_runtime': 6.5996, 'eval_samples_per_second': 45.003, 'eval_steps_per_second': 2.879, 'epoch': 17.0}


  4%|▎         | 2718/75500 [09:57<3:03:17,  6.62it/s] 

{'loss': 0.1832, 'learning_rate': 0.0028395421728664087, 'epoch': 18.0}


                                                      
  4%|▎         | 2718/75500 [10:03<3:03:17,  6.62it/s]

{'eval_loss': 0.39436429738998413, 'eval_bleu': 4.7023, 'eval_gen_len': 5.2054, 'eval_runtime': 6.5099, 'eval_samples_per_second': 45.623, 'eval_steps_per_second': 2.919, 'epoch': 18.0}


  4%|▍         | 2869/75500 [10:28<3:01:11,  6.68it/s] 

{'loss': 0.1747, 'learning_rate': 0.0028336510874741767, 'epoch': 19.0}


                                                      
  4%|▍         | 2869/75500 [10:34<3:01:11,  6.68it/s]

{'eval_loss': 0.3899122178554535, 'eval_bleu': 8.168, 'eval_gen_len': 4.9125, 'eval_runtime': 6.4975, 'eval_samples_per_second': 45.71, 'eval_steps_per_second': 2.924, 'epoch': 19.0}


  4%|▍         | 3020/75500 [11:02<3:03:41,  6.58it/s] 

{'loss': 0.1724, 'learning_rate': 0.0028277600020819448, 'epoch': 20.0}


                                                      
  4%|▍         | 3020/75500 [11:09<3:03:41,  6.58it/s]

{'eval_loss': 0.3939327597618103, 'eval_bleu': 5.8136, 'eval_gen_len': 5.2323, 'eval_runtime': 7.0507, 'eval_samples_per_second': 42.123, 'eval_steps_per_second': 2.695, 'epoch': 20.0}


  4%|▍         | 3171/75500 [11:34<3:18:48,  6.06it/s] 

{'loss': 0.1837, 'learning_rate': 0.002821868916689713, 'epoch': 21.0}


                                                      
  4%|▍         | 3171/75500 [11:41<3:18:48,  6.06it/s]

{'eval_loss': 0.39104098081588745, 'eval_bleu': 6.8538, 'eval_gen_len': 4.7239, 'eval_runtime': 6.537, 'eval_samples_per_second': 45.434, 'eval_steps_per_second': 2.907, 'epoch': 21.0}


  4%|▍         | 3322/75500 [12:06<3:24:26,  5.88it/s] 

{'loss': 0.1702, 'learning_rate': 0.0028159778312974804, 'epoch': 22.0}


                                                      
  4%|▍         | 3322/75500 [12:15<3:24:26,  5.88it/s]

{'eval_loss': 0.3855467140674591, 'eval_bleu': 8.9744, 'eval_gen_len': 4.862, 'eval_runtime': 8.2666, 'eval_samples_per_second': 35.928, 'eval_steps_per_second': 2.298, 'epoch': 22.0}


  5%|▍         | 3473/75500 [12:45<3:57:30,  5.05it/s] 

{'loss': 0.167, 'learning_rate': 0.0028100867459052484, 'epoch': 23.0}


                                                      
  5%|▍         | 3473/75500 [12:55<3:57:30,  5.05it/s]

{'eval_loss': 0.39310988783836365, 'eval_bleu': 6.8782, 'eval_gen_len': 5.734, 'eval_runtime': 9.5134, 'eval_samples_per_second': 31.219, 'eval_steps_per_second': 1.997, 'epoch': 23.0}


  5%|▍         | 3624/75500 [13:32<3:28:22,  5.75it/s] 

{'loss': 0.1723, 'learning_rate': 0.0028041956605130165, 'epoch': 24.0}


                                                      
  5%|▍         | 3624/75500 [13:40<3:28:22,  5.75it/s]

{'eval_loss': 0.3889075219631195, 'eval_bleu': 5.7761, 'eval_gen_len': 5.3636, 'eval_runtime': 8.2252, 'eval_samples_per_second': 36.108, 'eval_steps_per_second': 2.31, 'epoch': 24.0}


  5%|▌         | 3775/75500 [14:11<3:49:49,  5.20it/s] 

{'loss': 0.1685, 'learning_rate': 0.002798304575120784, 'epoch': 25.0}


                                                      
  5%|▌         | 3775/75500 [14:20<3:49:49,  5.20it/s]

{'eval_loss': 0.38117873668670654, 'eval_bleu': 4.5441, 'eval_gen_len': 5.2323, 'eval_runtime': 8.8453, 'eval_samples_per_second': 33.577, 'eval_steps_per_second': 2.148, 'epoch': 25.0}


  5%|▌         | 3926/75500 [14:49<3:29:10,  5.70it/s] 

{'loss': 0.1676, 'learning_rate': 0.002792413489728552, 'epoch': 26.0}


                                                      
  5%|▌         | 3926/75500 [14:58<3:29:10,  5.70it/s]

{'eval_loss': 0.3819454610347748, 'eval_bleu': 8.8362, 'eval_gen_len': 5.4747, 'eval_runtime': 8.0959, 'eval_samples_per_second': 36.685, 'eval_steps_per_second': 2.347, 'epoch': 26.0}


  5%|▌         | 4077/75500 [15:28<4:07:32,  4.81it/s] 

{'loss': 0.1676, 'learning_rate': 0.00278652240433632, 'epoch': 27.0}


                                                      
  5%|▌         | 4077/75500 [15:36<4:07:32,  4.81it/s]

{'eval_loss': 0.37211498618125916, 'eval_bleu': 4.7065, 'eval_gen_len': 5.2357, 'eval_runtime': 7.8978, 'eval_samples_per_second': 37.606, 'eval_steps_per_second': 2.406, 'epoch': 27.0}


  6%|▌         | 4228/75500 [16:06<3:25:14,  5.79it/s] 

{'loss': 0.1749, 'learning_rate': 0.002780631318944088, 'epoch': 28.0}


                                                      
  6%|▌         | 4228/75500 [16:14<3:25:14,  5.79it/s]

{'eval_loss': 0.4173600971698761, 'eval_bleu': 1.7174, 'eval_gen_len': 5.0505, 'eval_runtime': 7.8533, 'eval_samples_per_second': 37.818, 'eval_steps_per_second': 2.419, 'epoch': 28.0}


  6%|▌         | 4379/75500 [16:43<3:47:28,  5.21it/s] 

{'loss': 0.2134, 'learning_rate': 0.0027747792473624007, 'epoch': 29.0}


                                                      
  6%|▌         | 4379/75500 [16:51<3:47:28,  5.21it/s]

{'eval_loss': 0.3758368492126465, 'eval_bleu': 8.8214, 'eval_gen_len': 4.8384, 'eval_runtime': 7.7767, 'eval_samples_per_second': 38.191, 'eval_steps_per_second': 2.443, 'epoch': 29.0}


  6%|▌         | 4530/75500 [17:20<3:36:17,  5.47it/s] 

{'loss': 0.1683, 'learning_rate': 0.0027688881619701683, 'epoch': 30.0}


                                                      
  6%|▌         | 4530/75500 [17:28<3:36:17,  5.47it/s]

{'eval_loss': 0.3713303506374359, 'eval_bleu': 7.2795, 'eval_gen_len': 5.2862, 'eval_runtime': 7.9047, 'eval_samples_per_second': 37.573, 'eval_steps_per_second': 2.404, 'epoch': 30.0}


  6%|▌         | 4681/75500 [17:57<3:52:26,  5.08it/s] 

{'loss': 0.1575, 'learning_rate': 0.0027629970765779364, 'epoch': 31.0}


                                                      
  6%|▌         | 4681/75500 [18:05<3:52:26,  5.08it/s]

{'eval_loss': 0.3791356086730957, 'eval_bleu': 7.6058, 'eval_gen_len': 5.0943, 'eval_runtime': 7.9609, 'eval_samples_per_second': 37.307, 'eval_steps_per_second': 2.387, 'epoch': 31.0}


  6%|▋         | 4832/75500 [18:34<3:36:53,  5.43it/s] 

{'loss': 0.1627, 'learning_rate': 0.0027571059911857044, 'epoch': 32.0}


                                                      
  6%|▋         | 4832/75500 [18:42<3:36:53,  5.43it/s]

{'eval_loss': 0.37833577394485474, 'eval_bleu': 6.7727, 'eval_gen_len': 5.5118, 'eval_runtime': 7.9849, 'eval_samples_per_second': 37.195, 'eval_steps_per_second': 2.379, 'epoch': 32.0}


  7%|▋         | 4983/75500 [19:11<3:33:45,  5.50it/s] 

{'loss': 0.1659, 'learning_rate': 0.002751214905793472, 'epoch': 33.0}


                                                      
  7%|▋         | 4983/75500 [19:19<3:33:45,  5.50it/s]

{'eval_loss': 0.37163832783699036, 'eval_bleu': 5.8245, 'eval_gen_len': 5.5993, 'eval_runtime': 7.9528, 'eval_samples_per_second': 37.345, 'eval_steps_per_second': 2.389, 'epoch': 33.0}


  7%|▋         | 5134/75500 [19:49<3:42:02,  5.28it/s] 

{'loss': 0.1674, 'learning_rate': 0.00274532382040124, 'epoch': 34.0}


                                                      
  7%|▋         | 5134/75500 [19:57<3:42:02,  5.28it/s]

{'eval_loss': 0.3793576657772064, 'eval_bleu': 9.3712, 'eval_gen_len': 4.9024, 'eval_runtime': 7.7345, 'eval_samples_per_second': 38.399, 'eval_steps_per_second': 2.457, 'epoch': 34.0}


  7%|▋         | 5285/75500 [20:29<3:45:14,  5.20it/s] 

{'loss': 0.1655, 'learning_rate': 0.002739432735009008, 'epoch': 35.0}


                                                      
  7%|▋         | 5285/75500 [20:39<3:45:14,  5.20it/s]

{'eval_loss': 0.39754828810691833, 'eval_bleu': 7.729, 'eval_gen_len': 5.0707, 'eval_runtime': 10.6151, 'eval_samples_per_second': 27.979, 'eval_steps_per_second': 1.79, 'epoch': 35.0}


  7%|▋         | 5436/75500 [21:10<4:13:55,  4.60it/s] 

{'loss': 0.1668, 'learning_rate': 0.002733541649616776, 'epoch': 36.0}


                                                      
  7%|▋         | 5436/75500 [21:18<4:13:55,  4.60it/s]

{'eval_loss': 0.3835105001926422, 'eval_bleu': 7.0455, 'eval_gen_len': 4.6397, 'eval_runtime': 8.6411, 'eval_samples_per_second': 34.37, 'eval_steps_per_second': 2.199, 'epoch': 36.0}


  7%|▋         | 5587/75500 [21:48<3:52:06,  5.02it/s] 

{'loss': 0.1651, 'learning_rate': 0.0027276505642245437, 'epoch': 37.0}


                                                      
  7%|▋         | 5587/75500 [21:57<3:52:06,  5.02it/s]

{'eval_loss': 0.3739379048347473, 'eval_bleu': 8.6389, 'eval_gen_len': 4.8653, 'eval_runtime': 8.8475, 'eval_samples_per_second': 33.569, 'eval_steps_per_second': 2.148, 'epoch': 37.0}


  8%|▊         | 5738/75500 [22:30<3:11:06,  6.08it/s] 

{'loss': 0.1638, 'learning_rate': 0.0027217594788323117, 'epoch': 38.0}


                                                      
  8%|▊         | 5738/75500 [22:37<3:11:06,  6.08it/s]

{'eval_loss': 0.37637975811958313, 'eval_bleu': 9.1822, 'eval_gen_len': 5.4411, 'eval_runtime': 6.8228, 'eval_samples_per_second': 43.53, 'eval_steps_per_second': 2.785, 'epoch': 38.0}


  8%|▊         | 5889/75500 [23:01<2:42:53,  7.12it/s] 

{'loss': 0.1645, 'learning_rate': 0.0027158683934400797, 'epoch': 39.0}


                                                      
  8%|▊         | 5889/75500 [23:08<2:42:53,  7.12it/s]

{'eval_loss': 0.3740963935852051, 'eval_bleu': 7.0817, 'eval_gen_len': 5.1886, 'eval_runtime': 6.6969, 'eval_samples_per_second': 44.349, 'eval_steps_per_second': 2.837, 'epoch': 39.0}


  8%|▊         | 6040/75500 [23:32<2:50:08,  6.80it/s] 

{'loss': 0.1764, 'learning_rate': 0.0027099773080478478, 'epoch': 40.0}


                                                      
  8%|▊         | 6040/75500 [23:38<2:50:08,  6.80it/s]

{'eval_loss': 0.41168901324272156, 'eval_bleu': 5.7697, 'eval_gen_len': 4.5118, 'eval_runtime': 6.5753, 'eval_samples_per_second': 45.169, 'eval_steps_per_second': 2.89, 'epoch': 40.0}


  8%|▊         | 6191/75500 [24:04<3:16:42,  5.87it/s] 

{'loss': 0.1704, 'learning_rate': 0.0027040862226556154, 'epoch': 41.0}


                                                      
  8%|▊         | 6191/75500 [24:12<3:16:42,  5.87it/s]

{'eval_loss': 0.3858857750892639, 'eval_bleu': 9.1821, 'eval_gen_len': 4.8889, 'eval_runtime': 7.2853, 'eval_samples_per_second': 40.767, 'eval_steps_per_second': 2.608, 'epoch': 41.0}


  8%|▊         | 6342/75500 [24:36<2:41:53,  7.12it/s] 

{'loss': 0.1565, 'learning_rate': 0.0026981951372633834, 'epoch': 42.0}


                                                      
  8%|▊         | 6342/75500 [24:43<2:41:53,  7.12it/s]

{'eval_loss': 0.38327261805534363, 'eval_bleu': 7.6858, 'eval_gen_len': 4.9293, 'eval_runtime': 6.5585, 'eval_samples_per_second': 45.285, 'eval_steps_per_second': 2.897, 'epoch': 42.0}


  9%|▊         | 6493/75500 [25:07<2:58:18,  6.45it/s] 

{'loss': 0.1589, 'learning_rate': 0.0026923040518711514, 'epoch': 43.0}


                                                      
  9%|▊         | 6493/75500 [25:14<2:58:18,  6.45it/s]

{'eval_loss': 0.3761862814426422, 'eval_bleu': 4.7814, 'eval_gen_len': 5.1481, 'eval_runtime': 7.0186, 'eval_samples_per_second': 42.316, 'eval_steps_per_second': 2.707, 'epoch': 43.0}


  9%|▉         | 6644/75500 [25:38<2:56:13,  6.51it/s] 

{'loss': 0.1599, 'learning_rate': 0.002686412966478919, 'epoch': 44.0}


                                                      
  9%|▉         | 6644/75500 [25:44<2:56:13,  6.51it/s]

{'eval_loss': 0.38023850321769714, 'eval_bleu': 5.1395, 'eval_gen_len': 5.4377, 'eval_runtime': 6.7863, 'eval_samples_per_second': 43.764, 'eval_steps_per_second': 2.8, 'epoch': 44.0}


  9%|▉         | 6795/75500 [26:08<3:20:13,  5.72it/s] 

{'loss': 0.1549, 'learning_rate': 0.002680521881086687, 'epoch': 45.0}


                                                      
  9%|▉         | 6795/75500 [26:15<3:20:13,  5.72it/s]

{'eval_loss': 0.3896496891975403, 'eval_bleu': 7.5497, 'eval_gen_len': 4.6566, 'eval_runtime': 6.5999, 'eval_samples_per_second': 45.0, 'eval_steps_per_second': 2.879, 'epoch': 45.0}


  9%|▉         | 6946/75500 [26:39<2:44:57,  6.93it/s] 

{'loss': 0.152, 'learning_rate': 0.002674630795694455, 'epoch': 46.0}


                                                      
  9%|▉         | 6946/75500 [26:46<2:44:57,  6.93it/s]

{'eval_loss': 0.3912084698677063, 'eval_bleu': 7.9083, 'eval_gen_len': 5.4444, 'eval_runtime': 6.5767, 'eval_samples_per_second': 45.159, 'eval_steps_per_second': 2.889, 'epoch': 46.0}


  9%|▉         | 7097/75500 [27:12<3:05:54,  6.13it/s] 

{'loss': 0.1519, 'learning_rate': 0.002668739710302223, 'epoch': 47.0}


                                                      
  9%|▉         | 7097/75500 [27:22<3:05:54,  6.13it/s]

{'eval_loss': 0.38145896792411804, 'eval_bleu': 7.4649, 'eval_gen_len': 5.5791, 'eval_runtime': 9.2139, 'eval_samples_per_second': 32.234, 'eval_steps_per_second': 2.062, 'epoch': 47.0}


 10%|▉         | 7248/75500 [27:50<2:45:28,  6.87it/s] 

{'loss': 0.1897, 'learning_rate': 0.0026628876387205357, 'epoch': 48.0}


                                                      
 10%|▉         | 7248/75500 [27:57<2:45:28,  6.87it/s]

{'eval_loss': 0.3903833329677582, 'eval_bleu': 5.6843, 'eval_gen_len': 5.2694, 'eval_runtime': 6.4932, 'eval_samples_per_second': 45.74, 'eval_steps_per_second': 2.926, 'epoch': 48.0}


 10%|▉         | 7399/75500 [28:22<3:11:05,  5.94it/s] 

{'loss': 0.1603, 'learning_rate': 0.0026569965533283033, 'epoch': 49.0}


                                                      
 10%|▉         | 7399/75500 [28:30<3:11:05,  5.94it/s]

{'eval_loss': 0.3879314363002777, 'eval_bleu': 5.169, 'eval_gen_len': 4.9293, 'eval_runtime': 8.5231, 'eval_samples_per_second': 34.846, 'eval_steps_per_second': 2.229, 'epoch': 49.0}


 10%|█         | 7550/75500 [29:00<3:09:42,  5.97it/s] 

{'loss': 0.1444, 'learning_rate': 0.0026511054679360714, 'epoch': 50.0}


                                                      
 10%|█         | 7550/75500 [29:07<3:09:42,  5.97it/s]

{'eval_loss': 0.3926709294319153, 'eval_bleu': 6.2069, 'eval_gen_len': 4.8822, 'eval_runtime': 7.0351, 'eval_samples_per_second': 42.217, 'eval_steps_per_second': 2.701, 'epoch': 50.0}


 10%|█         | 7701/75500 [29:35<2:54:29,  6.48it/s] 

{'loss': 0.1385, 'learning_rate': 0.0026452143825438394, 'epoch': 51.0}


                                                      
 10%|█         | 7701/75500 [29:42<2:54:29,  6.48it/s]

{'eval_loss': 0.3867543637752533, 'eval_bleu': 7.5159, 'eval_gen_len': 4.8754, 'eval_runtime': 6.9631, 'eval_samples_per_second': 42.653, 'eval_steps_per_second': 2.729, 'epoch': 51.0}


 10%|█         | 7852/75500 [30:08<3:15:42,  5.76it/s] 

{'loss': 0.1416, 'learning_rate': 0.0026393232971516074, 'epoch': 52.0}


                                                      
 10%|█         | 7852/75500 [30:15<3:15:42,  5.76it/s]

{'eval_loss': 0.38602954149246216, 'eval_bleu': 6.9365, 'eval_gen_len': 4.9529, 'eval_runtime': 7.2573, 'eval_samples_per_second': 40.924, 'eval_steps_per_second': 2.618, 'epoch': 52.0}


 11%|█         | 8003/75500 [30:42<3:58:15,  4.72it/s] 

{'loss': 0.1462, 'learning_rate': 0.002633432211759375, 'epoch': 53.0}


                                                      
 11%|█         | 8003/75500 [30:52<3:58:15,  4.72it/s]

{'eval_loss': 0.3838023245334625, 'eval_bleu': 7.5017, 'eval_gen_len': 4.5791, 'eval_runtime': 10.4531, 'eval_samples_per_second': 28.413, 'eval_steps_per_second': 1.818, 'epoch': 53.0}


 11%|█         | 8154/75500 [31:26<3:16:33,  5.71it/s] 

{'loss': 0.1551, 'learning_rate': 0.002627541126367143, 'epoch': 54.0}


                                                      
 11%|█         | 8154/75500 [31:34<3:16:33,  5.71it/s]

{'eval_loss': 0.4147220551967621, 'eval_bleu': 7.8441, 'eval_gen_len': 4.1919, 'eval_runtime': 8.417, 'eval_samples_per_second': 35.286, 'eval_steps_per_second': 2.257, 'epoch': 54.0}


 11%|█         | 8305/75500 [32:04<3:15:49,  5.72it/s] 

{'loss': 0.1516, 'learning_rate': 0.002621650040974911, 'epoch': 55.0}


                                                      
 11%|█         | 8305/75500 [32:11<3:15:49,  5.72it/s]

{'eval_loss': 0.375941663980484, 'eval_bleu': 11.2967, 'eval_gen_len': 5.3502, 'eval_runtime': 7.6832, 'eval_samples_per_second': 38.656, 'eval_steps_per_second': 2.473, 'epoch': 55.0}


 11%|█         | 8456/75500 [32:36<2:57:14,  6.30it/s] 

{'loss': 0.1356, 'learning_rate': 0.0026157589555826787, 'epoch': 56.0}


                                                      
 11%|█         | 8456/75500 [32:42<2:57:14,  6.30it/s]

{'eval_loss': 0.39261800050735474, 'eval_bleu': 6.3159, 'eval_gen_len': 4.9327, 'eval_runtime': 6.6629, 'eval_samples_per_second': 44.575, 'eval_steps_per_second': 2.852, 'epoch': 56.0}


 11%|█▏        | 8607/75500 [33:07<2:35:54,  7.15it/s] 

{'loss': 0.1367, 'learning_rate': 0.0026098678701904467, 'epoch': 57.0}


                                                      
 11%|█▏        | 8607/75500 [33:14<2:35:54,  7.15it/s]

{'eval_loss': 0.38576337695121765, 'eval_bleu': 9.8815, 'eval_gen_len': 5.0269, 'eval_runtime': 6.6343, 'eval_samples_per_second': 44.768, 'eval_steps_per_second': 2.864, 'epoch': 57.0}


 12%|█▏        | 8758/75500 [33:45<2:58:53,  6.22it/s] 

{'loss': 0.1379, 'learning_rate': 0.0026039767847982147, 'epoch': 58.0}


                                                      
 12%|█▏        | 8758/75500 [33:52<2:58:53,  6.22it/s]

{'eval_loss': 0.3861828148365021, 'eval_bleu': 5.4359, 'eval_gen_len': 5.6397, 'eval_runtime': 7.0288, 'eval_samples_per_second': 42.255, 'eval_steps_per_second': 2.703, 'epoch': 58.0}


 12%|█▏        | 8909/75500 [34:18<2:51:04,  6.49it/s] 

{'loss': 0.1363, 'learning_rate': 0.0025980856994059828, 'epoch': 59.0}


                                                      
 12%|█▏        | 8909/75500 [34:25<2:51:04,  6.49it/s]

{'eval_loss': 0.38377541303634644, 'eval_bleu': 5.4032, 'eval_gen_len': 5.6094, 'eval_runtime': 7.0507, 'eval_samples_per_second': 42.124, 'eval_steps_per_second': 2.695, 'epoch': 59.0}


 12%|█▏        | 9060/75500 [34:50<3:06:55,  5.92it/s] 

{'loss': 0.1374, 'learning_rate': 0.0025921946140137504, 'epoch': 60.0}


                                                      
 12%|█▏        | 9060/75500 [34:57<3:06:55,  5.92it/s]

{'eval_loss': 0.37680166959762573, 'eval_bleu': 7.3521, 'eval_gen_len': 5.7003, 'eval_runtime': 6.5617, 'eval_samples_per_second': 45.263, 'eval_steps_per_second': 2.896, 'epoch': 60.0}


 12%|█▏        | 9211/75500 [35:21<2:48:52,  6.54it/s] 

{'loss': 0.1359, 'learning_rate': 0.0025863035286215184, 'epoch': 61.0}


                                                      
 12%|█▏        | 9211/75500 [35:28<2:48:52,  6.54it/s]

{'eval_loss': 0.3830239772796631, 'eval_bleu': 9.3768, 'eval_gen_len': 5.5791, 'eval_runtime': 6.6041, 'eval_samples_per_second': 44.972, 'eval_steps_per_second': 2.877, 'epoch': 61.0}


 12%|█▏        | 9362/75500 [35:53<3:03:03,  6.02it/s] 

{'loss': 0.1374, 'learning_rate': 0.0025804124432292864, 'epoch': 62.0}


                                                      
 12%|█▏        | 9362/75500 [36:00<3:03:03,  6.02it/s]

{'eval_loss': 0.3841252326965332, 'eval_bleu': 9.3917, 'eval_gen_len': 5.5118, 'eval_runtime': 6.9206, 'eval_samples_per_second': 42.916, 'eval_steps_per_second': 2.745, 'epoch': 62.0}


 13%|█▎        | 9513/75500 [36:25<2:41:50,  6.80it/s] 

{'loss': 0.1352, 'learning_rate': 0.0025745213578370544, 'epoch': 63.0}


                                                      
 13%|█▎        | 9513/75500 [36:32<2:41:50,  6.80it/s]

{'eval_loss': 0.3910960555076599, 'eval_bleu': 9.2324, 'eval_gen_len': 5.1414, 'eval_runtime': 6.7175, 'eval_samples_per_second': 44.213, 'eval_steps_per_second': 2.828, 'epoch': 63.0}


 13%|█▎        | 9664/75500 [36:56<2:56:20,  6.22it/s] 

{'loss': 0.1287, 'learning_rate': 0.002568630272444822, 'epoch': 64.0}


                                                      
 13%|█▎        | 9664/75500 [37:03<2:56:20,  6.22it/s]

{'eval_loss': 0.3881823718547821, 'eval_bleu': 10.568, 'eval_gen_len': 5.2963, 'eval_runtime': 6.7323, 'eval_samples_per_second': 44.116, 'eval_steps_per_second': 2.822, 'epoch': 64.0}


 13%|█▎        | 9815/75500 [37:29<2:42:15,  6.75it/s] 

{'loss': 0.1355, 'learning_rate': 0.00256273918705259, 'epoch': 65.0}


                                                      
 13%|█▎        | 9815/75500 [37:36<2:42:15,  6.75it/s]

{'eval_loss': 0.4097009599208832, 'eval_bleu': 6.4583, 'eval_gen_len': 5.2189, 'eval_runtime': 6.99, 'eval_samples_per_second': 42.489, 'eval_steps_per_second': 2.718, 'epoch': 65.0}


 13%|█▎        | 9966/75500 [38:01<2:59:47,  6.07it/s] 

{'loss': 0.1593, 'learning_rate': 0.002556848101660358, 'epoch': 66.0}


                                                      
 13%|█▎        | 9966/75500 [38:08<2:59:47,  6.07it/s]

{'eval_loss': 0.42498770356178284, 'eval_bleu': 4.9677, 'eval_gen_len': 4.1212, 'eval_runtime': 6.988, 'eval_samples_per_second': 42.502, 'eval_steps_per_second': 2.719, 'epoch': 66.0}


 13%|█▎        | 10117/75500 [38:33<2:42:54,  6.69it/s]

{'loss': 0.1326, 'learning_rate': 0.0025509570162681257, 'epoch': 67.0}


                                                       
 13%|█▎        | 10117/75500 [38:40<2:42:54,  6.69it/s]

{'eval_loss': 0.3825269937515259, 'eval_bleu': 7.6332, 'eval_gen_len': 5.4815, 'eval_runtime': 6.978, 'eval_samples_per_second': 42.562, 'eval_steps_per_second': 2.723, 'epoch': 67.0}


 14%|█▎        | 10268/75500 [39:08<2:30:20,  7.23it/s] 

{'loss': 0.1223, 'learning_rate': 0.0025450659308758937, 'epoch': 68.0}


                                                       
 14%|█▎        | 10268/75500 [39:15<2:30:20,  7.23it/s]

{'eval_loss': 0.4018552601337433, 'eval_bleu': 6.7936, 'eval_gen_len': 5.6869, 'eval_runtime': 6.6668, 'eval_samples_per_second': 44.549, 'eval_steps_per_second': 2.85, 'epoch': 68.0}


 14%|█▍        | 10419/75500 [39:39<2:43:47,  6.62it/s] 

{'loss': 0.1332, 'learning_rate': 0.0025391748454836618, 'epoch': 69.0}


                                                       
 14%|█▍        | 10419/75500 [39:46<2:43:47,  6.62it/s]

{'eval_loss': 0.39526140689849854, 'eval_bleu': 6.7167, 'eval_gen_len': 5.5791, 'eval_runtime': 6.6556, 'eval_samples_per_second': 44.624, 'eval_steps_per_second': 2.855, 'epoch': 69.0}


 14%|█▍        | 10570/75500 [40:11<2:32:58,  7.07it/s] 

{'loss': 0.1304, 'learning_rate': 0.00253328376009143, 'epoch': 70.0}


                                                       
 14%|█▍        | 10570/75500 [40:18<2:32:58,  7.07it/s]

{'eval_loss': 0.3966847062110901, 'eval_bleu': 7.5837, 'eval_gen_len': 5.9192, 'eval_runtime': 6.5922, 'eval_samples_per_second': 45.053, 'eval_steps_per_second': 2.882, 'epoch': 70.0}


 14%|█▍        | 10721/75500 [40:44<2:39:02,  6.79it/s] 

{'loss': 0.1294, 'learning_rate': 0.0025273926746991974, 'epoch': 71.0}


                                                       
 14%|█▍        | 10721/75500 [40:54<2:39:02,  6.79it/s]

{'eval_loss': 0.3866104483604431, 'eval_bleu': 11.819, 'eval_gen_len': 5.6027, 'eval_runtime': 9.8041, 'eval_samples_per_second': 30.293, 'eval_steps_per_second': 1.938, 'epoch': 71.0}


 14%|█▍        | 10872/75500 [41:22<2:42:48,  6.62it/s] 

{'loss': 0.1202, 'learning_rate': 0.0025215015893069654, 'epoch': 72.0}


                                                       
 14%|█▍        | 10872/75500 [41:29<2:42:48,  6.62it/s]

{'eval_loss': 0.3936781585216522, 'eval_bleu': 9.6881, 'eval_gen_len': 5.1684, 'eval_runtime': 6.9666, 'eval_samples_per_second': 42.632, 'eval_steps_per_second': 2.727, 'epoch': 72.0}


 15%|█▍        | 11023/75500 [41:55<3:30:20,  5.11it/s] 

{'loss': 0.1219, 'learning_rate': 0.0025156105039147334, 'epoch': 73.0}


                                                       
 15%|█▍        | 11023/75500 [42:02<3:30:20,  5.11it/s]

{'eval_loss': 0.4036162793636322, 'eval_bleu': 6.4267, 'eval_gen_len': 4.7407, 'eval_runtime': 7.3953, 'eval_samples_per_second': 40.161, 'eval_steps_per_second': 2.569, 'epoch': 73.0}


 15%|█▍        | 11174/75500 [42:28<2:50:25,  6.29it/s] 

{'loss': 0.1227, 'learning_rate': 0.002509719418522501, 'epoch': 74.0}


                                                       
 15%|█▍        | 11174/75500 [42:35<2:50:25,  6.29it/s]

{'eval_loss': 0.4004063904285431, 'eval_bleu': 8.448, 'eval_gen_len': 4.8923, 'eval_runtime': 7.256, 'eval_samples_per_second': 40.932, 'eval_steps_per_second': 2.619, 'epoch': 74.0}


 15%|█▌        | 11325/75500 [43:00<2:42:26,  6.58it/s] 

{'loss': 0.1223, 'learning_rate': 0.002503828333130269, 'epoch': 75.0}


                                                       
 15%|█▌        | 11325/75500 [43:07<2:42:26,  6.58it/s]

{'eval_loss': 0.40603122115135193, 'eval_bleu': 10.6559, 'eval_gen_len': 5.0471, 'eval_runtime': 6.6762, 'eval_samples_per_second': 44.486, 'eval_steps_per_second': 2.846, 'epoch': 75.0}


 15%|█▌        | 11476/75500 [43:31<3:07:41,  5.69it/s] 

{'loss': 0.1201, 'learning_rate': 0.0024979762615485817, 'epoch': 76.0}


                                                       
 15%|█▌        | 11476/75500 [43:38<3:07:41,  5.69it/s]

{'eval_loss': 0.40803253650665283, 'eval_bleu': 9.7988, 'eval_gen_len': 4.6936, 'eval_runtime': 6.6817, 'eval_samples_per_second': 44.45, 'eval_steps_per_second': 2.844, 'epoch': 76.0}


 15%|█▌        | 11627/75500 [44:04<2:38:06,  6.73it/s] 

{'loss': 0.1258, 'learning_rate': 0.0024920851761563497, 'epoch': 77.0}


                                                       
 15%|█▌        | 11627/75500 [44:11<2:38:06,  6.73it/s]

{'eval_loss': 0.39564067125320435, 'eval_bleu': 7.0313, 'eval_gen_len': 5.3805, 'eval_runtime': 7.1558, 'eval_samples_per_second': 41.505, 'eval_steps_per_second': 2.655, 'epoch': 77.0}


 16%|█▌        | 11778/75500 [44:39<3:47:08,  4.68it/s] 

{'loss': 0.1405, 'learning_rate': 0.0024862331045746623, 'epoch': 78.0}


                                                       
 16%|█▌        | 11778/75500 [44:47<3:47:08,  4.68it/s]

{'eval_loss': 0.4084324836730957, 'eval_bleu': 4.2681, 'eval_gen_len': 5.4141, 'eval_runtime': 7.6513, 'eval_samples_per_second': 38.817, 'eval_steps_per_second': 2.483, 'epoch': 78.0}


 16%|█▌        | 11929/75500 [45:16<2:39:11,  6.66it/s] 

{'loss': 0.1356, 'learning_rate': 0.0024803420191824303, 'epoch': 79.0}


                                                       
 16%|█▌        | 11929/75500 [45:23<2:39:11,  6.66it/s]

{'eval_loss': 0.4182598888874054, 'eval_bleu': 7.7899, 'eval_gen_len': 4.4983, 'eval_runtime': 6.7402, 'eval_samples_per_second': 44.064, 'eval_steps_per_second': 2.819, 'epoch': 79.0}


 16%|█▌        | 12080/75500 [45:49<2:49:42,  6.23it/s] 

{'loss': 0.1174, 'learning_rate': 0.002474450933790198, 'epoch': 80.0}


                                                       
 16%|█▌        | 12080/75500 [45:56<2:49:42,  6.23it/s]

{'eval_loss': 0.40602734684944153, 'eval_bleu': 9.1803, 'eval_gen_len': 5.1953, 'eval_runtime': 7.4228, 'eval_samples_per_second': 40.012, 'eval_steps_per_second': 2.56, 'epoch': 80.0}


 16%|█▌        | 12231/75500 [46:23<2:41:53,  6.51it/s] 

{'loss': 0.1107, 'learning_rate': 0.002468559848397966, 'epoch': 81.0}


                                                       
 16%|█▌        | 12231/75500 [46:31<2:41:53,  6.51it/s]

{'eval_loss': 0.3974735736846924, 'eval_bleu': 9.4847, 'eval_gen_len': 5.101, 'eval_runtime': 7.9695, 'eval_samples_per_second': 37.267, 'eval_steps_per_second': 2.384, 'epoch': 81.0}


 16%|█▋        | 12382/75500 [47:00<3:15:35,  5.38it/s] 

{'loss': 0.1123, 'learning_rate': 0.002462668763005734, 'epoch': 82.0}


                                                       
 16%|█▋        | 12382/75500 [47:08<3:15:35,  5.38it/s]

{'eval_loss': 0.39956343173980713, 'eval_bleu': 8.9138, 'eval_gen_len': 5.5387, 'eval_runtime': 8.3822, 'eval_samples_per_second': 35.432, 'eval_steps_per_second': 2.267, 'epoch': 82.0}


 17%|█▋        | 12533/75500 [47:36<2:57:04,  5.93it/s] 

{'loss': 0.1206, 'learning_rate': 0.002456777677613502, 'epoch': 83.0}


                                                       
 17%|█▋        | 12533/75500 [47:42<2:57:04,  5.93it/s]

{'eval_loss': 0.4108518660068512, 'eval_bleu': 8.2669, 'eval_gen_len': 4.532, 'eval_runtime': 6.7412, 'eval_samples_per_second': 44.058, 'eval_steps_per_second': 2.819, 'epoch': 83.0}


 17%|█▋        | 12684/75500 [48:07<2:33:20,  6.83it/s] 

{'loss': 0.1193, 'learning_rate': 0.0024508865922212696, 'epoch': 84.0}


                                                       
 17%|█▋        | 12684/75500 [48:14<2:33:20,  6.83it/s]

{'eval_loss': 0.3953816890716553, 'eval_bleu': 9.7246, 'eval_gen_len': 5.2492, 'eval_runtime': 6.8399, 'eval_samples_per_second': 43.422, 'eval_steps_per_second': 2.778, 'epoch': 84.0}


 17%|█▋        | 12835/75500 [48:41<2:28:00,  7.06it/s] 

{'loss': 0.1178, 'learning_rate': 0.0024449955068290377, 'epoch': 85.0}


                                                       
 17%|█▋        | 12835/75500 [48:48<2:28:00,  7.06it/s]

{'eval_loss': 0.4141477942466736, 'eval_bleu': 11.3027, 'eval_gen_len': 4.7576, 'eval_runtime': 6.9619, 'eval_samples_per_second': 42.661, 'eval_steps_per_second': 2.729, 'epoch': 85.0}


 17%|█▋        | 12986/75500 [49:13<2:52:06,  6.05it/s] 

{'loss': 0.1281, 'learning_rate': 0.0024391044214368057, 'epoch': 86.0}


                                                       
 17%|█▋        | 12986/75500 [49:20<2:52:06,  6.05it/s]

{'eval_loss': 0.4153177738189697, 'eval_bleu': 6.2409, 'eval_gen_len': 5.4714, 'eval_runtime': 6.678, 'eval_samples_per_second': 44.474, 'eval_steps_per_second': 2.845, 'epoch': 86.0}


 17%|█▋        | 13137/75500 [49:45<2:50:18,  6.10it/s] 

{'loss': 0.1422, 'learning_rate': 0.0024332133360445733, 'epoch': 87.0}


                                                       
 17%|█▋        | 13137/75500 [49:52<2:50:18,  6.10it/s]

{'eval_loss': 0.41224542260169983, 'eval_bleu': 8.504, 'eval_gen_len': 5.4646, 'eval_runtime': 7.3072, 'eval_samples_per_second': 40.645, 'eval_steps_per_second': 2.6, 'epoch': 87.0}


 18%|█▊        | 13288/75500 [50:19<2:48:10,  6.17it/s] 

{'loss': 0.1137, 'learning_rate': 0.0024273222506523413, 'epoch': 88.0}


                                                       
 18%|█▊        | 13288/75500 [50:27<2:48:10,  6.17it/s]

{'eval_loss': 0.3981569707393646, 'eval_bleu': 13.4413, 'eval_gen_len': 4.9125, 'eval_runtime': 7.8056, 'eval_samples_per_second': 38.05, 'eval_steps_per_second': 2.434, 'epoch': 88.0}


 18%|█▊        | 13439/75500 [50:54<2:40:55,  6.43it/s] 

{'loss': 0.1121, 'learning_rate': 0.0024214311652601093, 'epoch': 89.0}


                                                       
 18%|█▊        | 13439/75500 [51:01<2:40:55,  6.43it/s]

{'eval_loss': 0.3959297239780426, 'eval_bleu': 9.7432, 'eval_gen_len': 5.6869, 'eval_runtime': 7.2755, 'eval_samples_per_second': 40.822, 'eval_steps_per_second': 2.611, 'epoch': 89.0}


 18%|█▊        | 13590/75500 [51:29<2:42:19,  6.36it/s] 

{'loss': 0.1076, 'learning_rate': 0.0024155400798678774, 'epoch': 90.0}


                                                       
 18%|█▊        | 13590/75500 [51:37<2:42:19,  6.36it/s]

{'eval_loss': 0.40869197249412537, 'eval_bleu': 9.9054, 'eval_gen_len': 4.9933, 'eval_runtime': 7.8925, 'eval_samples_per_second': 37.631, 'eval_steps_per_second': 2.407, 'epoch': 90.0}


 18%|█▊        | 13741/75500 [52:04<2:32:26,  6.75it/s] 

{'loss': 0.1093, 'learning_rate': 0.002409648994475645, 'epoch': 91.0}


                                                       
 18%|█▊        | 13741/75500 [52:11<2:32:26,  6.75it/s]

{'eval_loss': 0.4112171232700348, 'eval_bleu': 9.9929, 'eval_gen_len': 4.8384, 'eval_runtime': 7.2392, 'eval_samples_per_second': 41.026, 'eval_steps_per_second': 2.625, 'epoch': 91.0}


 18%|█▊        | 13892/75500 [52:41<3:29:22,  4.90it/s] 

{'loss': 0.1128, 'learning_rate': 0.002403757909083413, 'epoch': 92.0}


                                                       
 18%|█▊        | 13892/75500 [52:49<3:29:22,  4.90it/s]

{'eval_loss': 0.4133070409297943, 'eval_bleu': 11.6407, 'eval_gen_len': 4.9865, 'eval_runtime': 7.7257, 'eval_samples_per_second': 38.443, 'eval_steps_per_second': 2.459, 'epoch': 92.0}


 19%|█▊        | 14043/75500 [53:15<2:50:49,  6.00it/s] 

{'loss': 0.1117, 'learning_rate': 0.002397866823691181, 'epoch': 93.0}


                                                       
 19%|█▊        | 14043/75500 [53:23<2:50:49,  6.00it/s]

{'eval_loss': 0.40679603815078735, 'eval_bleu': 9.4882, 'eval_gen_len': 4.9024, 'eval_runtime': 7.4978, 'eval_samples_per_second': 39.612, 'eval_steps_per_second': 2.534, 'epoch': 93.0}


 19%|█▉        | 14194/75500 [53:50<2:43:58,  6.23it/s] 

{'loss': 0.1166, 'learning_rate': 0.002391975738298949, 'epoch': 94.0}


                                                       
 19%|█▉        | 14194/75500 [53:58<2:43:58,  6.23it/s]

{'eval_loss': 0.4224262237548828, 'eval_bleu': 7.2652, 'eval_gen_len': 4.697, 'eval_runtime': 8.0935, 'eval_samples_per_second': 36.696, 'eval_steps_per_second': 2.348, 'epoch': 94.0}


 19%|█▉        | 14345/75500 [54:25<2:59:12,  5.69it/s] 

{'loss': 0.1182, 'learning_rate': 0.0023860846529067167, 'epoch': 95.0}


                                                       
 19%|█▉        | 14345/75500 [54:33<2:59:12,  5.69it/s]

{'eval_loss': 0.4285012185573578, 'eval_bleu': 8.6639, 'eval_gen_len': 5.3333, 'eval_runtime': 7.3902, 'eval_samples_per_second': 40.188, 'eval_steps_per_second': 2.571, 'epoch': 95.0}


 19%|█▉        | 14496/75500 [55:02<2:33:32,  6.62it/s] 

{'loss': 0.1312, 'learning_rate': 0.0023801935675144847, 'epoch': 96.0}


                                                       
 19%|█▉        | 14496/75500 [55:09<2:33:32,  6.62it/s]

{'eval_loss': 0.4251411259174347, 'eval_bleu': 7.456, 'eval_gen_len': 4.7643, 'eval_runtime': 6.9648, 'eval_samples_per_second': 42.643, 'eval_steps_per_second': 2.728, 'epoch': 96.0}


 19%|█▉        | 14647/75500 [55:35<2:43:12,  6.21it/s] 

{'loss': 0.1139, 'learning_rate': 0.0023743024821222527, 'epoch': 97.0}


                                                       
 19%|█▉        | 14647/75500 [55:43<2:43:12,  6.21it/s]

{'eval_loss': 0.4057214558124542, 'eval_bleu': 7.8745, 'eval_gen_len': 6.165, 'eval_runtime': 7.6362, 'eval_samples_per_second': 38.894, 'eval_steps_per_second': 2.488, 'epoch': 97.0}


 20%|█▉        | 14798/75500 [56:09<2:40:56,  6.29it/s] 

{'loss': 0.1052, 'learning_rate': 0.0023684113967300203, 'epoch': 98.0}


                                                       
 20%|█▉        | 14798/75500 [56:17<2:40:56,  6.29it/s]

{'eval_loss': 0.4186013340950012, 'eval_bleu': 8.9021, 'eval_gen_len': 5.1481, 'eval_runtime': 7.3105, 'eval_samples_per_second': 40.626, 'eval_steps_per_second': 2.599, 'epoch': 98.0}


 20%|█▉        | 14949/75500 [56:43<2:46:33,  6.06it/s] 

{'loss': 0.1059, 'learning_rate': 0.0023625203113377883, 'epoch': 99.0}


                                                       
 20%|█▉        | 14949/75500 [56:50<2:46:33,  6.06it/s]

{'eval_loss': 0.4083940386772156, 'eval_bleu': 10.8775, 'eval_gen_len': 4.5522, 'eval_runtime': 7.244, 'eval_samples_per_second': 40.999, 'eval_steps_per_second': 2.623, 'epoch': 99.0}


 20%|██        | 15100/75500 [57:18<2:59:36,  5.60it/s] 

{'loss': 0.1178, 'learning_rate': 0.0023566292259455564, 'epoch': 100.0}


                                                       
 20%|██        | 15100/75500 [57:25<2:59:36,  5.60it/s]

{'eval_loss': 0.4156941771507263, 'eval_bleu': 8.1511, 'eval_gen_len': 5.3333, 'eval_runtime': 7.4263, 'eval_samples_per_second': 39.993, 'eval_steps_per_second': 2.558, 'epoch': 100.0}


 20%|██        | 15251/75500 [57:52<2:42:49,  6.17it/s] 

{'loss': 0.1135, 'learning_rate': 0.0023507381405533244, 'epoch': 101.0}


                                                       
 20%|██        | 15251/75500 [58:00<2:42:49,  6.17it/s]

{'eval_loss': 0.41720643639564514, 'eval_bleu': 8.1746, 'eval_gen_len': 5.5455, 'eval_runtime': 7.4432, 'eval_samples_per_second': 39.902, 'eval_steps_per_second': 2.553, 'epoch': 101.0}


 20%|██        | 15402/75500 [58:30<2:38:53,  6.30it/s] 

{'loss': 0.1068, 'learning_rate': 0.002344847055161092, 'epoch': 102.0}


                                                       
 20%|██        | 15402/75500 [58:37<2:38:53,  6.30it/s]

{'eval_loss': 0.4081767201423645, 'eval_bleu': 11.683, 'eval_gen_len': 5.2963, 'eval_runtime': 7.3985, 'eval_samples_per_second': 40.143, 'eval_steps_per_second': 2.568, 'epoch': 102.0}


 21%|██        | 15553/75500 [59:05<2:42:24,  6.15it/s] 

{'loss': 0.1066, 'learning_rate': 0.00233895596976886, 'epoch': 103.0}


                                                       
 21%|██        | 15553/75500 [59:12<2:42:24,  6.15it/s]

{'eval_loss': 0.4023890495300293, 'eval_bleu': 12.0033, 'eval_gen_len': 5.6397, 'eval_runtime': 7.401, 'eval_samples_per_second': 40.13, 'eval_steps_per_second': 2.567, 'epoch': 103.0}


 21%|██        | 15704/75500 [59:40<2:47:45,  5.94it/s] 

{'loss': 0.1083, 'learning_rate': 0.002333064884376628, 'epoch': 104.0}


                                                       
 21%|██        | 15704/75500 [59:48<2:47:45,  5.94it/s]

{'eval_loss': 0.4264695346355438, 'eval_bleu': 11.2135, 'eval_gen_len': 4.9731, 'eval_runtime': 7.8962, 'eval_samples_per_second': 37.613, 'eval_steps_per_second': 2.406, 'epoch': 104.0}


 21%|██        | 15855/75500 [1:00:14<2:48:06,  5.91it/s]

{'loss': 0.1091, 'learning_rate': 0.002327173798984396, 'epoch': 105.0}


                                                         
 21%|██        | 15855/75500 [1:00:22<2:48:06,  5.91it/s]

{'eval_loss': 0.42842844128608704, 'eval_bleu': 6.171, 'eval_gen_len': 4.9764, 'eval_runtime': 7.8657, 'eval_samples_per_second': 37.759, 'eval_steps_per_second': 2.416, 'epoch': 105.0}


 21%|██        | 16006/75500 [1:00:49<2:48:22,  5.89it/s] 

{'loss': 0.1067, 'learning_rate': 0.0023212827135921637, 'epoch': 106.0}


                                                         
 21%|██        | 16006/75500 [1:00:56<2:48:22,  5.89it/s]

{'eval_loss': 0.41516944766044617, 'eval_bleu': 6.8178, 'eval_gen_len': 4.7441, 'eval_runtime': 7.5502, 'eval_samples_per_second': 39.337, 'eval_steps_per_second': 2.517, 'epoch': 106.0}


 21%|██▏       | 16157/75500 [1:01:25<2:56:20,  5.61it/s] 

{'loss': 0.1097, 'learning_rate': 0.0023153916281999317, 'epoch': 107.0}


                                                         
 21%|██▏       | 16157/75500 [1:01:33<2:56:20,  5.61it/s]

{'eval_loss': 0.4126267731189728, 'eval_bleu': 8.1643, 'eval_gen_len': 5.5488, 'eval_runtime': 8.0202, 'eval_samples_per_second': 37.031, 'eval_steps_per_second': 2.369, 'epoch': 107.0}


 22%|██▏       | 16308/75500 [1:01:58<2:29:52,  6.58it/s] 

{'loss': 0.1111, 'learning_rate': 0.0023095005428076998, 'epoch': 108.0}


                                                         
 22%|██▏       | 16308/75500 [1:02:05<2:29:52,  6.58it/s]

{'eval_loss': 0.41000521183013916, 'eval_bleu': 11.9465, 'eval_gen_len': 4.5421, 'eval_runtime': 7.0908, 'eval_samples_per_second': 41.885, 'eval_steps_per_second': 2.68, 'epoch': 108.0}


 22%|██▏       | 16459/75500 [1:02:32<3:02:48,  5.38it/s] 

{'loss': 0.1058, 'learning_rate': 0.0023036094574154673, 'epoch': 109.0}


                                                         
 22%|██▏       | 16459/75500 [1:02:40<3:02:48,  5.38it/s]

{'eval_loss': 0.42497503757476807, 'eval_bleu': 7.8819, 'eval_gen_len': 5.0572, 'eval_runtime': 7.7515, 'eval_samples_per_second': 38.315, 'eval_steps_per_second': 2.451, 'epoch': 109.0}


 22%|██▏       | 16610/75500 [1:03:07<2:56:03,  5.57it/s] 

{'loss': 0.1096, 'learning_rate': 0.0022977183720232354, 'epoch': 110.0}


                                                         
 22%|██▏       | 16610/75500 [1:03:14<2:56:03,  5.57it/s]

{'eval_loss': 0.42757126688957214, 'eval_bleu': 10.3528, 'eval_gen_len': 4.5084, 'eval_runtime': 7.44, 'eval_samples_per_second': 39.919, 'eval_steps_per_second': 2.554, 'epoch': 110.0}


 22%|██▏       | 16761/75500 [1:03:42<3:14:09,  5.04it/s] 

{'loss': 0.1045, 'learning_rate': 0.0022918272866310034, 'epoch': 111.0}


                                                         
 22%|██▏       | 16761/75500 [1:03:50<3:14:09,  5.04it/s]

{'eval_loss': 0.41396844387054443, 'eval_bleu': 13.9072, 'eval_gen_len': 5.0067, 'eval_runtime': 7.404, 'eval_samples_per_second': 40.113, 'eval_steps_per_second': 2.566, 'epoch': 111.0}


 22%|██▏       | 16912/75500 [1:04:18<2:28:47,  6.56it/s] 

{'loss': 0.1148, 'learning_rate': 0.002285975215049316, 'epoch': 112.0}


                                                         
 22%|██▏       | 16912/75500 [1:04:25<2:28:47,  6.56it/s]

{'eval_loss': 0.4279059171676636, 'eval_bleu': 5.3515, 'eval_gen_len': 5.0909, 'eval_runtime': 7.5605, 'eval_samples_per_second': 39.283, 'eval_steps_per_second': 2.513, 'epoch': 112.0}


 23%|██▎       | 17063/75500 [1:04:54<3:02:07,  5.35it/s] 

{'loss': 0.1196, 'learning_rate': 0.002280084129657084, 'epoch': 113.0}


                                                         
 23%|██▎       | 17063/75500 [1:05:03<3:02:07,  5.35it/s]

{'eval_loss': 0.4244072437286377, 'eval_bleu': 6.0797, 'eval_gen_len': 5.4343, 'eval_runtime': 8.3229, 'eval_samples_per_second': 35.685, 'eval_steps_per_second': 2.283, 'epoch': 113.0}


 23%|██▎       | 17214/75500 [1:05:30<2:39:00,  6.11it/s] 

{'loss': 0.1056, 'learning_rate': 0.0022741930442648516, 'epoch': 114.0}


                                                         
 23%|██▎       | 17214/75500 [1:05:37<2:39:00,  6.11it/s]

{'eval_loss': 0.4227845370769501, 'eval_bleu': 10.0421, 'eval_gen_len': 4.8822, 'eval_runtime': 7.7076, 'eval_samples_per_second': 38.534, 'eval_steps_per_second': 2.465, 'epoch': 114.0}


 23%|██▎       | 17365/75500 [1:06:05<2:28:15,  6.54it/s] 

{'loss': 0.1102, 'learning_rate': 0.0022683019588726197, 'epoch': 115.0}


                                                         
 23%|██▎       | 17365/75500 [1:06:13<2:28:15,  6.54it/s]

{'eval_loss': 0.4331040382385254, 'eval_bleu': 9.2018, 'eval_gen_len': 4.9933, 'eval_runtime': 7.4274, 'eval_samples_per_second': 39.987, 'eval_steps_per_second': 2.558, 'epoch': 115.0}


 23%|██▎       | 17516/75500 [1:06:39<2:30:42,  6.41it/s] 

{'loss': 0.1077, 'learning_rate': 0.0022624108734803877, 'epoch': 116.0}


                                                         
 23%|██▎       | 17516/75500 [1:06:46<2:30:42,  6.41it/s]

{'eval_loss': 0.4154239892959595, 'eval_bleu': 10.0865, 'eval_gen_len': 5.6061, 'eval_runtime': 7.3714, 'eval_samples_per_second': 40.291, 'eval_steps_per_second': 2.578, 'epoch': 116.0}


 23%|██▎       | 17667/75500 [1:07:17<3:44:15,  4.30it/s] 

{'loss': 0.1015, 'learning_rate': 0.0022565197880881553, 'epoch': 117.0}


                                                         
 23%|██▎       | 17667/75500 [1:07:26<3:44:15,  4.30it/s]

{'eval_loss': 0.41760990023612976, 'eval_bleu': 8.6207, 'eval_gen_len': 5.8182, 'eval_runtime': 9.5554, 'eval_samples_per_second': 31.082, 'eval_steps_per_second': 1.988, 'epoch': 117.0}


 24%|██▎       | 17818/75500 [1:07:55<2:39:47,  6.02it/s] 

{'loss': 0.1006, 'learning_rate': 0.0022506287026959233, 'epoch': 118.0}


                                                         
 24%|██▎       | 17818/75500 [1:08:03<2:39:47,  6.02it/s]

{'eval_loss': 0.4206564724445343, 'eval_bleu': 9.493, 'eval_gen_len': 5.2795, 'eval_runtime': 7.5299, 'eval_samples_per_second': 39.443, 'eval_steps_per_second': 2.523, 'epoch': 118.0}


 24%|██▍       | 17969/75500 [1:08:33<3:10:57,  5.02it/s] 

{'loss': 0.1032, 'learning_rate': 0.0022447376173036914, 'epoch': 119.0}


                                                         
 24%|██▍       | 17969/75500 [1:08:41<3:10:57,  5.02it/s]

{'eval_loss': 0.4135223627090454, 'eval_bleu': 8.7511, 'eval_gen_len': 5.2929, 'eval_runtime': 8.624, 'eval_samples_per_second': 34.439, 'eval_steps_per_second': 2.203, 'epoch': 119.0}


 24%|██▍       | 18120/75500 [1:09:10<2:47:39,  5.70it/s] 

{'loss': 0.1149, 'learning_rate': 0.0022388465319114594, 'epoch': 120.0}


                                                         
 24%|██▍       | 18120/75500 [1:09:17<2:47:39,  5.70it/s]

{'eval_loss': 0.4269636869430542, 'eval_bleu': 6.7067, 'eval_gen_len': 5.3468, 'eval_runtime': 7.4612, 'eval_samples_per_second': 39.806, 'eval_steps_per_second': 2.547, 'epoch': 120.0}


 24%|██▍       | 18271/75500 [1:09:44<2:38:29,  6.02it/s] 

{'loss': 0.1059, 'learning_rate': 0.002232955446519227, 'epoch': 121.0}


                                                         
 24%|██▍       | 18271/75500 [1:09:51<2:38:29,  6.02it/s]

{'eval_loss': 0.4234825074672699, 'eval_bleu': 9.3843, 'eval_gen_len': 5.1448, 'eval_runtime': 7.3337, 'eval_samples_per_second': 40.498, 'eval_steps_per_second': 2.591, 'epoch': 121.0}


 24%|██▍       | 18422/75500 [1:10:18<3:13:02,  4.93it/s] 

{'loss': 0.0996, 'learning_rate': 0.002227064361126995, 'epoch': 122.0}


                                                         
 24%|██▍       | 18422/75500 [1:10:25<3:13:02,  4.93it/s]

{'eval_loss': 0.44466182589530945, 'eval_bleu': 9.8197, 'eval_gen_len': 4.7912, 'eval_runtime': 7.2052, 'eval_samples_per_second': 41.22, 'eval_steps_per_second': 2.637, 'epoch': 122.0}


 25%|██▍       | 18573/75500 [1:10:52<2:51:30,  5.53it/s] 

{'loss': 0.0966, 'learning_rate': 0.002221173275734763, 'epoch': 123.0}


                                                         
 25%|██▍       | 18573/75500 [1:11:00<2:51:30,  5.53it/s]

{'eval_loss': 0.42030763626098633, 'eval_bleu': 9.9072, 'eval_gen_len': 5.0034, 'eval_runtime': 7.7405, 'eval_samples_per_second': 38.369, 'eval_steps_per_second': 2.455, 'epoch': 123.0}


 25%|██▍       | 18724/75500 [1:11:28<2:36:41,  6.04it/s] 

{'loss': 0.1004, 'learning_rate': 0.002215282190342531, 'epoch': 124.0}


                                                         
 25%|██▍       | 18724/75500 [1:11:37<2:36:41,  6.04it/s]

{'eval_loss': 0.41559654474258423, 'eval_bleu': 9.1086, 'eval_gen_len': 4.9158, 'eval_runtime': 8.9371, 'eval_samples_per_second': 33.232, 'eval_steps_per_second': 2.126, 'epoch': 124.0}


 25%|██▌       | 18875/75500 [1:12:05<2:14:45,  7.00it/s] 

{'loss': 0.1074, 'learning_rate': 0.0022093911049502987, 'epoch': 125.0}


                                                         
 25%|██▌       | 18875/75500 [1:12:12<2:14:45,  7.00it/s]

{'eval_loss': 0.44235825538635254, 'eval_bleu': 7.6992, 'eval_gen_len': 4.7778, 'eval_runtime': 7.0555, 'eval_samples_per_second': 42.095, 'eval_steps_per_second': 2.693, 'epoch': 125.0}


 25%|██▌       | 19026/75500 [1:12:37<2:17:18,  6.86it/s] 

{'loss': 0.1099, 'learning_rate': 0.0022035000195580667, 'epoch': 126.0}


                                                         
 25%|██▌       | 19026/75500 [1:12:43<2:17:18,  6.86it/s]

{'eval_loss': 0.4277889132499695, 'eval_bleu': 7.7667, 'eval_gen_len': 5.1279, 'eval_runtime': 6.7194, 'eval_samples_per_second': 44.2, 'eval_steps_per_second': 2.828, 'epoch': 126.0}


 25%|██▌       | 19177/75500 [1:13:09<2:11:37,  7.13it/s] 

{'loss': 0.1031, 'learning_rate': 0.0021976089341658347, 'epoch': 127.0}


                                                         
 25%|██▌       | 19177/75500 [1:13:16<2:11:37,  7.13it/s]

{'eval_loss': 0.41705822944641113, 'eval_bleu': 12.214, 'eval_gen_len': 5.4512, 'eval_runtime': 6.6175, 'eval_samples_per_second': 44.881, 'eval_steps_per_second': 2.871, 'epoch': 127.0}


 26%|██▌       | 19328/75500 [1:13:41<2:25:03,  6.45it/s] 

{'loss': 0.0965, 'learning_rate': 0.0021917178487736023, 'epoch': 128.0}


                                                         
 26%|██▌       | 19328/75500 [1:13:49<2:25:03,  6.45it/s]

{'eval_loss': 0.4231695830821991, 'eval_bleu': 7.3555, 'eval_gen_len': 5.5657, 'eval_runtime': 7.2992, 'eval_samples_per_second': 40.689, 'eval_steps_per_second': 2.603, 'epoch': 128.0}


 26%|██▌       | 19479/75500 [1:14:14<2:41:24,  5.78it/s] 

{'loss': 0.097, 'learning_rate': 0.0021858267633813704, 'epoch': 129.0}


                                                         
 26%|██▌       | 19479/75500 [1:14:21<2:41:24,  5.78it/s]

{'eval_loss': 0.4136282205581665, 'eval_bleu': 8.7725, 'eval_gen_len': 5.0943, 'eval_runtime': 7.1735, 'eval_samples_per_second': 41.402, 'eval_steps_per_second': 2.649, 'epoch': 129.0}


 26%|██▌       | 19630/75500 [1:14:46<2:28:40,  6.26it/s] 

{'loss': 0.1002, 'learning_rate': 0.0021799356779891384, 'epoch': 130.0}


                                                         
 26%|██▌       | 19630/75500 [1:14:53<2:28:40,  6.26it/s]

{'eval_loss': 0.43241703510284424, 'eval_bleu': 8.7467, 'eval_gen_len': 5.9832, 'eval_runtime': 6.6726, 'eval_samples_per_second': 44.51, 'eval_steps_per_second': 2.847, 'epoch': 130.0}


 26%|██▌       | 19781/75500 [1:15:18<2:45:01,  5.63it/s] 

{'loss': 0.1051, 'learning_rate': 0.0021740445925969064, 'epoch': 131.0}


                                                         
 26%|██▌       | 19781/75500 [1:15:25<2:45:01,  5.63it/s]

{'eval_loss': 0.4238881468772888, 'eval_bleu': 9.9915, 'eval_gen_len': 5.1785, 'eval_runtime': 7.0786, 'eval_samples_per_second': 41.957, 'eval_steps_per_second': 2.684, 'epoch': 131.0}


 26%|██▋       | 19932/75500 [1:15:53<3:08:39,  4.91it/s] 

{'loss': 0.1046, 'learning_rate': 0.002168153507204674, 'epoch': 132.0}


                                                         
 26%|██▋       | 19932/75500 [1:16:00<3:08:39,  4.91it/s]

{'eval_loss': 0.43018773198127747, 'eval_bleu': 9.4425, 'eval_gen_len': 4.9832, 'eval_runtime': 7.3875, 'eval_samples_per_second': 40.203, 'eval_steps_per_second': 2.572, 'epoch': 132.0}


 27%|██▋       | 20083/75500 [1:16:26<2:18:05,  6.69it/s] 

{'loss': 0.1041, 'learning_rate': 0.002162262421812442, 'epoch': 133.0}


                                                         
 27%|██▋       | 20083/75500 [1:16:33<2:18:05,  6.69it/s]

{'eval_loss': 0.421854168176651, 'eval_bleu': 9.943, 'eval_gen_len': 5.2357, 'eval_runtime': 6.9683, 'eval_samples_per_second': 42.622, 'eval_steps_per_second': 2.727, 'epoch': 133.0}


 27%|██▋       | 20234/75500 [1:16:59<2:30:49,  6.11it/s] 

{'loss': 0.1031, 'learning_rate': 0.00215637133642021, 'epoch': 134.0}


                                                         
 27%|██▋       | 20234/75500 [1:17:05<2:30:49,  6.11it/s]

{'eval_loss': 0.43100032210350037, 'eval_bleu': 8.0592, 'eval_gen_len': 5.0404, 'eval_runtime': 6.7265, 'eval_samples_per_second': 44.154, 'eval_steps_per_second': 2.825, 'epoch': 134.0}


 27%|██▋       | 20385/75500 [1:17:31<2:23:39,  6.39it/s] 

{'loss': 0.1053, 'learning_rate': 0.002150480251027978, 'epoch': 135.0}


                                                         
 27%|██▋       | 20385/75500 [1:17:39<2:23:39,  6.39it/s]

{'eval_loss': 0.4344368875026703, 'eval_bleu': 8.2468, 'eval_gen_len': 4.9125, 'eval_runtime': 7.5066, 'eval_samples_per_second': 39.565, 'eval_steps_per_second': 2.531, 'epoch': 135.0}


 27%|██▋       | 20536/75500 [1:18:05<2:17:01,  6.69it/s] 

{'loss': 0.0969, 'learning_rate': 0.0021445891656357457, 'epoch': 136.0}


                                                         
 27%|██▋       | 20536/75500 [1:18:12<2:17:01,  6.69it/s]

{'eval_loss': 0.4403402507305145, 'eval_bleu': 10.3049, 'eval_gen_len': 4.7811, 'eval_runtime': 7.5827, 'eval_samples_per_second': 39.168, 'eval_steps_per_second': 2.506, 'epoch': 136.0}


 27%|██▋       | 20687/75500 [1:18:40<2:39:09,  5.74it/s] 

{'loss': 0.0959, 'learning_rate': 0.0021386980802435137, 'epoch': 137.0}


                                                         
 27%|██▋       | 20687/75500 [1:18:47<2:39:09,  5.74it/s]

{'eval_loss': 0.42602986097335815, 'eval_bleu': 10.6327, 'eval_gen_len': 5.2626, 'eval_runtime': 7.4051, 'eval_samples_per_second': 40.107, 'eval_steps_per_second': 2.566, 'epoch': 137.0}


 28%|██▊       | 20838/75500 [1:19:13<2:45:22,  5.51it/s] 

{'loss': 0.0966, 'learning_rate': 0.0021328069948512818, 'epoch': 138.0}


                                                         
 28%|██▊       | 20838/75500 [1:19:21<2:45:22,  5.51it/s]

{'eval_loss': 0.4281405806541443, 'eval_bleu': 7.8005, 'eval_gen_len': 5.1717, 'eval_runtime': 7.2397, 'eval_samples_per_second': 41.024, 'eval_steps_per_second': 2.624, 'epoch': 138.0}


 28%|██▊       | 20989/75500 [1:19:48<2:31:24,  6.00it/s] 

{'loss': 0.0942, 'learning_rate': 0.0021269159094590494, 'epoch': 139.0}


                                                         
 28%|██▊       | 20989/75500 [1:19:55<2:31:24,  6.00it/s]

{'eval_loss': 0.42476439476013184, 'eval_bleu': 12.2408, 'eval_gen_len': 5.165, 'eval_runtime': 7.1892, 'eval_samples_per_second': 41.312, 'eval_steps_per_second': 2.643, 'epoch': 139.0}


 28%|██▊       | 21140/75500 [1:20:21<2:04:20,  7.29it/s] 

{'loss': 0.0967, 'learning_rate': 0.0021210248240668174, 'epoch': 140.0}


                                                         
 28%|██▊       | 21140/75500 [1:20:27<2:04:20,  7.29it/s]

{'eval_loss': 0.42288216948509216, 'eval_bleu': 10.0156, 'eval_gen_len': 5.5084, 'eval_runtime': 6.3668, 'eval_samples_per_second': 46.648, 'eval_steps_per_second': 2.984, 'epoch': 140.0}


 28%|██▊       | 21291/75500 [1:20:50<2:19:19,  6.48it/s] 

{'loss': 0.0977, 'learning_rate': 0.0021151337386745854, 'epoch': 141.0}


                                                         
 28%|██▊       | 21291/75500 [1:20:57<2:19:19,  6.48it/s]

{'eval_loss': 0.42598071694374084, 'eval_bleu': 11.4787, 'eval_gen_len': 5.5589, 'eval_runtime': 6.7307, 'eval_samples_per_second': 44.126, 'eval_steps_per_second': 2.823, 'epoch': 141.0}


 28%|██▊       | 21442/75500 [1:21:29<2:28:29,  6.07it/s] 

{'loss': 0.1019, 'learning_rate': 0.0021092426532823534, 'epoch': 142.0}


                                                         
 28%|██▊       | 21442/75500 [1:21:36<2:28:29,  6.07it/s]

{'eval_loss': 0.44204434752464294, 'eval_bleu': 7.6844, 'eval_gen_len': 5.0236, 'eval_runtime': 7.3985, 'eval_samples_per_second': 40.143, 'eval_steps_per_second': 2.568, 'epoch': 142.0}


 29%|██▊       | 21593/75500 [1:22:02<2:24:16,  6.23it/s] 

{'loss': 0.1022, 'learning_rate': 0.002103390581700666, 'epoch': 143.0}


                                                         
 29%|██▊       | 21593/75500 [1:22:10<2:24:16,  6.23it/s]

{'eval_loss': 0.4274404048919678, 'eval_bleu': 10.2902, 'eval_gen_len': 5.6061, 'eval_runtime': 7.4221, 'eval_samples_per_second': 40.016, 'eval_steps_per_second': 2.56, 'epoch': 143.0}


 29%|██▉       | 21744/75500 [1:22:39<2:33:05,  5.85it/s] 

{'loss': 0.0986, 'learning_rate': 0.0020974994963084337, 'epoch': 144.0}


                                                         
 29%|██▉       | 21744/75500 [1:22:46<2:33:05,  5.85it/s]

{'eval_loss': 0.43711918592453003, 'eval_bleu': 10.2799, 'eval_gen_len': 4.6599, 'eval_runtime': 7.1192, 'eval_samples_per_second': 41.718, 'eval_steps_per_second': 2.669, 'epoch': 144.0}


 29%|██▉       | 21895/75500 [1:23:13<2:17:34,  6.49it/s] 

{'loss': 0.0982, 'learning_rate': 0.0020916084109162017, 'epoch': 145.0}


                                                         
 29%|██▉       | 21895/75500 [1:23:20<2:17:34,  6.49it/s]

{'eval_loss': 0.4254867732524872, 'eval_bleu': 8.4605, 'eval_gen_len': 5.6061, 'eval_runtime': 7.8334, 'eval_samples_per_second': 37.915, 'eval_steps_per_second': 2.426, 'epoch': 145.0}


 29%|██▉       | 22046/75500 [1:23:47<2:11:51,  6.76it/s] 

{'loss': 0.0951, 'learning_rate': 0.0020857173255239697, 'epoch': 146.0}


                                                         
 29%|██▉       | 22046/75500 [1:23:54<2:11:51,  6.76it/s]

{'eval_loss': 0.4437207579612732, 'eval_bleu': 5.9607, 'eval_gen_len': 5.1515, 'eval_runtime': 7.0228, 'eval_samples_per_second': 42.291, 'eval_steps_per_second': 2.705, 'epoch': 146.0}


 29%|██▉       | 22197/75500 [1:24:20<2:26:27,  6.07it/s] 

{'loss': 0.099, 'learning_rate': 0.0020798262401317377, 'epoch': 147.0}


                                                         
 29%|██▉       | 22197/75500 [1:24:27<2:26:27,  6.07it/s]

{'eval_loss': 0.42965131998062134, 'eval_bleu': 8.5628, 'eval_gen_len': 5.5657, 'eval_runtime': 7.5421, 'eval_samples_per_second': 39.379, 'eval_steps_per_second': 2.519, 'epoch': 147.0}


 30%|██▉       | 22348/75500 [1:24:57<2:21:38,  6.25it/s] 

{'loss': 0.1011, 'learning_rate': 0.0020739351547395053, 'epoch': 148.0}


                                                         
 30%|██▉       | 22348/75500 [1:25:04<2:21:38,  6.25it/s]

{'eval_loss': 0.43983474373817444, 'eval_bleu': 5.0603, 'eval_gen_len': 5.4579, 'eval_runtime': 7.5163, 'eval_samples_per_second': 39.514, 'eval_steps_per_second': 2.528, 'epoch': 148.0}


 30%|██▉       | 22499/75500 [1:25:31<2:32:51,  5.78it/s] 

{'loss': 0.1019, 'learning_rate': 0.0020680440693472734, 'epoch': 149.0}


                                                         
 30%|██▉       | 22499/75500 [1:25:39<2:32:51,  5.78it/s]

{'eval_loss': 0.4286673069000244, 'eval_bleu': 9.8827, 'eval_gen_len': 6.0707, 'eval_runtime': 7.4876, 'eval_samples_per_second': 39.666, 'eval_steps_per_second': 2.538, 'epoch': 149.0}


 30%|███       | 22650/75500 [1:26:06<2:33:53,  5.72it/s] 

{'loss': 0.1001, 'learning_rate': 0.0020621529839550414, 'epoch': 150.0}


                                                         
 30%|███       | 22650/75500 [1:26:13<2:33:53,  5.72it/s]

{'eval_loss': 0.4237881004810333, 'eval_bleu': 9.3637, 'eval_gen_len': 5.0, 'eval_runtime': 7.1163, 'eval_samples_per_second': 41.735, 'eval_steps_per_second': 2.67, 'epoch': 150.0}


 30%|███       | 22801/75500 [1:26:38<2:06:51,  6.92it/s] 

{'loss': 0.0885, 'learning_rate': 0.002056261898562809, 'epoch': 151.0}


                                                         
 30%|███       | 22801/75500 [1:26:45<2:06:51,  6.92it/s]

{'eval_loss': 0.4435705244541168, 'eval_bleu': 10.7246, 'eval_gen_len': 4.8316, 'eval_runtime': 6.8881, 'eval_samples_per_second': 43.118, 'eval_steps_per_second': 2.758, 'epoch': 151.0}


 30%|███       | 22952/75500 [1:27:09<2:18:50,  6.31it/s] 

{'loss': 0.0876, 'learning_rate': 0.002050370813170577, 'epoch': 152.0}


                                                         
 30%|███       | 22952/75500 [1:27:16<2:18:50,  6.31it/s]

{'eval_loss': 0.42367908358573914, 'eval_bleu': 10.7446, 'eval_gen_len': 4.9798, 'eval_runtime': 7.0805, 'eval_samples_per_second': 41.946, 'eval_steps_per_second': 2.683, 'epoch': 152.0}


 31%|███       | 23103/75500 [1:27:42<2:34:45,  5.64it/s] 

{'loss': 0.0872, 'learning_rate': 0.002044479727778345, 'epoch': 153.0}


                                                         
 31%|███       | 23103/75500 [1:27:50<2:34:45,  5.64it/s]

{'eval_loss': 0.41776931285858154, 'eval_bleu': 11.8502, 'eval_gen_len': 4.6667, 'eval_runtime': 7.6234, 'eval_samples_per_second': 38.959, 'eval_steps_per_second': 2.492, 'epoch': 153.0}


 31%|███       | 23254/75500 [1:28:17<2:10:23,  6.68it/s] 

{'loss': 0.094, 'learning_rate': 0.002038588642386113, 'epoch': 154.0}


                                                         
 31%|███       | 23254/75500 [1:28:24<2:10:23,  6.68it/s]

{'eval_loss': 0.4207862913608551, 'eval_bleu': 10.0983, 'eval_gen_len': 5.1044, 'eval_runtime': 7.3453, 'eval_samples_per_second': 40.434, 'eval_steps_per_second': 2.587, 'epoch': 154.0}


 31%|███       | 23405/75500 [1:28:50<2:21:07,  6.15it/s] 

{'loss': 0.0971, 'learning_rate': 0.0020326975569938807, 'epoch': 155.0}


                                                         
 31%|███       | 23405/75500 [1:28:57<2:21:07,  6.15it/s]

{'eval_loss': 0.4374209940433502, 'eval_bleu': 8.7552, 'eval_gen_len': 5.1684, 'eval_runtime': 6.9833, 'eval_samples_per_second': 42.53, 'eval_steps_per_second': 2.721, 'epoch': 155.0}


 31%|███       | 23556/75500 [1:29:23<2:15:14,  6.40it/s] 

{'loss': 0.0972, 'learning_rate': 0.0020268064716016487, 'epoch': 156.0}


                                                         
 31%|███       | 23556/75500 [1:29:30<2:15:14,  6.40it/s]

{'eval_loss': 0.4307003319263458, 'eval_bleu': 10.5207, 'eval_gen_len': 5.303, 'eval_runtime': 7.0137, 'eval_samples_per_second': 42.346, 'eval_steps_per_second': 2.709, 'epoch': 156.0}


 31%|███▏      | 23707/75500 [1:29:59<3:11:27,  4.51it/s] 

{'loss': 0.096, 'learning_rate': 0.0020209153862094167, 'epoch': 157.0}


                                                         
 31%|███▏      | 23707/75500 [1:30:07<3:11:27,  4.51it/s]

{'eval_loss': 0.4400905966758728, 'eval_bleu': 8.6511, 'eval_gen_len': 4.7374, 'eval_runtime': 7.6233, 'eval_samples_per_second': 38.959, 'eval_steps_per_second': 2.492, 'epoch': 157.0}


 32%|███▏      | 23858/75500 [1:30:35<2:37:32,  5.46it/s] 

{'loss': 0.0905, 'learning_rate': 0.0020150243008171848, 'epoch': 158.0}


                                                         
 32%|███▏      | 23858/75500 [1:30:44<2:37:32,  5.46it/s]

{'eval_loss': 0.43113425374031067, 'eval_bleu': 11.8356, 'eval_gen_len': 5.2896, 'eval_runtime': 8.6755, 'eval_samples_per_second': 34.235, 'eval_steps_per_second': 2.19, 'epoch': 158.0}


 32%|███▏      | 24009/75500 [1:31:17<2:41:17,  5.32it/s] 

{'loss': 0.0928, 'learning_rate': 0.0020091332154249524, 'epoch': 159.0}


                                                         
 32%|███▏      | 24009/75500 [1:31:24<2:41:17,  5.32it/s]

{'eval_loss': 0.4198872447013855, 'eval_bleu': 10.3941, 'eval_gen_len': 5.7542, 'eval_runtime': 7.3109, 'eval_samples_per_second': 40.624, 'eval_steps_per_second': 2.599, 'epoch': 159.0}


 32%|███▏      | 24160/75500 [1:59:18<11:10:33,  1.28it/s]   

{'loss': 0.0897, 'learning_rate': 0.0020032421300327204, 'epoch': 160.0}


                                                          
 32%|███▏      | 24160/75500 [1:59:39<11:10:33,  1.28it/s]

{'eval_loss': 0.4351554811000824, 'eval_bleu': 13.0572, 'eval_gen_len': 5.4781, 'eval_runtime': 20.6155, 'eval_samples_per_second': 14.407, 'eval_steps_per_second': 0.922, 'epoch': 160.0}


 32%|███▏      | 24311/75500 [2:01:28<12:49:38,  1.11it/s] 

{'loss': 0.0922, 'learning_rate': 0.0019973510446404884, 'epoch': 161.0}


                                                          
 32%|███▏      | 24311/75500 [2:01:50<12:49:38,  1.11it/s]

{'eval_loss': 0.4109604060649872, 'eval_bleu': 9.1496, 'eval_gen_len': 5.367, 'eval_runtime': 21.676, 'eval_samples_per_second': 13.702, 'eval_steps_per_second': 0.877, 'epoch': 161.0}


 32%|███▏      | 24462/75500 [2:03:46<10:59:18,  1.29it/s] 

{'loss': 0.0919, 'learning_rate': 0.001991459959248256, 'epoch': 162.0}


                                                          
 32%|███▏      | 24462/75500 [2:04:06<10:59:18,  1.29it/s]

{'eval_loss': 0.43145936727523804, 'eval_bleu': 10.4236, 'eval_gen_len': 5.7744, 'eval_runtime': 20.7669, 'eval_samples_per_second': 14.302, 'eval_steps_per_second': 0.915, 'epoch': 162.0}


 33%|███▎      | 24613/75500 [2:05:54<8:26:52,  1.67it/s]  

{'loss': 0.0955, 'learning_rate': 0.001985568873856024, 'epoch': 163.0}


                                                         
 33%|███▎      | 24613/75500 [2:06:13<8:26:52,  1.67it/s]

{'eval_loss': 0.42856529355049133, 'eval_bleu': 9.2081, 'eval_gen_len': 4.8754, 'eval_runtime': 18.909, 'eval_samples_per_second': 15.707, 'eval_steps_per_second': 1.005, 'epoch': 163.0}


 33%|███▎      | 24764/75500 [2:07:51<8:47:33,  1.60it/s] 

{'loss': 0.0951, 'learning_rate': 0.0019797558160848812, 'epoch': 164.0}


                                                         
 33%|███▎      | 24764/75500 [2:08:10<8:47:33,  1.60it/s]

{'eval_loss': 0.42701783776283264, 'eval_bleu': 12.5318, 'eval_gen_len': 5.7643, 'eval_runtime': 19.0904, 'eval_samples_per_second': 15.558, 'eval_steps_per_second': 0.995, 'epoch': 164.0}


 33%|███▎      | 24915/75500 [2:09:16<1:57:38,  7.17it/s] 

{'loss': 0.0919, 'learning_rate': 0.0019738647306926493, 'epoch': 165.0}


                                                         
 33%|███▎      | 24915/75500 [2:09:23<1:57:38,  7.17it/s]

{'eval_loss': 0.42339250445365906, 'eval_bleu': 9.8771, 'eval_gen_len': 5.2458, 'eval_runtime': 7.2289, 'eval_samples_per_second': 41.085, 'eval_steps_per_second': 2.628, 'epoch': 165.0}


 33%|███▎      | 25066/75500 [2:09:53<2:24:50,  5.80it/s] 

{'loss': 0.0915, 'learning_rate': 0.0019679736453004173, 'epoch': 166.0}


                                                         
 33%|███▎      | 25066/75500 [2:09:59<2:24:50,  5.80it/s]

{'eval_loss': 0.41619715094566345, 'eval_bleu': 14.4122, 'eval_gen_len': 5.5758, 'eval_runtime': 6.3338, 'eval_samples_per_second': 46.891, 'eval_steps_per_second': 3.0, 'epoch': 166.0}


 33%|███▎      | 25217/75500 [2:10:24<2:05:14,  6.69it/s] 

{'loss': 0.0898, 'learning_rate': 0.0019620825599081853, 'epoch': 167.0}


                                                         
 33%|███▎      | 25217/75500 [2:10:30<2:05:14,  6.69it/s]

{'eval_loss': 0.42627230286598206, 'eval_bleu': 10.7645, 'eval_gen_len': 5.8552, 'eval_runtime': 6.4364, 'eval_samples_per_second': 46.143, 'eval_steps_per_second': 2.952, 'epoch': 167.0}


 34%|███▎      | 25368/75500 [2:10:55<2:18:45,  6.02it/s] 

{'loss': 0.09, 'learning_rate': 0.001956191474515953, 'epoch': 168.0}


                                                         
 34%|███▎      | 25368/75500 [2:11:01<2:18:45,  6.02it/s]

{'eval_loss': 0.4373660087585449, 'eval_bleu': 12.2352, 'eval_gen_len': 5.3872, 'eval_runtime': 6.3606, 'eval_samples_per_second': 46.694, 'eval_steps_per_second': 2.987, 'epoch': 168.0}


 34%|███▍      | 25519/75500 [2:11:26<2:12:37,  6.28it/s] 

{'loss': 0.0932, 'learning_rate': 0.001950300389123721, 'epoch': 169.0}


                                                         
 34%|███▍      | 25519/75500 [2:11:33<2:12:37,  6.28it/s]

{'eval_loss': 0.4335831105709076, 'eval_bleu': 10.6118, 'eval_gen_len': 4.6027, 'eval_runtime': 6.6282, 'eval_samples_per_second': 44.808, 'eval_steps_per_second': 2.867, 'epoch': 169.0}


 34%|███▍      | 25670/75500 [2:11:57<1:59:52,  6.93it/s] 

{'loss': 0.0919, 'learning_rate': 0.0019444093037314888, 'epoch': 170.0}


                                                         
 34%|███▍      | 25670/75500 [2:12:03<1:59:52,  6.93it/s]

{'eval_loss': 0.4330669343471527, 'eval_bleu': 8.4161, 'eval_gen_len': 4.9899, 'eval_runtime': 6.4135, 'eval_samples_per_second': 46.308, 'eval_steps_per_second': 2.962, 'epoch': 170.0}


 34%|███▍      | 25821/75500 [2:12:29<2:02:35,  6.75it/s] 

{'loss': 0.0932, 'learning_rate': 0.0019385182183392568, 'epoch': 171.0}


                                                         
 34%|███▍      | 25821/75500 [2:12:36<2:02:35,  6.75it/s]

{'eval_loss': 0.4333926737308502, 'eval_bleu': 11.6074, 'eval_gen_len': 5.2155, 'eval_runtime': 6.8412, 'eval_samples_per_second': 43.414, 'eval_steps_per_second': 2.777, 'epoch': 171.0}


 34%|███▍      | 25972/75500 [2:13:00<2:00:56,  6.83it/s] 

{'loss': 0.0891, 'learning_rate': 0.0019326271329470246, 'epoch': 172.0}


                                                         
 34%|███▍      | 25972/75500 [2:13:06<2:00:56,  6.83it/s]

{'eval_loss': 0.4236242175102234, 'eval_bleu': 12.1504, 'eval_gen_len': 5.165, 'eval_runtime': 6.3913, 'eval_samples_per_second': 46.469, 'eval_steps_per_second': 2.973, 'epoch': 172.0}


 35%|███▍      | 26123/75500 [2:13:30<2:32:20,  5.40it/s] 

{'loss': 0.0864, 'learning_rate': 0.0019267360475547926, 'epoch': 173.0}


                                                         
 35%|███▍      | 26123/75500 [2:13:37<2:32:20,  5.40it/s]

{'eval_loss': 0.4238069951534271, 'eval_bleu': 12.2063, 'eval_gen_len': 4.7576, 'eval_runtime': 6.4915, 'eval_samples_per_second': 45.752, 'eval_steps_per_second': 2.927, 'epoch': 173.0}


 35%|███▍      | 26274/75500 [2:14:00<2:08:41,  6.38it/s] 

{'loss': 0.0876, 'learning_rate': 0.0019208449621625605, 'epoch': 174.0}


                                                         
 35%|███▍      | 26274/75500 [2:14:06<2:08:41,  6.38it/s]

{'eval_loss': 0.4426722228527069, 'eval_bleu': 8.3938, 'eval_gen_len': 5.4209, 'eval_runtime': 6.4572, 'eval_samples_per_second': 45.996, 'eval_steps_per_second': 2.942, 'epoch': 174.0}


 35%|███▌      | 26425/75500 [2:14:30<2:17:46,  5.94it/s] 

{'loss': 0.0932, 'learning_rate': 0.0019149538767703285, 'epoch': 175.0}


                                                         
 35%|███▌      | 26425/75500 [2:14:37<2:17:46,  5.94it/s]

{'eval_loss': 0.43071699142456055, 'eval_bleu': 11.0909, 'eval_gen_len': 5.33, 'eval_runtime': 6.6677, 'eval_samples_per_second': 44.543, 'eval_steps_per_second': 2.85, 'epoch': 175.0}


 35%|███▌      | 26576/75500 [2:15:01<1:54:58,  7.09it/s] 

{'loss': 0.0885, 'learning_rate': 0.0019090627913780963, 'epoch': 176.0}


                                                         
 35%|███▌      | 26576/75500 [2:15:08<1:54:58,  7.09it/s]

{'eval_loss': 0.42993518710136414, 'eval_bleu': 12.8765, 'eval_gen_len': 5.5623, 'eval_runtime': 6.5873, 'eval_samples_per_second': 45.087, 'eval_steps_per_second': 2.884, 'epoch': 176.0}


 35%|███▌      | 26727/75500 [2:15:37<2:40:05,  5.08it/s] 

{'loss': 0.0858, 'learning_rate': 0.0019031717059858643, 'epoch': 177.0}


                                                         
 35%|███▌      | 26727/75500 [2:15:44<2:40:05,  5.08it/s]

{'eval_loss': 0.42676547169685364, 'eval_bleu': 12.5126, 'eval_gen_len': 4.8047, 'eval_runtime': 7.9384, 'eval_samples_per_second': 37.413, 'eval_steps_per_second': 2.393, 'epoch': 177.0}


 36%|███▌      | 26878/75500 [2:16:08<2:04:25,  6.51it/s] 

{'loss': 0.084, 'learning_rate': 0.0018972806205936321, 'epoch': 178.0}


                                                         
 36%|███▌      | 26878/75500 [2:16:15<2:04:25,  6.51it/s]

{'eval_loss': 0.42208772897720337, 'eval_bleu': 11.0323, 'eval_gen_len': 5.4916, 'eval_runtime': 6.4371, 'eval_samples_per_second': 46.139, 'eval_steps_per_second': 2.952, 'epoch': 178.0}


 36%|███▌      | 27029/75500 [2:16:38<2:19:20,  5.80it/s] 

{'loss': 0.0854, 'learning_rate': 0.0018913895352014002, 'epoch': 179.0}


                                                         
 36%|███▌      | 27029/75500 [2:16:45<2:19:20,  5.80it/s]

{'eval_loss': 0.43477684259414673, 'eval_bleu': 13.4276, 'eval_gen_len': 4.8249, 'eval_runtime': 6.524, 'eval_samples_per_second': 45.524, 'eval_steps_per_second': 2.912, 'epoch': 179.0}


 36%|███▌      | 27180/75500 [2:17:09<1:59:31,  6.74it/s] 

{'loss': 0.0887, 'learning_rate': 0.001885498449809168, 'epoch': 180.0}


                                                         
 36%|███▌      | 27180/75500 [2:17:16<1:59:31,  6.74it/s]

{'eval_loss': 0.43661078810691833, 'eval_bleu': 11.5958, 'eval_gen_len': 4.8485, 'eval_runtime': 6.5283, 'eval_samples_per_second': 45.494, 'eval_steps_per_second': 2.91, 'epoch': 180.0}


 36%|███▌      | 27331/75500 [2:17:39<1:58:13,  6.79it/s] 

{'loss': 0.0903, 'learning_rate': 0.0018796073644169358, 'epoch': 181.0}


                                                         
 36%|███▌      | 27331/75500 [2:17:47<1:58:13,  6.79it/s]

{'eval_loss': 0.4286207854747772, 'eval_bleu': 10.7152, 'eval_gen_len': 5.1852, 'eval_runtime': 7.6666, 'eval_samples_per_second': 38.739, 'eval_steps_per_second': 2.478, 'epoch': 181.0}


 36%|███▋      | 27482/75500 [2:18:16<2:34:45,  5.17it/s] 

{'loss': 0.0863, 'learning_rate': 0.0018737162790247038, 'epoch': 182.0}


                                                         
 36%|███▋      | 27482/75500 [2:18:24<2:34:45,  5.17it/s]

{'eval_loss': 0.4278983771800995, 'eval_bleu': 9.1781, 'eval_gen_len': 5.1178, 'eval_runtime': 7.8652, 'eval_samples_per_second': 37.761, 'eval_steps_per_second': 2.416, 'epoch': 182.0}


 37%|███▋      | 27633/75500 [2:18:54<2:21:51,  5.62it/s] 

{'loss': 0.0861, 'learning_rate': 0.0018678251936324716, 'epoch': 183.0}


                                                         
 37%|███▋      | 27633/75500 [2:19:02<2:21:51,  5.62it/s]

{'eval_loss': 0.4262077510356903, 'eval_bleu': 10.6112, 'eval_gen_len': 5.1818, 'eval_runtime': 7.9188, 'eval_samples_per_second': 37.506, 'eval_steps_per_second': 2.399, 'epoch': 183.0}


 37%|███▋      | 27784/75500 [2:19:32<3:04:15,  4.32it/s] 

{'loss': 0.0867, 'learning_rate': 0.0018619341082402397, 'epoch': 184.0}


                                                         
 37%|███▋      | 27784/75500 [2:19:40<3:04:15,  4.32it/s]

{'eval_loss': 0.43160825967788696, 'eval_bleu': 10.566, 'eval_gen_len': 5.3468, 'eval_runtime': 8.1967, 'eval_samples_per_second': 36.234, 'eval_steps_per_second': 2.318, 'epoch': 184.0}


 37%|███▋      | 27935/75500 [2:20:04<2:04:51,  6.35it/s] 

{'loss': 0.0875, 'learning_rate': 0.0018560430228480075, 'epoch': 185.0}


                                                         
 37%|███▋      | 27935/75500 [2:20:11<2:04:51,  6.35it/s]

{'eval_loss': 0.45109325647354126, 'eval_bleu': 11.77, 'eval_gen_len': 5.0707, 'eval_runtime': 6.3326, 'eval_samples_per_second': 46.9, 'eval_steps_per_second': 3.0, 'epoch': 185.0}


 37%|███▋      | 28086/75500 [2:20:35<2:01:58,  6.48it/s] 

{'loss': 0.0915, 'learning_rate': 0.0018501519374557755, 'epoch': 186.0}


                                                         
 37%|███▋      | 28086/75500 [2:20:41<2:01:58,  6.48it/s]

{'eval_loss': 0.4399144947528839, 'eval_bleu': 11.4884, 'eval_gen_len': 5.037, 'eval_runtime': 6.5143, 'eval_samples_per_second': 45.592, 'eval_steps_per_second': 2.917, 'epoch': 186.0}


 37%|███▋      | 28237/75500 [2:21:05<1:51:16,  7.08it/s] 

{'loss': 0.0834, 'learning_rate': 0.0018442608520635433, 'epoch': 187.0}


                                                         
 37%|███▋      | 28237/75500 [2:21:13<1:51:16,  7.08it/s]

{'eval_loss': 0.4251938462257385, 'eval_bleu': 9.9173, 'eval_gen_len': 5.1145, 'eval_runtime': 7.6892, 'eval_samples_per_second': 38.625, 'eval_steps_per_second': 2.471, 'epoch': 187.0}


 38%|███▊      | 28388/75500 [2:21:36<2:04:31,  6.31it/s] 

{'loss': 0.0806, 'learning_rate': 0.0018383697666713114, 'epoch': 188.0}


                                                         
 38%|███▊      | 28388/75500 [2:21:43<2:04:31,  6.31it/s]

{'eval_loss': 0.42492911219596863, 'eval_bleu': 11.2341, 'eval_gen_len': 5.2121, 'eval_runtime': 6.5784, 'eval_samples_per_second': 45.147, 'eval_steps_per_second': 2.888, 'epoch': 188.0}


 38%|███▊      | 28539/75500 [2:22:09<1:56:11,  6.74it/s] 

{'loss': 0.0854, 'learning_rate': 0.0018324786812790792, 'epoch': 189.0}


                                                         
 38%|███▊      | 28539/75500 [2:22:16<1:56:11,  6.74it/s]

{'eval_loss': 0.4266643226146698, 'eval_bleu': 10.5939, 'eval_gen_len': 4.9731, 'eval_runtime': 6.4459, 'eval_samples_per_second': 46.076, 'eval_steps_per_second': 2.948, 'epoch': 189.0}


 38%|███▊      | 28690/75500 [2:22:40<1:55:11,  6.77it/s] 

{'loss': 0.0826, 'learning_rate': 0.0018265875958868472, 'epoch': 190.0}


                                                         
 38%|███▊      | 28690/75500 [2:22:46<1:55:11,  6.77it/s]

{'eval_loss': 0.43619412183761597, 'eval_bleu': 12.3909, 'eval_gen_len': 4.9596, 'eval_runtime': 6.496, 'eval_samples_per_second': 45.72, 'eval_steps_per_second': 2.925, 'epoch': 190.0}


 38%|███▊      | 28841/75500 [2:23:10<2:04:45,  6.23it/s] 

{'loss': 0.0838, 'learning_rate': 0.001820696510494615, 'epoch': 191.0}


                                                         
 38%|███▊      | 28841/75500 [2:23:17<2:04:45,  6.23it/s]

{'eval_loss': 0.43915122747421265, 'eval_bleu': 10.7634, 'eval_gen_len': 4.9091, 'eval_runtime': 6.5865, 'eval_samples_per_second': 45.092, 'eval_steps_per_second': 2.885, 'epoch': 191.0}


 38%|███▊      | 28992/75500 [2:23:41<1:57:31,  6.60it/s] 

{'loss': 0.0866, 'learning_rate': 0.0018148054251023828, 'epoch': 192.0}


                                                         
 38%|███▊      | 28992/75500 [2:23:47<1:57:31,  6.60it/s]

{'eval_loss': 0.428827702999115, 'eval_bleu': 9.2427, 'eval_gen_len': 5.6128, 'eval_runtime': 6.4366, 'eval_samples_per_second': 46.143, 'eval_steps_per_second': 2.952, 'epoch': 192.0}


 39%|███▊      | 29143/75500 [2:24:10<1:48:07,  7.15it/s] 

{'loss': 0.0864, 'learning_rate': 0.0018089143397101509, 'epoch': 193.0}


                                                         
 39%|███▊      | 29143/75500 [2:24:17<1:48:07,  7.15it/s]

{'eval_loss': 0.4413028955459595, 'eval_bleu': 11.5168, 'eval_gen_len': 5.1717, 'eval_runtime': 6.5491, 'eval_samples_per_second': 45.35, 'eval_steps_per_second': 2.901, 'epoch': 193.0}


 39%|███▉      | 29294/75500 [2:24:40<1:49:44,  7.02it/s] 

{'loss': 0.0879, 'learning_rate': 0.0018030232543179187, 'epoch': 194.0}


                                                         
 39%|███▉      | 29294/75500 [2:24:47<1:49:44,  7.02it/s]

{'eval_loss': 0.4428988993167877, 'eval_bleu': 9.024, 'eval_gen_len': 5.1448, 'eval_runtime': 6.5334, 'eval_samples_per_second': 45.458, 'eval_steps_per_second': 2.908, 'epoch': 194.0}


 39%|███▉      | 29445/75500 [2:25:11<1:58:27,  6.48it/s] 

{'loss': 0.0838, 'learning_rate': 0.0017971321689256867, 'epoch': 195.0}


                                                         
 39%|███▉      | 29445/75500 [2:25:17<1:58:27,  6.48it/s]

{'eval_loss': 0.4413330852985382, 'eval_bleu': 11.7966, 'eval_gen_len': 5.5993, 'eval_runtime': 6.6246, 'eval_samples_per_second': 44.833, 'eval_steps_per_second': 2.868, 'epoch': 195.0}


 39%|███▉      | 29596/75500 [2:25:41<2:02:22,  6.25it/s] 

{'loss': 0.081, 'learning_rate': 0.0017912410835334545, 'epoch': 196.0}


                                                         
 39%|███▉      | 29596/75500 [2:25:48<2:02:22,  6.25it/s]

{'eval_loss': 0.4338628053665161, 'eval_bleu': 11.3732, 'eval_gen_len': 5.5657, 'eval_runtime': 6.7267, 'eval_samples_per_second': 44.152, 'eval_steps_per_second': 2.825, 'epoch': 196.0}


 39%|███▉      | 29747/75500 [2:26:11<2:13:00,  5.73it/s] 

{'loss': 0.0796, 'learning_rate': 0.0017853499981412225, 'epoch': 197.0}


                                                         
 39%|███▉      | 29747/75500 [2:26:18<2:13:00,  5.73it/s]

{'eval_loss': 0.44332459568977356, 'eval_bleu': 13.6046, 'eval_gen_len': 4.8215, 'eval_runtime': 6.4826, 'eval_samples_per_second': 45.815, 'eval_steps_per_second': 2.931, 'epoch': 197.0}


 40%|███▉      | 29898/75500 [2:26:41<1:48:08,  7.03it/s] 

{'loss': 0.0805, 'learning_rate': 0.0017794589127489904, 'epoch': 198.0}


                                                         
 40%|███▉      | 29898/75500 [2:26:48<1:48:08,  7.03it/s]

{'eval_loss': 0.43173012137413025, 'eval_bleu': 9.8112, 'eval_gen_len': 5.4444, 'eval_runtime': 6.6482, 'eval_samples_per_second': 44.674, 'eval_steps_per_second': 2.858, 'epoch': 198.0}


 40%|███▉      | 30049/75500 [2:27:11<1:57:20,  6.46it/s] 

{'loss': 0.0827, 'learning_rate': 0.0017735678273567584, 'epoch': 199.0}


                                                         
 40%|███▉      | 30049/75500 [2:27:18<1:57:20,  6.46it/s]

{'eval_loss': 0.4536064863204956, 'eval_bleu': 9.3688, 'eval_gen_len': 4.9394, 'eval_runtime': 6.5098, 'eval_samples_per_second': 45.624, 'eval_steps_per_second': 2.919, 'epoch': 199.0}


 40%|████      | 30200/75500 [2:27:45<1:42:40,  7.35it/s] 

{'loss': 0.0867, 'learning_rate': 0.0017676767419645262, 'epoch': 200.0}


                                                         
 40%|████      | 30200/75500 [2:27:52<1:42:40,  7.35it/s]

{'eval_loss': 0.44534698128700256, 'eval_bleu': 11.5628, 'eval_gen_len': 5.0337, 'eval_runtime': 7.3239, 'eval_samples_per_second': 40.552, 'eval_steps_per_second': 2.594, 'epoch': 200.0}


 40%|████      | 30351/75500 [2:28:16<1:47:13,  7.02it/s] 

{'loss': 0.0857, 'learning_rate': 0.0017617856565722942, 'epoch': 201.0}


                                                         
 40%|████      | 30351/75500 [2:28:23<1:47:13,  7.02it/s]

{'eval_loss': 0.4426601827144623, 'eval_bleu': 8.1409, 'eval_gen_len': 5.5421, 'eval_runtime': 6.8619, 'eval_samples_per_second': 43.282, 'eval_steps_per_second': 2.769, 'epoch': 201.0}


 40%|████      | 30502/75500 [2:28:47<2:10:56,  5.73it/s] 

{'loss': 0.0808, 'learning_rate': 0.001755894571180062, 'epoch': 202.0}


                                                         
 40%|████      | 30502/75500 [2:28:54<2:10:56,  5.73it/s]

{'eval_loss': 0.45541083812713623, 'eval_bleu': 11.7826, 'eval_gen_len': 4.5253, 'eval_runtime': 6.3663, 'eval_samples_per_second': 46.652, 'eval_steps_per_second': 2.984, 'epoch': 202.0}


 41%|████      | 30653/75500 [2:29:17<1:59:57,  6.23it/s] 

{'loss': 0.0832, 'learning_rate': 0.0017500034857878299, 'epoch': 203.0}


                                                         
 41%|████      | 30653/75500 [2:29:24<1:59:57,  6.23it/s]

{'eval_loss': 0.45118194818496704, 'eval_bleu': 8.6007, 'eval_gen_len': 4.5253, 'eval_runtime': 6.3515, 'eval_samples_per_second': 46.761, 'eval_steps_per_second': 2.991, 'epoch': 203.0}


 41%|████      | 30804/75500 [2:29:48<1:45:30,  7.06it/s] 

{'loss': 0.0799, 'learning_rate': 0.0017441124003955979, 'epoch': 204.0}


                                                         
 41%|████      | 30804/75500 [2:29:54<1:45:30,  7.06it/s]

{'eval_loss': 0.43013566732406616, 'eval_bleu': 12.0015, 'eval_gen_len': 4.8552, 'eval_runtime': 6.4612, 'eval_samples_per_second': 45.966, 'eval_steps_per_second': 2.941, 'epoch': 204.0}


 41%|████      | 30955/75500 [2:30:19<2:02:10,  6.08it/s] 

{'loss': 0.0768, 'learning_rate': 0.0017382213150033657, 'epoch': 205.0}


                                                         
 41%|████      | 30955/75500 [2:30:25<2:02:10,  6.08it/s]

{'eval_loss': 0.4388542175292969, 'eval_bleu': 12.998, 'eval_gen_len': 5.2559, 'eval_runtime': 6.5506, 'eval_samples_per_second': 45.339, 'eval_steps_per_second': 2.9, 'epoch': 205.0}


 41%|████      | 31106/75500 [2:30:49<1:51:42,  6.62it/s] 

{'loss': 0.0782, 'learning_rate': 0.0017323692434216783, 'epoch': 206.0}


                                                         
 41%|████      | 31106/75500 [2:30:55<1:51:42,  6.62it/s]

{'eval_loss': 0.4396767020225525, 'eval_bleu': 10.7475, 'eval_gen_len': 5.367, 'eval_runtime': 6.3426, 'eval_samples_per_second': 46.826, 'eval_steps_per_second': 2.996, 'epoch': 206.0}


 41%|████▏     | 31257/75500 [2:31:20<1:41:03,  7.30it/s] 

{'loss': 0.0824, 'learning_rate': 0.0017264781580294463, 'epoch': 207.0}


                                                         
 41%|████▏     | 31257/75500 [2:31:27<1:41:03,  7.30it/s]

{'eval_loss': 0.43621861934661865, 'eval_bleu': 13.733, 'eval_gen_len': 5.5084, 'eval_runtime': 6.4781, 'eval_samples_per_second': 45.847, 'eval_steps_per_second': 2.933, 'epoch': 207.0}


 42%|████▏     | 31408/75500 [2:31:50<1:59:15,  6.16it/s] 

{'loss': 0.0815, 'learning_rate': 0.0017205870726372142, 'epoch': 208.0}


                                                         
 42%|████▏     | 31408/75500 [2:31:57<1:59:15,  6.16it/s]

{'eval_loss': 0.4437578618526459, 'eval_bleu': 11.3145, 'eval_gen_len': 5.8215, 'eval_runtime': 6.3468, 'eval_samples_per_second': 46.795, 'eval_steps_per_second': 2.994, 'epoch': 208.0}


 42%|████▏     | 31559/75500 [2:32:21<1:58:18,  6.19it/s] 

{'loss': 0.082, 'learning_rate': 0.0017146959872449822, 'epoch': 209.0}


                                                         
 42%|████▏     | 31559/75500 [2:32:27<1:58:18,  6.19it/s]

{'eval_loss': 0.4456592798233032, 'eval_bleu': 13.8259, 'eval_gen_len': 5.0168, 'eval_runtime': 6.5162, 'eval_samples_per_second': 45.579, 'eval_steps_per_second': 2.916, 'epoch': 209.0}


 42%|████▏     | 31710/75500 [2:32:51<1:50:52,  6.58it/s] 

{'loss': 0.0838, 'learning_rate': 0.00170880490185275, 'epoch': 210.0}


                                                         
 42%|████▏     | 31710/75500 [2:32:57<1:50:52,  6.58it/s]

{'eval_loss': 0.4529172480106354, 'eval_bleu': 11.1206, 'eval_gen_len': 5.4512, 'eval_runtime': 6.4118, 'eval_samples_per_second': 46.321, 'eval_steps_per_second': 2.963, 'epoch': 210.0}


 42%|████▏     | 31861/75500 [2:33:24<1:53:44,  6.39it/s] 

{'loss': 0.0809, 'learning_rate': 0.001702913816460518, 'epoch': 211.0}


                                                         
 42%|████▏     | 31861/75500 [2:33:31<1:53:44,  6.39it/s]

{'eval_loss': 0.45243749022483826, 'eval_bleu': 13.1566, 'eval_gen_len': 5.202, 'eval_runtime': 6.6413, 'eval_samples_per_second': 44.72, 'eval_steps_per_second': 2.861, 'epoch': 211.0}


 42%|████▏     | 32012/75500 [2:33:55<1:45:31,  6.87it/s] 

{'loss': 0.0757, 'learning_rate': 0.0016970227310682858, 'epoch': 212.0}


                                                         
 42%|████▏     | 32012/75500 [2:34:01<1:45:31,  6.87it/s]

{'eval_loss': 0.4414975047111511, 'eval_bleu': 10.8034, 'eval_gen_len': 5.3872, 'eval_runtime': 6.4713, 'eval_samples_per_second': 45.895, 'eval_steps_per_second': 2.936, 'epoch': 212.0}


 43%|████▎     | 32163/75500 [2:34:24<1:52:15,  6.43it/s] 

{'loss': 0.0767, 'learning_rate': 0.0016911316456760537, 'epoch': 213.0}


                                                         
 43%|████▎     | 32163/75500 [2:34:31<1:52:15,  6.43it/s]

{'eval_loss': 0.44392046332359314, 'eval_bleu': 9.0381, 'eval_gen_len': 5.8081, 'eval_runtime': 6.7642, 'eval_samples_per_second': 43.907, 'eval_steps_per_second': 2.809, 'epoch': 213.0}


 43%|████▎     | 32314/75500 [2:34:55<1:34:35,  7.61it/s] 

{'loss': 0.077, 'learning_rate': 0.0016852795740943663, 'epoch': 214.0}


                                                         
 43%|████▎     | 32314/75500 [2:35:02<1:34:35,  7.61it/s]

{'eval_loss': 0.433945894241333, 'eval_bleu': 12.8697, 'eval_gen_len': 5.5758, 'eval_runtime': 6.5316, 'eval_samples_per_second': 45.471, 'eval_steps_per_second': 2.909, 'epoch': 214.0}


 43%|████▎     | 32465/75500 [2:35:24<1:41:35,  7.06it/s] 

{'loss': 0.0753, 'learning_rate': 0.0016793884887021343, 'epoch': 215.0}


                                                         
 43%|████▎     | 32465/75500 [2:35:30<1:41:35,  7.06it/s]

{'eval_loss': 0.4470890462398529, 'eval_bleu': 8.949, 'eval_gen_len': 5.532, 'eval_runtime': 6.2668, 'eval_samples_per_second': 47.392, 'eval_steps_per_second': 3.032, 'epoch': 215.0}


 43%|████▎     | 32616/75500 [2:35:52<1:33:30,  7.64it/s] 

{'loss': 0.0774, 'learning_rate': 0.001673497403309902, 'epoch': 216.0}


                                                         
 43%|████▎     | 32616/75500 [2:35:59<1:33:30,  7.64it/s]

{'eval_loss': 0.43353620171546936, 'eval_bleu': 9.9258, 'eval_gen_len': 5.6229, 'eval_runtime': 6.3423, 'eval_samples_per_second': 46.829, 'eval_steps_per_second': 2.996, 'epoch': 216.0}


 43%|████▎     | 32767/75500 [2:36:21<1:35:15,  7.48it/s] 

{'loss': 0.0805, 'learning_rate': 0.0016676063179176701, 'epoch': 217.0}


                                                         
 43%|████▎     | 32767/75500 [2:36:27<1:35:15,  7.48it/s]

{'eval_loss': 0.43214964866638184, 'eval_bleu': 11.3892, 'eval_gen_len': 5.3737, 'eval_runtime': 6.2445, 'eval_samples_per_second': 47.562, 'eval_steps_per_second': 3.043, 'epoch': 217.0}


 44%|████▎     | 32918/75500 [2:36:49<1:34:03,  7.55it/s] 

{'loss': 0.081, 'learning_rate': 0.001661715232525438, 'epoch': 218.0}


                                                         
 44%|████▎     | 32918/75500 [2:36:56<1:34:03,  7.55it/s]

{'eval_loss': 0.44193604588508606, 'eval_bleu': 11.2989, 'eval_gen_len': 5.2862, 'eval_runtime': 6.2113, 'eval_samples_per_second': 47.816, 'eval_steps_per_second': 3.059, 'epoch': 218.0}


 44%|████▍     | 33069/75500 [2:37:17<1:38:35,  7.17it/s] 

{'loss': 0.0804, 'learning_rate': 0.001655824147133206, 'epoch': 219.0}


                                                         
 44%|████▍     | 33069/75500 [2:37:24<1:38:35,  7.17it/s]

{'eval_loss': 0.44773992896080017, 'eval_bleu': 12.4476, 'eval_gen_len': 5.0539, 'eval_runtime': 6.3182, 'eval_samples_per_second': 47.007, 'eval_steps_per_second': 3.007, 'epoch': 219.0}


 44%|████▍     | 33220/75500 [2:37:46<1:36:15,  7.32it/s] 

{'loss': 0.0813, 'learning_rate': 0.0016499720755515186, 'epoch': 220.0}


                                                         
 44%|████▍     | 33220/75500 [2:37:52<1:36:15,  7.32it/s]

{'eval_loss': 0.4360567331314087, 'eval_bleu': 11.8109, 'eval_gen_len': 4.771, 'eval_runtime': 6.6193, 'eval_samples_per_second': 44.869, 'eval_steps_per_second': 2.87, 'epoch': 220.0}


 44%|████▍     | 33371/75500 [2:38:14<1:36:45,  7.26it/s] 

{'loss': 0.0761, 'learning_rate': 0.0016440809901592864, 'epoch': 221.0}


                                                         
 44%|████▍     | 33371/75500 [2:38:21<1:36:45,  7.26it/s]

{'eval_loss': 0.4346165955066681, 'eval_bleu': 8.5836, 'eval_gen_len': 5.6162, 'eval_runtime': 6.3586, 'eval_samples_per_second': 46.708, 'eval_steps_per_second': 2.988, 'epoch': 221.0}


 44%|████▍     | 33522/75500 [2:38:49<1:41:25,  6.90it/s] 

{'loss': 0.0728, 'learning_rate': 0.0016381899047670544, 'epoch': 222.0}


                                                         
 44%|████▍     | 33522/75500 [2:38:55<1:41:25,  6.90it/s]

{'eval_loss': 0.44890958070755005, 'eval_bleu': 13.4561, 'eval_gen_len': 5.1448, 'eval_runtime': 6.4629, 'eval_samples_per_second': 45.954, 'eval_steps_per_second': 2.94, 'epoch': 222.0}


 45%|████▍     | 33673/75500 [2:39:17<1:34:23,  7.39it/s] 

{'loss': 0.0755, 'learning_rate': 0.0016322988193748222, 'epoch': 223.0}


                                                         
 45%|████▍     | 33673/75500 [2:39:23<1:34:23,  7.39it/s]

{'eval_loss': 0.44327157735824585, 'eval_bleu': 10.9933, 'eval_gen_len': 5.0707, 'eval_runtime': 6.3448, 'eval_samples_per_second': 46.81, 'eval_steps_per_second': 2.995, 'epoch': 223.0}


 45%|████▍     | 33824/75500 [2:39:46<1:37:00,  7.16it/s] 

{'loss': 0.0729, 'learning_rate': 0.00162640773398259, 'epoch': 224.0}


                                                         
 45%|████▍     | 33824/75500 [2:39:53<1:37:00,  7.16it/s]

{'eval_loss': 0.44168466329574585, 'eval_bleu': 11.7217, 'eval_gen_len': 5.2626, 'eval_runtime': 6.2423, 'eval_samples_per_second': 47.579, 'eval_steps_per_second': 3.044, 'epoch': 224.0}


 45%|████▌     | 33975/75500 [2:40:15<1:33:19,  7.42it/s] 

{'loss': 0.0748, 'learning_rate': 0.001620516648590358, 'epoch': 225.0}


                                                         
 45%|████▌     | 33975/75500 [2:40:21<1:33:19,  7.42it/s]

{'eval_loss': 0.4394729435443878, 'eval_bleu': 11.2313, 'eval_gen_len': 5.0337, 'eval_runtime': 6.2427, 'eval_samples_per_second': 47.576, 'eval_steps_per_second': 3.044, 'epoch': 225.0}


 45%|████▌     | 34126/75500 [2:40:44<1:46:44,  6.46it/s] 

{'loss': 0.0782, 'learning_rate': 0.001614625563198126, 'epoch': 226.0}


                                                         
 45%|████▌     | 34126/75500 [2:40:50<1:46:44,  6.46it/s]

{'eval_loss': 0.4318881034851074, 'eval_bleu': 10.8658, 'eval_gen_len': 5.1077, 'eval_runtime': 6.3251, 'eval_samples_per_second': 46.955, 'eval_steps_per_second': 3.004, 'epoch': 226.0}


 45%|████▌     | 34277/75500 [2:41:13<2:02:00,  5.63it/s] 

{'loss': 0.0791, 'learning_rate': 0.001608734477805894, 'epoch': 227.0}


                                                         
 45%|████▌     | 34277/75500 [2:41:20<2:02:00,  5.63it/s]

{'eval_loss': 0.42933517694473267, 'eval_bleu': 12.3484, 'eval_gen_len': 5.7071, 'eval_runtime': 6.4967, 'eval_samples_per_second': 45.715, 'eval_steps_per_second': 2.925, 'epoch': 227.0}


 46%|████▌     | 34428/75500 [2:41:41<1:32:50,  7.37it/s] 

{'loss': 0.0762, 'learning_rate': 0.0016028433924136617, 'epoch': 228.0}


                                                         
 46%|████▌     | 34428/75500 [2:41:48<1:32:50,  7.37it/s]

{'eval_loss': 0.43813085556030273, 'eval_bleu': 12.5171, 'eval_gen_len': 5.3401, 'eval_runtime': 6.27, 'eval_samples_per_second': 47.368, 'eval_steps_per_second': 3.03, 'epoch': 228.0}


 46%|████▌     | 34579/75500 [2:42:10<1:39:11,  6.88it/s] 

{'loss': 0.0744, 'learning_rate': 0.0015969523070214298, 'epoch': 229.0}


                                                         
 46%|████▌     | 34579/75500 [2:42:17<1:39:11,  6.88it/s]

{'eval_loss': 0.4494926929473877, 'eval_bleu': 10.2279, 'eval_gen_len': 5.0236, 'eval_runtime': 6.5448, 'eval_samples_per_second': 45.379, 'eval_steps_per_second': 2.903, 'epoch': 229.0}


 46%|████▌     | 34730/75500 [2:42:39<1:44:28,  6.50it/s] 

{'loss': 0.0737, 'learning_rate': 0.0015910612216291976, 'epoch': 230.0}


                                                         
 46%|████▌     | 34730/75500 [2:42:45<1:44:28,  6.50it/s]

{'eval_loss': 0.4319930076599121, 'eval_bleu': 11.3557, 'eval_gen_len': 5.2929, 'eval_runtime': 6.2648, 'eval_samples_per_second': 47.408, 'eval_steps_per_second': 3.033, 'epoch': 230.0}


 46%|████▌     | 34881/75500 [2:43:07<1:31:24,  7.41it/s] 

{'loss': 0.0726, 'learning_rate': 0.0015851701362369656, 'epoch': 231.0}


                                                         
 46%|████▌     | 34881/75500 [2:43:13<1:31:24,  7.41it/s]

{'eval_loss': 0.43294137716293335, 'eval_bleu': 11.9884, 'eval_gen_len': 5.3939, 'eval_runtime': 6.3583, 'eval_samples_per_second': 46.71, 'eval_steps_per_second': 2.988, 'epoch': 231.0}


 46%|████▋     | 35032/75500 [2:43:36<1:38:18,  6.86it/s] 

{'loss': 0.076, 'learning_rate': 0.0015792790508447334, 'epoch': 232.0}


                                                         
 46%|████▋     | 35032/75500 [2:43:42<1:38:18,  6.86it/s]

{'eval_loss': 0.4400367736816406, 'eval_bleu': 13.552, 'eval_gen_len': 5.2896, 'eval_runtime': 6.3502, 'eval_samples_per_second': 46.77, 'eval_steps_per_second': 2.992, 'epoch': 232.0}


 47%|████▋     | 35183/75500 [2:44:04<1:36:18,  6.98it/s] 

{'loss': 0.0721, 'learning_rate': 0.0015733879654525015, 'epoch': 233.0}


                                                         
 47%|████▋     | 35183/75500 [2:44:10<1:36:18,  6.98it/s]

{'eval_loss': 0.4557778835296631, 'eval_bleu': 10.7596, 'eval_gen_len': 4.8956, 'eval_runtime': 6.1864, 'eval_samples_per_second': 48.008, 'eval_steps_per_second': 3.071, 'epoch': 233.0}


 47%|████▋     | 35334/75500 [2:44:33<1:29:09,  7.51it/s] 

{'loss': 0.0784, 'learning_rate': 0.0015674968800602693, 'epoch': 234.0}


                                                         
 47%|████▋     | 35334/75500 [2:44:39<1:29:09,  7.51it/s]

{'eval_loss': 0.4403255879878998, 'eval_bleu': 10.5308, 'eval_gen_len': 5.6801, 'eval_runtime': 6.4648, 'eval_samples_per_second': 45.941, 'eval_steps_per_second': 2.939, 'epoch': 234.0}


 47%|████▋     | 35485/75500 [2:45:01<1:28:21,  7.55it/s] 

{'loss': 0.0787, 'learning_rate': 0.001561605794668037, 'epoch': 235.0}


                                                         
 47%|████▋     | 35485/75500 [2:45:07<1:28:21,  7.55it/s]

{'eval_loss': 0.4406326115131378, 'eval_bleu': 16.2981, 'eval_gen_len': 4.9158, 'eval_runtime': 6.2279, 'eval_samples_per_second': 47.688, 'eval_steps_per_second': 3.051, 'epoch': 235.0}


 47%|████▋     | 35636/75500 [2:45:29<1:47:39,  6.17it/s] 

{'loss': 0.0701, 'learning_rate': 0.0015557147092758051, 'epoch': 236.0}


                                                         
 47%|████▋     | 35636/75500 [2:45:36<1:47:39,  6.17it/s]

{'eval_loss': 0.42977577447891235, 'eval_bleu': 17.0177, 'eval_gen_len': 5.1279, 'eval_runtime': 6.5269, 'eval_samples_per_second': 45.504, 'eval_steps_per_second': 2.911, 'epoch': 236.0}


 47%|████▋     | 35787/75500 [2:45:57<1:30:11,  7.34it/s] 

{'loss': 0.0685, 'learning_rate': 0.001549823623883573, 'epoch': 237.0}


                                                         
 47%|████▋     | 35787/75500 [2:46:03<1:30:11,  7.34it/s]

{'eval_loss': 0.45456457138061523, 'eval_bleu': 13.7921, 'eval_gen_len': 4.8519, 'eval_runtime': 6.2195, 'eval_samples_per_second': 47.753, 'eval_steps_per_second': 3.055, 'epoch': 237.0}


 48%|████▊     | 35938/75500 [2:46:25<1:32:16,  7.15it/s] 

{'loss': 0.0694, 'learning_rate': 0.001543932538491341, 'epoch': 238.0}


                                                         
 48%|████▊     | 35938/75500 [2:46:31<1:32:16,  7.15it/s]

{'eval_loss': 0.44854745268821716, 'eval_bleu': 12.77, 'eval_gen_len': 5.1515, 'eval_runtime': 6.2391, 'eval_samples_per_second': 47.603, 'eval_steps_per_second': 3.045, 'epoch': 238.0}


 48%|████▊     | 36089/75500 [2:46:54<1:38:12,  6.69it/s] 

{'loss': 0.0719, 'learning_rate': 0.0015380414530991088, 'epoch': 239.0}


                                                         
 48%|████▊     | 36089/75500 [2:47:00<1:38:12,  6.69it/s]

{'eval_loss': 0.43951961398124695, 'eval_bleu': 15.4417, 'eval_gen_len': 5.1684, 'eval_runtime': 6.2153, 'eval_samples_per_second': 47.786, 'eval_steps_per_second': 3.057, 'epoch': 239.0}


 48%|████▊     | 36240/75500 [2:47:22<1:31:06,  7.18it/s] 

{'loss': 0.0711, 'learning_rate': 0.0015321503677068768, 'epoch': 240.0}


                                                         
 48%|████▊     | 36240/75500 [2:47:28<1:31:06,  7.18it/s]

{'eval_loss': 0.43143829703330994, 'eval_bleu': 13.4558, 'eval_gen_len': 5.1212, 'eval_runtime': 6.2949, 'eval_samples_per_second': 47.181, 'eval_steps_per_second': 3.018, 'epoch': 240.0}


 48%|████▊     | 36391/75500 [2:47:50<1:33:47,  6.95it/s] 

{'loss': 0.0712, 'learning_rate': 0.0015262592823146446, 'epoch': 241.0}


                                                         
 48%|████▊     | 36391/75500 [2:47:57<1:33:47,  6.95it/s]

{'eval_loss': 0.4290425479412079, 'eval_bleu': 13.1194, 'eval_gen_len': 5.4512, 'eval_runtime': 6.3017, 'eval_samples_per_second': 47.13, 'eval_steps_per_second': 3.015, 'epoch': 241.0}


 48%|████▊     | 36542/75500 [2:48:18<1:26:48,  7.48it/s] 

{'loss': 0.0775, 'learning_rate': 0.0015204072107329572, 'epoch': 242.0}


                                                         
 48%|████▊     | 36542/75500 [2:48:25<1:26:48,  7.48it/s]

{'eval_loss': 0.45259326696395874, 'eval_bleu': 11.6662, 'eval_gen_len': 4.8687, 'eval_runtime': 6.4697, 'eval_samples_per_second': 45.906, 'eval_steps_per_second': 2.937, 'epoch': 242.0}


 49%|████▊     | 36693/75500 [2:48:53<1:29:01,  7.27it/s] 

{'loss': 0.0749, 'learning_rate': 0.0015145161253407253, 'epoch': 243.0}


                                                         
 49%|████▊     | 36693/75500 [2:48:59<1:29:01,  7.27it/s]

{'eval_loss': 0.44631171226501465, 'eval_bleu': 11.5181, 'eval_gen_len': 5.0303, 'eval_runtime': 6.2439, 'eval_samples_per_second': 47.567, 'eval_steps_per_second': 3.043, 'epoch': 243.0}


 49%|████▉     | 36844/75500 [2:49:21<1:26:54,  7.41it/s] 

{'loss': 0.0734, 'learning_rate': 0.001508625039948493, 'epoch': 244.0}


                                                         
 49%|████▉     | 36844/75500 [2:49:27<1:26:54,  7.41it/s]

{'eval_loss': 0.47119027376174927, 'eval_bleu': 13.4142, 'eval_gen_len': 4.367, 'eval_runtime': 6.1568, 'eval_samples_per_second': 48.239, 'eval_steps_per_second': 3.086, 'epoch': 244.0}


 49%|████▉     | 36995/75500 [2:49:50<1:55:42,  5.55it/s] 

{'loss': 0.0777, 'learning_rate': 0.0015027339545562609, 'epoch': 245.0}


                                                         
 49%|████▉     | 36995/75500 [2:49:59<1:55:42,  5.55it/s]

{'eval_loss': 0.43204763531684875, 'eval_bleu': 16.7761, 'eval_gen_len': 5.0909, 'eval_runtime': 8.7253, 'eval_samples_per_second': 34.039, 'eval_steps_per_second': 2.178, 'epoch': 245.0}


 49%|████▉     | 37146/75500 [2:50:29<2:06:13,  5.06it/s] 

{'loss': 0.0834, 'learning_rate': 0.001496842869164029, 'epoch': 246.0}


                                                         
 49%|████▉     | 37146/75500 [2:50:37<2:06:13,  5.06it/s]

{'eval_loss': 0.4621550142765045, 'eval_bleu': 9.0404, 'eval_gen_len': 4.5354, 'eval_runtime': 7.8496, 'eval_samples_per_second': 37.836, 'eval_steps_per_second': 2.42, 'epoch': 246.0}


 49%|████▉     | 37297/75500 [2:51:07<2:13:49,  4.76it/s] 

{'loss': 0.0688, 'learning_rate': 0.0014909517837717967, 'epoch': 247.0}


                                                         
 49%|████▉     | 37297/75500 [2:51:17<2:13:49,  4.76it/s]

{'eval_loss': 0.4426197111606598, 'eval_bleu': 13.5017, 'eval_gen_len': 4.8721, 'eval_runtime': 9.7204, 'eval_samples_per_second': 30.554, 'eval_steps_per_second': 1.955, 'epoch': 247.0}


 50%|████▉     | 37448/75500 [2:51:45<1:30:11,  7.03it/s] 

{'loss': 0.0656, 'learning_rate': 0.0014850606983795648, 'epoch': 248.0}


                                                         
 50%|████▉     | 37448/75500 [2:51:52<1:30:11,  7.03it/s]

{'eval_loss': 0.43879714608192444, 'eval_bleu': 12.3979, 'eval_gen_len': 5.3805, 'eval_runtime': 6.5332, 'eval_samples_per_second': 45.46, 'eval_steps_per_second': 2.908, 'epoch': 248.0}


 50%|████▉     | 37599/75500 [2:52:16<1:37:12,  6.50it/s] 

{'loss': 0.0653, 'learning_rate': 0.0014791696129873326, 'epoch': 249.0}


                                                         
 50%|████▉     | 37599/75500 [2:52:22<1:37:12,  6.50it/s]

{'eval_loss': 0.4354567527770996, 'eval_bleu': 17.3976, 'eval_gen_len': 4.8451, 'eval_runtime': 6.4362, 'eval_samples_per_second': 46.145, 'eval_steps_per_second': 2.952, 'epoch': 249.0}


 50%|█████     | 37750/75500 [2:52:46<1:31:12,  6.90it/s] 

{'loss': 0.0664, 'learning_rate': 0.0014732785275951006, 'epoch': 250.0}


                                                         
 50%|█████     | 37750/75500 [2:52:53<1:31:12,  6.90it/s]

{'eval_loss': 0.44237110018730164, 'eval_bleu': 13.1278, 'eval_gen_len': 5.0168, 'eval_runtime': 6.4126, 'eval_samples_per_second': 46.315, 'eval_steps_per_second': 2.963, 'epoch': 250.0}


 50%|█████     | 37901/75500 [2:53:17<1:47:19,  5.84it/s] 

{'loss': 0.067, 'learning_rate': 0.0014673874422028684, 'epoch': 251.0}


                                                         
 50%|█████     | 37901/75500 [2:53:23<1:47:19,  5.84it/s]

{'eval_loss': 0.43400898575782776, 'eval_bleu': 10.6707, 'eval_gen_len': 5.1717, 'eval_runtime': 6.3942, 'eval_samples_per_second': 46.448, 'eval_steps_per_second': 2.971, 'epoch': 251.0}


 50%|█████     | 38052/75500 [2:53:47<1:23:41,  7.46it/s] 

{'loss': 0.0755, 'learning_rate': 0.0014614963568106364, 'epoch': 252.0}


                                                         
 50%|█████     | 38052/75500 [2:53:53<1:23:41,  7.46it/s]

{'eval_loss': 0.44788938760757446, 'eval_bleu': 11.1762, 'eval_gen_len': 5.3872, 'eval_runtime': 6.3253, 'eval_samples_per_second': 46.954, 'eval_steps_per_second': 3.004, 'epoch': 252.0}


 51%|█████     | 38203/75500 [2:54:17<1:28:36,  7.02it/s] 

{'loss': 0.0727, 'learning_rate': 0.0014556052714184043, 'epoch': 253.0}


                                                         
 51%|█████     | 38203/75500 [2:54:24<1:28:36,  7.02it/s]

{'eval_loss': 0.44046905636787415, 'eval_bleu': 10.7564, 'eval_gen_len': 5.4377, 'eval_runtime': 6.5546, 'eval_samples_per_second': 45.312, 'eval_steps_per_second': 2.899, 'epoch': 253.0}


 51%|█████     | 38354/75500 [2:54:51<1:32:51,  6.67it/s] 

{'loss': 0.0695, 'learning_rate': 0.0014497141860261723, 'epoch': 254.0}


                                                         
 51%|█████     | 38354/75500 [2:54:57<1:32:51,  6.67it/s]

{'eval_loss': 0.42072322964668274, 'eval_bleu': 10.0027, 'eval_gen_len': 5.4007, 'eval_runtime': 6.4012, 'eval_samples_per_second': 46.398, 'eval_steps_per_second': 2.968, 'epoch': 254.0}


 51%|█████     | 38505/75500 [2:55:21<1:33:02,  6.63it/s] 

{'loss': 0.0678, 'learning_rate': 0.00144382310063394, 'epoch': 255.0}


                                                         
 51%|█████     | 38505/75500 [2:55:28<1:33:02,  6.63it/s]

{'eval_loss': 0.45615383982658386, 'eval_bleu': 11.874, 'eval_gen_len': 4.8081, 'eval_runtime': 6.7965, 'eval_samples_per_second': 43.699, 'eval_steps_per_second': 2.796, 'epoch': 255.0}


 51%|█████     | 38656/75500 [2:55:52<1:26:43,  7.08it/s] 

{'loss': 0.0662, 'learning_rate': 0.001437932015241708, 'epoch': 256.0}


                                                         
 51%|█████     | 38656/75500 [2:55:58<1:26:43,  7.08it/s]

{'eval_loss': 0.4446524679660797, 'eval_bleu': 13.4791, 'eval_gen_len': 5.6027, 'eval_runtime': 6.3113, 'eval_samples_per_second': 47.058, 'eval_steps_per_second': 3.01, 'epoch': 256.0}


 51%|█████▏    | 38807/75500 [2:56:22<1:33:29,  6.54it/s] 

{'loss': 0.0665, 'learning_rate': 0.001432040929849476, 'epoch': 257.0}


                                                         
 51%|█████▏    | 38807/75500 [2:56:29<1:33:29,  6.54it/s]

{'eval_loss': 0.43756258487701416, 'eval_bleu': 13.973, 'eval_gen_len': 5.0471, 'eval_runtime': 6.6172, 'eval_samples_per_second': 44.883, 'eval_steps_per_second': 2.871, 'epoch': 257.0}


 52%|█████▏    | 38958/75500 [2:56:53<1:24:37,  7.20it/s] 

{'loss': 0.0697, 'learning_rate': 0.0014261498444572438, 'epoch': 258.0}


                                                         
 52%|█████▏    | 38958/75500 [2:56:59<1:24:37,  7.20it/s]

{'eval_loss': 0.4440707266330719, 'eval_bleu': 12.1634, 'eval_gen_len': 5.0202, 'eval_runtime': 6.4844, 'eval_samples_per_second': 45.802, 'eval_steps_per_second': 2.93, 'epoch': 258.0}


 52%|█████▏    | 39109/75500 [2:57:23<1:45:22,  5.76it/s] 

{'loss': 0.0713, 'learning_rate': 0.0014202587590650118, 'epoch': 259.0}


                                                         
 52%|█████▏    | 39109/75500 [2:57:30<1:45:22,  5.76it/s]

{'eval_loss': 0.4502444267272949, 'eval_bleu': 11.7267, 'eval_gen_len': 5.1987, 'eval_runtime': 6.4559, 'eval_samples_per_second': 46.005, 'eval_steps_per_second': 2.943, 'epoch': 259.0}


 52%|█████▏    | 39260/75500 [2:57:53<1:32:51,  6.50it/s] 

{'loss': 0.0668, 'learning_rate': 0.0014143676736727796, 'epoch': 260.0}


                                                         
 52%|█████▏    | 39260/75500 [2:58:00<1:32:51,  6.50it/s]

{'eval_loss': 0.4553050100803375, 'eval_bleu': 14.5974, 'eval_gen_len': 5.0943, 'eval_runtime': 6.4613, 'eval_samples_per_second': 45.966, 'eval_steps_per_second': 2.941, 'epoch': 260.0}


 52%|█████▏    | 39411/75500 [2:58:24<1:45:04,  5.72it/s] 

{'loss': 0.0666, 'learning_rate': 0.0014084765882805476, 'epoch': 261.0}


                                                         
 52%|█████▏    | 39411/75500 [2:58:31<1:45:04,  5.72it/s]

{'eval_loss': 0.4579183757305145, 'eval_bleu': 11.5984, 'eval_gen_len': 5.4848, 'eval_runtime': 6.6727, 'eval_samples_per_second': 44.51, 'eval_steps_per_second': 2.847, 'epoch': 261.0}


 52%|█████▏    | 39562/75500 [2:58:56<1:23:49,  7.15it/s] 

{'loss': 0.065, 'learning_rate': 0.0014025855028883154, 'epoch': 262.0}


                                                         
 52%|█████▏    | 39562/75500 [2:59:02<1:23:49,  7.15it/s]

{'eval_loss': 0.43631258606910706, 'eval_bleu': 12.0762, 'eval_gen_len': 5.5118, 'eval_runtime': 6.2644, 'eval_samples_per_second': 47.411, 'eval_steps_per_second': 3.033, 'epoch': 262.0}


 53%|█████▎    | 39713/75500 [2:59:26<1:25:46,  6.95it/s] 

{'loss': 0.0659, 'learning_rate': 0.0013966944174960835, 'epoch': 263.0}


                                                         
 53%|█████▎    | 39713/75500 [2:59:32<1:25:46,  6.95it/s]

{'eval_loss': 0.44578680396080017, 'eval_bleu': 17.6444, 'eval_gen_len': 5.1852, 'eval_runtime': 6.3875, 'eval_samples_per_second': 46.497, 'eval_steps_per_second': 2.975, 'epoch': 263.0}


 53%|█████▎    | 39864/75500 [2:59:57<1:34:22,  6.29it/s] 

{'loss': 0.0676, 'learning_rate': 0.0013908033321038513, 'epoch': 264.0}


                                                         
 53%|█████▎    | 39864/75500 [3:00:05<1:34:22,  6.29it/s]

{'eval_loss': 0.4567624032497406, 'eval_bleu': 10.4894, 'eval_gen_len': 5.0909, 'eval_runtime': 7.5514, 'eval_samples_per_second': 39.331, 'eval_steps_per_second': 2.516, 'epoch': 264.0}


 53%|█████▎    | 40015/75500 [3:00:36<1:57:11,  5.05it/s] 

{'loss': 0.0679, 'learning_rate': 0.001384912246711619, 'epoch': 265.0}


                                                         
 53%|█████▎    | 40015/75500 [3:00:44<1:57:11,  5.05it/s]

{'eval_loss': 0.44132739305496216, 'eval_bleu': 13.1704, 'eval_gen_len': 4.9024, 'eval_runtime': 8.2773, 'eval_samples_per_second': 35.881, 'eval_steps_per_second': 2.295, 'epoch': 265.0}


 53%|█████▎    | 40166/75500 [3:01:12<1:52:37,  5.23it/s] 

{'loss': 0.0703, 'learning_rate': 0.0013790211613193871, 'epoch': 266.0}


                                                         
 53%|█████▎    | 40166/75500 [3:01:19<1:52:37,  5.23it/s]

{'eval_loss': 0.4553527235984802, 'eval_bleu': 11.0872, 'eval_gen_len': 4.9158, 'eval_runtime': 6.9704, 'eval_samples_per_second': 42.609, 'eval_steps_per_second': 2.726, 'epoch': 266.0}


 53%|█████▎    | 40317/75500 [3:01:44<1:36:05,  6.10it/s] 

{'loss': 0.069, 'learning_rate': 0.001373130075927155, 'epoch': 267.0}


                                                         
 53%|█████▎    | 40317/75500 [3:01:51<1:36:05,  6.10it/s]

{'eval_loss': 0.44217705726623535, 'eval_bleu': 10.7028, 'eval_gen_len': 5.3872, 'eval_runtime': 6.9859, 'eval_samples_per_second': 42.514, 'eval_steps_per_second': 2.72, 'epoch': 267.0}


 54%|█████▎    | 40468/75500 [3:03:55<10:13:06,  1.05s/it]

{'loss': 0.0647, 'learning_rate': 0.001367238990534923, 'epoch': 268.0}


                                                          
 54%|█████▎    | 40468/75500 [3:04:23<10:13:06,  1.05s/it]

{'eval_loss': 0.4502562880516052, 'eval_bleu': 11.845, 'eval_gen_len': 4.8418, 'eval_runtime': 28.1994, 'eval_samples_per_second': 10.532, 'eval_steps_per_second': 0.674, 'epoch': 268.0}


 54%|█████▍    | 40619/75500 [3:06:56<11:49:43,  1.22s/it]

{'loss': 0.0654, 'learning_rate': 0.0013613479051426908, 'epoch': 269.0}


                                                          
 54%|█████▍    | 40619/75500 [3:07:24<11:49:43,  1.22s/it]

{'eval_loss': 0.45285189151763916, 'eval_bleu': 15.1978, 'eval_gen_len': 4.798, 'eval_runtime': 28.4447, 'eval_samples_per_second': 10.441, 'eval_steps_per_second': 0.668, 'epoch': 269.0}


 54%|█████▍    | 40770/75500 [3:09:39<8:02:13,  1.20it/s] 

{'loss': 0.0635, 'learning_rate': 0.0013554568197504588, 'epoch': 270.0}


                                                         
 54%|█████▍    | 40770/75500 [3:10:00<8:02:13,  1.20it/s]

{'eval_loss': 0.4537598192691803, 'eval_bleu': 12.2163, 'eval_gen_len': 4.9933, 'eval_runtime': 20.6972, 'eval_samples_per_second': 14.35, 'eval_steps_per_second': 0.918, 'epoch': 270.0}


 54%|█████▍    | 40921/75500 [3:12:05<7:52:13,  1.22it/s] 

{'loss': 0.0604, 'learning_rate': 0.0013495657343582266, 'epoch': 271.0}


                                                         
 54%|█████▍    | 40921/75500 [3:12:25<7:52:13,  1.22it/s]

{'eval_loss': 0.44513726234436035, 'eval_bleu': 15.179, 'eval_gen_len': 4.8519, 'eval_runtime': 19.541, 'eval_samples_per_second': 15.199, 'eval_steps_per_second': 0.972, 'epoch': 271.0}


 54%|█████▍    | 41072/75500 [3:15:01<9:06:11,  1.05it/s] 

{'loss': 0.0618, 'learning_rate': 0.0013436746489659947, 'epoch': 272.0}


                                                         
 54%|█████▍    | 41072/75500 [3:15:25<9:06:11,  1.05it/s]

{'eval_loss': 0.4532168209552765, 'eval_bleu': 9.9656, 'eval_gen_len': 4.8283, 'eval_runtime': 24.02, 'eval_samples_per_second': 12.365, 'eval_steps_per_second': 0.791, 'epoch': 272.0}


 55%|█████▍    | 41223/75500 [3:18:01<9:55:51,  1.04s/it] 

{'loss': 0.0696, 'learning_rate': 0.0013378225773843073, 'epoch': 273.0}


                                                         
 55%|█████▍    | 41223/75500 [3:18:22<9:55:51,  1.04s/it]

{'eval_loss': 0.44215571880340576, 'eval_bleu': 14.6843, 'eval_gen_len': 5.2391, 'eval_runtime': 20.8265, 'eval_samples_per_second': 14.261, 'eval_steps_per_second': 0.912, 'epoch': 273.0}


 55%|█████▍    | 41374/75500 [3:20:37<7:15:52,  1.30it/s] 

{'loss': 0.0674, 'learning_rate': 0.001331931491992075, 'epoch': 274.0}


                                                         
 55%|█████▍    | 41374/75500 [3:21:02<7:15:52,  1.30it/s]

{'eval_loss': 0.45181962847709656, 'eval_bleu': 13.8351, 'eval_gen_len': 5.3266, 'eval_runtime': 24.9546, 'eval_samples_per_second': 11.902, 'eval_steps_per_second': 0.761, 'epoch': 274.0}


 55%|█████▌    | 41525/75500 [3:23:38<9:15:51,  1.02it/s] 

{'loss': 0.0638, 'learning_rate': 0.001326040406599843, 'epoch': 275.0}


                                                         
 55%|█████▌    | 41525/75500 [3:24:04<9:15:51,  1.02it/s]

{'eval_loss': 0.4538716971874237, 'eval_bleu': 13.6056, 'eval_gen_len': 4.5354, 'eval_runtime': 26.3607, 'eval_samples_per_second': 11.267, 'eval_steps_per_second': 0.721, 'epoch': 275.0}


 55%|█████▌    | 41676/75500 [3:26:34<8:58:41,  1.05it/s] 

{'loss': 0.0606, 'learning_rate': 0.001320149321207611, 'epoch': 276.0}


                                                         
 55%|█████▌    | 41676/75500 [3:27:02<8:58:41,  1.05it/s]

{'eval_loss': 0.44836169481277466, 'eval_bleu': 15.0147, 'eval_gen_len': 5.3199, 'eval_runtime': 27.7418, 'eval_samples_per_second': 10.706, 'eval_steps_per_second': 0.685, 'epoch': 276.0}


 55%|█████▌    | 41827/75500 [3:29:32<9:01:52,  1.04it/s] 

{'loss': 0.0597, 'learning_rate': 0.0013142582358153787, 'epoch': 277.0}


                                                         
 55%|█████▌    | 41827/75500 [3:30:03<9:01:52,  1.04it/s]

{'eval_loss': 0.45157313346862793, 'eval_bleu': 11.1353, 'eval_gen_len': 5.4815, 'eval_runtime': 30.2918, 'eval_samples_per_second': 9.805, 'eval_steps_per_second': 0.627, 'epoch': 277.0}


 56%|█████▌    | 41978/75500 [3:32:43<10:13:31,  1.10s/it]

{'loss': 0.0599, 'learning_rate': 0.0013083671504231468, 'epoch': 278.0}


                                                          
 56%|█████▌    | 41978/75500 [3:33:10<10:13:31,  1.10s/it]

{'eval_loss': 0.45254507660865784, 'eval_bleu': 13.9268, 'eval_gen_len': 5.1751, 'eval_runtime': 27.2571, 'eval_samples_per_second': 10.896, 'eval_steps_per_second': 0.697, 'epoch': 278.0}


 56%|█████▌    | 42129/75500 [3:33:37<1:38:36,  5.64it/s] 

{'loss': 0.0623, 'learning_rate': 0.0013024760650309146, 'epoch': 279.0}


                                                         
 56%|█████▌    | 42129/75500 [3:33:45<1:38:36,  5.64it/s]

{'eval_loss': 0.4472326338291168, 'eval_bleu': 14.0188, 'eval_gen_len': 5.0034, 'eval_runtime': 7.9073, 'eval_samples_per_second': 37.56, 'eval_steps_per_second': 2.403, 'epoch': 279.0}


 56%|█████▌    | 42280/75500 [3:34:12<1:30:39,  6.11it/s] 

{'loss': 0.0637, 'learning_rate': 0.0012965849796386826, 'epoch': 280.0}


                                                         
 56%|█████▌    | 42280/75500 [3:34:19<1:30:39,  6.11it/s]

{'eval_loss': 0.46102556586265564, 'eval_bleu': 12.8078, 'eval_gen_len': 5.1448, 'eval_runtime': 7.182, 'eval_samples_per_second': 41.354, 'eval_steps_per_second': 2.646, 'epoch': 280.0}


 56%|█████▌    | 42431/75500 [3:34:45<1:38:36,  5.59it/s] 

{'loss': 0.064, 'learning_rate': 0.0012906938942464504, 'epoch': 281.0}


                                                         
 56%|█████▌    | 42431/75500 [3:34:53<1:38:36,  5.59it/s]

{'eval_loss': 0.4465502202510834, 'eval_bleu': 12.4512, 'eval_gen_len': 5.5758, 'eval_runtime': 8.6693, 'eval_samples_per_second': 34.259, 'eval_steps_per_second': 2.192, 'epoch': 281.0}


 56%|█████▋    | 42582/75500 [3:35:23<1:25:35,  6.41it/s] 

{'loss': 0.0628, 'learning_rate': 0.0012848028088542184, 'epoch': 282.0}


                                                         
 56%|█████▋    | 42582/75500 [3:35:31<1:25:35,  6.41it/s]

{'eval_loss': 0.44783735275268555, 'eval_bleu': 11.4389, 'eval_gen_len': 4.7071, 'eval_runtime': 7.555, 'eval_samples_per_second': 39.311, 'eval_steps_per_second': 2.515, 'epoch': 282.0}


 57%|█████▋    | 42733/75500 [3:35:56<1:23:37,  6.53it/s] 

{'loss': 0.0675, 'learning_rate': 0.0012789117234619863, 'epoch': 283.0}


                                                         
 57%|█████▋    | 42733/75500 [3:36:03<1:23:37,  6.53it/s]

{'eval_loss': 0.45614540576934814, 'eval_bleu': 9.3742, 'eval_gen_len': 5.9697, 'eval_runtime': 6.4703, 'eval_samples_per_second': 45.902, 'eval_steps_per_second': 2.936, 'epoch': 283.0}


 57%|█████▋    | 42884/75500 [3:36:29<1:37:10,  5.59it/s] 

{'loss': 0.0686, 'learning_rate': 0.0012730206380697543, 'epoch': 284.0}


                                                         
 57%|█████▋    | 42884/75500 [3:36:35<1:37:10,  5.59it/s]

{'eval_loss': 0.4471043050289154, 'eval_bleu': 13.6017, 'eval_gen_len': 4.9259, 'eval_runtime': 6.7134, 'eval_samples_per_second': 44.24, 'eval_steps_per_second': 2.83, 'epoch': 284.0}


 57%|█████▋    | 43035/75500 [3:37:05<1:37:57,  5.52it/s] 

{'loss': 0.0593, 'learning_rate': 0.001267129552677522, 'epoch': 285.0}


                                                         
 57%|█████▋    | 43035/75500 [3:37:14<1:37:57,  5.52it/s]

{'eval_loss': 0.4638572633266449, 'eval_bleu': 12.0881, 'eval_gen_len': 4.6801, 'eval_runtime': 9.0447, 'eval_samples_per_second': 32.837, 'eval_steps_per_second': 2.101, 'epoch': 285.0}


 57%|█████▋    | 43186/75500 [3:37:41<1:52:51,  4.77it/s] 

{'loss': 0.0557, 'learning_rate': 0.0012612384672852901, 'epoch': 286.0}


                                                         
 57%|█████▋    | 43186/75500 [3:37:48<1:52:51,  4.77it/s]

{'eval_loss': 0.44486650824546814, 'eval_bleu': 15.116, 'eval_gen_len': 4.963, 'eval_runtime': 7.7143, 'eval_samples_per_second': 38.5, 'eval_steps_per_second': 2.463, 'epoch': 286.0}


 57%|█████▋    | 43337/75500 [3:38:16<1:30:10,  5.95it/s] 

{'loss': 0.0566, 'learning_rate': 0.001255347381893058, 'epoch': 287.0}


                                                         
 57%|█████▋    | 43337/75500 [3:38:23<1:30:10,  5.95it/s]

{'eval_loss': 0.44482094049453735, 'eval_bleu': 11.7983, 'eval_gen_len': 5.1987, 'eval_runtime': 6.8225, 'eval_samples_per_second': 43.533, 'eval_steps_per_second': 2.785, 'epoch': 287.0}


 58%|█████▊    | 43488/75500 [3:38:49<1:26:53,  6.14it/s] 

{'loss': 0.0578, 'learning_rate': 0.0012494562965008258, 'epoch': 288.0}


                                                         
 58%|█████▊    | 43488/75500 [3:38:57<1:26:53,  6.14it/s]

{'eval_loss': 0.4640776216983795, 'eval_bleu': 14.1203, 'eval_gen_len': 4.6902, 'eval_runtime': 7.5884, 'eval_samples_per_second': 39.139, 'eval_steps_per_second': 2.504, 'epoch': 288.0}


 58%|█████▊    | 43639/75500 [3:39:22<1:33:00,  5.71it/s] 

{'loss': 0.0581, 'learning_rate': 0.0012435652111085938, 'epoch': 289.0}


                                                         
 58%|█████▊    | 43639/75500 [3:39:29<1:33:00,  5.71it/s]

{'eval_loss': 0.4541202783584595, 'eval_bleu': 12.1226, 'eval_gen_len': 4.9798, 'eval_runtime': 6.9638, 'eval_samples_per_second': 42.649, 'eval_steps_per_second': 2.728, 'epoch': 289.0}


 58%|█████▊    | 43790/75500 [3:39:56<1:28:38,  5.96it/s] 

{'loss': 0.0576, 'learning_rate': 0.0012376741257163616, 'epoch': 290.0}


                                                         
 58%|█████▊    | 43790/75500 [3:40:04<1:28:38,  5.96it/s]

{'eval_loss': 0.4715152978897095, 'eval_bleu': 13.014, 'eval_gen_len': 4.5219, 'eval_runtime': 7.6122, 'eval_samples_per_second': 39.017, 'eval_steps_per_second': 2.496, 'epoch': 290.0}


 58%|█████▊    | 43941/75500 [3:40:33<1:40:16,  5.25it/s] 

{'loss': 0.0612, 'learning_rate': 0.0012317830403241296, 'epoch': 291.0}


                                                         
 58%|█████▊    | 43941/75500 [3:40:41<1:40:16,  5.25it/s]

{'eval_loss': 0.44636058807373047, 'eval_bleu': 13.9863, 'eval_gen_len': 5.3401, 'eval_runtime': 8.0269, 'eval_samples_per_second': 37.001, 'eval_steps_per_second': 2.367, 'epoch': 291.0}


 58%|█████▊    | 44092/75500 [3:41:11<2:01:30,  4.31it/s] 

{'loss': 0.0585, 'learning_rate': 0.0012258919549318974, 'epoch': 292.0}


                                                         
 58%|█████▊    | 44092/75500 [3:41:20<2:01:30,  4.31it/s]

{'eval_loss': 0.46559029817581177, 'eval_bleu': 14.6724, 'eval_gen_len': 4.8081, 'eval_runtime': 8.1911, 'eval_samples_per_second': 36.259, 'eval_steps_per_second': 2.32, 'epoch': 292.0}


 59%|█████▊    | 44243/75500 [3:41:49<1:40:28,  5.18it/s] 

{'loss': 0.0589, 'learning_rate': 0.0012200008695396655, 'epoch': 293.0}


                                                         
 59%|█████▊    | 44243/75500 [3:41:58<1:40:28,  5.18it/s]

{'eval_loss': 0.44738492369651794, 'eval_bleu': 13.1631, 'eval_gen_len': 5.1919, 'eval_runtime': 9.0277, 'eval_samples_per_second': 32.899, 'eval_steps_per_second': 2.105, 'epoch': 293.0}


 59%|█████▉    | 44394/75500 [3:42:27<1:25:15,  6.08it/s] 

{'loss': 0.0607, 'learning_rate': 0.0012141097841474333, 'epoch': 294.0}


                                                         
 59%|█████▉    | 44394/75500 [3:42:36<1:25:15,  6.08it/s]

{'eval_loss': 0.461429625749588, 'eval_bleu': 12.1417, 'eval_gen_len': 5.0539, 'eval_runtime': 8.8741, 'eval_samples_per_second': 33.468, 'eval_steps_per_second': 2.141, 'epoch': 294.0}


 59%|█████▉    | 44545/75500 [3:43:05<1:26:45,  5.95it/s] 

{'loss': 0.0601, 'learning_rate': 0.0012082186987552013, 'epoch': 295.0}


                                                         
 59%|█████▉    | 44545/75500 [3:43:12<1:26:45,  5.95it/s]

{'eval_loss': 0.4441702961921692, 'eval_bleu': 11.7031, 'eval_gen_len': 5.4512, 'eval_runtime': 7.0909, 'eval_samples_per_second': 41.884, 'eval_steps_per_second': 2.679, 'epoch': 295.0}


 59%|█████▉    | 44696/75500 [3:43:37<1:23:19,  6.16it/s] 

{'loss': 0.0598, 'learning_rate': 0.0012023276133629691, 'epoch': 296.0}


                                                         
 59%|█████▉    | 44696/75500 [3:43:44<1:23:19,  6.16it/s]

{'eval_loss': 0.46037986874580383, 'eval_bleu': 10.8872, 'eval_gen_len': 5.101, 'eval_runtime': 6.6218, 'eval_samples_per_second': 44.852, 'eval_steps_per_second': 2.869, 'epoch': 296.0}


 59%|█████▉    | 44847/75500 [3:44:11<1:51:36,  4.58it/s] 

{'loss': 0.0571, 'learning_rate': 0.001196436527970737, 'epoch': 297.0}


                                                         
 59%|█████▉    | 44847/75500 [3:44:19<1:51:36,  4.58it/s]

{'eval_loss': 0.4406484365463257, 'eval_bleu': 15.5389, 'eval_gen_len': 5.3704, 'eval_runtime': 8.0209, 'eval_samples_per_second': 37.028, 'eval_steps_per_second': 2.369, 'epoch': 297.0}


 60%|█████▉    | 44998/75500 [3:44:50<1:37:04,  5.24it/s] 

{'loss': 0.0557, 'learning_rate': 0.001190545442578505, 'epoch': 298.0}


                                                         
 60%|█████▉    | 44998/75500 [3:44:58<1:37:04,  5.24it/s]

{'eval_loss': 0.4544776380062103, 'eval_bleu': 11.9638, 'eval_gen_len': 5.3939, 'eval_runtime': 7.9285, 'eval_samples_per_second': 37.46, 'eval_steps_per_second': 2.396, 'epoch': 298.0}


 60%|█████▉    | 45149/75500 [3:45:25<1:10:47,  7.15it/s] 

{'loss': 0.0573, 'learning_rate': 0.0011846543571862728, 'epoch': 299.0}


                                                         
 60%|█████▉    | 45149/75500 [3:45:32<1:10:47,  7.15it/s]

{'eval_loss': 0.45032498240470886, 'eval_bleu': 14.226, 'eval_gen_len': 5.138, 'eval_runtime': 6.5618, 'eval_samples_per_second': 45.262, 'eval_steps_per_second': 2.896, 'epoch': 299.0}


 60%|██████    | 45300/75500 [3:45:58<1:46:44,  4.72it/s] 

{'loss': 0.0554, 'learning_rate': 0.001178763271794041, 'epoch': 300.0}


                                                         
 60%|██████    | 45300/75500 [3:46:07<1:46:44,  4.72it/s]

{'eval_loss': 0.45898911356925964, 'eval_bleu': 11.8872, 'eval_gen_len': 4.6465, 'eval_runtime': 8.8723, 'eval_samples_per_second': 33.475, 'eval_steps_per_second': 2.141, 'epoch': 300.0}


 60%|██████    | 45451/75500 [3:46:34<1:26:31,  5.79it/s] 

{'loss': 0.056, 'learning_rate': 0.0011728721864018089, 'epoch': 301.0}


                                                         
 60%|██████    | 45451/75500 [3:46:42<1:26:31,  5.79it/s]

{'eval_loss': 0.44729623198509216, 'eval_bleu': 12.6686, 'eval_gen_len': 4.9461, 'eval_runtime': 7.7304, 'eval_samples_per_second': 38.42, 'eval_steps_per_second': 2.458, 'epoch': 301.0}


 60%|██████    | 45602/75500 [3:47:13<1:18:37,  6.34it/s] 

{'loss': 0.0545, 'learning_rate': 0.0011669811010095767, 'epoch': 302.0}


                                                         
 60%|██████    | 45602/75500 [3:47:21<1:18:37,  6.34it/s]

{'eval_loss': 0.4434472918510437, 'eval_bleu': 13.4811, 'eval_gen_len': 4.9798, 'eval_runtime': 7.8958, 'eval_samples_per_second': 37.615, 'eval_steps_per_second': 2.406, 'epoch': 302.0}


 61%|██████    | 45753/75500 [3:47:48<1:21:56,  6.05it/s] 

{'loss': 0.0567, 'learning_rate': 0.0011610900156173447, 'epoch': 303.0}


                                                         
 61%|██████    | 45753/75500 [3:47:55<1:21:56,  6.05it/s]

{'eval_loss': 0.4372875988483429, 'eval_bleu': 13.6173, 'eval_gen_len': 5.0438, 'eval_runtime': 7.1001, 'eval_samples_per_second': 41.831, 'eval_steps_per_second': 2.676, 'epoch': 303.0}


 61%|██████    | 45904/75500 [3:48:22<1:27:55,  5.61it/s] 

{'loss': 0.0595, 'learning_rate': 0.0011551989302251125, 'epoch': 304.0}


                                                         
 61%|██████    | 45904/75500 [3:48:31<1:27:55,  5.61it/s]

{'eval_loss': 0.4554373621940613, 'eval_bleu': 11.7729, 'eval_gen_len': 5.7104, 'eval_runtime': 8.7809, 'eval_samples_per_second': 33.824, 'eval_steps_per_second': 2.164, 'epoch': 304.0}


 61%|██████    | 46055/75500 [3:49:02<1:37:39,  5.03it/s] 

{'loss': 0.0569, 'learning_rate': 0.0011493078448328805, 'epoch': 305.0}


                                                         
 61%|██████    | 46055/75500 [3:49:10<1:37:39,  5.03it/s]

{'eval_loss': 0.44479307532310486, 'eval_bleu': 12.3003, 'eval_gen_len': 5.6195, 'eval_runtime': 7.9962, 'eval_samples_per_second': 37.142, 'eval_steps_per_second': 2.376, 'epoch': 305.0}


 61%|██████    | 46206/75500 [3:49:39<1:23:51,  5.82it/s] 

{'loss': 0.0554, 'learning_rate': 0.0011434167594406484, 'epoch': 306.0}


                                                         
 61%|██████    | 46206/75500 [3:49:45<1:23:51,  5.82it/s]

{'eval_loss': 0.44462621212005615, 'eval_bleu': 13.3851, 'eval_gen_len': 5.0505, 'eval_runtime': 6.7331, 'eval_samples_per_second': 44.111, 'eval_steps_per_second': 2.822, 'epoch': 306.0}


 61%|██████▏   | 46357/75500 [3:50:14<1:45:54,  4.59it/s] 

{'loss': 0.0534, 'learning_rate': 0.0011375256740484164, 'epoch': 307.0}


                                                         
 61%|██████▏   | 46357/75500 [3:50:22<1:45:54,  4.59it/s]

{'eval_loss': 0.44397881627082825, 'eval_bleu': 14.5176, 'eval_gen_len': 4.9259, 'eval_runtime': 8.2326, 'eval_samples_per_second': 36.076, 'eval_steps_per_second': 2.308, 'epoch': 307.0}


 62%|██████▏   | 46508/75500 [3:50:51<1:09:30,  6.95it/s] 

{'loss': 0.0499, 'learning_rate': 0.0011316736024667288, 'epoch': 308.0}


                                                         
 62%|██████▏   | 46508/75500 [3:50:58<1:09:30,  6.95it/s]

{'eval_loss': 0.454097718000412, 'eval_bleu': 15.3397, 'eval_gen_len': 4.9933, 'eval_runtime': 6.7926, 'eval_samples_per_second': 43.724, 'eval_steps_per_second': 2.797, 'epoch': 308.0}


 62%|██████▏   | 46659/75500 [3:51:24<1:14:03,  6.49it/s] 

{'loss': 0.0498, 'learning_rate': 0.0011257825170744966, 'epoch': 309.0}


                                                         
 62%|██████▏   | 46659/75500 [3:51:31<1:14:03,  6.49it/s]

{'eval_loss': 0.4474908709526062, 'eval_bleu': 12.6991, 'eval_gen_len': 5.101, 'eval_runtime': 7.0728, 'eval_samples_per_second': 41.992, 'eval_steps_per_second': 2.686, 'epoch': 309.0}


 62%|██████▏   | 46810/75500 [3:51:56<1:12:42,  6.58it/s] 

{'loss': 0.0568, 'learning_rate': 0.0011198914316822646, 'epoch': 310.0}


                                                         
 62%|██████▏   | 46810/75500 [3:52:02<1:12:42,  6.58it/s]

{'eval_loss': 0.4562422037124634, 'eval_bleu': 12.5015, 'eval_gen_len': 4.7205, 'eval_runtime': 6.7023, 'eval_samples_per_second': 44.313, 'eval_steps_per_second': 2.835, 'epoch': 310.0}


 62%|██████▏   | 46961/75500 [3:52:27<1:18:21,  6.07it/s] 

{'loss': 0.056, 'learning_rate': 0.0011140003462900324, 'epoch': 311.0}


                                                         
 62%|██████▏   | 46961/75500 [3:52:35<1:18:21,  6.07it/s]

{'eval_loss': 0.4564032256603241, 'eval_bleu': 13.0492, 'eval_gen_len': 4.8384, 'eval_runtime': 7.3348, 'eval_samples_per_second': 40.492, 'eval_steps_per_second': 2.59, 'epoch': 311.0}


 62%|██████▏   | 47112/75500 [3:53:00<1:18:23,  6.04it/s] 

{'loss': 0.0572, 'learning_rate': 0.0011081092608978005, 'epoch': 312.0}


                                                         
 62%|██████▏   | 47112/75500 [3:53:07<1:18:23,  6.04it/s]

{'eval_loss': 0.45263662934303284, 'eval_bleu': 13.9543, 'eval_gen_len': 4.936, 'eval_runtime': 7.1799, 'eval_samples_per_second': 41.366, 'eval_steps_per_second': 2.646, 'epoch': 312.0}


 63%|██████▎   | 47263/75500 [3:53:37<1:40:56,  4.66it/s] 

{'loss': 0.0569, 'learning_rate': 0.0011022181755055683, 'epoch': 313.0}


                                                         
 63%|██████▎   | 47263/75500 [3:53:45<1:40:56,  4.66it/s]

{'eval_loss': 0.45446649193763733, 'eval_bleu': 12.0549, 'eval_gen_len': 4.7744, 'eval_runtime': 8.252, 'eval_samples_per_second': 35.991, 'eval_steps_per_second': 2.302, 'epoch': 313.0}


 63%|██████▎   | 47414/75500 [3:54:13<1:15:48,  6.17it/s] 

{'loss': 0.055, 'learning_rate': 0.0010963270901133363, 'epoch': 314.0}


                                                         
 63%|██████▎   | 47414/75500 [3:54:20<1:15:48,  6.17it/s]

{'eval_loss': 0.45209717750549316, 'eval_bleu': 9.7578, 'eval_gen_len': 5.367, 'eval_runtime': 6.9918, 'eval_samples_per_second': 42.478, 'eval_steps_per_second': 2.717, 'epoch': 314.0}


 63%|██████▎   | 47565/75500 [3:54:46<1:19:56,  5.82it/s] 

{'loss': 0.053, 'learning_rate': 0.0010904360047211041, 'epoch': 315.0}


                                                         
 63%|██████▎   | 47565/75500 [3:54:53<1:19:56,  5.82it/s]

{'eval_loss': 0.4679547846317291, 'eval_bleu': 10.939, 'eval_gen_len': 5.2593, 'eval_runtime': 6.6274, 'eval_samples_per_second': 44.814, 'eval_steps_per_second': 2.867, 'epoch': 315.0}


 63%|██████▎   | 47716/75500 [3:55:18<1:09:53,  6.62it/s] 

{'loss': 0.0561, 'learning_rate': 0.0010845449193288721, 'epoch': 316.0}


                                                         
 63%|██████▎   | 47716/75500 [3:55:24<1:09:53,  6.62it/s]

{'eval_loss': 0.44039952754974365, 'eval_bleu': 12.3295, 'eval_gen_len': 5.5354, 'eval_runtime': 6.259, 'eval_samples_per_second': 47.451, 'eval_steps_per_second': 3.036, 'epoch': 316.0}


 63%|██████▎   | 47867/75500 [3:55:49<1:27:42,  5.25it/s] 

{'loss': 0.052, 'learning_rate': 0.00107865383393664, 'epoch': 317.0}


                                                         
 63%|██████▎   | 47867/75500 [3:55:56<1:27:42,  5.25it/s]

{'eval_loss': 0.4524540305137634, 'eval_bleu': 13.1137, 'eval_gen_len': 5.1481, 'eval_runtime': 7.6483, 'eval_samples_per_second': 38.832, 'eval_steps_per_second': 2.484, 'epoch': 317.0}


 64%|██████▎   | 48018/75500 [3:56:24<1:32:43,  4.94it/s] 

{'loss': 0.0503, 'learning_rate': 0.0010727627485444078, 'epoch': 318.0}


                                                         
 64%|██████▎   | 48018/75500 [3:56:33<1:32:43,  4.94it/s]

{'eval_loss': 0.45200422406196594, 'eval_bleu': 12.8551, 'eval_gen_len': 5.7778, 'eval_runtime': 9.0614, 'eval_samples_per_second': 32.776, 'eval_steps_per_second': 2.097, 'epoch': 318.0}


 64%|██████▍   | 48169/75500 [3:57:02<1:39:11,  4.59it/s] 

{'loss': 0.0492, 'learning_rate': 0.0010668716631521758, 'epoch': 319.0}


                                                         
 64%|██████▍   | 48169/75500 [3:57:10<1:39:11,  4.59it/s]

{'eval_loss': 0.45415017008781433, 'eval_bleu': 14.7069, 'eval_gen_len': 4.9394, 'eval_runtime': 8.4179, 'eval_samples_per_second': 35.282, 'eval_steps_per_second': 2.257, 'epoch': 319.0}


 64%|██████▍   | 48320/75500 [3:57:41<1:32:26,  4.90it/s] 

{'loss': 0.0508, 'learning_rate': 0.0010609805777599438, 'epoch': 320.0}


                                                         
 64%|██████▍   | 48320/75500 [3:57:50<1:32:26,  4.90it/s]

{'eval_loss': 0.4459410309791565, 'eval_bleu': 14.0255, 'eval_gen_len': 5.0101, 'eval_runtime': 8.5267, 'eval_samples_per_second': 34.832, 'eval_steps_per_second': 2.228, 'epoch': 320.0}


 64%|██████▍   | 48471/75500 [3:58:17<1:02:34,  7.20it/s] 

{'loss': 0.0491, 'learning_rate': 0.0010550894923677119, 'epoch': 321.0}


                                                         
 64%|██████▍   | 48471/75500 [3:58:24<1:02:34,  7.20it/s]

{'eval_loss': 0.45328590273857117, 'eval_bleu': 15.0218, 'eval_gen_len': 5.1852, 'eval_runtime': 6.7643, 'eval_samples_per_second': 43.907, 'eval_steps_per_second': 2.809, 'epoch': 321.0}


 64%|██████▍   | 48622/75500 [3:58:53<1:11:35,  6.26it/s] 

{'loss': 0.0495, 'learning_rate': 0.0010491984069754797, 'epoch': 322.0}


                                                         
 64%|██████▍   | 48622/75500 [3:59:01<1:11:35,  6.26it/s]

{'eval_loss': 0.45035725831985474, 'eval_bleu': 11.6368, 'eval_gen_len': 5.2088, 'eval_runtime': 7.2851, 'eval_samples_per_second': 40.768, 'eval_steps_per_second': 2.608, 'epoch': 322.0}


 65%|██████▍   | 48773/75500 [3:59:32<1:18:01,  5.71it/s] 

{'loss': 0.0495, 'learning_rate': 0.001043346335393792, 'epoch': 323.0}


                                                         
 65%|██████▍   | 48773/75500 [3:59:41<1:18:01,  5.71it/s]

{'eval_loss': 0.44042614102363586, 'eval_bleu': 12.7389, 'eval_gen_len': 5.2997, 'eval_runtime': 8.8007, 'eval_samples_per_second': 33.747, 'eval_steps_per_second': 2.159, 'epoch': 323.0}


 65%|██████▍   | 48924/75500 [4:00:08<1:21:02,  5.47it/s] 

{'loss': 0.0475, 'learning_rate': 0.00103745525000156, 'epoch': 324.0}


                                                         
 65%|██████▍   | 48924/75500 [4:00:16<1:21:02,  5.47it/s]

{'eval_loss': 0.42751219868659973, 'eval_bleu': 13.4425, 'eval_gen_len': 5.798, 'eval_runtime': 7.9131, 'eval_samples_per_second': 37.533, 'eval_steps_per_second': 2.401, 'epoch': 324.0}


 65%|██████▌   | 49075/75500 [4:00:46<1:09:45,  6.31it/s] 

{'loss': 0.0509, 'learning_rate': 0.001031564164609328, 'epoch': 325.0}


                                                         
 65%|██████▌   | 49075/75500 [4:00:54<1:09:45,  6.31it/s]

{'eval_loss': 0.45049700140953064, 'eval_bleu': 15.9742, 'eval_gen_len': 4.9394, 'eval_runtime': 7.2723, 'eval_samples_per_second': 40.84, 'eval_steps_per_second': 2.613, 'epoch': 325.0}


 65%|██████▌   | 49226/75500 [4:01:24<1:18:31,  5.58it/s] 

{'loss': 0.0537, 'learning_rate': 0.001025673079217096, 'epoch': 326.0}


                                                         
 65%|██████▌   | 49226/75500 [4:01:34<1:18:31,  5.58it/s]

{'eval_loss': 0.4462011158466339, 'eval_bleu': 13.4659, 'eval_gen_len': 5.3434, 'eval_runtime': 9.6523, 'eval_samples_per_second': 30.77, 'eval_steps_per_second': 1.968, 'epoch': 326.0}


 65%|██████▌   | 49377/75500 [4:02:00<1:17:03,  5.65it/s] 

{'loss': 0.0511, 'learning_rate': 0.0010197819938248638, 'epoch': 327.0}


                                                         
 65%|██████▌   | 49377/75500 [4:02:09<1:17:03,  5.65it/s]

{'eval_loss': 0.45786556601524353, 'eval_bleu': 13.2608, 'eval_gen_len': 4.9697, 'eval_runtime': 8.8065, 'eval_samples_per_second': 33.725, 'eval_steps_per_second': 2.157, 'epoch': 327.0}


 66%|██████▌   | 49528/75500 [4:02:39<1:07:56,  6.37it/s] 

{'loss': 0.0529, 'learning_rate': 0.0010138909084326318, 'epoch': 328.0}


                                                         
 66%|██████▌   | 49528/75500 [4:02:47<1:07:56,  6.37it/s]

{'eval_loss': 0.45039254426956177, 'eval_bleu': 13.9421, 'eval_gen_len': 5.2727, 'eval_runtime': 8.0722, 'eval_samples_per_second': 36.793, 'eval_steps_per_second': 2.354, 'epoch': 328.0}


 66%|██████▌   | 49679/75500 [4:03:18<1:40:34,  4.28it/s] 

{'loss': 0.052, 'learning_rate': 0.0010079998230403996, 'epoch': 329.0}


                                                         
 66%|██████▌   | 49679/75500 [4:03:27<1:40:34,  4.28it/s]

{'eval_loss': 0.45591261982917786, 'eval_bleu': 10.3168, 'eval_gen_len': 5.2256, 'eval_runtime': 9.5074, 'eval_samples_per_second': 31.239, 'eval_steps_per_second': 1.998, 'epoch': 329.0}


 66%|██████▌   | 49830/75500 [4:03:57<1:08:56,  6.21it/s] 

{'loss': 0.0486, 'learning_rate': 0.0010021087376481674, 'epoch': 330.0}


                                                         
 66%|██████▌   | 49830/75500 [4:04:05<1:08:56,  6.21it/s]

{'eval_loss': 0.43540436029434204, 'eval_bleu': 15.6417, 'eval_gen_len': 5.4512, 'eval_runtime': 7.6201, 'eval_samples_per_second': 38.976, 'eval_steps_per_second': 2.493, 'epoch': 330.0}


 66%|██████▌   | 49981/75500 [4:04:35<1:45:51,  4.02it/s] 

{'loss': 0.0489, 'learning_rate': 0.0009962176522559354, 'epoch': 331.0}


                                                         
 66%|██████▌   | 49981/75500 [4:04:43<1:45:51,  4.02it/s]

{'eval_loss': 0.43668878078460693, 'eval_bleu': 13.787, 'eval_gen_len': 5.2424, 'eval_runtime': 8.2801, 'eval_samples_per_second': 35.869, 'eval_steps_per_second': 2.295, 'epoch': 331.0}


 66%|██████▋   | 50132/75500 [4:05:12<1:16:52,  5.50it/s] 

{'loss': 0.048, 'learning_rate': 0.0009903265668637033, 'epoch': 332.0}


                                                         
 66%|██████▋   | 50132/75500 [4:05:19<1:16:52,  5.50it/s]

{'eval_loss': 0.45513126254081726, 'eval_bleu': 14.0249, 'eval_gen_len': 4.9562, 'eval_runtime': 7.1667, 'eval_samples_per_second': 41.441, 'eval_steps_per_second': 2.651, 'epoch': 332.0}


 67%|██████▋   | 50283/75500 [4:05:46<1:03:48,  6.59it/s] 

{'loss': 0.0473, 'learning_rate': 0.0009844354814714713, 'epoch': 333.0}


                                                         
 67%|██████▋   | 50283/75500 [4:05:54<1:03:48,  6.59it/s]

{'eval_loss': 0.45350560545921326, 'eval_bleu': 13.7401, 'eval_gen_len': 5.1852, 'eval_runtime': 8.1189, 'eval_samples_per_second': 36.581, 'eval_steps_per_second': 2.34, 'epoch': 333.0}


 67%|██████▋   | 50434/75500 [4:06:24<1:29:38,  4.66it/s] 

{'loss': 0.0455, 'learning_rate': 0.000978544396079239, 'epoch': 334.0}


                                                         
 67%|██████▋   | 50434/75500 [4:06:31<1:29:38,  4.66it/s]

{'eval_loss': 0.43311071395874023, 'eval_bleu': 12.5222, 'eval_gen_len': 5.1852, 'eval_runtime': 7.8406, 'eval_samples_per_second': 37.88, 'eval_steps_per_second': 2.423, 'epoch': 334.0}


 67%|██████▋   | 50585/75500 [4:07:00<1:02:47,  6.61it/s] 

{'loss': 0.0485, 'learning_rate': 0.000972653310687007, 'epoch': 335.0}


                                                         
 67%|██████▋   | 50585/75500 [4:07:07<1:02:47,  6.61it/s]

{'eval_loss': 0.44091540575027466, 'eval_bleu': 14.7261, 'eval_gen_len': 5.5387, 'eval_runtime': 7.6567, 'eval_samples_per_second': 38.79, 'eval_steps_per_second': 2.481, 'epoch': 335.0}


 67%|██████▋   | 50736/75500 [4:07:35<1:06:23,  6.22it/s] 

{'loss': 0.0471, 'learning_rate': 0.0009667622252947749, 'epoch': 336.0}


                                                         
 67%|██████▋   | 50736/75500 [4:07:42<1:06:23,  6.22it/s]

{'eval_loss': 0.44258764386177063, 'eval_bleu': 15.8196, 'eval_gen_len': 5.3737, 'eval_runtime': 6.7606, 'eval_samples_per_second': 43.931, 'eval_steps_per_second': 2.81, 'epoch': 336.0}


 67%|██████▋   | 50887/75500 [4:08:10<1:25:32,  4.80it/s] 

{'loss': 0.0475, 'learning_rate': 0.0009608711399025429, 'epoch': 337.0}


                                                         
 67%|██████▋   | 50887/75500 [4:08:17<1:25:32,  4.80it/s]

{'eval_loss': 0.4523126482963562, 'eval_bleu': 14.126, 'eval_gen_len': 4.7306, 'eval_runtime': 7.5611, 'eval_samples_per_second': 39.28, 'eval_steps_per_second': 2.513, 'epoch': 337.0}


 68%|██████▊   | 51038/75500 [4:08:45<1:12:00,  5.66it/s] 

{'loss': 0.0466, 'learning_rate': 0.0009549800545103108, 'epoch': 338.0}


                                                         
 68%|██████▊   | 51038/75500 [4:08:52<1:12:00,  5.66it/s]

{'eval_loss': 0.4482431709766388, 'eval_bleu': 13.6077, 'eval_gen_len': 5.3737, 'eval_runtime': 7.3783, 'eval_samples_per_second': 40.253, 'eval_steps_per_second': 2.575, 'epoch': 338.0}


 68%|██████▊   | 51189/75500 [4:09:19<1:09:51,  5.80it/s] 

{'loss': 0.0461, 'learning_rate': 0.0009490889691180787, 'epoch': 339.0}


                                                         
 68%|██████▊   | 51189/75500 [4:09:26<1:09:51,  5.80it/s]

{'eval_loss': 0.45128458738327026, 'eval_bleu': 11.849, 'eval_gen_len': 6.2189, 'eval_runtime': 7.5824, 'eval_samples_per_second': 39.17, 'eval_steps_per_second': 2.506, 'epoch': 339.0}


 68%|██████▊   | 51340/75500 [4:09:56<1:03:58,  6.29it/s] 

{'loss': 0.0467, 'learning_rate': 0.0009431978837258466, 'epoch': 340.0}


                                                         
 68%|██████▊   | 51340/75500 [4:10:04<1:03:58,  6.29it/s]

{'eval_loss': 0.4428773522377014, 'eval_bleu': 14.0656, 'eval_gen_len': 5.633, 'eval_runtime': 7.8539, 'eval_samples_per_second': 37.816, 'eval_steps_per_second': 2.419, 'epoch': 340.0}


 68%|██████▊   | 51491/75500 [4:10:32<1:15:38,  5.29it/s] 

{'loss': 0.0465, 'learning_rate': 0.0009373067983336147, 'epoch': 341.0}


                                                         
 68%|██████▊   | 51491/75500 [4:10:40<1:15:38,  5.29it/s]

{'eval_loss': 0.44695788621902466, 'eval_bleu': 13.8467, 'eval_gen_len': 5.4074, 'eval_runtime': 7.6743, 'eval_samples_per_second': 38.701, 'eval_steps_per_second': 2.476, 'epoch': 341.0}


 68%|██████▊   | 51642/75500 [4:11:09<1:16:23,  5.21it/s] 

{'loss': 0.0461, 'learning_rate': 0.0009314547267519272, 'epoch': 342.0}


                                                         
 68%|██████▊   | 51642/75500 [4:11:17<1:16:23,  5.21it/s]

{'eval_loss': 0.4549578130245209, 'eval_bleu': 14.6715, 'eval_gen_len': 5.1684, 'eval_runtime': 7.6468, 'eval_samples_per_second': 38.84, 'eval_steps_per_second': 2.485, 'epoch': 342.0}


 69%|██████▊   | 51793/75500 [4:11:41<1:03:23,  6.23it/s] 

{'loss': 0.0436, 'learning_rate': 0.0009255636413596951, 'epoch': 343.0}


                                                         
 69%|██████▊   | 51793/75500 [4:11:47<1:03:23,  6.23it/s]

{'eval_loss': 0.4377445578575134, 'eval_bleu': 16.5988, 'eval_gen_len': 5.1953, 'eval_runtime': 6.3616, 'eval_samples_per_second': 46.686, 'eval_steps_per_second': 2.987, 'epoch': 343.0}


 69%|██████▉   | 51944/75500 [4:12:11<59:23,  6.61it/s]   

{'loss': 0.0466, 'learning_rate': 0.000919672555967463, 'epoch': 344.0}


                                                       
 69%|██████▉   | 51944/75500 [4:12:18<59:23,  6.61it/s]

{'eval_loss': 0.43400657176971436, 'eval_bleu': 19.7191, 'eval_gen_len': 5.3199, 'eval_runtime': 6.9215, 'eval_samples_per_second': 42.91, 'eval_steps_per_second': 2.745, 'epoch': 344.0}


 69%|██████▉   | 52095/75500 [4:12:42<58:30,  6.67it/s]   

{'loss': 0.0464, 'learning_rate': 0.0009137814705752308, 'epoch': 345.0}


                                                       
 69%|██████▉   | 52095/75500 [4:12:49<58:30,  6.67it/s]

{'eval_loss': 0.443419486284256, 'eval_bleu': 11.5068, 'eval_gen_len': 5.4747, 'eval_runtime': 6.8325, 'eval_samples_per_second': 43.469, 'eval_steps_per_second': 2.781, 'epoch': 345.0}


 69%|██████▉   | 52246/75500 [4:13:14<1:19:59,  4.85it/s] 

{'loss': 0.0506, 'learning_rate': 0.0009078903851829987, 'epoch': 346.0}


                                                         
 69%|██████▉   | 52246/75500 [4:13:21<1:19:59,  4.85it/s]

{'eval_loss': 0.4472288489341736, 'eval_bleu': 15.1123, 'eval_gen_len': 5.1616, 'eval_runtime': 6.5019, 'eval_samples_per_second': 45.679, 'eval_steps_per_second': 2.922, 'epoch': 346.0}


 69%|██████▉   | 52397/75500 [4:13:45<55:06,  6.99it/s]   

{'loss': 0.0451, 'learning_rate': 0.0009019992997907667, 'epoch': 347.0}


                                                       
 69%|██████▉   | 52397/75500 [4:13:52<55:06,  6.99it/s]

{'eval_loss': 0.44267138838768005, 'eval_bleu': 17.3401, 'eval_gen_len': 4.8956, 'eval_runtime': 6.368, 'eval_samples_per_second': 46.64, 'eval_steps_per_second': 2.984, 'epoch': 347.0}


 70%|██████▉   | 52548/75500 [4:14:18<57:42,  6.63it/s]   

{'loss': 0.0431, 'learning_rate': 0.0008961082143985346, 'epoch': 348.0}


                                                       
 70%|██████▉   | 52548/75500 [4:14:25<57:42,  6.63it/s]

{'eval_loss': 0.44847041368484497, 'eval_bleu': 17.3534, 'eval_gen_len': 5.0, 'eval_runtime': 6.4757, 'eval_samples_per_second': 45.864, 'eval_steps_per_second': 2.934, 'epoch': 348.0}


 70%|██████▉   | 52699/75500 [4:14:49<59:01,  6.44it/s]   

{'loss': 0.0398, 'learning_rate': 0.0008902171290063025, 'epoch': 349.0}


                                                       
 70%|██████▉   | 52699/75500 [4:14:56<59:01,  6.44it/s]

{'eval_loss': 0.4448075592517853, 'eval_bleu': 14.8116, 'eval_gen_len': 5.3939, 'eval_runtime': 6.4936, 'eval_samples_per_second': 45.737, 'eval_steps_per_second': 2.926, 'epoch': 349.0}


 70%|███████   | 52850/75500 [4:15:26<1:06:02,  5.72it/s] 

{'loss': 0.0417, 'learning_rate': 0.0008843260436140704, 'epoch': 350.0}


                                                         
 70%|███████   | 52850/75500 [4:15:33<1:06:02,  5.72it/s]

{'eval_loss': 0.44407257437705994, 'eval_bleu': 16.9112, 'eval_gen_len': 5.3401, 'eval_runtime': 7.0477, 'eval_samples_per_second': 42.141, 'eval_steps_per_second': 2.696, 'epoch': 350.0}


 70%|███████   | 53001/75500 [4:16:00<1:00:25,  6.21it/s] 

{'loss': 0.0426, 'learning_rate': 0.0008784349582218383, 'epoch': 351.0}


                                                         
 70%|███████   | 53001/75500 [4:16:06<1:00:25,  6.21it/s]

{'eval_loss': 0.440590500831604, 'eval_bleu': 16.44, 'eval_gen_len': 5.1886, 'eval_runtime': 6.5182, 'eval_samples_per_second': 45.565, 'eval_steps_per_second': 2.915, 'epoch': 351.0}


 70%|███████   | 53152/75500 [4:16:30<1:01:26,  6.06it/s] 

{'loss': 0.0421, 'learning_rate': 0.0008725438728296063, 'epoch': 352.0}


                                                         
 70%|███████   | 53152/75500 [4:16:36<1:01:26,  6.06it/s]

{'eval_loss': 0.43574026226997375, 'eval_bleu': 13.6771, 'eval_gen_len': 5.532, 'eval_runtime': 6.3576, 'eval_samples_per_second': 46.716, 'eval_steps_per_second': 2.989, 'epoch': 352.0}


 71%|███████   | 53303/75500 [4:17:01<1:00:20,  6.13it/s] 

{'loss': 0.0416, 'learning_rate': 0.0008666527874373742, 'epoch': 353.0}


                                                         
 71%|███████   | 53303/75500 [4:17:07<1:00:20,  6.13it/s]

{'eval_loss': 0.43367964029312134, 'eval_bleu': 18.0754, 'eval_gen_len': 5.2593, 'eval_runtime': 6.4858, 'eval_samples_per_second': 45.792, 'eval_steps_per_second': 2.929, 'epoch': 353.0}


 71%|███████   | 53454/75500 [4:17:31<52:28,  7.00it/s]   

{'loss': 0.0429, 'learning_rate': 0.0008607617020451421, 'epoch': 354.0}


                                                       
 71%|███████   | 53454/75500 [4:17:37<52:28,  7.00it/s]

{'eval_loss': 0.44679656624794006, 'eval_bleu': 15.5831, 'eval_gen_len': 5.3266, 'eval_runtime': 6.6395, 'eval_samples_per_second': 44.732, 'eval_steps_per_second': 2.862, 'epoch': 354.0}


 71%|███████   | 53605/75500 [4:18:01<55:26,  6.58it/s]   

{'loss': 0.0439, 'learning_rate': 0.0008548706166529099, 'epoch': 355.0}


                                                       
 71%|███████   | 53605/75500 [4:18:08<55:26,  6.58it/s]

{'eval_loss': 0.44081512093544006, 'eval_bleu': 16.1645, 'eval_gen_len': 5.5118, 'eval_runtime': 6.4354, 'eval_samples_per_second': 46.151, 'eval_steps_per_second': 2.952, 'epoch': 355.0}


 71%|███████   | 53756/75500 [4:18:33<1:14:16,  4.88it/s] 

{'loss': 0.0447, 'learning_rate': 0.0008489795312606778, 'epoch': 356.0}


                                                         
 71%|███████   | 53756/75500 [4:18:43<1:14:16,  4.88it/s]

{'eval_loss': 0.44105544686317444, 'eval_bleu': 14.3714, 'eval_gen_len': 5.1582, 'eval_runtime': 9.5601, 'eval_samples_per_second': 31.067, 'eval_steps_per_second': 1.987, 'epoch': 356.0}


 71%|███████▏  | 53907/75500 [4:19:14<1:03:56,  5.63it/s] 

{'loss': 0.0441, 'learning_rate': 0.0008430884458684458, 'epoch': 357.0}


                                                         
 71%|███████▏  | 53907/75500 [4:19:21<1:03:56,  5.63it/s]

{'eval_loss': 0.45168840885162354, 'eval_bleu': 15.9534, 'eval_gen_len': 5.4108, 'eval_runtime': 6.9189, 'eval_samples_per_second': 42.926, 'eval_steps_per_second': 2.746, 'epoch': 357.0}


 72%|███████▏  | 54058/75500 [4:19:46<57:28,  6.22it/s]   

{'loss': 0.0415, 'learning_rate': 0.0008371973604762137, 'epoch': 358.0}


                                                       
 72%|███████▏  | 54058/75500 [4:19:53<57:28,  6.22it/s]

{'eval_loss': 0.45959389209747314, 'eval_bleu': 14.4022, 'eval_gen_len': 5.0539, 'eval_runtime': 7.363, 'eval_samples_per_second': 40.337, 'eval_steps_per_second': 2.58, 'epoch': 358.0}


 72%|███████▏  | 54209/75500 [4:20:19<51:09,  6.94it/s]   

{'loss': 0.0406, 'learning_rate': 0.0008313062750839816, 'epoch': 359.0}


                                                       
 72%|███████▏  | 54209/75500 [4:20:26<51:09,  6.94it/s]

{'eval_loss': 0.44859063625335693, 'eval_bleu': 15.1533, 'eval_gen_len': 5.3165, 'eval_runtime': 6.8477, 'eval_samples_per_second': 43.372, 'eval_steps_per_second': 2.775, 'epoch': 359.0}


 72%|███████▏  | 54360/75500 [4:20:55<1:00:26,  5.83it/s] 

{'loss': 0.0406, 'learning_rate': 0.0008254542035022942, 'epoch': 360.0}


                                                         
 72%|███████▏  | 54360/75500 [4:21:06<1:00:26,  5.83it/s]

{'eval_loss': 0.4434463679790497, 'eval_bleu': 15.0912, 'eval_gen_len': 4.9192, 'eval_runtime': 11.2407, 'eval_samples_per_second': 26.422, 'eval_steps_per_second': 1.69, 'epoch': 360.0}


 72%|███████▏  | 54511/75500 [4:21:38<1:04:16,  5.44it/s] 

{'loss': 0.0385, 'learning_rate': 0.0008195631181100621, 'epoch': 361.0}


                                                         
 72%|███████▏  | 54511/75500 [4:21:47<1:04:16,  5.44it/s]

{'eval_loss': 0.4485945403575897, 'eval_bleu': 14.825, 'eval_gen_len': 5.4411, 'eval_runtime': 8.9241, 'eval_samples_per_second': 33.28, 'eval_steps_per_second': 2.129, 'epoch': 361.0}


 72%|███████▏  | 54662/75500 [4:22:16<1:09:08,  5.02it/s] 

{'loss': 0.0414, 'learning_rate': 0.0008136720327178301, 'epoch': 362.0}


                                                         
 72%|███████▏  | 54662/75500 [4:22:24<1:09:08,  5.02it/s]

{'eval_loss': 0.4403042793273926, 'eval_bleu': 15.1152, 'eval_gen_len': 5.1684, 'eval_runtime': 7.8668, 'eval_samples_per_second': 37.754, 'eval_steps_per_second': 2.415, 'epoch': 362.0}


 73%|███████▎  | 54813/75500 [4:22:51<57:55,  5.95it/s]   

{'loss': 0.0413, 'learning_rate': 0.000807780947325598, 'epoch': 363.0}


                                                       
 73%|███████▎  | 54813/75500 [4:22:58<57:55,  5.95it/s]

{'eval_loss': 0.44155576825141907, 'eval_bleu': 16.8122, 'eval_gen_len': 5.202, 'eval_runtime': 7.2756, 'eval_samples_per_second': 40.821, 'eval_steps_per_second': 2.611, 'epoch': 363.0}


 73%|███████▎  | 54964/75500 [4:23:26<1:22:05,  4.17it/s] 

{'loss': 0.039, 'learning_rate': 0.0008018898619333659, 'epoch': 364.0}


                                                         
 73%|███████▎  | 54964/75500 [4:23:34<1:22:05,  4.17it/s]

{'eval_loss': 0.43899449706077576, 'eval_bleu': 17.3178, 'eval_gen_len': 4.9125, 'eval_runtime': 7.2837, 'eval_samples_per_second': 40.776, 'eval_steps_per_second': 2.609, 'epoch': 364.0}


 73%|███████▎  | 55115/75500 [4:24:02<58:40,  5.79it/s]   

{'loss': 0.0386, 'learning_rate': 0.0007959987765411338, 'epoch': 365.0}


                                                       
 73%|███████▎  | 55115/75500 [4:24:09<58:40,  5.79it/s]

{'eval_loss': 0.44213759899139404, 'eval_bleu': 14.6708, 'eval_gen_len': 5.2189, 'eval_runtime': 6.5676, 'eval_samples_per_second': 45.222, 'eval_steps_per_second': 2.893, 'epoch': 365.0}


 73%|███████▎  | 55266/75500 [4:24:34<55:26,  6.08it/s]   

{'loss': 0.0409, 'learning_rate': 0.0007901076911489016, 'epoch': 366.0}


                                                       
 73%|███████▎  | 55266/75500 [4:24:41<55:26,  6.08it/s]

{'eval_loss': 0.4464757740497589, 'eval_bleu': 12.5182, 'eval_gen_len': 5.1515, 'eval_runtime': 6.8016, 'eval_samples_per_second': 43.666, 'eval_steps_per_second': 2.793, 'epoch': 366.0}


 73%|███████▎  | 55417/75500 [4:25:05<46:24,  7.21it/s]   

{'loss': 0.0407, 'learning_rate': 0.0007842166057566696, 'epoch': 367.0}


                                                       
 73%|███████▎  | 55417/75500 [4:25:12<46:24,  7.21it/s]

{'eval_loss': 0.441022664308548, 'eval_bleu': 13.5014, 'eval_gen_len': 5.734, 'eval_runtime': 7.3567, 'eval_samples_per_second': 40.371, 'eval_steps_per_second': 2.583, 'epoch': 367.0}


 74%|███████▎  | 55568/75500 [4:25:39<1:00:46,  5.47it/s] 

{'loss': 0.0404, 'learning_rate': 0.0007783255203644375, 'epoch': 368.0}


                                                         
 74%|███████▎  | 55568/75500 [4:25:46<1:00:46,  5.47it/s]

{'eval_loss': 0.444527268409729, 'eval_bleu': 13.6132, 'eval_gen_len': 4.8215, 'eval_runtime': 7.452, 'eval_samples_per_second': 39.855, 'eval_steps_per_second': 2.55, 'epoch': 368.0}


 74%|███████▍  | 55719/75500 [4:26:17<1:14:13,  4.44it/s] 

{'loss': 0.0372, 'learning_rate': 0.0007724344349722054, 'epoch': 369.0}


                                                         
 74%|███████▍  | 55719/75500 [4:26:26<1:14:13,  4.44it/s]

{'eval_loss': 0.4526370167732239, 'eval_bleu': 15.918, 'eval_gen_len': 4.8519, 'eval_runtime': 8.0944, 'eval_samples_per_second': 36.692, 'eval_steps_per_second': 2.347, 'epoch': 369.0}


 74%|███████▍  | 55870/75500 [4:26:56<1:04:03,  5.11it/s] 

{'loss': 0.037, 'learning_rate': 0.0007665433495799733, 'epoch': 370.0}


                                                         
 74%|███████▍  | 55870/75500 [4:27:04<1:04:03,  5.11it/s]

{'eval_loss': 0.4523293972015381, 'eval_bleu': 13.2569, 'eval_gen_len': 4.8721, 'eval_runtime': 7.5849, 'eval_samples_per_second': 39.157, 'eval_steps_per_second': 2.505, 'epoch': 370.0}


 74%|███████▍  | 56021/75500 [4:27:29<48:56,  6.63it/s]   

{'loss': 0.0381, 'learning_rate': 0.0007606522641877412, 'epoch': 371.0}


                                                       
 74%|███████▍  | 56021/75500 [4:27:36<48:56,  6.63it/s]

{'eval_loss': 0.44694504141807556, 'eval_bleu': 14.6942, 'eval_gen_len': 5.2256, 'eval_runtime': 7.2077, 'eval_samples_per_second': 41.206, 'eval_steps_per_second': 2.636, 'epoch': 371.0}


 74%|███████▍  | 56172/75500 [4:28:03<55:08,  5.84it/s]   

{'loss': 0.0387, 'learning_rate': 0.0007547611787955092, 'epoch': 372.0}


                                                       
 74%|███████▍  | 56172/75500 [4:28:10<55:08,  5.84it/s]

{'eval_loss': 0.4450422525405884, 'eval_bleu': 16.0712, 'eval_gen_len': 4.9428, 'eval_runtime': 7.1652, 'eval_samples_per_second': 41.45, 'eval_steps_per_second': 2.652, 'epoch': 372.0}


 75%|███████▍  | 56323/75500 [4:28:40<1:01:24,  5.20it/s] 

{'loss': 0.0381, 'learning_rate': 0.0007488700934032771, 'epoch': 373.0}


                                                         
 75%|███████▍  | 56323/75500 [4:28:49<1:01:24,  5.20it/s]

{'eval_loss': 0.4460837244987488, 'eval_bleu': 16.8659, 'eval_gen_len': 5.2862, 'eval_runtime': 8.6112, 'eval_samples_per_second': 34.49, 'eval_steps_per_second': 2.206, 'epoch': 373.0}


 75%|███████▍  | 56474/75500 [4:29:14<45:08,  7.02it/s]   

{'loss': 0.0361, 'learning_rate': 0.000742979008011045, 'epoch': 374.0}


                                                       
 75%|███████▍  | 56474/75500 [4:29:20<45:08,  7.02it/s]

{'eval_loss': 0.457532674074173, 'eval_bleu': 15.8995, 'eval_gen_len': 5.0943, 'eval_runtime': 6.4107, 'eval_samples_per_second': 46.329, 'eval_steps_per_second': 2.964, 'epoch': 374.0}


 75%|███████▌  | 56625/75500 [4:29:45<50:53,  6.18it/s]   

{'loss': 0.0362, 'learning_rate': 0.0007370879226188129, 'epoch': 375.0}


                                                       
 75%|███████▌  | 56625/75500 [4:29:52<50:53,  6.18it/s]

{'eval_loss': 0.4473254382610321, 'eval_bleu': 15.9188, 'eval_gen_len': 5.3906, 'eval_runtime': 6.4001, 'eval_samples_per_second': 46.405, 'eval_steps_per_second': 2.969, 'epoch': 375.0}


 75%|███████▌  | 56776/75500 [4:30:15<42:07,  7.41it/s]   

{'loss': 0.0376, 'learning_rate': 0.0007311968372265809, 'epoch': 376.0}


                                                       
 75%|███████▌  | 56776/75500 [4:30:21<42:07,  7.41it/s]

{'eval_loss': 0.4505336880683899, 'eval_bleu': 16.6997, 'eval_gen_len': 5.2189, 'eval_runtime': 6.1481, 'eval_samples_per_second': 48.307, 'eval_steps_per_second': 3.09, 'epoch': 376.0}


 75%|███████▌  | 56927/75500 [4:30:44<48:18,  6.41it/s]   

{'loss': 0.0369, 'learning_rate': 0.0007253057518343488, 'epoch': 377.0}


                                                       
 75%|███████▌  | 56927/75500 [4:30:52<48:18,  6.41it/s]

{'eval_loss': 0.4424002766609192, 'eval_bleu': 16.613, 'eval_gen_len': 5.2761, 'eval_runtime': 8.187, 'eval_samples_per_second': 36.277, 'eval_steps_per_second': 2.321, 'epoch': 377.0}


 76%|███████▌  | 57078/75500 [4:31:26<58:32,  5.24it/s]   

{'loss': 0.0353, 'learning_rate': 0.0007194536802526613, 'epoch': 378.0}


                                                       
 76%|███████▌  | 57078/75500 [4:31:35<58:32,  5.24it/s]

{'eval_loss': 0.4496085047721863, 'eval_bleu': 13.3611, 'eval_gen_len': 5.6835, 'eval_runtime': 8.6618, 'eval_samples_per_second': 34.289, 'eval_steps_per_second': 2.194, 'epoch': 378.0}


 76%|███████▌  | 57229/75500 [4:32:03<48:50,  6.24it/s]   

{'loss': 0.0341, 'learning_rate': 0.0007135625948604292, 'epoch': 379.0}


                                                       
 76%|███████▌  | 57229/75500 [4:32:10<48:50,  6.24it/s]

{'eval_loss': 0.4466317296028137, 'eval_bleu': 17.0332, 'eval_gen_len': 5.3939, 'eval_runtime': 6.662, 'eval_samples_per_second': 44.581, 'eval_steps_per_second': 2.852, 'epoch': 379.0}


 76%|███████▌  | 57380/75500 [4:32:36<54:02,  5.59it/s]   

{'loss': 0.0353, 'learning_rate': 0.0007076715094681971, 'epoch': 380.0}


                                                       
 76%|███████▌  | 57380/75500 [4:32:43<54:02,  5.59it/s]

{'eval_loss': 0.4572691321372986, 'eval_bleu': 14.9272, 'eval_gen_len': 4.7475, 'eval_runtime': 6.6755, 'eval_samples_per_second': 44.491, 'eval_steps_per_second': 2.846, 'epoch': 380.0}


 76%|███████▌  | 57531/75500 [4:33:08<46:47,  6.40it/s]   

{'loss': 0.0355, 'learning_rate': 0.000701780424075965, 'epoch': 381.0}


                                                       
 76%|███████▌  | 57531/75500 [4:33:15<46:47,  6.40it/s]

{'eval_loss': 0.4473365843296051, 'eval_bleu': 15.0493, 'eval_gen_len': 4.9327, 'eval_runtime': 7.2684, 'eval_samples_per_second': 40.862, 'eval_steps_per_second': 2.614, 'epoch': 381.0}


 76%|███████▋  | 57682/75500 [4:33:43<58:29,  5.08it/s]   

{'loss': 0.0352, 'learning_rate': 0.000695889338683733, 'epoch': 382.0}


                                                       
 76%|███████▋  | 57682/75500 [4:33:50<58:29,  5.08it/s]

{'eval_loss': 0.4557546377182007, 'eval_bleu': 16.3515, 'eval_gen_len': 5.2054, 'eval_runtime': 7.7309, 'eval_samples_per_second': 38.417, 'eval_steps_per_second': 2.458, 'epoch': 382.0}


 77%|███████▋  | 57833/75500 [4:34:19<44:28,  6.62it/s]   

{'loss': 0.034, 'learning_rate': 0.0006899982532915009, 'epoch': 383.0}


                                                       
 77%|███████▋  | 57833/75500 [4:34:27<44:28,  6.62it/s]

{'eval_loss': 0.4448692798614502, 'eval_bleu': 18.0528, 'eval_gen_len': 5.1684, 'eval_runtime': 7.8209, 'eval_samples_per_second': 37.975, 'eval_steps_per_second': 2.429, 'epoch': 383.0}


 77%|███████▋  | 57984/75500 [4:35:01<59:00,  4.95it/s]   

{'loss': 0.0355, 'learning_rate': 0.0006841071678992688, 'epoch': 384.0}


                                                       
 77%|███████▋  | 57984/75500 [4:35:09<59:00,  4.95it/s]

{'eval_loss': 0.448324590921402, 'eval_bleu': 15.796, 'eval_gen_len': 5.2525, 'eval_runtime': 8.5193, 'eval_samples_per_second': 34.862, 'eval_steps_per_second': 2.23, 'epoch': 384.0}


 77%|███████▋  | 58135/75500 [4:35:36<58:53,  4.91it/s]   

{'loss': 0.0356, 'learning_rate': 0.0006782160825070367, 'epoch': 385.0}


                                                       
 77%|███████▋  | 58135/75500 [4:35:45<58:53,  4.91it/s]

{'eval_loss': 0.4363519549369812, 'eval_bleu': 15.8846, 'eval_gen_len': 5.5421, 'eval_runtime': 8.4036, 'eval_samples_per_second': 35.342, 'eval_steps_per_second': 2.261, 'epoch': 385.0}


 77%|███████▋  | 58286/75500 [4:36:12<49:11,  5.83it/s]   

{'loss': 0.0348, 'learning_rate': 0.0006723249971148046, 'epoch': 386.0}


                                                       
 77%|███████▋  | 58286/75500 [4:36:18<49:11,  5.83it/s]

{'eval_loss': 0.45100799202919006, 'eval_bleu': 16.4782, 'eval_gen_len': 5.4545, 'eval_runtime': 6.633, 'eval_samples_per_second': 44.776, 'eval_steps_per_second': 2.864, 'epoch': 386.0}


 77%|███████▋  | 58437/75500 [4:36:46<43:08,  6.59it/s]   

{'loss': 0.0324, 'learning_rate': 0.0006664339117225725, 'epoch': 387.0}


                                                       
 77%|███████▋  | 58437/75500 [4:36:53<43:08,  6.59it/s]

{'eval_loss': 0.45617881417274475, 'eval_bleu': 16.6753, 'eval_gen_len': 5.4545, 'eval_runtime': 7.085, 'eval_samples_per_second': 41.919, 'eval_steps_per_second': 2.682, 'epoch': 387.0}


 78%|███████▊  | 58588/75500 [4:37:23<1:06:35,  4.23it/s] 

{'loss': 0.033, 'learning_rate': 0.0006605428263303404, 'epoch': 388.0}


                                                         
 78%|███████▊  | 58588/75500 [4:37:30<1:06:35,  4.23it/s]

{'eval_loss': 0.4381517469882965, 'eval_bleu': 14.5622, 'eval_gen_len': 5.4209, 'eval_runtime': 7.616, 'eval_samples_per_second': 38.997, 'eval_steps_per_second': 2.495, 'epoch': 388.0}


 78%|███████▊  | 58739/75500 [4:38:00<43:45,  6.38it/s]   

{'loss': 0.0319, 'learning_rate': 0.0006546517409381083, 'epoch': 389.0}


                                                       
 78%|███████▊  | 58739/75500 [4:38:08<43:45,  6.38it/s]

{'eval_loss': 0.45481497049331665, 'eval_bleu': 16.9208, 'eval_gen_len': 5.1246, 'eval_runtime': 8.27, 'eval_samples_per_second': 35.913, 'eval_steps_per_second': 2.297, 'epoch': 389.0}


 78%|███████▊  | 58890/75500 [4:38:37<49:13,  5.62it/s]   

{'loss': 0.0332, 'learning_rate': 0.0006487606555458762, 'epoch': 390.0}


                                                       
 78%|███████▊  | 58890/75500 [4:38:44<49:13,  5.62it/s]

{'eval_loss': 0.44829869270324707, 'eval_bleu': 16.7131, 'eval_gen_len': 5.1515, 'eval_runtime': 7.897, 'eval_samples_per_second': 37.609, 'eval_steps_per_second': 2.406, 'epoch': 390.0}


 78%|███████▊  | 59041/75500 [4:39:10<39:21,  6.97it/s]   

{'loss': 0.0325, 'learning_rate': 0.0006428695701536441, 'epoch': 391.0}


                                                       
 78%|███████▊  | 59041/75500 [4:39:17<39:21,  6.97it/s]

{'eval_loss': 0.4462846517562866, 'eval_bleu': 15.0424, 'eval_gen_len': 5.2323, 'eval_runtime': 7.1659, 'eval_samples_per_second': 41.446, 'eval_steps_per_second': 2.651, 'epoch': 391.0}


 78%|███████▊  | 59192/75500 [4:39:44<48:54,  5.56it/s]   

{'loss': 0.0319, 'learning_rate': 0.0006369784847614121, 'epoch': 392.0}


                                                       
 78%|███████▊  | 59192/75500 [4:39:51<48:54,  5.56it/s]

{'eval_loss': 0.44802308082580566, 'eval_bleu': 15.6024, 'eval_gen_len': 5.1212, 'eval_runtime': 6.9063, 'eval_samples_per_second': 43.004, 'eval_steps_per_second': 2.751, 'epoch': 392.0}


 79%|███████▊  | 59343/75500 [4:40:18<49:38,  5.42it/s]   

{'loss': 0.0337, 'learning_rate': 0.0006311264131797246, 'epoch': 393.0}


                                                       
 79%|███████▊  | 59343/75500 [4:40:25<49:38,  5.42it/s]

{'eval_loss': 0.44215330481529236, 'eval_bleu': 13.6943, 'eval_gen_len': 5.4579, 'eval_runtime': 7.401, 'eval_samples_per_second': 40.13, 'eval_steps_per_second': 2.567, 'epoch': 393.0}


 79%|███████▉  | 59494/75500 [4:40:51<51:23,  5.19it/s]   

{'loss': 0.0321, 'learning_rate': 0.0006252353277874925, 'epoch': 394.0}


                                                       
 79%|███████▉  | 59494/75500 [4:40:57<51:23,  5.19it/s]

{'eval_loss': 0.44808876514434814, 'eval_bleu': 12.8177, 'eval_gen_len': 5.1111, 'eval_runtime': 6.8019, 'eval_samples_per_second': 43.664, 'eval_steps_per_second': 2.793, 'epoch': 394.0}


 79%|███████▉  | 59645/75500 [4:41:28<1:01:56,  4.27it/s] 

{'loss': 0.033, 'learning_rate': 0.0006193442423952604, 'epoch': 395.0}


                                                         
 79%|███████▉  | 59645/75500 [4:41:37<1:01:56,  4.27it/s]

{'eval_loss': 0.450851172208786, 'eval_bleu': 13.2151, 'eval_gen_len': 5.1414, 'eval_runtime': 8.892, 'eval_samples_per_second': 33.401, 'eval_steps_per_second': 2.137, 'epoch': 395.0}


 79%|███████▉  | 59796/75500 [4:42:08<47:17,  5.54it/s]   

{'loss': 0.0321, 'learning_rate': 0.0006134531570030283, 'epoch': 396.0}


                                                       
 79%|███████▉  | 59796/75500 [4:42:16<47:17,  5.54it/s]

{'eval_loss': 0.4565580189228058, 'eval_bleu': 13.1489, 'eval_gen_len': 5.1717, 'eval_runtime': 7.8534, 'eval_samples_per_second': 37.818, 'eval_steps_per_second': 2.419, 'epoch': 396.0}


 79%|███████▉  | 59947/75500 [4:42:42<39:26,  6.57it/s]   

{'loss': 0.032, 'learning_rate': 0.0006076010854213408, 'epoch': 397.0}


                                                       
 79%|███████▉  | 59947/75500 [4:42:49<39:26,  6.57it/s]

{'eval_loss': 0.4501765966415405, 'eval_bleu': 14.9346, 'eval_gen_len': 4.9731, 'eval_runtime': 6.908, 'eval_samples_per_second': 42.994, 'eval_steps_per_second': 2.75, 'epoch': 397.0}


 80%|███████▉  | 60098/75500 [4:43:15<37:20,  6.88it/s]   

{'loss': 0.0303, 'learning_rate': 0.0006017100000291088, 'epoch': 398.0}


                                                       
 80%|███████▉  | 60098/75500 [4:43:22<37:20,  6.88it/s]

{'eval_loss': 0.4529598355293274, 'eval_bleu': 16.5099, 'eval_gen_len': 4.9562, 'eval_runtime': 7.458, 'eval_samples_per_second': 39.823, 'eval_steps_per_second': 2.548, 'epoch': 398.0}


 80%|███████▉  | 60249/75500 [4:43:49<46:33,  5.46it/s]   

{'loss': 0.0309, 'learning_rate': 0.0005958189146368767, 'epoch': 399.0}


                                                       
 80%|███████▉  | 60249/75500 [4:43:57<46:33,  5.46it/s]

{'eval_loss': 0.4486302435398102, 'eval_bleu': 16.2385, 'eval_gen_len': 5.1549, 'eval_runtime': 7.0766, 'eval_samples_per_second': 41.97, 'eval_steps_per_second': 2.685, 'epoch': 399.0}


 80%|████████  | 60400/75500 [4:44:24<57:42,  4.36it/s]   

{'loss': 0.0299, 'learning_rate': 0.0005899278292446446, 'epoch': 400.0}


                                                       
 80%|████████  | 60400/75500 [4:44:33<57:42,  4.36it/s]

{'eval_loss': 0.45346829295158386, 'eval_bleu': 16.0796, 'eval_gen_len': 5.2795, 'eval_runtime': 9.7316, 'eval_samples_per_second': 30.519, 'eval_steps_per_second': 1.952, 'epoch': 400.0}


 80%|████████  | 60551/75500 [4:45:02<47:31,  5.24it/s]   

{'loss': 0.0284, 'learning_rate': 0.0005840367438524125, 'epoch': 401.0}


                                                       
 80%|████████  | 60551/75500 [4:45:09<47:31,  5.24it/s]

{'eval_loss': 0.44766584038734436, 'eval_bleu': 17.8809, 'eval_gen_len': 5.3468, 'eval_runtime': 7.3303, 'eval_samples_per_second': 40.517, 'eval_steps_per_second': 2.592, 'epoch': 401.0}


 80%|████████  | 60702/75500 [4:45:34<39:25,  6.25it/s]   

{'loss': 0.0288, 'learning_rate': 0.0005781456584601804, 'epoch': 402.0}


                                                       
 80%|████████  | 60702/75500 [4:45:41<39:25,  6.25it/s]

{'eval_loss': 0.4455082416534424, 'eval_bleu': 14.2839, 'eval_gen_len': 5.5892, 'eval_runtime': 6.6367, 'eval_samples_per_second': 44.751, 'eval_steps_per_second': 2.863, 'epoch': 402.0}


 81%|████████  | 60853/75500 [4:46:05<35:24,  6.89it/s]   

{'loss': 0.0299, 'learning_rate': 0.0005722545730679484, 'epoch': 403.0}


                                                       
 81%|████████  | 60853/75500 [4:46:13<35:24,  6.89it/s]

{'eval_loss': 0.44754764437675476, 'eval_bleu': 15.533, 'eval_gen_len': 4.9293, 'eval_runtime': 7.2896, 'eval_samples_per_second': 40.743, 'eval_steps_per_second': 2.606, 'epoch': 403.0}


 81%|████████  | 61004/75500 [4:46:38<39:10,  6.17it/s]   

{'loss': 0.03, 'learning_rate': 0.0005663634876757163, 'epoch': 404.0}


                                                       
 81%|████████  | 61004/75500 [4:46:44<39:10,  6.17it/s]

{'eval_loss': 0.44707614183425903, 'eval_bleu': 15.1874, 'eval_gen_len': 5.1953, 'eval_runtime': 6.8038, 'eval_samples_per_second': 43.652, 'eval_steps_per_second': 2.793, 'epoch': 404.0}


 81%|████████  | 61155/75500 [4:47:08<35:40,  6.70it/s]  

{'loss': 0.0286, 'learning_rate': 0.0005604724022834842, 'epoch': 405.0}


                                                       
 81%|████████  | 61155/75500 [4:47:15<35:40,  6.70it/s]

{'eval_loss': 0.45330512523651123, 'eval_bleu': 14.2858, 'eval_gen_len': 5.4545, 'eval_runtime': 6.6974, 'eval_samples_per_second': 44.346, 'eval_steps_per_second': 2.837, 'epoch': 405.0}


 81%|████████  | 61306/75500 [4:47:41<35:37,  6.64it/s]   

{'loss': 0.0294, 'learning_rate': 0.0005545813168912521, 'epoch': 406.0}


                                                       
 81%|████████  | 61306/75500 [4:47:48<35:37,  6.64it/s]

{'eval_loss': 0.4498440623283386, 'eval_bleu': 17.6565, 'eval_gen_len': 5.2761, 'eval_runtime': 7.0119, 'eval_samples_per_second': 42.357, 'eval_steps_per_second': 2.71, 'epoch': 406.0}


 81%|████████▏ | 61457/75500 [4:48:13<36:17,  6.45it/s]  

{'loss': 0.029, 'learning_rate': 0.00054869023149902, 'epoch': 407.0}


                                                       
 81%|████████▏ | 61457/75500 [4:48:20<36:17,  6.45it/s]

{'eval_loss': 0.45663928985595703, 'eval_bleu': 13.8433, 'eval_gen_len': 5.4276, 'eval_runtime': 6.7081, 'eval_samples_per_second': 44.275, 'eval_steps_per_second': 2.832, 'epoch': 407.0}


 82%|████████▏ | 61608/75500 [4:48:44<34:49,  6.65it/s]  

{'loss': 0.0306, 'learning_rate': 0.000542799146106788, 'epoch': 408.0}


                                                       
 82%|████████▏ | 61608/75500 [4:48:51<34:49,  6.65it/s]

{'eval_loss': 0.46802836656570435, 'eval_bleu': 16.314, 'eval_gen_len': 5.229, 'eval_runtime': 6.8692, 'eval_samples_per_second': 43.237, 'eval_steps_per_second': 2.766, 'epoch': 408.0}


 82%|████████▏ | 61759/75500 [4:49:21<56:00,  4.09it/s]  

{'loss': 0.0298, 'learning_rate': 0.0005369080607145559, 'epoch': 409.0}


                                                       
 82%|████████▏ | 61759/75500 [4:49:33<56:00,  4.09it/s]

{'eval_loss': 0.44674158096313477, 'eval_bleu': 16.6359, 'eval_gen_len': 5.6566, 'eval_runtime': 11.2679, 'eval_samples_per_second': 26.358, 'eval_steps_per_second': 1.686, 'epoch': 409.0}


 82%|████████▏ | 61910/75500 [4:50:03<45:50,  4.94it/s]   

{'loss': 0.0286, 'learning_rate': 0.0005310169753223238, 'epoch': 410.0}


                                                       
 82%|████████▏ | 61910/75500 [4:50:11<45:50,  4.94it/s]

{'eval_loss': 0.44980546832084656, 'eval_bleu': 17.2045, 'eval_gen_len': 4.9125, 'eval_runtime': 7.384, 'eval_samples_per_second': 40.222, 'eval_steps_per_second': 2.573, 'epoch': 410.0}


 82%|████████▏ | 62061/75500 [4:50:39<39:24,  5.68it/s]   

{'loss': 0.0274, 'learning_rate': 0.0005251258899300917, 'epoch': 411.0}


                                                       
 82%|████████▏ | 62061/75500 [4:50:46<39:24,  5.68it/s]

{'eval_loss': 0.4400363564491272, 'eval_bleu': 15.3169, 'eval_gen_len': 5.4781, 'eval_runtime': 6.6327, 'eval_samples_per_second': 44.778, 'eval_steps_per_second': 2.865, 'epoch': 411.0}


 82%|████████▏ | 62212/75500 [4:51:11<42:52,  5.16it/s]  

{'loss': 0.0268, 'learning_rate': 0.0005192348045378595, 'epoch': 412.0}


                                                       
 82%|████████▏ | 62212/75500 [4:51:18<42:52,  5.16it/s]

{'eval_loss': 0.4504562020301819, 'eval_bleu': 16.7009, 'eval_gen_len': 5.0135, 'eval_runtime': 7.0706, 'eval_samples_per_second': 42.005, 'eval_steps_per_second': 2.687, 'epoch': 412.0}


 83%|████████▎ | 62363/75500 [4:51:45<34:20,  6.38it/s]  

{'loss': 0.0265, 'learning_rate': 0.0005133437191456275, 'epoch': 413.0}


                                                       
 83%|████████▎ | 62363/75500 [4:51:52<34:20,  6.38it/s]

{'eval_loss': 0.4494490623474121, 'eval_bleu': 16.7993, 'eval_gen_len': 5.3805, 'eval_runtime': 6.6821, 'eval_samples_per_second': 44.447, 'eval_steps_per_second': 2.843, 'epoch': 413.0}


 83%|████████▎ | 62514/75500 [4:52:20<34:53,  6.20it/s]   

{'loss': 0.026, 'learning_rate': 0.0005074526337533954, 'epoch': 414.0}


                                                       
 83%|████████▎ | 62514/75500 [4:52:27<34:53,  6.20it/s]

{'eval_loss': 0.4412469267845154, 'eval_bleu': 18.6503, 'eval_gen_len': 5.2222, 'eval_runtime': 6.5695, 'eval_samples_per_second': 45.209, 'eval_steps_per_second': 2.892, 'epoch': 414.0}


 83%|████████▎ | 62665/75500 [4:52:51<33:05,  6.46it/s]  

{'loss': 0.0245, 'learning_rate': 0.0005015615483611633, 'epoch': 415.0}


                                                       
 83%|████████▎ | 62665/75500 [4:52:59<33:05,  6.46it/s]

{'eval_loss': 0.4423464238643646, 'eval_bleu': 18.0056, 'eval_gen_len': 5.367, 'eval_runtime': 8.0655, 'eval_samples_per_second': 36.824, 'eval_steps_per_second': 2.356, 'epoch': 415.0}


 83%|████████▎ | 62816/75500 [4:53:28<38:31,  5.49it/s]   

{'loss': 0.0253, 'learning_rate': 0.0004956704629689312, 'epoch': 416.0}


                                                       
 83%|████████▎ | 62816/75500 [4:53:37<38:31,  5.49it/s]

{'eval_loss': 0.4405679404735565, 'eval_bleu': 16.8157, 'eval_gen_len': 5.3098, 'eval_runtime': 9.1156, 'eval_samples_per_second': 32.582, 'eval_steps_per_second': 2.084, 'epoch': 416.0}


 83%|████████▎ | 62967/75500 [4:54:05<41:38,  5.02it/s]   

{'loss': 0.0255, 'learning_rate': 0.0004897793775766992, 'epoch': 417.0}


                                                       
 83%|████████▎ | 62967/75500 [4:54:12<41:38,  5.02it/s]

{'eval_loss': 0.44902917742729187, 'eval_bleu': 18.7278, 'eval_gen_len': 5.1684, 'eval_runtime': 7.143, 'eval_samples_per_second': 41.579, 'eval_steps_per_second': 2.66, 'epoch': 417.0}


 84%|████████▎ | 63118/75500 [4:54:39<34:16,  6.02it/s]  

{'loss': 0.0249, 'learning_rate': 0.0004838882921844671, 'epoch': 418.0}


                                                       
 84%|████████▎ | 63118/75500 [4:54:46<34:16,  6.02it/s]

{'eval_loss': 0.4500688314437866, 'eval_bleu': 15.7179, 'eval_gen_len': 5.229, 'eval_runtime': 7.1191, 'eval_samples_per_second': 41.719, 'eval_steps_per_second': 2.669, 'epoch': 418.0}


 84%|████████▍ | 63269/75500 [4:55:12<35:52,  5.68it/s]  

{'loss': 0.0252, 'learning_rate': 0.00047799720679223505, 'epoch': 419.0}


                                                       
 84%|████████▍ | 63269/75500 [4:55:20<35:52,  5.68it/s]

{'eval_loss': 0.44399356842041016, 'eval_bleu': 17.3117, 'eval_gen_len': 5.3872, 'eval_runtime': 7.4368, 'eval_samples_per_second': 39.937, 'eval_steps_per_second': 2.555, 'epoch': 419.0}


 84%|████████▍ | 63420/75500 [4:55:47<31:54,  6.31it/s]  

{'loss': 0.024, 'learning_rate': 0.0004721451352105475, 'epoch': 420.0}


                                                       
 84%|████████▍ | 63420/75500 [4:55:54<31:54,  6.31it/s]

{'eval_loss': 0.4676544964313507, 'eval_bleu': 15.6316, 'eval_gen_len': 4.5859, 'eval_runtime': 7.5067, 'eval_samples_per_second': 39.565, 'eval_steps_per_second': 2.531, 'epoch': 420.0}


 84%|████████▍ | 63571/75500 [4:56:22<30:31,  6.51it/s]  

{'loss': 0.024, 'learning_rate': 0.0004662540498183154, 'epoch': 421.0}


                                                       
 84%|████████▍ | 63571/75500 [4:56:30<30:31,  6.51it/s]

{'eval_loss': 0.4428177773952484, 'eval_bleu': 16.3674, 'eval_gen_len': 5.2458, 'eval_runtime': 7.8711, 'eval_samples_per_second': 37.733, 'eval_steps_per_second': 2.414, 'epoch': 421.0}


 84%|████████▍ | 63722/75500 [4:57:00<48:03,  4.08it/s]  

{'loss': 0.0232, 'learning_rate': 0.0004603629644260833, 'epoch': 422.0}


                                                       
 84%|████████▍ | 63722/75500 [4:57:07<48:03,  4.08it/s]

{'eval_loss': 0.4485437572002411, 'eval_bleu': 15.1438, 'eval_gen_len': 5.165, 'eval_runtime': 7.0095, 'eval_samples_per_second': 42.371, 'eval_steps_per_second': 2.711, 'epoch': 422.0}


 85%|████████▍ | 63873/75500 [4:57:36<38:57,  4.97it/s]  

{'loss': 0.025, 'learning_rate': 0.0004544718790338513, 'epoch': 423.0}


                                                       
 85%|████████▍ | 63873/75500 [4:57:45<38:57,  4.97it/s]

{'eval_loss': 0.44518405199050903, 'eval_bleu': 16.185, 'eval_gen_len': 5.5286, 'eval_runtime': 9.1312, 'eval_samples_per_second': 32.526, 'eval_steps_per_second': 2.081, 'epoch': 423.0}


 85%|████████▍ | 64024/75500 [4:58:16<42:25,  4.51it/s]   

{'loss': 0.0259, 'learning_rate': 0.00044858079364161924, 'epoch': 424.0}


                                                       
 85%|████████▍ | 64024/75500 [4:58:24<42:25,  4.51it/s]

{'eval_loss': 0.4350792467594147, 'eval_bleu': 16.0279, 'eval_gen_len': 5.1279, 'eval_runtime': 7.5111, 'eval_samples_per_second': 39.541, 'eval_steps_per_second': 2.53, 'epoch': 424.0}


 85%|████████▌ | 64175/75500 [4:58:54<32:22,  5.83it/s]  

{'loss': 0.0247, 'learning_rate': 0.0004426897082493871, 'epoch': 425.0}


                                                       
 85%|████████▌ | 64175/75500 [4:59:02<32:22,  5.83it/s]

{'eval_loss': 0.4352477490901947, 'eval_bleu': 17.4661, 'eval_gen_len': 5.2626, 'eval_runtime': 8.1047, 'eval_samples_per_second': 36.645, 'eval_steps_per_second': 2.344, 'epoch': 425.0}


 85%|████████▌ | 64326/75500 [4:59:28<25:43,  7.24it/s]  

{'loss': 0.0241, 'learning_rate': 0.00043679862285715503, 'epoch': 426.0}


                                                       
 85%|████████▌ | 64326/75500 [4:59:36<25:43,  7.24it/s]

{'eval_loss': 0.44744041562080383, 'eval_bleu': 15.8372, 'eval_gen_len': 5.1785, 'eval_runtime': 7.5225, 'eval_samples_per_second': 39.482, 'eval_steps_per_second': 2.526, 'epoch': 426.0}


 85%|████████▌ | 64477/75500 [5:00:03<27:18,  6.73it/s]  

{'loss': 0.0237, 'learning_rate': 0.00043090753746492295, 'epoch': 427.0}


                                                       
 85%|████████▌ | 64477/75500 [5:00:10<27:18,  6.73it/s]

{'eval_loss': 0.4488372206687927, 'eval_bleu': 14.1898, 'eval_gen_len': 5.1751, 'eval_runtime': 7.5283, 'eval_samples_per_second': 39.451, 'eval_steps_per_second': 2.524, 'epoch': 427.0}


 86%|████████▌ | 64628/75500 [5:00:36<26:57,  6.72it/s]  

{'loss': 0.0235, 'learning_rate': 0.00042501645207269087, 'epoch': 428.0}


                                                       
 86%|████████▌ | 64628/75500 [5:00:43<26:57,  6.72it/s]

{'eval_loss': 0.43851447105407715, 'eval_bleu': 17.2818, 'eval_gen_len': 5.2896, 'eval_runtime': 6.9924, 'eval_samples_per_second': 42.474, 'eval_steps_per_second': 2.717, 'epoch': 428.0}


 86%|████████▌ | 64779/75500 [5:01:08<35:13,  5.07it/s]  

{'loss': 0.022, 'learning_rate': 0.0004191253666804588, 'epoch': 429.0}


                                                       
 86%|████████▌ | 64779/75500 [5:01:16<35:13,  5.07it/s]

{'eval_loss': 0.4391949474811554, 'eval_bleu': 20.1177, 'eval_gen_len': 5.0135, 'eval_runtime': 7.9447, 'eval_samples_per_second': 37.384, 'eval_steps_per_second': 2.392, 'epoch': 429.0}


 86%|████████▌ | 64930/75500 [5:01:49<33:07,  5.32it/s]  

{'loss': 0.0223, 'learning_rate': 0.00041323428128822666, 'epoch': 430.0}


                                                       
 86%|████████▌ | 64930/75500 [5:01:58<33:07,  5.32it/s]

{'eval_loss': 0.44459468126296997, 'eval_bleu': 19.6365, 'eval_gen_len': 5.2323, 'eval_runtime': 8.8297, 'eval_samples_per_second': 33.637, 'eval_steps_per_second': 2.152, 'epoch': 430.0}


 86%|████████▌ | 65081/75500 [5:02:26<25:35,  6.79it/s]  

{'loss': 0.0215, 'learning_rate': 0.0004073431958959946, 'epoch': 431.0}


                                                       
 86%|████████▌ | 65081/75500 [5:02:33<25:35,  6.79it/s]

{'eval_loss': 0.4516051709651947, 'eval_bleu': 19.3731, 'eval_gen_len': 4.9226, 'eval_runtime': 6.7904, 'eval_samples_per_second': 43.738, 'eval_steps_per_second': 2.798, 'epoch': 431.0}


 86%|████████▋ | 65232/75500 [5:03:02<26:14,  6.52it/s]  

{'loss': 0.0221, 'learning_rate': 0.0004014521105037625, 'epoch': 432.0}


                                                       
 86%|████████▋ | 65232/75500 [5:03:10<26:14,  6.52it/s]

{'eval_loss': 0.45638638734817505, 'eval_bleu': 19.3666, 'eval_gen_len': 4.7172, 'eval_runtime': 8.5358, 'eval_samples_per_second': 34.795, 'eval_steps_per_second': 2.226, 'epoch': 432.0}


 87%|████████▋ | 65383/75500 [5:03:41<32:03,  5.26it/s]  

{'loss': 0.022, 'learning_rate': 0.0003955610251115304, 'epoch': 433.0}


                                                       
 87%|████████▋ | 65383/75500 [5:03:49<32:03,  5.26it/s]

{'eval_loss': 0.4518551230430603, 'eval_bleu': 18.3353, 'eval_gen_len': 4.9832, 'eval_runtime': 7.8035, 'eval_samples_per_second': 38.06, 'eval_steps_per_second': 2.435, 'epoch': 433.0}


 87%|████████▋ | 65534/75500 [5:04:20<51:19,  3.24it/s]  

{'loss': 0.0207, 'learning_rate': 0.00038966993971929835, 'epoch': 434.0}


                                                       
 87%|████████▋ | 65534/75500 [5:04:29<51:19,  3.24it/s]

{'eval_loss': 0.45286327600479126, 'eval_bleu': 17.7158, 'eval_gen_len': 4.8956, 'eval_runtime': 8.8969, 'eval_samples_per_second': 33.382, 'eval_steps_per_second': 2.136, 'epoch': 434.0}


 87%|████████▋ | 65685/75500 [5:04:58<30:32,  5.36it/s]  

{'loss': 0.0207, 'learning_rate': 0.0003837788543270662, 'epoch': 435.0}


                                                       
 87%|████████▋ | 65685/75500 [5:05:05<30:32,  5.36it/s]

{'eval_loss': 0.44913604855537415, 'eval_bleu': 19.0654, 'eval_gen_len': 5.1852, 'eval_runtime': 6.8479, 'eval_samples_per_second': 43.371, 'eval_steps_per_second': 2.775, 'epoch': 435.0}


 87%|████████▋ | 65836/75500 [5:05:34<23:42,  6.79it/s]  

{'loss': 0.0208, 'learning_rate': 0.00037788776893483414, 'epoch': 436.0}


                                                       
 87%|████████▋ | 65836/75500 [5:05:42<23:42,  6.79it/s]

{'eval_loss': 0.4437306821346283, 'eval_bleu': 18.1345, 'eval_gen_len': 5.2088, 'eval_runtime': 7.8489, 'eval_samples_per_second': 37.84, 'eval_steps_per_second': 2.421, 'epoch': 436.0}


 87%|████████▋ | 65987/75500 [5:06:11<31:09,  5.09it/s]  

{'loss': 0.0201, 'learning_rate': 0.00037199668354260206, 'epoch': 437.0}


                                                       
 87%|████████▋ | 65987/75500 [5:06:18<31:09,  5.09it/s]

{'eval_loss': 0.44659486413002014, 'eval_bleu': 17.5944, 'eval_gen_len': 5.2828, 'eval_runtime': 7.6946, 'eval_samples_per_second': 38.599, 'eval_steps_per_second': 2.469, 'epoch': 437.0}


 88%|████████▊ | 66138/75500 [5:06:46<32:50,  4.75it/s]  

{'loss': 0.0204, 'learning_rate': 0.00036610559815037003, 'epoch': 438.0}


                                                       
 88%|████████▊ | 66138/75500 [5:06:53<32:50,  4.75it/s]

{'eval_loss': 0.4578809142112732, 'eval_bleu': 18.2172, 'eval_gen_len': 4.9158, 'eval_runtime': 7.4516, 'eval_samples_per_second': 39.857, 'eval_steps_per_second': 2.55, 'epoch': 438.0}


 88%|████████▊ | 66289/75500 [5:07:21<29:17,  5.24it/s]  

{'loss': 0.0205, 'learning_rate': 0.0003602145127581379, 'epoch': 439.0}


                                                       
 88%|████████▊ | 66289/75500 [5:07:28<29:17,  5.24it/s]

{'eval_loss': 0.44323793053627014, 'eval_bleu': 16.3145, 'eval_gen_len': 4.9428, 'eval_runtime': 6.6398, 'eval_samples_per_second': 44.73, 'eval_steps_per_second': 2.862, 'epoch': 439.0}


 88%|████████▊ | 66440/75500 [5:07:55<24:14,  6.23it/s]  

{'loss': 0.0217, 'learning_rate': 0.0003543234273659058, 'epoch': 440.0}


                                                       
 88%|████████▊ | 66440/75500 [5:08:02<24:14,  6.23it/s]

{'eval_loss': 0.4389957785606384, 'eval_bleu': 20.1743, 'eval_gen_len': 5.3771, 'eval_runtime': 7.1516, 'eval_samples_per_second': 41.529, 'eval_steps_per_second': 2.657, 'epoch': 440.0}


 88%|████████▊ | 66591/75500 [5:08:30<27:22,  5.42it/s]  

{'loss': 0.0203, 'learning_rate': 0.00034843234197367374, 'epoch': 441.0}


                                                       
 88%|████████▊ | 66591/75500 [5:08:38<27:22,  5.42it/s]

{'eval_loss': 0.44376206398010254, 'eval_bleu': 17.823, 'eval_gen_len': 5.2795, 'eval_runtime': 7.1221, 'eval_samples_per_second': 41.701, 'eval_steps_per_second': 2.668, 'epoch': 441.0}


 88%|████████▊ | 66742/75500 [5:09:04<25:06,  5.81it/s]  

{'loss': 0.0192, 'learning_rate': 0.00034254125658144167, 'epoch': 442.0}


                                                       
 88%|████████▊ | 66742/75500 [5:09:11<25:06,  5.81it/s]

{'eval_loss': 0.4435098469257355, 'eval_bleu': 19.0787, 'eval_gen_len': 5.1987, 'eval_runtime': 7.5033, 'eval_samples_per_second': 39.583, 'eval_steps_per_second': 2.532, 'epoch': 442.0}


 89%|████████▊ | 66893/75500 [5:09:38<25:16,  5.68it/s]  

{'loss': 0.0185, 'learning_rate': 0.0003366501711892096, 'epoch': 443.0}


                                                       
 89%|████████▊ | 66893/75500 [5:09:46<25:16,  5.68it/s]

{'eval_loss': 0.4405202269554138, 'eval_bleu': 17.6108, 'eval_gen_len': 5.1987, 'eval_runtime': 7.6082, 'eval_samples_per_second': 39.037, 'eval_steps_per_second': 2.497, 'epoch': 443.0}


 89%|████████▉ | 67044/75500 [5:10:12<26:37,  5.29it/s]  

{'loss': 0.0195, 'learning_rate': 0.0003307590857969775, 'epoch': 444.0}


                                                       
 89%|████████▉ | 67044/75500 [5:10:20<26:37,  5.29it/s]

{'eval_loss': 0.4472813606262207, 'eval_bleu': 15.7472, 'eval_gen_len': 5.0236, 'eval_runtime': 7.6005, 'eval_samples_per_second': 39.076, 'eval_steps_per_second': 2.5, 'epoch': 444.0}


 89%|████████▉ | 67195/75500 [5:10:50<27:49,  4.97it/s]  

{'loss': 0.0195, 'learning_rate': 0.00032486800040474543, 'epoch': 445.0}


                                                       
 89%|████████▉ | 67195/75500 [5:10:57<27:49,  4.97it/s]

{'eval_loss': 0.45668894052505493, 'eval_bleu': 17.4271, 'eval_gen_len': 5.1279, 'eval_runtime': 7.3616, 'eval_samples_per_second': 40.345, 'eval_steps_per_second': 2.581, 'epoch': 445.0}


 89%|████████▉ | 67346/75500 [5:11:25<25:05,  5.42it/s]  

{'loss': 0.0187, 'learning_rate': 0.00031897691501251335, 'epoch': 446.0}


                                                       
 89%|████████▉ | 67346/75500 [5:11:33<25:05,  5.42it/s]

{'eval_loss': 0.4585432708263397, 'eval_bleu': 17.1522, 'eval_gen_len': 4.7845, 'eval_runtime': 7.6271, 'eval_samples_per_second': 38.94, 'eval_steps_per_second': 2.491, 'epoch': 446.0}


 89%|████████▉ | 67497/75500 [5:12:01<21:48,  6.11it/s]  

{'loss': 0.0179, 'learning_rate': 0.0003130858296202813, 'epoch': 447.0}


                                                       
 89%|████████▉ | 67497/75500 [5:12:08<21:48,  6.11it/s]

{'eval_loss': 0.4564291834831238, 'eval_bleu': 17.0221, 'eval_gen_len': 4.9529, 'eval_runtime': 6.9808, 'eval_samples_per_second': 42.545, 'eval_steps_per_second': 2.722, 'epoch': 447.0}


 90%|████████▉ | 67648/75500 [5:12:35<24:41,  5.30it/s]  

{'loss': 0.0174, 'learning_rate': 0.00030719474422804914, 'epoch': 448.0}


                                                       
 90%|████████▉ | 67648/75500 [5:12:42<24:41,  5.30it/s]

{'eval_loss': 0.4535679817199707, 'eval_bleu': 18.9682, 'eval_gen_len': 5.0572, 'eval_runtime': 7.3656, 'eval_samples_per_second': 40.323, 'eval_steps_per_second': 2.58, 'epoch': 448.0}


 90%|████████▉ | 67799/75500 [5:13:08<20:02,  6.40it/s]  

{'loss': 0.0174, 'learning_rate': 0.00030130365883581706, 'epoch': 449.0}


                                                       
 90%|████████▉ | 67799/75500 [5:13:16<20:02,  6.40it/s]

{'eval_loss': 0.4548608958721161, 'eval_bleu': 16.9418, 'eval_gen_len': 4.9428, 'eval_runtime': 7.9394, 'eval_samples_per_second': 37.408, 'eval_steps_per_second': 2.393, 'epoch': 449.0}


 90%|█████████ | 67950/75500 [5:13:42<20:04,  6.27it/s]  

{'loss': 0.0167, 'learning_rate': 0.0002954515872541296, 'epoch': 450.0}


                                                       
 90%|█████████ | 67950/75500 [5:13:49<20:04,  6.27it/s]

{'eval_loss': 0.45198413729667664, 'eval_bleu': 18.5716, 'eval_gen_len': 5.2054, 'eval_runtime': 7.1805, 'eval_samples_per_second': 41.362, 'eval_steps_per_second': 2.646, 'epoch': 450.0}


 90%|█████████ | 68101/75500 [5:14:17<25:36,  4.82it/s]  

{'loss': 0.0173, 'learning_rate': 0.0002895605018618975, 'epoch': 451.0}


                                                       
 90%|█████████ | 68101/75500 [5:14:24<25:36,  4.82it/s]

{'eval_loss': 0.45512673258781433, 'eval_bleu': 16.8753, 'eval_gen_len': 5.0, 'eval_runtime': 7.2412, 'eval_samples_per_second': 41.015, 'eval_steps_per_second': 2.624, 'epoch': 451.0}


 90%|█████████ | 68252/75500 [5:14:50<20:00,  6.04it/s]  

{'loss': 0.0176, 'learning_rate': 0.0002836694164696654, 'epoch': 452.0}


                                                       
 90%|█████████ | 68252/75500 [5:14:57<20:00,  6.04it/s]

{'eval_loss': 0.44847413897514343, 'eval_bleu': 18.3359, 'eval_gen_len': 5.1347, 'eval_runtime': 7.084, 'eval_samples_per_second': 41.925, 'eval_steps_per_second': 2.682, 'epoch': 452.0}


 91%|█████████ | 68403/75500 [5:15:23<19:56,  5.93it/s]  

{'loss': 0.0167, 'learning_rate': 0.00027777833107743333, 'epoch': 453.0}


                                                       
 91%|█████████ | 68403/75500 [5:15:31<19:56,  5.93it/s]

{'eval_loss': 0.45969948172569275, 'eval_bleu': 19.6262, 'eval_gen_len': 4.8923, 'eval_runtime': 7.1929, 'eval_samples_per_second': 41.29, 'eval_steps_per_second': 2.641, 'epoch': 453.0}


 91%|█████████ | 68554/75500 [5:15:58<18:33,  6.24it/s]  

{'loss': 0.0162, 'learning_rate': 0.00027188724568520125, 'epoch': 454.0}


                                                       
 91%|█████████ | 68554/75500 [5:16:05<18:33,  6.24it/s]

{'eval_loss': 0.45611119270324707, 'eval_bleu': 19.265, 'eval_gen_len': 5.2424, 'eval_runtime': 6.6376, 'eval_samples_per_second': 44.745, 'eval_steps_per_second': 2.862, 'epoch': 454.0}


 91%|█████████ | 68705/75500 [5:16:30<15:57,  7.09it/s]  

{'loss': 0.0166, 'learning_rate': 0.00026599616029296917, 'epoch': 455.0}


                                                       
 91%|█████████ | 68705/75500 [5:16:38<15:57,  7.09it/s]

{'eval_loss': 0.45887258648872375, 'eval_bleu': 18.1887, 'eval_gen_len': 5.2189, 'eval_runtime': 7.1907, 'eval_samples_per_second': 41.303, 'eval_steps_per_second': 2.642, 'epoch': 455.0}


 91%|█████████ | 68856/75500 [5:17:03<19:46,  5.60it/s]  

{'loss': 0.0162, 'learning_rate': 0.0002601050749007371, 'epoch': 456.0}


                                                       
 91%|█████████ | 68856/75500 [5:17:10<19:46,  5.60it/s]

{'eval_loss': 0.4505019783973694, 'eval_bleu': 20.5131, 'eval_gen_len': 5.0438, 'eval_runtime': 6.6535, 'eval_samples_per_second': 44.638, 'eval_steps_per_second': 2.856, 'epoch': 456.0}


 91%|█████████▏| 69007/75500 [5:17:38<18:59,  5.70it/s]  

{'loss': 0.0156, 'learning_rate': 0.000254213989508505, 'epoch': 457.0}


                                                       
 91%|█████████▏| 69007/75500 [5:17:45<18:59,  5.70it/s]

{'eval_loss': 0.4571772515773773, 'eval_bleu': 17.9533, 'eval_gen_len': 5.2054, 'eval_runtime': 7.7787, 'eval_samples_per_second': 38.181, 'eval_steps_per_second': 2.443, 'epoch': 457.0}


 92%|█████████▏| 69158/75500 [5:18:16<16:29,  6.41it/s]  

{'loss': 0.0156, 'learning_rate': 0.00024832290411627294, 'epoch': 458.0}


                                                       
 92%|█████████▏| 69158/75500 [5:18:24<16:29,  6.41it/s]

{'eval_loss': 0.44723019003868103, 'eval_bleu': 20.1555, 'eval_gen_len': 5.2222, 'eval_runtime': 8.054, 'eval_samples_per_second': 36.876, 'eval_steps_per_second': 2.359, 'epoch': 458.0}


 92%|█████████▏| 69309/75500 [5:18:53<21:39,  4.76it/s]  

{'loss': 0.0149, 'learning_rate': 0.00024243181872404083, 'epoch': 459.0}


                                                       
 92%|█████████▏| 69309/75500 [5:19:01<21:39,  4.76it/s]

{'eval_loss': 0.44564130902290344, 'eval_bleu': 19.6377, 'eval_gen_len': 5.3939, 'eval_runtime': 7.4812, 'eval_samples_per_second': 39.7, 'eval_steps_per_second': 2.54, 'epoch': 459.0}


 92%|█████████▏| 69460/75500 [5:19:30<20:14,  4.97it/s]  

{'loss': 0.0142, 'learning_rate': 0.00023654073333180875, 'epoch': 460.0}


                                                       
 92%|█████████▏| 69460/75500 [5:19:38<20:14,  4.97it/s]

{'eval_loss': 0.45166435837745667, 'eval_bleu': 21.0035, 'eval_gen_len': 5.2054, 'eval_runtime': 8.3854, 'eval_samples_per_second': 35.419, 'eval_steps_per_second': 2.266, 'epoch': 460.0}


 92%|█████████▏| 69611/75500 [5:20:09<16:06,  6.09it/s]  

{'loss': 0.0156, 'learning_rate': 0.00023064964793957667, 'epoch': 461.0}


                                                       
 92%|█████████▏| 69611/75500 [5:20:17<16:06,  6.09it/s]

{'eval_loss': 0.44371840357780457, 'eval_bleu': 20.6479, 'eval_gen_len': 5.3569, 'eval_runtime': 7.516, 'eval_samples_per_second': 39.516, 'eval_steps_per_second': 2.528, 'epoch': 461.0}


 92%|█████████▏| 69762/75500 [5:20:43<17:54,  5.34it/s]  

{'loss': 0.0148, 'learning_rate': 0.00022475856254734457, 'epoch': 462.0}


                                                       
 92%|█████████▏| 69762/75500 [5:20:51<17:54,  5.34it/s]

{'eval_loss': 0.4451574385166168, 'eval_bleu': 19.1693, 'eval_gen_len': 5.3535, 'eval_runtime': 8.0468, 'eval_samples_per_second': 36.909, 'eval_steps_per_second': 2.361, 'epoch': 462.0}


 93%|█████████▎| 69913/75500 [5:21:22<16:01,  5.81it/s]  

{'loss': 0.0152, 'learning_rate': 0.00021886747715511252, 'epoch': 463.0}


                                                       
 93%|█████████▎| 69913/75500 [5:21:29<16:01,  5.81it/s]

{'eval_loss': 0.4500994384288788, 'eval_bleu': 15.4108, 'eval_gen_len': 4.899, 'eval_runtime': 7.6453, 'eval_samples_per_second': 38.847, 'eval_steps_per_second': 2.485, 'epoch': 463.0}


 93%|█████████▎| 70064/75500 [5:21:56<18:35,  4.87it/s]  

{'loss': 0.0146, 'learning_rate': 0.00021297639176288044, 'epoch': 464.0}


                                                       
 93%|█████████▎| 70064/75500 [5:22:04<18:35,  4.87it/s]

{'eval_loss': 0.4622388780117035, 'eval_bleu': 18.6133, 'eval_gen_len': 4.8316, 'eval_runtime': 7.6921, 'eval_samples_per_second': 38.611, 'eval_steps_per_second': 2.47, 'epoch': 464.0}


 93%|█████████▎| 70215/75500 [5:22:31<15:19,  5.75it/s]  

{'loss': 0.0152, 'learning_rate': 0.0002071243201811929, 'epoch': 465.0}


                                                       
 93%|█████████▎| 70215/75500 [5:22:39<15:19,  5.75it/s]

{'eval_loss': 0.44227713346481323, 'eval_bleu': 19.5611, 'eval_gen_len': 5.2256, 'eval_runtime': 8.3662, 'eval_samples_per_second': 35.5, 'eval_steps_per_second': 2.271, 'epoch': 465.0}


 93%|█████████▎| 70366/75500 [5:23:11<15:01,  5.69it/s]  

{'loss': 0.0136, 'learning_rate': 0.00020123323478896083, 'epoch': 466.0}


                                                       
 93%|█████████▎| 70366/75500 [5:23:20<15:01,  5.69it/s]

{'eval_loss': 0.4563552439212799, 'eval_bleu': 17.5293, 'eval_gen_len': 5.0741, 'eval_runtime': 9.1036, 'eval_samples_per_second': 32.624, 'eval_steps_per_second': 2.087, 'epoch': 466.0}


 93%|█████████▎| 70517/75500 [5:23:46<13:39,  6.08it/s]  

{'loss': 0.013, 'learning_rate': 0.00019534214939672873, 'epoch': 467.0}


                                                       
 93%|█████████▎| 70517/75500 [5:23:55<13:39,  6.08it/s]

{'eval_loss': 0.4487968981266022, 'eval_bleu': 18.5855, 'eval_gen_len': 5.1684, 'eval_runtime': 8.4017, 'eval_samples_per_second': 35.35, 'eval_steps_per_second': 2.261, 'epoch': 467.0}


 94%|█████████▎| 70668/75500 [5:24:24<13:48,  5.83it/s]  

{'loss': 0.0125, 'learning_rate': 0.0001894510640044967, 'epoch': 468.0}


                                                       
 94%|█████████▎| 70668/75500 [5:24:31<13:48,  5.83it/s]

{'eval_loss': 0.4455564320087433, 'eval_bleu': 18.0055, 'eval_gen_len': 5.266, 'eval_runtime': 7.1874, 'eval_samples_per_second': 41.322, 'eval_steps_per_second': 2.644, 'epoch': 468.0}


 94%|█████████▍| 70819/75500 [5:25:00<12:28,  6.25it/s]  

{'loss': 0.0123, 'learning_rate': 0.00018355997861226457, 'epoch': 469.0}


                                                       
 94%|█████████▍| 70819/75500 [5:25:10<12:28,  6.25it/s]

{'eval_loss': 0.46174728870391846, 'eval_bleu': 19.1825, 'eval_gen_len': 5.3333, 'eval_runtime': 9.8174, 'eval_samples_per_second': 30.253, 'eval_steps_per_second': 1.935, 'epoch': 469.0}


 94%|█████████▍| 70970/75500 [5:25:38<11:28,  6.58it/s]  

{'loss': 0.0126, 'learning_rate': 0.00017766889322003252, 'epoch': 470.0}


                                                       
 94%|█████████▍| 70970/75500 [5:25:44<11:28,  6.58it/s]

{'eval_loss': 0.4576067626476288, 'eval_bleu': 19.0699, 'eval_gen_len': 5.2525, 'eval_runtime': 6.4825, 'eval_samples_per_second': 45.816, 'eval_steps_per_second': 2.931, 'epoch': 470.0}


 94%|█████████▍| 71121/75500 [5:26:10<12:09,  6.00it/s]  

{'loss': 0.0126, 'learning_rate': 0.0001717778078278004, 'epoch': 471.0}


                                                       
 94%|█████████▍| 71121/75500 [5:26:18<12:09,  6.00it/s]

{'eval_loss': 0.4571508467197418, 'eval_bleu': 18.5564, 'eval_gen_len': 5.2862, 'eval_runtime': 7.26, 'eval_samples_per_second': 40.909, 'eval_steps_per_second': 2.617, 'epoch': 471.0}


 94%|█████████▍| 71272/75500 [5:26:45<11:01,  6.39it/s]  

{'loss': 0.0127, 'learning_rate': 0.00016588672243556833, 'epoch': 472.0}


                                                       
 94%|█████████▍| 71272/75500 [5:26:51<11:01,  6.39it/s]

{'eval_loss': 0.46528980135917664, 'eval_bleu': 18.2916, 'eval_gen_len': 5.2391, 'eval_runtime': 6.2901, 'eval_samples_per_second': 47.217, 'eval_steps_per_second': 3.021, 'epoch': 472.0}


 95%|█████████▍| 71423/75500 [5:27:16<09:37,  7.06it/s]  

{'loss': 0.0119, 'learning_rate': 0.00015999563704333626, 'epoch': 473.0}


                                                       
 95%|█████████▍| 71423/75500 [5:27:24<09:37,  7.06it/s]

{'eval_loss': 0.46760883927345276, 'eval_bleu': 17.7619, 'eval_gen_len': 5.0337, 'eval_runtime': 7.3735, 'eval_samples_per_second': 40.279, 'eval_steps_per_second': 2.577, 'epoch': 473.0}


 95%|█████████▍| 71574/75500 [5:27:53<11:37,  5.63it/s]  

{'loss': 0.0119, 'learning_rate': 0.00015410455165110418, 'epoch': 474.0}


                                                       
 95%|█████████▍| 71574/75500 [5:28:01<11:37,  5.63it/s]

{'eval_loss': 0.4641357362270355, 'eval_bleu': 19.5732, 'eval_gen_len': 4.9899, 'eval_runtime': 7.5723, 'eval_samples_per_second': 39.222, 'eval_steps_per_second': 2.509, 'epoch': 474.0}


 95%|█████████▌| 71725/75500 [5:28:26<08:56,  7.03it/s]  

{'loss': 0.0114, 'learning_rate': 0.0001482134662588721, 'epoch': 475.0}


                                                       
 95%|█████████▌| 71725/75500 [5:28:33<08:56,  7.03it/s]

{'eval_loss': 0.45794811844825745, 'eval_bleu': 18.0637, 'eval_gen_len': 5.2761, 'eval_runtime': 6.9264, 'eval_samples_per_second': 42.879, 'eval_steps_per_second': 2.743, 'epoch': 475.0}


 95%|█████████▌| 71876/75500 [5:28:58<09:24,  6.42it/s]  

{'loss': 0.0113, 'learning_rate': 0.00014232238086664, 'epoch': 476.0}


                                                       
 95%|█████████▌| 71876/75500 [5:29:05<09:24,  6.42it/s]

{'eval_loss': 0.467293381690979, 'eval_bleu': 20.5474, 'eval_gen_len': 4.9327, 'eval_runtime': 6.6195, 'eval_samples_per_second': 44.868, 'eval_steps_per_second': 2.87, 'epoch': 476.0}


 95%|█████████▌| 72027/75500 [5:29:30<09:29,  6.09it/s]  

{'loss': 0.011, 'learning_rate': 0.00013643129547440792, 'epoch': 477.0}


                                                       
 95%|█████████▌| 72027/75500 [5:29:37<09:29,  6.09it/s]

{'eval_loss': 0.46116071939468384, 'eval_bleu': 20.0222, 'eval_gen_len': 5.1448, 'eval_runtime': 7.0853, 'eval_samples_per_second': 41.918, 'eval_steps_per_second': 2.682, 'epoch': 477.0}


 96%|█████████▌| 72178/75500 [5:30:08<13:30,  4.10it/s]  

{'loss': 0.0111, 'learning_rate': 0.00013054021008217584, 'epoch': 478.0}


                                                       
 96%|█████████▌| 72178/75500 [5:30:18<13:30,  4.10it/s]

{'eval_loss': 0.4538271129131317, 'eval_bleu': 18.8719, 'eval_gen_len': 5.2559, 'eval_runtime': 9.8195, 'eval_samples_per_second': 30.246, 'eval_steps_per_second': 1.935, 'epoch': 478.0}


 96%|█████████▌| 72329/75500 [5:30:52<11:23,  4.64it/s]  

{'loss': 0.0111, 'learning_rate': 0.00012464912468994376, 'epoch': 479.0}


                                                       
 96%|█████████▌| 72329/75500 [5:31:01<11:23,  4.64it/s]

{'eval_loss': 0.45800504088401794, 'eval_bleu': 18.9043, 'eval_gen_len': 5.1414, 'eval_runtime': 9.2926, 'eval_samples_per_second': 31.961, 'eval_steps_per_second': 2.045, 'epoch': 479.0}


 96%|█████████▌| 72480/75500 [5:31:32<09:31,  5.29it/s]  

{'loss': 0.0108, 'learning_rate': 0.00011875803929771167, 'epoch': 480.0}


                                                       
 96%|█████████▌| 72480/75500 [5:31:40<09:31,  5.29it/s]

{'eval_loss': 0.4603463113307953, 'eval_bleu': 18.4849, 'eval_gen_len': 5.1616, 'eval_runtime': 7.818, 'eval_samples_per_second': 37.989, 'eval_steps_per_second': 2.43, 'epoch': 480.0}


 96%|█████████▌| 72631/75500 [5:32:09<09:38,  4.96it/s]  

{'loss': 0.0105, 'learning_rate': 0.00011286695390547958, 'epoch': 481.0}


                                                       
 96%|█████████▌| 72631/75500 [5:32:16<09:38,  4.96it/s]

{'eval_loss': 0.4551814794540405, 'eval_bleu': 20.2356, 'eval_gen_len': 5.2963, 'eval_runtime': 7.6318, 'eval_samples_per_second': 38.916, 'eval_steps_per_second': 2.49, 'epoch': 481.0}


 96%|█████████▋| 72782/75500 [5:32:41<06:22,  7.10it/s]  

{'loss': 0.0101, 'learning_rate': 0.00010697586851324751, 'epoch': 482.0}


                                                       
 96%|█████████▋| 72782/75500 [5:32:47<06:22,  7.10it/s]

{'eval_loss': 0.45219242572784424, 'eval_bleu': 19.1875, 'eval_gen_len': 5.2492, 'eval_runtime': 6.3384, 'eval_samples_per_second': 46.857, 'eval_steps_per_second': 2.998, 'epoch': 482.0}


 97%|█████████▋| 72933/75500 [5:33:11<06:43,  6.35it/s]  

{'loss': 0.0101, 'learning_rate': 0.00010108478312101542, 'epoch': 483.0}


                                                       
 97%|█████████▋| 72933/75500 [5:33:18<06:43,  6.35it/s]

{'eval_loss': 0.46097779273986816, 'eval_bleu': 20.268, 'eval_gen_len': 5.1145, 'eval_runtime': 6.3355, 'eval_samples_per_second': 46.879, 'eval_steps_per_second': 2.999, 'epoch': 483.0}


 97%|█████████▋| 73084/75500 [5:33:43<06:21,  6.34it/s]  

{'loss': 0.0099, 'learning_rate': 9.519369772878333e-05, 'epoch': 484.0}


                                                       
 97%|█████████▋| 73084/75500 [5:33:49<06:21,  6.34it/s]

{'eval_loss': 0.45771291851997375, 'eval_bleu': 19.1149, 'eval_gen_len': 5.1448, 'eval_runtime': 6.308, 'eval_samples_per_second': 47.083, 'eval_steps_per_second': 3.012, 'epoch': 484.0}


 97%|█████████▋| 73235/75500 [5:34:13<06:10,  6.11it/s]  

{'loss': 0.0101, 'learning_rate': 8.934162614709584e-05, 'epoch': 485.0}


                                                       
 97%|█████████▋| 73235/75500 [5:34:19<06:10,  6.11it/s]

{'eval_loss': 0.46697089076042175, 'eval_bleu': 19.3269, 'eval_gen_len': 5.0505, 'eval_runtime': 6.4149, 'eval_samples_per_second': 46.299, 'eval_steps_per_second': 2.962, 'epoch': 485.0}


 97%|█████████▋| 73386/75500 [5:34:45<05:22,  6.56it/s]  

{'loss': 0.0098, 'learning_rate': 8.345054075486375e-05, 'epoch': 486.0}


                                                       
 97%|█████████▋| 73386/75500 [5:34:52<05:22,  6.56it/s]

{'eval_loss': 0.4658014178276062, 'eval_bleu': 18.4325, 'eval_gen_len': 5.1178, 'eval_runtime': 6.6585, 'eval_samples_per_second': 44.605, 'eval_steps_per_second': 2.853, 'epoch': 486.0}


 97%|█████████▋| 73537/75500 [5:35:18<05:12,  6.27it/s]  

{'loss': 0.0094, 'learning_rate': 7.755945536263167e-05, 'epoch': 487.0}


                                                       
 97%|█████████▋| 73537/75500 [5:35:27<05:12,  6.27it/s]

{'eval_loss': 0.47225967049598694, 'eval_bleu': 17.8146, 'eval_gen_len': 5.0909, 'eval_runtime': 9.0373, 'eval_samples_per_second': 32.864, 'eval_steps_per_second': 2.102, 'epoch': 487.0}


 98%|█████████▊| 73688/75500 [5:35:57<04:36,  6.55it/s]  

{'loss': 0.0088, 'learning_rate': 7.166836997039958e-05, 'epoch': 488.0}


                                                       
 98%|█████████▊| 73688/75500 [5:36:05<04:36,  6.55it/s]

{'eval_loss': 0.4707978665828705, 'eval_bleu': 18.3079, 'eval_gen_len': 5.2155, 'eval_runtime': 8.0852, 'eval_samples_per_second': 36.734, 'eval_steps_per_second': 2.35, 'epoch': 488.0}


 98%|█████████▊| 73839/75500 [5:36:32<04:32,  6.09it/s]  

{'loss': 0.009, 'learning_rate': 6.57772845781675e-05, 'epoch': 489.0}


                                                       
 98%|█████████▊| 73839/75500 [5:36:39<04:32,  6.09it/s]

{'eval_loss': 0.4743085205554962, 'eval_bleu': 19.3226, 'eval_gen_len': 5.1111, 'eval_runtime': 7.0472, 'eval_samples_per_second': 42.144, 'eval_steps_per_second': 2.696, 'epoch': 489.0}


 98%|█████████▊| 73990/75500 [5:37:06<03:41,  6.83it/s]  

{'loss': 0.009, 'learning_rate': 5.9886199185935414e-05, 'epoch': 490.0}


                                                       
 98%|█████████▊| 73990/75500 [5:37:13<03:41,  6.83it/s]

{'eval_loss': 0.469818115234375, 'eval_bleu': 18.3763, 'eval_gen_len': 5.1279, 'eval_runtime': 6.6846, 'eval_samples_per_second': 44.431, 'eval_steps_per_second': 2.842, 'epoch': 490.0}


 98%|█████████▊| 74141/75500 [5:37:43<03:50,  5.91it/s]  

{'loss': 0.0087, 'learning_rate': 5.3995113793703336e-05, 'epoch': 491.0}


                                                       
 98%|█████████▊| 74141/75500 [5:37:51<03:50,  5.91it/s]

{'eval_loss': 0.4727153778076172, 'eval_bleu': 18.3388, 'eval_gen_len': 5.0909, 'eval_runtime': 8.2276, 'eval_samples_per_second': 36.098, 'eval_steps_per_second': 2.309, 'epoch': 491.0}


 98%|█████████▊| 74292/75500 [5:38:20<03:14,  6.22it/s]  

{'loss': 0.0085, 'learning_rate': 4.810402840147125e-05, 'epoch': 492.0}


                                                       
 98%|█████████▊| 74292/75500 [5:38:27<03:14,  6.22it/s]

{'eval_loss': 0.4748472571372986, 'eval_bleu': 19.3864, 'eval_gen_len': 5.0202, 'eval_runtime': 7.0534, 'eval_samples_per_second': 42.107, 'eval_steps_per_second': 2.694, 'epoch': 492.0}


 99%|█████████▊| 74443/75500 [5:38:59<04:10,  4.22it/s]

{'loss': 0.0084, 'learning_rate': 4.2212943009239166e-05, 'epoch': 493.0}


                                                       
 99%|█████████▊| 74443/75500 [5:39:06<04:10,  4.22it/s]

{'eval_loss': 0.46810850501060486, 'eval_bleu': 18.94, 'eval_gen_len': 5.2121, 'eval_runtime': 6.9849, 'eval_samples_per_second': 42.52, 'eval_steps_per_second': 2.72, 'epoch': 493.0}


 99%|█████████▉| 74594/75500 [5:39:36<02:38,  5.70it/s]

{'loss': 0.0083, 'learning_rate': 3.632185761700708e-05, 'epoch': 494.0}


                                                       
 99%|█████████▉| 74594/75500 [5:39:42<02:38,  5.70it/s]

{'eval_loss': 0.4710828363895416, 'eval_bleu': 19.0243, 'eval_gen_len': 5.0943, 'eval_runtime': 6.6133, 'eval_samples_per_second': 44.909, 'eval_steps_per_second': 2.873, 'epoch': 494.0}


 99%|█████████▉| 74745/75500 [5:40:06<01:45,  7.14it/s]

{'loss': 0.0079, 'learning_rate': 3.0430772224775002e-05, 'epoch': 495.0}


                                                       
 99%|█████████▉| 74745/75500 [5:40:12<01:45,  7.14it/s]

{'eval_loss': 0.4694432318210602, 'eval_bleu': 18.6767, 'eval_gen_len': 5.2626, 'eval_runtime': 6.4172, 'eval_samples_per_second': 46.282, 'eval_steps_per_second': 2.961, 'epoch': 495.0}


 99%|█████████▉| 74896/75500 [5:40:38<01:33,  6.46it/s]

{'loss': 0.008, 'learning_rate': 2.453968683254292e-05, 'epoch': 496.0}


                                                       
 99%|█████████▉| 74896/75500 [5:40:45<01:33,  6.46it/s]

{'eval_loss': 0.47067537903785706, 'eval_bleu': 17.9649, 'eval_gen_len': 5.1953, 'eval_runtime': 6.6731, 'eval_samples_per_second': 44.507, 'eval_steps_per_second': 2.847, 'epoch': 496.0}


 99%|█████████▉| 75047/75500 [5:41:10<01:27,  5.17it/s]

{'loss': 0.0076, 'learning_rate': 1.8648601440310832e-05, 'epoch': 497.0}


                                                       
 99%|█████████▉| 75047/75500 [5:41:17<01:27,  5.17it/s]

{'eval_loss': 0.4714444875717163, 'eval_bleu': 18.0617, 'eval_gen_len': 5.2357, 'eval_runtime': 7.0593, 'eval_samples_per_second': 42.072, 'eval_steps_per_second': 2.691, 'epoch': 497.0}


100%|█████████▉| 75198/75500 [5:41:44<00:50,  5.94it/s]

{'loss': 0.0077, 'learning_rate': 1.275751604807875e-05, 'epoch': 498.0}


                                                       
100%|█████████▉| 75198/75500 [5:41:51<00:50,  5.94it/s]

{'eval_loss': 0.47351258993148804, 'eval_bleu': 18.0929, 'eval_gen_len': 5.2458, 'eval_runtime': 6.7888, 'eval_samples_per_second': 43.749, 'eval_steps_per_second': 2.799, 'epoch': 498.0}


100%|█████████▉| 75349/75500 [5:42:15<00:26,  5.72it/s]

{'loss': 0.0075, 'learning_rate': 6.866430655846666e-06, 'epoch': 499.0}


                                                       
100%|█████████▉| 75349/75500 [5:42:22<00:26,  5.72it/s]

{'eval_loss': 0.4736345708370209, 'eval_bleu': 18.0159, 'eval_gen_len': 5.2222, 'eval_runtime': 6.5953, 'eval_samples_per_second': 45.032, 'eval_steps_per_second': 2.881, 'epoch': 499.0}


100%|██████████| 75500/75500 [5:42:48<00:00,  6.23it/s]

{'loss': 0.0075, 'learning_rate': 1.0143590741591668e-06, 'epoch': 500.0}


                                                       
100%|██████████| 75500/75500 [5:42:55<00:00,  6.23it/s]

{'eval_loss': 0.4737825095653534, 'eval_bleu': 17.9489, 'eval_gen_len': 5.2391, 'eval_runtime': 6.9379, 'eval_samples_per_second': 42.808, 'eval_steps_per_second': 2.739, 'epoch': 500.0}


100%|██████████| 75500/75500 [5:42:56<00:00,  3.67it/s]

{'train_runtime': 20576.0209, 'train_samples_per_second': 58.393, 'train_steps_per_second': 3.669, 'train_loss': 0.07788922929763795, 'epoch': 500.0}





TrainOutput(global_step=75500, training_loss=0.07788922929763795, metrics={'train_runtime': 20576.0209, 'train_samples_per_second': 58.393, 'train_steps_per_second': 3.669, 'train_loss': 0.07788922929763795, 'epoch': 500.0})

In [16]:
trainer.state.best_model_checkpoint

'data/checkpoints/t5_results_fw_v2_3\\checkpoint-69460'

In [None]:
# # let us save the best model
# trainer.save_state()
# trainer.save_model('data/checkpoints/t5_results_fw_v2_2/')
# trainer._save('data/checkpoints/')

# with open('data/checkpoints/t5_results_fw_v2_2/optimizer.pt', 'wb') as f:
    
#     torch.save(trainer.optimizer.state_dict(), f)
    
# with open('data/checkpoints/t5_results_fw_v2_2/scheduler.pt', 'wb') as f:
    
#     torch.save(trainer.lr_scheduler.state_dict(), f)


In [17]:
# let us get the best model
model = AutoModelForSeq2SeqLM.from_pretrained('data/checkpoints/t5_results_fw_v2_3/checkpoint-69460')

### Predictions

Let us generate texts and store into a DataFrame.

In [18]:

# set the model to eval mode
_ = model.eval()

# run model inference on all test data
original_translations, predicted_translations, original_texts, scores = [], [], [], {}

for data, attention_mask, labels in tqdm(DataLoader(test_dataset)):
    
    # Traduce the sentences
    original_text = tokenizer.decode(data[0], skip_special_tokens=True)
    
    original_translation = tokenizer.decode(labels[0], skip_special_tokens=True)
    
    # get tokens
    generated = torch.tensor(data)
    
    attention_mask = torch.tensor(attention_mask)
    
    # recuperate the pad token id
    pad_token_id = tokenizer.pad_token_id
    
    # perform prediction
    predictions = model.generate(generated, do_sample = False, top_k = 50, max_length = test_dataset.max_len, top_p = 0.90,
                                    temperature = 0, num_return_sequences = 0, attention_mask = attention_mask, pad_token_id = pad_token_id)
    
    # calculate the score and add it to the score
    result = evaluation.compute_metrics((predictions, torch.tensor(labels)))
    
    if not scores: scores.update({k: v for k, v in result.items()})
    
    else: scores.update({k: round(scores[k] + v, 4) for k, v in result.items()})
    
    # decode the predicted tokens into texts
    predicted_translation = list(test_dataset.decode(predictions))
    
    print(predicted_translation[0])
    
    # append results
    original_translations.append(original_translation)
    
    predicted_translations.extend(predicted_translation)
    
    original_texts.append(original_text)

# transform result into data frame
df_ft_to_wf = pd.DataFrame({'original_text': original_texts,
                            'original_label': original_translations,
                            'predicted_label': predicted_translations})

# print the result
df_ft_to_wf.head()

  generated = torch.tensor(data)
  attention_mask = torch.tensor(attention_mask)
  result = evaluation.compute_metrics((predictions, torch.tensor(labels)))
  0%|          | 1/297 [00:01<06:26,  1.31s/it]

leen


  1%|          | 2/297 [00:02<05:14,  1.07s/it]

dog


  1%|          | 3/297 [00:02<04:36,  1.06it/s]

Du góor.


  1%|▏         | 4/297 [00:03<04:34,  1.07it/s]

Soo demee góor gee ni nit la


  2%|▏         | 5/297 [00:04<04:14,  1.15it/s]

waaraw


  2%|▏         | 6/297 [00:05<04:03,  1.19it/s]

Mbaa...


  2%|▏         | 7/297 [00:06<03:56,  1.23it/s]

xam


  3%|▎         | 8/297 [00:06<03:49,  1.26it/s]

Ndaw senn réerul.


  3%|▎         | 9/297 [00:07<03:50,  1.25it/s]

fatarñi


  3%|▎         | 10/297 [00:08<03:58,  1.21it/s]

Góor gi waxtaan na ag yaw.


  4%|▎         | 11/297 [00:09<03:56,  1.21it/s]

Maa dem


  4%|▍         | 12/297 [00:10<03:48,  1.25it/s]

Ma õgee doon dem


  4%|▍         | 13/297 [00:11<03:46,  1.25it/s]

Waaye man sax dama dul ñëw


  5%|▍         | 14/297 [00:11<03:49,  1.23it/s]

seet


  5%|▌         | 15/297 [00:12<03:46,  1.25it/s]

ci biir


  5%|▌         | 16/297 [00:13<03:51,  1.21it/s]

mpar


  6%|▌         | 17/297 [00:14<03:50,  1.22it/s]

sabar


  6%|▌         | 18/297 [00:15<03:45,  1.24it/s]

joyee


  6%|▋         | 19/297 [00:15<03:41,  1.25it/s]

ñam


  7%|▋         | 20/297 [00:16<03:43,  1.24it/s]

ub


  7%|▋         | 21/297 [00:17<03:41,  1.25it/s]

Góor gi nit la.


  7%|▋         | 22/297 [00:18<03:34,  1.28it/s]

Benn bi rëcc laay seet.


  8%|▊         | 23/297 [00:19<03:43,  1.23it/s]

jiyi


  8%|▊         | 24/297 [00:19<03:39,  1.25it/s]

ñam


  8%|▊         | 25/297 [00:20<03:34,  1.27it/s]

ndax itam


  9%|▉         | 26/297 [00:21<03:34,  1.27it/s]

lemu


  9%|▉         | 27/297 [00:22<03:40,  1.22it/s]

siifal


  9%|▉         | 28/297 [00:23<03:33,  1.26it/s]

waxaale


 10%|▉         | 29/297 [00:23<03:40,  1.22it/s]

Doo dem, xanaa?


 10%|█         | 30/297 [00:24<03:37,  1.23it/s]

takk


 10%|█         | 31/297 [00:25<03:34,  1.24it/s]

Yaa õgi demuloo


 11%|█         | 32/297 [00:26<03:32,  1.25it/s]

Góor gi ñëw.


 11%|█         | 33/297 [00:27<03:32,  1.25it/s]

gannaaw


 11%|█▏        | 34/297 [00:27<03:30,  1.25it/s]

ñi


 12%|█▏        | 35/297 [00:28<03:35,  1.22it/s]

Góor gi dem ba xale?


 12%|█▏        | 36/297 [00:29<03:36,  1.21it/s]

Samba lale


 12%|█▏        | 37/297 [00:30<03:34,  1.21it/s]

li woon


 13%|█▎        | 38/297 [00:31<03:38,  1.19it/s]

Waaye,,, mu m,.


 13%|█▎        | 39/297 [00:32<03:34,  1.20it/s]

Baax na mu tollu


 13%|█▎        | 40/297 [00:32<03:31,  1.21it/s]

pas


 14%|█▍        | 41/297 [00:33<03:37,  1.18it/s]

sopp


 14%|█▍        | 42/297 [00:34<03:33,  1.20it/s]

Bu ñu dem


 14%|█▍        | 43/297 [00:35<03:29,  1.21it/s]

Soo demee, mu ñëw


 15%|█▍        | 44/297 [00:36<03:35,  1.17it/s]

Nit ki ñëw ba ñu xammee ka.


 15%|█▌        | 45/297 [00:37<03:33,  1.18it/s]

Dem na ci biir.


 15%|█▌        | 46/297 [00:38<03:28,  1.21it/s]

Yaa õgile


 16%|█▌        | 47/297 [00:38<03:34,  1.17it/s]

seetu


 16%|█▌        | 48/297 [00:39<03:33,  1.17it/s]

Dafa di dem.


 16%|█▋        | 49/297 [00:40<03:30,  1.18it/s]

foofa


 17%|█▋        | 50/297 [00:41<03:24,  1.21it/s]

nit ku góor


 17%|█▋        | 51/297 [00:42<03:26,  1.19it/s]

araféef raññee


 18%|█▊        | 52/297 [00:43<03:27,  1.18it/s]

wax


 18%|█▊        | 53/297 [00:44<03:32,  1.15it/s]

suufu


 18%|█▊        | 54/297 [00:44<03:25,  1.18it/s]

Yeewal bépp xar


 19%|█▊        | 55/297 [00:45<03:22,  1.19it/s]

Waw ndax nak na du du yar


 19%|█▉        | 56/297 [00:46<03:19,  1.21it/s]

loolule


 19%|█▉        | 57/297 [00:47<03:18,  1.21it/s]

nit nit nit nit nit


 20%|█▉        | 58/297 [00:48<03:23,  1.17it/s]

May na ka keneen ku sawar.


 20%|█▉        | 59/297 [00:49<03:35,  1.10it/s]

loolale


 20%|██        | 60/297 [00:50<03:36,  1.09it/s]

anal


 21%|██        | 61/297 [00:51<03:32,  1.11it/s]

Waaye daawul wax.


 21%|██        | 62/297 [00:51<03:25,  1.14it/s]

Góor gee dem


 21%|██        | 63/297 [00:52<03:20,  1.17it/s]

Waxtaanal ag ñooñu doõõ!


 22%|██▏       | 64/297 [00:53<03:17,  1.18it/s]

Bi mu dee dem


 22%|██▏       | 65/297 [00:54<03:18,  1.17it/s]

Dem õga te dem na.


 22%|██▏       | 66/297 [00:55<03:15,  1.18it/s]

Dem na ca subë.


 23%|██▎       | 67/297 [00:56<03:14,  1.18it/s]

noona nak


 23%|██▎       | 68/297 [00:56<03:09,  1.21it/s]

Gis naa ki woon.


 23%|██▎       | 69/297 [00:57<03:06,  1.23it/s]

toogukaay


 24%|██▎       | 70/297 [00:58<03:06,  1.22it/s]

faj


 24%|██▍       | 71/297 [00:59<03:12,  1.18it/s]

Waa ji demkoon na


 24%|██▍       | 72/297 [01:00<03:11,  1.17it/s]

Yobul na t lekke.


 25%|██▍       | 73/297 [01:01<03:10,  1.18it/s]

ñu


 25%|██▍       | 74/297 [01:02<03:18,  1.12it/s]

wàñ


 25%|██▌       | 75/297 [01:03<03:27,  1.07it/s]

araféegul


 26%|██▌       | 76/297 [01:04<03:37,  1.02it/s]

Góor ñi daawul wax.


 26%|██▌       | 77/297 [01:05<03:30,  1.05it/s]

waxé


 26%|██▋       | 78/297 [01:05<03:18,  1.10it/s]

nit ku góor


 27%|██▋       | 79/297 [01:06<03:16,  1.11it/s]

Fu mu demoon?


 27%|██▋       | 80/297 [01:07<03:09,  1.15it/s]

Liggéeykat yi ñun la.


 27%|██▋       | 81/297 [01:08<03:07,  1.16it/s]

Kooka ak kile, bokkuñu


 28%|██▊       | 82/297 [01:09<03:09,  1.13it/s]

ñappati


 28%|██▊       | 83/297 [01:10<03:04,  1.16it/s]

naw


 28%|██▊       | 84/297 [01:10<02:59,  1.18it/s]

keneen ku jigéen


 29%|██▊       | 85/297 [01:11<02:57,  1.19it/s]

Ñii daawuñu dem.


 29%|██▉       | 86/297 [01:12<02:57,  1.19it/s]

Waaye nag, xale yi bëggunu


 29%|██▉       | 87/297 [01:13<02:56,  1.19it/s]

xar xar


 30%|██▉       | 88/297 [01:14<03:00,  1.16it/s]

Góor gii demoon


 30%|██▉       | 89/297 [01:15<02:53,  1.20it/s]

Joxal téere bi doomu nit ku yaru kooku


 30%|███       | 90/297 [01:15<02:50,  1.21it/s]

ndax


 31%|███       | 91/297 [01:16<02:48,  1.22it/s]

ñooñuu


 31%|███       | 92/297 [01:17<02:46,  1.23it/s]

Nit doõõ õga


 31%|███▏      | 93/297 [01:18<02:45,  1.23it/s]

mbubb


 32%|███▏      | 94/297 [01:19<02:50,  1.19it/s]

Néeg bépp


 32%|███▏      | 95/297 [01:20<02:47,  1.20it/s]

Gis naa jaõx yi woon.


 32%|███▏      | 96/297 [01:20<02:46,  1.21it/s]

Góor dëgg du fen.


 33%|███▎      | 97/297 [01:21<02:44,  1.22it/s]

gannaaw.


 33%|███▎      | 98/297 [01:22<02:42,  1.22it/s]

om


 33%|███▎      | 99/297 [01:23<02:40,  1.23it/s]

Fan õga jëm?


 34%|███▎      | 100/297 [01:24<02:47,  1.17it/s]

Bëgg naa ci juróomi waxtu, ci subë, õga ñëw


 34%|███▍      | 101/297 [01:25<02:44,  1.19it/s]

jaambur-jaambur


 34%|███▍      | 102/297 [01:25<02:40,  1.21it/s]

Bonag, dey mu ñibbi


 35%|███▍      | 103/297 [01:26<02:37,  1.24it/s]

araféef raññee


 35%|███▌      | 104/297 [01:27<02:35,  1.24it/s]

Dem na.


 35%|███▌      | 105/297 [01:28<02:35,  1.23it/s]

Su góor gi dana dem


 36%|███▌      | 106/297 [01:29<02:40,  1.19it/s]

apiko-dantaal


 36%|███▌      | 107/297 [01:29<02:37,  1.20it/s]

Bi õga demee la.


 36%|███▋      | 108/297 [01:30<02:39,  1.18it/s]

Jëlël lépp.


 37%|███▋      | 109/297 [01:31<02:37,  1.19it/s]

walla


 37%|███▋      | 110/297 [01:32<02:33,  1.22it/s]

araféef wu léer


 37%|███▋      | 111/297 [01:33<02:32,  1.22it/s]

Na õgeen dem ndax gerte gi ñëw te nak ñjaay mi jar benn yoon?


 38%|███▊      | 112/297 [01:34<02:38,  1.17it/s]

Daan na di génn.


 38%|███▊      | 113/297 [01:35<02:36,  1.18it/s]

Moo bëgg


 38%|███▊      | 114/297 [01:35<02:33,  1.19it/s]

Séen naa ay yëf.


 39%|███▊      | 115/297 [01:36<02:32,  1.19it/s]

leneen rekk


 39%|███▉      | 116/297 [01:37<02:30,  1.21it/s]

Soo dee góor


 39%|███▉      | 117/297 [01:38<02:30,  1.19it/s]

góox


 40%|███▉      | 118/297 [01:39<02:37,  1.14it/s]

Ci foofu õga taxaw.


 40%|████      | 119/297 [01:40<02:37,  1.13it/s]

Giséegul


 40%|████      | 120/297 [01:41<02:33,  1.16it/s]

Yéen bëgg õgeen


 41%|████      | 121/297 [01:41<02:27,  1.19it/s]

su


 41%|████      | 122/297 [01:42<02:26,  1.19it/s]

xamadi


 41%|████▏     | 123/297 [01:43<02:27,  1.18it/s]

Lëf lan a réer?


 42%|████▏     | 124/297 [01:44<02:31,  1.14it/s]

sagar


 42%|████▏     | 125/297 [01:45<02:27,  1.17it/s]

Nit kookuu génn na mbër.


 42%|████▏     | 126/297 [01:46<02:25,  1.18it/s]

Góor gee ni, soo demee, mi õgi fi


 43%|████▎     | 127/297 [01:47<02:24,  1.17it/s]

Séen naa ay ndaw.


 43%|████▎     | 128/297 [01:47<02:22,  1.19it/s]

Nit la ci


 43%|████▎     | 129/297 [01:48<02:20,  1.20it/s]

sa xarit yi ag sa ba itam


 44%|████▍     | 130/297 [01:49<02:26,  1.14it/s]

dindiku


 44%|████▍     | 131/297 [01:50<02:22,  1.16it/s]

nëbboo


 44%|████▍     | 132/297 [01:51<02:19,  1.19it/s]

Naan?


 45%|████▍     | 133/297 [01:52<02:16,  1.21it/s]

jar


 45%|████▌     | 134/297 [01:52<02:16,  1.20it/s]

sax


 45%|████▌     | 135/297 [01:53<02:20,  1.15it/s]

coppeekuwaay


 46%|████▌     | 136/297 [01:54<02:24,  1.11it/s]

takk


 46%|████▌     | 137/297 [01:55<02:20,  1.14it/s]

bax


 46%|████▋     | 138/297 [01:56<02:17,  1.16it/s]

yan


 47%|████▋     | 139/297 [01:57<02:17,  1.15it/s]

Bul dem


 47%|████▋     | 140/297 [01:58<02:12,  1.18it/s]

Góor gi may na nit dara.


 47%|████▋     | 141/297 [01:59<02:20,  1.11it/s]

Su wax ba xale yi


 48%|████▊     | 142/297 [01:59<02:16,  1.13it/s]

xaf


 48%|████▊     | 143/297 [02:00<02:12,  1.16it/s]

Buur bi buur la fii.


 48%|████▊     | 144/297 [02:01<02:08,  1.19it/s]

Waaye, moom lañu


 49%|████▉     | 145/297 [02:02<02:06,  1.20it/s]

Moo di bày


 49%|████▉     | 146/297 [02:03<02:05,  1.20it/s]

Faatim la,.


 49%|████▉     | 147/297 [02:04<02:08,  1.16it/s]

Dans góor gi bëgg na õga dem te xale yi it õga ñëw te nak ñu nit yi yar.


 50%|████▉     | 148/297 [02:04<02:05,  1.18it/s]

Kooy waxal?


 50%|█████     | 149/297 [02:05<02:03,  1.20it/s]

bëggante


 51%|█████     | 150/297 [02:06<02:01,  1.21it/s]

Góor gi dem na


 51%|█████     | 151/297 [02:07<01:58,  1.23it/s]

Na dugg ci biir su bëggée


 51%|█████     | 152/297 [02:08<02:01,  1.19it/s]

Buleen dem


 52%|█████▏    | 153/297 [02:09<02:05,  1.14it/s]

Dama dem


 52%|█████▏    | 154/297 [02:10<02:03,  1.16it/s]

Defal


 52%|█████▏    | 155/297 [02:10<01:59,  1.19it/s]

foofa


 53%|█████▎    | 156/297 [02:11<01:58,  1.19it/s]

raññe ameef


 53%|█████▎    | 157/297 [02:12<01:56,  1.20it/s]

Gor gii di Lawbe Ndar.


 53%|█████▎    | 158/297 [02:13<01:53,  1.22it/s]

Dem naa ba Ndar


 54%|█████▎    | 159/297 [02:14<01:58,  1.16it/s]

wàt


 54%|█████▍    | 160/297 [02:15<01:55,  1.18it/s]

Na mu dem mu Rao


 54%|█████▍    | 161/297 [02:15<01:53,  1.20it/s]

Wooyil Musaa moom mi di dem


 55%|█████▍    | 162/297 [02:16<01:54,  1.17it/s]

Dëkku Séeréer ban õga wax?


 55%|█████▍    | 163/297 [02:17<01:53,  1.19it/s]

lépp loolu woon


 55%|█████▌    | 164/297 [02:18<01:51,  1.19it/s]

ñu


 56%|█████▌    | 165/297 [02:19<01:54,  1.15it/s]

faj


 56%|█████▌    | 166/297 [02:20<01:54,  1.15it/s]

boli


 56%|█████▌    | 167/297 [02:21<01:51,  1.17it/s]

Góor gi gisul xale boobale woon.


 57%|█████▋    | 168/297 [02:21<01:49,  1.17it/s]

Bëgg naa góor ñi ñëw, xale yi ñëw, jigéen ñi toog!


 57%|█████▋    | 169/297 [02:22<01:46,  1.21it/s]

Bëñ


 57%|█████▋    | 170/297 [02:23<01:44,  1.22it/s]

Daõõar


 58%|█████▊    | 171/297 [02:24<01:48,  1.17it/s]

Gis õga nit kee?


 58%|█████▊    | 172/297 [02:25<01:48,  1.15it/s]

demal


 58%|█████▊    | 173/297 [02:26<01:46,  1.17it/s]

weddi


 59%|█████▊    | 174/297 [02:26<01:44,  1.18it/s]

li woon


 59%|█████▉    | 175/297 [02:27<01:44,  1.16it/s]

Góor gi dem na ca


 59%|█████▉    | 176/297 [02:28<01:45,  1.15it/s]

Xammee õga jabaram joojuwoon.


 60%|█████▉    | 177/297 [02:29<01:47,  1.12it/s]

jar


 60%|█████▉    | 178/297 [02:30<01:42,  1.16it/s]

Koo gis?


 60%|██████    | 179/297 [02:31<01:40,  1.17it/s]

ñaareel


 61%|██████    | 180/297 [02:32<01:38,  1.19it/s]

Beneen ndab laa bëgg.


 61%|██████    | 181/297 [02:32<01:38,  1.18it/s]

Dana tukki tay mbaa ëlëk.


 61%|██████▏   | 182/297 [02:33<01:41,  1.14it/s]

bëgg bëgg bëgg


 62%|██████▏   | 183/297 [02:34<01:40,  1.13it/s]

dox


 62%|██████▏   | 184/297 [02:35<01:37,  1.16it/s]

Dañu defewoon ci õgoon ag ci suba yépp da dem


 62%|██████▏   | 185/297 [02:36<01:34,  1.19it/s]

xamadi


 63%|██████▎   | 186/297 [02:37<01:31,  1.21it/s]

seet


 63%|██████▎   | 187/297 [02:38<01:30,  1.22it/s]

Më õgile demkoon


 63%|██████▎   | 188/297 [02:38<01:31,  1.19it/s]

Wool góor gi dul dem


 64%|██████▎   | 189/297 [02:39<01:31,  1.17it/s]

So demee, mi õgiy wax


 64%|██████▍   | 190/297 [02:40<01:30,  1.18it/s]

Réew mi am na alal ndax?


 64%|██████▍   | 191/297 [02:41<01:29,  1.18it/s]

Tannal fasu Waalo wu jigéen.


 65%|██████▍   | 192/297 [02:42<01:32,  1.14it/s]

mbaa


 65%|██████▍   | 193/297 [02:43<01:29,  1.16it/s]

araf wu ubbéeku


 65%|██████▌   | 194/297 [02:44<01:31,  1.13it/s]

sax-sax


 66%|██████▌   | 195/297 [02:45<01:29,  1.14it/s]

dog


 66%|██████▌   | 196/297 [02:45<01:26,  1.17it/s]

raõ raõ lu


 66%|██████▋   | 197/297 [02:46<01:23,  1.20it/s]

wàõõ


 67%|██████▋   | 198/297 [02:47<01:21,  1.21it/s]

aaréen


 67%|██████▋   | 199/297 [02:48<01:21,  1.20it/s]

aaréen


 67%|██████▋   | 200/297 [02:49<01:22,  1.17it/s]

Ku dem


 68%|██████▊   | 201/297 [02:50<01:23,  1.15it/s]

Gis na seeni xarit yooyu yépp


 68%|██████▊   | 202/297 [02:50<01:21,  1.16it/s]

Góor ñi dañu tayel te jigéen ñi du ñu deglóo.


 68%|██████▊   | 203/297 [02:51<01:19,  1.18it/s]

siiwal


 69%|██████▊   | 204/297 [02:52<01:17,  1.20it/s]

ntëng


 69%|██████▉   | 205/297 [02:53<01:15,  1.21it/s]

Yaa õgi, dem


 69%|██████▉   | 206/297 [02:54<01:17,  1.17it/s]

nii


 70%|██████▉   | 207/297 [02:55<01:17,  1.17it/s]

Keneen la moom


 70%|███████   | 208/297 [02:55<01:14,  1.19it/s]

saf


 70%|███████   | 209/297 [02:56<01:14,  1.19it/s]

Benn nit ki ñëw na.


 71%|███████   | 210/297 [02:57<01:12,  1.21it/s]

Dem õga...


 71%|███████   | 211/297 [02:58<01:11,  1.20it/s]

saf


 71%|███████▏  | 212/297 [02:59<01:15,  1.13it/s]

foofale


 72%|███████▏  | 213/297 [03:00<01:13,  1.14it/s]

Ñenn nit ñi yegseeguñu.


 72%|███████▏  | 214/297 [03:01<01:11,  1.16it/s]

Jigéen ji ag góor gi ñjool lañu.


 72%|███████▏  | 215/297 [03:02<01:10,  1.16it/s]

Yeneen fas laa bëgg.


 73%|███████▎  | 216/297 [03:02<01:09,  1.17it/s]

Ñëwël ndax xale yi di mbër te it ñu di ay jambaar.


 73%|███████▎  | 217/297 [03:03<01:09,  1.15it/s]

Yobul na fi xar góor gi.


 73%|███████▎  | 218/297 [03:04<01:09,  1.13it/s]

Yaa doonkoon wax.


 74%|███████▎  | 219/297 [03:05<01:07,  1.15it/s]

Ci foofu jàmm dana fa wacc.


 74%|███████▍  | 220/297 [03:06<01:07,  1.14it/s]

Duõgeen woon ñëw.


 74%|███████▍  | 221/297 [03:07<01:05,  1.17it/s]

miir


 75%|███████▍  | 222/297 [03:08<01:03,  1.18it/s]

Su demee


 75%|███████▌  | 223/297 [03:08<01:04,  1.15it/s]

Gis na sama xarit yeneen yooyuu!


 75%|███████▌  | 224/297 [03:09<01:03,  1.16it/s]

Yaa ñëwkóon


 76%|███████▌  | 225/297 [03:10<01:01,  1.17it/s]

fule


 76%|███████▌  | 226/297 [03:11<01:00,  1.18it/s]

takk


 76%|███████▋  | 227/297 [03:12<00:59,  1.19it/s]

nii


 77%|███████▋  | 228/297 [03:13<00:57,  1.19it/s]

mooy


 77%|███████▋  | 229/297 [03:14<01:00,  1.13it/s]

nuyu


 77%|███████▋  | 230/297 [03:15<00:59,  1.13it/s]

anal


 78%|███████▊  | 231/297 [03:15<00:57,  1.15it/s]

Loolu la.


 78%|███████▊  | 232/297 [03:16<00:55,  1.18it/s]

Nit lañu!


 78%|███████▊  | 233/297 [03:17<00:54,  1.18it/s]

Góor gi di dem


 79%|███████▉  | 234/297 [03:18<00:52,  1.20it/s]

xéewlu


 79%|███████▉  | 235/297 [03:19<00:53,  1.16it/s]

Koo gis?


 79%|███████▉  | 236/297 [03:20<00:52,  1.17it/s]

Defe naa, kodd.


 80%|███████▉  | 237/297 [03:20<00:51,  1.18it/s]

Kenn kooku yeksi lawoon.


 80%|████████  | 238/297 [03:21<00:51,  1.14it/s]

Su dee dem


 80%|████████  | 239/297 [03:22<00:49,  1.17it/s]

jigéen ñi wérul


 81%|████████  | 240/297 [03:23<00:48,  1.18it/s]

Waxtaanal ag ñooñu doõõ!


 81%|████████  | 241/297 [03:24<00:49,  1.13it/s]

Ndaw senn réerul.


 81%|████████▏ | 242/297 [03:25<00:47,  1.16it/s]

Xammee õga bee xale?


 82%|████████▏ | 243/297 [03:26<00:46,  1.17it/s]

yennu nit õga dee gis


 82%|████████▏ | 244/297 [03:26<00:45,  1.18it/s]

Yobul na ci biir góor gi.


 82%|████████▏ | 245/297 [03:27<00:43,  1.19it/s]

lekke


 83%|████████▎ | 246/297 [03:28<00:42,  1.19it/s]

Dinaa dem


 83%|████████▎ | 247/297 [03:29<00:44,  1.12it/s]

Mbaa kenn demul


 84%|████████▎ | 248/297 [03:30<00:42,  1.15it/s]

Maa di dem


 84%|████████▍ | 249/297 [03:31<00:41,  1.16it/s]

waaye


 84%|████████▍ | 250/297 [03:32<00:40,  1.17it/s]

Dellu biir Ndar, soo bëggée.


 85%|████████▍ | 251/297 [03:32<00:39,  1.17it/s]

xeeñ


 85%|████████▍ | 252/297 [03:33<00:37,  1.21it/s]

Bëñ


 85%|████████▌ | 253/297 [03:34<00:36,  1.19it/s]

um


 86%|████████▌ | 254/297 [03:35<00:36,  1.19it/s]

waxaale


 86%|████████▌ | 255/297 [03:36<00:34,  1.22it/s]

Yan ñoo yeksi?


 86%|████████▌ | 256/297 [03:37<00:33,  1.22it/s]

yennu nit


 87%|████████▋ | 257/297 [03:37<00:31,  1.26it/s]

Maa di dem


 87%|████████▋ | 258/297 [03:38<00:30,  1.30it/s]

tiit


 87%|████████▋ | 259/297 [03:39<00:31,  1.20it/s]

Kii?


 88%|████████▊ | 260/297 [03:40<00:31,  1.16it/s]

Nit ku baaxkoon la


 88%|████████▊ | 261/297 [03:41<00:30,  1.19it/s]

lekke


 88%|████████▊ | 262/297 [03:41<00:28,  1.23it/s]

bu


 89%|████████▊ | 263/297 [03:42<00:27,  1.25it/s]

Na õgeen def?


 89%|████████▉ | 264/297 [03:43<00:25,  1.27it/s]

Maa dem


 89%|████████▉ | 265/297 [03:44<00:25,  1.24it/s]

Gis õga xale bee?


 90%|████████▉ | 266/297 [03:45<00:24,  1.26it/s]

Du rekk.


 90%|████████▉ | 267/297 [03:46<00:25,  1.17it/s]

seet


 90%|█████████ | 268/297 [03:47<00:25,  1.12it/s]

Gisoon naa nit ñooña ñépp.


 91%|█████████ | 269/297 [03:47<00:23,  1.18it/s]

Yéen dem õgeen?


 91%|█████████ | 270/297 [03:48<00:22,  1.22it/s]

Duñu la woon.


 91%|█████████ | 271/297 [03:49<00:22,  1.17it/s]

ndax


 92%|█████████▏| 272/297 [03:50<00:21,  1.16it/s]

ka


 92%|█████████▏| 273/297 [03:51<00:19,  1.21it/s]

leneen rekk


 92%|█████████▏| 274/297 [03:51<00:18,  1.24it/s]

baq


 93%|█████████▎| 275/297 [03:52<00:17,  1.26it/s]

Waaye nag, xale yi bëggunu


 93%|█████████▎| 276/297 [03:53<00:16,  1.30it/s]

Yaa ñëwkóon


 93%|█████████▎| 277/297 [03:54<00:15,  1.27it/s]

ndax


 94%|█████████▎| 278/297 [03:54<00:15,  1.26it/s]

xale ya õga dem waaye õgay deey, dem dana dellusi


 94%|█████████▍| 279/297 [03:55<00:14,  1.26it/s]

Gis õga nit ñan?


 94%|█████████▍| 280/297 [03:56<00:13,  1.23it/s]

su


 95%|█████████▍| 281/297 [03:57<00:12,  1.26it/s]

Yéen ñan la wax


 95%|█████████▍| 282/297 [03:58<00:11,  1.28it/s]

Menn nit ñëwul.


 95%|█████████▌| 283/297 [03:58<00:11,  1.27it/s]

Mi õgii dem ba delusi


 96%|█████████▌| 284/297 [03:59<00:10,  1.20it/s]

So demee, góor gee ni nit la.


 96%|█████████▌| 285/297 [04:00<00:09,  1.23it/s]

Ñépp ñan õga gis?


 96%|█████████▋| 286/297 [04:01<00:09,  1.21it/s]

lépp loolu woon


 97%|█████████▋| 287/297 [04:02<00:08,  1.17it/s]

baxbax-lu


 97%|█████████▋| 288/297 [04:03<00:07,  1.17it/s]

ub


 97%|█████████▋| 289/297 [04:04<00:07,  1.09it/s]

nule


 98%|█████████▊| 290/297 [04:05<00:06,  1.07it/s]

Gaynde la, ku dem


 98%|█████████▊| 291/297 [04:06<00:05,  1.03it/s]

pac


 98%|█████████▊| 292/297 [04:07<00:04,  1.07it/s]

Ñeneen lañu.


 99%|█████████▊| 293/297 [04:08<00:03,  1.11it/s]

gëmm nitu xam


 99%|█████████▉| 294/297 [04:09<00:02,  1.01it/s]

waxkat


 99%|█████████▉| 295/297 [04:10<00:01,  1.07it/s]

Waaye nit kookii?


100%|█████████▉| 296/297 [04:10<00:00,  1.13it/s]

Góor gi bëgg na


100%|██████████| 297/297 [04:11<00:00,  1.18it/s]

Man demuma





Unnamed: 0,original_text,original_label,predicted_label
0,Va les voir!,Gisi leen!,leen
1,couper,dagg,dog
2,Ce n'était pas un homme de Saint-Louis.,Du woon góoru Ndar.,Du góor.
3,Peut-être l'homme a-t-il dit que c'est celui-là!,Soo demee góor gee ni kookule la!,Soo demee góor gee ni nit la
4,indignité,goreedi,waaraw


In [19]:
df_ft_to_wf.tail(10)

Unnamed: 0,original_text,original_label,predicted_label
287,fermer,up,ub
288,de quelle manière,naan,nule
289,C'est peut-être un lion!,"Ku dem, gaynde la!","Gaynde la, ku dem"
290,suinter,siit,pac
291,"Autres, ils sont.",Ñeñeen lañu.,Ñeneen lañu.
292,il y a un moment,saõx,gëmm nitu xam
293,qui nie,weddikat,waxkat
294,"Tu sais, cet homme?",Gis õga nit kookale?,Waaye nit kookii?
295,L'homme partira aujourd'hui,Góor gi dana dem tay ji,Góor gi bëgg na
296,Moi-même je n'ai pas été,Man mii demuma,Man demuma


In [20]:
# let us display 100 samples
pd.options.display.max_rows = 100
df_ft_to_wf.sample(100)

Unnamed: 0,original_text,original_label,predicted_label
167,"Je veux, si vous avez fini, que tu viennes et ...",Bëgg naa õga ñëw mu dem su õgeen noppée!,"Bëgg naa góor ñi ñëw, xale yi ñëw, jigéen ñi t..."
211,celui-ci près de toi,kookii,foofale
63,Du moment qu'il part,Bi mu dee dem,Bi mu dee dem
154,là a cet endroit,foofile,foofa
5,Est-ce comme je le crains que...!,Mbaa...!,Mbaa...
77,êtres humains de sexe masculin,nit ñu góor,nit ku góor
183,On croyait que tu allais partir matin et soir!,Dañu defewoon ni daa dem ci subë ag ci õgoon y...,Dañu defewoon ci õgoon ag ci suba yépp da dem
158,traîner,wàtal,wàt
9,L'homme n'a rien mangé avec la main.,Nit ki lekkul dara ak loxoom.,Góor gi waxtaan na ag yaw.
139,L'homme a donné quelque chose à quelqu'un.,Góor gi may na dara nit.,Góor gi may na nit dara.


## Colab download and remove step

In [None]:
import shutil

# shutil.rmtree('/content/drive/MyDrive/Memoire/subject2/training2/results2')
shutil.rmtree('wandb')
# shutil.make_archive('wandb', 'zip', 'wanbd')