Hyper-parameter search with the Text-To-Text Transformer 🤖 (Random)
-----------------------------------

In this project, we will make transfer learning with the Text-To-Text Transformer (T5) model to translate French sentences into Wolof sentences and vice-versa. We will use the `random` method to make an hyperparameter search and compare it to the results obtained with the `bayes` method. We will use the `wandb` library to evaluate the model more efficiently with `Parallel coordinate` and `Parameter Importance` charts. After finding the best model, we will take the checkpoints and continue the training in another notebook. Let us dive into the process.

We want to know the best combination of values of the following hyperparameters:

- **learning rate** $\sim Log U(1e-3, 1e-5)$
- **weight decay** $\in \{0.0, 0.1, 0.2, 0.3, 0.4, 0.5\}$
- **random state** (seed of the data splitting generator) $\in \{0, 10, 20, 30, 40, 50, 60, 70, 80, 100\}$

1. For the translation from French to Wolof

  - **fr_char_p** (probability of modifying a character from a French word) $\sim U(0.0, 0.9)$
  - **fr_word_p** (probability of modifying a word from a French sentence) $\sim U(0.0, 0.9)$

2. For the translation from Wolof to French

  - **wf_char_p** (probability of modifying a character from a Wolof word) $\sim U(0.0, 0.9)$
  - **fr_word_p** (probability of modifying a word from a Wolof sentence) $\sim U(0.0, 0.9)$


The Bayes method requires to define a metric. We will evaluate the model on the test set, so the metric that we will add in the hyperparameter setting can be either the `cross entropy loss` calculated on the test set or `BLEU` score. Since we are making a machine translation task, a BLEU score will be more useful as evaluation metric.

**Objective**: We will try to `maximize the metric.` For the moment, we want to obtain a `BLEU` score more than `0.5`.

In [2]:
# let us extend the paths of the system
import sys

path = "/content/drive/MyDrive/Memoire/subject2/T5/"

sys.path.extend([path, f"{path}new_data"])

In [3]:
# define wandb environment
%env WANDB_LOG_MODEL=true
%env WANDB_NOTEBOOK_NAME=training_gpt2_2.ipynb
%env WANDB_API_KEY=237a8450cd2568ea1c8e1f8e0400708e79b6b4ee 

env: WANDB_LOG_MODEL=true
env: WANDB_NOTEBOOK_NAME=training_gpt2_2.ipynb
env: WANDB_API_KEY=237a8450cd2568ea1c8e1f8e0400708e79b6b4ee


In [4]:
!pip install -qq wandb --upgrade

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.0/2.0 MB[0m [31m79.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m49.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 kB[0m [31m22.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m201.7/201.7 kB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [5]:
!pip install evaluate -qq
!pip install sacrebleu -qq
# !pip install optuna -qq
!pip install transformers -qq 
!pip install tokenizers -qq
!pip install nlpaug -qq
!pip install ray[tune] -qq
!python -m spacy download fr_core_news_lg 

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.5/212.5 kB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m37.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m59.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m30.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

In [6]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, set_seed
from wolof_translate.utils.sent_transformers import TransformerSequences
from wolof_translate.data.dataset_v2 import T5SentenceDataset
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from functools import partial
from tqdm import tqdm
import pandas as pd
import numpy as np
import evaluate
import wandb
import torch


We will create two models: 

- One translating the french corpus to a wolof corpus [french_to_wolof](#french-to-wolof)
- One translating the wolof corpus to a french corpus [wolof_to_french](#wolof-to-french)

--------------

## French to wolof

### Configure dataset 🔠

We will split the sentences between train (for the model's training) and test (to evaluate model fitting) sets. The samples added as train and test sets are identified according to `the random state.` We will tune the random state to the groups that guarantee the model's best fitting. In other words, we want the model to identify many training sentences and generalize that learning on the test sentences. It is not sometimes the case, mainly when using a small dataset like ours.

In [7]:
def split_data(random_state: int = 50):
  """Split data between train and test sets

  Args:
    random_state (int): the seed of the splitting generator. Defaults to 50
  """
  # load the corpora and split into train and test sets
  corpora = pd.read_csv(f"{path}new_data/corpora_v3.csv")

  train_set, test_set = train_test_split(corpora, test_size=0.1, random_state=random_state)

  # let us save the sets
  train_set.to_csv(f"{path}new_data/train_set.csv", index=False)

  test_set.to_csv(f"{path}new_data/test_set.csv", index=False)

Let us load the French and Wolof corpora's common tokenizer.

In [8]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"{path}wolof_translate/tokenizers/t5_tokenizers/tokenizer_v1.json")


The following function will make recuperate the datasets.

In [9]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float):

  # Create augmentation to add on French sentences
  fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p),
                                        remove_mark_space, delete_guillemet_space)

  # Recuperate the train dataset
  train_dataset_aug = T5SentenceDataset(f"{path}new_data/train_set.csv",
                                        tokenizer,
                                        truncation = True,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the test dataset
  test_dataset = T5SentenceDataset(f"{path}new_data/test_set.csv",
                                        tokenizer,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, test_dataset

### Configure hyperparameter search ⚙️

We have to configure the search space, the search method and the metric. 

In [10]:
wandb.login(key="237a8450cd2568ea1c8e1f8e0400708e79b6b4ee")

# hyperparameters
sweep_config = {
    'method': 'random',
    'metric':{
          'goal': 'maximize',
          'name': 'bleu'
      },
    'parameters':
    {
      'epochs': {
          'value': 1
      },
      'batch_size': {
          'values': [5, 16, 32]
      },
      'learning_rate': {
          'distribution': 'log_uniform_values',
          'min': 1e-5,
          'max': 1e-3
      },
      'weight_decay': {
          'values': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5]
      },
     'fr_char_p': {
         'min': 0.0,
         'max': 0.9
     },
     'fr_word_p': {
          'min': 0.0,
          'max': 0.9
     },
     'random_state': {
         'values': [0, 10, 20, 30, 40, 50, 60, 70, 80, 100]
     }
    }
}

# Initialize the hyperparameter search
sweep_id = wandb.sweep(sweep_config, project = "small-t5-translation-random-hpsearch")



[34m[1mwandb[0m: Currently logged in as: [33moumar-kane[0m ([33moumar-kane-team[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Create sweep with ID: 3mtzr07a
Sweep URL: https://wandb.ai/oumar-kane-team/t5-translation-random-hpsearch/sweeps/3mtzr07a


### Configure the model and the evaluation function ⚙️

Let us recuperate the model and resize the token embeddings.

**Note**: In the first training we want to use the t5-small. If we don't obtain good results we will take the t5-base which contains more parameters. See bellow the configuration of the t5-small and the t5-base models, respectively.

In [None]:
small_model_name = 't5-small'
base_model_name = 't5-base'

# import the small model with its pre-trained weights
small_model = AutoModelForSeq2SeqLM.from_pretrained(small_model_name)

# import the base model with its pre-trained weights
base_model = AutoModelForSeq2SeqLM.from_pretrained(base_model_name)


In [12]:
# print the small configuration
small_model.config

T5Config {
  "_name_or_path": "t5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en_to_de": {
      "early_stopping": true,
      "max_length": 300,
      "num_beams": 4,
      "prefi

In [13]:
# print the base configuration
base_model.config

T5Config {
  "_name_or_path": "t5-base",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "d_ff": 3072,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 12,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en_to_de": {
      "early_stopping": true,
      "max_length": 300,
      "num_beams": 4,
      "pre

The small model have the same architecture than the original transformer of Ashish Vaswani, Noam Shazeer, and all. in the article [Attention_is_all_you_need](https://arxiv.org/pdf/1706.03762).

The base model contains more parameters since it use 12 heads in place of 8 and, 12 stack decoder layers in place of 6, the number of feed forward features is of 3072 so 1024 more features than the small one and the embedding dimension is of 768 in place of 512. The base model contains exactly 220 millions of parameters which is a huge number. But since it is pre-trained, we can directly make transfer learning with already trained weights. The base model was firstly explained in the article [Text_To_Text_Transformer](https://arxiv.org/pdf/1910.10683). 

In [14]:
def gpt2_model_init(tokenizer):

  # Initialize the model name
  model_name = 't5-small'

  # import the model with its pre-trained weights
  model = AutoModelForSeq2SeqLM.from_pretrained(small_model_name)

  # resize the token embeddings
  model.resize_token_embeddings(len(tokenizer))

  return model

Let us evaluate the predictions with the `bleu` metric. The metric computation that we will use, we got it from the following `HugginFace` tutorial [translation](https://huggingface.co/docs/transformers/tasks/translation). We will use a class to add more parameters if we want.

In [15]:
# %%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
        
        decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

Downloading builder script:   0%|          | 0.00/8.15k [00:00<?, ?B/s]

In [16]:
# %run wolof-translate/wolof_translate/utils/evaluation.py

Let us initialize the evaluation object.

In [17]:
evaluation = TranslationEvaluation(tokenizer)

### Searching for the best parameters 🕖

Let us define the data collator.

In [18]:
def data_collator(batch):
    """Generate a batch of data to provide to trainer

    Args:
        batch (_type_): The batch

    Returns:
        dict: A dictionary containing the ids, the attention mask and the labels
    """
    input_ids = torch.stack([b[0].squeeze(0) for b in batch])
    
    attention_mask = torch.stack([b[1].squeeze(0) for b in batch])
    
    labels = torch.stack([b[2].squeeze(0) for b in batch])
    
    return {'input_ids': input_ids, 'attention_mask': attention_mask,
            'labels': labels}

Let us initialize the training arguments and search for the best model. The latter will be saved as an artefact inside our `wandb` project.

In [19]:
# %%wandb

def train(config = None):

  with wandb.init(config = config):

    # seed
    set_seed(0)

    # set sweep configuration
    config = wandb.config

    # split the data
    split_data(config.random_state)

    # let us recuperate the datasets
    train_dataset, test_dataset = recuperate_datasets(config.fr_char_p, config.fr_word_p)

    # set training arguments
    training_args = Seq2SeqTrainingArguments(f"{path}/training/bayes_search_results",
                                      report_to = f"wandb",
                                      num_train_epochs=config.epochs,
                                      load_best_model_at_end=True,
                                      save_strategy="epoch",
                                      evaluation_strategy="epoch",
                                      logging_strategy = 'epoch',
                                      per_device_train_batch_size=config.batch_size, 
                                      per_device_eval_batch_size=16,
                                      learning_rate=config.learning_rate,
                                      weight_decay=config.weight_decay,
                                      predict_with_generate=True, # we will use predict with generate in order to obtain more valuable test results
                                      fp16 = True,
                                      )   

    # define training loop
    trainer = Seq2SeqTrainer(model_init=partial(gpt2_model_init, tokenizer = train_dataset.tokenizer),
                      args=training_args,
                      train_dataset=train_dataset, 
                      eval_dataset=test_dataset,
                      data_collator=data_collator,
                      compute_metrics=evaluation.compute_metrics
                      )

    # start training loop
    trainer.train()

agent = wandb.agent(sweep_id, train, count = 35)


[34m[1mwandb[0m: Agent Starting Run: 6c2ykgt4 with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.09578974556310609
[34m[1mwandb[0m: 	fr_word_p: 0.35391759493444985
[34m[1mwandb[0m: 	learning_rate: 0.000211583084569954
[34m[1mwandb[0m: 	random_state: 100
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.0076,1.004148,0.0,2.4375


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,2.4375
eval/loss,1.00415
eval/runtime,3.4997
eval/samples_per_second,27.431
eval/steps_per_second,1.714
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.0076


[34m[1mwandb[0m: Agent Starting Run: jjtt3d0h with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.27298879831637657
[34m[1mwandb[0m: 	fr_word_p: 0.016438704116760374
[34m[1mwandb[0m: 	learning_rate: 0.0002983469585105299
[34m[1mwandb[0m: 	random_state: 100
[34m[1mwandb[0m: 	weight_decay: 0.2




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,0.9334,0.911538,0.0,3.0


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,3.0
eval/loss,0.91154
eval/runtime,2.6344
eval/samples_per_second,36.441
eval/steps_per_second,2.278
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,0.9334


[34m[1mwandb[0m: Agent Starting Run: ie2nsry8 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.2825889271324897
[34m[1mwandb[0m: 	fr_word_p: 0.04804707448940956
[34m[1mwandb[0m: 	learning_rate: 0.00028703674001639485
[34m[1mwandb[0m: 	random_state: 70
[34m[1mwandb[0m: 	weight_decay: 0.5




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.7924,1.428247,0.0,4.0625


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,4.0625
eval/loss,1.42825
eval/runtime,2.1607
eval/samples_per_second,44.43
eval/steps_per_second,2.777
train/epoch,1.0
train/global_step,27.0
train/learning_rate,3e-05
train/loss,1.7924


[34m[1mwandb[0m: Agent Starting Run: one8apnd with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.804081467235025
[34m[1mwandb[0m: 	fr_word_p: 0.6318115575428167
[34m[1mwandb[0m: 	learning_rate: 6.890112150282818e-05
[34m[1mwandb[0m: 	random_state: 10
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,2.1161,1.421832,0.0757,12.4792


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0757
eval/gen_len,12.4792
eval/loss,1.42183
eval/runtime,2.1588
eval/samples_per_second,44.47
eval/steps_per_second,2.779
train/epoch,1.0
train/global_step,54.0
train/learning_rate,1e-05
train/loss,2.1161


[34m[1mwandb[0m: Agent Starting Run: twz3epjc with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.2695417296270744
[34m[1mwandb[0m: 	fr_word_p: 0.16278641680044156
[34m[1mwandb[0m: 	learning_rate: 0.0004773619557047186
[34m[1mwandb[0m: 	random_state: 0
[34m[1mwandb[0m: 	weight_decay: 0.3




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,2.0673,1.205763,0.0,4.1979


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,4.1979
eval/loss,1.20576
eval/runtime,2.1921
eval/samples_per_second,43.793
eval/steps_per_second,2.737
train/epoch,1.0
train/global_step,27.0
train/learning_rate,7e-05
train/loss,2.0673


[34m[1mwandb[0m: Agent Starting Run: ffw2hrna with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.3928067116048982
[34m[1mwandb[0m: 	fr_word_p: 0.6932476075667772
[34m[1mwandb[0m: 	learning_rate: 0.0008592828383864983
[34m[1mwandb[0m: 	random_state: 20
[34m[1mwandb[0m: 	weight_decay: 0.2




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.3286,0.908215,0.0,2.2917


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,2.2917
eval/loss,0.90822
eval/runtime,2.1995
eval/samples_per_second,43.647
eval/steps_per_second,2.728
train/epoch,1.0
train/global_step,27.0
train/learning_rate,3e-05
train/loss,1.3286


[34m[1mwandb[0m: Agent Starting Run: 0k06b2sf with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.3442911701958876
[34m[1mwandb[0m: 	fr_word_p: 0.03524927860286549
[34m[1mwandb[0m: 	learning_rate: 0.00012438622162966358
[34m[1mwandb[0m: 	random_state: 70
[34m[1mwandb[0m: 	weight_decay: 0.3




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,2.182,1.820869,0.007,10.4792


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.007
eval/gen_len,10.4792
eval/loss,1.82087
eval/runtime,2.772
eval/samples_per_second,34.633
eval/steps_per_second,2.165
train/epoch,1.0
train/global_step,27.0
train/learning_rate,1e-05
train/loss,2.182


[34m[1mwandb[0m: Agent Starting Run: p5m1qvxs with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.4418418094343982
[34m[1mwandb[0m: 	fr_word_p: 0.1233563795181484
[34m[1mwandb[0m: 	learning_rate: 1.3007978644790536e-05
[34m[1mwandb[0m: 	random_state: 10
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,2.505,1.853689,0.1561,16.8229


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.1561
eval/gen_len,16.8229
eval/loss,1.85369
eval/runtime,2.1954
eval/samples_per_second,43.728
eval/steps_per_second,2.733
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,2.505


[34m[1mwandb[0m: Agent Starting Run: t6kmrftw with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.3496231232830341
[34m[1mwandb[0m: 	fr_word_p: 0.6307760338185459
[34m[1mwandb[0m: 	learning_rate: 0.00021591722083488932
[34m[1mwandb[0m: 	random_state: 20
[34m[1mwandb[0m: 	weight_decay: 0.4




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,0.9741,0.884295,0.0,2.4062


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,2.4062
eval/loss,0.8843
eval/runtime,2.5391
eval/samples_per_second,37.808
eval/steps_per_second,2.363
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,0.9741


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: cxobshpp with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.8020646775932108
[34m[1mwandb[0m: 	fr_word_p: 0.6418994012862635
[34m[1mwandb[0m: 	learning_rate: 2.558011430708031e-05
[34m[1mwandb[0m: 	random_state: 30
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.59,1.484496,0.0114,9.8125


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0114
eval/gen_len,9.8125
eval/loss,1.4845
eval/runtime,2.5604
eval/samples_per_second,37.494
eval/steps_per_second,2.343
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.59


[34m[1mwandb[0m: Agent Starting Run: 4qhka48t with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.860918992362395
[34m[1mwandb[0m: 	fr_word_p: 0.774769971585576
[34m[1mwandb[0m: 	learning_rate: 4.7055783503530355e-05
[34m[1mwandb[0m: 	random_state: 10
[34m[1mwandb[0m: 	weight_decay: 0.1




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.3551,1.227722,0.008,8.0833


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.008
eval/gen_len,8.0833
eval/loss,1.22772
eval/runtime,3.4668
eval/samples_per_second,27.691
eval/steps_per_second,1.731
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.3551


[34m[1mwandb[0m: Agent Starting Run: u5e5polq with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.26419416861986456
[34m[1mwandb[0m: 	fr_word_p: 0.5752396177714139
[34m[1mwandb[0m: 	learning_rate: 2.971122452459015e-05
[34m[1mwandb[0m: 	random_state: 100
[34m[1mwandb[0m: 	weight_decay: 0.1




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,4.0146,3.695528,0.0374,18.4583


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0374
eval/gen_len,18.4583
eval/loss,3.69553
eval/runtime,2.7887
eval/samples_per_second,34.425
eval/steps_per_second,2.152
train/epoch,1.0
train/global_step,27.0
train/learning_rate,0.0
train/loss,4.0146


[34m[1mwandb[0m: Agent Starting Run: y60de1ro with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.4309345405122542
[34m[1mwandb[0m: 	fr_word_p: 0.8051782563677614
[34m[1mwandb[0m: 	learning_rate: 0.00014702318701319245
[34m[1mwandb[0m: 	random_state: 60
[34m[1mwandb[0m: 	weight_decay: 0.5




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.4225,1.249934,0.0008,5.4167


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0008
eval/gen_len,5.4167
eval/loss,1.24993
eval/runtime,2.8844
eval/samples_per_second,33.283
eval/steps_per_second,2.08
train/epoch,1.0
train/global_step,54.0
train/learning_rate,1e-05
train/loss,1.4225


[34m[1mwandb[0m: Agent Starting Run: 6sj4oo36 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.4194551488317243
[34m[1mwandb[0m: 	fr_word_p: 0.455997518994076
[34m[1mwandb[0m: 	learning_rate: 4.9409129951555865e-05
[34m[1mwandb[0m: 	random_state: 70
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,3.6872,2.986617,0.0321,17.4167


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0321
eval/gen_len,17.4167
eval/loss,2.98662
eval/runtime,2.515
eval/samples_per_second,38.172
eval/steps_per_second,2.386
train/epoch,1.0
train/global_step,27.0
train/learning_rate,1e-05
train/loss,3.6872


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: xho3mm5u with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.7594012357387808
[34m[1mwandb[0m: 	fr_word_p: 0.815200516887035
[34m[1mwandb[0m: 	learning_rate: 0.0008049450489338906
[34m[1mwandb[0m: 	random_state: 0
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.2509,1.000572,0.0,3.625


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,3.625
eval/loss,1.00057
eval/runtime,2.2278
eval/samples_per_second,43.092
eval/steps_per_second,2.693
train/epoch,1.0
train/global_step,54.0
train/learning_rate,4e-05
train/loss,1.2509


[34m[1mwandb[0m: Agent Starting Run: ztf1je0n with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.7061637871889751
[34m[1mwandb[0m: 	fr_word_p: 0.4826097090271555
[34m[1mwandb[0m: 	learning_rate: 6.715537134045317e-05
[34m[1mwandb[0m: 	random_state: 60
[34m[1mwandb[0m: 	weight_decay: 0.3




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.2022,1.21719,0.0009,5.7083


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0009
eval/gen_len,5.7083
eval/loss,1.21719
eval/runtime,2.8879
eval/samples_per_second,33.242
eval/steps_per_second,2.078
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.2022


[34m[1mwandb[0m: Agent Starting Run: e6pnykch with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.34536222000050565
[34m[1mwandb[0m: 	fr_word_p: 0.006295541415295669
[34m[1mwandb[0m: 	learning_rate: 0.0008216878457528646
[34m[1mwandb[0m: 	random_state: 40
[34m[1mwandb[0m: 	weight_decay: 0.2




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,0.8883,0.755061,0.006,7.625


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.006
eval/gen_len,7.625
eval/loss,0.75506
eval/runtime,2.297
eval/samples_per_second,41.793
eval/steps_per_second,2.612
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,0.8883


[34m[1mwandb[0m: Agent Starting Run: 52pozg1h with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.7382113966298139
[34m[1mwandb[0m: 	fr_word_p: 0.32959050832075065
[34m[1mwandb[0m: 	learning_rate: 2.594556657913986e-05
[34m[1mwandb[0m: 	random_state: 60
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,4.1705,3.871779,0.0363,18.6354


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0363
eval/gen_len,18.6354
eval/loss,3.87178
eval/runtime,2.2438
eval/samples_per_second,42.785
eval/steps_per_second,2.674
train/epoch,1.0
train/global_step,27.0
train/learning_rate,0.0
train/loss,4.1705


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: e1z7hhfs with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.036823178187751784
[34m[1mwandb[0m: 	fr_word_p: 0.3432131142137884
[34m[1mwandb[0m: 	learning_rate: 9.618059581397212e-05
[34m[1mwandb[0m: 	random_state: 100
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.1472,1.221476,0.0,3.1146


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,3.1146
eval/loss,1.22148
eval/runtime,2.2386
eval/samples_per_second,42.884
eval/steps_per_second,2.68
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.1472


[34m[1mwandb[0m: Agent Starting Run: 821bdsn0 with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.12413886978074325
[34m[1mwandb[0m: 	fr_word_p: 0.07855962584983993
[34m[1mwandb[0m: 	learning_rate: 0.0001157231677812616
[34m[1mwandb[0m: 	random_state: 50
[34m[1mwandb[0m: 	weight_decay: 0.2




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.1543,1.055873,0.0,3.1667


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,3.1667
eval/loss,1.05587
eval/runtime,2.6501
eval/samples_per_second,36.225
eval/steps_per_second,2.264
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.1543


[34m[1mwandb[0m: Agent Starting Run: 0dcur7r7 with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.483001777961202
[34m[1mwandb[0m: 	fr_word_p: 0.7391343203332741
[34m[1mwandb[0m: 	learning_rate: 6.398069712972872e-05
[34m[1mwandb[0m: 	random_state: 100
[34m[1mwandb[0m: 	weight_decay: 0.4




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.2602,1.326939,0.0003,4.9688


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0003
eval/gen_len,4.9688
eval/loss,1.32694
eval/runtime,2.5819
eval/samples_per_second,37.182
eval/steps_per_second,2.324
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.2602


[34m[1mwandb[0m: Agent Starting Run: twsfibgy with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.5013491803470502
[34m[1mwandb[0m: 	fr_word_p: 0.34408863921261534
[34m[1mwandb[0m: 	learning_rate: 0.0007794458156605326
[34m[1mwandb[0m: 	random_state: 10
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,0.9224,0.674547,0.0001,4.1979


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0001
eval/gen_len,4.1979
eval/loss,0.67455
eval/runtime,2.237
eval/samples_per_second,42.914
eval/steps_per_second,2.682
train/epoch,1.0
train/global_step,172.0
train/learning_rate,1e-05
train/loss,0.9224


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: s3aiug5j with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.3666409883524076
[34m[1mwandb[0m: 	fr_word_p: 0.5586590696205764
[34m[1mwandb[0m: 	learning_rate: 0.0006927502443643693
[34m[1mwandb[0m: 	random_state: 30
[34m[1mwandb[0m: 	weight_decay: 0.1




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.0843,0.864624,0.0,2.25


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,2.25
eval/loss,0.86462
eval/runtime,2.2261
eval/samples_per_second,43.125
eval/steps_per_second,2.695
train/epoch,1.0
train/global_step,54.0
train/learning_rate,1e-05
train/loss,1.0843


[34m[1mwandb[0m: Agent Starting Run: w5s4w6vc with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.142322122265327
[34m[1mwandb[0m: 	fr_word_p: 0.700432846123837
[34m[1mwandb[0m: 	learning_rate: 0.000273264650141292
[34m[1mwandb[0m: 	random_state: 100
[34m[1mwandb[0m: 	weight_decay: 0.1




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.2328,1.121522,0.0,2.625


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,2.625
eval/loss,1.12152
eval/runtime,2.2302
eval/samples_per_second,43.045
eval/steps_per_second,2.69
train/epoch,1.0
train/global_step,54.0
train/learning_rate,1e-05
train/loss,1.2328


[34m[1mwandb[0m: Agent Starting Run: thooc665 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.03542170415930151
[34m[1mwandb[0m: 	fr_word_p: 0.7703259065836922
[34m[1mwandb[0m: 	learning_rate: 0.000235193834639328
[34m[1mwandb[0m: 	random_state: 40
[34m[1mwandb[0m: 	weight_decay: 0.2




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.2334,1.13152,0.0,3.3438


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,3.3438
eval/loss,1.13152
eval/runtime,2.2107
eval/samples_per_second,43.424
eval/steps_per_second,2.714
train/epoch,1.0
train/global_step,54.0
train/learning_rate,0.0
train/loss,1.2334


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: rdcre4z7 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.17744087303962247
[34m[1mwandb[0m: 	fr_word_p: 0.5367257928181528
[34m[1mwandb[0m: 	learning_rate: 0.00011888153469919643
[34m[1mwandb[0m: 	random_state: 0
[34m[1mwandb[0m: 	weight_decay: 0.3




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.702,1.578847,0.002,6.5312


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.002
eval/gen_len,6.5312
eval/loss,1.57885
eval/runtime,2.9233
eval/samples_per_second,32.839
eval/steps_per_second,2.052
train/epoch,1.0
train/global_step,54.0
train/learning_rate,1e-05
train/loss,1.702


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: b2k3fstn with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.5505484510553698
[34m[1mwandb[0m: 	fr_word_p: 0.8440319547570221
[34m[1mwandb[0m: 	learning_rate: 7.916511914256649e-05
[34m[1mwandb[0m: 	random_state: 50
[34m[1mwandb[0m: 	weight_decay: 0.1




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.2262,1.297477,0.0004,4.6667


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0004
eval/gen_len,4.6667
eval/loss,1.29748
eval/runtime,2.7123
eval/samples_per_second,35.394
eval/steps_per_second,2.212
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.2262


[34m[1mwandb[0m: Agent Starting Run: n6h7k9lr with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.49303865404347896
[34m[1mwandb[0m: 	fr_word_p: 0.6620112472226801
[34m[1mwandb[0m: 	learning_rate: 1.1812457625355704e-05
[34m[1mwandb[0m: 	random_state: 70
[34m[1mwandb[0m: 	weight_decay: 0.5




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,4.2777,4.172988,0.1017,18.1771


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.1017
eval/gen_len,18.1771
eval/loss,4.17299
eval/runtime,2.7944
eval/samples_per_second,34.354
eval/steps_per_second,2.147
train/epoch,1.0
train/global_step,54.0
train/learning_rate,0.0
train/loss,4.2777


[34m[1mwandb[0m: Agent Starting Run: ejeuiuaz with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.204703625860697
[34m[1mwandb[0m: 	fr_word_p: 0.45687410478364554
[34m[1mwandb[0m: 	learning_rate: 0.00010195914137966136
[34m[1mwandb[0m: 	random_state: 70
[34m[1mwandb[0m: 	weight_decay: 0.2




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.7268,1.632492,0.0027,7.1979


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0027
eval/gen_len,7.1979
eval/loss,1.63249
eval/runtime,2.5974
eval/samples_per_second,36.959
eval/steps_per_second,2.31
train/epoch,1.0
train/global_step,54.0
train/learning_rate,1e-05
train/loss,1.7268


[34m[1mwandb[0m: Agent Starting Run: 4zih95dd with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.8922658627752631
[34m[1mwandb[0m: 	fr_word_p: 0.08428507512773574
[34m[1mwandb[0m: 	learning_rate: 7.82513372467444e-05
[34m[1mwandb[0m: 	random_state: 10
[34m[1mwandb[0m: 	weight_decay: 0




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.9827,1.343049,0.0244,10.875


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0244
eval/gen_len,10.875
eval/loss,1.34305
eval/runtime,2.2429
eval/samples_per_second,42.802
eval/steps_per_second,2.675
train/epoch,1.0
train/global_step,54.0
train/learning_rate,1e-05
train/loss,1.9827


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: s5tl3e0g with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.7817237427980834
[34m[1mwandb[0m: 	fr_word_p: 0.4724365980854402
[34m[1mwandb[0m: 	learning_rate: 9.293306187024969e-05
[34m[1mwandb[0m: 	random_state: 60
[34m[1mwandb[0m: 	weight_decay: 0.4




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.952,1.430696,0.0054,8.3021


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0054
eval/gen_len,8.3021
eval/loss,1.4307
eval/runtime,2.6535
eval/samples_per_second,36.179
eval/steps_per_second,2.261
train/epoch,1.0
train/global_step,54.0
train/learning_rate,1e-05
train/loss,1.952


[34m[1mwandb[0m: Agent Starting Run: p84kmjyu with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.8062404622739876
[34m[1mwandb[0m: 	fr_word_p: 0.7640357457888571
[34m[1mwandb[0m: 	learning_rate: 8.36708850883483e-05
[34m[1mwandb[0m: 	random_state: 80
[34m[1mwandb[0m: 	weight_decay: 0.3




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.8546,1.237831,0.0023,8.5625


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0023
eval/gen_len,8.5625
eval/loss,1.23783
eval/runtime,2.7758
eval/samples_per_second,34.585
eval/steps_per_second,2.162
train/epoch,1.0
train/global_step,54.0
train/learning_rate,1e-05
train/loss,1.8546


[34m[1mwandb[0m: Agent Starting Run: 1krhu92m with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.7082922925532461
[34m[1mwandb[0m: 	fr_word_p: 0.17749138887057617
[34m[1mwandb[0m: 	learning_rate: 8.504474760197488e-05
[34m[1mwandb[0m: 	random_state: 10
[34m[1mwandb[0m: 	weight_decay: 0.4




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.189,1.04071,0.0006,5.0104


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0006
eval/gen_len,5.0104
eval/loss,1.04071
eval/runtime,2.3804
eval/samples_per_second,40.33
eval/steps_per_second,2.521
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.189


[34m[1mwandb[0m: Agent Starting Run: 3g9705f3 with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.09156496230786491
[34m[1mwandb[0m: 	fr_word_p: 0.45554044111098047
[34m[1mwandb[0m: 	learning_rate: 6.902519774628652e-05
[34m[1mwandb[0m: 	random_state: 20
[34m[1mwandb[0m: 	weight_decay: 0.5




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.308,1.093692,0.0,3.4271


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0
eval/gen_len,3.4271
eval/loss,1.09369
eval/runtime,2.3498
eval/samples_per_second,40.854
eval/steps_per_second,2.553
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.308


[34m[1mwandb[0m: Agent Starting Run: dkvci7ke with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fr_char_p: 0.5476828397128914
[34m[1mwandb[0m: 	fr_word_p: 0.2617097176914787
[34m[1mwandb[0m: 	learning_rate: 3.603048379733686e-05
[34m[1mwandb[0m: 	random_state: 0
[34m[1mwandb[0m: 	weight_decay: 0.5




Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,1.4711,1.682662,0.0076,8.9167


0,1
eval/bleu,▁
eval/gen_len,▁
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁

0,1
eval/bleu,0.0076
eval/gen_len,8.9167
eval/loss,1.68266
eval/runtime,2.5456
eval/samples_per_second,37.712
eval/steps_per_second,2.357
train/epoch,1.0
train/global_step,172.0
train/learning_rate,0.0
train/loss,1.4711


------------------

## Wolof to french

The only thing that we will change is the order of sentences. The wolof sentence is the first one to write.

### Configure dataset 🔠

We can use the same custom dataset that we created in [text_augmentation](text_augmentation.ipynb). But we need to split the data between train and test sets and save them.

In [None]:
def split_data(random_state: int = 50):

  # load the corpora and split into train and test sets
  corpora = pd.read_csv(f"{path}new_data/sent_extraction.csv")

  train_set, test_set = train_test_split(corpora, test_size=0.1, random_state=random_state)

  # let us save the sets
  train_set.to_csv(f"{path}new_data/train_set.csv", index=False)

  test_set.to_csv(f"{path}new_data/test_set.csv", index=False)

Let us recuperate the datasets.

In [None]:
def recuperate_datasets(wf_char_p: float, wf_word_p):

  # with augmentation
  wf_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=wf_char_p, aug_word_p=wf_word_p),
                                        remove_mark_space, delete_guillemet_space)

  train_dataset_aug = SentenceDataset(f"{path}new_data/train_set.csv", 
                                  tokenizer_path = f"{path}wolof-translate/wolof_translate/tokenizers/tokenizer_v1.json",
                                  corpus_1="wolof_corpus",
                                  corpus_2="french_corpus",
                                  cp1_transformer=wf_augmentation, truncation=True,
                                  max_len=579)

  test_dataset = SentenceDataset(f"{path}new_data/test_set.csv",
                                tokenizer_path = f"{path}wolof-translate/wolof_translate/tokenizers/tokenizer_v1.json",
                                corpus_1="wolof_corpus",
                                corpus_2="french_corpus",
                                truncation=True, max_len=579)
  
  return train_dataset_aug, test_dataset

### Configure hyperparameter search ⚙️

We have to configure the search space and the search method ("random" in our case). .

In [None]:
import wandb
wandb.login(key="237a8450cd2568ea1c8e1f8e0400708e79b6b4ee")

# hyperparameters
sweep_config = {
    'method': 'bayes',
    'metric':{
          'goal': 'minimize',
          'name': 'eval_loss'
      },
    'parameters':
    {
      'epochs': {
          'value': 1
      },
      'batch_size': {
          'values': [2, 3, 5]
      },
      'learning_rate': {
          'distribution': 'log_uniform_values',
          'min': 1e-5,
          'max': 1e-3
      },
      'weight_decay': {
          'values': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5]
      },
     'wf_char_p': {
          'min': 0.0,
          'max': 0.7
     },
     'wf_word_p': {
          'min': 0.0,
          'max': 0.7
     },
     'random_state': {
         'values': [0, 10, 20, 30, 40, 50, 60, 70, 80, 100]
     }
    }
}

# Initialize the hyperparameter search
sweep_id = wandb.sweep(sweep_config, project = "gpt2-wolof-french-translation_bayes1_1")



[34m[1mwandb[0m: Currently logged in as: [33moumar-kane[0m ([33moumar-kane-team[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Create sweep with ID: alygo14y
Sweep URL: https://wandb.ai/oumar-kane-team/gpt2-wolof-french-translation_bayes1_1/sweeps/alygo14y


### Configure the model and the evaluation function ⚙️

Let us recuperate the model and resize the token embeddings.

In [None]:
def gpt2_model_init(tokenizer):
  # set the mode name
  model_name = "gpt2"

  # recuperate the tokenizer from the dataset
  tokenizer = tokenizer

  # configure the model
  model = GPT2LMHeadModel.from_pretrained(model_name).cuda()

  # resize the token embeddings
  model.resize_token_embeddings(len(tokenizer))

  return model

Let us evaluate the predictions with the `bleu` metric.

In [None]:
# %%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):
        
        preds, labels = eval_preds.preds.detach().cpu(), labels.detach().cpu()
        
        if isinstance(preds, tuple):
            
            preds = preds[0]
        
        if self.decoder is None:
            
            decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)
            
            decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)
            
            decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)
            
            result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
            
            result = {"bleu": result["score"]}
            
            prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in preds]
            
            result["gen_len"] = np.mean(prediction_lens)
        
        else:
            
            predictions = list(self.decoder(preds))
            
            labels = list(self.decoder(labels))
      
            decoded_preds, decoded_labels = self.postprocess_text(predictions, labels)
            
            result = self.metric.compute(predictions=predictions, references=labels)
            
            result = {"bleu": result["score"]}
        
        result = {k:round(v, 4) for k, v in result.items()}

        wandb.log("bleu", result["bleu"])
            
        return result

Downloading builder script:   0%|          | 0.00/8.15k [00:00<?, ?B/s]

In [None]:
# %run wolof-translate/wolof_translate/utils/evaluation.py

Let us initialize the evaluation object.

In [None]:
# translation_eval = TranslationEvaluation(test_dataset.tokenizer)

### Searching for the best parameters 🕖

Let us define the data collator.

In [None]:
def data_collator(batch):
    """Generate a batch of data to provide to trainer

    Args:
        batch (_type_): The batch

    Returns:
        dict: A dictionary containing the ids, the attention mask and the labels
    """
    input_ids = torch.stack([b[0] for b in batch])
    
    attention_mask = torch.stack([b[1] for b in batch])
    
    labels = torch.stack([b[0] for b in batch])
    
    return {'input_ids': input_ids, 'attention_mask': attention_mask,
            'labels': labels}

Let us initialize the training arguments and make random search.

In [None]:
# %%wandb

def train(config = None):

  with wandb.init(config = config):

    # seed
    torch.manual_seed(50)

    # set sweep configuration
    config = wandb.config

    # split the data
    split_data(config.random_state)

    # let us recuperate the datasets
    train_dataset, test_dataset = recuperate_datasets(config.wf_char_p, config.wf_word_p)

    # get train and test datasets according to the config

    # train_dataset = datasets[config.dataset_aug]['train_dataset']

    # test_dataset = datasets[config.dataset_aug]['test_dataset']

    # set training arguments
    training_args = TrainingArguments(f"{path}training2/Results1",
                                      report_to = f"wandb",
                                      num_train_epochs=config.epochs,
                                      # logging_steps=100,
                                      load_best_model_at_end=True,
                                      save_strategy="epoch",
                                      evaluation_strategy="epoch",
                                      logging_strategy = 'epoch',
                                      per_device_train_batch_size=config.batch_size, 
                                      per_device_eval_batch_size=5,
                                      learning_rate=config.learning_rate,
                                      weight_decay=config.weight_decay,
                                      remove_unused_columns = False,
                                      fp16 = True,
                                      )   

    # define training loop
    trainer = Trainer(model_init=partial(gpt2_model_init, tokenizer = train_dataset.tokenizer),
                      args=training_args,
                      train_dataset=train_dataset, 
                      eval_dataset=test_dataset,
                      data_collator=data_collator,
                      # compute_metrics=translation_eval.compute_metrics
                      )

    # start training loop
    trainer.train()

agent = wandb.agent(sweep_id, train, count = 25)


[34m[1mwandb[0m: Agent Starting Run: a0u0t6k2 with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 6.702369179262155e-05
[34m[1mwandb[0m: 	random_state: 30
[34m[1mwandb[0m: 	weight_decay: 0.4
[34m[1mwandb[0m: 	wf_char_p: 0.2819068695463206
[34m[1mwandb[0m: 	wf_word_p: 0.3271474379445852


Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]



Epoch,Training Loss,Validation Loss
1,1.3314,0.972646


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.97265
eval/runtime,2.7358
eval/samples_per_second,29.973
eval/steps_per_second,6.214
train/epoch,1.0
train/global_step,367.0
train/learning_rate,0.0
train/loss,1.3314
train/total_flos,216590170752000.0
train/train_loss,1.33135


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 8p97mqyj with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 0.000687378518112751
[34m[1mwandb[0m: 	random_state: 20
[34m[1mwandb[0m: 	weight_decay: 0.2
[34m[1mwandb[0m: 	wf_char_p: 0.10242928904824668
[34m[1mwandb[0m: 	wf_word_p: 0.3761238934195836




Epoch,Training Loss,Validation Loss
1,1.2118,0.902809


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.90281
eval/runtime,2.7187
eval/samples_per_second,30.162
eval/steps_per_second,6.253
train/epoch,1.0
train/global_step,367.0
train/learning_rate,1e-05
train/loss,1.2118
train/total_flos,216590170752000.0
train/train_loss,1.21175


[34m[1mwandb[0m: Agent Starting Run: cufx9n8t with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 1.226951404890168e-05
[34m[1mwandb[0m: 	random_state: 70
[34m[1mwandb[0m: 	weight_decay: 0.2
[34m[1mwandb[0m: 	wf_char_p: 0.23759176734591644
[34m[1mwandb[0m: 	wf_word_p: 0.13227163155096622




Epoch,Training Loss,Validation Loss
1,1.5713,0.956762


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.95676
eval/runtime,2.7129
eval/samples_per_second,30.226
eval/steps_per_second,6.266
train/epoch,1.0
train/global_step,245.0
train/learning_rate,0.0
train/loss,1.5713
train/total_flos,216590170752000.0
train/train_loss,1.57131


[34m[1mwandb[0m: Agent Starting Run: y2zo0avu with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 0.0007890211017920526
[34m[1mwandb[0m: 	random_state: 20
[34m[1mwandb[0m: 	weight_decay: 0.5
[34m[1mwandb[0m: 	wf_char_p: 0.2153084724244564
[34m[1mwandb[0m: 	wf_word_p: 0.03750263309251056




Epoch,Training Loss,Validation Loss
1,1.3947,0.852979


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.85298
eval/runtime,2.7128
eval/samples_per_second,30.227
eval/steps_per_second,6.267
train/epoch,1.0
train/global_step,147.0
train/learning_rate,4e-05
train/loss,1.3947
train/total_flos,216590170752000.0
train/train_loss,1.39468


[34m[1mwandb[0m: Agent Starting Run: 6zl4nshz with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 0.00013060682353049685
[34m[1mwandb[0m: 	random_state: 40
[34m[1mwandb[0m: 	weight_decay: 0.1
[34m[1mwandb[0m: 	wf_char_p: 0.6581358201308465
[34m[1mwandb[0m: 	wf_word_p: 0.010288732393246668




Epoch,Training Loss,Validation Loss
1,1.1252,0.843749


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.84375
eval/runtime,2.7146
eval/samples_per_second,30.207
eval/steps_per_second,6.262
train/epoch,1.0
train/global_step,367.0
train/learning_rate,0.0
train/loss,1.1252
train/total_flos,216590170752000.0
train/train_loss,1.12516


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: lmlh43bi with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 4.6544061486102166e-05
[34m[1mwandb[0m: 	random_state: 20
[34m[1mwandb[0m: 	weight_decay: 0.4
[34m[1mwandb[0m: 	wf_char_p: 0.601761705072325
[34m[1mwandb[0m: 	wf_word_p: 0.5485382582443594
[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin




Epoch,Training Loss,Validation Loss
1,1.5112,0.959907


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.95991
eval/runtime,2.7127
eval/samples_per_second,30.228
eval/steps_per_second,6.267
train/epoch,1.0
train/global_step,245.0
train/learning_rate,0.0
train/loss,1.5112
train/total_flos,216590170752000.0
train/train_loss,1.5112


[34m[1mwandb[0m: Agent Starting Run: b3709lrr with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 8.01812368676995e-05
[34m[1mwandb[0m: 	random_state: 30
[34m[1mwandb[0m: 	weight_decay: 0.1
[34m[1mwandb[0m: 	wf_char_p: 0.3414514505517595
[34m[1mwandb[0m: 	wf_word_p: 0.06590909422021479




Epoch,Training Loss,Validation Loss
1,1.2984,0.943387


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.94339
eval/runtime,2.7141
eval/samples_per_second,30.212
eval/steps_per_second,6.264
train/epoch,1.0
train/global_step,245.0
train/learning_rate,0.0
train/loss,1.2984
train/total_flos,216590170752000.0
train/train_loss,1.2984


[34m[1mwandb[0m: Agent Starting Run: ixh1wkga with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 7.665404061522188e-05
[34m[1mwandb[0m: 	random_state: 40
[34m[1mwandb[0m: 	weight_decay: 0.5
[34m[1mwandb[0m: 	wf_char_p: 0.2025899023149373
[34m[1mwandb[0m: 	wf_word_p: 0.30403171896970477




Epoch,Training Loss,Validation Loss
1,1.6004,0.898786


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.89879
eval/runtime,2.7076
eval/samples_per_second,30.285
eval/steps_per_second,6.279
train/epoch,1.0
train/global_step,147.0
train/learning_rate,0.0
train/loss,1.6004
train/total_flos,216590170752000.0
train/train_loss,1.60037


[34m[1mwandb[0m: Agent Starting Run: yqxwql6m with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 0.0009940796140095907
[34m[1mwandb[0m: 	random_state: 40
[34m[1mwandb[0m: 	weight_decay: 0.4
[34m[1mwandb[0m: 	wf_char_p: 0.5740543904824967
[34m[1mwandb[0m: 	wf_word_p: 0.2462419315938406




Epoch,Training Loss,Validation Loss
1,1.2845,0.848199


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.8482
eval/runtime,2.7189
eval/samples_per_second,30.159
eval/steps_per_second,6.253
train/epoch,1.0
train/global_step,367.0
train/learning_rate,2e-05
train/loss,1.2845
train/total_flos,216590170752000.0
train/train_loss,1.28448


[34m[1mwandb[0m: Agent Starting Run: 6z8kvttx with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 3.136396657280805e-05
[34m[1mwandb[0m: 	random_state: 80
[34m[1mwandb[0m: 	weight_decay: 0.1
[34m[1mwandb[0m: 	wf_char_p: 0.496992386293468
[34m[1mwandb[0m: 	wf_word_p: 0.09128048050662484


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016668863250000263, max=1.0…



Epoch,Training Loss,Validation Loss
1,1.6337,0.925104


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.9251
eval/runtime,2.7135
eval/samples_per_second,30.22
eval/steps_per_second,6.265
train/epoch,1.0
train/global_step,147.0
train/learning_rate,0.0
train/loss,1.6337
train/total_flos,216590170752000.0
train/train_loss,1.63375


[34m[1mwandb[0m: Agent Starting Run: m9sb4aqc with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 2.292925960874299e-05
[34m[1mwandb[0m: 	random_state: 80
[34m[1mwandb[0m: 	weight_decay: 0.2
[34m[1mwandb[0m: 	wf_char_p: 0.1392332134258406
[34m[1mwandb[0m: 	wf_word_p: 0.33276367556881936




Epoch,Training Loss,Validation Loss
1,1.3772,0.921299


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.9213
eval/runtime,2.7224
eval/samples_per_second,30.12
eval/steps_per_second,6.244
train/epoch,1.0
train/global_step,367.0
train/learning_rate,0.0
train/loss,1.3772
train/total_flos,216590170752000.0
train/train_loss,1.37722


[34m[1mwandb[0m: Agent Starting Run: jbput8ui with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 6.444484556126347e-05
[34m[1mwandb[0m: 	random_state: 40
[34m[1mwandb[0m: 	weight_decay: 0
[34m[1mwandb[0m: 	wf_char_p: 0.29893712569494857
[34m[1mwandb[0m: 	wf_word_p: 0.26117783521919574




Epoch,Training Loss,Validation Loss
1,1.4321,0.896062


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.89606
eval/runtime,2.708
eval/samples_per_second,30.281
eval/steps_per_second,6.278
train/epoch,1.0
train/global_step,245.0
train/learning_rate,0.0
train/loss,1.4321
train/total_flos,216590170752000.0
train/train_loss,1.43209


[34m[1mwandb[0m: Agent Starting Run: 4fwhdrrr with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 2.5593238990374552e-05
[34m[1mwandb[0m: 	random_state: 0
[34m[1mwandb[0m: 	weight_decay: 0.5
[34m[1mwandb[0m: 	wf_char_p: 0.2943749543935819
[34m[1mwandb[0m: 	wf_word_p: 0.12398175089457102




Epoch,Training Loss,Validation Loss
1,1.6999,0.85214


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.85214
eval/runtime,2.7191
eval/samples_per_second,30.157
eval/steps_per_second,6.252
train/epoch,1.0
train/global_step,147.0
train/learning_rate,0.0
train/loss,1.6999
train/total_flos,216590170752000.0
train/train_loss,1.69991


[34m[1mwandb[0m: Agent Starting Run: 1bz3dbic with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 3.890480473611398e-05
[34m[1mwandb[0m: 	random_state: 10
[34m[1mwandb[0m: 	weight_decay: 0.1
[34m[1mwandb[0m: 	wf_char_p: 0.2378445346533837
[34m[1mwandb[0m: 	wf_word_p: 0.4209643755950993




Epoch,Training Loss,Validation Loss
1,1.4832,0.875917


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.87592
eval/runtime,2.7525
eval/samples_per_second,29.792
eval/steps_per_second,6.176
train/epoch,1.0
train/global_step,245.0
train/learning_rate,0.0
train/loss,1.4832
train/total_flos,216590170752000.0
train/train_loss,1.48315


[34m[1mwandb[0m: Agent Starting Run: ho6ta14m with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 0.0002774553527447285
[34m[1mwandb[0m: 	random_state: 40
[34m[1mwandb[0m: 	weight_decay: 0.2
[34m[1mwandb[0m: 	wf_char_p: 0.6714497099264989
[34m[1mwandb[0m: 	wf_word_p: 0.03656663081592687




Epoch,Training Loss,Validation Loss
1,1.1247,0.822917


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.82292
eval/runtime,2.7172
eval/samples_per_second,30.178
eval/steps_per_second,6.256
train/epoch,1.0
train/global_step,367.0
train/learning_rate,1e-05
train/loss,1.1247
train/total_flos,216590170752000.0
train/train_loss,1.12474


[34m[1mwandb[0m: Agent Starting Run: wmlfiw0r with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 0.0003923791647223955
[34m[1mwandb[0m: 	random_state: 30
[34m[1mwandb[0m: 	weight_decay: 0.2
[34m[1mwandb[0m: 	wf_char_p: 0.06807537344840917
[34m[1mwandb[0m: 	wf_word_p: 0.5604330614348828




Epoch,Training Loss,Validation Loss
1,1.5006,0.949637


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.94964
eval/runtime,2.7103
eval/samples_per_second,30.255
eval/steps_per_second,6.272
train/epoch,1.0
train/global_step,147.0
train/learning_rate,2e-05
train/loss,1.5006
train/total_flos,216590170752000.0
train/train_loss,1.5006


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: r8gcrole with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 1.821341363881095e-05
[34m[1mwandb[0m: 	random_state: 50
[34m[1mwandb[0m: 	weight_decay: 0.3
[34m[1mwandb[0m: 	wf_char_p: 0.4377052667800509
[34m[1mwandb[0m: 	wf_word_p: 0.28337028938356845




Epoch,Training Loss,Validation Loss
1,1.8462,0.981831


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.98183
eval/runtime,2.7136
eval/samples_per_second,30.218
eval/steps_per_second,6.265
train/epoch,1.0
train/global_step,147.0
train/learning_rate,0.0
train/loss,1.8462
train/total_flos,216590170752000.0
train/train_loss,1.8462


[34m[1mwandb[0m: Agent Starting Run: phwfj18n with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 7.811279212646426e-05
[34m[1mwandb[0m: 	random_state: 30
[34m[1mwandb[0m: 	weight_decay: 0.5
[34m[1mwandb[0m: 	wf_char_p: 0.2317434997460037
[34m[1mwandb[0m: 	wf_word_p: 0.4361895012572056




Epoch,Training Loss,Validation Loss
1,1.6396,0.980084


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.98008
eval/runtime,2.767
eval/samples_per_second,29.635
eval/steps_per_second,6.144
train/epoch,1.0
train/global_step,147.0
train/learning_rate,0.0
train/loss,1.6396
train/total_flos,216590170752000.0
train/train_loss,1.63955


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: jlqkadbj with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 0.00010056956001033137
[34m[1mwandb[0m: 	random_state: 10
[34m[1mwandb[0m: 	weight_decay: 0.5
[34m[1mwandb[0m: 	wf_char_p: 0.6390337876205812
[34m[1mwandb[0m: 	wf_word_p: 0.23792329132405943




Epoch,Training Loss,Validation Loss
1,1.6119,0.873367


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.87337
eval/runtime,2.7126
eval/samples_per_second,30.23
eval/steps_per_second,6.267
train/epoch,1.0
train/global_step,147.0
train/learning_rate,1e-05
train/loss,1.6119
train/total_flos,216590170752000.0
train/train_loss,1.61191


[34m[1mwandb[0m: Agent Starting Run: 6p270685 with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 7.129742665577746e-05
[34m[1mwandb[0m: 	random_state: 20
[34m[1mwandb[0m: 	weight_decay: 0.5
[34m[1mwandb[0m: 	wf_char_p: 0.39725467091195105
[34m[1mwandb[0m: 	wf_word_p: 0.5004309074101329




Epoch,Training Loss,Validation Loss
1,1.4593,0.955068


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.95507
eval/runtime,2.7243
eval/samples_per_second,30.099
eval/steps_per_second,6.24
train/epoch,1.0
train/global_step,245.0
train/learning_rate,0.0
train/loss,1.4593
train/total_flos,216590170752000.0
train/train_loss,1.45931


[34m[1mwandb[0m: Agent Starting Run: gcv6axp9 with config:
[34m[1mwandb[0m: 	batch_size: 2
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 6.252438217932992e-05
[34m[1mwandb[0m: 	random_state: 30
[34m[1mwandb[0m: 	weight_decay: 0
[34m[1mwandb[0m: 	wf_char_p: 0.29046645972247725
[34m[1mwandb[0m: 	wf_word_p: 0.2600785813543463




Epoch,Training Loss,Validation Loss
1,1.3169,0.971379


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.97138
eval/runtime,2.7496
eval/samples_per_second,29.823
eval/steps_per_second,6.183
train/epoch,1.0
train/global_step,367.0
train/learning_rate,0.0
train/loss,1.3169
train/total_flos,216590170752000.0
train/train_loss,1.31693


[34m[1mwandb[0m: Agent Starting Run: yhyy9e23 with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 3.647196100197014e-05
[34m[1mwandb[0m: 	random_state: 10
[34m[1mwandb[0m: 	weight_decay: 0.1
[34m[1mwandb[0m: 	wf_char_p: 0.1717366963251063
[34m[1mwandb[0m: 	wf_word_p: 0.1316624948710261




Epoch,Training Loss,Validation Loss
1,1.3969,0.856822


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.85682
eval/runtime,2.7137
eval/samples_per_second,30.217
eval/steps_per_second,6.265
train/epoch,1.0
train/global_step,245.0
train/learning_rate,0.0
train/loss,1.3969
train/total_flos,216590170752000.0
train/train_loss,1.3969


[34m[1mwandb[0m: Agent Starting Run: xup68ew4 with config:
[34m[1mwandb[0m: 	batch_size: 5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 3.637652455538814e-05
[34m[1mwandb[0m: 	random_state: 60
[34m[1mwandb[0m: 	weight_decay: 0.3
[34m[1mwandb[0m: 	wf_char_p: 0.677355252029728
[34m[1mwandb[0m: 	wf_word_p: 0.16163588455481578




Epoch,Training Loss,Validation Loss
1,1.6871,0.952795


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.9528
eval/runtime,2.7376
eval/samples_per_second,29.953
eval/steps_per_second,6.21
train/epoch,1.0
train/global_step,147.0
train/learning_rate,0.0
train/loss,1.6871
train/total_flos,216590170752000.0
train/train_loss,1.68711


[34m[1mwandb[0m: Agent Starting Run: wa2re6yd with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 0.00015372099844614283
[34m[1mwandb[0m: 	random_state: 0
[34m[1mwandb[0m: 	weight_decay: 0.1
[34m[1mwandb[0m: 	wf_char_p: 0.4593404124892092
[34m[1mwandb[0m: 	wf_word_p: 0.5384324544438147




Epoch,Training Loss,Validation Loss
1,1.4028,0.823874


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.82387
eval/runtime,2.7275
eval/samples_per_second,30.064
eval/steps_per_second,6.233
train/epoch,1.0
train/global_step,245.0
train/learning_rate,0.0
train/loss,1.4028
train/total_flos,216590170752000.0
train/train_loss,1.40277


[34m[1mwandb[0m: Agent Starting Run: i305ffth with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 4.284698923411304e-05
[34m[1mwandb[0m: 	random_state: 30
[34m[1mwandb[0m: 	weight_decay: 0.4
[34m[1mwandb[0m: 	wf_char_p: 0.6295786463189464
[34m[1mwandb[0m: 	wf_word_p: 0.6968639258681786




Epoch,Training Loss,Validation Loss
1,1.5093,0.986804


0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁
train/learning_rate,▁
train/loss,▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.9868
eval/runtime,2.7162
eval/samples_per_second,30.189
eval/steps_per_second,6.259
train/epoch,1.0
train/global_step,245.0
train/learning_rate,0.0
train/loss,1.5093
train/total_flos,216590170752000.0
train/train_loss,1.5093


-----------

## Colab download and remove step

In [None]:
import shutil

shutil.rmtree('/content/drive/MyDrive/Memoire/subject2/T5/training/bayes_search_results')
shutil.rmtree('wandb')
# shutil.make_archive('wandb', 'zip', 'wanbd')