Custom Transformer Training
-------------------------------

In this notebook we will train the custom transformer on multiple GPUs if they are available. The GPUs are in a single machine. In [multiple](_custom_transformer_train_multiple.ipynb), we will use sagemaker to distribute the training of the model over multiple instances. 

We will pursue the following steps:

- Load the libraries
- Creating function to recuperate datasets (arguments: char_p, word_p, max_len, end_mark, corpus_1, corpus_2, data_directory)
- Training (The model is automatically saved)(arguments: config dictionary initialized before)
- Predictions

-------------------------------------------

#### French-Wolof v5

➡️ Import the libraries.

In [1]:
from wolof_translate import *

# specify a seed for everything
lt.seed_everything(0)

Global seed set to 0


0

➡️ Function to recuperate datasets

In [2]:
%%writefile wolof-translate/wolof_translate/utils/recuperate_datasets.py
from wolof_translate import *

def recuperate_datasets(char_p: float, word_p: float, max_len: int, end_mark: int, tokenizer: T5TokenizerFast,
                        corpus_1: str = 'french', corpus_2: str = 'wolof', 
                        train_file: str = 'data/extractions/new_data/train_set.csv', 
                        test_file: str = 'data/extractions/new_data/test_file.csv'):

  # Let us recuperate the end_mark adding option
  if end_mark == 1:
    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space)

    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space)
    
  else:
    
    if end_mark == 2:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!', replace = True)
    
    elif end_mark == 3:

      end_mark_fn = partial(add_end_mark)
    
    elif end_mark == 4:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!')
    
    else:  
        
        raise ValueError(f'No end mark number {end_mark}')

    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(train_file,
                                        tokenizer,
                                        truncation = False,
                                        cp1_transformer = fr_augmentation_1,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2
                                        )

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(test_file,
                                        tokenizer,
                                        cp1_transformer = fr_augmentation_2,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2,
                                        truncation = False)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

Overwriting wolof-translate/wolof_translate/utils/recuperate_datasets.py


In [3]:
%run wolof-translate/wolof_translate/utils/recuperate_datasets.py

➡️ Training

In [12]:
# initialize the configurations
config = {
    'epochs': 21,
    'max_epoch': None,
    'log_step': 1,
    'metric_for_best_model': 'test_loss',
    'metric_objective': 'minimize',
    'corpus_1': 'french',
    'corpus_2': 'wolof',
    'train_file': 'data/extractions/new_data/train_set.csv',
    'test_file': 'data/extractions/new_data/valid_set.csv',
    'drop_out_rate': 0.291121690756753,
    'd_model': 512,
    'n_head': 8,
    'dim_ff': 2024,
    'n_encoders': 6,
    'n_decoders': 6,
    'learning_rate': 1e-3,
    'weight_decay': 0.0,
    'char_p': 0.8986208054599546,
    'word_p': 0.7876712525708085,
    'end_mark': 3,
    'label_smoothing': 0.1,
    'max_len': 20,
    'random_state': 0,
    'boundaries': [2, 23, 43, 64, 84, 104],
    'batch_sizes': [256, 128, 64, 32, 16, 8, 4],
    'batch_size': None, 
    'warmup_init': False,
    'relative_step': False,
    'num_workers': 0,
    'pin_memory': False,
    # --------------------> Must be changed when continuing a training
    'model_dir': 't5_small_v5_fw',
    'new_model_dir': 't5_small_v5_fw',
    'continue': False, # --------------------------> Must be changed when continuing training
    'logging_dir': 'data/logs/t5_small_fw',
    'save_best': True,
    'tokenizer_path': 'wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v4.model',
    'data_directory': 'data/extractions/new_data/',
    'data_file': 'ad_sentences.csv',
    'version': 5,
    # in the case of a distributed training
    'backend': None,
    'hosts': [],
    'current_host': None,
    'num_gpus': 5,
    'logger': None,
    'return_trainer': True,
    'include_split': True,
}

In [13]:
%%writefile wolof-translate/wolof_translate/utils/hg_training.py
from wolof_translate import *
import warnings

def train(config: dict):
    
    # ---------------------------------------
    # add distribution if necessary (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_mnist/mnist.py)
    
    logger = config['logger']
    
    is_distributed = len(config['hosts']) > 1 and config['backend'] is not None
    
    use_cuda = config['num_gpus'] > 0
    
    config.update({"num_workers": 1, "pin_memory": True} if use_cuda else {})

    if not logger is None:
        
        logger.debug("Distributed training - {}".format(is_distributed))
        
        logger.debug("Number of gpus available - {}".format(config['num_gpus']))
        
    if is_distributed:
        # Initialize the distributed environment.
        world_size = len(config['hosts'])
        
        os.environ["WORLD_SIZE"] = str(world_size)
        
        host_rank = config['hosts'].index(config['current_host'])
        
        os.environ["RANK"] = str(host_rank)
        
        dist.init_process_group(backend=config['backend'], rank=host_rank, world_size=world_size)
        
        if not logger is None: logger.info(
            "Initialized the distributed environment: '{}' backend on {} nodes. ".format(
                config['backend'], dist.get_world_size()
            )
            + "Current host rank is {}. Number of gpus: {}".format(dist.get_rank(), config['num_gpus'])
        )
    # ---------------------------------------
    
    # split the data
    if config['include_split']: split_data(config['random_state'], config['data_directory'], config['data_file'])

    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # Initialize the model name
    model_name = 't5-small'

    # import the model with its pre-trained weights
    model = T5ForConditionalGeneration.from_pretrained(model_name)

    # resize the token embeddings
    model.resize_token_embeddings(len(tokenizer))
    
    # recuperate train and test set
    train_dataset, test_dataset = recuperate_datasets(config['char_p'],
                                                        config['word_p'], config['max_len'],
                                                        config['end_mark'], tokenizer, config['corpus_1'],
                                                        config['corpus_2'],
                                                        config['train_file'], config['test_file'])
    
    # initialize the evaluation object
    evaluation = TranslationEvaluation(tokenizer, train_dataset.decode)

    # let us initialize the trainer
    trainer = ModelRunner(model = model, version=config['version'], seed = 0, evaluation = evaluation, optimizer = Adafactor)

    #-------------------------------------
    # in the case when the linear learning rate scheduler with warmup is used
    
    # let us calculate the appropriate warmup steps (let us take a max epoch of 100)
    # length = len(train_dataset)

    # n_steps = length // config['batch_size']

    # num_steps = config['max_epoch'] * n_steps

    # warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

    # Initialize the scheduler parameters
    # scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}
    #-------------------------------------

    # Initialize the optimizer parameters
    optimizer_args = {
        'lr': config['learning_rate'],
        'weight_decay': config['weight_decay'],
        # 'betas': (0.9, 0.98),
        'warmup_init': config['warmup_init'],
        'relative_step': config['relative_step']
    }

    # ----------------------------
    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    train_sampler = SequenceLengthBatchSampler(train_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    # ------------------------------
    # initialize a bucket sampler with fixed batch size in the case of single machine
    # with parallelization on multiple gpus
    # train_sampler = BucketSampler(train_dataset, config['batch_size'])

    # test_sampler = BucketSampler(test_dataset, config['batch_size'])
    
    # ------------------------------

    # Initialize the loaders parameters
    train_loader_args = {'batch_sampler': train_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    # Add the datasets and hyperparameters to trainer
    trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                    test_loader_args, optimizer_kwargs = optimizer_args,
                    # lr_scheduler=get_linear_schedule_with_warmup,
                    # lr_scheduler_kwargs=scheduler_args,
                    predict_with_generate = True,
                    hugging_face = True,
                    is_distributed=is_distributed,
                    logging_dir=config['logging_dir'],
                    dist=dist
                    )

    # load the model
    trainer.load(config['model_dir'], load_best = not config['continue'])
    
    # Train the model
    trainer.train(config['epochs'] - trainer.current_epoch, auto_save = True, log_step = config['log_step'], saving_directory=config['new_model_dir'], save_best = config['save_best'],
                  metric_for_best_model = config['metric_for_best_model'], metric_objective = config['metric_objective'])
    
    if config['return_trainer']:
        
        return trainer
    
    return None


Overwriting wolof-translate/wolof_translate/utils/hg_training.py


Below train and save if we want.

In [14]:
from wolof_translate.utils.hg_training import train

In [37]:
# with warnings.catch_warnings():
    # warnings.simplefilter("ignore")
trainer = train(config)

# save if necessary

  0%|          | 0/25 [00:00<?, ?it/s]

For epoch 6: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.41batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.23batches/s]



Metrics: {'train_loss': 2.5114653375916345, 'test_loss': 2.7761967109911367, 'bleu': 2.266399494949495, 'gen_len': 13.217162121212121}




  4%|▍         | 1/25 [00:10<04:22, 10.92s/it]

For epoch 7: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.40batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.98batches/s]



Metrics: {'train_loss': 2.42468051816903, 'test_loss': 2.6943533950381813, 'bleu': 2.4339045454545456, 'gen_len': 12.560631313131314}




  8%|▊         | 2/25 [00:22<04:17, 11.18s/it]

For epoch 8: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.49batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.98batches/s]



Metrics: {'train_loss': 2.3517192236461786, 'test_loss': 2.634918566906091, 'bleu': 2.3541358585858587, 'gen_len': 13.671723737373739}




 12%|█▏        | 3/25 [00:33<04:06, 11.22s/it]

For epoch 9: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.39batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.99batches/s]



Metrics: {'train_loss': 2.2777167031615697, 'test_loss': 2.590723254463889, 'bleu': 2.4163686868686867, 'gen_len': 14.858569696969699}




 16%|█▌        | 4/25 [00:44<03:56, 11.26s/it]

For epoch 10: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.29batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.13batches/s]



Metrics: {'train_loss': 2.223224872882271, 'test_loss': 2.6014229601079766, 'bleu': 2.3500434343434353, 'gen_len': 11.525243434343436}




 20%|██        | 5/25 [00:54<03:35, 10.80s/it]

For epoch 11: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:06<00:00,  7.09batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.07batches/s]



Metrics: {'train_loss': 2.1764694935607425, 'test_loss': 2.5320193478555386, 'bleu': 2.618009595959596, 'gen_len': 12.217173737373738}




 24%|██▍       | 6/25 [01:06<03:29, 11.01s/it]

For epoch 12: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.34batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.32batches/s]



Metrics: {'train_loss': 2.1303964158841753, 'test_loss': 2.5434020721551147, 'bleu': 2.744057070707071, 'gen_len': 11.03030404040404}




 28%|██▊       | 7/25 [01:15<03:09, 10.55s/it]

For epoch 13: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.40batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.02batches/s]



Metrics: {'train_loss': 2.0804740817089926, 'test_loss': 2.5099077537806354, 'bleu': 2.563536868686869, 'gen_len': 12.616167676767677}




 32%|███▏      | 8/25 [01:27<03:03, 10.78s/it]

For epoch 14: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.33batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.17batches/s]



Metrics: {'train_loss': 2.0455848709229487, 'test_loss': 2.52968150919134, 'bleu': 3.1110070707070716, 'gen_len': 11.015133333333333}




 36%|███▌      | 9/25 [01:36<02:47, 10.47s/it]

For epoch 15: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.37batches/s]
Test batch number 3:  14%|█▍        | 1/7 [00:00<00:01,  4.58batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.09batches/s]



Metrics: {'train_loss': 1.9926729948742343, 'test_loss': 2.5232395668222445, 'bleu': 2.864400505050505, 'gen_len': 11.368671212121212}




 40%|████      | 10/25 [01:46<02:34, 10.30s/it]

For epoch 16: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.33batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.00batches/s]



Metrics: {'train_loss': 1.9518446629745607, 'test_loss': 2.5120888189835986, 'bleu': 2.8665191919191924, 'gen_len': 11.176735858585861}




 44%|████▍     | 11/25 [01:56<02:23, 10.23s/it]

For epoch 17: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.32batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.07batches/s]



Metrics: {'train_loss': 1.9194935183974475, 'test_loss': 2.5144295403451626, 'bleu': 3.2974808080808082, 'gen_len': 11.974762626262626}




 48%|████▊     | 12/25 [02:06<02:11, 10.13s/it]

For epoch 18: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:06<00:00,  7.16batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.09batches/s]



Metrics: {'train_loss': 1.8835548197144882, 'test_loss': 2.5088922808868714, 'bleu': 2.9407616161616166, 'gen_len': 11.671747474747475}




 52%|█████▏    | 13/25 [02:18<02:05, 10.50s/it]

For epoch 19: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.24batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.14batches/s]



Metrics: {'train_loss': 1.842434627055217, 'test_loss': 2.5319321829863273, 'bleu': 2.9120015151515157, 'gen_len': 10.03538484848485}




 56%|█████▌    | 14/25 [02:28<01:53, 10.31s/it]

For epoch 20: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.34batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.05batches/s]



Metrics: {'train_loss': 1.8105094993061694, 'test_loss': 2.497430601505318, 'bleu': 2.875379292929293, 'gen_len': 11.111095454545456}




 60%|██████    | 15/25 [02:39<01:45, 10.59s/it]

For epoch 21: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.26batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.26batches/s]



Metrics: {'train_loss': 1.7762581537525384, 'test_loss': 2.4893684290876292, 'bleu': 3.0885227272727276, 'gen_len': 11.181794444444446}




 64%|██████▍   | 16/25 [02:50<01:36, 10.72s/it]

For epoch 22: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.25batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:02<00:00,  2.47batches/s]



Metrics: {'train_loss': 1.7403693729228116, 'test_loss': 2.5324128372500643, 'bleu': 2.810114141414142, 'gen_len': 9.777791919191921}




 68%|██████▊   | 17/25 [02:59<01:22, 10.36s/it]

For epoch 23: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.26batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.31batches/s]



Metrics: {'train_loss': 1.6994967128113057, 'test_loss': 2.506803387343282, 'bleu': 2.749485858585859, 'gen_len': 10.61617676767677}




 72%|███████▏  | 18/25 [03:09<01:11, 10.17s/it]

For epoch 24: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.27batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.06batches/s]



Metrics: {'train_loss': 1.6744661027309942, 'test_loss': 2.52493722992714, 'bleu': 2.8192727272727276, 'gen_len': 12.17172676767677}




 76%|███████▌  | 19/25 [03:19<01:00, 10.13s/it]

For epoch 25: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.19batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.08batches/s]



Metrics: {'train_loss': 1.6336687108069412, 'test_loss': 2.5186826291710447, 'bleu': 3.126379797979798, 'gen_len': 11.545429292929294}




 80%|████████  | 20/25 [03:29<00:50, 10.12s/it]

For epoch 26: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.23batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.04batches/s]



Metrics: {'train_loss': 1.595658942463276, 'test_loss': 2.5293197246512986, 'bleu': 3.3226782828282833, 'gen_len': 11.797979292929293}




 84%|████████▍ | 21/25 [03:39<00:40, 10.10s/it]

For epoch 27: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.20batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.08batches/s]



Metrics: {'train_loss': 1.5806173942723958, 'test_loss': 2.5458529356754185, 'bleu': 3.031581818181818, 'gen_len': 12.040430303030304}




 88%|████████▊ | 22/25 [03:49<00:30, 10.07s/it]

For epoch 28: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.21batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.96batches/s]



Metrics: {'train_loss': 1.5406483437802352, 'test_loss': 2.5303414036529235, 'bleu': 3.12880101010101, 'gen_len': 12.808074747474748}




 92%|█████████▏| 23/25 [03:59<00:20, 10.11s/it]

For epoch 29: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:05<00:00,  7.20batches/s]
Test batch number 3:  14%|█▍        | 1/7 [00:00<00:01,  4.60batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.09batches/s]



Metrics: {'train_loss': 1.5093537220692503, 'test_loss': 2.5414568535005206, 'bleu': 3.5049414141414146, 'gen_len': 12.07069797979798}




 96%|█████████▌| 24/25 [04:09<00:10, 10.09s/it]

For epoch 30: 


Train batch number 2:   0%|          | 0/43 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 44: 100%|██████████| 43/43 [00:06<00:00,  7.15batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.11batches/s]



Metrics: {'train_loss': 1.4772157685655651, 'test_loss': 2.5400167571173777, 'bleu': 3.1790439393939396, 'gen_len': 12.126261111111113}




100%|██████████| 25/25 [04:20<00:00, 10.40s/it]


In [15]:
# with warnings.catch_warnings():
    # warnings.simplefilter("ignore")
trainer = train(config)

# save if necessary

0it [00:00, ?it/s]


➡️ Predictions


In [16]:
if not trainer is None:
    
    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate the test dataset
    # initialize the transformation sequence
    end_mark_fn = partial(add_end_mark)
    augmentation = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)


    # let us get the test set
    test_dataset = SentenceDataset(f"{config['data_directory']}test_set.csv",
                                            tokenizer = tokenizer,
                                            cp1_transformer = augmentation,
                                            cp2_transformer = augmentation,
                                            corpus_1=config['corpus_1'],
                                            corpus_2=config['corpus_2'],
                                            truncation = False)

    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                            'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    metrics, prediction = trainer.evaluate(test_dataset, test_loader_args)


Evaluation batch number 2:   0%|          | 0/6 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Evaluation batch number 7: 100%|██████████| 6/6 [00:03<00:00,  1.64batches/s]


In [17]:
metrics

{'test_loss': 2.249727627243659,
 'bleu': 2.7937212121212123,
 'gen_len': 7.303034343434343}

In [18]:
prediction

Unnamed: 0,original_sentences,translations,predictions
0,C'est des femmes.,Jigéen lanu.,Nit la.
1,Cet homme qui avait voulu.,Góor gii bëggóon.,Góor gii ŋga dem.
2,Par ici?,Ci fii?,Ci wax?
3,Qu'il entre!,Na dugg ci biir su bëggée!,Koo dem!
4,À l'intérieur si tu ne veux pas!,Ci biir soo bëggul!,Ci biir!
...,...,...,...
193,"Ceux-là, cependant, sont des cases. Celle qui ...","Waaw lii nag ay néegi ñax la, néegi ñax bi ci ...","Lii ab néeg la, néeg bi dañ kooale, néeg bi d..."
194,On voit sur la photo beaucoup de personnes sor...,Ñu gis ci nataal bi ay nit ñu bari ñu génn ci ...,Nataal bii de gis naa ci benn bool bu weex ak...
195,"Ceux-là, aussi, sont des gendarmes. Ils siègen...",Ñii moom tamit ay takk-der nañ. Ñi ñi ngi bàyy...,"Lii ab néeg la, néeg bi dañ kooale, néeg bi d..."
196,"Ceci, cependant, on a l'habitude de faire les ...",Lii nag dañ ciy faral di def ndugg maanaam jig...,Waaw nataal bii de ay bunt yu dóomu-taal moo ...


----------------------------------

#### Wolof-French v5

➡️ Import the libraries.

In [22]:
from wolof_translate import *

# specify a seed for everything
lt.seed_everything(0)

Global seed set to 0


0

➡️ Function to recuperate datasets

In [23]:
%%writefile wolof-translate/wolof_translate/utils/recuperate_datasets.py
from wolof_translate import *

def recuperate_datasets(char_p: float, word_p: float, max_len: int, end_mark: int, tokenizer: T5TokenizerFast,
                        corpus_1: str = 'french', corpus_2: str = 'wolof', 
                        train_file: str = 'data/extractions/new_data/train_set.csv', 
                        test_file: str = 'data/extractions/new_data/test_file.csv'):

  # Let us recuperate the end_mark adding option
  if end_mark == 1:
    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space)

    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space)
    
  else:
    
    if end_mark == 2:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!', replace = True)
    
    elif end_mark == 3:

      end_mark_fn = partial(add_end_mark)
    
    elif end_mark == 4:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!')
    
    else:  
        
        raise ValueError(f'No end mark number {end_mark}')

    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(train_file,
                                        tokenizer,
                                        truncation = False,
                                        cp1_transformer = fr_augmentation_1,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2
                                        )

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(test_file,
                                        tokenizer,
                                        cp1_transformer = fr_augmentation_2,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2,
                                        truncation = False)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

Overwriting wolof-translate/wolof_translate/utils/recuperate_datasets.py


In [24]:
%run wolof-translate/wolof_translate/utils/recuperate_datasets.py

➡️ Training

In [33]:
# initialize the configurations
config = {
    'epochs': 50,
    'max_epoch': None,
    'log_step': 1,
    'metric_for_best_model': 'test_loss',
    'metric_objective': 'minimize',
    'corpus_1': 'wolof',
    'corpus_2': 'french',
    'train_file': 'data/extractions/new_data/train_set.csv',
    'test_file': 'data/extractions/new_data/valid_set.csv',
    'drop_out_rate': 0.291121690756753,
    'd_model': 512,
    'n_head': 8,
    'dim_ff': 2024,
    'n_encoders': 6,
    'n_decoders': 6,
    'learning_rate': 1e-3,
    'weight_decay': 0.0,
    'char_p': 0.5275538662009825,
    'word_p': 0.8981250882159111,
    'end_mark': 3,
    'label_smoothing': 0.1,
    'max_len': 20,
    'random_state': 0,
    'boundaries': [2, 23, 43, 64, 84, 104],
    'batch_sizes': [256, 128, 64, 32, 16, 8, 4],
    'batch_size': None, 
    'warmup_init': False,
    'relative_step': False,
    'num_workers': 0,
    'pin_memory': False,
    # --------------------> Must be changed when continuing a training
    'model_dir': 't5_small_v5_wf',
    'new_model_dir': 't5_small_v5_wf',
    'continue': True, # --------------------------> Must be changed when continuing training
    'logging_dir': 'data/logs/t5_small_wf',
    'save_best': True,
    'tokenizer_path': 'wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v4.model',
    'data_directory': 'data/extractions/new_data/',
    'data_file': 'ad_sentences.csv',
    'version': 5,
    # in the case of a distributed training
    'backend': None,
    'hosts': [],
    'current_host': None,
    'num_gpus': 5,
    'logger': None,
    'return_trainer': True,
    'include_split': True,
}

In [34]:
%%writefile wolof-translate/wolof_translate/utils/hg_training.py
from wolof_translate import *
import warnings

def train(config: dict):
    
    # ---------------------------------------
    # add distribution if necessary (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_mnist/mnist.py)
    
    logger = config['logger']
    
    is_distributed = len(config['hosts']) > 1 and config['backend'] is not None
    
    use_cuda = config['num_gpus'] > 0
    
    config.update({"num_workers": 1, "pin_memory": True} if use_cuda else {})

    if not logger is None:
        
        logger.debug("Distributed training - {}".format(is_distributed))
        
        logger.debug("Number of gpus available - {}".format(config['num_gpus']))
        
    if is_distributed:
        # Initialize the distributed environment.
        world_size = len(config['hosts'])
        
        os.environ["WORLD_SIZE"] = str(world_size)
        
        host_rank = config['hosts'].index(config['current_host'])
        
        os.environ["RANK"] = str(host_rank)
        
        dist.init_process_group(backend=config['backend'], rank=host_rank, world_size=world_size)
        
        if not logger is None: logger.info(
            "Initialized the distributed environment: '{}' backend on {} nodes. ".format(
                config['backend'], dist.get_world_size()
            )
            + "Current host rank is {}. Number of gpus: {}".format(dist.get_rank(), config['num_gpus'])
        )
    # ---------------------------------------
    
    # split the data
    if config['include_split']: split_data(config['random_state'], config['data_directory'], config['data_file'])

    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # Initialize the model name
    model_name = 't5-small'

    # import the model with its pre-trained weights
    model = T5ForConditionalGeneration.from_pretrained(model_name)

    # resize the token embeddings
    model.resize_token_embeddings(len(tokenizer))
    
    # recuperate train and test set
    train_dataset, test_dataset = recuperate_datasets(config['char_p'],
                                                        config['word_p'], config['max_len'],
                                                        config['end_mark'], tokenizer, config['corpus_1'],
                                                        config['corpus_2'],
                                                        config['train_file'], config['test_file'])
    
    # initialize the evaluation object
    evaluation = TranslationEvaluation(tokenizer, train_dataset.decode)

    # let us initialize the trainer
    trainer = ModelRunner(model = model, version=config['version'], seed = 0, evaluation = evaluation, optimizer = Adafactor)

    #-------------------------------------
    # in the case when the linear learning rate scheduler with warmup is used
    
    # let us calculate the appropriate warmup steps (let us take a max epoch of 100)
    # length = len(train_dataset)

    # n_steps = length // config['batch_size']

    # num_steps = config['max_epoch'] * n_steps

    # warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

    # Initialize the scheduler parameters
    # scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}
    #-------------------------------------

    # Initialize the optimizer parameters
    optimizer_args = {
        'lr': config['learning_rate'],
        'weight_decay': config['weight_decay'],
        # 'betas': (0.9, 0.98),
        'warmup_init': config['warmup_init'],
        'relative_step': config['relative_step']
    }

    # ----------------------------
    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    train_sampler = SequenceLengthBatchSampler(train_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    # ------------------------------
    # initialize a bucket sampler with fixed batch size in the case of single machine
    # with parallelization on multiple gpus
    # train_sampler = BucketSampler(train_dataset, config['batch_size'])

    # test_sampler = BucketSampler(test_dataset, config['batch_size'])
    
    # ------------------------------

    # Initialize the loaders parameters
    train_loader_args = {'batch_sampler': train_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    # Add the datasets and hyperparameters to trainer
    trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                    test_loader_args, optimizer_kwargs = optimizer_args,
                    # lr_scheduler=get_linear_schedule_with_warmup,
                    # lr_scheduler_kwargs=scheduler_args,
                    predict_with_generate = True,
                    hugging_face = True,
                    is_distributed=is_distributed,
                    logging_dir=config['logging_dir'],
                    dist=dist
                    )

    # load the model
    trainer.load(config['model_dir'], load_best = not config['continue'])
    
    # Train the model
    trainer.train(config['epochs'] - trainer.current_epoch, auto_save = True, log_step = config['log_step'], saving_directory=config['new_model_dir'], save_best = config['save_best'],
                  metric_for_best_model = config['metric_for_best_model'], metric_objective = config['metric_objective'])
    
    if config['return_trainer']:
        
        return trainer
    
    return None


Overwriting wolof-translate/wolof_translate/utils/hg_training.py


Below train and save if we want.

In [35]:
from wolof_translate.utils.hg_training import train

In [32]:
# with warnings.catch_warnings():
    # warnings.simplefilter("ignore")
trainer = train(config)

# save if necessary

  0%|          | 0/25 [00:00<?, ?it/s]

For epoch 6: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.64batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.75batches/s]



Metrics: {'train_loss': 3.025106288450267, 'test_loss': 3.0578086183528708, 'bleu': 1.330838383838384, 'gen_len': 19.964669191919192}




  4%|▍         | 1/25 [00:09<03:43,  9.31s/it]

For epoch 7: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.60batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.96batches/s]



Metrics: {'train_loss': 2.888327236344436, 'test_loss': 2.932071378736785, 'bleu': 1.3087666666666669, 'gen_len': 16.136386363636365}




  8%|▊         | 2/25 [00:19<03:49,  9.96s/it]

For epoch 8: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.56batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.95batches/s]



Metrics: {'train_loss': 2.7692004121759877, 'test_loss': 2.849424434430672, 'bleu': 1.4679595959595961, 'gen_len': 16.43939494949495}




 12%|█▏        | 3/25 [00:30<03:44, 10.19s/it]

For epoch 9: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.53batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.77batches/s]



Metrics: {'train_loss': 2.663334673526241, 'test_loss': 2.788634987792584, 'bleu': 1.5415909090909092, 'gen_len': 16.055548484848487}




 16%|█▌        | 4/25 [00:41<03:39, 10.45s/it]

For epoch 10: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.53batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:04<00:00,  1.74batches/s]



Metrics: {'train_loss': 2.5818608457117724, 'test_loss': 2.7095980692391444, 'bleu': 1.8785469696969699, 'gen_len': 13.808072222222222}




 20%|██        | 5/25 [00:51<03:32, 10.62s/it]

For epoch 11: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.60batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:04<00:00,  1.73batches/s]



Metrics: {'train_loss': 2.4917865036259266, 'test_loss': 2.6703582159196495, 'bleu': 2.21769898989899, 'gen_len': 15.121238383838381}




 24%|██▍       | 6/25 [01:02<03:23, 10.71s/it]

For epoch 12: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.59batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.82batches/s]



Metrics: {'train_loss': 2.4179942812970907, 'test_loss': 2.6327337351712314, 'bleu': 2.2543474747474748, 'gen_len': 16.212137373737374}




 28%|██▊       | 7/25 [01:13<03:12, 10.70s/it]

For epoch 13: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.58batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:02<00:00,  2.44batches/s]



Metrics: {'train_loss': 2.362779219643667, 'test_loss': 2.6032126094355728, 'bleu': 2.085411616161616, 'gen_len': 12.797957575757577}




 32%|███▏      | 8/25 [01:23<02:56, 10.40s/it]

For epoch 14: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.43batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.75batches/s]



Metrics: {'train_loss': 2.292300087677235, 'test_loss': 2.561257794649914, 'bleu': 2.3605954545454546, 'gen_len': 15.70708686868687}




 36%|███▌      | 9/25 [01:34<02:49, 10.58s/it]

For epoch 15: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.51batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.86batches/s]



Metrics: {'train_loss': 2.2410037018391695, 'test_loss': 2.5626361875823047, 'bleu': 2.6147292929292933, 'gen_len': 15.65151666666667}




 40%|████      | 10/25 [01:43<02:33, 10.23s/it]

For epoch 16: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.47batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.07batches/s]



Metrics: {'train_loss': 2.1965482313372076, 'test_loss': 2.527354256071226, 'bleu': 2.484096969696969, 'gen_len': 13.2525}




 44%|████▍     | 11/25 [01:54<02:23, 10.26s/it]

For epoch 17: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.45batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.93batches/s]



Metrics: {'train_loss': 2.1407281559226945, 'test_loss': 2.497628905556419, 'bleu': 2.9381095959595958, 'gen_len': 14.272710606060606}




 48%|████▊     | 12/25 [02:04<02:14, 10.36s/it]

For epoch 18: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.53batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.76batches/s]



Metrics: {'train_loss': 2.0784834201844107, 'test_loss': 2.497836967911384, 'bleu': 2.899290404040405, 'gen_len': 15.631311111111112}




 52%|█████▏    | 13/25 [02:14<02:01, 10.14s/it]

For epoch 19: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.49batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.89batches/s]



Metrics: {'train_loss': 2.05341728807325, 'test_loss': 2.481565406828216, 'bleu': 2.6675671717171716, 'gen_len': 16.57071616161616}




 56%|█████▌    | 14/25 [02:24<01:53, 10.28s/it]

For epoch 20: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.41batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:04<00:00,  1.68batches/s]



Metrics: {'train_loss': 2.002027263300544, 'test_loss': 2.461949396615077, 'bleu': 3.311483333333334, 'gen_len': 15.63637171717172}




 60%|██████    | 15/25 [02:36<01:45, 10.55s/it]

For epoch 21: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.41batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.78batches/s]



Metrics: {'train_loss': 1.9573933808280268, 'test_loss': 2.440742301218437, 'bleu': 3.6045080808080816, 'gen_len': 16.005034343434343}




 64%|██████▍   | 16/25 [02:47<01:36, 10.74s/it]

For epoch 22: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.43batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.82batches/s]



Metrics: {'train_loss': 1.9155979083354977, 'test_loss': 2.457561742175709, 'bleu': 4.930958080808081, 'gen_len': 16.479829292929296}




 68%|██████▊   | 17/25 [02:56<01:23, 10.40s/it]

For epoch 23: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:05<00:00,  6.38batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.87batches/s]



Metrics: {'train_loss': 1.8723649379952205, 'test_loss': 2.454600731531779, 'bleu': 4.282234343434344, 'gen_len': 15.005060101010102}




 72%|███████▏  | 18/25 [03:06<01:11, 10.16s/it]

For epoch 24: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.52batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.75batches/s]



Metrics: {'train_loss': 1.8316938165318433, 'test_loss': 2.439377421080464, 'bleu': 4.814183333333334, 'gen_len': 16.166674242424243}




 76%|███████▌  | 19/25 [03:17<01:02, 10.38s/it]

For epoch 25: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.48batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:04<00:00,  1.73batches/s]



Metrics: {'train_loss': 1.7985615124814047, 'test_loss': 2.456443261618566, 'bleu': 4.883823232323234, 'gen_len': 16.010118686868687}




 80%|████████  | 20/25 [03:27<00:51, 10.22s/it]

For epoch 26: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.44batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.79batches/s]



Metrics: {'train_loss': 1.7628219490334776, 'test_loss': 2.4467217958334717, 'bleu': 4.541628787878789, 'gen_len': 16.626272222222223}




 84%|████████▍ | 21/25 [03:36<00:40, 10.04s/it]

For epoch 27: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.44batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.14batches/s]



Metrics: {'train_loss': 1.7351601044638563, 'test_loss': 2.4418391073592987, 'bleu': 4.809811616161616, 'gen_len': 13.74244191919192}




 88%|████████▊ | 22/25 [03:45<00:29,  9.72s/it]

For epoch 28: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.43batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.84batches/s]



Metrics: {'train_loss': 1.6943547316249008, 'test_loss': 2.4684393875526665, 'bleu': 4.866994444444445, 'gen_len': 15.510115151515153}




 92%|█████████▏| 23/25 [03:55<00:19,  9.65s/it]

For epoch 29: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.45batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.76batches/s]



Metrics: {'train_loss': 1.6575911842065094, 'test_loss': 2.433619892958439, 'bleu': 5.274884848484849, 'gen_len': 15.570673232323234}




 96%|█████████▌| 24/25 [04:06<00:10, 10.04s/it]

For epoch 30: 


Train batch number 2:   0%|          | 0/32 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 33: 100%|██████████| 32/32 [00:04<00:00,  6.44batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.98batches/s]



Metrics: {'train_loss': 1.6241250370666247, 'test_loss': 2.4374969884602713, 'bleu': 4.865085353535354, 'gen_len': 15.782837878787882}




100%|██████████| 25/25 [04:15<00:00, 10.22s/it]


In [36]:
# with warnings.catch_warnings():
    # warnings.simplefilter("ignore")
trainer = train(config)

# save if necessary

  0%|          | 0/20 [00:00<?, ?it/s]

For epoch 31: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.74batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.99batches/s]



Metrics: {'train_loss': 1.6031698549495028, 'test_loss': 2.48018310286782, 'bleu': 5.082866161616162, 'gen_len': 14.929275757575759}




  5%|▌         | 1/20 [00:09<02:54,  9.20s/it]

For epoch 32: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.66batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.83batches/s]



Metrics: {'train_loss': 1.56718063592006, 'test_loss': 2.4559369737451733, 'bleu': 5.706334848484849, 'gen_len': 15.72220505050505}




 10%|█         | 2/20 [00:18<02:50,  9.47s/it]

For epoch 33: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.62batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.79batches/s]



Metrics: {'train_loss': 1.5363509753774343, 'test_loss': 2.462271097934608, 'bleu': 4.999496464646464, 'gen_len': 16.78284292929293}




 15%|█▌        | 3/20 [00:28<02:42,  9.58s/it]

For epoch 34: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.68batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.76batches/s]



Metrics: {'train_loss': 1.4993085373171824, 'test_loss': 2.4647206417237872, 'bleu': 5.2097772727272735, 'gen_len': 16.76260101010101}




 20%|██        | 4/20 [00:38<02:33,  9.62s/it]

For epoch 35: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.65batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.18batches/s]



Metrics: {'train_loss': 1.4714344369505565, 'test_loss': 2.52563930039454, 'bleu': 5.206471717171718, 'gen_len': 13.84849494949495}




 25%|██▌       | 5/20 [00:47<02:20,  9.37s/it]

For epoch 36: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.63batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.97batches/s]



Metrics: {'train_loss': 1.4317455306828286, 'test_loss': 2.4958050130593654, 'bleu': 4.841671717171717, 'gen_len': 14.994919191919193}




 30%|███       | 6/20 [00:56<02:11,  9.39s/it]

For epoch 37: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.71batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:04<00:00,  1.71batches/s]



Metrics: {'train_loss': 1.4115574834544322, 'test_loss': 2.4901456832885747, 'bleu': 5.709369696969697, 'gen_len': 16.217164646464646}




 35%|███▌      | 7/20 [01:06<02:04,  9.55s/it]

For epoch 38: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:05<00:00,  6.58batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.84batches/s]



Metrics: {'train_loss': 1.3803954382134265, 'test_loss': 2.5174373015008786, 'bleu': 5.082258080808081, 'gen_len': 14.899022222222225}




 40%|████      | 8/20 [01:16<01:55,  9.59s/it]

For epoch 39: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.64batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.33batches/s]



Metrics: {'train_loss': 1.3583325295603628, 'test_loss': 2.531702978442414, 'bleu': 5.318264646464646, 'gen_len': 12.757601515151515}




 45%|████▌     | 9/20 [01:24<01:42,  9.35s/it]

For epoch 40: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:05<00:00,  6.52batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.91batches/s]



Metrics: {'train_loss': 1.3195869691721784, 'test_loss': 2.5422805367094097, 'bleu': 5.3666823232323235, 'gen_len': 15.166634848484849}




 50%|█████     | 10/20 [01:34<01:34,  9.43s/it]

For epoch 41: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.67batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.86batches/s]



Metrics: {'train_loss': 1.297868958158934, 'test_loss': 2.5708903038140503, 'bleu': 4.87779494949495, 'gen_len': 15.833323232323233}




 55%|█████▌    | 11/20 [01:44<01:25,  9.45s/it]

For epoch 42: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.62batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.88batches/s]



Metrics: {'train_loss': 1.2691234390449861, 'test_loss': 2.5951932126825508, 'bleu': 5.3977151515151505, 'gen_len': 15.232341414141414}




 60%|██████    | 12/20 [01:53<01:15,  9.48s/it]

For epoch 43: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:05<00:00,  6.54batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.87batches/s]



Metrics: {'train_loss': 1.2421140144853033, 'test_loss': 2.590024545939282, 'bleu': 6.711281818181818, 'gen_len': 13.772703535353537}




 65%|██████▌   | 13/20 [02:03<01:06,  9.52s/it]

For epoch 44: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:04<00:00,  6.61batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.78batches/s]



Metrics: {'train_loss': 1.2233287424018762, 'test_loss': 2.5700962663900975, 'bleu': 5.34880303030303, 'gen_len': 16.36870303030303}




 70%|███████   | 14/20 [02:12<00:57,  9.59s/it]

For epoch 45: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:05<00:00,  6.53batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.81batches/s]



Metrics: {'train_loss': 1.2071754128061318, 'test_loss': 2.6190930038991604, 'bleu': 5.325007070707071, 'gen_len': 15.06563181818182}




 75%|███████▌  | 15/20 [02:22<00:48,  9.62s/it]

For epoch 46: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:05<00:00,  6.56batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.79batches/s]



Metrics: {'train_loss': 1.167711973529613, 'test_loss': 2.5440620123737996, 'bleu': 6.071860606060607, 'gen_len': 15.292952525252526}




 80%|████████  | 16/20 [02:32<00:38,  9.66s/it]

For epoch 47: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:05<00:00,  6.55batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.83batches/s]



Metrics: {'train_loss': 1.151745243541128, 'test_loss': 2.610427420548719, 'bleu': 6.87779696969697, 'gen_len': 14.500006060606061}




 85%|████████▌ | 17/20 [02:42<00:28,  9.65s/it]

For epoch 48: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:05<00:00,  6.51batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.76batches/s]



Metrics: {'train_loss': 1.1192170883152472, 'test_loss': 2.5948631040977714, 'bleu': 5.628589393939393, 'gen_len': 16.36867777777778}




 90%|█████████ | 18/20 [02:51<00:19,  9.70s/it]

For epoch 49: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:05<00:00,  6.54batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  2.09batches/s]



Metrics: {'train_loss': 1.0961309396564627, 'test_loss': 2.650956584949686, 'bleu': 7.354377777777779, 'gen_len': 14.343432323232324}




 95%|█████████▌| 19/20 [03:01<00:09,  9.55s/it]

For epoch 50: 


Train batch number 2:   0%|          | 0/33 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 34: 100%|██████████| 33/33 [00:05<00:00,  6.53batches/s]
Test batch number 2:   0%|          | 0/7 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Test batch number 8: 100%|██████████| 7/7 [00:03<00:00,  1.94batches/s]



Metrics: {'train_loss': 1.0732805997670491, 'test_loss': 2.6258159791580358, 'bleu': 6.835085353535354, 'gen_len': 14.914118686868687}




100%|██████████| 20/20 [03:10<00:00,  9.53s/it]


➡️ Predictions


In [15]:
if not trainer is None:
    
    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate the test dataset
    # initialize the transformation sequence
    end_mark_fn = partial(add_end_mark)
    augmentation = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)


    # let us get the test set
    test_dataset = SentenceDataset(f"{config['data_directory']}test_set.csv",
                                            tokenizer = tokenizer,
                                            cp1_transformer = augmentation,
                                            cp2_transformer = augmentation,
                                            corpus_1=config['corpus_1'],
                                            corpus_2=config['corpus_2'],
                                            truncation = False)

    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                            'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    metrics, prediction = trainer.evaluate(test_dataset, test_loader_args)


Evaluation batch number 2:   0%|          | 0/6 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Evaluation batch number 7: 100%|██████████| 6/6 [00:02<00:00,  2.03batches/s]


In [16]:
metrics

{'test_loss': 2.876970546414154,
 'bleu': 8.537555050505052,
 'gen_len': 7.237385353535354}

In [17]:
prediction

Unnamed: 0,original_sentences,translations,predictions
0,L'homme est parti je crois!,Ma defe góor gi dem na!,Góor gi!
1,Tu as vu l'homme?,Gis ŋga nit ki?,Gis ŋga nit ki woon?
2,Tu vois ces femmes-là?,Gis ŋga jigéen ñooñii?,Gis ŋga xale bii?
3,Sois un être de raison!,Dil nit!,Saal!
4,Je veux que l'homme vienne.,Bëgg naa góor gi ñëw.,Degguma.
...,...,...,...
193,"Ceux-là, cependant, sont des cases. Celle qui ...","Waaw lii nag ay néegi ñax la, néegi ñax bi ci ...","Lii, ag kër la, kër gu yàqu. Boo ko gisee xam..."
194,On voit sur la photo beaucoup de personnes sor...,Ñu gis ci nataal bi ay nit ñu bari ñu génn ci ...,"Waxu ñu ngi ci ag kër la, kër gu yàqu."
195,"Ceux-là, aussi, sont des gendarmes. Ils siègen...",Ñii moom tamit ay takk-der nañ. Ñi ñi ngi bàyy...,Lii nag ëe kaas la ak palaat. Nu ciy xelli ka...
196,"Ceci, cependant, on a l'habitude de faire les ...",Lii nag dañ ciy faral di def ndugg maanaam jig...,nii nag mu mel ni benn bërëb jullikaay wala.....
