Custom Transformer Training
-------------------------------

In this notebook we will train the custom transformer on multiple GPUs if they are available. The GPUs are in a single machine. In [multiple](_custom_transformer_train_multiple.ipynb), we will use sagemaker to distribute the training of the model over multiple instances. 

We will pursue the following steps:

- Load the libraries
- Creating function to recuperate datasets (arguments: char_p, word_p, max_len, end_mark, corpus_1, corpus_2, data_directory)
- Training (The model is automatically saved)(arguments: config dictionary initialized before)
- Predictions

--------------------------

#### French-Wolof v6

➡️ Import the libraries.

In [5]:
from wolof_translate import *

# specify a seed for everything
lt.seed_everything(0)

Global seed set to 0


0

➡️ Function to recuperate datasets

In [6]:
%%writefile wolof-translate/wolof_translate/utils/recuperate_datasets.py
from wolof_translate import *

def recuperate_datasets(char_p: float, word_p: float, max_len: int, end_mark: int, tokenizer: T5TokenizerFast,
                        corpus_1: str = 'french', corpus_2: str = 'wolof', 
                        train_file: str = 'data/extractions/new_data/train_set.csv', 
                        test_file: str = 'data/extractions/new_data/test_file.csv'):

  # Let us recuperate the end_mark adding option
  if end_mark == 1:
    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space)

    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space)
    
  else:
    
    if end_mark == 2:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!', replace = True)
    
    elif end_mark == 3:

      end_mark_fn = partial(add_end_mark)
    
    elif end_mark == 4:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!')
    
    else:  
        
        raise ValueError(f'No end mark number {end_mark}')

    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(train_file,
                                        tokenizer,
                                        truncation = False,
                                        cp1_transformer = fr_augmentation_1,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2
                                        )

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(test_file,
                                        tokenizer,
                                        cp1_transformer = fr_augmentation_2,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2,
                                        truncation = False)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

Overwriting wolof-translate/wolof_translate/utils/recuperate_datasets.py


In [7]:
%run wolof-translate/wolof_translate/utils/recuperate_datasets.py

➡️ Training

In [8]:
# initialize the configurations
config = {
    'epochs': 30,
    'max_epoch': None,
    'log_step': 1,
    'metric_for_best_model': 'test_loss',
    'metric_objective': 'minimize',
    'corpus_1': 'french',
    'corpus_2': 'wolof',
    'train_file': 'data/extractions/new_data/train_set.csv',
    'test_file': 'data/extractions/new_data/valid_set.csv',
    'drop_out_rate': 0.2419294660308021,
    'd_model': 512,
    'n_head': 8,
    'dim_ff': 2086,
    'n_encoders': 6,
    'n_decoders': 6,
    'learning_rate': None,
    'weight_decay': 0.0,
    'char_p': 0.3527965684636239,
    'word_p': 0.037437213564754435,
    'end_mark': 3,
    'label_smoothing': 0.1,
    'max_len': 20,
    'random_state': 0,
    'boundaries': [2, 31, 59, 87, 115, 143, 171],
    'batch_sizes': [256, 128, 64, 32, 16, 8, 4, 2],
    'batch_size': None, 
    'warmup_init': True,
    'relative_step': True,
    'num_workers': 0,
    'pin_memory': False,
    # --------------------> Must be changed when continuing a training
    'model_dir': 'custom_transformer_v6_fw_best',
    'new_model_dir': 'custom_transformer_v6_fw',
    'continue': False, # --------------------------> Must be changed when continuing training
    'logging_dir': 'data/logs/custom_transformer_fw',
    'save_best': True,
    'tokenizer_path': 'wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v5.model',
    'data_directory': 'data/extractions/new_data/',
    'data_file': 'corpora_v6.csv',
    'version': 6,
    # in the case of a distributed training
    'backend': None,
    'hosts': [],
    'current_host': None,
    'num_gpus': 5,
    'logger': None,
    'return_trainer': True,
    'include_split': True,
}

In [9]:
%%writefile wolof-translate/wolof_translate/utils/training.py
from wolof_translate import *
import warnings

def train(config: dict):
    
    # ---------------------------------------
    # add distribution if necessary (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_mnist/mnist.py)
    
    logger = config['logger']
    
    is_distributed = len(config['hosts']) > 1 and config['backend'] is not None
    
    use_cuda = config['num_gpus'] > 0
    
    config.update({"num_workers": 1, "pin_memory": True} if use_cuda else {})

    if not logger is None:
        
        logger.debug("Distributed training - {}".format(is_distributed))
        
        logger.debug("Number of gpus available - {}".format(config['num_gpus']))
        
    if is_distributed:
        # Initialize the distributed environment.
        world_size = len(config['hosts'])
        
        os.environ["WORLD_SIZE"] = str(world_size)
        
        host_rank = config['hosts'].index(config['current_host'])
        
        os.environ["RANK"] = str(host_rank)
        
        dist.init_process_group(backend=config['backend'], rank=host_rank, world_size=world_size)
        
        if not logger is None: logger.info(
            "Initialized the distributed environment: '{}' backend on {} nodes. ".format(
                config['backend'], dist.get_world_size()
            )
            + "Current host rank is {}. Number of gpus: {}".format(dist.get_rank(), config['num_gpus'])
        )
    # ---------------------------------------
    
    # split the data
    if config['include_split']: split_data(config['random_state'], config['data_directory'], config['data_file'])

    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate train and test set
    train_dataset, test_dataset = recuperate_datasets(config['char_p'],
                                                        config['word_p'], config['max_len'],
                                                        config['end_mark'], tokenizer, config['corpus_1'],
                                                        config['corpus_2'],
                                                        config['train_file'], config['test_file'])
    
    # initialize the evaluation object
    evaluation = TranslationEvaluation(tokenizer, train_dataset.decode)

    # let us initialize the trainer
    trainer = ModelRunner(model = Transformer, version=config['version'], seed = 0, evaluation = evaluation, optimizer = Adafactor)

    # initialize the encoder and the decoder layers
    encoder_layer = nn.TransformerEncoderLayer(config['d_model'],
                                                config['n_head'],
                                                config['dim_ff'],
                                                config['drop_out_rate'], batch_first = True)

    decoder_layer = nn.TransformerDecoderLayer(config['d_model'],
                                                config['n_head'],
                                                config['dim_ff'],
                                                config['drop_out_rate'], batch_first = True)

    # let us initialize the encoder and the decoder
    encoder = nn.TransformerEncoder(encoder_layer, config['n_encoders'])

    decoder = nn.TransformerDecoder(decoder_layer, config['n_decoders'])

    #-------------------------------------
    # in the case when the linear learning rate scheduler with warmup is used
    
    # let us calculate the appropriate warmup steps (let us take a max epoch of 100)
    # length = len(train_dataset)

    # n_steps = length // config['batch_size']

    # num_steps = config['max_epoch'] * n_steps

    # warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

    # Initialize the scheduler parameters
    # scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}
    #-------------------------------------

    # Initialize the transformer parameters
    model_args = {
        'vocab_size': len(tokenizer),
        'encoder': encoder,
        'decoder': decoder,
        'class_criterion': nn.CrossEntropyLoss(label_smoothing = config['label_smoothing']),
        'max_len': config['max_len']
    }

    # Initialize the optimizer parameters
    optimizer_args = {
        'lr': config['learning_rate'],
        'weight_decay': config['weight_decay'],
        # 'betas': (0.9, 0.98),
        'warmup_init': config['warmup_init'],
        'relative_step': config['relative_step']
    }

    # ----------------------------
    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    train_sampler = SequenceLengthBatchSampler(train_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    # ------------------------------
    # initialize a bucket sampler with fixed batch size in the case of single machine
    # with parallelization on multiple gpus
    # train_sampler = BucketSampler(train_dataset, config['batch_size'])

    # test_sampler = BucketSampler(test_dataset, config['batch_size'])
    
    # ------------------------------

    # Initialize the loaders parameters
    train_loader_args = {'batch_sampler': train_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    # Add the datasets and hyperparameters to trainer
    trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                    test_loader_args, optimizer_kwargs = optimizer_args,
                    model_kwargs = model_args,
                    # lr_scheduler=get_linear_schedule_with_warmup,
                    # lr_scheduler_kwargs=scheduler_args,
                    predict_with_generate = True,
                    is_distributed=is_distributed,
                    logging_dir=config['logging_dir'],
                    dist=dist
                    )

    # load the model
    trainer.load(config['model_dir'], load_best = not config['continue'])
    
    # Train the model
    trainer.train(config['epochs'] - trainer.current_epoch, auto_save = True, log_step = config['log_step'], saving_directory=config['new_model_dir'], save_best = config['save_best'],
                  metric_for_best_model = config['metric_for_best_model'], metric_objective = config['metric_objective'])
    
    if config['return_trainer']:
        
        return trainer
    
    return None


Overwriting wolof-translate/wolof_translate/utils/training.py


Below train and save if we want.

In [10]:
from wolof_translate.utils.training import train

In [11]:
# with warnings.catch_warnings():
    # warnings.simplefilter("ignore")
trainer = train(config)

# save if necessary

  0%|          | 0/25 [00:00<?, ?it/s]

For epoch 6: 


Train batch number 2:   0%|          | 0/44 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.45batches/s]
  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.26s/batches]



Metrics: {'train_loss': 7.934772714340482, 'test_loss': 7.833881711626386, 'accuracy': 0.06772622377622378, 'bleu': 0.14415874125874129, 'gen_len': 34.04895104895105}




  4%|▍         | 1/25 [00:35<14:16, 35.70s/it]

For epoch 7: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.64batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.26s/batches]



Metrics: {'train_loss': 7.760191181835046, 'test_loss': 7.694770501210139, 'accuracy': 0.06672902097902098, 'bleu': 0.1445832167832168, 'gen_len': 34.04895104895105}




  8%|▊         | 2/25 [01:10<13:31, 35.28s/it]

For epoch 8: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.57batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.28s/batches]



Metrics: {'train_loss': 7.61706943328484, 'test_loss': 7.552814035148888, 'accuracy': 0.06966188811188809, 'bleu': 0.14514405594405597, 'gen_len': 34.04895104895105}




 12%|█▏        | 3/25 [01:46<12:57, 35.35s/it]

For epoch 9: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.56batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.28s/batches]



Metrics: {'train_loss': 7.46579562997422, 'test_loss': 7.413245406184163, 'accuracy': 0.0685716783216783, 'bleu': 0.1469937062937063, 'gen_len': 34.04895104895105}




 16%|█▌        | 4/25 [02:21<12:23, 35.41s/it]

For epoch 10: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.54batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.30s/batches]



Metrics: {'train_loss': 7.302814248038733, 'test_loss': 7.2720197864345755, 'accuracy': 0.07874020979020978, 'bleu': 0.1592762237762238, 'gen_len': 34.04895104895105}




 20%|██        | 5/25 [02:57<11:50, 35.52s/it]

For epoch 11: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.50batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.30s/batches]



Metrics: {'train_loss': 7.131047444412374, 'test_loss': 7.171542259363027, 'accuracy': 0.08227237762237762, 'bleu': 0.16601888111888116, 'gen_len': 34.04895104895105}




 24%|██▍       | 6/25 [03:33<11:17, 35.66s/it]

For epoch 12: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.55batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.27s/batches]



Metrics: {'train_loss': 6.932831786113584, 'test_loss': 7.105062326351245, 'accuracy': 0.08631433566433566, 'bleu': 0.17152167832167833, 'gen_len': 34.04895104895105}




 28%|██▊       | 7/25 [04:08<10:40, 35.58s/it]

For epoch 13: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.51batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.28s/batches]



Metrics: {'train_loss': 6.723532695878681, 'test_loss': 7.050896767969731, 'accuracy': 0.08469195804195803, 'bleu': 0.1650629370629371, 'gen_len': 34.04895104895105}




 32%|███▏      | 8/25 [04:44<10:05, 35.61s/it]

For epoch 14: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.57batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.29s/batches]



Metrics: {'train_loss': 6.55946037182393, 'test_loss': 6.996962205513373, 'accuracy': 0.0912590909090909, 'bleu': 0.18054685314685315, 'gen_len': 34.04895104895105}




 36%|███▌      | 9/25 [05:19<09:29, 35.58s/it]

For epoch 15: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.53batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.29s/batches]



Metrics: {'train_loss': 6.410641746120682, 'test_loss': 6.931248434773692, 'accuracy': 0.1069444055944056, 'bleu': 0.2089811188811189, 'gen_len': 34.04895104895105}




 40%|████      | 10/25 [05:55<08:54, 35.64s/it]

For epoch 16: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.60batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.28s/batches]



Metrics: {'train_loss': 6.304844635878788, 'test_loss': 6.900436131270615, 'accuracy': 0.11669265734265735, 'bleu': 0.2242853146853147, 'gen_len': 34.04895104895105}




 44%|████▍     | 11/25 [06:31<08:17, 35.56s/it]

For epoch 17: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.60batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.31s/batches]



Metrics: {'train_loss': 6.221980996042241, 'test_loss': 6.931699144256699, 'accuracy': 0.11473531468531467, 'bleu': 0.15758741258741257, 'gen_len': 34.04895104895105}




 48%|████▊     | 12/25 [07:04<07:35, 35.06s/it]

For epoch 18: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.58batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.28s/batches]



Metrics: {'train_loss': 6.145892856524507, 'test_loss': 6.938010539208258, 'accuracy': 0.09204650349650349, 'bleu': 0.16453286713286713, 'gen_len': 34.04895104895105}




 52%|█████▏    | 13/25 [07:38<06:56, 34.67s/it]

For epoch 19: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.59batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:19<00:00,  2.22s/batches]



Metrics: {'train_loss': 6.088921090323276, 'test_loss': 6.951084475417236, 'accuracy': 0.0859660839160839, 'bleu': 0.17963146853146858, 'gen_len': 34.04895104895105}




 56%|█████▌    | 14/25 [08:11<06:16, 34.23s/it]

For epoch 20: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.63batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:19<00:00,  2.19s/batches]



Metrics: {'train_loss': 6.036534367975419, 'test_loss': 6.917959348305121, 'accuracy': 0.08607517482517482, 'bleu': 0.18342657342657342, 'gen_len': 34.04895104895105}




 60%|██████    | 15/25 [08:44<05:37, 33.79s/it]

For epoch 21: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:11<00:00,  3.67batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:19<00:00,  2.20s/batches]



Metrics: {'train_loss': 5.987669150382086, 'test_loss': 6.926544929717804, 'accuracy': 0.08562377622377623, 'bleu': 0.218943006993007, 'gen_len': 34.04895104895105}




 64%|██████▍   | 16/25 [09:17<05:01, 33.48s/it]

For epoch 22: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:11<00:00,  3.70batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:19<00:00,  2.19s/batches]



Metrics: {'train_loss': 5.940447477466633, 'test_loss': 6.961722484001748, 'accuracy': 0.07660174825174824, 'bleu': 0.19005734265734267, 'gen_len': 34.04895104895105}




 68%|██████▊   | 17/25 [09:49<04:25, 33.19s/it]

For epoch 23: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.62batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:19<00:00,  2.20s/batches]



Metrics: {'train_loss': 5.8911104306520015, 'test_loss': 6.929494771090421, 'accuracy': 0.06923216783216785, 'bleu': 0.19322027972027975, 'gen_len': 34.04895104895105}




 72%|███████▏  | 18/25 [10:22<03:51, 33.11s/it]

For epoch 24: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.63batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:19<00:00,  2.20s/batches]



Metrics: {'train_loss': 5.84470271230213, 'test_loss': 6.902182347290998, 'accuracy': 0.0744006993006993, 'bleu': 0.2032332167832168, 'gen_len': 34.04895104895105}




 76%|███████▌  | 19/25 [10:55<03:18, 33.04s/it]

For epoch 25: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.61batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:19<00:00,  2.19s/batches]



Metrics: {'train_loss': 5.798899612001417, 'test_loss': 6.878334332179357, 'accuracy': 0.0780034965034965, 'bleu': 0.1954346153846154, 'gen_len': 34.04895104895105}




 80%|████████  | 20/25 [11:30<02:47, 33.49s/it]

For epoch 26: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:11<00:00,  3.68batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.22s/batches]



Metrics: {'train_loss': 5.749149788296477, 'test_loss': 6.976302328643264, 'accuracy': 0.06902132867132868, 'bleu': 0.16852727272727275, 'gen_len': 34.04895104895105}




 84%|████████▍ | 21/25 [12:03<02:13, 33.32s/it]

For epoch 27: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:11<00:00,  3.70batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:19<00:00,  2.21s/batches]



Metrics: {'train_loss': 5.715225611496139, 'test_loss': 6.953042335443564, 'accuracy': 0.06098496503496504, 'bleu': 0.18417237762237762, 'gen_len': 34.04895104895105}




 88%|████████▊ | 22/25 [12:35<01:39, 33.15s/it]

For epoch 28: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:11<00:00,  3.68batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:19<00:00,  2.22s/batches]



Metrics: {'train_loss': 5.670965863358364, 'test_loss': 6.90616479453507, 'accuracy': 0.06299230769230771, 'bleu': 0.19109615384615383, 'gen_len': 34.04895104895105}




 92%|█████████▏| 23/25 [13:08<01:06, 33.07s/it]

For epoch 29: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:11<00:00,  3.67batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.23s/batches]



Metrics: {'train_loss': 5.626204092428576, 'test_loss': 6.971113051567878, 'accuracy': 0.06677622377622379, 'bleu': 0.16144090909090905, 'gen_len': 34.04895104895105}




 96%|█████████▌| 24/25 [13:41<00:33, 33.06s/it]

For epoch 30: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 45: 100%|██████████| 44/44 [00:12<00:00,  3.55batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:20<00:00,  2.25s/batches]



Metrics: {'train_loss': 5.57767354321365, 'test_loss': 6.944304607964896, 'accuracy': 0.06408951048951048, 'bleu': 0.21971853146853149, 'gen_len': 34.04895104895105}




100%|██████████| 25/25 [14:15<00:00, 34.22s/it]


➡️ Predictions


In [10]:
if not trainer is None:
    
    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate the test dataset
    # initialize the transformation sequence
    end_mark_fn = partial(add_end_mark)
    augmentation = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)


    # let us get the test set
    test_dataset = SentenceDataset(f"{config['data_directory']}test_set.csv",
                                            tokenizer = tokenizer,
                                            cp1_transformer = augmentation,
                                            cp2_transformer = augmentation,
                                            corpus_1=config['corpus_1'],
                                            corpus_2=config['corpus_2'],
                                            truncation = False)

    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                            'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    metrics, prediction = trainer.evaluate(test_dataset, test_loader_args)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Evaluation batch number 8: 100%|██████████| 7/7 [00:39<00:00,  5.66s/batches]


In [11]:
metrics

{'test_loss': 6.96613918031965,
 'accuracy': 0.06841428571428572,
 'bleu': 0.28232857142857143,
 'gen_len': 82.14285714285714}

In [12]:
prediction

Unnamed: 0,original_sentences,translations,predictions
0,L'homme t'avait vu.,Góor gi gisóon na la.,Gis Gis naa na na.....
1,Tu te rappelles son amour?,Gis ŋga coroom la woon?,Gis?????????
2,La nuit se passe bien.,Guddi gaangi fi rek.,Gis Gis naa na na.....
3,Cela simplement!,Loolu doŋŋ!,Gis Gis Gis la la la!!!!
4,Où le mets-tu?,Foo kay def?,Gis?????????
...,...,...,...
281,"Ce n'est que longtemps après, quand l'égoïsme ...","Teg nañ ciy ati-at ma door a jëli ni jigéen, n...","Gis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
282,"J'ai ressenti de l'étonnement, et même de l'in...","Li wóor te wér moo di ne bi loolu lépp weesoo,...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
283,À quel point les arbres aux troncs rectilignes...,"Dàtti garab yaa ngi lunk, sànneeku jëm ca kow,...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
284,Je peux ressentir l'émotion qu'il éprouve à tr...,"Li koy yëngal noonu, xam naa ko. Lan moo ko dà...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."


----------------------------------------------------------------

#### Wolof-French v6

➡️ Import the libraries.

In [12]:
from wolof_translate import *

# specify a seed for everything
lt.seed_everything(0)

Global seed set to 0


0

➡️ Function to recuperate datasets

In [13]:
%%writefile wolof-translate/wolof_translate/utils/recuperate_datasets.py
from wolof_translate import *

def recuperate_datasets(char_p: float, word_p: float, max_len: int, end_mark: int, tokenizer: T5TokenizerFast,
                        corpus_1: str = 'french', corpus_2: str = 'wolof', 
                        train_file: str = 'data/extractions/new_data/train_set.csv', 
                        test_file: str = 'data/extractions/new_data/test_file.csv'):

  # Let us recuperate the end_mark adding option
  if end_mark == 1:
    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space)

    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space)
    
  else:
    
    if end_mark == 2:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!', replace = True)
    
    elif end_mark == 3:

      end_mark_fn = partial(add_end_mark)
    
    elif end_mark == 4:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!')
    
    else:  
        
        raise ValueError(f'No end mark number {end_mark}')

    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(train_file,
                                        tokenizer,
                                        truncation = False,
                                        cp1_transformer = fr_augmentation_1,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2
                                        )

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(test_file,
                                        tokenizer,
                                        cp1_transformer = fr_augmentation_2,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2,
                                        truncation = False)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

Overwriting wolof-translate/wolof_translate/utils/recuperate_datasets.py


In [14]:
%run wolof-translate/wolof_translate/utils/recuperate_datasets.py

➡️ Training

In [15]:
# initialize the configurations
config = {
    'epochs': 30,
    'max_epoch': None,
    'log_step': 1,
    'metric_for_best_model': 'test_loss',
    'metric_objective': 'minimize',
    'corpus_1': 'wolof',
    'corpus_2': 'french',
    'train_file': 'data/extractions/new_data/train_set.csv',
    'test_file': 'data/extractions/new_data/valid_set.csv',
    'drop_out_rate': 0.1919742253902882,
    'd_model': 512,
    'n_head': 8,
    'dim_ff': 2001,
    'n_encoders': 6,
    'n_decoders': 6,
    'learning_rate': None,
    'weight_decay': 0.0,
    'char_p': 0.05135686578146414,
    'word_p': 0.1726149822377257,
    'end_mark': 3,
    'label_smoothing': 0.1,
    'max_len': 20,
    'random_state': 0,
    'boundaries': [2, 31, 59, 87, 115, 143, 171],
    'batch_sizes': [256, 128, 64, 32, 16, 8, 4, 2],
    'batch_size': None, 
    'warmup_init': True,
    'relative_step': True,
    'num_workers': 0,
    'pin_memory': False,
    # --------------------> Must be changed when continuing a training
    'model_dir': 'custom_transformer_v6_wf_best',
    'new_model_dir': 'custom_transformer_v6_wf',
    'continue': False, # --------------------------> Must be changed when continuing training
    'logging_dir': 'data/logs/custom_transformer_wf',
    'save_best': True,
    'tokenizer_path': 'wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v5.model',
    'data_directory': 'data/extractions/new_data/',
    'data_file': 'corpora_v6.csv',
    'version': 6,
    # in the case of a distributed training
    'backend': None,
    'hosts': [],
    'current_host': None,
    'num_gpus': 5,
    'logger': None,
    'return_trainer': True,
    'include_split': True,
}

In [16]:
%%writefile wolof-translate/wolof_translate/utils/training.py
from wolof_translate import *
import warnings

def train(config: dict):
    
    # ---------------------------------------
    # add distribution if necessary (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_mnist/mnist.py)
    
    logger = config['logger']
    
    is_distributed = len(config['hosts']) > 1 and config['backend'] is not None
    
    use_cuda = config['num_gpus'] > 0
    
    config.update({"num_workers": 1, "pin_memory": True} if use_cuda else {})

    if not logger is None:
        
        logger.debug("Distributed training - {}".format(is_distributed))
        
        logger.debug("Number of gpus available - {}".format(config['num_gpus']))
        
    if is_distributed:
        # Initialize the distributed environment.
        world_size = len(config['hosts'])
        
        os.environ["WORLD_SIZE"] = str(world_size)
        
        host_rank = config['hosts'].index(config['current_host'])
        
        os.environ["RANK"] = str(host_rank)
        
        dist.init_process_group(backend=config['backend'], rank=host_rank, world_size=world_size)
        
        if not logger is None: logger.info(
            "Initialized the distributed environment: '{}' backend on {} nodes. ".format(
                config['backend'], dist.get_world_size()
            )
            + "Current host rank is {}. Number of gpus: {}".format(dist.get_rank(), config['num_gpus'])
        )
    # ---------------------------------------
    
    # split the data
    if config['include_split']: split_data(config['random_state'], config['data_directory'], config['data_file'])

    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate train and test set
    train_dataset, test_dataset = recuperate_datasets(config['char_p'],
                                                        config['word_p'], config['max_len'],
                                                        config['end_mark'], tokenizer, config['corpus_1'],
                                                        config['corpus_2'],
                                                        config['train_file'], config['test_file'])
    
    # initialize the evaluation object
    evaluation = TranslationEvaluation(tokenizer, train_dataset.decode)

    # let us initialize the trainer
    trainer = ModelRunner(model = Transformer, version=config['version'], seed = 0, evaluation = evaluation, optimizer = Adafactor)

    # initialize the encoder and the decoder layers
    encoder_layer = nn.TransformerEncoderLayer(config['d_model'],
                                                config['n_head'],
                                                config['dim_ff'],
                                                config['drop_out_rate'], batch_first = True)

    decoder_layer = nn.TransformerDecoderLayer(config['d_model'],
                                                config['n_head'],
                                                config['dim_ff'],
                                                config['drop_out_rate'], batch_first = True)

    # let us initialize the encoder and the decoder
    encoder = nn.TransformerEncoder(encoder_layer, config['n_encoders'])

    decoder = nn.TransformerDecoder(decoder_layer, config['n_decoders'])

    #-------------------------------------
    # in the case when the linear learning rate scheduler with warmup is used
    
    # let us calculate the appropriate warmup steps (let us take a max epoch of 100)
    # length = len(train_dataset)

    # n_steps = length // config['batch_size']

    # num_steps = config['max_epoch'] * n_steps

    # warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

    # Initialize the scheduler parameters
    # scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}
    #-------------------------------------

    # Initialize the transformer parameters
    model_args = {
        'vocab_size': len(tokenizer),
        'encoder': encoder,
        'decoder': decoder,
        'class_criterion': nn.CrossEntropyLoss(label_smoothing = config['label_smoothing']),
        'max_len': config['max_len']
    }

    # Initialize the optimizer parameters
    optimizer_args = {
        'lr': config['learning_rate'],
        'weight_decay': config['weight_decay'],
        # 'betas': (0.9, 0.98),
        'warmup_init': config['warmup_init'],
        'relative_step': config['relative_step']
    }

    # ----------------------------
    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    train_sampler = SequenceLengthBatchSampler(train_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    # ------------------------------
    # initialize a bucket sampler with fixed batch size in the case of single machine
    # with parallelization on multiple gpus
    # train_sampler = BucketSampler(train_dataset, config['batch_size'])

    # test_sampler = BucketSampler(test_dataset, config['batch_size'])
    
    # ------------------------------

    # Initialize the loaders parameters
    train_loader_args = {'batch_sampler': train_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    # Add the datasets and hyperparameters to trainer
    trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                    test_loader_args, optimizer_kwargs = optimizer_args,
                    model_kwargs = model_args,
                    # lr_scheduler=get_linear_schedule_with_warmup,
                    # lr_scheduler_kwargs=scheduler_args,
                    predict_with_generate = True,
                    is_distributed=is_distributed,
                    logging_dir=config['logging_dir'],
                    dist=dist
                    )

    # load the model
    trainer.load(config['model_dir'], load_best = not config['continue'])
    
    # Train the model
    trainer.train(config['epochs'] - trainer.current_epoch, auto_save = True, log_step = config['log_step'], saving_directory=config['new_model_dir'], save_best = config['save_best'],
                  metric_for_best_model = config['metric_for_best_model'], metric_objective = config['metric_objective'])
    
    if config['return_trainer']:
        
        return trainer
    
    return None


Overwriting wolof-translate/wolof_translate/utils/training.py


Below train and save if we want.

In [17]:
from wolof_translate.utils.training import train

In [18]:
# with warnings.catch_warnings():
    # warnings.simplefilter("ignore")
trainer = train(config)

# save if necessary

  0%|          | 0/25 [00:00<?, ?it/s]

For epoch 6: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  4.02batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.44s/batches]



Metrics: {'train_loss': 7.8540980479161595, 'test_loss': 7.793455522377175, 'accuracy': 0.031582167832167836, 'bleu': 0.11423706293706293, 'gen_len': 35.7937062937063}




  4%|▍         | 1/25 [00:34<13:42, 34.27s/it]

For epoch 7: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:12<00:00,  3.72batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:22<00:00,  2.46s/batches]



Metrics: {'train_loss': 7.666568416875681, 'test_loss': 7.63682396762021, 'accuracy': 0.032080069930069934, 'bleu': 0.11967762237762239, 'gen_len': 35.7937062937063}




  8%|▊         | 2/25 [01:11<13:49, 36.07s/it]

For epoch 8: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:12<00:00,  3.89batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:22<00:00,  2.46s/batches]



Metrics: {'train_loss': 7.494213374879717, 'test_loss': 7.475065116282111, 'accuracy': 0.04199475524475525, 'bleu': 0.1450615384615385, 'gen_len': 35.7937062937063}




 12%|█▏        | 3/25 [01:48<13:20, 36.37s/it]

For epoch 9: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  3.93batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:22<00:00,  2.47s/batches]



Metrics: {'train_loss': 7.297473667488381, 'test_loss': 7.344157237272997, 'accuracy': 0.0549409090909091, 'bleu': 0.06410699300699302, 'gen_len': 35.7937062937063}




 16%|█▌        | 4/25 [02:25<12:46, 36.50s/it]

For epoch 10: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:12<00:00,  3.86batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.40s/batches]



Metrics: {'train_loss': 7.07282889603422, 'test_loss': 7.2855046862488875, 'accuracy': 0.05592552447552448, 'bleu': 0.061746153846153846, 'gen_len': 35.7937062937063}




 20%|██        | 5/25 [03:01<12:08, 36.41s/it]

For epoch 11: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:12<00:00,  3.91batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.37s/batches]



Metrics: {'train_loss': 6.818478773606623, 'test_loss': 7.228219250699024, 'accuracy': 0.058067482517482526, 'bleu': 0.05647727272727273, 'gen_len': 35.7937062937063}




 24%|██▍       | 6/25 [03:37<11:28, 36.22s/it]

For epoch 12: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  3.92batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.39s/batches]



Metrics: {'train_loss': 6.584398720845104, 'test_loss': 7.222171061522477, 'accuracy': 0.059243706293706296, 'bleu': 0.06019405594405595, 'gen_len': 35.7937062937063}




 28%|██▊       | 7/25 [04:13<10:50, 36.16s/it]

For epoch 13: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  3.97batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.37s/batches]



Metrics: {'train_loss': 6.376424780046258, 'test_loss': 7.15495913345497, 'accuracy': 0.06287517482517482, 'bleu': 0.08258601398601399, 'gen_len': 35.7937062937063}




 32%|███▏      | 8/25 [04:48<10:12, 36.00s/it]

For epoch 14: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  4.10batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.36s/batches]



Metrics: {'train_loss': 6.189679010497151, 'test_loss': 7.106904018175352, 'accuracy': 0.048919930069930076, 'bleu': 0.12576433566433567, 'gen_len': 35.7937062937063}




 36%|███▌      | 9/25 [05:24<09:31, 35.74s/it]

For epoch 15: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  3.97batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.38s/batches]



Metrics: {'train_loss': 6.03270690640248, 'test_loss': 7.083049730821089, 'accuracy': 0.04290174825174825, 'bleu': 0.1851367132867133, 'gen_len': 35.7937062937063}




 40%|████      | 10/25 [05:59<08:56, 35.75s/it]

For epoch 16: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  3.93batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.39s/batches]



Metrics: {'train_loss': 5.89679601967048, 'test_loss': 7.087487886002014, 'accuracy': 0.05546993006993008, 'bleu': 0.16696398601398604, 'gen_len': 35.7937062937063}




 44%|████▍     | 11/25 [06:34<08:14, 35.33s/it]

For epoch 17: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:12<00:00,  3.84batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.41s/batches]



Metrics: {'train_loss': 5.780763471392394, 'test_loss': 7.084635017635107, 'accuracy': 0.05911993006993007, 'bleu': 0.1853625874125874, 'gen_len': 35.7937062937063}




 48%|████▊     | 12/25 [07:09<07:37, 35.19s/it]

For epoch 18: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:12<00:00,  3.87batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.44s/batches]



Metrics: {'train_loss': 5.672451679811865, 'test_loss': 7.146355477246371, 'accuracy': 0.052254545454545456, 'bleu': 0.09402972027972027, 'gen_len': 35.7937062937063}




 52%|█████▏    | 13/25 [07:43<07:01, 35.12s/it]

For epoch 19: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  4.00batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:22<00:00,  2.46s/batches]



Metrics: {'train_loss': 5.594154247197199, 'test_loss': 7.167966852654942, 'accuracy': 0.04208916083916084, 'bleu': 0.09471608391608392, 'gen_len': 35.7937062937063}




 56%|█████▌    | 14/25 [08:18<06:25, 35.03s/it]

For epoch 20: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  3.98batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.42s/batches]



Metrics: {'train_loss': 5.51876312747083, 'test_loss': 7.166435568482726, 'accuracy': 0.03020244755244755, 'bleu': 0.11120839160839162, 'gen_len': 35.7937062937063}




 60%|██████    | 15/25 [08:53<05:48, 34.88s/it]

For epoch 21: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  4.00batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.38s/batches]



Metrics: {'train_loss': 5.4514359105972945, 'test_loss': 7.144553823070926, 'accuracy': 0.03715734265734264, 'bleu': 0.11291398601398603, 'gen_len': 35.7937062937063}




 64%|██████▍   | 16/25 [09:27<05:11, 34.65s/it]

For epoch 22: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:12<00:00,  3.75batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.37s/batches]



Metrics: {'train_loss': 5.371711763482871, 'test_loss': 7.16178085920694, 'accuracy': 0.03411153846153846, 'bleu': 0.0949506993006993, 'gen_len': 35.7937062937063}




 68%|██████▊   | 17/25 [10:02<04:37, 34.69s/it]

For epoch 23: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  4.11batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.37s/batches]



Metrics: {'train_loss': 5.3065845456559115, 'test_loss': 7.310934065105197, 'accuracy': 0.026518181818181816, 'bleu': 0.09702552447552447, 'gen_len': 35.7937062937063}




 72%|███████▏  | 18/25 [10:35<04:00, 34.38s/it]

For epoch 24: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  4.12batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.35s/batches]



Metrics: {'train_loss': 5.241957541266853, 'test_loss': 7.172978327824519, 'accuracy': 0.03166608391608392, 'bleu': 0.10376363636363636, 'gen_len': 35.7937062937063}




 76%|███████▌  | 19/25 [11:09<03:24, 34.10s/it]

For epoch 25: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  4.05batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.36s/batches]



Metrics: {'train_loss': 5.167650884554069, 'test_loss': 7.401669292183189, 'accuracy': 0.02298951048951049, 'bleu': 0.09336713286713287, 'gen_len': 35.7937062937063}




 80%|████████  | 20/25 [11:43<02:49, 33.98s/it]

For epoch 26: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  3.98batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.38s/batches]



Metrics: {'train_loss': 5.108312452939279, 'test_loss': 7.252266538726699, 'accuracy': 0.025374125874125873, 'bleu': 0.1244839160839161, 'gen_len': 35.7937062937063}




 84%|████████▍ | 21/25 [12:17<02:16, 34.03s/it]

For epoch 27: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  3.97batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.42s/batches]



Metrics: {'train_loss': 5.046437872514281, 'test_loss': 7.38431969889394, 'accuracy': 0.02766083916083916, 'bleu': 0.10793181818181817, 'gen_len': 35.7937062937063}




 88%|████████▊ | 22/25 [12:51<01:42, 34.20s/it]

For epoch 28: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  4.04batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:21<00:00,  2.42s/batches]



Metrics: {'train_loss': 4.989349711759674, 'test_loss': 7.518584968326808, 'accuracy': 0.024191958041958044, 'bleu': 0.0770493006993007, 'gen_len': 35.7937062937063}




 92%|█████████▏| 23/25 [13:26<01:08, 34.25s/it]

For epoch 29: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  3.98batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:22<00:00,  2.45s/batches]



Metrics: {'train_loss': 4.918835839904133, 'test_loss': 7.587331466741493, 'accuracy': 0.02368006993006993, 'bleu': 0.07536363636363635, 'gen_len': 35.7937062937063}




 96%|█████████▌| 24/25 [14:00<00:34, 34.41s/it]

For epoch 30: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 48: 100%|██████████| 47/47 [00:11<00:00,  4.00batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 10: 100%|██████████| 9/9 [00:22<00:00,  2.47s/batches]



Metrics: {'train_loss': 4.869492654879645, 'test_loss': 7.702398358525096, 'accuracy': 0.024270279720279717, 'bleu': 0.058753496503496504, 'gen_len': 35.7937062937063}




100%|██████████| 25/25 [14:35<00:00, 35.03s/it]


➡️ Predictions


In [10]:
if not trainer is None:
    
    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate the test dataset
    # initialize the transformation sequence
    end_mark_fn = partial(add_end_mark)
    augmentation = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)


    # let us get the test set
    test_dataset = SentenceDataset(f"{config['data_directory']}test_set.csv",
                                            tokenizer = tokenizer,
                                            cp1_transformer = augmentation,
                                            cp2_transformer = augmentation,
                                            corpus_1=config['corpus_1'],
                                            corpus_2=config['corpus_2'],
                                            truncation = False)

    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                            'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    metrics, prediction = trainer.evaluate(test_dataset, test_loader_args)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Evaluation batch number 8: 100%|██████████| 7/7 [00:39<00:00,  5.66s/batches]


In [11]:
metrics

{'test_loss': 6.96613918031965,
 'accuracy': 0.06841428571428572,
 'bleu': 0.28232857142857143,
 'gen_len': 82.14285714285714}

In [12]:
prediction

Unnamed: 0,original_sentences,translations,predictions
0,L'homme t'avait vu.,Góor gi gisóon na la.,Gis Gis naa na na.....
1,Tu te rappelles son amour?,Gis ŋga coroom la woon?,Gis?????????
2,La nuit se passe bien.,Guddi gaangi fi rek.,Gis Gis naa na na.....
3,Cela simplement!,Loolu doŋŋ!,Gis Gis Gis la la la!!!!
4,Où le mets-tu?,Foo kay def?,Gis?????????
...,...,...,...
281,"Ce n'est que longtemps après, quand l'égoïsme ...","Teg nañ ciy ati-at ma door a jëli ni jigéen, n...","Gis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
282,"J'ai ressenti de l'étonnement, et même de l'in...","Li wóor te wér moo di ne bi loolu lépp weesoo,...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
283,À quel point les arbres aux troncs rectilignes...,"Dàtti garab yaa ngi lunk, sànneeku jëm ca kow,...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
284,Je peux ressentir l'émotion qu'il éprouve à tr...,"Li koy yëngal noonu, xam naa ko. Lan moo ko dà...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."


-------------------------------

--------------------------

#### French-Wolof v7

➡️ Import the libraries.

In [1]:
from wolof_translate import *

# specify a seed for everything
lt.seed_everything(0)

Global seed set to 0


0

➡️ Function to recuperate datasets

In [2]:
%%writefile wolof-translate/wolof_translate/utils/recuperate_datasets.py
from wolof_translate import *

def recuperate_datasets(char_p: float, word_p: float, max_len: int, end_mark: int, tokenizer: T5TokenizerFast,
                        corpus_1: str = 'french', corpus_2: str = 'wolof', 
                        train_file: str = 'data/extractions/new_data/train_set.csv', 
                        test_file: str = 'data/extractions/new_data/test_file.csv'):

  # Let us recuperate the end_mark adding option
  if end_mark == 1:
    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space)

    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space)
    
  else:
    
    if end_mark == 2:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!', replace = True)
    
    elif end_mark == 3:

      end_mark_fn = partial(add_end_mark)
    
    elif end_mark == 4:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!')
    
    else:  
        
        raise ValueError(f'No end mark number {end_mark}')

    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(train_file,
                                        tokenizer,
                                        truncation = False,
                                        cp1_transformer = fr_augmentation_1,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2
                                        )

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(test_file,
                                        tokenizer,
                                        cp1_transformer = fr_augmentation_2,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2,
                                        truncation = False)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

Overwriting wolof-translate/wolof_translate/utils/recuperate_datasets.py


In [3]:
%run wolof-translate/wolof_translate/utils/recuperate_datasets.py

➡️ Training

In [4]:
# initialize the configurations
config = {
    'epochs': 30,
    'max_epoch': None,
    'log_step': 1,
    'metric_for_best_model': 'test_loss',
    'metric_objective': 'minimize',
    'corpus_1': 'french',
    'corpus_2': 'wolof',
    'train_file': 'data/extractions/new_data/train_set.csv',
    'test_file': 'data/extractions/new_data/valid_set.csv',
    'drop_out_rate': 0.0975888869998125,
    'd_model': 512,
    'n_head': 8,
    'dim_ff': 2069,
    'n_encoders': 6,
    'n_decoders': 6,
    'learning_rate': None,
    'weight_decay': 0.0,
    'char_p': 0.5571906485431747,
    'word_p': 0.5830875624838612,
    'end_mark': 3,
    'label_smoothing': 0.1,
    'max_len': 20,
    'random_state': 0,
    'boundaries': [2, 30, 57, 84, 112, 139, 166],
    'batch_sizes': [256, 128, 64, 32, 16, 8, 4, 2],
    'batch_size': None, 
    'warmup_init': True,
    'relative_step': True,
    'num_workers': 0,
    'pin_memory': False,
    # --------------------> Must be changed when continuing a training
    'model_dir': 'custom_transformer_v7_fw_best',
    'new_model_dir': 'custom_transformer_v7_fw',
    'continue': False, # --------------------------> Must be changed when continuing training
    'logging_dir': 'data/logs/custom_transformer_fw',
    'save_best': True,
    'tokenizer_path': 'wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v6.model',
    'data_directory': 'data/extractions/new_data/',
    'data_file': 'corpora_v7.csv',
    'version': 7,
    # in the case of a distributed training
    'backend': None,
    'hosts': [],
    'current_host': None,
    'num_gpus': 5,
    'logger': None,
    'return_trainer': True,
    'include_split': True,
}

In [5]:
%%writefile wolof-translate/wolof_translate/utils/training.py
from wolof_translate import *
import warnings

def train(config: dict):
    
    # ---------------------------------------
    # add distribution if necessary (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_mnist/mnist.py)
    
    logger = config['logger']
    
    is_distributed = len(config['hosts']) > 1 and config['backend'] is not None
    
    use_cuda = config['num_gpus'] > 0
    
    config.update({"num_workers": 1, "pin_memory": True} if use_cuda else {})

    if not logger is None:
        
        logger.debug("Distributed training - {}".format(is_distributed))
        
        logger.debug("Number of gpus available - {}".format(config['num_gpus']))
        
    if is_distributed:
        # Initialize the distributed environment.
        world_size = len(config['hosts'])
        
        os.environ["WORLD_SIZE"] = str(world_size)
        
        host_rank = config['hosts'].index(config['current_host'])
        
        os.environ["RANK"] = str(host_rank)
        
        dist.init_process_group(backend=config['backend'], rank=host_rank, world_size=world_size)
        
        if not logger is None: logger.info(
            "Initialized the distributed environment: '{}' backend on {} nodes. ".format(
                config['backend'], dist.get_world_size()
            )
            + "Current host rank is {}. Number of gpus: {}".format(dist.get_rank(), config['num_gpus'])
        )
    # ---------------------------------------
    
    # split the data
    if config['include_split']: split_data(config['random_state'], config['data_directory'], config['data_file'])

    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate train and test set
    train_dataset, test_dataset = recuperate_datasets(config['char_p'],
                                                        config['word_p'], config['max_len'],
                                                        config['end_mark'], tokenizer, config['corpus_1'],
                                                        config['corpus_2'],
                                                        config['train_file'], config['test_file'])
    
    # initialize the evaluation object
    evaluation = TranslationEvaluation(tokenizer, train_dataset.decode)

    # let us initialize the trainer
    trainer = ModelRunner(model = Transformer, version=config['version'], seed = 0, evaluation = evaluation, optimizer = Adafactor)

    # initialize the encoder and the decoder layers
    encoder_layer = nn.TransformerEncoderLayer(config['d_model'],
                                                config['n_head'],
                                                config['dim_ff'],
                                                config['drop_out_rate'], batch_first = True)

    decoder_layer = nn.TransformerDecoderLayer(config['d_model'],
                                                config['n_head'],
                                                config['dim_ff'],
                                                config['drop_out_rate'], batch_first = True)

    # let us initialize the encoder and the decoder
    encoder = nn.TransformerEncoder(encoder_layer, config['n_encoders'])

    decoder = nn.TransformerDecoder(decoder_layer, config['n_decoders'])

    #-------------------------------------
    # in the case when the linear learning rate scheduler with warmup is used
    
    # let us calculate the appropriate warmup steps (let us take a max epoch of 100)
    # length = len(train_dataset)

    # n_steps = length // config['batch_size']

    # num_steps = config['max_epoch'] * n_steps

    # warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

    # Initialize the scheduler parameters
    # scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}
    #-------------------------------------

    # Initialize the transformer parameters
    model_args = {
        'vocab_size': len(tokenizer),
        'encoder': encoder,
        'decoder': decoder,
        'class_criterion': nn.CrossEntropyLoss(label_smoothing = config['label_smoothing']),
        'max_len': config['max_len']
    }

    # Initialize the optimizer parameters
    optimizer_args = {
        'lr': config['learning_rate'],
        'weight_decay': config['weight_decay'],
        # 'betas': (0.9, 0.98),
        'warmup_init': config['warmup_init'],
        'relative_step': config['relative_step']
    }

    # ----------------------------
    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    train_sampler = SequenceLengthBatchSampler(train_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    # ------------------------------
    # initialize a bucket sampler with fixed batch size in the case of single machine
    # with parallelization on multiple gpus
    # train_sampler = BucketSampler(train_dataset, config['batch_size'])

    # test_sampler = BucketSampler(test_dataset, config['batch_size'])
    
    # ------------------------------

    # Initialize the loaders parameters
    train_loader_args = {'batch_sampler': train_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    # Add the datasets and hyperparameters to trainer
    trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                    test_loader_args, optimizer_kwargs = optimizer_args,
                    model_kwargs = model_args,
                    # lr_scheduler=get_linear_schedule_with_warmup,
                    # lr_scheduler_kwargs=scheduler_args,
                    predict_with_generate = True,
                    is_distributed=is_distributed,
                    logging_dir=config['logging_dir'],
                    dist=dist
                    )

    # load the model
    trainer.load(config['model_dir'], load_best = not config['continue'])
    
    # Train the model
    trainer.train(config['epochs'] - trainer.current_epoch, auto_save = True, log_step = config['log_step'], saving_directory=config['new_model_dir'], save_best = config['save_best'],
                  metric_for_best_model = config['metric_for_best_model'], metric_objective = config['metric_objective'])
    
    if config['return_trainer']:
        
        return trainer
    
    return None


Overwriting wolof-translate/wolof_translate/utils/training.py


Below train and save if we want.

In [6]:
from wolof_translate.utils.training import train

In [7]:
# with warnings.catch_warnings():
    # warnings.simplefilter("ignore")
trainer = train(config)

# save if necessary

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  0%|          | 0/25 [00:00<?, ?it/s]

For epoch 6: 


Train batch number 2:   0%|          | 0/155 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:53<00:00,  2.89batches/s]
Test batch number 2:   0%|          | 0/11 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:22<00:00,  2.04s/batches]



Metrics: {'train_loss': 6.347812943421847, 'test_loss': 7.286884536091079, 'accuracy': 0.077728547008547, 'bleu': 0.17436769230769228, 'gen_len': 21.938461538461535}




  4%|▍         | 1/25 [01:16<30:39, 76.64s/it]

For epoch 7: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.68batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:18<00:00,  1.67s/batches]



Metrics: {'train_loss': 6.202891310022058, 'test_loss': 7.304513001238178, 'accuracy': 0.04969128205128205, 'bleu': 0.14563726495726492, 'gen_len': 21.938461538461535}




  8%|▊         | 2/25 [02:09<23:57, 62.49s/it]

For epoch 8: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.63batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.74s/batches]



Metrics: {'train_loss': 6.058869433464199, 'test_loss': 7.341238081353342, 'accuracy': 0.04436923076923077, 'bleu': 0.10297914529914529, 'gen_len': 21.938461538461535}




 12%|█▏        | 3/25 [03:02<21:27, 58.51s/it]

For epoch 9: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.62batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.75s/batches]



Metrics: {'train_loss': 5.930035534019244, 'test_loss': 7.4412640253702795, 'accuracy': 0.04950905982905983, 'bleu': 0.12722085470085467, 'gen_len': 21.938461538461535}




 16%|█▌        | 4/25 [03:57<19:51, 56.76s/it]

For epoch 10: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.54batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:18<00:00,  1.70s/batches]



Metrics: {'train_loss': 5.8172357590204635, 'test_loss': 7.608230721237313, 'accuracy': 0.03900478632478633, 'bleu': 0.14578940170940174, 'gen_len': 21.938461538461535}




 20%|██        | 5/25 [04:51<18:36, 55.82s/it]

For epoch 11: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.60batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.78s/batches]



Metrics: {'train_loss': 5.710812221540148, 'test_loss': 7.692296189528245, 'accuracy': 0.029491623931623932, 'bleu': 0.15800495726495722, 'gen_len': 21.938461538461535}




 24%|██▍       | 6/25 [05:45<17:31, 55.33s/it]

For epoch 12: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.63batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.79s/batches]



Metrics: {'train_loss': 5.60540836116409, 'test_loss': 7.878433681553245, 'accuracy': 0.029438290598290607, 'bleu': 0.13506888888888888, 'gen_len': 21.938461538461535}




 28%|██▊       | 7/25 [06:40<16:30, 55.04s/it]

For epoch 13: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.52batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.80s/batches]



Metrics: {'train_loss': 5.506191526446414, 'test_loss': 7.862267953921586, 'accuracy': 0.028039829059829057, 'bleu': 0.13487794871794873, 'gen_len': 21.938461538461535}




 32%|███▏      | 8/25 [07:35<15:36, 55.11s/it]

For epoch 14: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.62batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.82s/batches]



Metrics: {'train_loss': 5.395847317599348, 'test_loss': 7.77997603294177, 'accuracy': 0.03172529914529914, 'bleu': 0.1402991452991453, 'gen_len': 21.938461538461535}




 36%|███▌      | 9/25 [08:30<14:40, 55.01s/it]

For epoch 15: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.56batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.85s/batches]



Metrics: {'train_loss': 5.294270834421072, 'test_loss': 7.772342659061789, 'accuracy': 0.0426936752136752, 'bleu': 0.14857555555555552, 'gen_len': 21.938461538461535}




 40%|████      | 10/25 [09:25<13:47, 55.17s/it]

For epoch 16: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:35<00:00,  4.39batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.88s/batches]



Metrics: {'train_loss': 5.191183352337829, 'test_loss': 7.956884904193062, 'accuracy': 0.03180905982905982, 'bleu': 0.15204923076923083, 'gen_len': 21.938461538461535}




 44%|████▍     | 11/25 [10:22<13:01, 55.79s/it]

For epoch 17: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.44batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.86s/batches]



Metrics: {'train_loss': 5.080068744381242, 'test_loss': 7.9777169015672476, 'accuracy': 0.02449384615384616, 'bleu': 0.1429555555555555, 'gen_len': 21.938461538461535}




 48%|████▊     | 12/25 [11:19<12:08, 56.06s/it]

For epoch 18: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.53batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.81s/batches]



Metrics: {'train_loss': 4.9651269427316596, 'test_loss': 8.1770618088225, 'accuracy': 0.023213504273504274, 'bleu': 0.18393162393162393, 'gen_len': 21.938461538461535}




 52%|█████▏    | 13/25 [12:16<11:16, 56.40s/it]

For epoch 19: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.58batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.80s/batches]



Metrics: {'train_loss': 4.868774271500834, 'test_loss': 8.119646360120202, 'accuracy': 0.02043264957264957, 'bleu': 0.08229042735042737, 'gen_len': 21.938461538461535}




 56%|█████▌    | 14/25 [13:11<10:15, 55.96s/it]

For epoch 20: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.58batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.80s/batches]



Metrics: {'train_loss': 4.771150579811474, 'test_loss': 8.223424989749226, 'accuracy': 0.0152382905982906, 'bleu': 0.13169008547008548, 'gen_len': 21.938461538461535}




 60%|██████    | 15/25 [14:06<09:16, 55.69s/it]

For epoch 21: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.59batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.81s/batches]



Metrics: {'train_loss': 4.646613261044986, 'test_loss': 8.249136626414764, 'accuracy': 0.025264102564102567, 'bleu': 0.12399247863247863, 'gen_len': 21.938461538461535}




 64%|██████▍   | 16/25 [15:01<08:18, 55.44s/it]

For epoch 22: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:33<00:00,  4.57batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.81s/batches]



Metrics: {'train_loss': 4.55536812696832, 'test_loss': 8.414478187887077, 'accuracy': 0.012407008547008548, 'bleu': 0.09996495726495724, 'gen_len': 21.938461538461535}




 68%|██████▊   | 17/25 [15:56<07:22, 55.30s/it]

For epoch 23: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.50batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.82s/batches]



Metrics: {'train_loss': 4.439106575207797, 'test_loss': 8.376678977053388, 'accuracy': 0.03142102564102564, 'bleu': 0.12846273504273503, 'gen_len': 21.938461538461535}




 72%|███████▏  | 18/25 [16:52<06:27, 55.41s/it]

For epoch 24: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.50batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.85s/batches]



Metrics: {'train_loss': 4.338459440334114, 'test_loss': 8.655875682015704, 'accuracy': 0.010894358974358977, 'bleu': 0.10579264957264954, 'gen_len': 21.938461538461535}




 76%|███████▌  | 19/25 [17:48<05:33, 55.62s/it]

For epoch 25: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.54batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.86s/batches]



Metrics: {'train_loss': 4.225645205217921, 'test_loss': 8.8050875753419, 'accuracy': 0.016605128205128206, 'bleu': 0.22742307692307692, 'gen_len': 21.938461538461535}




 80%|████████  | 20/25 [18:44<04:38, 55.70s/it]

For epoch 26: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:35<00:00,  4.41batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.85s/batches]



Metrics: {'train_loss': 4.105685008500151, 'test_loss': 8.83837815309182, 'accuracy': 0.016957094017094016, 'bleu': 0.15468034188034185, 'gen_len': 21.938461538461535}




 84%|████████▍ | 21/25 [19:40<03:44, 56.00s/it]

For epoch 27: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.54batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.84s/batches]



Metrics: {'train_loss': 4.005466308243974, 'test_loss': 8.870791153214938, 'accuracy': 0.022394188034188037, 'bleu': 0.1484733333333334, 'gen_len': 21.938461538461535}




 88%|████████▊ | 22/25 [20:36<02:47, 55.89s/it]

For epoch 28: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.48batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.84s/batches]



Metrics: {'train_loss': 3.8837388504051154, 'test_loss': 9.03520376417372, 'accuracy': 0.02045128205128205, 'bleu': 0.19044888888888886, 'gen_len': 21.938461538461535}




 92%|█████████▏| 23/25 [21:32<01:51, 55.98s/it]

For epoch 29: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:35<00:00,  4.42batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.87s/batches]



Metrics: {'train_loss': 3.774458319611584, 'test_loss': 8.72389763400086, 'accuracy': 0.017678461538461535, 'bleu': 0.12553982905982908, 'gen_len': 21.938461538461535}




 96%|█████████▌| 24/25 [22:29<00:56, 56.26s/it]

For epoch 30: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 156: 100%|██████████| 155/155 [00:34<00:00,  4.50batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.82s/batches]



Metrics: {'train_loss': 3.6655619917413533, 'test_loss': 9.069508009690505, 'accuracy': 0.01737521367521367, 'bleu': 0.16014529914529918, 'gen_len': 21.938461538461535}




100%|██████████| 25/25 [23:25<00:00, 56.21s/it]


➡️ Predictions


In [10]:
if not trainer is None:
    
    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate the test dataset
    # initialize the transformation sequence
    end_mark_fn = partial(add_end_mark)
    augmentation = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)


    # let us get the test set
    test_dataset = SentenceDataset(f"{config['data_directory']}test_set.csv",
                                            tokenizer = tokenizer,
                                            cp1_transformer = augmentation,
                                            cp2_transformer = augmentation,
                                            corpus_1=config['corpus_1'],
                                            corpus_2=config['corpus_2'],
                                            truncation = False)

    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                            'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    metrics, prediction = trainer.evaluate(test_dataset, test_loader_args)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Evaluation batch number 8: 100%|██████████| 7/7 [00:39<00:00,  5.66s/batches]


In [11]:
metrics

{'test_loss': 6.96613918031965,
 'accuracy': 0.06841428571428572,
 'bleu': 0.28232857142857143,
 'gen_len': 82.14285714285714}

In [12]:
prediction

Unnamed: 0,original_sentences,translations,predictions
0,L'homme t'avait vu.,Góor gi gisóon na la.,Gis Gis naa na na.....
1,Tu te rappelles son amour?,Gis ŋga coroom la woon?,Gis?????????
2,La nuit se passe bien.,Guddi gaangi fi rek.,Gis Gis naa na na.....
3,Cela simplement!,Loolu doŋŋ!,Gis Gis Gis la la la!!!!
4,Où le mets-tu?,Foo kay def?,Gis?????????
...,...,...,...
281,"Ce n'est que longtemps après, quand l'égoïsme ...","Teg nañ ciy ati-at ma door a jëli ni jigéen, n...","Gis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
282,"J'ai ressenti de l'étonnement, et même de l'in...","Li wóor te wér moo di ne bi loolu lépp weesoo,...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
283,À quel point les arbres aux troncs rectilignes...,"Dàtti garab yaa ngi lunk, sànneeku jëm ca kow,...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
284,Je peux ressentir l'émotion qu'il éprouve à tr...,"Li koy yëngal noonu, xam naa ko. Lan moo ko dà...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."


----------------------------------------------------------------

#### Wolof-French v7

➡️ Import the libraries.

In [1]:
from wolof_translate import *

# specify a seed for everything
lt.seed_everything(0)

Global seed set to 0


0

➡️ Function to recuperate datasets

In [2]:
%%writefile wolof-translate/wolof_translate/utils/recuperate_datasets.py
from wolof_translate import *

def recuperate_datasets(char_p: float, word_p: float, max_len: int, end_mark: int, tokenizer: T5TokenizerFast,
                        corpus_1: str = 'french', corpus_2: str = 'wolof', 
                        train_file: str = 'data/extractions/new_data/train_set.csv', 
                        test_file: str = 'data/extractions/new_data/test_file.csv'):

  # Let us recuperate the end_mark adding option
  if end_mark == 1:
    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space)

    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space)
    
  else:
    
    if end_mark == 2:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!', replace = True)
    
    elif end_mark == 3:

      end_mark_fn = partial(add_end_mark)
    
    elif end_mark == 4:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!')
    
    else:  
        
        raise ValueError(f'No end mark number {end_mark}')

    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=char_p, aug_word_p=word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(train_file,
                                        tokenizer,
                                        truncation = False,
                                        cp1_transformer = fr_augmentation_1,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2
                                        )

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(test_file,
                                        tokenizer,
                                        cp1_transformer = fr_augmentation_2,
                                        cp2_transformer = fr_augmentation_2,
                                        corpus_1=corpus_1,
                                        corpus_2=corpus_2,
                                        truncation = False)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

Overwriting wolof-translate/wolof_translate/utils/recuperate_datasets.py


In [3]:
%run wolof-translate/wolof_translate/utils/recuperate_datasets.py

➡️ Training

In [4]:
# initialize the configurations
config = {
    'epochs': 30,
    'max_epoch': None,
    'log_step': 1,
    'metric_for_best_model': 'test_loss',
    'metric_objective': 'minimize',
    'corpus_1': 'wolof',
    'corpus_2': 'french',
    'train_file': 'data/extractions/new_data/train_set.csv',
    'test_file': 'data/extractions/new_data/valid_set.csv',
    'drop_out_rate': 0.14135351426879492,
    'd_model': 512,
    'n_head': 8,
    'dim_ff': 2031,
    'n_encoders': 6,
    'n_decoders': 6,
    'learning_rate': None,
    'weight_decay': 0.0,
    'char_p': 0.15461558886293383,
    'word_p': 0.8189330024450924,
    'end_mark': 3,
    'label_smoothing': 0.1,
    'max_len': 20,
    'random_state': 0,
    'boundaries': [2, 30, 57, 84, 112, 139, 166],
    'batch_sizes': [256, 128, 64, 32, 16, 8, 4, 2],
    'batch_size': None, 
    'warmup_init': True,
    'relative_step': True,
    'num_workers': 0,
    'pin_memory': False,
    # --------------------> Must be changed when continuing a training
    'model_dir': 'custom_transformer_v7_wf_best',
    'new_model_dir': 'custom_transformer_v7_wf',
    'continue': False, # --------------------------> Must be changed when continuing training
    'logging_dir': 'data/logs/custom_transformer_wf',
    'save_best': True,
    'tokenizer_path': 'wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v6.model',
    'data_directory': 'data/extractions/new_data/',
    'data_file': 'corpora_v7.csv',
    'version': 7,
    # in the case of a distributed training
    'backend': None,
    'hosts': [],
    'current_host': None,
    'num_gpus': 5,
    'logger': None,
    'return_trainer': True,
    'include_split': True,
}

In [5]:
%%writefile wolof-translate/wolof_translate/utils/training.py
from wolof_translate import *
import warnings

def train(config: dict):
    
    # ---------------------------------------
    # add distribution if necessary (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_mnist/mnist.py)
    
    logger = config['logger']
    
    is_distributed = len(config['hosts']) > 1 and config['backend'] is not None
    
    use_cuda = config['num_gpus'] > 0
    
    config.update({"num_workers": 1, "pin_memory": True} if use_cuda else {})

    if not logger is None:
        
        logger.debug("Distributed training - {}".format(is_distributed))
        
        logger.debug("Number of gpus available - {}".format(config['num_gpus']))
        
    if is_distributed:
        # Initialize the distributed environment.
        world_size = len(config['hosts'])
        
        os.environ["WORLD_SIZE"] = str(world_size)
        
        host_rank = config['hosts'].index(config['current_host'])
        
        os.environ["RANK"] = str(host_rank)
        
        dist.init_process_group(backend=config['backend'], rank=host_rank, world_size=world_size)
        
        if not logger is None: logger.info(
            "Initialized the distributed environment: '{}' backend on {} nodes. ".format(
                config['backend'], dist.get_world_size()
            )
            + "Current host rank is {}. Number of gpus: {}".format(dist.get_rank(), config['num_gpus'])
        )
    # ---------------------------------------
    
    # split the data
    if config['include_split']: split_data(config['random_state'], config['data_directory'], config['data_file'])

    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate train and test set
    train_dataset, test_dataset = recuperate_datasets(config['char_p'],
                                                        config['word_p'], config['max_len'],
                                                        config['end_mark'], tokenizer, config['corpus_1'],
                                                        config['corpus_2'],
                                                        config['train_file'], config['test_file'])
    
    # initialize the evaluation object
    evaluation = TranslationEvaluation(tokenizer, train_dataset.decode)

    # let us initialize the trainer
    trainer = ModelRunner(model = Transformer, version=config['version'], seed = 0, evaluation = evaluation, optimizer = Adafactor)

    # initialize the encoder and the decoder layers
    encoder_layer = nn.TransformerEncoderLayer(config['d_model'],
                                                config['n_head'],
                                                config['dim_ff'],
                                                config['drop_out_rate'], batch_first = True)

    decoder_layer = nn.TransformerDecoderLayer(config['d_model'],
                                                config['n_head'],
                                                config['dim_ff'],
                                                config['drop_out_rate'], batch_first = True)

    # let us initialize the encoder and the decoder
    encoder = nn.TransformerEncoder(encoder_layer, config['n_encoders'])

    decoder = nn.TransformerDecoder(decoder_layer, config['n_decoders'])

    #-------------------------------------
    # in the case when the linear learning rate scheduler with warmup is used
    
    # let us calculate the appropriate warmup steps (let us take a max epoch of 100)
    # length = len(train_dataset)

    # n_steps = length // config['batch_size']

    # num_steps = config['max_epoch'] * n_steps

    # warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

    # Initialize the scheduler parameters
    # scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}
    #-------------------------------------

    # Initialize the transformer parameters
    model_args = {
        'vocab_size': len(tokenizer),
        'encoder': encoder,
        'decoder': decoder,
        'class_criterion': nn.CrossEntropyLoss(label_smoothing = config['label_smoothing']),
        'max_len': config['max_len']
    }

    # Initialize the optimizer parameters
    optimizer_args = {
        'lr': config['learning_rate'],
        'weight_decay': config['weight_decay'],
        # 'betas': (0.9, 0.98),
        'warmup_init': config['warmup_init'],
        'relative_step': config['relative_step']
    }

    # ----------------------------
    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    train_sampler = SequenceLengthBatchSampler(train_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    # ------------------------------
    # initialize a bucket sampler with fixed batch size in the case of single machine
    # with parallelization on multiple gpus
    # train_sampler = BucketSampler(train_dataset, config['batch_size'])

    # test_sampler = BucketSampler(test_dataset, config['batch_size'])
    
    # ------------------------------

    # Initialize the loaders parameters
    train_loader_args = {'batch_sampler': train_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                        'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    # Add the datasets and hyperparameters to trainer
    trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                    test_loader_args, optimizer_kwargs = optimizer_args,
                    model_kwargs = model_args,
                    # lr_scheduler=get_linear_schedule_with_warmup,
                    # lr_scheduler_kwargs=scheduler_args,
                    predict_with_generate = True,
                    is_distributed=is_distributed,
                    logging_dir=config['logging_dir'],
                    dist=dist
                    )

    # load the model
    trainer.load(config['model_dir'], load_best = not config['continue'])
    
    # Train the model
    trainer.train(config['epochs'] - trainer.current_epoch, auto_save = True, log_step = config['log_step'], saving_directory=config['new_model_dir'], save_best = config['save_best'],
                  metric_for_best_model = config['metric_for_best_model'], metric_objective = config['metric_objective'])
    
    if config['return_trainer']:
        
        return trainer
    
    return None


Overwriting wolof-translate/wolof_translate/utils/training.py


Below train and save if we want.

In [6]:
from wolof_translate.utils.training import train

In [7]:
# with warnings.catch_warnings():
    # warnings.simplefilter("ignore")
trainer = train(config)

# save if necessary

  0%|          | 0/25 [00:00<?, ?it/s]

For epoch 6: 


Train batch number 2:   0%|          | 0/79 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:24<00:00,  3.18batches/s]
Test batch number 2:   0%|          | 0/11 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False)
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:22<00:00,  2.04s/batches]



Metrics: {'train_loss': 7.187684862607562, 'test_loss': 7.44390119650425, 'accuracy': 0.04530717948717949, 'bleu': 0.0936411965811966, 'gen_len': 22.355555555555554}




  4%|▍         | 1/25 [00:47<19:09, 47.90s/it]

For epoch 7: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.40batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:21<00:00,  1.91s/batches]



Metrics: {'train_loss': 6.747729900252421, 'test_loss': 7.33677910861806, 'accuracy': 0.04709897435897436, 'bleu': 0.06707452991452992, 'gen_len': 22.355555555555554}




  8%|▊         | 2/25 [01:41<19:44, 51.49s/it]

For epoch 8: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.41batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.83s/batches]



Metrics: {'train_loss': 6.38781906326578, 'test_loss': 7.304613937475742, 'accuracy': 0.02995247863247863, 'bleu': 0.06842615384615384, 'gen_len': 22.355555555555554}




 12%|█▏        | 3/25 [02:36<19:25, 52.98s/it]

For epoch 9: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.42batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.82s/batches]



Metrics: {'train_loss': 6.120712252536119, 'test_loss': 7.241891223548824, 'accuracy': 0.030185641025641023, 'bleu': 0.06756735042735043, 'gen_len': 22.355555555555554}




 16%|█▌        | 4/25 [03:33<19:05, 54.56s/it]

For epoch 10: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.40batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.85s/batches]



Metrics: {'train_loss': 5.932615137487729, 'test_loss': 7.255839188893636, 'accuracy': 0.03287025641025641, 'bleu': 0.08893299145299147, 'gen_len': 22.355555555555554}




 20%|██        | 5/25 [04:22<17:27, 52.38s/it]

For epoch 11: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.41batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.82s/batches]



Metrics: {'train_loss': 5.797690454051471, 'test_loss': 7.232953423720139, 'accuracy': 0.04010769230769231, 'bleu': 0.09735948717948718, 'gen_len': 22.355555555555554}




 24%|██▍       | 6/25 [05:08<15:53, 50.20s/it]

For epoch 12: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.40batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.86s/batches]



Metrics: {'train_loss': 5.681706225494731, 'test_loss': 7.2330093840248555, 'accuracy': 0.046400000000000004, 'bleu': 0.10081623931623933, 'gen_len': 22.355555555555554}




 28%|██▊       | 7/25 [05:53<14:33, 48.51s/it]

For epoch 13: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:34<00:00,  2.30batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:24<00:00,  2.23s/batches]



Metrics: {'train_loss': 5.5792333383250154, 'test_loss': 7.208712688674274, 'accuracy': 0.04488615384615384, 'bleu': 0.10237880341880343, 'gen_len': 22.355555555555554}




 32%|███▏      | 8/25 [06:57<15:10, 53.56s/it]

For epoch 14: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:34<00:00,  2.29batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:23<00:00,  2.18s/batches]



Metrics: {'train_loss': 5.476012399198664, 'test_loss': 7.2477222589346075, 'accuracy': 0.04687213675213675, 'bleu': 0.10007282051282052, 'gen_len': 22.355555555555554}




 36%|███▌      | 9/25 [07:57<14:49, 55.57s/it]

For epoch 15: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:37<00:00,  2.12batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:24<00:00,  2.22s/batches]



Metrics: {'train_loss': 5.3841612896825435, 'test_loss': 7.36185769382705, 'accuracy': 0.04649282051282052, 'bleu': 0.12334717948717946, 'gen_len': 22.355555555555554}




 40%|████      | 10/25 [09:00<14:27, 57.80s/it]

For epoch 16: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:36<00:00,  2.15batches/s]
Test batch number 2:   0%|          | 0/11 [00:00<?, ?batches/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:25<00:00,  2.28s/batches]



Metrics: {'train_loss': 5.301969289983613, 'test_loss': 7.4997970776680205, 'accuracy': 0.04429470085470085, 'bleu': 0.1381153846153846, 'gen_len': 22.355555555555554}




 44%|████▍     | 11/25 [10:03<13:51, 59.37s/it]

For epoch 17: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:39<00:00,  2.00batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:25<00:00,  2.29s/batches]



Metrics: {'train_loss': 5.211127134854208, 'test_loss': 7.516774188147651, 'accuracy': 0.04765230769230769, 'bleu': 0.11910256410256409, 'gen_len': 22.355555555555554}




 48%|████▊     | 12/25 [11:09<13:17, 61.36s/it]

For epoch 18: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:40<00:00,  1.94batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:25<00:00,  2.33s/batches]



Metrics: {'train_loss': 5.1259991126350055, 'test_loss': 7.569934063283807, 'accuracy': 0.0496382905982906, 'bleu': 0.14772769230769228, 'gen_len': 22.355555555555554}




 52%|█████▏    | 13/25 [12:16<12:38, 63.18s/it]

For epoch 19: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:37<00:00,  2.12batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:24<00:00,  2.24s/batches]



Metrics: {'train_loss': 5.056288480758666, 'test_loss': 7.461149239336325, 'accuracy': 0.04562341880341881, 'bleu': 0.13461829059829059, 'gen_len': 22.355555555555554}




 56%|█████▌    | 14/25 [13:19<11:34, 63.11s/it]

For epoch 20: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:34<00:00,  2.29batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:25<00:00,  2.31s/batches]



Metrics: {'train_loss': 4.973348451129795, 'test_loss': 7.602865479950212, 'accuracy': 0.04509128205128205, 'bleu': 0.11657692307692306, 'gen_len': 22.355555555555554}




 60%|██████    | 15/25 [14:20<10:24, 62.45s/it]

For epoch 21: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:34<00:00,  2.32batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:24<00:00,  2.23s/batches]



Metrics: {'train_loss': 4.9049403679687, 'test_loss': 7.513884770972097, 'accuracy': 0.03593521367521368, 'bleu': 0.12593247863247864, 'gen_len': 22.355555555555554}




 64%|██████▍   | 16/25 [15:20<09:14, 61.61s/it]

For epoch 22: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:31<00:00,  2.51batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.86s/batches]



Metrics: {'train_loss': 4.831563282053729, 'test_loss': 7.738234032524956, 'accuracy': 0.041381196581196586, 'bleu': 0.13351470085470082, 'gen_len': 22.355555555555554}




 68%|██████▊   | 17/25 [16:13<07:52, 59.03s/it]

For epoch 23: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:25<00:00,  3.10batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.86s/batches]



Metrics: {'train_loss': 4.7601258937266255, 'test_loss': 7.682662941859319, 'accuracy': 0.03750940170940171, 'bleu': 0.10915487179487181, 'gen_len': 22.355555555555554}




 72%|███████▏  | 18/25 [17:00<06:27, 55.41s/it]

For epoch 24: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.36batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.80s/batches]



Metrics: {'train_loss': 4.7026086196214125, 'test_loss': 7.721827574672862, 'accuracy': 0.03278598290598291, 'bleu': 0.119928547008547, 'gen_len': 22.355555555555554}




 76%|███████▌  | 19/25 [17:44<05:12, 52.08s/it]

For epoch 25: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:22<00:00,  3.44batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.79s/batches]



Metrics: {'train_loss': 4.638420750485001, 'test_loss': 7.652609861814059, 'accuracy': 0.03794940170940171, 'bleu': 0.10516444444444445, 'gen_len': 22.355555555555554}




 80%|████████  | 20/25 [18:27<04:07, 49.53s/it]

For epoch 26: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.32batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:22<00:00,  2.03s/batches]



Metrics: {'train_loss': 4.568095618175379, 'test_loss': 7.708123462220541, 'accuracy': 0.03454478632478634, 'bleu': 0.09843179487179486, 'gen_len': 22.355555555555554}




 84%|████████▍ | 21/25 [19:15<03:15, 48.80s/it]

For epoch 27: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:22<00:00,  3.44batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.78s/batches]



Metrics: {'train_loss': 4.51142231667521, 'test_loss': 7.753795340122322, 'accuracy': 0.032625982905982905, 'bleu': 0.09674119658119658, 'gen_len': 22.355555555555554}




 88%|████████▊ | 22/25 [19:58<02:21, 47.23s/it]

For epoch 28: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.43batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.80s/batches]



Metrics: {'train_loss': 4.446778978620255, 'test_loss': 7.779957402058137, 'accuracy': 0.03822615384615386, 'bleu': 0.11845350427350426, 'gen_len': 22.355555555555554}




 92%|█████████▏| 23/25 [20:42<01:32, 46.20s/it]

For epoch 29: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:22<00:00,  3.45batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:19<00:00,  1.81s/batches]



Metrics: {'train_loss': 4.387733264546603, 'test_loss': 7.86080608775473, 'accuracy': 0.03896136752136753, 'bleu': 0.0989528205128205, 'gen_len': 22.355555555555554}




 96%|█████████▌| 24/25 [21:26<00:45, 45.49s/it]

For epoch 30: 




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Train batch number 80: 100%|██████████| 79/79 [00:23<00:00,  3.42batches/s]
  return torch._native_multi_head_attention(


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Test batch number 12: 100%|██████████| 11/11 [00:20<00:00,  1.84s/batches]



Metrics: {'train_loss': 4.334654654775348, 'test_loss': 7.966567238375672, 'accuracy': 0.03769863247863248, 'bleu': 0.11237811965811965, 'gen_len': 22.355555555555554}




100%|██████████| 25/25 [22:10<00:00, 53.22s/it]


In [None]:
# with warnings.catch_warnings():
    # warnings.simplefilter("ignore")
trainer = train(config)

# save if necessary

➡️ Predictions


In [10]:
if not trainer is None:
    
    # recuperate the tokenizer
    tokenizer = T5TokenizerFast(config['tokenizer_path'])
    
    # recuperate the test dataset
    # initialize the transformation sequence
    end_mark_fn = partial(add_end_mark)
    augmentation = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)


    # let us get the test set
    test_dataset = SentenceDataset(f"{config['data_directory']}test_set.csv",
                                            tokenizer = tokenizer,
                                            cp1_transformer = augmentation,
                                            cp2_transformer = augmentation,
                                            corpus_1=config['corpus_1'],
                                            corpus_2=config['corpus_2'],
                                            truncation = False)

    # initialize the bucket samplers for distributed environment
    boundaries = config['boundaries']
    batch_sizes = config['batch_sizes']

    test_sampler = SequenceLengthBatchSampler(test_dataset,
                                                boundaries = boundaries,
                                                batch_sizes = batch_sizes)

    test_loader_args = {'batch_sampler': test_sampler, 'collate_fn': collate_fn,
                            'num_workers': config['num_workers'], 'pin_memory': config['pin_memory']}

    metrics, prediction = trainer.evaluate(test_dataset, test_loader_args)


  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
  return torch._native_multi_head_attention(
Evaluation batch number 8: 100%|██████████| 7/7 [00:39<00:00,  5.66s/batches]


In [11]:
metrics

{'test_loss': 6.96613918031965,
 'accuracy': 0.06841428571428572,
 'bleu': 0.28232857142857143,
 'gen_len': 82.14285714285714}

In [12]:
prediction

Unnamed: 0,original_sentences,translations,predictions
0,L'homme t'avait vu.,Góor gi gisóon na la.,Gis Gis naa na na.....
1,Tu te rappelles son amour?,Gis ŋga coroom la woon?,Gis?????????
2,La nuit se passe bien.,Guddi gaangi fi rek.,Gis Gis naa na na.....
3,Cela simplement!,Loolu doŋŋ!,Gis Gis Gis la la la!!!!
4,Où le mets-tu?,Foo kay def?,Gis?????????
...,...,...,...
281,"Ce n'est que longtemps après, quand l'égoïsme ...","Teg nañ ciy ati-at ma door a jëli ni jigéen, n...","Gis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
282,"J'ai ressenti de l'étonnement, et même de l'in...","Li wóor te wér moo di ne bi loolu lépp weesoo,...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
283,À quel point les arbres aux troncs rectilignes...,"Dàtti garab yaa ngi lunk, sànneeku jëm ca kow,...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
284,Je peux ressentir l'émotion qu'il éprouve à tr...,"Li koy yëngal noonu, xam naa ko. Lan moo ko dà...",",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..."
