Fine-tuning best T5 Transformer 🤖
-----------------------------------

In this notebook, we will continue the fine-tuning of T5 transformer on the new extracted sentences from the book **Grammaire de Wolof Moderne** plus that extracted from the book **Baay Sama, ...**, in addition of sentences extracted from the web. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step. We will evaluate the training on the validation dataset.

- Parallel coordinates from panel:

- Parameter importance char: 
[t5_v3_importance](https://wandb.ai/oumar-kane-team/small-t5-cross-fw-translation-bayes-hpsearch-v3/reports/undefined-23-05-16-10-36-17---Vmlldzo0Mzc4NDY0?accessToken=eyaiyrid0qz1zg2jkq3fc65biw53084dpfitbi0dgonq6mweupw6kgjml9d2nv1w)

We can see in the above chart that the batch is the most important parameter with a negative correlation with the BLEU score (meaning that a lower batch size is better). Next, we the probability of modifying a character in the french corpus is also important and a high probability provide a better BLEU score.  

In [1]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, set_seed, AdamW, get_linear_schedule_with_warmup, T5ForConditionalGeneration,\
    get_cosine_schedule_with_warmup, Adafactor
from wolof_translate.utils.sent_transformers import TransformerSequences
from wolof_translate.utils.improvements.end_marks import add_end_mark # added
from torch.nn import TransformerEncoderLayer, TransformerDecoderLayer
from torch.utils.data import Dataset, DataLoader, random_split
from wolof_translate.data.dataset_v3 import SentenceDataset # v2 -> v3
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from torch.optim.lr_scheduler import _LRScheduler
# from custom_rnn.utils.kwargs import Kwargs
from torch.nn.utils.rnn import pad_sequence
from plotly.subplots import make_subplots
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from torch.nn import functional as F
import plotly.graph_objects as go
from tokenizers import Tokenizer
import matplotlib.pyplot as plt
import pytorch_lightning as tl
from tqdm import tqdm, trange
from functools import partial
from torch.nn import utils
from copy import deepcopy
from torch import optim
from typing import *
from torch import nn
import pandas as pd
import numpy as np
import itertools
import evaluate
import random
import string
import shutil
import wandb
import torch
import json
import copy
import os

# set a global seed
tl.seed_everything(0)

os.environ["WANDB_DISABLED"] = "true"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

  from .autonotebook import tqdm as notebook_tqdm
Global seed set to 0


## French to wolof

### Configure dataset 🔠

In [2]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v5.json")


In [3]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float, max_len: int, end_mark_opt: int):

  # Let us recuperate the end_mark adding option
  if end_mark_opt == 1:
    # Create augmentation to add on French sentences
    fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p, 
                                                          aug_word_max=max_len),
                                          remove_mark_space, delete_guillemet_space)

  else:
    
    if end_mark_opt == 2:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!', replace = True)
    
    elif end_mark_opt == 3:

      end_mark_fn = partial(add_end_mark)
    
    elif end_mark_opt == 4:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!')

    # Create augmentation to add on French sentences
    fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p, 
                                                          aug_word_max= max_len),
                                          remove_mark_space, delete_guillemet_space, end_mark_fn)

  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(f"data/extractions/new_data/train_set.csv",
                                        tokenizer,
                                        truncation = True, max_len=max_len,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(f"data/extractions/new_data/valid_set.csv",
                                        tokenizer, max_len=max_len,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

### Configure the model and the evaluation function ⚙️

Let us evaluate the predictions with the `bleu` metric.

In [4]:
%%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, self.tokenizer.pad_token_id)
        
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

Overwriting wolof-translate/wolof_translate/utils/evaluation.py


Let us initialize the evaluation object.

In [5]:
%run wolof-translate/wolof_translate/utils/evaluation.py
evaluation = TranslationEvaluation(tokenizer)


### Searching for the best parameters 🕖

In [6]:
from wolof_translate.models.transformers.optimization import TransformerScheduler
from wolof_translate.trainers.transformer_trainer import ModelRunner
from wolof_translate.utils.evaluation import TranslationEvaluation
from wolof_translate.models.transformers.main import Transformer
from wolof_translate.utils.split_with_valid import split_data


-------------

### --- Wandb V6

In [7]:
# let us initialize the hyperparameter configuration 
config = {
    'random_state': 0,
    'fr_char_p': 0.8119681909961385,
    'fr_word_p': 0.16226004599890684,
    'learning_rate': 0.0046261590792525484,
    'weight_decay': 0.09029588340921264,
    'batch_size': 16,
    'warmup_ratio': 0.0,
    'max_epoch': 293,
    'max_len': 117,
    'end_mark': 4,
    'bleu': 0.5602,
    'model_dir': 'data/checkpoints/fw_t5_small_custom_train_v6_checkpoints/',
    'new_model_dir': 'data/checkpoints/t5_small_custom_train_results_fw_v6/'
}

# Initialize the model name
model_name = 't5-small'

# import the model with its pre-trained weights
model = T5ForConditionalGeneration.from_pretrained(model_name)

# resize the token embeddings
model.resize_token_embeddings(len(tokenizer))

# let us initialize the evaluation class
evaluation = TranslationEvaluation(tokenizer)

# let us initialize the trainer
trainer = ModelRunner(model, seed = 0, version = 1, evaluation = evaluation, optimizer = Adafactor)

# split the data
split_data(config['random_state'], csv_file = "corpora_v6.csv")

# recuperate train and test set
train_dataset, test_dataset = recuperate_datasets(config['fr_char_p'], 
                                                    config['fr_word_p'], config['max_len'],
                                                    config['end_mark'])

# let us calculate the appropriate warmup steps (let us take a max epoch of 100)
# length = len(train_dataset)

# n_steps = length // config['batch_size']

# num_steps = config['max_epoch'] * n_steps

# warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

# # Initialize the scheduler parameters
# scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}

# Initialize the optimizer parameters
optimizer_args = {
    'lr': config['learning_rate'],
    'weight_decay': config['weight_decay'],
    # 'betas': (0.9, 0.98),
    'relative_step': False
}

# Initialize the loaders parameters
train_loader_args = {'batch_size': config['batch_size']}

# Add the datasets and hyperparameters to trainer
trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                optimizer_kwargs = optimizer_args,
                # lr_scheduler=get_linear_schedule_with_warmup,
                # lr_scheduler_kwargs=scheduler_args, 
                predict_with_generate = True,
                hugging_face = True,
                logging_dir="data/logs/t5_small_custom_train_fw_v6"
                )

# We will from checkpoints so let us the model
trainer.load(config['model_dir'], load_best=True) # Only for the first loading
# trainer.load(config['new_model_dir'], load_best=True)

        

### ---

In [8]:
trainer.train(epochs = config['max_epoch'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])

  0%|          | 0/290 [00:00<?, ?it/s]

For epoch 4: 


Train batch number 139: 100%|██████████| 139/139 [01:06<00:00,  2.09batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:09<00:00,  1.73batches/s]



Metrics: {'train_loss': 0.6756335385411764, 'test_loss': 0.745678411796689, 'bleu': 0.308, 'gen_len': 10.0243}




  0%|          | 1/290 [01:21<6:33:03, 81.60s/it]

For epoch 5: 


Train batch number 139: 100%|██████████| 139/139 [01:06<00:00,  2.09batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.78batches/s]



Metrics: {'train_loss': 0.6299353357699278, 'test_loss': 0.736256267875433, 'bleu': 0.4886, 'gen_len': 10.7287}




  1%|          | 2/290 [02:43<6:32:26, 81.76s/it]

For epoch 6: 


Train batch number 139: 100%|██████████| 139/139 [01:03<00:00,  2.20batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.82batches/s]



Metrics: {'train_loss': 0.5844838919399453, 'test_loss': 0.7321856115013361, 'bleu': 0.3412, 'gen_len': 10.5101}




  1%|          | 3/290 [03:59<6:18:27, 79.12s/it]

For epoch 7: 


Train batch number 139: 100%|██████████| 139/139 [01:03<00:00,  2.18batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.07batches/s]



Metrics: {'train_loss': 0.5407071919749966, 'test_loss': 0.747887559235096, 'bleu': 0.4296, 'gen_len': 10.1012}




  1%|▏         | 4/290 [05:16<6:12:34, 78.16s/it]

For epoch 8: 


Train batch number 139: 100%|██████████| 139/139 [00:59<00:00,  2.33batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.94batches/s]



Metrics: {'train_loss': 0.49814703433419305, 'test_loss': 0.7449956610798836, 'bleu': 0.4617, 'gen_len': 10.7409}




  2%|▏         | 5/290 [06:28<6:01:37, 76.13s/it]

For epoch 9: 


Train batch number 139: 100%|██████████| 139/139 [01:01<00:00,  2.25batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:11<00:00,  1.38batches/s]



Metrics: {'train_loss': 0.45293555060307755, 'test_loss': 0.7572793215513229, 'bleu': 0.6232, 'gen_len': 11.1579}




  2%|▏         | 6/290 [07:48<6:06:04, 77.34s/it]

For epoch 10: 


Train batch number 139: 100%|██████████| 139/139 [01:03<00:00,  2.19batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.78batches/s]



Metrics: {'train_loss': 0.40918134849706145, 'test_loss': 0.7688983902335167, 'bleu': 0.8281, 'gen_len': 12.0283}




  2%|▏         | 7/290 [09:08<6:09:43, 78.39s/it]

For epoch 11: 


Train batch number 139: 100%|██████████| 139/139 [00:59<00:00,  2.33batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:09<00:00,  1.74batches/s]



Metrics: {'train_loss': 0.3661694301010893, 'test_loss': 0.8105172757059336, 'bleu': 0.6895, 'gen_len': 11.8745}




  3%|▎         | 8/290 [10:22<6:01:27, 76.91s/it]

For epoch 12: 


Train batch number 139: 100%|██████████| 139/139 [01:00<00:00,  2.28batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.23batches/s]



Metrics: {'train_loss': 0.32660300734875014, 'test_loss': 0.8075136207044125, 'bleu': 0.99, 'gen_len': 12.3158}




  3%|▎         | 9/290 [11:37<5:57:03, 76.24s/it]

For epoch 13: 


Train batch number 139: 100%|██████████| 139/139 [00:58<00:00,  2.38batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.22batches/s]



Metrics: {'train_loss': 0.2878974949498828, 'test_loss': 0.8294185847043991, 'bleu': 1.0053, 'gen_len': 11.8462}




  3%|▎         | 10/290 [12:49<5:49:27, 74.88s/it]

For epoch 14: 


Train batch number 139: 100%|██████████| 139/139 [00:59<00:00,  2.33batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.95batches/s]



Metrics: {'train_loss': 0.25320540277434767, 'test_loss': 0.8501656614243984, 'bleu': 0.8544, 'gen_len': 11.5547}




  4%|▍         | 11/290 [14:01<5:44:44, 74.14s/it]

For epoch 15: 


Train batch number 139: 100%|██████████| 139/139 [01:00<00:00,  2.31batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.07batches/s]



Metrics: {'train_loss': 0.21888714610672683, 'test_loss': 0.8742351308465004, 'bleu': 1.1133, 'gen_len': 11.753}




  4%|▍         | 12/290 [15:15<5:43:15, 74.09s/it]

For epoch 16: 


Train batch number 139: 100%|██████████| 139/139 [00:59<00:00,  2.33batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.07batches/s]



Metrics: {'train_loss': 0.19135808818739095, 'test_loss': 0.8914206176996231, 'bleu': 0.9704, 'gen_len': 11.3603}




  4%|▍         | 13/290 [16:27<5:38:53, 73.41s/it]

For epoch 17: 


Train batch number 139: 100%|██████████| 139/139 [00:59<00:00,  2.35batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:06<00:00,  2.29batches/s]



Metrics: {'train_loss': 0.16632476398091522, 'test_loss': 0.8953558914363384, 'bleu': 1.5538, 'gen_len': 12.2915}




  5%|▍         | 14/290 [17:39<5:36:06, 73.07s/it]

For epoch 18: 


Train batch number 139: 100%|██████████| 139/139 [00:58<00:00,  2.39batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.24batches/s]



Metrics: {'train_loss': 0.14638291486947658, 'test_loss': 0.9099540039896965, 'bleu': 1.6797, 'gen_len': 11.7166}




  5%|▌         | 15/290 [18:50<5:32:18, 72.50s/it]

For epoch 19: 


Train batch number 139: 100%|██████████| 139/139 [00:58<00:00,  2.39batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.18batches/s]



Metrics: {'train_loss': 0.1289063654571986, 'test_loss': 0.9259704910218716, 'bleu': 1.5937, 'gen_len': 12.0243}




  6%|▌         | 16/290 [20:00<5:27:09, 71.64s/it]

For epoch 20: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.42batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:06<00:00,  2.30batches/s]



Metrics: {'train_loss': 0.11382425139597852, 'test_loss': 0.9347099140286446, 'bleu': 1.9582, 'gen_len': 11.7652}




  6%|▌         | 17/290 [21:11<5:24:43, 71.37s/it]

For epoch 21: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.43batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.23batches/s]



Metrics: {'train_loss': 0.09960249781930189, 'test_loss': 0.9483309388160706, 'bleu': 1.8551, 'gen_len': 11.6073}




  6%|▌         | 18/290 [22:20<5:20:04, 70.60s/it]

For epoch 22: 


Train batch number 139: 100%|██████████| 139/139 [00:58<00:00,  2.39batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.18batches/s]



Metrics: {'train_loss': 0.08971277849494125, 'test_loss': 0.9452789388597012, 'bleu': 1.9598, 'gen_len': 11.6761}




  7%|▋         | 19/290 [23:31<5:20:19, 70.92s/it]

For epoch 23: 


Train batch number 139: 100%|██████████| 139/139 [00:56<00:00,  2.44batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.28batches/s]



Metrics: {'train_loss': 0.07920412507524593, 'test_loss': 0.9510470777750015, 'bleu': 2.0056, 'gen_len': 11.9393}




  7%|▋         | 20/290 [24:41<5:18:05, 70.69s/it]

For epoch 24: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.44batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.14batches/s]



Metrics: {'train_loss': 0.07290507808810087, 'test_loss': 0.9551120772957802, 'bleu': 2.7117, 'gen_len': 11.7247}




  7%|▋         | 21/290 [25:52<5:16:36, 70.62s/it]

For epoch 25: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.43batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.20batches/s]



Metrics: {'train_loss': 0.06602750070017877, 'test_loss': 0.9647871442139149, 'bleu': 2.0919, 'gen_len': 10.749}




  8%|▊         | 22/290 [27:01<5:12:58, 70.07s/it]

For epoch 26: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.42batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:06<00:00,  2.32batches/s]



Metrics: {'train_loss': 0.060282323496054405, 'test_loss': 0.9579690285027027, 'bleu': 2.844, 'gen_len': 12.0688}




  8%|▊         | 23/290 [28:11<5:12:19, 70.18s/it]

For epoch 27: 


Train batch number 139: 100%|██████████| 139/139 [00:58<00:00,  2.38batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:06<00:00,  2.32batches/s]



Metrics: {'train_loss': 0.054661320844035355, 'test_loss': 0.9608607813715935, 'bleu': 2.6851, 'gen_len': 11.8381}




  8%|▊         | 24/290 [29:21<5:10:32, 70.05s/it]

For epoch 28: 


Train batch number 139: 100%|██████████| 139/139 [00:56<00:00,  2.44batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:06<00:00,  2.30batches/s]



Metrics: {'train_loss': 0.050324460464737396, 'test_loss': 0.9718196876347065, 'bleu': 1.6612, 'gen_len': 11.3198}




  9%|▊         | 25/290 [30:30<5:07:33, 69.63s/it]

For epoch 29: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.42batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.28batches/s]



Metrics: {'train_loss': 0.046621085794495164, 'test_loss': 0.9823607541620731, 'bleu': 2.8358, 'gen_len': 12.0243}




  9%|▉         | 26/290 [31:38<5:05:18, 69.39s/it]

For epoch 30: 


Train batch number 139: 100%|██████████| 139/139 [00:58<00:00,  2.39batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.28batches/s]



Metrics: {'train_loss': 0.0435392325007015, 'test_loss': 0.9748452678322792, 'bleu': 2.8916, 'gen_len': 12.0243}




  9%|▉         | 27/290 [32:50<5:06:58, 70.03s/it]

For epoch 31: 


Train batch number 139: 100%|██████████| 139/139 [00:56<00:00,  2.45batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:06<00:00,  2.31batches/s]



Metrics: {'train_loss': 0.04136707503750599, 'test_loss': 0.972589835524559, 'bleu': 2.8743, 'gen_len': 11.8016}




 10%|▉         | 28/290 [33:58<5:03:24, 69.48s/it]

For epoch 32: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.42batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.19batches/s]



Metrics: {'train_loss': 0.03757678181361809, 'test_loss': 0.9769226275384426, 'bleu': 3.0232, 'gen_len': 11.4089}




 10%|█         | 29/290 [35:09<5:03:50, 69.85s/it]

For epoch 33: 


Train batch number 139: 100%|██████████| 139/139 [00:56<00:00,  2.44batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:06<00:00,  2.31batches/s]



Metrics: {'train_loss': 0.03511069916027913, 'test_loss': 0.9741561971604824, 'bleu': 2.5906, 'gen_len': 11.6235}




 10%|█         | 30/290 [36:17<5:00:34, 69.36s/it]

For epoch 34: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.42batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.27batches/s]



Metrics: {'train_loss': 0.031503129100574435, 'test_loss': 0.9798270016908646, 'bleu': 2.6801, 'gen_len': 11.6761}




 11%|█         | 31/290 [37:26<4:58:56, 69.25s/it]

For epoch 35: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.41batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.08batches/s]



Metrics: {'train_loss': 0.030019942054645622, 'test_loss': 0.975056067109108, 'bleu': 3.0561, 'gen_len': 12.0}




 11%|█         | 32/290 [38:38<5:00:41, 69.93s/it]

For epoch 36: 


Train batch number 139: 100%|██████████| 139/139 [00:56<00:00,  2.45batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.28batches/s]



Metrics: {'train_loss': 0.02781689953621772, 'test_loss': 0.9790943190455437, 'bleu': 3.0667, 'gen_len': 11.587}




 11%|█▏        | 33/290 [39:48<4:59:34, 69.94s/it]

For epoch 37: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.42batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.17batches/s]



Metrics: {'train_loss': 0.02694242796023115, 'test_loss': 0.9855843149125576, 'bleu': 2.8842, 'gen_len': 11.6883}




 12%|█▏        | 34/290 [40:57<4:57:21, 69.69s/it]

For epoch 38: 


Train batch number 139: 100%|██████████| 139/139 [00:57<00:00,  2.41batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.24batches/s]



Metrics: {'train_loss': 0.026179476393212518, 'test_loss': 0.9795428141951561, 'bleu': 2.6458, 'gen_len': 11.2389}




 12%|█▏        | 35/290 [42:06<4:55:49, 69.61s/it]

For epoch 39: 


Train batch number 139: 100%|██████████| 139/139 [00:59<00:00,  2.35batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:09<00:00,  1.64batches/s]



Metrics: {'train_loss': 0.024344450941563938, 'test_loss': 0.9721959158778191, 'bleu': 3.1193, 'gen_len': 11.7449}




 12%|█▏        | 36/290 [43:21<5:01:44, 71.28s/it]

For epoch 40: 


Train batch number 139: 100%|██████████| 139/139 [01:01<00:00,  2.25batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.05batches/s]



Metrics: {'train_loss': 0.023992332482455875, 'test_loss': 0.9646492749452591, 'bleu': 2.7094, 'gen_len': 11.6518}




 13%|█▎        | 37/290 [44:37<5:05:45, 72.51s/it]

For epoch 41: 


Train batch number 139: 100%|██████████| 139/139 [01:00<00:00,  2.28batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.94batches/s]



Metrics: {'train_loss': 0.023226654458549812, 'test_loss': 0.9771487079560757, 'bleu': 2.837, 'gen_len': 11.5709}




 13%|█▎        | 38/290 [45:50<5:06:13, 72.91s/it]

For epoch 42: 


Train batch number 139: 100%|██████████| 139/139 [01:01<00:00,  2.25batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:07<00:00,  2.03batches/s]



Metrics: {'train_loss': 0.022639706637040315, 'test_loss': 0.967010248452425, 'bleu': 3.0384, 'gen_len': 11.3968}




 13%|█▎        | 39/290 [47:05<5:06:43, 73.32s/it]

For epoch 43: 


Train batch number 139: 100%|██████████| 139/139 [01:01<00:00,  2.27batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.84batches/s]



Metrics: {'train_loss': 0.021646851539719018, 'test_loss': 0.9794707410037518, 'bleu': 3.2554, 'gen_len': 11.749}




 14%|█▍        | 40/290 [48:21<5:09:07, 74.19s/it]

For epoch 44: 


Train batch number 139: 100%|██████████| 139/139 [01:01<00:00,  2.27batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.89batches/s]



Metrics: {'train_loss': 0.020459982653637583, 'test_loss': 0.9790031872689724, 'bleu': 3.0417, 'gen_len': 11.8057}




 14%|█▍        | 41/290 [49:35<5:08:13, 74.27s/it]

For epoch 45: 


Train batch number 139: 100%|██████████| 139/139 [01:02<00:00,  2.21batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:09<00:00,  1.69batches/s]



Metrics: {'train_loss': 0.01995771533387492, 'test_loss': 0.9851186536252499, 'bleu': 3.1262, 'gen_len': 11.7206}




 14%|█▍        | 42/290 [50:53<5:11:25, 75.35s/it]

For epoch 46: 


Train batch number 139: 100%|██████████| 139/139 [01:03<00:00,  2.18batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.87batches/s]



Metrics: {'train_loss': 0.018619373256488027, 'test_loss': 0.966800183057785, 'bleu': 2.9289, 'gen_len': 11.2753}




 15%|█▍        | 43/290 [52:11<5:12:58, 76.03s/it]

For epoch 47: 


Train batch number 139: 100%|██████████| 139/139 [01:01<00:00,  2.26batches/s]
Test batch number 16: 100%|██████████| 16/16 [00:08<00:00,  1.87batches/s]



Metrics: {'train_loss': 0.018106454366938675, 'test_loss': 0.9818330779671669, 'bleu': 2.8319, 'gen_len': 11.2672}




 15%|█▌        | 44/290 [53:26<5:10:23, 75.70s/it]

For epoch 48: 


Train batch number 83:  59%|█████▉    | 82/139 [00:41<00:35,  1.62batches/s]

### Predictions and Evaluation

In [15]:
# let us get the test set
test_dataset = SentenceDataset(f"data/extractions/new_data/test_set.csv",
                                        corpus_1='wolof',
                                        corpus_2='french',
                                        tokenizer = tokenizer,
                                        truncation = True)

Let us make the evaluation and print the predicted sentences.

In [16]:
# evaluation with test set
df_ft_to_wf = trainer.evaluate(test_dataset)

Evaluation batch number 11: 100%|██████████| 11/11 [00:04<00:00,  2.68batches/s]


In [17]:
df_ft_to_wf[1].tail(10)

Unnamed: 0,original_sentences,translations,predictions
152,"Yaa ŋgi, dem ŋga","Te voila, tu as été","Toi, tu as été"
153,Bëgg na ŋga dem,Il veut que tu viennes,Il veut que tu as été
154,Liggéeykat yi man ag yaw la.,Les travailleurs c'est toi et moi.,Il a vu les dames.
155,Foofee fan?,Où?,Là-bas où?
156,"Yaa ŋgi, mi ŋgi","Te voilà, le voilà","Toi, tu n'as pas été"
157,Gis ŋga kooku?,Tu as vu celui-ci?,Tu as vu celui-là?
158,Dem naa ba ci moom.,J'ai été jusqu'à lui.,J'ai été jusqu'à Saint-Louis.
159,Yéen ñan la wax?,Il parle de vous?,Il parle desquelles de vous?
160,Moo doon ganam.,C'était son hôte habituellement.,C'est le Laobe.
161,Nil waa ji na ñëw,Dis à la personne qu'elle vienne,L'homme est venu


In [18]:
# let us display 100 samples
pd.options.display.max_rows = 100
df_ft_to_wf[1].sample(100)

Unnamed: 0,original_sentences,translations,predictions
135,Góor gi demul,L'homme ne part pas,L'homme n'a pas voulu
106,Seetal ma ñenn ñuu!,Surveille-moi les-uns que voilà!,Surveille-moi ceux-là!
5,Naka ŋgeen bëggé góor gi dimëlé leen?,Comment voulez-vous que l'homme vous aide?,Vous êtes des enfants seulement?
96,Di tel-teli doŋŋ taxul sotal dara.,S'agiter simplement ne suffit à rien résoudre.,Sois quelqu'un de studieux.
145,Kooku dem ku më bëgg la!,"Celui qui est parti, c'est quelqu'un que j'app...","C'est quelqu'un que j'apprécie, celui qui est ..."
4,Gis naa xale booba?,J'ai vu cet enfant-là?,J'ai vu cet enfant-là?
141,Gis naa am xar.,J'ai vu un mouton.,J'ai vu un mouton.
66,Yaa daan ganu Mustaf,Tu étais d'habitude l'hôte de Mustapha,C'est toi qui eusses été élu
51,Xam naa xale bi.,Je connais l'enfant.,Je vois les gens.
13,Ma japp nag yee yan?,Que j'attrape quelles vaches?,Quelles personnes se sont égarées?
