Fine-tuning best T5 Transformer 🤖
-----------------------------------

In this notebook, we will continue the fine-tuning of T5 transformer on the new extracted sentences from the bool **Grammaire de Wolof Moderne** without considering the definitions. We obtained, after a hyperparameter tuning with `wandb`, a best bleu score of **4.281** for the french to wolof translation model. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step. It is important to notice that we will evaluate the training on the validation dataset.

- Parallel coordinates from panel:

- Parameter importance char: 
[t5_v3_importance](https://wandb.ai/oumar-kane-team/small-t5-cross-fw-translation-bayes-hpsearch-v3/reports/undefined-23-05-16-10-36-17---Vmlldzo0Mzc4NDY0?accessToken=eyaiyrid0qz1zg2jkq3fc65biw53084dpfitbi0dgonq6mweupw6kgjml9d2nv1w)

We can see in the above chart that the batch is the most important parameter with a negative correlation with the BLEU score (meaning that a lower batch size is better). Next, we the probability of modifying a character in the french corpus is also important and a high probability provide a better BLEU score.  

In [1]:
# let us import all necessary libraries
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, T5TokenizerFast, set_seed, AdamW, get_linear_schedule_with_warmup, T5ForConditionalGeneration
from wolof_translate.utils.sent_transformers import TransformerSequences
from torch.nn import TransformerEncoderLayer, TransformerDecoderLayer
from torch.utils.data import Dataset, DataLoader, random_split
from wolof_translate.data.dataset_v2 import SentenceDataset
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from torch.optim.lr_scheduler import _LRScheduler
# from custom_rnn.utils.kwargs import Kwargs
from torch.nn.utils.rnn import pad_sequence
from plotly.subplots import make_subplots
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from torch.nn import functional as F
import plotly.graph_objects as go
from tokenizers import Tokenizer
import matplotlib.pyplot as plt
from tqdm import tqdm, trange
from functools import partial
from torch.nn import utils
from copy import deepcopy
from torch import optim
from typing import *
from torch import nn
import pandas as pd
import numpy as np
import itertools
import evaluate
import random
import string
import shutil
import wandb
import torch
import json
import copy
import os

os.environ["WANDB_DISABLED"] = "true"

  from .autonotebook import tqdm as notebook_tqdm


## French to wolof

### Configure dataset 🔠

In [2]:
# recuperate the tokenizer from a json file
tokenizer = T5TokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/t5_tokenizers/tokenizer_v3.json")


In [3]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float):

  # Create augmentation to add on French sentences
  fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p),
                                        remove_mark_space, delete_guillemet_space)

  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(f"data/extractions/new_data/train_set.csv",
                                        tokenizer,
                                        truncation = True,
                                        cp1_transformer = fr_augmentation)

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(f"data/extractions/new_data/valid_set.csv",
                                        tokenizer,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

### Configure the model and the evaluation function ⚙️

Let us evaluate the predictions with the `bleu` metric.

In [4]:
%%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):

        preds, labels = eval_preds

        if isinstance(preds, tuple):
        
            preds = preds[0]
        
        decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, self.tokenizer.pad_token_id)
        
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)

        result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
        
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in preds]
        
        result["gen_len"] = np.mean(prediction_lens)
        
        result = {k: round(v, 4) for k, v in result.items()}
        
        return result

Overwriting wolof-translate/wolof_translate/utils/evaluation.py


Let us initialize the evaluation object.

In [5]:
%run wolof-translate/wolof_translate/utils/evaluation.py
evaluation = TranslationEvaluation(tokenizer)


### Searching for the best parameters 🕖

Let us continue the training until reaching 1000 epochs.

### ---

In [6]:
from wolof_translate.models.transformers.optimization import TransformerScheduler
from wolof_translate.trainers.transformer_trainer import ModelRunner
from wolof_translate.utils.evaluation import TranslationEvaluation
from wolof_translate.models.transformers.main import Transformer
from wolof_translate.utils.split_with_valid import split_data


In [8]:
# let us initialize the hyperparameter configuration
config = {
    'random_state': 0,
    'fr_char_p': 0.7041942989344284,
    'fr_word_p': 0.33493399432318166,
    'learning_rate': 0.00013306522113618738,
    'weight_decay': 0.1438110093320219,
    'batch_size': 32,
    'warmup_steps': 738.9370685156708,
    'scale_factor': 5.04736665125969,
    'model_dir': 'data/checkpoints/fw_t5_small_custom_train_v3_checkpoints/',
    'new_model_dir': 'data/checkpoints/t5_small_custom_train_results_fw_v3/'
}

# Initialize the model name
model_name = 't5-small'

# import the model with its pre-trained weights
model = T5ForConditionalGeneration.from_pretrained(model_name)

# resize the token embeddings
model.resize_token_embeddings(len(tokenizer))

# let us initialize the evaluation class
evaluation = TranslationEvaluation(tokenizer)

# let us initialize the trainer
trainer = ModelRunner(model, seed = 0, evaluation = evaluation)

# split the data
split_data(config['random_state'])

# recuperate train and test set
train_dataset, test_dataset = recuperate_datasets(config['fr_char_p'], 
                                                    config['fr_word_p'])

# Initialize the scheduler parameters
scheduler_args = {'scale_factor': config['scale_factor'], 'lr_warmup_step': config['warmup_steps']}

# Initialize the optimizer parameters
optimizer_args = {
    'lr': config['learning_rate'],
    'weight_decay': config['weight_decay'],
    'betas': (0.9, 0.98),
}

# Initialize the loaders parameters
train_loader_args = {'batch_size': config['batch_size']}

# Add the datasets and hyperparameters to trainer
trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                optimizer_kwargs = optimizer_args,
                lr_scheduler=TransformerScheduler,
                lr_scheduler_kwargs=scheduler_args, 
                predict_with_generate = True,
                hugging_face=True,
                logging_dir="data/logs/t5_small_custom_train_fw_v3"
                )

# We will from checkpoints so let us the model
trainer.load(config['model_dir'], load_best=True)

        

In [None]:
trainer.train(auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1)

In [10]:
trainer.state.best_model_checkpoint

'data/checkpoints/t5_results_fw_v3\\checkpoint-1048'

We obtained a final BLEU score of **25.7015** for the best model.

In [9]:
# let us get the best model
model = AutoModelForSeq2SeqLM.from_pretrained('data/checkpoints/t5_results_fw_v3/...')

# let us get the test set
test_dataset = SentenceDataset(f"data/extractions/new_data/test_set.csv",
                                        tokenizer,
                                        truncation = True)

### Predictions

Let us generate texts and store into a DataFrame.

In [10]:

# set the model to eval mode
_ = model.eval()

# run model inference on all test data
original_translations, predicted_translations, original_texts, scores = [], [], [], {}

for data, attention_mask, labels in tqdm(DataLoader(test_dataset)):
    
    # Traduce the sentences
    original_text = tokenizer.decode(data[0], skip_special_tokens=True)
    
    original_translation = tokenizer.decode(labels[0], skip_special_tokens=True)
    
    # get tokens
    generated = torch.tensor(data)
    
    attention_mask = torch.tensor(attention_mask)
    
    # recuperate the pad token id
    pad_token_id = tokenizer.pad_token_id
    
    # perform prediction
    predictions = model.generate(generated, do_sample = False, top_k = 50, max_length = test_dataset.max_len, top_p = 0.90,
                                    temperature = 0, num_return_sequences = 0, attention_mask = attention_mask, pad_token_id = pad_token_id)
    
    # calculate the score and add it to the score
    result = evaluation.compute_metrics((predictions, torch.tensor(labels)))
    
    if not scores: scores.update({k: v for k, v in result.items()})
    
    else: scores.update({k: round(scores[k] + v, 4) for k, v in result.items()})
    
    # decode the predicted tokens into texts
    predicted_translation = list(test_dataset.decode(predictions))
    
    print(predicted_translation[0])
    
    # append results
    original_translations.append(original_translation)
    
    predicted_translations.extend(predicted_translation)
    
    original_texts.append(original_text)

# transform result into data frame
df_ft_to_wf = pd.DataFrame({'original_text': original_texts,
                            'original_label': original_translations,
                            'predicted_label': predicted_translations})

# print the result
df_ft_to_wf.head()

  generated = torch.tensor(data)
  attention_mask = torch.tensor(attention_mask)
  result = evaluation.compute_metrics((predictions, torch.tensor(labels)))
  1%|          | 1/162 [00:01<04:39,  1.74s/it]

Mbaa jan?


  1%|          | 2/162 [00:02<03:28,  1.30s/it]

Góor gi kenn bañ Moom


  2%|▏         | 3/162 [00:04<03:36,  1.36s/it]

Ci biir ŋgeen jëm?


  2%|▏         | 4/162 [00:05<03:17,  1.25s/it]

Dem naa ci keneen ki ñëw.


  3%|▎         | 5/162 [00:06<02:59,  1.14s/it]

Dil nitu réew mi


  4%|▎         | 6/162 [00:07<02:50,  1.09s/it]

Doo jëm?


  4%|▍         | 7/162 [00:08<03:25,  1.32s/it]

Mbaa kenn demul?


  5%|▍         | 8/162 [00:10<03:32,  1.38s/it]

Séen naa am xar.


  6%|▌         | 9/162 [00:11<03:29,  1.37s/it]

Yaw mi ŋga


  6%|▌         | 10/162 [00:12<03:15,  1.29s/it]

Ñii dañu demul woon


  7%|▋         | 11/162 [00:14<03:20,  1.33s/it]

Ku Loolu?


  7%|▋         | 12/162 [00:15<03:05,  1.24s/it]

Dóor na ka ba mi ŋgi.


  8%|▊         | 13/162 [00:16<02:52,  1.16s/it]

Demal rekk


  9%|▊         | 14/162 [00:17<02:49,  1.15s/it]

Waxtaan ak kenn kan?


  9%|▉         | 15/162 [00:18<02:46,  1.13s/it]

Nit ki rekk a ñëwul.


 10%|▉         | 16/162 [00:19<02:49,  1.16s/it]

Ci kii.


 10%|█         | 17/162 [00:20<02:40,  1.10s/it]

Moo di dem.


 11%|█         | 18/162 [00:21<02:33,  1.07s/it]

Waxtaan ŋga ag góor gi doon dem


 12%|█▏        | 19/162 [00:22<02:28,  1.04s/it]

Yéen mi ŋgi ci foofu


 12%|█▏        | 20/162 [00:23<02:31,  1.07s/it]

Soo demee, mu ñëw.


 13%|█▎        | 21/162 [00:24<02:25,  1.03s/it]

Bëgg naa góor gi ñëw


 14%|█▎        | 22/162 [00:25<02:22,  1.02s/it]

Gis ŋga xale yooyule?


 14%|█▍        | 23/162 [00:26<02:18,  1.00it/s]

Noona xale yi set nañu


 15%|█▍        | 24/162 [00:29<03:20,  1.45s/it]

Gor gii di Lawbe Ndar.


 15%|█▌        | 25/162 [00:31<03:35,  1.58s/it]

Yan ñoo ñëw?


 16%|█▌        | 26/162 [00:32<03:20,  1.47s/it]

Lépp jeex na.


 17%|█▋        | 27/162 [00:34<03:29,  1.56s/it]

Benn boobule laa la may.


 17%|█▋        | 28/162 [00:35<03:20,  1.50s/it]

Yaa ka gis moom Samba.


 18%|█▊        | 29/162 [00:37<03:28,  1.57s/it]

Xale yi bëgg nañu dikk, te mag ni ñaan nañu ŋgeen dem


 19%|█▊        | 30/162 [00:38<03:05,  1.40s/it]

Moontin nag, bëgg nañu dem


 19%|█▉        | 31/162 [00:39<02:58,  1.36s/it]

Waxuma yooyale xale?


 20%|█▉        | 32/162 [00:40<02:44,  1.27s/it]

Gis na keneen ki woon.


 20%|██        | 33/162 [00:41<02:30,  1.17s/it]

Noona sa waajur ñëw


 21%|██        | 34/162 [00:42<02:18,  1.08s/it]

Ku ñëw?


 22%|██▏       | 35/162 [00:43<02:08,  1.01s/it]

Gis naa ki woon.


 22%|██▏       | 36/162 [00:44<02:05,  1.00it/s]

Yaa ñëwkóon


 23%|██▎       | 37/162 [00:45<02:02,  1.02it/s]

Yéen demulwoon


 23%|██▎       | 38/162 [00:46<02:00,  1.03it/s]

Waw kookule.


 24%|██▍       | 39/162 [00:47<01:58,  1.04it/s]

Gis ŋga nit kee?


 25%|██▍       | 40/162 [00:47<01:55,  1.06it/s]

Ni ŋga def noonu.


 25%|██▌       | 41/162 [00:48<01:55,  1.05it/s]

Demal rekk!


 26%|██▌       | 42/162 [00:49<01:51,  1.07it/s]

Kenn ki dem na


 27%|██▋       | 43/162 [00:50<01:47,  1.10it/s]

Wax ji yépp, bañ-ŋga-ñëw la.


 27%|██▋       | 44/162 [00:51<01:47,  1.10it/s]

Ñun ñii lay set.


 28%|██▊       | 45/162 [00:52<01:46,  1.10it/s]

Dem nañu


 28%|██▊       | 46/162 [00:53<01:44,  1.11it/s]

Na dem su bëggul


 29%|██▉       | 47/162 [00:54<01:44,  1.10it/s]

Gis naa sama xarit yeneen yooyuu


 30%|██▉       | 48/162 [00:55<01:40,  1.13it/s]

Yooyale deey bëggu leen!


 30%|███       | 49/162 [00:55<01:39,  1.13it/s]

Su dee dem


 31%|███       | 50/162 [00:56<01:37,  1.14it/s]

Kii dafa demkoon


 31%|███▏      | 51/162 [00:57<01:35,  1.16it/s]

Soo demee, ci ñëw


 32%|███▏      | 52/162 [00:58<01:35,  1.16it/s]

Gis ŋga xale be?


 33%|███▎      | 53/162 [00:59<01:36,  1.12it/s]

Musaa


 33%|███▎      | 54/162 [01:00<01:34,  1.14it/s]

Doo dem?


 34%|███▍      | 55/162 [01:01<01:32,  1.16it/s]

Ma ŋgee doon dem


 35%|███▍      | 56/162 [01:01<01:30,  1.17it/s]

Ndax kan dem?


 35%|███▌      | 57/162 [01:02<01:29,  1.17it/s]

Góor gi moo demulwoon


 36%|███▌      | 58/162 [01:03<01:31,  1.13it/s]

Laobe ŋga woon.


 36%|███▋      | 59/162 [01:04<01:31,  1.13it/s]

Séen naa am guy.


 37%|███▋      | 60/162 [01:05<01:29,  1.14it/s]

Góor gi waxkoon na


 38%|███▊      | 61/162 [01:06<01:27,  1.15it/s]

Tann ŋga doomu benn jigéen.


 38%|███▊      | 62/162 [01:07<01:26,  1.16it/s]

Ma may ñan?


 39%|███▉      | 63/162 [01:08<01:24,  1.18it/s]

Ku mu?


 40%|███▉      | 64/162 [01:09<01:27,  1.12it/s]

Waxal ak ñooñule!


 40%|████      | 65/162 [01:09<01:25,  1.14it/s]

Gis ŋga nag yii yépp, woowuu moo ci gën.


 41%|████      | 66/162 [01:10<01:23,  1.15it/s]

Soo demee ag soo demul


 41%|████▏     | 67/162 [01:11<01:21,  1.17it/s]

Gis ŋga buu?


 42%|████▏     | 68/162 [01:12<01:20,  1.17it/s]

Doo nitu jamm


 43%|████▎     | 69/162 [01:13<01:19,  1.18it/s]

Kookule,?


 43%|████▎     | 70/162 [01:14<01:20,  1.14it/s]

Faatim la.


 44%|████▍     | 71/162 [01:15<01:19,  1.14it/s]

Waxal ag ndaw soo demul itam.


 44%|████▍     | 72/162 [01:15<01:18,  1.15it/s]

Ibraayima


 45%|████▌     | 73/162 [01:16<01:17,  1.15it/s]

Loo jëm?


 46%|████▌     | 74/162 [01:17<01:16,  1.15it/s]

Góor gi bëggul


 46%|████▋     | 75/162 [01:18<01:17,  1.12it/s]

Kookule la soo demee


 47%|████▋     | 76/162 [01:19<01:17,  1.11it/s]

Nit kookuu génn laa wax.


 48%|████▊     | 77/162 [01:20<01:15,  1.13it/s]

Góor gi gisul meneen.


 48%|████▊     | 78/162 [01:21<01:13,  1.14it/s]

Gisoon seen ban xarit?


 49%|████▉     | 79/162 [01:22<01:11,  1.15it/s]

Góor gi nee na la fi saŋx, ŋga dem ci biti.


 49%|████▉     | 80/162 [01:22<01:10,  1.16it/s]

Xale bi tawat la wax.


 50%|█████     | 81/162 [01:23<01:11,  1.13it/s]

Li ŋga wax loolu.


 51%|█████     | 82/162 [01:24<01:11,  1.11it/s]

Defe naa du ñëw


 51%|█████     | 83/162 [01:25<01:09,  1.13it/s]

Nit kookuu ci sama wet.


 52%|█████▏    | 84/162 [01:26<01:07,  1.15it/s]

Dem


 52%|█████▏    | 85/162 [01:27<01:06,  1.16it/s]

Feneen fi bëttóon foofu.


 53%|█████▎    | 86/162 [01:28<01:06,  1.15it/s]

Aminta ñëw?


 54%|█████▎    | 87/162 [01:29<01:06,  1.12it/s]

Yaa ñëw na.


 54%|█████▍    | 88/162 [01:30<01:05,  1.14it/s]

Dem nañu


 55%|█████▍    | 89/162 [01:30<01:03,  1.15it/s]

Ci fi góor gi dem.


 56%|█████▌    | 90/162 [01:31<01:04,  1.12it/s]

Loolule lépp.


 56%|█████▌    | 91/162 [01:32<01:03,  1.12it/s]

Nit ag gaynde duñu dëkkóo.


 57%|█████▋    | 92/162 [01:33<01:04,  1.08it/s]

Sa yay nee dana ñëw ci ŋgoon.


 57%|█████▋    | 93/162 [01:34<01:06,  1.05it/s]

Xale yile yarunañu.


 58%|█████▊    | 94/162 [01:35<01:03,  1.07it/s]

Yan kan ŋga dem?


 59%|█████▊    | 95/162 [01:36<01:01,  1.09it/s]

Góor gi moo dulwoon


 59%|█████▉    | 96/162 [01:37<01:00,  1.10it/s]

Daŋga gis kan?


 60%|█████▉    | 97/162 [01:38<00:58,  1.12it/s]

Seetil nag yépp!


 60%|██████    | 98/162 [01:39<00:58,  1.10it/s]

Xale bi mayul dara kii.


 61%|██████    | 99/162 [01:40<00:56,  1.11it/s]

Nit la.


 62%|██████▏   | 100/162 [01:40<00:54,  1.13it/s]

Dem na


 62%|██████▏   | 101/162 [01:41<00:54,  1.13it/s]

Xale bi tawat la wax.


 63%|██████▎   | 102/162 [01:42<00:53,  1.12it/s]

Gis naa booba xale?


 64%|██████▎   | 103/162 [01:43<00:52,  1.13it/s]

Dafa di nitu tay.


 64%|██████▍   | 104/162 [01:44<00:53,  1.08it/s]

Góor gi du t


 65%|██████▍   | 105/162 [01:45<00:51,  1.10it/s]

Dem ŋga dem te mu dem ag sama xarit ya.


 65%|██████▌   | 106/162 [01:46<00:50,  1.10it/s]

Ku mu?


 66%|██████▌   | 107/162 [01:47<00:49,  1.11it/s]

Boobu néeg ban ŋga wax?


 67%|██████▋   | 108/162 [01:48<00:48,  1.11it/s]

Góor gi dem?


 67%|██████▋   | 109/162 [01:49<00:48,  1.08it/s]

Su dee Lawbe


 68%|██████▊   | 110/162 [01:50<00:48,  1.08it/s]

Jan ŋga gis?


 69%|██████▊   | 111/162 [01:51<00:47,  1.07it/s]

Baax na?


 69%|██████▉   | 112/162 [01:51<00:47,  1.06it/s]

Demkoonuma


 70%|██████▉   | 113/162 [01:52<00:46,  1.05it/s]

Gis na keneen ki.


 70%|███████   | 114/162 [01:54<00:47,  1.02it/s]

Gis na keneen ki woon.


 71%|███████   | 115/162 [01:55<00:46,  1.01it/s]

Góor gee ni soo demee nit la


 72%|███████▏  | 116/162 [01:55<00:45,  1.01it/s]

Soo demee ag soo demul itam, dana ñëw.


 72%|███████▏  | 117/162 [01:56<00:43,  1.04it/s]

Séen naa am xar.


 73%|███████▎  | 118/162 [01:57<00:42,  1.04it/s]

Ñëwël ndax xale yi di ay liggéeykat, di ay jambaar


 73%|███████▎  | 119/162 [01:58<00:43,  1.00s/it]

Ci kooku, ndax mu wettëliku


 74%|███████▍  | 120/162 [01:59<00:41,  1.01it/s]

Lii lan?


 75%|███████▍  | 121/162 [02:00<00:39,  1.04it/s]

Koo gis?


 75%|███████▌  | 122/162 [02:01<00:36,  1.09it/s]

Du ŋgeen


 76%|███████▌  | 123/162 [02:02<00:35,  1.10it/s]

Bi ŋga dee dem


 77%|███████▋  | 124/162 [02:03<00:34,  1.10it/s]

Gis naa gaynde.


 77%|███████▋  | 125/162 [02:04<00:34,  1.08it/s]

Nit kookuu doo ka.


 78%|███████▊  | 126/162 [02:05<00:32,  1.11it/s]

Góor gi jëm


 78%|███████▊  | 127/162 [02:06<00:30,  1.14it/s]

Foofu, góor gi dem ba mi ŋgi fi.


 79%|███████▉  | 128/162 [02:06<00:29,  1.15it/s]

Gis naa xar yi gannaaw yaw.


 80%|███████▉  | 129/162 [02:07<00:28,  1.15it/s]

Doo dem?


 80%|████████  | 130/162 [02:08<00:27,  1.15it/s]

Ma ŋgoogule foofu.


 81%|████████  | 131/162 [02:09<00:27,  1.14it/s]

Fee la.


 81%|████████▏ | 132/162 [02:10<00:26,  1.15it/s]

Jile jigéen jan ŋgeen wax?


 82%|████████▏ | 133/162 [02:11<00:25,  1.15it/s]

Xam naa xale bi.


 83%|████████▎ | 134/162 [02:12<00:24,  1.13it/s]

Samba la?


 83%|████████▎ | 135/162 [02:13<00:23,  1.15it/s]

Ñeñeen lañu.


 84%|████████▍ | 136/162 [02:13<00:23,  1.12it/s]

Seet ŋga ñooñale ñan?


 85%|████████▍ | 137/162 [02:14<00:22,  1.12it/s]

Jigéen jan ñoo réer?


 85%|████████▌ | 138/162 [02:15<00:21,  1.14it/s]

Waxu la.


 86%|████████▌ | 139/162 [02:16<00:19,  1.15it/s]

Nit, demkoon


 86%|████████▋ | 140/162 [02:17<00:18,  1.16it/s]

Génnéel képp nit koo gis!


 87%|████████▋ | 141/162 [02:18<00:18,  1.17it/s]

Bëgguma du yar.


 88%|████████▊ | 142/162 [02:19<00:17,  1.14it/s]

Menn xar réerul.


 88%|████████▊ | 143/162 [02:20<00:16,  1.16it/s]

Yéen dem ŋga


 89%|████████▉ | 144/162 [02:20<00:15,  1.16it/s]

Jënd ñaa menn xar mi.


 90%|████████▉ | 145/162 [02:21<00:14,  1.17it/s]

Demkoonuma


 90%|█████████ | 146/162 [02:22<00:13,  1.18it/s]

Kookule la soo demee


 91%|█████████ | 147/162 [02:23<00:12,  1.17it/s]

Ci Séeréer yi ag Pël yi


 91%|█████████▏| 148/162 [02:24<00:12,  1.16it/s]

Wool góor gi dul dem


 92%|█████████▏| 149/162 [02:25<00:11,  1.18it/s]

Ku dem?


 93%|█████████▎| 150/162 [02:25<00:09,  1.21it/s]

Gis naa xar.


 93%|█████████▎| 151/162 [02:26<00:09,  1.22it/s]

Nit ka, moom nit kooka la.


 94%|█████████▍| 152/162 [02:27<00:08,  1.22it/s]

Dem na


 94%|█████████▍| 153/162 [02:28<00:07,  1.21it/s]

Nit, gayndé, nag, àndoon nañu fi.


 95%|█████████▌| 154/162 [02:29<00:06,  1.19it/s]

Yaa doonkoon wax


 96%|█████████▌| 155/162 [02:30<00:05,  1.20it/s]

Góor gi bëggul


 96%|█████████▋| 156/162 [02:30<00:04,  1.21it/s]

Nit ñenn ñi yegseeguñu.


 97%|█████████▋| 157/162 [02:31<00:04,  1.23it/s]

Man xar mépp.


 98%|█████████▊| 158/162 [02:32<00:03,  1.24it/s]

Yaw moomu laa wax


 98%|█████████▊| 159/162 [02:33<00:02,  1.23it/s]

Xammee ŋga waa jooju?


 99%|█████████▉| 160/162 [02:34<00:01,  1.21it/s]

Noona Góor gaa ŋgi, mu ñëw.


 99%|█████████▉| 161/162 [02:34<00:00,  1.22it/s]

Su demee


100%|██████████| 162/162 [02:35<00:00,  1.04it/s]

Dafa doon nitu dëgg.





Unnamed: 0,original_text,original_label,predicted_label
0,Que j'attrape quelles vaches?,Ma japp nag yee yan?,Mbaa jan?
1,Et que nul ne bouge!,Te bu fi kenn jogé!,Góor gi kenn bañ Moom
2,C'est à l'intérieur que tu dis?,Ci biir ŋga wax?,Ci biir ŋgeen jëm?
3,J'ai donné le livre à l'homme qui est venu.,Jox naa téere bi góor gi ñëw.,Dem naa ci keneen ki ñëw.
4,Sois homme de ce pays!,Dil nitu réew mi!,Dil nitu réew mi


In [11]:
df_ft_to_wf.tail(10)

Unnamed: 0,original_text,original_label,predicted_label
152,"Homme, lion, boeuf... allaient de concert.","Nit, gayndé, nag... àndoon nañu fi.","Nit, gayndé, nag, àndoon nañu fi."
153,C'est toi qui eusses été élu,Yaa doonkoon falu,Yaa doonkoon wax
154,L'homme ne cultivera pas,Góor gi du bày,Góor gi bëggul
155,S'agiter simplement ne suffit à rien résoudre.,Di tel-teli doŋŋ taxul sotal dara.,Nit ñenn ñi yegseeguñu.
156,C'était son hôte habituellement.,Moo doon ganam.,Man xar mépp.
157,Je parle de ceux-là!,Yenn xar yooyuu laa wax!,Yaw moomu laa wax
158,Tu reconnais cet enfant-ci?,Xammee ŋga bee xale?,Xammee ŋga waa jooju?
159,"Alors l'homme entra, les enfants le virent, il...","Noona góor gi dugg, xale yi gis ka, mu toog, ñ...","Noona Góor gaa ŋgi, mu ñëw."
160,C'est leur ami!,Suñu xarit la!,Su demee
161,Il était Lebou de Yoff.,Mu doon Lebu Yoff.,Dafa doon nitu dëgg.


In [12]:
# let us display 100 samples
pd.options.display.max_rows = 100
df_ft_to_wf.sample(100)

Unnamed: 0,original_text,original_label,predicted_label
105,Qui est-ce?,Ñan la?,Ku mu?
80,Tu as dit cela.,La ŋga wax la.,Li ŋga wax loolu.
52,A Moussa!,Musaa!,Musaa
132,Je connais l'enfant.,Xam naa xale bi.,Xam naa xale bi.
59,L'homme qui eût travaillé,Waa ji liggéeykoon,Góor gi waxkoon na
54,Le voilà qui part!,Mi ŋgiiy!,Ma ŋgee doon dem
115,Que tu partes ou que tu ne partes pas il viendra.,Dana ñëw soo demul ag soo demee itam.,"Soo demee ag soo demul itam, dana ñëw."
114,C'est l'homme qui a soutenu qu'il est sain d'e...,"Góor gee ni nit la, soo demee!",Góor gee ni soo demee nit la
46,J'ai vu mes amis!,Gis naa sana xarit yi!,Gis naa sama xarit yeneen yooyuu
147,Appelle l'homme qui ne part pas,Wool góor gi dul dem,Wool góor gi dul dem


## Colab download and remove step

In [None]:
import shutil

# shutil.rmtree('/content/drive/MyDrive/Memoire/subject2/training2/results2')
shutil.rmtree('wandb')
# shutil.make_archive('wandb', 'zip', 'wanbd')