First Training result with the GPT-2 decoder 🤖 (after bayes method)
-----------------------------------

In this notebook, we will continue the fine-tuning of the pre-trained GPT-2 model provided by OPEN-AI. We obtained, after a hyperparameter tuning with `wandb`, a model with a minimal evaluation cross-entropy-loss of **0.71** for french to wolof translation model and **. Let us load the model with the best hyperparameter setting and continue the training.

Parallel coordinates from panel:

![bayes](parallel_coord_bayes_gpt2.png)


We also see that the evaluation loss depends more on the probability of modifying words from a french sentence (fr_word_p) with the following `Parameter importance char` (from [panel](https://wandb.ai/oumar-kane-team/gpt2-wolof-french-translation_bayes1/reports/undefined-23-04-30-22-32-51---Vmlldzo0MjIzOTM1?accessToken=9wnl2kvqzq3tfg35pp9zl5y0etpg8xy2jr7b4hi5crxfk8on4vdxz9baxrr4hack)):

![parameter_importance](Parameter_importance_bayes_gpt2.png)

The evaluation loss is also negatively correlated to the learning rate and positively to the fr_word_p (probability of modifying words from a french sentence).

In [2]:
# let us extend the paths of the system
import sys

path = "/content/drive/MyDrive/Memoire/subject2/"

sys.path.extend([f"{path}new_data", f"{path}wolof-translate"])

In [3]:
# define environment
%env WANDB_LOG_MODEL=true
%env WANDB_NOTEBOOK_NAME=training_gpt2_2.ipynb
%env WANDB_API_KEY=237a8450cd2568ea1c8e1f8e0400708e79b6b4ee

env: WANDB_LOG_MODEL=true
env: WANDB_NOTEBOOK_NAME=training_gpt2_2.ipynb
env: WANDB_API_KEY=237a8450cd2568ea1c8e1f8e0400708e79b6b4ee


In [4]:
!pip install -qq wandb --upgrade

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.8/2.0 MB[0m [31m22.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m201.7/201.7 kB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 kB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [5]:
!pip install evaluate -qq
!pip install sacrebleu -qq
!pip install optuna -qq
!pip install transformers -qq 
!pip install tokenizers -qq
!pip install nlpaug -qq
!pip install ray[tune] -qq
!python -m spacy download fr_core_news_lg 

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.5/212.5 kB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m35.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m43.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

In [6]:
# let us import all necessary libraries
from transformers import GPT2LMHeadModel, TrainingArguments, Trainer, EarlyStoppingCallback
from wolof_translate.utils.sent_transformers import TransformerSequences
from wolof_translate.data.dataset_v1 import SentenceDataset
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from functools import partial
from tqdm import tqdm
import pandas as pd
import numpy as np
import evaluate
import torch
import wandb

wandb.login(key="237a8450cd2568ea1c8e1f8e0400708e79b6b4ee")


[34m[1mwandb[0m: Currently logged in as: [33moumar-kane[0m ([33moumar-kane-team[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

We will create two models: 

- One translating the french corpus to a wolof corpus [french_to_wolof](#french-to-wolof)
- One translating the wolof corpus to a french corpus [wolof_to_french](#wolof-to-french)

--------------

## French to wolof

### Configure dataset 🔠

We can use the same custom dataset that we created in [text_augmentation](text_augmentation.ipynb). But we need to split the data between train and test sets and save them.

In [7]:
def split_data(random_state: int = 50):

  # load the corpora and split into train and test sets
  corpora = pd.read_csv(f"{path}new_data/sent_extraction.csv")

  train_set, test_set = train_test_split(corpora, test_size=0.1, random_state=random_state)

  # let us save the sets
  train_set.to_csv(f"{path}new_data/train_set.csv", index=False)

  test_set.to_csv(f"{path}new_data/test_set.csv", index=False)

Let us recuperate the datasets with and without augmentation.

In [8]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float):

  # with augmentation
  fr_augmentation = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p),
                                        remove_mark_space, delete_guillemet_space)

  train_dataset_aug = SentenceDataset(f"{path}new_data/train_set.csv", 
                                  tokenizer_path = f"{path}wolof-translate/wolof_translate/tokenizers/tokenizer_v1.json",
                                  cp1_transformer=fr_augmentation, truncation=True,
                                  max_len=579)

  test_dataset = SentenceDataset(f"{path}new_data/test_set.csv",
                                tokenizer_path = f"{path}wolof-translate/wolof_translate/tokenizers/tokenizer_v1.json",
                                truncation=True, max_len=579)
  
  return train_dataset_aug, test_dataset

### Configure the model and the evaluation function ⚙️

Let us recuperate the model and resize the token embeddings.

In [9]:
def gpt2_model_init(tokenizer):
  # set the mode name
  model_name = "gpt2"

  # recuperate the tokenizer from the dataset
  tokenizer = tokenizer

  # configure the model
  model = GPT2LMHeadModel.from_pretrained(model_name).cuda()

  # resize the token embeddings
  model.resize_token_embeddings(len(tokenizer))

  return model

Let us evaluate the predictions with the `bleu` metric.

In [10]:
# %%writefile wolof-translate/wolof_translate/utils/evaluation.py
from tokenizers import Tokenizer
from typing import *
import numpy as np
import evaluate

class TranslationEvaluation:
    
    def __init__(self, 
                 tokenizer: Tokenizer,
                 decoder: Union[Callable, None] = None,
                 metric = evaluate.load('sacrebleu'),
                 ):
        
        self.tokenizer = tokenizer
        
        self.decoder = decoder
        
        self.metric = metric
    
    def postprocess_text(self, preds, labels):
        
        preds = [pred.strip() for pred in preds]
        
        labels = [[label.strip()] for label in labels]
        
        return preds, labels

    def compute_metrics(self, eval_preds):
        
        preds, labels = eval_preds.preds.detach().cpu(), labels.detach().cpu()
        
        if isinstance(preds, tuple):
            
            preds = preds[0]
        
        if self.decoder is None:
            
            decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)
            
            decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)
            
            decoded_preds, decoded_labels = self.postprocess_text(decoded_preds, decoded_labels)
            
            result = self.metric.compute(predictions=decoded_preds, references=decoded_labels)
            
            result = {"bleu": result["score"]}
            
            prediction_lens = [np.count_nonzero(pred != self.tokenizer.pad_token_id) for pred in preds]
            
            result["gen_len"] = np.mean(prediction_lens)
        
        else:
            
            predictions = list(self.decoder(preds))
            
            labels = list(self.decoder(labels))
      
            decoded_preds, decoded_labels = self.postprocess_text(predictions, labels)
            
            result = self.metric.compute(predictions=predictions, references=labels)
            
            result = {"bleu": result["score"]}
        
        result = {k:round(v, 4) for k, v in result.items()}

        wandb.log("bleu", result["bleu"])
            
        return result

Downloading builder script:   0%|          | 0.00/8.15k [00:00<?, ?B/s]

In [11]:
# %run wolof-translate/wolof_translate/utils/evaluation.py

### Searching for the best parameters 🕖

Let us define the data collator.

In [13]:
def data_collator(batch):
    """Generate a batch of data to provide to trainer

    Args:
        batch (_type_): The batch

    Returns:
        dict: A dictionary containing the ids, the attention mask and the labels
    """
    input_ids = torch.stack([b[0] for b in batch])
    
    attention_mask = torch.stack([b[1] for b in batch])
    
    labels = torch.stack([b[0] for b in batch])
    
    return {'input_ids': input_ids, 'attention_mask': attention_mask,
            'labels': labels}

Let us initialize the training arguments and make random search.

In [None]:
# %%wandb

"""Grid search best parameters
learning_rate = 0.000008605037398250715
weight_decay = 0.3
train_batch_size = 2
random_state = 0
fr_char_p = 0.6167489331342644
fr_word_p = 0.24656203270287985
eval/loss = 0.71915203332901
"""

# seed
torch.manual_seed(50)

# Initialize the splits
split_data(0)

# Initialize wandb
wandb.init(project = "gpt2-wolof-french-translation_bayes2")

# let us recuperate the datasets
train_dataset, test_dataset = recuperate_datasets(0.6167489331342644, 0.24656203270287985)

# set training arguments
training_args = TrainingArguments(f"{path}training2/results3",
                                  report_to = "wandb",
                                  num_train_epochs=20,
                                  # logging_steps=100,
                                  load_best_model_at_end=True,
                                  save_strategy="epoch",
                                  evaluation_strategy="epoch",
                                  logging_strategy = 'epoch',
                                  per_device_train_batch_size=2, 
                                  per_device_eval_batch_size=5,
                                  learning_rate = 0.000008605037398250715,
                                  weight_decay=0.3,
                                  remove_unused_columns = False,
                                  fp16 = True,
                                  metric_for_best_model="eval_loss",
                                  greater_is_better=False,
                                  )   

# define training loop
trainer = Trainer(model_init=partial(gpt2_model_init, tokenizer = train_dataset.tokenizer),
                  args=training_args,
                  train_dataset=train_dataset, 
                  eval_dataset=test_dataset,
                  data_collator=data_collator,
                  # compute_metrics=translation_eval.compute_metrics
                  )

# load last checkpoint
# trainer._load_from_checkpoint("data/training2/results/checkpoint-147")

# start training loop
trainer.train('/content/drive/MyDrive/Memoire/subject2/training3/checkpoint')

# finish wandb
wandb.finish()


We see that the model is over-fitted. We must fine-tune the model and augment it to add some noise into the training step.

## Predictions

Let load the best model.

In [16]:
# load from a checkpoint the best model
trainer._load_from_checkpoint('/content/drive/MyDrive/Memoire/subject2/training3/results1/checkpoint-1835')

model = trainer.model

# get the tokenizer
tokenizer = test_dataset.tokenizer

# let us initialize the evaluation class
translation_eval = TranslationEvaluation(tokenizer)

Let us generate texts and store into a DataFrame.

In [17]:

# set the model to eval mode
_ = model.eval()

# run model inference on all test data
original_traduction, predicted_traduction, original_text, scores = [], [], [], {}

for data in tqdm(DataLoader(test_dataset)):
    
    # recuperate the two part of the sentence
    sents = list(test_dataset.decode(data[0]))
    
    cp1_sent, cp2_sent = sents[0][0], sents[0][1] 
    
    # create the sentence to traduce
    sent1 = f'{test_dataset.cls_token}{cp1_sent}{test_dataset.sep_token}'
    
    # generate tokens
    encoding = tokenizer(sent1, return_tensors='pt')
    
    generated = encoding.input_ids.cuda()
    
    attention_mask = encoding.attention_mask.cuda()
    
    # recuperate the pad token id
    pad_token_id = tokenizer.pad_token_id
    
    # perform prediction
    sample_outputs = model.generate(generated, do_sample = False, top_k = 50, max_length = test_dataset.max_len, top_p = 0.90,
                                    temperature = 0, num_return_sequences = 0, attention_mask = attention_mask, pad_token_id = pad_token_id)
    
    # calculate the score and add it to the score
    # result = translation_eval.compute_metrics((sample_outputs, generated))
    
    # if not scores: scores.update({k: v for k, v in result.items()})
    
    # else: scores.update({k: round((scores[k] + v) / 2, 4) for k, v in result.items()})
    
    # decode the predicted tokens into texts
    sent2 = list(test_dataset.decode(sample_outputs, True))[0]
    
    print(sent2)
    # append results
    original_traduction.append(cp2_sent)
    predicted_traduction.append(sent2)
    original_text.append(cp1_sent)

# transform result into data frame
df_ft_to_wf = pd.DataFrame({'original_text': original_text,
                            'original_label': original_traduction,
                            'predicted_label': predicted_traduction})

# print the result
df_ft_to_wf.head()

  1%|          | 1/82 [00:09<12:27,  9.22s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma daan ma, di ma daan ma doon.


  2%|▏         | 2/82 [00:15<10:08,  7.60s/it]

Mu ma nee, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di.


  4%|▎         | 3/82 [00:24<10:48,  8.21s/it]

Mu, di ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daanJt, ma daanJt, la, la, la, la, la, la, la, la, la, la, la, ma daanJt, la, la, la, la, la, la, la, ma daanJt, la, la, ma mère, ma daanJt, la, ma mère, ma mère, ma mère, ma mère, ma mère, ma mère, ma mère, ma mère, ma doon.


  5%|▍         | 4/82 [00:30<09:29,  7.30s/it]

Mu ngi fàttaliku, di, di, di, di, di, di, di, di, di, di, di, di, di.


  6%|▌         | 5/82 [00:38<09:46,  7.62s/it]

Mu ngi fàttaliku, di ma daan def, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di.


  7%|▋         | 6/82 [00:44<09:00,  7.11s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma doon, di ma doon, di ma doon, di ma doon.


  9%|▊         | 7/82 [00:52<08:54,  7.13s/it]

Mu ngi fàttaliku, di ma daan def, di ma daan def, di ma daan def, di ma daan def.


 10%|▉         | 8/82 [01:00<09:17,  7.54s/it]

Mu doon fa, di ma nee, di, di ma daan def, di, di, di, di, di, di, di.


 11%|█         | 9/82 [01:06<08:33,  7.03s/it]

Mu ngi fàttaliku, di, di ma daan def, di, di, di, di, di, di, di, di, di.


 12%|█▏        | 10/82 [01:14<08:47,  7.33s/it]

Li ma nee, di ma nee, di, di, di, di, di, di, di, di, di, di, di, di, di.


 13%|█▎        | 11/82 [01:21<08:37,  7.29s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma, di ma daan ma, di ma doon.


 15%|█▍        | 12/82 [01:32<09:38,  8.26s/it]

Li ma nee, di, di, di ma daan ma daan ma daan ma daan ma daan ma daan ma doon.


 16%|█▌        | 13/82 [01:39<09:14,  8.03s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma doon.


 17%|█▋        | 14/82 [01:45<08:26,  7.45s/it]

Mu ngi fàttaliku, di ma daan def, di, di ma daan def, di, di, di, di, di, di, di, di, di, di.


 18%|█▊        | 15/82 [01:54<08:40,  7.77s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma doon.


 20%|█▉        | 16/82 [02:00<07:58,  7.24s/it]

Mu ngi fàttaliku, di ma nee, di daan def, di, di, di, di, di, di, di, di.


 21%|██        | 17/82 [02:07<07:58,  7.37s/it]

Mu, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, ma daan ma daan ma, ma daanJt, ma daan9e, la, ma daanJt, la, la, ma daanJt, la, la, la, ma daanJt, la, la, ma daan9e, ma daan9e, ma nee, ma daan9e, ma doon.


 22%|██▏       | 18/82 [02:15<07:52,  7.39s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma ma daan ma doon.


 23%|██▎       | 19/82 [02:21<07:30,  7.15s/it]

Mu, di ma daan ma daan ma daan ma daan ma doon, di ma doon.


 24%|██▍       | 20/82 [02:30<07:53,  7.63s/it]

Mu ma daan ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma doon.


 26%|██▌       | 21/82 [02:35<07:04,  6.95s/it]

Li ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma doon, di, di ma doon, di ma doon, di ma doon, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di.


 27%|██▋       | 22/82 [02:43<07:04,  7.07s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma doon.


 28%|██▊       | 23/82 [02:50<07:01,  7.14s/it]

Mu doon fa, di ma daan def, di, di, di, di, di, di, di, di, di, di, di, di.


 29%|██▉       | 24/82 [02:56<06:40,  6.90s/it]

Mu, di ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma doon.


 30%|███       | 25/82 [03:05<07:01,  7.40s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma doon.


 32%|███▏      | 26/82 [03:11<06:38,  7.11s/it]

Mu, di ma daan ma daan ma daan ma daan ma doon, di ma doon.


 33%|███▎      | 27/82 [03:19<06:41,  7.29s/it]

Mu ngi fàttaliku, di ma daan def, di ma daan def, di, di, di, di, di, di, di, di, di, di.


 34%|███▍      | 28/82 [03:26<06:32,  7.27s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma doon.


 35%|███▌      | 29/82 [03:33<06:10,  7.00s/it]

Mu ngi fàttaliku, di ma nee, di ma daan def, di, di, di, di, di, di, di, di, di.


 37%|███▋      | 30/82 [03:41<06:29,  7.49s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daanJt, ma daanJt, la, la, la, la, la, la, la, la, la, la, la, la, la, ma daanJt, la, la, la, la, la, la, ma daanJt, la, la, la, ma daanJt, la ma daanJt, ma daanJt, ma daanJt, ma daanJt, ma daanJt, ma doon.


 38%|███▊      | 31/82 [03:47<06:00,  7.06s/it]

Mu, di ma nee, di ma daan ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, di ma, di ma daan ma daan ma, di ma doon.


 39%|███▉      | 32/82 [03:56<06:12,  7.46s/it]

Mu, di ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, di ma, di ma daan ma, di ma daan ma, di ma doon.


 40%|████      | 33/82 [04:03<05:55,  7.26s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, ma daan ma daan ma daan ma daan ma, ma daanJt, la, la, la, la, la, la, la, ma bëgg, ma daanJt, la, la, la, la, la, la, la, ma bëgg, ma daan9e, ma bëgg, ma doon.


 41%|████▏     | 34/82 [04:10<05:44,  7.18s/it]

Mu, di ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, di ma doon.


 43%|████▎     | 35/82 [04:20<06:28,  8.27s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma doon.


 44%|████▍     | 36/82 [04:27<05:58,  7.80s/it]

Mu doon fa, di ma nee, di, di, di, di, di, di, di, di, di, di.


 45%|████▌     | 37/82 [04:36<06:04,  8.11s/it]

Li ma nee, di ma mu ma mu ma mu ma mu ma mu ma doon.


 46%|████▋     | 38/82 [04:42<05:29,  7.49s/it]

Mu, di ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma doon.


 48%|████▊     | 39/82 [04:49<05:20,  7.46s/it]

Mu, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma, di ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma daan ma daan ma, di ma doon.


 49%|████▉     | 40/82 [04:57<05:16,  7.54s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma doon.


 50%|█████     | 41/82 [05:04<04:54,  7.19s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma doon.


 51%|█████     | 42/82 [05:12<05:04,  7.60s/it]

Mu ngi fàttaliku, di ma nee, di daan def, di, di daan def, di, di, di daan def.


 52%|█████▏    | 43/82 [05:18<04:38,  7.14s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma doon.


 54%|█████▎    | 44/82 [05:24<04:21,  6.89s/it]

Li ma nee, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di ma doon.


 55%|█████▍    | 45/82 [05:32<04:25,  7.17s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma doon.


 56%|█████▌    | 46/82 [05:39<04:09,  6.94s/it]

Mu, di ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma daan ma, di ma doon.


 57%|█████▋    | 47/82 [05:47<04:21,  7.48s/it]

Mu ma nee, di ma daan def, di ma daan def, di, di ma daan def.


 59%|█████▊    | 48/82 [05:53<03:59,  7.05s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma doon.


 60%|█████▉    | 49/82 [06:01<04:01,  7.31s/it]

Mu ma nee, di ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma daan ma, di ma doon.


 61%|██████    | 50/82 [06:09<03:52,  7.27s/it]

Mu, di ma nee, di ma nee, di ma daan def, di, di, di, di, di, di, di.


 62%|██████▏   | 51/82 [06:15<03:39,  7.09s/it]

Mu ma daan ma daan ma daan ma daan ma daan ma doon, di ma doon, di ma doon, di ma doon.


 63%|██████▎   | 52/82 [06:24<03:44,  7.49s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, di ma, di ma doon.


 65%|██████▍   | 53/82 [06:30<03:24,  7.05s/it]

Mu ngi fàttaliku, di ma daan def, di ma daan def, di ma daan def.


 66%|██████▌   | 54/82 [06:37<03:21,  7.21s/it]

Li ma nee, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di.


 67%|██████▋   | 55/82 [06:44<03:14,  7.19s/it]

Mu, di ma daan ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma daan ma daan ma daan ma, di ma, di ma doon.


 68%|██████▊   | 56/82 [06:50<02:57,  6.84s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma doon.


 70%|██████▉   | 57/82 [06:58<02:59,  7.18s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma doon, di ma doon, di ma doon.


 71%|███████   | 58/82 [07:06<02:53,  7.22s/it]

Mu doon fa, di ma nee, di ma daan def, di, di, di, di, di, di, di, di, di.


 72%|███████▏  | 59/82 [07:16<03:06,  8.12s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, di ma doon.


 73%|███████▎  | 60/82 [07:23<02:49,  7.69s/it]

Mu, di ma daan ma doon, di ma doon, di ma doon.


 74%|███████▍  | 61/82 [07:29<02:35,  7.39s/it]

Mu ngi fàttaliku, di ma daan def, di, di, di, di, di, di, di, di, di, di, di.


 76%|███████▌  | 62/82 [07:37<02:30,  7.51s/it]

Mu ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma doon.


 77%|███████▋  | 63/82 [07:43<02:11,  6.93s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma doon, di ma doon, di ma doon, di ma doon, di, di ma doon.


 78%|███████▊  | 64/82 [07:51<02:10,  7.26s/it]

Li ma nee, di ma daan ma daan ma nee, di ma daan ma daan ma daan ma doon.


 79%|███████▉  | 65/82 [07:57<01:59,  7.06s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma, di ma doon.


 80%|████████  | 66/82 [08:04<01:50,  6.89s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma doon.


 82%|████████▏ | 67/82 [08:12<01:48,  7.25s/it]

Li ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma doon.


 83%|████████▎ | 68/82 [08:18<01:37,  6.99s/it]

Mu, di ma doon, di ma doon, di ma doon, di ma doon.


 84%|████████▍ | 69/82 [08:27<01:37,  7.51s/it]

Mu doon fa, di, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma, di ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma, di ma daan ma, di ma daan ma, di ma doon.


 85%|████████▌ | 70/82 [08:33<01:25,  7.16s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma doon.


 87%|████████▋ | 71/82 [08:39<01:14,  6.76s/it]

Li ma doon tey ci seen bopp, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di.


 88%|████████▊ | 72/82 [08:47<01:12,  7.21s/it]

Mu doon, di ma nee, di ma daan def, di, di, di, di, di, di, di, di, di, di, di, di, di.


 89%|████████▉ | 73/82 [08:53<01:01,  6.86s/it]

Li ma nee, di ma daan ma nee, di ma nee ci seen bopp, di, di, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma doon.


 90%|█████████ | 74/82 [09:02<00:59,  7.40s/it]

Mu, di ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma daan ma, di ma daan ma, di ma daan ma, di ma doon.


 91%|█████████▏| 75/82 [09:08<00:48,  6.88s/it]

Li ma nee ngi fàttaliku, di ma daan ma daan ma daan ma daan ma doon, di ma doon tey ci biir, di ma doon tey ci biir, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di ma doon.


 93%|█████████▎| 76/82 [09:14<00:40,  6.74s/it]

Mu ngi fàttaliku, di ma daan def, di, di, di, di, di, di, di, di, di, di, di, di.


 94%|█████████▍| 77/82 [09:23<00:36,  7.27s/it]

Mu, di ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma doon.


 95%|█████████▌| 78/82 [09:29<00:27,  6.92s/it]

Mu ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma, di ma doon.


 96%|█████████▋| 79/82 [09:37<00:21,  7.19s/it]

Mu ngi fàttaliku, di ma daan def, di, di, di, di, di, di, di, di, di, di, di, di, di, di, di.


 98%|█████████▊| 80/82 [09:44<00:14,  7.14s/it]

Mu, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma daan ma, di ma daan ma, di ma, di ma daan ma, di ma daan ma, di ma doon.


 99%|█████████▉| 81/82 [09:51<00:07,  7.06s/it]

Mu ma nee, di ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma daan ma, di ma daan ma daan ma daan ma, di ma daan ma daan ma, di ma, di ma daan ma, di ma doon.


100%|██████████| 82/82 [10:01<00:00,  7.33s/it]

Mu ngi fàttaliku, di ma daan def, di, di, di, di, di, di, di, di, di, di, di, di, di.





Unnamed: 0,original_text,original_label,predicted_label
0,Son travail de médecin devient pour lui une ob...,"Mu daldi sóobu nag ci liggéey bi, ngir fàtte.","Mu ma nee, di ma daan ma daan ma daan ma daan ..."
1,Mon père et ma mère ont juste le temps de rass...,Baay ak sama yaay daldi gaawtu roñ seeni yéree...,"Mu ma nee, di, di, di, di, di, di, di, di, di,..."
2,Sans doute un jardin.,"Tool waru faa ñàkk, moom.","Mu, di ma nee, di ma daan ma daan ma daan ma d..."
3,À l'époque où il parcourt la province du Nord-...,Su weesoo dénd wi làrme réewum Almaañ defar ci...,"Mu ngi fàttaliku, di, di, di, di, di, di, di, ..."
4,Elle brillait dans ces noms qui entraient en m...,May yëgaat ni muy nes-nesilee ay tur yu may mi...,"Mu ngi fàttaliku, di ma daan def, di, di, di, ..."


## Colab download and remove step

In [None]:
import shutil

# shutil.rmtree('/content/drive/MyDrive/Memoire/subject2/training2/results2')
# shutil.rmtree('wandb')
# shutil.make_archive('wandb', 'zip', 'wanbd')