Fine-tuning best BART 🤖
-----------------------------------

In this notebook, we will continue the fine-tuning of the BART model on the new extracted sentences from the book **Grammaire de Wolof Moderne** and additional sentences. We provide, bellow, the main evaluation figures, obtained from the hyperparameter search step. We will evaluate the training on the validation dataset.

- Parallel coordinates from panel:

- Parameter importance char: 
[t5_v3_importance](https://wandb.ai/oumar-kane-team/small-t5-cross-fw-translation-bayes-hpsearch-v3/reports/undefined-23-05-16-10-36-17---Vmlldzo0Mzc4NDY0?accessToken=eyaiyrid0qz1zg2jkq3fc65biw53084dpfitbi0dgonq6mweupw6kgjml9d2nv1w)

We can see in the above chart that the batch is the most important parameter with a negative correlation with the BLEU score (meaning that a lower batch size is better). Next, we the probability of modifying a character in the french corpus is also important and a high probability provide a better BLEU score.  

In [1]:
# let us import all necessary libraries
from transformers import BartModel, BartForConditionalGeneration, Seq2SeqTrainer, BartTokenizerFast, set_seed, AdamW, get_linear_schedule_with_warmup,\
                          get_linear_schedule_with_warmup, get_cosine_schedule_with_warmup, Adafactor
from wolof_translate.utils.sent_transformers import TransformerSequences
from wolof_translate.utils.improvements.end_marks import add_end_mark # added
from torch.nn import TransformerEncoderLayer, TransformerDecoderLayer
from torch.utils.data import Dataset, DataLoader, random_split
from wolof_translate.data.dataset_v4 import SentenceDataset # v2 -> v3 -> v4
from wolof_translate.utils.sent_corrections import *
from sklearn.model_selection import train_test_split
from torch.optim.lr_scheduler import _LRScheduler
# from custom_rnn.utils.kwargs import Kwargs
from torch.nn.utils.rnn import pad_sequence
from plotly.subplots import make_subplots
from nlpaug.augmenter import char as nac
from torch.utils.data import DataLoader
# from datasets  import load_metric # make pip install evaluate instead
# and pip install sacrebleu for instance
from torch.nn import functional as F
import plotly.graph_objects as go
from tokenizers import Tokenizer
import matplotlib.pyplot as plt
import pytorch_lightning as lt
from tqdm import tqdm, trange
from functools import partial
from torch.nn import utils
from copy import deepcopy
from torch import optim
from typing import *
from torch import nn
import pandas as pd
import numpy as np
import itertools
import evaluate
import random
import string
import shutil
import wandb
import torch
import json
import copy
import os

# add seed for everything
lt.seed_everything(0)

os.environ["WANDB_DISABLED"] = "true"

  from .autonotebook import tqdm as notebook_tqdm
Global seed set to 0


## French to wolof

### Configure dataset 🔠

In [2]:
# recuperate the tokenizer from a json file
tokenizer = BartTokenizerFast(tokenizer_file=f"wolof-translate/wolof_translate/tokenizers/bart_tokenizers/tokenizer_v3_2.json")

In [3]:
def recuperate_datasets(fr_char_p: float, fr_word_p: float, max_len: int, end_mark_opt: int):

  # Let us recuperate the end_mark adding option
  if end_mark_opt == 1:
    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space)

    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space)
    
  else:
    
    if end_mark_opt == 2:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!', replace = True)
    
    elif end_mark_opt == 3:

      end_mark_fn = partial(add_end_mark)
    
    elif end_mark_opt == 4:

      end_mark_fn = partial(add_end_mark, end_mark_to_remove = '!')

    # Create augmentation to add on French sentences
    fr_augmentation_1 = TransformerSequences(nac.KeyboardAug(aug_char_p=fr_char_p, aug_word_p=fr_word_p,
                                                             aug_word_max = max_len),
                                          remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
    fr_augmentation_2 = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)
    
  # Recuperate the train dataset
  train_dataset_aug = SentenceDataset(f"data/extractions/new_data/train_set.csv",
                                        tokenizer,
                                        truncation = True, max_len=max_len,
                                        cp1_transformer = fr_augmentation_1,
                                        cp2_transformer = fr_augmentation_2,
                                        add_bos_token=True
                                        )

  # Recuperate the valid dataset
  valid_dataset = SentenceDataset(f"data/extractions/new_data/valid_set.csv",
                                        tokenizer, max_len=max_len,
                                        cp1_transformer = fr_augmentation_2,
                                        cp2_transformer = fr_augmentation_2,
                                        add_bos_token=True,
                                        truncation = True)
  
  # Return the datasets
  return train_dataset_aug, valid_dataset

### Searching for the best parameters 🕖

In [4]:
from wolof_translate.models.transformers.optimization import TransformerScheduler
from wolof_translate.trainers.transformer_trainer import ModelRunner
from wolof_translate.utils.evaluation import TranslationEvaluation
from wolof_translate.models.transformers.main import Transformer
from wolof_translate.utils.split_with_valid import split_data


-------------

### --- Wandb v5 2000

In [12]:
# let us initialize the hyperparameter configuration 
config = {
    'random_state': 0,
    'fr_char_p': 0.19310415677880952,
    'fr_word_p': 0.07285606562901713,
    'learning_rate': 0.002762393179839311,
    'weight_decay': 0.033353528922291854,
    'batch_size': 16,
    'warmup_ratio': 0.0,
    'max_epoch': 454,
    'epochs': 45,
    'mid_epoch': 45,
    'max_len': 38,
    'end_mark': 3,
    'bleu': 7.3684,
    'model_dir': 'data/checkpoints/fw_bart_custom_train_v5_checkpoints',
    'new_model_dir': 'data/checkpoints/bart_custom_train_results_fw_v5'
}

# Initialize the model name
model_name = 'facebook/bart-base'

# import the model with its pre-trained weights
model = BartForConditionalGeneration.from_pretrained(model_name)

# resize the token embeddings
model.resize_token_embeddings(len(tokenizer))

# let us initialize the evaluation class
evaluation = TranslationEvaluation(tokenizer)

# let us initialize the trainer
trainer = ModelRunner(model, seed = 0, version = 5, evaluation = evaluation, optimizer=Adafactor)

# split the data
split_data(config['random_state'], csv_file = "ad_sentences.csv")

# recuperate train and test set
train_dataset, test_dataset = recuperate_datasets(config['fr_char_p'], 
                                                    config['fr_word_p'], config['max_len'],
                                                    config['end_mark'])

# let us calculate the appropriate warmup steps (let us take a max epoch of 100)
length = len(train_dataset)

n_steps = length // config['batch_size']

num_steps = config['max_epoch'] * n_steps

warmup_steps = (config['max_epoch'] * n_steps) * config['warmup_ratio']

# # Initialize the scheduler parameters
scheduler_args = {'num_warmup_steps': warmup_steps, 'num_training_steps': num_steps}

# Initialize the optimizer parameters
optimizer_args = {
    'lr': config['learning_rate'],
    'weight_decay': config['weight_decay'],
    # 'betas': (0.9, 0.98),
    'warmup_init': False,
    'relative_step': False
}

# Initialize the loaders parameters
train_loader_args = {'batch_size': config['batch_size']}

# Add the datasets and hyperparameters to trainer
trainer.compile(train_dataset, test_dataset, tokenizer, train_loader_args,
                optimizer_kwargs = optimizer_args,
                lr_scheduler=get_linear_schedule_with_warmup,
                lr_scheduler_kwargs=scheduler_args, 
                predict_with_generate = True,
                hugging_face = True,
                logging_dir="data/logs/bart_custom_train_fw"
                )

# We will from checkpoints so let us the model
# trainer.load(config['model_dir'], load_best=True) # Only for the first loading
trainer.load(config['new_model_dir'], load_best=True)

        

### --- Linear

In [11]:
trainer.train(epochs = config['epochs'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])



For epoch 6: 
{Learning rate: [0.0027708459507466156]}


Train batch number 99: 100%|██████████| 99/99 [00:32<00:00,  3.02batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:13<00:00,  1.07s/batches]



Metrics: {'train_loss': 0.5331674415354777, 'test_loss': 1.0587910505441518, 'bleu': 6.9969, 'gen_len': 10.8283}




  2%|▎         | 1/40 [00:55<36:20, 55.90s/it]

For epoch 7: 
{Learning rate: [0.002764611122255493]}


Train batch number 99: 100%|██████████| 99/99 [00:37<00:00,  2.67batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.28s/batches]



Metrics: {'train_loss': 0.4071016970909003, 'test_loss': 1.1061589167668269, 'bleu': 8.3503, 'gen_len': 12.8788}




  5%|▌         | 2/40 [01:58<38:01, 60.05s/it]

For epoch 8: 
{Learning rate: [0.002758376293764371]}


Train batch number 99: 100%|██████████| 99/99 [00:41<00:00,  2.40batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.27s/batches]



Metrics: {'train_loss': 0.31022625649818264, 'test_loss': 1.1236080481455877, 'bleu': 11.1201, 'gen_len': 12.8434}




  8%|▊         | 3/40 [03:06<39:01, 63.30s/it]

For epoch 9: 
{Learning rate: [0.0027521414652732484]}


Train batch number 99: 100%|██████████| 99/99 [00:38<00:00,  2.56batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:17<00:00,  1.33s/batches]



Metrics: {'train_loss': 0.23214480017471795, 'test_loss': 1.1920990256162791, 'bleu': 11.3352, 'gen_len': 12.803}




 10%|█         | 4/40 [04:11<38:32, 64.23s/it]

For epoch 10: 
{Learning rate: [0.002745906636782126]}


Train batch number 99: 100%|██████████| 99/99 [00:42<00:00,  2.34batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:18<00:00,  1.40s/batches]



Metrics: {'train_loss': 0.17824547617423414, 'test_loss': 1.1930893522042494, 'bleu': 15.4479, 'gen_len': 12.9192}




 12%|█▎        | 5/40 [05:21<38:36, 66.17s/it]

For epoch 11: 
{Learning rate: [0.0027396718082910035]}


Train batch number 99: 100%|██████████| 99/99 [00:43<00:00,  2.28batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:19<00:00,  1.48s/batches]



Metrics: {'train_loss': 0.13561171368517058, 'test_loss': 1.23938492169747, 'bleu': 11.4807, 'gen_len': 12.8333}




 15%|█▌        | 6/40 [06:28<37:45, 66.63s/it]

For epoch 12: 
{Learning rate: [0.002733436979799881]}


Train batch number 99: 100%|██████████| 99/99 [00:43<00:00,  2.28batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:19<00:00,  1.51s/batches]



Metrics: {'train_loss': 0.11536957188086076, 'test_loss': 1.269504345380343, 'bleu': 14.2235, 'gen_len': 12.7071}




 18%|█▊        | 7/40 [07:37<37:02, 67.35s/it]

For epoch 13: 
{Learning rate: [0.0027272021513087587]}


Train batch number 99: 100%|██████████| 99/99 [00:44<00:00,  2.22batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:20<00:00,  1.55s/batches]



Metrics: {'train_loss': 0.09750010362929767, 'test_loss': 1.3005648943094106, 'bleu': 14.2558, 'gen_len': 13.1212}




 20%|██        | 8/40 [08:47<36:22, 68.21s/it]

For epoch 14: 
{Learning rate: [0.0027209673228176363]}


Train batch number 99: 100%|██████████| 99/99 [00:47<00:00,  2.09batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:19<00:00,  1.51s/batches]



Metrics: {'train_loss': 0.0875176875052428, 'test_loss': 1.2973823547363281, 'bleu': 14.4621, 'gen_len': 13.1263}




 22%|██▎       | 9/40 [09:59<35:52, 69.43s/it]

For epoch 15: 
{Learning rate: [0.002714732494326514]}


Train batch number 99: 100%|██████████| 99/99 [00:42<00:00,  2.33batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:17<00:00,  1.31s/batches]



Metrics: {'train_loss': 0.07727035767201221, 'test_loss': 1.3073506813782911, 'bleu': 14.6288, 'gen_len': 12.6313}




 25%|██▌       | 10/40 [11:05<34:12, 68.43s/it]

For epoch 16: 
{Learning rate: [0.0027084976658353914]}


Train batch number 99: 100%|██████████| 99/99 [00:44<00:00,  2.23batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:19<00:00,  1.52s/batches]



Metrics: {'train_loss': 0.06916370265411609, 'test_loss': 1.3111402850884657, 'bleu': 13.4481, 'gen_len': 13.3384}




 28%|██▊       | 11/40 [12:15<33:11, 68.66s/it]

For epoch 17: 
{Learning rate: [0.0027022628373442686]}


Train batch number 99: 100%|██████████| 99/99 [00:46<00:00,  2.14batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:18<00:00,  1.40s/batches]



Metrics: {'train_loss': 0.06065154491425163, 'test_loss': 1.350119411945343, 'bleu': 14.5282, 'gen_len': 12.2626}




 30%|███       | 12/40 [13:24<32:10, 68.96s/it]

For epoch 18: 
{Learning rate: [0.002696028008853146]}


Train batch number 99: 100%|██████████| 99/99 [00:43<00:00,  2.27batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:17<00:00,  1.35s/batches]



Metrics: {'train_loss': 0.05891040919555558, 'test_loss': 1.3430194808886602, 'bleu': 13.8888, 'gen_len': 13.1364}




 32%|███▎      | 13/40 [14:30<30:38, 68.10s/it]

For epoch 19: 
{Learning rate: [0.0026897931803620237]}


Train batch number 99: 100%|██████████| 99/99 [00:41<00:00,  2.37batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.26s/batches]



Metrics: {'train_loss': 0.0532477025619962, 'test_loss': 1.3583722343811622, 'bleu': 12.8109, 'gen_len': 12.399}




 35%|███▌      | 14/40 [15:34<28:52, 66.65s/it]

For epoch 20: 
{Learning rate: [0.0026835583518709013]}


Train batch number 99: 100%|██████████| 99/99 [00:41<00:00,  2.40batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:19<00:00,  1.46s/batches]



Metrics: {'train_loss': 0.05010606170716611, 'test_loss': 1.3336971723116362, 'bleu': 13.5189, 'gen_len': 12.8939}




 38%|███▊      | 15/40 [16:39<27:38, 66.32s/it]

For epoch 21: 
{Learning rate: [0.002677323523379779]}


Train batch number 99: 100%|██████████| 99/99 [00:49<00:00,  1.98batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:18<00:00,  1.44s/batches]



Metrics: {'train_loss': 0.045728730865650706, 'test_loss': 1.3664012092810411, 'bleu': 15.4021, 'gen_len': 12.6465}




 40%|████      | 16/40 [17:53<27:28, 68.69s/it]

For epoch 22: 
{Learning rate: [0.0026710886948886564]}


Train batch number 99: 100%|██████████| 99/99 [00:40<00:00,  2.43batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:17<00:00,  1.35s/batches]



Metrics: {'train_loss': 0.04393004452941394, 'test_loss': 1.359107723602882, 'bleu': 15.0084, 'gen_len': 12.9141}




 42%|████▎     | 17/40 [18:57<25:43, 67.12s/it]

For epoch 23: 
{Learning rate: [0.002664853866397534]}


Train batch number 99: 100%|██████████| 99/99 [00:40<00:00,  2.44batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.28s/batches]



Metrics: {'train_loss': 0.042096095648829386, 'test_loss': 1.343453443967379, 'bleu': 13.885, 'gen_len': 12.6566}




 45%|████▌     | 18/40 [19:59<24:05, 65.69s/it]

For epoch 24: 
{Learning rate: [0.0026586190379064116]}


Train batch number 99: 100%|██████████| 99/99 [00:40<00:00,  2.43batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:19<00:00,  1.54s/batches]



Metrics: {'train_loss': 0.04308000018801352, 'test_loss': 1.3771590131979723, 'bleu': 14.8924, 'gen_len': 12.8687}




 48%|████▊     | 19/40 [21:05<22:59, 65.68s/it]

For epoch 25: 
{Learning rate: [0.002652384209415289]}


Train batch number 99: 100%|██████████| 99/99 [00:45<00:00,  2.19batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:19<00:00,  1.49s/batches]



Metrics: {'train_loss': 0.03766482586812491, 'test_loss': 1.3727525885288532, 'bleu': 15.2926, 'gen_len': 13.1515}




 50%|█████     | 20/40 [22:15<22:20, 67.02s/it]

For epoch 26: 
{Learning rate: [0.0026461493809241668]}


Train batch number 99: 100%|██████████| 99/99 [00:45<00:00,  2.15batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:18<00:00,  1.42s/batches]



Metrics: {'train_loss': 0.037945234433117536, 'test_loss': 1.367100633107699, 'bleu': 14.912, 'gen_len': 12.9495}




 52%|█████▎    | 21/40 [23:25<21:29, 67.87s/it]

For epoch 27: 
{Learning rate: [0.0026399145524330443]}


Train batch number 99: 100%|██████████| 99/99 [00:45<00:00,  2.19batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:18<00:00,  1.40s/batches]



Metrics: {'train_loss': 0.037307741440305804, 'test_loss': 1.4190766215324402, 'bleu': 13.8761, 'gen_len': 12.4495}




 55%|█████▌    | 22/40 [24:33<20:25, 68.06s/it]

For epoch 28: 
{Learning rate: [0.002633679723941922]}


Train batch number 99: 100%|██████████| 99/99 [00:43<00:00,  2.28batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:17<00:00,  1.33s/batches]



Metrics: {'train_loss': 0.030500840057026257, 'test_loss': 1.3966215207026556, 'bleu': 14.8674, 'gen_len': 12.8889}




 57%|█████▊    | 23/40 [25:39<19:06, 67.43s/it]

For epoch 29: 
{Learning rate: [0.0026274448954507995]}


Train batch number 99: 100%|██████████| 99/99 [00:40<00:00,  2.45batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.30s/batches]



Metrics: {'train_loss': 0.030101753197459863, 'test_loss': 1.4298279652228723, 'bleu': 13.4793, 'gen_len': 12.8939}




 60%|██████    | 24/40 [26:42<17:34, 65.90s/it]

For epoch 30: 
{Learning rate: [0.0026212100669596766]}


Train batch number 99: 100%|██████████| 99/99 [00:40<00:00,  2.42batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:17<00:00,  1.37s/batches]



Metrics: {'train_loss': 0.027759757824242115, 'test_loss': 1.3836947771219106, 'bleu': 15.3309, 'gen_len': 13.2677}




 62%|██████▎   | 25/40 [27:45<16:18, 65.23s/it]

For epoch 31: 
{Learning rate: [0.002614975238468554]}


Train batch number 99: 100%|██████████| 99/99 [00:40<00:00,  2.43batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:17<00:00,  1.38s/batches]



Metrics: {'train_loss': 0.028990222587993348, 'test_loss': 1.3779189403240497, 'bleu': 15.9085, 'gen_len': 13.1313}




 65%|██████▌   | 26/40 [28:56<15:35, 66.83s/it]

For epoch 32: 
{Learning rate: [0.002608740409977432]}


Train batch number 99: 100%|██████████| 99/99 [00:46<00:00,  2.12batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:21<00:00,  1.67s/batches]



Metrics: {'train_loss': 0.023429580773650246, 'test_loss': 1.4177517524132361, 'bleu': 13.9763, 'gen_len': 12.8535}




 68%|██████▊   | 27/40 [30:10<14:54, 68.84s/it]

For epoch 33: 
{Learning rate: [0.0026025055814863094]}


Train batch number 99: 100%|██████████| 99/99 [00:42<00:00,  2.35batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:17<00:00,  1.36s/batches]



Metrics: {'train_loss': 0.025069100744646005, 'test_loss': 1.3987321440990155, 'bleu': 15.3394, 'gen_len': 13.1162}




 70%|███████   | 28/40 [31:14<13:32, 67.67s/it]

For epoch 34: 
{Learning rate: [0.002596270752995187]}


Train batch number 99: 100%|██████████| 99/99 [00:42<00:00,  2.33batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.28s/batches]



Metrics: {'train_loss': 0.025375749219698134, 'test_loss': 1.4128950467476478, 'bleu': 14.4662, 'gen_len': 13.2576}




 72%|███████▎  | 29/40 [32:19<12:13, 66.69s/it]

For epoch 35: 
{Learning rate: [0.0025900359245040645]}


Train batch number 99: 100%|██████████| 99/99 [00:38<00:00,  2.54batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.29s/batches]



Metrics: {'train_loss': 0.02386736071602714, 'test_loss': 1.4082451233497033, 'bleu': 16.5471, 'gen_len': 13.0859}




 75%|███████▌  | 30/40 [33:24<11:01, 66.16s/it]

For epoch 36: 
{Learning rate: [0.0025838010960129417]}


Train batch number 99: 100%|██████████| 99/99 [00:39<00:00,  2.51batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:15<00:00,  1.22s/batches]



Metrics: {'train_loss': 0.022211450810610045, 'test_loss': 1.4087164677106416, 'bleu': 15.9951, 'gen_len': 12.6566}




 78%|███████▊  | 31/40 [34:24<09:40, 64.46s/it]

For epoch 37: 
{Learning rate: [0.0025775662675218193]}


Train batch number 99: 100%|██████████| 99/99 [00:39<00:00,  2.52batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:15<00:00,  1.19s/batches]



Metrics: {'train_loss': 0.0220291623516441, 'test_loss': 1.435927308522738, 'bleu': 14.6881, 'gen_len': 12.7879}




 80%|████████  | 32/40 [35:25<08:26, 63.30s/it]

For epoch 38: 
{Learning rate: [0.0025713314390306973]}


Train batch number 99: 100%|██████████| 99/99 [00:38<00:00,  2.56batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.28s/batches]



Metrics: {'train_loss': 0.02161514088793686, 'test_loss': 1.4156509041786194, 'bleu': 16.5164, 'gen_len': 12.9899}




 82%|████████▎ | 33/40 [36:25<07:17, 62.48s/it]

For epoch 39: 
{Learning rate: [0.002565096610539575]}


Train batch number 99: 100%|██████████| 99/99 [00:38<00:00,  2.54batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:17<00:00,  1.32s/batches]



Metrics: {'train_loss': 0.02076074972071431, 'test_loss': 1.4084171606944158, 'bleu': 16.3584, 'gen_len': 13.303}




 85%|████████▌ | 34/40 [37:27<06:12, 62.12s/it]

For epoch 40: 
{Learning rate: [0.0025588617820484524]}


Train batch number 99: 100%|██████████| 99/99 [00:39<00:00,  2.53batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.30s/batches]



Metrics: {'train_loss': 0.0210536178201437, 'test_loss': 1.4041353372427134, 'bleu': 16.3989, 'gen_len': 13.1414}




 88%|████████▊ | 35/40 [38:28<05:09, 61.81s/it]

For epoch 41: 
{Learning rate: [0.00255262695355733]}


Train batch number 99: 100%|██████████| 99/99 [00:38<00:00,  2.55batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.28s/batches]



Metrics: {'train_loss': 0.019451439347720208, 'test_loss': 1.4175263734964223, 'bleu': 15.2361, 'gen_len': 12.8081}




 90%|█████████ | 36/40 [39:28<04:05, 61.40s/it]

For epoch 42: 
{Learning rate: [0.002546392125066207]}


Train batch number 99: 100%|██████████| 99/99 [00:39<00:00,  2.54batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:15<00:00,  1.20s/batches]



Metrics: {'train_loss': 0.017754000148293796, 'test_loss': 1.43679894392307, 'bleu': 16.6226, 'gen_len': 12.601}




 92%|█████████▎| 37/40 [40:32<03:06, 62.13s/it]

For epoch 43: 
{Learning rate: [0.0025401572965750847]}


Train batch number 99: 100%|██████████| 99/99 [00:38<00:00,  2.54batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.25s/batches]



Metrics: {'train_loss': 0.019889665734624924, 'test_loss': 1.4331138271551866, 'bleu': 15.3736, 'gen_len': 13.1515}




 95%|█████████▌| 38/40 [41:33<02:03, 61.74s/it]

For epoch 44: 
{Learning rate: [0.0025339224680839623]}


Train batch number 99: 100%|██████████| 99/99 [00:38<00:00,  2.55batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:16<00:00,  1.27s/batches]



Metrics: {'train_loss': 0.018458652404146365, 'test_loss': 1.4297356788928692, 'bleu': 15.0022, 'gen_len': 13.0909}




 98%|█████████▊| 39/40 [42:33<01:01, 61.37s/it]

For epoch 45: 
{Learning rate: [0.00252768763959284]}


Train batch number 99: 100%|██████████| 99/99 [00:39<00:00,  2.53batches/s]
Test batch number 13: 100%|██████████| 13/13 [00:15<00:00,  1.21s/batches]



Metrics: {'train_loss': 0.018007289124370524, 'test_loss': 1.455772720850431, 'bleu': 15.0748, 'gen_len': 12.7172}




100%|██████████| 40/40 [43:33<00:00, 65.35s/it]


### --- Cosine

In [None]:
trainer.train(epochs = config['epochs'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])

### ---

In [None]:
trainer.train(epochs = config['epochs'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])

### ---

In [None]:
trainer.train(epochs = config['epochs'] - trainer.current_epoch, auto_save=True, metric_for_best_model='bleu', metric_objective='maximize', log_step=1,
              saving_directory = config['new_model_dir'])

### Predictions and Evaluation

In [13]:
# initialize the transformation sequence
end_mark_fn = partial(add_end_mark)
fr_augmentation = TransformerSequences(remove_mark_space, delete_guillemet_space, add_mark_space, end_mark_fn)

# let us get the test set
test_dataset = SentenceDataset(f"data/extractions/new_data/test_set.csv",
                                        tokenizer = tokenizer,
                                        cp1_transformer = fr_augmentation,
                                        cp2_transformer = fr_augmentation,
                                        truncation = True, add_bos_token=True)

Let us make the evaluation and print the predicted sentences.

In [14]:
# evaluation with test set
df_ft_to_wf = trainer.evaluate(test_dataset)

Evaluation batch number 13: 100%|██████████| 13/13 [00:44<00:00,  3.40s/batches]

predictions_: [[2, 0, 3684, 850, 136, 888, 5, 8, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [2, 0, 3673, 157, 1294, 13, 40, 282, 946, 82, 5, 8, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [2, 0, 3690, 850, 11, 5, 8, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [2, 0, 3684, 14, 14, 1175, 6, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [2, 0, 1574, 1625, 77, 2501, 23, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [2, 0, 3674, 17, 2029, 633, 6, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [2, 0, 3684, 14, 14, 1643, 274, 23, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 




In [15]:
df_ft_to_wf[1].tail(10)

Unnamed: 0,original_sentences,translations,predictions
188,Donne le travail à un autre!,Joxal kenn liggéey bi!,Joxal téere bi doomu benn jigéen!
189,Cet homme qui a été.,Góor gii demoon.,Gor gii dem.
190,Dites-lui.,Nileen ka.,Nileen leen.
191,Il est là.,Ma ŋgoogule foofu.,Mi ŋgi fi.
192,Tu vois cet homme là-bas?,Gis ŋga nit kale?,Gis ŋga nit kee?
193,L'homme est parti je crois!,Ma defe góor gi dem na!,Góor gi dem na ma defe!
194,J'ai aperçu un baobab.,Séen naa ag guy.,Séen naa aw fas.
195,Celui-ci serait parti.,Kii dafa demkoon.,Kii dafa demkoon.
196,Toutes les portes étaient ouvertes.,Bunt yi yépp a tëjju woon.,Ci biir foofu yiooyale deey bëggu leen.
197,"C'est Fatim, aujourd'hui.","Faatim la, tay.",Faatim la soo demee.


In [16]:
# let us display 100 samples
pd.options.display.max_rows = 100
df_ft_to_wf[1].sample(100)

Unnamed: 0,original_sentences,translations,predictions
79,Nous fûmes Laobé.,Laobe lanu woon.,Laobe lañu woon.
48,"Quiconque s'en va, celui-là est un froussard.","Képp ku dem, kooku raggal la.","Kann, kookuleuñu dara."
68,Qu'ils ne partent pas!,Bu ñu dem!,Bu ñu dem!
38,Le gars est arrivé un peu en retard.,Góor gi egg na wanté tardé na tuuti.,Lépp jeex na.
60,Mes voeux d'anniversaire.,Mangui lay ndokkel ci sa bess bu délu si bi.,Mangui lay set.
40,Le voilà assis là-bas.,Ma ŋgii toog.,Ñu ŋgoogee.
100,Quel démon!,Moo di loola!,Moo di loola!
54,Ils ont tous convenu qu'ils travailleraient to...,Degganté ñu ñoom ñép ci lu ñép di liggéeyi ci ...,Laobe lañu woon.
52,"Et pourtant, je l'ai vu.","Moontin dey, gis naa ka.","Moontin, man, më dem."
127,C'est que j'ai effectivement été.,Dama dem.,Maa ŋgii demoon.


### Most similar tokens

In this subsection we want to find what tokens have most similarity between them using the embeddings. We must, at first, extract the weights of the embedding layer.

In [19]:
# the following can compare two vectors using the cosine similarity
def cosine_similarity(embed_1: torch.Tensor, embed_2: torch.Tensor):
  return np.dot(embed_1, embed_2)/(np.linalg.norm(embed_1) * np.linalg.norm(embed_2))

# recuperate the embeddings
embeddings = trainer.model.model.shared.weight

# let us compare the token at index 10 and 100
cosine_similarity(embeddings[10].cpu().detach().numpy(), embeddings[100].cpu().detach().numpy())

0.2508285

The cosine similarity is between 0 and 1. A higher value represents a higher similarity between two tokens.

To compare two groups of words we must tokenize them add the vectors of the tokens we obtain and calculate the similarity between the two groups' vectors.

Let us take the example of the words 'wax ak kii' and 'waxtaan ak yow'.

In [22]:
word1 = 'wax ak kii'

word2 = 'waxtaan ak yow'

# get the token ids
word1_id = tokenizer(word1, return_tensors='pt')['input_ids'][0]

word2_id = tokenizer(word2, return_tensors='pt')['input_ids'][0]

word1_id, word2_id


(tensor([2753,   73,  542]), tensor([2753,   31, 1116,   73,   48, 3452]))

Let us get their vectors with the following function.

In [37]:
def get_embedding(model: torch.nn.Module, word_id: torch.Tensor):
  
  embeddings = model.model.shared.weight.cpu().detach().numpy()
  
  return embeddings[word_id].mean(axis=0) if word_id.shape[0] else embeddings[word_id]

embeddings_1 = get_embedding(trainer.model, word1_id)

embeddings_2 = get_embedding(trainer.model, word2_id)

We can then calculate the cosine similarity between the embeddings.

In [39]:
similarity = cosine_similarity(embeddings_1, embeddings_2)

similarity

0.5306478

Let us add the two tasks inside an unique function.

In [45]:
def calculate_similary(model: torch.nn.Module, word1: str, word2: str):
  
  word1_id = tokenizer(word1, return_tensors='pt')['input_ids'][0]
  
  word2_id = tokenizer(word2, return_tensors='pt')['input_ids'][0]
  
  embeddings_1 = get_embedding(model, word1_id)
  
  embeddings_2 = get_embedding(model, word2_id)
  
  return cosine_similarity(embeddings_1, embeddings_2)

calculate_similary(trainer.model, 'wax ak kii', 'waxtaan ak yow')

0.5306478