<a href="https://colab.research.google.com/github/EverlynAsiko/Neural_Machine_Translation_for_African_Languages/blob/main/All_Multilingual_NMT_results1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multilingual neural machine translation.

This is the utilization of more than one pair of language in machine translation. Essentially, one can use several languages pairs of languages in one model to either perform many-to-one, one-to-many or many-to-many translation.

For this case, we shall to a many-to-one translation:
{Kinyarwanda, Luganda, Luhya} to English. With this model, we do not need any type of special tagging but we concatenate the datasets.

In [None]:
# Linking to drive
from google.colab import drive
drive.mount("/content/gdrive")


Mounted at /content/gdrive


In [None]:
# Importing needed libraries for preprocessing and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#@title Default title text
# Install Pytorch with GPU support v1.8.0.
! pip install torch==1.8.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.8.0+cu101
  Downloading https://download.pytorch.org/whl/cu101/torch-1.8.0%2Bcu101-cp37-cp37m-linux_x86_64.whl (763.5 MB)
[K     |████████████████████████████████| 763.5 MB 14 kB/s 
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.9.0+cu102
    Uninstalling torch-1.9.0+cu102:
      Successfully uninstalled torch-1.9.0+cu102
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.10.0+cu102 requires torch==1.9.0, but you have torch 1.8.0+cu101 which is incompatible.
torchtext 0.10.0 requires torch==1.9.0, but you have torch 1.8.0+cu101 which is incompatible.[0m
Successfully installed torch-1.8.0+cu101


In [None]:
# Filtering warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Loading the drive
import os
os.chdir("/content/gdrive/Shared drives/NMT_for_African_Language")

In [None]:
# Setting source and target languages
source_language = "en"
target_language = "lg_rw_lh"

os.environ["src"] = source_language 
os.environ["tgt"] = target_language

## Data preprocessing

In [None]:
! head Luganda/train.*
! head Luganda/dev.*

==> Luganda/train.bpe.en <==
Ev@@ en@@ tually , however , the tru@@ ths I learned from the Bible began to sin@@ k deep@@ er into my heart . I real@@ ized that if I wanted to serve Jehovah , I had to change my pol@@ it@@ ical view@@ poin@@ ts and associ@@ ations .
At last , I have the st@@ able family life that I always cr@@ av@@ ed , and I have the loving Father that I always wanted .
I was a new husband , only 25 years old and very in@@ experienced , but off we went with confidence in Jehovah .
What can you do to show these de@@ a@@ f brothers personal attention ?
R@@ ef@@ er@@ r@@ ing to what the rul@@ er@@ ship of God’s Son will accompl@@ ish , Isaiah 9 : 7 says : “ The very z@@ eal of Jehovah of arm@@ ies will do this . ”
Jesus is the m@@ igh@@ ti@@ est of all of Jehovah’s spirit sons .
The ste@@ ad@@ f@@ ast example set by J@@ ac@@ o@@ b and R@@ ac@@ he@@ l no doubt had a powerful effect on their son Joseph , influ@@ enc@@ ing how he would hand@@ le t@@ ests of his own faith .
Whe

In [None]:
! head Kinyarwanda/train.*
! head Kinyarwanda/dev.*

==> Kinyarwanda/train.bpe.en <==
R@@ ight after his bapt@@ ism , he “ went off into Ar@@ ab@@ ia ” ​ — e@@ ither the S@@ y@@ ri@@ an D@@ es@@ ert or pos@@ sib@@ ly some qu@@ i@@ et place on the Ar@@ ab@@ ian P@@ en@@ ins@@ ul@@ a that was conduc@@ ive to med@@ it@@ ation .
You will see the time when God br@@ ings righteous rule to all the earth , und@@ o@@ ing the d@@ am@@ age and inj@@ ust@@ ice brought by human rul@@ er@@ ship .
Let us consider f@@ ive reas@@ ons why we should want to follow the Christ .
Even in the Bible , the id@@ ea of pers@@ u@@ as@@ ion som@@ et@@ imes has n@@ eg@@ ative con@@ no@@ t@@ ations , den@@ ot@@ ing a cor@@ rup@@ ting or a lead@@ ing as@@ tr@@ ay .
For God’s servants to be deliv@@ ered , Satan and his ent@@ ire world@@ wide system of things need to be rem@@ ov@@ ed .
I had never heard that name used in my ch@@ urch .
S@@ imp@@ ly having authority or a wid@@ er name recogn@@ ition is not the important thing .
M@@ ost people do not believe in the spir@@ 

In [None]:
! head Luhyia/train.*
! head Luhyia/dev.*

==> Luhyia/train.bpe.en <==
Then Pilate entered the P@@ ra@@ et@@ or@@ i@@ um again , called Jesus , and said to Him , “ Are You the King of the Jews ? ”
If anyone th@@ in@@ ks himself to be a prophet or spirit@@ ual , let him ac@@ knowledge that the things which I write to you are the commandments of the Lord .
E@@ very br@@ an@@ ch in Me that does not bear fruit He tak@@ es away ; and every br@@ an@@ ch that be@@ ars fruit He pr@@ un@@ es , that it may bear more fruit .
D@@ em@@ et@@ ri@@ us has a good testimony from all , and from the truth its@@ el@@ f . And we also bear witness , and you know that our testimony is true .
And supp@@ er being ended , the devil having already put it into the heart of Judas Is@@ c@@ ari@@ ot , Simon ’ s son , to betr@@ ay Him ,
im@@ pl@@ or@@ ing us with much ur@@ gen@@ c@@ y that we would receive the gift and the fel@@ low@@ ship of the minis@@ ter@@ ing to the saints .
It is written in the prophets , ‘ And they shall all be taught by G@@ od@@ . ’ Th

In [None]:
pre = '/content/gdrive/Shared drives/NMT_for_African_Language/'
# Train data source
filenames = [pre+'Luganda/train.en', pre+'Kinyarwanda/train.en',pre+'Luhyia/train.en']

# Train data target
filenames2 = [pre+'Luganda/train.lg', pre+'Kinyarwanda/train.rw',pre+'Luhyia/train.lh']

# Dev data source
file1 = [pre+'Luganda/dev.en', pre+'Kinyarwanda/dev.en',pre+'Luhyia/dev.en']

# Dev data target
file2 = [pre+'Luganda/dev.lg', pre+'Kinyarwanda/dev.rw',pre+'Luhyia/dev.lh']

In [None]:
# Changing to Multilingual directory
os.chdir("/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual")

In [None]:
# Procedure to create concatenated files
def create_file(x,filename):
  # Open filename in write mode
  with open(filename, 'w') as outfile:
      for names in x:
          # Open each file in read mode
          with open(names) as infile:
              # read the data and write it in file3
              outfile.write(infile.read())
          outfile.write("\n")

In [None]:
# Creating multilingual files
create_file(filenames,'train.en')
create_file(filenames2,'train.lg_rw_lh')
create_file(file1,'dev.en')
create_file(file2,'dev.lg_rw_lh')

### BPE codes

In [None]:
#! git clone https://github.com/joeynmt/joeynmt.git
! cd joeynmt; pip3 install .

Processing /content/gdrive/Shareddrives/NMT_for_African_Language/Multilingual/joeynmt
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
Collecting numpy==1.20.1
  Downloading numpy-1.20.1-cp37-cp37m-manylinux2010_x86_64.whl (15.3 MB)
[K     |████████████████████████████████| 15.3 MB 96 kB/s 
Collecting torchtext==0.9.0
  Downloading torchtext-0.9.0-cp37-cp37m-manylinux1_x86_64.whl (7.1 MB)
[K     |████████████████████████████████| 7.1 MB 17.4 MB/s 
[?25hCollecting sacrebleu>=1.3.6
  Downloading sacrebleu-1.5.1-py3-none-any.whl (54 kB)
[K     |████████████████████████████████| 54 kB 3.1 MB/s 
[?25hCollecting subword-nmt
  Downloading

#### Baseline BPEs

In [None]:
# Apply BPE splits to the development and test data.
! subword-nmt learn-joint-bpe-and-vocab --input train.$src train.$tgt -s 4000 -o bpe.codes.4000 --write-vocabulary vocab.$src vocab.$tgt

# Apply BPE splits to the development and test data.
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < train.$src > train.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < train.$tgt > train.bpe.$tgt

! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < dev.$src > dev.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < dev.$tgt > dev.bpe.$tgt

# Create that vocab using build_vocab
! sudo chmod 777 joeynmt/scripts/build_vocab.py
! joeynmt/scripts/build_vocab.py train.bpe.$src train.bpe.$tgt --output_path vocab.txt

In [None]:
# Applying BPE to tests
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < test1.$src > test.bpe.en1
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < test1.lh > test.bpe.lh

! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < test2.$src > test.bpe.en2
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < test2.lg > test.bpe.lg

! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < test3.$src > test.bpe.en3
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < test3.rw > test.bpe.rw

In [None]:
# Some output
! echo "BPE Sentences"
! tail -n 5 test.bpe.$tgt
! echo "Combined BPE Vocab"
! tail -n 10 vocab.txt

BPE Sentences
N@@ asi , n@@ ir@@ ee@@ ba end@@ i , ‘ N@@ iwe w@@ ina , Omw@@ ami ? ’ Omw@@ oyo ok@@ wo , n@@ i@@ kum@@ bo@@ ol@@ el@@ a kuri , ‘ N@@ is@@ ie Yesu o@@ wa N@@ az@@ ar@@ eti ow@@ os@@ a@@ and@@ inj@@ ia . ’
sh@@ ic@@ hil@@ a , omuk@@ h@@ aan@@ awe omut@@ el@@ wa , ow@@ em@@ iy@@ ika ek@@ hum@@ i n@@ ach@@ ib@@ ili yali n@@ any@@ iranga . N@@ e olwa yali nat@@ s@@ it@@ s@@ anga , aband@@ u , ba@@ mw@@ ib@@ um@@ bak@@ h@@ wo okh@@ ur@@ ula mut@@ s@@ imb@@ eka t@@ si@@ osi .
N@@ e olwa kab@@ is@@ ibwa mbu kh@@ uk@@ ho@@ y@@ ile okh@@ ut@@ si@@ ila , m@@ um@@ e@@ eli okh@@ uula I@@ tal@@ ia , bah@@ aana Paul@@ o nende aba@@ bo@@ he , b@@ andi k@@ hum@@ us@@ inj@@ il@@ ili w@@ el@@ ihe J@@ ul@@ i@@ asi ow@@ e@@ ing@@ '@@ anda e@@ ya , esh@@ ir@@ oma ey@@ il@@ angwa mbu , “ I@@ ng@@ '@@ anda ey@@ il@@ ind@@ anga , O@@ mur@@ uc@@ h@@ i . ”
Ol@@ uny@@ um@@ ak@@ h@@ wo , aba@@ ku@@ uka be@@ f@@ we , aba@@ bu@@ kul@@ a l@@ ih@@ e@@ ema el@@ o okh@@ ur@@ ula kh@@ u@@ bas@@ abwe , bal

In [None]:
! tail train.*
! tail dev.*

==> train.bpe.en <==
But if anyone lov@@ es God , this one is known by H@@ im .
And the second is like it : ‘ You shall love your neigh@@ b@@ or as yourself . ’
until the day in which He was taken up , after He through the H@@ ol@@ y Sp@@ ir@@ it had given command@@ ments to the apostles whom He had cho@@ s@@ en ,
For what if some did not believe ? W@@ ill their un@@ beli@@ ef make the faith@@ ful@@ ness of God without ef@@ fect ?
And when you go into a hous@@ eho@@ ld , gre@@ et it .
And a very great mul@@ t@@ itude sp@@ read their clo@@ th@@ es on the ro@@ ad ; others c@@ ut down br@@ an@@ ch@@ es from the tre@@ es and sp@@ read them on the ro@@ ad .
And we heard this vo@@ ice which came from heaven when we were with H@@ im on the holy m@@ ount@@ ain .
O@@ r those e@@ igh@@ te@@ en on whom the tower in S@@ il@@ o@@ am f@@ ell and kill@@ ed them , do you think that they were wor@@ se sin@@ n@@ ers than all other men who dw@@ el@@ t in Jerusalem ?
For I be@@ ar him witness that he has 

## Modelling

### Baseline MNMT

In [None]:
#@title
name = '%s%s' % (target_language, source_language)

# Create the config
config = """
name: "{target_language}{source_language}_reverse_transformer"

data:
    src: "{target_language}"
    trg: "{source_language}"
    train: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/train.bpe"
    dev:   "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/dev.bpe"
    test:  "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe"
    level: "bpe"
    lowercase: False
    max_sent_length: 100
    src_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"
    trg_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"

testing:
    beam_size: 5
    alpha: 1.0

training:
    #load_model: "{gdrive_path}/models/{name}_transformer/1.ckpt" # if uncommented, load a pre-trained model from this checkpoint
    random_seed: 42
    optimizer: "adam"
    normalization: "tokens"
    adam_betas: [0.9, 0.999] 
    scheduling: "plateau"           # TODO: try switching from plateau to Noam scheduling
    patience: 5                     # For plateau: decrease learning rate by decrease_factor if validation score has not improved for this many validation rounds.
    learning_rate_factor: 0.5       # factor for Noam scheduler (used with Transformer)
    learning_rate_warmup: 1000      # warmup steps for Noam scheduler (used with Transformer)
    decrease_factor: 0.7
    loss: "crossentropy"
    learning_rate: 0.0003
    learning_rate_min: 0.00000001
    weight_decay: 0.0
    label_smoothing: 0.1
    batch_size: 4096
    batch_type: "token"
    eval_batch_size: 1000
    eval_batch_type: "token"
    batch_multiplier: 1
    early_stopping_metric: "ppl"
    epochs: 30                  # TODO: Decrease for when playing around and checking of working. Around 30 is sufficient to check if its working at all
    validation_freq: 5000         # TODO: Set to at least once per epoch.
    logging_freq: 200
    eval_metric: "bleu"
    model_dir: "models/{name}_reverse_transformer"
    overwrite: True 
    shuffle: True
    use_cuda: True
    max_output_length: 100
    print_valid_sents: [0, 1, 2, 3]
    keep_last_ckpts: 3

model:
    initializer: "xavier"
    bias_initializer: "zeros"
    init_gain: 1.0
    embed_initializer: "xavier"
    embed_init_gain: 1.0
    tied_embeddings: True
    tied_softmax: True
    encoder:
        type: "transformer"
        num_layers: 6
        num_heads: 4             # TODO: Increase to 8 for larger data.
        embeddings:
            embedding_dim: 256   # TODO: Increase to 512 for larger data.
            scale: True
            dropout: 0.2
        # typically ff_size = 4 x hidden_size
        hidden_size: 256         # TODO: Increase to 512 for larger data.
        ff_size: 1024            # TODO: Increase to 2048 for larger data.
        dropout: 0.3
    decoder:
        type: "transformer"
        num_layers: 6
        num_heads: 4              # TODO: Increase to 8 for larger data.
        embeddings:
            embedding_dim: 256    # TODO: Increase to 512 for larger data.
            scale: True
            dropout: 0.2
        # typically ff_size = 4 x hidden_size
        hidden_size: 256         # TODO: Increase to 512 for larger data.
        ff_size: 1024            # TODO: Increase to 2048 for larger data.
        dropout: 0.3
""".format(name=name, gdrive_path="/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual", source_language=source_language, target_language=target_language)
with open("joeynmt/configs/transformer_reverse_{name}.yaml".format(name=name),'w') as f:
    f.write(config)

In [None]:
# Train the model
!cd joeynmt; python3 -m joeynmt train configs/transformer_reverse_$tgt$src.yaml

2021-07-13 09:47:18,455 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-13 09:47:18,522 - INFO - joeynmt.data - Loading training data...
2021-07-13 09:47:31,882 - INFO - joeynmt.data - Building vocabulary...
2021-07-13 09:47:32,193 - INFO - joeynmt.data - Loading dev data...
2021-07-13 09:47:32,324 - INFO - joeynmt.data - Loading test data...
2021-07-13 09:47:32,915 - INFO - joeynmt.data - Data loaded.
2021-07-13 09:47:32,915 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-13 09:47:33,310 - INFO - joeynmt.model - Enc-dec model built.
2021-07-13 09:47:33.555721: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-13 09:47:37,600 - INFO - joeynmt.training - Total params: 12179456
2021-07-13 09:47:48,366 - INFO - joeynmt.helpers - cfg.name                           : lg_rw_lhen_reverse_transformer
2021-07-13 09:47:48,366 - INFO - joeynmt.helpers - cfg.data.src                    

15 epochs completed

In [None]:
# Reloading configuration file
ckpt_number = 120000
reload_config = config.replace(
    f'#load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/models/lg_rw_lhen_transformer/1.ckpt"', 
    f'load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer/{ckpt_number}.ckpt"').replace(
        f'model_dir: "models/lg_rw_lhen_reverse_transformer"', f'model_dir: "models/lg_rw_lhen_reverse_transformer_continued"').replace(
        f'epochs: 30', f'epochs: 15')
        
with open("joeynmt/configs/transformer_{name}_reload.yaml".format(name=name),'w') as f:
    f.write(reload_config)

In [None]:
!cat "joeynmt/configs/transformer_lg_rw_lhen_reload.yaml"


name: "lg_rw_lhen_reverse_transformer"

data:
    src: "lg_rw_lh"
    trg: "en"
    train: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/train.bpe"
    dev:   "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/dev.bpe"
    test:  "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe"
    level: "bpe"
    lowercase: False
    max_sent_length: 100
    src_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"
    trg_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"

testing:
    beam_size: 5
    alpha: 1.0

training:
    load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer/120000.ckpt" # if uncommented, load a pre-trained model from this checkpoint
    random_seed: 42
    optimizer: "adam"
    normalization: "tokens"
    adam_betas: [0.9, 0.999] 
    scheduling: "plateau"          

In [None]:
# Train continued
!cd joeynmt; python3 -m joeynmt train configs/transformer_lg_rw_lhen_reload.yaml

2021-07-13 15:53:21,970 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-13 15:53:22,042 - INFO - joeynmt.data - Loading training data...
2021-07-13 15:53:35,796 - INFO - joeynmt.data - Building vocabulary...
2021-07-13 15:53:36,469 - INFO - joeynmt.data - Loading dev data...
2021-07-13 15:53:37,293 - INFO - joeynmt.data - Loading test data...
2021-07-13 15:53:38,048 - INFO - joeynmt.data - Data loaded.
2021-07-13 15:53:38,049 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-13 15:53:38,431 - INFO - joeynmt.model - Enc-dec model built.
2021-07-13 15:53:38.688099: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-13 15:53:42,331 - INFO - joeynmt.training - Total params: 12179456
2021-07-13 15:53:52,972 - INFO - joeynmt.training - Loading model from /content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer/120000.ckpt
2021-

4 epochs completed

In [None]:
# Reloading configuration file
ckpt_number = 150000
reload_config = config.replace(
    f'#load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/models/lg_rw_lhen_transformer/1.ckpt"', 
    f'load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued/{ckpt_number}.ckpt"').replace(
        f'model_dir: "models/lg_rw_lhen_reverse_transformer"', f'model_dir: "models/lg_rw_lhen_reverse_transformer_continued2"').replace(
        f'epochs: 30', f'epochs: 11')
        
with open("joeynmt/configs/transformer_{name}_reload2.yaml".format(name=name),'w') as f:
    f.write(reload_config)

In [None]:
!cat "joeynmt/configs/transformer_lg_rw_lhen_reload2.yaml"


name: "lg_rw_lhen_reverse_transformer"

data:
    src: "lg_rw_lh"
    trg: "en"
    train: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/train.bpe"
    dev:   "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/dev.bpe"
    test:  "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe"
    level: "bpe"
    lowercase: False
    max_sent_length: 100
    src_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"
    trg_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"

testing:
    beam_size: 5
    alpha: 1.0

training:
    load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued/150000.ckpt" # if uncommented, load a pre-trained model from this checkpoint
    random_seed: 42
    optimizer: "adam"
    normalization: "tokens"
    adam_betas: [0.9, 0.999] 
    scheduling: "plateau"

In [None]:
# Train continued
!cd joeynmt; python3 -m joeynmt train configs/transformer_lg_rw_lhen_reload2.yaml

2021-07-14 13:01:03,868 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-14 13:01:03,939 - INFO - joeynmt.data - Loading training data...
2021-07-14 13:01:21,005 - INFO - joeynmt.data - Building vocabulary...
2021-07-14 13:01:22,065 - INFO - joeynmt.data - Loading dev data...
2021-07-14 13:01:24,475 - INFO - joeynmt.data - Loading test data...
2021-07-14 13:01:26,043 - INFO - joeynmt.data - Data loaded.
2021-07-14 13:01:26,043 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-14 13:01:26,475 - INFO - joeynmt.model - Enc-dec model built.
2021-07-14 13:01:26.790320: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-14 13:01:30,874 - INFO - joeynmt.training - Total params: 12179456
2021-07-14 13:01:39,587 - INFO - joeynmt.training - Loading model from /content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued/150000.

7 epochs completed

In [None]:
# Output our validation accuracy
! cat "joeynmt/models/lg_rw_lhen_reverse_transformer/validations.txt"

Steps: 5000	Loss: 294844.96875	PPL: 26.54260	bleu: 3.53660	LR: 0.00030000	*
Steps: 10000	Loss: 262393.87500	PPL: 18.50215	bleu: 5.77667	LR: 0.00030000	*
Steps: 15000	Loss: 244500.31250	PPL: 15.16372	bleu: 8.66136	LR: 0.00030000	*
Steps: 20000	Loss: 232730.06250	PPL: 13.30337	bleu: 10.26265	LR: 0.00030000	*
Steps: 25000	Loss: 224283.95312	PPL: 12.11076	bleu: 11.83755	LR: 0.00030000	*
Steps: 30000	Loss: 218069.34375	PPL: 11.30208	bleu: 12.22675	LR: 0.00030000	*
Steps: 35000	Loss: 212952.45312	PPL: 10.67693	bleu: 13.58401	LR: 0.00030000	*
Steps: 40000	Loss: 211624.50000	PPL: 10.52042	bleu: 13.86809	LR: 0.00030000	*
Steps: 45000	Loss: 206207.64062	PPL: 9.90541	bleu: 13.85557	LR: 0.00030000	*
Steps: 50000	Loss: 203779.25000	PPL: 9.64150	bleu: 14.47956	LR: 0.00030000	*
Steps: 55000	Loss: 200319.60938	PPL: 9.27762	bleu: 15.54748	LR: 0.00030000	*
Steps: 60000	Loss: 199524.12500	PPL: 9.19591	bleu: 15.40850	LR: 0.00030000	*
Steps: 65000	Loss: 198351.85938	PPL: 9.07681	bleu: 14.63837	LR: 0.000300

In [None]:
! cat "joeynmt/models/lg_rw_lhen_reverse_transformer_continued/validations.txt"

Steps: 125000	Loss: 185210.06250	PPL: 7.84270	bleu: 17.83400	LR: 0.00030000	*
Steps: 130000	Loss: 184858.45312	PPL: 7.81209	bleu: 17.96464	LR: 0.00030000	*
Steps: 135000	Loss: 184420.81250	PPL: 7.77417	bleu: 17.77071	LR: 0.00030000	*
Steps: 140000	Loss: 183049.64062	PPL: 7.65653	bleu: 18.16814	LR: 0.00030000	*
Steps: 145000	Loss: 182512.57812	PPL: 7.61094	bleu: 18.16392	LR: 0.00030000	*
Steps: 150000	Loss: 182100.42188	PPL: 7.57613	bleu: 18.47894	LR: 0.00030000	*


In [None]:
! cat "joeynmt/models/lg_rw_lhen_reverse_transformer_continued2/validations.txt"

Steps: 155000	Loss: 181684.31250	PPL: 7.54116	bleu: 18.17956	LR: 0.00030000	*
Steps: 160000	Loss: 181753.37500	PPL: 7.54695	bleu: 18.35824	LR: 0.00030000	
Steps: 165000	Loss: 180658.29688	PPL: 7.45561	bleu: 18.38394	LR: 0.00030000	*
Steps: 170000	Loss: 180725.34375	PPL: 7.46117	bleu: 18.63909	LR: 0.00030000	
Steps: 175000	Loss: 181152.57812	PPL: 7.49670	bleu: 18.44216	LR: 0.00030000	
Steps: 180000	Loss: 179871.68750	PPL: 7.39067	bleu: 18.46554	LR: 0.00030000	*
Steps: 185000	Loss: 179396.95312	PPL: 7.35176	bleu: 18.45346	LR: 0.00030000	*
Steps: 190000	Loss: 178744.23438	PPL: 7.29859	bleu: 18.66191	LR: 0.00030000	*
Steps: 195000	Loss: 178519.01562	PPL: 7.28034	bleu: 18.88726	LR: 0.00030000	*


In [None]:
!cd joeynmt; python -m joeynmt test 'models/lg_rw_lhen_reverse_transformer_continued2/config.yaml'

2021-07-14 21:30:33,594 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-14 21:30:33,598 - INFO - joeynmt.data - Building vocabulary...
2021-07-14 21:30:34,314 - INFO - joeynmt.data - Loading dev data...
2021-07-14 21:30:34,978 - INFO - joeynmt.data - Loading test data...
2021-07-14 21:30:35,641 - INFO - joeynmt.data - Data loaded.
2021-07-14 21:30:35,711 - INFO - joeynmt.prediction - Process device: cuda, n_gpu: 1, batch_size per device: 5000 (with beam_size)
2021-07-14 21:30:46,753 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-14 21:30:47,121 - INFO - joeynmt.model - Enc-dec model built.
2021-07-14 21:30:47,193 - INFO - joeynmt.prediction - Decoding on dev set (/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/dev.bpe.en)...
2021-07-14 21:32:30,406 - INFO - joeynmt.prediction -  dev bleu[13a]:  19.21 [Beam search decoding with beam size = 5 and alpha = 1.0]
2021-07-14 21:32:30,406 - INFO - joeynmt.prediction - Decoding on test s

In [None]:
#@title
def empty_counter(x):
  # Opening a file
  infile = open(x,"r")
  empty = []
  
  for i,line in enumerate(infile):
    if not line.strip(): 
      empty.append(i)

  return empty

#@title
# Reference: https://thispointer.com/python-how-to-delete-specific-lines-in-a-file-in-a-memory-efficient-way/
def delete_multiple_lines(original_file, line_numbers):
    """In a file, delete the lines at line number in given list"""
    is_skipped = False
    counter = 0
    # Create name of dummy / temporary file
    dummy_file = original_file + '.bak'
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            # If current line number exist in list then skip copying that line
            if counter not in line_numbers:
                write_obj.write(line)
            else:
                is_skipped = True
            counter += 1
    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)

x = empty_counter("train.lg_rw_lh")
x

delete_multiple_lines("train.lg_rw_lh",x)
delete_multiple_lines("train.en",x)

delete_multiple_lines("test3.en",empty_counter("test3.rw"))

delete_multiple_lines("test2.en",empty_counter("test2.lg"))

In [None]:
# Reloading configuration file
ckpt_number = 195000
reload_config = config.replace(
    f'#load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/models/lg_rw_lhen_transformer/1.ckpt"', 
    f'load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued2/{ckpt_number}.ckpt"').replace(
        f'model_dir: "models/lg_rw_lhen_reverse_transformer"', f'model_dir: "models/lg_rw_lhen_reverse_transformer_continued3"').replace(
        f'epochs: 30', f'epochs: 4')
        
with open("joeynmt/configs/transformer_{name}_reload3.yaml".format(name=name),'w') as f:
    f.write(reload_config)

In [None]:
!cat "joeynmt/configs/transformer_lg_rw_lhen_reload3.yaml"


name: "lg_rw_lhen_reverse_transformer"

data:
    src: "lg_rw_lh"
    trg: "en"
    train: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/train.bpe"
    dev:   "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/dev.bpe"
    test:  "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe"
    level: "bpe"
    lowercase: False
    max_sent_length: 100
    src_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"
    trg_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"

testing:
    beam_size: 5
    alpha: 1.0

training:
    load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued2/195000.ckpt" # if uncommented, load a pre-trained model from this checkpoint
    random_seed: 42
    optimizer: "adam"
    normalization: "tokens"
    adam_betas: [0.9, 0.999] 
    scheduling: "plateau

In [None]:
# Train continued
!cd joeynmt; python3 -m joeynmt train configs/transformer_lg_rw_lhen_reload3.yaml

2021-07-22 09:03:19,993 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-22 09:03:20,030 - INFO - joeynmt.data - Loading training data...
2021-07-22 09:03:34,833 - INFO - joeynmt.data - Building vocabulary...
2021-07-22 09:03:35,151 - INFO - joeynmt.data - Loading dev data...
2021-07-22 09:03:35,275 - INFO - joeynmt.data - Loading test data...
2021-07-22 09:03:35,290 - INFO - joeynmt.data - Data loaded.
2021-07-22 09:03:35,291 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-22 09:03:35,529 - INFO - joeynmt.model - Enc-dec model built.
2021-07-22 09:03:35.788189: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-22 09:03:39,648 - INFO - joeynmt.training - Total params: 12179456
2021-07-22 09:03:42,522 - INFO - joeynmt.training - Loading model from /content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued2/195000

In [None]:
!cd joeynmt; python -m joeynmt translate 'models/lg_rw_lhen_reverse_transformer_continued3/config.yaml' < "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe.lh" > "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/translation.bpe.lh_en"

2021-07-26 08:04:54,859 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-26 08:05:04,977 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-26 08:05:05,387 - INFO - joeynmt.model - Enc-dec model built.


In [None]:
!cat "translation.bpe.lh_en" | sacrebleu "test1.en"

sacreBLEU: That's 100 lines that end in a tokenized period ('.')
sacreBLEU: It looks like you forgot to detokenize your test data, which may hurt your score.
sacreBLEU: If you insist your data is detokenized, or don't care, you can suppress this message with '--force'.
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.5.1 = 8.8 41.2/14.3/5.6/2.8 (BP = 0.891 ratio = 0.897 hyp_len = 23360 ref_len = 26044)


In [None]:
!cd joeynmt; python -m joeynmt translate 'models/lg_rw_lhen_reverse_transformer_continued3/config.yaml' < "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe.rw" > "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/translation.bpe.rw_en"

2021-07-26 08:10:58,707 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-26 08:11:01,758 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-26 08:11:02,027 - INFO - joeynmt.model - Enc-dec model built.


In [None]:
!cat "translation.bpe.rw_en" | sacrebleu "test3.en"

sacreBLEU: That's 100 lines that end in a tokenized period ('.')
sacreBLEU: It looks like you forgot to detokenize your test data, which may hurt your score.
sacreBLEU: If you insist your data is detokenized, or don't care, you can suppress this message with '--force'.
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.5.1 = 36.8 68.0/46.0/34.8/27.4 (BP = 0.884 ratio = 0.890 hyp_len = 37790 ref_len = 42439)


In [None]:
!cd joeynmt; python -m joeynmt translate 'models/lg_rw_lhen_reverse_transformer_continued3/config.yaml' < "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe.lg" > "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/translation.bpe.lg_en"

2021-07-26 08:18:24,485 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-26 08:18:27,528 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-26 08:18:27,792 - INFO - joeynmt.model - Enc-dec model built.


In [None]:
!cat "translation.bpe.lg_en" | sacrebleu "test2.en"

sacreBLEU: That's 100 lines that end in a tokenized period ('.')
sacreBLEU: It looks like you forgot to detokenize your test data, which may hurt your score.
sacreBLEU: If you insist your data is detokenized, or don't care, you can suppress this message with '--force'.
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.5.1 = 35.4 65.6/43.6/32.5/25.3 (BP = 0.905 ratio = 0.909 hyp_len = 39211 ref_len = 43116)


In [None]:
# Reloading configuration file
ckpt_number = 225000
reload_config = config.replace(
    f'#load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/models/lg_rw_lhen_transformer/1.ckpt"', 
    f'load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued3/{ckpt_number}.ckpt"').replace(
        f'model_dir: "models/lg_rw_lhen_reverse_transformer"', f'model_dir: "models/lg_rw_lhen_reverse_transformer_continued4"')
        
with open("joeynmt/configs/transformer_{name}_reload4.yaml".format(name=name),'w') as f:
    f.write(reload_config)

In [None]:
# Train continued
!cd joeynmt; python3 -m joeynmt train configs/transformer_lg_rw_lhen_reload4.yaml

2021-07-22 11:04:01,211 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-22 11:04:01,237 - INFO - joeynmt.data - Loading training data...
2021-07-22 11:04:14,924 - INFO - joeynmt.data - Building vocabulary...
2021-07-22 11:04:15,256 - INFO - joeynmt.data - Loading dev data...
2021-07-22 11:04:15,383 - INFO - joeynmt.data - Loading test data...
2021-07-22 11:04:15,398 - INFO - joeynmt.data - Data loaded.
2021-07-22 11:04:15,398 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-22 11:04:15,634 - INFO - joeynmt.model - Enc-dec model built.
2021-07-22 11:04:15.849913: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-22 11:04:19,270 - INFO - joeynmt.training - Total params: 12179456
2021-07-22 11:04:22,079 - INFO - joeynmt.training - Loading model from /content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued3/225000

In [None]:
# Reloading configuration file
ckpt_number = 310000
reload_config = config.replace(
    f'#load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/models/lg_rw_lhen_transformer/1.ckpt"', 
    f'load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued4/{ckpt_number}.ckpt"').replace(
        f'model_dir: "models/lg_rw_lhen_reverse_transformer"', f'model_dir: "models/lg_rw_lhen_reverse_transformer_continued5"').replace(
        f'epochs: 30', f'epochs: 18')
        
with open("joeynmt/configs/transformer_{name}_reload5.yaml".format(name=name),'w') as f:
    f.write(reload_config)

In [None]:
!cat "joeynmt/configs/transformer_lg_rw_lhen_reload5.yaml"


name: "lg_rw_lhen_reverse_transformer"

data:
    src: "lg_rw_lh"
    trg: "en"
    train: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/train.bpe"
    dev:   "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/dev.bpe"
    test:  "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe"
    level: "bpe"
    lowercase: False
    max_sent_length: 100
    src_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"
    trg_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"

testing:
    beam_size: 5
    alpha: 1.0

training:
    load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued4/310000.ckpt" # if uncommented, load a pre-trained model from this checkpoint
    random_seed: 42
    optimizer: "adam"
    normalization: "tokens"
    adam_betas: [0.9, 0.999] 
    scheduling: "plateau

In [None]:
# Train continued
!cd joeynmt; python3 -m joeynmt train configs/transformer_lg_rw_lhen_reload5.yaml

2021-07-25 07:21:54,318 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-25 07:21:54,392 - INFO - joeynmt.data - Loading training data...
2021-07-25 07:22:09,628 - INFO - joeynmt.data - Building vocabulary...
2021-07-25 07:22:10,160 - INFO - joeynmt.data - Loading dev data...
2021-07-25 07:22:10,868 - INFO - joeynmt.data - Loading test data...
2021-07-25 07:22:11,674 - INFO - joeynmt.data - Data loaded.
2021-07-25 07:22:11,674 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-25 07:22:12,053 - INFO - joeynmt.model - Enc-dec model built.
2021-07-25 07:22:12.301664: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-25 07:22:16,034 - INFO - joeynmt.training - Total params: 12179456
2021-07-25 07:22:26,804 - INFO - joeynmt.training - Loading model from /content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued4/310000

In [None]:
# Reloading configuration file
ckpt_number = 450000
reload_config = config.replace(
    f'#load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/models/lg_rw_lhen_transformer/1.ckpt"', 
    f'load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued5/{ckpt_number}.ckpt"').replace(
        f'model_dir: "models/lg_rw_lhen_reverse_transformer"', f'model_dir: "models/lg_rw_lhen_reverse_transformer_continued6"').replace(
        f'epochs: 30', f'epochs: 2')
        
with open("joeynmt/configs/transformer_{name}_reload6.yaml".format(name=name),'w') as f:
    f.write(reload_config)

In [None]:
!cat "joeynmt/configs/transformer_lg_rw_lhen_reload6.yaml"


name: "lg_rw_lhen_reverse_transformer"

data:
    src: "lg_rw_lh"
    trg: "en"
    train: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/train.bpe"
    dev:   "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/dev.bpe"
    test:  "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe"
    level: "bpe"
    lowercase: False
    max_sent_length: 100
    src_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"
    trg_vocab: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/vocab.txt"

testing:
    beam_size: 5
    alpha: 1.0

training:
    load_model: "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued5/450000.ckpt" # if uncommented, load a pre-trained model from this checkpoint
    random_seed: 42
    optimizer: "adam"
    normalization: "tokens"
    adam_betas: [0.9, 0.999] 
    scheduling: "plateau

In [None]:
# Train continued
!cd joeynmt; python3 -m joeynmt train configs/transformer_lg_rw_lhen_reload6.yaml

2021-07-25 14:33:18,079 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-25 14:33:18,166 - INFO - joeynmt.data - Loading training data...
2021-07-25 14:33:36,505 - INFO - joeynmt.data - Building vocabulary...
2021-07-25 14:33:37,627 - INFO - joeynmt.data - Loading dev data...
2021-07-25 14:33:41,987 - INFO - joeynmt.data - Loading test data...
2021-07-25 14:33:43,510 - INFO - joeynmt.data - Data loaded.
2021-07-25 14:33:43,510 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-25 14:33:43,930 - INFO - joeynmt.model - Enc-dec model built.
2021-07-25 14:33:44.177548: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-25 14:33:48,669 - INFO - joeynmt.training - Total params: 12179456
2021-07-25 14:33:57,385 - INFO - joeynmt.training - Loading model from /content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/joeynmt/models/lg_rw_lhen_reverse_transformer_continued5/450000

In [None]:
!cd joeynmt; python -m joeynmt test 'models/lg_rw_lhen_reverse_transformer_continued6/config.yaml'

2021-07-26 09:33:10,321 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-26 09:33:10,339 - INFO - joeynmt.data - Building vocabulary...
2021-07-26 09:33:11,544 - INFO - joeynmt.data - Loading dev data...
2021-07-26 09:33:13,098 - INFO - joeynmt.data - Loading test data...
2021-07-26 09:33:14,166 - INFO - joeynmt.data - Data loaded.
2021-07-26 09:33:14,231 - INFO - joeynmt.prediction - Process device: cuda, n_gpu: 1, batch_size per device: 5000 (with beam_size)
2021-07-26 09:33:23,372 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-26 09:33:23,780 - INFO - joeynmt.model - Enc-dec model built.
2021-07-26 09:33:23,864 - INFO - joeynmt.prediction - Decoding on dev set (/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/dev.bpe.en)...
2021-07-26 09:37:19,657 - INFO - joeynmt.prediction -  dev bleu[13a]:  20.81 [Beam search decoding with beam size = 5 and alpha = 1.0]
2021-07-26 09:37:19,657 - INFO - joeynmt.prediction - Decoding on test s

In [None]:
!cd joeynmt; python -m joeynmt translate 'models/lg_rw_lhen_reverse_transformer_continued6/config.yaml' < "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe.lh" > "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/translation2.bpe.lh_en"

2021-07-26 09:39:09,755 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-26 09:39:12,630 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-26 09:39:12,889 - INFO - joeynmt.model - Enc-dec model built.


In [None]:
!cat "translation2.bpe.lh_en" | sacrebleu "test1.en"

sacreBLEU: That's 100 lines that end in a tokenized period ('.')
sacreBLEU: It looks like you forgot to detokenize your test data, which may hurt your score.
sacreBLEU: If you insist your data is detokenized, or don't care, you can suppress this message with '--force'.
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.5.1 = 10.2 41.6/15.1/6.5/3.4 (BP = 0.944 ratio = 0.946 hyp_len = 24628 ref_len = 26044)


In [None]:
!cd joeynmt; python -m joeynmt translate 'models/lg_rw_lhen_reverse_transformer_continued6/config.yaml' < "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe.rw" > "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/translation2.bpe.rw_en"

2021-07-26 09:41:04,443 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-26 09:41:07,329 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-26 09:41:07,598 - INFO - joeynmt.model - Enc-dec model built.


In [None]:
!cat "translation2.bpe.rw_en" | sacrebleu "test3.en"

sacreBLEU: That's 100 lines that end in a tokenized period ('.')
sacreBLEU: It looks like you forgot to detokenize your test data, which may hurt your score.
sacreBLEU: If you insist your data is detokenized, or don't care, you can suppress this message with '--force'.
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.5.1 = 38.2 69.0/47.4/36.5/29.1 (BP = 0.885 ratio = 0.891 hyp_len = 37818 ref_len = 42439)


In [None]:
!cd joeynmt; python -m joeynmt translate 'models/lg_rw_lhen_reverse_transformer_continued6/config.yaml' < "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/test.bpe.lg" > "/content/gdrive/Shared drives/NMT_for_African_Language/Multilingual/translation2.bpe.lg_en"

2021-07-26 09:42:55,100 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-07-26 09:42:57,936 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-07-26 09:42:58,207 - INFO - joeynmt.model - Enc-dec model built.


In [None]:
!cat "translation2.bpe.lg_en" | sacrebleu "test2.en"

sacreBLEU: That's 100 lines that end in a tokenized period ('.')
sacreBLEU: It looks like you forgot to detokenize your test data, which may hurt your score.
sacreBLEU: If you insist your data is detokenized, or don't care, you can suppress this message with '--force'.
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.5.1 = 37.1 66.9/45.4/34.3/27.1 (BP = 0.904 ratio = 0.909 hyp_len = 39177 ref_len = 43116)
