<a href="https://colab.research.google.com/github/Shingirai98/Xhosa_English_Translation/blob/main/30_epochs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Mount the drive to be used for storage of data

In [25]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Setup source and target languages using the short hand from the JW300 JSON data

In [26]:
# TODO: Set your source and target languages. Keep in mind, these traditionally use language codes as found here:
# These will also become the suffix's of all vocab and corpus files used throughout
import os
source_language = "en"
target_language = "xh" 
lc = False  # If True, lowercase the data.
seed = 42  # Random seed for shuffling.
tag = "baseline_10epochs" # Give a unique name to your folder - this is to ensure you don't rewrite any models you've already submitted

os.environ["src"] = source_language # Sets them in bash as well, since we often use bash scripts
os.environ["tgt"] = target_language
os.environ["tag"] = tag

# This will save it to a folder in our gdrive 
!mkdir -p "/content/drive/My Drive/m/$tgt-$src-$tag"
os.environ["gdrive_path"] = "/content/drive/My Drive/m/%s-%s-%s" % (target_language, source_language, tag)

In [27]:
# confirm the availability of folder

!echo $gdrive_path

/content/drive/My Drive/m/xh-en-baseline_10epochs


In [28]:
# install the opus tools
! pip install opustools-pkg



In [29]:
# Downloading our corpus
! opus_read -d XhosaNavy -s $src -t $tgt -wm moses -w xhosanavy.$src xhosanavy.$tgt -q

# extract the corpus file
! gunzip XhosaNavy_latest_xml_$src-$tgt.xml.gz


Alignment file /proj/nlpl/data/OPUS/XhosaNavy/latest/xml/en-xh.xml.gz not found. The following files are available for downloading:

        ./XhosaNavy_latest_xml_en.zip already exists
        ./XhosaNavy_latest_xml_xh.zip already exists
 379 KB https://object.pouta.csc.fi/OPUS-XhosaNavy/v1/xml/en-xh.xml.gz

 379 KB Total size
./XhosaNavy_latest_xml_en-xh.xml.gz ... 100% of 379 KB
gzip: XhosaNavy_latest_xml_en-xh.xml already exists; do you wish to overwrite (y or n)? y


In [30]:
# Download the global test set.
! wget https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-any.en
  
# And the specific test set for this language pair.
os.environ["trg"] = target_language 
os.environ["src"] = source_language 

! wget https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-$trg.en 
! mv test.en-$trg.en test.en
! wget https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-$trg.$trg 
! mv test.en-$trg.$trg test.$trg

--2021-10-24 08:11:50--  https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-any.en
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 277791 (271K) [text/plain]
Saving to: ‘test.en-any.en.1’


2021-10-24 08:11:51 (18.2 MB/s) - ‘test.en-any.en.1’ saved [277791/277791]

--2021-10-24 08:11:51--  https://raw.githubusercontent.com/juliakreutzer/masakhane/master/jw300_utils/test/test.en-xh.en
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 206162 (201K) [text/plain]
Saving to: ‘test.en-xh.en’


2021-1

In [31]:
# Read the test data to filter from train and dev splits.
# Store english portion in set for quick filtering checks.
en_test_sents = set()
filter_test_sents = "test.en-any.en"
j = 0
with open(filter_test_sents) as f:
  for line in f:
    en_test_sents.add(line.strip())
    j += 1
print('Loaded {} global test sentences to filter from the training/dev data.'.format(j))

Loaded 3571 global test sentences to filter from the training/dev data.


In [32]:
import pandas as pd

# TMX file to dataframe
source_file = 'xhosanavy.' + source_language
target_file = 'xhosanavy.' + target_language

source = []
target = []
skip_lines = []  # Collect the line numbers of the source portion to skip the same lines for the target portion.
with open(source_file) as f:
    for i, line in enumerate(f):
        # Skip sentences that are contained in the test set.
        if line.strip() not in en_test_sents:
            source.append(line.strip())
        else:
            skip_lines.append(i)             
with open(target_file) as f:
    for j, line in enumerate(f):
        # Only add to corpus if corresponding source was not skipped.
        if j not in skip_lines:
            target.append(line.strip())
    
print('Loaded data and skipped {}/{} lines since contained in test set.'.format(len(skip_lines), i))
    
df = pd.DataFrame(zip(source, target), columns=['source_sentence', 'target_sentence'])

df.head(3)

Loaded data and skipped 1/50097 lines since contained in test set.


Unnamed: 0,source_sentence,target_sentence
0,Rope and its Usage,Intambo nomsebenzi ewenzayo .
1,In this chapter are described the various type...,Kwesi sahluko sixelelwa ngendindi zeentambo at...
2,The chapter has been divided into seven sectio...,"Esi sahluko sahlulwa - hlulwe kasixhenxe , zic..."


Preprocessing and Export

In [33]:
# drop duplicate translations
df_pp = df.drop_duplicates()

# drop conflicting translations
# (this is optional and something that you might want to comment out 
# depending on the size of your corpus)
df_pp.drop_duplicates(subset='source_sentence', inplace=True)
df_pp.drop_duplicates(subset='target_sentence', inplace=True)

# Shuffle the data to remove bias in dev set selection.
df_pp = df_pp.sample(frac=1, random_state=seed).reset_index(drop=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Remove almost duplicate sentences in the test and training datasets

In [34]:
# Install fuzzy wuzzy to remove "almost duplicate" sentences in the
# test and training sets.
! pip install fuzzywuzzy
! pip install python-Levenshtein
import time
from fuzzywuzzy import process
import numpy as np
from os import cpu_count
from functools import partial
from multiprocessing import Pool


# reset the index of the training set after previous filtering
df_pp.reset_index(drop=False, inplace=True)

# Remove samples from the training data set if they "almost overlap" with the
# samples in the test set.

# Filtering function. Adjust pad to narrow down the candidate matches to
# within a certain length of characters of the given sample.
def fuzzfilter(sample, candidates, pad):
  candidates = [x for x in candidates if len(x) <= len(sample)+pad and len(x) >= len(sample)-pad] 
  if len(candidates) > 0:
    return process.extractOne(sample, candidates)[1]
  else:
    return np.nan



Split between the train/dev for the parallel corpora them saves them as separate files


In [35]:
# start_time = time.time()
# # ### iterating over pandas dataframe rows is not recomended, let use multi processing to apply the function

# with Pool(cpu_count()-1) as pool:
#     scores = pool.map(partial(fuzzfilter, candidates=list(en_test_sents), pad=5), df_pp['source_sentence'])
# hours, rem = divmod(time.time() - start_time, 3600)
# minutes, seconds = divmod(rem, 60)
# print("done in {}h:{}min:{}seconds".format(hours, minutes, seconds))

# # Filter out "almost overlapping samples"
# df_pp = df_pp.assign(scores=scores)
# df_pp = df_pp[df_pp['scores'] < 95]

In [36]:
import csv

# Do the split between dev/train and create parallel corpora
num_dev_patterns = 1000

# Optional: lower case the corpora - this will make it easier to generalize, but without proper casing.
if lc:  # Julia: making lowercasing optional
    df_pp["source_sentence"] = df_pp["source_sentence"].str.lower()
    df_pp["target_sentence"] = df_pp["target_sentence"].str.lower()

# Julia: test sets are already generated
dev = df_pp.tail(num_dev_patterns) # Herman: Error in original
stripped = df_pp.drop(df_pp.tail(num_dev_patterns).index)

with open("train."+source_language, "w") as src_file, open("train."+target_language, "w") as trg_file:
  for index, row in stripped.iterrows():
    src_file.write(row["source_sentence"]+"\n")
    trg_file.write(row["target_sentence"]+"\n")
    
with open("dev."+source_language, "w") as src_file, open("dev."+target_language, "w") as trg_file:
  for index, row in dev.iterrows():
    src_file.write(row["source_sentence"]+"\n")
    trg_file.write(row["target_sentence"]+"\n")
! head train.*
! head dev.*

==> train.bpe.en <==
It is adv@@ is@@ able to st@@ op engines , however , a little later than us@@ ual .
H@@ er@@ m@@ es , of the H@@ igh@@ fl@@ y@@ er class , was conver@@ ted into a se@@ a@@ pl@@ ane carri@@ er just before the war .
C@@ oun@@ ter@@ mine
E@@ ven in the future , ships for the carriage of general cargo will not exc@@ eed a deadweight of 2@@ 5@@ ,000 tons , with the maj@@ ority having dead@@ we@@ ights of about 12@@ ,000 to 1@@ 5@@ ,000 tons .
Their ch@@ ief char@@ ac@@ ter@@ ist@@ ic was speed .
This naval strength dec@@ lin@@ ed a little after the war .
He does not re@@ b@@ uke an in@@ experi@@ enc@@ ed Officer of the Watch who oc@@ cas@@ ion@@ ally c@@ alls him on some tr@@ iv@@ ial pre@@ t@@ ext .
For this rem@@ ar@@ k@@ able instrum@@ ent , could keep ac@@ cur@@ ate time for a long period at sea des@@ p@@ ite chang@@ es of tem@@ per@@ ature and the mo@@ tion of the vessel .
To exp@@ ed@@ ite the ad@@ min@@ is@@ tr@@ ation of a dec@@ eas@@ ed 's est@@ ate , the follo

## Installation of JoeyNMT
This is a simple, minimal Neural Machine Translation package for learning and teaching

In [37]:


# Install JoeyNMT
! git clone https://github.com/joeynmt/joeynmt.git
! cd joeynmt; pip3 install .
# Install Pytorch with GPU support v1.7.1.
! pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html

fatal: destination path 'joeynmt' already exists and is not an empty directory.
Processing /content/joeynmt
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
Building wheels for collected packages: joeynmt
  Building wheel for joeynmt (setup.py) ... [?25l[?25hdone
  Created wheel for joeynmt: filename=joeynmt-1.3-py3-none-any.whl size=86029 sha256=4284e98ea9156455c33dedf1a8ec6b1ce70eeadea3b48135cfd4367ee374338a
  Stored in directory: /tmp/pip-ephem-wheel-cache-xpb21ije/wheels/0a/f4/bf/6c9d3b8efbfece6cd209f865be37382b02e7c3584df2e28ca4
Successfully built joeynmt
Installing collected packages: joeynmt
  Attempting uninstall: joeynmt
    

##Preprocessing the data into Subword BPE Tokens

In [38]:
from os import path
os.environ["src"] = source_language # Sets them in bash as well, since we often use bash scripts
os.environ["tgt"] = target_language

# Learn BPEs on the training data.
os.environ["data_path"] = path.join("joeynmt", "data",target_language + source_language ) # Herman! 
! subword-nmt learn-joint-bpe-and-vocab --input train.$src train.$tgt -s 4000 -o bpe.codes.4000 --write-vocabulary vocab.$src vocab.$tgt

# Apply BPE splits to the development and test data.
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < train.$src > train.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < train.$tgt > train.bpe.$tgt

! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < dev.$src > dev.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < dev.$tgt > dev.bpe.$tgt
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$src < test.$src > test.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.4000 --vocabulary vocab.$tgt < test.$tgt > test.bpe.$tgt

# Create directory, move everyone we care about to the correct location
! mkdir -p $data_path
! cp train.* $data_path
! cp test.* $data_path
! cp dev.* $data_path
! cp bpe.codes.4000 $data_path
! ls $data_path

! cp train.* "$gdrive_path"
! cp test.* "$gdrive_path"
! cp dev.* "$gdrive_path"
! cp bpe.codes.4000 "$gdrive_path"
! ls "$gdrive_path"

# Create that vocab using build_vocab
! sudo chmod 777 joeynmt/scripts/build_vocab.py
! joeynmt/scripts/build_vocab.py joeynmt/data/$tgt$src/train.bpe.$src joeynmt/data/$tgt$src/train.bpe.$tgt --output_path joeynmt/data/$tgt$src/vocab.txt

# Some output
! echo "BPE isiXhosa Sentences"
! tail -n 5 test.bpe.$tgt
! echo "Combined BPE Vocab"
! tail -n 10 joeynmt/data/$tgt$src/vocab.txt  # Herman

bpe.codes.4000	dev.xh	     test.en-any.en    train.bpe.xh
dev.bpe.en	test.bpe.en  test.en-any.en.1  train.en
dev.bpe.xh	test.bpe.xh  test.xh	       train.xh
dev.en		test.en      train.bpe.en      vocab.txt
bpe.codes.4000	dev.en	     test.bpe.xh     test.en-any.en.1  train.bpe.xh
dev.bpe.en	dev.xh	     test.en	     test.xh	       train.en
dev.bpe.xh	test.bpe.en  test.en-any.en  train.bpe.en      train.xh
BPE isiXhosa Sentences
Oku kw@@ aph@@ umela ekub@@ eni nd@@ id@@ ume njengom@@ ntu ong@@ any@@ anis@@ ekanga .
Xa nd@@ af@@ und@@ a iny@@ aniso , and@@ iz@@ ange nd@@ iph@@ inde nd@@ iv@@ ume uku@@ qhub@@ eka n@@ dis@@ enza oko , naku@@ b@@ eni nd@@ and@@ ihl@@ awulwa um@@ v@@ uzo on@@ c@@ um@@ isayo .
N@@ d@@ ing@@ umzekelo om@@ hle ko@@ ony@@ ana b@@ am ab@@ abini yaye n@@ di@@ ye nd@@ af@@ an@@ elek@@ ela am@@ alung@@ elo ang@@ akumbi eb@@ andl@@ eni .
N@@ d@@ aziwa njengom@@ ntu ony@@ anis@@ ekileyo ngab@@ antu end@@ ish@@ ish@@ ina kunye n@@ abo kwan@@ ab@@ ahl@@ ol@@ i boku@@ m@@ 

In [39]:
# Also move everything we care about to a mounted location in google drive (relevant if running in colab) at gdrive_path
! cp train.* "$gdrive_path"
! cp test.* "$gdrive_path"
! cp dev.* "$gdrive_path"
! cp bpe.codes.4000 "$gdrive_path"
! ls "$gdrive_path"

bpe.codes.4000	dev.en	     test.bpe.xh     test.en-any.en.1  train.bpe.xh
dev.bpe.en	dev.xh	     test.en	     test.xh	       train.en
dev.bpe.xh	test.bpe.en  test.en-any.en  train.bpe.en      train.xh


In [40]:
name = '%s%s' % (target_language, source_language)
# gdrive_path = os.environ["gdrive_path"]

# Create the config
config = """
name: "{target_language}{source_language}_reverse_transformer"

data:
    src: "{target_language}"
    trg: "{source_language}"
    train: "data/{name}/train.bpe"
    dev:   "data/{name}/dev.bpe"
    test:  "data/{name}/test.bpe"
    level: "bpe"
    lowercase: False
    max_sent_length: 100
    src_vocab: "data/{name}/vocab.txt"
    trg_vocab: "data/{name}/vocab.txt"

testing:
    beam_size: 5
    alpha: 1.0

training:
    #load_model: "{gdrive_path}/models/{name}_transformer/1.ckpt" # if uncommented, load a pre-trained model from this checkpoint
    random_seed: 42
    optimizer: "adam"
    normalization: "tokens"
    adam_betas: [0.9, 0.999] 
    scheduling: "plateau"           
    patience: 5                     
    learning_rate_factor: 0.5       
    learning_rate_warmup: 1000      
    decrease_factor: 0.7
    loss: "crossentropy"
    learning_rate: 0.0003
    learning_rate_min: 0.00000001
    weight_decay: 0.0
    label_smoothing: 0.1
    batch_size: 4096
    batch_type: "token"
    eval_batch_size: 3600
    eval_batch_type: "token"
    batch_multiplier: 1
    early_stopping_metric: "ppl"
    epochs: 30                  
    validation_freq: 1000          
    logging_freq: 100
    eval_metric: "bleu"
    model_dir: "models/{name}_reverse_transformer"
    overwrite: True              
    shuffle: True
    use_cuda: True
    max_output_length: 100
    print_valid_sents: [0, 1, 2, 3]
    keep_last_ckpts: 3

model:
    initializer: "xavier"
    bias_initializer: "zeros"
    init_gain: 1.0
    embed_initializer: "xavier"
    embed_init_gain: 1.0
    tied_embeddings: True
    tied_softmax: True
    encoder:
        type: "transformer"
        num_layers: 6
        num_heads: 4             
        embeddings:
            embedding_dim: 256   
            scale: True
            dropout: 0.2
        # typically ff_size = 4 x hidden_size
        hidden_size: 256         
        ff_size: 1024            
        dropout: 0.3
    decoder:
        type: "transformer"
        num_layers: 6
        num_heads: 4              
        embeddings:
            embedding_dim: 256    
            scale: True
            dropout: 0.2
        # typically ff_size = 4 x hidden_size
        hidden_size: 256         
        ff_size: 1024            
        dropout: 0.3
""".format(name=name, gdrive_path=os.environ["gdrive_path"], source_language=source_language, target_language=target_language)
with open("joeynmt/configs/transformer_reverse_{name}.yaml".format(name=name),'w') as f:
    f.write(config)


In [41]:
# Train the model
# You can press Ctrl-C to stop. And then run the next cell to save your checkpoints! 
!cd joeynmt; python3 -m joeynmt train configs/transformer_reverse_$tgt$src.yaml

2021-10-24 08:12:59,810 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-10-24 08:12:59,837 - INFO - joeynmt.data - Loading training data...
2021-10-24 08:13:00,485 - INFO - joeynmt.data - Building vocabulary...
2021-10-24 08:13:00,787 - INFO - joeynmt.data - Loading dev data...
2021-10-24 08:13:00,801 - INFO - joeynmt.data - Loading test data...
2021-10-24 08:13:00,980 - INFO - joeynmt.data - Data loaded.
2021-10-24 08:13:00,980 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-10-24 08:13:01,226 - INFO - joeynmt.model - Enc-dec model built.
2021-10-24 08:13:02,752 - INFO - joeynmt.training - Total params: 12135424
2021-10-24 08:13:05,611 - INFO - joeynmt.helpers - cfg.name                           : xhen_reverse_transformer
2021-10-24 08:13:05,612 - INFO - joeynmt.helpers - cfg.data.src                       : xh
2021-10-24 08:13:05,612 - INFO - joeynmt.helpers - cfg.data.trg                       : en
2021-10-24 08:13:05,612 - INFO - joeynmt.helpers - cf

In [53]:
# Copy the created models from the notebook storage to google drive for persistant storage 
!cp -r joeynmt/models/${tgt}${src}_reverse_transformer/ "$gdrive_path/models/${tgt}${src}_reverse_transformer/"

In [50]:
! touch $gdrive_path/models/${tgt}${src}_reverse_transformer/validations.txt

touch: cannot touch '/content/drive/My': Operation not supported
touch: cannot touch 'Drive/m/xh-en-baseline_10epochs/models/xhen_reverse_transformer/validations.txt': No such file or directory


In [54]:
# Output our validation accuracy
! cat "$gdrive_path/models/${tgt}${src}_reverse_transformer/validations.txt"

Steps: 1000	Loss: 112401.50000	PPL: 84.39928	bleu: 0.31721	LR: 0.00030000	*
Steps: 2000	Loss: 99845.14062	PPL: 51.42185	bleu: 0.80554	LR: 0.00030000	*
Steps: 3000	Loss: 91615.40625	PPL: 37.16262	bleu: 1.80184	LR: 0.00030000	*
Steps: 4000	Loss: 85857.57031	PPL: 29.60932	bleu: 2.66721	LR: 0.00030000	*
Steps: 5000	Loss: 81840.14844	PPL: 25.26841	bleu: 3.06048	LR: 0.00030000	*
Steps: 6000	Loss: 78571.08594	PPL: 22.21020	bleu: 4.06724	LR: 0.00030000	*
Steps: 7000	Loss: 75983.10156	PPL: 20.05394	bleu: 5.11519	LR: 0.00030000	*
Steps: 8000	Loss: 74091.54688	PPL: 18.61153	bleu: 5.42913	LR: 0.00030000	*
Steps: 9000	Loss: 72134.07812	PPL: 17.22801	bleu: 6.38756	LR: 0.00030000	*
Steps: 10000	Loss: 70563.35156	PPL: 16.19257	bleu: 6.56053	LR: 0.00030000	*
Steps: 11000	Loss: 69272.80469	PPL: 15.38858	bleu: 6.89533	LR: 0.00030000	*
Steps: 12000	Loss: 67891.14844	PPL: 14.57201	bleu: 7.55864	LR: 0.00030000	*
Steps: 13000	Loss: 66740.70312	PPL: 13.92526	bleu: 8.31587	LR: 0.00030000	*
Steps: 14000	Loss: 6

In [55]:
# Test our model
! cd joeynmt; python3 -m joeynmt test "$gdrive_path/models/${tgt}${src}_reverse_transformer/config.yaml"

2021-10-24 11:23:14,426 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-10-24 11:23:14,427 - INFO - joeynmt.data - Building vocabulary...
2021-10-24 11:23:14,708 - INFO - joeynmt.data - Loading dev data...
2021-10-24 11:23:14,719 - INFO - joeynmt.data - Loading test data...
2021-10-24 11:23:14,773 - INFO - joeynmt.data - Data loaded.
2021-10-24 11:23:14,799 - INFO - joeynmt.prediction - Process device: cuda, n_gpu: 1, batch_size per device: 3600
2021-10-24 11:23:14,800 - INFO - joeynmt.prediction - Loading model from models/xhen_reverse_transformer/26000.ckpt
2021-10-24 11:23:17,452 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-10-24 11:23:17,698 - INFO - joeynmt.model - Enc-dec model built.
2021-10-24 11:23:17,784 - INFO - joeynmt.prediction - Decoding on dev set (data/xhen/dev.bpe.en)...
2021-10-24 11:24:42,983 - INFO - joeynmt.prediction -  dev bleu[13a]:  14.57 [Beam search decoding with beam size = 5 and alpha = 1.0]
2021-10-24 11:24:42,984 - INFO 

In [59]:
# Human eval test
! cd joeynmt; touch my_input.txt
! cd joeynmt; echo $'Esi sahluko sahlulwa - hlulwe kasixhenxe , zicalulwe ngokulandelayo' > my_input.txt
! cd joeynmt; python3 -m joeynmt translate "$gdrive_path/models/${tgt}${src}_reverse_transformer/config.yaml" < my_input.txt
#! echo myinput.txt

2021-10-24 11:28:44,873 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-10-24 11:28:45,169 - INFO - joeynmt.prediction - Loading model from models/xhen_reverse_transformer/26000.ckpt
2021-10-24 11:28:47,746 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-10-24 11:28:48,025 - INFO - joeynmt.model - Enc-dec model built.
Flooding


In [57]:
# Human eval test
! cd joeynmt; touch my_input.txt
! cd joeynmt; echo $'Ewe' > my_input.txt
! cd joeynmt; python3 -m joeynmt translate "$gdrive_path/models/${tgt}${src}_reverse_transformer/config.yaml" < my_input.txt
#! echo myinput.txt

2021-10-24 11:28:22,736 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-10-24 11:28:23,029 - INFO - joeynmt.prediction - Loading model from models/xhen_reverse_transformer/26000.ckpt
2021-10-24 11:28:25,571 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-10-24 11:28:25,844 - INFO - joeynmt.model - Enc-dec model built.
Apparent .


In [63]:
# Human eval test
! cd joeynmt; touch my_inpu.txt
! cd joeynmt; echo $'Ndilahlekelwe yi isingxobo' > my_inpu.txt
! cd joeynmt; python3 -m joeynmt translate "$gdrive_path/models/${tgt}${src}_reverse_transformer/config.yaml" < my_inpu.txt
#! echo myinput.txt

2021-10-24 11:30:43,874 - INFO - root - Hello! This is Joey-NMT (version 1.3).
2021-10-24 11:30:44,181 - INFO - joeynmt.prediction - Loading model from models/xhen_reverse_transformer/26000.ckpt
2021-10-24 11:30:46,788 - INFO - joeynmt.model - Building an encoder-decoder model...
2021-10-24 11:30:47,076 - INFO - joeynmt.model - Enc-dec model built.
Apparent .
