Skip to content

Efficiently translating Latin to English using a sequence-to-sequence transformer augmented with learnable morphologically-derived grammatical embeddings. 🚀

Notifications You must be signed in to change notification settings

UE2020/DeclineFormer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeclineFormer

Efficiently translating Latin to English using a sequence-to-sequence transformer augmented with learnable morphologically-derived grammatical embeddings. 🚀

Usage

Evaluation

Evaluation requires two steps.

First, DeclEngine should be cloned and compiled.

A sentence should be translated to IR using the test.py script in DeclEngine:

$ python3 test.py "Qui Deum non audiunt, certe peribunt."
how<SEP><ACC>God<SEP>not<SEP>they hear,<SEP>surely<SEP>they will die.

Then the string should be used with the test subcommand:

$ ./target/release/seq2seq test model.pt "how<SEP><ACC>God<SEP>not<SEP>they hear,<SEP>surely<SEP>they will die."
output: <BOS> those who do not listen to god, will they die.<EOS>

Well done! Comparing Google Translate and our result:

Translation
GTranslate Those who do not listen to God will surely perish.
Our result those who do not listen to god, will they die.

A comparison of translations from the Latin Vulgate (Genesis 8:7):

Translation
Latin qui egrediebatur, et non revertebatur, donec siccarentur aquae super terram.
Ground Truth which went forth and did not return, until the waters were dried up across the earth.
GTranslate who went out and did not return until the waters were dried up on the earth.
Our result and he went out, and he did not return, until the waters were dried up upon the earth.

Tokenization

Tokenizers can be tested using the test-tok command:

$ ./target/release/seq2seq test-tok tokenizer.json "This is a test." # <tokenizer> <test-sentence>
["Ä this", "Ä is", "Ä a", "Ä test", "."]

Training

Training (tensorboard logs are written to ./logdir/train):

$ python3 src/model.py # generate torchscripts
$ python3 src/split.py # split data
$ ./target/release/seq2seq train ir.txt en.txt 5000 ir-en.txt false 1 # last parameter is number of hours before quitting
Epoch 1 complete!
Epoch 2 complete!
...
Epoch 18 complete!

Checkpoints will be saved to model_<EPOCH>.pt

About

Efficiently translating Latin to English using a sequence-to-sequence transformer augmented with learnable morphologically-derived grammatical embeddings. 🚀

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published