DeclineFormer

Efficiently translating Latin to English using a sequence-to-sequence transformer augmented with learnable morphologically-derived grammatical embeddings. 🚀

Usage

Evaluation

Evaluation requires two steps.

First, DeclEngine should be cloned and compiled.

A sentence should be translated to IR using the test.py script in DeclEngine:

$ python3 test.py "Qui Deum non audiunt, certe peribunt."
how<SEP><ACC>God<SEP>not<SEP>they hear,<SEP>surely<SEP>they will die.

Then the string should be used with the test subcommand:

$ ./target/release/seq2seq test model.pt "how<SEP><ACC>God<SEP>not<SEP>they hear,<SEP>surely<SEP>they will die."
output: <BOS> those who do not listen to god, will they die.<EOS>

Well done! Comparing Google Translate and our result:

	Translation
GTranslate	Those who do not listen to God will surely perish.
Our result	those who do not listen to god, will they die.

A comparison of translations from the Latin Vulgate (Genesis 8:7):

	Translation
Latin	qui egrediebatur, et non revertebatur, donec siccarentur aquae super terram.
Ground Truth	which went forth and did not return, until the waters were dried up across the earth.
GTranslate	who went out and did not return until the waters were dried up on the earth.
Our result	and he went out, and he did not return, until the waters were dried up upon the earth.

Tokenization

Tokenizers can be tested using the test-tok command:

$ ./target/release/seq2seq test-tok tokenizer.json "This is a test." # <tokenizer> <test-sentence>
["Ġthis", "Ġis", "Ġa", "Ġtest", "."]

Training

Training (tensorboard logs are written to ./logdir/train):

$ python3 src/model.py # generate torchscripts
$ python3 src/split.py # split data
$ ./target/release/seq2seq train ir.txt en.txt 5000 ir-en.txt false 1 # last parameter is number of hours before quitting
Epoch 1 complete!
Epoch 2 complete!
...
Epoch 18 complete!

Checkpoints will be saved to model_<EPOCH>.pt

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
src		src
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
lt.txt		lt.txt
translate.sh		translate.sh
translations.tsv		translations.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeclineFormer

Usage

Evaluation

Tokenization

Training

About

Releases

Packages

Languages

UE2020/DeclineFormer

Folders and files

Latest commit

History

Repository files navigation

DeclineFormer

Usage

Evaluation

Tokenization

Training

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages