Peru is Multilingual, Its Machine Translation Should Be Too?

(Under construction) This repository will include the machine translation models between Spanish and Quechua (Chanka), Aymara, Ashaninka and Shipibo-Konibo, plus the new evaluation dataset (Kirika) for Shipibo-Konibo. You can check all the details in the poster or in the paper.

Abstract

Peru is a multilingual country with a long history of contact between the indigenous languages and Spanish. Taking advantage of this context for machine translation is possible with multilingual approaches for learning both unsupervised subword segmentation and neural machine translation models. The study proposes the first multilingual translation models for four languages spoken in Peru: Aymara, Ashaninka, Quechua and Shipibo-Konibo, providing both many-to-Spanish and Spanish-to-many models and outperforming pairwise baselines in most of them. The task exploited a large English-Spanish dataset for pre-training, monolingual texts with tagged back-translation, and parallel corpora aligned with English. Finally, by fine-tuning the best models, we also assessed the out-of-domain capabilities in two evaluation datasets for Quechua and a new one for Shipibo-Konibo.

Cite

@inproceedings{oncevay-2021-peru,
    title = "{P}eru is Multilingual, Its Machine Translation Should Be Too?",
    author = "Oncevay, Arturo",
    booktitle = "Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.americasnlp-1.22",
    pages = "194--201"
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
peru-mt-poster.pdf		peru-mt-poster.pdf
test-elcomercio.md		test-elcomercio.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Peru is Multilingual, Its Machine Translation Should Be Too?

Abstract

Cite

About

Releases

Packages

aoncevay/mt-peru

Folders and files

Latest commit

History

Repository files navigation

Peru is Multilingual, Its Machine Translation Should Be Too?

Abstract

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages