parfdaWMT15

parfda WMT'15 SMT Datasets

We make the English, Czech, Finnish, French, German, and Russian datasets available used when building parfda Moses SMT systems for research purposes:

https://drive.google.com/a/dcu.ie/folderview?id=0B6Jae6trZb1afjJ1T0ZOZlZFZUk0S2R3Z0U3eVdxN2tpQlVwTUgyX0tteVk4TnlhRHVJR2M&usp=sharing

Reference translations for the test set are available from http://www.statmt.org/wmt15/translation-task.html. Results are presented in the following citation from WMT'15 (http://www.statmt.org/wmt15/).

Citation:

Ergun Biçici, Qun Liu, and Andy Way. ParFDA for Fast Deployment of Accurate Statistical Machine Translation Systems, Benchmarks, and Statistics. In Proceedings of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, September 2015.

The datasets and the SMT results can serve as a benchmark for SMT research where further linguistic processing can be performed. The datasets allow fast deployment of accurate SMT systems and can be used for benchmarking the performance of SMT systems.

Language model corpora used contain 15M sentences some of which are selected from LDC Gigaword corpora by ParFDA: [5 use the LDC English Gigaword 5th edition]

Czech - English
Finnish - English
French - English
German - English
Russian - English

[1 use the LDC French Gigaword 3rd edition]

English - French

LICENSE: Dublin City University License for Open Data allowing use for research and academic purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
DCU_OpenDataLicence_ParFDA5WMT15.pdf		DCU_OpenDataLicence_ParFDA5WMT15.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parfdaWMT15

About

Releases

Packages

bicici/ParFDAWMT15

Folders and files

Latest commit

History

Repository files navigation

parfdaWMT15

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages