Skip to content

bicici/ParFDAWMT15

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

parfdaWMT15

parfda WMT'15 SMT Datasets

We make the English, Czech, Finnish, French, German, and Russian datasets available used when building parfda Moses SMT systems for research purposes:

https://drive.google.com/a/dcu.ie/folderview?id=0B6Jae6trZb1afjJ1T0ZOZlZFZUk0S2R3Z0U3eVdxN2tpQlVwTUgyX0tteVk4TnlhRHVJR2M&usp=sharing

Reference translations for the test set are available from http://www.statmt.org/wmt15/translation-task.html. Results are presented in the following citation from WMT'15 (http://www.statmt.org/wmt15/).

Citation:

Ergun Biçici, Qun Liu, and Andy Way. ParFDA for Fast Deployment of Accurate Statistical Machine Translation Systems, Benchmarks, and Statistics. In Proceedings of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, September 2015.

The datasets and the SMT results can serve as a benchmark for SMT research where further linguistic processing can be performed. The datasets allow fast deployment of accurate SMT systems and can be used for benchmarking the performance of SMT systems.

Language model corpora used contain 15M sentences some of which are selected from LDC Gigaword corpora by ParFDA: [5 use the LDC English Gigaword 5th edition]

  • Czech - English
  • Finnish - English
  • French - English
  • German - English
  • Russian - English

[1 use the LDC French Gigaword 3rd edition]

  • English - French

LICENSE: Dublin City University License for Open Data allowing use for research and academic purposes.

About

parfda WMT'15 Datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published