Skip to content

AboutLessie/wmt-2017-cs-en

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WMT2017 Czech-English machine translation challenge for news

Translate news articles from Czech into English.

This is WMT2017 news challenge reformatted as a Gonito.net challenge, all the data were taken from http://www.statmt.org/wmt17/translation-task.html.

BLEU is used as the evaluation metric.

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • train/ — directory with training data
  • train/commoncrawl.tsv.xz — Common Crawl parallel corpus
  • train/news-commentary-v12.tsv.xz — News Commentary parallel corpus
  • train/europarl-v7.tsv.xz — Europarliament parallel corpus
  • dev-0/ — directory with dev (test) data (Newstest 2013)
  • dev-0/in.tsv — German input text for the dev set
  • dev-0/expected.tsv — English reference translation for the dev set
  • test-A — directory with test data
  • test-A/in.tsv — German input data for the test set (WMT2017 test set)
  • test-A/expected.tsv — English reference translation for the test set

Training sets

All training sets were compressed with xz, use xzcat to decompress:

$ xzcat train/*.tsv.xz | ...

The pairs where German or English side is empty were removed from the training sets.

Test sets

Reference English translations in the dev and test sets is not tokenised.

Monolingual data

Monolingual data was not included here.

About

TAU zad 2019/20 - tłumaczenie maszynowe z czeskiego na angielski z wykorzystaniem MOSES'a

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages