Files related to the publication "A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions" at EACL 2017
antot/neural_vs_-phrasebased_smt_eacl17
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
master
Could not load branches
Nothing to show
Could not load tags
Nothing to show
{{ refName }}
default
Code
-
Clone
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
- Open with GitHub Desktop
- Download ZIP
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
This repository contains files related to the publication: Antonio Toral and Víctor M. Sánchez-Cartagena. A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions. 15th EACL Conference. 2017. ------------------- Contents (folders): ------------------- code/ Code developed (Python): - Stemmer - Tokenizer - Scores by length data*/ Urls to download the monolingual and parallel training data used third/ Third party code used - chrF evaluation metric - Czech stemmer - hjerson - Moses v3 scripts references/ References of WMT16 in their original format (sgm) and processed: - tokenised (.tok) - truecased (.true) - stemmed (.base). Czech stemming has 2 variants (-aggresive, -light) systems/ MT outputs of the best submissions at WMT6 in their original format (sgm) and processed: - tokenised (.tok) - truecased (.true) - stemmed (.base). Czech stemming has 2 variants (-aggresive, -light) overlaps/ Output of the output similarity experiment (Section 3) scores_*length/ Outputs of the experiment regarding sentence length (Section 6) hjerson/ Outputs of the error categories experiment (Section 7) in different formats: - .cats - .errs - .html - .out (used to report results in the paper, Tables 7 and 8) - .sents ----------------- Contents (files): ----------------- preprocessing.sh Data preprocessing: - desgmise - train monolingual data - train parallel data - reference translations and MT outputs experiments.sh Experiments: - Output similarity (Section 3). Function overlap_metric - Fluency (Section 4). VSC TODO - Reordering (Section 5). VSC TODO - Sentence length (Section 6). Function scores_by_length - Error categories (Section 7). Function hjerson systems_list.txt List and description of the machine translation systems used in the experiments russian_stem_fix.txt Instructions to fix the output of the stemmer used for Russian
About
Files related to the publication "A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions" at EACL 2017
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published