GitHub - WMT-Metrics-task/wmt20-metrics

Repo to hold code for WMT Metrics Task evaluation

Requirements: Python >= 3.6, numpy and pandas

And optionally, if you'd like to get "winners" of a language pair, i.e. metrics not outperformed by any other based on the William's test for statistical significance, you'll need the r2py library in python, and the psych library in R.

Run results/get-all-results.sh to reproduce results

Intermediate tables with metric scores and correlations for each language pair in results/output

Final latex tables in results/tables

The notebook results/p0-preprocess_scores_and_visualize.ipynb contains code for preprocessing human and metric scores and visualising system level scores and identifying outliers.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
baselines-computed		baselines-computed
final-metric-scores		final-metric-scores
input		input
manual-evaluation		manual-evaluation
results		results
submissions-as-received		submissions-as-received
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baselines-computed

baselines-computed

final-metric-scores

final-metric-scores

input

input

manual-evaluation

manual-evaluation

results

results

submissions-as-received

submissions-as-received

.gitignore

.gitignore

README.md

README.md

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

WMT-Metrics-task/wmt20-metrics

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages