This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825299.
This tool was build as part of the GoURMET Project to complete Direct Assessment evaluation on machine translation models and is open sourced under GPL v3. Issues should be raised via the GitHub issues. Code changes can be proposed by opening a pull request.
Direct Assessment is a standard evaluation approach used in academic research to assess the quality of translation. This approach differs from automatic evaluation such as BLEU as the evaluation is carried out by a human rather than an algorithm. The goal of Direct Assessment is to evaluate a translation model by asking a human to compare the quality of a machine translated sentence to a human translated sentence where the human translation is assumed to be the gold standard. For each case there must be a set of three sentences.
- A sentence in the source language
- The same sentence translated into the target language by a human
- The same sentence translated into the target language by a machine
The evaluator will be shown the human translated sentence and the machine translated sentence and asked to rate on a scale from 0 to 100
- If the machine translated sentence adequately expresses the meaning of the human translated sentence.
- The machine translated sentence is a well-written phrase or sentence that is grammatically and idiomatically correct
A more in-depth explanation of Direct Assessment can be found in the papers Continuous Measurement Scales in Human Evaluation of Machine Translation and Is all that Glitters in Machine Translation Quality Estimation really Gold?.