WCE LIG: an open-source toolkit for Word Confidence Estimation V1.5
This toolkit, written in python (python3), enables you to estimate the quality of an automatic translation at word level. It outputs a good (G) or a bad (B) label foreach word in of the translation hypothesis.
Source: give me some pills Translation hypothesis: me donner des pilules WCE: B B G G Human post-edition: donnes moi des pilules
What the toolkit do?
First, the toolkit pre-process the data, then, it extract some internal and external features.
Finally, it outputs a good (G) or a bad (B) label foreach word in of the translation hypothesis based on those features.
Actually, the internal features belongs to the translation system and the external features uses external toolkits to extract informations (linguistic or probabilistic)
+ New: add DBnary as feature component. More details in the tools directory.
What are the features extracted?
Here is the list of all the features which are used in the toolkit.
|1 Proper Name||17 Left Target POS||25 WPP Exact|
|2 Unknown Stemming||18 Left Target Word||26 WPP Any|
|3 Number of Word Occurrences||19 Left Target Stem||27 Max|
|4 Number of Stemming Occurrences||20 Right Target POS||28 Min|
|5 Source POS||21 Right Target Word||29 Nodes|
|6 Source Word||22 Right Target Stem||30 Constituent Label|
|7 Source Stem||15 Target Word||31 Distance To Root|
|8 Left Source POS||16 Target Stem||32 Numeric|
|9 Left Source Word||17 Left Target POS||33 Punctuation|
|10 Left Source Stem||18 Left Target Word||34 Stop Word|
|11 Right Source POS||19 Left Target Stem||35 Occur in Google Translate|
|12 Right Source Word||20 Right Target POS||36 Occur in Bing Translator|
|13 Right Source Stem||21 Right Target Word||37 Polysemy Count -- Target|
|14 Target POS||22 Right Target Stem||38 Backoff Behaviour -- Target|
|15 Target Word||23 Longest Target $N$-gram Length|
|16 Target Stem||24 Longest Source $N$-gram Length|
Detailed description can be founded if the paper directory.
How far can we go?
You can achieve State-of-the-Art WCE results in the WMT shared task (http://www.statmt.org/wmt14/quality-estimation-task.html) For English-French quality estimation task:
Metrics used are Precision (Pr), Recall (Rc) and F-Measure F1.
What is needed?
- Set the WCE_ROOT environment variable (see Readme file)
- NLTK for python 3
- tools: see tools directory
- 7zip to decompress data in the input_data directory
- wce_system: contains the core of the system
- input_data: contains the data used to train your WCE system
- tools: contains all the tools needed to use fully the toolkit
- docs: contains the documentation and the scientific papers related to this toolkit
This toolkit is part of the project KEHATH (https://kehath.imag.fr/) funded by the French National Research Agency.
When using this software, please cite:
Christophe Servan, Ngoc Tien Le, Ngoc Quang Luong, Benjamin Lecouteux and Laurent Besacier, “An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation”, in The Proceedings of The 12th International Workshop on Spoken Language Translation (IWSLT 2015), Da Nang, Vietnam, Dec. 2015.