WCE LIG: an open-source toolkit for Word Confidence Estimation V1.5

This toolkit, written in python (python3), enables you to estimate the quality of an automatic translation at word level. It outputs a good (G) or a bad (B) label foreach word in of the translation hypothesis.

For instance:

Source: give me some pills
Translation hypothesis: me donner des pilules
Human post-edition: donnes moi des pilules

What the toolkit do?

First, the toolkit pre-process the data, then, it extract some internal and external features. Finally, it outputs a good (G) or a bad (B) label foreach word in of the translation hypothesis based on those features. Actually, the internal features belongs to the translation system and the external features uses external toolkits to extract informations (linguistic or probabilistic)

+ New: add DBnary as feature component. More details in the tools directory.

What are the features extracted?

Here is the list of all the features which are used in the toolkit.

1 Proper Name 17 Left Target POS 25 WPP Exact
2 Unknown Stemming 18 Left Target Word 26 WPP Any
3 Number of Word Occurrences 19 Left Target Stem 27 Max
4 Number of Stemming Occurrences 20 Right Target POS 28 Min
5 Source POS 21 Right Target Word 29 Nodes
6 Source Word 22 Right Target Stem 30 Constituent Label
7 Source Stem 15 Target Word 31 Distance To Root
8 Left Source POS 16 Target Stem 32 Numeric
9 Left Source Word 17 Left Target POS 33 Punctuation
10 Left Source Stem 18 Left Target Word 34 Stop Word
11 Right Source POS 19 Left Target Stem 35 Occur in Google Translate
12 Right Source Word 20 Right Target POS 36 Occur in Bing Translator
13 Right Source Stem 21 Right Target Word 37 Polysemy Count -- Target
14 Target POS 22 Right Target Stem 38 Backoff Behaviour -- Target
15 Target Word 23 Longest Target $N$-gram Length
16 Target Stem 24 Longest Source $N$-gram Length

Detailed description can be founded if the paper directory.

How far can we go?

You can achieve State-of-the-Art WCE results in the WMT shared task ( For English-French quality estimation task:

B Pr=0.4831 Rc=0.3615 F1=0.4135
G Pr=0.8417 Rc=0.8978 F1=0.8688

Metrics used are Precision (Pr), Recall (Rc) and F-Measure F1.

What is needed?

  • Set the WCE_ROOT environment variable (see Readme file)
  • python3
  • PyYAML-3.11
  • NLTK for python 3
  • tools: see tools directory
  • 7zip to decompress data in the input_data directory

Repository description

  • wce_system: contains the core of the system
  • input_data: contains the data used to train your WCE system
  • tools: contains all the tools needed to use fully the toolkit
  • docs: contains the documentation and the scientific papers related to this toolkit


This toolkit is part of the project KEHATH ( funded by the French National Research Agency.


When using this software, please cite:

Christophe Servan, Ngoc Tien Le, Ngoc Quang Luong, Benjamin Lecouteux and Laurent Besacier, 
“An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation”, 
in The Proceedings of The 12th International Workshop on Spoken Language Translation (IWSLT 2015), 
Da Nang, Vietnam, Dec. 2015.


