Appraise evaluation system for manual evaluation of machine translation output
Python HTML JavaScript CSS Perl Shell PowerShell
Switch branches/tags
Nothing to show

Appraise Evaluation System

Current release used to run the evaluation of the ACL 2016 First Conference on Machine Translation (WMT16). It has also been used for WMT 2015, 2014 and 2013. Second major release in time for the Seventh MT Marathon 2012 which took place September 3-8, 2012 in Edinburgh, Scotland. Initial import into GitHub on Oct 23, 2011. First versions of this software appeared in summer 2008...


We are currently finishing preparations for WMT16 — Evaluation campaign at Stay tuned for official kick off.


Appraise has been updated for WMT15 — Evaluation campaign at — Follow #WMT15 on for updates. Invite tokens have been sent out to participants. For research group registration details or problems drop me an email: cfedermann [at] gmail [dot] com

Updates 2015


Here we go again! #WMT15 evaluation campaign is running!
Happy annotating! -- #WMT #Appraise

— Christian Federmann (@cfedermann) May 8, 2015
<script async src="//" charset="utf-8"></script>


Follow #WMT14 on — Evaluation campaign at

For research group registration details or problems drop me a note via email: cfedermann [at] gmail [dot] com

Updates 2014

2014-03-19 User changeable passwords and new action menu in navigation bar; go to when logged in to change the password for your Appraise account. You can also use the lovely new user action menu on the top right of the navigation bar ("Admin", of course, only visible to some):


Finally! #WMT14 evaluation campaign is live! --

— Christian Federmann (@cfedermann) March 18, 2014
<script async src="//" charset="utf-8"></script>

There's a new release of Appraise for use in the WMT '14; see the new Django app inside appraise.wmt14 for more details. This version also integrates with Amazon's Mechanical Turk, allowing to collect even more manual annotations.


Appraise is an open-source tool for manual evaluation of Machine Translation output. Appraise allows to collect human judgments on translation output, implementing annotation tasks such as

  1. translation quality checking;
  2. ranking of translations;
  3. error classification;
  4. manual post-editing.

It features an extensible XML import/output format and can easily be adapted to new annotation tasks. The next version of Appraise will also include automatic computation of inter-annotator agreements allowing quick access to evaluation results.

Appraise is available under an open, BSD-style license.

How does it look like?

You can see a deployed version of Appraise here. If you want to play around with it, you will need an account in order to login to the system. I’ll be happy to create an account for you, just drop me an email cfedermann [at] gmail [dot] com.

System Requirements

Appraise is based on the Django framework, version 1.3 or newer. You will need Python 2.7 to run it locally. For deployment, a FastCGI compatible web server such as lighttpd is required.

Quickstart Instructions

Assuming you have already installed Python and Django, you can clone a local copy of Appraise using the following command; you can change the folder name Appraise-Software to anything you like.

$ git clone git:// Appraise-Software

After having cloned the GitHub project, you have to initialise Appraise. This is a two-step process:

  1. Initialise the SQLite database:

    $ cd Appraise-Software/appraise
    $ python syncdb
  2. Collect static files and copy them into Appraise-Software/appraise/static-files. Answer yes when asked whether you want to overwrite existing files.

    $ python collectstatic

    More information on handling of static files in Django 1.3+ is available here.

Finally, you can start up your local copy of Django using the runserver command:

$ python runserver

You should be greeted with the following output from your terminal:

Validating models...

0 errors found
Django version 1.3.1, using settings 'appraise.settings'
Development server is running at
Quit the server with CONTROL-C.

Point your browser to and there it is…

Add users

Users can be added here.

Add evaluation tasks

Evaluation tasks can be created here.

You need an XML file in proper format to upload a task; an example file can be found in examples/sample-ranking-task.xml .

Deployment with lighttpd

You will need to create a customised script inside Appraise-Software/appraise. There is a .sample file available in this folder which should help you get started quickly. In a nutshell, you have to uncomment and edit the last two lines:

# /path/to/bin/python runfcgi host= port=1234 method=threaded pidfile=$DJANGO_PID

The first line tells Django to start up in FastCGI mode, binding to hostname and port 1234 in our example, running a threaded server and writing the process ID to the file $DJANGO_PID. The .pid files will be used by to properly shutdown Appraise.

Using Django’s with the runfcgi command requires you to also install flup into the site-packages folder of your Python installation. It is available from here.

# /path/to/sbin/lighttpd -f /path/to/lighttpd/etc/appraise.conf

The second line starts up the lighttd server using an appropriate configuration file appraise.conf. Have a look at Appraise-Software/examples/appraise-lighttpd.conf to create your own.

Once the various /path/to/XYZ settings are properly configured, you should be able to launch Appraise in production mode.


If you use Appraise in your research, please cite the MT Marathon 2012 paper:

Christian Federmann Appraise: An Open-Source Toolkit for Manual Evaluation of Machine Translation Output In The Prague Bulletin of Mathematical Linguistics volume 98, Prague, Czech Republic, 9/2012


  author =  {Christian Federmann},
  title =   {Appraise: An Open-Source Toolkit for Manual Evaluation of Machine Translation Output},
  journal = {The Prague Bulletin of Mathematical Linguistics},
  volume =  {98},
  pages =   {25--35},
  year =    {2012},
  address = {Prague, Czech Republic},
  month =   {September}

A previous version of Appraise had been published at LREC 2010:

Christian Federmann Appraise: An Open-Source Toolkit for Manual Phrase-Based Evaluation of Translations In Proceedings of the Seventh Conference on International Language Resources and Evaluation, Valletta, Malta, LREC, 5/2010