Skip to content
No description, website, or topics provided.
Python Shell
Branch: master
Clone or download


Welcome to the Duolingo 2020 Shared Task! The shared task website is here:

This repository has code for:

  • Scoring a predictions file
  • Training an example baseline model with fairseq

Python 3.6+ is required. It is strongly recommended that you run this in a virtual environment.



There are no special requirements for running the scoring function.


You can score a predicted file as follows (using the AWS baseline as example, and running in the repo top level directory):

$ python --goldfile staple-2020-train/en_vi/  --predfile staple-2020-train/en_vi/train.en_vi.aws_baseline.pred.txt

Training models

If all you want to do is evaluation, then ignore this section.

Most participants will probably write their own code for this task, but we also provide code for training a vanilla sequence-to-sequence models using fairseq. This does not produce the best results for this task, but it is an obvious baseline and may give you a jumpstart. This code is an adaptation of translation tutorials from fairseq.


Certain scripts require perl to run. If you are on mac or Linux, you probably already have it. See here for more details.

Next, get these repositories:

$ git clone
$ git clone

Go to the file and set the paths for MOSES and SUBWORDNMT accordingly.

Install python requirements:

$ pip install fairseq sacremoses subword_nmt sacrebleu tqdm


The following files are provided.

  • : common BASH variables
  • : to preprocess the data for training with fairseq
  • : to train the model using preprocessed data
  • : script to run pretrained fairseq models
  • : used to convert outputs from fairseq into shared task format files (used in
  • : converts shared task format files into fairseq-readable format (used in

The most relevant files are,, and

Good luck!

If you have questions, feel free to check or post to the mailing list

You can’t perform that action at this time.