Steps to train a Transformer model on the WMT English-German dataset


  • OpenNMT-tf (>= 1.10.0)
  • SentencePiece

Please follow the instructions to install and build SentencePiece. If installed in a custom location, change the SP_PATH variable in the scripts.

pip install OpenNMT-tf[tensorflow_gpu]>=1.10.0


Data preparation

Before running the script, look at the links in the file header to download the datasets. Depending on the task, you may need to change the filenames and the folders paths.

./ /data/wmt/

where /data/wmt/ contains the raw parallel datasets.

The script will train a SentencePiece model using the same source and target vocabulary. It will tokenize the dataset and prepare the train/valid/test files. A new directory data/ will contain the generated files.


We recommend training on 4 GPUs to get the best performance:


Or if you have only 1 GPU, run the dedicated script:



./ /data/wmt/

Lazy run...

This model achieved the following scores:

Test set NIST BLEU
newstest2014 26.9
newstest2017 28.0