Skip to content
Go to file


Failed to load latest commit information.
Latest commit message
Commit time
Jun 9, 2018
Aug 16, 2017
Jun 7, 2016


Attention-based sequence to sequence learning


  • TensorFlow 1.2+ for Python 3
  • YAML and Matplotlib modules for Python 3: sudo apt-get install python3-yaml python3-matplotlib
  • A recent NVIDIA GPU

How to use

Train a model (CONFIG is a YAML configuration file, such as config/default.yaml):

./ CONFIG --train -v 

Translate text using an existing model:


or for interactive decoding:

./ CONFIG --decode

Example English→French model

This is the same model and dataset as Bahdanau et al. 2015.

config/WMT14/    # download WMT14 data into raw_data/WMT14
config/WMT14/     # preprocess the data, and copy the files to data/WMT14
./ config/WMT14/baseline.yaml --train -v   # train a baseline model on this data

You should get similar BLEU scores as these (our model was trained on a single Titan X I for about 4 days).

Dev Test +beam Steps Time
25.04 28.64 29.22 240k 60h
25.25 28.67 29.28 330k 80h

Download this model here. To use this model, just extract the archive into the seq2seq/models folder, and run:

 ./ models/WMT14/config.yaml --decode -v

Example German→English model

This is the same dataset as Ranzato et al. 2015.

./ config/IWSLT14/baseline.yaml --train -v
Dev Test +beam Steps
28.32 25.33 26.74 44k

The model is available for download here.

Audio pre-processing

If you want to use the toolkit for Automatic Speech Recognition (ASR) or Automatic Speech Translation (AST), then you'll need to pre-process your audio files accordingly. This README details how it can be done. You'll need to install the Yaafe library, and use scripts/speech/ to extract MFCCs from a set of wav files.


  • YAML configuration files
  • Beam-search decoder
  • Ensemble decoding
  • Multiple encoders
  • Hierarchical encoder
  • Bidirectional encoder
  • Local attention model
  • Convolutional attention model
  • Detailed logging
  • Periodic BLEU evaluation
  • Periodic checkpoints
  • Multi-task training: train on several tasks at once (e.g. French->English and German->English MT)
  • Subwords training and decoding
  • Input binary features instead of text
  • Pre-processing script: we provide a fully-featured Python script for data pre-processing (vocabulary creation, lowercasing, tokenizing, splitting, etc.)
  • Dynamic RNNs: we use symbolic loops instead of statically unrolled RNNs. This means that we don't mean to manually configure bucket sizes, and that model creation is much faster.



Attention-based sequence to sequence learning




No packages published