Skip to content
Generative Flow based Sequence-to-Sequence Toolkit written in Python.
Python Perl
Branch: master
Clone or download
Latest commit 8cb4ae0 Sep 11, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
experiments final version Aug 24, 2019
flownmt add readme Sep 3, 2019
images add readme Sep 3, 2019
.gitignore Update .gitignore Aug 14, 2019
LICENSE Initial commit Aug 14, 2019 Update Sep 11, 2019
requirements.txt Update requirements.txt Sep 3, 2019

FlowSeq: a Generative Flow based Sequence-to-Sequence Tookit.

This is the Pytorch implementation for FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow, accepted by EMNLP 2019.

We propose an efficient and effective model for non-autoregressive sequence generation using latent variable models. We model the complex distributions with generative flows, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. On several machine translation benchmark datasets (wmt14-ende, wmt16-enro), we achieved comparable performance with state-of-the-art non-autoregressive NMT models and almost constant-decoding time w.r.t the sequence length.


  • Python version >= 3.6
  • Pytorch version >= 1.1
  • apex
  • Perl


  1. Install NVIDIA-apex.
  2. Install Pytorch and torchvision.


  1. WMT'14 English to German (EN-DE) can be obtained with scripts provided in fairseq.
  2. WMT'16 English to Romania (EN-RO) can be obtained from here.

Training a new model

The MT datasets should be named in the format of train.{language code}, dev.{language code}, test.{language code}, e.g "". Suppose we put the WMT14-ENDE data sets under data/wmt14-ende/real-bpe/, we can train FlowSeq over this data on one node with the following script:

cd experiments

python -u  \
    --nnodes 1 --node_rank 0 --nproc_per_node <num of gpus per node> --master_addr <address of master node> \
    --master_port <port ID> \
    --config configs/wmt14/config-transformer-base.json --model_path <path to the saved model> \
    --data_path data/wmt14-ende/real-bpe/ \
    --batch_size 2048 --batch_steps 1 --init_batch_size 512 --eval_batch_size 32 \
    --src en --tgt de \
    --lr 0.0005 --beta1 0.9 --beta2 0.999 --eps 1e-8 --grad_clip 1.0 --amsgrad \
    --lr_decay 'expo' --weight_decay 0.001 \
    --init_steps 30000 --kl_warmup_steps 10000 \
    --subword 'joint-bpe' --bucket_batch 1 --create_vocab 

After training, under the , there will be saved checkpoints,, config.json, log.txt, vocab directory and intermediate translation results under the translations directory.


  • The argument --batch_steps is used for accumulated gradients to trade speed for memory. The size of each segment of data batch is batch-size / (num_gpus * batch_steps).
  • To train FlowSeq on multiple nodes, we provide a script for the slurm cluster environment /experiments/ or please refer to the pytorch distributed parallel training tutorial.
  • To create distillation dataset, please use fairseq to train a Transformer model and translate the source data set.

Translation and evalutaion

cd experiments

python -u \
    --model_path <path to the saved model> \
    --data_path data/wmt14-ende/real-bpe/ \
    --batch_size 32 --bucket_batch 1 \
    --decode {'argmax', 'iw', 'sample'} \
    --tau 0.0 --nlen 3 --ntr 1

Please check details of arguments here.

To keep the output translations original order of the input test data, use --bucket_batch 0.


    title = {FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow},
    author = {Ma, Xuezhe and Zhou, Chunting and Li, Xian and Neubig, Graham and Hovy, Eduard},
    booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
    address = {Hong Kong},
    month = {November},
    year = {2019}
You can’t perform that action at this time.