Skip to content

Latest commit

 

History

History

interleaved_bidirectional_transformer

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Fast Interleaved Bidirectional Sequence Generation

Sequence-to-sequence (seq2seq) models, Transformer in particular, still suffer from slow decoding due to the autoregressive decoding constraint. Researchers thus attempt to relax this constraint by resorting to semi-to-non autoregressive modeling, often gaining translation speedup but at the cost of model performance.

We follow this direction: trying to produce multiple target tokens per decoding step, with a specific focus on the semi-autoregressive (SA) modeling. One drawback of the vanilla SA model is that it imposes independence assumption on
neighbouring target tokens, ignoring the fact that neighbouring words are often strongly correlated. By contrast, we explore bidirectional generation from the left-to-right and the right-to-left simultaneously, and show evidence that the independence assumptions in our model are more felicitous.

We propose interleaved bidirectional decoder (IBDecoder), that interleaves target words from the left-to-right and right-to-left directions and separate their positions to support reusing any standard unidirectional decoders. See figure below for illustration:

Our experiments on several seq2seq tasks, including machine translation and document summarization demonstrate the superiority of IBDecoder, yielding comparable performance against the autoregressive baseline without knowledge distillation with a speedup of ~2x. With knowledge distillation, IBDecoder achieves 4x-11x speedups across different tasks at the cost of <1 BLEU or <0.5 ROUGE (on average) by producing more target tokens (beyond two) at each decoder step.

Model Training & Evaluation

Please go to the interleaved_bidirectional_transformer branch for more details. The source code is compatible with the master version: no need for specially preprocessing the corpus for bidirectional modeling.

Performance and Download

We provide pretrained models and preprocessed datasets for all tasks we used. Below is the detailed results.

Model KD Machine Translation Document Summarization
En-De En-Fr Ro-En En-Ru En-Ja Gigaword CNNDailyMail
Data download download download download download download download
Beam Search (B= 4) Quality Transformer no 26.9 32.1 32.7 27.7 43.97 35.03 36.88
IBDecoder no 26.2 model txt 32.1 model txt 33.3 model txt 27.0 model txt 43.51 model txt 34.57 model txt 36.11 model txt
+SA no 23.0 model txt 30.3 model txt 31.3 model txt 25.0 model txt 41.75 model txt 33.65 model txt 35.27 model txt
IBDecoder yes 27.1 model txt 32.7 model txt 33.5 model txt 27.5 model txt 43.76 model txt 35.12 model txt 36.46 model txt
+SA yes 26.3 model txt 31.3 model txt 32.7 model txt 26.4 model txt 42.99 model txt 34.74 model txt 36.27 model txt
Speedup IBDecoder yes 1.90x 1.75x 1.79x 1.82x 1.86x 2.35x 3.02x
+SA yes 3.31x 3.41x 3.37x 3.30x 3.10x 4.20x 6.55x
Greedy Search Quality Transformer no 26.0 31.6 32.3 27.8 42.95 34.88 34.51
IBDecoder no 25.0 txt 31.7 txt 32.6 txt 26.8 txt 43.29 txt 34.22 txt 36.74 txt
+SA no 21.7 txt 29.0 txt 30.4 txt 24.3 txt 41.05 txt 33.25 txt 35.04 txt
IBDecoder yes 26.8 txt 32.2 txt 33.2 txt 28.2 txt 43.79 txt 35.18 txt 37.03 txt
+SA yes 26.0 txt 30.7 txt 32.4 txt 26.5 txt 42.70 txt 34.63 txt 36.39 txt
Speedup IBDecoder yes 2.33x 2.18x 2.37x 2.37x 2.40x 3.51x 6.36x
+SA yes 4.35x 4.20x 4.17x 4.14x 4.34x 5.83x 11.15x

Citation

Please consider cite our paper as follows:

Biao Zhang; Ivan Titov; Rico Sennrich (2020). Fast Interleaved Bidirectional Sequence Generation. In Proceedings of Fifth Conference on Machine Translation (WMT20).

@inproceedings{zhang-etal-2020-fast,
    title = "Fast Interleaved Bidirectional Sequence Generation",
    author = {Zhang, Biao  and
      Titov, Ivan  and
      Sennrich, Rico},
    booktitle = "Proceedings of Fifth Conference on Machine Translation (WMT20)",
    year = "2020",
}