Experiments on Multilingual NMT
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin added case/uncase options to t2t-bleu Mar 31, 2018
data Added data for en->de+fi and en->ru+bs Jul 25, 2018
examples basline Attention is all you Need model Mar 20, 2018
legacy added 4 bpe scripts May 8, 2018
models added pipeline for experiments May 20, 2018
temp basline Attention is all you Need model Mar 20, 2018
tools Multilingual BPE Pipeline Jul 25, 2018
vis updated readme Sep 11, 2018
.gitignore BLEU computation in multi-lingual setting Mar 27, 2018
README.md updated readme Sep 11, 2018
config.py added exponential moving avg of parameters May 4, 2018
convert.py basline Attention is all you Need model Mar 20, 2018
data_statistics.py Added translate script Mar 28, 2018
evaluator.py basline Attention is all you Need model Mar 20, 2018
exp_moving_avg.py fixed bug in exp saving average May 8, 2018
expert_utils.py basline Attention is all you Need model Mar 20, 2018
general_utils.py basline Attention is all you Need model Mar 20, 2018
multi_exp.py perform multiple experiments May 28, 2018
multi_translate.py added pipeline for experiments May 20, 2018
optimizer.py updated the Transformer code and multilingual forward graph Mar 26, 2018
preprocess.py added 32k vocab size Mar 28, 2018
requirements.txt updated requirements.txt file May 4, 2018
search_strategy.py fixed bug in exp saving average May 8, 2018
train.py added EMA and general BLEU score calculation May 13, 2018
translate.py added bpe o2m May 14, 2018
utils.py added wmt14 en->de data Apr 1, 2018

README.md

Attention is All you Need (Transformer)

This repository implements the transformer model that was introduced in the paper Attention is All you Need as described in their NIPS 2017 version: https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

This codebase was also used for the bilingual and multilingual translation experiments for the paper "Parameter Sharing Methods for Multilingual Self-Attentional Translation Models"

A high-level view of transformer model is as shown:

transformer

The code in this repository implements the following features:

  • positional encoding
  • multi-head dot-product attention
  • label smoothing
  • warm-up steps based training of Adam optimizer
  • shared weights of the embedding and softmax layers
  • beam search with length normalization
  • exponential moving average checkpoint of parameters

Software Requirements

  • python 3.6
  • pytorch v0.3.1
  • torchtext
  • chainer
  • numpy

One can install the above packages using the requirements file.

pip install -r requirements.txt

Usage

Please refer to scripts under "tools" directory for usage examples.

More details will be added soon.

Dataset

Dataset Statistics included in data directory are:

Dataset Train Dev Test
English-Vietnamese (IWSLT 2015) 133,317 1,553 1,268
English-German (TED talks) 167,888 4,148 4,491
English-Romanian (TED talks) 180,484 3,904 4,631
English-Dutch (TED talks) 183,767 4,459 5,006

Experiments

Bilingual Translation Tasks

Dataset This Repo tensor2tensor GNMT
En -> Vi 28.84 28.12 26.50
En -> De 29.31 28.68 27.01
En -> Ro 26.81 26.38 23.92
En -> Nl 32.42 31.74 30.64
De -> En 37.33 36.96 35.46
Ro -> En 37.00 35.45 34.77
Nl -> En 38.59 37.71 35.81

Citation

If you find this code useful, please consider citing our paper as:

@InProceedings{devendra2018multilingual,
  author = 	"Sachan, Devendra
		and Neubig, Graham,
  title = 	"Parameter Sharing Methods for Multilingual Self-Attentional Translation Models",
  booktitle = 	"Proceedings of the Third Conference on Machine Translation",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  location = 	"Brussels, Belgium"
}

Acknowledgements