A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need
Branch: master
Clone or download


The Transformer model in Attention is all you need:a Keras implementation.

A Keras+TensorFlow Implementation of the Transformer: "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017)


Please refer to en2de_main.py and pinyin_main.py



  • The code achieves near results as in the repository: about 70% valid accuracy. If using smaller model parameters, such as layers=2 and d_model=256, the valid accuracy is better since the task is quite small.

For your own data

  • Just preproess your source and target sequences as the format in en2de.s2s.txt and pinyin.corpus.examples.txt.

Some notes

  • For larger number of layers, the special learning rate scheduler reported in the papar is necessary.
  • In pinyin_main.py, I tried another method to train the deep network. I train the first layer and the embedding layer first, then train a 2-layers model, and then train a 3-layers, etc. It works in this task.