Basic transformer tutorial
The Naive-Transformer

For tutorial we achieve the naive transformer that is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).


  • python 3.5+
  • pytorch 0.4.1+
  • tqdm
  • numpy


The dataset is a naive dataset, randomly made by numpy.

The epoch is 1/10:
The average loss is 4.904755330085754.
The epoch is 2/10:
The average loss is 4.624333620071411.
The epoch is 3/10:
The average loss is 4.62061635017395.
The epoch is 4/10:
The average loss is 4.61753402709961.
The epoch is 5/10:
The average loss is 4.616219253540039.
The epoch is 6/10:
The average loss is 4.615866875648498.
The epoch is 7/10:
The average loss is 4.614339418411255.
The epoch is 8/10:
The average loss is 4.613479561805725.
The epoch is 9/10:
The average loss is 4.613105096817017.
The epoch is 10/10:
The average loss is 4.613206758499145.


MIT © Sohone Guo

