The source code for The Annotated Transformer blog post: http://nlp.seas.harvard.edu/2018/04/03/attention.html Copied from https://github.com/harvardnlp/annotated-transformer Added some useful module and move to Pytorch 0.4.