Skip to content

Releases: Rishit-dagli/GLU

GLU v0.1.0

04 May 08:39
c857d1d
Compare
Choose a tag to compare

This is the initial release of glu-tf, an easy-to-use library for GLU (Gated Linear Units) and GLU variants in TensorFlow. This repository allows you to easily make use of the following activation functions:

  • GLU introduced in the paper Language Modeling with Gated Convolutional Networks [1]
  • Bilinear introduced in the paper Language Modeling with Gated Convolutional Networks [1] atrributed to Mnih et al. [2]
  • ReGLU introduced in the paper GLU Variants Improve Transformer [3]
  • GEGLU introduced in the paper GLU Variants Improve Transformer [3]
  • SwiGLU introduced in the paper GLU Variants Improve Transformer [3]
  • SeGLU

Gated Linear Units consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using different nonlinear (or even linear) functions in place of sigmoid. In the GLU Variants Improve Transformer [3] paper, in a fine-tuning scenario the new variants seem to produce better perplexities for the de-noising objective used in pre-training, as well as better results on many downstream language-understanding tasks. Furthermore these do not have any apparent computational drawbacks.

References

[1] Dauphin, Yann N., et al. ‘Language Modeling with Gated Convolutional Networks’. ArXiv:1612.08083 [Cs], Sept. 2017. arXiv.org, http://arxiv.org/abs/1612.08083.

[2] Mnih, A., and Hinton, G. 2007. Three new graphical models for statistical language modelling. In Proceedings of the 24th international conference on Machine learning (pp. 641–648).

[3] Shazeer, Noam. ‘GLU Variants Improve Transformer’. ArXiv:2002.05202 [Cs, Stat], Feb. 2020. arXiv.org, http://arxiv.org/abs/2002.05202.