Replacing the implicit and explicit regularization effects of dropout with analytically-derived regularizers. The paper accompanying this code: https://arxiv.org/abs/2002.12915.
Please see the lstm-qrnn directory for our experiments with LSTMs on PTB and WikiText-2, and transformer directory for our transformer experiments on WikiText-103.
Code based on the following repos: https://github.com/salesforce/awd-lstm-lm, https://github.com/kimiyoung/transformer-xl