Skip to content

Latest commit

 

History

History
39 lines (32 loc) · 1.94 KB

File metadata and controls

39 lines (32 loc) · 1.94 KB

Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)

Pre-trained models

Description Parameters Dataset Model and Test set(s)
Adaptive Inputs
(Baevski and Auli, 2018)
1026M Google Billion Words download (.tar.bz2)
Adaptive Inputs
(Baevski and Auli, 2018)
247M WikiText-103 download (.tar.bz2)

Training an LM with adaptive inputs

First, see the general language modeling README for instructions on preprocessing the WikiText-103 data.

Then use the following training command to train a model with adaptive inputs using the transformer_lm_wiki103 model architecture:

fairseq-train --task language_modeling \
    data-bin/wikitext-103 \
    --save-dir checkpoints/transformer_wikitext-103 \
    --arch transformer_lm_wiki103 \
    --max-update 286000 --max-lr 1.0 --t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 \
    --warmup-updates 16000 --warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer nag --lr 0.0001 --clip-norm 0.1 \
    --criterion adaptive_loss --max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
    --sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d

Citation

@inproceedings{
    baevski2018adaptive,
    title={Adaptive Input Representations for Neural Language Modeling},
    author={Alexei Baevski and Michael Auli},
    booktitle={International Conference on Learning Representations},
    year={2019},
    url={https://openreview.net/forum?id=ByxZX20qFQ},
}