Skip to content

v1.6.0

Compare
Choose a tag to compare
@julianmack julianmack released this 22 Jan 16:30
· 6 commits to main since this release

v1.6.0

This release adds support for a new 196M parameter model a.k.a. "large", improves WER on long-utterances, increases training speed and makes a number of smaller changes. For a summary of the {base, large} inference performance, WER and training times please refer to the top-level-README.

This release:

  • Adds the large model configuration
  • Adds 'Random State Passing' (RSP) as in Narayanan et al., 2019. On in-house validation data this improves WER on long-utterances by ~40% relative
  • Removes the hard-LSTM finetune instructions as we now support soft-LSTMs in hardware
  • Makes the following changes to training script defaults:
    • WEIGHT_DECAY=0.001 -> WEIGHT_DECAY=0.01
    • HOLD_STEPS=10880 -> HOLD_STEPS=18000. We find that this, combined with the change to WEIGHT_DECAY results in ~5% relative reduction in WER
    • custom_lstm: false -> custom_lstm: true in yaml configs. This is required to support RSP
  • Increases training speed (see summary):
    • by packing samples in loss calculation in order to skip padding computation. This may facilitate higher per-gpu batch sizes
    • for WebDataset reading by using multiprocessing
  • Makes miscellaneous changes including:
    • setting of SEED in dataloader to make runs deterministic. Previously, data order and weights were deterministic but there was some run-to-run variation due to dither
    • addition of schema checking to ensure trained and exported model checkpoints are compatible with the downstream inference server
    • addition of gradient noise augmentation (off by default)
    • switching the order of WEIGHTS_INIT_SCALE=0.5 and forget_gate_bias=1.0 during weight initialisation so that we now (correctly) initialise the LSTM forget gate bias to 1.0
    • code organisation and refactoring (e.g. we add new Setup classes to reduce object building repetition)
    • improvements to Tensorboard launch script