v1.6.0
v1.6.0
This release adds support for a new 196M parameter model a.k.a. "large", improves WER on long-utterances, increases training speed and makes a number of smaller changes. For a summary of the {base, large}
inference performance, WER and training times please refer to the top-level-README.
This release:
- Adds the
large
model configuration - Adds 'Random State Passing' (RSP) as in Narayanan et al., 2019. On in-house validation data this improves WER on long-utterances by ~40% relative
- Removes the hard-LSTM finetune instructions as we now support soft-LSTMs in hardware
- Makes the following changes to training script defaults:
WEIGHT_DECAY=0.001
->WEIGHT_DECAY=0.01
HOLD_STEPS=10880
->HOLD_STEPS=18000
. We find that this, combined with the change toWEIGHT_DECAY
results in ~5% relative reduction in WERcustom_lstm: false
->custom_lstm: true
in yaml configs. This is required to support RSP
- Increases training speed (see summary):
- by packing samples in loss calculation in order to skip padding computation. This may facilitate higher per-gpu batch sizes
- for
WebDataset
reading by using multiprocessing
- Makes miscellaneous changes including:
- setting of
SEED
in dataloader to make runs deterministic. Previously, data order and weights were deterministic but there was some run-to-run variation due to dither - addition of schema checking to ensure trained and exported model checkpoints are compatible with the downstream inference server
- addition of gradient noise augmentation (off by default)
- switching the order of
WEIGHTS_INIT_SCALE=0.5
andforget_gate_bias=1.0
during weight initialisation so that we now (correctly) initialise the LSTM forget gate bias to 1.0 - code organisation and refactoring (e.g. we add new
Setup
classes to reduce object building repetition) - improvements to Tensorboard launch script
- setting of