v1.6.0

This release adds support for a new 196M parameter model a.k.a. "large", improves WER on long-utterances, increases training speed and makes a number of smaller changes. For a summary of the {base, large} inference performance, WER and training times please refer to the top-level-README.

This release:

Adds the large model configuration
Adds 'Random State Passing' (RSP) as in Narayanan et al., 2019. On in-house validation data this improves WER on long-utterances by ~40% relative
Removes the hard-LSTM finetune instructions as we now support soft-LSTMs in hardware
Makes the following changes to training script defaults:
- WEIGHT_DECAY=0.001 -> WEIGHT_DECAY=0.01
- HOLD_STEPS=10880 -> HOLD_STEPS=18000. We find that this, combined with the change to WEIGHT_DECAY results in ~5% relative reduction in WER
- custom_lstm: false -> custom_lstm: true in yaml configs. This is required to support RSP
Increases training speed (see summary):
- by packing samples in loss calculation in order to skip padding computation. This may facilitate higher per-gpu batch sizes
- for WebDataset reading by using multiprocessing
Makes miscellaneous changes including:
- setting of SEED in dataloader to make runs deterministic. Previously, data order and weights were deterministic but there was some run-to-run variation due to dither
- addition of schema checking to ensure trained and exported model checkpoints are compatible with the downstream inference server
- addition of gradient noise augmentation (off by default)
- switching the order of WEIGHTS_INIT_SCALE=0.5 and forget_gate_bias=1.0 during weight initialisation so that we now (correctly) initialise the LSTM forget gate bias to 1.0
- code organisation and refactoring (e.g. we add new Setup classes to reduce object building repetition)
- improvements to Tensorboard launch script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.6.0

v1.6.0