This release reduces the WER on long-utterances. Specifically, the best Earnings21 WER when training on open-source data decreases from 21.85% to 15.57% (when training on open-source data. See latest WERs here). There are a few contributing changes but the most important one is the addition of "state resets with overlaps"

This release:

Adds hosted documentation here
Improves validation. Specifically it:
- Adds state resets with overlaps. This technique is on by default and is described in more detail here
- Enables running validation on Hugging Face datasets. See here
- Automatically reduces validation VRAM usage. See here
Adds an augmentation technique in which we sample across possible tokenizations during training rather than always using the default sentencepiece tokenization. This is on by default. See here
Updates the training script to control checkpoint saving and evaluation using steps rather than epochs. This is to fix an issue where users training on large datasets only saved checkpoints very rarely
- Relatedly, the total length of training is now controlled using --training_steps rather than --epochs
Changes the way in which activations are normalized at the input to the RNN-T. Specifically:
- Streaming norm is replaced with normalization using precomputed mean and stddev of training data mel-bins
- This change is made because streaming norm was only used at inference-time only and resulted in some WER degradation
- See here
Makes the following miscellaneous changes:
- Renames repository and python library to CAIMAN-ASR to match product name
- Reduces the time to start training by >50% by parallelizing the transcript tokenization
- Updates code structure to make it easier to navigate

train.sh defaults: summary of changes

--training_steps=100000 (instead of --epochs=100)
--sr_segment=15, --sr_overlap=3 (addition of state resets)
--max_inputs_per_batch=1e7 (reduces validation VRAM usage)
yaml: sampling: 0.05 (adds tokenizer sampling)
yaml: stats_path: /datasets/stats/STATS_SUBDIR (adds dataset stats normalization)

Assets 2

22 Feb 22:07

julianmack

v1.8.0

5f035d0

v1.8.0

Release notes

This release adds a number of features and increases the training speed by 1.1-2.0x depending on the {model size, hardware} combination.

This release:

Changes the train.sh and val.sh scripts' API so that args are now passed as named command line arguments instead of environment variables (--num_gpus=2 instead of NUM_GPUS=2)
- This is so that the arguments are now spell-checked by the scripts: previously if you set NUM_GPU=2 (no plural GPUs), the scripts would silently fall back to the default rather than alerting the user that the provided arg didn't exist
- The scripts scripts/legacy/train.sh and scripts/legacy/val.sh still use the former API but these do not support features introduced after v1.7.1, and they will be removed in a future release
Increases training throughput (see updated training times):
- Adds batch splitting. This involves splitting the encoder/prediction batches into smaller sub-batches that can run through the joint network & loss w/o going out-of-memory. This results in higher GPU utilisation and is described in more detail here. See --batch_split_factor
- Uses fewer DALI dataloading processes per core during dataloading when training with --num_gpus > 1. See --dali_processes_per_cpu
Adds background noise augmentation using CAIMAN-ASR-BackgroundNoise. See --prob_background_noise
Standardizes the WER calculation. Hypotheses and transcripts are now normalised with the Whisper EnglishSpellingNormalizer before WERs are calculated as described here. This is on by default but can be turned off by setting standardize_wer: false in the yaml config
Makes the following miscellaneous changes:
- Adds ability to validate on directories of files using --val_txt_dir and --val_audio_dir as described here
- Removes the valCPU.sh script. Validation on cpu is now performed by passing the --cpu flag to val.sh
- Bumps PyTorch from 2.0 -> 2.1 and Ubuntu from 20.04 -> 22.04
- Reduces audio volume during narrowband downsampling in order to reduce clipping and improve WER. See --prob_train_narrowband

train.sh defaults: summary of changes

--half_life_steps=10880 (up from 2805)
--prob_background_noise=0.25. By default background noise is now added to 25% of utterances
--dali_processes_per_cpu=1
yaml: normalize_transcripts: true
yaml: standardize_wer: true

Full Changelog: v1.7.1...v1.8.0

Assets 2

29 Jan 19:27

julianmack

v1.7.1

da4feb0

v1.7.1

This release makes small changes. Specifically it adds:

Narrowband (8 kHz) audio augmentation (off by default). Use with PROB_TRAIN_NARROWBAND
Training profiling (off by default). Use with PROFILER=true
Ability to train on subset of data via N_UTTERANCES_ONLY

Full Changelog: v1.6.0...v1.7.1

v1.7.1 patch

This patch to v1.7.0:

Uses top instead of htop for logging cpu usage when PROFILER=true (to avoid truncation with a large number of CPUs)
Correctly sets version numbers

Assets 2

22 Jan 16:30

julianmack

v1.6.0

d04a8ad

v1.6.0

This release adds support for a new 196M parameter model a.k.a. "large", improves WER on long-utterances, increases training speed and makes a number of smaller changes. For a summary of the {base, large} inference performance, WER and training times please refer to the top-level-README.

This release:

Adds the large model configuration
Adds 'Random State Passing' (RSP) as in Narayanan et al., 2019. On in-house validation data this improves WER on long-utterances by ~40% relative
Removes the hard-LSTM finetune instructions as we now support soft-LSTMs in hardware
Makes the following changes to training script defaults:
- WEIGHT_DECAY=0.001 -> WEIGHT_DECAY=0.01
- HOLD_STEPS=10880 -> HOLD_STEPS=18000. We find that this, combined with the change to WEIGHT_DECAY results in ~5% relative reduction in WER
- custom_lstm: false -> custom_lstm: true in yaml configs. This is required to support RSP
Increases training speed (see summary):
- by packing samples in loss calculation in order to skip padding computation. This may facilitate higher per-gpu batch sizes
- for WebDataset reading by using multiprocessing
Makes miscellaneous changes including:
- setting of SEED in dataloader to make runs deterministic. Previously, data order and weights were deterministic but there was some run-to-run variation due to dither
- addition of schema checking to ensure trained and exported model checkpoints are compatible with the downstream inference server
- addition of gradient noise augmentation (off by default)
- switching the order of WEIGHTS_INIT_SCALE=0.5 and forget_gate_bias=1.0 during weight initialisation so that we now (correctly) initialise the LSTM forget gate bias to 1.0
- code organisation and refactoring (e.g. we add new Setup classes to reduce object building repetition)
- improvements to Tensorboard launch script

Assets 2

23 Oct 11:06

julianmack

v1.5.0

ee3be63

v1.5.0

v1.5.0 adds support for evaluation on long utterances, improves logging and makes other small fixes.

This release:

Adds support for validation on long utterances:
- Adds NO_LOSS arg in val.sh script to avoid going OOM (use NO_LOSS=true)
- Uses a faster levenshtein distance calculation
- Unsets MAX_SYMBOLS_PER_SAMPLE cap on decoding length in validation scripts
Improves logging:
- Fixes incorrectly scaled loss in tensorboard
- Records configuration and stdout to files
- Adds per-layer weight & grad norm diagnostics
- Removes historical MLPerf logging remnants
Misc:
- Removes SAVE_MILESTONES arg: the Checkpointer class no-longer deletes any checkpoints
- Fixes issues with WebDataset reading: filename parsing and filtering
- Fixes race condition with mel-stats export

Assets 2

05 Sep 13:46

baji-myrtle

v1.4.0

ea9402d

v1.4.0

This release:

Upgrades pytorch to 2.0
Adds new faster custom LSTM
Provides more details on hardware checkpoint fine tuning

Assets 2

31 Jul 09:47

julianmack

v1.3.0

c5d9abf

v1.3.0

This release:

Adds support for training & validation from the WebDataset format including README instructions
Upgrades DALI to 1.18
Alters repository structure: all training code is now in the pip installable rnnt_train package

Assets 2

06 Jul 12:12

julianmack

v1.2.0

4a54b6a

v1.2.0: Model Update

This release:

Adds config and training code for a higher quality 85M parameter model called base. The existing 49M parameter model is renamed to testing as it isn't optimised for acceleration on FPGA with Myrtle.ai's IP and should only be used for debug purposes. See this table for more information
Adds support for docker following symlinks inside mounted volumes

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release notes

Release Notes

This release:

train.sh defaults: summary of changes

Release notes

train.sh defaults: summary of changes

v1.7.1 patch

v1.6.0

Releases: MyrtleSoftware/caiman-asr

v1.10.1

Release notes

v1.10.0

v1.9.0

Release Notes

This release:

train.sh defaults: summary of changes

v1.8.0

Release notes

train.sh defaults: summary of changes

v1.7.1

v1.7.1 patch

v1.6.0

v1.6.0

v1.5.0

v1.4.0

v1.3.0

v1.2.0: Model Update