NVIDIA Neural Modules 1.13.0
Highlights
NeMo ASR
- Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder
- Support for codeswitched manifests during training
- Support for Language ID during inference for ML models
- Support of cache-aware streaming for offline models
- Word confidence estimation for CTC & RNNT greedy decoding
NeMo Megatron
- Interleaved Pipeline schedule
- Transformer Engine for GPT
- HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning
- IA3 and Adapter Tuning (Tensor + Pipeline Parallel)
- Pipeline Parallel Support for T5 Prompt Learning
- MegatronNMT export
NeMo TTS
- TTS introductory tutorial
- Phonemizer/espeak removal (Spanish/German)
- Char-only support for Spanish/German models
- Documentation Refactor
NeMo Core
- Upgrade to NGC PyTorch 22.09 container
- Add pre-commit hooks
- Exponential moving average (EMA) of weights during training
NeMo Models
- ASR Conformer Croatian: stt_hr_conformer_ctc_large and stt_hr_conformer_transducer_large
- ASR Conformer Belarusian: stt_be_conformer_ctc_large and stt_be_conformer_transducer_large
- ASR Squeezeformer Librispeech: 6 checkpoints (XS, S, SM, M, ML, L)
- SLURP Intent Classification / Slot Filling: slu_conformer_transformer_large_slurp
- LanguageID AmberNet: langid_ambernet
Detailed Changelogs
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.09
Known Issues
Issues
- pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.
ASR
Changelog
- Add docs tutorial on kinyarwanda asr by @bene-ges :: PR: #4953
- Asr codeswitch by @bmwshop :: PR: #4821
- Add test for nested ASR model by @titu1994 :: PR: #5002
- Greedy decoding confidence for CTC and RNNT by @GNroy :: PR: #4931
- [ASR][Tools] RIR corpus generator by @anteju :: PR: #4927
- Add Squeezeformer CTC model checkpoints on Librispeech by @titu1994 :: PR: #5121
- adding loss normalization options to rnnt joint by @bmwshop :: PR: #4829
- Asr concat dataloader by @bmwshop :: PR: #5108
- Added ASR model comparison to SDE by @Jorjeous :: PR: #5043
- Add scripts for converting Spoken Wikipedia to asr dataset by @bene-ges :: PR: #5138
- ASR confidence bug fix for older Python versions by @GNroy :: PR: #5180
- Update ASR Scores and Results by @titu1994 :: PR: #5254
- [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer by @ssh-meister :: PR: #5340
TTS
Changelog
- [TTS] Adding speaker embedding conditioning in fastpitch by @subhankar-ghosh :: PR: #4986
- [TTS] Remove PhonemizerTokenizer by @rlangman :: PR: #4990
- [TTS] FastPitch speaker interpolation by @subhankar-ghosh :: PR: #4997
- RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
- [TTS] remove phonemizer.py by @XuesongYang :: PR: #5090
- [TTS] Add NeMo TTS Primer Tutorial by @rlangman :: PR: #4933
- [TTS] Add SpanishCharsTokenizer by @rlangman :: PR: #5135
- Fixes for docs/typos + remove max_utts parameter from tarred datasets as it causes hang in training by @Kipok :: PR: #5118
- refactor TTS documentation organization and add new contents. by @XuesongYang :: PR: #5137
- [TTS][DOC] update models trained on HifiTTS dataset. by @XuesongYang :: PR: #5173
- [TTS] Fix TTS Primer image markup by @rlangman :: PR: #5192
- [TTS] deprecate TextToWaveform base class. by @XuesongYang :: PR: #5205
- [TTS] remove the avoidance of circular imports by @XuesongYang :: PR: #5214
- [TTS] remove LinVocoder and apply Vocoder as parent class. by @XuesongYang :: PR: #5206
- [TTS] unify requirements_tts.txt and requirements_torch_tts.txt by @XuesongYang :: PR: #5232
- Minor typo fixes in TTS tutorial by @redoctopus :: PR: #5266
- Radtts 1.13 by @borisfom :: PR: #5451
- Radtts 1.13 plus by @borisfom :: PR: #5457
NLP / NMT
Changelog
- IA3 support for GPT and T5 by @arendu :: PR: #4909
- Fix and refactor consumed samples save/restore for Megatron models. by @MaximumEntropy :: PR: #5077
- Remove unsupported arguments from MegatronNMT by @MaximumEntropy :: PR: #5065
- Update megatron interface to dialogue by @Zhilin123 :: PR: #4936
- gpt ia3 CI tests by @arendu :: PR: #5140
- Fix NMT Eval Sampler by @aklife97 :: PR: #5154
- Add interleaved pipeline schedule to GPT by @ericharper :: PR: #5025
- fix for bug in bignlp by @arendu :: PR: #5172
- Fixes some args that were not removed properly for multilingual Megatron NMT by @MaximumEntropy :: PR: #5142
- Fix absolute path in GPT Adapter CI tests by @arendu :: PR: #5184
- Add ability to configure drop last batch for validation datasets with MegatronGPT by @shanmugamr1992 :: PR: #5067
- Megatron Export Update by @Davood-M :: PR: #5343
- Fix GPT generation when using sentencepiece tokenizer by @MaximumEntropy :: PR: #5413
- Disable sync_batch_comm in validation_step for GPT by @ericharper :: PR: #5397
- Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
- Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475
Text Normalization / Inverse Text Normalization
Changelog
- [Chinese text normalization] speed up graph building by @pengzhendong :: PR: #5128
NeMo Tools
Export
Changelog
- Fix export bug by @VahidooX :: PR: #5009
- RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
- Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164
- Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052
- Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114
- replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217
- Megatron Export Update by @Davood-M :: PR: #5343
- Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446
- export_utils bugfix by @Davood-M :: PR: #5480
- Export fixes for Riva by @borisfom :: PR: #5496
General Improvements and Bugfixes
Changelog
- don't use bfloat16 when in jit by @bmwshop :: PR: #5051
- Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
- Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475
- Pin Transformers version to fix CI by @SeanNaren :: PR: #4955
- Fix changelog builder (#4962) by @titu1994 :: PR: #4963
- Checkpoint averaging class fix by @michalivne :: PR: #4946
- Add ability to give seperate datasets for test, train and validation by @shanmugamr1992 :: PR: #4798
- Add simple pre-commit file by @SeanNaren :: PR: #4983
- Import pycuda.autoprimaryctx or pycuda.autoinit to init pycuda execut… by @liji-nv :: PR: #4951
- Improvements to AMI script by @SeanNaren :: PR: #4974
- clean warnings from tests and CI runs, and prepare for upgrade to PTL 1.8 by @nithinraok :: PR: #4830
- Update libraries by @titu1994 :: PR: #5010
- add close inactive issues and PRs github action. by @XuesongYang :: PR: #5015
- Fix filename extraction in vad_utils.py by @GKPr0 :: PR: #4999
- Add black to pre-commit by @SeanNaren :: PR: #5027
- [CI] Enable previous build abort when new commit pushed by @SeanNaren :: PR: #5041
- Tutorials and Docs for Multi-scale Diarization Decoder by @tango4j :: PR: #4930
- Refactor output directory for MSDD Inference Notebook by @SeanNaren :: PR: #5044
- text_memmap dataset index range testing fix by @michalivne :: PR: #5034
- fix undefined constant in code example by @bene-ges :: PR: #5046
- Text generation refactor and RETRO text generation implementation by @yidong72 :: PR: #4985
- Lids by @bmwshop :: PR: #4820
- Add datasets folder, add diarization datasets voxconverse/aishell by @SeanNaren :: PR: #5042
- Fix the bugs in cache-aware streaming Conformer by @VahidooX :: PR: #5032
- Bug fix - Limit val batches set to 1.0 by @shanmugamr1992 :: PR: #5023
- [bug_fix] kv_channels is used when available by @arendu :: PR: #5066
- Add spe_split_by_unicode_script arg by @piraka9011 :: PR: #5072
- Transformer Engine Integration by @ericharper :: PR: #5104
- Text memmap dataset index memory efficiency by @michalivne :: PR: #5056
- Add NGC links for Aligner and FastPitch by @redoctopus :: PR: #5235
- Fix link to inference notebook by @redoctopus :: PR: #5247
- Fix links to speaker identification notebook by @SeanNaren :: PR: #5260
- Fix bug into Dialogue tutorial by @Zhilin123 :: PR: #5277
- PCLA tutorial typo fix by @jubick1337 :: PR: #5288
- Fix dialogue tutorial bug by @Zhilin123 :: PR: #5297
- small bugfix for r1.13.0 by @fayejf :: PR: #5310
- Add italian model checkpoints by @Kipok :: PR: #5316
- Pcla tutorial fixes by @jubick1337 :: PR: #5313
- Fix issue with HF Model upload tutorial by @titu1994 :: PR: #5359
- P&C LA tutorial fixes by @jubick1337 :: PR: #5354
- Add SDP documentation by @erastorgueva-nv :: PR: #5274
- [Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 by @tango4j :: PR: #5375
- Rename Speech Dataset Processor to Speech Data Processor by @erastorgueva-nv :: PR: #5378
- fix for num worker 0 causing issues in losses after 1 epoch by @arendu :: PR: #5379
- Fixed bug in notebook by @vadam5 :: PR: #5382
- Force MHA QKV onto fp32 by @titu1994 :: PR: #5391
- Fix for prompt table restore error by @vadam5 :: PR: #5393
- Fix activation checkpoint args for T5 by @MaximumEntropy :: PR: #5410
- Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5421
- disable pc test by @ekmb :: PR: #5426
- Revert Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5431
- Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False by @MaximumEntropy :: PR: #5420
- Add num layers check for full activation checkpointing by @MaximumEntropy :: PR: #5470
- Cherry Pick T5 finetuning changes into 1.13 by @MaximumEntropy :: PR: #5478
- T5 Eval bugfix by @Davood-M :: PR: #5521
- added set_start_method + function param bugfix by @Davood-M :: PR: #5539
- Remove notebook by @ericharper :: PR: #5548
- Remove broadcast from T5 prompt learning inference by @MaximumEntropy :: PR: #5558
- Fix all gather while writing to a file during T5 finetuning by @MaximumEntropy :: PR: #5561