Highlights

NeMo ASR

Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder
Support for codeswitched manifests during training
Support for Language ID during inference for ML models
Support of cache-aware streaming for offline models
Word confidence estimation for CTC & RNNT greedy decoding

NeMo Megatron

Interleaved Pipeline schedule
Transformer Engine for GPT
HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning
IA3 and Adapter Tuning (Tensor + Pipeline Parallel)
Pipeline Parallel Support for T5 Prompt Learning
MegatronNMT export

NeMo TTS

TTS introductory tutorial
Phonemizer/espeak removal (Spanish/German)
Char-only support for Spanish/German models
Documentation Refactor

NeMo Core

Upgrade to NGC PyTorch 22.09 container
Add pre-commit hooks
Exponential moving average (EMA) of weights during training

NeMo Models

ASR Conformer Croatian: stt_hr_conformer_ctc_large and stt_hr_conformer_transducer_large
ASR Conformer Belarusian: stt_be_conformer_ctc_large and stt_be_conformer_transducer_large
ASR Squeezeformer Librispeech: 6 checkpoints (XS, S, SM, M, ML, L)
SLURP Intent Classification / Slot Filling: slu_conformer_transformer_large_slurp
LanguageID AmberNet: langid_ambernet

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.09

Known Issues

Issues

pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.

ASR

Changelog

Add docs tutorial on kinyarwanda asr by @bene-ges :: PR: #4953
Asr codeswitch by @bmwshop :: PR: #4821
Add test for nested ASR model by @titu1994 :: PR: #5002
Greedy decoding confidence for CTC and RNNT by @GNroy :: PR: #4931
[ASR][Tools] RIR corpus generator by @anteju :: PR: #4927
Add Squeezeformer CTC model checkpoints on Librispeech by @titu1994 :: PR: #5121
adding loss normalization options to rnnt joint by @bmwshop :: PR: #4829
Asr concat dataloader by @bmwshop :: PR: #5108
Added ASR model comparison to SDE by @Jorjeous :: PR: #5043
Add scripts for converting Spoken Wikipedia to asr dataset by @bene-ges :: PR: #5138
ASR confidence bug fix for older Python versions by @GNroy :: PR: #5180
Update ASR Scores and Results by @titu1994 :: PR: #5254
[STT] Add Ru ASR Conformer-CTC and Conformer-Transducer by @ssh-meister :: PR: #5340

TTS

Changelog

[TTS] Adding speaker embedding conditioning in fastpitch by @subhankar-ghosh :: PR: #4986
[TTS] Remove PhonemizerTokenizer by @rlangman :: PR: #4990
[TTS] FastPitch speaker interpolation by @subhankar-ghosh :: PR: #4997
RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
[TTS] remove phonemizer.py by @XuesongYang :: PR: #5090
[TTS] Add NeMo TTS Primer Tutorial by @rlangman :: PR: #4933
[TTS] Add SpanishCharsTokenizer by @rlangman :: PR: #5135
Fixes for docs/typos + remove max_utts parameter from tarred datasets as it causes hang in training by @Kipok :: PR: #5118
refactor TTS documentation organization and add new contents. by @XuesongYang :: PR: #5137
[TTS][DOC] update models trained on HifiTTS dataset. by @XuesongYang :: PR: #5173
[TTS] Fix TTS Primer image markup by @rlangman :: PR: #5192
[TTS] deprecate TextToWaveform base class. by @XuesongYang :: PR: #5205
[TTS] remove the avoidance of circular imports by @XuesongYang :: PR: #5214
[TTS] remove LinVocoder and apply Vocoder as parent class. by @XuesongYang :: PR: #5206
[TTS] unify requirements_tts.txt and requirements_torch_tts.txt by @XuesongYang :: PR: #5232
Minor typo fixes in TTS tutorial by @redoctopus :: PR: #5266
Radtts 1.13 by @borisfom :: PR: #5451
Radtts 1.13 plus by @borisfom :: PR: #5457

NLP / NMT

Changelog

IA3 support for GPT and T5 by @arendu :: PR: #4909
Fix and refactor consumed samples save/restore for Megatron models. by @MaximumEntropy :: PR: #5077
Remove unsupported arguments from MegatronNMT by @MaximumEntropy :: PR: #5065
Update megatron interface to dialogue by @Zhilin123 :: PR: #4936
gpt ia3 CI tests by @arendu :: PR: #5140
Fix NMT Eval Sampler by @aklife97 :: PR: #5154
Add interleaved pipeline schedule to GPT by @ericharper :: PR: #5025
fix for bug in bignlp by @arendu :: PR: #5172
Fixes some args that were not removed properly for multilingual Megatron NMT by @MaximumEntropy :: PR: #5142
Fix absolute path in GPT Adapter CI tests by @arendu :: PR: #5184
Add ability to configure drop last batch for validation datasets with MegatronGPT by @shanmugamr1992 :: PR: #5067
Megatron Export Update by @Davood-M :: PR: #5343
Fix GPT generation when using sentencepiece tokenizer by @MaximumEntropy :: PR: #5413
Disable sync_batch_comm in validation_step for GPT by @ericharper :: PR: #5397
Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475

Text Normalization / Inverse Text Normalization

Changelog

[Chinese text normalization] speed up graph building by @pengzhendong :: PR: #5128

NeMo Tools

Changelog

Added ASR model comparison to SDE by @Jorjeous :: PR: #5043

Export

Changelog

Fix export bug by @VahidooX :: PR: #5009
RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164
Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052
Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114
replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217
Megatron Export Update by @Davood-M :: PR: #5343
Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446
export_utils bugfix by @Davood-M :: PR: #5480
Export fixes for Riva by @borisfom :: PR: #5496

General Improvements and Bugfixes

Changelog

don't use bfloat16 when in jit by @bmwshop :: PR: #5051
Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475
Pin Transformers version to fix CI by @SeanNaren :: PR: #4955
Fix changelog builder (#4962) by @titu1994 :: PR: #4963
Checkpoint averaging class fix by @michalivne :: PR: #4946
Add ability to give seperate datasets for test, train and validation by @shanmugamr1992 :: PR: #4798
Add simple pre-commit file by @SeanNaren :: PR: #4983
Import pycuda.autoprimaryctx or pycuda.autoinit to init pycuda execut… by @liji-nv :: PR: #4951
Improvements to AMI script by @SeanNaren :: PR: #4974
clean warnings from tests and CI runs, and prepare for upgrade to PTL 1.8 by @nithinraok :: PR: #4830
Update libraries by @titu1994 :: PR: #5010
add close inactive issues and PRs github action. by @XuesongYang :: PR: #5015
Fix filename extraction in vad_utils.py by @GKPr0 :: PR: #4999
Add black to pre-commit by @SeanNaren :: PR: #5027
[CI] Enable previous build abort when new commit pushed by @SeanNaren :: PR: #5041
Tutorials and Docs for Multi-scale Diarization Decoder by @tango4j :: PR: #4930
Refactor output directory for MSDD Inference Notebook by @SeanNaren :: PR: #5044
text_memmap dataset index range testing fix by @michalivne :: PR: #5034
fix undefined constant in code example by @bene-ges :: PR: #5046
Text generation refactor and RETRO text generation implementation by @yidong72 :: PR: #4985
Lids by @bmwshop :: PR: #4820
Add datasets folder, add diarization datasets voxconverse/aishell by @SeanNaren :: PR: #5042
Fix the bugs in cache-aware streaming Conformer by @VahidooX :: PR: #5032
Bug fix - Limit val batches set to 1.0 by @shanmugamr1992 :: PR: #5023
[bug_fix] kv_channels is used when available by @arendu :: PR: #5066
Add spe_split_by_unicode_script arg by @piraka9011 :: PR: #5072
Transformer Engine Integration by @ericharper :: PR: #5104
Text memmap dataset index memory efficiency by @michalivne :: PR: #5056
Add NGC links for Aligner and FastPitch by @redoctopus :: PR: #5235
Fix link to inference notebook by @redoctopus :: PR: #5247
Fix links to speaker identification notebook by @SeanNaren :: PR: #5260
Fix bug into Dialogue tutorial by @Zhilin123 :: PR: #5277
PCLA tutorial typo fix by @jubick1337 :: PR: #5288
Fix dialogue tutorial bug by @Zhilin123 :: PR: #5297
small bugfix for r1.13.0 by @fayejf :: PR: #5310
Add italian model checkpoints by @Kipok :: PR: #5316
Pcla tutorial fixes by @jubick1337 :: PR: #5313
Fix issue with HF Model upload tutorial by @titu1994 :: PR: #5359
P&C LA tutorial fixes by @jubick1337 :: PR: #5354
Add SDP documentation by @erastorgueva-nv :: PR: #5274
[Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 by @tango4j :: PR: #5375
Rename Speech Dataset Processor to Speech Data Processor by @erastorgueva-nv :: PR: #5378
fix for num worker 0 causing issues in losses after 1 epoch by @arendu :: PR: #5379
Fixed bug in notebook by @vadam5 :: PR: #5382
Force MHA QKV onto fp32 by @titu1994 :: PR: #5391
Fix for prompt table restore error by @vadam5 :: PR: #5393
Fix activation checkpoint args for T5 by @MaximumEntropy :: PR: #5410
Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5421
disable pc test by @ekmb :: PR: #5426
Revert Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5431
Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False by @MaximumEntropy :: PR: #5420
Add num layers check for full activation checkpointing by @MaximumEntropy :: PR: #5470
Cherry Pick T5 finetuning changes into 1.13 by @MaximumEntropy :: PR: #5478
T5 Eval bugfix by @Davood-M :: PR: #5521
added set_start_method + function param bugfix by @Davood-M :: PR: #5539
Remove notebook by @ericharper :: PR: #5548
Remove broadcast from T5 prompt learning inference by @MaximumEntropy :: PR: #5558
Fix all gather while writing to a file during T5 finetuning by @MaximumEntropy :: PR: #5561

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Neural Modules 1.13.0

Highlights

NeMo ASR

NeMo Megatron

NeMo TTS

NeMo Core

NeMo Models

Detailed Changelogs

Container

Known Issues

ASR

TTS

NLP / NMT

Text Normalization / Inverse Text Normalization

NeMo Tools

Export

General Improvements and Bugfixes

Contributors