Skip to content

Refactored all ASR collections documentation#15542

Merged
pzelasko merged 24 commits into
mainfrom
asr-collections-ref
May 4, 2026
Merged

Refactored all ASR collections documentation#15542
pzelasko merged 24 commits into
mainfrom
asr-collections-ref

Conversation

@Ssofja
Copy link
Copy Markdown
Collaborator

@Ssofja Ssofja commented Mar 23, 2026

What does this PR do

This PR reperesents the ASR collections' full refactoring
Collection: [docs]

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Signed-off-by: Ssofja <sofiakostandian@gmail.com>
@Ssofja Ssofja requested a review from pzelasko March 23, 2026 23:34
@github-actions github-actions Bot added the ASR label Mar 23, 2026
@Ssofja Ssofja requested review from artbataev and nithinraok March 23, 2026 23:34
@pzelasko pzelasko changed the title Refactored all ASR collections module Refactored all ASR collections documentation Mar 23, 2026
Comment thread docs/source/asr/intro.rst Outdated
Comment thread docs/source/asr/models.rst Outdated
Comment thread docs/source/asr/models.rst Outdated
Comment thread docs/source/asr/models.rst Outdated
Comment thread docs/source/asr/asr_checkpoints.rst Outdated

10) Cleanup step. Compute full batch WER and log. Concatenate loss list and pass to PTL to compute the equivalent of the original (full batch) Joint step. Delete ancillary objects necessary for sub-batching.

Transducer Decoding
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self and other reviewers - decoding docs are now placed in Inference and ASR Language Modeling and Customization


Refer to the :ref:`Audio Augmentors <asr-api-audio-augmentors>` API section for more details.

Tokenizer Configurations
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add one more code block: an example of AggretatedTokenizer


.. _asr-configs-augmentation-configurations:

Augmentation Configurations
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should keep the SpecAugment part of this section.


.. _asr-configs-preprocessor-configuration:

Preprocessor Configuration
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be kept

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, users are normally confused by this portion so would need more documentation - if anything.

use_cer: false
log_prediction: true

BLEU Score
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would revert the compaction of this section - I think it's pretty recent and describes various config tweaks introduced by @bonham79

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is deleting a lot of things that are hidden in the code and some improved user functionality. without this you're basically just forcing dependence on torchmetric documentation - and that ain't pretty.

@nithinraok
Copy link
Copy Markdown
Member

/claude review

Comment thread docs/source/asr/fine_tuning.rst Outdated
Comment on lines +150 to +151
* `CTC Fine-tuning README <https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf/asr_finetune>`_
* `Transducer Fine-tuning README <https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf/asr_finetune>`_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both links point to the exact same URL (examples/asr/conf/asr_finetune). The Transducer link should presumably point to a different location (e.g., examples/asr/asr_transducer or examples/asr/conf/asr_finetune with an anchor for transducer-specific instructions). As-is, labeling two identical URLs as "CTC" and "Transducer" is misleading.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 24, 2026

Overall this is a clean docs refactor. One issue found:

  • fine_tuning.rst: The CTC and Transducer fine-tuning README links both point to the same URL — one of them likely needs a different target.

Minor note: docs/source/asr/all_chkpt.rst appears to be orphaned after this PR (no remaining references point to it). Consider deleting it or adding a redirect if it was intentionally replaced by asr_checkpoints.rst.

.. list-table::
:header-rows: 1

* - Model
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc some of these didn't really prioritize PnC no?

Comment thread docs/source/asr/asr_checkpoints.rst Outdated
* - `nemotron-speech-streaming-en-0.6b <https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b>`__
- Hybrid
- ASR, streaming
- en
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be more economical to just list the architecture and configure a list of supported language models, or maybe a matrix?

Comment thread docs/source/asr/asr_checkpoints.rst Outdated
* - `stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc <https://huggingface.co/nvidia/stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc>`__
- Hybrid
- ASR, PnC, streaming
- ka
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah on Piotr's above point, few know the georgian language code off hand.

Comment thread docs/source/asr/asr_checkpoints.rst
use_cer: false
log_prediction: true

BLEU Score
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is deleting a lot of things that are hidden in the code and some improved user functionality. without this you're basically just forcing dependence on torchmetric documentation - and that ain't pretty.

Comment thread docs/source/asr/fine_tuning.rst Outdated
2. **Use Lhotse dataloading** for efficient training with dynamic batching. See :doc:`Lhotse Dataloading </dataloaders>`.
3. **Monitor validation WER** closely — fine-tuning can overfit quickly on small datasets.
4. **Use spec augmentation** during fine-tuning to improve robustness.
5. **For multilingual fine-tuning**, consider using ``AggregateTokenizer`` and the Hybrid model with prompt conditioning.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provide link for both

Comment thread docs/source/asr/fine_tuning.rst Outdated
1. **Start with a low learning rate** — fine-tuning with too high a learning rate can destroy pretrained features.
2. **Use Lhotse dataloading** for efficient training with dynamic batching. See :doc:`Lhotse Dataloading </dataloaders>`.
3. **Monitor validation WER** closely — fine-tuning can overfit quickly on small datasets.
4. **Use spec augmentation** during fine-tuning to improve robustness.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to doc page


.. code-block:: python

config = model.get_transcribe_config()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give example transcribe config. this is a more obfuscated aspect of transcription in the codebase

Comment thread docs/source/asr/models.rst Outdated
@@ -1,17 +1,9 @@
Models
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move parakeet before canary - more successful so people will be hunting for it


.. _Conformer-HAT_model:

Conformer-HAT
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep these on a legacy model page?

@artbataev artbataev mentioned this pull request Mar 25, 2026
8 tasks
Ssofja and others added 17 commits March 29, 2026 18:41
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Signed-off-by: Ssofja <sofiakostandian@gmail.com>
Merge branch 'asr-collections-ref' of github.com:NVIDIA/NeMo into asr-collections-ref

Signed-off-by: Ssofja <sofiakostandian@gmail.com>
@Ssofja Ssofja force-pushed the asr-collections-ref branch from 17d3941 to 4ad4a65 Compare April 14, 2026 21:37
Comment thread docs/source/asr/asr_checkpoints.rst Outdated
* - **PnC**
- Punctuation and Capitalization in the output
* - **Streaming**
- Real-time / cache-aware inference capability
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add SALM - Speech augmented Language Model in the glossary for canary qwen

Parakeet, Nemotron Speech, and the ``stt_*_fastconformer_*`` models below all share the same underlying FastConformer encoder;
the different names reflect release branding, not architectural differences.

.. list-table::
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this table define language in Size, and the next table of streaming models defines language in Language? Add Language column here.

Comment thread docs/source/asr/asr_checkpoints.rst Outdated
* - `parakeet-rnnt-110m-da-dk <https://huggingface.co/nvidia/parakeet-rnnt-110m-da-dk>`__
- RNN-T
- ASR
- 110M (Danish)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should not have been resolved, it wasn't addressed. Similar cases above. @Ssofja

Comment thread docs/source/asr/asr_checkpoints.rst Outdated
Loading Models
--------------

All models can be loaded via the ``from_pretrained()`` API:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revise:

All models (except SALM) ...  # + make SALM linked to SpeechLM2 docs

@@ -1,102 +1,92 @@
.. _asr-configs-dataset-configuration:

NeMo ASR Configuration Files
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this file from scratch again I now see that this PR discards the entire documentation about setting model hyperparameters (how to set a given encoder type, layer dimension, decoder type, loss type, loss hparams, etc.) - we need those back, if anything the documentation was maybe even too obscure in the first place. It's OK to discard OLD things like LSTM encoder but for FastConformer we need a comprehensive doc with available options.


.. code-block:: bash

python examples/asr/speech_to_text_finetune.py \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this command actually wouldn't work because it doesn't specify init_from_nemo/pretrained_model. Let's either show a proper example using config, or proper example using CLI options, but make sure that if somebody tries to run it this way, it will work OK.

Comment thread docs/source/asr/fine_tuning.rst Outdated
- joint


Enforcing a Single Language During Inference
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this have to do in fine tuning? Shouldn't this be in inference documentation?

Fine-Tuning with HuggingFace Datasets
---------------------------------------

NeMo supports loading datasets directly from HuggingFace:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note saying this is not currently supported in lhotse dataloader.

Comment thread docs/source/asr/fine_tuning.rst Outdated
For the complete configuration reference, see :doc:`Configuration Files <./configs>`.


Execution Flow
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These link to training execution flow and not finetuning execution flows, do we need these?

Comment thread docs/source/asr/fine_tuning.rst Outdated
1. **Start with a low learning rate** — fine-tuning with too high a learning rate can destroy pretrained features. Typical fine-tuning LRs are 1e-4 to 1e-5. If your pretrained config uses the Noam (warmup + decay) scheduler, override it with a constant or cosine-annealing schedule to avoid the warmup phase resetting to a high LR.
2. **Use Lhotse dataloading** for efficient training with dynamic batching. See :doc:`Lhotse Dataloading </dataloaders>`.
3. **Use spec augmentation** during fine-tuning to improve robustness. See :ref:`Augmentation Configurations <asr-configs-augmentation-configurations>`.
4. **For multilingual fine-tuning**, consider using ``AggregateTokenizer`` (see :doc:`Configs <./configs>`) and the :ref:`Hybrid model with prompt conditioning <Hybrid-Transducer-CTC-Prompt_model__Config>`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is a good advice. Where is it coming from?

Comment thread docs/source/asr/inference.rst Outdated
# HuggingFace (prefix with nvidia/)
model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2")

# NGC (no prefix)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discard NGC

Comment thread docs/source/asr/inference.rst Outdated
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.restore_from("path/to/checkpoint.nemo")

**From HuggingFace or NGC:**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discard NGC

Comment thread docs/source/asr/inference.rst Outdated

.. code-block:: python

outputs = model.transcribe(audio=["file1.wav", "file2.wav"], batch_size=4)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
outputs = model.transcribe(audio=["file1.wav", "file2.wav"], batch_size=4)
outputs = model.transcribe(audio=["file1.wav", "file2.wav"], batch_size=2)

Comment thread docs/source/asr/inference.rst Outdated

**Advanced configuration:**

See :doc:`Configs <./configs>` for all available ``decoding`` options and :doc:`ASR Language Modeling and Customization <./asr_language_modeling_and_customization>` for decoding customization (confidence, CUDA graphs, language models, word boosting).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configs doesn't explain all available decoding options - where can we find them now? Add if missing and link here.

Comment thread docs/source/asr/inference.rst Outdated

.. code-block:: json

{"audio_filepath": "/path/to/audio.wav", "duration": null, "source_lang": "en", "target_lang": "en", "pnc": "yes", "answer": "na"}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redefined, link to the page in docs explaining Canary2 manifest format

Comment thread docs/source/asr/models.rst Outdated
@@ -1,518 +1,101 @@
Models
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this whole page to Featured Models

Comment thread docs/source/asr/scores.rst Outdated
@@ -1,3 +1,5 @@
:orphan:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used? If not, remove.

Ssofja and others added 3 commits April 18, 2026 12:22
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Ssofja added 2 commits April 27, 2026 21:22
Signed-off-by: Ssofja <sofiakostandian@gmail.com>
Signed-off-by: Ssofja <sofiakostandian@gmail.com>
Copy link
Copy Markdown
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pzelasko
Copy link
Copy Markdown
Collaborator

pzelasko commented May 4, 2026

/ok to test 3937652

Copy link
Copy Markdown
Collaborator

@tbartley94 tbartley94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are resolved

@pzelasko
Copy link
Copy Markdown
Collaborator

pzelasko commented May 4, 2026

bypassing CI - it's a documentation PR and the docs checks passed

@pzelasko pzelasko merged commit fb9a6c4 into main May 4, 2026
68 checks passed
@pzelasko pzelasko deleted the asr-collections-ref branch May 4, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants