Add ASR fine-tuning skill by pzelasko · Pull Request #15733 · NVIDIA-NeMo/NeMo

pzelasko · 2026-05-27T15:31:40Z

Summary

Add a repo-local nemo-speech-asr-finetune skill for NeMo Speech ASR fine-tuning workflows
Split detailed guidance into staged references for setup/checkpoints, data/Lhotse, architecture/tokenizer/metrics, and training/evaluation
Include Lhotse-first dataloader guidance, OOMptimizer workflow, AED/Canary multitask metrics, checkpoint averaging, and evaluation recommendations

Validation

I tested this skill by asking Codex to finetune parakeet v3 on a polish HF dataset (bigos-v2) and evaluate the improvement on the test set. It autonomously created exp config, set up bucketing, oomptimizer, and reduced the WER from 18.49% to 17.71%.

copy-pr-bot · 2026-05-27T15:31:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

nithinraok · 2026-05-28T15:26:57Z

+- CTC: `examples/asr/asr_ctc/speech_to_text_ctc_bpe.py`
+- RNNT: `examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py`
+- Hybrid RNNT/CTC or TDT/CTC: `examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.py`
+- AED/Canary: `examples/asr/speech_multitask/speech_to_text_aed.py`


should we point them to speechlm2 scripts instead or add a note about it?

No, speechlm2 should have its own skill later. I want to start with ASR because it is very stable already.

nithinraok · 2026-05-28T15:31:34Z

+Before launching a long fine-tune, spend a few minutes on cheap failure checks:
+
+- Confirm the intended NeMo checkout is imported from inside the container.
+- Confirm each training/validation manifest exists, has non-empty `text`, valid `audio_filepath`, and usable


should we provide manifest_row example here on what to expect?

It's described in data-lhotse.md already

nithinraok · 2026-05-28T15:34:01Z

+Standard ASR JSONL:
+
+```json
+{"audio_filepath": "/data/audio/sample.wav", "text": "transcript text", "duration": 3.42}


answer key?

It's mentioned below as Canary's special manifest format

nithinraok · 2026-05-28T15:35:38Z

+```python
+from nemo.collections.asr.models import ASRModel
+
+cfg = ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2", return_config=True)


nit: lets change to v3 to suggest in most cases.

nithinraok · 2026-05-28T15:36:14Z

+Use `examples/asr/speech_to_text_finetune.py` for compatible-architecture fine-tuning. For architecture-specific
+recipes:
+
+- CTC: `examples/asr/asr_ctc/speech_to_text_ctc_bpe.py`


might also be better to include example configs here?

nithinraok · 2026-05-28T15:37:34Z

+  use_cer=False
+```
+
+Use `examples/asr/transcribe_speech.py` for direct transcription and streaming or chunked inference scripts for


Suggested change

Use `examples/asr/transcribe_speech.py` for direct transcription and streaming or chunked inference scripts for

Use `examples/asr/transcribe_speech.py` for direct offline transcription and streaming or chunked inference scripts for

nithinraok · 2026-05-28T16:32:22Z

/claude review

claude

LGTM

nithinraok

Great work @pzelasko . LGTM!

Add ASR fine-tuning skill draft

4b949bf

pzelasko changed the title ~~Add ASR fine-tuning skill draft~~ Add ASR fine-tuning skill May 27, 2026

pzelasko requested review from artbataev, nithinraok, tango4j and tbartley94 May 27, 2026 15:33

pzelasko added 4 commits May 27, 2026 11:35

Refine ASR fine-tuning skill guidance

d71cda6

Improve ASR fine-tuning guardrails

91e16c3

Add ASR refinement and export guidance

25c98bf

Refine ASR fine-tuning iteration guidance

399c792

nithinraok reviewed May 28, 2026

View reviewed changes

Add transcript style preflight to ASR finetune skill

f74d0c4

claude Bot reviewed May 28, 2026

View reviewed changes

pzelasko added 2 commits May 28, 2026 10:48

Refine ASR finetune skill evaluation guidance

0477c5b

Address ASR finetune skill review feedback

f795ba4

nithinraok approved these changes May 28, 2026

View reviewed changes

pzelasko merged commit 5ccc6c8 into main May 28, 2026
44 checks passed

pzelasko deleted the codex/nemo-speech-asr-finetune-skill branch May 28, 2026 23:28

	Use `examples/asr/transcribe_speech.py` for direct transcription and streaming or chunked inference scripts for
	Use `examples/asr/transcribe_speech.py` for direct offline transcription and streaming or chunked inference scripts for

Conversation

pzelasko commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nithinraok commented May 28, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

nithinraok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pzelasko commented May 27, 2026 •

edited

Loading