ci(speech): split L0_Unit_Tests GPU/CPU ASR and CPU SpeechLM2 into parallel buckets by ko3n1g · Pull Request #15654 · NVIDIA-NeMo/NeMo

ko3n1g · 2026-04-29T20:16:10Z

Claude summary

Splits three monolithic L0 CI jobs into parallel buckets to reduce wall-clock time:

L0_Unit_Tests_GPU_ASR (was 40 min, timeout 60) → 5 parallel buckets (timeout 15 each):

Bucket 1 (~8 min): confidence/, decoding/test_batched_beam_decoding.py, test_batched_beam_hyps.py, test_batched_hyps_and_alignments.py, test_biasing_multi_model.py, test_ctc_decoding.py, test_cuda_graph_rnnt_greedy_decoding.py, test_multi_task_decoding.py
Bucket 2 (~8 min): decoding/test_multi_task_streaming_decoding.py, test_rnnt_alignments.py, test_rnnt_decoding.py, test_streaming_buffer.py, test_streaming_decoding.py, inference/, k2/, mixins/
Bucket 3 (~11 min): numba/, test_asr_classification_model.py through test_asr_multitalker_models.py
Bucket 4 (~11 min): test_asr_multitask_model_bpe.py alone
Bucket 5 (~6 min): test_asr_parts_submodules_batchnorm.py through utils/

L0_Unit_Tests_CPU_SpeechLM2 (was 17 min, timeout 20) → 4 parallel buckets (timeout 12 each):

Bucket 1 (shard 0/4): test_salm_asr_decoder.py (dominant training_step test)
Bucket 2 (shard 1/4): test_duplex_eartts.py, test_duplex_s2s.py, test_duplex_s2s_speech_decoder.py, test_duplex_stt.py
Bucket 3 (shard 2/4): remaining test files (first half)
Bucket 4 (shard 3/4): remaining test files (second half)

L0_Unit_Tests_CPU_ASR (was 30 min, timeout 30) → 4 parallel buckets (timeout 12 each):

Bucket 1 (~8 min): decoding/test_batched_beam_decoding.py (375 s fixture setup) + small decoding tests
Bucket 2 (~8 min): confidence/ + test_asr_multitask_model_bpe.py
Bucket 3 (~7 min): remaining decoding tests, mixins/, test_asr_local_attn.py
Bucket 4 (~7 min): all remaining files (numba/, inference/, k2/, standalone test files, utils/)

The monolithic job ran 40+ min total. Each bucket targets ≤10 min, distributed by observed wall-clock time from run 25112807335. Bucket mapping (approx times): 1 (~6.6m): confidence/ + fast decoding tests 2 (~9.8m): streaming/rnnt decoding + inference/ + k2/ + mixins/ 3 (~8.6m): numba/ + hybrid/interctc/local_attn models 4 (~10.6m): test_asr_multitask_model_bpe.py (single file, irreducible) 5 (~4.9m): rnnt encoder models + remaining small tests + utils/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

… flaky batchnorm test ASR_1/ASR_2 were failing because the split scripts were missing TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1, which is required by all other model-loading test scripts (Core, Common, TTS). The .nemo checkpoints on /home/TestData use a legacy PyTorch storage format incompatible with weights_only=True in PyTorch 2.6+. ASR_5 was failing because test_from_batchnorm is order-dependent: the monolithic run consumed random state from ~1641 prior tests, whereas the isolated split starts fresh. Fix: add torch.manual_seed(0) for determinism and use atol=1e-5 to reflect the float32 rounding difference between fused (x*W+B) and standard ((x-mean)/std*w+b) formulations. Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

…ripts The original L0_Unit_Tests_GPU_ASR.sh launched with: python -c "from nemo.collections.asr.models import ASRModel" && ... which performs the required module initialization before running the test suite. All five splits were missing this prefix, causing failures in model-loading tests (kenlm, RNNT decoding) that depend on the import side-effects triggered by ASRModel initialization. Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

Signed-off-by: oliver könig <okoenig@nvidia.com>

github-actions · 2026-04-30T11:45:27Z

[🤖]: Hi @ko3n1g 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

github-actions · 2026-04-30T13:32:12Z

[🤖]: Hi @ko3n1g 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

pzelasko

Thanks @ko3n1g !

github-actions Bot added the CI label Apr 29, 2026

copy-pr-bot Bot temporarily deployed to test April 29, 2026 20:18 Inactive

github-actions Bot added the ASR label Apr 29, 2026

copy-pr-bot Bot temporarily deployed to test April 29, 2026 21:25 Inactive

ko3n1g and others added 2 commits April 29, 2026 22:06

fix(ci): drop TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD from ASR split scripts

1940e77

Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

github-actions Bot removed the ASR label Apr 29, 2026

copy-pr-bot Bot temporarily deployed to test April 29, 2026 22:09 Inactive

feat(ci): split L0_Unit_Tests_CPU_SpeechLM2 into 3 parallel buckets

d7817b1

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot Bot had a problem deploying to test April 29, 2026 22:57 Error

feat(ci): split L0_Unit_Tests_CPU_ASR into 4 parallel buckets

72cdd25

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot Bot temporarily deployed to test April 29, 2026 23:01 Inactive

ko3n1g changed the title ~~ci(speech): split L0_Unit_Tests_GPU_ASR into 5 parallel buckets~~ ci(speech): split L0_Unit_Tests GPU/CPU ASR and CPU SpeechLM2 into parallel buckets Apr 29, 2026

ko3n1g requested a review from pzelasko April 29, 2026 23:29

ko3n1g enabled auto-merge (squash) April 29, 2026 23:30

ko3n1g added 2 commits April 29, 2026 23:36

refactor(ci): replace explicit file lists with pytest-shard

e627520

Signed-off-by: oliver könig <okoenig@nvidia.com>

build(deps): add pytest-shard to test dependencies

4ab7439

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot Bot temporarily deployed to test April 29, 2026 23:39 Inactive

feat(ci): add one more shard to GPU ASR (6) and CPU ASR (5)

e52a4c0

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot Bot temporarily deployed to test April 30, 2026 08:51 Inactive

feat(ci): split GPU Audio into 2 shards, expand CPU ASR to 6 shards

8739c5e

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot Bot temporarily deployed to test April 30, 2026 09:43 Inactive

chore: merge origin/main

b61c6ee

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot Bot temporarily deployed to test April 30, 2026 10:31 Inactive

ci(speechlm2): split L0_Unit_Tests_CPU_SpeechLM2_2 into four shards

252dd6b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot Bot temporarily deployed to test April 30, 2026 12:08 Inactive

pzelasko approved these changes Apr 30, 2026

View reviewed changes

ko3n1g merged commit 01485cd into main Apr 30, 2026
183 of 185 checks passed

ko3n1g deleted the ko3n1g/feat/split-asr-l0-unit-tests branch April 30, 2026 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(speech): split L0_Unit_Tests GPU/CPU ASR and CPU SpeechLM2 into parallel buckets#15654

ci(speech): split L0_Unit_Tests GPU/CPU ASR and CPU SpeechLM2 into parallel buckets#15654
ko3n1g merged 12 commits into
mainfrom
ko3n1g/feat/split-asr-l0-unit-tests

ko3n1g commented Apr 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

pzelasko left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ko3n1g commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

pzelasko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ko3n1g commented Apr 29, 2026 •

edited

Loading