Skip to content

ci(speech): split L0_Unit_Tests GPU/CPU ASR and CPU SpeechLM2 into parallel buckets#15654

Merged
ko3n1g merged 12 commits into
mainfrom
ko3n1g/feat/split-asr-l0-unit-tests
Apr 30, 2026
Merged

ci(speech): split L0_Unit_Tests GPU/CPU ASR and CPU SpeechLM2 into parallel buckets#15654
ko3n1g merged 12 commits into
mainfrom
ko3n1g/feat/split-asr-l0-unit-tests

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Apr 29, 2026

Claude summary

Splits three monolithic L0 CI jobs into parallel buckets to reduce wall-clock time:

L0_Unit_Tests_GPU_ASR (was 40 min, timeout 60) → 5 parallel buckets (timeout 15 each):

  • Bucket 1 (~8 min): confidence/, decoding/test_batched_beam_decoding.py, test_batched_beam_hyps.py, test_batched_hyps_and_alignments.py, test_biasing_multi_model.py, test_ctc_decoding.py, test_cuda_graph_rnnt_greedy_decoding.py, test_multi_task_decoding.py
  • Bucket 2 (~8 min): decoding/test_multi_task_streaming_decoding.py, test_rnnt_alignments.py, test_rnnt_decoding.py, test_streaming_buffer.py, test_streaming_decoding.py, inference/, k2/, mixins/
  • Bucket 3 (~11 min): numba/, test_asr_classification_model.py through test_asr_multitalker_models.py
  • Bucket 4 (~11 min): test_asr_multitask_model_bpe.py alone
  • Bucket 5 (~6 min): test_asr_parts_submodules_batchnorm.py through utils/

L0_Unit_Tests_CPU_SpeechLM2 (was 17 min, timeout 20) → 4 parallel buckets (timeout 12 each):

  • Bucket 1 (shard 0/4): test_salm_asr_decoder.py (dominant training_step test)
  • Bucket 2 (shard 1/4): test_duplex_eartts.py, test_duplex_s2s.py, test_duplex_s2s_speech_decoder.py, test_duplex_stt.py
  • Bucket 3 (shard 2/4): remaining test files (first half)
  • Bucket 4 (shard 3/4): remaining test files (second half)

L0_Unit_Tests_CPU_ASR (was 30 min, timeout 30) → 4 parallel buckets (timeout 12 each):

  • Bucket 1 (~8 min): decoding/test_batched_beam_decoding.py (375 s fixture setup) + small decoding tests
  • Bucket 2 (~8 min): confidence/ + test_asr_multitask_model_bpe.py
  • Bucket 3 (~7 min): remaining decoding tests, mixins/, test_asr_local_attn.py
  • Bucket 4 (~7 min): all remaining files (numba/, inference/, k2/, standalone test files, utils/)

The monolithic job ran 40+ min total. Each bucket targets ≤10 min,
distributed by observed wall-clock time from run 25112807335.

Bucket mapping (approx times):
  1 (~6.6m): confidence/ + fast decoding tests
  2 (~9.8m): streaming/rnnt decoding + inference/ + k2/ + mixins/
  3 (~8.6m): numba/ + hybrid/interctc/local_attn models
  4 (~10.6m): test_asr_multitask_model_bpe.py (single file, irreducible)
  5 (~4.9m): rnnt encoder models + remaining small tests + utils/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
… flaky batchnorm test

ASR_1/ASR_2 were failing because the split scripts were missing
TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1, which is required by all other
model-loading test scripts (Core, Common, TTS). The .nemo checkpoints
on /home/TestData use a legacy PyTorch storage format incompatible with
weights_only=True in PyTorch 2.6+.

ASR_5 was failing because test_from_batchnorm is order-dependent: the
monolithic run consumed random state from ~1641 prior tests, whereas
the isolated split starts fresh. Fix: add torch.manual_seed(0) for
determinism and use atol=1e-5 to reflect the float32 rounding
difference between fused (x*W+B) and standard ((x-mean)/std*w+b)
formulations.

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
ko3n1g and others added 2 commits April 29, 2026 22:06
…ripts

The original L0_Unit_Tests_GPU_ASR.sh launched with:
  python -c "from nemo.collections.asr.models import ASRModel" && ...
which performs the required module initialization before running the test
suite. All five splits were missing this prefix, causing failures in
model-loading tests (kenlm, RNNT decoding) that depend on the import
side-effects triggered by ASRModel initialization.

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g changed the title ci(speech): split L0_Unit_Tests_GPU_ASR into 5 parallel buckets ci(speech): split L0_Unit_Tests GPU/CPU ASR and CPU SpeechLM2 into parallel buckets Apr 29, 2026
@ko3n1g ko3n1g requested a review from pzelasko April 29, 2026 23:29
@ko3n1g ko3n1g enabled auto-merge (squash) April 29, 2026 23:30
ko3n1g added 2 commits April 29, 2026 23:36
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

[🤖]: Hi @ko3n1g 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

[🤖]: Hi @ko3n1g 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Copy link
Copy Markdown
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ko3n1g !

@ko3n1g ko3n1g merged commit 01485cd into main Apr 30, 2026
183 of 185 checks passed
@ko3n1g ko3n1g deleted the ko3n1g/feat/split-asr-l0-unit-tests branch April 30, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants