Fix Sortformer tutorial issues and add InferenceSortformerStage benchmark by melllinia · Pull Request #1764 · NVIDIA-NeMo/Curator

melllinia · 2026-04-08T14:08:53Z

Summary

Fix EnsureMonoStage to use ffmpeg instead of sox (not installed in nightly container)
Fix InferenceSortformerStage.setup() model-path resolution bug (_resolve_model_path)
Cap CallHomeReaderStage workers to 1 via xenna_stage_spec (prevents crash on 64-CPU hosts)
Update README prerequisites and model limitations
Add audio_sortformer_benchmark.py benchmarking script

Closes #1728

…mark Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

copy-pr-bot · 2026-04-08T14:08:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-04-08T14:11:34Z

Greptile Summary

This PR fixes two runtime bugs in the Sortformer tutorial pipeline (sox→ffmpeg for EnsureMonoStage, model-path resolution in setup()), caps CallHomeReaderStage workers to 1 to prevent crashes on high-CPU hosts, bumps the default model to diar_streaming_sortformer_4spk-v2.1, and adds a new nightly benchmark script with RTF/throughput/segment metrics.

Confidence Score: 4/5

Safe to merge after addressing the chunk_left_context CLI gap; an unresolved P1 from a prior review (Ray cluster not stopped on exception) persists in run.py.

All new findings in this review are P2. However a P1 from the previous review round (ray_client.stop() unguarded by try/finally in run.py) remains unaddressed, which justifies holding at 4 rather than 5.

tutorials/audio/callhome_diar/run.py — Ray cluster cleanup and the new chunk_left_context parameter exposure.

Important Files Changed

Filename	Overview
nemo_curator/stages/audio/inference/sortformer.py	Model path resolution refactored into `_resolve_model_path()`, fixing the setup() bug; `chunk_left_context` added; `spkcache_update_period` guarded with `hasattr`; positional encoding extension for long audio added. Mostly clean changes; residual P1 (`batch_duration` unused) from prior review now gone.
tutorials/audio/callhome_diar/run.py	sox→ffmpeg swap, `xenna_stage_spec` workers cap, model version bump applied. `chunk_left_context` added to stage but not exposed as CLI arg. `ray_client.stop()` still not guarded by try/finally (pre-existing P1).
benchmarking/scripts/audio_sortformer_benchmark.py	New benchmark script; `is_success` correctly gated on `num_files > 0`, addressing prior review feedback. RTF, throughput, and segment count metrics collected. Exception path writes results even on failure.
benchmarking/nightly-benchmark.yaml	New `audio_sortformer_xenna` entry added (disabled) with correct metrics, timeout, GPU/CPU allocation, and success requirements.
tutorials/audio/callhome_diar/README.md	Updated prerequisites (sox→ffmpeg), model version (v2→v2.1), and added mono/16kHz limitation note. Consistent with code changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["CallHomeReaderStage\n(num_workers_per_node=1)\nDiscovers WAV+CHA pairs"] --> B["EnsureMonoStage\n(ffmpeg -ac 1 -ar 16000)\nDownmix stereo → mono"]
    B --> C["InferenceSortformerStage\nsetup_on_node: snapshot_download\n_resolve_model_path → .nemo\n_configure_streaming + _extend_pos_enc"]
    C --> D["DERComputationStage\nCHA ground-truth scoring\ncollar tolerance"]
    D --> E["Results JSON + RTTM files"]

    subgraph "sortformer.py setup flow"
        F["setup_on_node()\nsnapshot_download only"] --> G["_resolve_model_path()\nmodel_path override\n OR sorted .nemo from cache"]
        G --> H["SortformerEncLabelModel\n.restore_from()"]
        H --> I["_configure_streaming()\nchunk_len / chunk_left_context\nchunk_right_context / fifo_len"]
        I --> J["_extend_pos_enc_for_long_audio()\nextend_pe(max_len=30000)"]
    end

_{Reviews (6): Last reviewed commit: "Merge branch 'main' into sortformer-fixe..." | Re-trigger Greptile}

sarahyurick

LGTM. We can merge after approval from @mohammadaaftabv and Satish.

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

melllinia · 2026-04-21T16:13:02Z

@mohammadaaftabv Could you please take another look? I’ve made the requested changes.

sarahyurick · 2026-04-22T17:51:45Z

+        repo_dir = getattr(self, "_cached_repo_dir", None) or snapshot_download(
+            repo_id=self.model_name, cache_dir=self.cache_dir
+        )


This shouldn't need snapshot_download right? Since setup_on_node is always called.

Suggested change

repo_dir = getattr(self, "_cached_repo_dir", None) or snapshot_download(

repo_id=self.model_name, cache_dir=self.cache_dir

)

repo_dir = getattr(self, "_cached_repo_dir", None)

Workers don't see _cached_repo_dir due to serialization, so the second snapshot_download() is needed, it just reads from cache, no re-download.

greptile-apps · 2026-04-23T11:02:18Z

    spkcache_update_period: int = 300
    spkcache_len: int = 188
    inference_batch_size: int = 1
+    batch_duration: int = 100000


batch_duration declared but never passed to diarize()

batch_duration is documented as "Maximum total audio duration (seconds) per lhotse batch" but is never forwarded to the underlying diarize() call (line 210-213). Any user who sets this field expecting it to control lhotse batching will be silently ignored; the default 100 000 s limit is effectively never enforced.

If SortformerEncLabelModel.diarize() accepts a batch_duration kwarg, it needs to be threaded through:

predicted_segments = self.diar_model.diarize( audio=audio_paths, batch_size=self.inference_batch_size, batch_duration=self.batch_duration, )

If the NeMo API does not expose this parameter, the field and its docstring should be removed to avoid confusion.

+1 it doesn't look like batch_duration is being used anywhere atm.

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

sarahyurick · 2026-04-23T17:05:50Z

+def run_audio_sortformer_benchmark(
+    benchmark_results_path: str,
+    manifest_path: str,
+    model_name: str,
+    rttm_out_dir: str | None = None,
+    executor: str = "xenna",
+    **kwargs,  # noqa: ARG001
+) -> dict[str, Any]:
+    """Run the audio Sortformer diarization benchmark and collect metrics."""
+    benchmark_results_path = Path(benchmark_results_path)


This is minor but it looks like benchmark_results_path isn't used by this function, only main, so it can be removed?

sarahyurick · 2026-04-23T17:07:38Z

+            return
+        device = next(self.diar_model.parameters()).device
+        try:
+            pos_enc.extend_pe(max_len, device, torch.float32)


Following up on some greptile comments, should it always be float32?

sarahyurick · 2026-04-23T17:08:41Z

-        sm.spkcache_update_period = self.spkcache_update_period
+        sm.chunk_left_context = self.chunk_left_context
+        if hasattr(sm, "spkcache_update_period"):
+            sm.spkcache_update_period = self.spkcache_update_period
        sm.spkcache_len = self.spkcache_len


Following up on some greptile comments, why is there a hasattr guard for spkcache_update_period? Also should spkcache_len have a guard too?

Older NeMo versions don't have spkcache_update_period on SortformerModules, without the guard it crashes.

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

sarahyurick · 2026-04-23T17:40:57Z

/ok to test c5564bf

It looks like all requests have been implemented, thanks!

…mark (#1764) (#1866) * Fix Sortformer tutorial issues and add InferenceSortformerStage benchmark * Address PR review feedback and update model to v2.1 * Address second round of review feedback * fixing some more issues --------- Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>

Merge origin/main into dev to pick up upstream changes (492 files, +57k/-6k): - 26.04 staging release - Generic ASR/TTS audio processing pipeline (#1679) - Dynamo disaggregated serving + validators (#1813, #1820, #1833, #1834, #1861) - ReadSpeech audio curation benchmark + tutorials (#1841, #1851, #1870) - VideoReader path validation, audio waveform leak fixes (#1845, #1765) - Sortformer tutorial fixes + benchmarks (#1764) - Generic audio pipeline + qwen3 support (#1827) - Fern docs (audio + curate-audio sections) Conflict resolution: - nemo_curator/stages/audio/__init__.py: kept dev's lazy __getattr__ registry, added main's new ManifestReader and ManifestWriterStage to both __all__ and _LAZY_IMPORTS (now lazy-loaded from nemo_curator.stages.audio.common). - uv.lock: took main's version (latest dependency resolutions). Removals propagated from main (pre-merge-base files we no longer need): - nemo_curator/stages/audio/alm/alm_manifest_writer.py (replaced by ShardedManifestWriterStage) - nemo_curator/stages/audio/alm/alm_manifest_reader.py - nemo_curator/backends/experimental/* (refactored away) - nemo_curator/core/serve.py (replaced by typed serve config) Verified intact: - SCOTCH pipeline: speaker_id/, hifi_pipeline/slurm_e2e/ (dev-only additions, untouched). - Cherry-picked audio PRs (#1853, #3, #1, #1839, integration-test) all present. Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>

Fix Sortformer tutorial issues and add InferenceSortformerStage bench…

81e54cb

…mark Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

melllinia requested a review from a team as a code owner April 8, 2026 14:08

melllinia requested review from ayushdg and removed request for a team April 8, 2026 14:08

greptile-apps Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread benchmarking/scripts/audio_sortformer_benchmark.py

Comment thread nemo_curator/stages/audio/inference/sortformer.py Outdated

sarahyurick approved these changes Apr 8, 2026

View reviewed changes

sarahyurick requested a review from mohammadaaftabv April 8, 2026 18:44

sarahyurick added the r1.2.0 Pick this label for auto cherry-picking into r1.2.0 label Apr 13, 2026

mohammadaaftabv previously requested changes Apr 16, 2026

View reviewed changes

Address PR review feedback and update model to v2.1

fd87576

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

melllinia requested a review from mohammadaaftabv April 21, 2026 16:11

sarahyurick requested changes Apr 22, 2026

View reviewed changes

melllinia requested a review from sarahyurick April 23, 2026 11:00

greptile-apps Bot reviewed Apr 23, 2026

View reviewed changes

Address second round of review feedback

5c3be57

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

melllinia force-pushed the sortformer-fixes-benchmarks branch from eafdd10 to 5c3be57 Compare April 23, 2026 17:00

sarahyurick reviewed Apr 23, 2026

View reviewed changes

fixing some more issues

246067a

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

sarahyurick approved these changes Apr 23, 2026

View reviewed changes

Merge branch 'main' into sortformer-fixes-benchmarks

c5564bf

copy-pr-bot Bot temporarily deployed to test April 23, 2026 17:41 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 23, 2026 17:41 Inactive

sarahyurick enabled auto-merge (squash) April 23, 2026 17:42

copy-pr-bot Bot temporarily deployed to nemo-ci April 23, 2026 17:49 Inactive

sarahyurick merged commit 4b4b584 into main Apr 23, 2026
45 checks passed

Conversation

melllinia commented Apr 8, 2026

Summary

Uh oh!

copy-pr-bot Bot commented Apr 8, 2026

Uh oh!

greptile-apps Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

sarahyurick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

melllinia commented Apr 21, 2026

Uh oh!

sarahyurick Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

melllinia Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

sarahyurick Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

sarahyurick Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

sarahyurick Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

sarahyurick Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

melllinia Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

sarahyurick commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Apr 8, 2026 •

edited

Loading