Skip to content

Fix Sortformer tutorial issues and add InferenceSortformerStage benchmark#1764

Merged
sarahyurick merged 5 commits intomainfrom
sortformer-fixes-benchmarks
Apr 23, 2026
Merged

Fix Sortformer tutorial issues and add InferenceSortformerStage benchmark#1764
sarahyurick merged 5 commits intomainfrom
sortformer-fixes-benchmarks

Conversation

@melllinia
Copy link
Copy Markdown
Member

Summary

  • Fix EnsureMonoStage to use ffmpeg instead of sox (not installed in nightly container)
  • Fix InferenceSortformerStage.setup() model-path resolution bug (_resolve_model_path)
  • Cap CallHomeReaderStage workers to 1 via xenna_stage_spec (prevents crash on 64-CPU hosts)
  • Update README prerequisites and model limitations
  • Add audio_sortformer_benchmark.py benchmarking script

Closes #1728

…mark

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
@melllinia melllinia requested a review from a team as a code owner April 8, 2026 14:08
@melllinia melllinia requested review from ayushdg and removed request for a team April 8, 2026 14:08
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 8, 2026

Greptile Summary

This PR fixes two runtime bugs in the Sortformer tutorial pipeline (sox→ffmpeg for EnsureMonoStage, model-path resolution in setup()), caps CallHomeReaderStage workers to 1 to prevent crashes on high-CPU hosts, bumps the default model to diar_streaming_sortformer_4spk-v2.1, and adds a new nightly benchmark script with RTF/throughput/segment metrics.

Confidence Score: 4/5

Safe to merge after addressing the chunk_left_context CLI gap; an unresolved P1 from a prior review (Ray cluster not stopped on exception) persists in run.py.

All new findings in this review are P2. However a P1 from the previous review round (ray_client.stop() unguarded by try/finally in run.py) remains unaddressed, which justifies holding at 4 rather than 5.

tutorials/audio/callhome_diar/run.py — Ray cluster cleanup and the new chunk_left_context parameter exposure.

Important Files Changed

Filename Overview
nemo_curator/stages/audio/inference/sortformer.py Model path resolution refactored into _resolve_model_path(), fixing the setup() bug; chunk_left_context added; spkcache_update_period guarded with hasattr; positional encoding extension for long audio added. Mostly clean changes; residual P1 (batch_duration unused) from prior review now gone.
tutorials/audio/callhome_diar/run.py sox→ffmpeg swap, xenna_stage_spec workers cap, model version bump applied. chunk_left_context added to stage but not exposed as CLI arg. ray_client.stop() still not guarded by try/finally (pre-existing P1).
benchmarking/scripts/audio_sortformer_benchmark.py New benchmark script; is_success correctly gated on num_files > 0, addressing prior review feedback. RTF, throughput, and segment count metrics collected. Exception path writes results even on failure.
benchmarking/nightly-benchmark.yaml New audio_sortformer_xenna entry added (disabled) with correct metrics, timeout, GPU/CPU allocation, and success requirements.
tutorials/audio/callhome_diar/README.md Updated prerequisites (sox→ffmpeg), model version (v2→v2.1), and added mono/16kHz limitation note. Consistent with code changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["CallHomeReaderStage\n(num_workers_per_node=1)\nDiscovers WAV+CHA pairs"] --> B["EnsureMonoStage\n(ffmpeg -ac 1 -ar 16000)\nDownmix stereo → mono"]
    B --> C["InferenceSortformerStage\nsetup_on_node: snapshot_download\n_resolve_model_path → .nemo\n_configure_streaming + _extend_pos_enc"]
    C --> D["DERComputationStage\nCHA ground-truth scoring\ncollar tolerance"]
    D --> E["Results JSON + RTTM files"]

    subgraph "sortformer.py setup flow"
        F["setup_on_node()\nsnapshot_download only"] --> G["_resolve_model_path()\nmodel_path override\n OR sorted .nemo from cache"]
        G --> H["SortformerEncLabelModel\n.restore_from()"]
        H --> I["_configure_streaming()\nchunk_len / chunk_left_context\nchunk_right_context / fifo_len"]
        I --> J["_extend_pos_enc_for_long_audio()\nextend_pe(max_len=30000)"]
    end
Loading

Reviews (6): Last reviewed commit: "Merge branch 'main' into sortformer-fixe..." | Re-trigger Greptile

Comment thread benchmarking/scripts/audio_sortformer_benchmark.py
Comment thread nemo_curator/stages/audio/inference/sortformer.py Outdated
Copy link
Copy Markdown
Contributor

@sarahyurick sarahyurick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can merge after approval from @mohammadaaftabv and Satish.

@sarahyurick sarahyurick added the r1.2.0 Pick this label for auto cherry-picking into r1.2.0 label Apr 13, 2026
Comment thread benchmarking/scripts/audio_sortformer_benchmark.py
Comment thread benchmarking/scripts/audio_sortformer_benchmark.py
Comment thread benchmarking/scripts/audio_sortformer_benchmark.py
Comment thread nemo_curator/stages/audio/inference/sortformer.py Outdated
Comment thread nemo_curator/stages/audio/inference/sortformer.py
Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
@melllinia
Copy link
Copy Markdown
Member Author

@mohammadaaftabv Could you please take another look? I’ve made the requested changes.

Comment on lines +145 to +147
repo_dir = getattr(self, "_cached_repo_dir", None) or snapshot_download(
repo_id=self.model_name, cache_dir=self.cache_dir
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't need snapshot_download right? Since setup_on_node is always called.

Suggested change
repo_dir = getattr(self, "_cached_repo_dir", None) or snapshot_download(
repo_id=self.model_name, cache_dir=self.cache_dir
)
repo_dir = getattr(self, "_cached_repo_dir", None)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workers don't see _cached_repo_dir due to serialization, so the second snapshot_download() is needed, it just reads from cache, no re-download.

Comment thread nemo_curator/stages/audio/inference/sortformer.py Outdated
Comment thread benchmarking/scripts/audio_sortformer_benchmark.py Outdated
@melllinia melllinia requested a review from sarahyurick April 23, 2026 11:00
spkcache_update_period: int = 300
spkcache_len: int = 188
inference_batch_size: int = 1
batch_duration: int = 100000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 batch_duration declared but never passed to diarize()

batch_duration is documented as "Maximum total audio duration (seconds) per lhotse batch" but is never forwarded to the underlying diarize() call (line 210-213). Any user who sets this field expecting it to control lhotse batching will be silently ignored; the default 100 000 s limit is effectively never enforced.

If SortformerEncLabelModel.diarize() accepts a batch_duration kwarg, it needs to be threaded through:

predicted_segments = self.diar_model.diarize(
    audio=audio_paths,
    batch_size=self.inference_batch_size,
    batch_duration=self.batch_duration,
)

If the NeMo API does not expose this parameter, the field and its docstring should be removed to avoid confusion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 it doesn't look like batch_duration is being used anywhere atm.

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
@melllinia melllinia force-pushed the sortformer-fixes-benchmarks branch from eafdd10 to 5c3be57 Compare April 23, 2026 17:00
Comment on lines +62 to +71
def run_audio_sortformer_benchmark(
benchmark_results_path: str,
manifest_path: str,
model_name: str,
rttm_out_dir: str | None = None,
executor: str = "xenna",
**kwargs, # noqa: ARG001
) -> dict[str, Any]:
"""Run the audio Sortformer diarization benchmark and collect metrics."""
benchmark_results_path = Path(benchmark_results_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor but it looks like benchmark_results_path isn't used by this function, only main, so it can be removed?

return
device = next(self.diar_model.parameters()).device
try:
pos_enc.extend_pe(max_len, device, torch.float32)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on some greptile comments, should it always be float32?

Comment on lines -164 to 195
sm.spkcache_update_period = self.spkcache_update_period
sm.chunk_left_context = self.chunk_left_context
if hasattr(sm, "spkcache_update_period"):
sm.spkcache_update_period = self.spkcache_update_period
sm.spkcache_len = self.spkcache_len
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on some greptile comments, why is there a hasattr guard for spkcache_update_period? Also should spkcache_len have a guard too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older NeMo versions don't have spkcache_update_period on SortformerModules, without the guard it crashes.

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
@sarahyurick
Copy link
Copy Markdown
Contributor

/ok to test c5564bf

@sarahyurick sarahyurick dismissed mohammadaaftabv’s stale review April 23, 2026 17:41

It looks like all requests have been implemented, thanks!

@sarahyurick sarahyurick enabled auto-merge (squash) April 23, 2026 17:42
@sarahyurick sarahyurick merged commit 4b4b584 into main Apr 23, 2026
45 checks passed
sarahyurick added a commit that referenced this pull request Apr 23, 2026
…mark (#1764) (#1866)

* Fix Sortformer tutorial issues and add InferenceSortformerStage benchmark



* Address PR review feedback and update model to v2.1



* Address second round of review feedback



* fixing some more issues



---------

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Jorjeous added a commit that referenced this pull request Apr 27, 2026
Merge origin/main into dev to pick up upstream changes (492 files, +57k/-6k):
- 26.04 staging release
- Generic ASR/TTS audio processing pipeline (#1679)
- Dynamo disaggregated serving + validators (#1813, #1820, #1833, #1834, #1861)
- ReadSpeech audio curation benchmark + tutorials (#1841, #1851, #1870)
- VideoReader path validation, audio waveform leak fixes (#1845, #1765)
- Sortformer tutorial fixes + benchmarks (#1764)
- Generic audio pipeline + qwen3 support (#1827)
- Fern docs (audio + curate-audio sections)

Conflict resolution:
- nemo_curator/stages/audio/__init__.py: kept dev's lazy __getattr__ registry,
  added main's new ManifestReader and ManifestWriterStage to both __all__ and
  _LAZY_IMPORTS (now lazy-loaded from nemo_curator.stages.audio.common).
- uv.lock: took main's version (latest dependency resolutions).

Removals propagated from main (pre-merge-base files we no longer need):
- nemo_curator/stages/audio/alm/alm_manifest_writer.py (replaced by ShardedManifestWriterStage)
- nemo_curator/stages/audio/alm/alm_manifest_reader.py
- nemo_curator/backends/experimental/* (refactored away)
- nemo_curator/core/serve.py (replaced by typed serve config)

Verified intact:
- SCOTCH pipeline: speaker_id/, hifi_pipeline/slurm_e2e/ (dev-only additions, untouched).
- Cherry-picked audio PRs (#1853, #3, #1, #1839, integration-test) all present.

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r1.2.0 Pick this label for auto cherry-picking into r1.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add benchmarks for InferenceSortformerStage

3 participants