Skip to content

feat(tutorials/readspeech): add interactive Jupyter notebook tutorial#1870

Merged
sarahyurick merged 2 commits intoNVIDIA-NeMo:mainfrom
shubhamNvidia:pr/notebook
Apr 24, 2026
Merged

feat(tutorials/readspeech): add interactive Jupyter notebook tutorial#1870
sarahyurick merged 2 commits intoNVIDIA-NeMo:mainfrom
shubhamNvidia:pr/notebook

Conversation

@shubhamNvidia
Copy link
Copy Markdown
Contributor

Summary

  • Add interactive Jupyter notebook walkthrough for DNS Challenge Read Speech audio curation pipeline
  • Provides step-by-step execution with visualization of quality score distributions
  • Demonstrates threshold tuning to control quality vs. data retention tradeoffs

What's Included

  • tutorials/audio/readspeech/readspeech_tutorial.ipynb - Interactive notebook covering:
    1. Dataset download and manifest creation
    2. Pipeline stages: Mono → VAD → Band Filter → UTMOS → SIGMOS → Speaker Separation
    3. Quality filtering with configurable thresholds
    4. Visualization of intermediate outputs and score distributions

Test Plan

  • Open notebook in Jupyter and run all cells
  • Verify plots render correctly
  • Confirm pipeline stages execute without errors

@shubhamNvidia shubhamNvidia requested review from a team as code owners April 24, 2026 17:58
@shubhamNvidia shubhamNvidia requested review from meatybobby and removed request for a team April 24, 2026 17:58
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shubhamNvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 972b1c1

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR adds an interactive Jupyter notebook tutorial (tutorials/audio/readspeech/readspeech_tutorial.ipynb) that walks through the DNS Challenge Read Speech audio curation pipeline, covering dataset download, all filter stages, quality-score visualization, and threshold tuning. The .secrets.baseline is regenerated to track the new notebook's embedded plot images in place of stale multimodal notebook entries.

Confidence Score: 5/5

Safe to merge; all findings are P2 style suggestions that do not affect correctness.

No P0 or P1 issues found. The two P2 comments cover a potentially misleading threshold-sensitivity chart scope and an unguarded Ray cleanup path — neither blocks functionality. The secrets baseline update is a clean regeneration.

No files require special attention.

Important Files Changed

Filename Overview
tutorials/audio/readspeech/readspeech_tutorial.ipynb New interactive tutorial notebook for DNS Challenge Read Speech pipeline; two minor P2 issues: threshold-sensitivity chart misleads with passing-only sample scope, and Ray cluster lifecycle has no cleanup on failure.
.github/workflows/config/.secrets.baseline Baseline regenerated via detect-secrets scan: swaps stale multimodal notebook entries (base64 plot images) for the new audio tutorial notebook entries; timestamp updated to current run.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Cell 2: Config & Imports] --> B[Cell 4: CreateInitialManifestReadSpeechStage\nstandalone preview]
    A --> C[Cell 6: Full Pipeline]
    C --> D[CreateInitialManifestReadSpeechStage\nauto_download=True]
    D --> E[AudioDataFilterStage]
    E --> E1[Mono Conversion 48 kHz]
    E1 --> E2[VAD 2-60 s]
    E2 --> E3[Band Filter full_band]
    E3 --> E4[UTMOS >= 3.4]
    E4 --> E5[SIGMOS OVRL >= 3.5 / NOISE >= 4.0]
    E5 --> E6[Speaker Separation]
    E6 --> F[AudioToDocumentStage]
    F --> G[JsonlWriter -> RESULT_DIR]
    G --> H[Cell 9: load_jsonl_results]
    H --> I[Cell 11: Score Distributions]
    H --> J[Cell 13: Band Classification]
    H --> K[Cell 15: Speaker Distribution]
    H --> L[Cell 17: Threshold Sensitivity]
    G --> M[Cell 19: ray_client.stop]
Loading

Reviews (2): Last reviewed commit: "Merge branch 'main' into pr/notebook" | Re-trigger Greptile

"**What you'll learn:**\n",
"1. Download and inspect the dataset\n",
"2. Run each filter stage and examine intermediate outputs\n",
"3. Visualize quality score distributions\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Internal codename leaked into public tutorial

The comments reference "Xenna" — which appears to be an internal project codename — in two places. This will be confusing to external contributors and users who won't know what "Xenna" refers to. These should be replaced with a neutral description.

The second occurrence (# Use the default executor (Xenna), matching pipeline.py CLI defaults.) should similarly be reworded to something like # Use the default executor, matching pipeline.py CLI defaults.

"cell_type": "markdown",
"metadata": {},
"source": [
"# DNS Challenge Read Speech \u2014 Interactive Tutorial\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Pass rate denominator may be misleading

The pass rate is calculated as len(results) / MAX_SAMPLES, but MAX_SAMPLES is an upper bound on how many files to download — the pipeline may have actually processed fewer than MAX_SAMPLES inputs (e.g., if the dataset has fewer matching files, or some are skipped). Using MAX_SAMPLES as the denominator can silently understate the true pass rate.

Consider tracking the actual number of inputs from the manifest stage output and using that as the denominator, or at least adding a note that the denominator is the requested cap, not the actual input count.

Add Jupyter notebook walkthrough for DNS Challenge Read Speech audio
curation pipeline with step-by-step execution and visualization.

Also update secrets baseline for notebook image false positives.
@sarahyurick
Copy link
Copy Markdown
Contributor

/ok to test e26ad56

@sarahyurick sarahyurick merged commit 6cdd923 into NVIDIA-NeMo:main Apr 24, 2026
23 checks passed
Jorjeous added a commit that referenced this pull request Apr 27, 2026
Merge origin/main into dev to pick up upstream changes (492 files, +57k/-6k):
- 26.04 staging release
- Generic ASR/TTS audio processing pipeline (#1679)
- Dynamo disaggregated serving + validators (#1813, #1820, #1833, #1834, #1861)
- ReadSpeech audio curation benchmark + tutorials (#1841, #1851, #1870)
- VideoReader path validation, audio waveform leak fixes (#1845, #1765)
- Sortformer tutorial fixes + benchmarks (#1764)
- Generic audio pipeline + qwen3 support (#1827)
- Fern docs (audio + curate-audio sections)

Conflict resolution:
- nemo_curator/stages/audio/__init__.py: kept dev's lazy __getattr__ registry,
  added main's new ManifestReader and ManifestWriterStage to both __all__ and
  _LAZY_IMPORTS (now lazy-loaded from nemo_curator.stages.audio.common).
- uv.lock: took main's version (latest dependency resolutions).

Removals propagated from main (pre-merge-base files we no longer need):
- nemo_curator/stages/audio/alm/alm_manifest_writer.py (replaced by ShardedManifestWriterStage)
- nemo_curator/stages/audio/alm/alm_manifest_reader.py
- nemo_curator/backends/experimental/* (refactored away)
- nemo_curator/core/serve.py (replaced by typed serve config)

Verified intact:
- SCOTCH pipeline: speaker_id/, hifi_pipeline/slurm_e2e/ (dev-only additions, untouched).
- Cherry-picked audio PRs (#1853, #3, #1, #1839, integration-test) all present.

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants