feat(tutorials/readspeech): add interactive Jupyter notebook tutorial#1870
feat(tutorials/readspeech): add interactive Jupyter notebook tutorial#1870sarahyurick merged 2 commits intoNVIDIA-NeMo:mainfrom
Conversation
|
/ok to test 972b1c1 |
Greptile SummaryThis PR adds an interactive Jupyter notebook tutorial ( Confidence Score: 5/5Safe to merge; all findings are P2 style suggestions that do not affect correctness. No P0 or P1 issues found. The two P2 comments cover a potentially misleading threshold-sensitivity chart scope and an unguarded Ray cleanup path — neither blocks functionality. The secrets baseline update is a clean regeneration. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Cell 2: Config & Imports] --> B[Cell 4: CreateInitialManifestReadSpeechStage\nstandalone preview]
A --> C[Cell 6: Full Pipeline]
C --> D[CreateInitialManifestReadSpeechStage\nauto_download=True]
D --> E[AudioDataFilterStage]
E --> E1[Mono Conversion 48 kHz]
E1 --> E2[VAD 2-60 s]
E2 --> E3[Band Filter full_band]
E3 --> E4[UTMOS >= 3.4]
E4 --> E5[SIGMOS OVRL >= 3.5 / NOISE >= 4.0]
E5 --> E6[Speaker Separation]
E6 --> F[AudioToDocumentStage]
F --> G[JsonlWriter -> RESULT_DIR]
G --> H[Cell 9: load_jsonl_results]
H --> I[Cell 11: Score Distributions]
H --> J[Cell 13: Band Classification]
H --> K[Cell 15: Speaker Distribution]
H --> L[Cell 17: Threshold Sensitivity]
G --> M[Cell 19: ray_client.stop]
Reviews (2): Last reviewed commit: "Merge branch 'main' into pr/notebook" | Re-trigger Greptile |
| "**What you'll learn:**\n", | ||
| "1. Download and inspect the dataset\n", | ||
| "2. Run each filter stage and examine intermediate outputs\n", | ||
| "3. Visualize quality score distributions\n", |
There was a problem hiding this comment.
Internal codename leaked into public tutorial
The comments reference "Xenna" — which appears to be an internal project codename — in two places. This will be confusing to external contributors and users who won't know what "Xenna" refers to. These should be replaced with a neutral description.
The second occurrence (# Use the default executor (Xenna), matching pipeline.py CLI defaults.) should similarly be reworded to something like # Use the default executor, matching pipeline.py CLI defaults.
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# DNS Challenge Read Speech \u2014 Interactive Tutorial\n", |
There was a problem hiding this comment.
Pass rate denominator may be misleading
The pass rate is calculated as len(results) / MAX_SAMPLES, but MAX_SAMPLES is an upper bound on how many files to download — the pipeline may have actually processed fewer than MAX_SAMPLES inputs (e.g., if the dataset has fewer matching files, or some are skipped). Using MAX_SAMPLES as the denominator can silently understate the true pass rate.
Consider tracking the actual number of inputs from the manifest stage output and using that as the denominator, or at least adding a note that the denominator is the requested cap, not the actual input count.
Add Jupyter notebook walkthrough for DNS Challenge Read Speech audio curation pipeline with step-by-step execution and visualization. Also update secrets baseline for notebook image false positives.
|
/ok to test e26ad56 |
Merge origin/main into dev to pick up upstream changes (492 files, +57k/-6k): - 26.04 staging release - Generic ASR/TTS audio processing pipeline (#1679) - Dynamo disaggregated serving + validators (#1813, #1820, #1833, #1834, #1861) - ReadSpeech audio curation benchmark + tutorials (#1841, #1851, #1870) - VideoReader path validation, audio waveform leak fixes (#1845, #1765) - Sortformer tutorial fixes + benchmarks (#1764) - Generic audio pipeline + qwen3 support (#1827) - Fern docs (audio + curate-audio sections) Conflict resolution: - nemo_curator/stages/audio/__init__.py: kept dev's lazy __getattr__ registry, added main's new ManifestReader and ManifestWriterStage to both __all__ and _LAZY_IMPORTS (now lazy-loaded from nemo_curator.stages.audio.common). - uv.lock: took main's version (latest dependency resolutions). Removals propagated from main (pre-merge-base files we no longer need): - nemo_curator/stages/audio/alm/alm_manifest_writer.py (replaced by ShardedManifestWriterStage) - nemo_curator/stages/audio/alm/alm_manifest_reader.py - nemo_curator/backends/experimental/* (refactored away) - nemo_curator/core/serve.py (replaced by typed serve config) Verified intact: - SCOTCH pipeline: speaker_id/, hifi_pipeline/slurm_e2e/ (dev-only additions, untouched). - Cherry-picked audio PRs (#1853, #3, #1, #1839, integration-test) all present. Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Summary
What's Included
tutorials/audio/readspeech/readspeech_tutorial.ipynb- Interactive notebook covering:Test Plan