Release v0.7.6#29
Conversation
|
Thank you for raising your pull request and contributing to voscript. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #29 +/- ##
==========================================
- Coverage 91.32% 91.18% -0.15%
==========================================
Files 75 76 +1
Lines 2906 3016 +110
==========================================
+ Hits 2654 2750 +96
- Misses 252 266 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92a7af54ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # whisperx: forced word-level alignment via wav2vec2. | ||
| # 3.3.1 is the newest non-yanked WhisperX release that still supports the | ||
| # current torch 2.6 / numpy<2 / SciPy 1.11.x runtime without moving to the | ||
| # WhisperX 3.3.3+ numpy>=2 or 3.8.x torch~=2.8 dependency stack. | ||
| whisperx==3.3.1 | ||
| # Docker installs whisperx==3.3.1 with --no-deps after this file is resolved. | ||
| # WhisperX 3.3.1 still hard-pins faster-whisper==1.1.0 / ctranslate2<4.5.0, | ||
| # which would replace the cuDNN9-compatible ASR stack above and look for | ||
| # cuDNN8 runtime libraries in the CUDA 12.4 + cuDNN9 base image. | ||
| nltk>=3.9,<4.0 |
There was a problem hiding this comment.
Restore WhisperX to non-Docker installs
The local quickstart still tells users to install only pip install -r app/requirements.txt, but this file now contains only comments for WhisperX and no installable whisperx requirement. Those environments will start without the package and then hit ModuleNotFoundError at the import whisperx in the diarization alignment path; the Dockerfile's separate pip install --no-deps whisperx==3.3.1 does not cover local venvs or CI jobs that consume this requirements file directly.
Useful? React with 👍 / 👎.
| data, sample_rate = sf.read(audio_path, dtype="float32", always_2d=True) | ||
| waveform = torch.from_numpy(data.T.copy()) |
There was a problem hiding this comment.
Avoid loading entire embedding audio into memory
For long recordings, sf.read() decodes the whole embedding source before any diarization-turn filtering, and data.T.copy() immediately creates another full-size copy. The previous path loaded only each bounded turn with torchaudio.load(frame_offset, num_frames), so large accepted uploads or caller-provided raw embedding sources can now exhaust RAM before embeddings start; consider keeping segmented reads or gating the full-file fast path by duration/size.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Release v0.7.6 updates the runtime to improve health stability during GPU cleanup, make WhisperX alignment more reliable/isolated, and reduce embedding/audio loading overhead, with accompanying unit tests and documentation/changelog updates.
Changes:
- Add
WHISPERX_ALIGN_DEVICEsupport and cache/reuse WhisperX alignment models by language/model/device. - Prefer single-pass audio loading for embedding extraction (soundfile → slice per diarization turn), with a torchaudio fallback.
- Adjust GPU cleanup to avoid full Python GC during active job phases; update hallucination guard for short “stock outro” segments; bump docs/version/changelog and update dependency/runtime baselines.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_provider_registry.py | Adds coverage for alignment-device caching/reuse and the embedding single-load path; updates embedding timing test expectations. |
| tests/unit/test_pipeline_model_lifecycle.py | Extends lifecycle/unload tests to cover alignment cache/device/model state. |
| tests/unit/test_main_lifespan.py | Updates OpenAPI version assertion to 0.7.6. |
| tests/unit/test_job_runtime.py | Adds tests for CUDA cache flushing behavior with/without full Python GC by phase. |
| tests/unit/test_dependency_runtime_baseline.py | Adds regression tests to enforce faster-whisper/ctranslate2 pins and Docker-only whisperx install strategy. |
| tests/unit/test_asr_repetition_guard.py | Adds tests for the new short single-segment “stock outro” hallucination filter. |
| doc/security.zh.md | Updates documented “current version” reference to 0.7.6. |
| doc/security.en.md | Updates documented “current version” reference to 0.7.6. |
| doc/quickstart.zh.md | Documents WHISPERX_ALIGN_DEVICE and clarifies CPU default for zh alignment. |
| doc/quickstart.en.md | Documents WHISPERX_ALIGN_DEVICE and clarifies CPU default for zh alignment. |
| doc/configuration.zh.md | Bumps version references to 0.7.6 and documents WHISPERX_ALIGN_DEVICE + validation wording. |
| doc/configuration.en.md | Bumps version references to 0.7.6 and documents WHISPERX_ALIGN_DEVICE + validation wording. |
| doc/changelog.zh.md | Adds v0.7.6 release notes (deps/runtime/health/alignment/embedding). |
| doc/changelog.en.md | Adds v0.7.6 release notes (deps/runtime/health/alignment/embedding). |
| app/requirements.txt | Moves ASR stack to cuDNN9-compatible faster-whisper/ctranslate2, adds soundfile/nltk; removes whisperx pin (Docker installs it separately). |
| app/providers/embedding/default.py | Implements soundfile full-load + per-turn slicing path with fallback to torchaudio segment loading and adds load-timing logs. |
| app/providers/diarization/default.py | Adds WHISPERX_ALIGN_DEVICE resolution and alignment model caching/reuse keyed by language/model/device. |
| app/providers/asr/default.py | Adds a “stock outro” marker-based single-segment hallucination filter. |
| app/pipeline/orchestrator.py | Tracks and unloads alignment cache/device/model state alongside other pipeline models. |
| app/infra/job_runtime.py | Avoids full Python GC at active job phases; keeps full GC for idle-unload; factors GC into helper. |
| app/Dockerfile | Installs whisperx==3.3.1 with --no-deps after resolving requirements to avoid dependency replacement. |
| app/config.py | Bumps APP_VERSION to 0.7.6 and adds WHISPERX_ALIGN_DEVICE env config. |
| .env.example | Documents WHISPERX_ALIGN_DEVICE default and recommended usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Docker installs whisperx==3.3.1 with --no-deps after this file is resolved. | ||
| # WhisperX 3.3.1 still hard-pins faster-whisper==1.1.0 / ctranslate2<4.5.0, | ||
| # which would replace the cuDNN9-compatible ASR stack above and look for | ||
| # cuDNN8 runtime libraries in the CUDA 12.4 + cuDNN9 base image. |
| except Exception as exc: | ||
| logger.warning( | ||
| "Falling back to torchaudio segment loading for embedding audio: %s", | ||
| exc, | ||
| ) | ||
| info = torchaudio.info(audio_path) | ||
| native_sr = info.sample_rate |
| def _load_full_waveform(audio_path: str): | ||
| """Load normalized audio once with libsndfile to avoid per-turn torch decode.""" | ||
|
|
||
| load_started = time.perf_counter() | ||
| data, sample_rate = sf.read(audio_path, dtype="float32", always_2d=True) | ||
| waveform = torch.from_numpy(data.T.copy()) | ||
| logger.info( | ||
| "embedding_audio_load_timing backend=soundfile elapsed_s=%.3f sample_rate=%d channels=%d frames=%d", | ||
| time.perf_counter() - load_started, | ||
| sample_rate, | ||
| waveform.shape[0], | ||
| waveform.shape[1], | ||
| ) |
| def _outro_marker_score(normalized_text: str) -> tuple[int, float]: | ||
| if not normalized_text: | ||
| return 0, 0.0 | ||
|
|
||
| matched = { | ||
| marker for marker in _OUTRO_HALLUCINATION_MARKERS if marker in normalized_text | ||
| } | ||
| marker_chars = sum(len(marker) for marker in matched) | ||
| return len(matched), marker_chars / len(normalized_text) |
c50fabd to
439a5d2
Compare
|
❤️ Great PR @MapleEve ❤️ The growth of project is inseparable from user feedback and contribution, thanks for your contribution! |
Change Type
Description of Change
Release v0.7.6 with health, WhisperX alignment runtime, stock outro hallucination guard, and embedding audio slicing/runtime reliability fixes.
Public documentation and changelog updates use anonymized validation wording only. Internal validation is described by behavior category, without sample names, job IDs, speaker IDs, hosts, paths, or logs.
Verification
PYSEC-2022-42969,CVE-2026-1839.ruff check app/ --ignore E501passed.ruff format --check app/passed.PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest tests/unit/ tests/test_security.py -v --tb=short --no-headerpassed in a CI-equivalent Python 3.11 environment.git diff --checkpassed.