Skip to content

Release v0.7.6#29

Merged
MapleEve merged 10 commits into
mainfrom
fix/health-unblock-gpu-cleanup
May 7, 2026
Merged

Release v0.7.6#29
MapleEve merged 10 commits into
mainfrom
fix/health-unblock-gpu-cleanup

Conversation

@MapleEve
Copy link
Copy Markdown
Owner

@MapleEve MapleEve commented May 7, 2026

Change Type

  • fix
  • docs
  • chore

Description of Change

Release v0.7.6 with health, WhisperX alignment runtime, stock outro hallucination guard, and embedding audio slicing/runtime reliability fixes.

Public documentation and changelog updates use anonymized validation wording only. Internal validation is described by behavior category, without sample names, job IDs, speaker IDs, hosts, paths, or logs.

Verification

  • Public release scan passed.
  • Dependency audit passed with the accepted ignores: PYSEC-2022-42969, CVE-2026-1839.
  • ruff check app/ --ignore E501 passed.
  • ruff format --check app/ passed.
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest tests/unit/ tests/test_security.py -v --tb=short --no-header passed in a CI-equivalent Python 3.11 environment.
  • git diff --check passed.

Copilot AI review requested due to automatic review settings May 7, 2026 04:02
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

👍 @MapleEve

Thank you for raising your pull request and contributing to voscript.
Please make sure you have followed our contributing guidelines. We will review it as soon as possible.
If you encounter any problems, please feel free to connect with us.
非常感谢您提出拉取请求并为 voscript 做出贡献,请确保您已经遵循了我们的贡献指南,我们会尽快审查它。
如果您遇到任何问题,请随时与我们联系。

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 86.17886% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.18%. Comparing base (80b61dd) to head (439a5d2).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
app/nltk/tokenize/punkt.py 86.36% 6 Missing ⚠️
app/providers/diarization/default.py 83.33% 5 Missing ⚠️
app/providers/embedding/default.py 87.50% 3 Missing ⚠️
app/infra/job_runtime.py 60.00% 2 Missing ⚠️
app/providers/asr/default.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #29      +/-   ##
==========================================
- Coverage   91.32%   91.18%   -0.15%     
==========================================
  Files          75       76       +1     
  Lines        2906     3016     +110     
==========================================
+ Hits         2654     2750      +96     
- Misses        252      266      +14     
Flag Coverage Δ
unit 91.18% <86.17%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92a7af54ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/requirements.txt Outdated
Comment on lines +41 to +46
# whisperx: forced word-level alignment via wav2vec2.
# 3.3.1 is the newest non-yanked WhisperX release that still supports the
# current torch 2.6 / numpy<2 / SciPy 1.11.x runtime without moving to the
# WhisperX 3.3.3+ numpy>=2 or 3.8.x torch~=2.8 dependency stack.
whisperx==3.3.1
# Docker installs whisperx==3.3.1 with --no-deps after this file is resolved.
# WhisperX 3.3.1 still hard-pins faster-whisper==1.1.0 / ctranslate2<4.5.0,
# which would replace the cuDNN9-compatible ASR stack above and look for
# cuDNN8 runtime libraries in the CUDA 12.4 + cuDNN9 base image.
nltk>=3.9,<4.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore WhisperX to non-Docker installs

The local quickstart still tells users to install only pip install -r app/requirements.txt, but this file now contains only comments for WhisperX and no installable whisperx requirement. Those environments will start without the package and then hit ModuleNotFoundError at the import whisperx in the diarization alignment path; the Dockerfile's separate pip install --no-deps whisperx==3.3.1 does not cover local venvs or CI jobs that consume this requirements file directly.

Useful? React with 👍 / 👎.

Comment on lines +27 to +28
data, sample_rate = sf.read(audio_path, dtype="float32", always_2d=True)
waveform = torch.from_numpy(data.T.copy())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid loading entire embedding audio into memory

For long recordings, sf.read() decodes the whole embedding source before any diarization-turn filtering, and data.T.copy() immediately creates another full-size copy. The previous path loaded only each bounded turn with torchaudio.load(frame_offset, num_frames), so large accepted uploads or caller-provided raw embedding sources can now exhaust RAM before embeddings start; consider keeping segmented reads or gating the full-file fast path by duration/size.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Release v0.7.6 updates the runtime to improve health stability during GPU cleanup, make WhisperX alignment more reliable/isolated, and reduce embedding/audio loading overhead, with accompanying unit tests and documentation/changelog updates.

Changes:

  • Add WHISPERX_ALIGN_DEVICE support and cache/reuse WhisperX alignment models by language/model/device.
  • Prefer single-pass audio loading for embedding extraction (soundfile → slice per diarization turn), with a torchaudio fallback.
  • Adjust GPU cleanup to avoid full Python GC during active job phases; update hallucination guard for short “stock outro” segments; bump docs/version/changelog and update dependency/runtime baselines.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/test_provider_registry.py Adds coverage for alignment-device caching/reuse and the embedding single-load path; updates embedding timing test expectations.
tests/unit/test_pipeline_model_lifecycle.py Extends lifecycle/unload tests to cover alignment cache/device/model state.
tests/unit/test_main_lifespan.py Updates OpenAPI version assertion to 0.7.6.
tests/unit/test_job_runtime.py Adds tests for CUDA cache flushing behavior with/without full Python GC by phase.
tests/unit/test_dependency_runtime_baseline.py Adds regression tests to enforce faster-whisper/ctranslate2 pins and Docker-only whisperx install strategy.
tests/unit/test_asr_repetition_guard.py Adds tests for the new short single-segment “stock outro” hallucination filter.
doc/security.zh.md Updates documented “current version” reference to 0.7.6.
doc/security.en.md Updates documented “current version” reference to 0.7.6.
doc/quickstart.zh.md Documents WHISPERX_ALIGN_DEVICE and clarifies CPU default for zh alignment.
doc/quickstart.en.md Documents WHISPERX_ALIGN_DEVICE and clarifies CPU default for zh alignment.
doc/configuration.zh.md Bumps version references to 0.7.6 and documents WHISPERX_ALIGN_DEVICE + validation wording.
doc/configuration.en.md Bumps version references to 0.7.6 and documents WHISPERX_ALIGN_DEVICE + validation wording.
doc/changelog.zh.md Adds v0.7.6 release notes (deps/runtime/health/alignment/embedding).
doc/changelog.en.md Adds v0.7.6 release notes (deps/runtime/health/alignment/embedding).
app/requirements.txt Moves ASR stack to cuDNN9-compatible faster-whisper/ctranslate2, adds soundfile/nltk; removes whisperx pin (Docker installs it separately).
app/providers/embedding/default.py Implements soundfile full-load + per-turn slicing path with fallback to torchaudio segment loading and adds load-timing logs.
app/providers/diarization/default.py Adds WHISPERX_ALIGN_DEVICE resolution and alignment model caching/reuse keyed by language/model/device.
app/providers/asr/default.py Adds a “stock outro” marker-based single-segment hallucination filter.
app/pipeline/orchestrator.py Tracks and unloads alignment cache/device/model state alongside other pipeline models.
app/infra/job_runtime.py Avoids full Python GC at active job phases; keeps full GC for idle-unload; factors GC into helper.
app/Dockerfile Installs whisperx==3.3.1 with --no-deps after resolving requirements to avoid dependency replacement.
app/config.py Bumps APP_VERSION to 0.7.6 and adds WHISPERX_ALIGN_DEVICE env config.
.env.example Documents WHISPERX_ALIGN_DEVICE default and recommended usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/requirements.txt Outdated
Comment on lines +42 to +45
# Docker installs whisperx==3.3.1 with --no-deps after this file is resolved.
# WhisperX 3.3.1 still hard-pins faster-whisper==1.1.0 / ctranslate2<4.5.0,
# which would replace the cuDNN9-compatible ASR stack above and look for
# cuDNN8 runtime libraries in the CUDA 12.4 + cuDNN9 base image.
Comment on lines +49 to +55
except Exception as exc:
logger.warning(
"Falling back to torchaudio segment loading for embedding audio: %s",
exc,
)
info = torchaudio.info(audio_path)
native_sr = info.sample_rate
Comment on lines +23 to +35
def _load_full_waveform(audio_path: str):
"""Load normalized audio once with libsndfile to avoid per-turn torch decode."""

load_started = time.perf_counter()
data, sample_rate = sf.read(audio_path, dtype="float32", always_2d=True)
waveform = torch.from_numpy(data.T.copy())
logger.info(
"embedding_audio_load_timing backend=soundfile elapsed_s=%.3f sample_rate=%d channels=%d frames=%d",
time.perf_counter() - load_started,
sample_rate,
waveform.shape[0],
waveform.shape[1],
)
Comment on lines +66 to +74
def _outro_marker_score(normalized_text: str) -> tuple[int, float]:
if not normalized_text:
return 0, 0.0

matched = {
marker for marker in _OUTRO_HALLUCINATION_MARKERS if marker in normalized_text
}
marker_chars = sum(len(marker) for marker in matched)
return len(matched), marker_chars / len(normalized_text)
@MapleEve MapleEve force-pushed the fix/health-unblock-gpu-cleanup branch from c50fabd to 439a5d2 Compare May 7, 2026 04:56
@MapleEve MapleEve merged commit 6d5242a into main May 7, 2026
8 checks passed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

❤️ Great PR @MapleEve ❤️

The growth of project is inseparable from user feedback and contribution, thanks for your contribution!
项目的成长离不开用户反馈和贡献,感谢您的贡献!

@MapleEve MapleEve deleted the fix/health-unblock-gpu-cleanup branch May 7, 2026 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants