Skip to content

Add transcription artifact manifest#33

Merged
MapleEve merged 3 commits intomainfrom
feat/v0.8.0a-artifacts
May 7, 2026
Merged

Add transcription artifact manifest#33
MapleEve merged 3 commits intomainfrom
feat/v0.8.0a-artifacts

Conversation

@MapleEve
Copy link
Copy Markdown
Owner

@MapleEve MapleEve commented May 7, 2026

Summary

  • Add an optional public-safe artifacts manifest to completed transcription results.
  • Document compatibility rules for missing or unknown artifacts fields in zh/en API and configuration docs.
  • Add unit coverage for persisted manifests and local lightweight dependency fallbacks.

Validation

  • Public release scan passed.
  • ruff check app/ tests/conftest.py --ignore E501
  • ruff format --check app/ tests/conftest.py
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python -m pytest tests/unit/ tests/test_security.py -v --tb=short --no-header (150 passed)

Copilot AI review requested due to automatic review settings May 7, 2026 20:42
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

👍 @MapleEve

Thank you for raising your pull request and contributing to voscript.
Please make sure you have followed our contributing guidelines. We will review it as soon as possible.
If you encounter any problems, please feel free to connect with us.
非常感谢您提出拉取请求并为 voscript 做出贡献,请确保您已经遵循了我们的贡献指南,我们会尽快审查它。
如果您遇到任何问题,请随时与我们联系。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @MapleEve's task in 4m 53s —— View job


PR Review: Add transcription artifact manifest

Todo List

  • Read REVIEW.md for review guidelines
  • Examine changed files and understand the artifact manifest feature
  • Check for privacy/security leaks
  • Check model lifecycle and fallback behavior
  • Check HTTP API compatibility
  • Check test coverage
  • Check English/Chinese documentation synchronization
  • Post review findings

Review Summary

No actionable issues found. This PR adds an optional artifacts manifest to completed transcription results following VoScript's privacy-first and backward-compatibility principles.

What Was Reviewed

✅ Privacy and Security

  • No path leaks: The manifest exposes only filename (e.g., result.json, emb_SPEAKER_00.npy), not absolute paths or artifact_dir values
  • No sensitive data: Does not expose job IDs, tokens, hosts, or debug information
  • Safe speaker labels: Uses safe_speaker_label() validation before embedding labels into filenames
  • Test coverage confirms: test_runner_persists_artifacts_and_cleans_generated_audio explicitly asserts str(tmp_path) not in json.dumps(persisted_result["artifacts"])

✅ HTTP API Compatibility

  • Optional field: The artifacts field is added as optional; existing clients can ignore it
  • Backward compatible: Documentation explicitly states clients must tolerate missing artifacts fields
  • No breaking changes: Existing response contracts (segments, speaker_map, unique_speakers, params) remain unchanged
  • Consistent with existing patterns: Follows the same optional-field principle as words, alignment, and warning

✅ Test Coverage

  • Regression coverage added: Two test assertions verify the manifest structure in test_pipeline_runner.py
    • test_runner_persists_artifacts_and_cleans_generated_audio (lines 494-524): validates full manifest with speaker embeddings
    • test_artifact_result_contract_keeps_status_speaker_label_and_optional_alignm (lines 609-619): validates minimal manifest without embeddings
  • Path safety verified: Test confirms no local paths leak into persisted artifacts
  • No live E2E required: Unit tests adequately cover the manifest assembly logic

✅ Documentation Synchronization

  • English/Chinese parity maintained: Both api.zh.md and api.en.md document the new artifacts field with equivalent content
  • Configuration docs updated: Both configuration.zh.md and configuration.en.md list artifacts as an optional field
  • Changelog synchronized: Both changelog.zh.md and changelog.en.md describe the feature consistently
  • Compatibility guidance clear: All docs emphasize that clients must tolerate missing or unknown artifacts fields

✅ Model Lifecycle

  • No model lifecycle impact: The artifact manifest is assembled in the artifacts stage after all model operations complete
  • No GPU/CUDA concerns: The code only builds metadata structures; no model loading, unloading, or device management

Architecture Notes

  • The manifest uses a three-tier categorization (stable, optional, experimental) that provides a clear evolution path for future artifact types
  • The ArtifactManifestEntry dataclass cleanly separates public-safe metadata from internal runtime paths
  • The build_artifact_manifest() function provides a single composition point for manifest assembly

Recommendation: Approve and merge. This PR follows VoScript's security and compatibility guidelines without introducing actionable risks.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.24%. Comparing base (e0e6f06) to head (49edb8e).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
+ Coverage   91.18%   91.24%   +0.06%     
==========================================
  Files          76       76              
  Lines        3016     3039      +23     
==========================================
+ Hits         2750     2773      +23     
  Misses        266      266              
Flag Coverage Δ
unit 91.24% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a public-safe, optional artifacts manifest to completed transcription results so clients can discover result-adjacent files (e.g., result.json, speaker embeddings) without exposing host-local paths, and documents compatibility expectations for missing/unknown fields.

Changes:

  • Introduce ArtifactManifestEntry + build_artifact_manifest() (versioned as artifact_manifest.v1) and embed the manifest in completed transcription payloads.
  • Persist and validate the manifest via unit tests (including “no local path leakage” assertions).
  • Expand zh/en API + configuration docs and update changelogs to describe the new optional field and compatibility rules.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/test_pipeline_runner.py Adds assertions for the artifacts manifest in API results and persisted result.json, including path-safety checks.
tests/conftest.py Extends lightweight stubs (notably numpy + fastapi) to support more import/runtime scenarios in minimal environments.
doc/configuration.zh.md Documents artifacts as an optional, public-safe completed-result field and clarifies compatibility expectations.
doc/configuration.en.md Same as above for English documentation.
doc/changelog.zh.md Notes the new optional artifacts manifest feature in the Unreleased section.
doc/changelog.en.md Same as above for English changelog.
doc/api.zh.md Adds artifacts to example payloads and documents the manifest semantics/compatibility.
doc/api.en.md Same as above for the English API docs.
app/providers/artifacts/default.py Builds and attaches the artifacts manifest (stable entries: result.json + per-speaker embedding files).
app/pipeline/contracts/artifacts.py Defines the manifest contract (ARTIFACT_MANIFEST_VERSION, ArtifactManifestEntry, build_artifact_manifest).
app/pipeline/contracts/init.py Re-exports new artifacts-manifest contract symbols.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/conftest.py Outdated
def __getitem__(self, item):
if isinstance(item, tuple):
value = self
for index in item:
@MapleEve MapleEve merged commit 7b49fb0 into main May 7, 2026
8 checks passed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

❤️ Great PR @MapleEve ❤️

The growth of project is inseparable from user feedback and contribution, thanks for your contribution!
项目的成长离不开用户反馈和贡献,感谢您的贡献!

@MapleEve MapleEve deleted the feat/v0.8.0a-artifacts branch May 7, 2026 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants