feat(m11): GenerationMetadata for pipeline provenance tracking#49
Conversation
Add GenerationMetadata Pydantic model (spec §4.11) that captures per-clip
pipeline provenance: TTS backend, voice family, mix mode, normalization
strategy, breathiness flag, and final speaker state snapshots. Written to
{clip_id}.json under `generation_metadata` key. Backward-compatible —
existing V1 clips without the key still validate (field is Optional[None]).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Adds per-clip pipeline provenance tracking by introducing a new GenerationMetadata Pydantic model and wiring it into label metadata generation and CLI output, with tests validating backward compatibility and JSON roundtrips.
Changes:
- Introduce
GenerationMetadataand add optionalgeneration_metadatatoClipMetadata - Thread
generation_metadatathroughLabelGenerator.generate_clip_metadata()and CLI clip-metadata writing - Add unit tests covering construction/serialization and backward-compat parsing
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
synthbanshee/labels/schema.py |
Adds GenerationMetadata model and optional field on ClipMetadata |
synthbanshee/labels/generator.py |
Adds generation_metadata parameter passthrough into ClipMetadata creation |
synthbanshee/labels/__init__.py |
Exports GenerationMetadata from the labels package |
synthbanshee/cli.py |
Constructs GenerationMetadata from runtime speaker/turn state and attaches to clip metadata |
tests/unit/test_generation_metadata.py |
Adds unit tests for the new model and backward compatibility |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…None defaults - tts_backend and voice_family are now per-speaker dicts (not scalar), capturing mixed-provider scenes (M9a) without information loss. - mix_mode_used is computed from actual MixedScene.mix_modes (new field) populated by SceneMixer, instead of hardcoded "SEQUENTIAL". - Version fields (text_normalization_version, prosody_controller_version, timing_controller_version) default to None instead of "" to distinguish "not tracked" from "tracked as empty." - MixedScene gains a mix_modes field (list[str]) populated by the mixer. - Tests expanded from 8 to 14: mixer mix_modes, Counter-based dominant mode, LabelGenerator passthrough, per-speaker maps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ic test data - mix_mode_used uses lowercase values consistent with MixMode.value (e.g. "sequential", "overlap", "barge_in") instead of uppercase. - Test speaker_state_serialized keys match real SpeakerState.to_metadata_dict() output (rate_offset, pitch_offset_st, volume_offset_db, breathiness_level). - Renamed misleading test; assert key presence explicitly for null case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
|
pr-agent-context report: This run includes a patch coverage gap on PR #49 in repository https://github.com/DataHackIL/SynthBanshee
Address the patch coverage gaps below, then push all of these changes in a single commit.
# Patch coverage
Patch test coverage is 90.32%; please raise it to 100%. These are the uncovered code lines:
- synthbanshee/cli.py: 560, 561, 577Run metadata: |
There was a problem hiding this comment.
Pull request overview
Adds structured pipeline provenance tracking to per-clip label metadata (M11 / spec §4.11) by introducing a GenerationMetadata model, propagating it through label generation and CLI output, and extending mixer outputs to record per-turn mix modes.
Changes:
- Introduces
GenerationMetadata(Pydantic) and adds optionalgeneration_metadatatoClipMetadata. - Extends
MixedSceneandSceneMixer.mix_sequential()to capture per-turnmix_modes. - Builds and attaches
GenerationMetadatain the CLI; adds unit tests for model/serialization/backward-compat.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
synthbanshee/labels/schema.py |
Adds GenerationMetadata model and optional generation_metadata on ClipMetadata. |
synthbanshee/labels/generator.py |
Threads generation_metadata through generate_clip_metadata(). |
synthbanshee/labels/__init__.py |
Exports GenerationMetadata. |
synthbanshee/cli.py |
Constructs GenerationMetadata from pipeline state and writes it into clip metadata JSON. |
synthbanshee/script/types.py |
Extends MixedScene with mix_modes per turn. |
synthbanshee/tts/mixer.py |
Populates MixedScene.mix_modes from MixMode.value for each segment. |
tests/unit/test_generation_metadata.py |
Adds unit tests for GenerationMetadata, ClipMetadata compat/roundtrip, and mixer mix-mode capture. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Final speaker state: capture post-last-update state by replaying | ||
| # the last turn's update on top of the pre-render snapshot. | ||
| # speaker_state_snapshot is captured *before* update() in the renderer, | ||
| # so it reflects the state going into the last turn, not the state after. | ||
| # We need the post-update state for provenance; use the SpeakerState | ||
| # objects still alive in the renderer — but they're local to render_scene(). | ||
| # Instead, collect the last snapshot per speaker and note it represents | ||
| # the pre-render state of the final turn (the best we have without | ||
| # a renderer API change). |
| Captures which TTS provider, voice, SSML parameters, mixer settings, | ||
| preprocessing steps, and augmentation config were used to generate a clip. |
| def test_dominant_mix_mode_from_counter(self) -> None: | ||
| """Counter.most_common gives deterministic dominant mode.""" | ||
| modes = ["sequential", "overlap", "sequential", "barge_in"] | ||
| dominant = Counter(modes).most_common(1)[0][0] | ||
| assert dominant == "sequential" | ||
|
|
||
| modes2 = ["overlap", "overlap", "barge_in"] | ||
| dominant2 = Counter(modes2).most_common(1)[0][0] | ||
| assert dominant2 == "overlap" |
| _mode_counts = Counter(mixed.mix_modes) | ||
| _dominant_mix_mode = _mode_counts.most_common(1)[0][0] | ||
| else: | ||
| _dominant_mix_mode = "SEQUENTIAL" |
| # Dominant mix mode from actual mixer output. | ||
| if mixed.mix_modes: | ||
| _mode_counts = Counter(mixed.mix_modes) | ||
| _dominant_mix_mode = _mode_counts.most_common(1)[0][0] |
- Mark M11, M13, M15 as Done in V3 implementation tracker (PRs #49–#51) - Update V3.1 recommended-order note: only M16 and M12 remain - Fix 4 wiki pages: review_state human-authored → human-reviewed, remove extra created/updated fields not in splendor schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update tracker (M11/M13/M15 done) + fix wiki frontmatter - Mark M11, M13, M15 as Done in V3 implementation tracker (PRs #49–#51) - Update V3.1 recommended-order note: only M16 and M12 remain - Fix 4 wiki pages: review_state human-authored → human-reviewed, remove extra created/updated fields not in splendor schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: fix GenerationMetadata type — dataclass → Pydantic BaseModel The implementation uses a Pydantic BaseModel, not a dataclass. Update both mentions in the V3 design doc to match the code. Addresses COPILOT-1 on PR #53. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
GenerationMetadataPydantic model (spec §4.11) capturing per-clip pipeline provenance: TTS backend, voice family, mix mode, normalization strategy, breathiness flag, and final speaker state snapshotsClipMetadataas an optionalgeneration_metadatafield, written to{clip_id}.jsonNone)Changes
synthbanshee/labels/schema.pyGenerationMetadatamodel; optional field onClipMetadatasynthbanshee/labels/generator.pygenerate_clip_metadata()accepts and passes throughgeneration_metadatasynthbanshee/labels/__init__.pyGenerationMetadatasynthbanshee/cli.pyGenerationMetadatafrom pipeline state (TTS backend, voice family, speaker states) and attach to clip metadatatests/unit/test_generation_metadata.pyTest plan
pytest tests/unit/test_generation_metadata.py— 8/8 passedpytest tests/unit/— 1264/1264 passedruff check— all passedmypy synthbanshee/— no new errors (pre-existing errors inscript/generator.pyunrelated)🤖 Generated with Claude Code