feat(m11): GenerationMetadata for pipeline provenance tracking by shaypal5 · Pull Request #49 · DataHackIL/SynthBanshee

shaypal5 · 2026-05-01T07:17:13Z

Summary

Adds GenerationMetadata Pydantic model (spec §4.11) capturing per-clip pipeline provenance: TTS backend, voice family, mix mode, normalization strategy, breathiness flag, and final speaker state snapshots
Wires it into ClipMetadata as an optional generation_metadata field, written to {clip_id}.json
Backward-compatible: existing V1 clips without the key still validate (field defaults to None)

Changes

File	Change
`synthbanshee/labels/schema.py`	New `GenerationMetadata` model; optional field on `ClipMetadata`
`synthbanshee/labels/generator.py`	`generate_clip_metadata()` accepts and passes through `generation_metadata`
`synthbanshee/labels/__init__.py`	Export `GenerationMetadata`
`synthbanshee/cli.py`	Build `GenerationMetadata` from pipeline state (TTS backend, voice family, speaker states) and attach to clip metadata
`tests/unit/test_generation_metadata.py`	8 unit tests: model construction, serialization roundtrip, backward compat, JSON output

Test plan

pytest tests/unit/test_generation_metadata.py — 8/8 passed
pytest tests/unit/ — 1264/1264 passed
ruff check — all passed
mypy synthbanshee/ — no new errors (pre-existing errors in script/generator.py unrelated)

🤖 Generated with Claude Code

Add GenerationMetadata Pydantic model (spec §4.11) that captures per-clip pipeline provenance: TTS backend, voice family, mix mode, normalization strategy, breathiness flag, and final speaker state snapshots. Written to {clip_id}.json under `generation_metadata` key. Backward-compatible — existing V1 clips without the key still validate (field is Optional[None]). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds per-clip pipeline provenance tracking by introducing a new GenerationMetadata Pydantic model and wiring it into label metadata generation and CLI output, with tests validating backward compatibility and JSON roundtrips.

Changes:

Introduce GenerationMetadata and add optional generation_metadata to ClipMetadata
Thread generation_metadata through LabelGenerator.generate_clip_metadata() and CLI clip-metadata writing
Add unit tests covering construction/serialization and backward-compat parsing

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`synthbanshee/labels/schema.py`	Adds `GenerationMetadata` model and optional field on `ClipMetadata`
`synthbanshee/labels/generator.py`	Adds `generation_metadata` parameter passthrough into `ClipMetadata` creation
`synthbanshee/labels/__init__.py`	Exports `GenerationMetadata` from the labels package
`synthbanshee/cli.py`	Constructs `GenerationMetadata` from runtime speaker/turn state and attaches to clip metadata
`tests/unit/test_generation_metadata.py`	Adds unit tests for the new model and backward compatibility

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…None defaults - tts_backend and voice_family are now per-speaker dicts (not scalar), capturing mixed-provider scenes (M9a) without information loss. - mix_mode_used is computed from actual MixedScene.mix_modes (new field) populated by SceneMixer, instead of hardcoded "SEQUENTIAL". - Version fields (text_normalization_version, prosody_controller_version, timing_controller_version) default to None instead of "" to distinguish "not tracked" from "tracked as empty." - MixedScene gains a mix_modes field (list[str]) populated by the mixer. - Tests expanded from 8 to 14: mixer mix_modes, Counter-based dominant mode, LabelGenerator passthrough, per-speaker maps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ic test data - mix_mode_used uses lowercase values consistent with MixMode.value (e.g. "sequential", "overlap", "barge_in") instead of uppercase. - Test speaker_state_serialized keys match real SpeakerState.to_metadata_dict() output (rate_offset, pitch_offset_st, volume_offset_db, breathiness_level). - Renamed misleading test; assert key presence explicitly for null case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-01T10:31:23Z

pr-agent-context report:

This run includes a patch coverage gap on PR #49 in repository https://github.com/DataHackIL/SynthBanshee

Address the patch coverage gaps below, then push all of these changes in a single commit.

# Patch coverage

Patch test coverage is 90.32%; please raise it to 100%. These are the uncovered code lines:
- synthbanshee/cli.py: 560, 561, 577

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25211149316 attempt 1
Comment timestamp: 2026-05-01T10:30:34.864264+00:00
PR head commit: d4db0520ced4e7dcfbe5f56358514a6a3cc1ebce

Copilot

Pull request overview

Adds structured pipeline provenance tracking to per-clip label metadata (M11 / spec §4.11) by introducing a GenerationMetadata model, propagating it through label generation and CLI output, and extending mixer outputs to record per-turn mix modes.

Changes:

Introduces GenerationMetadata (Pydantic) and adds optional generation_metadata to ClipMetadata.
Extends MixedScene and SceneMixer.mix_sequential() to capture per-turn mix_modes.
Builds and attaches GenerationMetadata in the CLI; adds unit tests for model/serialization/backward-compat.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`synthbanshee/labels/schema.py`	Adds `GenerationMetadata` model and optional `generation_metadata` on `ClipMetadata`.
`synthbanshee/labels/generator.py`	Threads `generation_metadata` through `generate_clip_metadata()`.
`synthbanshee/labels/__init__.py`	Exports `GenerationMetadata`.
`synthbanshee/cli.py`	Constructs `GenerationMetadata` from pipeline state and writes it into clip metadata JSON.
`synthbanshee/script/types.py`	Extends `MixedScene` with `mix_modes` per turn.
`synthbanshee/tts/mixer.py`	Populates `MixedScene.mix_modes` from `MixMode.value` for each segment.
`tests/unit/test_generation_metadata.py`	Adds unit tests for `GenerationMetadata`, `ClipMetadata` compat/roundtrip, and mixer mix-mode capture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    # Final speaker state: capture post-last-update state by replaying
+    # the last turn's update on top of the pre-render snapshot.
+    # speaker_state_snapshot is captured *before* update() in the renderer,
+    # so it reflects the state going into the last turn, not the state after.
+    # We need the post-update state for provenance; use the SpeakerState
+    # objects still alive in the renderer — but they're local to render_scene().
+    # Instead, collect the last snapshot per speaker and note it represents
+    # the pre-render state of the final turn (the best we have without
+    # a renderer API change).


+    Captures which TTS provider, voice, SSML parameters, mixer settings,
+    preprocessing steps, and augmentation config were used to generate a clip.


+    def test_dominant_mix_mode_from_counter(self) -> None:
+        """Counter.most_common gives deterministic dominant mode."""
+        modes = ["sequential", "overlap", "sequential", "barge_in"]
+        dominant = Counter(modes).most_common(1)[0][0]
+        assert dominant == "sequential"
+
+        modes2 = ["overlap", "overlap", "barge_in"]
+        dominant2 = Counter(modes2).most_common(1)[0][0]
+        assert dominant2 == "overlap"


+        _mode_counts = Counter(mixed.mix_modes)
+        _dominant_mix_mode = _mode_counts.most_common(1)[0][0]
+    else:
+        _dominant_mix_mode = "SEQUENTIAL"


+    # Dominant mix mode from actual mixer output.
+    if mixed.mix_modes:
+        _mode_counts = Counter(mixed.mix_modes)
+        _dominant_mix_mode = _mode_counts.most_common(1)[0][0]


- Mark M11, M13, M15 as Done in V3 implementation tracker (PRs #49–#51) - Update V3.1 recommended-order note: only M16 and M12 remain - Fix 4 wiki pages: review_state human-authored → human-reviewed, remove extra created/updated fields not in splendor schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update tracker (M11/M13/M15 done) + fix wiki frontmatter - Mark M11, M13, M15 as Done in V3 implementation tracker (PRs #49–#51) - Update V3.1 recommended-order note: only M16 and M12 remain - Fix 4 wiki pages: review_state human-authored → human-reviewed, remove extra created/updated fields not in splendor schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: fix GenerationMetadata type — dataclass → Pydantic BaseModel The implementation uses a Pydantic BaseModel, not a dataclass. Update both mentions in the V3 design doc to match the code. Addresses COPILOT-1 on PR #53. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 1, 2026 07:17

shaypal5 added this to the M11 milestone May 1, 2026

shaypal5 added the enhancement New feature or request label May 1, 2026

Copilot started reviewing on behalf of shaypal5 May 1, 2026 07:17 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread synthbanshee/cli.py Outdated

Comment thread synthbanshee/cli.py

Comment thread synthbanshee/labels/schema.py Outdated

Comment thread tests/unit/test_generation_metadata.py Outdated

Comment thread tests/unit/test_generation_metadata.py Outdated

shaypal5 and others added 2 commits May 1, 2026 13:24

Copilot AI review requested due to automatic review settings May 1, 2026 10:27

Copilot started reviewing on behalf of shaypal5 May 1, 2026 10:27 View session

This comment has been minimized.

Sign in to view

shaypal5 merged commit a40b418 into main May 1, 2026
8 checks passed

shaypal5 deleted the feat/m11-generation-metadata branch May 1, 2026 10:31

Copilot AI reviewed May 1, 2026

View reviewed changes

shaypal5 mentioned this pull request May 1, 2026

docs: update implementation tracker — M11, M13, M15 now done #53

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(m11): GenerationMetadata for pipeline provenance tracking#49

feat(m11): GenerationMetadata for pipeline provenance tracking#49
shaypal5 merged 3 commits into
mainfrom
feat/m11-generation-metadata

shaypal5 commented May 1, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Captures which TTS provider, voice, SSML parameters, mixer settings,
		preprocessing steps, and augmentation config were used to generate a clip.

Conversation

shaypal5 commented May 1, 2026

Summary

Changes

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants