feat(m15): SSML prosody tuning with research-validated Hebrew parameters by shaypal5 · Pull Request #51 · DataHackIL/SynthBanshee

shaypal5 · 2026-05-01T18:58:25Z

Summary

Update speaker YAML style_maps to research-consensus prosody values (rate, pitch, volume per intensity level) derived from Amir et al. (2003), T-RES Hebrew emotional prosody, and Gelfer (2005) F0 ranges
Add 2.0 semitone F0 drift bound to SpeakerState with f0_drift_exceeded property for cross-clip monitoring
Implement turn-level quality gates (synthbanshee/tts/quality_gates.py) — sustained-vowel detection (>2.8s reject), F0 guardrails (male [80,180] Hz, female [150,290] Hz), click detection (DC-offset jumps)
Wire quality gates into TTSRenderer.render_scene() with verbose logging on failure

Changes by file

File	Change
`configs/speakers/*.yaml` (6 files)	Updated style_map rate/pitch/volume to research consensus; replaced `angry`/`sad` styles with `General`
`configs/examples/*.yaml` (6 files)	Same updates for example speaker configs
`synthbanshee/tts/quality_gates.py`	New — three quality gate implementations + composite runner
`synthbanshee/tts/speaker_state.py`	Added `MAX_F0_DRIFT_ST` constant and `f0_drift_exceeded` property
`synthbanshee/tts/renderer.py`	Wire quality gates post-render; warn on F0 drift exceeded
`tests/unit/test_quality_gates.py`	New — 19 tests for all quality gates
`tests/unit/test_speaker_state.py`	7 new tests for F0 drift bound
`tests/unit/test_config.py`	Updated assertions to match new `General` style

Test plan

pytest tests/unit/ — 1322 passed
ruff check — all checks passed
mypy synthbanshee/tts/quality_gates.py synthbanshee/tts/renderer.py synthbanshee/tts/speaker_state.py — success, no issues
Pre-commit hooks pass (ruff, ruff-format, mypy, yaml check)

🤖 Generated with Claude Code

Tunes TTS prosody to research-consensus values from three independent reports (Amir et al., T-RES, Gelfer 2005) and adds turn-level quality gates to reject unrealistic renders before mixing. Changes: - Update all speaker YAML style_maps (rate, pitch, volume) to match the consensus table in wiki/topics/research-synthesis.md (lines 93-99) - Replace 'angry'/'sad' express-as styles with 'General' (M14 confirmed express-as is not supported for he-IL voices) - Add MAX_F0_DRIFT_ST (2.0 st) bound and f0_drift_exceeded property to SpeakerState for cross-clip drift monitoring - New synthbanshee/tts/quality_gates.py module with three gates: - Sustained-vowel detection (>2.8 s reject) - F0 guardrails (male [80,180] Hz, female [150,290] Hz) - Click detection (DC-offset jumps) - Wire quality gates into TTSRenderer.render_scene() with verbose logging - Add comprehensive unit tests (19 new tests in test_quality_gates.py, 7 new tests in test_speaker_state.py) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR tunes Hebrew SSML prosody controls (rate/pitch/volume) in speaker configs and introduces post-render turn-level audio validation (quality gates) plus a bounded cross-turn F0 drift monitor to catch unrealistic renders early in the TTS pipeline.

Changes:

Updated multiple speaker + example YAML style_map entries to research-consensus prosody parameters and standardized styles to "General".
Added MAX_F0_DRIFT_ST / f0_drift_exceeded to SpeakerState and integrated drift warnings into TTSRenderer.render_scene().
Added synthbanshee/tts/quality_gates.py (sustained vowel, F0 guardrails, click detection) and unit tests.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`synthbanshee/tts/quality_gates.py`	New module implementing turn-level audio validation gates and a composite runner.
`synthbanshee/tts/renderer.py`	Runs quality gates after each turn render and logs gate failures; warns on accumulated F0 drift.
`synthbanshee/tts/speaker_state.py`	Adds a 2.0 semitone drift bound constant and an `f0_drift_exceeded` property.
`tests/unit/test_quality_gates.py`	New unit test coverage for all quality gates and the composite runner.
`tests/unit/test_speaker_state.py`	Adds unit tests covering the drift bound constant and property behavior.
`tests/unit/test_config.py`	Updates assertions to reflect `"General"` style usage at intensity 5/3.
`configs/speakers/speaker_VIC_F_25-40_004.yaml`	Updates prosody parameters per intensity level; standardizes style to `"General"`.
`configs/speakers/speaker_SW_F_30-45_003.yaml`	Updates prosody parameters per intensity level; standardizes style to `"General"`.
`configs/speakers/speaker_SW_F_30-45_002.yaml`	Updates prosody parameters per intensity level; standardizes style to `"General"`.
`configs/speakers/speaker_BEN_M_40-55_005.yaml`	Updates prosody parameters per intensity level; standardizes style to `"General"`.
`configs/speakers/speaker_BEN_M_40-55_004.yaml`	Updates prosody parameters per intensity level; standardizes style to `"General"`.
`configs/speakers/speaker_AGG_M_30-45_003.yaml`	Updates prosody parameters per intensity level; standardizes style to `"General"`.
`configs/examples/speaker_VIC_F_25-40_003.yaml`	Mirrors speaker prosody updates in example config.
`configs/examples/speaker_VIC_F_25-40_002.yaml`	Mirrors speaker prosody updates in example config; updates narrative comments accordingly.
`configs/examples/speaker_SW_F_30-45_001.yaml`	Mirrors speaker prosody updates in example config.
`configs/examples/speaker_BEN_M_40-55_003.yaml`	Mirrors speaker prosody updates in example config.
`configs/examples/speaker_AGG_M_30-45_002.yaml`	Mirrors speaker prosody updates in example config.
`configs/examples/speaker_AGG_M_30-45_001.yaml`	Mirrors speaker prosody updates in example config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…rsity Self-review fixes: 1. Quality gates now retry on failure (up to quality_gate_retries=2 re-renders with different random seeds) before accepting a failed turn. Failures are persisted in DialogueTurn.quality_gate_failures for downstream observability. 2. Click detection raised threshold from 0.05 to 0.15 (avoids false positives on plosive transients /p/,/t/,/k/) and added isolated-spike criterion: only count a diff event as a click if surrounding ±3 samples are below threshold — distinguishes single-sample DC jumps from multi-sample bursts. 3. F0 drift warning now prints the actual numeric bound (±2.0 st) instead of the class name. 4. Added quality_gate_failures field to DialogueTurn so gate results are persisted in output metadata. 5. Added quality_gates=True and quality_gate_retries=2 params to render_scene() so callers can disable gates for fast batch runs. 6. Restored inter-speaker prosody variation: each speaker instance now samples a different point within the research consensus ranges, preserving perceptual diversity while staying within validated bounds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix sustained-vowel duration calculation to account for frame overlap: duration = frame_len/sr + (N-1)*hop/sr (was N*hop/sr, underestimating) - Rename test_agg_sustained_i5_may_exceed → test_agg_sustained_i5_stays_within_bound to clarify that the drift target is never exceeded (exponential convergence) Other Copilot comments (click detection, reject behavior, F0 drift warning) were already addressed in the previous commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix _wav_bytes_to_samples docstring to not claim PCM16-only (accepts any WAV subtype readable by soundfile) - Log actual retries_attempted count instead of max retries configured Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-01T19:23:58Z

pr-agent-context report:

This run includes patch coverage gaps on PR #51 in repository https://github.com/DataHackIL/SynthBanshee

Address the patch coverage gaps below, then push all of these changes in a single commit.

# Patch coverage

Patch test coverage is 94.78%; please raise it to 100%. These are the uncovered code lines:
- synthbanshee/tts/quality_gates.py: 88, 166, 183, 187, 254, 293
- synthbanshee/tts/renderer.py: 349

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25229232317 attempt 1
Comment timestamp: 2026-05-01T19:23:09.827900+00:00
PR head commit: dd6041ba1a6d5cb5e4842d5522fefe2baefb2579

- Mark M11, M13, M15 as Done in V3 implementation tracker (PRs #49–#51) - Update V3.1 recommended-order note: only M16 and M12 remain - Fix 4 wiki pages: review_state human-authored → human-reviewed, remove extra created/updated fields not in splendor schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update tracker (M11/M13/M15 done) + fix wiki frontmatter - Mark M11, M13, M15 as Done in V3 implementation tracker (PRs #49–#51) - Update V3.1 recommended-order note: only M16 and M12 remain - Fix 4 wiki pages: review_state human-authored → human-reviewed, remove extra created/updated fields not in splendor schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: fix GenerationMetadata type — dataclass → Pydantic BaseModel The implementation uses a Pydantic BaseModel, not a dataclass. Update both mentions in the V3 design doc to match the code. Addresses COPILOT-1 on PR #53. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…oor + helium range (#90) The bisect on PR #86 showed the residual sp_it_a_0001 WER regression (0.322 vs 04-15's 0.056) is caused by M7 SpeakerState drift compounding with #51's M15 style_map values, producing effective pitch +14 % to +17 % and rate 1.27-1.33x at high-intensity turns. That range simultaneously sounds cartoonish to listeners (May-3 listening test "helium / oompa- loompa") and trips Whisper-large-v3's silence-detection heuristic — the classic length-ratio collapse to ~0.7 that hid the bug for weeks. This PR ships a partial fix: a runtime effective-prosody cap that addresses the canonical Whisper-backdoor fingerprint and the helium- range pitch concern, plus the two detection layers Shay asked for to catch this class of regression in the future. It does NOT fully restore high-intensity WER to the pre-#51 baseline — see #89 for the follow-up workstream. ## Tier-3 Whisper validation (`sp_it_a_0001`) | variant | dur | WER | length_ratio | hyp / ref | |---|---:|---:|---:|---:| | 04-15 reference | 155.9 s | 0.056 | 1.009 | 236 / 234 | | post-#86 main (no cap) | 146.6 s | 0.322 | 0.709 | 166 / 234 | | this PR (cap active) | 149.1 s | **0.129** | **0.906** | 212 / 234 | - Length-ratio recovers above the qa-report --asr 0.85 threshold. - WER reduced 2.5x (0.322 -> 0.129) but still above the 04-15 baseline of 0.056. Failure mode shifts from silence-detector trip (~30 % of words missing) to substitution noise — distinct mechanism requiring a paired listening test to fix without breaking M15 naturalness calibration. Tracked in #89 with insights and four proposed approaches. ## The fix — effective-prosody runtime cap `synthbanshee/tts/renderer._apply_effective_prosody_cap` clamps post- state, post-randomization prosody before SSML emission: - pitch in [-3.0, +2.0] st (~ +/- 12 % Azure) - rate in [0.85, 1.20] - volume left to the existing +/-50 % Azure clamp (Whisper internally normalizes loudness, per #82's lever probe — not a Whisper-trip dimension). Caps are anchored to the pre-#51 effective envelope, which produced the 04-15 reference clips with WER 0.04-0.08. Tighter caps would diverge further from M15 listening-test calibration; looser caps would re-trip Whisper. Each cap activation logs a warning and is recorded per turn. ## Detection layer 1 — static prosody-cap activations in metadata - `DialogueTurn.effective_prosody_caps` carries per-turn cap events. - `cli.py` rolls them up into `ClipMetadata.generation_metadata.effective_prosody_caps` (new `EffectiveProsodyCapEvent` model in labels/schema.py). - `qa-report` surfaces a new "Effective-Prosody Cap Activations (#87)" table per clip — runs on every batch, no Azure / Whisper required. Tier-3 render of sp_it_a_0001 recorded 14 cap activations across 7 high-intensity turns; metadata example in PR description. ## Detection layer 2 — `qa-report --asr` Whisper backdoor check New `synthbanshee/package/asr_sanity.py` provides a lazy-loaded `WhisperRunner` and `compute_asr_metrics`. `qa-report --asr` runs Whisper-large-v3 on every clip in a directory, flags clips whose length-ratio falls below `--asr-min-length-ratio` (default 0.85 — the #87 fingerprint sat at ~0.71). Heavy dependencies isolated in the new `eval-asr` optional extra so normal generation/QA stays light. Per the policy decision documented in CLAUDE.md ("ASR sanity check policy"), Tier-3 ASR sanity is local-only (not in CI) for now — see GH issue #88 for the deferred CI re-evaluation triggers. ## Tests - tests/unit/test_effective_prosody_cap.py: 11 tests covering the helper unit, render_utterance integration, and render_scene event propagation to DialogueTurn. - tests/unit/test_qa.py::TestProsodyCapRollup: 3 tests verifying cap-event aggregation in qa-report. - tests/unit/test_asr_sanity.py: 11 tests covering normalize_for_wer, AsrMetrics threshold semantics, and bracket-line stripping in the reference parser. Heavy Whisper inference is exercised by the Tier-3 local run, not these tests. - 1687 unit tests pass (1662 baseline + 25 new); ruff + mypy clean. ## Docs - CLAUDE.md: new "ASR sanity check policy" section + "What NOT to do" bullets pinning the cap thresholds and the Tier-3 local-only policy. - pyproject.toml: new `eval-asr` optional extra. Reduces #87 (does not fully close — see #89 for the residual WER work). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 1, 2026 18:58

shaypal5 added this to the M15 milestone May 1, 2026

shaypal5 added the enhancement New feature or request label May 1, 2026

Copilot started reviewing on behalf of shaypal5 May 1, 2026 18:58 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 1, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings May 1, 2026 19:13

Copilot started reviewing on behalf of shaypal5 May 1, 2026 19:13 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread synthbanshee/tts/quality_gates.py Outdated

Comment thread synthbanshee/tts/renderer.py

Comment thread synthbanshee/tts/quality_gates.py

shaypal5 merged commit 7d30492 into main May 1, 2026
6 checks passed

shaypal5 deleted the feat/m15-prosody-tuning branch May 1, 2026 20:15

shaypal5 mentioned this pull request May 1, 2026

docs: update implementation tracker — M11, M13, M15 now done #53

Merged

shaypal5 mentioned this pull request May 6, 2026

fix(tts): #87 follow-up — test rate-floor lift to address residual sp_it WER gap (R) #91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(m15): SSML prosody tuning with research-validated Hebrew parameters#51

feat(m15): SSML prosody tuning with research-validated Hebrew parameters#51
shaypal5 merged 4 commits into
mainfrom
feat/m15-prosody-tuning

shaypal5 commented May 1, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 1, 2026

Summary

Changes by file

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants