fix(tts): #91 — rate-floor lift R experiment (sp_it_a_0001)#95
Merged
Conversation
Single-knob change: lift `_EFFECTIVE_RATE_MIN` from 0.85 to 0.95 to test whether VIC's I4/I5 slowdown is what drives the residual Whisper WER gap on `sp_it_a_0001` after PR #90. PR #90's symmetric pitch cap closed most of the #87 WER gap (0.322 → 0.129). PR #89 (B) falsified pitch as the residual driver. R is the next single-knob lever: VIC at high intensity currently floors at rate 0.85 (style_map I5 0.90 × baseline 1.0 × drift toward 0.87 → clamps to 0.85). Hypothesis: that floor still trips Whisper's silence-detection heuristic. Lifting the floor risks flattening VIC's deliberate distress cue — the paired native-speaker listening test is the merge gate. Tier-3 ASR sanity (local) on `sp_it_a_0001` (seed 1101, single render): | variant | dur | WER | len_r | |--------------------------------------|-------:|------:|------:| | 04-15 baseline (target) | 155.9s | 0.056 | 1.013 | | PR #90 reference (cap on) | 149.1s | 0.129 | 0.910 | | B falsification (per-role pitch cap) | 149.1s | 0.129 | 0.910 | | R (rate floor 0.95) — this PR | 143.8s | 0.052 | 1.009 | WER pass criterion (≤ 0.10) cleared with margin; R matches the 04-15 baseline. Listening test pending — VIC at I4/I5 must still sound distressed for the cap relaxation to be acceptable per CLAUDE.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR adjusts the TTS “effective prosody” runtime cap to support issue #91’s R experiment, raising the effective rate floor to reduce Whisper WER regressions observed on high-intensity turns (notably sp_it_a_0001) while keeping the pitch caps and rate ceiling unchanged.
Changes:
- Lift
_EFFECTIVE_RATE_MINinsynthbanshee/tts/renderer.pyfrom0.85to0.95, with updated in-code rationale tying the change to #91. - Add a unit-test “anchor” assertion to loudly catch accidental reverts of the new floor value.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
synthbanshee/tts/renderer.py |
Raises the effective rate floor constant and documents the #91 rationale/risks inline. |
tests/unit/test_effective_prosody_cap.py |
Adds a literal-value anchor test ensuring the new floor stays at 0.95. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
pr-agent-context report: This is a refreshed snapshot of the current PR state.
This run includes a patch coverage gap on PR #95 in repository https://github.com/DataHackIL/SynthBanshee
Address the patch coverage gaps below, then push all of these changes in a single commit.
# Patch coverage
Patch test coverage is 50%; please raise it to 100%. These are the uncovered code lines:
- synthbanshee/tts/renderer.py: 70Run metadata: |
This was referenced May 7, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hypothesis (#91 R)
PR #90's symmetric pitch cap recovered
sp_it_a_0001WER from 0.322 → 0.129, but left a residual gap to the 04-15 baseline (0.056). #89 falsified pitch as the residual driver. R is the next single-knob lever: lift_EFFECTIVE_RATE_MINfrom0.85→0.95, on the hypothesis that VIC's I4/I5 slowdown — currently flooring at 0.85 — is what still trips Whisper's silence-detection heuristic.The risk this PR explicitly traded against: VIC slowdown at high intensity is a deliberate distress cue. Flooring rate at 0.95 may flatten it. The native-speaker listening test was the merge gate — Tier-3 ASR alone is not sufficient.
What this PR ships
A single-constant change plus an anchor test:
synthbanshee/tts/renderer.py_EFFECTIVE_RATE_MIN = 0.85→0.95. Comment updated to anchor the new value to #91's R rationale. Pitch cap (_EFFECTIVE_PITCH_*) and rate ceiling (_EFFECTIVE_RATE_MAX) untouched.tests/unit/test_effective_prosody_cap.pytest_rate_floor_anchored_to_0_95_for_issue_91— a literal-value anchor so an inadvertent revert is caught loudly. Existing rate-floor assertions reference the imported_EFFECTIVE_RATE_MINsymbol and auto-track the new value.Diff: 2 files, +16 / -1.
Tier-3 ASR sanity (local)
sp_it_a_0001re-rendered at seed 1101 intodata/m17_pr89_R_rate_floor/(5 VIC turns re-rendered against Azure due to new floor; AGG and pitch SSML unchanged → cache hits). Whisper-large-v3 + jiwer over four variants:WER pass criterion (≤ 0.10) cleared with margin; R reaches WER 0.052, equivalent to the 04-15 baseline. Length-ratio fully recovered to 1.009 (vs PR #90's 0.910).
Cap-clamp breakdown (this render)
19 cap activations across the scene. Rate-floor clamps are the new behaviour from this PR:
Plus the unchanged PR #90 ceiling clamps on AGG turns (rate 1.20, pitch +2.00) — 11 events on AGG at I3-I5.
Listening test — completed (2026-05-07)
Native-speaker A/B by @shaypal5 between PR #90 reference and R candidate, focused on VIC at I3–I4.
Verdict: parity, not flattening. Both renders sound the same. In both, VIC at I3–I5 does not sound distressed — only the +2.0 st pitch ceiling is audible as an intensity cue, while the rate movement (whether floored at 0.85 or 0.95) is below perceptual threshold.
Verbatim: "both sound the same, which in both cases doesn't sound very distressed, just a robot whose pitch is a bit higher."
What that means for R
R does not regress vs the May-3-calibrated PR #90 cap — naturalness parity holds. The "VIC sounds distressed" bar wasn't being met by main either; that's a deeper TTS issue (rate + pitch knobs are not the right levers for distress in Azure he-IL voices) which is now tracked separately as #97.
Reframed merge rationale: R is a strict WER win at zero perceptual cost. Decoupling the WER fix (this PR) from the distress investigation (#97) is the right call — the two need different levers and shouldn't be gated on each other.
What R does NOT fix
Test plan
pytest tests/unit -q— 1688 passed locally (1687 pre-PR + 1 new anchor test).ruff check synthbanshee/tts/renderer.py tests/unit/test_effective_prosody_cap.py— clean.mypy synthbanshee/tts/renderer.py tests/unit/test_effective_prosody_cap.py— clean.References
sp_it_a_0001(0.052 ≈ baseline 0.056).🤖 Generated with Claude Code