fix(tts): #91 — rate-floor lift R experiment (sp_it_a_0001) by shaypal5 · Pull Request #95 · DataHackIL/SynthBanshee

shaypal5 · 2026-05-07T05:54:33Z

Hypothesis (#91 R)

PR #90's symmetric pitch cap recovered sp_it_a_0001 WER from 0.322 → 0.129, but left a residual gap to the 04-15 baseline (0.056). #89 falsified pitch as the residual driver. R is the next single-knob lever: lift _EFFECTIVE_RATE_MIN from 0.85 → 0.95, on the hypothesis that VIC's I4/I5 slowdown — currently flooring at 0.85 — is what still trips Whisper's silence-detection heuristic.

The risk this PR explicitly traded against: VIC slowdown at high intensity is a deliberate distress cue. Flooring rate at 0.95 may flatten it. The native-speaker listening test was the merge gate — Tier-3 ASR alone is not sufficient.

What this PR ships

A single-constant change plus an anchor test:

file	change
`synthbanshee/tts/renderer.py`	`_EFFECTIVE_RATE_MIN = 0.85` → `0.95`. Comment updated to anchor the new value to #91's R rationale. Pitch cap (`_EFFECTIVE_PITCH_*`) and rate ceiling (`_EFFECTIVE_RATE_MAX`) untouched.
`tests/unit/test_effective_prosody_cap.py`	Add `test_rate_floor_anchored_to_0_95_for_issue_91` — a literal-value anchor so an inadvertent revert is caught loudly. Existing rate-floor assertions reference the imported `_EFFECTIVE_RATE_MIN` symbol and auto-track the new value.

Diff: 2 files, +16 / -1.

Tier-3 ASR sanity (local)

sp_it_a_0001 re-rendered at seed 1101 into data/m17_pr89_R_rate_floor/ (5 VIC turns re-rendered against Azure due to new floor; AGG and pitch SSML unchanged → cache hits). Whisper-large-v3 + jiwer over four variants:

variant	dur	WER	length_ratio	hyp / ref
04-15 baseline (target)	155.9s	0.056	1.013	236 / 233
PR #90 reference (cap on)	149.1s	0.129	0.910	212 / 233
B falsification (per-role pitch cap)	149.1s	0.129	0.910	212 / 233
R (rate floor 0.95) — this PR	143.8s	0.052	1.009	235 / 233

WER pass criterion (≤ 0.10) cleared with margin; R reaches WER 0.052, equivalent to the 04-15 baseline. Length-ratio fully recovered to 1.009 (vs PR #90's 0.910).

Cap-clamp breakdown (this render)

19 cap activations across the scene. Rate-floor clamps are the new behaviour from this PR:

turn	role	I	dim	pre	post
04	VIC	2	rate floor	0.912	0.950
06	VIC	2	rate floor	0.901	0.950
08	VIC	3	rate floor	0.878	0.950
10	VIC	3	rate floor	0.846	0.950
12	VIC	3	rate floor	0.819	0.950
14	VIC	4	rate floor	0.833	0.950
14	VIC	4	pitch ceil	+2.29	+2.00
16	VIC	3	rate floor	0.819	0.950

Plus the unchanged PR #90 ceiling clamps on AGG turns (rate 1.20, pitch +2.00) — 11 events on AGG at I3-I5.

Listening test — completed (2026-05-07)

Native-speaker A/B by @shaypal5 between PR #90 reference and R candidate, focused on VIC at I3–I4.

Verdict: parity, not flattening. Both renders sound the same. In both, VIC at I3–I5 does not sound distressed — only the +2.0 st pitch ceiling is audible as an intensity cue, while the rate movement (whether floored at 0.85 or 0.95) is below perceptual threshold.

Verbatim: "both sound the same, which in both cases doesn't sound very distressed, just a robot whose pitch is a bit higher."

What that means for R

R does not regress vs the May-3-calibrated PR #90 cap — naturalness parity holds. The "VIC sounds distressed" bar wasn't being met by main either; that's a deeper TTS issue (rate + pitch knobs are not the right levers for distress in Azure he-IL voices) which is now tracked separately as #97.

Reframed merge rationale: R is a strict WER win at zero perceptual cost. Decoupling the WER fix (this PR) from the distress investigation (#97) is the right call — the two need different levers and shouldn't be gated on each other.

What R does NOT fix

VIC distress cue at I3–I5 — see TTS distress cue absent at I3–I5: rate + pitch are not sufficient signal #97 (TTS distress cue absent: rate + pitch are not sufficient signal). Out of scope here.
AGG aggression at I3–I5 — not exercised by this listening test; may or may not have the same gap.

Test plan

pytest tests/unit -q — 1688 passed locally (1687 pre-PR + 1 new anchor test).
ruff check synthbanshee/tts/renderer.py tests/unit/test_effective_prosody_cap.py — clean.
mypy synthbanshee/tts/renderer.py tests/unit/test_effective_prosody_cap.py — clean.
Tier-3 ASR sanity (local) — see four-way table above; R clears the 0.10 WER bar and matches the 04-15 baseline. Length-ratio 1.009, no silence-detector trip.
Native-speaker listening test (paired A/B) — completed 2026-05-07. Verdict: parity vs PR fix(tts): #87 partial — effective-prosody cap addresses Whisper backdoor + helium range #90 reference; no naturalness regression. Distress-cue absence is a pre-existing problem tracked in TTS distress cue absent at I3–I5: rate + pitch are not sufficient signal #97.

References

Closes fix(tts): #87 follow-up — test rate-floor lift to address residual sp_it WER gap (R) #91.
Surfaces TTS distress cue absent at I3–I5: rate + pitch are not sufficient signal #97 — distress cue at I3–I5 is not addressable through the cap layer's prosody knobs; a different lever (style, breathiness, disfluency, alt voice model) is needed. PR fix(tts): #91 — rate-floor lift R experiment (sp_it_a_0001) #95 banks the WER win without blocking on that work.
Builds on fix(tts): #87 partial — effective-prosody cap addresses Whisper backdoor + helium range #90 (effective-prosody cap; pitch ceiling + rate ceiling retained from there).
fix(tts): #87 follow-up — close residual sp_it WER gap (0.129) toward baseline (0.056) without sacrificing M15 calibration #89 (B falsification) — closed on the basis that pitch is not the residual driver; this PR confirms rate is.
investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87 — original Whisper WER regression. PR fix(tts): #91 — rate-floor lift R experiment (sp_it_a_0001) #95 fully closes the WER gap on sp_it_a_0001 (0.052 ≈ baseline 0.056).
CLAUDE.md "ASR sanity check policy" — local-only Tier-3 run completed; paired listening test completed.

🤖 Generated with Claude Code

Single-knob change: lift `_EFFECTIVE_RATE_MIN` from 0.85 to 0.95 to test whether VIC's I4/I5 slowdown is what drives the residual Whisper WER gap on `sp_it_a_0001` after PR #90. PR #90's symmetric pitch cap closed most of the #87 WER gap (0.322 → 0.129). PR #89 (B) falsified pitch as the residual driver. R is the next single-knob lever: VIC at high intensity currently floors at rate 0.85 (style_map I5 0.90 × baseline 1.0 × drift toward 0.87 → clamps to 0.85). Hypothesis: that floor still trips Whisper's silence-detection heuristic. Lifting the floor risks flattening VIC's deliberate distress cue — the paired native-speaker listening test is the merge gate. Tier-3 ASR sanity (local) on `sp_it_a_0001` (seed 1101, single render): | variant | dur | WER | len_r | |--------------------------------------|-------:|------:|------:| | 04-15 baseline (target) | 155.9s | 0.056 | 1.013 | | PR #90 reference (cap on) | 149.1s | 0.129 | 0.910 | | B falsification (per-role pitch cap) | 149.1s | 0.129 | 0.910 | | R (rate floor 0.95) — this PR | 143.8s | 0.052 | 1.009 | WER pass criterion (≤ 0.10) cleared with margin; R matches the 04-15 baseline. Listening test pending — VIC at I4/I5 must still sound distressed for the cap relaxation to be acceptable per CLAUDE.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR adjusts the TTS “effective prosody” runtime cap to support issue #91’s R experiment, raising the effective rate floor to reduce Whisper WER regressions observed on high-intensity turns (notably sp_it_a_0001) while keeping the pitch caps and rate ceiling unchanged.

Changes:

Lift _EFFECTIVE_RATE_MIN in synthbanshee/tts/renderer.py from 0.85 to 0.95, with updated in-code rationale tying the change to #91.
Add a unit-test “anchor” assertion to loudly catch accidental reverts of the new floor value.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`synthbanshee/tts/renderer.py`	Raises the effective rate floor constant and documents the #91 rationale/risks inline.
`tests/unit/test_effective_prosody_cap.py`	Adds a literal-value anchor test ensuring the new floor stays at `0.95`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-05-07T06:52:40Z

pr-agent-context report:

This is a refreshed snapshot of the current PR state.

This run includes a patch coverage gap on PR #95 in repository https://github.com/DataHackIL/SynthBanshee

Address the patch coverage gaps below, then push all of these changes in a single commit.

# Patch coverage

Patch test coverage is 50%; please raise it to 100%. These are the uncovered code lines:
- synthbanshee/tts/renderer.py: 70

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: schedule
Workflow run: 25480717421 attempt 1
Comment timestamp: 2026-05-07T06:52:07.628445+00:00
PR head commit: b398b4ff68a7ac08d8912aea2c83c7619fc11d65

shaypal5 added this to the M17 milestone May 7, 2026

shaypal5 added bugfix comp: tts TTS rendering, SSML, Azure/Google providers labels May 7, 2026

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings May 7, 2026 06:11

Copilot started reviewing on behalf of shaypal5 May 7, 2026 06:11 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

shaypal5 mentioned this pull request May 7, 2026

TTS distress cue absent at I3–I5: rate + pitch are not sufficient signal #97

Open

shaypal5 marked this pull request as ready for review May 7, 2026 20:46

shaypal5 merged commit f3b86c4 into main May 7, 2026
10 checks passed

shaypal5 deleted the fix/m17-rate-floor-lift branch May 7, 2026 20:48

This was referenced May 7, 2026

investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87

Open

spike(tts): #97 — Azure express-as fearful style A/B at I4–I5 on sp_it_a_0001 #98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tts): #91 — rate-floor lift R experiment (sp_it_a_0001)#95

fix(tts): #91 — rate-floor lift R experiment (sp_it_a_0001)#95
shaypal5 merged 1 commit into
mainfrom
fix/m17-rate-floor-lift

shaypal5 commented May 7, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothesis (#91 R)

What this PR ships

Tier-3 ASR sanity (local)

Cap-clamp breakdown (this render)

Listening test — completed (2026-05-07)

What that means for R

What R does NOT fix

Test plan

References

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaypal5 commented May 7, 2026 •

edited

Loading