fix(tts): #83 partial — revert inter-word breaks (low-intensity scenes only) by shaypal5 · Pull Request #86 · DataHackIL/SynthBanshee

shaypal5 · 2026-05-05T20:41:41Z

Problem

Whisper-large-v3 WER on Tier A clips dropped from ~0.04–0.08 (pre-2026-04-16) to ~0.286 on current main (#83). The same scene config also produces 63% longer audio (#81 — 121s → 198s on sp_neu_a_0001).

PR #82 falsified loudness as the cause via a controlled lever probe. #83 was spun out as the actual ASR fix.

Bisect — partial result

Same scene, same random_seed: 1401, same Whisper-large-v3 invocation across all rows. Two scenes (N=2): one low-intensity (intensity arc almost all I1), one high-intensity (I1→I5 arc).

scene	variant	duration	WER	length-ratio
`sp_neu_a_0001` (low-intensity, arc `[1,1,1,2,1]`)	04-15 reference	121.0 s	0.079	1.000
`sp_neu_a_0001`	current `main` HEAD	197.8 s	0.286	0.753
`sp_neu_a_0001`	this PR (revert #70 + #71)	122.9 s	0.048	0.995
`sp_it_a_0001` (high-intensity, arc `[1,2,3,4,5,4,3]`)	04-15 reference	155.9 s	0.056	1.009
`sp_it_a_0001`	this PR (revert #70 + #71)	146.6 s	0.322	0.709

What this PR fixes

sp_neu_a_0001 (low-intensity): fully resolved. WER 0.286 → 0.048 (better than the 04-15 reference's 0.079). Length-ratio returns to ~1.0.
investigate(tts): same scene config now yields 63% longer audio (121s → 198s) — confirm new duration regime is intentional #81 duration regression on sp_neu_a_0001: fully resolved. 197.8s → 122.9s ≈ 04-15 reference's 121.0s.

What this PR does NOT fix

sp_it_a_0001 (high-intensity): residual regression remains. WER 0.322 vs 04-15 reference's 0.056 — same Whisper-silence-detector fingerprint (length-ratio 0.709, ~28% of words missing). The Hebrew text is byte-identical between the 04-15 reference and the revert branch (verified via diff), so the residual gap is purely audio rendering.

There is therefore a second TTS-side change in the post-2026-04-15 window that fires at I3+. PR #86 does not identify or fix it.

Why #70 was the dominant low-intensity cause

<break time="50ms"/> between every word pair does two things at every word boundary:

Adds a small silence. Across ~100–200 word boundaries per clip, the cumulative silence is large, and Whisper's silence-detection heuristic treats the audio as fragmented short utterances → drops context → over-emits silence tokens.
Resets the prosody pipeline at every word. Words can no longer flow into each other prosodically — cadence becomes uniform and unnaturally rhythmic.

Native-speaker A/B confirms the perceptual signature: "very over-pausy", "very robotic content tone", "weird, rhythmic, equal spacing between words" — all consistent with mechanical 50 ms inter-word silences regardless of natural prosody.

#71 is reverted alongside #70 because most of #71 only existed to defuse an SSML parse error (#67) that #70 itself introduced. However, #71 also bundled three independent hardenings unrelated to #70 — those have been re-introduced individually in this PR's second commit (see "Restored hardenings" below).

Bisect scope and limitations (be honest)

Step 2 of the bisect was not run. Step 1 (revert fix(tts): insert inter-word <break> tags to prevent Hebrew word merging #70 + fix(tts): Azure SSML parsing error on adjacent break elements (#67) #71) fully resolved both regressions on sp_neu_a_0001, so feat(m15): SSML prosody tuning with research-validated Hebrew parameters #51 and fix(config): halve pitch escalation at I4–I5 to eliminate helium effect #68 were not additionally reverted. Their residual contribution to WER on low-intensity scenes is not bounded by this work; the arithmetic argument that feat(m15): SSML prosody tuning with research-validated Hebrew parameters #51's I1 rate slowdown (0.95–0.97×) cannot account for the regressions is plausible but not empirically verified.
N=2 evidence base. Two scenes were tested. The high-intensity scene revealed that the bisect's central claim of "sole material cause" was overscoped — there is at least one residual cause, tracked separately in investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87.
Original suspect list missed PR fix(mixer): #65 Lombard spectral tilt at I4–I5 #74 (Lombard spectral tilt at I4–I5). That PR is now the top suspect for the residual cause; investigated in investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87.

Restored hardenings (from #71, unrelated to #70)

The initial revert in this PR was a wholesale git revert of #70 + #71. PR #71 actually bundled three independent hardenings — only one (the adjacent-break merge) was caused by #70. The second commit on this branch restores the other two and narrows the third:

hardening	independent of #70?	status in this PR
`_sanitize_text` (strip XML 1.0 control chars from LLM output)	Yes — independent bug class	restored + regression test
Azure prosody-range clamping (`±50%` pitch, `-50%..+200%` rate) + warning logs	Yes — independent bug class (e.g. `speaker_BYS_F_6-10_001.yaml` has `pitch_delta_st=+9` → `+54%` unclamped)	restored + regression tests
Adjacent-break merge in `_apply_phrase_prosody`	No, but a residual case (phrase break_after → next phrase break_before) survives the #70 revert	restored, narrowed to the surviving case + regression test

These were independently flagged by Copilot's review on this PR (3 review threads, all resolved).

Trade-off — #62 is reopened

#70 was solving #62 (Hebrew word merging — "most generated speech unintelligible to a native Hebrew speaker" per 2026-05-03 listening test). Reverting #70 brings the original word-merging problem back. This is a deliberate trade-off:

The Whisper WER regression is a 6× degradation that blocks all ASR-based eval.
Hebrew word-merging intelligibility is the listener's call and needs A/B comparison against alternative mitigations.

#62 is reopened with the following alternative-mitigation list. Any future attempt should be validated with both a native-speaker A/B listening test (the original failure mode is invisible to ASR metrics) and a Whisper WER regression check on sp_neu_a_0001 (the prior failure mode is invisible to a listener-only test).

Add nikud (vowel diacritics) to high-error words via the Stage 1b normalization lexicon. Targets the actual cause (missing vowel disambiguation) rather than treating every word boundary as ambiguous.
Explicit <phoneme alphabet="ipa"> tags for known-problematic words, scoped to the specific failures rather than applied globally.
Test Google Chirp 3 HD as primary for he-IL — Azure he-IL appears to be the engine that needs the most help; Chirp may handle Hebrew word boundaries differently.
If breaks are revisited — try shorter durations (10–20 ms) AND only at boundaries where merging is empirically observed (not every word). Note that the prosody-reset side-effect of <break> may persist even at low durations, so a listening test is required regardless of duration.

Changes per file

file	change	reason
`synthbanshee/tts/ssml_builder.py`	reverted `_inject_word_breaks` and per-word break logic; restored `_sanitize_text`, prosody clamping (with warning logs), and a narrowed adjacent-break merge	partial revert — keep #70's removal, keep #71's unrelated hardenings
`tests/unit/test_tts.py`	dropped tests for the removed per-word-break behaviour; added `TestSSMLInvariants` class with five new tests pinning the restored hardenings + the #83 regression rule	`test_no_per_word_breaks_in_default_ssml` is the regression test that would catch any future re-introduction of #70's approach
`tests/unit/test_phrase_prosody.py`	restored pre-#70 assertion shapes	text-injection no longer creates `<break>` children in the no-phrases case

No prosody / SSML defaults changed. Loudness contract from #82 remains in effect.

Test plan

pytest tests/unit -q — 1662 passed locally
ruff check synthbanshee/tts/ tests/unit/ — clean
mypy synthbanshee/tts/ssml_builder.py — clean
CI: all 6 checks green on the first pushed commit (d07e144); restoration commit (df65198) re-running CI now
Reproduce WER gap on the existing pre-rendered pair (no render cost) — confirmed 0.079 vs 0.286
Render sp_neu_a_0001 on this branch (low-intensity, 12 turns, cache-miss against Azure) — duration 122.9 s, WER 0.048
Render sp_it_a_0001 on this branch (high-intensity, 17 turns, cache-miss) — duration 146.6 s, WER 0.322 (residual cause confirmed; tracked in investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87)
All three Copilot review threads resolved by the restoration commit

References

Closes investigate(tts): same scene config now yields 63% longer audio (121s → 198s) — confirm new duration regime is intentional #81 — sp_neu duration regression (197.8s → 122.9s) is fully resolved by this PR.
Does NOT close investigate(tts): Whisper WER regression on Tier A clips is NOT loudness-driven — bisect M15/#70/#71 prosody changes #83 — only resolves it for low-intensity scenes. The high-intensity residual is tracked in investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87.
Opens investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87 — Whisper WER regression on high-intensity (I3+) Tier A clips, with the residual investigation plan.
Reopens bug(tts): Hebrew word merging makes speech unintelligible #62 — Hebrew word merging; alternative-mitigation list above.
Builds on fix(preprocessing): #78 define loudness contract + metadata trail (does NOT recover Whisper — see #83) #82's loudness contract (orthogonal — does not regress that fix).

…#67) (#71)" This reverts commit 0bdb217.

…rd merging (#70)" This reverts commit d0c273b.

Copilot

Pull request overview

This PR reverts the SSML “inter-word <break> injection” behavior (and its test suite) to restore pre-#70/#71 TTS output characteristics, addressing the reported Whisper WER and scene-duration regressions in the Hebrew TTS pipeline.

Changes:

Revert SSMLBuilder back to direct text injection (removing word-boundary <break> insertion and related SSML normalization logic).
Remove unit tests that asserted word-break injection and adjacent-break merging behavior.
Update phrase-prosody unit tests to match the restored “plain text” behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`synthbanshee/tts/ssml_builder.py`	Reverts SSML generation logic to remove inter-word break injection and associated handling.
`tests/unit/test_tts.py`	Drops tests that validated inter-word breaks / adjacent-break merging / sanitization behavior.
`tests/unit/test_phrase_prosody.py`	Updates assertions to match non-break-injected phrase prosody output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…regression test The initial revert in this branch (revert PR #70 + PR #71 wholesale) was too aggressive: PR #71 bundled three independent hardenings, only one of which was caused by PR #70. This commit restores the two hardenings that have nothing to do with inter-word break injection, and narrows the third to the residual case that survives #70's revert. Also adds the #83 regression test that pins the bisect finding. Restored from #71 (and explicitly verified to NOT re-introduce #83): - `_sanitize_text` + `_XML_INVALID_CHARS_RE` regex. Defends against an LLM emitting XML 1.0 control characters that otherwise make the SSML unparseable by Azure. Independent bug class from per-word breaks. - Azure-range prosody clamping in `_semitones_to_percent`, `_rate_to_string`, `_volume_to_string`, plus warning logs on clamp activation. `speaker_BYS_F_6-10_001.yaml` ships `pitch_delta_st=+9` → +54% unclamped, which Azure rejects. Independent bug class. - Adjacent `<break>` merging in `_apply_phrase_prosody`, narrowed to the phrase-after / phrase-before case (the only adjacent-break source that survives #70's revert). The original #71 logic also had a word-break branch that is no longer reachable. Added: - `test_no_per_word_breaks_in_default_ssml` regression test pinned to #83. The default multi-word SSML must not contain `<break>` tags; per-word break injection (PR #70) tripped Whisper's silence-detection heuristic and produced the WER regression. Any future Hebrew word- merge mitigation (#62) must not re-introduce per-word breaks. - `test_text_with_invalid_xml_chars_sanitized`, `test_prosody_pitch_clamped_to_azure_range`, `test_prosody_rate_clamped_to_azure_range`, `test_prosody_volume_clamped_to_azure_range`, `test_adjacent_phrase_breaks_are_merged` — pin the restored hardenings. All three of these were independently flagged by Copilot's review on this PR (resolves three Copilot review threads). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-05T21:34:22Z

pr-agent-context report:

This is a refreshed snapshot of the current PR state.

This run includes a patch coverage gap on PR #86 in repository https://github.com/DataHackIL/SynthBanshee

Address the patch coverage gaps below, then push all of these changes in a single commit.

# Patch coverage

Patch test coverage is 53.33%; please raise it to 100%. These are the uncovered code lines:
- synthbanshee/tts/ssml_builder.py: 115, 118, 119, 120, 121, 137, 141

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: schedule
Workflow run: 25403543471 attempt 1
Comment timestamp: 2026-05-05T21:33:48.144278+00:00
PR head commit: df651981211de9d2d8db65caf992a31dac8f58a4

…oor + helium range (#90) The bisect on PR #86 showed the residual sp_it_a_0001 WER regression (0.322 vs 04-15's 0.056) is caused by M7 SpeakerState drift compounding with #51's M15 style_map values, producing effective pitch +14 % to +17 % and rate 1.27-1.33x at high-intensity turns. That range simultaneously sounds cartoonish to listeners (May-3 listening test "helium / oompa- loompa") and trips Whisper-large-v3's silence-detection heuristic — the classic length-ratio collapse to ~0.7 that hid the bug for weeks. This PR ships a partial fix: a runtime effective-prosody cap that addresses the canonical Whisper-backdoor fingerprint and the helium- range pitch concern, plus the two detection layers Shay asked for to catch this class of regression in the future. It does NOT fully restore high-intensity WER to the pre-#51 baseline — see #89 for the follow-up workstream. ## Tier-3 Whisper validation (`sp_it_a_0001`) | variant | dur | WER | length_ratio | hyp / ref | |---|---:|---:|---:|---:| | 04-15 reference | 155.9 s | 0.056 | 1.009 | 236 / 234 | | post-#86 main (no cap) | 146.6 s | 0.322 | 0.709 | 166 / 234 | | this PR (cap active) | 149.1 s | **0.129** | **0.906** | 212 / 234 | - Length-ratio recovers above the qa-report --asr 0.85 threshold. - WER reduced 2.5x (0.322 -> 0.129) but still above the 04-15 baseline of 0.056. Failure mode shifts from silence-detector trip (~30 % of words missing) to substitution noise — distinct mechanism requiring a paired listening test to fix without breaking M15 naturalness calibration. Tracked in #89 with insights and four proposed approaches. ## The fix — effective-prosody runtime cap `synthbanshee/tts/renderer._apply_effective_prosody_cap` clamps post- state, post-randomization prosody before SSML emission: - pitch in [-3.0, +2.0] st (~ +/- 12 % Azure) - rate in [0.85, 1.20] - volume left to the existing +/-50 % Azure clamp (Whisper internally normalizes loudness, per #82's lever probe — not a Whisper-trip dimension). Caps are anchored to the pre-#51 effective envelope, which produced the 04-15 reference clips with WER 0.04-0.08. Tighter caps would diverge further from M15 listening-test calibration; looser caps would re-trip Whisper. Each cap activation logs a warning and is recorded per turn. ## Detection layer 1 — static prosody-cap activations in metadata - `DialogueTurn.effective_prosody_caps` carries per-turn cap events. - `cli.py` rolls them up into `ClipMetadata.generation_metadata.effective_prosody_caps` (new `EffectiveProsodyCapEvent` model in labels/schema.py). - `qa-report` surfaces a new "Effective-Prosody Cap Activations (#87)" table per clip — runs on every batch, no Azure / Whisper required. Tier-3 render of sp_it_a_0001 recorded 14 cap activations across 7 high-intensity turns; metadata example in PR description. ## Detection layer 2 — `qa-report --asr` Whisper backdoor check New `synthbanshee/package/asr_sanity.py` provides a lazy-loaded `WhisperRunner` and `compute_asr_metrics`. `qa-report --asr` runs Whisper-large-v3 on every clip in a directory, flags clips whose length-ratio falls below `--asr-min-length-ratio` (default 0.85 — the #87 fingerprint sat at ~0.71). Heavy dependencies isolated in the new `eval-asr` optional extra so normal generation/QA stays light. Per the policy decision documented in CLAUDE.md ("ASR sanity check policy"), Tier-3 ASR sanity is local-only (not in CI) for now — see GH issue #88 for the deferred CI re-evaluation triggers. ## Tests - tests/unit/test_effective_prosody_cap.py: 11 tests covering the helper unit, render_utterance integration, and render_scene event propagation to DialogueTurn. - tests/unit/test_qa.py::TestProsodyCapRollup: 3 tests verifying cap-event aggregation in qa-report. - tests/unit/test_asr_sanity.py: 11 tests covering normalize_for_wer, AsrMetrics threshold semantics, and bracket-line stripping in the reference parser. Heavy Whisper inference is exercised by the Tier-3 local run, not these tests. - 1687 unit tests pass (1662 baseline + 25 new); ruff + mypy clean. ## Docs - CLAUDE.md: new "ASR sanity check policy" section + "What NOT to do" bullets pinning the cap thresholds and the Tier-3 local-only policy. - pyproject.toml: new `eval-asr` optional extra. Reduces #87 (does not fully close — see #89 for the residual WER work). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ft (#96) * docs: sync .agent-plan.md to current state (M16 done, M17 in flight) The state tracker was last updated 2026-04-22 and claimed the active task was M8b. Since then M8b, M9a, M9b, M10a, M10b, M11, M13, M14, M15, M16 have all merged, and M17 (automated evaluation) has had its design (#73), Phase A spike (#77, #79), and a wave of bug-fix PRs (#82, #85, #86, #90) land — anyone reading this file got an 8-milestone-stale picture. Rewrote: - Current system state — list every merged V3 milestone with one-line summaries; promoted the loudness contract (#78) and effective-prosody cap (#87) to the architectural-invariants list since both are load-bearing for any agent editing TTS or preprocessing code. - Active / next task — replaced "M8b" with the M17 ASR regression thread, noted PR #90 as the partial fix and #91 (rate-floor lift R) as the queued next experiment, including the WER ≤ 0.10 + listening-test pass criterion. - Open threads table — new section listing the threads agents most often need to know about (M12 gate, M17 full automation, #62 word merging, #72 SSML parse, #88 CI ASR deferral). - Context pointers — added splendor-brief as the orientation entrypoint and the .venv-vs-~/.local PATH trap. - CI / Workflow notes — added the Tier-3 ASR local-only policy summary so PRs touching audio-rendering files don't merge without it. This file is a quick-orientation summary — added a header line marking the design-doc tracker as authoritative when details disagree, so the next drift gets caught in a tracker diff rather than a stale summary. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: delete .agent-plan.md (Option A; supersedes prior commit) Self-review of commit 4361497 (the .agent-plan.md rewrite) caught that the file fundamentally fails the duplication test: milestone status duplicates docs/audio_generation_v3_design.md → Implementation Tracker; open-thread state duplicates GitHub Issues + splendor brief; pointers duplicate AGENTS.md; and the only non-duplicate section (architectural invariants) is already covered in AGENTS.md (FLOAT subtype line 72, MixedScene shift line 37, validate_audio peak line 85, full #78 loudness contract line 35, full #87 effective-prosody cap line 73). The rewrite also reproduced the "if details disagree, the tracker wins" disclaimer antipattern — the same one PR #94's review removed from docs/spec.md §3.1 just one PR ago. A docs file whose explicit charter is "I am allowed to be wrong relative to the canonical source" is structured drift bait. The original .agent-plan.md got 15 days stale; "rewrite more carefully" was the wrong response. Changes: - Delete .agent-plan.md (75-line summary that duplicated load-bearing state held authoritatively elsewhere). - Update .claude/skills/open-feature-pr.md step 2 to drop the ".agent-plan.md" fallback for milestone-ID inference; the branch name + parent issue's milestone field already cover it, and pointing at the design-doc tracker as authoritative is more durable than pointing at a manually-maintained summary. - Splendor maintenance: source forget src-9d9759e5ad... --apply. Removes the orphan source manifest, wiki summary page, and wiki index entry. Residual cross-references in 5 planning tasks and 5 wiki pages remain — splendor surfaces them but doesn't auto-clean; they'll regenerate on next ingest of those sources. Nothing added to AGENTS.md: the cross-cutting rules from .agent-plan.md are already there. The remaining "invariants" (5-tuple mixer API post-M8a, audible_* timeline use, MixMode no-audio-deps, _peak_limit vs _normalize_peak naming) are mixer-internal details that belong in module docstrings, not global agent rules — and AGENTS.md has its own M8a drift bug (line 71 still says "4-tuple") that's better fixed in a dedicated PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(agents): fix M3a/M8a/#74 drift in mixer Segment API description AGENTS.md "TTS" section claimed `SceneMixer.mix_sequential` took a 4-tuple `(wav_bytes, pause_s, speaker_id, rms_target_dbfs)`. The actual current API is the `Segment` dataclass in `synthbanshee/tts/mixer.py` with six named fields (`wav_bytes`, `amount_s`, `speaker_id`, `rms_target_dbfs`, `mix_mode`, `intensity`). The doc has been wrong since at least M8a (added `mix_mode`); #74's Lombard tilt then added `intensity`. Per the dataclass docstring, the named-fields move from positional tuple was deliberate so call sites and reviewers can't transpose args silently. Surfaced during PR #96 review (delete-.agent-plan.md) — the original .agent-plan.md and AGENTS.md disagreed about the segment API; turns out they were both stale, and the design-doc tracker (#74 row line 219) shows the dataclass move that neither AGENTS.md nor .agent-plan.md caught. Splendor re-ingest of AGENTS.md (and the opportunistic re-ingest of docs/spec.md that ingest --changed picked up) is intentionally NOT in this commit — it's 16 state/wiki/planning files of churn, scoped for a dedicated splendor maintenance follow-up rather than mixed into a doc fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(agents): cross-reference Lombard tilt as #65 (issue), not #74 (PR) Copilot's review of commit b20eaac caught a cross-reference convention break: the codebase consistently uses #65 (the issue) when referencing the Lombard high-shelf — see synthbanshee/tts/mixer.py lines 6, 80, 84, 88, 108, 183, 322, and docs/audio_generation_v3_design.md §4.2c (line 215). PR #74 is the implementation that closed the issue, but the cross-link convention is to point at the issue. One-character fix: #74 → #65 in the M3a TTS bullet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

shaypal5 added 2 commits May 5, 2026 23:29

Revert "fix(tts): Azure SSML parsing error on adjacent break elements (…

99eeb84

…#67) (#71)" This reverts commit 0bdb217.

Revert "fix(tts): insert inter-word <break> tags to prevent Hebrew wo…

d07e144

…rd merging (#70)" This reverts commit d0c273b.

Copilot AI review requested due to automatic review settings May 5, 2026 20:41

shaypal5 added this to the M17 milestone May 5, 2026

shaypal5 added bugfix comp: tts TTS rendering, SSML, Azure/Google providers labels May 5, 2026

shaypal5 mentioned this pull request May 5, 2026

bug(tts): Hebrew word merging makes speech unintelligible #62

Open

Copilot started reviewing on behalf of shaypal5 May 5, 2026 20:42 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread synthbanshee/tts/ssml_builder.py

Comment thread synthbanshee/tts/ssml_builder.py

Comment thread synthbanshee/tts/ssml_builder.py

shaypal5 mentioned this pull request May 5, 2026

investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87

Open

shaypal5 changed the title ~~fix(tts): #83 revert inter-word break tags — root cause of Whisper WER & duration regressions~~ fix(tts): #83 partial — revert inter-word breaks (low-intensity scenes only) May 5, 2026

This comment has been minimized.

Sign in to view

shaypal5 merged commit a910f2e into main May 5, 2026
6 checks passed

shaypal5 deleted the investigate/83-whisper-wer-bisect branch May 5, 2026 21:35

This was referenced May 6, 2026

fix(tts): #87 follow-up — close residual sp_it WER gap (0.129) toward baseline (0.056) without sacrificing M15 calibration #89

Closed

fix(tts): #87 partial — effective-prosody cap addresses Whisper backdoor + helium range #90

Merged

This was referenced May 6, 2026

tts: aggregate Hebrew TTS naturalness backlog from 2026-05-06 listening test #92

Open

docs: delete .agent-plan.md + fix AGENTS.md M3a/M8a/#74 mixer-API drift #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tts): #83 partial — revert inter-word breaks (low-intensity scenes only)#86

fix(tts): #83 partial — revert inter-word breaks (low-intensity scenes only)#86
shaypal5 merged 3 commits into
mainfrom
investigate/83-whisper-wer-bisect

shaypal5 commented May 5, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Bisect — partial result

What this PR fixes

What this PR does NOT fix

Why #70 was the dominant low-intensity cause

Bisect scope and limitations (be honest)

Restored hardenings (from #71, unrelated to #70)

Trade-off — #62 is reopened

Changes per file

Test plan

References

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaypal5 commented May 5, 2026 •

edited

Loading