fix(tts): Azure SSML parsing error on adjacent break elements (#67) by shaypal5 · Pull Request #71 · DataHackIL/SynthBanshee

shaypal5 · 2026-05-03T22:59:40Z

Summary

Root cause: PR fix(tts): insert inter-word <break> tags to prevent Hebrew word merging #70's inter-word <break time="50ms"/> elements interact with phrase prosody break_before_ms (from "menace"/"slow"/"break_before" hints) to create adjacent <break> elements. Azure's SSML parser rejects this pattern with error 0x80045003.
Fix: Merge adjacent <break> elements instead of emitting consecutive breaks. Word-boundary breaks (50ms) are replaced by the phrase break; semantic breaks (e.g. a preceding break_after) are summed to preserve both pause intents.
Scope: Only affects high-intensity turns (I3–I5) in violent scene templates that use hints with break_before_ms > 0.

Changes

File	Change
`synthbanshee/tts/ssml_builder.py`	Merge adjacent breaks in `_apply_phrase_prosody` (word-boundary → replace, semantic → sum); clamp prosody values to Azure ranges with `logging.warning`; add `_sanitize_text` for XML-invalid chars as defense-in-depth
`tests/unit/test_tts.py`	9 regression tests: adjacent-break merging, duration semantics (replace vs sum), prosody clamping, text sanitization, full Hebrew I5 integration with niqqud and realistic AGG speaker params

Design decisions

Replace vs sum: A word-boundary break (50ms) is purely structural — the phrase break subsumes its intent. But a semantic break (e.g. 250ms break_after) represents intentional dramatic pause that should be preserved alongside the new phrase's break_before.
Clamping warns, doesn't silently swallow: If prosody values hit Azure's ceiling, that's a speaker-config or state-drift bug upstream. The warning makes it visible in logs.
_sanitize_text stays in the builder: Ideally invalid chars should be rejected at the LLM parsing boundary, but defense-in-depth here ensures the SSML layer never produces unparseable output regardless of upstream bugs.

Test plan

pytest tests/ — all 1688 tests pass (9 new)
ruff check — clean
mypy — clean
Manual: run Azure TTS on previously failing scenes (sp_it_a_0002, sp_sv_a_0001, sp_sv_a_0002, sp_neg_a_0002) to confirm they render successfully

Closes #67

🤖 Generated with Claude Code

…arse error (#67) Azure's SSML parser rejects adjacent <break> elements (error 0x80045003). PR #70's inter-word breaks interact with phrase prosody break_before/break_after to create adjacent breaks in high-intensity turns with "menace"/"slow" hints. Fix: merge adjacent breaks (use max duration) instead of emitting consecutive break elements. Also add prosody value clamping to Azure's documented ranges and text sanitization for XML 1.0 invalid characters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…w test Address self-review feedback: - Merge logic now distinguishes word-boundary breaks (replace) from semantic breaks like break_after (sum durations) to preserve dramatic pause intent - Prosody clamping now logs a warning instead of silently swallowing — makes upstream speaker config bugs visible - Remove redundant ET.fromstring validation (ET.tostring can't produce invalid XML from a valid tree; _sanitize_text handles the only failure path) - Add Hebrew regression test with actual AGG I5 parameters and niqqud text - Add test for break_after → break_before summing on consecutive phrases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-03T23:07:37Z

pr-agent-context report:

This run includes a patch coverage gap on PR #71 in repository https://github.com/DataHackIL/SynthBanshee

Address the patch coverage gaps below, then push all of these changes in a single commit.

# Patch coverage

Patch test coverage is 97.06%; please raise it to 100%. These are the uncovered code lines:
- synthbanshee/tts/ssml_builder.py: 178

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25293417025 attempt 1
Comment timestamp: 2026-05-03T23:06:49.561142+00:00
PR head commit: db82021071d3a0abba3da4add5496235ceb3ae85

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

sp_it_a_0003.yaml (intimate_terror_financial_control then jealousy_surveillance templates, seed 1103) consistently failed with Azure TTS SSML parse error 0x80045003 ("TurnStarted; Received audio size: 0 bytes"). Both templates exhibited the same failure under different slots, suggesting the cause is upstream (in the SSML the script generator emits for IT-typology content in this seed range), not template-specific. Replaced with sp_neg_a_0003 (negative_argument_deescalation template, seed 1303), which renders cleanly. Final 10-clip spike set: 2 IT / 3 NEG / 3 NEU / 2 SV. The IT failure is itself a finding for the report — it suggests PR #67 / #71's SSML escaping work isn't fully covering the LLM output distribution and warrants a follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…sults Final report captures: ASR — PASS strong on all three runs (10 clips, paired comparison): - openai/whisper-large-v3 (HF MPS): median WER 0.064 - ivrit-ai/whisper-large-v3 (HF MPS): median WER 0.036 ← preferred - ivrit-ai/whisper-large-v3-ct2 int8 (faster-whisper CPU): median WER 0.044 Δ int8 vs MPS = +0.008, well within ε=0.02 → CI plan validated. Bonus: int8 CT2 dodges the only hallucination loop (sp_it_a_0002_00) seen on HF MPS; opens the door to using CT2 in dev/QA, not just CI. Hallucination detectors empirically validated on the 10 clips: trigram_repeat_ratio > 1.5 catches the outlier alone (ratio 2.00 vs 1.00 for all other 9 clips). Replaces the wrong-by-one-thousandth length-ratio threshold from PR #77 review. UTMOS — FAIL but with much richer evidence: - With paired same-5 comparison, all five degradations (4 SNR + lowpass) push UTMOS below clean. Direction is correct. - But max separation 0.207, well below 0.5 gate. - White-noise severity is INVERTED (more noise → higher UTMOS). - New clips score ~0.9 MOS points higher than originals — UTMOS is sensitive to TTS pipeline drift, dominating any within-clip degradation signal. Still NO-GO; recommend turn-segmented Option A re-spike before any E2 work. Carry-over finding: Azure TTS SSML rejected sp_it_a_0003 with two different intimate_terror templates. PR #67/#71 SSML escaping work doesn't cover this case; tracking as a separate follow-up. Replaced with sp_neg_a_0003 in the spike set. Reproducibility settled: greedy decoding on MPS is byte-stable run-to-run given the same normalization. Outlier WER 0.144 → 0.127 from PR #77 to this report is fully explained by the new RTL-mark strip in normalize_for_wer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…regression test The initial revert in this branch (revert PR #70 + PR #71 wholesale) was too aggressive: PR #71 bundled three independent hardenings, only one of which was caused by PR #70. This commit restores the two hardenings that have nothing to do with inter-word break injection, and narrows the third to the residual case that survives #70's revert. Also adds the #83 regression test that pins the bisect finding. Restored from #71 (and explicitly verified to NOT re-introduce #83): - `_sanitize_text` + `_XML_INVALID_CHARS_RE` regex. Defends against an LLM emitting XML 1.0 control characters that otherwise make the SSML unparseable by Azure. Independent bug class from per-word breaks. - Azure-range prosody clamping in `_semitones_to_percent`, `_rate_to_string`, `_volume_to_string`, plus warning logs on clamp activation. `speaker_BYS_F_6-10_001.yaml` ships `pitch_delta_st=+9` → +54% unclamped, which Azure rejects. Independent bug class. - Adjacent `<break>` merging in `_apply_phrase_prosody`, narrowed to the phrase-after / phrase-before case (the only adjacent-break source that survives #70's revert). The original #71 logic also had a word-break branch that is no longer reachable. Added: - `test_no_per_word_breaks_in_default_ssml` regression test pinned to #83. The default multi-word SSML must not contain `<break>` tags; per-word break injection (PR #70) tripped Whisper's silence-detection heuristic and produced the WER regression. Any future Hebrew word- merge mitigation (#62) must not re-introduce per-word breaks. - `test_text_with_invalid_xml_chars_sanitized`, `test_prosody_pitch_clamped_to_azure_range`, `test_prosody_rate_clamped_to_azure_range`, `test_prosody_volume_clamped_to_azure_range`, `test_adjacent_phrase_breaks_are_merged` — pin the restored hardenings. All three of these were independently flagged by Copilot's review on this PR (resolves three Copilot review threads). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…s only) (#86) * Revert "fix(tts): Azure SSML parsing error on adjacent break elements (#67) (#71)" This reverts commit 0bdb217. * Revert "fix(tts): insert inter-word <break> tags to prevent Hebrew word merging (#70)" This reverts commit d0c273b. * restore(tts): reinstate hardenings from #71 unrelated to #70 + add #83 regression test The initial revert in this branch (revert PR #70 + PR #71 wholesale) was too aggressive: PR #71 bundled three independent hardenings, only one of which was caused by PR #70. This commit restores the two hardenings that have nothing to do with inter-word break injection, and narrows the third to the residual case that survives #70's revert. Also adds the #83 regression test that pins the bisect finding. Restored from #71 (and explicitly verified to NOT re-introduce #83): - `_sanitize_text` + `_XML_INVALID_CHARS_RE` regex. Defends against an LLM emitting XML 1.0 control characters that otherwise make the SSML unparseable by Azure. Independent bug class from per-word breaks. - Azure-range prosody clamping in `_semitones_to_percent`, `_rate_to_string`, `_volume_to_string`, plus warning logs on clamp activation. `speaker_BYS_F_6-10_001.yaml` ships `pitch_delta_st=+9` → +54% unclamped, which Azure rejects. Independent bug class. - Adjacent `<break>` merging in `_apply_phrase_prosody`, narrowed to the phrase-after / phrase-before case (the only adjacent-break source that survives #70's revert). The original #71 logic also had a word-break branch that is no longer reachable. Added: - `test_no_per_word_breaks_in_default_ssml` regression test pinned to #83. The default multi-word SSML must not contain `<break>` tags; per-word break injection (PR #70) tripped Whisper's silence-detection heuristic and produced the WER regression. Any future Hebrew word- merge mitigation (#62) must not re-introduce per-word breaks. - `test_text_with_invalid_xml_chars_sanitized`, `test_prosody_pitch_clamped_to_azure_range`, `test_prosody_rate_clamped_to_azure_range`, `test_prosody_volume_clamped_to_azure_range`, `test_adjacent_phrase_breaks_are_merged` — pin the restored hardenings. All three of these were independently flagged by Copilot's review on this PR (resolves three Copilot review threads). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

shaypal5 added type: fix Bug fix comp: tts TTS rendering, SSML, Azure/Google providers labels May 3, 2026

Copilot AI review requested due to automatic review settings May 3, 2026 22:59

Copilot started reviewing on behalf of shaypal5 May 3, 2026 23:00 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 3, 2026

View reviewed changes

shaypal5 merged commit 0bdb217 into main May 3, 2026
6 checks passed

shaypal5 deleted the fix/ssml-parse-errors-67 branch May 3, 2026 23:21

shaypal5 mentioned this pull request May 4, 2026

bug(tts): SSML parse error 0x80045003 — unable to reproduce, root cause unknown #72

Open

6 tasks

shaypal5 mentioned this pull request May 5, 2026

investigate(tts): #83 residual — Whisper WER regression on high-intensity (I3+) Tier A clips #87

Open

shaypal5 mentioned this pull request May 6, 2026

fix(tts): #87 partial — effective-prosody cap addresses Whisper backdoor + helium range #90

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tts): Azure SSML parsing error on adjacent break elements (#67)#71

fix(tts): Azure SSML parsing error on adjacent break elements (#67)#71
shaypal5 merged 2 commits into
mainfrom
fix/ssml-parse-errors-67

shaypal5 commented May 3, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

github-actions Bot commented May 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Design decisions

Test plan

Uh oh!

This comment has been minimized.

github-actions Bot commented May 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaypal5 commented May 3, 2026 •

edited

Loading