fix(tts): Azure SSML parsing error on adjacent break elements (#67)#71
Merged
Conversation
…arse error (#67) Azure's SSML parser rejects adjacent <break> elements (error 0x80045003). PR #70's inter-word breaks interact with phrase prosody break_before/break_after to create adjacent breaks in high-intensity turns with "menace"/"slow" hints. Fix: merge adjacent breaks (use max duration) instead of emitting consecutive break elements. Also add prosody value clamping to Azure's documented ranges and text sanitization for XML 1.0 invalid characters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
…w test Address self-review feedback: - Merge logic now distinguishes word-boundary breaks (replace) from semantic breaks like break_after (sum durations) to preserve dramatic pause intent - Prosody clamping now logs a warning instead of silently swallowing — makes upstream speaker config bugs visible - Remove redundant ET.fromstring validation (ET.tostring can't produce invalid XML from a valid tree; _sanitize_text handles the only failure path) - Add Hebrew regression test with actual AGG I5 parameters and niqqud text - Add test for break_after → break_before summing on consecutive phrases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
pr-agent-context report: This run includes a patch coverage gap on PR #71 in repository https://github.com/DataHackIL/SynthBanshee
Address the patch coverage gaps below, then push all of these changes in a single commit.
# Patch coverage
Patch test coverage is 97.06%; please raise it to 100%. These are the uncovered code lines:
- synthbanshee/tts/ssml_builder.py: 178Run metadata: |
6 tasks
shaypal5
added a commit
that referenced
this pull request
May 5, 2026
sp_it_a_0003.yaml (intimate_terror_financial_control then jealousy_surveillance
templates, seed 1103) consistently failed with Azure TTS SSML parse error
0x80045003 ("TurnStarted; Received audio size: 0 bytes"). Both templates
exhibited the same failure under different slots, suggesting the cause is
upstream (in the SSML the script generator emits for IT-typology content
in this seed range), not template-specific.
Replaced with sp_neg_a_0003 (negative_argument_deescalation template, seed
1303), which renders cleanly. Final 10-clip spike set: 2 IT / 3 NEG / 3 NEU
/ 2 SV. The IT failure is itself a finding for the report — it suggests
PR #67 / #71's SSML escaping work isn't fully covering the LLM output
distribution and warrants a follow-up.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5
added a commit
that referenced
this pull request
May 5, 2026
…sults Final report captures: ASR — PASS strong on all three runs (10 clips, paired comparison): - openai/whisper-large-v3 (HF MPS): median WER 0.064 - ivrit-ai/whisper-large-v3 (HF MPS): median WER 0.036 ← preferred - ivrit-ai/whisper-large-v3-ct2 int8 (faster-whisper CPU): median WER 0.044 Δ int8 vs MPS = +0.008, well within ε=0.02 → CI plan validated. Bonus: int8 CT2 dodges the only hallucination loop (sp_it_a_0002_00) seen on HF MPS; opens the door to using CT2 in dev/QA, not just CI. Hallucination detectors empirically validated on the 10 clips: trigram_repeat_ratio > 1.5 catches the outlier alone (ratio 2.00 vs 1.00 for all other 9 clips). Replaces the wrong-by-one-thousandth length-ratio threshold from PR #77 review. UTMOS — FAIL but with much richer evidence: - With paired same-5 comparison, all five degradations (4 SNR + lowpass) push UTMOS below clean. Direction is correct. - But max separation 0.207, well below 0.5 gate. - White-noise severity is INVERTED (more noise → higher UTMOS). - New clips score ~0.9 MOS points higher than originals — UTMOS is sensitive to TTS pipeline drift, dominating any within-clip degradation signal. Still NO-GO; recommend turn-segmented Option A re-spike before any E2 work. Carry-over finding: Azure TTS SSML rejected sp_it_a_0003 with two different intimate_terror templates. PR #67/#71 SSML escaping work doesn't cover this case; tracking as a separate follow-up. Replaced with sp_neg_a_0003 in the spike set. Reproducibility settled: greedy decoding on MPS is byte-stable run-to-run given the same normalization. Outlier WER 0.144 → 0.127 from PR #77 to this report is fully explained by the new RTL-mark strip in normalize_for_wer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 5, 2026
shaypal5
added a commit
that referenced
this pull request
May 5, 2026
…regression test The initial revert in this branch (revert PR #70 + PR #71 wholesale) was too aggressive: PR #71 bundled three independent hardenings, only one of which was caused by PR #70. This commit restores the two hardenings that have nothing to do with inter-word break injection, and narrows the third to the residual case that survives #70's revert. Also adds the #83 regression test that pins the bisect finding. Restored from #71 (and explicitly verified to NOT re-introduce #83): - `_sanitize_text` + `_XML_INVALID_CHARS_RE` regex. Defends against an LLM emitting XML 1.0 control characters that otherwise make the SSML unparseable by Azure. Independent bug class from per-word breaks. - Azure-range prosody clamping in `_semitones_to_percent`, `_rate_to_string`, `_volume_to_string`, plus warning logs on clamp activation. `speaker_BYS_F_6-10_001.yaml` ships `pitch_delta_st=+9` → +54% unclamped, which Azure rejects. Independent bug class. - Adjacent `<break>` merging in `_apply_phrase_prosody`, narrowed to the phrase-after / phrase-before case (the only adjacent-break source that survives #70's revert). The original #71 logic also had a word-break branch that is no longer reachable. Added: - `test_no_per_word_breaks_in_default_ssml` regression test pinned to #83. The default multi-word SSML must not contain `<break>` tags; per-word break injection (PR #70) tripped Whisper's silence-detection heuristic and produced the WER regression. Any future Hebrew word- merge mitigation (#62) must not re-introduce per-word breaks. - `test_text_with_invalid_xml_chars_sanitized`, `test_prosody_pitch_clamped_to_azure_range`, `test_prosody_rate_clamped_to_azure_range`, `test_prosody_volume_clamped_to_azure_range`, `test_adjacent_phrase_breaks_are_merged` — pin the restored hardenings. All three of these were independently flagged by Copilot's review on this PR (resolves three Copilot review threads). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5
added a commit
that referenced
this pull request
May 5, 2026
…s only) (#86) * Revert "fix(tts): Azure SSML parsing error on adjacent break elements (#67) (#71)" This reverts commit 0bdb217. * Revert "fix(tts): insert inter-word <break> tags to prevent Hebrew word merging (#70)" This reverts commit d0c273b. * restore(tts): reinstate hardenings from #71 unrelated to #70 + add #83 regression test The initial revert in this branch (revert PR #70 + PR #71 wholesale) was too aggressive: PR #71 bundled three independent hardenings, only one of which was caused by PR #70. This commit restores the two hardenings that have nothing to do with inter-word break injection, and narrows the third to the residual case that survives #70's revert. Also adds the #83 regression test that pins the bisect finding. Restored from #71 (and explicitly verified to NOT re-introduce #83): - `_sanitize_text` + `_XML_INVALID_CHARS_RE` regex. Defends against an LLM emitting XML 1.0 control characters that otherwise make the SSML unparseable by Azure. Independent bug class from per-word breaks. - Azure-range prosody clamping in `_semitones_to_percent`, `_rate_to_string`, `_volume_to_string`, plus warning logs on clamp activation. `speaker_BYS_F_6-10_001.yaml` ships `pitch_delta_st=+9` → +54% unclamped, which Azure rejects. Independent bug class. - Adjacent `<break>` merging in `_apply_phrase_prosody`, narrowed to the phrase-after / phrase-before case (the only adjacent-break source that survives #70's revert). The original #71 logic also had a word-break branch that is no longer reachable. Added: - `test_no_per_word_breaks_in_default_ssml` regression test pinned to #83. The default multi-word SSML must not contain `<break>` tags; per-word break injection (PR #70) tripped Whisper's silence-detection heuristic and produced the WER regression. Any future Hebrew word- merge mitigation (#62) must not re-introduce per-word breaks. - `test_text_with_invalid_xml_chars_sanitized`, `test_prosody_pitch_clamped_to_azure_range`, `test_prosody_rate_clamped_to_azure_range`, `test_prosody_volume_clamped_to_azure_range`, `test_adjacent_phrase_breaks_are_merged` — pin the restored hardenings. All three of these were independently flagged by Copilot's review on this PR (resolves three Copilot review threads). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
<break time="50ms"/>elements interact with phrase prosodybreak_before_ms(from "menace"/"slow"/"break_before" hints) to create adjacent<break>elements. Azure's SSML parser rejects this pattern with error0x80045003.<break>elements instead of emitting consecutive breaks. Word-boundary breaks (50ms) are replaced by the phrase break; semantic breaks (e.g. a precedingbreak_after) are summed to preserve both pause intents.break_before_ms > 0.Changes
synthbanshee/tts/ssml_builder.py_apply_phrase_prosody(word-boundary → replace, semantic → sum); clamp prosody values to Azure ranges withlogging.warning; add_sanitize_textfor XML-invalid chars as defense-in-depthtests/unit/test_tts.pyDesign decisions
break_after) represents intentional dramatic pause that should be preserved alongside the new phrase'sbreak_before._sanitize_textstays in the builder: Ideally invalid chars should be rejected at the LLM parsing boundary, but defense-in-depth here ensures the SSML layer never produces unparseable output regardless of upstream bugs.Test plan
pytest tests/— all 1688 tests pass (9 new)ruff check— cleanmypy— cleanCloses #67
🤖 Generated with Claude Code