fix(mixer): #65 Lombard spectral tilt at I4–I5#74
Conversation
Volume increases at high intensity were applied via pure amplitude scaling (SSML volume_delta_db + per-turn RMS gain), which sounds like microphone proximity rather than a raised voice. Real shouting boosts high-frequency energy (Lombard effect). Add a post-TTS RBJ high-shelf biquad at 2.5 kHz applied in SceneMixer.mix_sequential after RMS gain: +2.0 dB at I4, +3.5 dB at I5, no-op for I1–I3. The intensity is threaded through the segment tuple as a 6th element. I1–I3 turns are bit-exact unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Adds a post-TTS Lombard-style spectral tilt in the mixer so high-intensity turns (I4/I5) sound more like raised speech, which fits the existing TTS pipeline where Azure he-IL style tags are unavailable and scene-level shaping happens after per-turn rendering.
Changes:
- Adds a 2.5 kHz RBJ high-shelf in
SceneMixerand applies it for I4/I5 after RMS gain. - Threads
turn.intensitythroughTTSRenderer.render_scene()into mixer segment data. - Updates mixer/metadata/integration tests and design docs for the new intensity-aware mixing step.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/test_mixer.py |
Updates segment tuples and adds Lombard tilt unit/integration tests. |
tests/unit/test_generation_metadata.py |
Updates mixer segment fixtures for the new tuple shape. |
tests/integration/test_multi_speaker.py |
Updates helper scene construction to pass intensity slot through mixer tuples. |
synthbanshee/tts/renderer.py |
Extends mixer segment payload to include turn.intensity. |
synthbanshee/tts/mixer.py |
Implements the high-shelf filter and applies it during sequential mixing. |
docs/audio_generation_v3_design.md |
Documents the new Lombard spectral-tilt design and placement in the pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
…uted biquad Apply senior-dev self-review feedback to the #65 Lombard PR before merging: - Replace the 6-element segment tuple with a Segment dataclass. Named fields remove transposition risk and make the call sites self-documenting; the positional init keeps test boilerplate small. - Drop intensity: int | None in favour of intensity: int (default 1 = no boost). None silently skipping the effect was a footgun if a future call site forgot to wire intensity through. - Pre-compute the two RBJ high-shelf biquad coefficient sets at module load. The trig was previously recomputed for every I4/I5 segment. - Drop the misleading sample_rate=_TARGET_SR default from _apply_lombard_tilt; the function only operates on already-resampled mixer output. - Tighten the spectral test: split HF/LF energy at 3.5 kHz (above the shelf knee, in the asymptotic-gain region) instead of at the 2.5 kHz corner. - Add a regression test for out-of-range intensities (-1, 0, 6, 99) so the no-op contract is explicit. - Skip 50 ms instead of ~12 ms when measuring low-band preservation, so the filter startup transient doesn't bias the assertion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-review pass — pushed
|
This comment has been minimized.
This comment has been minimized.
) Addresses Copilot review feedback on PR #74: the existing mixer-level tests would not catch a regression where render_scene stops forwarding each turn's intensity into the Segment list (e.g. accidental intensity=1 hard-code at line 387). Verified by injecting that exact regression locally — the new test fails with the expected diff. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Addressing Copilot's two review comments in
|
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #74 in repository https://github.com/DataHackIL/SynthBanshee. Treat this PR as all clear unless new signals appear.Run metadata: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
AGENTS.md "TTS" section claimed `SceneMixer.mix_sequential` took a 4-tuple `(wav_bytes, pause_s, speaker_id, rms_target_dbfs)`. The actual current API is the `Segment` dataclass in `synthbanshee/tts/mixer.py` with six named fields (`wav_bytes`, `amount_s`, `speaker_id`, `rms_target_dbfs`, `mix_mode`, `intensity`). The doc has been wrong since at least M8a (added `mix_mode`); #74's Lombard tilt then added `intensity`. Per the dataclass docstring, the named-fields move from positional tuple was deliberate so call sites and reviewers can't transpose args silently. Surfaced during PR #96 review (delete-.agent-plan.md) — the original .agent-plan.md and AGENTS.md disagreed about the segment API; turns out they were both stale, and the design-doc tracker (#74 row line 219) shows the dataclass move that neither AGENTS.md nor .agent-plan.md caught. Splendor re-ingest of AGENTS.md (and the opportunistic re-ingest of docs/spec.md that ingest --changed picked up) is intentionally NOT in this commit — it's 16 state/wiki/planning files of churn, scoped for a dedicated splendor maintenance follow-up rather than mixed into a doc fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot's review of commit b20eaac caught a cross-reference convention break: the codebase consistently uses #65 (the issue) when referencing the Lombard high-shelf — see synthbanshee/tts/mixer.py lines 6, 80, 84, 88, 108, 183, 322, and docs/audio_generation_v3_design.md §4.2c (line 215). PR #74 is the implementation that closed the issue, but the cross-link convention is to point at the issue. One-character fix: #74 → #65 in the M3a TTS bullet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ft (#96) * docs: sync .agent-plan.md to current state (M16 done, M17 in flight) The state tracker was last updated 2026-04-22 and claimed the active task was M8b. Since then M8b, M9a, M9b, M10a, M10b, M11, M13, M14, M15, M16 have all merged, and M17 (automated evaluation) has had its design (#73), Phase A spike (#77, #79), and a wave of bug-fix PRs (#82, #85, #86, #90) land — anyone reading this file got an 8-milestone-stale picture. Rewrote: - Current system state — list every merged V3 milestone with one-line summaries; promoted the loudness contract (#78) and effective-prosody cap (#87) to the architectural-invariants list since both are load-bearing for any agent editing TTS or preprocessing code. - Active / next task — replaced "M8b" with the M17 ASR regression thread, noted PR #90 as the partial fix and #91 (rate-floor lift R) as the queued next experiment, including the WER ≤ 0.10 + listening-test pass criterion. - Open threads table — new section listing the threads agents most often need to know about (M12 gate, M17 full automation, #62 word merging, #72 SSML parse, #88 CI ASR deferral). - Context pointers — added splendor-brief as the orientation entrypoint and the .venv-vs-~/.local PATH trap. - CI / Workflow notes — added the Tier-3 ASR local-only policy summary so PRs touching audio-rendering files don't merge without it. This file is a quick-orientation summary — added a header line marking the design-doc tracker as authoritative when details disagree, so the next drift gets caught in a tracker diff rather than a stale summary. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: delete .agent-plan.md (Option A; supersedes prior commit) Self-review of commit 4361497 (the .agent-plan.md rewrite) caught that the file fundamentally fails the duplication test: milestone status duplicates docs/audio_generation_v3_design.md → Implementation Tracker; open-thread state duplicates GitHub Issues + splendor brief; pointers duplicate AGENTS.md; and the only non-duplicate section (architectural invariants) is already covered in AGENTS.md (FLOAT subtype line 72, MixedScene shift line 37, validate_audio peak line 85, full #78 loudness contract line 35, full #87 effective-prosody cap line 73). The rewrite also reproduced the "if details disagree, the tracker wins" disclaimer antipattern — the same one PR #94's review removed from docs/spec.md §3.1 just one PR ago. A docs file whose explicit charter is "I am allowed to be wrong relative to the canonical source" is structured drift bait. The original .agent-plan.md got 15 days stale; "rewrite more carefully" was the wrong response. Changes: - Delete .agent-plan.md (75-line summary that duplicated load-bearing state held authoritatively elsewhere). - Update .claude/skills/open-feature-pr.md step 2 to drop the ".agent-plan.md" fallback for milestone-ID inference; the branch name + parent issue's milestone field already cover it, and pointing at the design-doc tracker as authoritative is more durable than pointing at a manually-maintained summary. - Splendor maintenance: source forget src-9d9759e5ad... --apply. Removes the orphan source manifest, wiki summary page, and wiki index entry. Residual cross-references in 5 planning tasks and 5 wiki pages remain — splendor surfaces them but doesn't auto-clean; they'll regenerate on next ingest of those sources. Nothing added to AGENTS.md: the cross-cutting rules from .agent-plan.md are already there. The remaining "invariants" (5-tuple mixer API post-M8a, audible_* timeline use, MixMode no-audio-deps, _peak_limit vs _normalize_peak naming) are mixer-internal details that belong in module docstrings, not global agent rules — and AGENTS.md has its own M8a drift bug (line 71 still says "4-tuple") that's better fixed in a dedicated PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(agents): fix M3a/M8a/#74 drift in mixer Segment API description AGENTS.md "TTS" section claimed `SceneMixer.mix_sequential` took a 4-tuple `(wav_bytes, pause_s, speaker_id, rms_target_dbfs)`. The actual current API is the `Segment` dataclass in `synthbanshee/tts/mixer.py` with six named fields (`wav_bytes`, `amount_s`, `speaker_id`, `rms_target_dbfs`, `mix_mode`, `intensity`). The doc has been wrong since at least M8a (added `mix_mode`); #74's Lombard tilt then added `intensity`. Per the dataclass docstring, the named-fields move from positional tuple was deliberate so call sites and reviewers can't transpose args silently. Surfaced during PR #96 review (delete-.agent-plan.md) — the original .agent-plan.md and AGENTS.md disagreed about the segment API; turns out they were both stale, and the design-doc tracker (#74 row line 219) shows the dataclass move that neither AGENTS.md nor .agent-plan.md caught. Splendor re-ingest of AGENTS.md (and the opportunistic re-ingest of docs/spec.md that ingest --changed picked up) is intentionally NOT in this commit — it's 16 state/wiki/planning files of churn, scoped for a dedicated splendor maintenance follow-up rather than mixed into a doc fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(agents): cross-reference Lombard tilt as #65 (issue), not #74 (PR) Copilot's review of commit b20eaac caught a cross-reference convention break: the codebase consistently uses #65 (the issue) when referencing the Lombard high-shelf — see synthbanshee/tts/mixer.py lines 6, 80, 84, 88, 108, 183, 322, and docs/audio_generation_v3_design.md §4.2c (line 215). PR #74 is the implementation that closed the issue, but the cross-link convention is to point at the issue. One-character fix: #74 → #65 in the M3a TTS bullet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Closes #65.
Problem
At I4–I5 the volume increase was applied via pure amplitude scaling
(
StyleEntry.volume_delta_db+ per-turn RMS gain in the mixer). Both areamplitude-only operations — they don't change the spectral envelope, so the
result sounds like the speaker moved closer to the microphone, not like they
raised their voice. Azure he-IL voices don't honour
<mstts:express-as>(disabled in M14), so SSML can't request a shouting style; the fix has to be
a post-TTS DSP step.
Solution
Add a Lombard-effect spectral tilt as a post-TTS step in
SceneMixer.mix_sequential:intensity=None).and before the M14 edge fades.
mix_sequentialsegment tuple as a6th element (
int | None).No clipping is performed inside the shelf — preprocessing's peak limiter
handles ceiling enforcement, consistent with
_apply_rms_gain.Changes per file
synthbanshee/tts/mixer.py_apply_lombard_tilt()and_highshelf_biquad(); extendmix_sequentialsegment tuple to 6 elements; apply tilt after RMS gainsynthbanshee/tts/renderer.pyturn.intensityinto the segment tupletests/unit/test_mixer.pyTestLombardTiltandTestLombardInMixerclasses; update existing 5-tuples to 6-tuplestests/unit/test_generation_metadata.pytests/integration/test_multi_speaker.pydocs/audio_generation_v3_design.mdTest plan
pytest— full suite: 1698 passedruff check— passes (incl. pre-commitruff format)mypy— passes on changed filestest_i5_boosts_high_frequencies)test_i5_boost_exceeds_i4)Noneare bit-exact passthrough (test_low_intensity_passes_through,test_none_intensity_passes_through)test_low_band_largely_preserved)test_i5_does_not_clip_typical_signal)Out of scope
augment/preprocessing.pyis untouched — the tilt is a TTS-output characteristic).🤖 Generated with Claude Code