fix(mixer): #65 Lombard spectral tilt at I4–I5 by shaypal5 · Pull Request #74 · DataHackIL/SynthBanshee

shaypal5 · 2026-05-04T07:42:28Z

Closes #65.

Problem

At I4–I5 the volume increase was applied via pure amplitude scaling
(StyleEntry.volume_delta_db + per-turn RMS gain in the mixer). Both are
amplitude-only operations — they don't change the spectral envelope, so the
result sounds like the speaker moved closer to the microphone, not like they
raised their voice. Azure he-IL voices don't honour <mstts:express-as>
(disabled in M14), so SSML can't request a shouting style; the fix has to be
a post-TTS DSP step.

Solution

Add a Lombard-effect spectral tilt as a post-TTS step in SceneMixer.mix_sequential:

An RBJ high-shelf biquad at 2.5 kHz, slope S=1.
+2.0 dB at I4, +3.5 dB at I5; no-op for I1–I3 (and intensity=None).
Runs after per-turn RMS gain (so the boost shapes the already-loud signal)
and before the M14 edge fades.
The intensity is threaded through the mix_sequential segment tuple as a
6th element (int | None).

No clipping is performed inside the shelf — preprocessing's peak limiter
handles ceiling enforcement, consistent with _apply_rms_gain.

Changes per file

File	Change
`synthbanshee/tts/mixer.py`	Add `_apply_lombard_tilt()` and `_highshelf_biquad()`; extend `mix_sequential` segment tuple to 6 elements; apply tilt after RMS gain
`synthbanshee/tts/renderer.py`	Pass `turn.intensity` into the segment tuple
`tests/unit/test_mixer.py`	New `TestLombardTilt` and `TestLombardInMixer` classes; update existing 5-tuples to 6-tuples
`tests/unit/test_generation_metadata.py`	Update segment tuples to 6 elements
`tests/integration/test_multi_speaker.py`	Update segment tuples to 6 elements
`docs/audio_generation_v3_design.md`	New §4.2c documenting the Lombard tilt

Test plan

pytest — full suite: 1698 passed
ruff check — passes (incl. pre-commit ruff format)
mypy — passes on changed files
Spectral verification: I5 raises HF/LF (>2.5 kHz vs. <2.5 kHz) energy ratio by ≥ 50 % vs. baseline white noise (test_i5_boosts_high_frequencies)
Monotonicity: I5 boost > I4 boost (test_i5_boost_exceeds_i4)
No regression at low intensity: I1, I2, I3, and None are bit-exact passthrough (test_low_intensity_passes_through, test_none_intensity_passes_through)
Low-frequency content preserved (test_low_band_largely_preserved)
No clipping introduced for typical-amplitude turns (test_i5_does_not_clip_typical_signal)

Out of scope

Formant shifting (more complex; deferred per issue bug(tts): volume increase sounds like mic proximity, not raised voice #65).
SSML / TTS parameter changes.
Preprocessing changes (augment/preprocessing.py is untouched — the tilt is a TTS-output characteristic).

🤖 Generated with Claude Code

Volume increases at high intensity were applied via pure amplitude scaling (SSML volume_delta_db + per-turn RMS gain), which sounds like microphone proximity rather than a raised voice. Real shouting boosts high-frequency energy (Lombard effect). Add a post-TTS RBJ high-shelf biquad at 2.5 kHz applied in SceneMixer.mix_sequential after RMS gain: +2.0 dB at I4, +3.5 dB at I5, no-op for I1–I3. The intensity is threaded through the segment tuple as a 6th element. I1–I3 turns are bit-exact unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a post-TTS Lombard-style spectral tilt in the mixer so high-intensity turns (I4/I5) sound more like raised speech, which fits the existing TTS pipeline where Azure he-IL style tags are unavailable and scene-level shaping happens after per-turn rendering.

Changes:

Adds a 2.5 kHz RBJ high-shelf in SceneMixer and applies it for I4/I5 after RMS gain.
Threads turn.intensity through TTSRenderer.render_scene() into mixer segment data.
Updates mixer/metadata/integration tests and design docs for the new intensity-aware mixing step.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`tests/unit/test_mixer.py`	Updates segment tuples and adds Lombard tilt unit/integration tests.
`tests/unit/test_generation_metadata.py`	Updates mixer segment fixtures for the new tuple shape.
`tests/integration/test_multi_speaker.py`	Updates helper scene construction to pass intensity slot through mixer tuples.
`synthbanshee/tts/renderer.py`	Extends mixer segment payload to include `turn.intensity`.
`synthbanshee/tts/mixer.py`	Implements the high-shelf filter and applies it during sequential mixing.
`docs/audio_generation_v3_design.md`	Documents the new Lombard spectral-tilt design and placement in the pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…uted biquad Apply senior-dev self-review feedback to the #65 Lombard PR before merging: - Replace the 6-element segment tuple with a Segment dataclass. Named fields remove transposition risk and make the call sites self-documenting; the positional init keeps test boilerplate small. - Drop intensity: int | None in favour of intensity: int (default 1 = no boost). None silently skipping the effect was a footgun if a future call site forgot to wire intensity through. - Pre-compute the two RBJ high-shelf biquad coefficient sets at module load. The trig was previously recomputed for every I4/I5 segment. - Drop the misleading sample_rate=_TARGET_SR default from _apply_lombard_tilt; the function only operates on already-resampled mixer output. - Tighten the spectral test: split HF/LF energy at 3.5 kHz (above the shelf knee, in the asymptotic-gain region) instead of at the 2.5 kHz corner. - Add a regression test for out-of-range intensities (-1, 0, 6, 99) so the no-op contract is explicit. - Skip 50 ms instead of ~12 ms when measuring low-band preservation, so the filter startup transient doesn't bias the assertion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

shaypal5 · 2026-05-04T07:55:07Z

Self-review pass — pushed `d0649ba`

Did a critical self-review and applied the following fixes before requesting merge:

API

Replaced the 6-element segment tuple with a Segment dataclass. Named fields remove transposition risk; positional init keeps test call sites tight.
Dropped intensity: int | None in favour of intensity: int (default 1 = no boost). None silently skipping the effect was a footgun if a future call site forgot to wire intensity through.

Performance

Precomputed the two RBJ biquad coefficient sets at module load via _precompute_lombard_coeffs(). The trig was previously recomputed for every I4/I5 segment in the hot loop.

Cleanups

Dropped the misleading sample_rate=_TARGET_SR default from _apply_lombard_tilt — the function only operates on already-resampled mixer output.
Clarified the docstring: shelf gain is asymptotic (full +2.0/+3.5 dB above the knee, half-gain at the 2.5 kHz corner).

Tests

Tightened spectral assertion: HF/LF split at 3500 Hz (above the shelf knee, in the asymptotic region) instead of at the 2500 Hz corner where the response is in transition. Threshold tuned to 1.5× — close to the analytically expected 1.7× and well above noise.
Added regression for out-of-range intensities (-1, 0, 6, 99) so the no-op contract is explicit.
Skip 50 ms (not 12 ms) when measuring low-band preservation so the filter startup transient doesn't bias the assertion.

Verified

pytest: 1698 passed
ruff: clean (incl. pre-commit ruff format)
mypy: clean

) Addresses Copilot review feedback on PR #74: the existing mixer-level tests would not catch a regression where render_scene stops forwarding each turn's intensity into the Segment list (e.g. accidental intensity=1 hard-code at line 387). Verified by injecting that exact regression locally — the new test fails with the expected diff. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

shaypal5 · 2026-05-04T08:04:27Z

Addressing Copilot's two review comments in 81078ef:

mixer.py — breaking API change: not a true backward-compatibility concern. mix_sequential() is internal — the only production caller is TTSRenderer.render_scene(), which is updated in this PR. Tests are also updated. After the self-review refactor (d0649ba) the input is now a Segment dataclass rather than a tuple, so a stale 5-tuple caller would fail with a clear TypeError: Segment.__init__() missing arguments (not the cryptic unpack error Copilot called out). Resolving as out-of-scope — there are no external programmatic callers of this internal helper.

renderer.py:384 — no test for turn.intensity plumbing: legitimate gap. Added test_render_scene_forwards_turn_intensity_to_mixer in tests/unit/test_tts.py — it spies on SceneMixer.mix_sequential, runs three turns at intensities [1, 5, 4], and asserts the resulting Segment list carries those exact values in order. Verified the test catches the exact regression Copilot warned about: locally hard-coding intensity=1 in renderer.py:387 makes the test fail with assert [1, 1, 1] == [1, 5, 4]. Reverted the injection; pushing the test only.

github-actions · 2026-05-04T08:06:38Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #74 in repository https://github.com/DataHackIL/SynthBanshee. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25307975324 attempt 1
Comment timestamp: 2026-05-04T08:05:49.220983+00:00
PR head commit: 81078effaff146b6369628cf81b79bc67283ef53

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

AGENTS.md "TTS" section claimed `SceneMixer.mix_sequential` took a 4-tuple `(wav_bytes, pause_s, speaker_id, rms_target_dbfs)`. The actual current API is the `Segment` dataclass in `synthbanshee/tts/mixer.py` with six named fields (`wav_bytes`, `amount_s`, `speaker_id`, `rms_target_dbfs`, `mix_mode`, `intensity`). The doc has been wrong since at least M8a (added `mix_mode`); #74's Lombard tilt then added `intensity`. Per the dataclass docstring, the named-fields move from positional tuple was deliberate so call sites and reviewers can't transpose args silently. Surfaced during PR #96 review (delete-.agent-plan.md) — the original .agent-plan.md and AGENTS.md disagreed about the segment API; turns out they were both stale, and the design-doc tracker (#74 row line 219) shows the dataclass move that neither AGENTS.md nor .agent-plan.md caught. Splendor re-ingest of AGENTS.md (and the opportunistic re-ingest of docs/spec.md that ingest --changed picked up) is intentionally NOT in this commit — it's 16 state/wiki/planning files of churn, scoped for a dedicated splendor maintenance follow-up rather than mixed into a doc fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot's review of commit b20eaac caught a cross-reference convention break: the codebase consistently uses #65 (the issue) when referencing the Lombard high-shelf — see synthbanshee/tts/mixer.py lines 6, 80, 84, 88, 108, 183, 322, and docs/audio_generation_v3_design.md §4.2c (line 215). PR #74 is the implementation that closed the issue, but the cross-link convention is to point at the issue. One-character fix: #74 → #65 in the M3a TTS bullet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ft (#96) * docs: sync .agent-plan.md to current state (M16 done, M17 in flight) The state tracker was last updated 2026-04-22 and claimed the active task was M8b. Since then M8b, M9a, M9b, M10a, M10b, M11, M13, M14, M15, M16 have all merged, and M17 (automated evaluation) has had its design (#73), Phase A spike (#77, #79), and a wave of bug-fix PRs (#82, #85, #86, #90) land — anyone reading this file got an 8-milestone-stale picture. Rewrote: - Current system state — list every merged V3 milestone with one-line summaries; promoted the loudness contract (#78) and effective-prosody cap (#87) to the architectural-invariants list since both are load-bearing for any agent editing TTS or preprocessing code. - Active / next task — replaced "M8b" with the M17 ASR regression thread, noted PR #90 as the partial fix and #91 (rate-floor lift R) as the queued next experiment, including the WER ≤ 0.10 + listening-test pass criterion. - Open threads table — new section listing the threads agents most often need to know about (M12 gate, M17 full automation, #62 word merging, #72 SSML parse, #88 CI ASR deferral). - Context pointers — added splendor-brief as the orientation entrypoint and the .venv-vs-~/.local PATH trap. - CI / Workflow notes — added the Tier-3 ASR local-only policy summary so PRs touching audio-rendering files don't merge without it. This file is a quick-orientation summary — added a header line marking the design-doc tracker as authoritative when details disagree, so the next drift gets caught in a tracker diff rather than a stale summary. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: delete .agent-plan.md (Option A; supersedes prior commit) Self-review of commit 4361497 (the .agent-plan.md rewrite) caught that the file fundamentally fails the duplication test: milestone status duplicates docs/audio_generation_v3_design.md → Implementation Tracker; open-thread state duplicates GitHub Issues + splendor brief; pointers duplicate AGENTS.md; and the only non-duplicate section (architectural invariants) is already covered in AGENTS.md (FLOAT subtype line 72, MixedScene shift line 37, validate_audio peak line 85, full #78 loudness contract line 35, full #87 effective-prosody cap line 73). The rewrite also reproduced the "if details disagree, the tracker wins" disclaimer antipattern — the same one PR #94's review removed from docs/spec.md §3.1 just one PR ago. A docs file whose explicit charter is "I am allowed to be wrong relative to the canonical source" is structured drift bait. The original .agent-plan.md got 15 days stale; "rewrite more carefully" was the wrong response. Changes: - Delete .agent-plan.md (75-line summary that duplicated load-bearing state held authoritatively elsewhere). - Update .claude/skills/open-feature-pr.md step 2 to drop the ".agent-plan.md" fallback for milestone-ID inference; the branch name + parent issue's milestone field already cover it, and pointing at the design-doc tracker as authoritative is more durable than pointing at a manually-maintained summary. - Splendor maintenance: source forget src-9d9759e5ad... --apply. Removes the orphan source manifest, wiki summary page, and wiki index entry. Residual cross-references in 5 planning tasks and 5 wiki pages remain — splendor surfaces them but doesn't auto-clean; they'll regenerate on next ingest of those sources. Nothing added to AGENTS.md: the cross-cutting rules from .agent-plan.md are already there. The remaining "invariants" (5-tuple mixer API post-M8a, audible_* timeline use, MixMode no-audio-deps, _peak_limit vs _normalize_peak naming) are mixer-internal details that belong in module docstrings, not global agent rules — and AGENTS.md has its own M8a drift bug (line 71 still says "4-tuple") that's better fixed in a dedicated PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(agents): fix M3a/M8a/#74 drift in mixer Segment API description AGENTS.md "TTS" section claimed `SceneMixer.mix_sequential` took a 4-tuple `(wav_bytes, pause_s, speaker_id, rms_target_dbfs)`. The actual current API is the `Segment` dataclass in `synthbanshee/tts/mixer.py` with six named fields (`wav_bytes`, `amount_s`, `speaker_id`, `rms_target_dbfs`, `mix_mode`, `intensity`). The doc has been wrong since at least M8a (added `mix_mode`); #74's Lombard tilt then added `intensity`. Per the dataclass docstring, the named-fields move from positional tuple was deliberate so call sites and reviewers can't transpose args silently. Surfaced during PR #96 review (delete-.agent-plan.md) — the original .agent-plan.md and AGENTS.md disagreed about the segment API; turns out they were both stale, and the design-doc tracker (#74 row line 219) shows the dataclass move that neither AGENTS.md nor .agent-plan.md caught. Splendor re-ingest of AGENTS.md (and the opportunistic re-ingest of docs/spec.md that ingest --changed picked up) is intentionally NOT in this commit — it's 16 state/wiki/planning files of churn, scoped for a dedicated splendor maintenance follow-up rather than mixed into a doc fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(agents): cross-reference Lombard tilt as #65 (issue), not #74 (PR) Copilot's review of commit b20eaac caught a cross-reference convention break: the codebase consistently uses #65 (the issue) when referencing the Lombard high-shelf — see synthbanshee/tts/mixer.py lines 6, 80, 84, 88, 108, 183, 322, and docs/audio_generation_v3_design.md §4.2c (line 215). PR #74 is the implementation that closed the issue, but the cross-link convention is to point at the issue. One-character fix: #74 → #65 in the M3a TTS bullet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

shaypal5 added bugfix type: fix Bug fix comp: tts TTS rendering, SSML, Azure/Google providers comp: mixer SceneMixer, MixMode, gap controller labels May 4, 2026

Copilot AI review requested due to automatic review settings May 4, 2026 07:42

shaypal5 added bugfix type: fix Bug fix comp: tts TTS rendering, SSML, Azure/Google providers comp: mixer SceneMixer, MixMode, gap controller labels May 4, 2026

Copilot started reviewing on behalf of shaypal5 May 4, 2026 07:43 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread synthbanshee/tts/mixer.py Outdated

Comment thread synthbanshee/tts/renderer.py

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings May 4, 2026 08:04

Copilot started reviewing on behalf of shaypal5 May 4, 2026 08:04 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread synthbanshee/tts/mixer.py

shaypal5 merged commit 4d4df10 into main May 4, 2026
13 of 14 checks passed

shaypal5 deleted the fix/lombard-tilt-65 branch May 4, 2026 08:17

shaypal5 mentioned this pull request May 7, 2026

docs: delete .agent-plan.md + fix AGENTS.md M3a/M8a/#74 mixer-API drift #96

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mixer): #65 Lombard spectral tilt at I4–I5#74

fix(mixer): #65 Lombard spectral tilt at I4–I5#74
shaypal5 merged 3 commits into
mainfrom
fix/lombard-tilt-65

shaypal5 commented May 4, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

shaypal5 commented May 4, 2026

Uh oh!

This comment has been minimized.

shaypal5 commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 4, 2026

Problem

Solution

Changes per file

Test plan

Out of scope

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

shaypal5 commented May 4, 2026

Self-review pass — pushed d0649ba

API

Performance

Cleanups

Tests

Verified

Uh oh!

This comment has been minimized.

shaypal5 commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Self-review pass — pushed `d0649ba`