Skip to content

fix(mixer): #65 Lombard spectral tilt at I4–I5#74

Merged
shaypal5 merged 3 commits into
mainfrom
fix/lombard-tilt-65
May 4, 2026
Merged

fix(mixer): #65 Lombard spectral tilt at I4–I5#74
shaypal5 merged 3 commits into
mainfrom
fix/lombard-tilt-65

Conversation

@shaypal5
Copy link
Copy Markdown
Member

@shaypal5 shaypal5 commented May 4, 2026

Closes #65.

Problem

At I4–I5 the volume increase was applied via pure amplitude scaling
(StyleEntry.volume_delta_db + per-turn RMS gain in the mixer). Both are
amplitude-only operations — they don't change the spectral envelope, so the
result sounds like the speaker moved closer to the microphone, not like they
raised their voice. Azure he-IL voices don't honour <mstts:express-as>
(disabled in M14), so SSML can't request a shouting style; the fix has to be
a post-TTS DSP step.

Solution

Add a Lombard-effect spectral tilt as a post-TTS step in SceneMixer.mix_sequential:

  • An RBJ high-shelf biquad at 2.5 kHz, slope S=1.
  • +2.0 dB at I4, +3.5 dB at I5; no-op for I1–I3 (and intensity=None).
  • Runs after per-turn RMS gain (so the boost shapes the already-loud signal)
    and before the M14 edge fades.
  • The intensity is threaded through the mix_sequential segment tuple as a
    6th element (int | None).

No clipping is performed inside the shelf — preprocessing's peak limiter
handles ceiling enforcement, consistent with _apply_rms_gain.

Changes per file

File Change
synthbanshee/tts/mixer.py Add _apply_lombard_tilt() and _highshelf_biquad(); extend mix_sequential segment tuple to 6 elements; apply tilt after RMS gain
synthbanshee/tts/renderer.py Pass turn.intensity into the segment tuple
tests/unit/test_mixer.py New TestLombardTilt and TestLombardInMixer classes; update existing 5-tuples to 6-tuples
tests/unit/test_generation_metadata.py Update segment tuples to 6 elements
tests/integration/test_multi_speaker.py Update segment tuples to 6 elements
docs/audio_generation_v3_design.md New §4.2c documenting the Lombard tilt

Test plan

  • pytest — full suite: 1698 passed
  • ruff check — passes (incl. pre-commit ruff format)
  • mypy — passes on changed files
  • Spectral verification: I5 raises HF/LF (>2.5 kHz vs. <2.5 kHz) energy ratio by ≥ 50 % vs. baseline white noise (test_i5_boosts_high_frequencies)
  • Monotonicity: I5 boost > I4 boost (test_i5_boost_exceeds_i4)
  • No regression at low intensity: I1, I2, I3, and None are bit-exact passthrough (test_low_intensity_passes_through, test_none_intensity_passes_through)
  • Low-frequency content preserved (test_low_band_largely_preserved)
  • No clipping introduced for typical-amplitude turns (test_i5_does_not_clip_typical_signal)

Out of scope

🤖 Generated with Claude Code

Volume increases at high intensity were applied via pure amplitude scaling
(SSML volume_delta_db + per-turn RMS gain), which sounds like microphone
proximity rather than a raised voice. Real shouting boosts high-frequency
energy (Lombard effect).

Add a post-TTS RBJ high-shelf biquad at 2.5 kHz applied in
SceneMixer.mix_sequential after RMS gain: +2.0 dB at I4, +3.5 dB at I5,
no-op for I1–I3. The intensity is threaded through the segment tuple as a
6th element. I1–I3 turns are bit-exact unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@shaypal5 shaypal5 added bugfix type: fix Bug fix comp: tts TTS rendering, SSML, Azure/Google providers comp: mixer SceneMixer, MixMode, gap controller labels May 4, 2026
Copilot AI review requested due to automatic review settings May 4, 2026 07:42
@shaypal5 shaypal5 added bugfix type: fix Bug fix comp: tts TTS rendering, SSML, Azure/Google providers comp: mixer SceneMixer, MixMode, gap controller labels May 4, 2026
@github-actions

This comment has been minimized.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a post-TTS Lombard-style spectral tilt in the mixer so high-intensity turns (I4/I5) sound more like raised speech, which fits the existing TTS pipeline where Azure he-IL style tags are unavailable and scene-level shaping happens after per-turn rendering.

Changes:

  • Adds a 2.5 kHz RBJ high-shelf in SceneMixer and applies it for I4/I5 after RMS gain.
  • Threads turn.intensity through TTSRenderer.render_scene() into mixer segment data.
  • Updates mixer/metadata/integration tests and design docs for the new intensity-aware mixing step.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/test_mixer.py Updates segment tuples and adds Lombard tilt unit/integration tests.
tests/unit/test_generation_metadata.py Updates mixer segment fixtures for the new tuple shape.
tests/integration/test_multi_speaker.py Updates helper scene construction to pass intensity slot through mixer tuples.
synthbanshee/tts/renderer.py Extends mixer segment payload to include turn.intensity.
synthbanshee/tts/mixer.py Implements the high-shelf filter and applies it during sequential mixing.
docs/audio_generation_v3_design.md Documents the new Lombard spectral-tilt design and placement in the pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread synthbanshee/tts/mixer.py Outdated
Comment thread synthbanshee/tts/renderer.py
@github-actions

This comment has been minimized.

…uted biquad

Apply senior-dev self-review feedback to the #65 Lombard PR before merging:

- Replace the 6-element segment tuple with a Segment dataclass.  Named fields
  remove transposition risk and make the call sites self-documenting; the
  positional init keeps test boilerplate small.
- Drop intensity: int | None in favour of intensity: int (default 1 = no
  boost).  None silently skipping the effect was a footgun if a future call
  site forgot to wire intensity through.
- Pre-compute the two RBJ high-shelf biquad coefficient sets at module load.
  The trig was previously recomputed for every I4/I5 segment.
- Drop the misleading sample_rate=_TARGET_SR default from
  _apply_lombard_tilt; the function only operates on already-resampled
  mixer output.
- Tighten the spectral test: split HF/LF energy at 3.5 kHz (above the shelf
  knee, in the asymptotic-gain region) instead of at the 2.5 kHz corner.
- Add a regression test for out-of-range intensities (-1, 0, 6, 99) so the
  no-op contract is explicit.
- Skip 50 ms instead of ~12 ms when measuring low-band preservation, so the
  filter startup transient doesn't bias the assertion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@shaypal5
Copy link
Copy Markdown
Member Author

shaypal5 commented May 4, 2026

Self-review pass — pushed d0649ba

Did a critical self-review and applied the following fixes before requesting merge:

API

  • Replaced the 6-element segment tuple with a Segment dataclass. Named fields remove transposition risk; positional init keeps test call sites tight.
  • Dropped intensity: int | None in favour of intensity: int (default 1 = no boost). None silently skipping the effect was a footgun if a future call site forgot to wire intensity through.

Performance

  • Precomputed the two RBJ biquad coefficient sets at module load via _precompute_lombard_coeffs(). The trig was previously recomputed for every I4/I5 segment in the hot loop.

Cleanups

  • Dropped the misleading sample_rate=_TARGET_SR default from _apply_lombard_tilt — the function only operates on already-resampled mixer output.
  • Clarified the docstring: shelf gain is asymptotic (full +2.0/+3.5 dB above the knee, half-gain at the 2.5 kHz corner).

Tests

  • Tightened spectral assertion: HF/LF split at 3500 Hz (above the shelf knee, in the asymptotic region) instead of at the 2500 Hz corner where the response is in transition. Threshold tuned to 1.5× — close to the analytically expected 1.7× and well above noise.
  • Added regression for out-of-range intensities (-1, 0, 6, 99) so the no-op contract is explicit.
  • Skip 50 ms (not 12 ms) when measuring low-band preservation so the filter startup transient doesn't bias the assertion.

Verified

  • pytest: 1698 passed
  • ruff: clean (incl. pre-commit ruff format)
  • mypy: clean

@github-actions

This comment has been minimized.

)

Addresses Copilot review feedback on PR #74: the existing mixer-level tests
would not catch a regression where render_scene stops forwarding each
turn's intensity into the Segment list (e.g. accidental intensity=1
hard-code at line 387).  Verified by injecting that exact regression
locally — the new test fails with the expected diff.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 4, 2026 08:04
@shaypal5
Copy link
Copy Markdown
Member Author

shaypal5 commented May 4, 2026

Addressing Copilot's two review comments in 81078ef:

mixer.py — breaking API change: not a true backward-compatibility concern. mix_sequential() is internal — the only production caller is TTSRenderer.render_scene(), which is updated in this PR. Tests are also updated. After the self-review refactor (d0649ba) the input is now a Segment dataclass rather than a tuple, so a stale 5-tuple caller would fail with a clear TypeError: Segment.__init__() missing arguments (not the cryptic unpack error Copilot called out). Resolving as out-of-scope — there are no external programmatic callers of this internal helper.

renderer.py:384 — no test for turn.intensity plumbing: legitimate gap. Added test_render_scene_forwards_turn_intensity_to_mixer in tests/unit/test_tts.py — it spies on SceneMixer.mix_sequential, runs three turns at intensities [1, 5, 4], and asserts the resulting Segment list carries those exact values in order. Verified the test catches the exact regression Copilot warned about: locally hard-coding intensity=1 in renderer.py:387 makes the test fail with assert [1, 1, 1] == [1, 5, 4]. Reverted the injection; pushing the test only.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #74 in repository https://github.com/DataHackIL/SynthBanshee. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25307975324 attempt 1
Comment timestamp: 2026-05-04T08:05:49.220983+00:00
PR head commit: 81078effaff146b6369628cf81b79bc67283ef53

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread synthbanshee/tts/mixer.py
@shaypal5 shaypal5 merged commit 4d4df10 into main May 4, 2026
13 of 14 checks passed
@shaypal5 shaypal5 deleted the fix/lombard-tilt-65 branch May 4, 2026 08:17
shaypal5 added a commit that referenced this pull request May 7, 2026
AGENTS.md "TTS" section claimed `SceneMixer.mix_sequential` took a
4-tuple `(wav_bytes, pause_s, speaker_id, rms_target_dbfs)`. The actual
current API is the `Segment` dataclass in `synthbanshee/tts/mixer.py`
with six named fields (`wav_bytes`, `amount_s`, `speaker_id`,
`rms_target_dbfs`, `mix_mode`, `intensity`). The doc has been wrong
since at least M8a (added `mix_mode`); #74's Lombard tilt then added
`intensity`. Per the dataclass docstring, the named-fields move from
positional tuple was deliberate so call sites and reviewers can't
transpose args silently.

Surfaced during PR #96 review (delete-.agent-plan.md) — the original
.agent-plan.md and AGENTS.md disagreed about the segment API; turns out
they were both stale, and the design-doc tracker (#74 row line 219)
shows the dataclass move that neither AGENTS.md nor .agent-plan.md
caught.

Splendor re-ingest of AGENTS.md (and the opportunistic re-ingest of
docs/spec.md that ingest --changed picked up) is intentionally NOT in
this commit — it's 16 state/wiki/planning files of churn, scoped for a
dedicated splendor maintenance follow-up rather than mixed into a doc
fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 7, 2026
Copilot's review of commit b20eaac caught a cross-reference convention
break: the codebase consistently uses #65 (the issue) when referencing
the Lombard high-shelf — see synthbanshee/tts/mixer.py lines 6, 80, 84,
88, 108, 183, 322, and docs/audio_generation_v3_design.md §4.2c
(line 215). PR #74 is the implementation that closed the issue, but
the cross-link convention is to point at the issue.

One-character fix: #74#65 in the M3a TTS bullet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 7, 2026
…ft (#96)

* docs: sync .agent-plan.md to current state (M16 done, M17 in flight)

The state tracker was last updated 2026-04-22 and claimed the active task
was M8b. Since then M8b, M9a, M9b, M10a, M10b, M11, M13, M14, M15, M16
have all merged, and M17 (automated evaluation) has had its design (#73),
Phase A spike (#77, #79), and a wave of bug-fix PRs (#82, #85, #86, #90)
land — anyone reading this file got an 8-milestone-stale picture.

Rewrote:
- Current system state — list every merged V3 milestone with one-line
  summaries; promoted the loudness contract (#78) and effective-prosody
  cap (#87) to the architectural-invariants list since both are
  load-bearing for any agent editing TTS or preprocessing code.
- Active / next task — replaced "M8b" with the M17 ASR regression thread,
  noted PR #90 as the partial fix and #91 (rate-floor lift R) as the
  queued next experiment, including the WER ≤ 0.10 + listening-test pass
  criterion.
- Open threads table — new section listing the threads agents most often
  need to know about (M12 gate, M17 full automation, #62 word merging,
  #72 SSML parse, #88 CI ASR deferral).
- Context pointers — added splendor-brief as the orientation entrypoint
  and the .venv-vs-~/.local PATH trap.
- CI / Workflow notes — added the Tier-3 ASR local-only policy summary
  so PRs touching audio-rendering files don't merge without it.

This file is a quick-orientation summary — added a header line marking
the design-doc tracker as authoritative when details disagree, so the
next drift gets caught in a tracker diff rather than a stale summary.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: delete .agent-plan.md (Option A; supersedes prior commit)

Self-review of commit 4361497 (the .agent-plan.md rewrite) caught that
the file fundamentally fails the duplication test: milestone status
duplicates docs/audio_generation_v3_design.md → Implementation Tracker;
open-thread state duplicates GitHub Issues + splendor brief; pointers
duplicate AGENTS.md; and the only non-duplicate section
(architectural invariants) is already covered in AGENTS.md (FLOAT
subtype line 72, MixedScene shift line 37, validate_audio peak line 85,
full #78 loudness contract line 35, full #87 effective-prosody cap
line 73).

The rewrite also reproduced the "if details disagree, the tracker wins"
disclaimer antipattern — the same one PR #94's review removed from
docs/spec.md §3.1 just one PR ago. A docs file whose explicit charter
is "I am allowed to be wrong relative to the canonical source" is
structured drift bait. The original .agent-plan.md got 15 days stale;
"rewrite more carefully" was the wrong response.

Changes:
- Delete .agent-plan.md (75-line summary that duplicated load-bearing
  state held authoritatively elsewhere).
- Update .claude/skills/open-feature-pr.md step 2 to drop the
  ".agent-plan.md" fallback for milestone-ID inference; the branch
  name + parent issue's milestone field already cover it, and pointing
  at the design-doc tracker as authoritative is more durable than
  pointing at a manually-maintained summary.
- Splendor maintenance: source forget src-9d9759e5ad... --apply.
  Removes the orphan source manifest, wiki summary page, and wiki
  index entry. Residual cross-references in 5 planning tasks and 5
  wiki pages remain — splendor surfaces them but doesn't auto-clean;
  they'll regenerate on next ingest of those sources.

Nothing added to AGENTS.md: the cross-cutting rules from .agent-plan.md
are already there. The remaining "invariants" (5-tuple mixer API
post-M8a, audible_* timeline use, MixMode no-audio-deps, _peak_limit vs
_normalize_peak naming) are mixer-internal details that belong in
module docstrings, not global agent rules — and AGENTS.md has its own
M8a drift bug (line 71 still says "4-tuple") that's better fixed in a
dedicated PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(agents): fix M3a/M8a/#74 drift in mixer Segment API description

AGENTS.md "TTS" section claimed `SceneMixer.mix_sequential` took a
4-tuple `(wav_bytes, pause_s, speaker_id, rms_target_dbfs)`. The actual
current API is the `Segment` dataclass in `synthbanshee/tts/mixer.py`
with six named fields (`wav_bytes`, `amount_s`, `speaker_id`,
`rms_target_dbfs`, `mix_mode`, `intensity`). The doc has been wrong
since at least M8a (added `mix_mode`); #74's Lombard tilt then added
`intensity`. Per the dataclass docstring, the named-fields move from
positional tuple was deliberate so call sites and reviewers can't
transpose args silently.

Surfaced during PR #96 review (delete-.agent-plan.md) — the original
.agent-plan.md and AGENTS.md disagreed about the segment API; turns out
they were both stale, and the design-doc tracker (#74 row line 219)
shows the dataclass move that neither AGENTS.md nor .agent-plan.md
caught.

Splendor re-ingest of AGENTS.md (and the opportunistic re-ingest of
docs/spec.md that ingest --changed picked up) is intentionally NOT in
this commit — it's 16 state/wiki/planning files of churn, scoped for a
dedicated splendor maintenance follow-up rather than mixed into a doc
fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(agents): cross-reference Lombard tilt as #65 (issue), not #74 (PR)

Copilot's review of commit b20eaac caught a cross-reference convention
break: the codebase consistently uses #65 (the issue) when referencing
the Lombard high-shelf — see synthbanshee/tts/mixer.py lines 6, 80, 84,
88, 108, 183, 322, and docs/audio_generation_v3_design.md §4.2c
(line 215). PR #74 is the implementation that closed the issue, but
the cross-link convention is to point at the issue.

One-character fix: #74#65 in the M3a TTS bullet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix comp: mixer SceneMixer, MixMode, gap controller comp: tts TTS rendering, SSML, Azure/Google providers type: fix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(tts): volume increase sounds like mic proximity, not raised voice

2 participants