Skip to content

fix(config): halve pitch escalation at I4–I5 to eliminate helium effect#68

Merged
shaypal5 merged 2 commits into
mainfrom
fix/pitch-escalation-i4-i5
May 3, 2026
Merged

fix(config): halve pitch escalation at I4–I5 to eliminate helium effect#68
shaypal5 merged 2 commits into
mainfrom
fix/pitch-escalation-i4-i5

Conversation

@shaypal5
Copy link
Copy Markdown
Member

@shaypal5 shaypal5 commented May 3, 2026

Closes #64.

Summary

  • Listening test revealed VIC female pitch at I5 reached +6 st (=36%), far exceeding the M15 research consensus of +4–10%. Root cause: semitone/percent unit conversion error in M15 calibration.
  • Halves pitch_delta_st across all 10 speaker YAMLs (6 female VIC/SW, 4 male AGG/BEN)
  • Reduces SpeakerState pitch drift targets so accumulated cross-turn pitch doesn't compound the problem
  • Tightens MAX_F0_DRIFT_ST from 2.0 → 1.5 semitones

Before / After (VIC female, 210 Hz baseline, I5 after 3 escalating turns)

Component Before After
style_map pitch_delta_st +6 st (36%) +3 st (18%)
SpeakerState drift +0.9 st +0.6 st
Total pitch shift +6.9 st → 297 Hz +3.6 st → 255 Hz
Perceptual Helium / oompa-loompa Within natural female range

Changes per file

File Change
configs/examples/speaker_VIC_F_25-40_002.yaml I1:-2→0, I2:-1→0, I3:+2→+1, I4:+4→+2, I5:+6→+3
configs/examples/speaker_VIC_F_25-40_003.yaml I1:-1→0, I4:+3→+2, I5:+5→+3
configs/speakers/speaker_VIC_F_25-40_004.yaml I1:-2→0, I2:-1→0, I3:+2→+1, I4:+4→+2, I5:+6→+3
configs/examples/speaker_SW_F_30-45_001.yaml I5:+5→+3
configs/speakers/speaker_SW_F_30-45_002.yaml I4:+3→+2, I5:+5→+3
configs/speakers/speaker_SW_F_30-45_003.yaml I1:-2→0, I2:-1→0, I3:+2→+1, I4:+4→+2, I5:+6→+3
configs/examples/speaker_AGG_M_30-45_001.yaml I5:+3→+2
configs/examples/speaker_BEN_M_40-55_003.yaml I5:+3→+2
configs/speakers/speaker_BEN_M_40-55_004.yaml I5:+3→+2
configs/speakers/speaker_BEN_M_40-55_005.yaml I5:+3→+2
synthbanshee/tts/speaker_state.py AGG/VIC pitch drift targets reduced; MAX_F0_DRIFT_ST 2.0→1.5
tests/unit/test_speaker_state.py Updated drift bound assertions

Test plan

  • All 1435 tests pass
  • ruff check — passed
  • ruff format — passed
  • mypy — passed
  • Re-generate listening test clips after merge to verify improvement

🤖 Generated with Claude Code

…ct (#64)

Listening test (2026-05-03) revealed cartoonish pitch at high intensity:
VIC female reached +6 st (=36%) at I5, far exceeding the M15 consensus
range of +4–10%. The discrepancy was a unit conversion error — research
ranges are in percent but style_map values are in semitones (1 st ≈ 6%).

Changes:
- Speaker YAMLs: cap female VIC/SW pitch_delta_st to +3 st at I5 (was +5/+6)
- Speaker YAMLs: cap male AGG/BEN pitch_delta_st to +2 st at I5 (was +3)
- SpeakerState: reduce AGG pitch drift targets (I4: 1.5→1.0, I5: 2.0→1.5)
- SpeakerState: reduce VIC pitch drift targets (I4: 0.8→0.5, I5: 1.0→0.7)
- MAX_F0_DRIFT_ST: tighten from 2.0 to 1.5 semitones
- Update unit tests for new drift bound

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shaypal5 shaypal5 added type: fix Bug fix comp: tts TTS rendering, SSML, Azure/Google providers comp: config Pydantic config models (scene, speaker, run) labels May 3, 2026
Copilot AI review requested due to automatic review settings May 3, 2026 20:38
@shaypal5 shaypal5 added type: fix Bug fix comp: tts TTS rendering, SSML, Azure/Google providers comp: config Pydantic config models (scene, speaker, run) labels May 3, 2026
@github-actions

This comment has been minimized.

… stale comments

Self-review follow-up for #64:
- Female VIC/SW I5 pitch_delta_st reduced from +3 st (~18%) to +2 st (~12%),
  bringing it closer to the M15 consensus range of 4–10%
- SW_001/SW_002 I1 pitch corrected from -1 st (-6%) to 0 (within -3% to +2%)
- Added pitch clamp (±12 st) in renderer.py to prevent unbounded drift when
  speaker_state + randomization stack up
- Fixed stale docstring in f0_drift_exceeded (2.0 → 1.5)
- Fixed stale comment in VIC_003 about old pitch range

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #68 in repository https://github.com/DataHackIL/SynthBanshee. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25290315136 attempt 1
Comment timestamp: 2026-05-03T20:45:38.956618+00:00
PR head commit: 307f1297b7c45bf02c23032983ec820bea572214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: config Pydantic config models (scene, speaker, run) comp: tts TTS rendering, SSML, Azure/Google providers type: fix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(tts): pitch escalation at I4–I5 is cartoonish (helium effect)

1 participant