Discovered while re-rendering sp_neu_a_0001 from its scene config on current main for #78.
Evidence
Same scene config (configs/scenes/she_proves/sp_neu_a_0001.yaml, 12 turns, intensity arc [1,1,1,2,1], AGG_M_30-45_001 + VIC_F_25-40_002), rendered before and after the prosody changes that landed in late April / early May:
| Date |
Code path |
Duration |
Vs scene target_duration_minutes: 3.0 (180 s) |
Vs She-Proves window (3–6 min) |
| 2026-04-15 |
pre-M15 / pre-#70 / pre-#68 |
121.0 s |
33% under target |
below lower bound |
| 2026-05-05 |
current main |
197.8 s |
10% over target |
within range |
The new duration is actually closer to the spec target than the old one. But the +63% delta is large enough that downstream code that estimated wall-cost or filename-budget on Tier A clips against the old duration regime is now off.
Likely cause (not yet confirmed)
Cumulative effect of recent prosody work, in rough order of suspected magnitude:
A controlled bisect on the same scene config (cached LLM script + cached SSML where possible, only the rendering path changing) will identify the dominant contributor cheaply.
Why this is not a #78 dependency
#78 is purely about loudness (peak / RMS). Duration is independent. But both regressions came in the same window of PRs and surface together when re-rendering Tier A clips, so it's worth tracking the duration delta separately so the loudness fix doesn't accidentally take ownership of "why are clips longer now."
Decision needed
Two questions:
- Is the new duration regime intentional? Per CLAUDE.md, She-Proves clips should be 3–6 min. 198 s = 3.3 min satisfies the lower bound; 121 s did not. If the team agrees the new duration is correct, this issue closes as "investigate, document, no code change."
- Does anything downstream budget on duration? E.g. label-generator phase boundaries, augmentation event placement, M17 evaluation runtime estimates. If yes, those budgets need to be re-derived against current TTS output.
Reproduction
.venv/bin/synthbanshee generate -c configs/scenes/she_proves/sp_neu_a_0001.yaml \
-o /tmp/duration_repro -p she_proves
.venv/bin/python -c \"import soundfile as sf; print(sf.info('/tmp/duration_repro/agg_m_30-45_001/sp_neu_a_0001_00.wav'))\"
Old reference at data/m2a_wettest/agg_m_30-45_001/sp_neu_a_0001_00.wav is 121.0 s.
References
Discovered while re-rendering
sp_neu_a_0001from its scene config on currentmainfor #78.Evidence
Same scene config (
configs/scenes/she_proves/sp_neu_a_0001.yaml, 12 turns, intensity arc[1,1,1,2,1], AGG_M_30-45_001 + VIC_F_25-40_002), rendered before and after the prosody changes that landed in late April / early May:target_duration_minutes: 3.0(180 s)The new duration is actually closer to the spec target than the old one. But the +63% delta is large enough that downstream code that estimated wall-cost or filename-budget on Tier A clips against the old duration regime is now off.
Likely cause (not yet confirmed)
Cumulative effect of recent prosody work, in rough order of suspected magnitude:
style_map.<break>tags to prevent Hebrew word merging. Each break adds a small pause; for ~100 Hebrew words per turn over 12 turns, even a 50 ms median break adds ~60 s to the clip.A controlled bisect on the same scene config (cached LLM script + cached SSML where possible, only the rendering path changing) will identify the dominant contributor cheaply.
Why this is not a #78 dependency
#78 is purely about loudness (peak / RMS). Duration is independent. But both regressions came in the same window of PRs and surface together when re-rendering Tier A clips, so it's worth tracking the duration delta separately so the loudness fix doesn't accidentally take ownership of "why are clips longer now."
Decision needed
Two questions:
Reproduction
.venv/bin/synthbanshee generate -c configs/scenes/she_proves/sp_neu_a_0001.yaml \ -o /tmp/duration_repro -p she_proves .venv/bin/python -c \"import soundfile as sf; print(sf.info('/tmp/duration_repro/agg_m_30-45_001/sp_neu_a_0001_00.wav'))\"Old reference at
data/m2a_wettest/agg_m_30-45_001/sp_neu_a_0001_00.wavis 121.0 s.References