Skip to content

feat: per-scene sfx via elevenlabs (milestone a of #25)#26

Merged
cuio merged 1 commit intomainfrom
feat/storyline-sfx-elevenlabs
Apr 27, 2026
Merged

feat: per-scene sfx via elevenlabs (milestone a of #25)#26
cuio merged 1 commit intomainfrom
feat/storyline-sfx-elevenlabs

Conversation

@cuio
Copy link
Copy Markdown
Owner

@cuio cuio commented Apr 27, 2026

First audio-bed retention lever from the SFX/Music + Gemini plan (#25). Per-scene generated sound effects landing on the existing SFX lane that PR #20's placeholders wired.

Summary

What
Backend generateSoundEffect() ElevenLabs client · SFX manifest (append-only, three anchors) · 4 new storyline routes (suggest / generate / get / delete) · assembler emits <audio data-track-index="3"> per manifest entry
Frontend New 🔊 Add SFX per-card action · two new stacks per card (amber "proposals" + emerald "SFX on lane") · HTML5 audio audition + Remove on each applied entry
Tests 10 new anchor-math cases — scene-start / scene-end (with SFX-longer-than-scene clamp) / accent-word (mid / last / past-last / negative / empty narration fallback)

How it composes

[🔊 Add SFX] click on a card
        │
        ▼
POST /storyline/sfx-suggest   ← Haiku 4.5, ~$0.001
        │
        ▼ returns 1-3 { prompt, durationS, anchor, rationale }
[Amber proposal stack on the card]
        │ user clicks 🔊 Generate
        ▼
POST /storyline/sfx-generate  ← ElevenLabs Sound Generation, 1 credit
        │
        ▼ writes assets/sfx/<sceneId>-<id>.mp3, appends manifest
[Emerald "SFX on lane" stack — HTML5 audio audition]
        │ next assemble reads the manifest
        ▼
<audio data-track-index="3" data-timeline-group="sfx">  on the SFX lane

Architecture notes

  1. Manifest, not script.json fields. SFX entries live in assets/sfx/sfx.manifest.json. The assembler reads it at assemble time. Keeps script.json purely about narration + scene visual decisions.
  2. Three anchors, no free-form offsets: scene-start, accent-word, scene-end. Covers ~95% of cinematic usage. Resist adding offsets until users ask.
  3. Soft delete only. DELETE /sfx/:entryId removes the manifest entry but leaves the audio file on disk so the user can recover by hand-editing JSON.
  4. Suggest and generate are separate routes. Lets the user scan multiple Haiku ideas before paying ElevenLabs credits to generate one.
  5. Cost-tracked end-to-end. Suggest logs script.storyline.sfx.suggest (Haiku); generate logs script.storyline.sfx.generate (ElevenLabs). The CostBadge already aggregates by op.

Test plan

  • 729 core tests pass (was 719; +10 anchor-math)
  • 281 studio tests pass
  • Lint, format, typecheck clean
  • Live verify: Storyline tab loads 16 scene cards, each has the 🔊 Add SFX button, zero console errors
  • Manual: click 🔊 Add SFX on s04 (the long-narration hook). Confirm 1-3 amber proposals appear with prompt/duration/anchor/rationale.
  • Manual: click 🔊 Generate on one. Confirm the file lands at assets/sfx/s04-sfx-….mp3, an emerald "SFX on lane" row appears with playable audio, and the master HTML's SFX lane shows the new clip on the timeline.
  • Manual: Remove the entry. Confirm the manifest entry is gone but the file stays on disk.

Milestone A of #25 — what's next

When What
B Music next Multi-scene background music tracks via Eleven v3 Music polled-job pattern
C Gemini render review after B Analyse the rendered MP4 → structured retention feedback + scroll-risk windows
D Image analysis after C Auto-detect role / vibe / suggested treatment on upload
E Per-scene scroll test after D "Would they scroll?" frame-sample prediction with one-change fix proposal
F Retention map last Horizontal strip at the top of the Storyline tab — green/amber/red squares per scene

Each subsequent milestone reuses the manifest pattern + applyPatch pipeline that this PR puts in place.

🤖 Generated with Claude Code

The first audio-bed retention lever from the SFX/Music + Gemini plan
(#25). Per-scene generated sound effects landing on the existing SFX
lane that PR #20 placeholders wired.

**Backend** (`packages/core`)

- `elevenlabs/sfx.ts` — `generateSoundEffect(apiKey, prompt, opts)` mirrors
  the existing `synthesize()` shape: returns mp3 bytes + format. Clamps
  duration to 0.5..22, prompt influence to 0..1. Surfaces ElevenLabsError
  with the raw HTTP detail so the studio can show a meaningful 502.
- `script/sfx/manifest.ts` — append-only manifest at
  `assets/sfx/sfx.manifest.json`. Three anchors (scene-start /
  accent-word / scene-end). Pure-helper `resolveSfxStart` does the
  anchor → absolute-time math; `resolveSfxStartForScene` wraps it with
  scene + cursor lookup so the assembler can call it directly.
- `assemble.ts` reads the manifest once per assemble, groups by sceneId,
  and emits `<audio data-track-index="3" data-timeline-group="sfx">`
  elements alongside the existing voiceover audio. Already-existing SFX
  lane placeholder picks them up natively — no runtime/timeline changes.
- 4 new storyline routes:
    POST  /storyline/sfx-suggest    Haiku 4.5 → 1-3 prompt+anchor ideas
    POST  /storyline/sfx-generate   ElevenLabs → mp3 + manifest append
    GET   /storyline/sfx            current manifest
    DELETE /storyline/sfx/:entryId  manifest remove (audio file kept)

  All four cost-tracked (`script.storyline.sfx.suggest` / `.generate`).
  Suggest is ~$0.001/call; generate is one ElevenLabs SFX credit.

**Frontend** (`packages/studio`)

- New `🔊 Add SFX` per-card AI action. Click → Haiku returns 1-3 prompt
  ideas, rendered in a new amber-tinted suggestion stack below the
  regular Haiku stack. Each row shows label + prompt + duration +
  anchor + rationale + 🔊 Generate button. Clicking Generate calls
  the ElevenLabs route, writes the file, refreshes the manifest, and
  re-reads the script (so the SFX immediately lands on the timeline's
  SFX lane).
- New emerald "SFX on lane" stack on each card lists already-applied
  SFX with HTML5 `<audio controls>` audition + Remove button.
- `StorylineTab` adds three independent state maps (proposed / applied /
  generationStatus) so the UI stays responsive while ElevenLabs
  generates one and the user previews another.

**Tests** (`packages/core`)

- 10 anchor-math cases covering scene-start / scene-end (with SFX
  longer than the scene → clamps to sceneStart, never preceding it),
  accent-word interpolation (0, mid, last, past-last clamps to last,
  negative clamps to 0, empty narration falls back to scene-start),
  custom focal-line fractions.

Tests: 729 core (+10) + 281 studio passing. Lint, format, typecheck
clean. Verified live: 16 scene cards render, 16 "🔊 Add SFX" buttons
visible, zero console errors.

Plan: #25 (Milestone A of six). Next milestones (B Music, C Gemini
render review, D Image analysis, E Scroll test, F Retention map) layer
on this same SFX manifest pattern + applyPatch pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cuio cuio merged commit 27cc52c into main Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant