Holocron v1.3.0
AI voiceover. Adds /holocron:voice-act and the voice-actor agent — the first MCP integration in Holocron, wiring up the official elevenlabs-mcp server (TTS, voice search, voice details, voice previews, subscription checks). Four modes: audition (compare 3–5 candidate voices side-by-side for casting), soundbite (confirm a locked voice with a clean canonical render), scene (multi-voice render of a finished scene with attribution-driven speaker switching), and line (single-line re-take with stability/style overrides). Per-character voice assignments live in a new world/voice-cast.md, generated from a template by /holocron:new. Renders land in inserts/audio/ with .transcript.txt sidecars listing every voice ID and rendering parameter. Every render is also logged to an append-only render log inside voice-cast.md.
The manuscript stays clean. Chapter files are never modified — no SSML, no emotion tags, no inline directives. All per-render direction (stability, style, speed, pacing) lives in the in-flight ElevenLabs prompt the agent synthesises from prose context plus the character profile plus craft/voice.md's declared narrative register. The same chapter file feeds /holocron:write, /holocron:critic, /holocron:edit, /holocron:build, and /holocron:voice-act without compromise.
The pattern mirrors v0.7.0's script-renderer (PNG glyph inserts via Pillow): text → external pipeline → media artifact in inserts/<media-type>/ with a sidecar. Three-layer dispatch (preflight → cast lookup → MCP call), fail-fast on missing dependencies, conservative-on-fabrication (never picks a voice ID silently — audition produces candidates, the writer locks the winner manually).
Added
agents/voice-actor.md— new specialist agent (opus, magenta). Frontmatter declares the ElevenLabs MCP tool surface (mcp__elevenlabs__text_to_speech,search_voices,get_voice,text_to_voice,list_models,check_subscription) plus the standard file tools. System prompt covers the 6-step pipeline (preflight → path resolution → mode dispatch → prompt synthesis → MCP call → output artifact + render log), per-mode procedures for audition / soundbite / scene / line, prompt-synthesis heuristics (anger → lower stability + raise style; sorrow → raise stability + lower speed slightly; etc.), conservative discipline (never invents a voice ID, never modifies prose), four triggering examples covering common writer phrasings.skills/voice-act/skill.md— slash-command entry point./holocron:voice-act <audition|soundbite|scene|line> <args>. Documents the four modes, prerequisites (MCP install, API key, optional ffmpeg for scene stitching), the workflow (audition → manual lock → soundbite confirm → scene), re-casting protocol (--archiveflag), and what the skill does NOT do (no SSML in chapter files, no voice-cloning of the author, no soundscape generation, no audiobook stitching across chapters).templates/standalone-story/world/voice-cast.md— new per-project cast template. Sections: Narrator voice settings, per-character cast entries (voice ID, accent/origin, speech-pattern notes, settings overrides, audition log), append-only render log, cast-change protocol. YAML frontmatter for indexing; prose-first body for human editing.templates/standalone-story/inserts/audio/.gitkeep— output directory placeholder. Mirrors the v0.7.0inserts/.gitkeeppattern for scripts.
Changed
agents/new-project.md— Project Structure block updated to includeworld/voice-cast.mdand theinserts/directory tree (audio + aurebesh / etc.). No new wizard phase — voicing is post-write, optional, and orthogonal to the project-bootstrap flow.templates/standalone-story/README.md— new Audio (optional) section near the bottom, pointing the writer at/holocron:voice-actandworld/voice-cast.md. Framed as an optional add-on, not a required step.skills/dashboard/skill.md— Workshop tab gains a Voiceover section: cast-completion status (N of M characters cast), rendered MP3 count frominserts/audio/, recent entries from thevoice-cast.mdrender log, link to the cast file, CTA to audition uncast characters. Files-read list expands withworld/voice-cast.mdandinserts/audio/*.mp3.README.md(plugin root) — Status block bumped to v1.3.0 with a one-paragraph summary of the new feature. Commands table adds the/holocron:voice-actrow. NewDependenciessubsection paragraph documents the ElevenLabs MCP install (uvx / pip) and theELEVENLABS_API_KEYenv-var setup. Project-structure block updated forinserts/audio/andworld/voice-cast.md..claude-plugin/plugin.json,.claude-plugin/marketplace.json— version bumped to1.3.0.
Notes
- First MCP consumer in the plugin. Holocron's external pipelines (font install + glyph render, EPUB build) all run through Python helpers via Bash. The voice-actor agent introduces a different pattern — MCP tool calls dispatched directly from the agent. The skill / agent frontmatter declares the specific tool surface (
mcp__elevenlabs__text_to_speech, etc.) so permissions are scoped tight. - Why audition is the entry point. A locked voice that doesn't survive contact with extended dialogue is the failure mode that wastes the most credits. Audition produces 3–5 candidate soundbites at once; the writer listens and picks; the lock is then the writer's explicit edit to
voice-cast.md. The agent never picks a voice silently — even when only one candidate would clearly fit the character profile. - Why the manuscript stays clean. Holocron is built on plain-markdown chapter files that every agent reads —
writer,chapter-reviewer,editor,critique,continuity, the critic personas, and nowvoice-actor. If voice-actor wrote SSML into chapters, every other agent would have to learn to ignore it. The cleaner architecture is: chapters are prose; emotion/pacing for renders is synthesised at render time from prose context plus the character profile pluscraft/voice.md. The trade-off is that the writer can't fine-tune the audio render by adding inline markers — but they can fine-tune by overriding--stability/--styleper-call, or by editing the per-character entries invoice-cast.md. - Credit accounting. ElevenLabs free tier ships 10k credits/month (~20 soundbites, ~3 scene renders). The agent surfaces credit usage in every reply via
check_subscription. Scene mode flags when a render will exceed 10k credits before dispatching. - Backwards compatibility. Existing v1.2.0 projects work without modification —
world/voice-cast.mdandinserts/audio/are absent until the writer runs/holocron:voice-act, at which point the agent halts and points them at the template. No automatic file creation in old projects; the writer opts in.
Deferred (candidates for v1.3.x or v1.4)
- Voice cloning of the author for narration (ElevenLabs supports it via
voice_clone; out of scope for v1.3.0 — separate feature class). - Soundscape generation for scene ambience (the MCP server supports
text_to_sound_effects; out of scope). - Full audiobook assembly — per-scene MP3s stitched into per-chapter audio with normalised levels, M4B output, chapter markers. Natural v1.4 once per-scene renders are reliable in real-world use.
- EPUB-with-embedded-audio build (EPUB 3 media overlays) — possible on the
/holocron:buildside but requires significant scaffolding; out of scope. - Live-render-during-write (writer agent triggers a render preview after each scene). Deferred until the cost/latency profile is understood.