Skip to content

Holocron v1.3.0

Choose a tag to compare

@github-actions github-actions released this 11 May 05:07
· 8 commits to main since this release

AI voiceover. Adds /holocron:voice-act and the voice-actor agent — the first MCP integration in Holocron, wiring up the official elevenlabs-mcp server (TTS, voice search, voice details, voice previews, subscription checks). Four modes: audition (compare 3–5 candidate voices side-by-side for casting), soundbite (confirm a locked voice with a clean canonical render), scene (multi-voice render of a finished scene with attribution-driven speaker switching), and line (single-line re-take with stability/style overrides). Per-character voice assignments live in a new world/voice-cast.md, generated from a template by /holocron:new. Renders land in inserts/audio/ with .transcript.txt sidecars listing every voice ID and rendering parameter. Every render is also logged to an append-only render log inside voice-cast.md.

The manuscript stays clean. Chapter files are never modified — no SSML, no emotion tags, no inline directives. All per-render direction (stability, style, speed, pacing) lives in the in-flight ElevenLabs prompt the agent synthesises from prose context plus the character profile plus craft/voice.md's declared narrative register. The same chapter file feeds /holocron:write, /holocron:critic, /holocron:edit, /holocron:build, and /holocron:voice-act without compromise.

The pattern mirrors v0.7.0's script-renderer (PNG glyph inserts via Pillow): text → external pipeline → media artifact in inserts/<media-type>/ with a sidecar. Three-layer dispatch (preflight → cast lookup → MCP call), fail-fast on missing dependencies, conservative-on-fabrication (never picks a voice ID silently — audition produces candidates, the writer locks the winner manually).

Added

  • agents/voice-actor.md — new specialist agent (opus, magenta). Frontmatter declares the ElevenLabs MCP tool surface (mcp__elevenlabs__text_to_speech, search_voices, get_voice, text_to_voice, list_models, check_subscription) plus the standard file tools. System prompt covers the 6-step pipeline (preflight → path resolution → mode dispatch → prompt synthesis → MCP call → output artifact + render log), per-mode procedures for audition / soundbite / scene / line, prompt-synthesis heuristics (anger → lower stability + raise style; sorrow → raise stability + lower speed slightly; etc.), conservative discipline (never invents a voice ID, never modifies prose), four triggering examples covering common writer phrasings.
  • skills/voice-act/skill.md — slash-command entry point. /holocron:voice-act <audition|soundbite|scene|line> <args>. Documents the four modes, prerequisites (MCP install, API key, optional ffmpeg for scene stitching), the workflow (audition → manual lock → soundbite confirm → scene), re-casting protocol (--archive flag), and what the skill does NOT do (no SSML in chapter files, no voice-cloning of the author, no soundscape generation, no audiobook stitching across chapters).
  • templates/standalone-story/world/voice-cast.md — new per-project cast template. Sections: Narrator voice settings, per-character cast entries (voice ID, accent/origin, speech-pattern notes, settings overrides, audition log), append-only render log, cast-change protocol. YAML frontmatter for indexing; prose-first body for human editing.
  • templates/standalone-story/inserts/audio/.gitkeep — output directory placeholder. Mirrors the v0.7.0 inserts/.gitkeep pattern for scripts.

Changed

  • agents/new-project.md — Project Structure block updated to include world/voice-cast.md and the inserts/ directory tree (audio + aurebesh / etc.). No new wizard phase — voicing is post-write, optional, and orthogonal to the project-bootstrap flow.
  • templates/standalone-story/README.md — new Audio (optional) section near the bottom, pointing the writer at /holocron:voice-act and world/voice-cast.md. Framed as an optional add-on, not a required step.
  • skills/dashboard/skill.md — Workshop tab gains a Voiceover section: cast-completion status (N of M characters cast), rendered MP3 count from inserts/audio/, recent entries from the voice-cast.md render log, link to the cast file, CTA to audition uncast characters. Files-read list expands with world/voice-cast.md and inserts/audio/*.mp3.
  • README.md (plugin root) — Status block bumped to v1.3.0 with a one-paragraph summary of the new feature. Commands table adds the /holocron:voice-act row. New Dependencies subsection paragraph documents the ElevenLabs MCP install (uvx / pip) and the ELEVENLABS_API_KEY env-var setup. Project-structure block updated for inserts/audio/ and world/voice-cast.md.
  • .claude-plugin/plugin.json, .claude-plugin/marketplace.json — version bumped to 1.3.0.

Notes

  • First MCP consumer in the plugin. Holocron's external pipelines (font install + glyph render, EPUB build) all run through Python helpers via Bash. The voice-actor agent introduces a different pattern — MCP tool calls dispatched directly from the agent. The skill / agent frontmatter declares the specific tool surface (mcp__elevenlabs__text_to_speech, etc.) so permissions are scoped tight.
  • Why audition is the entry point. A locked voice that doesn't survive contact with extended dialogue is the failure mode that wastes the most credits. Audition produces 3–5 candidate soundbites at once; the writer listens and picks; the lock is then the writer's explicit edit to voice-cast.md. The agent never picks a voice silently — even when only one candidate would clearly fit the character profile.
  • Why the manuscript stays clean. Holocron is built on plain-markdown chapter files that every agent reads — writer, chapter-reviewer, editor, critique, continuity, the critic personas, and now voice-actor. If voice-actor wrote SSML into chapters, every other agent would have to learn to ignore it. The cleaner architecture is: chapters are prose; emotion/pacing for renders is synthesised at render time from prose context plus the character profile plus craft/voice.md. The trade-off is that the writer can't fine-tune the audio render by adding inline markers — but they can fine-tune by overriding --stability / --style per-call, or by editing the per-character entries in voice-cast.md.
  • Credit accounting. ElevenLabs free tier ships 10k credits/month (~20 soundbites, ~3 scene renders). The agent surfaces credit usage in every reply via check_subscription. Scene mode flags when a render will exceed 10k credits before dispatching.
  • Backwards compatibility. Existing v1.2.0 projects work without modification — world/voice-cast.md and inserts/audio/ are absent until the writer runs /holocron:voice-act, at which point the agent halts and points them at the template. No automatic file creation in old projects; the writer opts in.

Deferred (candidates for v1.3.x or v1.4)

  • Voice cloning of the author for narration (ElevenLabs supports it via voice_clone; out of scope for v1.3.0 — separate feature class).
  • Soundscape generation for scene ambience (the MCP server supports text_to_sound_effects; out of scope).
  • Full audiobook assembly — per-scene MP3s stitched into per-chapter audio with normalised levels, M4B output, chapter markers. Natural v1.4 once per-scene renders are reliable in real-world use.
  • EPUB-with-embedded-audio build (EPUB 3 media overlays) — possible on the /holocron:build side but requires significant scaffolding; out of scope.
  • Live-render-during-write (writer agent triggers a render preview after each scene). Deferred until the cost/latency profile is understood.