A macOS app that turns a narration script into a motion graphics video. Each sentence becomes a scene with theme-aware layout, generated text-to-speech audio, and a live preview. Export to MP4 when ready.
npm install
cd packages/pipeline && npm run build
cd packages/shell && npx electron-vite dev┌─────────────────────────────────────────────────┐
│ [☰ Menu] [Theme] [⤓ Render] │ TopBar
├──────┬──────────────────────────────────┬───────┤
│ ▢ 1 │ ┌───────────────────────────┐ │ │
│ ▢ 2◄ │ │ Scene preview (webview) │ │ Theme │
│ ▢ 3 │ └───────────────────────────┘ │ panel │
│ ▢ 4 │ ┌──────────────────┬────────┐ │ (tog- │
│ ▢ 5 │ │ │ Layout ▾│ │ gle) │
│ │ │ Scene script │ ⟳ Regen │ │ │
│ │ │ │ │ │ │
│ │ └──────────────────┴────────┘ │ │
└──────┴──────────────────────────────────┴───────┘
- Top bar: project menu (left), Theme toggle and Render button (right).
- Left sidebar: one tile per scene (numbered, color-coded by archetype, with a thumbnail when the scene has been rendered). Click to seek the preview.
- Center: live
@remotion/playerpreview of the selected scene + per-scene script editor + scene-level layout dropdown + scene-level Regenerate button. - Right (toggleable): theme picker. Selecting a different theme reveals an Apply theme button that runs a deterministic theme swap.
The app makes a clear split between operations that cost AI tokens and operations that don't. The pipeline's change-detector chooses the cheapest path that still produces a correct result.
| Operation | What runs |
|---|---|
| Open / save / list projects | Filesystem read/write of project.json |
| Archetype classification + layout pick | archetype-mapper.ts heuristics (numbers→stats, ?→rhetorical, …) |
| Theme swap (theme-only change) | theme-patcher.ts — string-replaces theme imports in the composition |
| Preview server | npx vite serving the project's player.html |
| Render to MP4 | npx remotion render |
| Scene thumbnails | npx remotion still per scene |
| Composition update on add/remove | MyVideo.tsx regenerated by template logic |
| TTS audio | Calls Inworld API — costs Inworld credits, not Claude tokens |
| Scene removal | Delete scene file + composition update |
| Operation | What runs |
|---|---|
| Initial scene generation | One full Claude Code session writes all scene .tsx files |
| Edit scene text → regenerate that scene | Targeted regen — only the changed scenes are sent to the AI |
| Change a scene's layout | Targeted regen — affected scenes only |
| Add a new scene | Targeted regen for added scenes + composition update |
The change detector returns a ChangeSet describing exactly which scenes need AI work, which need new TTS, and whether a theme swap can shortcut everything else. Theme-only changes never go through AI; sentence edits never re-render unaffected scenes.
The app is split into @remotion-app/pipeline (pure Node.js) and @remotion-app/shell (Electron). The pipeline has zero Electron imports — it communicates through EventEmitter events and method calls.
Why: The shell may change. Today it's Electron; tomorrow it could be a native macOS app (Swift + WKWebView) or a Tauri app. By keeping all video production logic in a standalone Node.js package, a shell swap only requires rewriting the thin IPC bridge — not the scaffolding, TTS, AI generation, preview server, rendering, or thumbnail code.
The coupling point is a single file: packages/shell/src/main/ipc-bridge.ts. It's the only file that imports both Electron and the pipeline.
Both audio generation and AI scene generation are behind interfaces, not hardcoded to any vendor.
Why: These are the two external dependencies most likely to change. TTS is a commodity — Inworld today, ElevenLabs or user-recorded audio tomorrow. AI scene generation is Claude Code today, but could be a different CLI, a direct API call, or a local model. By putting both behind interfaces (AudioProvider and AIProvider), swapping a vendor means writing one new class, not touching the pipeline or shell.
Built-in audio providers:
InworldProvider— calls the Inworld TTS API, writes MP3s, measures durationsFileAudioProvider— copies user-supplied MP3s, measures durations
Built-in AI providers:
ClaudeCodeProvider— spawns the Claude Code CLI with--print --output-format stream-json --verbose. Captures the session ID from thesystem.initevent and passes it as--resumeon subsequent turns for multi-turn context.
change-detector.ts compares the current project state against the last generated snapshot and returns a ChangeSet. The pipeline routes work to the cheapest path that produces a correct result:
- No changes → skip everything.
- Theme only →
patchTheme(deterministic file edit, no AI, no TTS). - Sentence edits → new TTS + AI scene regen for affected scenes.
- Layout-only edits → AI scene regen for affected scenes (no new TTS).
- Added scenes → TTS + AI for new scenes + composition update.
- Removed scenes → delete scene files + composition update (no AI).
Thumbnails are regenerated at the end of any change so the sidebar stays in sync.
The user edits scene text per-scene, not as one large script. Each scene is a freeform text block, classified by archetype (hook, stats, rhetorical, contrast, enumeration, cta, statement) which determines its default layout. The user can override the layout per scene.
The whole-script view is derived (scenes.map(s => s.sentence).join("\n")) and exists only for change detection and pipeline compatibility.
The preview is a <webview> pointing at a Vite dev server that serves a small player.html / player.tsx mounting <Player> with the project's main composition. The webview reloads with ?from=N to seek to scene N.
Why: Earlier versions embedded Remotion Studio with CSS injection to hide the chrome. Switching to @remotion/player gave us a clean canvas with no UI to strip, full control over which composition mounts, and easier seeking via URL params. Vite serves the project files directly, so HMR still picks up scene/theme edits without restarting anything.
When the user pastes a script, each sentence is classified into an archetype which maps to a default layout. The mapper is a pure function (no AI call) using simple heuristics:
- contains a number →
stats - ends with
?→rhetorical - first sentence and short →
hook - last sentence →
cta - contrast words (
but,however, …) →contrast - list words (
first,second, …) →enumeration - default →
statement
Adjacent layout duplicates are resolved automatically — no two consecutive scenes get the same layout.
The default AI provider spawns the Claude Code CLI rather than calling the Anthropic API directly.
Why: Claude Code already has the user's API key, model preferences, and authentication configured. It handles file writing, error recovery, and the complex system prompt from skill.md. The stream-json output gives structured progress events for the UI.
This is a provider choice, not an architectural commitment. The AIProvider interface accepts { cwd, systemPrompt, userMessage, onProgress } — any backend that can generate Remotion scene files from a prompt can be plugged in.
The Electron renderer uses Zustand for UI-only state (selected scene, theme panel open, render status, etc.). Pipeline state flows in through IPC events. Project persistence is handled by the pipeline's project-manager.ts.
- Node.js >= 20
- npm >= 9
- Claude Code CLI installed and authenticated
INWORLD_API_KEYin~/.env(for TTS), or useFileAudioProviderwith your own MP3s