Cathode is a local-first explainer-video pipeline with three main surfaces:
- a React + FastAPI control room for the current workspace-first UI
- a legacy Streamlit app for the older manual step-by-step path
- an MCP server for agent/client-driven runs
It turns rough notes, source text, or a finished script into a local project folder plus a rendered MP4, and it now supports classic, hybrid, and motion-first composition modes.
Cathode now has four practical lanes:
React/FastAPI control roomFill in Brief Studio, hit the primary button, watch the background job/logs, then land on the final MP4.Legacy Streamlit appUse the older manual step-by-step path when you want a more explicit scene-by-scene workflow.MCP workflowCallmake_videofrom an agent or client and let Cathode build the local project in the background.Live demo workflowLaunch or attach to a real app, capture fresh footage, review it, then feed the approved clips into Cathode for final render.
If you only remember one thing, remember this:
- most users only need the React/FastAPI app or MCP path
- the packaged live-demo skill is for cases where real UI footage is the story
- the scene editor is there for surgical fixes, not because the happy path should feel heavy
- brief-driven storyboard generation with
source_modeandcomposition_mode - image scenes, video scenes, and Remotion-backed motion scenes
- a one-button GUI background job path plus storyboard-only/manual editing when you want it
- scene-by-scene narration, prompt, media, preview, and operator-log editing
- persisted demo-target metadata and reviewed footage manifests for live-demo runs
- local MP4 render through
ffmpegor Remotion, depending on the resolved render backend - MCP tools and web API job routes for agent/client-driven video generation
Use this for the current workspace-based UI.
./start.sh --reactThe main workspaces are:
BriefScenesRenderQueueSettings
In Brief Studio, there are now two clearly separate actions:
- primary path: start the full background video run
- secondary path: generate or rebuild only the storyboard
If demo-target context or reviewed footage is present, the GUI prefers the hybrid path automatically unless you explicitly choose something else.
Use this when you want the older manual step-by-step flow.
./start.shThis is still supported, but the React/FastAPI control room is the more current operator surface.
Use this when an agent or client should drive Cathode programmatically.
/opt/homebrew/bin/python3.10 cathode_mcp_server.py --transport stdioThe core tool is make_video. It can inspect a bounded workspace, accept explicit source files, persist demo-target metadata, and accept reviewed footage_paths / footage_manifest inputs for mixed-media demos.
The React GUI and the MCP path now converge on the same persisted background-job model instead of maintaining separate orchestration logic.
The web stack also exposes the same job model through POST /api/jobs/make-video.
Use this when the video should prove a real running product.
The packaged skill now lives in:
skills/cathode-project-demo/.claude/skills/cathode-project-demo/
Its flow is:
- bootstrap Cathode
- prepare a live capture session
- launch or attach to the target app
- capture fresh states in a real browser
- review the footage
- hand approved clips into Cathode
- render
This path is capture-first and review-first. It does not assume existing README screenshots are good enough.
The QC pass is supposed to run inside Codex or Claude as a spawned reviewer sub-agent looking at extracted images only, not as some separate external “vision model” workflow. The reviewer prompt should stay tiny and human, more like “hey, check out my demo vid” than a schema dump.
The parent agent should save that raw reviewer reply, translate it into Cathode’s structured accept / warn / retry observations, and then let the deterministic review rules decide retries and handoff safety.
In practice, that review loop is:
extract_review_frames.pycreates the image bundle.- A spawned worker sub-agent sees only those frames plus the short gut-check prompt.
- The parent agent saves the raw reply, seeds
review_observations.template.jsonwithinit_review_observations.py, fills the structured observations, and runsreview_bundle.py.
The packaged live-demo lane now also has a real capture driver and retry-plan tool:
capture_live_demo.py: run a Playwright-backed walkthrough from a capture plan and keep raw browser video, trace, screenshots, and a step manifest.apply_retry_actions.py: mutate the capture plan from bounded retry actions before rerunning capture.
Cathode no longer stops at still-image and clip-only storyboards.
classicimage + video scenes, withffmpegas the default final render backendhybridmix image, video, and motion scenes in one project; Remotion is the default render backend when the local toolchain is availablemotion_onlybuild the project around motion scenes plus narration
Motion scenes are template-first in the current product and render through the local Remotion toolchain bundled in frontend/. The React app only exposes motion and hybrid options when the Remotion toolchain is actually runnable on this machine.
Important timing rule: narration audio is still the source of truth. Cathode computes scene durations, video trim/speed/hold behavior, and the Remotion manifest from the same timing contract, so hybrid renders stay in sync instead of drifting.
- Product demo:
docs/assets/__storyboard-demo.mp4 - LocalLLaMA short demo:
docs/assets/localllama-demo.mp4 - Mixed-media workflow clip:
docs/assets/ui-workflow-clip.mp4 - Brief Studio screenshot:
docs/assets/brief-studio-focus.png - Motion scene workspace screenshot:
docs/assets/motion-scene-focus.png - Render workspace screenshot:
docs/assets/render-finished-focus.png - Sample prompt brief:
docs/demo-brief.md
Cathode is env-driven on purpose.
OPENAI_API_KEY: OpenAI storyboard and optional OpenAI TTSANTHROPIC_API_KEY: Anthropic storyboardREPLICATE_API_TOKEN: Qwen image generation, Replicate-backed image edit, and Chatterbox voiceCATHODE_LOCAL_IMAGE_MODEL: optional local Hugging Face image generation for image scenesELEVENLABS_API_KEY: ElevenLabs narrationDASHSCOPE_API_KEYorALIBABA_API_KEY: optional DashScope image editCATHODE_LOCAL_VIDEO_COMMANDand/orCATHODE_LOCAL_VIDEO_ENDPOINT: optional local video generation for video scenesCATHODE_LOCAL_VIDEO_MODEL: optional local model label or path passed through to that backend- Node + the installed frontend workspace: local Remotion motion/hybrid rendering
- Kokoro remains the always-available local voice option
Only configured providers appear in the UI. If you leave a key out, the UI stays quieter.
Cathode is local-first, not cloud-hosted.
- the app runs locally
- projects live under
projects/<project>/ - previews and final renders happen locally
- uploaded stills and clips stay local
- Kokoro is local TTS
- video scenes can use a local generation backend when configured
- motion and hybrid renders happen locally through Remotion when available
- persisted job state and logs live under
projects/<project>/.cathode/jobs/
For visuals, the built-in AI image path can run either through Replicate or through a configured local Hugging Face Qwen model. If neither is configured, you can still upload stills yourself. Video scenes can come from reviewed footage, the live-demo agent path, or a configured local video backend. Motion scenes render through the local Remotion layer when the frontend toolchain is installed.
Cathode can run Qwen Image locally in two ways:
torchruntime for CUDA, CPU, or MPS through Hugging Facediffusersmlxruntime for Apple Silicon throughmflux
Typical CUDA / generic torch setup:
/opt/homebrew/bin/python3.10 -m pip install -r requirements-local-image.txt
export CATHODE_LOCAL_IMAGE_RUNTIME=torch
export CATHODE_LOCAL_IMAGE_MODEL=Qwen/Qwen-Image-2512Typical Apple Silicon MLX setup:
uv tool install --upgrade mflux
export CATHODE_LOCAL_IMAGE_RUNTIME=mlx
export CATHODE_LOCAL_IMAGE_MODEL=Qwen/Qwen-Image-2512
export CATHODE_LOCAL_IMAGE_MLX_MODEL=mlx-community/Qwen-Image-2512-8bitAuto mode keeps the single product-facing provider in the UI and picks MLX on Apple Silicon when mflux is installed; otherwise it falls back to the torch path.
Optional tuning:
export CATHODE_LOCAL_IMAGE_RUNTIME=auto
export CATHODE_LOCAL_IMAGE_DEVICE=auto
export CATHODE_LOCAL_IMAGE_DTYPE=auto
export CATHODE_LOCAL_IMAGE_STEPS=50
export CATHODE_LOCAL_IMAGE_TRUE_CFG_SCALE=4.0
export CATHODE_LOCAL_IMAGE_MLX_CACHE_LIMIT_GB=
export CATHODE_LOCAL_IMAGE_MLX_LOW_RAM=0Cathode keeps local video generation generic and env-driven rather than baking in one model family.
Configure one of these:
CATHODE_LOCAL_VIDEO_COMMAND: Cathode runs a local command and passes scene data through env vars such asCATHODE_VIDEO_PROMPT,CATHODE_VIDEO_OUTPUT_PATH,CATHODE_VIDEO_DURATION_SECONDS,CATHODE_VIDEO_MODEL, andCATHODE_VIDEO_REQUEST_JSON.CATHODE_LOCAL_VIDEO_ENDPOINT: Cathode sends a JSON POST request withprompt,output_path,duration_seconds,width,height,fps,scene, andbrief.
Your local backend can satisfy the request in any of these ways:
- write the clip directly to
CATHODE_VIDEO_OUTPUT_PATH/ the requestoutput_path - return JSON with
output_path - return JSON with
url - return JSON with
b64_json
Typical setup looks like this:
CATHODE_LOCAL_VIDEO_COMMAND='python /path/to/local_video_wrapper.py'
CATHODE_LOCAL_VIDEO_MODEL=/models/wanOr:
CATHODE_LOCAL_VIDEO_ENDPOINT=http://127.0.0.1:8787/generate
CATHODE_LOCAL_VIDEO_MODEL=wan2.1./start.sh --reactLegacy Streamlit path:
./start.shManual React + FastAPI run:
/opt/homebrew/bin/python3.10 -m uvicorn server.app:app --host 127.0.0.1 --port 9321 --reload
npm run dev --prefix frontend -- --host 127.0.0.1 --port 9322Manual app run:
/opt/homebrew/bin/python3.10 -m streamlit run app.py --server.port 8517Default port is 8517. Override it with STREAMLIT_PORT when using ./start.sh.
React mode uses CATHODE_API_PORT for FastAPI (default 9321) and CATHODE_FRONTEND_PORT for Vite (default 9322).
Final render now uses direct ffmpeg orchestration and auto-prefers hardware H.264 encoders when the local ffmpeg build supports them. Override with CATHODE_VIDEO_ENCODER or force CPU fallback with CATHODE_DISABLE_HW_ENCODER=1.
When Remotion is available and the project resolves to motion_only or hybrid, Cathode can switch the final render backend to Remotion automatically.
Cathode also ships as an MCP server.
Run over stdio:
/opt/homebrew/bin/python3.10 cathode_mcp_server.py --transport stdioRun over Streamable HTTP:
CATHODE_MCP_PORT=8765 /opt/homebrew/bin/python3.10 cathode_mcp_server.py --transport streamable-httpDocker:
docker build -t cathode-mcp .
docker run --rm -p 8765:8765 cathode-mcpPrimary MCP tools:
make_videoget_job_statuscancel_jobrerun_stagelist_projects
Primary MCP resources:
project://{project_name}/planproject://{project_name}/artifacts
macOS:
brew install python@3.10 ffmpeg espeak-ngUbuntu / Debian:
sudo apt-get install python3.10 ffmpeg espeak-ng/opt/homebrew/bin/python3.10 -m pip install -r requirements.txtCopy .env.example to .env and fill in only what you need.
Example:
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
REPLICATE_API_TOKEN=
ELEVENLABS_API_KEY=
DASHSCOPE_API_KEY=
ALIBABA_API_KEY=
IMAGE_EDIT_PROVIDER=
IMAGE_EDIT_MODEL=qwen/qwen-image-edit-2511
CATHODE_LOCAL_IMAGE_MODEL=
CATHODE_LOCAL_IMAGE_RUNTIME=auto
CATHODE_LOCAL_IMAGE_MLX_MODEL=mlx-community/Qwen-Image-2512-8bit
CATHODE_LOCAL_IMAGE_DEVICE=auto
CATHODE_LOCAL_IMAGE_DTYPE=auto
CATHODE_LOCAL_IMAGE_STEPS=50
CATHODE_LOCAL_IMAGE_TRUE_CFG_SCALE=4.0
CATHODE_LOCAL_IMAGE_NEGATIVE_PROMPT=
CATHODE_LOCAL_IMAGE_MLX_CACHE_LIMIT_GB=
CATHODE_LOCAL_IMAGE_MLX_LOW_RAM=0
STREAMLIT_PORT=8517
CATHODE_VIDEO_ENCODER=auto
CATHODE_DISABLE_HW_ENCODER=0
CATHODE_LOCAL_VIDEO_COMMAND=
CATHODE_LOCAL_VIDEO_ENDPOINT=
CATHODE_LOCAL_VIDEO_MODEL=
CATHODE_LOCAL_VIDEO_API_KEY=
CATHODE_LOCAL_VIDEO_TIMEOUT_SECONDS=900Every Cathode project stores:
- a normalized brief
- composition mode
- storyboard scenes
- image, clip, motion, audio, and preview paths
- render metadata
- demo-target metadata under
meta.agent_demo_profile - optional style references
- optional reviewed footage manifest
- persisted background job metadata and logs under
.cathode/jobs/
projects/<project>/plan.json is the source of truth.
The core brief still revolves around:
- source mode
- composition mode
- goal
- audience
- target length
- tone
- visual style
- source material
- optional footage notes
- optional demo-target context (
workspace_path,app_url,launch_command,expected_url,repo_url,flow_hints)
For live demos, add reviewed footage_paths or footage_manifest instead of only prose.
- image scenes hold for narration duration
- video scenes trim to narration duration
- short clips can freeze on the last frame to stay in sync
- motion scenes render from normalized template props through Remotion
- reviewed footage clips can be copied into
clips/and auto-assigned tovideoscenes - final render uses
ffmpegor Remotion based on the resolved render backend
python3.10 batch_regenerate.py
python3.10 batch_regenerate.py --projects demo_one,demo_two
python3.10 batch_regenerate.py --dry-runPYTHONPATH=. /opt/homebrew/bin/python3.10 -m pytest -qapp.py
batch_regenerate.py
cathode_mcp_server.py
core/
core/remotion_render.py
frontend/
server/
prompts/
skills/
tests/
docs/assets/
projects/
output/
MIT



