An all-in-one AI pipeline for creating cinematic, documentary-style videos β
from a single topic to a fully packaged YouTube-ready project.
The Pipeline Β· Models & APIs Β· Real-World Cost Β· Getting Started Β· Features
ContentMachine.1.mp4
API Status: I've personally tested this with Replicate and Gemini APIs β those are the battle-tested paths. fal.ai & elevenlabs support is implemented but not fully verified β it may have rough edges. PRs welcome!
ContentMachine automates the entire documentary video production workflow using state-of-the-art AI. Give it a topic, and it handles everything: researching real historical stories, planning scenes, generating images, creating video clips, writing narration scripts, generating voiceover audio, YouTube metadata, and thumbnails β all packaged into a clean ZIP ready for your video editor.
Built for content creators, documentarians, educators, and hobbyists who want to produce high-quality, cinematic content without a full production team.
I built this as a personal all-in-one pipeline β easy enough to run locally, flexible enough to swap AI providers, and powerful enough to produce publish-ready assets in one session.
ContentMachine runs a step-by-step pipeline with a clean UI to monitor, pause, and resume at any stage.
Topic Input
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. STORY GENERATION β
β LLM finds 4 real, documented historical stories β
β with cinematic potential β you pick one β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. SCENE PLANNING β
β LLM builds a full cinematic shot list with β
β smart pacing: durations adapt per video model β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. IMAGE GENERATION β
β 4 variations per scene (establishing, intimate, β
β detail, atmospheric) β select the best one β
β All images saved as real PNG/JPG files in ZIP β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 4. VIDEO GENERATION β
β Image-to-video, 2 scenes at a time β
β Multiple models available β select best clip β
β Browse previous versions with β β arrows β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 5. AUDIO (optional) β
β ElevenLabs TTS narration + SFX per scene β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 6. EXPORT β
β YouTube metadata Β· multi-select thumbnails β
β Full ZIP: videos + images/selected + images/all β
β + audio + script + restorable project.json β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The default aesthetic uses seamless glossy porcelain mannequins β figures always fully clothed in period-accurate outfits including explicitly named footwear (e.g. "iron-buckled brown leather knee boots"), no visible joints, stands, or supports. Photorealistic environments, ray tracing, cinematic lighting. A great starting point for YouTube-focused creators since it avoids depicting realistic scenes that may have been altered.
The visual style is fully customisable: expand Advanced β Customize System Prompts on the start page to edit the image prompt rules for any character type. Pair this with the Character Base Images feature (see below) to lock in a consistent look across every scene.
Note: Replicate and Gemini are the tested providers. fal.ai is a work in progress β contributions welcome.
| Provider | Models |
|---|---|
| fal.ai (WIP) | Claude 3.5 Sonnet |
| Gemini (direct) | Gemini 3 Flash (recommended), Gemini 3.1 Pro, Gemini 3 Pro, Gemini 2.5 Flash, Gemini 2.5 Pro |
| Replicate | Gemini 2.5 Flash, Gemini 3 Flash, Gemini 3.1 Pro, Claude 3.5 Sonnet |
| Provider | Models |
|---|---|
| fal.ai (WIP) | Flux Pro, Flux 2 Pro, Flux Schnell, Nano Banana Pro, Qwen Image 2512, Z-Image Base, Ideogram V3, SD 3.5 Large |
| Replicate | Flux 2 Pro, Flux 1.1 Pro, Nano Banana Pro (Gemini), Imagen 4 |
| Gemini (direct) | Gemini 3 Pro Image Preview (2K native output) |
| Provider | Model | Notes |
|---|---|---|
| fal.ai (WIP) | LTX-2 image-to-video | Not fully verified |
| Replicate | LTX-2 Pro | With generated audio, 6β10s |
| Replicate | LTX-2 Fast | 6β20s in 2s steps, favours 12β20s |
| Replicate | Kling v3 | 3β15s, standard/pro mode, AI audio |
| Replicate | Kling v2.5 Turbo Pro | 5s or 10s only |
| Provider | Capability |
|---|---|
| ElevenLabs | Scene-by-scene narration voiceover + SFX generation |
| Local TTS | Bring your own (QWEN TTS, Kokoro, etc.) β zero cost |
A 4:30 minute documentary video produced with ContentMachine cost me approximately $28 USD.
| Component | Provider / Model Used | Notes |
|---|---|---|
| Story + Scene Planning + Scripts | Gemini 3 Flash Preview (Gemini API) | Very cheap |
| Scene Images + Thumbnail | Nano Banana Pro / gemini-3-image-preview (Replicate) | Medium |
| Video Clips | LTX-2 Pro (Replicate) | Largest cost driver |
| Narrator TTS | QWEN TTS (local) | Free |
Tips to reduce cost:
- Use
gemini-2.5-flash(non-preview) for LLM β higher quota, fewer rate limits - Use fal.ai LTX-2 instead of Replicate LTX-2 Pro for cheaper video (once fal.ai is fully verified)
- Use Flux Schnell for faster, cheaper image generation
- Use a free local TTS tool for zero audio cost
- Use LTX-2 Fast on Replicate for longer scenes at a similar price point
- Node.js 18+
- API keys for at least one LLM provider and one image provider
# Clone
git clone https://github.com/Saganaki22/ContentMachine
cd ContentMachine
# Install all dependencies
npm install
# Start both backend and frontend
npm run devApp runs at http://localhost:5173. Backend API at http://localhost:3000.
Open the Settings panel (gear icon, top right). Paste your API keys β they are saved in your browser's localStorage and automatically pushed to the backend on each session startup. No .env file required for local use.
| Provider | Link |
|---|---|
| fal.ai | fal.ai/dashboard/keys |
| Replicate | replicate.com/account/api-tokens |
| Gemini | aistudio.google.com/api-keys |
| ElevenLabs | elevenlabs.io/app/settings/api-keys |
npm run build
npm run startContentMachine/
βββ backend/
β βββ server.js Express API server (200mb body limit)
β βββ routes/
β βββ claude.js LLM: stories, scene plans, prompts, scripts, metadata
β βββ images.js Image generation: fal.ai / Replicate / Gemini
β βββ videos.js Video generation + status polling
β βββ elevenlabs.js TTS narration + SFX generation
β βββ thumbnail.js Thumbnail image generation
β βββ export.js ZIP packaging (streams to browser)
β βββ session.js Auto-save sessions to output/ folder
β βββ settings.js API key management
β
βββ output/ Auto-saved sessions (one folder per session)
β βββ session_YYYY-MM-DD_xxx/
β βββ session.json Restorable project state
β βββ images/selected/ Chosen image per scene (PNG/JPG)
β βββ images/all/ All 4 generated variants per scene
β βββ images/history/ Previously regenerated image versions
β βββ videos/ Current selected video per scene (MP4)
β βββ videos/history/ Previously regenerated video versions
β βββ thumbnails/ Generated thumbnail images + history
β
βββ frontend/src/
βββ pages/
β βββ StorySelect.jsx Step 1 β topic input, story selection, aspect ratio, character images, advanced prompts
β βββ SceneImages.jsx Step 2 β image generation + selection + export
β βββ VideoGeneration.jsx Step 3 β video generation + narration script
β βββ AudioGeneration.jsx Step 4 β TTS voiceover (optional)
β βββ Export.jsx Step 5 β thumbnail, metadata, ZIP export
βββ components/
β βββ Layout.jsx Header, nav, settings drawer, session browser, footer
β βββ ImageModal.jsx Full-screen image viewer with history navigation
β βββ VideoModal.jsx Full-screen video viewer with history navigation
β βββ ExportModal.jsx Shared export modal (available from images page onwards)
βββ store/
β βββ pipelineStore.js Zustand global state + all async actions
βββ services/
β βββ api.js Axios client for all backend calls
βββ workers/
βββ zipImporter.worker.js ZIP extraction off main thread (JSZip + base64)
βββ jsonSerializer.worker.js (legacy, retained for reference)
- 6-step guided pipeline β story β scenes β images β videos β audio β export
- 4 image variations per scene β establishing, intimate, detail, atmospheric
- Batch processing β images generated scene-by-scene, videos 2 at a time
- Model-aware scene planning β LLM adapts allowed durations and pacing to match the selected video model's constraints
- Aspect ratio support β 16:9 (landscape) and 9:16 (portrait/TikTok/Reels); passed via API parameters, never written into prompts
- Resolution locked to 1080p β ensures consistent quality across all video models
- Pause / Resume at any point β safe to stop mid-batch and continue later
- Regenerate any individual image or video clip without re-running the pipeline
- Regenerate All β re-runs image generation for all scenes in one click
- Select All / Deselect All for video clips in one click
- Per-video download β download any individual video clip directly from the card
- Auto-retry β Gemini 429 rate limits and Replicate interruptions handled automatically with exponential backoff
- JSON repair β truncated or malformed LLM output is auto-repaired before parsing
- Live scene count estimate β estimated scene count shown on the start screen as you adjust video length
- Version history for images, videos, and thumbnails β every time you regenerate, the previous version is saved automatically
- β β arrow navigation β browse all previous versions of any image, video clip, or thumbnail in the full-screen modal
- Select any version β the version you are viewing when you click Select is the one that gets used; you are never locked into the latest regeneration
- Prompt saved per version β the exact prompt used for each version is shown and preserved; editing the prompt before regenerating updates it in the project
- History survives export β all previous image versions are included in the ZIP (
images/history/) and round-trip through session save/load
- LTX-2 Pro β 6/8/10s, generated audio
- LTX-2 Fast β 6β20s in 2s steps, scene planner biased toward 12β20s to make full use of the model
- Kling v3 β 3β15s integer, standard/pro mode, AI audio, uses start image
- Kling v2.5 Turbo Pro β 5s or 10s only, fast turnaround, uses start image
- Auto-save sessions β the app automatically saves your entire session to the
output/folder on the backend after every image batch, every completed video, and after thumbnails generate; a 60-second fallback timer catches anything in between - Session browser β click the clock icon in the header to browse all auto-saved sessions by date; click any session to restore it instantly, or delete sessions you no longer need
- Images and videos saved as real files β auto-saved sessions store every generated image AND video to disk (
images/all/,images/selected/,images/history/,videos/,videos/history/,thumbnails/) β no base64 bloat, files are immediately viewable in your file explorer - ZIP export β export at any stage (even from the images page before generating videos); ZIP contains:
images/selected/scene_NN.jpgβ your chosen image per sceneimages/all/scene_NN_vN.jpgβ all 4 generated variants per sceneimages/history/β previously regenerated image versionsvideos/scene_NN_v1.mp4,scene_NN_selected.mp4β all generated video versions per scenevideos/history/β previously regenerated video versionsaudio/,thumbnail/selected/,thumbnail/all/project.jsonβ fully restorable project state (no base64 β images are the real files)
- ZIP import β load a ZIP back into the app; extraction runs in a Web Worker so the UI never freezes; all images, videos, and thumbnails are restored and you continue exactly where you left off
- Load project β the Load button (folder icon) accepts both
.zipand.jsonfiles; navigates automatically to the furthest completed step - Safe load β loading a project never triggers new API requests or charges
- Browser persistence β Zustand state survives page reloads via localStorage
Note on video URL expiry: Some video providers (including Replicate) delete generated videos from their servers within a few hours of generation. Once the URL expires, the video is gone. Always export your ZIP or let the auto-save session capture the video URL before closing the tab. Images are always saved as real files and never expire.
- Saved in localStorage β keys auto-loaded into the backend on every session start, no re-entry needed
- Per-key Clear button β red clear button removes a key from localStorage and the backend instantly
- Validate before saving β Test button checks each key is valid before storing
- Upload reference images β upload a male and/or female character reference (JPG, PNG, WebP, max 10 MB each) on the start page
- Works for any character style β mannequins, realistic humans, anime characters, or anything else; describe the style in the optional Character style text field and the model follows it
- Sent with every scene β reference images are included with every image generation request so the model preserves the character's proportions, tone, and hair across all scenes
- Scene clothing always overridden β each scene still gets its own era-correct clothing and pose from the scene plan; only the character appearance is locked in
- Model-aware delivery β Nano Banana Pro (Replicate & fal) and Gemini receive the actual images as multimodal input; all other models receive a text consistency hint instead
- Resets with "Start Fresh" β character images and description are cleared when starting a new project
- Advanced System Prompts β expandable section on the start page with editable textareas for all 7 pipeline stages (story selection, scene planning, image prompts, video prompts, narration script, YouTube metadata, thumbnail prompts)
- Pre-filled with defaults β each textarea shows the actual prompt currently in use so you know exactly what to change
- Reset to default β one click restores any stage to its original prompt
- Custom prompts persist β saved to localStorage, survive page reloads
- ZIP export β available from the images page onwards; no need to complete every step before exporting
- Restorable project.json inside ZIP β load the ZIP back into the app at any time to continue where you left off
- Multi-select thumbnails β pick one or several thumbnails for export
- Thumbnail lightbox with history β view full size, browse previous regenerated versions with arrows, select/deselect from the lightbox
- Generate thumbnail without metadata β thumbnail generation works even if the metadata step was skipped
- YouTube metadata β 4 title options, SEO description, tags, chapter timestamps β all editable before export
- ContentMachine branding β clean dark UI, step indicator in header, GitHub link always visible
- Start Fresh β red button with confirmation dialog to clear all progress safely
- Example topics β pre-filled story suggestions with year/category tags
- Inline video preview β completed video clips play on hover directly in the card grid
- Video prompt editor β edit the motion prompt for any scene before regenerating
Feel free to open a PR! Some areas that would benefit most from contributions:
- fal.ai verification β testing and fixing the fal.ai image and video paths end-to-end
- New video models β adding support for additional Replicate or fal.ai video models
- New image models β expanding the image provider/model list
- Deployment config β Docker, Railway, Render, or Fly.io setup
- Bug fixes & polish β anything you find while using it
Licensed under the Apache License 2.0.

