Build a professional-grade, fully offline video editing application that runs entirely in the web browser, combining the power of modern web APIs (WebCodecs, WebGPU) with AI capabilities (speech recognition, natural language processing) to create an editing experience that rivals desktop applications like DaVinci Resolve and Final Cut Pro.
Enable anyone to edit videos professionally using natural language commands, without installing software, uploading files to servers, or requiring technical expertise.
- All video processing happens locally in the browser
- No file uploads to external servers
- No tracking or data collection
- User maintains complete control over their media
- Frame-accurate timeline editing with multi-track support
- Industry-standard features: transitions, effects, color correction
- Keyboard shortcuts matching professional NLE workflows
- Export to interchange formats (FCPXML, EDL, AAF) for handoff to other tools
- Natural Language Editing: "Cut from 1:30 to 2:45 and add a crossfade"
- Automatic Transcription: Generate word-level transcripts with speaker identification
- Smart Content Analysis: Find scenes, detect silence, identify speakers
- Motion Graphics Generation: Create lower thirds and titles from text descriptions
- Chat-based interface lowers the learning curve for beginners
- Visual preview of every AI command before execution
- Suggested commands guide users through common tasks
- Progressive disclosure: simple for beginners, powerful for experts
- Leverage WebCodecs for hardware-accelerated video decode/encode
- Use WebGPU for real-time effects and compositing
- Lightweight bundle (~75KB core) vs traditional FFmpeg.wasm (~30MB+)
- Proxy workflow for smooth timeline playback
- ✅ No installation required - runs in browser
- ✅ Cross-platform (works on any OS with a modern browser)
- ✅ Natural language editing interface
- ✅ Privacy-first (no cloud uploads)
- ❌ Slightly slower export due to browser constraints
- ❌ Limited codec support (no ProRes/DNxHD)
- ✅ Fully offline after initial model download
- ✅ No subscription required
- ✅ No file size or duration limits
- ✅ Complete privacy (files never leave device)
- ❌ Requires modern hardware (8GB+ RAM)
- ❌ Initial AI model downloads (200-800MB)
- ✅ Professional features (multi-track, effects, color grading)
- ✅ AI transcription and content analysis
- ✅ Frame-accurate editing
- ✅ Industry interchange format export
- ✅ Unlimited undo/redo with transactions
- YouTube/TikTok Editors: Quick cuts, remove silence, auto-captions
- Podcasters: Transcribe episodes, create video versions with waveforms
- Live Streamers: Edit VODs, create highlight reels
- Rough Cut Assembly: Quick edits with chat commands, export FCPXML for final polish
- Transcription Services: Generate accurate transcripts with speaker labels
- QC & Review: Annotate videos with markers and notes
- Students: Learn video editing without expensive software
- Educators: Create lecture videos with auto-generated captions
- Tutorial Creators: Demonstrate editing techniques in-browser
- Marketing Teams: Edit social media content without file uploads
- Internal Comms: Create training videos with privacy compliance
- Remote Teams: Collaborative editing without cloud dependencies
Unlike chatbots that produce unreliable text, this system uses strict JSON schema validation to ensure every AI command is:
- Parseable and executable
- Validated before execution
- Reversible (full undo/redo)
- Reviewable (preview before commit)
All edits work at the frame level, not seconds:
- Timecode-aware (supports drop-frame and non-drop)
- Zero rounding errors
- Professional precision matching desktop NLEs
Every command is wrapped in a transaction:
- Atomic operations (all-or-nothing)
- Full state snapshots for undo
- Labeled undo stack ("Undo: LLM command - Increase brightness")
- Rollback on errors
Not just a text transcript—every word is:
- Linked to video timecode
- Clickable for instant seek
- Editable with frame-accurate timing adjustments
- Exportable as SRT/VTT with speaker labels
Export to formats used by industry pros:
- FCPXML: Import into Final Cut Pro
- EDL: Universal cut list format
- Premiere XML: Import into Adobe Premiere
- AAF: Audio interchange for Pro Tools
- Includes handles, markers, and metadata
- ✅ Edit a 10-minute video start-to-finish using only chat commands
- ✅ Export professional-quality output with subtitles
- ✅ Hand off FCPXML to a professional editor for final polish
- ✅ Complete privacy (never upload files)
- ✅ < 5 second page load (excluding AI models)
- ✅ 30fps timeline playback for 1080p proxies
- ✅ < 100ms command parse time
- ✅ 95%+ command accuracy (validated JSON)
- ✅ Prove browser-based professional editing is viable
- ✅ Demonstrate AI + deterministic systems working together
- ✅ Show privacy-preserving AI applications
- ✅ Create open-source foundation others can build on
- 75KB vs 30MB: Dramatically faster initial load
- Native WebCodecs: Hardware acceleration out of the box
- Long-form support: No memory constraints from WASM
- Modern: Built for 2025+ browsers, not legacy compatibility
- 100% browser-native: No server required
- State-of-art models: Whisper, CLIP, Phi-3
- Progressive loading: Download only what you need
- Active ecosystem: Regular updates and improvements
- Deterministic: LLMs can be unpredictable, validation ensures reliability
- Type safety: Catch errors before execution
- Testability: Validate commands in unit tests
- Documentation: Schema serves as API contract
- ✅ Basic timeline with playback
- ✅ Cut, trim, split operations
- ✅ Simple chat commands
- ✅ MP4 export
- ✅ Multi-track editing
- ✅ Transitions and effects
- ✅ Color correction
- ✅ Audio mixing
- ✅ Whisper transcription
- ✅ Speaker diarization
- ✅ Scene detection
- ✅ Natural language commands
- ✅ FCPXML export
- ✅ EDL export
- ✅ SRT/VTT subtitles
- ✅ Project JSON format
- ⏳ Motion graphics templates
- ⏳ Multicam editing
- ⏳ Advanced color grading
- ⏳ Audio ducking & normalization
- Q: Can we maintain 30fps playback for 4K content?
- A: Use aggressive proxy generation (480p) and OffscreenCanvas optimization
- Q: What if the LLM produces invalid commands?
- A: Strict JSON schema validation + fallback deterministic parser for critical operations
- Q: What about older browsers or mobile?
- A: Graceful degradation: show compatibility warning, suggest Chrome/Edge 94+
- Q: 800MB of models is a lot to download
- A: Progressive loading, user choice (tiny vs accurate), persistent cache
This project needs expertise in:
- WebCodecs/WebGPU: Optimize rendering pipeline
- ML/AI: Improve transcription accuracy, add features
- Video Production: UX feedback from professional editors
- Accessibility: Ensure keyboard navigation, screen readers
- Performance: Profile and optimize bottlenecks
MIT
- Diffusion Studio: For pioneering browser-native video editing
- Transformers.js: For bringing AI models to the browser
- WebCodecs/WebGPU teams: For making this possible
- Professional editors: For feedback on workflow requirements
Built with the belief that professional creative tools should be accessible, private, and run anywhere a browser does.