AI-native video editor in your browser. Tell it what you want in plain English — it edits the video for you.
Fizz is a multi-track timeline editor (React + Flask) powered by Google Gemini. Instead of clicking through menus, you chat with an AI assistant that can arrange clips, generate captions, remove filler words, dub your video into other languages, and auto-switch camera angles in podcasts — all from a single conversation.
- Auto-Captions — Transcribes speech with Whisper and places timed subtitle overlays on the timeline. Supports 40+ languages via Google Translate.
- Translated Voice-Over — Dubs your video into another language with synced text-to-speech audio that matches original pacing.
- Talking-Head Cleanup — Detects and removes silences, filler words ("um", "uh"), false starts, and stutters automatically using FFmpeg silence detection + Gemini content analysis.
- Auto-Switch Cameras — For podcasts with two camera angles: performs speaker diarization to detect who's talking and cuts between cameras accordingly, with a shared audio track.
- Full Timeline Editing — Drag-and-drop clips, trim, split, reorder, multi-track layering, zoom, and real-time video preview with caption overlays.
- Export — Render to H.265 MP4 or export as Premiere Pro / Final Cut Pro XML with all media files.
- You upload media (video, audio, images) into the browser-based editor.
- You tell the AI what to do — e.g. "add Spanish captions", "remove silences", "switch cameras based on who's speaking".
- Gemini picks the right tools via function calling and executes them on the backend — extracting audio, running Whisper, calling Google Translate, generating TTS, building FFmpeg filter graphs — streaming live progress back to you.
- The timeline updates in real-time. Preview the result, make adjustments, and export.
The AI handles complex multi-step pipelines (transcription → translation → TTS → speed matching → timeline placement) that would take hours to do manually.
- Node.js v16+
- Python 3.8+
- FFmpeg installed and on your PATH
- A Google Gemini API key (get one free)
# Backend
cd backend
pip install -r requirements.txt
pip install faster-whisper deep-translator gTTS Pillow
echo GOOGLE_API_KEY=your_key_here > .env
python app.py
# → runs on http://localhost:5000
# Frontend (new terminal)
cd frontend
npm install
npm run dev
# → opens at http://localhost:5173That's it. Open http://localhost:5173, upload a video, and start chatting.
| Layer | Stack |
|---|---|
| Frontend | React 18, Vite 5, vanilla CSS |
| Backend | Python 3, Flask, SSE streaming |
| AI | Google Gemini 2.5 Flash (chat + function calling + audio analysis) |
| Speech-to-Text | faster-whisper (CTranslate2 Whisper) |
| Translation | Google Translate (deep-translator) + Gemini fallback |
| Text-to-Speech | gTTS |
| Video Processing | FFmpeg / ffprobe (CLI) |
MIT
