Fizz

AI-native video editor in your browser. Tell it what you want in plain English — it edits the video for you.

Fizz is a multi-track timeline editor (React + Flask) powered by Google Gemini. Instead of clicking through menus, you chat with an AI assistant that can arrange clips, generate captions, remove filler words, dub your video into other languages, and auto-switch camera angles in podcasts — all from a single conversation.

What It Can Do

Auto-Captions — Transcribes speech with Whisper and places timed subtitle overlays on the timeline. Supports 40+ languages via Google Translate.
Translated Voice-Over — Dubs your video into another language with synced text-to-speech audio that matches original pacing.
Talking-Head Cleanup — Detects and removes silences, filler words ("um", "uh"), false starts, and stutters automatically using FFmpeg silence detection + Gemini content analysis.
Auto-Switch Cameras — For podcasts with two camera angles: performs speaker diarization to detect who's talking and cuts between cameras accordingly, with a shared audio track.
Full Timeline Editing — Drag-and-drop clips, trim, split, reorder, multi-track layering, zoom, and real-time video preview with caption overlays.
Export — Render to H.265 MP4 or export as Premiere Pro / Final Cut Pro XML with all media files.

---

How It Works

You upload media (video, audio, images) into the browser-based editor.
You tell the AI what to do — e.g. "add Spanish captions", "remove silences", "switch cameras based on who's speaking".
Gemini picks the right tools via function calling and executes them on the backend — extracting audio, running Whisper, calling Google Translate, generating TTS, building FFmpeg filter graphs — streaming live progress back to you.
The timeline updates in real-time. Preview the result, make adjustments, and export.

The AI handles complex multi-step pipelines (transcription → translation → TTS → speed matching → timeline placement) that would take hours to do manually.

Quick Start

Prerequisites

Node.js v16+
Python 3.8+
FFmpeg installed and on your PATH
A Google Gemini API key (get one free)

Setup

# Backend
cd backend
pip install -r requirements.txt
pip install faster-whisper deep-translator gTTS Pillow
echo GOOGLE_API_KEY=your_key_here > .env
python app.py
# → runs on http://localhost:5000

# Frontend (new terminal)
cd frontend
npm install
npm run dev
# → opens at http://localhost:5173

That's it. Open http://localhost:5173, upload a video, and start chatting.

Tech Stack

Layer	Stack
Frontend	React 18, Vite 5, vanilla CSS
Backend	Python 3, Flask, SSE streaming
AI	Google Gemini 2.5 Flash (chat + function calling + audio analysis)
Speech-to-Text	faster-whisper (CTranslate2 Whisper)
Translation	Google Translate (deep-translator) + Gemini fallback
Text-to-Speech	gTTS
Video Processing	FFmpeg / ffprobe (CLI)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
.gitattributes		.gitattributes
Project1.xml		Project1.xml
Project2.xml		Project2.xml
README.md		README.md
README_detail.md		README_detail.md
timeline_backup.jsx		timeline_backup.jsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fizz

What It Can Do

How It Works

Quick Start

Prerequisites

Setup

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fizz

What It Can Do

How It Works

Quick Start

Prerequisites

Setup

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages