chat-first AI video editor. drop raw footage, talk to it, get an editable timeline in 60 seconds. desktop-native, powered entirely by gemini's multimodal stack.
20-hour hackathon build. spec lives in
docs/DESIGN.md.
- open the tauri app —
cd apps/desktop && pnpm tauri dev - drop
samples/sintel-30s.mp4into the assets panel - wait ~45s — watch the three pipeline badges go green (transcript + frames + timeline build in parallel, automatically, no button click)
- click the asset → real editable timeline materializes → chat "add a caption at 5 seconds that says the magic" → it just does it
- hit Export MP4 → server-side ffmpeg burns captions in → file lands in your downloads
full recorded-walkthrough script + judging criteria hooks + "do not click X" warnings live in docs/DEMO.md.
| if you're doing... | read this first |
|---|---|
| architecture / engineering decisions | docs/DESIGN.md |
| frontend, UI, UX, design generation | docs/PRODUCT-DESIGN.md |
| picking what to build next | docs/PLAN.md (after design sign-off) |
| understanding what we're NOT doing | docs/NOT-DOING.md |
| writing code (standards, setup, commands) | AGENTS.md |
| claude code-specific bits | CLAUDE.md |
tauri 2 rust desktop · python fastapi backend in k8s (kind locally) · seaweedfs in-cluster storage · gemini 2.5/3.x + veo 3.1 + imagen 4 + nano banana + lyria 3 · groq whisper · pexels.
we have a main concept called project- that is the main part of our project, it is the main componenet that we have, so we have diff projects per diff things sorta ykwim?
within those projects we have assets mainly those assets are video content (raw) which contains the video and audio raw content and we can provide custom sources for context.
we have a few features:
- thumnail generation
- QnA with the content
- transcript generation
- search
- b roll generation (or any content generation)
- transitions generation (again through veo 3)
- background audio generation (through gemini)
we need to provide some tools to our agents, some of the ideas i have are:
- zoom
- changing color tones
- cropping
- camera stabilizations
- cut clip
- remove clip
- move clip
- apply transitions
we need a massive sort of a chat that sort of the main thing- we need a AI native sort of a view, its the timeline, preview and then the chat mainly sorta ykwim?
we need to be able change audio levels and shape it out and all of that, and sound effects
prepped like text alignment as well, sort of text alignment and shit like that ykwim?
we have a big feature called auto edit- that is our killer feature
retention graph of timeline needs to be provided like a guess of sorts needs to be provided.
short form content re purpose of scnearios for the content man.
highlight generation for the intro for high retention sorta
AI caption adding to the content it self like what needs to be added sorta- we need to have styles as well for this shit- we have transcriptions this is something built on top of the transcripts
review video functiaonlity- sort of a User Acceptance Editors
section labelling for the video like for youtube for example
timeline heatmap sort of like the strong moments, weak moments, etc...
auto voice over
# prerequisites (macOS — adapt for linux)
brew install kind kubectl skaffold docker rustup uv pnpm
# spin up local k8s cluster
kind create cluster --name frameos --config infra/kind-config.yaml
kubectl create namespace frameos
# load API keys into a k8s secret
kubectl -n frameos create secret generic api-keys \
--from-literal=GEMINI_API_KEY=$GEMINI_API_KEY \
--from-literal=GROQ_API_KEY=$GROQ_API_KEY \
--from-literal=PEXELS_API_KEY=$PEXELS_API_KEY
# deploy backend + seaweedfs
kubectl apply -f infra/k8s/
# install deps
cd backend && uv sync && cd ..
cd desktop && pnpm install && cd ..
# run (two terminals)
cd infra && skaffold dev # backend live-reload
cd desktop && pnpm tauri dev # desktop appfull setup, day-to-day commands, coding standards, testing, and git workflow are in AGENTS.md.