Dub any video into another language — locally, offline, and for free.
VideoDubber is a local/offline-first desktop app that transcribes a video, translates the transcript, re-voices it with text-to-speech, time-aligns the new audio to the original timing, mixes it back over the (optionally ducked) background, and renders a finished dubbed video — with optional soft, burned-in, or sidecar subtitles.
Everything runs on your machine by default. No cloud account, no API keys, no per-minute billing. Cloud providers are an opt-in future enhancement, never a requirement.
Just want to use VideoDubber? Head to the Releases page and grab the installer for your OS — no Python, Node, or FFmpeg required. Each build is fully self-contained: it bundles the app, the pipeline engine, all three AI workers, and FFmpeg.
| Your machine | File to download | First launch |
|---|---|---|
| Mac — Apple Silicon (M1/M2/M3/M4) | VideoDubber_<ver>_aarch64.dmg |
Double-click to open (signed + notarized). |
| Windows (64-bit) | VideoDubber_<ver>_x64-setup.exe (or _x64_en-US.msi) |
SmartScreen → More info → Run anyway. |
| Linux (64-bit) | VideoDubber_<ver>_amd64.AppImage / .deb |
chmod +x *.AppImage && ./VideoDubber*.AppImage, or sudo dpkg -i *.deb. |
| Mac — Intel | VideoDubber_<ver>_x64.dmg |
Double-click to open (signed + notarized). |
Which Mac do I have? → About This Mac: "Apple M…" = Apple Silicon, "Intel" = Intel.
The macOS builds are signed with an Apple Developer ID and notarized by Apple, so VideoDubber opens with a normal double-click — no security prompt, no Terminal, no right-click:
- Open the
.dmgand drag VideoDubber into your Applications folder. - Open VideoDubber from Applications. That's it.
Building an unsigned fork (no Apple signing secrets configured)? macOS will quarantine it; the one-time unlock and the full signing/notarization setup are documented in
docs/APPLE_SIGNING.md.
Not every platform is necessarily attached to a given release — builds are added as they're ready, so check the Releases page for the installers currently available.
First launch runs a one-time wizard that downloads the AI models for the languages you choose (the only thing not in the installer); after that the app works fully offline. To update, download a newer release from the Releases page — in-app auto-update arrives in a later version (then toggleable in Settings → Updates).
Bundle internals:
docs/PRODUCTION.md· auto-update design:docs/AUTOUPDATE.md.
Cutting a release locally? Releases are built on the maintainer's own Mac + Windows (CI is opt-in per OS). The short version:
pnpm package:sidecars && pnpm app:build # self-contained installer for this OS
# macOS — one command (build + deep-sign + notarize + upload):
SIDECARS=1 UPLOAD=1 bash scripts/package/release-macos.sh
# other OSes — upload to the shared draft release:
bash scripts/package/release-upload.sh upload <artifact> # Windows: release-upload.ps1Full runbook (per-OS steps, signing, opt-in CI): docs/RELEASING.md;
macOS deep-sign rationale + troubleshooting: docs/APPLE_SIGNING.md.
The rest of this README covers developing from source.
Commercial dubbing services and cloud STT/MT/TTS APIs charge per minute of audio and per character translated. For long videos, batches, or iterative editing, that adds up fast — and your media leaves your machine.
VideoDubber flips that model:
- $0 marginal cost. Local engines (faster-whisper, Argos Translate, Piper) run on your CPU/GPU. Dub as much as you want.
- Private by default. Your video, audio, and transcripts never leave your computer unless you explicitly enable a cloud provider.
- Offline-capable. Once models are downloaded, no network is required.
- Cloud is optional. A future
cloud-enhancedmode lets you opt specific steps into higher-quality cloud providers (per-key, per-step), but the default is always local. Seedocs/PROVIDERS.md.
- 8-step dubbing pipeline: probe → extract-audio → STT → translation → TTS → alignment → audio-mix → render.
- Local speech-to-text via faster-whisper (word timestamps, language auto-detect).
- Local machine translation via Argos Translate (offline neural MT).
- Local text-to-speech via Piper, with graceful fallbacks to system TTS
(macOS
say, Linuxespeak-ng) and a dev silent/sine generator so the pipeline always completes. - Optional higher-quality Vietnamese neural voice (VieNeu‑TTS v3‑Turbo) as a downloadable engine pack — see the VieNeu setup guide.
- Smart time alignment: stretches/compresses TTS within configurable speed/overflow limits and flags segments that need review.
- Audio mixing with optional original background audio, ducking, and TTS gain.
- Subtitles: none,
.srtsidecar,.vttsidecar, embedded soft subtitles, or burned-in (with style controls). - Resumable pipeline: steps are skipped if their output artifacts already exist; retry a single step to re-run it and everything downstream.
- Editable transcript: review and correct translated segments, re-synthesize a single segment without re-running the whole job.
- Dual mode: run the Angular UI in a plain browser (no Rust needed), or build the full Tauri 2 native desktop app.
- Live progress over Server-Sent Events (SSE).
┌──────────────────────────────────────────┐
│ videodubber-desktop │
│ Angular 18 UI ──(SSE + HTTP)──┐ │
│ in a browser OR in Tauri 2 │ │
└─────────────┬───────────────────┘ │
│ HTTP / SSE │
▼ │
┌──────────────────────────────────────────┐ │
│ @videodubber/node-orchestrator :5100 │ │ Tauri commands
│ resumable pipeline · provider registry · │◄┘ (reqwest proxy)
│ workspace store · SSE events │
└───┬─────────┬─────────┬──────────┬────────┘
│ │ │ │
FFmpeg (argv) │ HTTP │ HTTP │ HTTP │ (in-process)
▼ ▼ ▼ ▼
┌───────────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ media-worker │ │ STT │ │ Transl │ │ TTS │
│ FFmpeg/ffprobe│ │ :5101 │ │ :5102 │ │ :5103 │
│ (Node TS) │ │whisper │ │ Argos │ │ Piper │
└───────────────┘ └────────┘ └────────┘ └────────┘
@videodubber/shared— TypeScript types + subtitle/language/pipeline utilities, imported by every TS component.@videodubber/media-worker— Node FFmpeg/ffprobe wrapper implementingMediaService(probe, extract audio, render). Used in-process by the orchestrator.@videodubber/node-orchestrator(port 5100) — the brain. HTTP engine that drives the pipeline, talks to the three Python workers, manages per-project workspaces, and streams progress via SSE.- Python workers (FastAPI + uvicorn): STT 5101, Translation 5102, TTS 5103.
videodubber-desktop— Angular 18 standalone UI inside a Tauri 2 shell.
Full details, the 8-step flow, and the data model are in
docs/ARCHITECTURE.md.
| Requirement | Version / notes |
|---|---|
| OS | macOS 12+, Windows 10/11, or a modern Linux. (Validated end-to-end on macOS arm64.) |
| Node.js | 20.11+ (LTS). Provides global fetch and ES2022. |
| pnpm | 9.x. Enable via corepack enable or npm i -g pnpm. |
| Python | 3.11–3.13 (3.13 verified working — faster-whisper/ctranslate2, argostranslate, Piper all have wheels). 3.10 works. brew install python@3.13 (or @3.12). The project uses a per-project interpreter (a .venv or PYTHON_PATH), so your system python3 version doesn't matter — see switching Python. |
| FFmpeg + ffprobe | Required at run time for probe / extract / mix / render. Install per OS — see docs/LOCAL_SETUP.md. For burned-in subtitles you need an FFmpeg built with libass (the subtitles filter). macOS Homebrew's default ffmpeg omits it — use brew install ffmpeg-full and set FFMPEG_PATH/FFPROBE_PATH. The other subtitle modes (soft / sidecar) work with any FFmpeg. |
| Disk | ~1–2 GB for Node deps + Python venvs + models (a small/base Whisper model, one Argos pair, one Piper voice). |
| Rust (optional) | Only needed to build/run the native Tauri desktop app (pnpm app). Install via rustup. The browser dev mode does not need Rust. |
Local models (downloaded by the setup script): a faster-whisper model (default
small; base is a fast CPU choice), an Argos language package (default en → vi),
and optionally a Piper voice for high-quality TTS. See
docs/MODEL_SETUP.md.
# 0. Clone, then enable pnpm
corepack enable
# 1. Install all TypeScript/Node workspace deps
pnpm install
# 2. One-time local setup: create Python venvs, install worker deps,
# pre-cache the whisper model, install an Argos pair, fetch a Piper voice.
# (Network step. Everything is individually skippable — see the script header.)
bash scripts/setup-local-models.sh # Windows: pwsh scripts/setup-local-models.ps1
# 3. Verify your environment (Node, pnpm, Python, ffmpeg, workers, models)
pnpm verify
# 4. Run EVERYTHING with ONE command, then open http://localhost:1420
pnpm dev # foreground (Ctrl-C stops everything)
# ...or detached, so your terminal returns:
pnpm start # start the whole stack in the background
pnpm stop # stop the whole stack (single command)Then open http://localhost:1420 in your browser, pick a video, choose source/target languages, and run the pipeline.
Prefer a native desktop window? Run
pnpm app(needs Rust) — it opens the app and auto-starts/stops all backend services for you. See Running the app below.
First run is slower if models still need to download. Re-runs are fully offline.
VideoDubber's UI is plain Angular 18 talking to the orchestrator over HTTP/SSE, so you can run it two ways. Either way, "everything" = the 3 Python workers + the Node orchestrator + the UI.
pnpm app # needs Rust (rustup). Opens the VideoDubber window.This is the intended end-user experience. The Tauri 2 shell:
- On open: automatically starts the backend (orchestrator + STT/translation/TTS
workers) — you do not run
pnpm dev. - On quit: automatically stops all of them. Close the window → everything shuts down.
It also adds real native commands (pick_video_file, "open output folder", …) that proxy
to the orchestrator. Auto-management is controlled by VIDEODUBBER_MANAGE_SERVICES
(default on; set to 0 if you'd rather run the backend yourself). Implementation:
apps/desktop/src-tauri/src/sidecar.rs. See
docs/DESKTOP_APP.md for the simple install & use guide.
pnpm dev # foreground: workers + orchestrator + Angular UI. Ctrl-C stops all.ng serve hosts the UI on http://localhost:1420; it calls the orchestrator at
http://127.0.0.1:5100 and subscribes to SSE for progress. Native-only conveniences
degrade gracefully outside Tauri. No Rust toolchain needed — great for development.
| Goal | Command |
|---|---|
| Start everything, foreground (Ctrl-C to stop) | pnpm dev |
| Start everything, detached (terminal returns) | pnpm start |
| Stop everything (any start method) | pnpm stop |
| Open the native desktop app (auto start/stop) | pnpm app |
| Backend only (no UI), foreground | pnpm services |
pnpm stopis port-based — it reliably stops the whole stack (UI 1420, orchestrator 5100, workers 5101–5103) however it was started.- Windows: use the PowerShell equivalents —
pwsh scripts/start.ps1,pwsh scripts/stop.ps1,pwsh scripts/dev.ps1. - Put machine-specific paths in a
.env(copy from.env.example) —FFMPEG_PATH,PYTHON_PATH,PIPER_*, ports, etc. The start scripts load it automatically.
Building a release installer (
pnpm app:build) additionally needs app icons (pnpm tauri icon …) and, for a fully standalone installer, bundling the Python workers — seedocs/DESKTOP_APP.mdanddocs/ROADMAP.md.
| Command | What it does |
|---|---|
pnpm install |
Install all TS/Node workspace dependencies. |
bash scripts/setup-local-models.sh |
Create Python venvs + install worker deps + download models. (.ps1 on Windows.) |
pnpm verify |
Run scripts/verify-environment.ts: checks Node/pnpm/Python/ffmpeg/workers/models. |
pnpm dev |
Start the full stack (3 workers + orchestrator + Angular UI), foreground. |
pnpm start |
Start the full stack detached (background); terminal returns. |
pnpm stop |
Stop everything (port-based; works for any start method). |
pnpm app |
Open the native desktop app (Tauri; auto starts/stops services). Needs Rust. |
pnpm app:build |
Build a native desktop installer/bundle (needs Rust + app icons). |
pnpm services |
Start only the backend (workers + orchestrator), no UI. |
pnpm dev:workers |
Start only the 3 Python workers (5101/5102/5103). |
pnpm dev:orchestrator |
Start only the Node orchestrator (5100). |
pnpm dev:desktop |
Start only the Angular UI (ng serve, port 1420). |
pnpm build |
Build the TS packages + media-worker. |
pnpm typecheck |
Type-check every workspace package. |
pnpm test |
Run unit tests (shared utils, media-worker, orchestrator). |
pnpm lint |
ESLint over the TypeScript sources. |
Environment overrides for
scripts/dev.sh:SKIP_WORKERS=1,SKIP_UI=1. Forscripts/setup-local-models.sh:SKIP_VENVS=1,SKIP_MODELS=1,SKIP_WHISPER=1,SKIP_ARGOS=1,SKIP_PIPER=1, plusFASTER_WHISPER_MODEL,ARGOS_FROM/ARGOS_TO,PIPER_VOICE.
Copy .env.example to .env and adjust ports, binary paths, and model settings as
needed. All values have sensible defaults; the app runs fully offline with none set.
Key environment variables (see .env.example for the full list):
| Variable | Default | Purpose |
|---|---|---|
ORCHESTRATOR_URL |
http://127.0.0.1:5100 |
Node orchestrator HTTP engine. |
STT_WORKER_URL |
http://127.0.0.1:5101 |
faster-whisper STT worker. |
TRANSLATION_WORKER_URL |
http://127.0.0.1:5102 |
Argos Translate worker. |
TTS_WORKER_URL |
http://127.0.0.1:5103 |
Piper/system/fallback TTS worker. |
VIDEODUBBER_PROJECTS_DIR |
~/VideoDubber/projects |
Per-project workspaces. |
FFMPEG_PATH / FFPROBE_PATH |
PATH lookup | FFmpeg binaries. |
PYTHON_PATH |
python3 |
Interpreter for the workers. |
FASTER_WHISPER_MODEL |
small |
Whisper model size. |
PIPER_BINARY_PATH / PIPER_VOICE_MODEL_PATH |
(unset) | Enable the Piper TTS engine. |
Optional cloud keys (OPENAI_API_KEY, DEEPL_API_KEY,
GOOGLE_APPLICATION_CREDENTIALS, AZURE_SPEECH_KEY/AZURE_SPEECH_REGION,
ELEVENLABS_API_KEY) are only used by the future cloud-enhanced mode and are never
required. See docs/PROVIDERS.md.
| Doc | Contents |
|---|---|
docs/DESKTOP_APP.md |
Simple install & use guide for the desktop app + auto start/stop of services. |
docs/PRODUCTION.md |
The fully self-contained installer: what's bundled vs. downloaded on first run, the first-run wizard, prod sidecar lifecycle, storage & sizes. |
docs/RELEASING.md |
Release runbook: cut a release locally (pnpm package:sidecars + pnpm app:build → release-upload.{sh,ps1}); macOS deep-sign + notarize via release-macos.sh; updater keys; opt-in per-OS CI. |
docs/AUTOUPDATE.md |
How auto-update works (endpoint, pubkey, signature verification), the auto/manual setting, manual checks, rollback. |
docs/ARCHITECTURE.md |
Components, pipeline flow, data model, workspace layout, HTTP API, Tauri commands, SSE model, service lifecycle. |
docs/LOCAL_SETUP.md |
Node/pnpm/Python/FFmpeg/Rust setup; running, starting & stopping each service. |
docs/MODEL_SETUP.md |
Whisper / Argos / Piper models: download, storage, troubleshooting. |
docs/PROVIDERS.md |
Local defaults, provider interfaces, optional cloud adapters + data flow. |
docs/TROUBLESHOOTING.md |
Every error code, common failures, and fixes. |
docs/ROADMAP.md |
Planned features (diarization, source separation, voice cloning, packaging…). |
- TTS quality depends on the chosen Piper voice; without a Piper binary/voice the worker falls back to system TTS or a silent/sine placeholder.
- No speaker diarization yet — all segments use a single voice.
- No source separation (music/voice) yet; ducking is a volume reduction, not stem isolation.
- Argos language coverage and quality vary by pair; some pairs are not available.
- Alignment uses time-stretching within limits; very dense speech may overflow and get flagged for review.
- Voice cloning is intentionally excluded (see disclaimer + roadmap).
See docs/ROADMAP.md for what's planned.
VideoDubber is a tool. You are responsible for how you use it.
- Only dub videos you own or have explicit permission to process. Respect copyright, licensing, and platform terms of service.
- Translations and synthetic voices can be inaccurate. Review output before publishing, especially for sensitive or factual content.
- Voice cloning is not included. VideoDubber uses generic TTS voices. Any future
voice-cloning capability would require explicit, documented consent from the
person whose voice is involved and a legal review (see
docs/ROADMAP.md). Do not use this software to impersonate anyone. - No warranty. Provided "as is" under the MIT License (see
LICENSE).
The project jianchang512/stt (GPL-3.0) was
studied as a reference only while designing the local STT/dubbing concept. No
GPL-licensed code was copied; VideoDubber is original work and does not depend on that
project at run time. Full statement in NOTICE.md and
docs/ARCHITECTURE.md.